Data on religion and politics in India
PLpgSQL Other
Permalink
Failed to load latest commit information.
andhragis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
andhraid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
andhrarolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
delhigis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
delhiid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
delhirolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
examples Clarify that EPA replication is a guide, not a 1:1 replica, because o… Jan 16, 2017
gujcandidates2014 Add Gujarat data - first step out of UP! Sep 29, 2016
gujgis Added raw data for Gujarat GIS Jan 12, 2017
gujid Added frontpage details for Gujarat Mar 8, 2017
gujloksabha2014 Add Gujarat data - first step out of UP! Sep 29, 2016
gujrolls2014 Added frontpage details for Gujarat Mar 8, 2017
hargis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
harid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
harrolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
kargis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
karid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
karrolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
kergis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
kerid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
kerrolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
mahagis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
mahaid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
maharolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
mpgis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
mpid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
mprolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
orgis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
orid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
orrolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
rajgis Added GIS for Andhra, Delhi, Haryana, Karnataka, Kerala, MP, Maharash… Jan 12, 2017
rajid Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
rajrolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
upcandidates2007 Added candidate names and likely religion for 2007, 2009, 2012 and 20… Feb 12, 2016
upcandidates2009 Added candidate names and likely religion for 2007, 2009, 2012 and 20… Feb 12, 2016
upcandidates2012 Added candidate names and likely religion for 2007, 2009, 2012 and 20… Feb 12, 2016
upcandidates2014 Added candidate names and likely religion for 2007, 2009, 2012 and 20… Feb 12, 2016
upcandidates2017 Added candidates for UP 2017 assembly elections Feb 24, 2017
upgis Added raw data for UP 2017 GIS Jan 12, 2017
upid Fixed some typos in aggregation SQL stuff Jan 11, 2017
uploksabha2009 Added properly compressed upid integration table Feb 17, 2016
uploksabha2014 Bugfix in uploksabha2014: station_id_14 and station_name_14 were not … Sep 30, 2016
uprolls2011 Added roll data for 2011 and 2014 and prepared to add 2012 and 2013 Feb 10, 2016
uprolls2012 Added actual data into uprolls2012 and uprolls2013 Feb 22, 2016
uprolls2013 Added actual data into uprolls2012 and uprolls2013 Feb 22, 2016
uprolls2014 Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
uprolls2015 Added UP namematching for 2015 and 2016 rolls Jan 11, 2017
uprolls2016 Added UP namematching for 2015 and 2016 rolls Jan 11, 2017
upvidhansabha2007 Added properly compressed upid integration table Feb 17, 2016
upvidhansabha2012 Added properly compressed upid integration table Feb 17, 2016
.gitignore updated .gitignore Sep 29, 2016
LICENSE.code.md Clarified license provisions Feb 1, 2016
LICENSE.md Initial commit; contains only UP Vidhan Sabha results for 2007; more … Jan 31, 2016
README.md Added candidates for UP 2017 assembly elections Feb 24, 2017
ROADMAP.md Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
TROUBLESHOOTING.md Added properly compressed upid integration table Feb 17, 2016
combined-a.sql Added frontpage details to all states but Gujarat (the latter to foll… Mar 7, 2017
combined-b.sql Fixed some typos in aggregation SQL stuff Jan 11, 2017

README.md

Data on religion and politics in India

This repository provides highly localized statistics on religion and politics in India under an open license. I aim to cover Uttar Pradesh as comprehensively as possible, and the rest of India during general elections (see roadmap) and/or if other people contribute. A (potentially incomplete) list of academic usecases for this data is on Google Scholar; there is also a separate folder with examples to replicate.

Fortunately, recent transparency initiatives by the Election Commission of India in general and the Chief Electoral Officer of UP in particular now allow researchers to shift the central unit of quantitative political analyses from the constituency level to that of polling booths, stations, and villages (earlier, such data had to be interpolated or estimated). Often, this data is not very user-friendly, though (think garbled, scanned PDFs). The purpose of this repository is to curate this data in a more accessible format and to share the scraping and cleanup code for reference. This official data is then supplemented with estimates of religious demography based on the religious connotations of electors' names in the voter lists (see below).

From 2013 to 2015, the whole dataset was located on my personal website, and the blog there continues to provide bits and pieces of advice on how to use it, as do my various publications. This created unnecessary hurdles for collaboration, though, and created its unique challenges in terms of long-term availability. After pondering various options, I decided to move to GitHub entirely. Technically, the final dataset comes as a SQLite database with a number of relational tables:

table description
examples Example queries that would replicate published papers based on this data
andhraid ID matching and integration table for Andhra Pradesh (see below)
andhragis GIS coordinates and other spatial characteristics of polling booths in Andhra Pradesh
andhrarolls2014 Booth-level estimates of religious demography for 2014 across Andhra Pradesh
delhiid ID matching and integration table for Delhi (see below)
delhigis GIS coordinates and other spatial characteristics of polling booths in Delhi
delhirolls2014 Booth-level estimates of religious demography for 2014 across Delhi
gujid ID matching and integration table for Gujarat (see below)
gujgis GIS coordinates and other spatial characteristics of polling booths in Gujarat
gujloksabha2014 Booth-level (form 20) results for the 2014 Lok Sabha election from Gujarat
gujcandidates2014 Candidates and their likely religion for the 2014 Lok Sabha election from Gujarat
gujrolls2014 Booth-level estimates of religious demography for 2014 across Gujarat
harid ID matching and integration table for Haryana (see below)
hargis GIS coordinates and other spatial characteristics of polling booths in Haryana
harrolls2014 Booth-level estimates of religious demography for 2014 across Haryana
karid ID matching and integration table for Karnataka (see below)
kargis GIS coordinates and other spatial characteristics of polling booths in Karnataka
karrolls2014 Booth-level estimates of religious demography for 2014 across Karnataka
kerid ID matching and integration table for Kerala (see below)
kergis GIS coordinates and other spatial characteristics of polling booths in Kerala
kerrolls2014 Booth-level estimates of religious demography for 2014 across Kerala
mpid ID matching and integration table for Madhya Pradesh (see below)
mpgis GIS coordinates and other spatial characteristics of polling booths in Madhya Pradesh
mprolls2014 Booth-level estimates of religious demography for 2014 across Madhya Pradesh
mahaid ID matching and integration table for Maharashtra (see below)
mahagis GIS coordinates and other spatial characteristics of polling booths in Maharashtra
maharolls2014 Booth-level estimates of religious demography for 2014 across Maharashtra
orid ID matching and integration table for Orissa (see below)
orgis GIS coordinates and other spatial characteristics of polling booths in Orissa
orrolls2014 Booth-level estimates of religious demography for 2014 across Orissa
rajid ID matching and integration table for Rajasthan (see below)
rajgis GIS coordinates and other spatial characteristics of polling booths in Rajasthan
rajrolls2014 Booth-level estimates of religious demography for 2014 across Rajasthan
upid ID matching and integration table for Uttar Pradesh (see below)
upgis GIS coordinates and other spatial characteristics of polling booths in Uttar Pradesh
upvidhansabha2007 Booth-level (form 20) results for the 2007 Vidhan Sabha election in Uttar Pradesh
uploksabha2009 Booth-level (form 20) results for the 2009 Lok Sabha election from Uttar Pradesh
upvidhansabha2012 Booth-level (form 20) results for the 2012 Vidhan Sabha election in Uttar Pradesh
uploksabha2014 Booth-level (form 20) results for the 2014 Lok Sabha election from Uttar Pradesh
upcandidates2007 Candidates and their likely religion for the 2007 Vidhan Sabha election in Uttar Pradesh
upcandidates2009 Candidates and their likely religion for the 2009 Lok Sabha election from Uttar Pradesh
upcandidates2012 Candidates and their likely religion for the 2012 Vidhan Sabha election in Uttar Pradesh
upcandidates2014 Candidates and their likely religion for the 2014 Lok Sabha election from Uttar Pradesh
upcandidates2017 Candidates and their likely religion for the 2017 Vidhan Sabha election in Uttar Pradesh
uprolls2011 Booth-level estimates of religious demography for 2011 across Uttar Pradesh
uprolls2012 Booth-level estimates of religious demography for 2012 across Uttar Pradesh
uprolls2013 Booth-level estimates of religious demography for 2013 across Uttar Pradesh
uprolls2014 Booth-level estimates of religious demography for 2014 across Uttar Pradesh
uprolls2015 Booth-level estimates of religious demography for 2015 across Uttar Pradesh
uprolls2016 Booth-level estimates of religious demography for 2016 across Uttar Pradesh

If you wish to recreate the whole database, the easiest way would be to clone this repository in its entirety, and then run the equivalent of cat combined-a.sql | sqlite3 combined.sqlite and cat combined-b.sql | sqlite3 combined.sqlite on your system. This will automatically create a new combined.sqlite file by running all table.sql files in the correct order. You can then extract your data from one or multiple tables for further processing using standard SQL commands.

If you wish to add or correct stuff in the dataset, you can either send me an informal email (see below) or, if sufficiently technically minded, create a pull request against this repository. If making corrections or merely adding more variables to an existing table, please update the respective README.md with an explanation, update table.sql with the necessary SQL code, and create a new table.csv dump (code for which should already be included in the table.sql). If adding entirely new tables, please follow this folder structure that applies to all tables:

  • table - a directory containing the scraping and cleanup code used to generate this table from raw data. Note that the raw data itself can often not be redistributed for legal reasons and may not be available at its earstwhile URL anymore - a chief reason to curate this repository. If you want access to original raw data in order to check the scripts, drop me an email and we can arrange something.
  • table/README.md - a description of each variable in this table alongside notes on raw data sources, notes on accuracy, and, if relevant, additional license information.
  • table/LICENSE.md - a copy of the data license (which may be different from the database license at large, see below)
  • table/table.sql - a set of SQLite commands that you can use to add the table to your master database using combined.sql (see below; this might be split into several files if they get too large).
  • table/table.csv - a CSV dump of said table. I personally prefer to work straight from SQLite, but you may not (this might again be split into several files).

One particularly important set of tables are the various "id" ones - they map the ID codes across the dataset against each other (there is one id table per state, re-generated after each addition to the dataset). Unfortunately, but necessarily, the Election Commission changes polling booth IDs and names once in a while and we had a delimitation exercise in 2008 with even starker impact on precincts. Consequently, you cannot simply assume that, for instance, booth 143 in constituency 47 of Uttar Pradesh in the uploksabha2014 table is the same entity as booth 143 in constituency 47 of Uttar Pradesh in the upvidhansabha2012 table. Likewise, spatial matching - for instance used to tell which district a given polling station falls into - has its own set of inaccuracies. So if you need to combine tables with a different set of ID codes, you need to look up what matches what in the state's id table (id codes with the same name are directly compatible across tables within the same state)

The estimates of religious demography use an algorith which is also on GitHub and described more fully in the following article of mine (upscaling was generously sponsored by the Oxford Advanced Research Computing unit):

Susewind, R. (2015). What's in a name? Probabilistic inference of religious community from South Asian names. Field Methods 27(4), 319-332.

Another useful source that complements this data are the GIS shapefiles for assembly segments and parliamentary constituencies which are included in the following dataset; the ID codes used therein are compatible to the *loksabha2014 tables (note that the polling booth localities as such are also directly embedded in the *gis tables, so you only need the shapefiles to map higher levels of aggregation):

Susewind, R. (2014). GIS shapefiles for India's parliamentary and assembly constituencies including polling booth localities. Published under a CC-BY-NC-SA 4.0 license. Available from http://dx.doi.org/10.4119/unibi/2674065.

The dataset in its entirety is licensed under an ODC Open Database license. This allows you to download, copy, use and redistribute it, as long as you attribute correctly, abstrain from technical methods of copy protection, and most importantely make any additions and modifications publicly available on equal terms (preferably on this very repository). A number of tables in this dataset come with their own legal baggage, which is mentioned and explained further in their respective README.md and LICENSE.md files. Code used for crawling and compilation is subject to a CC-BY-NC-SA 4.0 license. In an academic context, I suggest you attribute using this reference:

Susewind, R. (2016). Data on religion and politics in India. Published under an ODbL 1.0 license. Available from https://github.com/raphael-susewind/india-religion-politics.

So I invite all to download and use this dataset for more localized quantitative analyses of political, religious and demographic dynamics in India in the spirit of Open Data sharing. Please let me know if you find the dataset useful and alert me to errors and mistakes. I provide this dataset without any guarantee - see troubleshooting notes for known general problems with this data, alongside the various table READMEs.

Raphael Susewind, mail@raphael-susewind.de, GPG key 10AEE42F