Biographical Data of Indian Politicians
Biographical data of national, state and some local elections candidates from archive.india.gov.in and myneta.info along with scripts for retrieving the data. The data from the 15th Lok Sabha and members in Rajya Sabha as of June, 2014 was used to produce this small note: (No) Missing daughters of Indian Politicians. While data on all political candidates in national, state and some local elections from myNeta was used to analyze spousal income, movable and immovable assets by politician gender. (Analysis.)
Table of Contents
Data on Indian MPs from the 'National Portal of India'
Get the Data
To get the data, download the scripts in the get_data/archive_india_gov folder to your computer. The scripts require
Python 3.x and
BeautifulSoup 4 to run. The package dependency is listed in get_data/archive_india_gov/requirements.txt. Once you have installed the dependencies, you can run the scripts.
To download web pages containing the information, run scrape_indian_gov.py:
The HTML files will be saved in
To parse and extract information from the HTML files, run extract_indian_gov.py
python extract_indian_gov.py <dir>
The script outputs a CSV file, saving it as
The data were scraped in June, 2014 and November, 2015.
- 15th Lok Sabha (Scraped June, 2014)
- 16th Lok Sabha (Scraped November, 2015)
- Rajya Sabha 2014 (Scraped June, 2014)
- Rajya Sabha 2015 (Scraped November, 2015)
Data on All Candidates from myNeta
Select biographical and electoral data of national, state and some local elections candidates from myneta.info. The data were scraped in November, 2015.
Get the Data
There are three scripts. Why three? Information about gender is not provided on candidate pages and is integrated later. The three scripts are:
- india_mps.py to download basic profile data.
- india_mps_women.py to get information on gender.
- india_mps_gender.py to merge gender information into all three CSVs.
To begin using the scripts, install the requirements. Then download the scripts into a folder, and run scripts from the command line.
usage: india_mps.py [-h] [-o OUTPUT] [-n MAX_CONN] [-s FROM_STATE] [-y FROM_YEAR] [-c FROM_CONSTITUENCY] [-t TYPE] [--no-header] optional arguments: -h, --help show this help message and exit -o OUTPUT, --output OUTPUT Output CSV file name -n MAX_CONN, --max-conn MAX_CONN Max concurrent connections -s FROM_STATE, --from-state FROM_STATE Start from a specific state -y FROM_YEAR, --from-year FROM_YEAR Start from a specific election year -c FROM_CONSTITUENCY, --from-constituency FROM_CONSTITUENCY Start from a specific constituency -t TYPE, --type TYPE Type (all|state|nation|local) --no-header Output without header at the first row
python india_mps.py -o india-mps-all.csv
Get all women candidates
URL of all women candidates saved as:
To merge all candidates with gender, run:
- Each row = politician per constituency per election year.
- Politician Name, Constituency, State, Party, Election Year, Whether They Won or Not, Type: State/National/Local
- Education, Age, Address, Self Profession, Spouse Profession
- Income Tax Return: Self Total Income, Spouse Total Income
- Self Movable Assests, Spouse Movable Assets:
- cash--- for self and spouse
- jewellery --- for self and spouse
- totals --- for self and spouse
- Immovable Assets --- Self Totals, Spouse Totals
- Liabilities --- Self Totals, Spouse Totals
There are missing data for election years before 2011:
- Income Tax Return so no Self/Spouse Total Income
- No column for Spouse in the Liabilities
- In a few elections, multiple candidates with the same name are fighting to get elected from the same constituency. For instance, check here, here, here, here, here, and here.
Scripts, figures, and writing are released under CC BY 2.0.