# Assignment Topic: Making Election Predictions  

##### If during the early part of 2014 you had to predict the upcoming Lok Sabha Elections, what would be your predictive model?  (Say you wanted to predict the election outcomes six months before the elections were held)

To solve this problem…here are some possible stuff you can think of….

* Identify data sources. This will depend upon what factors you think can determine winning elections
* Identifying the right technique/techniques to model that data
* Make predictions, validate and update your model.

## Submission details:

### Part 1: 
#### Identify the data sources, the variables of interest. Process the data and create a ‘clean’ data file(s).

##### Marks distribution: 20.

#### What needs to be submitted:

* Text file explaining what variables you are choosing and why, the source codes, explanation etc
* The cleaned data file (s)
* Relevant r codes if any

Marks will be awarded based on what factors you think are important in model building, what data sources you identify and finally how the clean data looks. The clean data has to be submitted as CSV (files).

### Part 2: 
#### Develop the model,present a predictive model.

##### Marks distribution: 20

#### What needs to be submitted:

* Text file explaining the model (s), why did you do what you did, model performance et
* Relevant R codes if any
 


In [1]:
import numpy as np
import pandas as pd

## Part 1
* Identify the data sources
* Variables of interest. 
* Process the data and create a ‘clean’ data file(s).


### Data Sources

#### use the data from the Website http://eci.nic.in/eci_main1/ElectionStatistics.aspx
We are using the data from 2009 election results and predecting 2014 Election results, assuming the Prediciton is done 6 months before 2014 election

##### The Data we took it is 
* GENERAL ELECTION TO LOKSABHA, 2009, http://eci.nic.in/eci_main/StatisticalReports/candidatewise/GE2009.xls
 * Electors
 * Candidate Wise results.

In [2]:
# Importing Data Directly from the website
SrcDataElectors = pd.read_excel("http://eci.nic.in/eci_main/StatisticalReports/candidatewise/GE2009.xls","electors", 
                                skiprows=3)
SrcDataCandidates = pd.read_excel("http://eci.nic.in/eci_main/StatisticalReports/candidatewise/GE2009.xls","Cand_Wise", 
                                skiprows=5)

##### The data is from all States. Need to filterout only Uttar Pradesh data for Analysis.

In [3]:
SrcDataElectors.loc[:,"STATE"].value_counts()

Uttar Pradesh                80
Maharashtra                  48
Andhra Pradesh               42
West Bengal                  42
Bihar                        40
Tamil Nadu                   39
Madhya Pradesh               29
Karnataka                    28
Gujarat                      26
Rajasthan                    25
Orissa                       21
Kerala                       20
Assam                        14
Jharkhand                    14
Punjab                       13
Chattisgarh                  11
Haryana                      10
NCT OF Delhi                  7
Jammu & Kashmir               6
Uttarakhand                   5
Himachal Pradesh              4
Arunachal Pradesh             2
Goa                           2
Tripura                       2
Manipur                       2
Meghalaya                     2
Sikkim                        1
Mizoram                       1
Chandigarh                    1
Puducherry                    1
Lakshadweep                   1
Dadra & 

In [16]:
SrcDataCandidates.loc[:,"State name"].value_counts()

Uttar Pradesh                1368
Tamil Nadu                    823
Maharashtra                   819
Bihar                         672
Andhra Pradesh                569
Madhya Pradesh                429
Karnataka                     427
West Bengal                   368
Gujarat                       359
Rajasthan                     346
Jharkhand                     249
Punjab                        218
Kerala                        217
Haryana                       210
Chattisgarh                   178
NCT OF Delhi                  160
Assam                         158
Orissa                        157
Jammu & Kashmir                81
Uttarakhand                    76
Himachal Pradesh               31
Puducherry                     28
Tripura                        19
Goa                            18
Manipur                        16
Chandigarh                     14
Andaman & Nicobar Islands      11
Meghalaya                      11
Arunachal Pradesh               8
Sikkim        

In [4]:
#UPCandidates

UPElectors = SrcDataElectors.loc[SrcDataElectors["STATE"] =='Uttar Pradesh',:]
UPCandidates = SrcDataCandidates.loc[SrcDataCandidates["State name"]=='Uttar Pradesh',:]

In [5]:
UPElectors["STATE"].value_counts()

Uttar Pradesh    80
Name: STATE, dtype: int64

In [25]:
UPCandidates["State name"].value_counts()

Uttar Pradesh    1368
Name: State name, dtype: int64

In [6]:
UPElectors.columns

Index(['STATE CODE', 'STATE', 'PC NO', 'PARLIAMENTARY CONSTITUENCY',
       'Total voters', 'Total_Electors', 'TOT_CONTESTANT', 'POLL PERCENTAGE'],
      dtype='object')

In [28]:
UPCandidates.columns

Index(['ST_CODE', 'State name', 'Month', 'Year', 'PC Number', 'PC name',
       'PC Type', 'Candidate Name', 'Candidate Sex', 'Candidate Category',
       'Candidate Age', 'Party Abbreviation', 'Total Votes Polled',
       'Position'],
      dtype='object')

#### Joining the Candidates and Electors data to get one holistic Data Frame

In [7]:
temp = pd.merge(UPCandidates, UPElectors, how='outer', left_on='PC Number', right_on='PC NO')

In [8]:
temp.columns

Index(['ST_CODE', 'State name', 'Month', 'Year', 'PC Number', 'PC name',
       'PC Type', 'Candidate Name', 'Candidate Sex', 'Candidate Category',
       'Candidate Age', 'Party Abbreviation', 'Total Votes Polled', 'Position',
       'STATE CODE', 'STATE', 'PC NO', 'PARLIAMENTARY CONSTITUENCY',
       'Total voters', 'Total_Electors', 'TOT_CONTESTANT', 'POLL PERCENTAGE'],
      dtype='object')

In [9]:
UP2009BASEDATA = temp.loc[:,[ 'STATE CODE', 'State name', 'PC Number', 'PC name',
       'PC Type', 'Candidate Name', 'Candidate Sex', 'Candidate Category',
       'Candidate Age', 'Party Abbreviation', 'Total Votes Polled', 'Position',
       'Total voters', 'Total_Electors', 'TOT_CONTESTANT', 'POLL PERCENTAGE']]

In [10]:
UP2009BASEDATA

Unnamed: 0,STATE CODE,State name,PC Number,PC name,PC Type,Candidate Name,Candidate Sex,Candidate Category,Candidate Age,Party Abbreviation,Total Votes Polled,Position,Total voters,Total_Electors,TOT_CONTESTANT,POLL PERCENTAGE
0,S24,Uttar Pradesh,1,Saharanpur,GEN,JAGDISH SINGH RANA,M,GEN,53,BSP,354807,1,821122,1298132,15,63.254122
1,S24,Uttar Pradesh,1,Saharanpur,GEN,RASHEED MASOOD,M,GEN,59,SP,269934,2,821122,1298132,15,63.254122
2,S24,Uttar Pradesh,1,Saharanpur,GEN,JASWANT SINGH SAINI,M,GEN,38,BJP,99894,3,821122,1298132,15,63.254122
3,S24,Uttar Pradesh,1,Saharanpur,GEN,GAJAY SINGH,M,GEN,50,INC,62593,4,821122,1298132,15,63.254122
4,S24,Uttar Pradesh,1,Saharanpur,GEN,HAJI MOHAMMED TAUSEEF,M,GEN,45,PECP,9214,5,821122,1298132,15,63.254122
5,S24,Uttar Pradesh,1,Saharanpur,GEN,M.RASHID KHAN,M,GEN,45,IND,8096,6,821122,1298132,15,63.254122
6,S24,Uttar Pradesh,1,Saharanpur,GEN,CHATTAR SINGH KASHYAP,M,GEN,54,VAJP,4647,7,821122,1298132,15,63.254122
7,S24,Uttar Pradesh,1,Saharanpur,GEN,MASHKOOR,M,GEN,49,IND,4433,8,821122,1298132,15,63.254122
8,S24,Uttar Pradesh,1,Saharanpur,GEN,NATHLU RAM,M,SC,57,IND,2069,9,821122,1298132,15,63.254122
9,S24,Uttar Pradesh,1,Saharanpur,GEN,TEJVEER,M,GEN,48,IND,1773,10,821122,1298132,15,63.254122


In [39]:
UP2009BASEDATAWINNERS = UP2009BASEDATA.loc[UP2009BASEDATA.Position == 1,:]

In [40]:
UP2009BASEDATAWINNERS = UP2009BASEDATAWINNERS.loc[:,['STATE CODE','PC Number','Candidate Name','Party Abbreviation','Total_Electors','POLL PERCENTAGE']]

In [42]:
UP2009BASEDATAWINNERS.columns = ['STATE_CODE 09','PC Number 09','Winning Candidate 09','Winning Party 09','Total_Electors 09','Winning Poll Percentage 09']
UP2009BASEDATAWINNERS 

Unnamed: 0,STATE_CODE 09,PC Number 09,Winning Candidate 09,Winning Party 09,Total_Electors 09,Winning Poll Percentage 09
0,S24,1,JAGDISH SINGH RANA,BSP,1298132,63.254122
15,S24,2,TABASSUM BEGUM,BSP,1282551,56.587067
30,S24,3,KADIR RANA,BSP,1370117,54.435424
53,S24,4,SANJAY SINGH CHAUHAN,RLD,1287070,55.017132
76,S24,5,YASHVIR SINGH,SP,1196566,53.778062
90,S24,6,MOHAMMED AZHARUDDIN,INC,1388525,54.807007
108,S24,7,JAYA PRADA NAHATA,SP,1154544,52.503759
124,S24,8,DR. SHAFIQUR RAHMAN BARQ,BSP,1290810,52.831710
135,S24,9,DEVENDRA NAGPAL,RLD,1173915,60.224292
155,S24,10,RAJENDRA AGARWAL,BJP,1508788,48.231494


### Import the data for 2014.

In [17]:
SrcDataElectors2014 = pd.read_excel("http://eci.nic.in/eci_main/StatisticalReports/candidatewise/LS-2014_ElectionResult.xls","electors", 
                                skiprows=3)
SrcDataCandidates2014 = pd.read_excel("http://eci.nic.in/eci_main/StatisticalReports/candidatewise/LS-2014_ElectionResult.xls","Cand_Wise", 
                                skiprows=3)

In [23]:
UPElectors2014 = SrcDataElectors2014.loc[SrcDataElectors2014.STATE == 'Uttar Pradesh',:]
UPCandidates2014 = SrcDataCandidates2014.loc[SrcDataCandidates2014["State name"]=='Uttar Pradesh',:]

In [28]:
temp = pd.merge(UPCandidates2014, UPElectors2014, how='outer', left_on='PC Number', right_on='PC NO')
UP2014BASEDATA = temp.loc[:,[ 'STATE CODE', 'State name', 'PC Number', 'PC name',
       'PC Type', 'Candidate Name', 'Candidate Sex', 'Candidate Category',
       'Candidate Age', 'Party Abbreviation', 'Total Votes Polled', 'Position',
       'Total voters', 'Total_Electors', 'TOT_CONTESTANT', 'POLL PERCENTAGE']]

In [43]:
UP2014TRAIN = pd.merge(UP2014BASEDATA,UP2009BASEDATAWINNERS, how='outer', left_on='PC Number', right_on='PC Number 09')

In [44]:
UP2014TRAIN = UP2014TRAIN.loc[:,['STATE CODE', 'State name', 'PC Number', 'PC name', 'PC Type',
       'Candidate Name', 'Candidate Sex', 'Candidate Category',
       'Candidate Age', 'Party Abbreviation','Total_Electors', 'TOT_CONTESTANT', 'STATE_CODE 09', 'PC Number 09', 'Winning Candidate 09',
       'Winning Party 09','Total_Electors 09', 'Winning Poll Percentage 09']]