The purpose of this notebook is to combine data from our 2 data sources.

Specifically look at the following 2 webpages, with different data on the same election:
- https://www.electionsireland.org/result.cfm?election=1977&cons=219
- https://www.irelandelection.com/election.php?elecid=11&electype=1&constitid=48

Note: The quotas are different.

For every elections we want
- Number of Constituencies
- How many consituency do we have vote data on?
- What was the quota?
- What was the votes/quota in first count?
- What was the lowest votes/quota?
- What was the highest votes/quota?
- Who transfered to who (if you have transfer data)

In [53]:
import pandas as pd
import numpy as np

df1 = pd.read_parquet('DAIL_elections_master.parquet')
df2 = pd.read_parquet('ALL_CANDIDATES.parquet')
print(df1.columns)
print(df2.columns)

Index(['candidate', 'party', 'first_pref_count', 'first_pref_pct',
       'pct_of_quota_reached_with_first_pref', 'elected_on_count', 'status',
       'seat', 'gender', 'election_date', 'electorate', 'number_of_candidates',
       'seats_available', 'constituency_name', 'constituency_name_as_Gaeilge',
       'quota', 'outgoing'],
      dtype='object')
Index(['election', 'elected', 'party', 'first_pref_pct', 'first_pref_count',
       'first_pref_quota_ratio', 'year', 'candidate', 'constituency',
       'election_type'],
      dtype='object')


In [54]:
df1.head()

Unnamed: 0,candidate,party,first_pref_count,first_pref_pct,pct_of_quota_reached_with_first_pref,elected_on_count,status,seat,gender,election_date,electorate,number_of_candidates,seats_available,constituency_name,constituency_name_as_Gaeilge,quota,outgoing
0,Patrick Gaffney,Farmers,10875,0.3483,1.74,1,Made Quota,1.0,Male,1922-06-16,51012,6,4,Carlow Kilkenny,Ceatharlach Cill Chainnigh,6246,
1,W T Cosgrave,Pro-Treaty Sinn Féin,7071,0.2264,1.13,1,Made Quota,2.0,Male,1922-06-16,51012,6,4,Carlow Kilkenny,Ceatharlach Cill Chainnigh,6246,TD
2,Denis Gorey,Labour,6122,0.196,0.98,2,Made Quota,3.0,Male,1922-06-16,51012,6,4,Carlow Kilkenny,Ceatharlach Cill Chainnigh,6246,
3,General Gerald O'Sullivan,Pro-Treaty Sinn Féin,2681,0.0859,0.43,4,Made Quota,4.0,Male,1922-06-16,51012,6,4,Carlow Kilkenny,Ceatharlach Cill Chainnigh,6246,TD
4,Edward Aylward,Anti-Treaty Sinn Féin,3365,0.1078,0.54,4,Not Elected,,Male,1922-06-16,51012,6,4,Carlow Kilkenny,Ceatharlach Cill Chainnigh,6246,TD


In [55]:
df2.head()

Unnamed: 0,election,elected,party,first_pref_pct,first_pref_count,first_pref_quota_ratio,year,candidate,constituency,election_type
0,2004 Local Election - Thomastown,True,Labour Party,0.085,641,0.51,2004,Ann Phelan,Thomastown,LOCAL
1,2009 Local Election - Thomastown,True,Labour Party,0.156,1183,0.78,2009,Ann Phelan,Thomastown,LOCAL
2,2011 general election - Carlow–Kilkenny,True,Labour Party,0.109,8072,0.66,2011,Ann Phelan,Carlow–Kilkenny,GENERAL
3,2016 general election - Carlow–Kilkenny,False,Labour Party,0.063,4391,0.38,2016,Ann Phelan,Carlow–Kilkenny,GENERAL
0,1982 (Feb) general election - Carlow–Kilkenny,False,Fianna Fáil,0.017,907,0.1,1982,John McGuinness,Carlow–Kilkenny,GENERAL


first, we organise the columns:

- consituency in df2 is the same as constituency in df1, 
- df2 call first_pref_quota_ratio which is the same data as pct_of_quota_reached_with_first_pref in df1

In [56]:
df1 = df1.reset_index().drop(columns=['index'])
print(df1.shape)

(4811, 17)


In [57]:
df1['year'] = df1.election_date.apply(lambda date:date.year)
df2 = df2.rename(columns=
{
    'first_pref_quota_ratio':'pct_of_quota_reached_with_first_pref',
    'constituency':'constituency_name'
}).reset_index().drop(columns=['index'])
print(df2.shape)

(36243, 10)


In [61]:
df3 = df2.reindex(columns=set(df1.columns).union(set(df2.columns)))
df3.columns

Index(['first_pref_pct', 'gender', 'party', 'electorate', 'elected_on_count',
       'outgoing', 'seat', 'pct_of_quota_reached_with_first_pref', 'status',
       'first_pref_count', 'candidate', 'constituency_name',
       'constituency_name_as_Gaeilge', 'number_of_candidates',
       'seats_available', 'election_type', 'elected', 'election', 'quota',
       'election_date', 'year'],
      dtype='object')

In [63]:
df3= df3[['year','election_date','election_type','constituency_name','candidate','party','elected','first_pref_count','pct_of_quota_reached_with_first_pref','first_pref_pct','status','elected_on_count','quota','number_of_candidates','seats_available']]
df3

Unnamed: 0,year,election_date,election_type,constituency_name,candidate,party,elected,first_pref_count,pct_of_quota_reached_with_first_pref,first_pref_pct,status,elected_on_count,quota,number_of_candidates,seats_available
0,2004,,LOCAL,Thomastown,Ann Phelan,Labour Party,True,641,0.51,0.085,,,,,
1,2009,,LOCAL,Thomastown,Ann Phelan,Labour Party,True,1183,0.78,0.156,,,,,
2,2011,,GENERAL,Carlow–Kilkenny,Ann Phelan,Labour Party,True,8072,0.66,0.109,,,,,
3,2016,,GENERAL,Carlow–Kilkenny,Ann Phelan,Labour Party,False,4391,0.38,0.063,,,,,
4,1982,,GENERAL,Carlow–Kilkenny,John McGuinness,Fianna Fáil,False,907,0.10,0.017,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36238,1950,,LOCAL,Coole,Patrick Weir,Independent,False,64,0.09,0.015,,,,,
36239,1979,,LOCAL,Gorey,Martin Connors,Labour Party,False,319,0.19,0.032,,,,,
36240,2016,,GENERAL,Mayo,Tom Moran,Solidarity - People Before Profit,False,576,0.05,0.009,,,,,
36241,1920,,LOCAL,West Ward,John Finan,Ratepayers,False,34,0.32,0.036,,,,,
