# Instruction

In this part of the assignment, you will prepare the data to analyze the "meaningful votes" for the European Union Withdrawal Agreement and carry out a classification task.

There were three attempts to pass a version of the withdrawal agreement (formed in late 2018) in the House of Commons, but in all three attempts, the government led by Prime Minister Theresa May failed to pass. The failures were due to the large number of rebels among Conservative MPs.

If you are not familiar with the story about it you can rely on the following sources:

- Aidt, T., Grey, F. & Savu, A. The Meaningful Votes: Voting on Brexit in the British House of Commons. *Public Choice* (2019).
  - https://link.springer.com/article/10.1007/s11127-019-00762-9
  - An academic article to analyze the situation
  - The analysis is similar to what you will do
- Wikipedia:
  - https://en.wikipedia.org/wiki/Parliamentary_votes_on_Brexit



There are three meaningful votes (see the links above) and the results are accessibe from here:

- Vote1: https://votes.parliament.uk/Votes/Commons/Division/562
- Vote2: https://votes.parliament.uk/Votes/Commons/Division/623
- Vote3: https://votes.parliament.uk/Votes/Commons/Division/664

I compiled the results of three meaningful votes, along with the [Revoke Article 50 and remain in the EU petition](https://petition.parliament.uk/archived/petitions/241584) (from Assignment 2), in a csv file.

## Your task

1. Get other datasets and merge them with the voting record data
2. Complete a machine learning task to predict rebels among Conservative MPs


In [1]:
import numpy as np
import pandas as pd
import json
from urllib.request import urlopen

# Get the main data from the GV918 data repository (5 percent)

Get the data hosted on:
https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data

- The parliamnetary votes as well as the petition outcomes are in `df_meaningful_vote.csv`


In [2]:
!git clone https://github.com/University-of-Essex-Dept-of-Government/GV918-UK-politics-data.git

Cloning into 'GV918-UK-politics-data'...
remote: Enumerating objects: 57, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 57 (delta 0), reused 0 (delta 0), pack-reused 56[K
Receiving objects: 100% (57/57), 33.14 MiB | 22.98 MiB/s, done.
Resolving deltas: 100% (13/13), done.


# Other data sources

In this section, you will get the data from several sources and merge them with the main dataframe.



## Referendum votes, general election data (5 percent)

In this section, you will merge two additional datasets.

1. Election outcomes of 2017 (You can use the code below)
2. Constituency level referendum output (We used this data in the previous class)

Once you merge, create a new variable of the number of petition signatures per electorate.

In [99]:
df_elec = pd.read_csv("/content/GV918-UK-politics-data/Data/HoC-GE2017-results-by-candidate.csv")
df_const = pd.read_csv("/content/GV918-UK-politics-data/Data/HoC-GE2017-constituency-results.csv")
df_const = df_const[['ons_id', 'electorate']]

In [100]:
df_elec.head()

Unnamed: 0,ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,constituency_type,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change
0,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,County,Labour,Lab,Stephen,Kinnock,Male,Yes,Yes,22662,0.681195,0.192155
1,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,County,Conservative,Con,Sadie,Vidal,Female,No,No,5901,0.177378,0.058671
2,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,County,Plaid Cymru,PC,Andrew,Bennison,Male,No,No,2761,0.082993,-0.033208
3,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,County,UK Independence Party,UKIP,Caroline,Jones,Female,No,No,1345,0.040429,-0.117265
4,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,County,Liberal Democrat,LD,Cen,Phillips,Male,No,No,599,0.018005,-0.026312


In [101]:
df_elec['sitting_mp'].unique()

array(['Yes', 'No'], dtype=object)

In [102]:
df_elec = df_elec[df_elec['sitting_mp'] == 'Yes']
len(df_elec)

618

In [103]:
df_const.head()

Unnamed: 0,ons_id,electorate
0,W07000049,49892
1,W07000058,45251
2,S14000001,62130
3,S14000002,64964
4,S14000003,64146


In [73]:
len(df_elec)

3304

In [74]:
len(df_const)

650

In [104]:
df_meaningful_vote = pd.read_csv("/content/GV918-UK-politics-data/Data/df_meaningful_vote.csv")
df_meaningful_vote.head()

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_code,signature_count_241584
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974


In [105]:
df_meaningful_vote = df_meaningful_vote.rename(columns = {'ons_code': 'ons_id', 'signature_count_241584': 'signature_count'})
df_meaningful_vote.head()

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_id,signature_count
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974


In [110]:
df_merged = pd.merge(df_meaningful_vote, df_const, on = 'ons_id', how = 'left')
df_merged

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_id,signature_count,electorate
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559,76076
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129,82546
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543,71654
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920,68786
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974,75965
...,...,...,...,...,...,...,...,...,...,...,...
633,633,4698,Janet Daby,Labour,Lewisham East,0.0,0.0,0.0,E14000787,13684,68124
634,634,4358,Wendy Morton,Conservative,Aldridge-Brownhills,,1.0,1.0,E14000531,3025,60363
635,635,3928,Nick Smith,Labour,Blaenau Gwent,,0.0,0.0,W07000072,2264,51227
636,636,4491,Vicky Foxcroft,Labour,"Lewisham, Deptford",,0.0,0.0,E14000789,24819,78468


In [111]:
df_merged = pd.merge(df_merged, df_elec, on = 'ons_id', how = 'left')
df_merged

Unnamed: 0,index,MemberId,Name,Party,MemberFrom,vote1,vote2,vote3,ons_id,signature_count,...,party_name,party_abbreviation,firstname,surname,gender,sitting_mp,former_mp,votes,share,change
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559,...,Conservative,Con,Theresa,May,Female,Yes,Yes,37718.0,0.647642,-0.010663
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129,...,Conservative,Con,David,Lidington,Male,Yes,Yes,32313.0,0.549700,0.042960
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543,...,Conservative,Con,Cheryl,Gillan,Female,Yes,Yes,33514.0,0.606566,0.016049
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920,...,Conservative,Con,Desmond,Swayne,Male,Yes,Yes,33170.0,0.668386,0.068935
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974,...,Conservative,Con,Oliver,Heald,Male,Yes,Yes,32587.0,0.586308,0.032652
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
633,633,4698,Janet Daby,Labour,Lewisham East,0.0,0.0,0.0,E14000787,13684,...,Labour,Lab,Heidi,Alexander,Female,Yes,Yes,32072.0,0.679477,0.122503
634,634,4358,Wendy Morton,Conservative,Aldridge-Brownhills,,1.0,1.0,E14000531,3025,...,Conservative,Con,Wendy,Morton,Female,Yes,Yes,26317.0,0.654082,0.133587
635,635,3928,Nick Smith,Labour,Blaenau Gwent,,0.0,0.0,W07000072,2264,...,Labour,Lab,Nick,Smith,Male,Yes,Yes,18787.0,0.580132,0.000010
636,636,4491,Vicky Foxcroft,Labour,"Lewisham, Deptford",,0.0,0.0,E14000789,24819,...,Labour,Lab,Vicky,Foxcroft,Female,Yes,Yes,42461.0,0.770449,0.167995


In [108]:
df_merged.columns

Index(['index', 'MemberId', 'Name', 'Party', 'MemberFrom', 'vote1', 'vote2',
       'vote3', 'ons_id', 'signature_count', 'electorate', 'ons_region_id',
       'constituency_name', 'county_name', 'region_name', 'country_name',
       'constituency_type', 'party_name', 'party_abbreviation', 'firstname',
       'surname', 'gender', 'sitting_mp', 'former_mp', 'votes', 'share',
       'change'],
      dtype='object')

In [112]:
df_merged['sig_per_elec'] = df_merged['signature_count'] / df_merged['electorate']
df_merged[['signature_count', 'electorate', 'sig_per_elec']]

Unnamed: 0,signature_count,electorate,sig_per_elec
0,13559,76076,0.178230
1,10129,82546,0.122707
2,13543,71654,0.189005
3,7920,68786,0.115140
4,10974,75965,0.144461
...,...,...,...
633,13684,68124,0.200869
634,3025,60363,0.050113
635,2264,51227,0.044195
636,24819,78468,0.316295


In [113]:
df_merged.columns


Index(['index', 'MemberId', 'Name', 'Party', 'MemberFrom', 'vote1', 'vote2',
       'vote3', 'ons_id', 'signature_count', 'electorate', 'ons_region_id',
       'constituency_name', 'county_name', 'region_name', 'country_name',
       'constituency_type', 'party_name', 'party_abbreviation', 'firstname',
       'surname', 'gender', 'sitting_mp', 'former_mp', 'votes', 'share',
       'change', 'sig_per_elec'],
      dtype='object')

In [124]:
df_merged.rename(columns = {'constituency_name': 'Constituency'}, inplace = True)
df_merged.columns

Index(['index', 'MemberId', 'Name', 'Party', 'MemberFrom', 'vote1', 'vote2',
       'vote3', 'ons_id', 'signature_count', 'electorate', 'ons_region_id',
       'Constituency', 'county_name', 'region_name', 'country_name',
       'constituency_type', 'party_name', 'party_abbreviation', 'firstname',
       'surname', 'gender', 'sitting_mp', 'former_mp', 'votes', 'share',
       'change', 'sig_per_elec'],
      dtype='object')

## 3. MPs positions data (5 percent)

The last dataset to merge is MPs position for Brexit referendum. The data is coming from Aidt et al (2019) paper.

In [46]:
df_mp_positions = pd.read_csv("/content/GV918-UK-politics-data/Data/mp_positions-cleaned.csv")

In [115]:
df_mp_positions.head()

Unnamed: 0,Party,Name,MP vote for Brexit,Constituency
0,Con,Nigel Adams,Leave,Selby and Ainsty
1,Con,Bim Afolami,Remain,Hitchin and Harpenden
2,Con,Stuart Andrew,Leave,Pudsey
3,Con,Edward Argar,Remain,Charnwood
4,Con,Victoria Atkins,Remain,Louth and Horncastle


In [119]:
len(df_mp_positions['Constituency'].unique())

650

In [125]:
df_merged_final = pd.merge(df_merged, df_mp_positions, on = 'Constituency', how = 'left')
df_merged_final

Unnamed: 0,index,MemberId,Name_x,Party_x,MemberFrom,vote1,vote2,vote3,ons_id,signature_count,...,gender,sitting_mp,former_mp,votes,share,change,sig_per_elec,Party_y,Name_y,MP vote for Brexit
0,0,8,Theresa May,Conservative,Maidenhead,1.0,1.0,1.0,E14000803,13559,...,Female,Yes,Yes,37718.0,0.647642,-0.010663,0.178230,Con,Theresa May,Remain
1,1,15,David Lidington,Conservative,Aylesbury,1.0,1.0,1.0,E14000538,10129,...,Male,Yes,Yes,32313.0,0.549700,0.042960,0.122707,Con,David Lidington,Remain
2,2,18,Cheryl Gillan,Conservative,Chesham and Amersham,1.0,1.0,1.0,E14000631,13543,...,Female,Yes,Yes,33514.0,0.606566,0.016049,0.189005,Con,Cheryl Gillan,Leave
3,3,55,Desmond Swayne,Conservative,New Forest West,1.0,1.0,1.0,E14000828,7920,...,Male,Yes,Yes,33170.0,0.668386,0.068935,0.115140,Con,Desmond Swayne,Leave
4,4,69,Oliver Heald,Conservative,North East Hertfordshire,1.0,1.0,1.0,E14000845,10974,...,Male,Yes,Yes,32587.0,0.586308,0.032652,0.144461,Con,Oliver Heald,Remain
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
633,633,4698,Janet Daby,Labour,Lewisham East,0.0,0.0,0.0,E14000787,13684,...,Female,Yes,Yes,32072.0,0.679477,0.122503,0.200869,Lab,Janet Daby,Remain
634,634,4358,Wendy Morton,Conservative,Aldridge-Brownhills,,1.0,1.0,E14000531,3025,...,Female,Yes,Yes,26317.0,0.654082,0.133587,0.050113,Con,Wendy Morton,Remain
635,635,3928,Nick Smith,Labour,Blaenau Gwent,,0.0,0.0,W07000072,2264,...,Male,Yes,Yes,18787.0,0.580132,0.000010,0.044195,Lab,Nick Smith,Remain
636,636,4491,Vicky Foxcroft,Labour,"Lewisham, Deptford",,0.0,0.0,E14000789,24819,...,Female,Yes,Yes,42461.0,0.770449,0.167995,0.316295,Lab,Vicky Foxcroft,Remain


In [133]:
df_merged_final[['Party_x', 'Party_y', 'Name_x', 'Name_y']].head()

Unnamed: 0,Party_x,Party_y,Name_x,Name_y
0,Conservative,Con,Theresa May,Theresa May
1,Conservative,Con,David Lidington,David Lidington
2,Conservative,Con,Cheryl Gillan,Cheryl Gillan
3,Conservative,Con,Desmond Swayne,Desmond Swayne
4,Conservative,Con,Oliver Heald,Oliver Heald


In [127]:
df_merged_final.columns

Index(['index', 'MemberId', 'Name_x', 'Party_x', 'MemberFrom', 'vote1',
       'vote2', 'vote3', 'ons_id', 'signature_count', 'electorate',
       'ons_region_id', 'Constituency', 'county_name', 'region_name',
       'country_name', 'constituency_type', 'party_name', 'party_abbreviation',
       'firstname', 'surname', 'gender', 'sitting_mp', 'former_mp', 'votes',
       'share', 'change', 'sig_per_elec', 'Party_y', 'Name_y',
       'MP vote for Brexit'],
      dtype='object')

In [138]:
df_brexit = pd.read_csv('/content/GV918-UK-politics-data/Data/merged_brexit_data.csv')
df_brexit

Unnamed: 0,ons_id,ons_region_id,constituency_name,county_name,region_name,country_name,mp_firstname,mp_surname,con_pct,lab_pct,leave_pct,UnempConstRate,ConstPercentChangeOneYr,Age_percent
0,W07000049,W92000004,Aberavon,West Glamorgan,Wales,Wales,Stephen,Kinnock,0.206279,0.538262,0.601245,0.036222,0.023232558139534953,0.196541
1,W07000058,W92000004,Aberconwy,Clwyd,Wales,Wales,Robin,Millar,0.460913,0.397081,0.521971,0.026173,0.04615384615384621,0.274634
2,S14000001,S92000003,Aberdeen North,Scotland,Scotland,Scotland,Kirsty,Blackman,0.201401,0.132013,0.430922,0.033311,-,0.137109
3,S14000002,S92000003,Aberdeen South,Scotland,Scotland,Scotland,Stephen,Flynn,0.359306,0.084009,0.321431,0.019850,-,0.161628
4,S14000003,S92000003,Airdrie and Shotts,Scotland,Scotland,Scotland,Neil,Gray,0.176280,0.320024,0.398381,0.040079,-,0.171167
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
645,E14001059,E12000002,Wythenshawe and Sale East,Greater Manchester,North West,England,Michael,Kane,0.300699,0.532965,0.496481,0.039245,0.08571428571428563,0.149229
646,E14001060,E12000009,Yeovil,Somerset,South West,England,Marcus,Fysh,0.583665,0.063466,0.598655,0.021093,0.024390243902439046,0.241783
647,W07000041,W92000004,Ynys Mon,Gwynedd,Wales,Wales,Virginia,Crosbie,0.354536,0.300695,0.509420,0.029914,0.00014708045300770856,0.262096
648,E14001061,E12000003,York Central,North Yorkshire,Yorkshire and The Humber,England,Rachael,Maskell,0.278093,0.551702,0.388207,0.015197,0.0182648401826484,0.134920


In [144]:
df_brexit = df_brexit[['ons_id', 'con_pct', 'lab_pct', 'leave_pct', 'UnempConstRate']]
df_brexit.head()

Unnamed: 0,ons_id,con_pct,lab_pct,leave_pct,UnempConstRate
0,W07000049,0.206279,0.538262,0.601245,0.036222
1,W07000058,0.460913,0.397081,0.521971,0.026173
2,S14000001,0.201401,0.132013,0.430922,0.033311
3,S14000002,0.359306,0.084009,0.321431,0.01985
4,S14000003,0.17628,0.320024,0.398381,0.040079


# Machine learning

Using the dataset you have prepared, run the classification problem below:

- Data: Conservative MPs meaningful votes
- Output: Rebellion in the meaningful motes
  - Rebel = Conservative MP who voted no (if you don't understand the logic, refer to Aidt et al (2019))
- You can choose input but at least you should include
  - Per electorate signature for the petition
  - MPs position in the referendum
  - Referendum outcomes at the constituency
  - Electoral strength measured by the percentage of votes




## ML procedures (25 percent)

Now you carry out machine learning task. You need to take the following steps:

1. Train-test split
2. Data wrangling (including standardization)
3. Model fitting
  - Run multiple algorithms. Explain the model choice (i.e. why you think the algorithm is worth trying)
  - Carry out parameter tuning
4. Evaluate/compare models
  - How is the performance of different algorithms?


In [132]:
df_merged_final.columns

Index(['index', 'MemberId', 'Name_x', 'Party_x', 'MemberFrom', 'vote1',
       'vote2', 'vote3', 'ons_id', 'signature_count', 'electorate',
       'ons_region_id', 'Constituency', 'county_name', 'region_name',
       'country_name', 'constituency_type', 'party_name', 'party_abbreviation',
       'firstname', 'surname', 'gender', 'sitting_mp', 'former_mp', 'votes',
       'share', 'change', 'sig_per_elec', 'Party_y', 'Name_y',
       'MP vote for Brexit'],
      dtype='object')

In [146]:
df = df_merged_final[['ons_id','sig_per_elec','MP vote for Brexit','share', ]]
df.head()

Unnamed: 0,ons_id,sig_per_elec,MP vote for Brexit,share
0,E14000803,0.17823,Remain,0.647642
1,E14000538,0.122707,Remain,0.5497
2,E14000631,0.189005,Leave,0.606566
3,E14000828,0.11514,Leave,0.668386
4,E14000845,0.144461,Remain,0.586308


In [152]:
df_analysis = pd.merge(df, df_brexit, on = 'ons_id', how = 'left')
df_analysis.head

<bound method NDFrame.head of         ons_id  sig_per_elec MP vote for Brexit     share   con_pct   lab_pct  \
0    E14000803      0.178230             Remain  0.647642  0.577427  0.139524   
1    E14000538      0.122707             Remain  0.549700  0.540429  0.253632   
2    E14000631      0.189005              Leave  0.606566  0.554009  0.128688   
3    E14000828      0.115140              Leave  0.668386  0.638353  0.131098   
4    E14000845      0.144461             Remain  0.586308  0.565601  0.236846   
..         ...           ...                ...       ...       ...       ...   
633  E14000787      0.200869             Remain  0.679477  0.215397  0.594912   
634  E14000531      0.050113             Remain  0.654082  0.707895  0.203701   
635  W07000072      0.044195             Remain  0.580132  0.190245  0.491810   
636  E14000789      0.316295             Remain  0.770449  0.113838  0.708279   
637  E14000822      0.120357              Leave  0.474886  0.500117  0.391629  

In [151]:
df_analysis = df_analysis.dropna()
df_analysis

Unnamed: 0,ons_id,sig_per_elec,MP vote for Brexit,share,con_pct,lab_pct,leave_pct,UnempConstRate
0,E14000803,0.178230,Remain,0.647642,0.577427,0.139524,0.450257,0.013594
1,E14000538,0.122707,Remain,0.549700,0.540429,0.253632,0.517879,0.015653
2,E14000631,0.189005,Leave,0.606566,0.554009,0.128688,0.449850,0.012641
3,E14000828,0.115140,Leave,0.668386,0.638353,0.131098,0.552577,0.014746
4,E14000845,0.144461,Remain,0.586308,0.565601,0.236846,0.514301,0.015471
...,...,...,...,...,...,...,...,...
633,E14000787,0.200869,Remain,0.679477,0.215397,0.594912,0.353705,0.036879
634,E14000531,0.050113,Remain,0.654082,0.707895,0.203701,0.677963,0.026034
635,W07000072,0.044195,Remain,0.580132,0.190245,0.491810,0.620280,0.039571
636,E14000789,0.316295,Remain,0.770449,0.113838,0.708279,0.244262,0.037482


## Interpretations (10 percent)

Summarise the finding and provide some discussion in writing (300 words or more). The discussion can include:

- Which algorism worked the best? Why?
- Which meaningful vote does the model explain the most? Why?
