## Creating the Data Set for Analysis

This notebook aggregates data from three sources:
- Cleaned bill data from https://www.parl.ca/legisinfo/en/bills?parlsession=all in 'data/cleaned_data.csv'
- Bill info webscraped from the LEGISinfo database (https://www.parl.ca/legisinfo/en/bill/) in 'data/bill_info.csv'
- Information on the members of parliament from https://www.ourcommons.ca/members/en/search?parliament=all&caucusId=all&province=all&gender=all in 'data/members_of_parliament.csv'


In [356]:
# Import Python libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [357]:
bill = pd.read_csv('data/bill_info.csv')
mps = pd.read_csv('data/members_of_parliament.csv')
data = pd.read_csv('data/cleaned_data.csv')

### Part 1a: Reformatting Bills Data

In [358]:
# Split bill_info objects into attributes
bill[['Id', 'Name', 'Title', 'Constituency']] = bill['SponsorInfo'].str.split('\n', 3, expand=True)
bill.drop(columns = ['SponsorInfo'], inplace = True)

for col in bill.columns:
    bill[col] = bill[col].apply(lambda x: x.split(':')[1].strip())
    
# Reformat names column to remove middle initials
def removeInitials(text):
    names = text.split(' ')
    names = [x for x in names if '.' not in x]
    return ' '.join(names)
        
bill['Name'] = bill['Name'].apply(lambda x: removeInitials(x))

# Rename columns to specify bill sponsorship
column_names = {
    'Name': 'SponsorName',
    'Title': 'SponsorTitle'
}

bill.rename(columns = column_names, inplace = True)

In [359]:
print(bill.shape)
display(bill.head())

(6761, 4)


Unnamed: 0,Id,SponsorName,SponsorTitle,Constituency
0,44-1/S-1,Yuen Pau Woo,Senator,
1,43-2/S-1,Marc Gold,Senator,
2,43-1/S-1,Joseph Day,Senator,
3,42-1/S-1,Yonah Martin,Senator,
4,41-2/S-1,Claude Carignan,Leader of the Government in the Senate,


### Part 1b: Reformatting MP Data

In [360]:
# Merge first and last name columns
mps['Name'] = mps['First Name'] + ' ' + mps['Last Name']

# Selecting only the political affiliation and namem columns
mps = mps[['Political Affiliation', 'Name']]

In [361]:
display(mps.head())

Unnamed: 0,Political Affiliation,Name
0,Conservative,Ziad Aboultaif
1,Conservative,Scott Aitchison
2,Conservative,Dan Albas
3,Liberal,John Aldag
4,Liberal,Omar Alghabra


### Part 2: Merging Dataframes Together

In [362]:
# Merging bill and mps
bill = pd.merge(bill, mps, left_on = ['SponsorName'], right_on = ['Name'], how = 'left')

In [363]:
# Check for null values
# There is no data on senator affiliation so there will be some null values.

print(bill.isna().sum().to_string())
print(bill.shape)
display(bill)

Id                          0
SponsorName                 0
SponsorTitle                0
Constituency                0
Political Affiliation    1537
Name                     1527
(6763, 6)


Unnamed: 0,Id,SponsorName,SponsorTitle,Constituency,Political Affiliation,Name
0,44-1/S-1,Yuen Pau Woo,Senator,,,
1,43-2/S-1,Marc Gold,Senator,,,
2,43-1/S-1,Joseph Day,Senator,,,
3,42-1/S-1,Yonah Martin,Senator,,,
4,41-2/S-1,Claude Carignan,Leader of the Government in the Senate,,,
...,...,...,...,...,...,...
6758,41-2/C-699,Elizabeth May,Member of Parliament,Saanich—Gulf Islands,Green Party,Elizabeth May
6759,41-2/C-700,Christine Moore,Member of Parliament,Abitibi—Témiscamingue,NDP,Christine Moore
6760,41-2/C-701,Irwin Cotler,Member of Parliament,Mount Royal,Liberal,Irwin Cotler
6761,41-2/C-702,Scott Simms,Member of Parliament,Bonavista—Gander—Grand Falls—Windsor,Liberal,Scott Simms


In [364]:
# Merge data and bill
data = pd.merge(data, bill, on = ['Id'], how = 'left')

# Drop redundant name column
data.drop(columns = 'Name', inplace = True)

# Replace NaN with None in political affiliation columns
data['Political Affiliation'].fillna(value = 'None', inplace = True)

In [365]:
print(data.isna().sum().to_string())
print(data.shape)
display(data.head(10))

Id                         0
Code                       0
Title                      0
LatestStageName            0
ParliamentNumber           0
SessionNumber              0
BillType                   0
PersonName                 0
ReceivedRoyalAssent        0
Ongoing                    0
ReadingsPassed             0
BillOrigin                 0
FirstStageDate           975
LastStageDate            975
TimeDebated              975
SponsorName                0
SponsorTitle               0
Constituency               0
Political Affiliation      0
(6763, 19)


Unnamed: 0,Id,Code,Title,LatestStageName,ParliamentNumber,SessionNumber,BillType,PersonName,ReceivedRoyalAssent,Ongoing,ReadingsPassed,BillOrigin,FirstStageDate,LastStageDate,TimeDebated,SponsorName,SponsorTitle,Constituency,Political Affiliation
0,44-1/S-1,S-1,An Act relating to railways,First reading in the Senate,44,1,Senate Public Bill,,False,True,1,Senate,2021-11-22,2021-11-22,0 days,Yuen Pau Woo,Senator,,
1,43-2/S-1,S-1,An Act relating to railways,First reading in the Senate,43,2,Senate Public Bill,,False,False,1,Senate,2020-09-22,2020-09-22,0 days,Marc Gold,Senator,,
2,43-1/S-1,S-1,An Act relating to railways,First reading in the Senate,43,1,Senate Public Bill,,False,False,1,Senate,2019-12-04,2019-12-04,0 days,Joseph Day,Senator,,
3,42-1/S-1,S-1,An Act relating to railways,First reading in the Senate,42,1,Senate Public Bill,,False,False,1,Senate,2015-12-03,2015-12-03,0 days,Yonah Martin,Senator,,
4,41-2/S-1,S-1,An Act relating to railways,First reading in the Senate,41,2,Senate Government Bill,,False,False,1,Senate,2013-10-15,2013-10-15,0 days,Claude Carignan,Leader of the Government in the Senate,,
5,41-1/S-1,S-1,An Act relating to railways,First reading in the Senate,41,1,Senate Government Bill,,False,False,1,Senate,2011-06-02,2011-06-02,0 days,,Leader of the Government in the Senate,,
6,40-3/S-1,S-1,An Act relating to railways,First reading in the Senate,40,3,Senate Government Bill,,False,False,1,Senate,2010-03-02,2010-03-02,0 days,,Leader of the Government in the Senate,,
7,40-2/S-1,S-1,An Act relating to railways,First reading in the Senate,40,2,Senate Government Bill,,False,False,1,Senate,2009-01-25,2009-01-25,0 days,,Leader of the Government in the Senate,,
8,40-1/S-1,S-1,An Act relating to railways,First reading in the Senate,40,1,Senate Government Bill,,False,False,1,Senate,2008-11-18,2008-11-18,0 days,,Leader of the Government in the Senate,,
9,39-2/S-1,S-1,An Act relating to railways,First reading in the Senate,39,2,Senate Government Bill,,False,False,1,Senate,2007-10-15,2007-10-15,0 days,,Leader of the Government in the Senate,,


In [366]:
# Export data for analysis
data.to_csv('data/final_data.csv', index = False)