# Congress and Bills Dataset
We associate members of the House of Representatives (HoR) with the respective bills they authored.

## Importing Datasets
The bills dataset in the `18th-congress-bills` folder and the HoR members data in the `congress-members-rip` folder are imported.

In [1]:
import pandas as pd

bills_df = pd.read_csv('18th-congress-bills/hor_bills.csv', encoding='utf-8')
reps_df = pd.read_csv('congress-members-rip/18th_congress_hor.csv', encoding='utf-8')

In [2]:
reps_df

Unnamed: 0,rep_name,rep_region,rep_subdistrict,partylist,link,name,party,votes,conf_date,district,region,subdistrict
0,"Abante, Bienvenido Jr. M.",Manila,6th District,,abante-b,"ABANTE, BIENVENIDO JR. MIRANDO",ASENSO,49795,14-May-2019,6th District,NATIONAL CAPITAL REGION,Manila
1,"Abaya, Francis Gerald Aguinaldo",Cavite,1st District,,abaya-fg,"ABAYA, FRANCIS GERALD AGUINALDO",LP,117031,16-May-2019,1st District,REGION IV-A,Cavite
2,"Abellanosa, Rodrigo A.",Cebu City,2nd District,,abellanosa-r,"ABELLANOSA, RODRIGO ABELLANA",LDP,163752,15-May-2019,2nd District,REGION VII,Cebu City
3,"Abu, Raneo ""Ranie"" E.",Batangas,2nd District,,abu-r,"ABU, RANEO ENRIQUEZ",NP,117205,16-May-2019,2nd District,REGION IV-A,Batangas
4,"Abueg-Zaldivar, Cyrille ""Beng"" F.",Palawan,2nd District,,abueg-zaldivar,"ABUEG, CYRILLE FABELLO",PPP,112960,17-May-2019,2nd District,REGION IV-B,Palawan
...,...,...,...,...,...,...,...,...,...,...,...,...
303,"Yu, Divina Grace C.",Zamboanga del Sur,1st District,,yu-d,"YU, DIVINA GRACE CABARDO",PDPLBN,164990,18-May-2019,1st District,REGION IX,Zamboanga del Sur
304,"Zamora, Manuel ""Way Kurat"" E.",Davao de Oro,1st District,,zamora-m,"ZAMORA, MANUEL ESQUIVEL",HNP,78257,17-May-2019,1st District,REGION XI,Davao de Oro
305,"Zamora, Ronaldo B.",San Juan City,Lone District,,zamora-r,"ZAMORA, RONALDO BAYAN",PDPLBN,35386,13-May-2019,Lone District,NATIONAL CAPITAL REGION,San Juan City
306,"Zarate, Carlos Isagani T.",,,BAYAN MUNA,zarate-c,,,,,,,


In [3]:
bills_df.iloc[9399]

ID                                                                            HB09411
Significance                                                                 National
Date Filed                                                                 2021-05-24
Full Title                          AN ACT\nPROVIDING FOR ADDITIONAL MECHANISMS TO...
Principal Author/s                  FUENTEBELLA, ARNIE B.;SAKALURAN, PRINCESS RIHA...
Date Read                                                                  2021-05-24
Primary Referral                                                     ECONOMIC AFFAIRS
Bill Status                         Approved by the House on 2021-06-01, transmitt...
Mother Bill Status                                                                NaN
Date Approved on Second Reading                                            2021-05-25
Date Approved on Third Reading                                             2021-06-01
Senate Bill Counterpart                               

Entries which are not house bills are removed.

In [4]:
bills_df = bills_df[bills_df['ID'].str[:2]=='HB']
bills_df


Unnamed: 0,ID,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,Date Acted Upon By the President,Republic Act No.,Author Count
0,HB00001,National,2019-07-01,"AN ACT INSTITUTIONALIZING A NATIONAL VALUES, E...","UNGAB, ISIDRO T.;CAYETANO, ALAN PETER ""COMPAÑE...",2019-07-23,BASIC EDUCATION AND CULTURE,Substituted by HB05829,Republic Act RA11476 enacted on 2020-06-25,,,,,,3
1,HB00002,National,2019-07-01,AN ACT CREATING THE DEPARTMENT OF OVERSEAS FIL...,"DUTERTE, PAOLO Z.;CAYETANO, MARIA LAARNI;CAYET...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05832,Republic Act RA11641 enacted on 2021-12-30,,,,,,3
2,HB00003,National,2019-07-01,AN ACT PROVIDING FOR A NATIONAL PROGRAM TO SUP...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,WELFARE OF CHILDREN,Transmitted to the Committee on Appropriations...,,,,,,,1
3,HB00004,National,2019-07-01,AN ACT CREATING THE EMERGENCY RESPONSE DEPARTM...,"DUTERTE, PAOLO Z.;CAYETANO, ALAN PETER ""COMPAÑ...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05989,"Approved by the House on 2020-09-22, transmitt...",,,,,,2
4,HB00005,National,2019-07-01,AN ACT INSTITUTIONALIZING MICROFINANCE PROGRAM...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,"MICRO, SMALL AND MEDIUM ENTERPRISE DEVELOPMENT","Pending with the Committee on MICRO, SMALL AND...",,,,,,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10817,HB10829,,2022-03-23,"AN ACT CREATING THE ""COUNCIL FOR THE BUILT ENV...","RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1
10818,HB10830,,2022-03-23,AN ACT STRENGTHENING THE ARCHITECTURE PROFESSI...,"RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1
10819,HB10831,,2022-03-29,AN ACT POSTPONING THE DECEMBER 2022 BARANGAY A...,"CARDEMA, DUCIELLE",,,Pending First Reading (Filed last 2022-03-29),,,,,,,1
10820,HB10832,,2022-03-31,"AN ACT MANDATING ALL EXECUTIVE DEPARTMENTS, BU...","FERNANDEZ, DAN S.;CAYETANO, ALAN PETER ""COMPAÑ...",,,Pending First Reading (Filed last 2022-03-31),,,,,,,5


Make sure each bill has an associated author field.

In [5]:
bills_df[bills_df['Principal Author/s'].isna()]

Unnamed: 0,ID,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,Date Acted Upon By the President,Republic Act No.,Author Count
1596,HB01603,National,2019-07-08,AN ACT GRANTING DULY APPOINTED REGIONAL TRIAL ...,,2019-07-24,JUSTICE,Pending with the Committee on JUSTICE since 20...,,,,,,,0
9988,HB1-485,,2021-11-12,AN ACT INSTITUTIONALIZING A SYSTEM OF FREE FOR...,,,,Pending First Reading (Filed last 2021-11-12),,,,,,,0


Fill in author field of HB01603. Row 9988 is dropped since it is a duplicate of HB10485.

In [6]:
bills_df.loc[bills_df['ID']=='HB01603', 'Principal Author/s'] = 'Castelo, Precious Hipolito' # from https://hrep-website.s3.ap-southeast-1.amazonaws.com/legisdocs/basic_18/HB01603.pdf
bills_df.loc[bills_df['ID']=='HB01603', 'Author Count'] = 1
bills_df = bills_df.drop(9988, axis=0)

bills_df[bills_df['Principal Author/s'].isna()]

Unnamed: 0,ID,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,Date Acted Upon By the President,Republic Act No.,Author Count


There are 308 members of the HoR.

In [7]:
rep_names = reps_df['rep_name'].str.lower()
rep_names = rep_names.to_dict()
len(reps_df)

308

Mapping from the representative's name to an integer denoting their index and back are created.

In [8]:
num_to_rep = rep_names
rep_names = dict(zip(rep_names.values(), rep_names.keys()))
rep_names, len(rep_names)
rep_names

{'abante, bienvenido jr. m.': 0,
 'abaya, francis gerald aguinaldo': 1,
 'abellanosa, rodrigo a.': 2,
 'abu, raneo "ranie" e.': 3,
 'abueg-zaldivar, cyrille "beng" f.': 4,
 'abunda, maria fe r.': 5,
 'acop, resurreccion m.': 6,
 'acosta, gil "kabarangay jr." a.': 7,
 'acosta-alba, maria lourdes': 8,
 'adiong, ansaruddin abdul malik a.': 9,
 'advincula, alex "aa" l.': 10,
 'agabas, tyrone d.': 11,
 'agarao, benjamin jr. c.': 12,
 'aglipay, michael edgar y.': 13,
 'aguinaldo, naealla bainto': 14,
 'albano, antonio "tonypet" t.': 15,
 'albano, rodolfo b.': 16,
 'alfonso, samantha louise vargas': 17,
 'almario, joel mayo z.': 18,
 'alonte, marlyn "len" b.': 19,
 'alvarez, franz "chicoy" e.': 20,
 'alvarez, genaro jr. m.': 21,
 'alvarez, pantaleon d.': 22,
 'amante-matba, ma. angelica m.': 23,
 'amatong, isagani s.': 24,
 'andaya, marissa': 25,
 'angara, rommel rico t.': 26,
 'aragones, sol': 27,
 'arbison, munir m.': 28,
 'arenas, rose marie "baby" j.': 29,
 'arroyo, ma. lourdes t.': 30,
 

Import the model used to compare strings for similarity - cosine similarity.

In [9]:
from strsimpy.cosine import Cosine
cosine = Cosine(2)

The `associate_authors` function takes as input a list of authors, the mapping between authors and their integer, their cosine vectors and the similarity module.

This function associates the comma-delimited list of authors associated with a single bill to a representative - either an exact match or one that needs to be computed for by obtaining the string closest to each author.

The `closest_match` function computes for the similarity score between each HoR member and the author string. The HoR member with the greatest similarity profile value is obtained and associated with the author string.

In [10]:
def closest_match(rep_vec, rep_names, author, model, out, verbose=False):
    # inefficient - sort as binary tree, find insertion and survey neighboring nodes more efficient
    '''
    get name of representative closest to input
    - perform cosine similarity if no exact match
        - if entry it is most similar to has already been classified, get next best until one is found
    - return exact match otherwise
    '''
    assert(type(model) is not Cosine or type(rep_vec) is dict)
    rep_score = []
    for rep in rep_names.keys():
        if type(model) is Cosine:
            rep_score.append((rep, 1-model.similarity_profiles(model.get_profile(author), rep_vec[rep])))
        else:
            rep_score.append((rep, model.distance(author, rep)))
    reps_sorted_scores = sorted(rep_score, key=lambda x: x[1])
    for i in range(len(rep_score)):
        if out.get(rep_names[reps_sorted_scores[i][0]], None) is None:
            if verbose:
                print(reps_sorted_scores[i], author)
            return reps_sorted_scores[i][0]

def associate_authors(authors_str, rep_names, vectorized_reps, model, verbose=False):
    '''
    get name of representative closest to input
    - perform cosine similarity if no exact match
    - return exact match otherwise
    '''
    out = {}
    author_li = authors_str.lower().split(';')
    second_pass = []
    for author in author_li:
        author = author.strip()
        if author == '':
            continue
        # if author is an exact match to key and not yet classified
        if author in rep_names and rep_names[author] not in out:
            out[rep_names[author]]=1
        # if author is not an exact match to key OR was classified already,
        # save for second pass
        else:
            second_pass.append(author)
    
    for author in second_pass:
        author_match = closest_match(vectorized_reps, rep_names, author, model, out, verbose=verbose)
        out[rep_names[author_match]] = 1
    return list(out.keys())

Test `associate_authors` on one entry. The similarity score of each string passed to the `closest_match` function is shown below together with the string with best match.

In [11]:
vectorized_reps = {x:cosine.get_profile(x) for x in rep_names}
out = associate_authors(bills_df['Principal Author/s'].iloc[10787], rep_names, vectorized_reps, cosine, verbose=True)
len(out)

('zamora, manuel "way kurat" e.', 0.26843647904971013) zamora, way kurat
('villafuerte, luis raymund  jr. "lray" f.', 0.03607461457624028) villafuerte, luis raymund "lray" jr f.
('pacquiao, rogelio "ruel" d.', 0.13289003047588022) pacquiao, rogelio d.
('momo, romeo sr. s.', 0.0897008726364874) momo, romeo s.
('pacquiao, alberto "bobby" d.', 0.16112950719213903) pacquiao, alberto d.
('palma, wilter  "sharky" ii wee', 0.0) palma, wilter "sharky" ii wee


34

Apply the function to each bill to associate a list of HoR authors with them. Its progress can be seen below.

In [12]:
from tqdm.notebook import tqdm
from functools import partial

bills_df['Principal Author/s'] = bills_df['Principal Author/s'].fillna('')

const_func = partial(associate_authors, rep_names=rep_names, model=cosine, vectorized_reps=vectorized_reps)
bills_df['indiv_cong'] = list(tqdm(map(const_func, bills_df['Principal Author/s'].to_list()), total=len(bills_df)))

  0%|          | 0/10821 [00:00<?, ?it/s]

Expected runtime: ~2-2.5 minutes

Check for missed values - validate that all authors present in the `Principal Author/s` field are mapped to an HoR member.

In [13]:
bills_df['identified_tot'] = bills_df['indiv_cong'].map(lambda x: len(x))
newdf = bills_df[bills_df['identified_tot'] != bills_df['Author Count']]
(newdf['Author Count'] - newdf['identified_tot']).value_counts()

Series([], dtype: int64)

Compare authors and associated HoR members visually.

In [14]:
bills_df['Principal Author/s'], bills_df['indiv_cong'].map(lambda x: ';'.join([num_to_rep[cong] for cong in x]))

(0        UNGAB, ISIDRO T.;CAYETANO, ALAN PETER "COMPAÑE...
 1        DUTERTE, PAOLO Z.;CAYETANO, MARIA LAARNI;CAYET...
 2                      CAYETANO, ALAN PETER "COMPAÑERO" S.
 3        DUTERTE, PAOLO Z.;CAYETANO, ALAN PETER "COMPAÑ...
 4                      CAYETANO, ALAN PETER "COMPAÑERO" S.
                                ...                        
 10817                                  RODRIGUEZ, RUFUS B.
 10818                                  RODRIGUEZ, RUFUS B.
 10819                                    CARDEMA, DUCIELLE
 10820    FERNANDEZ, DAN S.;CAYETANO, ALAN PETER "COMPAÑ...
 10821                                    CARDEMA, DUCIELLE
 Name: Principal Author/s, Length: 10821, dtype: object,
 0        ungab, isidro t.;cayetano, alan peter "compañe...
 1        duterte, paolo z.;cayetano, maria laarni;cayet...
 2                      cayetano, alan peter "compañero" s.
 3        duterte, paolo z.;cayetano, alan peter "compañ...
 4                      cayetano, alan pete

See the notation used to associate each HoR representative - an integer corresponding to its index in `reps_df`.

In [15]:
bills_df['indiv_cong']

0               [284, 72, 180]
1                [104, 73, 72]
2                         [72]
3                    [104, 72]
4                         [72]
                 ...          
10817                    [231]
10818                    [231]
10819                     [67]
10820    [121, 72, 3, 73, 294]
10821                     [67]
Name: indiv_cong, Length: 10821, dtype: object

## Creating Features

In [16]:
import numpy as np

In [17]:
reps_df

Unnamed: 0,rep_name,rep_region,rep_subdistrict,partylist,link,name,party,votes,conf_date,district,region,subdistrict
0,"Abante, Bienvenido Jr. M.",Manila,6th District,,abante-b,"ABANTE, BIENVENIDO JR. MIRANDO",ASENSO,49795,14-May-2019,6th District,NATIONAL CAPITAL REGION,Manila
1,"Abaya, Francis Gerald Aguinaldo",Cavite,1st District,,abaya-fg,"ABAYA, FRANCIS GERALD AGUINALDO",LP,117031,16-May-2019,1st District,REGION IV-A,Cavite
2,"Abellanosa, Rodrigo A.",Cebu City,2nd District,,abellanosa-r,"ABELLANOSA, RODRIGO ABELLANA",LDP,163752,15-May-2019,2nd District,REGION VII,Cebu City
3,"Abu, Raneo ""Ranie"" E.",Batangas,2nd District,,abu-r,"ABU, RANEO ENRIQUEZ",NP,117205,16-May-2019,2nd District,REGION IV-A,Batangas
4,"Abueg-Zaldivar, Cyrille ""Beng"" F.",Palawan,2nd District,,abueg-zaldivar,"ABUEG, CYRILLE FABELLO",PPP,112960,17-May-2019,2nd District,REGION IV-B,Palawan
...,...,...,...,...,...,...,...,...,...,...,...,...
303,"Yu, Divina Grace C.",Zamboanga del Sur,1st District,,yu-d,"YU, DIVINA GRACE CABARDO",PDPLBN,164990,18-May-2019,1st District,REGION IX,Zamboanga del Sur
304,"Zamora, Manuel ""Way Kurat"" E.",Davao de Oro,1st District,,zamora-m,"ZAMORA, MANUEL ESQUIVEL",HNP,78257,17-May-2019,1st District,REGION XI,Davao de Oro
305,"Zamora, Ronaldo B.",San Juan City,Lone District,,zamora-r,"ZAMORA, RONALDO BAYAN",PDPLBN,35386,13-May-2019,Lone District,NATIONAL CAPITAL REGION,San Juan City
306,"Zarate, Carlos Isagani T.",,,BAYAN MUNA,zarate-c,,,,,,,


Decompose the list of authors associated with each bill as a one-to-many relationship.

In [18]:
bill_to_author = pd.DataFrame({'ID':np.repeat(bills_df.ID.values, bills_df.indiv_cong.str.len()),
                        'author_id':np.concatenate(bills_df.indiv_cong.values)})
bill_to_author

Unnamed: 0,ID,author_id
0,HB00001,284
1,HB00001,72
2,HB00001,180
3,HB00002,104
4,HB00002,73
...,...,...
40807,HB10832,72
40808,HB10832,3
40809,HB10832,73
40810,HB10832,294


Inner join the `reps_df` with `bills_df`, associating the authors' details with each bill as well.

In [19]:
reps_df['author_id'] = range(len(reps_df))
reps_df = reps_df.drop('author_id', axis=1)
bill_to_author_df = bill_to_author.join(reps_df, on='author_id', how='inner')
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,rep_region,rep_subdistrict,partylist,link,name,party,votes,conf_date,district,region,subdistrict
0,HB00001,284,"Ungab, Isidro T.",Davao City,3rd District,,ungab-i,"UNGAB, ISIDRO TOM",HNP,160732,14-May-2019,3rd District,REGION XI,Davao City
948,HB00497,284,"Ungab, Isidro T.",Davao City,3rd District,,ungab-i,"UNGAB, ISIDRO TOM",HNP,160732,14-May-2019,3rd District,REGION XI,Davao City
1429,HB00861,284,"Ungab, Isidro T.",Davao City,3rd District,,ungab-i,"UNGAB, ISIDRO TOM",HNP,160732,14-May-2019,3rd District,REGION XI,Davao City
1687,HB01061,284,"Ungab, Isidro T.",Davao City,3rd District,,ungab-i,"UNGAB, ISIDRO TOM",HNP,160732,14-May-2019,3rd District,REGION XI,Davao City
1709,HB01079,284,"Ungab, Isidro T.",Davao City,3rd District,,ungab-i,"UNGAB, ISIDRO TOM",HNP,160732,14-May-2019,3rd District,REGION XI,Davao City
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,cardema-d,,,,,,,
26745,HB09411,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,cardema-d,,,,,,,
33805,HB10153,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,cardema-d,,,,,,,
40805,HB10831,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,cardema-d,,,,,,,


In [20]:
bill_to_author_df.columns

Index(['ID', 'author_id', 'rep_name', 'rep_region', 'rep_subdistrict',
       'partylist', 'link', 'name', 'party', 'votes', 'conf_date', 'district',
       'region', 'subdistrict'],
      dtype='object')

Inspect the unique values of each field.

In [21]:
bill_to_author_df.region.unique()

array(['REGION XI', 'NATIONAL CAPITAL REGION', 'REGION IV-B',
       'REGION VII', 'REGION V', 'REGION VI', 'REGION II', nan,
       'REGION III', 'REGION I', 'CORDILLERA ADMINISTRATIVE REGION',
       'REGION XIII', 'REGION IV-A', 'REGION IX', 'REGION X',
       'REGION VIII', 'REGION XII', 'BARMM'], dtype=object)

In [22]:
un = bill_to_author_df.partylist.unique()
un, len(un)

(array([nan, 'MAGDALO', 'AKO BICOL', 'ANG PROBINSYANO', 'AAMBIS-OWA',
        'GMA', 'MANILA TEACHERS', 'AKO PADAYON PILIPINO', 'ANAKALUSUGAN',
        'ATROL', '1-PACMAN', 'BH', 'KABAYAN', 'ACT-CIS', 'BA', 'AMIN',
        'DIWA', 'AN WARAY', 'BAHAY', 'TINGOG SINIRANGAN', 'ALONA', 'SAGIP',
        'AKO BISAYA', 'ACT-TEACHERS', 'ABANG LINGKOD', 'KABATAAN',
        'BAYAN MUNA', 'GABRIELA', 'HILRECA', 'RECOBODA', 'APEC',
        'MAGSASAKA', 'CIBAC', 'ROBINSYANO AKO', 'BHW', 'KALINGA', 'MARINO',
        'TUCP', 'ABONO', 'A TEACHER', 'BUHAY', 'KUSUG TAUSUG', 'GP', 'RAM',
        'CWS', 'COOP-NATCCO', 'DUMPER PTDA', 'TGP', 'AGAP', 'OFW FAMILY',
        'SENIOR CITIZENS', 'DUTERTE YOUTH'], dtype=object),
 52)

In [23]:
un = bill_to_author_df.party.unique()
(un, len(un))

(array(['HNP', 'NP', 'HTL', 'LP', 'NPC', 'PDPLBN', nan, 'PDP–Laban',
        'BILEG', 'LAKAS', 'NUP', 'NVTEÑO', 'PFP', 'BPP', 'CDP', 'KABAKA',
        'PPP', 'LDP', 'AKSYON', 'PRP', 'PMP', 'ASENSO', 'IND', 'APP'],
       dtype=object),
 24)

Create a column to indicate whether the representative associated with a bill is a Partylist representative.

In [24]:
bill_to_author_df['is_partylist'] = ~bill_to_author_df['partylist'].isna()
bill_to_author_df = bill_to_author_df.drop(['link', 'name', 'conf_date'], axis=1)
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,rep_region,rep_subdistrict,partylist,party,votes,district,region,subdistrict,is_partylist
0,HB00001,284,"Ungab, Isidro T.",Davao City,3rd District,,HNP,160732,3rd District,REGION XI,Davao City,False
948,HB00497,284,"Ungab, Isidro T.",Davao City,3rd District,,HNP,160732,3rd District,REGION XI,Davao City,False
1429,HB00861,284,"Ungab, Isidro T.",Davao City,3rd District,,HNP,160732,3rd District,REGION XI,Davao City,False
1687,HB01061,284,"Ungab, Isidro T.",Davao City,3rd District,,HNP,160732,3rd District,REGION XI,Davao City,False
1709,HB01079,284,"Ungab, Isidro T.",Davao City,3rd District,,HNP,160732,3rd District,REGION XI,Davao City,False
...,...,...,...,...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,,,,,,True
26745,HB09411,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,,,,,,True
33805,HB10153,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,,,,,,True
40805,HB10831,67,"Cardema, Ducielle Marie S.",,,DUTERTE YOUTH,,,,,,True


Drop unneeded columns and rename for clarity.

In [25]:
bill_to_author_df = bill_to_author_df.drop(['rep_region', 'rep_subdistrict'], axis=1)
bill_to_author_df = bill_to_author_df.rename({'subdistrict': 'province'}, axis=1)

In [26]:
bill_to_author_df['party'] = bill_to_author_df['party'].fillna(bill_to_author_df['partylist'])
bill_to_author_df = bill_to_author_df.drop(['partylist', 'votes'], axis=1)
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,party,district,region,province,is_partylist
0,HB00001,284,"Ungab, Isidro T.",HNP,3rd District,REGION XI,Davao City,False
948,HB00497,284,"Ungab, Isidro T.",HNP,3rd District,REGION XI,Davao City,False
1429,HB00861,284,"Ungab, Isidro T.",HNP,3rd District,REGION XI,Davao City,False
1687,HB01061,284,"Ungab, Isidro T.",HNP,3rd District,REGION XI,Davao City,False
1709,HB01079,284,"Ungab, Isidro T.",HNP,3rd District,REGION XI,Davao City,False
...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,,True
26745,HB09411,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,,True
33805,HB10153,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,,True
40805,HB10831,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,,True


In [27]:
bill_to_author_df = bill_to_author_df.drop(['district'], axis=1)
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,party,region,province,is_partylist
0,HB00001,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
948,HB00497,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
1429,HB00861,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
1687,HB01061,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
1709,HB01079,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
26745,HB09411,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
33805,HB10153,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
40805,HB10831,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True


Create dataset for exploratory data analysis.

In [28]:
bill_to_author_df_eda = bills_df.set_index('ID').join(bill_to_author_df.set_index('ID'), on='ID', how='right')
bill_to_author_df_eda.to_csv('congress_bill_association_eda.csv', encoding='utf-8', index=False)
bill_to_author_df_eda

Unnamed: 0_level_0,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,...,Republic Act No.,Author Count,indiv_cong,identified_tot,author_id,rep_name,party,region,province,is_partylist
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
HB00001,National,2019-07-01,"AN ACT INSTITUTIONALIZING A NATIONAL VALUES, E...","UNGAB, ISIDRO T.;CAYETANO, ALAN PETER ""COMPAÑE...",2019-07-23,BASIC EDUCATION AND CULTURE,Substituted by HB05829,Republic Act RA11476 enacted on 2020-06-25,,,...,,3,"[284, 72, 180]",3,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
HB00497,Local,2019-07-01,AN ACT PROVIDING FOR THE CONSTRUCTION OF A CIR...,"DEL MAR, RAUL V.;UNGAB, ISIDRO T.;MADRONA. ELE...",2019-07-23,PUBLIC WORKS AND HIGHWAYS,"Approved by the House on 2020-02-26, transmitt...",,2020-02-18,2020-02-26,...,,3,"[96, 284, 180]",3,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
HB00861,Local,2019-07-01,AN ACT CREATING A REGULAR LTO DISTRICT OFFICE ...,"HERNANDEZ, FERDINAND L.;UNGAB, ISIDRO T.;SARMI...",2019-07-23,TRANSPORTATION,Transmitted to the President on 2022-03-11,,2019-11-06,2019-11-12,...,,3,"[154, 284, 249]",3,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
HB01061,Local,2019-07-02,AN ACT CONVERTING THE STA. IGNACIA – GUIMBA PR...,"YAP, VICTOR A.;UNGAB, ISIDRO T.;MADRONA. ELEAN...",2019-07-24,PUBLIC WORKS AND HIGHWAYS,Republic Act RA11483 enacted on 2020-08-14,,2019-12-18,2020-01-27,...,RA11483,3,"[302, 284, 180]",3,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
HB01079,Local,2019-07-02,AN ACT MANDATING THE CONSTRUCTION OF A COASTAL...,"UNGAB, ISIDRO T.;SY-ALVARADO, ""KUYA"" JOSE ANTO...",2019-07-24,PUBLIC WORKS AND HIGHWAYS,"Approved by the House on 2020-02-26, transmitt...",,2020-02-18,2020-02-26,...,,3,"[284, 262, 180]",3,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HB08231,National,2020-12-14,AN ACT TO OUTLAW THE COMMUNIST PARTY OF THE PH...,"CARDEMA, DUCIELLE",2020-12-16,NATIONAL DEFENSE AND SECURITY,Pending with the Committee on NATIONAL DEFENSE...,,,,...,,1,[67],1,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
HB09411,National,2021-05-24,AN ACT\nPROVIDING FOR ADDITIONAL MECHANISMS TO...,"FUENTEBELLA, ARNIE B.;SAKALURAN, PRINCESS RIHA...",2021-05-24,ECONOMIC AFFAIRS,"Approved by the House on 2021-06-01, transmitt...",,2021-05-25,2021-06-01,...,,288,"[130, 147, 155, 270, 185, 134, 41, 78, 55, 165...",288,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
HB10153,National,2021-09-07,AN ACT APPROPRIATING FUNDS FOR THE OPERATION O...,"FRASCO, VINCENT FRANCO ""DUKE"" D.;VILLANUEVA, N...",2021-09-07,APPROPRIATIONS,Republic Act RA11639 enacted on 2021-12-30,,2021-09-30,2021-09-30,...,,120,"[296, 194, 130, 43, 147, 152, 240, 270, 36, 77...",120,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True
HB10831,,2022-03-29,AN ACT POSTPONING THE DECEMBER 2022 BARANGAY A...,"CARDEMA, DUCIELLE",,,Pending First Reading (Filed last 2022-03-29),,,,...,,1,[67],1,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True


One-hot encode political parties for each candidate.

In [29]:
cols = pd.get_dummies(bill_to_author_df['party'].str.lower().replace('-','_').replace(' ', '_'), prefix='party')
cols

Unnamed: 0,party_1-pacman,party_a teacher,party_aambis-owa,party_abang lingkod,party_abono,party_act-cis,party_act-teachers,party_agap,party_ako bicol,party_ako bisaya,...,party_ppp,party_prp,party_ram,party_recoboda,party_robinsyano ako,party_sagip,party_senior citizens,party_tgp,party_tingog sinirangan,party_tucp
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
948,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1429,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1687,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1709,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26745,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33805,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
40805,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Add the one-hot encoded party columns to the original dataset.

In [30]:
bill_to_author_df = bill_to_author_df.join(cols, how='inner')
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,party,region,province,is_partylist,party_1-pacman,party_a teacher,party_aambis-owa,...,party_ppp,party_prp,party_ram,party_recoboda,party_robinsyano ako,party_sagip,party_senior citizens,party_tgp,party_tingog sinirangan,party_tucp
0,HB00001,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
948,HB00497,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1429,HB00861,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1687,HB01061,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1709,HB01079,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26745,HB09411,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33805,HB10153,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
40805,HB10831,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Clean data by clarifying that "Taguig-Pateros" is the same as "Taguig" in terms of provincial districts.

In [31]:
bill_to_author_df.loc[bill_to_author_df['province']=='Taguig', 'province'] = 'Taguig-Pateros'
print(bill_to_author_df.province.unique(), len(bill_to_author_df.province.unique()))

['Davao City' 'Taguig-Pateros' 'Romblon' 'Cebu City' 'Albay' 'Sorsogon'
 'Muntinlupa City' 'Negros Occidental' 'Isabela' nan 'Tarlac'
 'Camarines Sur' 'Valenzuela City' 'Manila' 'Negros Oriental' 'Masbate'
 'Quirino' 'Ilocos Sur' 'Marikina City' 'Las Piñas City' 'Baguio City'
 'Surigao del Sur' 'Cebu' 'Capiz' 'Batangas' 'Zamboanga del Sur'
 'Quezon City' 'Camiguin' 'Iloilo' 'Parañaque City' 'Nueva Ecija'
 'Iloilo City' 'Makati City' 'Cavite' 'Pasay City' 'Camarines Norte'
 'Bohol' 'Navotas City' 'San Jose Del Monte City' 'Palawan' 'Leyte'
 'Misamis Oriental' 'Siquijor' 'Bulacan' 'Sultan Kudarat' 'Bataan' 'Rizal'
 'Sarangani' 'Pangasinan' 'Surigao del Norte' 'Northern Samar' 'Samar'
 'Bukidnon' 'Agusan del Norte' 'Pampanga' 'Laguna' 'Zamboanga City'
 'Cagayan de Oro City' 'Oriental Mindoro' 'Lapu-Lapu City' 'Malabon City'
 'Zamboanga Sibugay' 'Ifugao' 'Agusan del Sur' 'Ilocos Norte' 'Quezon'
 'Antipolo City' 'Occidental Mindoro' 'Zambales' 'Mountain Province'
 'Maguindanao' 'Benguet' 'L

One-hot encode region and province data for each candidate. Add the one-hot encoded region and province columns to the original dataset.

In [32]:
cols = pd.get_dummies(bill_to_author_df['province'].str.lower().str.replace(' ', '_'), prefix='province')
cols

Unnamed: 0,province_abra,province_agusan_del_norte,province_agusan_del_sur,province_aklan,province_albay,province_antipolo_city,province_antique,province_apayao,province_aurora,province_bacolod_city,...,province_surigao_del_sur,province_taguig-pateros,province_tarlac,province_tawi-tawi,province_valenzuela_city,province_zambales,province_zamboanga_city,province_zamboanga_del_norte,province_zamboanga_del_sur,province_zamboanga_sibugay
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
948,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1429,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1687,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1709,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26745,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33805,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
40805,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [33]:
bill_to_author_df = bill_to_author_df.join(cols, how='inner')
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,party,region,province,is_partylist,party_1-pacman,party_a teacher,party_aambis-owa,...,province_surigao_del_sur,province_taguig-pateros,province_tarlac,province_tawi-tawi,province_valenzuela_city,province_zambales,province_zamboanga_city,province_zamboanga_del_norte,province_zamboanga_del_sur,province_zamboanga_sibugay
0,HB00001,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
948,HB00497,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1429,HB00861,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1687,HB01061,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1709,HB01079,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26745,HB09411,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33805,HB10153,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
40805,HB10831,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
bill_to_author_df['region'].str.replace(' ', '_').str.lower().value_counts()

national_capital_region             5120
region_iii                          3187
region_v                            2926
region_iv-a                         2803
region_x                            2488
region_i                            2188
region_vii                          2158
region_vi                           1874
region_ii                           1859
region_ix                           1307
region_viii                         1196
region_xi                           1159
region_xiii                         1036
cordillera_administrative_region     961
region_xii                           946
region_iv-b                          846
barmm                                318
Name: region, dtype: int64

In [35]:
print(bill_to_author_df.region.unique(), len(bill_to_author_df.region.unique()))
cols = pd.get_dummies(bill_to_author_df['region'].str.replace(' ', '_').str.lower())
cols = cols.rename({'cordillera_administrative_region': 'region_car', 'national_capital_region': 'region_ncr'})
cols

['REGION XI' 'NATIONAL CAPITAL REGION' 'REGION IV-B' 'REGION VII'
 'REGION V' 'REGION VI' 'REGION II' nan 'REGION III' 'REGION I'
 'CORDILLERA ADMINISTRATIVE REGION' 'REGION XIII' 'REGION IV-A'
 'REGION IX' 'REGION X' 'REGION VIII' 'REGION XII' 'BARMM'] 18


Unnamed: 0,barmm,cordillera_administrative_region,national_capital_region,region_i,region_ii,region_iii,region_iv-a,region_iv-b,region_ix,region_v,region_vi,region_vii,region_viii,region_x,region_xi,region_xii,region_xiii
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
948,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1429,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1687,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1709,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
26745,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
33805,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
40805,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [36]:
bill_to_author_df = bill_to_author_df.join(cols, how='inner')
bill_to_author_df

Unnamed: 0,ID,author_id,rep_name,party,region,province,is_partylist,party_1-pacman,party_a teacher,party_aambis-owa,...,region_iv-b,region_ix,region_v,region_vi,region_vii,region_viii,region_x,region_xi,region_xii,region_xiii
0,HB00001,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,1,0,0
948,HB00497,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1429,HB00861,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1687,HB01061,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1709,HB01079,284,"Ungab, Isidro T.",HNP,REGION XI,Davao City,False,0,0,0,...,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18930,HB08231,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
26745,HB09411,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
33805,HB10153,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0
40805,HB10831,67,"Cardema, Ducielle Marie S.",DUTERTE YOUTH,,,True,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Remove hot-encoded categories and authors field.

In [37]:
bill_to_author_df = bill_to_author_df.drop(['party', 'region', 'province', 'rep_name', 'author_id'], axis=1)

In [38]:
bill_to_author_df_consol = bill_to_author_df.groupby('ID').sum()
bill_to_author_df_consol

Unnamed: 0_level_0,is_partylist,party_1-pacman,party_a teacher,party_aambis-owa,party_abang lingkod,party_abono,party_act-cis,party_act-teachers,party_agap,party_ako bicol,...,region_iv-b,region_ix,region_v,region_vi,region_vii,region_viii,region_x,region_xi,region_xii,region_xiii
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
HB00001,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,1,0,0
HB00002,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
HB00003,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
HB00004,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
HB00005,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HB10829,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
HB10830,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
HB10831,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
HB10832,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0


In [39]:
bills_df

Unnamed: 0,ID,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,Date Acted Upon By the President,Republic Act No.,Author Count,indiv_cong,identified_tot
0,HB00001,National,2019-07-01,"AN ACT INSTITUTIONALIZING A NATIONAL VALUES, E...","UNGAB, ISIDRO T.;CAYETANO, ALAN PETER ""COMPAÑE...",2019-07-23,BASIC EDUCATION AND CULTURE,Substituted by HB05829,Republic Act RA11476 enacted on 2020-06-25,,,,,,3,"[284, 72, 180]",3
1,HB00002,National,2019-07-01,AN ACT CREATING THE DEPARTMENT OF OVERSEAS FIL...,"DUTERTE, PAOLO Z.;CAYETANO, MARIA LAARNI;CAYET...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05832,Republic Act RA11641 enacted on 2021-12-30,,,,,,3,"[104, 73, 72]",3
2,HB00003,National,2019-07-01,AN ACT PROVIDING FOR A NATIONAL PROGRAM TO SUP...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,WELFARE OF CHILDREN,Transmitted to the Committee on Appropriations...,,,,,,,1,[72],1
3,HB00004,National,2019-07-01,AN ACT CREATING THE EMERGENCY RESPONSE DEPARTM...,"DUTERTE, PAOLO Z.;CAYETANO, ALAN PETER ""COMPAÑ...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05989,"Approved by the House on 2020-09-22, transmitt...",,,,,,2,"[104, 72]",2
4,HB00005,National,2019-07-01,AN ACT INSTITUTIONALIZING MICROFINANCE PROGRAM...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,"MICRO, SMALL AND MEDIUM ENTERPRISE DEVELOPMENT","Pending with the Committee on MICRO, SMALL AND...",,,,,,,1,[72],1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10817,HB10829,,2022-03-23,"AN ACT CREATING THE ""COUNCIL FOR THE BUILT ENV...","RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1,[231],1
10818,HB10830,,2022-03-23,AN ACT STRENGTHENING THE ARCHITECTURE PROFESSI...,"RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1,[231],1
10819,HB10831,,2022-03-29,AN ACT POSTPONING THE DECEMBER 2022 BARANGAY A...,"CARDEMA, DUCIELLE",,,Pending First Reading (Filed last 2022-03-29),,,,,,,1,[67],1
10820,HB10832,,2022-03-31,"AN ACT MANDATING ALL EXECUTIVE DEPARTMENTS, BU...","FERNANDEZ, DAN S.;CAYETANO, ALAN PETER ""COMPAÑ...",,,Pending First Reading (Filed last 2022-03-31),,,,,,,5,"[121, 72, 3, 73, 294]",5


In [40]:
reps_df.iloc[84]

rep_name                   Cuaresma, Luisa Lloren
rep_region                          Nueva Vizcaya
rep_subdistrict                     Lone District
partylist                                     NaN
link                                   cuaresma-l
name               CUARESMA, LUISA LLOREN QUIRINO
party                                         NUP
votes                                     114,338
conf_date                             14-May-2019
district                            Lone District
region                                  REGION II
subdistrict                         Nueva Vizcaya
Name: 84, dtype: object

Encode each author's participation as an adjacency list.

In [41]:
def adj_matrix(li, size):
    adj = [0 for _ in range(size)]
    for el in li:
        try:
            adj[el] = 1
        except:
            print(el)
    return adj
rep_count = len(rep_names.keys())
# skip individual congress members encoding for now
# bills_df[['cong_'+str(x) for x in range(rep_count)]] = bills_df['indiv_cong'].map(lambda x: adj_matrix(x, rep_count)).apply(pd.Series)
bills_df


Unnamed: 0,ID,Significance,Date Filed,Full Title,Principal Author/s,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,Date Acted Upon By the President,Republic Act No.,Author Count,indiv_cong,identified_tot
0,HB00001,National,2019-07-01,"AN ACT INSTITUTIONALIZING A NATIONAL VALUES, E...","UNGAB, ISIDRO T.;CAYETANO, ALAN PETER ""COMPAÑE...",2019-07-23,BASIC EDUCATION AND CULTURE,Substituted by HB05829,Republic Act RA11476 enacted on 2020-06-25,,,,,,3,"[284, 72, 180]",3
1,HB00002,National,2019-07-01,AN ACT CREATING THE DEPARTMENT OF OVERSEAS FIL...,"DUTERTE, PAOLO Z.;CAYETANO, MARIA LAARNI;CAYET...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05832,Republic Act RA11641 enacted on 2021-12-30,,,,,,3,"[104, 73, 72]",3
2,HB00003,National,2019-07-01,AN ACT PROVIDING FOR A NATIONAL PROGRAM TO SUP...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,WELFARE OF CHILDREN,Transmitted to the Committee on Appropriations...,,,,,,,1,[72],1
3,HB00004,National,2019-07-01,AN ACT CREATING THE EMERGENCY RESPONSE DEPARTM...,"DUTERTE, PAOLO Z.;CAYETANO, ALAN PETER ""COMPAÑ...",2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05989,"Approved by the House on 2020-09-22, transmitt...",,,,,,2,"[104, 72]",2
4,HB00005,National,2019-07-01,AN ACT INSTITUTIONALIZING MICROFINANCE PROGRAM...,"CAYETANO, ALAN PETER ""COMPAÑERO"" S.",2019-07-23,"MICRO, SMALL AND MEDIUM ENTERPRISE DEVELOPMENT","Pending with the Committee on MICRO, SMALL AND...",,,,,,,1,[72],1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10817,HB10829,,2022-03-23,"AN ACT CREATING THE ""COUNCIL FOR THE BUILT ENV...","RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1,[231],1
10818,HB10830,,2022-03-23,AN ACT STRENGTHENING THE ARCHITECTURE PROFESSI...,"RODRIGUEZ, RUFUS B.",,,Pending First Reading (Filed last 2022-03-23),,,,,,,1,[231],1
10819,HB10831,,2022-03-29,AN ACT POSTPONING THE DECEMBER 2022 BARANGAY A...,"CARDEMA, DUCIELLE",,,Pending First Reading (Filed last 2022-03-29),,,,,,,1,[67],1
10820,HB10832,,2022-03-31,"AN ACT MANDATING ALL EXECUTIVE DEPARTMENTS, BU...","FERNANDEZ, DAN S.;CAYETANO, ALAN PETER ""COMPAÑ...",,,Pending First Reading (Filed last 2022-03-31),,,,,,,5,"[121, 72, 3, 73, 294]",5


In [42]:
li = {'source': [], 'target': []}
for src, dest_list in zip(bills_df['ID'].to_list(), bills_df['indiv_cong'].to_list()):
    for dest in dest_list:
        li['source'].append(src)
        li['target'].append(num_to_rep[dest])
graph_viz = pd.DataFrame(li)
graph_viz.to_csv('congress_bill_association_viz.csv', encoding='utf-8', index=False)

Drop redundant columns after adjacency matrix encoding. Remove cross-validation field `identified_tot`.

In [43]:
bills_df = bills_df.drop(['Principal Author/s', 'indiv_cong'], axis=1)

In [44]:
bills_df = bills_df.drop('identified_tot', axis=1)

In [45]:
bills_df.columns[:14]

Index(['ID', 'Significance', 'Date Filed', 'Full Title', 'Date Read',
       'Primary Referral', 'Bill Status', 'Mother Bill Status',
       'Date Approved on Second Reading', 'Date Approved on Third Reading',
       'Senate Bill Counterpart', 'Date Acted Upon By the President',
       'Republic Act No.', 'Author Count'],
      dtype='object')

In [46]:
df = bills_df.set_index('ID').join(bill_to_author_df_consol, on='ID')
df

Unnamed: 0_level_0,Significance,Date Filed,Full Title,Date Read,Primary Referral,Bill Status,Mother Bill Status,Date Approved on Second Reading,Date Approved on Third Reading,Senate Bill Counterpart,...,region_iv-b,region_ix,region_v,region_vi,region_vii,region_viii,region_x,region_xi,region_xii,region_xiii
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
HB00001,National,2019-07-01,"AN ACT INSTITUTIONALIZING A NATIONAL VALUES, E...",2019-07-23,BASIC EDUCATION AND CULTURE,Substituted by HB05829,Republic Act RA11476 enacted on 2020-06-25,,,,...,1,0,0,0,0,0,0,1,0,0
HB00002,National,2019-07-01,AN ACT CREATING THE DEPARTMENT OF OVERSEAS FIL...,2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05832,Republic Act RA11641 enacted on 2021-12-30,,,,...,0,0,0,0,0,0,0,1,0,0
HB00003,National,2019-07-01,AN ACT PROVIDING FOR A NATIONAL PROGRAM TO SUP...,2019-07-23,WELFARE OF CHILDREN,Transmitted to the Committee on Appropriations...,,,,,...,0,0,0,0,0,0,0,0,0,0
HB00004,National,2019-07-01,AN ACT CREATING THE EMERGENCY RESPONSE DEPARTM...,2019-07-23,GOVERNMENT REORGANIZATION,Substituted by HB05989,"Approved by the House on 2020-09-22, transmitt...",,,,...,0,0,0,0,0,0,0,1,0,0
HB00005,National,2019-07-01,AN ACT INSTITUTIONALIZING MICROFINANCE PROGRAM...,2019-07-23,"MICRO, SMALL AND MEDIUM ENTERPRISE DEVELOPMENT","Pending with the Committee on MICRO, SMALL AND...",,,,,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
HB10829,,2022-03-23,"AN ACT CREATING THE ""COUNCIL FOR THE BUILT ENV...",,,Pending First Reading (Filed last 2022-03-23),,,,,...,0,0,0,0,0,0,1,0,0,0
HB10830,,2022-03-23,AN ACT STRENGTHENING THE ARCHITECTURE PROFESSI...,,,Pending First Reading (Filed last 2022-03-23),,,,,...,0,0,0,0,0,0,1,0,0,0
HB10831,,2022-03-29,AN ACT POSTPONING THE DECEMBER 2022 BARANGAY A...,,,Pending First Reading (Filed last 2022-03-29),,,,,...,0,0,0,0,0,0,0,0,0,0
HB10832,,2022-03-31,"AN ACT MANDATING ALL EXECUTIVE DEPARTMENTS, BU...",,,Pending First Reading (Filed last 2022-03-31),,,,,...,0,0,1,0,0,0,0,0,0,0


In [47]:
df.to_csv('18th_hor_bills_dataset.csv', encoding='utf-8')

In [48]:
[col for col in df.columns if any(df[col].isna())]

['Significance',
 'Full Title',
 'Date Read',
 'Primary Referral',
 'Bill Status',
 'Mother Bill Status',
 'Date Approved on Second Reading',
 'Date Approved on Third Reading',
 'Senate Bill Counterpart',
 'Date Acted Upon By the President',
 'Republic Act No.']