# SI 608 Project – Workspace
<span style="font-size: 18px;">General scratchpad workspace that preloads all the dataframes.</span>
<br>See <code>./modules</code> to review how libraries are installed and imported, as well as where the data is loaded, cleaned, and formatted. This is only here as a helpful tool, make a copy and do whatever you'd like. Or don't use this at all if that's preferable.

[OpenSecrets Data Dictionary Index](../../docs/open_source_data_dictionary.md)
<br><small><em>(View the index with markdown preview)</em></small>

## Environment

#### Settings
Configure certain behaviors in this notebook.

In [4]:
DISPLAY_DF = True # for showdf() -> df.head()
SAVE_DF = True # for savedf() -> pd.to_csv()

#### Initialize
Init file contains helper functions used throughout the project.

In [6]:
%run modules/init.ipynb

Initializing project...
pandas is already installed.
matplotlib is already installed.
networkx is already installed.
numpy is already installed.
...initialization complete.


#### Datasets

This module provides a single function for all of the *contribution* data from OpenSecrets.

In [337]:
%run modules/data.ipynb

Loading data module...
...data module loaded.


---
## Data

### 527 data

#### cmtes527

In [342]:
# OpenSecrets Data Definition: 527 Committees
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Cmtes.htm
columns_cmtes527 = ['cycle', 'rpt', 'ein', 'crp527name', 'affiliate', 'ultorg', 
                    'recipcode', 'cmteid', 'cid', 'eccmteid', 'party', 
                    'primcode', 'source', 'ffreq', 'ctype', 'csource', 'viewpt',
                    'comments', 'state']

if not os.path.exists('../../data/open_secrets/527/cmtes527_clean.csv'):
    process_data('../../data/open_secrets/527/cmtes527.csv', n_expected_fields=len(columns_cmtes527), headers=columns_cmtes527, show_errs=False)

df_cmtes527 = pd.read_csv('../../data/open_secrets/527/cmtes527_clean.csv', on_bad_lines='skip')

Reading line 3029 of 15149...
Reading line 6058 of 15149...
Reading line 9087 of 15149...
Reading line 12116 of 15149...
Reading line 15145 of 15149...
File saved as ../../data/open_secrets/527/cmtes527_clean.csv


In [344]:
showdf(df_cmtes527)

Unnamed: 0,cycle__cmtes527,rpt__cmtes527,ein__cmtes527,crp527name__cmtes527,affiliate__cmtes527,ultorg__cmtes527,recipcode__cmtes527,cmteid__cmtes527,cid__cmtes527,eccmteid__cmtes527,party__cmtes527,primcode__cmtes527,source__cmtes527,ffreq__cmtes527,ctype__cmtes527,csource__cmtes527,viewpt__cmtes527,comments__cmtes527,state__cmtes527
0,2002,Q302,861006189,American Electronics Assn,,American Electronics Assn,PB,,,,,C5000,WebPN,Q,F,,N,,AZ
1,2008,Q308,262108560,California 2008 GOP Delegation Corporate,,California 2008 GOP Delegation,RP,,,,R,Z5100,Name,Q,F,,C,,CA
2,2000,Q400,912101097,Alabama League of Environmental Action,,Alabama League of Environmental Action,PI,,,,,JE300,Name,Q,S,Name,L,,AL


#### expends527

In [346]:
# OpenSecrets Data Dictionary 527 Expenditure Data - from IRS Form 8872B
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Expenditures.htm
columns_expends527 = ['rpt', 'formid', 'schbid', 'orgname', 'ein', 'recipient', 
                    'recipientcrp', 'amount', 'date', 'expcode', 'source', 
                    'purpose', 'addr1', 'addr2', 'city', 'state', 'zip',
                    'employer', 'occupation']

if not os.path.exists('../../data/open_secrets/527/expends527_clean.csv'):
    process_data('../../data/open_secrets/527/expends527.csv', nrows=500, headers=columns_expends527, n_expected_fields=len(columns_cmtes527), show_errs=False)

df_expends527 = pd.read_csv('../../data/open_secrets/527/expends527_clean.csv', nrows=10000, on_bad_lines='skip')

File saved as ../../data/open_secrets/527/expends527_clean.csv


In [348]:
showdf(df_expends527)

Unnamed: 0,rpt__expends527,formid__expends527,schbid__expends527,orgname__expends527,ein__expends527,recipient__expends527,recipientcrp__expends527,amount__expends527,date__expends527,expcode__expends527,source__expends527,purpose__expends527,addr1__expends527,addr2__expends527,city__expends527,state__expends527,zip__expends527,employer__expends527,occupation__expends527
0,Q210,9595787,2016057,Republican State Leadership Cmte,50532524,VERIZON,Verizon Communications,125,04/16/2010,A70,@new,TELEPHONE,PO BOX 660720,,DALLAS,TX,75266,,
1,Q210,9595787,2016059,Republican State Leadership Cmte,50532524,VERIZON WIRELESS,Verizon Wireless,141,04/09/2010,A70,@new,CELL PHONE,PO BOX 25505,,LEHIGH VALLEY,PA,18002,,
2,Q210,9595787,2016062,Republican State Leadership Cmte,50532524,SCOTT WARD,Scott Ward,37,04/27/2010,A40,@new,"EXP. REIMB., TRAVEL","4817 41ST STREET, NW",,WASHINGTON,DC,20016,REPUBLICAN STATE LEADERSHIP COMMITTEE,PRESIDENT


#### rcpts527

In [354]:
# OpenSecrets Data Dictionary 527 Contribution Data - from IRS Form 8872A
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Receipts.htm
columns_rcpts527 = ['id', 'rpt', 'formid', 'schaid', 'contribid', 'contrib', 
                    'amount', 'date', 'orgname', 'ultorg', 'realcode', 
                    'recipid', 'recipcode', 'party', 'recipient', 'city', 'state',
                    'zip', 'zip4', 'pmsa', 'employer', 'occupation', 'ytd', 'gender', 'source']

if not os.path.exists('../../data/open_secrets/527/rcpts527_clean.csv'):
    process_data('../../data/open_secrets/527/rcpts527.csv', nrows=10000, headers=columns_rcpts527, n_expected_fields=len(columns_rcpts527), show_errs=False)

df_rcpts527 = pd.read_csv('../../data/open_secrets/527/rcpts527_clean.csv', nrows=10000, on_bad_lines='skip')

File saved as ../../data/open_secrets/527/rcpts527_clean.csv


In [356]:
showdf(df_rcpts527)

Unnamed: 0,id__rcpts527,rpt__rcpts527,formid__rcpts527,schaid__rcpts527,contribid__rcpts527,contrib__rcpts527,amount__rcpts527,date__rcpts527,orgname__rcpts527,ultorg__rcpts527,realcode__rcpts527,recipid__rcpts527,recipcode__rcpts527,party__rcpts527,recipient__rcpts527,city__rcpts527,state__rcpts527,zip__rcpts527,zip4__rcpts527,pmsa__rcpts527,employer__rcpts527,occupation__rcpts527,ytd__rcpts527,gender__rcpts527,source__rcpts527
0,981,Q210,9595837,2017490,,WEST LA DEMOCRATIC CLUB,1,04/18/2010,West La Democratic Club,,Z9600,270160261,PI,,ActBlue Technical Services,BURBANK,CA,91502,,4480,,,473,,Rept
1,982,Q210,9595837,2017492,,WINOGRAD FOR CONGRESS 2010,259,04/18/2010,Winograd For Congress 2010,,Z9600,270160261,PI,,ActBlue Technical Services,BURBANK,CA,91502,,4480,,,1049,,Rept
2,983,Q210,9595837,2017387,,FDL ACTION PAC,4,04/18/2010,Fdl Action Pac,,Z9600,270160261,PI,,ActBlue Technical Services,WASHINGTON,DC,20016,,8840,,,1524,,Rept


---
### Campaign Finance 22 data
#### cands22

In [358]:
# OpenSecrets Data Definition: Candidates
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20Candidates%20Data.htm
columns_cands22 = ['cycle', 'feccandid', 'cid', 'firstlastp', 'party', 'distidrunfor', 
                    'distidcurr', 'currcand', 'cyclecand', 'crpico', 'recipcode', 
                    'nopacs']

if not os.path.exists('../../data/open_secrets/CampaignFin22/cands22_clean.csv'):
    process_data('../../data/open_secrets/CampaignFin22/cands22.csv', headers=columns_cands22, n_expected_fields=len(columns_cands22), show_errs=False)

df_cands22 = pd.read_csv('../../data/open_secrets/CampaignFin22/cands22_clean.csv', on_bad_lines='skip')

# Remove party labels from names: '3', 'R', 'D', 'I', 'L', 'U', 'i'
df_cands22['firstlast__cands22'] = df_cands22['firstlastp__cands22'].apply(
    lambda x: x.replace(" (3)", "").replace(" (R)", "").replace(" (D)", "").replace(" (I)", "").replace(" (L)", "").replace(" (U)", "").replace(" (i)", "") if isinstance(x, str) else x
)

Reading line 1785 of 8928...
Reading line 3570 of 8928...
Reading line 5355 of 8928...
Reading line 7140 of 8928...
Reading line 8925 of 8928...
File saved as ../../data/open_secrets/CampaignFin22/cands22_clean.csv


In [360]:
showdf(df_cands22)

Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22
0,2022,H0AK00105,N00039029,Thomas Lamb (3),3,AK01,,,,,3N,,Thomas Lamb
1,2022,H0AL01055,N00044245,Jerry Carl (R),R,AL01,AL01,Y,Y,I,RW,,Jerry Carl
2,2022,H0AL01063,N00044288,Wes Lambert (R),R,AL01,,,,,RN,,Wes Lambert


#### cmtes22
*All cmtes, lead cmtes, pac cmtes*

In [362]:
# OpenSecrets Table Definition: Committee table
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20Cmtes.htm
columns_cmtes22 = ['cycle', 'cmteid', 'pacshort', 'affiliate', 'ultorg', 'recipid', 
                    'recipcode', 'feccandid', 'party', 'primcode', 'source', 'sensitive',
                    'foreign', 'active']

if not os.path.exists('../../data/open_secrets/CampaignFin22/cmtes22_clean.csv'):
    process_data('../../data/open_secrets/CampaignFin22/cmtes22.csv', headers=columns_cmtes22, n_expected_fields=len(columns_cmtes22), show_errs=False)

df_cmtes22 = pd.read_csv('../../data/open_secrets/CampaignFin22/cmtes22_clean.csv', on_bad_lines='skip')

Reading line 3957 of 19788...
Reading line 7914 of 19788...
Reading line 11871 of 19788...
Reading line 15828 of 19788...
Reading line 19785 of 19788...
File saved as ../../data/open_secrets/CampaignFin22/cmtes22_clean.csv


**All cmtes**

In [640]:
print(len(df_cmtes22))
showdf(df_cmtes22)

19788


Unnamed: 0,cycle__cmtes22,cmteid__cmtes22,pacshort__cmtes22,affiliate__cmtes22,ultorg__cmtes22,recipid__cmtes22,recipcode__cmtes22,feccandid__cmtes22,party__cmtes22,primcode__cmtes22,source__cmtes22,sensitive__cmtes22,foreign__cmtes22,active__cmtes22
0,2022,C00000059,Hallmark Cards,,Hallmark Cards,C00000059,PB,,,C1400,WAA21,Y,0,1
1,2022,C00000422,American Medical Assn,American Medical Assn,American Medical Assn,C00000422,PB,,,H1100,WAA21,n,0,1
2,2022,C00000489,Teamsters Local 886,Teamsters Union,Teamsters Union,C00000489,PL,,,LT300,WAA21,n,0,1


**Split lead and non-lead cmtes**

In [367]:
# I need to get transactions from NOT lead pac to IS lead pac.
# Use this column, pacid__pacs22, and lookup if value is a leadpac
# Then, remove all pacid__pacs22 that represent leadpacs.
df_recipid_cmtes22 = df_cmtes22[['cmteid__cmtes22', 'recipid__cmtes22']]

# Lead pac committees pacids, for filtering.
df_recipid_lead_cmtes22 = df_recipid_cmtes22[df_recipid_cmtes22['recipid__cmtes22'].str.startswith('N', na=False)]
df_recipid_lead_cmtes22 = df_recipid_lead_cmtes22[['cmteid__cmtes22']]

# Non-lead pac committees pacids, for filtering.
df_recipid_pac_cmtes22 = df_recipid_cmtes22[df_recipid_cmtes22['recipid__cmtes22'].str.startswith('C', na=False)]
df_recipid_pac_cmtes22 = df_recipid_pac_cmtes22[['cmteid__cmtes22']]

In [369]:
# Lead pac committees (filtered).
df_lead_cmtes22 = df_cmtes22[df_cmtes22['cmteid__cmtes22'].isin(df_recipid_lead_cmtes22['cmteid__cmtes22'])]
df_lead_cmtes22.columns = df_lead_cmtes22.columns.str.replace(r'(.*?)__(.*)', r'\1_lead__\2', regex=True)
print(len(df_lead_cmtes22))
showdf(df_lead_cmtes22)

7247


Unnamed: 0,cycle_lead__cmtes22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,recipid_lead__cmtes22,recipcode_lead__cmtes22,feccandid_lead__cmtes22,party_lead__cmtes22,primcode_lead__cmtes22,source_lead__cmtes22,sensitive_lead__cmtes22,foreign_lead__cmtes22,active_lead__cmtes22
25,2022,C00002592,Committee to Reelect Tom Foley,,Committee to Reelect Tom Foley,N00007967,DN,H6WA05023,D,Z1200,Rept,N,0,1
130,2022,C00012229,Alaskans for Don Young,,Alaskans for Don Young,N00007999,RI,H6AK00045,R,Z1100,Rept,N,0,1
140,2022,C00013128,Congressman Waxman Campaign Cmte,,Congressman Waxman Campaign Cmte,N00001861,DN,H6CA24048,D,Z1200,Rept,N,0,1


In [371]:
# Non-lead pac committees (filtered).
df_pac_cmtes22 = df_cmtes22[df_cmtes22['cmteid__cmtes22'].isin(df_recipid_pac_cmtes22['cmteid__cmtes22'])]
df_pac_cmtes22.columns = df_pac_cmtes22.columns.str.replace(r'(.*?)__(.*)', r'\1_pac__\2', regex=True)
print(len(df_pac_cmtes22))
showdf(df_pac_cmtes22)

12541


Unnamed: 0,cycle_pac__cmtes22,cmteid_pac__cmtes22,pacshort_pac__cmtes22,affiliate_pac__cmtes22,ultorg_pac__cmtes22,recipid_pac__cmtes22,recipcode_pac__cmtes22,feccandid_pac__cmtes22,party_pac__cmtes22,primcode_pac__cmtes22,source_pac__cmtes22,sensitive_pac__cmtes22,foreign_pac__cmtes22,active_pac__cmtes22
0,2022,C00000059,Hallmark Cards,,Hallmark Cards,C00000059,PB,,,C1400,WAA21,Y,0,1
1,2022,C00000422,American Medical Assn,American Medical Assn,American Medical Assn,C00000422,PB,,,H1100,WAA21,n,0,1
2,2022,C00000489,Teamsters Local 886,Teamsters Union,Teamsters Union,C00000489,PL,,,LT300,WAA21,n,0,1


#### pac_other22 – pacs to pacs
*All pacs, pac-to-pac, pac-to-cand*

**All pacs**

In [381]:
# OpenSecrets Data Definition for PAC to PAC Data (Pac_other table)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20PAC%20to%20PAC%20Data.htm
columns_pac_other22 = ['cycle', 'fecrecno', 'filerid', 'donorcmte', 'contriblendtrans', 'city', 'state', 
                            'zip', 'fecoccemp', 'primcode', 'date', 'amount', 'recipid', 'party', 'otherid',
                            'recipcode', 'recipprimcode', 'amend', 'report', 'pg', 'microfilm', 'type',
                            'realcode', 'source']

if not os.path.exists('../../data/open_secrets/CampaignFin22/pac_other22_clean.csv'):
    process_data('../../data/open_secrets/CampaignFin22/pac_other22.csv', headers=columns_pac_other22, n_expected_fields=len(columns_pac_other22), show_errs=False)

df_pac_other22 = pd.read_csv('../../data/open_secrets/CampaignFin22/pac_other22_clean.csv', on_bad_lines='skip')

Reading line 206224 of 1031123...
Reading line 412448 of 1031123...
Reading line 618672 of 1031123...
Reading line 824896 of 1031123...
Reading line 1031120 of 1031123...
File saved as ../../data/open_secrets/CampaignFin22/pac_other22_clean.csv


In [382]:
df_pac_other22.head()

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22
0,2022,1011320230265253021,C00032979,Teamsters Union,BOHANNAN VICTORY FUND,IOWA CITY,IA,52245.0,,LT300,11/15/2022,-1000.0,C00811059,D,C00811059,DP,Z4200,A,30G,P,202212089550408190,24K,LT300,PAC
1,2022,1011320230265254787,C00032979,Teamsters Union,WARNOCK VICTORY FUND,WASHINGTON,DC,20003.0,,LT300,11/22/2022,15000.0,C00740597,D,C00740597,DP,Z4200,A,30G,P,202212089550408210,24K,LT300,PAC
2,2022,1011320230265254797,C00032979,Teamsters Union,BLUE POWER PAC,COLLINSWOOD,NJ,8108.0,,LT300,11/03/2022,5000.0,C00575894,D,C00575894,PI,J2100,A,30G,P,202212089550408207,24K,LT300,PAC
3,2022,1011320230265254799,C00032979,Teamsters Union,COMMONWEALTH PAC,MERRIFIELD,VA,22116.0,,LT300,11/03/2022,5000.0,C00498931,D,C00498931,PI,J2100,A,30G,P,202212089550408208,24K,LT300,PAC
4,2022,1011320230265254826,C00032979,Teamsters Union,RHODE ISLAND DEMOCRATIC STATE CMTE,PROVIDENCE,RI,92940.0,,LT300,11/03/2022,5000.0,C00136200,D,C00136200,DP,Z5200,A,30G,P,202212089550408209,24K,LT300,PAC


In [385]:
# Identify the donor pacid.
# The "filerid" is the donor if "type" starts with "1" and "otherid" if "type" starts with "2".
df_pac_other22['donorid__pac_other22'] = df_pac_other22.apply(
    lambda row: row['otherid__pac_other22'] if row['type__pac_other22'].startswith('1')
    else (row['filerid__pac_other22'] if row['type__pac_other22'].startswith('2') else None),
    axis=1
)

In [387]:
showdf(df_pac_other22)

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22,donorid__pac_other22
0,2022,1011320230265253021,C00032979,Teamsters Union,BOHANNAN VICTORY FUND,IOWA CITY,IA,52245.0,,LT300,11/15/2022,-1000.0,C00811059,D,C00811059,DP,Z4200,A,30G,P,202212089550408190,24K,LT300,PAC,C00032979
1,2022,1011320230265254787,C00032979,Teamsters Union,WARNOCK VICTORY FUND,WASHINGTON,DC,20003.0,,LT300,11/22/2022,15000.0,C00740597,D,C00740597,DP,Z4200,A,30G,P,202212089550408210,24K,LT300,PAC,C00032979
2,2022,1011320230265254797,C00032979,Teamsters Union,BLUE POWER PAC,COLLINSWOOD,NJ,8108.0,,LT300,11/03/2022,5000.0,C00575894,D,C00575894,PI,J2100,A,30G,P,202212089550408207,24K,LT300,PAC,C00032979


In [389]:
# # Notice that candidates are never filers in pac_other22.
# df_pac_other22[df_pac_other22['filerid__pac_other22'].str.startswith('N', na=False)]

**Pacs to pacs**

In [392]:
# Flows from pacs to pacs (non-lead/candidate)
df_pac_to_pac = df_pac_other22[~ df_pac_other22['recipid__pac_other22'].str.startswith('N', na=False)]

In [394]:
showdf(df_pac_to_pac)

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22,donorid__pac_other22
0,2022,1011320230265253021,C00032979,Teamsters Union,BOHANNAN VICTORY FUND,IOWA CITY,IA,52245.0,,LT300,11/15/2022,-1000.0,C00811059,D,C00811059,DP,Z4200,A,30G,P,202212089550408190,24K,LT300,PAC,C00032979
1,2022,1011320230265254787,C00032979,Teamsters Union,WARNOCK VICTORY FUND,WASHINGTON,DC,20003.0,,LT300,11/22/2022,15000.0,C00740597,D,C00740597,DP,Z4200,A,30G,P,202212089550408210,24K,LT300,PAC,C00032979
2,2022,1011320230265254797,C00032979,Teamsters Union,BLUE POWER PAC,COLLINSWOOD,NJ,8108.0,,LT300,11/03/2022,5000.0,C00575894,D,C00575894,PI,J2100,A,30G,P,202212089550408207,24K,LT300,PAC,C00032979


**Pacs to cands**

In [397]:
# Flows from pacs *directly* to indiv candidates
df_pac_to_cand = df_pac_other22[df_pac_other22['recipid__pac_other22'].str.startswith('N', na=False)]

In [399]:
showdf(df_pac_to_cand)

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22,donorid__pac_other22
24,2022,1051020220264956692,C00803205,"Sodrel, Mike","SODREL, MICHAEL E",JEFFERSONVILLE,IN,47130.0,SODREL TRUCK LINES,Z1100,04/13/2022,210000.0,N00013869,R,H2IN09084,RL,Z1100,A,12P,P,202204219502250419,16C,Z1100,PAC,H2IN09084
25,2022,1051020220264956693,C00803205,"Sodrel, Mike","SODREL, MICHAEL E",JEFFERSONVILLE,IN,47130.0,SODREL TRUCK LINES,Z1100,04/13/2022,90000.0,N00013869,R,H2IN09084,RL,Z1100,A,12P,P,202204219502250419,16C,Z1100,PAC,H2IN09084
27,2022,1061620220264962191,C00793844,Charles K Moss US Senate Campaign,"MOSS, CHARLES K",RANDLEMAN,NC,27317.0,,Z1100,01/11/2022,841.0,N00050004,R,S2NC00612,RL,Z1100,A,Q1,P,202204180300404482,20C,Z1100,PAC,C00793844


#### pacs22 – pacs to cands

In [404]:
# Pacs22 – Lead pacs only.
# OpenSecrets Data Definition: PAC table (PACs to Candidates)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20PAC%20to%20Cands%20Data.htm
# "pacid" who represents "realcode" (industry or ideology) "di" (directly or indirectly) contributes "amount" to to "cid".
# NOTE: pacid__pacs22 never equals cid__pacs22 – no self-contributions.
columns_pacs22 = ['cycle', 'fecrecno', 'pacid', 'cid', 'amount', 'date', 'realcode', 
                            'type', 'di', 'feccandid']

if not os.path.exists('../../data/open_secrets/CampaignFin22/pacs22_clean.csv'):
    process_data('../../data/open_secrets/CampaignFin22/pacs22.csv', headers=columns_pacs22, n_expected_fields=len(columns_pacs22), show_errs=False)

df_pacs22 = pd.read_csv('../../data/open_secrets/CampaignFin22/pacs22_clean.csv', on_bad_lines='skip')

Reading line 151625 of 758125...
Reading line 303250 of 758125...
Reading line 454875 of 758125...
Reading line 606500 of 758125...
Reading line 758125 of 758125...
File saved as ../../data/open_secrets/CampaignFin22/pacs22_clean.csv


In [406]:
showdf(df_pacs22)

Unnamed: 0,cycle__pacs22,fecrecno__pacs22,pacid__pacs22,cid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,feccandid__pacs22
0,2022,1011020220264924462,C00406124,N00029513,500,10/18/2021,B4000,24K,D,H8IN07184
1,2022,1011320230265253020,C00032979,N00041569,-2500,11/21/2022,Z9600,24K,D,H8IL14174
2,2022,1011320230265254779,C00032979,N00041511,-5000,11/18/2022,Z9600,24K,D,H8MN08043


In [590]:
# # Lead pac transactions
# df_lead_pacs22 = df_pacs22[df_pacs22['pacid__pacs22'].isin(df_recipid_lead_cmtes22['cmteid__cmtes22'])]
# print(len(df_lead_pacs22))
# showdf(df_lead_pacs22)

In [592]:
# # Non-lead pac transactions
# df_pac_pacs22 = df_pacs22[df_pacs22['pacid__pacs22'].isin(df_recipid_pac_cmtes22['cmteid__cmtes22'])]
# print(len(df_pac_pacs22))
# showdf(df_pac_pacs22)

#### indivs22

In [421]:
# OpenSecrets Data Definition: Individual Contribution Data
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20Individual%20Contribution%20Data.htm
columns_indivs22 = ['cycle', 'fectransid', 'contribid', 'contrib_last', 'contrib_first', 'recipid', 'orgname', 
                    'ultorg', 'realcode', 'date', 'amount', 'street', 'city', 'state',
                    'zip', 'recipcode', 'type', 'cmteid', 'otherid', 'gender', 'microfilm',
                    'occupation', 'employer', 'source']

# This dataset is huge, and crashes my computer.
# Takes 6.5min to read the file.

if not os.path.exists('../../data/open_secrets/CampaignFin22/indivs22_clean.csv'):
    process_data('../../data/open_secrets/CampaignFin22/indivs22.csv', headers=columns_indivs22, nrows=1000, n_expected_fields=len(columns_indivs22), show_errs=False)

df_indivs22 = pd.read_csv('../../data/open_secrets/CampaignFin22/indivs22_clean.csv', on_bad_lines='skip', nrows=1000)

In [423]:
showdf(df_indivs22)

Unnamed: 0,cycle__indivs22,fectransid__indivs22,contribid__indivs22,contrib_last__indivs22,contrib_first__indivs22,recipid__indivs22,orgname__indivs22,ultorg__indivs22,realcode__indivs22,date__indivs22,amount__indivs22,street__indivs22,city__indivs22,state__indivs22,zip__indivs22,recipcode__indivs22,type__indivs22,cmteid__indivs22,otherid__indivs22,gender__indivs22,microfilm__indivs22,occupation__indivs22,employer__indivs22,source__indivs22
0,2022,4061520221505841534,r0014256510,"DILLARD, DANIEL",C00000935,[24T Contribution],,Z9500,08/29/2021,35,,HIDALGO,TX,78557,DP,24T,C00401224,C00000935,M,202205149504021121,NOT EMPLOYED,NOT EMPLOYED,Rept,
1,2022,4061520221505841535,r0015503614,"WHITE, SCOTTO",C00633404,[24T Contribution],,Z9500,08/29/2021,15,,YONKERS,NY,10701,PI,24T,C00401224,C00633404,M,202205149507911003,NOT EMPLOYED,NOT EMPLOYED,Rept,
2,2022,4061520221505841536,p0003861308,"DOMINGUEZ, SALLIE",C00632398,[24T Contribution],,Z9500,08/29/2021,10,,CAMPBELL,CA,95011,PI,24T,C00401224,C00632398,F,202205149504046504,HEALTHCARE,MCDONALD FAMILY DENTAL,Rept,


---
### Expends22 data
#### expends22

In [915]:
# OpenSecrets Data Dictionary for Expenditure Data - from FEC electronic filings
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20Expenditures.htm
columns_expends22 = ['cycle', 'id', 'transid', 'crpfilerid', 
                     'recipcode', 'pacshort', 'crprecipname', 
                     'expcode', 'amount', 'date', 'city', 'state', 
                     'zip', 'cmteid_ef', 'candid', 'type',
                     'descrip', 'pg', 'elecother', 'enttype',
                     'source']

if not os.path.exists('../../data/open_secrets/Expend22/expends22_clean.csv'):
    process_data('../../data/open_secrets/Expend22/expends22.csv', headers=columns_expends22, nrows=1000, n_expected_fields=len(columns_expends22), show_errs=False)

df_expends22 = pd.read_csv('../../data/open_secrets/Expend22/expends22_clean.csv', on_bad_lines='skip', nrows=1000)

In [917]:
# All pac expenditures
showdf(df_expends22)

Unnamed: 0,cycle__expends22,id__expends22,transid__expends22,crpfilerid__expends22,recipcode__expends22,pacshort__expends22,crprecipname__expends22,expcode__expends22,amount__expends22,date__expends22,city__expends22,state__expends22,zip__expends22,cmteid_ef__expends22,candid__expends22,type__expends22,descrip__expends22,pg__expends22,elecother__expends22,enttype__expends22,source__expends22
0,2022,1,500252553,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,1,05/01/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto
1,2022,2,500252556,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,10,05/02/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto
2,2022,3,500252557,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,4,05/02/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto


In [919]:
# Lead pac expenditures.
df_lead__expends22 = df_expends22[df_expends22['crpfilerid__expends22'].str.startswith('N')]
print(len(df_lead__expends22))
showdf(df_lead__expends22)

460


Unnamed: 0,cycle__expends22,id__expends22,transid__expends22,crpfilerid__expends22,recipcode__expends22,pacshort__expends22,crprecipname__expends22,expcode__expends22,amount__expends22,date__expends22,city__expends22,state__expends22,zip__expends22,cmteid_ef__expends22,candid__expends22,type__expends22,descrip__expends22,pg__expends22,elecother__expends22,enttype__expends22,source__expends22
75,2022,76,500180910,N00050407,DW,Jeff Jackson for Congress,Authentic Campaigns,U20,25000,03/23/2021,Richmond,VA,232235302,,,,Digital List Acquisition,P2022,,ORG,temps
76,2022,77,500180240,N00050407,DW,Jeff Jackson for Congress,Stripe Inc,F40,290,03/09/2021,San Francisco,CA,941079105,,,,Credit Card Processing Fee,P2022,,ORG,@new
77,2022,78,500179650,N00050407,DW,Jeff Jackson for Congress,Caravan Llc,M40,4189,02/05/2021,Charlotte,NC,282735505,,,,Media Production Services,P2022,,ORG,@new


In [921]:
# Non-lead pac expenditures.
df_pac__expends22 = df_expends22[df_expends22['crpfilerid__expends22'].str.startswith('C')]
print(len(df_pac__expends22))
showdf(df_pac__expends22)

540


Unnamed: 0,cycle__expends22,id__expends22,transid__expends22,crpfilerid__expends22,recipcode__expends22,pacshort__expends22,crprecipname__expends22,expcode__expends22,amount__expends22,date__expends22,city__expends22,state__expends22,zip__expends22,cmteid_ef__expends22,candid__expends22,type__expends22,descrip__expends22,pg__expends22,elecother__expends22,enttype__expends22,source__expends22
0,2022,1,500252553,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,1,05/01/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto
1,2022,2,500252556,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,10,05/02/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto
2,2022,3,500252557,C00143230,DP,New York State Democratic Cmte,ActBlue,F40,4,05/02/2021,Somerville,MA,21443132,,,,Credit Card Processing Fee,,,ORG,@auto


---
### Lobby data
#### lob_agency

In [444]:
# OpenSecrets Data Definition for Lobbying Data: Lobby agencies
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_agency.htm
columns_lob_agency = ['uniqid', 'agencyid', 'agency']

if not os.path.exists('../../data/open_secrets/Lobby/lob_agency_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_agency.csv', headers=columns_lob_agency, n_expected_fields=len(columns_lob_agency), show_errs=False)

df_lob_agency = pd.read_csv('../../data/open_secrets/Lobby/lob_agency_clean.csv', on_bad_lines='skip')

Reading line 819552 of 4097763...
Reading line 1639104 of 4097763...
Reading line 2458656 of 4097763...
Reading line 3278208 of 4097763...
Reading line 4097760 of 4097763...
File saved as ../../data/open_secrets/Lobby/lob_agency_clean.csv


In [445]:
showdf(df_lob_agency)

Unnamed: 0,uniqid__lob_agency,agencyid__lob_agency,agency__lob_agency
0,BB7367A7-7B60-4DED-AA2D-A94771A9EBE8,1,US Senate
1,BB7367A7-7B60-4DED-AA2D-A94771A9EBE8,2,US House of Representatives
2,04366C6F-B0CE-4C28-87BF-EE1CC8A9BB41,2,US House of Representatives


#### lob_bills

In [911]:
# OpenSecrets Data Definition for Lobbying Data: Lobby bills
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_bills.htm
columns_lob_bills = ['b_id', 'si_id', 'congno', 'bill_name']

if not os.path.exists('../../data/open_secrets/Lobby/lob_bills_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_bills.csv', headers=columns_lob_bills, n_expected_fields=len(columns_lob_bills), show_errs=False)

df_lob_bills = pd.read_csv('../../data/open_secrets/Lobby/lob_bills_clean.csv', on_bad_lines='skip')
df_lob_bills['bill_name__lob_bills'] = df_lob_bills['bill_name__lob_bills'].apply(lambda x: x[:-2])

In [913]:
showdf(df_lob_bills)

Unnamed: 0,b_id__lob_bills,si_id__lob_bills,congno__lob_bills,bill_name__lob_bills
0,s1461-117,2820018,117,S.14
1,hr463-117,2820018,117,H.R.4
2,s910-116,2820035,116,S.9


#### lob_indus

In [448]:
# OpenSecrets Data Definition for Lobbying Data: Lobby industries
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_indus.htm
columns_lob_indus = ['client', 'sub', 'total', 'year', 'catcode']

if not os.path.exists('../../data/open_secrets/Lobby/lob_indus_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_indus.csv', headers=columns_lob_indus, n_expected_fields=len(columns_lob_indus), show_errs=False)

df_lob_indus = pd.read_csv('../../data/open_secrets/Lobby/lob_indus_clean.csv', on_bad_lines='skip')

Reading line 61946 of 309730...
Reading line 123892 of 309730...
Reading line 185838 of 309730...
Reading line 247784 of 309730...
Reading line 309730 of 309730...
File saved as ../../data/open_secrets/Lobby/lob_indus_clean.csv


In [449]:
showdf(df_lob_indus)

Unnamed: 0,client__lob_indus,sub__lob_indus,total__lob_indus,year__lob_indus,catcode__lob_indus
0,National Assn for County Community & Econ Develop,National Assn for County Community & Econ Develop,0,1998,X3000
1,Fox Valley Technical College,Fox Valley Technical College,80000,2015,H5200
2,Employers Cncl on Flexible Compensation,Employers Cncl on Flexible Compensation,580000,2001,J9000


#### lob_issue

In [1251]:
# OpenSecrets Data Definition for Lobbying Data: Lobby issues
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_issues.htm
columns_lob_issue = ['si_id', 'uniqid', 'issueid', 'issue', 'specificissue', 'year']

if not os.path.exists('../../data/open_secrets/Lobby/lob_issue_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_issue.csv', headers=columns_lob_issue, n_expected_fields=len(columns_lob_issue), show_errs=False)

df_lob_issue = pd.read_csv('../../data/open_secrets/Lobby/lob_issue_clean.csv', on_bad_lines='skip')

Reading line 1218547 of 6092739...
Reading line 2437094 of 6092739...
Reading line 3655641 of 6092739...
Reading line 4874188 of 6092739...
Reading line 6092735 of 6092739...
File saved as ../../data/open_secrets/Lobby/lob_issue_clean.csv


In [1255]:
showdf(df_lob_issue)

Unnamed: 0,si_id__lob_issue,uniqid__lob_issue,issueid__lob_issue,issue__lob_issue,specificissue__lob_issue,year__lob_issue
0,3001624,02e92bd6-0159-495e-9d00-8a490a0be8be,DIS,Disaster & Emergency Planning,Issues affecting manufacturer of railroad and ...,2022
1,3001625,02e92bd6-0159-495e-9d00-8a490a0be8be,ENV,Environment & Superfund,Issues affecting manufacturer of railroad and ...,2022
2,3001626,02e92bd6-0159-495e-9d00-8a490a0be8be,LBR,"Labor, Antitrust & Workplace",Issues affecting manufacturer of railroad and ...,2022


#### lob_issue_no_specific

In [497]:
# OpenSecrets Data Definition for Lobbying Data: Lobby issues (no specific issue)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_issues.htm
columns_lob_issue_no_specific = ['si_id', 'uniqid', 'issueid', 'issue', 'year']

if not os.path.exists('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue.csv', headers=columns_lob_issue_no_specific, n_expected_fields=len(columns_lob_issue_no_specific), show_errs=False)

df_lob_issue_no_specific = pd.read_csv('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue_clean.csv', on_bad_lines='skip')

Reading line 601972 of 3009864...
Reading line 1203944 of 3009864...
Reading line 1805916 of 3009864...
Reading line 2407888 of 3009864...
Reading line 3009860 of 3009864...
File saved as ../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue_clean.csv


In [498]:
showdf(df_lob_issue_no_specific)

Unnamed: 0,si_id__lob_issue_NoSpecficIssue,uniqid__lob_issue_NoSpecficIssue,issueid__lob_issue_NoSpecficIssue,issue__lob_issue_NoSpecficIssue,year__lob_issue_NoSpecficIssue
0,3001624,02e92bd6-0159-495e-9d00-8a490a0be8be,DIS,Disaster & Emergency Planning,2022
1,3001625,02e92bd6-0159-495e-9d00-8a490a0be8be,ENV,Environment & Superfund,2022
2,3001626,02e92bd6-0159-495e-9d00-8a490a0be8be,LBR,"Labor, Antitrust & Workplace",2022


#### lob_lobbying

In [455]:
# OpenSecrets Data Definitions for Lobbying Data: Lobbying
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_lobbying.htm
columns_lob_lobbying = ['uniqid','registrant_raw','registrant','isfirm','client_raw','client','ultorg','amount',
                        'catcode','source','self','includensfs','use',
                       'ind', 'year', 'type', 'typelong', 'affiliate']

if not os.path.exists('../../data/open_secrets/Lobby/lob_lobbying_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_lobbying.csv', headers=columns_lob_lobbying, n_expected_fields=len(columns_lob_lobbying), show_errs=False)

df_lob_lobbying = pd.read_csv('../../data/open_secrets/Lobby/lob_lobbying_clean.csv', on_bad_lines='skip')

Reading line 308814 of 1544073...
Reading line 617628 of 1544073...
Reading line 926442 of 1544073...
Reading line 1235256 of 1544073...
Reading line 1544070 of 1544073...
File saved as ../../data/open_secrets/Lobby/lob_lobbying_clean.csv


In [456]:
showdf(df_lob_lobbying)

Unnamed: 0,uniqid__lob_lobbying,registrant_raw__lob_lobbying,registrant__lob_lobbying,isfirm__lob_lobbying,client_raw__lob_lobbying,client__lob_lobbying,ultorg__lob_lobbying,amount__lob_lobbying,catcode__lob_lobbying,source__lob_lobbying,self__lob_lobbying,includensfs__lob_lobbying,use__lob_lobbying,ind__lob_lobbying,year__lob_lobbying,type__lob_lobbying,typelong__lob_lobbying,affiliate__lob_lobbying
0,82c5f661-a637-45ad-a3a6-b5ba18cf8962,ASTRAZENECA PHARMACEUTICALS LP,AstraZeneca Pharmaceuticals,,ASTRAZENECA PHARMACEUTICALS LP,AstraZeneca Pharmaceuticals,AstraZeneca PLC,1370000.0,H4300,pac,x,,y,y,2021,q4a,FOURTH QUARTER AMENDMENT,
1,84ad3a9e-5864-4227-a802-e268fbf37237,"DAVID L. HORNE, LLC",David L Horne LLC,y,MULTIFAMILY LENDERS COUNCIL,Multifamily Lenders Council,Multifamily Lenders Council,15000.0,F4600,wda16,n,,y,y,2021,q4,FOURTH QUARTER REPORT,
2,85b111b1-5d2e-4107-bc24-0921316e29a5,ECHELON GOVERNMENT AFFAIRS,Echelon Government Affairs,y,THE ALBERS GROUP,Albers Group,Albers Group,10000.0,Y4000,,n,,y,y,2021,q4,FOURTH QUARTER REPORT,


#### lob_lobbyist

In [457]:
# OpenSecrets Data Definition for Lobbyists
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_lobbyists.htm
columns_lob_lobbyist = ['uniqid', 'lobbyist_lastname_std', 'lobbyist_firstname_std', 'lobbyist_lastname_raw', 
                     'lobbyist_firstname_raw', 'lobbyist_id', 'year', 'officialposition', 'cid', 'formercongmem']

if not os.path.exists('../../data/open_secrets/Lobby/lob_lobbyist_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_lobbyist.csv', headers=columns_lob_lobbyist, n_expected_fields=len(columns_lob_lobbyist), show_errs=False)

df_lob_lobbyist = pd.read_csv('../../data/open_secrets/Lobby/lob_lobbyist_clean.csv', on_bad_lines='skip')

Reading line 699422 of 3497114...
Reading line 1398844 of 3497114...
Reading line 2098266 of 3497114...
Reading line 2797688 of 3497114...
Reading line 3497110 of 3497114...
File saved as ../../data/open_secrets/Lobby/lob_lobbyist_clean.csv


In [458]:
showdf(df_lob_lobbyist)

Unnamed: 0,uniqid__lob_lobbyist,lobbyist_lastname_std__lob_lobbyist,lobbyist_firstname_std__lob_lobbyist,lobbyist_lastname_raw__lob_lobbyist,lobbyist_firstname_raw__lob_lobbyist,lobbyist_id__lob_lobbyist,year__lob_lobbyist,officialposition__lob_lobbyist,cid__lob_lobbyist,formercongmem__lob_lobbyist
0,06C29C84-250F-478B-872A-2F647D9DC044,"O'BRIEN, LAWRENCE F. III","O'Brien, Lawrence F III",Y0000046486L,2004,,,n,,
1,3A22C685-EC94-46AA-9C45-4AA4A7044C28,"BRAGG, PATRICIA DUNMIRE","Bragg, Patricia Dunmire",Y0000020554L,2001,,,n,,
2,5CBE61EC-87F1-401E-9D57-620975C9A1F8,"COSTELLO, RYAN","Costello, Ryan",Y0000027292L,2002,,N00031064,y,,


#### lob_rpt

In [469]:
# OpenSecrets Data Definitions for Lobbying Data: Report types
# No documentation provided on OpenSecrets.com
columns_lob_rpt = ['typelong', 'typeshort']

if not os.path.exists('../../data/open_secrets/Lobby/lob_rpt_clean.csv'):
    process_data('../../data/open_secrets/Lobby/lob_rpt.csv', headers=columns_lob_rpt, n_expected_fields=len(columns_lob_rpt), show_errs=False)

df_lob_rpt = pd.read_csv('../../data/open_secrets/Lobby/lob_rpt_clean.csv', on_bad_lines='skip')

Reading line 19 of 95...
Reading line 38 of 95...
Reading line 57 of 95...
Reading line 76 of 95...
Reading line 95 of 95...
File saved as ../../data/open_secrets/Lobby/lob_rpt_clean.csv


In [470]:
showdf(df_lob_rpt)

Unnamed: 0,typelong__lob_rpt,typeshort__lob_rpt
0,MID-YEAR REPORT,m
1,MID-YEAR AMENDMENT,ma
2,MID-YEAR (NO ACTIVITY),mn


### IDs and categories
#### CRP_ID

In [474]:
install_if_needed('xlrd')
import xlrd

xlrd is already installed.


In [878]:
# Candidate ids
# This dataset is very different, so load it independently.
columns_crp_ids = ['blank_excel_column__crp_ids', 'cid__crp_ids', 'crpname__crp_ids', 'party__crp_ids', 'distidrunfor__crp_ids', 'feccandid__crp_ids'] # Blank excel column is necessary.
columns_crp_ids = dict(enumerate(columns_crp_ids))
df_crp_ids = pd.read_excel('../../data/open_secrets/CRP_IDs.xls', header=None, skiprows=15)
df_crp_ids = df_crp_ids.drop(df_crp_ids.columns[0], axis=1)
df_crp_ids = df_crp_ids.rename(columns=columns_crp_ids)

In [879]:
showdf(df_crp_ids)

Unnamed: 0,cid__crp_ids,crpname__crp_ids,party__crp_ids,distidrunfor__crp_ids,feccandid__crp_ids
0,N00034296,"Aalders, Tim",R,UT03,H2UT03280
1,N00047923,"Aazami, Shervin",D,CA32,H2CA30291
2,N00051397,"Abahsain, Jill",D,MN07,H2MN07162


#### CRP_Categories

In [885]:
from io import StringIO
crp_filepath = '../../data/open_secrets/CRP_Categories.txt'
with open(crp_filepath, 'r') as file:
    lines = file.readlines()

header_line_index = next(i for i, line in enumerate(lines) if line.startswith('Catcode'))
table_data = ''.join(lines[header_line_index:])
df_crp_cats = pd.read_csv(StringIO(table_data), sep='\t')
df_crp_cats.columns = df_crp_cats.columns.str.lower().str.replace(' ', '_')
df_crp_cats.columns = [col + '__crp_cats' for col in df_crp_cats.columns]

In [887]:
showdf(df_crp_cats)

Unnamed: 0,catcode__crp_cats,catname__crp_cats,catorder__crp_cats,industry__crp_cats,sector__crp_cats,sector_long__crp_cats
0,A0000,Agriculture,A11,Misc Agriculture,Agribusiness,Agribusiness
1,A1000,Crop production & basic processing,A01,Crop Production & Basic Processing,Agribusiness,Agribusiness
2,A1100,Cotton,A01,Crop Production & Basic Processing,Agribusiness,Agribusiness


---

## Sanity Checks

One-by-one tests for the U.S. House Ways & Means committee members:

In [503]:
other_pacid = 'C00000422' # American Medical Association

In [505]:
member_name = 'Adrian Smith'
member_cid = 'N00027623'
member_pid = 'C00412890'

In [507]:
member_name2 = 'Jodey Arrington'
member_cid2 = 'N00038285'
member_pid2 = 'C00588657'

In [509]:
# Member info
print(f"MEMBER: {member_name}, CID: {member_cid}, PID: {member_pid}")
print('-' * 100)
print("DATASET: cands22")
display(df_cands22[df_cands22['cid__cands22'] == member_cid].head(1))
print('-' * 100)
print("DATASET: cmtes22")
display(df_cmtes22[df_cmtes22['recipid__cmtes22'] == member_cid].head(1))
print('-' * 100)
print("DATASET: pacs22")
display(df_pacs22[df_pacs22['pacid__pacs22'] == member_pid].head(1))

MEMBER: Adrian Smith, CID: N00027623, PID: C00412890
----------------------------------------------------------------------------------------------------
DATASET: cands22


Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22
5241,2022,H6NE03115,N00027623,Adrian Smith (R),R,NE03,NE03,Y,Y,I,RW,,Adrian Smith


----------------------------------------------------------------------------------------------------
DATASET: cmtes22


Unnamed: 0,cycle__cmtes22,cmteid__cmtes22,pacshort__cmtes22,affiliate__cmtes22,ultorg__cmtes22,recipid__cmtes22,recipcode__cmtes22,feccandid__cmtes22,party__cmtes22,primcode__cmtes22,source__cmtes22,sensitive__cmtes22,foreign__cmtes22,active__cmtes22
2686,2022,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,N00027623,RW,H6NE03115,R,Z1100,Rept,N,0,1


----------------------------------------------------------------------------------------------------
DATASET: pacs22


Unnamed: 0,cycle__pacs22,fecrecno__pacs22,pacid__pacs22,cid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,feccandid__pacs22
498732,2022,4021720221409043210,C00412890,N00045245,1000,12/15/2021,Z1100,24K,D,H0KS03236


In [511]:
# Sanity check pacs22.
df_pacs22[df_pacs22['cid__pacs22'] == member_cid]['amount__pacs22'].sum()

1169840

In [513]:
# Sanity check pac_other22.
df_pac_other22[df_pac_other22['recipid__pac_other22'] == member_cid]['amount__pac_other22'].sum()

87678.0

In [523]:
#Sanity check indivs22.
df_indivs22[df_indivs22['recipid__indivs22'] == member_cid]['amount__indivs22'].sum()

0.0

In [521]:
# Final check w/ indivs22.
df_pacs22[df_pacs22['cid__pacs22'] == member_cid]['amount__pacs22'].sum() \
    + df_pac_other22[df_pac_other22['recipid__pac_other22'] == member_cid]['amount__pac_other22'].sum() \
    + df_indivs22[df_indivs22['recipid__indivs22'] == member_cid]['amount__indivs22'].sum()

1257518.0

In [519]:
# Final check w/o indivs22.
df_pacs22[df_pacs22['cid__pacs22'] == member_cid]['amount__pacs22'].sum() \
    + df_pac_other22[df_pac_other22['recipid__pac_other22'] == member_cid]['amount__pac_other22'].sum()

1257518.0

---

## Ways and Means
#### Members

In [674]:
wm_dems = [
    'Richard Neal',
    'Brad Schneider',
    'Jimmy Panetta',
    'Bill Pascrell',
    'Linda Sánchez',
    'Terri Sewell',
    'Gwen Moore',
    'Tom Suozzi',
    'Mike Thompson',
    'John Larson',
    'Stephanie Murphy',
    'Donald Beyer',
    'Earl Blumenauer',
    'Brendan Boyle',
    'Judy Chu',
    'Danny Davis',
    'Suzan DelBene',
    'Lloyd Doggett',
    'Dwight Evans',
    'Jimmy Gomez',
    'Brian Higgins',
    'Steven Horsford',
    'Dan Kildee',
    'Ronald Kind'
]

df_wm_dem_members = pd.DataFrame(wm_dems, columns=['firstlastp'])
df_wm_dem = pd.merge(df_cands22, df_wm_dem_members, left_on='firstlast__cands22', right_on='firstlastp', how='inner')
df_wm_dem = df_wm_dem.drop(['firstlastp', 'cycle__cands22'], axis=1)

In [676]:
print(f"len: {len(df_wm_dem)}")
showdf(df_wm_dem)

len: 18


Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22
0,2022,H0AL07086,N00030622,Terri Sewell (D),D,AL07,AL07,Y,Y,I,DW,,Terri Sewell
1,2022,H0CA32101,N00030600,Judy Chu (D),D,CA28,CA27,Y,Y,I,DW,,Judy Chu
2,2022,H0WA08046,N00030693,Suzan DelBene (D),D,WA01,WA01,Y,Y,I,DW,,Suzan DelBene


In [678]:
wm_rep = [
    'Jason Smith', 
    'Vern Buchanan',
    'Kevin Hern',
    'Lloyd Smucker',
    'Mike Kelly',
    'Nicole Malliotakis',
    'Randy Feenstra',
    'Jodey Arrington',
    'Greg Steube',
    'Ron Estes',
    'Adrian Smith',
    'Beth Van Duyne',
    'Carol Miller',
    'Michelle Fischbach',
    'Michelle Steel',
    'Mike Carey',
    'Blake Moore',
    'Brad Wenstrup',
    'Brian Fitzpatrick',
    'Claudia Tenney',
    'Darin LaHood',
    'David Kustoff',
    'David Schweikert',
    'Drew Ferguson',
    'Gregory Murphy'
]

df_wm_rep_members = pd.DataFrame(wm_rep, columns=['firstlastp'])
df_wm_rep = pd.merge(df_cands22, df_wm_rep_members, left_on='firstlast__cands22', right_on='firstlastp', how='inner')
df_wm_rep = df_wm_rep.drop(['firstlastp', 'cycle__cands22'], axis=1)

In [680]:
print(f"len: {len(df_wm_rep)}")
showdf(df_wm_rep)

len: 24


Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22
0,2022,H0CA48198,N00044501,Michelle Steel (R),R,CA45,CA48,Y,Y,I,RW,,Michelle Steel
1,2022,H0IA04145,N00044011,Randy Feenstra (R),R,IA04,IA04,Y,Y,I,RW,,Randy Feenstra
2,2022,H0MN07091,N00045251,Michelle Fischbach (R),R,MN07,MN07,Y,Y,I,RW,,Michelle Fischbach


In [718]:
df_wm_members = pd.concat([df_wm_dem, df_wm_rep])
df_wm_members = df_wm_members.reset_index(drop=True)

In [720]:
df_wm_members = df_wm_members.drop(['cycle__cands22', 'firstlastp__cands22', 'distidrunfor__cands22', 'distidcurr__cands22', 'nopacs__cands22'], axis=1)

In [722]:
print(f"len: {len(df_wm_members)}")
showdf(df_wm_members)

len: 42


Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22
0,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell
1,H0CA32101,N00030600,D,Y,Y,I,DW,Judy Chu
2,H0WA08046,N00030693,D,Y,Y,I,DW,Suzan DelBene


#### Member lead pacs
Cands22 &larr; Cmtes22 (non-trans details of cand's lead pac) &larr; Pacs22 (contributions from lead pac to the cand)

**Lead PACS w/o transactions**

In [771]:
# Note: candidates may have multiple committees for reconciling accounts across cycles.
df_wm_leadpacs = pd.merge(df_wm_members, df_lead_cmtes22, left_on='cid__cands22', right_on='recipid_lead__cmtes22', how='inner')
df_wm_leadpacs = df_wm_leadpacs[(df_wm_leadpacs['primcode_lead__cmtes22'] == 'Z1200') | (df_wm_leadpacs['primcode_lead__cmtes22'] == 'Z1100')] # Exclude abnormal leadpacs (e.g., Jason Smith for President)
df_wm_leadpacs = df_wm_leadpacs.drop(['cycle_lead__cmtes22', 'recipid_lead__cmtes22', 'recipcode_lead__cmtes22',
                                      'feccandid_lead__cmtes22', 'party_lead__cmtes22', 'primcode_lead__cmtes22',
                                      'source_lead__cmtes22', 'sensitive_lead__cmtes22', 'foreign_lead__cmtes22'], axis=1)

In [773]:
print(f"len: {len(df_wm_leadpacs)}")
df_wm_leadpacs.head(3)

len: 50


Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,active_lead__cmtes22
0,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1
1,H0CA32101,N00030600,D,Y,Y,I,DW,Judy Chu,C00458125,Judy Chu for Congress,,Judy Chu for Congress,1
2,H0WA08046,N00030693,D,Y,Y,I,DW,Suzan DelBene,C00459099,DelBene for Congress,,DelBene for Congress,1


#### Lead PAC reciepts

In [803]:
# Link lead pac details
# Keep the "di" field, for now track both direct and indirect income:
# https://raw.githubusercontent.com/nketchum/si608exploration/refs/heads/main/docs/direct_vs_indirect_contrib.txt
df_wm_leadpac_flows = pd.merge(df_wm_leadpacs, df_pacs22, left_on='cid__cands22', right_on='cid__pacs22', how='inner')
df_wm_leadpac_flows = df_wm_leadpac_flows.drop(['cycle__pacs22', 'cid__pacs22', 'feccandid__pacs22'], axis=1)

In [805]:
# print(len(df_wm_leadpac_flows))
showdf(df_wm_leadpac_flows)

Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,active_lead__cmtes22,fecrecno__pacs22,pacid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22
0,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022720231732787080,C00388157,2000,09/15/2022,M1000,24K,D
1,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873080,C00806984,5000,10/01/2022,H3900,24K,D
2,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873082,C00806984,5000,10/01/2022,H3900,24K,D


#### Leadpac detailed receipts
(Money flows + details of money flow)

In [859]:
# Incoming pac contributions to leadpac EXCEPT contributions from other leadpacs
df_wm_leadpac_flow_details = pd.merge(df_wm_leadpac_flows, df_pac_cmtes22, left_on='pacid__pacs22', right_on='cmteid_pac__cmtes22', how='inner')
df_wm_leadpac_flow_details = df_wm_leadpac_flow_details.drop(['cycle_pac__cmtes22', 'cmteid_pac__cmtes22', 'recipid_pac__cmtes22', 'sensitive_pac__cmtes22'], axis=1)

In [861]:
to_csv(df_wm_leadpac_flow_details)
showdf(df_wm_leadpac_flow_details)

DataFrame saved as CSV file in ../../outputs/df_wm_leadpac_flow_details.csv


Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,active_lead__cmtes22,fecrecno__pacs22,pacid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,pacshort_pac__cmtes22,affiliate_pac__cmtes22,ultorg_pac__cmtes22,recipcode_pac__cmtes22,feccandid_pac__cmtes22,party_pac__cmtes22,primcode_pac__cmtes22,source_pac__cmtes22,foreign_pac__cmtes22,active_pac__cmtes22
0,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022720231732787080,C00388157,2000,09/15/2022,M1000,24K,D,American Composites Manufacturers Assn,,American Composites Manufacturers Assn,PB,,,M1000,WAA21,0,1
1,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873080,C00806984,5000,10/01/2022,H3900,24K,D,Axxess Technology Solutions,,Axxess Technology Solutions,PB,,,H3900,WAA22,0,1
2,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873082,C00806984,5000,10/01/2022,H3900,24K,D,Axxess Technology Solutions,,Axxess Technology Solutions,PB,,,H3900,WAA22,0,1


#### Leadpac industry for each receipt

In [895]:
df_wm_leadpac_flow_details_ind = pd.merge(df_wm_leadpac_flow_details, df_crp_cats, left_on='primcode_pac__cmtes22', right_on='catcode__crp_cats', how='left')
df_wm_leadpac_flow_details_ind = df_wm_leadpac_flow_details_ind.drop(['catcode__crp_cats'], axis=1)

In [897]:
showdf(df_wm_leadpac_flow_details_ind)

Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,active_lead__cmtes22,fecrecno__pacs22,pacid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,pacshort_pac__cmtes22,affiliate_pac__cmtes22,ultorg_pac__cmtes22,recipcode_pac__cmtes22,feccandid_pac__cmtes22,party_pac__cmtes22,primcode_pac__cmtes22,source_pac__cmtes22,foreign_pac__cmtes22,active_pac__cmtes22,catname__crp_cats,catorder__crp_cats,industry__crp_cats,sector__crp_cats,sector_long__crp_cats
0,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022720231732787080,C00388157,2000,09/15/2022,M1000,24K,D,American Composites Manufacturers Assn,,American Composites Manufacturers Assn,PB,,,M1000,WAA21,0,1,Chemicals,N13,Chemical & Related Manufacturing,Misc Business,Misc Business
1,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873080,C00806984,5000,10/01/2022,H3900,24K,D,Axxess Technology Solutions,,Axxess Technology Solutions,PB,,,H3900,WAA22,0,1,Health care Consultants,H03,Health Services/HMOs,Health,Health
2,H0AL07086,N00030622,D,Y,Y,I,DW,Terri Sewell,C00458976,Terri Sewell for Congress,,Terri Sewell for Congress,1,2022820231732873082,C00806984,5000,10/01/2022,H3900,24K,D,Axxess Technology Solutions,,Axxess Technology Solutions,PB,,,H3900,WAA22,0,1,Health care Consultants,H03,Health Services/HMOs,Health,Health


In [977]:
# df_crp_cats.head()

In [979]:
# df_lob_issue[(df_lob_issue['issue__lob_issue'] == 'Agriculture') & (df_lob_issue['specificissue__lob_issue'] == 'Agribusiness')]
# df_lob_issue[(df_lob_issue['issue__lob_issue'] == 'Chemical & Related Manufacturing')]

In [981]:
# df_lob_issue.sample()

In [983]:
# df_lob_issue['issueid__lob_issue'].unique()[:300]

In [985]:
# df_lob_issue[df_lob_issue['issue__lob_issue'] == 'Chemical & Related Manufacturing']

In [989]:
# showdf(df_lob_bills)

#### Output as text file

In [987]:
text_file = open("../../outputs/df_wm_leadpac_flow_details.txt", "w")
text_file.write(df_wm_leadpac_flow_details.iloc[0].to_string())
text_file.close()

---

#### Empircal Tests

In [869]:
# Confirm that Adrian Smith, in particular, has received one contributions from Elevance Health	(affliate pac is Blue Cross/Blue Shield), in particular.
df_wm_leadpac_flow_details[(df_wm_leadpac_flow_details['cid__cands22'] == 'N00027623') & (df_wm_leadpac_flow_details['pacid__pacs22'] == 'C00197228')]

Unnamed: 0,feccandid__cands22,cid__cands22,party__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,firstlast__cands22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,active_lead__cmtes22,fecrecno__pacs22,pacid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,pacshort_pac__cmtes22,affiliate_pac__cmtes22,ultorg_pac__cmtes22,recipcode_pac__cmtes22,feccandid_pac__cmtes22,party_pac__cmtes22,primcode_pac__cmtes22,source_pac__cmtes22,foreign_pac__cmtes22,active_pac__cmtes22
49927,H6NE03115,N00027623,R,Y,Y,I,RW,Adrian Smith,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,1,4042820221473212259,C00197228,2500,03/10/2022,H3700,24K,D,Elevance Health,Blue Cross/Blue Shield,Blue Cross/Blue Shield,PB,,,H3700,WAA21,0,1
50046,H6NE03115,N00027623,R,Y,Y,I,RW,Adrian Smith,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,1,4062320221531156619,C00197228,2500,05/18/2022,H3700,24K,D,Elevance Health,Blue Cross/Blue Shield,Blue Cross/Blue Shield,PB,,,H3700,WAA21,0,1
50047,H6NE03115,N00027623,R,Y,Y,I,RW,Adrian Smith,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,1,4062320221531156620,C00197228,2500,05/25/2022,H3700,24K,D,Elevance Health,Blue Cross/Blue Shield,Blue Cross/Blue Shield,PB,,,H3700,WAA21,0,1


---

## Toy Network
Start at candidate, find donoation from \[lead]pac, trace to pac (if from lead), trace to company. *Do not delete anything in this section.*

In [1119]:
df_toy_network = pd.DataFrame()
df_full_network = pd.DataFrame()

### From candidate to donor

Adrian Smith will be used in the toy networks. The full networks will include everyone.

**Candidate**

In [1123]:
# Adrian Smith
df_toy_network = df_cands22[df_cands22['cid__cands22'] == member_cid]
showdf(df_cands22[df_cands22['cid__cands22'] == member_cid])

Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22
5241,2022,H6NE03115,N00027623,Adrian Smith (R),R,NE03,NE03,Y,Y,I,RW,,Adrian Smith


In [1125]:
df_full_network = df_wm_members.copy()

**Candidate's lead pac**

In [1130]:
# Adrian Smith's lead pac cmte
df_toy_network = pd.merge(df_toy_network, df_lead_cmtes22[df_lead_cmtes22['recipid_lead__cmtes22'] == member_cid], left_on='cid__cands22', right_on='recipid_lead__cmtes22', how='inner')
showdf(df_lead_cmtes22[df_lead_cmtes22['recipid_lead__cmtes22'] == member_cid])

Unnamed: 0,cycle_lead__cmtes22,cmteid_lead__cmtes22,pacshort_lead__cmtes22,affiliate_lead__cmtes22,ultorg_lead__cmtes22,recipid_lead__cmtes22,recipcode_lead__cmtes22,feccandid_lead__cmtes22,party_lead__cmtes22,primcode_lead__cmtes22,source_lead__cmtes22,sensitive_lead__cmtes22,foreign_lead__cmtes22,active_lead__cmtes22
2686,2022,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,N00027623,RW,H6NE03115,R,Z1100,Rept,N,0,1


In [1114]:
df_full_network = pd.merge(df_full_network, df_cmtes22, left_on='cid__cands22', right_on='recipid__cmtes22', how='inner')

##### Reminder: pac_other22, Most are pacs-to-pacs ("C"), some are pacs-to-individual_cand ("N")

In [1192]:
# len(df_pac_other22)

In [1194]:
# len(df_pac_other22[df_pac_other22['recipid__pac_other22'].str.startswith('C', na=True)])

In [1196]:
# len(df_pac_other22[df_pac_other22['recipid__pac_other22'].str.startswith('N', na=True)])

In [1198]:
# df_pac_other22[df_pac_other22['recipid__pac_other22'].str.startswith('N', na=True)].sample()

In [1190]:
df_pacs22[df_pacs22['pacid__pacs22'] == 'C00387464'].sample()

Unnamed: 0,cycle__pacs22,fecrecno__pacs22,pacid__pacs22,cid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,feccandid__pacs22
685570,2022,4102120221590178640,C00387464,N00048454,5000,08/03/2022,J2200,24K,D,S2WA00387


In [1188]:
df_pac_other22[df_pac_other22['filerid__pac_other22'] == 'C00387464'] # yes
# df_pac_other22[df_pac_other22['recipid__pac_other22'] == 'C00387464'] # yes

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22,donorid__pac_other22
3365,2022,4010920231666527579,C00387464,American Council of Engineering Cos,AMERICAN COUNCIL OF ENGINEERING COMPANIES (ACE...,WASHINGTON,DC,20005.0,,B4000,11/08/2022,5000.0,C00387464,R,C00010868,PI,J2200,N,30G,P,202212089547395102,18K,B4000,PAC,C00010868
3366,2022,4010920231666527580,C00387464,American Hospital Assn,AMERICAN HOSPITAL ASSOCIATION PAC,WASHINGTON,DC,20001.0,,H2100,11/08/2022,2500.0,C00387464,R,C00106146,PI,J2200,N,30G,P,202212089547395102,18K,H2100,PAC,C00106146
3367,2022,4010920231666527581,C00387464,American Property Casualty Insurance Assn,AMERICAN PROPERTY CASUALTY INSURANCE ASSOCIATI...,CHICAGO,IL,60631.0,,F3400,11/05/2022,1000.0,C00387464,R,C00692806,PI,J2200,N,30G,P,202212089547395102,18K,F3400,PAC,C00692806
3368,2022,4010920231666527582,C00387464,American Seniors Housing Assn,AMERICAN SENIORS HOUSING ASSOCIATION (SENIORS ...,WASHINGTON,DC,20015.0,,H2200,11/05/2022,2500.0,C00387464,R,C00325332,PI,J2200,N,30G,P,202212089547395103,18K,H2200,PAC,C00325332
3369,2022,4010920231666527583,C00387464,Charter Communications,CHARTER COMMUNICATIONS INC. POLITICAL ACTION C...,WASHINGTON,DC,20001.0,,C4500,11/05/2022,5000.0,C00387464,R,C00426775,PI,J2200,N,30G,P,202212089547395103,18K,C4500,PAC,C00426775
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1013407,2022,4112220221630609566,C00387464,Alamo PAC,MIKE LEE VICTORY FUND,SALT LAKE CITY,UT,84111.0,,J2200,10/19/2022,10000.0,C00767368,R,C00767368,RP,Z4100,N,12G,P,202210269541694941,24K,J2200,PAC,C00387464
1013408,2022,4112220221630609567,C00387464,Alamo PAC,OZ VICTORY FUND,ALEXANDRIA,VA,22313.0,,J2200,10/19/2022,15000.0,C00819177,R,C00819177,RP,Z4100,N,12G,P,202210269541694942,24K,J2200,PAC,C00387464
1013409,2022,4112220221630609568,C00387464,Alamo PAC,RUBIO VICTORY,MIAMI,FL,33153.0,,J2200,10/19/2022,10000.0,C00494617,R,C00494617,RP,Z4100,N,12G,P,202210269541694942,24K,J2200,PAC,C00387464
1013410,2022,4112220221630609570,C00387464,Alamo PAC,PEOPLES CHAMPION PAC,ATLANTA,GA,31150.0,,J2200,10/19/2022,5000.0,C00792069,R,C00792069,PI,J2400,N,12G,P,202210269541694943,24K,J2200,PAC,C00387464


In [1242]:
# Let's test a non-lead pac and see if they show up in 
df_cmtes22[df_cmtes22['cmteid__cmtes22'] == 'C00822627']

Unnamed: 0,cycle__cmtes22,cmteid__cmtes22,pacshort__cmtes22,affiliate__cmtes22,ultorg__cmtes22,recipid__cmtes22,recipcode__cmtes22,feccandid__cmtes22,party__cmtes22,primcode__cmtes22,source__cmtes22,sensitive__cmtes22,foreign__cmtes22,active__cmtes22
18103,2022,C00822627,Masters Victory Cmte,,Masters Victory Cmte,C00822627,RP,,R,Z4100,Rept,N,0,1


In [1182]:
df_pacs22[df_pacs22['pacid__pacs22'] == 'C00822627']

Unnamed: 0,cycle__pacs22,fecrecno__pacs22,pacid__pacs22,cid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,feccandid__pacs22


In [1176]:
df_pac_other22[df_pac_other22['filerid__pac_other22'] == 'C00822627']

Unnamed: 0,cycle__pac_other22,fecrecno__pac_other22,filerid__pac_other22,donorcmte__pac_other22,contriblendtrans__pac_other22,city__pac_other22,state__pac_other22,zip__pac_other22,fecoccemp__pac_other22,primcode__pac_other22,date__pac_other22,amount__pac_other22,recipid__pac_other22,party__pac_other22,otherid__pac_other22,recipcode__pac_other22,recipprimcode__pac_other22,amend__pac_other22,report__pac_other22,pg__pac_other22,microfilm__pac_other22,type__pac_other22,realcode__pac_other22,source__pac_other22,donorid__pac_other22
3365,2022,4010920231666527579,C00387464,American Council of Engineering Cos,AMERICAN COUNCIL OF ENGINEERING COMPANIES (ACE...,WASHINGTON,DC,20005.0,,B4000,11/08/2022,5000.0,C00387464,R,C00010868,PI,J2200,N,30G,P,202212089547395102,18K,B4000,PAC,C00010868
3366,2022,4010920231666527580,C00387464,American Hospital Assn,AMERICAN HOSPITAL ASSOCIATION PAC,WASHINGTON,DC,20001.0,,H2100,11/08/2022,2500.0,C00387464,R,C00106146,PI,J2200,N,30G,P,202212089547395102,18K,H2100,PAC,C00106146
3367,2022,4010920231666527581,C00387464,American Property Casualty Insurance Assn,AMERICAN PROPERTY CASUALTY INSURANCE ASSOCIATI...,CHICAGO,IL,60631.0,,F3400,11/05/2022,1000.0,C00387464,R,C00692806,PI,J2200,N,30G,P,202212089547395102,18K,F3400,PAC,C00692806
3368,2022,4010920231666527582,C00387464,American Seniors Housing Assn,AMERICAN SENIORS HOUSING ASSOCIATION (SENIORS ...,WASHINGTON,DC,20015.0,,H2200,11/05/2022,2500.0,C00387464,R,C00325332,PI,J2200,N,30G,P,202212089547395103,18K,H2200,PAC,C00325332
3369,2022,4010920231666527583,C00387464,Charter Communications,CHARTER COMMUNICATIONS INC. POLITICAL ACTION C...,WASHINGTON,DC,20001.0,,C4500,11/05/2022,5000.0,C00387464,R,C00426775,PI,J2200,N,30G,P,202212089547395103,18K,C4500,PAC,C00426775
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1013407,2022,4112220221630609566,C00387464,Alamo PAC,MIKE LEE VICTORY FUND,SALT LAKE CITY,UT,84111.0,,J2200,10/19/2022,10000.0,C00767368,R,C00767368,RP,Z4100,N,12G,P,202210269541694941,24K,J2200,PAC,C00387464
1013408,2022,4112220221630609567,C00387464,Alamo PAC,OZ VICTORY FUND,ALEXANDRIA,VA,22313.0,,J2200,10/19/2022,15000.0,C00819177,R,C00819177,RP,Z4100,N,12G,P,202210269541694942,24K,J2200,PAC,C00387464
1013409,2022,4112220221630609568,C00387464,Alamo PAC,RUBIO VICTORY,MIAMI,FL,33153.0,,J2200,10/19/2022,10000.0,C00494617,R,C00494617,RP,Z4100,N,12G,P,202210269541694942,24K,J2200,PAC,C00387464
1013410,2022,4112220221630609570,C00387464,Alamo PAC,PEOPLES CHAMPION PAC,ATLANTA,GA,31150.0,,J2200,10/19/2022,5000.0,C00792069,R,C00792069,PI,J2400,N,12G,P,202210269541694943,24K,J2200,PAC,C00387464


**Candidate's lead pacs contributions from pacs**

In [1132]:
# Adrian Smith's lead pac incoming contributions.
df_toy_network = pd.merge(df_toy_network, df_pacs22[df_pacs22['cid__pacs22'] == member_cid], left_on='cid__cands22', right_on='cid__pacs22', how='inner')
showdf(df_pacs22[df_pacs22['cid__pacs22'] == member_cid])

KeyError: 'cid_pac__pacs22'

In [1093]:
df_full_network = pd.merge(df_full_network, df_pacs22, left_on='cid__cands22', right_on='cid__pacs22', how='inner')

In [1117]:
df_toy_network.head()

Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22,cycle__cmtes22,cmteid__cmtes22,pacshort__cmtes22,affiliate__cmtes22,ultorg__cmtes22,recipid__cmtes22,recipcode__cmtes22,feccandid__cmtes22,party__cmtes22,primcode__cmtes22,source__cmtes22,sensitive__cmtes22,foreign__cmtes22,active__cmtes22
0,2022,H6NE03115,N00027623,Adrian Smith (R),R,NE03,NE03,Y,Y,I,RW,,Adrian Smith,2022,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,N00027623,RW,H6NE03115,R,Z1100,Rept,N,0,1


In [1060]:
# # ONLY FOR ILLUSTRATION: there are many C00456335 in the list.
# # For each of these I need to attached the pac info – Physicians Mutual Insurance
# df_pacs22[df_pacs22['pacid__pacs22'] == 'C00456335']

**Pac contributing to candidate's lead pac**

In [1063]:
# PAC ID contributing to Adrian (from row 0 above)
pacid_giving_to_adrian = df_pacs22[df_pacs22['cid__pacs22'] == member_cid].iloc[0]['pacid__pacs22']
pacid_giving_to_adrian

'C00456335'

In [1071]:
df_toy_network[df_toy_network['pacid__pacs22'] == pacid_giving_to_adrian]

Unnamed: 0,cycle__cands22,feccandid__cands22,cid__cands22,firstlastp__cands22,party__cands22,distidrunfor__cands22,distidcurr__cands22,currcand__cands22,cyclecand__cands22,crpico__cands22,recipcode__cands22,nopacs__cands22,firstlast__cands22,cycle__cmtes22,cmteid__cmtes22,pacshort__cmtes22,affiliate__cmtes22,ultorg__cmtes22,recipid__cmtes22,recipcode__cmtes22,feccandid__cmtes22,party__cmtes22,primcode__cmtes22,source__cmtes22,sensitive__cmtes22,foreign__cmtes22,active__cmtes22,cycle__pacs22,fecrecno__pacs22,pacid__pacs22,cid__pacs22,amount__pacs22,date__pacs22,realcode__pacs22,type__pacs22,di__pacs22,feccandid__pacs22
0,2022,H6NE03115,N00027623,Adrian Smith (R),R,NE03,NE03,Y,Y,I,RW,,Adrian Smith,2022,C00412890,Adrian Smith for Congress,,Adrian Smith for Congress,N00027623,RW,H6NE03115,R,Z1100,Rept,N,0,1,2022,2022720231732787170,C00456335,N00027623,2500,10/06/2022,F3100,24K,D,H6NE03115


In [1073]:
# Name of contributor from above.
pac_giving_to_adrian = df_cmtes22[df_cmtes22['cmteid__cmtes22'] == pacid_giving_to_adrian].iloc[0]['ultorg__cmtes22']
pac_giving_to_adrian

'Physicians Mutual Insurance'

In [1075]:
# The industry of the contributor from above.
df_toy_network = pd.merge(df_toy_network, df_lob_indus[df_lob_indus['client__lob_indus'] == pac_giving_to_adrian], left_on='primcode_pac__cmtes22', right_on='catcode__lob_indus', how='left')
showdf(df_lob_indus[df_lob_indus['client__lob_indus'] == pac_giving_to_adrian])

KeyError: 'primcode_pac__cmtes22'

In [None]:
df_full_network = pd.merge(df_full_network, df_lob_indus, left_on='primcode_pac__cmtes22', right_on='catcode__lob_indus', how='left')

**Industry of contrib pac giving to candidate's lead pac**

In [None]:
# Take one of the catcodes of this single Adrian's contributor
catcode_giving_to_adrian = df_lob_indus[df_lob_indus['client__lob_indus'] == pac_giving_to_adrian].iloc[0]['catcode__lob_indus']
catcode_giving_to_adrian

In [None]:
# Catcode details of this single Adrian's contributor
showdf(df_crp_cats[df_crp_cats['catcode'] == catcode_giving_to_adrian])

**Contributing pac's lobby**

In [None]:
# The contributor details, and their lobbying firms.
df_toy_network = pd.merge(df_toy_network, df_lob_lobbying[df_lob_lobbying['ultorg__lob_lobbying'] == pac_giving_to_adrian], left_on='ultorg_pac__cmtes22', right_on='ultorg__lob_lobbying', how='inner')

showdf(df_lob_lobbying[df_lob_lobbying['ultorg__lob_lobbying'] == pac_giving_to_adrian])

In [None]:
lobby_of_contrib_pac = df_lob_lobbying[df_lob_lobbying['ultorg__lob_lobbying'] == pac_giving_to_adrian].iloc[0]['uniqid__lob_lobbying']
lobby_of_contrib_pac

In [None]:
# Individuals associated with the particular lobby (empty in this example)
df_toy_network = pd.merge(df_toy_network, df_lob_lobbyist[df_lob_lobbyist['uniqid__lob_lobbyist'] == lobby_of_contrib_pac], left_on='uniqid__lob_lobbying', right_on='uniqid__lob_lobbyist', how='left')
                          
showdf(df_lob_lobbyist[df_lob_lobbyist['uniqid__lob_lobbyist'] == lobby_of_contrib_pac])

In [None]:
# # SAVE EXAMPLE OF LINKING LOBBYISTS AND LOBBIES
# # I found which lobbyists are part of bigger lobbies by extracting duplicated uniquids.
# df_lob_lobbyist[df_lob_lobbyist['uniqid__lob_lobbyist'].duplicated()].head(1)

In [None]:
# # SAVE EXAMPLE OF LINKING LOBBYISTS AND LOBBIES
# # You can take the lobbying uniqid and find individuals associated with this firm.
# df_lob_lobbying[df_lob_lobbying['uniqid__lob_lobbying'] == 'A5D185B7-3D2B-4C6B-B881-6385E3918B89']

### From candidate to recipient

In [None]:
# CID receiving Adrian's outgoing contribution (from row 0 above)
cid_receiving_from_adrian = df_pacs22[df_pacs22['pacid__pacs22'] == member_pid].iloc[0]['cid__pacs22']
cid_receiving_from_adrian

In [None]:
# Lead pac for the cid receiving Adrian's outgoing contribution.
showdf(df_cmtes22[df_cmtes22['recipid__cmtes22'] == cid_receiving_from_adrian])

---
## Toy Tests

In [None]:
# # SANDBOX TESTING.
# # Test two members on two donors.

# # 1. Select two or three members and their lead pacs.
# df_test_leadpacs = df_wm_leadpacs[(df_wm_leadpacs['cid__cands22'] == member_cid) | (df_wm_leadpacs['cid__cands22'] == member_cid2)]
# df_to_csv(df_test_leadpacs)
# df_test_leadpacs.head(3)

# # 2. Attach each individual revenue-producing transactions to each associated member.
# df_test_revenues = pd.merge(df_test_leadpacs, df_pacs22_revenues, left_on='cid__cands22', right_on='cid__pacs22', how='inner')
# df_to_csv(df_test_revenues)
# df_test_revenues.sample(5)

# # 3. Select two non-lead-pac committees to identify transactions, which are 
# #    the American Medical Assocation C00000422 and Physicians Mutual Insurance C00456335
# df_test_cmtes22 = df_cmtes22[(df__cmtes22['cmteid__cmtes22'] == 'C00000422') | (df_cmtes22['cmteid__cmtes22'] == 'C00456335')]
# df_to_csv(df_test_cmtes22)
# df_test_cmtes22.head(3) 

# # 4. Expand transactions by adding committee details to each one.
# df_test_comb = pd.merge(df_test_revenues, df_test_cmtes22, left_on='pacid__pacs22', right_on='cmteid__cmtes22', how='inner')
# df_to_csv(df_test_comb)
# df_test_comb.head(3)

---

## Experiments

**Leadpacs**

In [None]:
# Get all lead pac committees (when recipid starts with 'N')
df_leadpacs = df_cmtes22[df_cmtes22['recipid__cmtes22'].str.startswith('N')]
df_leadpacs.drop(['cycle__cmtes22', 'affiliate__cmtes22', 'recipcode__cmtes22',
                      'feccandid__cmtes22', 'party__cmtes22', 'primcode__cmtes22',
                      'source__cmtes22', 'sensitive__cmtes22', 'foreign__cmtes22',
                      'active__cmtes22'], axis=1, inplace=True)
df_to_csv(df_leadpacs)
showdf(df_leadpacs)

**Leadpacs contributions to controlling candidate**

In [None]:
df_leadpac_contribs = pd.merge(df_pacs22, df_leadpacs, left_on='pacid__pacs22', right_on='cmteid__cmtes22', how='inner')
df_to_csv(df_leadpac_contribs)
showdf(df_leadpac_contribs)

**Leadpacs contributions to recipient candidate, with recipient candidate details**

In [None]:
df_cands22_tmp = df_cands22[['cid__cands22', 'firstlast__cands22', 'party__cands22']]
df_leadpac_contribs_to_cands = pd.merge (df_leadpac_contribs, df_cands22_tmp, left_on='recipid__cmtes22', right_on='cid__cands22', how='inner')
df_to_csv(df_leadpac_contribs_to_cands)
showdf(df_leadpac_contribs_to_cands)

**Pacs (non lead)**

In [None]:
df_pacs = df_cmtes22[df_cmtes22['recipid__cmtes22'].str.startswith('C')]
df_to_csv(df_pacs)
showdf(df_pacs)

**Pacs and their contributions to other pacs**

In [None]:
df_pac_contribs = pd.merge(df_pacs22, df_pacs, left_on='pacid__pacs22', right_on='cmteid__cmtes22', how='inner')
df_to_csv(df_pac_contribs)
showdf(df_pac_contribs)

**Pacs and their contributions to other pacs, with recipient pac details**

---

In [None]:
# # 1. Given our member, we follow donations like this one. 
# # pacid, a member of realcode, gives amount di ([in]directly) to cid
# df_pacs22[(df_pacs22['cid__pacs22'] == member_cid) & (df_pacs22['pacid__pacs22'] == other_pacid)]

In [None]:
# # 2. We find out who the donor is.
# # cmteid, aka afilliate of ultorg, 
# df_cmtes22[df_cmtes22['cmteid__cmtes22'] == 'C00000422']

In [None]:
# df_cmtes22[df_cmtes22['cmteid_cmtes22'] != df_cmtes22['recipid_cmtes22']]

In [None]:
# len(df_cmtes22)

In [None]:
# df_cmtes22[(df_cmtes22['cmteid_cmtes22'] == df_cmtes22['recipid_cmtes22']) & (df_cmtes22['recipid_cmtes22'].str.startswith('N'))]

In [None]:
# df_cmtes22[(df_cmtes22['cmteid_cmtes22'] != df_cmtes22['recipid_cmtes22']) & (df_cmtes22['recipid_cmtes22'].str.startswith('N'))]

In [None]:
# df_cmtes22[(df_cmtes22['cmteid_cmtes22'] == df_cmtes22['recipid_cmtes22']) & (df_cmtes22['recipid_cmtes22'].str.startswith('C'))]

In [None]:
# # 3. Now we their ideology
# df_crp_cats[df_crp_cats['catcode'] == 'H1100']