# SI 608 Project – Workspace
<span style="font-size: 18px;">General scratchpad workspace that preloads all the dataframes.</span>
<br>See <code>./modules</code> to review how libraries are installed and imported, as well as where the data is loaded, cleaned, and formatted. This is only here as a helpful tool, make a copy and do whatever you'd like. Or don't use this at all if that's preferable.

[OpenSecrets Data Dictionary Index](../../docs/open_source_data_dictionary.md)
<br><small><em>(View the index with markdown preview)</em></small>

## Environment

#### Settings
Configure certain behaviors in this notebook.

In [4]:
DISPLAY_DF = True # for showdf() -> df.head()
SAVE_DF = True # for to_csv() -> pd.to_csv()

#### Initialize
Init file contains helper functions used throughout the project.

In [6]:
%run modules/init.ipynb

Initializing project...
pandas is already installed.
matplotlib is already installed.
networkx is already installed.
numpy is already installed.
...initialization complete.


#### Datasets

This module provides a single function for all of the *contribution* data from OpenSecrets.

In [8]:
%run modules/data.ipynb

Loading data module...
...data module loaded.


---
## Data

### 527 data

#### cmtes527

In [12]:
# OpenSecrets Data Definition: 527 Committees
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Cmtes.htm
columns_cmtes527 = ['cycle', 'rpt', 'ein', 'crp527name', 'affiliate', 'ultorg', 
                    'recipcode', 'cmteid', 'cid', 'eccmteid', 'party', 
                    'primcode', 'source', 'ffreq', 'ctype', 'csource', 'viewpt',
                    'comments', 'state']

if not os.path.exists('../../data/open_secrets/527/cmtes527.csv'):
    process_data('../../data/open_secrets/527/cmtes527.txt', n_expected_fields=len(columns_cmtes527), headers=columns_cmtes527, show_errs=False)

df_cmtes527 = pd.read_csv('../../data/open_secrets/527/cmtes527.csv', on_bad_lines='skip')

Reading line 3029 of 15149...
Reading line 6058 of 15149...
Reading line 9087 of 15149...
Reading line 12116 of 15149...
Reading line 15145 of 15149...
Processed data saved as ../../data/open_secrets/527/cmtes527.csv


In [13]:
showdf(df_cmtes527)

Unnamed: 0,cycle__cmtes527,rpt__cmtes527,ein__cmtes527,crp527name__cmtes527,affiliate__cmtes527,ultorg__cmtes527,recipcode__cmtes527,cmteid__cmtes527,cid__cmtes527,eccmteid__cmtes527,party__cmtes527,primcode__cmtes527,source__cmtes527,ffreq__cmtes527,ctype__cmtes527,csource__cmtes527,viewpt__cmtes527,comments__cmtes527,state__cmtes527
0,2002,Q302,861006189,American Electronics Assn,,American Electronics Assn,PB,,,,,C5000,WebPN,Q,F,,N,,AZ
1,2008,Q308,262108560,California 2008 GOP Delegation Corporate,,California 2008 GOP Delegation,RP,,,,R,Z5100,Name,Q,F,,C,,CA
2,2000,Q400,912101097,Alabama League of Environmental Action,,Alabama League of Environmental Action,PI,,,,,JE300,Name,Q,S,Name,L,,AL


#### expends527

In [15]:
# OpenSecrets Data Dictionary 527 Expenditure Data - from IRS Form 8872B
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Expenditures.htm
columns_expends527 = ['rpt', 'formid', 'schbid', 'orgname', 'ein', 'recipient', 
                    'recipientcrp', 'amount', 'date', 'expcode', 'source', 
                    'purpose', 'addr1', 'addr2', 'city', 'state', 'zip',
                    'employer', 'occupation']

if not os.path.exists('../../data/open_secrets/527/expends527.csv'):
    process_data('../../data/open_secrets/527/expends527.txt', nrows=500, headers=columns_expends527, n_expected_fields=len(columns_cmtes527), show_errs=False)

df_expends527 = pd.read_csv('../../data/open_secrets/527/expends527.csv', nrows=10000, on_bad_lines='skip')

Processed data saved as ../../data/open_secrets/527/expends527.csv


In [16]:
showdf(df_expends527)

Unnamed: 0,rpt__expends527,formid__expends527,schbid__expends527,orgname__expends527,ein__expends527,recipient__expends527,recipientcrp__expends527,amount__expends527,date__expends527,expcode__expends527,source__expends527,purpose__expends527,addr1__expends527,addr2__expends527,city__expends527,state__expends527,zip__expends527,employer__expends527,occupation__expends527
0,Q210,9595787,2016057,Republican State Leadership Cmte,50532524,VERIZON,Verizon Communications,125,04/16/2010,A70,@new,TELEPHONE,PO BOX 660720,,DALLAS,TX,75266,,
1,Q210,9595787,2016059,Republican State Leadership Cmte,50532524,VERIZON WIRELESS,Verizon Wireless,141,04/09/2010,A70,@new,CELL PHONE,PO BOX 25505,,LEHIGH VALLEY,PA,18002,,
2,Q210,9595787,2016062,Republican State Leadership Cmte,50532524,SCOTT WARD,Scott Ward,37,04/27/2010,A40,@new,"EXP. REIMB., TRAVEL","4817 41ST STREET, NW",,WASHINGTON,DC,20016,REPUBLICAN STATE LEADERSHIP COMMITTEE,PRESIDENT


#### rcpts527

In [18]:
# OpenSecrets Data Dictionary 527 Contribution Data - from IRS Form 8872A
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20527%20Receipts.htm
columns_rcpts527 = ['id', 'rpt', 'formid', 'schaid', 'contribid', 'contrib', 
                    'amount', 'date', 'orgname', 'ultorg', 'realcode', 
                    'recipid', 'recipcode', 'party', 'recipient', 'city', 'state',
                    'zip', 'zip4', 'pmsa', 'employer', 'occupation', 'ytd', 'gender', 'source']

if not os.path.exists('../../data/open_secrets/527/rcpts527.csv'):
    process_data('../../data/open_secrets/527/rcpts527.txt', nrows=10000, headers=columns_rcpts527, n_expected_fields=len(columns_rcpts527), show_errs=False)

df_rcpts527 = pd.read_csv('../../data/open_secrets/527/rcpts527.csv', nrows=10000, on_bad_lines='skip')

Processed data saved as ../../data/open_secrets/527/rcpts527.csv


In [19]:
showdf(df_rcpts527)

Unnamed: 0,id__rcpts527,rpt__rcpts527,formid__rcpts527,schaid__rcpts527,contribid__rcpts527,contrib__rcpts527,amount__rcpts527,date__rcpts527,orgname__rcpts527,ultorg__rcpts527,realcode__rcpts527,recipid__rcpts527,recipcode__rcpts527,party__rcpts527,recipient__rcpts527,city__rcpts527,state__rcpts527,zip__rcpts527,zip4__rcpts527,pmsa__rcpts527,employer__rcpts527,occupation__rcpts527,ytd__rcpts527,gender__rcpts527,source__rcpts527
0,981,Q210,9595837,2017490,,WEST LA DEMOCRATIC CLUB,1,04/18/2010,West La Democratic Club,,Z9600,270160261,PI,,ActBlue Technical Services,BURBANK,CA,91502,,4480,,,473,,Rept
1,982,Q210,9595837,2017492,,WINOGRAD FOR CONGRESS 2010,259,04/18/2010,Winograd For Congress 2010,,Z9600,270160261,PI,,ActBlue Technical Services,BURBANK,CA,91502,,4480,,,1049,,Rept
2,983,Q210,9595837,2017387,,FDL ACTION PAC,4,04/18/2010,Fdl Action Pac,,Z9600,270160261,PI,,ActBlue Technical Services,WASHINGTON,DC,20016,,8840,,,1524,,Rept


---
### Campaign Finance 18 data
#### cands18

In [21]:
# OpenSecrets Data Definition: Candidates
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20Candidates%20Data.htm
columns_cands18 = ['cycle', 'feccandid', 'cid', 'firstlastp', 'party', 'distidrunfor', 
                    'distidcurr', 'currcand', 'cyclecand', 'crpico', 'recipcode', 
                    'nopacs']

if not os.path.exists('../../data/open_secrets/CampaignFin18/cands18.csv'):
    process_data('../../data/open_secrets/CampaignFin18/cands18.txt', headers=columns_cands18, n_expected_fields=len(columns_cands18), show_errs=False)

df_cands18 = pd.read_csv('../../data/open_secrets/CampaignFin18/cands18.csv', on_bad_lines='skip')

# # Remove party labels from names: '3', 'R', 'D', 'I', 'L', 'U', 'i'
# df_cands18['firstlast__cands18'] = df_cands18['firstlastp__cands18'].apply(
#     lambda x: x.replace(" (3)", "").replace(" (R)", "").replace(" (D)", "").replace(" (I)", "").replace(" (L)", "").replace(" (U)", "").replace(" (i)", "") if isinstance(x, str) else x
# )

Reading line 1527 of 7639...
Reading line 3054 of 7639...
Reading line 4581 of 7639...
Reading line 6108 of 7639...
Reading line 7635 of 7639...
Processed data saved as ../../data/open_secrets/CampaignFin18/cands18.csv


In [22]:
showdf(df_cands18)

Unnamed: 0,cycle__cands18,feccandid__cands18,cid__cands18,firstlastp__cands18,party__cands18,distidrunfor__cands18,distidcurr__cands18,currcand__cands18,cyclecand__cands18,crpico__cands18,recipcode__cands18,nopacs__cands18
0,2018,H0AL02087,N00030768,Martha Roby (R),R,AL02,AL02,Y,Y,I,RW,
1,2018,H0AL03192,N00043592,Hannah Thompson (D),D,AL03,,,,,DN,
2,2018,H0AL05049,N00003042,Bud Cramer (D),D,AL05,,,,,DN,


#### cmtes18
*All cmtes, lead cmtes, pac cmtes*

In [24]:
# OpenSecrets Table Definition: Committee table
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20Cmtes.htm
columns_cmtes18 = ['cycle', 'cmteid', 'pacshort', 'affiliate', 'ultorg', 'recipid', 
                    'recipcode', 'feccandid', 'party', 'primcode', 'source', 'sensitive',
                    'foreign', 'active']

if not os.path.exists('../../data/open_secrets/CampaignFin18/cmtes18.csv'):
    process_data('../../data/open_secrets/CampaignFin18/cmtes18.txt', headers=columns_cmtes18, n_expected_fields=len(columns_cmtes18), show_errs=False)

df_cmtes18 = pd.read_csv('../../data/open_secrets/CampaignFin18/cmtes18.csv', on_bad_lines='skip')

Reading line 3854 of 19270...
Reading line 7708 of 19270...
Reading line 11562 of 19270...
Reading line 15416 of 19270...
Reading line 19270 of 19270...
Processed data saved as ../../data/open_secrets/CampaignFin18/cmtes18.csv


**All cmtes**

In [26]:
print(len(df_cmtes18))
showdf(df_cmtes18)

19270


Unnamed: 0,cycle__cmtes18,cmteid__cmtes18,pacshort__cmtes18,affiliate__cmtes18,ultorg__cmtes18,recipid__cmtes18,recipcode__cmtes18,feccandid__cmtes18,party__cmtes18,primcode__cmtes18,source__cmtes18,sensitive__cmtes18,foreign__cmtes18,active__cmtes18
0,2018,C00000018,IRONWORKERS LOCAL UNION NO. 25 POLITICAL EDUCA...,,,C00000018,,H8TX22313,,,,,0,0
1,2018,C00000059,Hallmark Cards,,Hallmark Cards,C00000059,PB,,,C1400,WebAM,n,0,1
2,2018,C00000422,American Medical Assn,American Medical Assn,American Medical Assn,C00000422,PB,,,H1100,AFP88,n,0,1


**Split lead and non-lead cmtes**

In [28]:
# I need to get transactions from NOT lead pac to IS lead pac.
# Use this column, pacid__pacs18, and lookup if value is a leadpac
# Then, remove all pacid__pacs18 that represent leadpacs.
df_recipid_cmtes18 = df_cmtes18[['cmteid__cmtes18', 'recipid__cmtes18']]

# Lead pac committees pacids, for filtering.
df_recipid_lead_cmtes18 = df_recipid_cmtes18[df_recipid_cmtes18['recipid__cmtes18'].str.startswith('N', na=False)]
df_recipid_lead_cmtes18 = df_recipid_lead_cmtes18[['cmteid__cmtes18']]

# Non-lead pac committees pacids, for filtering.
df_recipid_pac_cmtes18 = df_recipid_cmtes18[df_recipid_cmtes18['recipid__cmtes18'].str.startswith('C', na=False)]
df_recipid_pac_cmtes18 = df_recipid_pac_cmtes18[['cmteid__cmtes18']]

In [29]:
# Lead pac committees (filtered).
df_lead_cmtes18 = df_cmtes18[df_cmtes18['cmteid__cmtes18'].isin(df_recipid_lead_cmtes18['cmteid__cmtes18'])]
df_lead_cmtes18.columns = df_lead_cmtes18.columns.str.replace(r'(.*?)__(.*)', r'\1_lead__\2', regex=True)
print(len(df_lead_cmtes18))
showdf(df_lead_cmtes18)

6516


Unnamed: 0,cycle_lead__cmtes18,cmteid_lead__cmtes18,pacshort_lead__cmtes18,affiliate_lead__cmtes18,ultorg_lead__cmtes18,recipid_lead__cmtes18,recipcode_lead__cmtes18,feccandid_lead__cmtes18,party_lead__cmtes18,primcode_lead__cmtes18,source_lead__cmtes18,sensitive_lead__cmtes18,foreign_lead__cmtes18,active_lead__cmtes18
29,2018,C00002592,COMMITTEE TO REELECT TOM FOLEY ...,,,N00007967,DN,H6WA05023,D,,,N,0,0
140,2018,C00012229,Alaskans for Don Young,,Alaskans for Don Young,N00007999,RW,H6AK00045,R,Z1100,Rept,N,0,1
150,2018,C00013128,Congressman Waxman Campaign Cmte,,Congressman Waxman Campaign Cmte,N00001861,DN,H6CA24048,D,Z1200,Rept,N,0,1


In [30]:
# Non-lead pac committees (filtered).
df_pac_cmtes18 = df_cmtes18[df_cmtes18['cmteid__cmtes18'].isin(df_recipid_pac_cmtes18['cmteid__cmtes18'])]
df_pac_cmtes18.columns = df_pac_cmtes18.columns.str.replace(r'(.*?)__(.*)', r'\1_pac__\2', regex=True)
print(len(df_pac_cmtes18))
showdf(df_pac_cmtes18)

12754


Unnamed: 0,cycle_pac__cmtes18,cmteid_pac__cmtes18,pacshort_pac__cmtes18,affiliate_pac__cmtes18,ultorg_pac__cmtes18,recipid_pac__cmtes18,recipcode_pac__cmtes18,feccandid_pac__cmtes18,party_pac__cmtes18,primcode_pac__cmtes18,source_pac__cmtes18,sensitive_pac__cmtes18,foreign_pac__cmtes18,active_pac__cmtes18
0,2018,C00000018,IRONWORKERS LOCAL UNION NO. 25 POLITICAL EDUCA...,,,C00000018,,H8TX22313,,,,,0,0
1,2018,C00000059,Hallmark Cards,,Hallmark Cards,C00000059,PB,,,C1400,WebAM,n,0,1
2,2018,C00000422,American Medical Assn,American Medical Assn,American Medical Assn,C00000422,PB,,,H1100,AFP88,n,0,1


#### pac_other18 – pacs to pacs
*All pacs, pac-to-pac, pac-to-cand*

**All pacs**

In [33]:
# OpenSecrets Data Definition for PAC to PAC Data (Pac_other table)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20PAC%20to%20PAC%20Data.htm
columns_pac_other18 = ['cycle', 'fecrecno', 'filerid', 'donorcmte', 'contriblendtrans', 'city', 'state', 
                            'zip', 'fecoccemp', 'primcode', 'date', 'amount', 'recipid', 'party', 'otherid',
                            'recipcode', 'recipprimcode', 'amend', 'report', 'pg', 'microfilm', 'type',
                            'realcode', 'source']

if not os.path.exists('../../data/open_secrets/CampaignFin18/pac_other18.csv'):
    process_data('../../data/open_secrets/CampaignFin18/pac_other18.txt', headers=columns_pac_other18, n_expected_fields=len(columns_pac_other18), show_errs=False)

df_pac_other18 = pd.read_csv('../../data/open_secrets/CampaignFin18/pac_other18.csv', on_bad_lines='skip')

Reading line 35366 of 176830...
Reading line 70732 of 176830...
Reading line 106098 of 176830...
Reading line 141464 of 176830...
Reading line 176830 of 176830...
Processed data saved as ../../data/open_secrets/CampaignFin18/pac_other18.csv


In [34]:
# Identify the donor pacid.
# The "filerid" is the donor if "type" starts with "2".
# The "otherid" is the donor if "type" starts with "1".
df_pac_other18['donorid__pac_other18'] = df_pac_other18.apply(
    lambda row: row['otherid__pac_other18'] if row['type__pac_other18'].startswith('1')
    else (row['filerid__pac_other18'] if row['type__pac_other18'].startswith('2') else None),
    axis=1
)

In [35]:
# Identify the recipient lead pacid (starting with "C").
# The "filerid" is the recipient if "type" starts with "1".
# The "otherid" is the recipient if "type" starts with "2".
df_pac_other18['recippacid__pac_other18'] = df_pac_other18.apply(
    lambda row: row['otherid__pac_other18'] if row['type__pac_other18'].startswith('2')
    else (row['filerid__pac_other18'] if row['type__pac_other18'].startswith('1') else None),
    axis=1
)

In [36]:
showdf(df_pac_other18)

Unnamed: 0,cycle__pac_other18,fecrecno__pac_other18,filerid__pac_other18,donorcmte__pac_other18,contriblendtrans__pac_other18,city__pac_other18,state__pac_other18,zip__pac_other18,fecoccemp__pac_other18,primcode__pac_other18,date__pac_other18,amount__pac_other18,recipid__pac_other18,party__pac_other18,otherid__pac_other18,recipcode__pac_other18,recipprimcode__pac_other18,amend__pac_other18,report__pac_other18,pg__pac_other18,microfilm__pac_other18,type__pac_other18,realcode__pac_other18,source__pac_other18,donorid__pac_other18,recippacid__pac_other18
0,2018,1010320180036112556,C00637983,"Nardolillo, Bobby",ROBERT A NARDOLLILO III,GREENE,RI,2827.0,NARDOLILLO FUNERAL HOME,Z1100,03/03/2017,2000.0,N00040819,R,S8RI00110,RN,Z1100,A,Q2,P,201707200200233820,16C,Z1100,PAC,S8RI00110,C00637983
1,2018,1010320180036112568,C00637983,Bobby for Senate,LEADERSHIP CONNECTICUT PAC,TRUMBULL,CT,6611.0,,Z1100,06/26/2017,280.0,C00499863,,C00499863,PI,J1100,A,Q2,P,201707200200233825,24G,Z1100,PAC,C00637983,C00499863
2,2018,1010320180036112716,C00443218,Wicker Majority Fund,WICKER MAJORITY FUND,JACKSON,MS,39205.0,,Z4100,09/30/2017,3323.0,N00003280,R,C00646380,RW,Z1100,A,Q3,G,201710200200357062,18G,Z4100,PAC,C00646380,C00443218


In [37]:
# # Notice that candidates are never filers in pac_other18.
# df_pac_other18[df_pac_other18['filerid__pac_other18'].str.startswith('N', na=False)]

**Pacs to pacs**

In [39]:
# Flows from pacs to pacs (non-lead/candidate)
df_pac_to_pac = df_pac_other18[~ df_pac_other18['recipid__pac_other18'].str.startswith('N', na=False)]

In [40]:
showdf(df_pac_to_pac)

Unnamed: 0,cycle__pac_other18,fecrecno__pac_other18,filerid__pac_other18,donorcmte__pac_other18,contriblendtrans__pac_other18,city__pac_other18,state__pac_other18,zip__pac_other18,fecoccemp__pac_other18,primcode__pac_other18,date__pac_other18,amount__pac_other18,recipid__pac_other18,party__pac_other18,otherid__pac_other18,recipcode__pac_other18,recipprimcode__pac_other18,amend__pac_other18,report__pac_other18,pg__pac_other18,microfilm__pac_other18,type__pac_other18,realcode__pac_other18,source__pac_other18,donorid__pac_other18,recippacid__pac_other18
1,2018,1010320180036112568,C00637983,Bobby for Senate,LEADERSHIP CONNECTICUT PAC,TRUMBULL,CT,6611.0,,Z1100,06/26/2017,280.0,C00499863,,C00499863,PI,J1100,A,Q2,P,201707200200233825,24G,Z1100,PAC,C00637983,C00499863
55,2018,1011920180036164101,C00326801,Tammy Baldwin for Senate,DEMOCRATIC PARTY OF WISCONSIN,MADISON,WI,53703.0,,Z1200,04/18/2017,0.0,C00019331,D,C00019331,DP,Z5200,A,Q2,P,201707180200203936,24K,Z1200,PAC,C00326801,C00019331
57,2018,1012620180036219483,C00632273,Patriots for Saccone,REPUBLICAN PARTY OF PA,,,,,Z1100,09/18/2017,800.0,C00044842,R,C00044842,RP,Z5100,A,Q3,P,201710100200281543,24K,Z1100,PAC,C00632273,C00044842


**Pacs to cands**

In [42]:
# Flows from pacs *directly* to indiv candidates
df_pac_to_cand = df_pac_other18[df_pac_other18['recipid__pac_other18'].str.startswith('N', na=False)]

In [43]:
showdf(df_pac_to_cand)

Unnamed: 0,cycle__pac_other18,fecrecno__pac_other18,filerid__pac_other18,donorcmte__pac_other18,contriblendtrans__pac_other18,city__pac_other18,state__pac_other18,zip__pac_other18,fecoccemp__pac_other18,primcode__pac_other18,date__pac_other18,amount__pac_other18,recipid__pac_other18,party__pac_other18,otherid__pac_other18,recipcode__pac_other18,recipprimcode__pac_other18,amend__pac_other18,report__pac_other18,pg__pac_other18,microfilm__pac_other18,type__pac_other18,realcode__pac_other18,source__pac_other18,donorid__pac_other18,recippacid__pac_other18
0,2018,1010320180036112556,C00637983,"Nardolillo, Bobby",ROBERT A NARDOLLILO III,GREENE,RI,2827.0,NARDOLILLO FUNERAL HOME,Z1100,03/03/2017,2000.0,N00040819,R,S8RI00110,RN,Z1100,A,Q2,P,201707200200233820,16C,Z1100,PAC,S8RI00110,C00637983
2,2018,1010320180036112716,C00443218,Wicker Majority Fund,WICKER MAJORITY FUND,JACKSON,MS,39205.0,,Z4100,09/30/2017,3323.0,N00003280,R,C00646380,RW,Z1100,A,Q3,G,201710200200357062,18G,Z4100,PAC,C00646380,C00443218
3,2018,1010320180036112854,C00443218,2017 Senators Classic Cmte,2017 SENATORS CLASSIC COMMITTEE,ALEXANDRIA,VA,22314.0,,Z4100,09/20/2017,9530.0,N00003280,R,C00637181,RW,Z1100,A,Q3,P,201710200200357063,18G,Z4100,PAC,C00637181,C00443218


#### pacs18 – pacs to cands

In [45]:
# Pacs18 – Lead pacs only.
# OpenSecrets Data Definition: PAC table (PACs to Candidates)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20PAC%20to%20Cands%20Data.htm
# "pacid" who represents "realcode" (industry or ideology) "di" (directly or indirectly) contributes "amount" to to "cid".
# NOTE: pacid__pacs18 never equals cid__pacs18 – no self-contributions.
columns_pacs18 = ['cycle', 'fecrecno', 'pacid', 'cid', 'amount', 'date', 'realcode', 
                            'type', 'di', 'feccandid']

if not os.path.exists('../../data/open_secrets/CampaignFin18/pacs18.csv'):
    process_data('../../data/open_secrets/CampaignFin18/pacs18.txt', headers=columns_pacs18, n_expected_fields=len(columns_pacs18), show_errs=False)

df_pacs18 = pd.read_csv('../../data/open_secrets/CampaignFin18/pacs18.csv', on_bad_lines='skip')

Reading line 70622 of 353110...
Reading line 141244 of 353110...
Reading line 211866 of 353110...
Reading line 282488 of 353110...
Reading line 353110 of 353110...
Processed data saved as ../../data/open_secrets/CampaignFin18/pacs18.csv


In [46]:
showdf(df_pacs18)

Unnamed: 0,cycle__pacs18,fecrecno__pacs18,pacid__pacs18,cid__pacs18,amount__pacs18,date__pacs18,realcode__pacs18,type__pacs18,di__pacs18,feccandid__pacs18
0,2018,4101920171458697439,C00411553,N00035278,1000,09/18/2017,H1100,24K,D,H4MA05084
1,2018,4061620171409955636,C00142711,N00004357,2500,05/15/2017,D2000,24K,D,H8WI01024
2,2018,4020620181503867907,C00439521,N00034584,2000,09/18/2017,J2200,24K,D,H4ME02234


In [47]:
# # Lead pac transactions
# df_lead_pacs18 = df_pacs18[df_pacs18['pacid__pacs18'].isin(df_recipid_lead_cmtes18['cmteid__cmtes18'])]
# print(len(df_lead_pacs18))
# showdf(df_lead_pacs18)

In [48]:
# # Non-lead pac transactions
# df_pac_pacs18 = df_pacs18[df_pacs18['pacid__pacs18'].isin(df_recipid_pac_cmtes18['cmteid__cmtes18'])]
# print(len(df_pac_pacs18))
# showdf(df_pac_pacs18)

#### indivs18

In [50]:
# OpenSecrets Data Definition: Individual Contribution Data
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20for%20Individual%20Contribution%20Data.htm
columns_indivs18 = ['cycle', 'fectransid', 'contribid', 'contrib_last', 'contrib_first', 'recipid', 'orgname', 
                    'ultorg', 'realcode', 'date', 'amount', 'street', 'city', 'state',
                    'zip', 'recipcode', 'type', 'cmteid', 'otherid', 'gender', 'microfilm',
                    'occupation', 'employer', 'source']

# This dataset is huge, and crashes my computer.
# Takes 6.5min to read the file.

if not os.path.exists('../../data/open_secrets/CampaignFin18/indivs18.csv'):
    process_data('../../data/open_secrets/CampaignFin18/indivs18.txt', headers=columns_indivs18, nrows=1000, n_expected_fields=len(columns_indivs18), show_errs=False)

df_indivs18 = pd.read_csv('../../data/open_secrets/CampaignFin18/indivs18.csv', on_bad_lines='skip', nrows=1000)

Processed data saved as ../../data/open_secrets/CampaignFin18/indivs18.csv


In [51]:
showdf(df_indivs18)

Unnamed: 0,cycle__indivs18,fectransid__indivs18,contribid__indivs18,contrib_last__indivs18,contrib_first__indivs18,recipid__indivs18,orgname__indivs18,ultorg__indivs18,realcode__indivs18,date__indivs18,amount__indivs18,street__indivs18,city__indivs18,state__indivs18,zip__indivs18,recipcode__indivs18,type__indivs18,cmteid__indivs18,otherid__indivs18,gender__indivs18,microfilm__indivs18,occupation__indivs18,employer__indivs18,source__indivs18
0,2018,1010320180036112450,q0000744792,"CAZZANI, SERAFINO V",N00040819,Cazzani Power Boats,,Y4000,08/04/2017,1000,,CRANSTON,RI,2920,RN,15,C00637983,,M,201710180200332243,,,,
1,2018,1010320180036112466,p0004650540,"JONES, KENNETH",N00040819,Kenneth Jones Construction,,B1500,08/07/2017,250,,WEST GREENWICH,RI,2817,RN,15,C00637983,,M,201710180200332248,,,Name,
2,2018,1010320180036112472,i3003283827@,"NARDOLILLO, KIM",N00040819,Nardolillo Funeral Home,,G5400,09/30/2017,1360,,NARRAGANSETT,RI,2882,RN,15,C00637983,,F,201710180200332250,,NARDOLILLO FUNERAL HOME,Name,


---
### Expends18 data
#### expends18

In [53]:
# OpenSecrets Data Dictionary for Expenditure Data - from FEC electronic filings
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20Expenditures.htm
columns_expends18 = ['cycle', 'id', 'transid', 'crpfilerid', 
                     'recipcode', 'pacshort', 'crprecipname', 
                     'expcode', 'amount', 'date', 'city', 'state', 
                     'zip', 'cmteid_ef', 'candid', 'type',
                     'descrip', 'pg', 'elecother', 'enttype',
                     'source']

if not os.path.exists('../../data/open_secrets/Expend18/expends18.csv'):
    process_data('../../data/open_secrets/Expend18/expends18.txt', headers=columns_expends18, nrows=1000, n_expected_fields=len(columns_expends18), show_errs=False)

df_expends18 = pd.read_csv('../../data/open_secrets/Expend18/expends18.csv', on_bad_lines='skip', nrows=1000)

Processed data saved as ../../data/open_secrets/Expend18/expends18.csv


In [54]:
# All pac expenditures
showdf(df_expends18)

Unnamed: 0,cycle__expends18,id__expends18,transid__expends18,crpfilerid__expends18,recipcode__expends18,pacshort__expends18,crprecipname__expends18,expcode__expends18,amount__expends18,date__expends18,city__expends18,state__expends18,zip__expends18,cmteid_ef__expends18,candid__expends18,type__expends18,descrip__expends18,pg__expends18,elecother__expends18,enttype__expends18,source__expends18
0,2018,1,D852477,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",A60,397,01/10/2017,Hubbard,OH,44425,,H4OH14094,,Compliance Services,P2004,,CAN,@auto
1,2018,2,D852474,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",A60,99,01/06/2017,Hubbard,OH,44425,,H4OH14094,,Compliance Services,P2004,,CAN,@auto
2,2018,3,D907626,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",U00,26,01/26/2017,Hubbard,OH,44425,,H4OH14094,,Shipping,P2004,,CAN,tempU


In [55]:
# Lead pac expenditures.
df_lead__expends18 = df_expends18[df_expends18['crpfilerid__expends18'].str.startswith('N')]
print(len(df_lead__expends18))
showdf(df_lead__expends18)

93


Unnamed: 0,cycle__expends18,id__expends18,transid__expends18,crpfilerid__expends18,recipcode__expends18,pacshort__expends18,crprecipname__expends18,expcode__expends18,amount__expends18,date__expends18,city__expends18,state__expends18,zip__expends18,cmteid_ef__expends18,candid__expends18,type__expends18,descrip__expends18,pg__expends18,elecother__expends18,enttype__expends18,source__expends18
0,2018,1,D852477,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",A60,397,01/10/2017,Hubbard,OH,44425,,H4OH14094,,Compliance Services,P2004,,CAN,@auto
1,2018,2,D852474,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",A60,99,01/06/2017,Hubbard,OH,44425,,H4OH14094,,Compliance Services,P2004,,CAN,@auto
2,2018,3,D907626,N00026651,DN,Capri S Cafaro for Congress,"Cafaro, Capri",U00,26,01/26/2017,Hubbard,OH,44425,,H4OH14094,,Shipping,P2004,,CAN,tempU


In [56]:
# Non-lead pac expenditures.
df_pac__expends18 = df_expends18[df_expends18['crpfilerid__expends18'].str.startswith('C')]
print(len(df_pac__expends18))
showdf(df_pac__expends18)

907


Unnamed: 0,cycle__expends18,id__expends18,transid__expends18,crpfilerid__expends18,recipcode__expends18,pacshort__expends18,crprecipname__expends18,expcode__expends18,amount__expends18,date__expends18,city__expends18,state__expends18,zip__expends18,cmteid_ef__expends18,candid__expends18,type__expends18,descrip__expends18,pg__expends18,elecother__expends18,enttype__expends18,source__expends18
24,2018,25,VQZGNAC0357,C00586982,DP,Kamala Harris Victory Fund,"Giza, Alexander",N10,1100,01/03/2017,Culver City,CA,902323420,,,,Refund,,,IND,@new
36,2018,37,SB21B4554,C00387811,3P,America First Party/Mississippi,Planters Bank & Trust Co,A00,7,01/03/2017,Greenwood,MS,38935,,,,bank fees,,,ORG,@new
37,2018,38,SB21B15920,C00377002,3P,America First National Cmte,1st National Processing,F40,182,01/03/2017,Calabasas,CA,913023829,,,,credit card processing fees,,,ORG,@new


---
### Lobby data
#### lob_agency

In [58]:
# OpenSecrets Data Definition for Lobbying Data: Lobby agencies
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_agency.htm
columns_lob_agency = ['uniqid', 'agencyid', 'agency']

if not os.path.exists('../../data/open_secrets/Lobby/lob_agency.csv'):
    process_data('../../data/open_secrets/Lobby/lob_agency.txt', headers=columns_lob_agency, n_expected_fields=len(columns_lob_agency), show_errs=False)

df_lob_agency = pd.read_csv('../../data/open_secrets/Lobby/lob_agency.csv', on_bad_lines='skip')

Reading line 819552 of 4097763...
Reading line 1639104 of 4097763...
Reading line 2458656 of 4097763...
Reading line 3278208 of 4097763...
Reading line 4097760 of 4097763...
Processed data saved as ../../data/open_secrets/Lobby/lob_agency.csv


In [59]:
showdf(df_lob_agency)

Unnamed: 0,uniqid__lob_agency,agencyid__lob_agency,agency__lob_agency
0,BB7367A7-7B60-4DED-AA2D-A94771A9EBE8,1,US Senate
1,BB7367A7-7B60-4DED-AA2D-A94771A9EBE8,2,US House of Representatives
2,04366C6F-B0CE-4C28-87BF-EE1CC8A9BB41,2,US House of Representatives


#### lob_bills

In [61]:
# OpenSecrets Data Definition for Lobbying Data: Lobby bills
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_bills.htm
columns_lob_bills = ['b_id', 'si_id', 'congno', 'bill_name']

if not os.path.exists('../../data/open_secrets/Lobby/lob_bills.csv'):
    process_data('../../data/open_secrets/Lobby/lob_bills.txt', headers=columns_lob_bills, n_expected_fields=len(columns_lob_bills), show_errs=False)

df_lob_bills = pd.read_csv('../../data/open_secrets/Lobby/lob_bills.csv', on_bad_lines='skip')
df_lob_bills['bill_name__lob_bills'] = df_lob_bills['bill_name__lob_bills'].apply(lambda x: x[:-2])

Reading line 1452405 of 7262028...
Reading line 2904810 of 7262028...
Reading line 4357215 of 7262028...
Reading line 5809620 of 7262028...
Reading line 7262025 of 7262028...
Processed data saved as ../../data/open_secrets/Lobby/lob_bills.csv


In [62]:
showdf(df_lob_bills)

Unnamed: 0,b_id__lob_bills,si_id__lob_bills,congno__lob_bills,bill_name__lob_bills
0,s1461-117,2820018,117,S.14
1,hr463-117,2820018,117,H.R.4
2,s910-116,2820035,116,S.9


#### lob_indus

In [64]:
# OpenSecrets Data Definition for Lobbying Data: Lobby industries
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_indus.htm
columns_lob_indus = ['client', 'sub', 'total', 'year', 'catcode']

if not os.path.exists('../../data/open_secrets/Lobby/lob_indus.csv'):
    process_data('../../data/open_secrets/Lobby/lob_indus.txt', headers=columns_lob_indus, n_expected_fields=len(columns_lob_indus), show_errs=False)

df_lob_indus = pd.read_csv('../../data/open_secrets/Lobby/lob_indus.csv', on_bad_lines='skip')

Reading line 61946 of 309730...
Reading line 123892 of 309730...
Reading line 185838 of 309730...
Reading line 247784 of 309730...
Reading line 309730 of 309730...
Processed data saved as ../../data/open_secrets/Lobby/lob_indus.csv


In [65]:
showdf(df_lob_indus)

Unnamed: 0,client__lob_indus,sub__lob_indus,total__lob_indus,year__lob_indus,catcode__lob_indus
0,National Assn for County Community & Econ Develop,National Assn for County Community & Econ Develop,0,1998,X3000
1,Fox Valley Technical College,Fox Valley Technical College,80000,2015,H5200
2,Employers Cncl on Flexible Compensation,Employers Cncl on Flexible Compensation,580000,2001,J9000


#### lob_issue

In [67]:
# OpenSecrets Data Definition for Lobbying Data: Lobby issues
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_issues.htm
columns_lob_issue = ['si_id', 'uniqid', 'issueid', 'issue', 'specificissue', 'year']

if not os.path.exists('../../data/open_secrets/Lobby/lob_issue.csv'):
    process_data('../../data/open_secrets/Lobby/lob_issue.txt', headers=columns_lob_issue, n_expected_fields=len(columns_lob_issue), show_errs=False)

df_lob_issue = pd.read_csv('../../data/open_secrets/Lobby/lob_issue.csv', on_bad_lines='skip')

Reading line 1218547 of 6092739...
Reading line 2437094 of 6092739...
Reading line 3655641 of 6092739...
Reading line 4874188 of 6092739...
Reading line 6092735 of 6092739...
Processed data saved as ../../data/open_secrets/Lobby/lob_issue.csv


In [68]:
showdf(df_lob_issue)

Unnamed: 0,si_id__lob_issue,uniqid__lob_issue,issueid__lob_issue,issue__lob_issue,specificissue__lob_issue,year__lob_issue
0,3001624,02e92bd6-0159-495e-9d00-8a490a0be8be,DIS,Disaster & Emergency Planning,Issues affecting manufacturer of railroad and ...,2022
1,3001625,02e92bd6-0159-495e-9d00-8a490a0be8be,ENV,Environment & Superfund,Issues affecting manufacturer of railroad and ...,2022
2,3001626,02e92bd6-0159-495e-9d00-8a490a0be8be,LBR,"Labor, Antitrust & Workplace",Issues affecting manufacturer of railroad and ...,2022


#### lob_issue_no_specific

In [70]:
# OpenSecrets Data Definition for Lobbying Data: Lobby issues (no specific issue)
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_issues.htm
columns_lob_issue_no_specific = ['si_id', 'uniqid', 'issueid', 'issue', 'year']

if not os.path.exists('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue.csv'):
    process_data('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue.txt', headers=columns_lob_issue_no_specific, n_expected_fields=len(columns_lob_issue_no_specific), show_errs=False)

df_lob_issue_no_specific = pd.read_csv('../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue.csv', on_bad_lines='skip')

Reading line 601972 of 3009864...
Reading line 1203944 of 3009864...
Reading line 1805916 of 3009864...
Reading line 2407888 of 3009864...
Reading line 3009860 of 3009864...
Processed data saved as ../../data/open_secrets/Lobby/lob_issue_NoSpecficIssue.csv


In [71]:
showdf(df_lob_issue_no_specific)

Unnamed: 0,si_id__lob_issue_NoSpecficIssue,uniqid__lob_issue_NoSpecficIssue,issueid__lob_issue_NoSpecficIssue,issue__lob_issue_NoSpecficIssue,year__lob_issue_NoSpecficIssue
0,3001624,02e92bd6-0159-495e-9d00-8a490a0be8be,DIS,Disaster & Emergency Planning,2022
1,3001625,02e92bd6-0159-495e-9d00-8a490a0be8be,ENV,Environment & Superfund,2022
2,3001626,02e92bd6-0159-495e-9d00-8a490a0be8be,LBR,"Labor, Antitrust & Workplace",2022


#### lob_lobbying

In [73]:
# OpenSecrets Data Definitions for Lobbying Data: Lobbying
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_lobbying.htm
columns_lob_lobbying = ['uniqid','registrant_raw','registrant','isfirm','client_raw','client','ultorg','amount',
                        'catcode','source','self','includensfs','use',
                       'ind', 'year', 'type', 'typelong', 'affiliate']

if not os.path.exists('../../data/open_secrets/Lobby/lob_lobbying.csv'):
    process_data('../../data/open_secrets/Lobby/lob_lobbying.txt', headers=columns_lob_lobbying, n_expected_fields=len(columns_lob_lobbying), show_errs=False)

df_lob_lobbying = pd.read_csv('../../data/open_secrets/Lobby/lob_lobbying.csv', on_bad_lines='skip')

Reading line 308814 of 1544073...
Reading line 617628 of 1544073...
Reading line 926442 of 1544073...
Reading line 1235256 of 1544073...
Reading line 1544070 of 1544073...
Processed data saved as ../../data/open_secrets/Lobby/lob_lobbying.csv


In [74]:
showdf(df_lob_lobbying)

Unnamed: 0,uniqid__lob_lobbying,registrant_raw__lob_lobbying,registrant__lob_lobbying,isfirm__lob_lobbying,client_raw__lob_lobbying,client__lob_lobbying,ultorg__lob_lobbying,amount__lob_lobbying,catcode__lob_lobbying,source__lob_lobbying,self__lob_lobbying,includensfs__lob_lobbying,use__lob_lobbying,ind__lob_lobbying,year__lob_lobbying,type__lob_lobbying,typelong__lob_lobbying,affiliate__lob_lobbying
0,82c5f661-a637-45ad-a3a6-b5ba18cf8962,ASTRAZENECA PHARMACEUTICALS LP,AstraZeneca Pharmaceuticals,,ASTRAZENECA PHARMACEUTICALS LP,AstraZeneca Pharmaceuticals,AstraZeneca PLC,1370000.0,H4300,pac,x,,y,y,2021,q4a,FOURTH QUARTER AMENDMENT,
1,84ad3a9e-5864-4227-a802-e268fbf37237,"DAVID L. HORNE, LLC",David L Horne LLC,y,MULTIFAMILY LENDERS COUNCIL,Multifamily Lenders Council,Multifamily Lenders Council,15000.0,F4600,wda16,n,,y,y,2021,q4,FOURTH QUARTER REPORT,
2,85b111b1-5d2e-4107-bc24-0921316e29a5,ECHELON GOVERNMENT AFFAIRS,Echelon Government Affairs,y,THE ALBERS GROUP,Albers Group,Albers Group,10000.0,Y4000,,n,,y,y,2021,q4,FOURTH QUARTER REPORT,


#### lob_lobbyist

In [76]:
# OpenSecrets Data Definition for Lobbyists
# https://www.opensecrets.org/resources/datadictionary/Data%20Dictionary%20lob_lobbyists.htm
columns_lob_lobbyist = ['uniqid', 'lobbyist_lastname_std', 'lobbyist_firstname_std', 'lobbyist_lastname_raw', 
                     'lobbyist_firstname_raw', 'lobbyist_id', 'year', 'officialposition', 'cid', 'formercongmem']

if not os.path.exists('../../data/open_secrets/Lobby/lob_lobbyist.csv'):
    process_data('../../data/open_secrets/Lobby/lob_lobbyist.txt', headers=columns_lob_lobbyist, n_expected_fields=len(columns_lob_lobbyist), show_errs=False)

df_lob_lobbyist = pd.read_csv('../../data/open_secrets/Lobby/lob_lobbyist.csv', on_bad_lines='skip')

Reading line 699422 of 3497114...
Reading line 1398844 of 3497114...
Reading line 2098266 of 3497114...
Reading line 2797688 of 3497114...
Reading line 3497110 of 3497114...
Processed data saved as ../../data/open_secrets/Lobby/lob_lobbyist.csv


In [77]:
showdf(df_lob_lobbyist)

Unnamed: 0,uniqid__lob_lobbyist,lobbyist_lastname_std__lob_lobbyist,lobbyist_firstname_std__lob_lobbyist,lobbyist_lastname_raw__lob_lobbyist,lobbyist_firstname_raw__lob_lobbyist,lobbyist_id__lob_lobbyist,year__lob_lobbyist,officialposition__lob_lobbyist,cid__lob_lobbyist,formercongmem__lob_lobbyist
0,06C29C84-250F-478B-872A-2F647D9DC044,"O'BRIEN, LAWRENCE F. III","O'Brien, Lawrence F III",Y0000046486L,2004,,,n,,
1,3A22C685-EC94-46AA-9C45-4AA4A7044C28,"BRAGG, PATRICIA DUNMIRE","Bragg, Patricia Dunmire",Y0000020554L,2001,,,n,,
2,5CBE61EC-87F1-401E-9D57-620975C9A1F8,"COSTELLO, RYAN","Costello, Ryan",Y0000027292L,2002,,N00031064,y,,


#### lob_rpt

In [79]:
# OpenSecrets Data Definitions for Lobbying Data: Report types
# No documentation provided on OpenSecrets.com
columns_lob_rpt = ['typelong', 'typeshort']

if not os.path.exists('../../data/open_secrets/Lobby/lob_rpt.csv'):
    process_data('../../data/open_secrets/Lobby/lob_rpt.txt', headers=columns_lob_rpt, n_expected_fields=len(columns_lob_rpt), show_errs=False)

df_lob_rpt = pd.read_csv('../../data/open_secrets/Lobby/lob_rpt.csv', on_bad_lines='skip')

Reading line 19 of 95...
Reading line 38 of 95...
Reading line 57 of 95...
Reading line 76 of 95...
Reading line 95 of 95...
Processed data saved as ../../data/open_secrets/Lobby/lob_rpt.csv


In [80]:
showdf(df_lob_rpt)

Unnamed: 0,typelong__lob_rpt,typeshort__lob_rpt
0,MID-YEAR REPORT,m
1,MID-YEAR AMENDMENT,ma
2,MID-YEAR (NO ACTIVITY),mn


### IDs and categories
#### CRP_ID

In [82]:
install_if_needed('xlrd')
import xlrd

xlrd is already installed.


In [83]:
# Candidate ids
# This dataset is very different, so load it independently.
columns_crp_ids = ['blank_excel_column__crp_ids', 'cid__crp_ids', 'crpname__crp_ids', 'party__crp_ids', 'distidrunfor__crp_ids', 'feccandid__crp_ids'] # Blank excel column is necessary.
columns_crp_ids = dict(enumerate(columns_crp_ids))
df_crp_ids = pd.read_excel('../../data/open_secrets/CRP_IDs.xls', header=None, skiprows=15)
df_crp_ids = df_crp_ids.drop(df_crp_ids.columns[0], axis=1)
df_crp_ids = df_crp_ids.rename(columns=columns_crp_ids)

In [84]:
showdf(df_crp_ids)

Unnamed: 0,cid__crp_ids,crpname__crp_ids,party__crp_ids,distidrunfor__crp_ids,feccandid__crp_ids
0,N00034296,"Aalders, Tim",R,UT03,H2UT03280
1,N00047923,"Aazami, Shervin",D,CA32,H2CA30291
2,N00051397,"Abahsain, Jill",D,MN07,H2MN07162


#### CRP_Categories

In [86]:
from io import StringIO
crp_filepath = '../../data/open_secrets/CRP_Categories.txt'
with open(crp_filepath, 'r') as file:
    lines = file.readlines()

header_line_index = next(i for i, line in enumerate(lines) if line.startswith('Catcode'))
table_data = ''.join(lines[header_line_index:])
df_crp_cats = pd.read_csv(StringIO(table_data), sep='\t')
df_crp_cats.columns = df_crp_cats.columns.str.lower().str.replace(' ', '_')
df_crp_cats.columns = [col + '__crp_cats' for col in df_crp_cats.columns]

In [87]:
showdf(df_crp_cats)

Unnamed: 0,catcode__crp_cats,catname__crp_cats,catorder__crp_cats,industry__crp_cats,sector__crp_cats,sector_long__crp_cats
0,A0000,Agriculture,A11,Misc Agriculture,Agribusiness,Agribusiness
1,A1000,Crop production & basic processing,A01,Crop Production & Basic Processing,Agribusiness,Agribusiness
2,A1100,Cotton,A01,Crop Production & Basic Processing,Agribusiness,Agribusiness
