## Some data cleaning

We now tidy up some of the data we are working with. Here are a few of the initial goals:
1. Identify which lobbying reports are relevant, and which we should remove from our consideration.
2. Remove a bunch of unimportant columns from the lobbying data.
3. Explicate and standardize the lobbying issue codes (sectors for lobbying activity).
4. Classify stocks bought by industry, i.e. match each stock to one or more sectors from 3.

We import the necessary packages here.

In [2]:
import numpy as np
import pandas as pd
import ast

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### Which filing codes are important?

Let's look at a few codes. Recall that they are of the following form (omitting M and Y codes since there were no filings labelled as such).

In [3]:
filing_types=["RR","RA","Q1","Q1Y","1T","1TY","1A","1AY","1@","1@Y","Q2","Q2Y","2T","2TY","2A","2AY","2@","2@Y","Q3","Q3Y","3T","3TY","3A","3AY","3@","3@Y","Q4","Q4Y","4T","4TY","4A","4AY","4@","4@Y"]

In the end, registrations (RR and RA) will not relevant to us, since there is no report of income or expenses on the registration form (this form is called form "LD-1" by the office of the Senate). See, for example, https://lda.senate.gov/filings/public/filing/286588cd-7cfd-4a82-a4a0-c5bd9265f868/print/.

While it is tempting to ignore codes involving Y, since Y is supposed to represent no activity, it seems very possible that firms can report income or expenses when filing a Y code.

In [57]:
lobbying_2023_Q1Y=pd.read_csv("../../lobbying-local-data/LDA_data/Filings_2023/filings_2023_Q1Y.csv")

In [61]:
lobbying_2023_Q1Y["expenses"]=lobbying_2023_Q1Y["expenses"].fillna(0)
lobbying_2023_Q1Y["income"]=lobbying_2023_Q1Y["income"].fillna(0)

In [63]:
lobbying_2023_Q1Y.loc[(lobbying_2023_Q1Y["expenses"]!=0) | (lobbying_2023_Q1Y["income"]!=0)]

Unnamed: 0,url,filing_uuid,filing_type,filing_type_display,filing_year,filing_period,filing_period_display,filing_document_url,filing_document_content_type,income,...,registrant_different_address,registrant_city,registrant_state,registrant_zip,registrant,client,lobbying_activities,conviction_disclosures,foreign_entities,affiliated_organizations
4,https://lda.senate.gov/api/v1/filings/2f3f7ad4...,2f3f7ad4-aa65-4272-9748-cd6b2d486375,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/2...,text/html,10000.0,...,False,Washington,DC,20001.0,"{'id': 401106013, 'url': 'https://lda.senate.g...","{'id': 54273, 'url': 'https://lda.senate.gov/a...",[],[],[],[]
27,https://lda.senate.gov/api/v1/filings/d18f7814...,d18f7814-89bb-43d2-b069-2ec862e55fed,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/d...,text/html,0.0,...,True,Sperryvile,VA,22740.0,"{'id': 401103509, 'url': 'https://lda.senate.g...","{'id': 199224, 'url': 'https://lda.senate.gov/...",[],[],[],[]
28,https://lda.senate.gov/api/v1/filings/0afef369...,0afef369-ebe1-442e-9f82-2b123da38939,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/0...,text/html,15000.0,...,False,Centreville,VA,20122.0,"{'id': 286872, 'url': 'https://lda.senate.gov/...","{'id': 176548, 'url': 'https://lda.senate.gov/...",[],[],[],[]
32,https://lda.senate.gov/api/v1/filings/22cb9607...,22cb9607-85d5-48a0-a834-a593e69441f8,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/2...,text/html,30000.0,...,False,Dayton,OH,45440.0,"{'id': 401103216, 'url': 'https://lda.senate.g...","{'id': 205208, 'url': 'https://lda.senate.gov/...",[],[],[],[]
38,https://lda.senate.gov/api/v1/filings/3e0840b3...,3e0840b3-b361-43fb-8a18-31494f879e76,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/3...,text/html,30000.0,...,True,Washington,DC,20037.0,"{'id': 401036920, 'url': 'https://lda.senate.g...","{'id': 208525, 'url': 'https://lda.senate.gov/...",[],[],[],[]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2416,https://lda.senate.gov/api/v1/filings/a8d42fc9...,a8d42fc9-621b-4cf0-81fa-d2248f727d8f,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/a...,text/html,12000.0,...,False,Valencia,CA,91355.0,"{'id': 401103948, 'url': 'https://lda.senate.g...","{'id': 200458, 'url': 'https://lda.senate.gov/...",[],[],[],[]
2417,https://lda.senate.gov/api/v1/filings/b10e92f1...,b10e92f1-f075-455c-aceb-f826d020b561,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/b...,text/html,20000.0,...,False,Washington,DC,20005.0,"{'id': 283696, 'url': 'https://lda.senate.gov/...","{'id': 56600, 'url': 'https://lda.senate.gov/a...",[],[],[],[]
2427,https://lda.senate.gov/api/v1/filings/9678095e...,9678095e-ad4b-4c7e-a068-500641560c29,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/9...,text/html,40000.0,...,False,Washington,DC,20002.0,"{'id': 30837, 'url': 'https://lda.senate.gov/a...","{'id': 136234, 'url': 'https://lda.senate.gov/...",[],[],[],[]
2433,https://lda.senate.gov/api/v1/filings/9e0a0976...,9e0a0976-a2f1-4f66-9206-499d593a1c96,Q1Y,1st Quarter - Report (No Activity),2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/9...,text/html,40000.0,...,False,Washington,DC,20020.0,"{'id': 401105126, 'url': 'https://lda.senate.g...","{'id': 211572, 'url': 'https://lda.senate.gov/...",[],[],[],[]


If a filing is an amendment to a previous filing (i.e. the code involves an "A" or "@"), the information within should be incorporated into the previous filing, so that we don't double count this single datapoint.

In [4]:
lobbying_2023_RR=pd.read_csv("../../lobbying-local-data/LDA_data/Filings_2023/filings_2023_RR.csv")
lobbying_2023_RA=pd.read_csv("../../lobbying-local-data/LDA_data/Filings_2023/filings_2023_RA.csv")

Just to see what the columns are, and what a typical entry may look like, here's the 57th row of one of these dataframes.

In [84]:
lobbying_2023_RR.iloc[57]

url                             https://lda.senate.gov/api/v1/filings/07a08705...
filing_uuid                                  07a08705-6f68-4bac-afe0-005ae5def8cf
filing_type                                                                    RR
filing_type_display                                                  Registration
filing_year                                                                  2023
filing_period                                                       first_quarter
filing_period_display                                1st Quarter (Jan 1 - Mar 31)
filing_document_url             https://lda.senate.gov/filings/public/filing/0...
filing_document_content_type                                            text/html
income                                                                        NaN
expenses                                                                      NaN
expenses_method                                                               NaN
expenses_method_

A lot of these columns seem to be unimportant. At a first glance, I don't think we will ever use: ["url", "filing_type_display", "filing_period_display", "filing_document_url", "filing_document_content_type", "filing_document_type", "expenses_method_display", "posted_by_name", "dt_posted", "registrant_country", "registrant_ppb_country", "registrant_address_1", "registrant_address_2", "registrant_different_address", "registrant_city", "registrant_state", "registrant_zip"]. Keeping the uuid is nice in case we need to refer to this specific filing for any reason. We will drop these columns later.

For now, let's match up the amended filings with their corresponding originals. We use ast.literal_eval to access information from those entries which contain strings representing dictionaries. (See discussion on "Dealing with dictionaries...".) We make new columns of our dataframes with the registrant id (senate id) and client id. The amendments will have these exact ids. This lets us match one to the other.

In [38]:
registrant_ids=[ast.literal_eval(lobbying_2023_RR.iloc[i]["registrant"])["id"] for i in range(len(lobbying_2023_RR))]
client_ids=[ast.literal_eval(lobbying_2023_RR.iloc[i]["client"])["id"] for i in range(len(lobbying_2023_RR))]
lobbying_2023_RR["registrant_id"]=registrant_ids
lobbying_2023_RR["client_id"]=client_ids

In [37]:
registrant_ids=[ast.literal_eval(lobbying_2023_RA.iloc[i]["registrant"])["id"] for i in range(len(lobbying_2023_RA))]
client_ids=[ast.literal_eval(lobbying_2023_RA.iloc[i]["client"])["id"] for i in range(len(lobbying_2023_RA))]
lobbying_2023_RA["registrant_id"]=registrant_ids
lobbying_2023_RA["client_id"]=client_ids

In [53]:
lobbying_2023_RR_toamend=pd.DataFrame()
for i in range(len(lobbying_2023_RA)):
    reg_client=[lobbying_2023_RA.iloc[i]["registrant_id"], lobbying_2023_RA.iloc[i]["client_id"]]
    df=lobbying_2023_RR.loc[(lobbying_2023_RR["registrant_id"]==reg_client[0]) & (lobbying_2023_RR["client_id"]==reg_client[1])]
    lobbying_2023_RR_toamend=pd.concat([lobbying_2023_RR_toamend,df])


Let's sort this by registrant_id, followed by client_id.

In [55]:
lobbying_2023_RR_toamend.sort_values(["registrant_id","client_id"])

Unnamed: 0,url,filing_uuid,filing_type,filing_type_display,filing_year,filing_period,filing_period_display,filing_document_url,filing_document_content_type,income,...,registrant_state,registrant_zip,registrant,client,lobbying_activities,conviction_disclosures,foreign_entities,affiliated_organizations,registrant_id,client_id
204,https://lda.senate.gov/api/v1/filings/3199e1d1...,3199e1d1-4f27-441e-845f-cda2a7f46dbc,RR,Registration,2023,first_quarter,1st Quarter (Jan 1 - Mar 31),https://lda.senate.gov/filings/public/filing/3...,text/html,,...,DC,20006.0,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 54272, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'MAN', 'general_issue_...",[],[],[],682,54272
3880,https://lda.senate.gov/api/v1/filings/202792f1...,202792f1-1b03-4653-9569-e18c087afca0,RR,Registration,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/2...,text/html,,...,DC,20005.0,"{'id': 795, 'url': 'https://lda.senate.gov/api...","{'id': 58093, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'COM', 'general_issue_...",[],[],[],795,58093
2653,https://lda.senate.gov/api/v1/filings/52508a4d...,52508a4d-7dc0-4698-bc1c-a20a179562b2,RR,Registration,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/5...,text/html,,...,DC,20004.0,"{'id': 7257, 'url': 'https://lda.senate.gov/ap...","{'id': 56841, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'HCR', 'general_issue_...",[],[],[],7257,56841
3566,https://lda.senate.gov/api/v1/filings/0e461030...,0e461030-6f88-4167-96eb-de5ce5b9e6ed,RR,Registration,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/0...,text/html,,...,DC,20004.0,"{'id': 7257, 'url': 'https://lda.senate.gov/ap...","{'id': 57772, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'GOV', 'general_issue_...",[],"[{'name': 'YUKSEL YILDIRIM', 'contribution': '...",[],7257,57772
2372,https://lda.senate.gov/api/v1/filings/650b9e5a...,650b9e5a-b4d0-4def-8dae-fcf63eb78a26,RR,Registration,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/6...,text/html,,...,DC,20007.0,"{'id': 15042, 'url': 'https://lda.senate.gov/a...","{'id': 56550, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'ECN', 'general_issue_...",[],[],[],15042,56550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3256,https://lda.senate.gov/api/v1/filings/757c4139...,757c4139-ec01-42f2-89d1-17bc7772bfaf,RR,Registration,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/7...,text/html,,...,DC,20006.0,"{'id': 401107944, 'url': 'https://lda.senate.g...","{'id': 57453, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'CSP', 'general_issue_...",[],[],[],401107944,57453
3986,https://lda.senate.gov/api/v1/filings/f2cb79d8...,f2cb79d8-6df4-4327-b9d8-003e88c3a342,RR,Registration,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/f...,text/html,,...,CA,94025.0,"{'id': 401108060, 'url': 'https://lda.senate.g...","{'id': 58238, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'SMB', 'general_issue_...",[],[],[],401108060,58238
3882,https://lda.senate.gov/api/v1/filings/1264d8e9...,1264d8e9-c889-4a3b-af09-d7a6d6405d0a,RR,Registration,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/1...,text/html,,...,FL,33140.0,"{'id': 401108086, 'url': 'https://lda.senate.g...","{'id': 58095, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'FAM', 'general_issue_...",[],[],[],401108086,58095
3958,https://lda.senate.gov/api/v1/filings/de25992d...,de25992d-b8fa-458e-9ef4-dafe39d50015,RR,Registration,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/d...,text/html,,...,DC,20007.0,"{'id': 401108110, 'url': 'https://lda.senate.g...","{'id': 58199, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'ENV', 'general_issue_...",[],[],[],401108110,58199


In [56]:
lobbying_2023_RA.sort_values(["registrant_id","client_id"])

Unnamed: 0,url,filing_uuid,filing_type,filing_type_display,filing_year,filing_period,filing_period_display,filing_document_url,filing_document_content_type,income,...,registrant_state,registrant_zip,registrant,client,lobbying_activities,conviction_disclosures,foreign_entities,affiliated_organizations,registrant_id,client_id
348,https://lda.senate.gov/api/v1/filings/aa865755...,aa865755-4d17-4399-ad70-5487c65029f2,RA,Registration - Amendment,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/a...,text/html,,...,DC,20006,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 51841, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'MAN', 'general_issue_...",[],[],[],682,51841
485,https://lda.senate.gov/api/v1/filings/9840a454...,9840a454-645c-40cb-86b0-8949aa1094be,RA,Registration - Amendment,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/9...,text/html,,...,DC,20006,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 53690, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'IMM', 'general_issue_...",[],[],[],682,53690
476,https://lda.senate.gov/api/v1/filings/97b92a95...,97b92a95-345d-429b-b2ed-54ea50ea7fc0,RA,Registration - Amendment,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/9...,text/html,,...,DC,20006,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 54272, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'CSP', 'general_issue_...",[],[],[],682,54272
352,https://lda.senate.gov/api/v1/filings/286588cd...,286588cd-7cfd-4a82-a4a0-c5bd9265f868,RA,Registration - Amendment,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/2...,text/html,,...,DC,20006,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 100528, 'url': 'https://lda.senate.gov/...","[{'general_issue_code': 'TAX', 'general_issue_...",[],[],[],682,100528
439,https://lda.senate.gov/api/v1/filings/79e28204...,79e28204-9f04-4d2a-a8ab-f83cef800b76,RA,Registration - Amendment,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/7...,text/html,,...,DC,20006,"{'id': 682, 'url': 'https://lda.senate.gov/api...","{'id': 101263, 'url': 'https://lda.senate.gov/...","[{'general_issue_code': 'HCR', 'general_issue_...",[],[],[],682,101263
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
378,https://lda.senate.gov/api/v1/filings/b3517459...,b3517459-911a-4f97-a58f-85e01ce350e3,RA,Registration - Amendment,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/b...,text/html,,...,DC,20006,"{'id': 401107944, 'url': 'https://lda.senate.g...","{'id': 57453, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'CSP', 'general_issue_...",[],[],[],401107944,57453
490,https://lda.senate.gov/api/v1/filings/4e63bb22...,4e63bb22-ab56-4322-951d-983bde7d12e9,RA,Registration - Amendment,2023,third_quarter,3rd Quarter (July 1 - Sep 30),https://lda.senate.gov/filings/public/filing/4...,text/html,,...,CA,94025,"{'id': 401108060, 'url': 'https://lda.senate.g...","{'id': 58238, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'SCI', 'general_issue_...",[],[],[],401108060,58238
505,https://lda.senate.gov/api/v1/filings/1cd5b424...,1cd5b424-a273-42b8-8f75-904cded3e591,RA,Registration - Amendment,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/1...,text/html,,...,FL,33140,"{'id': 401108086, 'url': 'https://lda.senate.g...","{'id': 58095, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'FIN', 'general_issue_...",[],[],[],401108086,58095
519,https://lda.senate.gov/api/v1/filings/4282489a...,4282489a-7a96-43a2-98ab-e8b6444b70ae,RA,Registration - Amendment,2023,fourth_quarter,4th Quarter (Oct 1 - Dec 31),https://lda.senate.gov/filings/public/filing/4...,text/html,,...,DC,20007,"{'id': 401108110, 'url': 'https://lda.senate.g...","{'id': 58199, 'url': 'https://lda.senate.gov/a...","[{'general_issue_code': 'ENV', 'general_issue_...",[],[],[],401108110,58199


### Dropping unnecessary columns

### Dealing with dictionaries in the DataFrames

Some of these columns have dictionaries as entries, namely "registrant" (i.e. lobbyist), "client" (i.e. entity lobbying), "lobbying_activities" (the codes for the issues at hand). Let's just look at these for our filing in question.

In [98]:
print(lobbying_2023_RR.iloc[57]["registrant"])
print()
print(lobbying_2023_RR.iloc[57]["client"])
print()
print(lobbying_2023_RR.iloc[57]["lobbying_activities"])

{'id': 323318, 'url': 'https://lda.senate.gov/api/v1/registrants/323318/', 'house_registrant_id': 39886, 'name': 'BRODY GROUP L.L.C. PUBLIC AFFAIRS', 'description': 'Public Affairs', 'address_1': '3299 K street NW Suite 401', 'address_2': None, 'address_3': None, 'address_4': None, 'city': 'Washington', 'state': 'DC', 'state_display': 'District of Columbia', 'zip': '20007', 'country': 'US', 'country_display': 'United States of America', 'ppb_country': 'US', 'ppb_country_display': 'United States of America', 'contact_name': 'MR. MIKE BRODY', 'contact_telephone': '+1 202-640-9135', 'dt_updated': '2024-01-26T20:13:12.964123-05:00'}

{'id': 53935, 'url': 'https://lda.senate.gov/api/v1/clients/53935/', 'client_id': 53935, 'name': 'BC ENGINEERED PRODUCTS INC.', 'general_description': 'manufacturing and defense industrial base supplier.', 'client_government_entity': None, 'client_self_select': False, 'state': 'NJ', 'state_display': 'New Jersey', 'country': 'US', 'country_display': 'United Sta

For "registrant" and "client", these entries are actually not dictionaries - rather they are the literal strings representing a dictionary. For "lobbying_activities" we have a string representing a list of dictionaries.

In [131]:
print("Registrant:", ast.literal_eval(lobbying_2023_RR.iloc[57]["registrant"])["name"])
print()
print("Client:", ast.literal_eval(lobbying_2023_RR.iloc[57]["client"])["name"])
print()
l=len(ast.literal_eval(lobbying_2023_RR.iloc[57]["lobbying_activities"])) #this is the length of the list of dictionaries
for i in range(l):
    print(ast.literal_eval(lobbying_2023_RR.iloc[57]["lobbying_activities"])[i]["general_issue_code"],ast.literal_eval(lobbying_2023_RR.iloc[57]["lobbying_activities"])[i]["description"])


Registrant: BRODY GROUP L.L.C. PUBLIC AFFAIRS

Client: BC ENGINEERED PRODUCTS INC.

DEF Fiscal Year 2024 Defense Authorization and Appropriation bills, related to Defense Logisitics Agency part cost reduction.
BUD Fiscal Year 2024 Defense Authorization and Appropriation bills, related to Defense Logisitics Agency part cost reduction.


In [81]:
df.loc[df.!=0][""]

0       RR
1       RR
2       RR
3       RR
4       RR
        ..
4208    RR
4209    RR
4210    RR
4211    RR
4212    RR
Name: filing_type, Length: 4213, dtype: object

In [85]:
# df=lobbying_2023_RA.drop(["url","filing_uuid","filing_type_display","filing_period_display","filing_document_url","filing_document_content_type","dt_posted",'registrant_country',
#        'registrant_ppb_country', 'registrant_address_1',
#        'registrant_address_2', 'registrant_different_address',
#        'registrant_city', 'registrant_state', 'registrant_zip'], axis=1)

In [60]:
df["registrant"][0]

"{'id': 28721, 'url': 'https://lda.senate.gov/api/v1/registrants/28721/', 'house_registrant_id': 32570, 'name': 'NATIONAL TRUST FOR HISTORIC PRESERVATION', 'description': 'non-profit for historic preservation', 'address_1': '600 14th Street, N.W.', 'address_2': 'Suite 500', 'address_3': None, 'address_4': None, 'city': 'WASHINGTON', 'state': 'DC', 'state_display': 'District of Columbia', 'zip': '20005', 'country': 'US', 'country_display': 'United States of America', 'ppb_country': 'US', 'ppb_country_display': 'United States of America', 'contact_name': 'MR. CARL R. WOLF', 'contact_telephone': '+1 202-588-6254', 'dt_updated': '2024-01-02T10:35:51.502816-05:00'}"

This is a string, so we convert to a dictionary using the ast package.

In [59]:
ast.literal_eval(df["registrant"][0])["name"]

'NATIONAL TRUST FOR HISTORIC PRESERVATION'

In [31]:
S=set(RA_uuids).intersection(set(RR_uuids))
S

set()

In [27]:
lobbying_2023_RR_amended=pd.DataFrame()
for uuid in RA_uuids:
    lobbying_2023_RR_amended=pd.concat([lobbying_2023_RR_amended,lobbying_2023_RR.loc[lobbying_2023_RR.filing_uuid==uuid]], ignore_index=True)


In [28]:
lobbying_2023_RR_amended

Unnamed: 0,url,filing_uuid,filing_type,filing_type_display,filing_year,filing_period,filing_period_display,filing_document_url,filing_document_content_type,income,...,registrant_different_address,registrant_city,registrant_state,registrant_zip,registrant,client,lobbying_activities,conviction_disclosures,foreign_entities,affiliated_organizations


## Dropping the unimportant columns

## Lobbying issue codes