## Getting the data

In this Jupyter notebook, I want to keep track of a few things:
1. Where we are sourcing our data.
2. The method by which we obtain data (e.g. scraping, using an API, etc.)
3. Some preliminary data cleaning and organization.

Recall that we are trying to identify a relationship between the lobbying behavior and the individual stock trades of congresspeople. Let's start with the lobbying data.

### Lobbying data

#### opensecrets.org

opensecrets.org is a non-profit watchdog organization that tries to keep track of money in politics. I think it is a good first place to look for finding out the different types of "money" in politics, and for determining broad-stroke data about such money. For instance - it has information on
1. Personal financial disclosures of congresspeople and estimates of their net worth.
2. Campaign contributions/ fundraising data.
3. Political ads by industries, either through 527s (issue advocacy groups) or PACs (often businesses, labor unions, or ideological interests)
4. __Domestic and foreign lobbying efforts, in the traditional sense__.
   
This last one is what we are most interested in, at least for an initial pass. 

NB: All the data opensecrets.org has is itself aggregated from different sources. In particular, data on lobbying efforts is all taken from disclosures from the office of the senate - we will look at this source next, as it is likely more granular in nature.

We will use the CRP API from opensecrets.org for access to some of their data.
The API is easy to register for here: https://www.opensecrets.org/open-data/api. There is a python client library, due to Rob Remington, "opensecrets-crpapi" which allows us to interface with the CRP API through python. See: https://github.com/robrem/opensecrets-crpapi.
We now use this to download some data.


In [1]:
!pip install opensecrets-crpapi

Collecting opensecrets-crpapi
  Downloading opensecrets_crpapi-0.2.2-py2.py3-none-any.whl (5.5 kB)
Collecting httplib2
  Downloading httplib2-0.22.0-py3-none-any.whl (96 kB)
[K     |████████████████████████████████| 96 kB 5.1 MB/s  eta 0:00:01
Installing collected packages: httplib2, opensecrets-crpapi
Successfully installed httplib2-0.22.0 opensecrets-crpapi-0.2.2


In [2]:
from crpapi import CRP
from ed.API_keys import get_opensecrets_key

crp=CRP(get_opensecrets_key()) #makes a CRP object

object_methods = [method_name for method_name in dir(crp) if callable(getattr(crp, method_name))]
print(dir(crp))



['BASE_URI', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'apikey', 'candidates', 'committees', 'fetch', 'http', 'indexp', 'orgs']


In [21]:
crp.orgs.get('Microsoft') #searches for organizations with 'Microsoft' in their name

{'@attributes': {'orgid': 'D000000115', 'orgname': 'Microsoft Corp'}}

In [18]:
crp.orgs.summary('D000000115') #returns summary information for Microsoft

{'cycle': '2024',
 'orgid': 'D000000115',
 'orgname': 'Microsoft Corp',
 'total': '3797244',
 'indivs': '2726546',
 'pacs': '0',
 'soft': '1050813',
 'tot527': '19885',
 'dems': '2311653',
 'repubs': '408590',
 'lobbying': '10544433',
 'outside': '0',
 'mems_invested': '0',
 'gave_to_pac': '230',
 'gave_to_party': '934897',
 'gave_to_527': '19885',
 'gave_to_cand': '1653469',
 'source': 'www.opensecrets.org/orgs/summary.php?id=D000000115'}

In [11]:
crp.orgs.totals('D000000115') #gets summary information for Microsoft

AttributeError: 'OrganizationsClient' object has no attribute 'totals'

In [3]:
# get a specific legislator by CID
cand = crp.candidates.get('N00007360')
print(cand['@attributes']['firstlast'])

# get the top contributors to a candidate for a specific cycle
contribs = crp.candidates.contrib('N00007360', '2016')
print(contribs[0]['@attributes']['org_name'])

# get fundraising information for a committee's members, by industry
cmte = crp.committees.cmte_by_ind('HARM', 'F10')
print(cmte[0]['@attributes']['member_name'])

# use fetch to access the endpoints more directly, without pre-parsed results
summ = crp.fetch('candSummary', cid='N00007360')
print(summ['summary']['@attributes']['first_elected'])

Nancy Pelosi
Facebook Inc
O'Rourke, Beto
1987


In [30]:
print([contribs[n]['@attributes']['org_name'] for n in range(len(contribs))])
print(contribs[0]['@attributes']['indivs'])

['Facebook Inc', 'Google Inc', 'Intel Corp', 'Boeing Co', 'Oracle Corp', 'Certain Software Inc', 'Mackenzie Capital Management', 'Marcus & Millichap', 'Peter G Peterson Foundation', 'R&S Assoc']
6450


### Lobbying disclosures from the Office of the Senate