# Lab 9: Social Network Analysis

In this lab, we'll build a social network of US congressional representatives and their major donors.

To do this, we'll use the API from https://www.opensecrets.org/, which collects public records on campaign donations to congressional representatives. To query their API, you'll need an API key. 

- Sign up for an account: https://www.opensecrets.org/api/admin/index.php?function=signup
- Then get your API key here: https://www.opensecrets.org/api/admin/index.php?function=user_data
- Check the documentation here: https://www.opensecrets.org/api/admin/index.php?function=user_api_list

API requests are executed through HTTP, so we'll import urllib and urllib3. Unfortunately, this code might be pretty different in python2. If you can't run it, you can skip ahead and load the CSV file.

In [1]:
import urllib, urllib3

In [2]:
# generate the URL for an API call
def get_url(func,apikey,params):
    url = 'http://www.opensecrets.org/api/?method=%s&output=json&apikey=%s&%s' % \
    (func, apikey, urllib.parse.urlencode(params))
    return url

Sign up with an account on opensecrets.org (free) to get your own API key.

In [3]:
apikey='your key goes here'

Let's start by checking out our Georgia legislators.

In [4]:
url=get_url('getLegislators',apikey,{'id':'GA'})

In [6]:
# in urllib3, you create a PoolManager to make calls 
http = urllib3.PoolManager()

In [7]:
response =http.request('GET',url)

In [8]:
response.data

b'{"response":{"legislator":[{"@attributes":{"cid":"N00035346","firstlast":"Buddy Carter","lastname":"CARTER","party":"R","office":"GA01","gender":"M","first_elected":"2014","exit_code":"0","comments":"","phone":"202-225-5831","fax":"202-226-2269","website":"http:\\/\\/buddycarter.house.gov","webform":"","congress_office":"432 Cannon House Office Building","bioguide_id":"C001103","votesmart_id":"","feccandid":"H4GA01039","twitter_id":"RepBuddyCarter","youtube_url":"","facebook_id":"congressmanbuddycarter","birthdate":"1957-09-06"}},{"@attributes":{"cid":"N00002674","firstlast":"Sanford Bishop","lastname":"BISHOP","party":"D","office":"GA02","gender":"M","first_elected":"1992","exit_code":"0","comments":"","phone":"202-225-3631","fax":"202-225-2203","website":"http:\\/\\/bishop.house.gov","webform":"http:\\/\\/bishop.house.gov\\/contact","congress_office":"2407 Rayburn House Office Building","bioguide_id":"B000490","votesmart_id":"26817","feccandid":"H2GA02031","twitter_id":"SanfordBish

Looks like JSON! Let's import the JSON library and create a decoders.

In [9]:
import json

In [10]:
dec = json.JSONDecoder()

In [11]:
# have to decode from utf-8 first
response_json = dec.decode(response.data.decode('utf-8'))

In [12]:
response_json.keys()

dict_keys(['response'])

In [13]:
ga_leg = response_json['response']

In [14]:
len(ga_leg['legislator'])

16

In [15]:
print(ga_leg['legislator'][2])

{'@attributes': {'office': 'GA03', 'fax': '202-225-2515', 'facebook_id': 'pages/foo/71389451419', 'website': 'http://westmoreland.house.gov', 'comments': 'Retired at end of 114th', 'feccandid': 'H4GA08067', 'birthdate': '1950-04-02', 'webform': 'https://westmoreland.house.gov/email-me', 'youtube_url': 'http://youtube.com/RepLynnWestmoreland', 'votesmart_id': '8001', 'phone': '202-225-5901', 'congress_office': '2202 Rayburn House Office Building', 'exit_code': '4', 'twitter_id': 'RepWestmoreland', 'party': 'R', 'firstlast': 'Lynn A Westmoreland', 'bioguide_id': 'W000796', 'cid': 'N00026163', 'lastname': 'WESTMORELAND', 'first_elected': '2004', 'gender': 'M'}}


In [16]:
for leg in ga_leg['legislator']:
    print (leg['@attributes']['firstlast'],leg['@attributes']['office'])

Buddy Carter GA01
Sanford Bishop GA02
Lynn A Westmoreland GA03
Hank Johnson GA04
John Lewis GA05
Tom Price GA06
Rob Woodall GA07
Austin Scott GA08
Doug Collins GA09
Jody B Hice GA10
Barry Loudermilk GA11
Richard W Allen GA12
David Scott GA13
Tom Graves GA14
David Perdue GAS1
Johnny Isakson GAS2


Now, to get the contributors for each of these candidates, we need another call, `candContrib`, https://www.opensecrets.org/api/?output=doc&method=candContrib.

The input to this call is the `cid` attribute of the legislator.

In [17]:
response =http.request('GET',get_url('candContrib',
                                     apikey,
                                     {'cid':ga_leg['legislator'][0]['@attributes']['cid']}))

In [18]:
response.data

b'{"response":{"contributors":{"@attributes":{"cand_name":"Buddy Carter (R)","cid":"N00035346","cycle":"2016","origin":"Center for Responsive Politics","source":"http:\\/\\/www.opensecrets.org\\/politicians\\/contrib.php?cid=N00035346&cycle=2016","notice":"The organizations themselves did not donate, rather the money came from the organization\'s PAC, its individual members or employees or owners, and those individuals\' immediate families."},"contributor":[{"@attributes":{"org_name":"Rite Aid Corp","total":"24250","pacs":"5000","indivs":"19250"}},{"@attributes":{"org_name":"Professional Compounding Centers of America","total":"18500","pacs":"18500","indivs":"0"}},{"@attributes":{"org_name":"General Dynamics","total":"17000","pacs":"10000","indivs":"7000"}},{"@attributes":{"org_name":"American Optometric Assn","total":"16000","pacs":"16000","indivs":"0"}},{"@attributes":{"org_name":"United Parcel Service","total":"12500","pacs":"12500","indivs":"0"}},{"@attributes":{"org_name":"Ameriso

In [19]:
response_json = dec.decode(response.data.decode('utf-8'))['response']

In [20]:
response_json['contributors']['contributor'][:3]

[{'@attributes': {'indivs': '19250',
   'org_name': 'Rite Aid Corp',
   'pacs': '5000',
   'total': '24250'}},
 {'@attributes': {'indivs': '0',
   'org_name': 'Professional Compounding Centers of America',
   'pacs': '18500',
   'total': '18500'}},
 {'@attributes': {'indivs': '7000',
   'org_name': 'General Dynamics',
   'pacs': '10000',
   'total': '17000'}}]

Now we'll just create a social network between legislators and the top donors, ignoring the dollar amounts. In the terminology of [Easley and Kleinberg](https://www.cs.cornell.edu/home/kleinber/networks-book/networks-book.pdf), this is an **affiliation network**, because it's a bipartite network between individuals and affiliations that they share.

In [23]:
import networkx as nx

In [24]:
G = nx.Graph()

In [25]:
ga_leg['legislator'][0]

{'@attributes': {'bioguide_id': 'C001103',
  'birthdate': '1957-09-06',
  'cid': 'N00035346',
  'comments': '',
  'congress_office': '432 Cannon House Office Building',
  'exit_code': '0',
  'facebook_id': 'congressmanbuddycarter',
  'fax': '202-226-2269',
  'feccandid': 'H4GA01039',
  'first_elected': '2014',
  'firstlast': 'Buddy Carter',
  'gender': 'M',
  'lastname': 'CARTER',
  'office': 'GA01',
  'party': 'R',
  'phone': '202-225-5831',
  'twitter_id': 'RepBuddyCarter',
  'votesmart_id': '',
  'webform': '',
  'website': 'http://buddycarter.house.gov',
  'youtube_url': ''}}

In [124]:
for legislator in ga_leg['legislator']:
    # we'll add each legislator as a node, with additional attributes for the political party
    G.add_node(legislator['@attributes']['firstlast'],
               attr={'party':legislator['@attributes']['party'],
               'office':legislator['@attributes']['office']})
    response =http.request('GET',get_url('candContrib',
                                     apikey,
                                     {'cid':legislator['@attributes']['cid']}))
    response_json = dec.decode(response.data.decode('utf-8'))['response']
    for contributor in response_json['contributors']['contributor']:
        G.add_edge(legislator['@attributes']['firstlast'],
                  contributor['@attributes']['org_name'])

In [199]:
nx.write_gexf(G,'ga-reps.xml')

# Building the complete network

Now we'll get all representatives from all states. This requires a few more API calls than the 200 that you get per day from opensecrets.

In [26]:
import pandas as pd

First we need to know all the possible state abbreviations.

In [31]:
df_codes = pd.read_csv('states.csv')
df_codes.tail()

Unnamed: 0,State,Abbreviation
46,Virginia,VA
47,Washington,WA
48,West Virginia,WV
49,Wisconsin,WI
50,Wyoming,WY


In [None]:
legislators = dict()
for code in df_codes['Abbreviation']:
    url=get_url('getLegislators',apikey,{'id':code})
    response =http.request('GET',url)
    legislators[code] = dec.decode(response.data.decode('utf-8'))['response']
    print(code,end=' ')

Now to convert this to a dataframe, and save it as CSV. This took a minute to figure out.

In [241]:
import itertools

In [267]:
delegations = [x['legislator'] for x in legislators.values()]

In [305]:
# delegations is a list of lists. this will flatten it
all_legislators = itertools.chain.from_iterable(delegations)

In [306]:
set([type(atts) for atts in all_legislators])

{dict, str}

In [308]:
leg_df = pd.DataFrame([atts['@attributes'] 
              for atts 
              in itertools.chain.from_iterable(delegations) 
              if type(atts)==dict])

In [311]:
leg_df.head(3)

Unnamed: 0,bioguide_id,birthdate,cid,comments,congress_office,exit_code,facebook_id,fax,feccandid,first_elected,...,gender,lastname,office,party,phone,twitter_id,votesmart_id,webform,website,youtube_url
0,L000578,1960-07-02,N00033987,,322 Cannon House Office Building,0,RepLaMalfa,530-534-7800,H2CA02142,2012,...,M,LAMALFA,CA01,R,202-225-3076,RepLaMalfa,29713,https://lamalfa.house.gov/contact/email-me,http://lamalfa.house.gov,https://youtube.com/RepLaMalfa
1,H001068,1964-02-18,N00033030,,1406 Longworth House Office Building,0,RepHuffman,202-225-5163,H2CA06259,2012,...,M,HUFFMAN,CA02,D,202-225-5161,RepHuffman,59849,https://huffman.house.gov/contact/email-me,http://huffman.house.gov,https://youtube.com/rephuffman
2,G000559,1945-01-24,N00030856,,2438 Rayburn House Office Building,0,repgaramendi,202-225-5914,H0CA10149,2009,...,M,GARAMENDI,CA03,D,202-225-1880,RepGaramendi,29664,https://garamendi.house.gov/contact-me/email-me,http://garamendi.house.gov,https://youtube.com/garamendiCA10


In [309]:
leg_df.to_csv('legislators.csv')

In [32]:
leg_df = pd.read_csv('legislators.csv',index_col=0)

In [33]:
leg_df.head(3)

Unnamed: 0,bioguide_id,birthdate,cid,comments,congress_office,exit_code,facebook_id,fax,feccandid,first_elected,...,gender,lastname,office,party,phone,twitter_id,votesmart_id,webform,website,youtube_url
0,L000578,1960-07-02,N00033987,,322 Cannon House Office Building,0,RepLaMalfa,530-534-7800,H2CA02142,2012,...,M,LAMALFA,CA01,R,202-225-3076,RepLaMalfa,29713.0,https://lamalfa.house.gov/contact/email-me,http://lamalfa.house.gov,https://youtube.com/RepLaMalfa
1,H001068,1964-02-18,N00033030,,1406 Longworth House Office Building,0,RepHuffman,202-225-5163,H2CA06259,2012,...,M,HUFFMAN,CA02,D,202-225-5161,RepHuffman,59849.0,https://huffman.house.gov/contact/email-me,http://huffman.house.gov,https://youtube.com/rephuffman
2,G000559,1945-01-24,N00030856,,2438 Rayburn House Office Building,0,repgaramendi,202-225-5914,H0CA10149,2009,...,M,GARAMENDI,CA03,D,202-225-1880,RepGaramendi,29664.0,https://garamendi.house.gov/contact-me/email-me,http://garamendi.house.gov,https://youtube.com/garamendiCA10


In [34]:
leg_df.tail(3)

Unnamed: 0,bioguide_id,birthdate,cid,comments,congress_office,exit_code,facebook_id,fax,feccandid,first_elected,...,gender,lastname,office,party,phone,twitter_id,votesmart_id,webform,website,youtube_url
536,H001064,1952-07-29,N00031557,,425 Cannon House Office Building,0,CongressmanDennyHeck,202-225-0129,H0WA03161,2012,...,M,HECK,WA10,D,202-225-9740,RepDennyHeck,126058.0,https://dennyheck.house.gov/contact/email-me,http://dennyheck.house.gov,https://youtube.com/RepDennyHeck
537,C000127,1958-10-13,N00007836,,511 Hart Senate Office Building,0,senatorcantwell,202-228-0514,S8WA00194,2000,...,F,CANTWELL,WAS1,D,202-224-3441,SenatorCantwell,27122.0,http://www.cantwell.senate.gov/public/index.cf...,https://www.cantwell.senate.gov,https://youtube.com/SenatorCantwell
538,M001111,1950-10-11,N00007876,,154 Russell Senate Office Building,0,,202-224-0238,S2WA00189,1992,...,F,MURRAY,WAS2,D,202-224-2621,PattyMurray,53358.0,http://www.murray.senate.gov/public/index.cfm/...,https://www.murray.senate.gov/public,https://youtube.com/SenatorPattyMurray


Now we have to get the contributors for each of these guys

In [35]:
G = nx.Graph()

The following code block will query for contributions all 500+ legislators in `leg_df`.

However, there's a rate limit of 200 queries per day, so you won't actually be able to run this all the way through.

In [39]:
failure_cases = list()
for _,legislator in leg_df.iterrows():
    # query for the contribution
    try:
        response =http.request('GET',get_url('candContrib',
                                             apikey,
                                             {'cid':legislator['cid']}))
        # decode and parse
        response_json = dec.decode(response.data.decode('utf-8'))['response']
        # iterate through contributors
        for contributor in response_json['contributors']['contributor']:
            # add labeled edges to network
            G.add_edge(legislator['cid'],
                       contributor['@attributes']['org_name'],
                       amount=contributor['@attributes']['total'])
    except:
        print("Failed on",legislator)
        failure_cases.append(legislator)

Failed on bioguide_id                                           G000556
birthdate                                          1958-03-13
cid                                                 N00028418
comments                                  Lost Senate primary
congress_office              303 Cannon House Office Building
exit_code                                                  20
facebook_id                                               NaN
fax                                              202-225-9742
feccandid                                           H6FL08213
first_elected                                            2012
firstlast                                        Alan Grayson
gender                                                      M
lastname                                              GRAYSON
office                                                   FL09
party                                                       D
phone                                            202-225-988

In [41]:
len(list(nx.get_edge_attributes(G,'amount').items()))

5349

In [42]:
nx.write_gexf(G,'contributions.xml')

Check out `Lab 9 - Social network analysis.ipynb` to see what we can do with this network.