# Merge together gathered data on social media accounts with opensecrets IDs
### Big picture:
We have two lists of current legislators which contain different sets of data, both from https://github.com/unitedstates/congress-legislators. We would like to match up these two lists, and store certain fields from both in a new flat file.

## Basic setup

In [2]:
import json
import pandas as pd

## Load data

In [3]:
with open("../data/congress-legislators/alternate_formats/legislators-social-media.json",'r') as f:
    social_media = json.load(f)
with open("../data/congress-legislators/alternate_formats/legislators-current.json",'r') as f:
    current = json.load(f)

## Construct dataframes containing relevant information for each file

#### We'll match on bioguide id, so want that from both datasets
Fields we also care about right now are name, state, opensecrets, bioguide, twitter, and twitter_id. We'll also get rss_url of the latest term if they have it.

From legislators-current, we need name, state, opensecrets, bioguide (and rss_url of last term if available)

In [19]:

current_df = pd.DataFrame()
for i, c in enumerate(current):
    d = {}
    d['name'] = c['name']['official_full']
    
    # we use -1 as an index because that provides most recent office in list of offices
    d['state'] = c['terms'][-1]['state']
    
    # not everyone has this field
    try:
        d['rss_url'] = c['terms'][-1]['rss_url']
    except:
        d['rss_url'] = None
        
    # this is true of literally one guy: Lucas Strange (fittingly)
    try:
        d['opensecrets'] = c['id']['opensecrets']
    except:
        d['opensecrets'] = None
        
    d['bioguide'] = c['id']['bioguide']
    
    # http://stackoverflow.com/questions/16597265/appending-to-an-empty-data-frame-in-pandas
    current_df = current_df.append(d, ignore_index=True)

In [6]:
social_media[471]['social']

{'facebook': 'RepDanDonovan',
 'twitter': 'RepDanDonovan',
 'twitter_id': 3353670647,
 'youtube_id': 'UCT8-VskXvxCqDuSGzR5sWWg'}

From legislators-social-media, we need bioguide, twitter, and twitter_id.

In [7]:
social_media_df = pd.DataFrame()
for i, sm in enumerate(social_media):
    d = {}
    d['bioguide'] = sm['id']['bioguide']
    
    # not everyone has twitter accounts
    try:
        d['twitter'] = sm['social']['twitter']
        d['twitter_id'] = str(sm['social']['twitter_id'])
    # print the index of the people who don't have twitter
    except:
        d['twitter'] = None
        d['twitter_id'] = None
        print(i)
    
    # http://stackoverflow.com/questions/16597265/appending-to-an-empty-data-frame-in-pandas
    social_media_df = social_media_df.append(d, ignore_index=True)

46
93
264
303
306
382
388
520
531


## Merge datasets on bioguide (left join)

In [8]:
merged = pd.merge(left=current_df, right=social_media_df, on='bioguide', how='left')

## Write result

In [16]:
merged.to_csv("../data/lincoln/current_social_media.csv", index=False)