# Track your US Congress Representative and Senator

Ever wonder who are representing you in the US Congress and Senate? How are they doing in Washington in terms of addressing your concerns and representing your values? Nowadays there is not a lack of information, but too much information. To make it easier to become informed, I did some googling and coding to get the kind of information I want. 

## Here are the steps to set up before running python:
1. Get yourself an API key from the PROPUBLICA website (https://propublica.github.io/congress-api-docs/#congress-api-documentation)
2. Find your US Congress Representative and Senator (https://www.govtrack.us/congress/members)
3. Get their Member ID (https://www.congress.gov/help/field-values/member-bioguide-ids)
4. Get yourself a New York Times API key (https://developer.nytimes.com/)

                                                        

## In my case, Florida
The Congress Rep in my district is Ted Yoho and his member ID is Y000065. 

In [1]:
import requests
import pandas as pd

# You might need to do the following 2 lines
# nltk.download('punkt')
#  nltk.download('stopwords')
from nltk.tokenize import word_tokenize 
from nltk.corpus import stopwords
import string
import re


In [37]:
# You have to subsitute your API keys 
headers = {
    'X-API-Key': 'YOUR-PROPUBLICA-KEY'
}
nyt_key='YOUR-NYT-KEY'

In [12]:
# Retrieve recent Bills by a specific member
member_id='Y000065' # Ted Yoho, Florida
bill_type='introduced'
bills_url='https://api.propublica.org/congress/v1/members/'+member_id+'/bills/'+bill_type+'.json'
data=requests.get(bills_url,headers=headers).json()

In [13]:
bill_data=data['results'][0]['bills']
df_bill=pd.DataFrame(columns=['date','bill id','title','co-sponsors'],index=range(0,len(bill_data)))
i=0
for bill in bill_data:
#    print bill['title']
    df_bill.iloc[i]['date']=bill['introduced_date']
    df_bill.iloc[i]['bill id']=bill['bill_id']
    df_bill.iloc[i]['title']=bill['title']
    df_bill.iloc[i]['co-sponsors']=bill['cosponsors']
    df_bill.iloc[i]['primary subject']=bill['primary_subject']
    i=i+1
    
pd.set_option('max_colwidth',200)
from IPython.display import display, HTML
display(df_bill)

Unnamed: 0,date,bill id,title,co-sponsors
0,"March 30, 2017",hr1847-115,"To amend the Horse Protection Act to designate additional unlawful acts under the Act, strengthen penalties for violations of the Act, improve Department of Agriculture enforcement of the Act, and...",208
1,"March 23, 2017",hres223-115,Calling on the People's Republic of China (PRC) to cease its retaliatory measures against the Republic of Korea in response to the deployment of the U.S. Terminal High Altitude Area Defense (THAAD...,5
2,"March 23, 2017",hconres40-115,Expressing the sense of Congress that all direct and indirect subsidies that benefit the production or export of sugar by all major sugar producing and consuming countries should be eliminated.,13
3,"March 20, 2017",hr1643-115,FEAA,0
4,"March 16, 2017",hr1592-115,Holding Health Insurers Harmless Act,0
5,"January 12, 2017",hr512-115,WINGMAN Act,174
6,"January 10, 2017",hr430-115,State Sponsors of Terrorism Review Enhancement Act,9
7,"January 4, 2017",hr291-115,TRUST Act,12
8,"September 28, 2016",hconres170-114,Expressing support for the designation of a 'National Purebred Dog Day'.,1
9,"July 14, 2016",hr5908-114,FEAA,4


There you have it. The latest 20 Bills introduced/co-sponsored by Representative Yoho. I personally find some of them questionable (#4 "Holding Health Insurers Harmless Act" and #14 "Condemning and censuring President Barack Obama"?!). You could also check from the number of co-sponsors to see whether the bill is partisan or bi-partisan.

## Track Congress Rep/Senator in mainstream media such as New York Times

Assuming you have the NYT API key already (see Step 4 in the beginning), you can set up your NYT queries easily. The query options for NYT API can be found in: https://developer.nytimes.com/article_search_v2.json#/README

I'm going to look at what NYT has reported about Bernie Sanders (who doesn't miss Bernie?!). 
I also want to combine the information on the bills Bernie introduced as part of my NYT search queries.

In [18]:
# Retrieve recent Bills by a specific member
member_id='S000033' # Bernie Sanders
bill_type='introduced'
bills_url='https://api.propublica.org/congress/v1/members/'+member_id+'/bills/'+bill_type+'.json'
data=requests.get(bills_url,headers=headers).json()
bill_data=data['results'][0]['bills']
df_bill=pd.DataFrame(columns=['date','bill id','title','co-sponsors'],index=range(0,len(bill_data)))
i=0
for bill in bill_data:
#    print bill['title']
    df_bill.iloc[i]['date']=bill['introduced_date']
    df_bill.iloc[i]['bill id']=bill['bill_id']
    df_bill.iloc[i]['title']=bill['title']
    df_bill.iloc[i]['co-sponsors']=bill['cosponsors']
    df_bill.iloc[i]['primary subject']=bill['primary_subject']
    i=i+1

pd.set_option('max_colwidth',200)
from IPython.display import display, HTML
display(df_bill)

Unnamed: 0,date,bill id,title,co-sponsors
0,"March 9, 2017",s586-115,Corporate Tax Dodging Prevention Act,1
1,"March 2, 2017",s495-115,Medical Innovation Prize Fund Act,0
2,"February 28, 2017",s469-115,Affordable and Safe Prescription Drug Importation Act,21
3,"February 16, 2017",s427-115,Social Security Expansion Act,1
4,"June 9, 2016",s3044-114,Puerto Rico Humanitarian Relief and Reconstruction Act,0
5,"December 10, 2015",s2391-114,American Clean Energy Investment Act of 2015,2
6,"December 10, 2015",s2398-114,Clean Energy Worker Just Transition Act,2
7,"December 10, 2015",s2399-114,Climate Protection and Justice Act of 2015,0
8,"November 5, 2015",s2242-114,Save Oak Flat Act,3
9,"November 4, 2015",s2237-114,Ending Federal Marijuana Prohibition Act of 2015,0


Take the #2 bill for an example. I extracted the bill's title and used it in the filter query in NYT articles' body of text. In addition, I also made the query request that "Bernie Sanders" appear in the article body or headline. The period for the search is between 09-01-2016 and 04-02-2017. 

In [28]:
bill_number=2
bill_title=df_bill.iloc[bill_number]['title']
bill_id=df_bill.iloc[bill_number]['bill id']

# String wrangling to extract key words only
bill_title_token = word_tokenize(bill_title)
regex = re.compile('[%s]' % re.escape(string.punctuation)) # remove punctuation
new_title_nopunc=[]
for token in bill_title_token:
    new_token = regex.sub(u'', token)
    if not new_token == u'':
        new_title_nopunc.append(new_token)
        
new_title_nopunc.append(bill_id)
#new_title_nopunc.append("Sanders")
fq_words=[]
for words in new_title_nopunc:
    if not words in stopwords.words('english'):
        fq_words.append('"'+str(words)+'"')
fq_words_list=" ".join(fq_words) # prepare list of key words for 'feed query' in NYT API
print('Original Bill title:  '+bill_title)
print('Query words in article body:  '+fq_words_list)

# prepare New York Times API
nyt_api = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?'
query_term='q="Bernie Sanders"'
fq_term='&fq=body:('+ fq_words_list + ') AND news_desk:("Politics")'
#fq_term='&fq=body:('+ fq_words_list + ')'

begin_date = '20160901'
end_date = '20170402'
date = '&begin_date=' + begin_date + '&' + 'end_date=' + end_date + '&'
offset = "offset=0"
sort_order='&sort:newest'
api_key='&api-key='+nyt_key

nyt_url = nyt_api + query_term + fq_term + date + offset + sort_order + api_key

# Fetch articles from NYT
articles=requests.get(nyt_url)
article_data=articles.json()
print('Fetched '+ str(len(article_data['response']['docs']))+' articles')

Original Bill title:  Affordable and Safe Prescription Drug Importation Act
Query words in article body:  "Affordable" "Safe" "Prescription" "Drug" "Importation" "Act" "s469-115"
Fetched 1 articles


In [32]:
formatted=[]
for i in article_data['response']['docs']:
    dic={}
#    dic['id']=i['_id']
    dic['headline']=i['headline']['main'].encode("utf8")
    dic['date']=i['pub_date'][0:10]
    dic['snippet']=i['snippet'].encode("utf8")
    dic['author']=i['byline']['original'].encode("utf8")
    dic['URL']=i['web_url'].encode("utf8")
    formatted.append(dic)

In [36]:
df_articles= pd.DataFrame.from_dict(formatted)
pd.set_option('max_colwidth',200)
from IPython.display import display, HTML
display(df_articles)
#HTML(df_articles.to_html())

Unnamed: 0,URL,author,date,headline,snippet
0,https://www.nytimes.com/2016/11/09/us/politics/donald-trump-won-now-what.html,By ALAN RAPPEPORT and ALEXANDER BURNS,2016-11-09,Highlights of Hillary Clinton’s Concession Speech and President Obama’s Remarks,Hillary Clinton gave her concession speech on Wednesday hours after an early-morning conversation in which Mrs. Clinton conceded the presidential race to Donald J. Trump....


A closer look at the NYT article reveals that the query search found a match in the article body with the words "Affordable" and "Sanders". It is not exactly a relevant match. The simple conclusion is that #2 Bill on "Affordable and Safe Prescription Drug Importation Act" has not been reported in New York Times.

# Next Steps
- Find a way to do better search query with contextual phrases, not separated words.
- Track State House Representatives and Senators with API from https://legiscan.com/datasets
- Look up API for news outlets other than NYT ( e.g. https://newsapi.org/the-washington-post-api)