## Using NY Times APIs

What are APIs?

Structured ways people can give you their data.

Why?

Usually because they want to help web/mobile developers attract more users to their service.

Twitter doesn't have an API to help you out.

They want developers to build apps to drive more eyeballs to their service.

![](https://raw.github.com/nealcaren/workshop_2014/master/notebooks/images/times_inequality.png)

No love with the scrape!!!

![](https://raw.github.com/nealcaren/workshop_2014/master/notebooks/images/no_luck.png)

In [1]:
import requests

Do me a favor and sign up to be a [developer](http://developer.nytimes.com) with the New York Times and get your own API key.

In [2]:
my_times_api_key = 'b565d8cd98f84bd487728a44142e5ee4'

APIs can be accessed like a normal URL, but they are often very long, complicated, and involve variables you want to change. For example, you can get the information about the first 10 articles published in the New York Times that used the word "food" with:

[http://api.nytimes.com/svc/search/v2/articlesearch.json?sort=newest&begin_date=20170101&end_date=20171015&api-key=d20bc9ac37156ecc4cb3d78eb956201d%3A0%3A54059647&q=food&page=0](http://api.nytimes.com/svc/search/v2/articlesearch.json?sort=newest&begin_date=20170101&end_date=20171015&api-key=d20bc9ac37156ecc4cb3d78eb956201d%3A0%3A54059647&q=food&page=0)

Requests allows you to do this in a more civilized way.

In [3]:
payload = {'q'         : 'food', 
           'begin_date': '20170101' ,
           'end_date'  : '20171015',
           'api-key'   :  my_times_api_key,
           'sort'      : 'oldest' ,
           'offset'    :  20}

base_url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?'

In [4]:
r = requests.get(base_url, params = payload)

#r.url

In [5]:
r.text

'{"message":"Invalid authentication credentials"}\n'

In [6]:
r.json()

{'message': 'Invalid authentication credentials'}

In [7]:
json = r.json()


In [8]:
json.keys()

dict_keys(['message'])

In [11]:
json['status']

KeyError: 'status'

Output from `json['response']` ommitted because it was really long.

In [None]:
json['response']['docs']

In [None]:
json['response']['meta']['hits']

In [None]:
from time import sleep

base_url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?'

payload = { 'q'         : 'food', 
            'api-key'   :  my_times_api_key,
            'sort'      : 'newest' ,
            'page'      :  0}
    
years = [2010, 2011, 2012, 2013, 2014, 2015, 2016]
counts = []
for year in years:
 
    
    year_string = str(year)
    payload['begin_date'] = year_string + '0101'
    payload['end_date']   = year_string + '1231'
    r = requests.get(base_url, params = payload)
    json = r.json()
    count  = json['response']['meta']['hits']
    counts.append(count)
    sleep(.1)   

This can be plotted in Python.

In [None]:
%pylab inline
import matplotlib.pyplot as plt

plt.scatter(years,counts)
plt.ticklabel_format(useOffset=False)

Your turn. Modify the script below to output a csv with the monthly total of "food" articles. For an extra challenge, add an additional column with the count of the number of "food security" articles.