## 2. Authenticated API: Searching through articles 🗞

This time, we'll be dealing with an *authenticated* API, which requires you to identify yourself when you make requests. This is done with an *API key*, which you must sign up for with the provider of the service you are using. **APIs with keys are the rule rather than the exception.** Among other things, they enable user-based quotas (e.g. throttling, monthly limits) and, importantly, billing. (There will be no billing today.)

### 2.1. Getting a NYT API key

🏋️‍♂️ **Go to https://developer.nytimes.com/get-started and follow the instructions.** 🏋️‍♂️ You'll need to go through all the steps.

One item on the instructions is obsolete. Once you're signed in to your newly created account, you'll select simply **"Apps"** (not "My Apps") from the user dropdown (pictured below).

<img src="img/nytdev_apps.jpg" width="600" />

When you create a new app, make sure to enable the *Article Search API*:

<img src="img/nytdev_asa_switch.png" width="600" />

Once you've hit "create", you should be looking at a page like the one below:

<img src="img/nytdev_created_app.png" width="600" />

Note the API key in the image above.

🏋️‍♂️**Copy your API key, paste it into the cell below, and save it to the variable `NYT_API_KEY`.** We'll be using this variable (or constant, really, as the casing suggests) pervasively.

In [None]:
NYT_API_KEY = 'YOUR_KEY_HERE' # replace with your API key

### 2.2. Making our first request

It turns out it's rather easy to get some meaningful results out of this API. We just need to set the `api-key` param appropriately.

To illustrate, let's search for articles on Laverne Cox.

In [None]:
import requests

response = requests.get(
    'https://api.nytimes.com/svc/search/v2/articlesearch.json',
    params={'q': 'laverne cox', 'api-key': NYT_API_KEY})
response.text[:5000]

Holy moly, that's just the first 5KB of the response...

How can we figure out the structure in order to find what we want?

In [None]:
response.json().keys()

Let's see what `status` contains:

In [None]:
response.json()['status']

And the keys in `response`:

In [None]:
response.json()['response'].keys()

Let's see what's in `docs`.

⚠️ Caution: This will output a *ton* of text on screen. If you want to clear it, make sure the cell is selected, then go to `Edit > Clear Outputs`.

In [None]:
response.json()['response']['docs']

### 🏋️‍♂️Exercise🏋️‍♂️ 

**Print total word frequencies, in descending order, for the lead paragraphs of the first 5 pages of API search results for articles on climate change published in 2006 or later.** (Or pick your topic; it doesn't have to be climate change. It just has to have sufficient results.)

This is comprised of several parts:
* Restricting the date range of a search.
* Extracting the lead paragraph of an article from a search result.
* Getting different/multiple pages of search results.
* Gathering up all the lead paragraphs into a single place.
* Counting the word frequencies.

**Search for articles on climate change published in 2006 or later. Save the response.** *Hint:* Read the [API docs](https://developer.nytimes.com/docs/articlesearch-product/1/overview), especially the "Using Facets" section.

In [None]:
# START
response = requests.get(
    'https://api.nytimes.com/svc/search/v2/articlesearch.json',
    params={
        'q': 'climate change',
        'begin_date': '20060101',
        'api-key': NYT_API_KEY
    })
# END

**Extract the lead paragraph from the first search result.**

In [None]:
# START
response.json()['response']['docs'][0]['lead_paragraph']
# END

**Fetch the second page of results.** Again, reading the docs can be handy.

In [None]:
# START
requests.get(
    'https://api.nytimes.com/svc/search/v2/articlesearch.json',
    params={
        'q': 'climate change',
        'begin_date': '20060101',
        'page': 2,
        'api-key': NYT_API_KEY
    })
# END

Now, **make 5 separate API requests, one for each page of results, and save all the lead paragraphs into a list.**

In [None]:
ARTICLE_SEARCH_URL = 'https://api.nytimes.com/svc/search/v2/articlesearch.json'
lead_paragraphs = [] # save lead paragraphs in this list
# START
for i in range(5):
    response = requests.get(
        ARTICLE_SEARCH_URL,
        params={
            'q': 'climate change',
            'begin_date': '20060101',
            'page': i,
            'api-key': NYT_API_KEY
    })
    for doc in response.json()['response']['docs']:
        lead_paragraphs.append(doc['lead_paragraph'])
# END

**Join the `lead_paragraphs` into one long string, separated by spaces, and make it all lowercase. Save the results as `text`.**

In [None]:
# START
text = " ".join(lead_paragraphs).lower()
# END

We're going to use a small dose of regex magic to take the text and get the words out of it:

In [None]:
import re
words = re.findall(r'\w+', text)

**Compute the word frequencies using a dictionary (or `collections.Counter`, if you're adventurous).**

In [None]:
# START
freqs = {}
for w in words:
    if w not in freqs:
        freqs[w] = 0
    freqs[w] += 1
# END

**Display the entries, sorted in descending order by frequency.**

In [None]:
# START
sorted(freqs.items(), key=lambda x: -x[1])[:250]
# END