# NY Times API 

So far, we have had exposure to working with requests library in Python. Using our knwoledge of APIs, we are going to use the NY Times API and collect a set of data.Please note that the NY Times API will not return full text for available news article. However, it will provide us with useful metadata such as subject terms, abstract, and updated-date, urls to images, and any other information a developer might seek when scraping the site. 

The main steps for this tutotal are the following:
1. Fetch data from NY Times API
2. Load JSON data using requests
3. parse nested JSON for title of article and data updated
4. **Advanced**: Use a visualization library of your choice, visually represent the data. 

First step is to [sign up](https://developer.nytimes.com/signup) for getting an API key. Take a moment to look through the link. Notice that the available api-key requests are separated by the _type_ of request. This is helpful for us as developers but it also provides user meta data to NY Times. For example, client metadata from api-key requests can be used by NY Times for analysis ([BIA](https://en.wikipedia.org/wiki/Business_intelligence)), building future APIs, improving existing ones, or providing additional API wrappers. In an organizational setting, generated meta data from the client side is just as important as receiving data from the server. User data has helped NY Times improve its developer tools such as providing new APIs, corresponding wrappers, and improving data accessibility. 

In the sign-up page, fill out your information to receive your api-key. For the purposes of this tutorial, the Top Stories API key was requested and used. Feel free to pick any of the other available APIs. See a [list](https://developer.nytimes.com/) of available APIs to follow through this tutorial. Once you have received the api key in your e-mail, store it as a variable in your python script. 

One of the main reasons the NY Times was selected for the purposes of this course is to receive a visual understanding between dynamic vs. static componenet of URL parameters. Dynamic components are aspects of the URL that can be changed as new information is needed. The static component is part of the base URL and no matter how much the requested information changes, the base part of it usually does not change. To understand the static and dynamic components of the URL in relation to making GET request, NY Times has made a helpful [GUI interface](https://developer.nytimes.com/top_stories_v2.json) available. 

For this notebook the following modules will be used:

In [1]:
import requests
import pandas as pd
import json
import networkx
import sys

## 1. Fetching Data
After importing the required modules, retrieving the necessary API key, choose a parameter you want to work with. This can be a single parameter (e.g. politics) or a list of parameters you want to iterate through as you construct your url. For the purposes of simplicity, I have chosen a single parameter. 

In [2]:
#sign up and get a key -- I chose Top Stories
key = "a1b6d14c03494629990937518357c475"
# within Top Stories, there are list of parameters to work with. Choose one param or a list of params to iterate through. 
url = 'https://api.nytimes.com/svc/topstories/v2/politics.json?q=&api-key=%s' % key
print(url)

https://api.nytimes.com/svc/topstories/v2/politics.json?q=&api-key=a1b6d14c03494629990937518357c475


## 2. Load the Data as a JSON

Now that we have constructed our url, feel free to visit and view JSON object you will be requesting in the next section. A [GET](https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/GET) request is performed. Error handling is also put in place for this section to ensure the right object is returned. This prevents confusion later on in the code because the json object loaded and returned gets used in part 3 and 4. 

In [3]:
def data_fetch(a):
    try:
        r = requests.get(url, timeout=100)
        
        if r.status_code != 200:
            r = 'N/A'
            return r
        else:
            obj = r.json()
            return obj
        
    except requests.exceptions.RequestException as error:
        print(error)
        sys.exit(1)

main_obj = data_fetch(url)
print(main_obj)

{'status': 'OK', 'copyright': 'Copyright (c) 2018 The New York Times Company. All Rights Reserved.', 'section': 'politics', 'last_updated': '2018-01-20T20:47:58-05:00', 'num_results': 12, 'results': [{'section': 'U.S.', 'subsection': 'Politics', 'title': 'Bitter Bickering Muddies the Path to Ending the Government Shutdown', 'abstract': 'With the government shut down and the two parties faulting each other, senators from both parties were looking for an agreement to end the crisis.', 'url': 'https://www.nytimes.com/2018/01/20/us/politics/government-shutdown-budget-talks.html', 'byline': 'By THOMAS KAPLAN and SHERYL GAY STOLBERG', 'item_type': 'Article', 'updated_date': '2018-01-20T19:53:12-05:00', 'created_date': '2018-01-20T11:07:56-05:00', 'published_date': '2018-01-20T11:07:56-05:00', 'material_type_facet': '', 'kicker': '', 'des_facet': ['United States Politics and Government', 'Shutdowns (Institutional)', 'Deferred Action for Childhood Arrivals'], 'org_facet': ['Senate'], 'per_face

Now that we have our JSON loaded, let's take a moment to understand the code above. For line 3, the get request is slightly different from what we have come across. The [timeout](http://docs.python-requests.org/en/master/user/quickstart/#timeouts) argument is used to instruct Requests on how long to wait on a response. For the example, 100 seconds has been designated but this can be smaller or bigger. This 100 second is not a limitation on how long the script has to load our needed data. Instead, this is an instruction to raise exception errors if 100 seconds elapsed but no byte of data has been received. 

We also have included a check to ensure the correct status_code in our request is getting returned. Different [status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) indicates different responses the server sends when a client requests data using [Hyper Transfer Text Protocol](https://developer.mozilla.org/en-US/docs/Web/HTTP). Since a 200 status code indicates that the request has been successful, we are checking for any event where the response comes back without the request being successful. 

Line 12 focuses on any ambiguous error that could have occured. Instead of having a statement for base class exception errors, this statement can be modified to specifically look for ConnectionErrors. In order to write more robust code with better error handling in the future, take a look at the [Exceptions doc](http://docs.python-requests.org/en/master/api/#exceptions). 

Line 14 uses sys.exit() from Python's [sys](https://docs.python.org/3/library/sys.html) module. This module works with system specific parameters and functions. Calling sys.exit() allows us to exit from Python if the error occurs. Passing an argument is optional but I have chosen to pass 1 as an argument. By default, 0 is seen as normal termination and any other integer is regarded as ["abnormal termination"](https://docs.python.org/3/library/sys.html#sys.exit). 

## 3. Parsing Nested JSON

A JSON object is contained within {}. Inside the JSON object, there is always a string key paired with a value that can be a string, number, another object, array, boolean, or null. A nested JSON object is when a key of a JSON contains another JSON object as its value. It is practical to understand the construction of large JSON objects by targeting the value of a key within a nested JSON object. For the purposes of this tutorial, we are going to look for the titles of all our articles under the Politics category in Top Stories the API has returned. 

First, let's examine the keys that appear in the JSON object. 

In [15]:
def parse_json(obj):
    data = obj.get('results')

    #Accessing the Title, published_date, and abstracts of each article
    title = []
    date = []
    ab = []
    c = 0
    
    for i in range(len(data)):
        
        val = data[c].get('title')
        title.append(val)
        
        dt = data[c].get('published_date')
        date.append(dt)
    
        a = data[c].get('abstract')
        ab.append(a)
        
        c += 1
        
    #creating a pandas data frame
    df = pd.DataFrame({'Published Date': date, 'Article Title': title, 'Abstract': ab})
    print(df)
    return df
        
print(parse_json(main_obj))


                                             Abstract  \
0   With the government shut down and the two part...   
1   The vast machinery of the federal government b...   
2   Immigration policy, the issue that propelled P...   
3   Carl Higbie stepped down from the Corporation ...   
4   Representative Patrick Meehan, Republican of P...   
5   The early days of the federal government shutd...   
6   Representative Patrick Meehan, a Republican me...   
7   How Dana Loesch, a onetime Democrat, became a ...   
8   This week, only the tours went according to plan.   
9   Rene A. Boucher of Bowling Green, Ky., is expe...   
10  The deal fell apart later in the day when the ...   
11  For someone who once described himself as “ver...   

                                        Article Title  \
0   Bitter Bickering Muddies the Path to Ending th...   
1   Open, Closed or Something in Between: What a S...   
2   After Vowing to Fix Washington, Trump Is Mired...   
3   Trump Appointee Resigns an

In this step, we have sifted through a nested JSON object, extracted the specific elements needed, and reformatted the required information into a Pandas Data Frame. Pandas Dataframes are crucial for working with data sets in Python. The data from this format can now be used for further textual analysis and data visualization. 

## 4. Visualization

The purposes of data visualization is usually based on a case by case basis. The purposes of good data visualization is to easily communicate your findings to someone else. Visualization of data needs to be functional, clean, and easy to read. Graphs and charts should follow good design principles that convey the purpose of your project to a new audience. Data visualization is a fast growing and vast field. For the purposes of this tutorial, we can use the nouns from the article title or abstract stored in the Panda's dataframe above and create a noun co-occurrence network. 

In [16]:
# do noun chunks and then use networkx for graph
