# New York Times API ARCHIVE Code

Elena Fernandez Fernandez

## Sources: 
* https://github.com/rochelleterman/scrape-interwebz/blob/master/1_APIs/3_api_workbook.ipynb
* https://medium.com/@danalindquist/using-new-york-times-api-and-jq-to-collect-news-data-a5f386c7237b
* https://stackabuse.com/saving-text-json-and-csv-to-a-file-in-python/

#### 1. Import libraries

In [1]:
import requests
import json
from __future__ import division
import csv

#### 2. Constructing API request

* It is required that you get your own key here: https://developer.nytimes.com/

In [2]:
your_key = " " ## write your own key here

url = "https://api.nytimes.com/svc/archive/v1/1910/1.json?api-key=" + your_key ##you need to write the year and the month

r = requests.get(url)

json_data = r.json()


In [3]:
json_data

{'copyright': 'Copyright (c) 2019 The New York Times Company. All Rights Reserved.',
 'response': {'meta': {'hits': 8140},
  'docs': [{'web_url': 'https://query.nytimes.com/gst/abstract.html?res=980DE3DA1730E233A25752C0A9679C946196D6CF',
    'snippet': '',
    'print_page': '6',
    'blog': [],
    'source': 'The New York Times',
    'multimedia': [],
    'headline': {'main': 'PHIL LEWIS LOSES CASE.; Brooklyn Club Not Required to Pay Salary Claimed.',
     'kicker': '1',
     'content_kicker': None,
     'print_headline': None,
     'name': None,
     'seo': None,
     'sub': None},
    'keywords': [],
    'pub_date': '1910-01-01T00:00:00Z',
    'document_type': 'article',
    'type_of_material': 'Article',
    '_id': '4fc04fdb45c1498b0d2420fc',
    'word_count': 144,
    'score': 0},
   {'web_url': 'https://query.nytimes.com/gst/abstract.html?res=9E02E3DA1730E233A25752C0A9679C946196D6CF',
    'snippet': 'The Governments of the Far East occupied the attention of the American Political 

In [4]:
response_text = r.text


In [5]:
data = json.loads(response_text)

In [6]:
print(data.keys())

dict_keys(['copyright', 'response'])


In [7]:
data['response'].keys()

dict_keys(['meta', 'docs'])

In [8]:
docs = data['response']['docs']

In [9]:
len(docs)

8140

#### 3. Filter the information that we would like to have for our analysis

In [10]:
def format_articles(unformatted_docs):
    '''
    This function takes in a list of documents returned by the NYT api 
    and parses the documents into a list of dictionaries, 
    with 'id', 'header', and 'date' keys
    '''
    formatted = []
    for i in unformatted_docs:
        dic = {}
        dic['headline'] = i['headline']['main']
        dic['date'] = i['pub_date'] # cutting time of day.
        dic['snippet'] = i['snippet']
        formatted.append(dic)
    return(formatted)

In [11]:
all_formatted = format_articles(docs)

#### 4. Save the data into a CSV file

In [12]:
keys = all_formatted[1]
# writing the rest
with open('january1920.csv', 'w', encoding = 'utf-8') as output_file:
    dict_writer = csv.DictWriter(output_file, keys)
    dict_writer.writeheader()
    dict_writer.writerows(all_formatted)