# NYT API 

Notebook implementing task on Slide 23 of Day 12

Our goal: collect NYT articles that we can use to compare with our video descriptions. Videos have dates (video_timestamp), so we want the articles to be from the same date

1. Write a Python function that takes a date, for example, "2024-02-12", and returns the list of articles for that day (extracting it from the month’s archive).
2. Write some code that explores whether the fields "abstract" and "snippet" are always the same or often differ. Which one has more information?
3. Write a function that given one article (in its nested structure), creates a flat dictionary with keys that are relevant for analysis: either the abstract or snippet (see point 2); lead paragraph; headline; keywords concatenated via semicolon; pub_date; document_type; section_name; and type_of_material
4. Write another function that calls the function from point 3 on every article, to create a list of article dictionaries, and convert this list into a dataframe and then store it as a CSV file with the date-month in the title (this is important for point 5 below).

This notebook will create the CSV to be used with Step 5:

5. Once you have done all of these in the notebook, create a Python script that can be called with a date (from a TikTok video). First, the script looks whether a CSV with cleaned articles is in our folder. If not, calls first the API function to get the articles and then the function that converts them into a CSV. Then, it loads the CSV into a dataframe and it uses filtering to get the articles for the desired date. These articles will be used for the Semantic Similarity portion of the TikTok Project.

Note: Point 5 is not included in this Notebook

**Table of Contents**
1. [Extract List of Articles](#1)
2. [Explore Abstract vs. Snippet](#2)
3. [Create Flat Dictionary](#3)     
4. [Create List of Article Dictionaries and CSV](#4)


### 1. Extract List of Articles <a class="anchor" id="1"></a>
Write a Python function that takes a date, for example, "2024-02-12", and returns the list of articles for that day (extracting it from the month’s archive).

In [3]:
import requests

API_key = 'CvRk9Qjp9rbVhKThEcRSAphBVJYU5SDT'            # change as necessary

In [30]:
def get_articles(date):
    year, month, _ = date.split('-')
    
    # remove leading 0 if month is a single digit
    if month[0] == '0':
        month = month[1:]
    
    url = f"https://api.nytimes.com/svc/archive/v1/{year}/{month}.json?api-key={API_key}"
    
    response = requests.get(url)

    if response.status_code != 200:     # only return for successful requests
        print ('Error making request to API')
        return
    else:
        # extract the articles from the response JSON corresponding to date
        articles = response.json()['response']['docs']
        article_list = [article for article in articles if article['pub_date'][:10] == date]
        num_articles = len(article_list)
        
        print(f"Successfully got articles for {date}!")
        print(f"Number of articles: {num_articles} \n")
        return article_list


In [15]:
# test get articles function
data = get_articles('2024-02-12')

Successfully got articles for 2024-02-12!
Number of articles: 116


### 2. Explore Abstract vs. Snippet <a class="anchor" id="2"></a>
Write some code that explores whether the fields "abstract" and "snippet" are always the same or often differ. Which one has more information?

In [17]:
# see what abstract and snippet look like for one article
test_article = data[0]
print("Testing for first Article")
print("Abstract:", test_article['abstract'])
print("Snippet:", test_article['snippet'])

Testing for first Article
Abstract: A Cetaphil commercial showed a father and daughter connecting over football and the music superstar. But a social media influencer said the idea was stolen from her.
Snippet: A Cetaphil commercial showed a father and daughter connecting over football and the music superstar. But a social media influencer said the idea was stolen from her.


In [37]:
def explore_abstracts_and_snippets(date):
    """ helper function, takes date and explores similarity and differences between abstract and snippet"""
    
    data = get_articles(date)
    
    print('Now exploring abstracts vs. snippets...')
    num_same = 0
    num_diff = 0

    for article in data:
        abstract = article['abstract']
        snippet = article['snippet']
        name = article['headline']['main']

        if abstract == snippet:
            num_same += 1
        else:
            num_diff +=1
            print(f"\n\"{name}\" has differing abstract and snippet:")
            print('Abstract:', abstract)
            if snippet == '':
                print('Article has no snippet.')
            else:
                print('Snippet:', snippet)

    if num_same == num_diff:
        print('\nAll articles have identical abstracts and snippets')
    else:
        print('\nNumber of articles where abstract and snippet are the same:', num_same)
        print('Number of articles where abstract and snippet are different:', num_diff, '\n')

In [36]:
test_date_list = ['2024-02-12', '2024-01-12', '2023-12-12']

for date in test_date_list:
    print(f"----- Exploring articles from {date} -----")
    explore_abstracts_and_snippets(date)

----- Exploring articles from 2024-02-12 -----
Successfully got articles for 2024-02-12!
Number of articles: 116 

Now exploring abstracts vs. snippets...
"Nothing Says ‘Be Mine’ Like a Chocolate Chip Cookie the Size of Your Face" has differing abstract and snippet:
Abstract: Have you ever baked a giant chocolate chip cookie in a skillet? If you haven’t, now is the time because Samantha Seneviratne’s recipe is utterly magical. Like everything she puts forth, Samantha’s giant cookie is perfectly balanced, a harmony of brown sugar, walnuts and chocolate chips seasoned with just enough salt to tame the sweetness. And smushing one giant cookie into a skillet is vastly easier and faster than forming individual cookies. It’s the kind of after-work treat that makes Monday altogether worth it.
Article has no snippet.

Number of articles where abstract and snippet are the same: 115
Number of articles where abstract and snippet are different: 1 

----- Exploring articles from 2024-01-12 -----
Su

Seems that snippet is either a truncated version of the abstract, or article will have an abstract but no snippet. Let's use abstract for part 3.

### 3. Create Flat Dictionary <a class="anchor" id=3></a>

Write a function that given one article (in its nested structure), creates a flat dictionary with keys that are relevant for analysis: either the abstract or snippet (see point 2); lead paragraph; headline; keywords concatenated via semicolon; pub_date; document_type; section_name; and type_of_material

In [48]:
# explore keyword field
test_article['lead_paragraph']

'When an advertisement for Cetaphil lotion was released online days before the Super Bowl, it drew rave reviews for a narrative that evoked a familiar story for parents, football fans and followers of Taylor Swift.'

In [43]:
keywords = []
for keyword in test_article['keywords']:
    keywords.append(keyword['value'])
keywords_concat = ';'.join(keywords)
print(keywords_concat)

Advertising and Marketing;Super Bowl;Cosmetics and Toiletries;Swift, Taylor;Kelce, Travis;Social Media;TikTok (ByteDance)


In [56]:
def flat_dictionary(article):
    """
    Given an article, creates a flat dictionary with the abstract, lead paragraph, headline, keywords,
    pub_date, document_type, section_name, and type_of_material
    """
    dict = {}

    dict['abstract'] = article['abstract']
    dict['lead_paragraph'] = article['lead_paragraph']
    dict['headline'] = article['headline']['main']
    dict['pub_date'] = article['pub_date']
    dict['document_type'] = article['document_type']
    dict['section_name'] = article['section_name']
    dict['type_of_material'] = article['type_of_material']

    # get keywords
    keywords_list = []
    for keyword in article['keywords']:
        keywords_list.append(keyword['value'])
    keywords_concat = ';'.join(keywords)
    dict['keywords'] = keywords_concat

    return dict

In [57]:
# test the function
flat_dictionary(test_article)

{'abstract': 'A Cetaphil commercial showed a father and daughter connecting over football and the music superstar. But a social media influencer said the idea was stolen from her.',
 'lead_paragraph': 'When an advertisement for Cetaphil lotion was released online days before the Super Bowl, it drew rave reviews for a narrative that evoked a familiar story for parents, football fans and followers of Taylor Swift.',
 'headline': 'Ad Nods to Taylor Swift and Football, Drawing Cheers and Criticism',
 'pub_date': '2024-02-12T00:30:32+0000',
 'document_type': 'article',
 'section_name': 'Business Day',
 'type_of_material': 'News',
 'keywords': 'Advertising and Marketing;Super Bowl;Cosmetics and Toiletries;Swift, Taylor;Kelce, Travis;Social Media;TikTok (ByteDance)'}

### 4. Create List of Article Dictionaries and CSV <a class="anchor" id=4></a>
Write another function that calls the function from point 3 on every article, to create a list of article dictionaries, and convert this list into a dataframe and then store it as a CSV file with the date-month in the title (this is important for point 5 below).

In [62]:
import pandas as pd

def articles_to_csv(date):
    """
    Given a date, outputs a csv with all relevant information for each article
    Calls helper functions get_articles and flat_dictionary
    """
    # get data using helper function
    data = get_articles(date)

    # collect all article dictionaries 
    article_data =[]
    for article in data:
        article_dict = flat_dictionary(article)
        article_data.append(article_dict)

    # create dataframe and write to csv
    df = pd.DataFrame(article_data)
    df.to_csv(f"NYT_articles_{date}.csv")

In [63]:
# test function
articles_to_csv('2024-02-12')

Successfully got articles for 2024-02-12!
Number of articles: 116 

