# New York Times Books API

This code

* Uses the New York Times Books API to download that 
  week's Combined Print & E-Book Fiction best seller list. 
* Creates a visualization that shows how many weeks each
  of the books have been on the list.

You need to 
**[sign up](https://developer.nytimes.com/get-started) for a NYT developer 
account**, register a new app, and copy the API key for your app into a
script called `config.py` in the same folder as this notebook. Save the
key as a string with the variable name `api_key`. 

You'll probably also find the 
[NYT API documentation](https://developer.nytimes.com/apis) useful.

In [1]:
import requests
import pandas as pd

try:
    from config import api_key
except ModuleNotFoundError:
    api_key = None
    print("config.py not found; set api_key to run the NYT request.")

config.py not found; set api_key to run the NYT request.




In [2]:
# Note that the Combined Print & E-Book Fiction list is published
# just once a week, but the API accepts any date, and returns the
# most recently published list.

date = '2019-01-20'  
list = "combined-print-and-e-book-fiction"

In [3]:
if not api_key:
    data = {"results": {"books": []}}
    print("Skipping NYT request; api_key is missing.")
else:
    response = requests.get(
        "https://api.nytimes.com/svc/books/v3/lists/{date}/{list}.json?api-key={api_key}".format(
            api_key=api_key, date=date, list=list
        )
    )
    data = response.json()
    print(data)

Skipping NYT request; api_key is missing.


In [4]:
dataframe = pd.json_normalize(data['results'], record_path=['books'])
dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 0 entries
Empty DataFrame


In [5]:
columns = ["title", "author", "publisher", "description", "rank", "rank_last_week", "weeks_on_list"]
available = [c for c in columns if c in dataframe.columns]
if not available:
    print("Expected columns not found; using full dataframe.")
    df = dataframe.copy()
else:
    df = dataframe[available]
    if "title" in available:
        df = df.set_index("title")
df


Expected columns not found; using full dataframe.


In [6]:
if df.shape[1] == 0:
    print("No columns available for describe().")
else:
    df.describe()


No columns available for describe().


In [7]:
numeric_df = df.select_dtypes(include="number")
if numeric_df.shape[1] == 0:
    print("No numeric columns available for correlation.")
else:
    numeric_df.corr()


No numeric columns available for correlation.


In [8]:
import matplotlib.pyplot as plt
import matplotlib.style as style
style.use('seaborn-v0_8-pastel')
plt.rcParams.update({'font.size': 20, 'figure.figsize': (12, 8)})

In [9]:
required = ["rank", "weeks_on_list"]
missing = [c for c in required if c not in df.columns]
if missing:
    print(f"Skipping plot; missing columns: {missing}")
else:
    df.plot(kind="bar", x="rank", y="weeks_on_list", title="Longest Bestsellers by Current Ranking")


Skipping plot; missing columns: ['rank', 'weeks_on_list']


In [10]:
if "weeks_on_list" not in df.columns:
    print("Skipping plot; missing column: weeks_on_list")
else:
    df.plot(kind="barh", y="weeks_on_list", title="Longest Bestsellers by Title in Order of Ranking")


Skipping plot; missing column: weeks_on_list


In [11]:
required = ["author", "weeks_on_list"]
missing = [c for c in required if c not in df.columns]
if missing:
    print(f"Skipping plot; missing columns: {missing}")
else:
    df.plot(kind="barh", x="author", y="weeks_on_list", title="Longest Bestsellers by Author in Order of Ranking")


Skipping plot; missing columns: ['author', 'weeks_on_list']


In [12]:
required = ["publisher", "weeks_on_list"]
missing = [c for c in required if c not in df.columns]
if missing:
    print(f"Skipping plot; missing columns: {missing}")
else:
    df.groupby("publisher").weeks_on_list.sum().plot(kind="barh", title="Longest Bestsellers by Publisher")


Skipping plot; missing columns: ['publisher', 'weeks_on_list']


In [13]:
required = ["rank", "rank_last_week", "weeks_on_list"]
missing = [c for c in required if c not in df.columns]
if missing:
    print(f"Skipping plot; missing columns: {missing}")
else:
    df.plot(kind="barh", y=["rank", "rank_last_week", "weeks_on_list"], title="Weeks on List, Rank, and Last Week Rank by Title")


Skipping plot; missing columns: ['rank', 'rank_last_week', 'weeks_on_list']


In [14]:
numeric_df = df.select_dtypes(include="number")
if numeric_df.shape[1] == 0:
    print("Skipping plot; no numeric columns.")
else:
    numeric_df.plot(title="Correlation of Rank to Last Week's Rank and Weeks on List " )
    ax1 = plt.axes()
    x_axis = ax1.axes.get_xaxis()
    x_axis.set_visible(False)


Skipping plot; no numeric columns.


In [15]:
import statsmodels.api as sm

required = ["rank", "rank_last_week", "weeks_on_list"]
missing = [c for c in required if c not in df.columns]
if missing:
    print(f"Skipping regression; missing columns: {missing}")
    results = None
else:
    model_df = df[required].apply(pd.to_numeric, errors="coerce").dropna()
    if model_df.empty:
        print("Skipping regression; no numeric rows available.")
        results = None
    else:
        X = sm.add_constant(model_df[["rank", "rank_last_week"]])
        y = model_df["weeks_on_list"]
        results = sm.OLS(y, X).fit()
        results.summary()

Skipping regression; missing columns: ['rank', 'rank_last_week', 'weeks_on_list']


In [16]:
if results is None:
    print("Skipping partial regression plots; regression model not available.")
else:
    fig = plt.figure(figsize=(12,8))
    fig = sm.graphics.plot_partregress_grid(results, fig=fig)

Skipping partial regression plots; regression model not available.
