# 20th Century Book and Their Movie Adaptations
Exploration of the popularity of titles around the time of their publication and movie adaptation.

By Kenneth Yang

# Introduction
## What are we looking for?
- 📖 How did a title's popularity trend when it was published?
- 🎥 How did a title's popularity trend when it was adapted into a movie?
- 📈 Did the movie adaptation leave a net increase in popularity of the title?

## What is Google Ngram Viewer and why are we using it?
- 🔎 [Google Books Ngram Viewer](https://books.google.com/ngrams/)
- 📰 20th century: most talk is through books, newspapers, and magazines. Works as a proxy for popularity.
- 📊 Makes pretty graphs for us to look at.

# Data Collection
- Define books, their publication dates, and their movie adaptations.
- Pull the data from Google Ngram Viewer.

## Publication dates of books and their movie adaptations

In [None]:
from collections import namedtuple

# Define a type to hold the publication dates of a book and its movie adaptation.
Published = namedtuple("Published", ["book", "movie"])

In [None]:
BOOKS = {
    "Gone with the Wind": Published(1936, 1939),
    "The Grapes of Wrath": Published(1939, 1940),
    "The Maltese Falcon": Published(1930, 1941),
    "To Kill a Mockingbird": Published(1960, 1962),
    "One Flew Over the Cuckoo's Nest": Published(1962, 1975),
    "The Godfather": Published(1969, 1972),
    "A Clockwork Orange": Published(1962, 1971),
}

## Pull data from Google Ngram Viewer

In [None]:
from requests import get
def json_url(title: str, start=1900, end=2000):
    return f"https://books.google.com/ngrams/json?content={title.replace(" ", "+")}&year_start={start}&year_end={end}&corpus=en&smoothing=0&case_insensitive=true"

In [None]:
responses = {}
for book in BOOKS:
    # Pull from Google Ngram Viewer.
    response = get(json_url(book))
    
    # Check for errors.
    response.raise_for_status()
    
    # Only keep the first timeseries data (the case-insensitive one).
    responses[book] = response.json()[0]["timeseries"]

## Compute the popularity trends.
This is the numerical derivatives of the Ngram from year to year.

In [None]:
from numpy import diff
trends = {}
for book, data in responses.items():
    trends[book] = diff(data)

# Display the Data

In [None]:
from IPython.display import IFrame

def graph_url(title: str, start=1900, end=2000):
    return json_url(title, start, end).replace("json", "interactive_chart", 1)

In [None]:
print(BOOKS["Gone with the Wind"])
IFrame(graph_url("Gone with the Wind"), width=900, height=500) 

In [None]:
IFrame(graph_url("The Grapes of Wrath"), width=900, height=500)

In [None]:
IFrame(graph_url("The Maltese Falcon"), width=900, height=500)

In [None]:
IFrame(graph_url("To Kill a Mockingbird"), width=900, height=500)

In [None]:
IFrame(graph_url("One Flew Over the Cuckoo's Nest"), width=900, height=500)

In [None]:
IFrame(graph_url("The Godfather"), width=900, height=500)

In [None]:
IFrame(graph_url("A Clockwork Orange"), width=900, height=500)