<a href="https://colab.research.google.com/github/liyueling13/Predicting-Banned-Books/blob/main/1)_Banned_Books_Getting_the_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PEN America Banned Books -- pulling descriptions and publication date
10/30/23

## Setup

In [None]:
# this performs line wrapping on output text in Colab

from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [None]:
# mount drive
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/My Drive/Data Science/Springboard assignments/Capstone Three/Banned Books')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd

## Load Pen America Banned Books

In [None]:
banned_df = pd.read_csv('PEN America Banned Books.csv', header=None)
columns = ['author', 'title']
banned_df.columns = columns
banned_df

Unnamed: 0,author,title
0,"Àbíké-Íyímídé, Faridah",Ace of Spades
1,"Acevedo, Elizabeth",Clap When You Land
2,"Acevedo, Elizabeth",The Poet X
3,"Acevedo, Elizabeth",The Poet X
4,"Acevedo, Elizabeth",The Poet X
...,...,...
2527,"Zia, Farhana",The Garden of My Imaan
2528,"Ziemke, Kristin",Read the World: Rethinking Literacy for Empath...
2529,"Zoboi, Ibi",American Street
2530,"Zoboi, Ibi",Black Enough: Stories of Being Young & Black i...


In [None]:
banned_df.drop_duplicates(inplace=True)
banned_df

Unnamed: 0,author,title
0,"Àbíké-Íyímídé, Faridah",Ace of Spades
1,"Acevedo, Elizabeth",Clap When You Land
2,"Acevedo, Elizabeth",The Poet X
6,"Aciman, André",Call Me By Your Name (Call Me By Your Name Ser...
7,"Acito, Marc","How I Paid for College: A Novel of Sex, Theft,..."
...,...,...
2527,"Zia, Farhana",The Garden of My Imaan
2528,"Ziemke, Kristin",Read the World: Rethinking Literacy for Empath...
2529,"Zoboi, Ibi",American Street
2530,"Zoboi, Ibi",Black Enough: Stories of Being Young & Black i...


In [None]:
# remove trailing whitespaces
banned_df = banned_df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

Let's clean up the dataframe. Some of the titles include subtitles and descriptions. My API call won't work on these. I will remove subtitles/descriptions by dropping everything after a parentheses or colon.

In [None]:
banned_df['title'] = banned_df['title'].str.replace(r'[:\(\[].*$', '', regex=True)
banned_df

Unnamed: 0,author,title
0,"Àbíké-Íyímídé, Faridah",Ace of Spades
1,"Acevedo, Elizabeth",Clap When You Land
2,"Acevedo, Elizabeth",The Poet X
6,"Aciman, André",Call Me By Your Name
7,"Acito, Marc",How I Paid for College
...,...,...
2527,"Zia, Farhana",The Garden of My Imaan
2528,"Ziemke, Kristin",Read the World
2529,"Zoboi, Ibi",American Street
2530,"Zoboi, Ibi",Black Enough


In [None]:
banned_df.drop_duplicates(inplace=True)
banned_df

## Let's figure out how to query the Google Books API



In [None]:
# import requests

# Google Books API key
# API_KEY = redacted

In [None]:
def get_book_json(title, author):
    base_url = 'https://www.googleapis.com/books/v1/volumes'
    params = {
        'q': f'intitle:{title}+inauthor:{author}',
        'key': API_KEY
    }

    response = requests.get(base_url, params=params)
    data = response.json()
    print(data)

In [None]:
# title = "All Boys Aren't Blue"
# author = "Johnson, George M."
# get_book_json(title, author)

In [None]:
def get_book_details(title, author):
    base_url = 'https://www.googleapis.com/books/v1/volumes'
    params = {
        'q': f'intitle:{title}+inauthor:{author}',
        'key': API_KEY
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        data = response.json()
        if 'items' in data:
            book = data['items'][0]['volumeInfo']
            title = book.get('title', 'N/A')
            authors = ', '.join(book.get('authors', ['N/A']))
            description = book.get('description', 'N/A')
            published_date = book.get('publishedDate', 'N/A')
            print(f'Title: {title}')
            print(f'Authors: {authors}')
            print(f'Description: {description}')
            print(f'Published Date: {published_date}')
        else:
            print('Book not found.')
    else:
        print('Error fetching data from Google Books API.')

In [None]:
# title = "All Boys Aren't Blue"
# author = "Johnson, George M."
# get_book_details(title, author)

Title: All Boys Aren't Blue
Authors: George M. Johnson
Description: *An Amazon Best Book of the Year optioned for television by Gabrielle Union!* In a series of personal essays, prominent journalist and LGBTQIA+ activist George M. Johnson explores his childhood, adolescence, and college years in New Jersey and Virginia. From the memories of getting his teeth kicked out by bullies at age five, to flea marketing with his loving grandmother, to his first sexual relationships, this young-adult memoir weaves together the trials and triumphs faced by Black queer boys. Both a primer for teens eager to be allies as well as a reassuring testimony for young queer men of color, All Boys Aren't Blue covers topics such as gender identity, toxic masculinity, brotherhood, family, structural marginalization, consent, and Black joy. Johnson's emotionally frank style of writing will appeal directly to young adults.
Published Date: 2020-04-28


## Let's pull descriptions and publishing dates from the Google Books API

In [None]:
from numpy import NaN
banned_df['description'] = NaN
banned_df['published_date'] = NaN
banned_df

Unnamed: 0,author,title,description,published_date
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,,
1,"Acevedo, Elizabeth",Clap When You Land,,
2,"Acevedo, Elizabeth",The Poet X,,
6,"Aciman, André",Call Me By Your Name,,
7,"Acito, Marc",How I Paid for College,,
...,...,...,...,...
2527,"Zia, Farhana",The Garden of My Imaan,,
2528,"Ziemke, Kristin",Read the World,,
2529,"Zoboi, Ibi",American Street,,
2530,"Zoboi, Ibi",Black Enough,,


In [None]:
def paste_book_details(title, author):
    base_url = 'https://www.googleapis.com/books/v1/volumes'
    params = {
        'q': f'intitle:{title}+inauthor:{author}',
        'key': API_KEY
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        data = response.json()
        if 'items' in data:
            book = data['items'][0]['volumeInfo']
            title = book.get('title', 'N/A')
            authors = ', '.join(book.get('authors', ['N/A']))
            description = book.get('description', 'N/A')
            published_date = book.get('publishedDate', 'N/A')
            banned_df.loc[banned_df['title']==title, 'description'] = description
            banned_df.loc[banned_df['title']==title, 'published_date'] = published_date
        else:
            print(f'Title: {title} - Book not found.')
    else:
        print(f'Title: {title} - Error fetching data from Google Books API.')

In [None]:
# title = 'Clap When You Land'
# author = 'Acevedo, Elizabeth'
# paste_book_details(title, author)

In [None]:
# banned_df.loc[banned_df['title']==title]
# it works!

Unnamed: 0,author,title,description,published_date
1,"Acevedo, Elizabeth",Clap When You Land,In a novel-in-verse that brims with grief and ...,2020-05-05


In [None]:
banned_df.iloc[0]['author']

'Àbíké-Íyímídé, Faridah'

Google API has quota limits. I'm limited to 1000 queries per day and 100 queries per min. So I will do this iteratively in 50s and wait a couple minutes between each query.

In [None]:
# df_len = len(banned_df)

# import time
# batch_size = 50

# for i in range(0, df_len, batch_size):
  #   for j in range(i, min(i+batch_size, total_items)):
    #   title = banned_df.iloc[j]['title']
    #   author = banned_df.iloc[j]['author']
    #   paste_book_details(title, author)
  #   time.sleep(60)

In [None]:
banned_df.head(50)

Unnamed: 0,author,title,description,published_date
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,"Gossip Girl meets Get Out in Ace of Spades, a ...",2021-06-01
1,"Acevedo, Elizabeth",Clap When You Land,In a novel-in-verse that brims with grief and ...,2020-05-05
2,"Acevedo, Elizabeth",The Poet X,A National Book Award Longlist title! Fans of ...,2019-03-19
6,"Aciman, André",Call Me By Your Name,,
7,"Acito, Marc",How I Paid for College,,
8,"Ada, Alma Flor",My Name Is María Isabel,,
9,"Addasi, Maha",Time to Pray,,
10,"Adeyemi, Tomi",Children of Blood and Bone,,
11,"Adeyoha, Koja","47,000 Beads",When Peyton doesn't want to wear a dress or da...,2017
12,"Adichie, Chimamanda Ngozi",Half of a Yellow Sun,"From the award-winning, bestselling author of ...",2007-09-04


In [None]:
banned_df['description']

0       Gossip Girl meets Get Out in Ace of Spades, a ...
1       In a novel-in-verse that brims with grief and ...
2       A National Book Award Longlist title! Fans of ...
6                                                     NaN
7                                                     NaN
                              ...                        
2527                                                  NaN
2528                                                  NaN
2529                                                  NaN
2530                                                  NaN
2531                                                  NaN
Name: description, Length: 1656, dtype: object

In [None]:
banned_df.to_csv('banned_book_descriptions.csv', index=False)

In [None]:
banned_df = pd.read_csv('banned_book_descriptions.csv')
banned_df

Unnamed: 0,author,title,description,published_date
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,"Gossip Girl meets Get Out in Ace of Spades, a ...",2021-06-01
1,"Acevedo, Elizabeth",Clap When You Land,In a novel-in-verse that brims with grief and ...,2020-05-05
2,"Acevedo, Elizabeth",The Poet X,A National Book Award Longlist title! Fans of ...,2019-03-19
3,"Aciman, André",Call Me By Your Name,,
4,"Acito, Marc",How I Paid for College,,
...,...,...,...,...
1651,"Zia, Farhana",The Garden of My Imaan,,
1652,"Ziemke, Kristin",Read the World,,
1653,"Zoboi, Ibi",American Street,,
1654,"Zoboi, Ibi",Black Enough,,


In [None]:
banned_df.isna().sum()

author               0
title                0
description       1138
published_date    1132
dtype: int64

In [None]:
# we have about 500 descriptions, let's start playing around with this

We've reached our API limit; we can pull more requests another day.

## Iterate more requests

Now that it's another day, we can try pulling more requests.

In [None]:
banned_df = pd.read_csv('banned_book_descriptions.csv')
banned_df.isna().sum()

author               0
title                0
description       1138
published_date    1132
dtype: int64

In [None]:
banned_df

Unnamed: 0,author,title,description,published_date
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,"Gossip Girl meets Get Out in Ace of Spades, a ...",2021-06-01
1,"Acevedo, Elizabeth",Clap When You Land,In a novel-in-verse that brims with grief and ...,2020-05-05
2,"Acevedo, Elizabeth",The Poet X,A National Book Award Longlist title! Fans of ...,2019-03-19
3,"Aciman, André",Call Me By Your Name,,
4,"Acito, Marc",How I Paid for College,A deliciously funny romp of a novel about one ...,2005-08-02
...,...,...,...,...
1651,"Zia, Farhana",The Garden of My Imaan,,
1652,"Ziemke, Kristin",Read the World,,
1653,"Zoboi, Ibi",American Street,,
1654,"Zoboi, Ibi",Black Enough,,


In [None]:
import math

df_len = len(banned_df)
import time
batch_size = 50

for i in range(0, df_len, batch_size):
    for j in range(i, min(i+batch_size, df_len)):
        check_value = banned_df.iloc[j]['description']
        try:
          if math.isnan(check_value):
            title = banned_df.iloc[j]['title']
            author = banned_df.iloc[j]['author']
            paste_book_details(title, author)
        except TypeError:
          pass
    time.sleep(120)

In [None]:
banned_df.isna().sum()

author              0
title               0
description       620
published_date    620
dtype: int64

In [None]:
banned_df.to_csv('banned_book_descriptions.csv', index=False)

In [None]:
banned_df

Unnamed: 0,author,title,description,published_date
0,"Àbíké-Íyímídé, Faridah",Ace of Spades,"Gossip Girl meets Get Out in Ace of Spades, a ...",2021-06-01
1,"Acevedo, Elizabeth",Clap When You Land,In a novel-in-verse that brims with grief and ...,2020-05-05
2,"Acevedo, Elizabeth",The Poet X,A National Book Award Longlist title! Fans of ...,2019-03-19
3,"Aciman, André",Call Me By Your Name,,
4,"Acito, Marc",How I Paid for College,A deliciously funny romp of a novel about one ...,2005-08-02
...,...,...,...,...
1651,"Zia, Farhana",The Garden of My Imaan,,
1652,"Ziemke, Kristin",Read the World,,
1653,"Zoboi, Ibi",American Street,,
1654,"Zoboi, Ibi",Black Enough,,


In [None]:
banned_df.isna().sum()

author              0
title               0
description       620
published_date    620
dtype: int64

In [None]:
# we have over 1000 descriptions!