# How I Collected the ALA Banned Book Data

By [Crystal Shearer](https://grrlofhighart.github.io/)

This notebook outlines my process for collecting and cleaning ALA Banned Book Data from the ALA website. The Office for Intellectual Freedom Americal Library Association compiles a list each year of the Top 10 most challenged books. Lists from 2001 thru 2023 are housed at https://www.ala.org/bbooks/frequentlychallengedbooks/top10/archive.

I was interested in trying out different collection methods throughout my project, so I decided to start with what seemed like the most simple method....Google Sheets. Since the data on the ALA website is formatted as an ordered list I was able to utilize the IMPORTHTML function. I'll also be working with a large dataset that requires sqlite, so I'll be transferring the ALA book data to my sql database for later analysis.

### Step 1: Importing all the necessary Libraries

In [2]:
# pandas for data manipulation
import pandas as pd
# sqlite3 for storage of the dataset
import sqlite3
from contextlib import closing
# re for working with regular expressions (strings)
import re

### Step 2: Importing the Google Sheet

In [3]:
# Function to convert Google Sheets url to a csv export url.
def convert_google_sheet_url(url):
    # Regular expression to match and capture the necessary part of the URL
    pattern = r'https://docs\.google\.com/spreadsheets/d/([a-zA-Z0-9-_]+)(/edit#gid=(\d+)|/edit.*)?'

    # Replace function to construct the new URL for CSV export
    # If gid is present in the URL, it includes it in the export URL, otherwise, it's omitted
    replacement = lambda m: f'https://docs.google.com/spreadsheets/d/{m.group(1)}/export?' + (f'gid={m.group(3)}&' if m.group(3) else '') + 'format=csv'

    # Replace using regex
    new_url = re.sub(pattern, replacement, url)

    return new_url

In [4]:
# URL to my Google Sheet with book data imported using IMPORTHTML function
url = 'https://docs.google.com/spreadsheets/d/1Ef1j2eAi6eGx7iF9ItePMEbQr6PqvfwSmpaHKLEo-5Q/edit?gid=1299874103#gid=1299874103'

# Updated URL
updated_url = convert_google_sheet_url(url)

# Pull info from Google Sheet into dataframe
df = pd.read_csv(updated_url)
df.head()

Unnamed: 0,2023,2022,2021,2020,2019,2018,2017,2016,2015,2014,...,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001
0,Gender Queer: A Memoir; Maia Kobabe; Reasons: ...,Gender Queer: A Memoir; Maia Kobabe; Reasons: ...,Gender Queer: A Memoir; Maia Kobabe; Reasons: ...,"George; Alex Gino; Reasons: Challenged, banned...","George; Alex Gino; Reasons: challenged, banned...","George; Alex Gino; Reasons: banned, challenged...",Thirteen Reasons Why; Jay Asher; Reasons: Orig...,This One Summer; Mariko Tamaki; Reasons: chall...,Looking for Alaska; John Green; Reasons: offen...,The Absolutely True Diary of a Part-Time India...,...,And Tango Makes Three; Peter Parnell and Justi...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons:...",And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,"It's Perfectly Normal: Changing Bodies, Growin...",The Chocolate War; Robert Cormier; Reasons: of...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Harry Potter Series Box Set; J.K. Rowling; Rea...,Harry Potter Series Box Set; J.K. Rowling; Rea...
1,All Boys Aren't Blue: A Memoir-Manifesto; Geor...,All Boys Aren't Blue: A Memoir-Manifesto; Geor...,Lawn Boy; Jonathan Evison; Reasons: Banned and...,"Stamped: Racism, Antiracism, and You; Ibram X....",Beyond Magenta: Transgender and Nonbinary Teen...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Absolutely True Diary of a Part-Time India...,Drama; Raina Telgemeier; Reasons: challenged b...,Fifty Shades of Grey; E. L. James; Reasons: se...,Persepolis: The Story of a Childhood; Marjane ...,...,The Absolutely True Diary of a Part-Time India...,And Tango Makes Three; Peter Parnell and Justi...,His Dark Materials; Philip Pullman; Reasons: p...,The Chocolate War; Robert Cormier; Reasons: of...,Gossip Girl; Cecily Von Ziegesar; Reasons: hom...,Forever; Judy Blume; Reasons: offensive langua...,Fallen Angels; Walter Dean Myers; Reasons: off...,Harry Potter Series Box Set; J.K. Rowling; Rea...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Of Mice and Men; John Steinbeck; Reasons: offe...
2,This Book is Gay; Juno Dawson; Reasons: LGBTQI...,The Bluest Eye; Toni Morrison; Reasons: Banned...,All Boys Aren't Blue: A Memoir-Manifesto; Geor...,All American Boys; Jason Reynolds and Brendan ...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Adventures Of Captain Underpants; Dav Pilk...,Drama; Raina Telgemeier; Reasons: This Stonewa...,George; Alex Gino; Reasons: challenged because...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,And Tango Makes Three; Justin Richardson and P...,...,Brave New World; Aldous Huxley; Reasons: insen...,The Perks of Being A Wallflower; Stephen Chbos...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons:...",Olive's Ocean; Kevin Henkes; Reasons: offensiv...,The Agony of Alice; Phyllis Reynolds Naylor; R...,The Catcher in the Rye; J. D. Salinger; Reason...,Arming America: The Origins of a National Gun ...,Of Mice and Men; John Steinbeck; Reasons: offe...,The Chocolate War; Robert Cormier; Reasons: of...,The Chocolate War; Robert Cormier; Reasons: of...
3,The Perks of Being a Wallflower; Stephen Chbos...,Flamer; Mike Curato; Reasons:Banned and challe...,Out of Darkness; Ashley Hope Perez; Reasons: B...,"Speak; Laurie Halse Anderson; Reasons: Banned,...","Sex Is a Funny Word: A Book about Bodies, Feel...",The Hate U Give; Angie Thomas; Reasons: banned...,The Kite Runner; Khaled Hosseini; Reasons: Thi...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,Beyond Magenta: Transgender and Nonbinary Teen...,The Bluest Eye; Toni Morrison; Reasons: sexual...,...,"Crank; Ellen Hopkins; Reasons: drugs, offensiv...",To Kill A Mockingbird; Harper Lee; Reasons: of...,Scary Stories to Tell in the Dark; Alvin Schwa...,The Golden Compass; Philip Pullman; Reasons: r...,"The Earth, My Butt, and Other Big Round Things...",The Chocolate War; Robert Cormier; Reasons: se...,The Adventures Of Captain Underpants; Dav Pilk...,Arming America: The Origins of a National Gun ...,I Know Why the Caged Bird Sings; Maya Angelou;...,I Know Why the Caged Bird Sings; Maya Angelou;...
4,Flamer; Mike Curato; Reasons: LGBTQIA+ content...,Looking for Alaska; John Green; Reasons: Banne...,The Hate U Give; Angie Thomas; Reasons: Banned...,The Absolutely True Diary of a Part-Time India...,Prince & Knight; Daniel Haack; Reasons: challe...,Drama; Raina Telgemeier; Reasons: banned and c...,George; Alex Gino; Reasons: Written for elemen...,Two Boys Kissing; David Levithan; Reasons: cha...,The Curious Incident of the Dog in the Night-T...,It’s Perfectly Normal; Robie Harris; Reasons: ...,...,The Hunger Games; Suzanne Collins; Reasons: se...,Twilight; Stephenie Meyer; Reasons: religious ...,Bless Me Ultima; Rudolfo Anaya; Reasons: occul...,The Adventures of Huckleberry Finn; Mark Twain...,The Bluest Eye; Toni Morrison; Reasons: offens...,"Whale Talk; Chris Crutcher; Reasons: racism, o...",The Perks of Being a Wallflower; Stephen Chbos...,Fallen Angels; Walter Dean Myers; Reasons: dru...,Taming the Star Runner; S.E. Hinton; Reasons: ...,Summer of My German Soldier; Bette Greene; Rea...


In [5]:
# Replace NaN values with 'None' to avoid error
df = df.replace({float('nan'): 'None; None; None'})

# Remove special characters
df = df.map(lambda x: re.sub(r'[\'\.+\`\|\#\:\’\*+]', '', x))
# df = df.map(lambda x: re.sub(r'[\-]', ' ', x))

# Remove parentheses and information within from dataframe
df = df.map(lambda x: re.sub(r'\([^()]*\)', '', x))

# Preview dataframe
df.head()

Unnamed: 0,2023,2022,2021,2020,2019,2018,2017,2016,2015,2014,...,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001
0,Gender Queer A Memoir; Maia Kobabe; Reasons LG...,Gender Queer A Memoir; Maia Kobabe; Reasons Ba...,Gender Queer A Memoir; Maia Kobabe; Reasons Ba...,"George; Alex Gino; Reasons Challenged, banned,...","George; Alex Gino; Reasons challenged, banned,...","George; Alex Gino; Reasons banned, challenged,...",Thirteen Reasons Why; Jay Asher; Reasons Origi...,This One Summer; Mariko Tamaki; Reasons challe...,Looking for Alaska; John Green; Reasons offens...,The Absolutely True Diary of a Part-Time India...,...,And Tango Makes Three; Peter Parnell and Justi...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons ...",And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,"Its Perfectly Normal Changing Bodies, Growing ...",The Chocolate War; Robert Cormier; Reasons off...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Harry Potter Series Box Set; J.K. Rowling; Rea...,Harry Potter Series Box Set; J.K. Rowling; Rea...
1,All Boys Arent Blue A Memoir-Manifesto; George...,All Boys Arent Blue A Memoir-Manifesto; George...,Lawn Boy; Jonathan Evison; Reasons Banned and ...,"Stamped Racism, Antiracism, and You; Ibram X. ...",Beyond Magenta Transgender and Nonbinary Teens...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Absolutely True Diary of a Part-Time India...,Drama; Raina Telgemeier; Reasons challenged be...,Fifty Shades of Grey; E. L. James; Reasons sex...,Persepolis The Story of a Childhood; Marjane S...,...,The Absolutely True Diary of a Part-Time India...,And Tango Makes Three; Peter Parnell and Justi...,His Dark Materials; Philip Pullman; Reasons po...,The Chocolate War; Robert Cormier; Reasons off...,Gossip Girl; Cecily Von Ziegesar; Reasons homo...,Forever; Judy Blume; Reasons offensive languag...,Fallen Angels; Walter Dean Myers; Reasons offe...,Harry Potter Series Box Set; J.K. Rowling; Rea...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Of Mice and Men; John Steinbeck; Reasons offen...
2,This Book is Gay; Juno Dawson; Reasons LGBTQIA...,The Bluest Eye; Toni Morrison; Reasons Banned ...,All Boys Arent Blue A Memoir-Manifesto; George...,All American Boys; Jason Reynolds and Brendan ...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Adventures Of Captain Underpants; Dav Pilk...,Drama; Raina Telgemeier; Reasons This Stonewal...,George; Alex Gino; Reasons challenged because ...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,And Tango Makes Three; Justin Richardson and P...,...,Brave New World; Aldous Huxley; Reasons insens...,The Perks of Being A Wallflower; Stephen Chbos...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons ...",Olives Ocean; Kevin Henkes; Reasons offensive ...,The Agony of Alice; Phyllis Reynolds Naylor; R...,The Catcher in the Rye; J. D. Salinger; Reason...,Arming America The Origins of a National Gun C...,Of Mice and Men; John Steinbeck; Reasons offen...,The Chocolate War; Robert Cormier; Reasons off...,The Chocolate War; Robert Cormier; Reasons off...
3,The Perks of Being a Wallflower; Stephen Chbos...,Flamer; Mike Curato; ReasonsBanned and challen...,Out of Darkness; Ashley Hope Perez; Reasons Ba...,"Speak; Laurie Halse Anderson; Reasons Banned, ...","Sex Is a Funny Word A Book about Bodies, Feeli...",The Hate U Give; Angie Thomas; Reasons banned ...,The Kite Runner; Khaled Hosseini; Reasons This...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,Beyond Magenta Transgender and Nonbinary Teens...,The Bluest Eye; Toni Morrison; Reasons sexuall...,...,"Crank; Ellen Hopkins; Reasons drugs, offensive...",To Kill A Mockingbird; Harper Lee; Reasons off...,Scary Stories to Tell in the Dark; Alvin Schwa...,The Golden Compass; Philip Pullman; Reasons re...,"The Earth, My Butt, and Other Big Round Things...",The Chocolate War; Robert Cormier; Reasons sex...,The Adventures Of Captain Underpants; Dav Pilk...,Arming America The Origins of a National Gun C...,I Know Why the Caged Bird Sings; Maya Angelou;...,I Know Why the Caged Bird Sings; Maya Angelou;...
4,"Flamer; Mike Curato; Reasons LGBTQIA content, ...",Looking for Alaska; John Green; Reasons Banned...,The Hate U Give; Angie Thomas; Reasons Banned ...,The Absolutely True Diary of a Part-Time India...,Prince & Knight; Daniel Haack; Reasons challen...,Drama; Raina Telgemeier; Reasons banned and ch...,George; Alex Gino; Reasons Written for element...,Two Boys Kissing; David Levithan; Reasons chal...,The Curious Incident of the Dog in the Night-T...,It’s Perfectly Normal; Robie Harris; Reasons n...,...,The Hunger Games; Suzanne Collins; Reasons sex...,Twilight; Stephenie Meyer; Reasons religious v...,Bless Me Ultima; Rudolfo Anaya; Reasons occult...,The Adventures of Huckleberry Finn; Mark Twain...,The Bluest Eye; Toni Morrison; Reasons offensi...,"Whale Talk; Chris Crutcher; Reasons racism, of...",The Perks of Being a Wallflower; Stephen Chbos...,Fallen Angels; Walter Dean Myers; Reasons drug...,Taming the Star Runner; S.E. Hinton; Reasons o...,Summer of My German Soldier; Bette Greene; Rea...


### Step 3: Setting up the Database

I plan on collecting multiple datasets, one of which I expect will be quite large. To keep everything uniform I decided to store my data in a sql database. 

In [18]:
# Setup SQLite Database to store data
database = 'Book_DB.db'
conn = sqlite3.connect(database)
c = conn.cursor()

In [7]:
# Transfer copy of original dataframe to database
df.to_sql(name='ala_top_10_archive_original', con=conn, if_exists='replace', index=False)
# df.to_csv('ala_top_10_archive_original.csv', index=False)

13

### Step 4: Cleaning the ALA Book Data

In its current form the dataframe is pretty useless. I'd like to separate out the book title, author, and challenge reasons into their own columns while keeping the years and order intact for each.

In [8]:
# Import table from sql database as dataframe
query = 'SELECT * FROM ala_top_10_archive_original'
sql_df = pd.read_sql(sql=query, con=conn)

# Preview dataframe
sql_df.head()

Unnamed: 0,2023,2022,2021,2020,2019,2018,2017,2016,2015,2014,...,2010,2009,2008,2007,2006,2005,2004,2003,2002,2001
0,Gender Queer A Memoir; Maia Kobabe; Reasons LG...,Gender Queer A Memoir; Maia Kobabe; Reasons Ba...,Gender Queer A Memoir; Maia Kobabe; Reasons Ba...,"George; Alex Gino; Reasons Challenged, banned,...","George; Alex Gino; Reasons challenged, banned,...","George; Alex Gino; Reasons banned, challenged,...",Thirteen Reasons Why; Jay Asher; Reasons Origi...,This One Summer; Mariko Tamaki; Reasons challe...,Looking for Alaska; John Green; Reasons offens...,The Absolutely True Diary of a Part-Time India...,...,And Tango Makes Three; Peter Parnell and Justi...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons ...",And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,And Tango Makes Three; Justin Richardson and P...,"Its Perfectly Normal Changing Bodies, Growing ...",The Chocolate War; Robert Cormier; Reasons off...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Harry Potter Series Box Set; J.K. Rowling; Rea...,Harry Potter Series Box Set; J.K. Rowling; Rea...
1,All Boys Arent Blue A Memoir-Manifesto; George...,All Boys Arent Blue A Memoir-Manifesto; George...,Lawn Boy; Jonathan Evison; Reasons Banned and ...,"Stamped Racism, Antiracism, and You; Ibram X. ...",Beyond Magenta Transgender and Nonbinary Teens...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Absolutely True Diary of a Part-Time India...,Drama; Raina Telgemeier; Reasons challenged be...,Fifty Shades of Grey; E. L. James; Reasons sex...,Persepolis The Story of a Childhood; Marjane S...,...,The Absolutely True Diary of a Part-Time India...,And Tango Makes Three; Peter Parnell and Justi...,His Dark Materials; Philip Pullman; Reasons po...,The Chocolate War; Robert Cormier; Reasons off...,Gossip Girl; Cecily Von Ziegesar; Reasons homo...,Forever; Judy Blume; Reasons offensive languag...,Fallen Angels; Walter Dean Myers; Reasons offe...,Harry Potter Series Box Set; J.K. Rowling; Rea...,The Agony of Alice; Phyllis Reynolds Naylor; R...,Of Mice and Men; John Steinbeck; Reasons offen...
2,This Book is Gay; Juno Dawson; Reasons LGBTQIA...,The Bluest Eye; Toni Morrison; Reasons Banned ...,All Boys Arent Blue A Memoir-Manifesto; George...,All American Boys; Jason Reynolds and Brendan ...,A Day in the Life of Marlon Bundo; Jill Twiss;...,The Adventures Of Captain Underpants; Dav Pilk...,Drama; Raina Telgemeier; Reasons This Stonewal...,George; Alex Gino; Reasons challenged because ...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,And Tango Makes Three; Justin Richardson and P...,...,Brave New World; Aldous Huxley; Reasons insens...,The Perks of Being A Wallflower; Stephen Chbos...,"ttyl; ttfn; l8r, g8r; Lauren Myracle; Reasons ...",Olives Ocean; Kevin Henkes; Reasons offensive ...,The Agony of Alice; Phyllis Reynolds Naylor; R...,The Catcher in the Rye; J. D. Salinger; Reason...,Arming America The Origins of a National Gun C...,Of Mice and Men; John Steinbeck; Reasons offen...,The Chocolate War; Robert Cormier; Reasons off...,The Chocolate War; Robert Cormier; Reasons off...
3,The Perks of Being a Wallflower; Stephen Chbos...,Flamer; Mike Curato; ReasonsBanned and challen...,Out of Darkness; Ashley Hope Perez; Reasons Ba...,"Speak; Laurie Halse Anderson; Reasons Banned, ...","Sex Is a Funny Word A Book about Bodies, Feeli...",The Hate U Give; Angie Thomas; Reasons banned ...,The Kite Runner; Khaled Hosseini; Reasons This...,I Am Jazz; Jessica Herthel and Jazz Jennings; ...,Beyond Magenta Transgender and Nonbinary Teens...,The Bluest Eye; Toni Morrison; Reasons sexuall...,...,"Crank; Ellen Hopkins; Reasons drugs, offensive...",To Kill A Mockingbird; Harper Lee; Reasons off...,Scary Stories to Tell in the Dark; Alvin Schwa...,The Golden Compass; Philip Pullman; Reasons re...,"The Earth, My Butt, and Other Big Round Things...",The Chocolate War; Robert Cormier; Reasons sex...,The Adventures Of Captain Underpants; Dav Pilk...,Arming America The Origins of a National Gun C...,I Know Why the Caged Bird Sings; Maya Angelou;...,I Know Why the Caged Bird Sings; Maya Angelou;...
4,"Flamer; Mike Curato; Reasons LGBTQIA content, ...",Looking for Alaska; John Green; Reasons Banned...,The Hate U Give; Angie Thomas; Reasons Banned ...,The Absolutely True Diary of a Part-Time India...,Prince & Knight; Daniel Haack; Reasons challen...,Drama; Raina Telgemeier; Reasons banned and ch...,George; Alex Gino; Reasons Written for element...,Two Boys Kissing; David Levithan; Reasons chal...,The Curious Incident of the Dog in the Night-T...,It’s Perfectly Normal; Robie Harris; Reasons n...,...,The Hunger Games; Suzanne Collins; Reasons sex...,Twilight; Stephenie Meyer; Reasons religious v...,Bless Me Ultima; Rudolfo Anaya; Reasons occult...,The Adventures of Huckleberry Finn; Mark Twain...,The Bluest Eye; Toni Morrison; Reasons offensi...,"Whale Talk; Chris Crutcher; Reasons racism, of...",The Perks of Being a Wallflower; Stephen Chbos...,Fallen Angels; Walter Dean Myers; Reasons drug...,Taming the Star Runner; S.E. Hinton; Reasons o...,Summer of My German Soldier; Bette Greene; Rea...


In [9]:
# Stack all columns of the dataframe
df_stack = sql_df.melt()

# Preview stacked dataframe
df_stack.head()

Unnamed: 0,variable,value
0,2023,Gender Queer A Memoir; Maia Kobabe; Reasons LG...
1,2023,All Boys Arent Blue A Memoir-Manifesto; George...
2,2023,This Book is Gay; Juno Dawson; Reasons LGBTQIA...
3,2023,The Perks of Being a Wallflower; Stephen Chbos...
4,2023,"Flamer; Mike Curato; Reasons LGBTQIA content, ..."


In [10]:
# Rename columns to make them easier to work with
df_stack = df_stack.rename(columns={'variable': 'year', 'value': 'drop1'})
# df_stack = df_stack[df_stack['drop1'] != ('None')]
df_stack

Unnamed: 0,year,drop1
0,2023,Gender Queer A Memoir; Maia Kobabe; Reasons LG...
1,2023,All Boys Arent Blue A Memoir-Manifesto; George...
2,2023,This Book is Gay; Juno Dawson; Reasons LGBTQIA...
3,2023,The Perks of Being a Wallflower; Stephen Chbos...
4,2023,"Flamer; Mike Curato; Reasons LGBTQIA content, ..."
...,...,...
294,2001,Fallen Angels; Walter Dean Myers; Reasons offe...
295,2001,Blood and Chocolate; Annette Curtis Klause; Re...
296,2001,None; None; None
297,2001,None; None; None


In [11]:
# Function to split column once on the ';' value
def split_column(name):
    return pd.Series(name.split(";", 1))

In [12]:
# Apply the split function to the "drop1" column using apply()
# Name the new columns accordingly
df_stack[['title', 'drop2']] = df_stack['drop1'].apply(split_column)

# Apply the split function to the new drop2 column using apply()
# Name the new columns accordingly
df_stack[['author', 'reasons']] = df_stack['drop2'].apply(split_column)
 
# Drop the columns labled with 'drop' since they are no longer needed 
df_stack.drop(columns=['drop1', 'drop2'], inplace=True)
df_stack

Unnamed: 0,year,title,author,reasons
0,2023,Gender Queer A Memoir,Maia Kobabe,"Reasons LGBTQIA content, claimed to be sexual..."
1,2023,All Boys Arent Blue A Memoir-Manifesto,George M. Johnson,"Reasons LGBTQIA content, claimed to be sexual..."
2,2023,This Book is Gay,Juno Dawson,"Reasons LGBTQIA content, sex education, claim..."
3,2023,The Perks of Being a Wallflower,Stephen Chbosky,"Reasons claimed to be sexually explicit, LGBT..."
4,2023,Flamer,Mike Curato,"Reasons LGBTQIA content, claimed to be sexual..."
...,...,...,...,...
294,2001,Fallen Angels,Walter Dean Myers,Reasons offensive language
295,2001,Blood and Chocolate,Annette Curtis Klause,"Reasons sexually explicit, unsuited to age group"
296,2001,,,
297,2001,,,


In [13]:
# Drop the rows containing 'None' values
df_stack = df_stack[df_stack['title'] != ('None')]
df_stack

Unnamed: 0,year,title,author,reasons
0,2023,Gender Queer A Memoir,Maia Kobabe,"Reasons LGBTQIA content, claimed to be sexual..."
1,2023,All Boys Arent Blue A Memoir-Manifesto,George M. Johnson,"Reasons LGBTQIA content, claimed to be sexual..."
2,2023,This Book is Gay,Juno Dawson,"Reasons LGBTQIA content, sex education, claim..."
3,2023,The Perks of Being a Wallflower,Stephen Chbosky,"Reasons claimed to be sexually explicit, LGBT..."
4,2023,Flamer,Mike Curato,"Reasons LGBTQIA content, claimed to be sexual..."
...,...,...,...,...
291,2001,The Catcher in the Rye,J.D. Salinger,"Reasons offensive language, unsuited to age g..."
292,2001,The Agony of Alice,Phyllis Reynolds Naylor,"Reasons sexually explicit, unsuited to age group"
293,2001,Go Ask Alice,Anonymous,"Reasons drugs, offensive language, sexually e..."
294,2001,Fallen Angels,Walter Dean Myers,Reasons offensive language


In [16]:
# Trim leading/trailing whitespace from dataframe
df_stack = df_stack.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
df_stack

Unnamed: 0,year,title,author,reasons
0,2023,Gender Queer A Memoir,Maia Kobabe,"Reasons LGBTQIA content, claimed to be sexuall..."
1,2023,All Boys Arent Blue A Memoir-Manifesto,George M. Johnson,"Reasons LGBTQIA content, claimed to be sexuall..."
2,2023,This Book is Gay,Juno Dawson,"Reasons LGBTQIA content, sex education, claime..."
3,2023,The Perks of Being a Wallflower,Stephen Chbosky,"Reasons claimed to be sexually explicit, LGBTQ..."
4,2023,Flamer,Mike Curato,"Reasons LGBTQIA content, claimed to be sexuall..."
...,...,...,...,...
291,2001,The Catcher in the Rye,J.D. Salinger,"Reasons offensive language, unsuited to age group"
292,2001,The Agony of Alice,Phyllis Reynolds Naylor,"Reasons sexually explicit, unsuited to age group"
293,2001,Go Ask Alice,Anonymous,"Reasons drugs, offensive language, sexually ex..."
294,2001,Fallen Angels,Walter Dean Myers,Reasons offensive language


### Step 5: Save the clean data to the database for later

In [19]:
# Transfer cleaned dataframe to database
df_stack.to_sql(name='ala_top_10_archive_clean', con=conn, if_exists='replace', index=False)

234

In [20]:
# Close database connection
conn.close()