## About the Dataset
**Source:** https://www.goodreads.com/choiceawards/best-books-2023
<br>
The data was scraped from the 15th Annual Goodreads Choice Awards results. The books included in this dataset are all considered as the best books in 2023. There are 15 categories in which the books are nominated: *Fiction, Historical Fiction, Mystery & Thriller, Romance, Romantasy, Fantasy, Science Fiction, Horror, Young Adult Fantasy, Young Adult Fiction, Debut Novel, Nonfiction, Memoir & Autobiography, History & Biography, and Humor.*
<br>
##### **Dataset Features**
* **title** - name of the book
* **authors** - name of the author/s of the book
* **genres** - categories of which the book is under
* **rating** - average rating that the book received
* **num_of_ratings** - total number of ratings that the book received
* **num_of_votes** - total number of votes that the book received under the 15th Annual Goodreads Choice Awards
* **num_of_reviews** - total number of written reviews that the book received
* **num_of_pages** - number of pages that the book has
* **language** - primary language used in the book
* **awards** - award/s given to the book under the 15th Annual Goodreads Choice Awards
* **publication_date** - date when the book was published
* **publisher** - name of the publisher of the book
* **isbn** - International Standard Book Number: a 13 or 10 digit number assigned to all books and book-like publications that are published internationally
* **description** - summary of the book's content

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px

In [None]:
df = pd.read_csv('goodreads_best_books_2023_raw.csv')

## Dataset Description

In [None]:
df.head()

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,publication_date,publisher,isbn,description
0,Yellowface,"R.F. Kuang, R.F. Kuang","Fiction, Contemporary, Audiobook, Literary Fic...",3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),"May 16, 2023",William Morrow,,White lies. Dark humor. Deadly consequences… B...
1,Hello Beautiful,"Ann Napolitano, Ann Napolitano","Fiction, Historical Fiction, Audiobook, Romanc...",4.17,330125,60171,31119,416,English,Nominee for Best Fiction (2023),"March 14, 2023",The Dial Press,,An emotionally layered and engrossing story of...
2,The Wishing Game,"Meg Shaffer, Meg Shaffer","Fiction, Fantasy, Romance, Contemporary, Magic...",4.1,120980,57702,19061,304,English,Nominee for Best Fiction (2023),"May 30, 2023",Ballantine Books,9780593598832 (ISBN10: 0593598830),Make a wish. . . .\n\nLucy Hart knows better t...
3,Tom Lake,"Ann Patchett, Ann Patchett","Fiction, Audiobook, Literary Fiction, Romance,...",4.01,284095,53470,34993,309,English,Nominee for Best Fiction (2023),"August 1, 2023",Harper,9780063327528 (ISBN10: 006332752X),In this beautiful and moving novel about famil...
4,The Five-Star Weekend,"Elin Hilderbrand, Elin Hilderbrand","Fiction, Romance, Audiobook, Chick Lit, Contem...",4.06,180939,45859,12404,384,English,Nominee for Best Fiction (2023),"June 13, 2023","Little, Brown and Company",9780316258777 (ISBN10: 0316258776),From the #1 New York Times bestselling author ...


In [None]:
df.tail()

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,publication_date,publisher,isbn,description
294,Misfit: Growing Up Awkward in the '80s,"Gary Gulman, Gary Gulman","Memoir, Humor, Nonfiction, Audiobook, Biograph...",4.14,2426,2533,348,304,English,,"September 19, 2023",Flatiron Books,9781250777065 (ISBN10: 1250777062),A tour de force of comedy and reflection about...
295,"Unreliable Narrator: Me, Myself, and Impostor ...","Aparna Nancherla, Aparna Nancherla","Memoir, Nonfiction, Humor, Essays, Audiobook, ...",3.63,1249,2257,151,304,English,Nominee for Best Humor (2023),"September 19, 2023",Viking,9781984879806 (ISBN10: 1984879804),A hilarious and insightful collection of essay...
296,America the Beautiful?: One Woman in a Borrowe...,"Blythe Roberson, Blythe Roberson","Nonfiction, Travel, Memoir, Humor, Nature, Aud...",3.77,2038,1866,385,304,English,Nominee for Best Humor (2023),"April 18, 2023",Harper Perennial,9780063115514 (ISBN10: 0063115514),"For writer and comedian Blythe Roberson, there..."
297,Alexandra Petri's US History: Important Americ...,"Alexandra Petri, Alexandra Petri","Humor, History, Nonfiction, Politics, Essays, ...",3.68,763,1137,197,326,English,Nominee for Best Humor (2023),"April 11, 2023",W. W. Norton & Company,9781324006435 (ISBN10: 1324006439),"A witty, absurdist satire of the last 500 year..."
298,"Not Funny: Essays on Life, Comedy, Culture, Et...","Jena Friedman, Jena Friedman","Humor, Nonfiction, Essays, Memoir, Comedy, Fem...",3.73,851,1072,147,256,English,Nominee for Best Humor (2023),"April 18, 2023",Atria/One Signal Publishers,9781982178284 (ISBN10: 1982178280),For fans of the perceptive comedy of Hannah Ga...


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 299 entries, 0 to 298
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   title             299 non-null    object 
 1   authors           299 non-null    object 
 2   genres            299 non-null    object 
 3   rating            299 non-null    float64
 4   num_of_ratings    299 non-null    int64  
 5   num_of_votes      299 non-null    int64  
 6   num_of_reviews    299 non-null    int64  
 7   num_of_pages      299 non-null    int64  
 8   language          299 non-null    object 
 9   awards            292 non-null    object 
 10  publication_date  299 non-null    object 
 11  publisher         293 non-null    object 
 12  isbn              274 non-null    object 
 13  description       299 non-null    object 
dtypes: float64(1), int64(4), object(9)
memory usage: 32.8+ KB


In [None]:
df.describe()

Unnamed: 0,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages
count,299.0,299.0,299.0,299.0,299.0
mean,4.02398,67598.66,19662.919732,8705.889632,391.541806
std,0.253603,137490.3,32406.337679,17259.77876,123.571301
min,3.45,405.0,935.0,122.0,192.0
25%,3.81,9926.5,3591.0,1619.5,320.0
50%,4.02,27317.0,10099.0,4213.0,367.0
75%,4.19,64843.0,24714.0,8790.0,431.5
max,4.75,1687287.0,397565.0,222338.0,1242.0


In [None]:
print(f'The dataset has {df.shape[1]} columns and {df.shape[0]} rows.')
print(f'Are there any duplicated rows? {df.duplicated().values.any()}')
print(f'Are there any null values? {df.isna().values.any()}')

The dataset has 14 columns and 299 rows.
Are there any duplicated rows? False
Are there any null values? True


## Convert Data Type

In [None]:
# Convert publication date from object to date
df['publication_date'] = pd.to_datetime(df['publication_date'])
df['publication_date'].info()

<class 'pandas.core.series.Series'>
RangeIndex: 299 entries, 0 to 298
Series name: publication_date
Non-Null Count  Dtype         
--------------  -----         
299 non-null    datetime64[ns]
dtypes: datetime64[ns](1)
memory usage: 2.5 KB


## Clean Null Values

In [None]:
null_rows = df.isna().any(axis=1)
df[null_rows]

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,publication_date,publisher,isbn,description
0,Yellowface,"R.F. Kuang, R.F. Kuang","Fiction, Contemporary, Audiobook, Literary Fic...",3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...
1,Hello Beautiful,"Ann Napolitano, Ann Napolitano","Fiction, Historical Fiction, Audiobook, Romanc...",4.17,330125,60171,31119,416,English,Nominee for Best Fiction (2023),2023-03-14,The Dial Press,,An emotionally layered and engrossing story of...
7,Maame,"Jessica George, Jessica George","Fiction, Contemporary, Audiobook, Literary Fic...",4.07,90554,21854,11681,320,English,"Nominee for Best Fiction (2023), Nominee for B...",2023-01-31,St. Martin's Press,,Shortlisted for the TikTok Book Awards in the ...
29,Looking for Jane,"Heather Marshall, Heather Marshall","Historical Fiction, Fiction, Historical, Canad...",4.37,43576,19468,5040,400,English,,2022-03-01,Atria Books,9781668013687 (ISBN10: 1668013681),A debut about three women whose lives are boun...
57,Don't Let Her Stay,"Nicola Sanders, Nicola Sanders","Thriller, Mystery Thriller, Mystery, Audiobook...",3.96,143495,4099,11219,283,English,Nominee for Best Mystery & Thriller (2023),2023-02-09,,,"Someone inside your house wants you dead, but ..."
66,King of Pride,"Ana Huang, Ana Huang","Romance, Contemporary Romance, Contemporary, A...",3.96,181921,37653,18550,358,English,Nominee for Best Romance (2023),2023-04-27,,,She's his opposite in every way...and the grea...
67,The Right Move,"Liz Tomforde, Liz Tomforde","Romance, Sports Romance, Sports, Contemporary,...",4.43,202314,36246,22584,499,English,Nominee for Best Romance (2023),2023-02-07,Golden Boy Publishing LLC,,"RYAN\n\nShe’s a distraction, that’s what she i..."
68,The Seven Year Slip,"Ashley Poston, Ashley Poston","Romance, Fiction, Contemporary, Magical Realis...",4.24,193867,30820,35205,352,English,Nominee for Best Romance (2023),2023-06-27,Berkley,,An overworked book publicist with a perfectly ...
74,Powerless,"Elsie Silver, Elsie Silver","Romance, Sports Romance, Contemporary Romance,...",4.09,180965,8486,18735,396,English,Nominee for Best Romance (2023),2023-02-10,,,Two childhood friends. Two broken hearts. One ...
78,A Long Time Coming,"Meghan Quinn, Meghan Quinn","Romance, Contemporary Romance, Contemporary, A...",4.27,73485,3918,6154,450,English,Nominee for Best Romance (2023),2023-01-10,Meghan Quinn,,"A witty take on a romantic comedy classic, My ..."


In [None]:
df[df['language'] != 'English']

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,publication_date,publisher,isbn,description
119,VenCo,"Cherie Dimaline, Cherie Dimaline","Fantasy, Fiction, Witches, Magical Realism, Ad...",3.79,8611,2418,1629,400,9780063054899 (ISBN10: 0063054892),Nominee for Best Fantasy (2023),2023-02-07,William Morrow,,"Lucky St. James, a Métis millennial living wit..."
158,Her Little Flowers,"Shannon Morgan, Shannon Morgan","Horror, Gothic, Mystery, Fiction, Thriller, Pa...",3.81,3601,1911,604,368,9781496743886 (ISBN10: 1496743881),Nominee for Best Horror (2023),2023-07-25,Kensington,,Francine Thwaite has lived all her fifty-five ...


In [None]:
# Transfer ISBN values from the language column to the ISBN column
df.at[119, 'isbn'] = df.iloc[119]['language']
df.at[158, 'isbn'] = df.iloc[158]['language']
df.at[119, 'language'] = np.NaN
df.at[158, 'language'] = np.NaN

In [None]:
# See book titles with null values under the awards column
null_rows_awards = df['awards'].isna()
df[null_rows_awards].sort_values('title')

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,publication_date,publisher,isbn,description
238,Build the Life You Want: The Art and Science o...,"Arthur C. Brooks, Oprah Winfrey, Arthur C. Brooks","Self Help, Nonfiction, Psychology, Personal De...",3.8,8544,2202,854,272,English,,2023-09-12,Portfolio,9780593545409 (ISBN10: 0593545400),You can get happier. And getting there will be...
29,Looking for Jane,"Heather Marshall, Heather Marshall","Historical Fiction, Fiction, Historical, Canad...",4.37,43576,19468,5040,400,English,,2022-03-01,Atria Books,9781668013687 (ISBN10: 1668013681),A debut about three women whose lives are boun...
205,Looking for Jane,"Heather Marshall, Heather Marshall","Historical Fiction, Fiction, Historical, Canad...",4.37,43576,20045,5040,400,English,,2022-03-01,Atria Books,9781668013687 (ISBN10: 1668013681),A debut about three women whose lives are boun...
294,Misfit: Growing Up Awkward in the '80s,"Gary Gulman, Gary Gulman","Memoir, Humor, Nonfiction, Audiobook, Biograph...",4.14,2426,2533,348,304,English,,2023-09-19,Flatiron Books,9781250777065 (ISBN10: 1250777062),A tour de force of comedy and reflection about...
148,Our Share of Night,"Mariana Enríquez, Megan McDowell, Pablo Gerard...","Horror, Fiction, Fantasy, Historical Fiction, ...",4.29,36310,14658,7224,588,English,,2019-11-27,Hogarth,9780451495143 (ISBN10: 0451495144),A woman’s mysterious death puts her husband an...
110,Stone Blind,"Natalie Haynes, Natalie Haynes","Mythology, Fantasy, Fiction, Greek Mythology, ...",3.84,46964,9336,6913,373,English,,2022-09-15,Harper,9780063258396 (ISBN10: 0063258390),"A fresh take on the story of Medusa, the origi..."
282,Tough Titties: On Living Your Best Life When Y...,"Laura Belgray, Laura Belgray","Nonfiction, Memoir, Humor, Self Help, Essays, ...",3.58,1565,18074,375,320,English,,2023-06-13,Hachette Books,9780306826047 (ISBN10: 0306826046),"From award-winning TV writer Laura Belgray, a ..."


In [None]:
# Update the awards column
df.at[238, 'awards'] = 'Nominee for Best Nonfiction (2023)'
df.at[29, 'awards'] = 'Nominee for Best Historical Fiction (2023), Nominee for Best Debut Novel (2023)'
df.at[205, 'awards'] = 'Nominee for Best Historical Fiction (2023), Nominee for Best Debut Novel (2023)'
df.at[294, 'awards'] = 'Nominee for Best Humor (2023)'
df.at[148, 'awards'] = 'Nominee for Best Horror (2023)'
df.at[110, 'awards'] = 'Nominee for Best Fantasy (2023)'
df.at[282, 'awards'] = 'Nominee for Best Humor (2023)'

## Create a Category Column

In [None]:
df.insert(10, 'category', np.NaN)

In [None]:
category_awards = ['Best Fiction', 'Best Historical Fiction',
                    'Best Mystery & Thriller', 'Best Romance', 'Best Romantasy',
                    'Best Fantasy', 'Best Science Fiction', 'Best Horror',
                    'Best Young Adult Fantasy & Science Fiction',
                    'Best Young Adult Fiction', 'Best Debut Novel',
                    'Best Nonfiction', 'Best Memoir & Autobiography',
                    'Best History & Biography', 'Best Humor']

for category in category_awards:
  matched_indices = df[df['awards'].str.contains(category)].index

  for index in matched_indices:
    if df['category'].isna()[index]:
      df.at[index, 'category'] = category
    else:
      add_category = df.at[index, 'category'] + f', {category}'
      df.at[index, 'category'] = add_category

In [None]:
print(f'Are there null values under the category column?')
print(df['category'].isna().values.any())

Are there null values under the category column?
False


In [None]:
# Remove 'Best' in the categories
for i in range(df.shape[0]):
  try:
    first_category = df.iloc[i]['category'].split(', ')[0].split()[1:]
    second_category = df.iloc[i]['category'].split(', ')[1].split()[1:]
    first_clean = ' '.join(first_category)
    second_clean = ' '.join(second_category)
    category = f'{first_clean}, {second_clean}'
    df.at[i, 'category'] = category
  except IndexError:
    only_category = df.iloc[i]['category'].split()[1:]
    category = ' '.join(only_category)
    df.at[i, 'category'] = category

df['category'][:10]

0                 Fiction
1                 Fiction
2                 Fiction
3                 Fiction
4                 Fiction
5                 Fiction
6                 Fiction
7    Fiction, Debut Novel
8    Fiction, Debut Novel
9    Fiction, Debut Novel
Name: category, dtype: object

## Clean Duplicated Values

In [None]:
# Clean duplicated authors
for i in range(df.shape[0]):
  author_list = df.iloc[i]['authors'].split(', ')
  authors_no_duplicates = ''

  for author in author_list:
    if author not in authors_no_duplicates:
      authors_no_duplicates += f', {author}'
    else:
      pass

  no_first_comma = authors_no_duplicates.split()[1:]
  proper_format = ' '.join(no_first_comma)
  df.at[i, 'authors'] = proper_format

df['authors']

0            R.F. Kuang
1        Ann Napolitano
2           Meg Shaffer
3          Ann Patchett
4      Elin Hilderbrand
             ...       
294         Gary Gulman
295    Aparna Nancherla
296     Blythe Roberson
297     Alexandra Petri
298       Jena Friedman
Name: authors, Length: 299, dtype: object

In [None]:
# Clean duplicated commas under genres
genres_to_fix = df[df['genres'].str.contains(', , , ,')]
display(genres_to_fix)

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
18,Blackouts,Justin Torres,", , , , , , , , , , , , , , , , , , , , , , , ...",3.78,6286,2611,1142,306,English,Nominee for Best Fiction (2023),Fiction,2023-10-10,"Farrar, Straus and Giroux",9780374293574 (ISBN10: 0374293570),"From the bestselling author of We the Animals,..."
254,Tell Me Everything,Minka Kelly,", , , , , , , , , , , , , , , , , , , , , , , ...",4.31,22102,2973,2128,288,English,Nominee for Best Memoir & Autobiography (2023),Memoir & Autobiography,2023-05-02,Henry Holt and Co.,9781250852069 (ISBN10: 1250852064),"“A timely, urgent portrait of working-class Am..."
276,Rough Sleepers,Tracy Kidder,", , , , , , , , , , , , , , , , , , , , , , , ...",4.35,7360,2142,1083,320,English,Nominee for Best History & Biography (2023),History & Biography,2023-01-17,Random House,9781984801432 (ISBN10: 1984801430),"In Rough Sleepers, Tracy Kidder shows how one ..."


In [None]:
genre_list = genres_to_fix['genres'].str.split()
genre_list_clean = []
for genres in genre_list:
  no_comma_list = genres[29:]
  no_comma_list = ' '.join(no_comma_list)
  genre_list_clean.append(no_comma_list)

df.at[18, 'genres'] = genre_list_clean[0]
df.at[254, 'genres'] = genre_list_clean[1]
df.at[276, 'genres'] = genre_list_clean[2]

In [None]:
# Make a dataframe with no number of votes and no duplicated titles
df_no_votes = df.loc[:, df.columns != 'num_of_votes']
to_delete = df_no_votes[df_no_votes.duplicated('title')]['title']

indices_to_delete = []
for title in to_delete:
  if title != 'Powerless':
    index_to_delete = df_no_votes[df_no_votes['title'] == title].index[1]
    df_no_votes.drop(index_to_delete, inplace=True)
  else:
    pass

In [None]:
print(f'df_no_votes has {df_no_votes.shape[1]} columns and {df_no_votes.shape[0]} rows')

df_no_votes has 14 columns and 284 rows


In [None]:
# See all the rows of the original dataframe with duplicated titles
duplicated_titles = df[df.duplicated('title')]['title']
for title in duplicated_titles:
  display(df[df['title'] == title])

Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
74,Powerless,Elsie Silver,"Romance, Sports Romance, Contemporary Romance,...",4.09,180965,8486,18735,396,English,Nominee for Best Romance (2023),Romance,2023-02-10,,,Two childhood friends. Two broken hearts. One ...
164,Powerless,Lauren Roberts,"Fantasy, Romance, Young Adult, Fantasy Romance...",4.27,231109,28558,44529,523,English,Nominee for Best Young Adult Fantasy & Science...,Young Adult Fantasy & Science Fiction,2023-01-31,Simon & Schuster Books for Young Readers,9781665954884 (ISBN10: 1665954884),She is the very thing he’s spent his whole lif...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
20,Weyward,Emilia Hart,"Historical Fiction, Fantasy, Fiction, Magical ...",4.08,167621,62211,20083,329,English,"Winner for Best Historical Fiction (2023), Win...","Historical Fiction, Debut Novel",2023-02-02,St. Martin's Press,9781250280800 (ISBN10: 125028080X),"I am a Weyward, and wild inside.\n\n2019: Unde..."
200,Weyward,Emilia Hart,"Historical Fiction, Fantasy, Fiction, Magical ...",4.08,167621,45420,20083,329,English,"Winner for Best Historical Fiction (2023), Win...","Historical Fiction, Debut Novel",2023-02-02,St. Martin's Press,9781250280800 (ISBN10: 125028080X),"I am a Weyward, and wild inside.\n\n2019: Unde..."


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
112,Ink Blood Sister Scribe,Emma Törzs,"Fantasy, Fiction, Magical Realism, Magic, Book...",4.04,37537,6895,5845,416,English,"Nominee for Best Fantasy (2023), Nominee for B...","Fantasy, Debut Novel",2023-05-30,William Morrow,9780063253469 (ISBN10: 0063253461),"In this spellbinding debut novel, two estrange..."
201,Ink Blood Sister Scribe,Emma Törzs,"Fantasy, Fiction, Magical Realism, Magic, Book...",4.04,37537,32924,5845,416,English,"Nominee for Best Fantasy (2023), Nominee for B...","Fantasy, Debut Novel",2023-05-30,William Morrow,9780063253469 (ISBN10: 0063253461),"In this spellbinding debut novel, two estrange..."


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
9,Pineapple Street,Jenny Jackson,"Fiction, Audiobook, Contemporary, Literary Fic...",3.57,164555,17864,15391,304,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-03-07,Pamela Dorman Books,9780593490693 (ISBN10: 059349069X),"Darley, the eldest daughter in the well-connec..."
202,Pineapple Street,Jenny Jackson,"Fiction, Audiobook, Contemporary, Literary Fic...",3.57,164560,29199,15392,304,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-03-07,Pamela Dorman Books,9780593490693 (ISBN10: 059349069X),"Darley, the eldest daughter in the well-connec..."


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
7,Maame,Jessica George,"Fiction, Contemporary, Audiobook, Literary Fic...",4.07,90554,21854,11681,320,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-01-31,St. Martin's Press,,Shortlisted for the TikTok Book Awards in the ...
203,Maame,Jessica George,"Fiction, Contemporary, Audiobook, Literary Fic...",4.07,90554,27680,11681,320,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-01-31,St. Martin's Press,9781250282521 (ISBN10: 1250282527),A Today Show #ReadWithJenna Book Club Pick\n\n...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
146,The September House,Carissa Orlando,"Horror, Thriller, Fiction, Paranormal, Mystery...",3.92,32482,15783,5641,344,English,"Nominee for Best Horror (2023), Nominee for Be...","Horror, Debut Novel",2023-09-05,Berkley,9780593548615 (ISBN10: 0593548612),A woman is determined to stay in her dream hom...
204,The September House,Carissa Orlando,"Horror, Thriller, Fiction, Paranormal, Mystery...",3.92,32482,22346,5641,344,English,"Nominee for Best Horror (2023), Nominee for Be...","Horror, Debut Novel",2023-09-05,Berkley,9780593548615 (ISBN10: 0593548612),A woman is determined to stay in her dream hom...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
29,Looking for Jane,Heather Marshall,"Historical Fiction, Fiction, Historical, Canad...",4.37,43576,19468,5040,400,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2022-03-01,Atria Books,9781668013687 (ISBN10: 1668013681),A debut about three women whose lives are boun...
205,Looking for Jane,Heather Marshall,"Historical Fiction, Fiction, Historical, Canad...",4.37,43576,20045,5040,400,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2022-03-01,Atria Books,9781668013687 (ISBN10: 1668013681),A debut about three women whose lives are boun...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
28,Did You Hear About Kitty Karr?,Crystal Smith Paul,"Historical Fiction, Fiction, Historical, Audio...",3.76,48597,24150,4920,416,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-05-02,Henry Holt and Co.,9781250815309 (ISBN10: 1250815304),A multigenerational saga that traverses the gl...
207,Did You Hear About Kitty Karr?,Crystal Smith Paul,"Historical Fiction, Fiction, Historical, Audio...",3.76,48597,19698,4920,416,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-05-02,Henry Holt and Co.,9781250815309 (ISBN10: 1250815304),A multigenerational saga that traverses the gl...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
125,Chain-Gang All-Stars,Nana Kwame Adjei-Brenyah,"Fiction, Science Fiction, Dystopia, LGBT, Fant...",4.14,45774,25843,8257,367,English,"Nominee for Best Science Fiction (2023), Nomin...","Science Fiction, Debut Novel",2023-05-02,Pantheon,9780593317334 (ISBN10: 0593317335),Two top women gladiators fight for their freed...
209,Chain-Gang All-Stars,Nana Kwame Adjei-Brenyah,"Fiction, Science Fiction, Dystopia, LGBT, Fant...",4.14,45774,7645,8257,367,English,"Nominee for Best Science Fiction (2023), Nomin...","Science Fiction, Debut Novel",2023-05-02,Pantheon,9780593317334 (ISBN10: 0593317335),Two top women gladiators fight for their freed...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
8,The Collected Regrets of Clover,Mikki Brammer,"Fiction, Contemporary, Romance, Audiobook, Lit...",4.17,64843,18050,9469,314,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-05-09,St Martin's Press,9781250284396 (ISBN10: 1250284392),What’s the point of giving someone a beautiful...
210,The Collected Regrets of Clover,Mikki Brammer,"Fiction, Contemporary, Romance, Audiobook, Lit...",4.17,64843,6979,9469,314,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-05-09,St Martin's Press,9781250284396 (ISBN10: 1250284392),What’s the point of giving someone a beautiful...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
35,"Beyond That, the Sea",Laura Spence-Ash,"Historical Fiction, Fiction, Historical, Roman...",4.25,36380,5444,3665,351,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-03-21,Celadon Books,9781250854377 (ISBN10: 1250854377),"A sweeping, tenderhearted love story, Beyond T..."
211,"Beyond That, the Sea",Laura Spence-Ash,"Historical Fiction, Fiction, Historical, Roman...",4.25,36381,6605,3665,351,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-03-21,Celadon Books,9781250854377 (ISBN10: 1250854377),"A sweeping, tenderhearted love story, Beyond T..."


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
11,Shark Heart: A Love Story,Emily Habeck,"Fiction, Magical Realism, Fantasy, Romance, Au...",4.01,34916,6218,7509,416,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-08-08,S&S/ Marysue Rucci Books,9781668006498 (ISBN10: 1668006499),"For Lewis and Wren, their first year of marria..."
212,Shark Heart: A Love Story,Emily Habeck,"Fiction, Magical Realism, Fantasy, Romance, Au...",4.01,34918,6041,7509,416,English,"Nominee for Best Fiction (2023), Nominee for B...","Fiction, Debut Novel",2023-08-08,S&S/ Marysue Rucci Books,9781668006498 (ISBN10: 1668006499),"For Lewis and Wren, their first year of marria..."


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
36,River Sing Me Home,Eleanor Shearer,"Historical Fiction, Fiction, Historical, Liter...",4.01,23727,4948,2981,322,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-01-19,Berkley,9780593548042 (ISBN10: 0593548043),Her search begins with an ending....\n\nThe ma...
214,River Sing Me Home,Eleanor Shearer,"Historical Fiction, Fiction, Historical, Liter...",4.01,23727,4095,2981,322,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-01-19,Berkley,9780593548042 (ISBN10: 0593548043),Her search begins with an ending....\n\nThe ma...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
37,Banyan Moon,Thao Thai,"Fiction, Historical Fiction, Contemporary, Lit...",3.92,23864,3762,2498,336,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-06-27,Mariner Books,9780063267107 (ISBN10: 0063267101),When Ann Tran gets the call that her fiercely ...
215,Banyan Moon,Thao Thai,"Fiction, Historical Fiction, Contemporary, Lit...",3.92,23864,3597,2498,336,English,"Nominee for Best Historical Fiction (2023), No...","Historical Fiction, Debut Novel",2023-06-27,Mariner Books,9780063267107 (ISBN10: 0063267101),When Ann Tran gets the call that her fiercely ...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
156,Monstrilio,Gerardo Sámano Córdova,"Horror, Fiction, Fantasy, Magical Realism, LGB...",4.15,12314,2595,2723,336,English,"Nominee for Best Horror (2023), Nominee for Be...","Horror, Debut Novel",2023-03-07,Zando,9781638930365 (ISBN10: 1638930368),A literary horror debut about a boy who transf...
216,Monstrilio,Gerardo Sámano Córdova,"Horror, Fiction, Fantasy, Magical Realism, LGB...",4.15,12314,3391,2723,336,English,"Nominee for Best Horror (2023), Nominee for Be...","Horror, Debut Novel",2023-03-07,Zando,9781638930365 (ISBN10: 1638930368),A literary horror debut about a boy who transf...


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
131,I Keep My Exoskeletons to Myself,Marisa Crane,"Fiction, Science Fiction, LGBT, Queer, Dystopi...",3.95,5438,3739,1060,352,English,"Nominee for Best Science Fiction (2023), Nomin...","Science Fiction, Debut Novel",2023-01-17,Catapult,9781646221295 (ISBN10: 164622129X),"In a United States not so unlike our own, the ..."
217,I Keep My Exoskeletons to Myself,Marisa Crane,"Fiction, Science Fiction, LGBT, Queer, Dystopi...",3.95,5438,3165,1060,352,English,"Nominee for Best Science Fiction (2023), Nomin...","Science Fiction, Debut Novel",2023-01-17,Catapult,9781646221295 (ISBN10: 164622129X),"In a United States not so unlike our own, the ..."


In [None]:
# Upon inspecting the indices of the the duplicates above and checking the
# Goodreads website, the number of votes of the row with the lower index
# applies to the first award recorded under the awards columns, and vice versa.

# Separate the awards and category according to the number of votes
for title in duplicated_titles:
  if title != 'Powerless':
    lower_index = df[df['title'] == title].index.values.astype(int)[0]
    higher_index = df[df['title'] == title].index.values.astype(int)[1]

    award_list = df.iloc[lower_index]['awards'].split(', ')
    category_list = df.iloc[lower_index]['category'].split(', ')

    df.at[lower_index, 'awards'] = award_list[0]
    df.at[lower_index, 'category'] = category_list[0]

    df.at[higher_index, 'awards'] = award_list[1]
    df.at[higher_index, 'category'] = category_list[1]
  else:
    pass

## Save CSV files

In [None]:
df_no_votes.to_csv('goodreads_best_books_2023.csv', index=False, encoding='utf-8')
df.to_csv('goodreads_2023_awards.csv', index=False, encoding='utf-8')

In [None]:
# Split the genres
df['genres'] = df['genres'].str.split(', ')
df_genre = df.explode('genres').reset_index(drop=True)

print(f'df_genre has {df_genre.shape[1]} columns and {df_genre.shape[0]} rows.')
df_genre.head()

df_genre has 15 columns and 2093 rows.


Unnamed: 0,title,authors,genres,rating,num_of_ratings,num_of_votes,num_of_reviews,num_of_pages,language,awards,category,publication_date,publisher,isbn,description
0,Yellowface,R.F. Kuang,Fiction,3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),Fiction,2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...
1,Yellowface,R.F. Kuang,Contemporary,3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),Fiction,2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...
2,Yellowface,R.F. Kuang,Audiobook,3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),Fiction,2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...
3,Yellowface,R.F. Kuang,Literary Fiction,3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),Fiction,2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...
4,Yellowface,R.F. Kuang,Thriller,3.79,496159,200722,67532,336,English,Winner for Best Fiction (2023),Fiction,2023-05-16,William Morrow,,White lies. Dark humor. Deadly consequences… B...


In [None]:
df_genre.to_csv('goodreads_2023_genres.csv', index=False, encoding='utf-8')

In [None]:
data = {'level': np.NaN,
        'ids': df['title'].tolist(),
        'labels': df['title'].tolist(),
        'parents': df['category'].tolist()}
df_category = pd.DataFrame(data)
df_category.head()

Unnamed: 0,level,ids,labels,parents
0,,Yellowface,Yellowface,Fiction
1,,Hello Beautiful,Hello Beautiful,Fiction
2,,The Wishing Game,The Wishing Game,Fiction
3,,Tom Lake,Tom Lake,Fiction
4,,The Five-Star Weekend,The Five-Star Weekend,Fiction


In [None]:
num_of_votes = df['num_of_votes'].tolist()
for i in range(299):
  add_votes = df_category.iloc[i]['labels'] + ' - ' + str(num_of_votes[i]) + ' votes'
  df_category.at[i, 'labels'] = add_votes

In [None]:
categories = df['category'].unique().tolist()
for category in categories:
  df_category.loc[-1] = [np.NaN, category, category, np.NaN]
  df_category.index = df_category.index + 1
  df_category.sort_index(inplace=True)

In [None]:
for i in range(len(categories)):
  level = str(i + 1)
  df_category.at[i, 'level'] = level

In [None]:
labels = df_category['labels'][:15].values.tolist()
levels = df_category['level'][:15].values.tolist()
level = 0
for category in labels:
  count = 1
  for i in range(15, 314):
    if df_category.iloc[i]['parents'] == category:
      category_level = levels[level]
      df_category.at[i, 'level'] = category_level + '.' + str(count)
      count += 1
    else:
      pass
  level += 1

In [None]:
df_category.to_csv('goodreads_2023_category.csv', index=False, encoding='utf-8')