# Goodreads: Science Fiction Books by Female Authors (Scraping to a CSV)

https://www.goodreads.com/list/show/6934.Science_Fiction_Books_by_Female_Authors

Scrape the fields below, and save as a CSV file.

|Field|Example|
|---|---|
|Rank|1|
|Title|The Handmaid's Tale|
|Author|Margaret Atwood|
|Score|score: 30,733|
|Votes|314 people voted|
|Rating|4.09 avg rating — 1,101,120 ratings|

This one is a little tougher, but the main difficulty is in cleaning the data! Clean and separate the scraped data, cleaning up columns and creating new ones like so:

|Before|After|
|---|---|
|A Wrinkle in Time (Time Quintet, #1)|A Wrinkle in Time|
|_Series_|Time Quintet|
|_Number in series_|1|
|score: 30,733|30733|
|4.09 avg rating — 1,101,120 ratings|4.09|
|_Number of ratings_|1101120|

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get('https://www.goodreads.com/list/show/6934.Science_Fiction_Books_by_Female_Authors')
doc = BeautifulSoup(response.text)

In [3]:
ranks = doc.find_all(class_="number")
for rank in ranks:
    print(rank.text.strip())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
89
91
92
93
94
95
96
97
98
99
100


In [4]:
titles = doc.find_all(class_="bookTitle")
for title in titles:
    print(title.text.strip())

The Handmaid's Tale
The Hunger Games (The Hunger Games, #1)
Frankenstein
A Wrinkle in Time (Time Quintet, #1)
The Left Hand of Darkness (Hainish Cycle #4)
Divergent (Divergent, #1)
Catching Fire (The Hunger Games, #2)
The Giver (The Giver, #1)
Kindred
The Dispossessed (Hainish Cycle #6)
Oryx and Crake (MaddAddam, #1)
Mockingjay (The Hunger Games, #3)
The Time Traveler's Wife
Doomsday Book (Oxford Time Travel, #1)
The Lathe of Heaven
Ancillary Justice (Imperial Radch #1)
Shards of Honour  (Vorkosigan Saga, #1)
To Say Nothing of the Dog (Oxford Time Travel, #2)
The Sparrow (The Sparrow, #1)
Dragonflight (Dragonriders of Pern, #1)
Parable of the Sower (Earthseed, #1)
The Warrior's Apprentice (Vorkosigan Saga, #2)
Barrayar (Vorkosigan Saga, #7)
The Host (The Host, #1)
Insurgent (Divergent, #2)
Memory (Vorkosigan Saga, #10)
The Children of Men
Starshine: Aurora Rising Book One (Aurora Rhapsody, #1)
The Year of the Flood  (MaddAddam, #2)
Crystal Singer (Crystal Singer, #1)
Mirror Dance (Vork

In [5]:
authors = doc.find_all(class_="authorName")
for author in authors:
    print(author.text.strip())

Margaret Atwood
Suzanne Collins
Mary Wollstonecraft Shelley
Madeleine L'Engle
Ursula K. Le Guin
Veronica Roth
Suzanne Collins
Lois Lowry
Octavia E. Butler
Ursula K. Le Guin
Margaret Atwood
Suzanne Collins
Audrey Niffenegger
Connie Willis
Ursula K. Le Guin
Ann Leckie
Lois McMaster Bujold
Connie Willis
Mary Doria Russell
Anne McCaffrey
Octavia E. Butler
Lois McMaster Bujold
Lois McMaster Bujold
Stephenie Meyer
Veronica Roth
Lois McMaster Bujold
P.D. James
G.S. Jennsen
Margaret Atwood
Anne McCaffrey
Lois McMaster Bujold
Octavia E. Butler
Lois McMaster Bujold
Octavia E. Butler
Madeleine L'Engle
Lois McMaster Bujold
Lois McMaster Bujold
Lois McMaster Bujold
Joan D. Vinge
Marissa Meyer
C.J. Cherryh
Anne McCaffrey
Deborah O'Neill Cordes
Anne McCaffrey
Lois McMaster Bujold
Lois McMaster Bujold
Madeleine L'Engle
Octavia E. Butler
Lois McMaster Bujold
C.J. Cherryh
Kate Wilhelm
Nancy Kress
James Tiptree Jr.
Tanith Lee
Lois McMaster Bujold
C.J. Cherryh
Emily St. John Mandel
C.J. Cherryh
Lois McMas

In [6]:
scores = doc.find_all(class_="smallText uitext")
for score in scores:
    score = score.find('a')
    print(score.text.strip())

score: 30,733
score: 28,553
score: 21,909
score: 18,720
score: 17,920
score: 13,326
score: 12,749
score: 12,399
score: 11,070
score: 10,731
score: 10,117
score: 9,807
score: 9,193
score: 8,935
score: 7,775
score: 7,336
score: 7,017
score: 6,890
score: 6,873
score: 6,802
score: 6,653
score: 6,087
score: 5,623
score: 5,274
score: 4,899
score: 4,682
score: 4,484
score: 4,394
score: 4,343
score: 4,235
score: 4,120
score: 4,108
score: 3,895
score: 3,842
score: 3,700
score: 3,683
score: 3,653
score: 3,618
score: 3,535
score: 3,440
score: 3,381
score: 3,361
score: 3,340
score: 3,092
score: 3,054
score: 2,904
score: 2,831
score: 2,755
score: 2,579
score: 2,547
score: 2,535
score: 2,526
score: 2,524
score: 2,504
score: 2,498
score: 2,469
score: 2,450
score: 2,316
score: 2,306
score: 2,203
score: 2,199
score: 2,191
score: 2,128
score: 2,038
score: 2,013
score: 2,006
score: 1,931
score: 1,914
score: 1,849
score: 1,819
score: 1,748
score: 1,698
score: 1,684
score: 1,683
score: 1,659
score: 1,572
s

In [7]:
scores[0].find_all('a')[1].text

'314 people voted'

In [8]:
# vote
for score in scores:
    print(score.find_all('a')[1].text)

314 people voted
292 people voted
224 people voted
196 people voted
184 people voted
138 people voted
133 people voted
129 people voted
116 people voted
112 people voted
107 people voted
104 people voted
98 people voted
96 people voted
84 people voted
81 people voted
75 people voted
74 people voted
72 people voted
74 people voted
71 people voted
65 people voted
60 people voted
56 people voted
54 people voted
51 people voted
49 people voted
45 people voted
47 people voted
47 people voted
47 people voted
47 people voted
46 people voted
43 people voted
42 people voted
43 people voted
43 people voted
43 people voted
43 people voted
38 people voted
40 people voted
39 people voted
34 people voted
36 people voted
37 people voted
36 people voted
32 people voted
32 people voted
31 people voted
30 people voted
30 people voted
30 people voted
29 people voted
29 people voted
31 people voted
31 people voted
28 people voted
30 people voted
29 people voted
27 people voted
22 people voted
27 people vo

In [9]:
rates = doc.find_all(class_="greyText")
for rate in rates:
    print(rate.text.strip())

(Goodreads Author)
4.09 avg rating — 1,105,358 ratings
and
Rate this book
4.33 avg rating — 5,743,136 ratings
and
Rate this book
3.78 avg rating — 1,024,143 ratings
and
Rate this book
4.01 avg rating — 904,957 ratings
and
Rate this book
4.06 avg rating — 98,909 ratings
and
Rate this book
(Goodreads Author)
4.21 avg rating — 2,604,504 ratings
and
Rate this book
4.29 avg rating — 2,201,771 ratings
and
Rate this book
(Goodreads Author)
4.12 avg rating — 1,536,227 ratings
and
Rate this book
4.23 avg rating — 72,456 ratings
and
Rate this book
4.21 avg rating — 71,722 ratings
and
Rate this book
(Goodreads Author)
4.01 avg rating — 197,962 ratings
and
Rate this book
4.03 avg rating — 2,072,962 ratings
and
Rate this book
(Goodreads Author)
3.96 avg rating — 1,447,313 ratings
and
Rate this book
4.03 avg rating — 43,277 ratings
and
Rate this book
4.11 avg rating — 44,029 ratings
and
Rate this book
(Goodreads Author)
3.98 avg rating — 66,308 ratings
and
Rate this book
(Goodreads Author)
4.11 avg 

In [10]:
rows = []
reads = doc.find_all('tr')

for read in reads:
    row = {}
    try:
        row['Rank'] = read.find(class_='number').text.strip() 
        row['Book_Title'] = read.find(class_='bookTitle').text.strip()
        row['Author_Name'] = read.find(class_='authorName').text.strip()
        row['Score'] = read.select('a[onclick]')[0].text
        row['Votes'] = read.select('a[id]')[0].text
        row['Rating'] = read.find(class_='minirating').text.strip()
        
    except:
        pass
    rows.append(row)


df = pd.DataFrame(rows)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted


In [11]:
reads[0].select('a')[0].text

'\n\n'

In [12]:
reads[0].select('a[onclick]')[0].text

'score: 30,733'

In [13]:
df.to_csv('GoodReads.csv', index=False)

In [14]:
df.shape

(100, 6)

In [15]:
df.Score.head()

0    score: 30,733
1    score: 28,553
2    score: 21,909
3    score: 18,720
4    score: 17,920
Name: Score, dtype: object

In [16]:
len('score:')

6

In [17]:
df['score_clean'] = df.Score.str[7:].str.replace(',', '').astype(int)

In [18]:
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731


In [19]:
df['score_clean'].mean()

4623.45

In [20]:
df['rating_clean'] = df.Rating.str.extract("(\d\.\d\d) ", expand=False)

In [21]:
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21


In [22]:
df.dtypes

Author_Name     object
Book_Title      object
Rank            object
Rating          object
Score           object
Votes           object
score_clean      int64
rating_clean    object
dtype: object

In [23]:
df['rating_clean'] = df.rating_clean.astype(float)

In [24]:
df.dtypes

Author_Name      object
Book_Title       object
Rank             object
Rating           object
Score            object
Votes            object
score_clean       int64
rating_clean    float64
dtype: object

In [25]:
df['number_of_ratings'] = df.Rating.str.extract("— (\d.*\d)", expand=False)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09,1105358
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33,5743136
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78,1024143
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01,904957
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06,98909
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21,2604504
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29,2201771
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12,1536227
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23,72456
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21,71722


In [26]:
df['number_of_ratings'] = df.number_of_ratings.str.replace(',', '').astype(int)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09,1105358
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33,5743136
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78,1024143
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01,904957
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06,98909
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21,2604504
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29,2201771
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12,1536227
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23,72456
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21,71722


In [27]:
df.dtypes

Author_Name           object
Book_Title            object
Rank                  object
Rating                object
Score                 object
Votes                 object
score_clean            int64
rating_clean         float64
number_of_ratings      int64
dtype: object

In [28]:
df['Book_Title_clean'] = df.Book_Title.str.extract("(^[^\(]*)", expand=False)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings,Book_Title_clean
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09,1105358,The Handmaid's Tale
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33,5743136,The Hunger Games
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78,1024143,Frankenstein
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01,904957,A Wrinkle in Time
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06,98909,The Left Hand of Darkness
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21,2604504,Divergent
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29,2201771,Catching Fire
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12,1536227,The Giver
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23,72456,Kindred
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21,71722,The Dispossessed


In [29]:
df['series'] = df.Book_Title.str.extract("\(([^\)]+)\)", expand=False)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings,Book_Title_clean,series
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09,1105358,The Handmaid's Tale,
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33,5743136,The Hunger Games,"The Hunger Games, #1"
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78,1024143,Frankenstein,
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01,904957,A Wrinkle in Time,"Time Quintet, #1"
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21,2604504,Divergent,"Divergent, #1"
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29,2201771,Catching Fire,"The Hunger Games, #2"
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12,1536227,The Giver,"The Giver, #1"
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23,72456,Kindred,
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21,71722,The Dispossessed,Hainish Cycle #6


In [30]:
df['number_of_series'] = df.Book_Title.str.extract("#(\d)", expand=False)
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings,Book_Title_clean,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings","score: 30,733",314 people voted,30733,4.09,1105358,The Handmaid's Tale,,
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings","score: 28,553",292 people voted,28553,4.33,5743136,The Hunger Games,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings","score: 21,909",224 people voted,21909,3.78,1024143,Frankenstein,,
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings","score: 18,720",196 people voted,18720,4.01,904957,A Wrinkle in Time,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings","score: 17,920",184 people voted,17920,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4,4
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings","score: 13,326",138 people voted,13326,4.21,2604504,Divergent,"Divergent, #1",1
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings","score: 12,749",133 people voted,12749,4.29,2201771,Catching Fire,"The Hunger Games, #2",2
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings","score: 12,399",129 people voted,12399,4.12,1536227,The Giver,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings","score: 11,070",116 people voted,11070,4.23,72456,Kindred,,
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings","score: 10,731",112 people voted,10731,4.21,71722,The Dispossessed,Hainish Cycle #6,6


In [31]:
df.dtypes

Author_Name           object
Book_Title            object
Rank                  object
Rating                object
Score                 object
Votes                 object
score_clean            int64
rating_clean         float64
number_of_ratings      int64
Book_Title_clean      object
series                object
number_of_series      object
dtype: object

In [32]:
df['Score'] = df.score_clean
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,score_clean,rating_clean,number_of_ratings,Book_Title_clean,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings",30733,314 people voted,30733,4.09,1105358,The Handmaid's Tale,,
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings",28553,292 people voted,28553,4.33,5743136,The Hunger Games,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings",21909,224 people voted,21909,3.78,1024143,Frankenstein,,
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings",18720,196 people voted,18720,4.01,904957,A Wrinkle in Time,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings",17920,184 people voted,17920,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4,4
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings",13326,138 people voted,13326,4.21,2604504,Divergent,"Divergent, #1",1
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings",12749,133 people voted,12749,4.29,2201771,Catching Fire,"The Hunger Games, #2",2
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings",12399,129 people voted,12399,4.12,1536227,The Giver,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings",11070,116 people voted,11070,4.23,72456,Kindred,,
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings",10731,112 people voted,10731,4.21,71722,The Dispossessed,Hainish Cycle #6,6


In [33]:
df = df.drop(columns=['score_clean'])
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,rating_clean,number_of_ratings,Book_Title_clean,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,"4.09 avg rating — 1,105,358 ratings",30733,314 people voted,4.09,1105358,The Handmaid's Tale,,
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,"4.33 avg rating — 5,743,136 ratings",28553,292 people voted,4.33,5743136,The Hunger Games,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,"3.78 avg rating — 1,024,143 ratings",21909,224 people voted,3.78,1024143,Frankenstein,,
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,"4.01 avg rating — 904,957 ratings",18720,196 people voted,4.01,904957,A Wrinkle in Time,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,"4.06 avg rating — 98,909 ratings",17920,184 people voted,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4,4
5,Veronica Roth,"Divergent (Divergent, #1)",6,"4.21 avg rating — 2,604,504 ratings",13326,138 people voted,4.21,2604504,Divergent,"Divergent, #1",1
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,"4.29 avg rating — 2,201,771 ratings",12749,133 people voted,4.29,2201771,Catching Fire,"The Hunger Games, #2",2
7,Lois Lowry,"The Giver (The Giver, #1)",8,"4.12 avg rating — 1,536,227 ratings",12399,129 people voted,4.12,1536227,The Giver,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,"4.23 avg rating — 72,456 ratings",11070,116 people voted,4.23,72456,Kindred,,
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,"4.21 avg rating — 71,722 ratings",10731,112 people voted,4.21,71722,The Dispossessed,Hainish Cycle #6,6


In [34]:
df['Rating'] = df.rating_clean
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,rating_clean,number_of_ratings,Book_Title_clean,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,4.09,30733,314 people voted,4.09,1105358,The Handmaid's Tale,,
1,Suzanne Collins,"The Hunger Games (The Hunger Games, #1)",2,4.33,28553,292 people voted,4.33,5743136,The Hunger Games,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,3.78,21909,224 people voted,3.78,1024143,Frankenstein,,
3,Madeleine L'Engle,"A Wrinkle in Time (Time Quintet, #1)",4,4.01,18720,196 people voted,4.01,904957,A Wrinkle in Time,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness (Hainish Cycle #4),5,4.06,17920,184 people voted,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4,4
5,Veronica Roth,"Divergent (Divergent, #1)",6,4.21,13326,138 people voted,4.21,2604504,Divergent,"Divergent, #1",1
6,Suzanne Collins,"Catching Fire (The Hunger Games, #2)",7,4.29,12749,133 people voted,4.29,2201771,Catching Fire,"The Hunger Games, #2",2
7,Lois Lowry,"The Giver (The Giver, #1)",8,4.12,12399,129 people voted,4.12,1536227,The Giver,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,4.23,11070,116 people voted,4.23,72456,Kindred,,
9,Ursula K. Le Guin,The Dispossessed (Hainish Cycle #6),10,4.21,10731,112 people voted,4.21,71722,The Dispossessed,Hainish Cycle #6,6


In [35]:
df['Book_Title'] = df.Book_Title_clean
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,rating_clean,number_of_ratings,Book_Title_clean,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,4.09,30733,314 people voted,4.09,1105358,The Handmaid's Tale,,
1,Suzanne Collins,The Hunger Games,2,4.33,28553,292 people voted,4.33,5743136,The Hunger Games,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,3.78,21909,224 people voted,3.78,1024143,Frankenstein,,
3,Madeleine L'Engle,A Wrinkle in Time,4,4.01,18720,196 people voted,4.01,904957,A Wrinkle in Time,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness,5,4.06,17920,184 people voted,4.06,98909,The Left Hand of Darkness,Hainish Cycle #4,4
5,Veronica Roth,Divergent,6,4.21,13326,138 people voted,4.21,2604504,Divergent,"Divergent, #1",1
6,Suzanne Collins,Catching Fire,7,4.29,12749,133 people voted,4.29,2201771,Catching Fire,"The Hunger Games, #2",2
7,Lois Lowry,The Giver,8,4.12,12399,129 people voted,4.12,1536227,The Giver,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,4.23,11070,116 people voted,4.23,72456,Kindred,,
9,Ursula K. Le Guin,The Dispossessed,10,4.21,10731,112 people voted,4.21,71722,The Dispossessed,Hainish Cycle #6,6


In [36]:
df = df.drop(columns=['rating_clean','Book_Title_clean'])
df

Unnamed: 0,Author_Name,Book_Title,Rank,Rating,Score,Votes,number_of_ratings,series,number_of_series
0,Margaret Atwood,The Handmaid's Tale,1,4.09,30733,314 people voted,1105358,,
1,Suzanne Collins,The Hunger Games,2,4.33,28553,292 people voted,5743136,"The Hunger Games, #1",1
2,Mary Wollstonecraft Shelley,Frankenstein,3,3.78,21909,224 people voted,1024143,,
3,Madeleine L'Engle,A Wrinkle in Time,4,4.01,18720,196 people voted,904957,"Time Quintet, #1",1
4,Ursula K. Le Guin,The Left Hand of Darkness,5,4.06,17920,184 people voted,98909,Hainish Cycle #4,4
5,Veronica Roth,Divergent,6,4.21,13326,138 people voted,2604504,"Divergent, #1",1
6,Suzanne Collins,Catching Fire,7,4.29,12749,133 people voted,2201771,"The Hunger Games, #2",2
7,Lois Lowry,The Giver,8,4.12,12399,129 people voted,1536227,"The Giver, #1",1
8,Octavia E. Butler,Kindred,9,4.23,11070,116 people voted,72456,,
9,Ursula K. Le Guin,The Dispossessed,10,4.21,10731,112 people voted,71722,Hainish Cycle #6,6


In [37]:
df = df.drop(columns=['Author_Name', 'Rank', 'Votes'])
df

Unnamed: 0,Book_Title,Rating,Score,number_of_ratings,series,number_of_series
0,The Handmaid's Tale,4.09,30733,1105358,,
1,The Hunger Games,4.33,28553,5743136,"The Hunger Games, #1",1
2,Frankenstein,3.78,21909,1024143,,
3,A Wrinkle in Time,4.01,18720,904957,"Time Quintet, #1",1
4,The Left Hand of Darkness,4.06,17920,98909,Hainish Cycle #4,4
5,Divergent,4.21,13326,2604504,"Divergent, #1",1
6,Catching Fire,4.29,12749,2201771,"The Hunger Games, #2",2
7,The Giver,4.12,12399,1536227,"The Giver, #1",1
8,Kindred,4.23,11070,72456,,
9,The Dispossessed,4.21,10731,71722,Hainish Cycle #6,6


In [41]:
df['series'] = df.series.str.extract("(^[\w ]+)", expand=False)
df

Unnamed: 0,Book_Title,Rating,Score,number_of_ratings,series,number_of_series
0,The Handmaid's Tale,4.09,30733,1105358,,
1,The Hunger Games,4.33,28553,5743136,The Hunger Games,1
2,Frankenstein,3.78,21909,1024143,,
3,A Wrinkle in Time,4.01,18720,904957,Time Quintet,1
4,The Left Hand of Darkness,4.06,17920,98909,Hainish Cycle,4
5,Divergent,4.21,13326,2604504,Divergent,1
6,Catching Fire,4.29,12749,2201771,The Hunger Games,2
7,The Giver,4.12,12399,1536227,The Giver,1
8,Kindred,4.23,11070,72456,,
9,The Dispossessed,4.21,10731,71722,Hainish Cycle,6
