# Goodreads: Science Fiction Books by Female Authors (Scraping to a CSV)

https://www.goodreads.com/list/show/6934.Science_Fiction_Books_by_Female_Authors

Scrape the fields below, and save as a CSV file.

|Field|Example|
|---|---|
|Rank|1|
|Title|The Handmaid's Tale|
|Author|Margaret Atwood|
|Score|score: 30,733|
|Votes|314 people voted|
|Rating|4.09 avg rating — 1,101,120 ratings|

This one is a little tougher, but the main difficulty is in cleaning the data! Clean and separate the scraped data, cleaning up columns and creating new ones like so:

|Before|After|
|---|---|
|A Wrinkle in Time (Time Quintet, #1)|A Wrinkle in Time|
|_Series_|Time Quintet|
|_Number in series_|1|
|score: 30,733|30733|
|4.09 avg rating — 1,101,120 ratings|4.09|
|_Number of ratings_|1101120|

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get('https://www.goodreads.com/list/show/6934.Science_Fiction_Books_by_Female_Authors')
doc = BeautifulSoup(response.text)

In [3]:
ranks = doc.find_all(class_="number")
for rank in ranks:
    print(rank.text.strip())

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
89
91
92
93
94
95
96
97
98
99
100


In [4]:
titles = doc.find_all(class_="bookTitle")
for title in titles:
    print(title.text.strip())

The Handmaid's Tale
The Hunger Games (The Hunger Games, #1)
Frankenstein
A Wrinkle in Time (Time Quintet, #1)
The Left Hand of Darkness (Hainish Cycle #4)
Divergent (Divergent, #1)
Catching Fire (The Hunger Games, #2)
The Giver (The Giver, #1)
Kindred
The Dispossessed (Hainish Cycle #6)
Oryx and Crake (MaddAddam, #1)
Mockingjay (The Hunger Games, #3)
The Time Traveler's Wife
Doomsday Book (Oxford Time Travel, #1)
The Lathe of Heaven
Ancillary Justice (Imperial Radch #1)
Shards of Honour  (Vorkosigan Saga, #1)
To Say Nothing of the Dog (Oxford Time Travel, #2)
The Sparrow (The Sparrow, #1)
Dragonflight (Dragonriders of Pern, #1)
Parable of the Sower (Earthseed, #1)
The Warrior's Apprentice (Vorkosigan Saga, #2)
Barrayar (Vorkosigan Saga, #7)
The Host (The Host, #1)
Insurgent (Divergent, #2)
Memory (Vorkosigan Saga, #10)
The Children of Men
Starshine: Aurora Rising Book One (Aurora Rhapsody, #1)
The Year of the Flood  (MaddAddam, #2)
Crystal Singer (Crystal Singer, #1)
Mirror Dance (Vork

In [5]:
authors = doc.find_all(class_="authorName")
for author in authors:
    print(author.text.strip())

Margaret Atwood
Suzanne Collins
Mary Wollstonecraft Shelley
Madeleine L'Engle
Ursula K. Le Guin
Veronica Roth
Suzanne Collins
Lois Lowry
Octavia E. Butler
Ursula K. Le Guin
Margaret Atwood
Suzanne Collins
Audrey Niffenegger
Connie Willis
Ursula K. Le Guin
Ann Leckie
Lois McMaster Bujold
Connie Willis
Mary Doria Russell
Anne McCaffrey
Octavia E. Butler
Lois McMaster Bujold
Lois McMaster Bujold
Stephenie Meyer
Veronica Roth
Lois McMaster Bujold
P.D. James
G.S. Jennsen
Margaret Atwood
Anne McCaffrey
Lois McMaster Bujold
Octavia E. Butler
Lois McMaster Bujold
Octavia E. Butler
Madeleine L'Engle
Lois McMaster Bujold
Lois McMaster Bujold
Lois McMaster Bujold
Joan D. Vinge
Marissa Meyer
C.J. Cherryh
Anne McCaffrey
Deborah O'Neill Cordes
Anne McCaffrey
Lois McMaster Bujold
Lois McMaster Bujold
Madeleine L'Engle
Octavia E. Butler
Lois McMaster Bujold
C.J. Cherryh
Kate Wilhelm
Nancy Kress
James Tiptree Jr.
Tanith Lee
Lois McMaster Bujold
C.J. Cherryh
Emily St. John Mandel
C.J. Cherryh
Lois McMas

In [6]:
scores = doc.find_all(class_="smallText uitext")
for score in scores:
    score = score.find('a')
    print(score.text.strip())

score: 30,733
score: 28,553
score: 21,909
score: 18,720
score: 17,920
score: 13,326
score: 12,749
score: 12,399
score: 11,070
score: 10,731
score: 10,117
score: 9,807
score: 9,193
score: 8,935
score: 7,775
score: 7,336
score: 7,017
score: 6,890
score: 6,873
score: 6,802
score: 6,653
score: 6,087
score: 5,623
score: 5,274
score: 4,899
score: 4,682
score: 4,484
score: 4,394
score: 4,343
score: 4,235
score: 4,120
score: 4,108
score: 3,895
score: 3,842
score: 3,700
score: 3,683
score: 3,653
score: 3,618
score: 3,535
score: 3,440
score: 3,381
score: 3,361
score: 3,340
score: 3,092
score: 3,054
score: 2,904
score: 2,831
score: 2,755
score: 2,579
score: 2,547
score: 2,535
score: 2,526
score: 2,524
score: 2,504
score: 2,498
score: 2,469
score: 2,450
score: 2,316
score: 2,306
score: 2,203
score: 2,199
score: 2,191
score: 2,128
score: 2,038
score: 2,013
score: 2,006
score: 1,931
score: 1,914
score: 1,849
score: 1,819
score: 1,748
score: 1,698
score: 1,684
score: 1,683
score: 1,659
score: 1,572
s

In [16]:
# vote
for score in scores:
    print(score.find_all('a')[1].text)

314 people voted
292 people voted
224 people voted
196 people voted
184 people voted
138 people voted
133 people voted
129 people voted
116 people voted
112 people voted
107 people voted
104 people voted
98 people voted
96 people voted
84 people voted
81 people voted
75 people voted
74 people voted
72 people voted
74 people voted
71 people voted
65 people voted
60 people voted
56 people voted
54 people voted
51 people voted
49 people voted
45 people voted
47 people voted
47 people voted
47 people voted
47 people voted
46 people voted
43 people voted
42 people voted
43 people voted
43 people voted
43 people voted
43 people voted
38 people voted
40 people voted
39 people voted
34 people voted
36 people voted
37 people voted
36 people voted
32 people voted
32 people voted
31 people voted
30 people voted
30 people voted
30 people voted
29 people voted
29 people voted
31 people voted
31 people voted
28 people voted
30 people voted
29 people voted
27 people voted
22 people voted
27 people vo

In [15]:
scores[0].find_all('a')[1].text

'314 people voted'

In [7]:
votes = doc.find_all(class_="smallText uitext")
votes

[<span class="smallText uitext">
 <a href="#" onclick="Lightbox.showBoxByID('score_explanation', 300); return false;">score: 30,733</a>,
               <span class="greyText">and</span>
 <a href="#" id="loading_link_377435" onclick="new Ajax.Request('/list/list_book/394876', {asynchronous:true, evalScripts:true, onFailure:function(request){Element.hide('loading_anim_377435');$('loading_link_377435').innerHTML = '&lt;span class=&quot;error&quot;&gt;ERROR&lt;/span&gt;try again';$('loading_link_377435').show();}, onLoading:function(request){;Element.show('loading_anim_377435');Element.hide('loading_link_377435')}, onSuccess:function(request){Element.hide('loading_anim_377435');Element.show('loading_link_377435');}, parameters:'authenticity_token=' + encodeURIComponent('z3ZOU2csDUWuJYU7z3hNg1mz5/ic3Dmr3cAnIGZVQ0j6cN47D5wtZFIobFqqSLoUf4cYrlvFoPLWvv03EVlEpA==')}); return false;">314 people voted</a><img alt="Loading trans" class="loading" id="loading_anim_377435" src="https://s.gr-assets.com

In [8]:
rates = doc.find_all(class_="greyText")
for rate in rates:
    print(rate.text.strip())

(Goodreads Author)
4.09 avg rating — 1,102,318 ratings
and
Rate this book
4.33 avg rating — 5,742,147 ratings
and
Rate this book
3.78 avg rating — 1,023,439 ratings
and
Rate this book
4.01 avg rating — 903,270 ratings
and
Rate this book
4.06 avg rating — 98,786 ratings
and
Rate this book
(Goodreads Author)
4.21 avg rating — 2,602,763 ratings
and
Rate this book
4.29 avg rating — 2,200,701 ratings
and
Rate this book
(Goodreads Author)
4.12 avg rating — 1,535,105 ratings
and
Rate this book
4.23 avg rating — 72,335 ratings
and
Rate this book
4.21 avg rating — 71,641 ratings
and
Rate this book
(Goodreads Author)
4.01 avg rating — 197,786 ratings
and
Rate this book
4.03 avg rating — 2,071,958 ratings
and
Rate this book
(Goodreads Author)
3.96 avg rating — 1,446,731 ratings
and
Rate this book
4.03 avg rating — 43,252 ratings
and
Rate this book
4.11 avg rating — 43,989 ratings
and
Rate this book
(Goodreads Author)
3.98 avg rating — 66,205 ratings
and
Rate this book
(Goodreads Author)
4.11 avg 