# Introduction


## Introduction to the Topic: What Makes a Good Fantasy Novel?
<br>

'Fantasy fiction' refers to a subclass of fictional literature which is quite straightforward to define. 
One [online literary dictionary](https://literaryterms.net/fantasy/) defines "fantasy" as "a genre of fiction that concentrates on imaginary elements (the fantastic). This can mean magic, the supernatural, alternate worlds, superheroes, monsters, fairies, magical creatures, mythological heroes—essentially, anything that an author can imagine outside of reality."

A substantial proportion of fantasy fans probably know that this particular genre of novel is not particularly well-respected in the world of literature: [this blog](https://livingwriter.com/blog/avoid-fantasy-writing-cliches) draws attention to how the 'fantasy genre is rife with clichés that have been used, abused, and overused', and how there are so many bad fantasy novels which seem clichéd, tropey and unoriginal.

A 2015 <em>Wired</em> [interview](https://www.wired.com/2015/04/geeks-guide-kazuo-ishiguro/) with Nobel-prize winning author Kazuo Ishiguro called "Why Are So Many People Snobby About Fantasy Fiction?" was devoted to a discussion of precisely this topic. The famous British-Japanese author calls certain critics dismissal of his book <em>The Buried Giant</em> just because "it has ogres in it" an example of "classic prejudice". It is clear that together with Ishiguro, authors such as J.R.R. Tolkien, Ursula le Guin, Haruki Murakami, and Olga Tokarczuk, whose books are often classed as "fantasy" on Goodreads.com, are highly-accomplished authors. On the other hand, it is also difficult to deny that the domain of fantasy fiction, similarly to other kinds of 'genre' literature (including horror, sci-fi and romance literature), is plagued by an enormous amount of literature which is poorly-written and clichéd, and often reinforces sexist and racist prejudices: this [*Guardian* Article](https://www.theguardian.com/books/booksblog/2018/mar/06/terry-goodkind-sexism-cover-shroud-eternity) discusses a lot of the more objectionable aspects of this genre.

This raises the question: what makes a *good* fantasy novel? What are the *most common features* of well-reviewed fantasy novels? This project will attempt to shed some light on this question by using several largely qualitative data analysis tools, such as Natural Language Processing, sentiment analysis and thematic analysis techniques, as well as *quantitative statistical analysis tools to see if there are correlations between certain themes and ratings* (**INSERT REFERENCE LATER!**) in order to highlight the central themes and patterns which occur in online reviews of the highest-rated and the lowest-rated fantasy novels.

This project was partly inspired by a MSc thesis paper written by Dalton J. Crutchfield at the University of Denver called "Using Natural Language Processing to Categorize Fictional Literature in an Unsupervised Manner" ([link here](https://digitalcommons.du.edu/etd/1741/)). Critchfield notes how the categorization of literature (e.g. 'this is science fiction', 'this is romance') is something which humans find easy, but how this 'intuitive' process is very difficult to formalize and implement using computers. The aim of his research was to 'mimic how a human may look deeper into a plot, find similar concepts like *certain words* being used' and 'the *type of words used*', for instance, an adventure booking having 'more verbs', via NLP techniques and packages such as nltk. The author of the thesis also cites a 2008 article by Richard Maker discussing the 'genre stigma', and how many readers 'believe they do not enjoy Fantasy novels, which, in actuality, will push them away from novels they may enjoy'. This leads Crutchfield to surmise that 'there are more classifications than just by genre', which are important in determining whether a reader will enjoy a work of fiction or not, such as the 'core structure' of the work - or as I might, add, the author's writing style, sense of humour, etc. Therefore, in order to explore what makes a Fantasy novel successful, I plan to also make use of some of the NLP techniques introduced in that study, including tokenization and sentiment analysis to try to fundamentally determine which patterns and features of a Fantasy novel make it successful, possibly incorporating training an unsupervised machine-learning model to identify patterns within the book descriptions and reviews in order to categorize the main topics/themes which make a novel successful.

## Aims and Objectives

### The main thematic questions which this project will focus on exploring are as follows:

1. **Which are the *x* most highly-rated fantasy novels on a book-review site (such as Goodreads or Amazon), and which are the *x* most low-rated?**
<br><br>
2. **Is there any correlation between the rating of a book classed as 'fantasy' (e.g. on *Goodreads*) and the co-genre which the book is classed as?** For instance, are fantasy books which are also in the 'historical' genre rated as higher than fantasy books which are also tagged as 'romance'?
<br> <br>
3. **What are the main themes and topics (e.g. "dragons", "magical schools") which occur in the book descriptions (blurbs) of the *x* most top-rated fantasy novels and in the *x* most low-rated fantasy novels?**
<br> <br>
4. **Which are the main themes and topics which emerge from professional literary reviews of the 50 most top-rated and the *x* lowest-rated fantasy novels?"**
<br> <br>
5. **Which are the main themes and topics which emerge from the wider audience reviews (i.e. the comments on Goodreads or Amazon) of the *x* most top-rated and the  *x* lowest-rated fantasy novels?"**
<br>

### The steps required to practically implement a discussion of the above questions are outlined below:

1. Determine how many fantasy novels should be looked considered for the set of best-reviewed novels and for the worst-reviewed novels.
<br><br>
2. Decide which dataset to use for the list of fantasy novels and their associated genres and ratings:
    - Should a ready made dataset be chosen from a website such as Kaggle, or should the data be collected and cleaned using an API or web scraping techniques with Selenium and BeautifulSoup 4?
    - Discuss the advantages and disadvantages of these different approaches based on the datasets and APIs which are currently available for book-review websites.
<br><br>  
3. Sanitize the best/worst novels datasets and represent the information about each book in a tabular format:
    - For instance, individual fantasy books are often part of a larger 'series' (like *Harry Potter*).
    - Decide whether each entry in the table should include the individual fantasy book or just the name of the entire series.
    - Decide whether only books written in English should be included in the table/pandas DataFrame.
<br><br>  
4. Determine which NLP packages and tools to use in order to conduct thematic analysis on the top and bottom *x* fantasy novels:
    - Look into some alternative packages to nltk such as Spacy or Gensim
    - A list of useful tools can be found here: (https://cruizbran.medium.com/top-python-libraries-for-nlp-eca6df4c9472)
    - Justify the choice of tools for this task
<br><br>
5. Analyse the main themes occuring in the blurbs/description textual data for the top *x* and bottom *x* fantasy books.
<br><br>
6. For each *x* top books (doing the same for the *x* lowest-rated books), use either an API to search for professional reviews on a news source such as *The New York Times* or *The Guardian*, or web scraping tools, and extract thematic information from the top 3-4 reviews found for each book.
<br><br>
7. For each *x* top books (doing the same for the *x* lowest-rated books), use either an API (e.g. Reddit and the Python praw module) to search for layperson reviews about the selected books, and extract thematic information from the top 3-4 reviews found for each book.
<br><br>
8. Isolate the main themes discussed by (a) professional reviewers and (b) wider audience reviews/comments for the top *x* and bottom *x* fantasy novels.
<br><br>
9. Visualize/represent the main themes emerging from the top-rated and lowest-rated fantasy novels in some way, either using wordclouds or bar charts with themes on the x-axis and overall rating on the y-axis.

## The Data

The major shortcoming of this proposed analysis is that most novels of which the modern 'Fantasy' genre is comprised were written in the latter half of the 20th century and in the 21st century, and that they are copyrighted and the texts are unavailable, which wouldn't be the case if this project were focusing on analysing themes in Shakespeare's plays or 19th century realist novels, for which I could then use the data from an free web-based library such as Project Gutenberg.

As such, the analysis of the main features of the most successful fantasy novels will be limited to data such as the other genres in which the novel is categorized by, product/book descriptions, professional critics' reviews, and discussion about the book on web forums and in comments sections. It should also be added that even if I had access to an electronic version of the top 50 or top 100 fantasy novels, processing and analysing such a large body of text would overwhelm the computational resources which I have access to for this project. However, a future project might want to apply tools and techniques from the domain of NLP to the actual texts of these highly-rated fantasy novels in order to verify the conclusions reached by this study. 

The first stage in **determining the things people like most about a Fantasy novel** was to choose a dataset from a resource where users can rank books, and where the books are categorized in terms of their genre, in order to be able to select books which are classed as 'Fantasy'. The most well-known web application dedicated to book reviews is Goodreads, which unfortunately does not have its own free API, as this was discontinued in 2020 ['due to 'inactivity''](https://help.goodreads.com/s/article/Why-did-my-API-key-stop-working). Furthermore, Goodreads [does not permit users to view books sorted by rating](https://help.goodreads.com/s/question/0D51H00005JPAaaSAH/is-there-a-way-to-organize-books-in-a-list-by-rating). I found some ready-made datasets about science-fiction and fantasy books on [Kaggle](https://www.kaggle.com/datasets/michaelcai2021/goodreads-pop-science-fiction-and-fantasy-books) but most often, they unfortunately did not include the 'genre' column in the dataset, which is a non-negotionable requirement for my analysis. As a result, I decided to use some of the more advanced web scraping techniques introduced in this course such as "paging" to get the information for all the books on Goodreads which are classed as "Fantasy", as the "Fantasy" shelf extends over many pages. 
I use the Python selenium package to automatically page through the "Fantasy" [shelf](https://www.goodreads.com/shelf/show/fantasy?page=1) on Goodreads.

## Web Scraping Goodreads to get the Set of 100 Top-Rated Fantasy Novels

Before I can determine the key features of the 100 top-rated fantasy novels on Goodreads, I first had to scrape the 'Fantasy' shelf on this website, getting the title, author, URL and rating out of 5 for each book. Unfortunately, when one navigates to the Fantasy shelf on Goodreads, the website does not you go past the 25th results page (there are 50 books on each page, so 1250 books you can access in total), even though the website clearly shows that there are 100 pages of results. There does not seem to be an explanation of why this is anywhere online, except for one [forum post](https://help.goodreads.com/s/question/0D58V00006Vuqw5SAB/why-cant-i-browse-past-page-25) where someone else noticed this problem, confirming that it is not possible to navigate to page 26 for **any** book category/genre on Goodreads. 

**Goodreads Fantasy shelf looks like it has 100 pages**
![image.png](attachment:image.png)

**What happens when you try to navigate to page 26 (look at the URL)**
![image-2.png](attachment:image-2.png)


The books on this Goodreads 'Fantasy' shelf are sorted not by rating, but instead by how many times the book was 'shelved' as 'Fantasy':
![image-3.png](attachment:image-3.png)

A book being ['shelved'](https://www.goodreads.com/questions/1240278-what-does-shelved-mean) by a genre means that a person has essentially tagged the book by this genre by adding it to a 'shelf' (list) with that genre name. As the books are displayed on the fantasy 'shelf' in descending order of how many people tagged them as 'fantasy', it could be argued that looking at the first 1250 most-shelved books (up to page 25) is a sufficient amount of data when trying to understand the features of the most popular and well-rated fantasy books, as the books at the inaccessible end of the list which have not been 'shelved' by many users, and are probably not that well-known. Nonetheless, it should be commented upon in this project that the data used unfortunately excluded the less-popular fantasy novels on Goodreads, as there could be some novels which are less well-known but which have received very high ratings from critics. This is therefore a significant flaw in the dataset which results from Goodreads' reluctance to allow users access to its full catalogue of books classed under a specific genre. As such, it would be a good idea for a further study to verify these findings by acquiring a more complete category of fantasy novels, for instance, by considering setting up a website or e-commerce site to be granted access to Amazon's API.

In [1]:
# Ref: https://dev.to/rahulkumarmalhotra/goodreads-scraping-using-python-and-selenium-343j
# Ref: https://stackoverflow.com/questions/70534875/typeerror-init-got-an-unexpected-keyword-argument-service-error-using-p

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys # Allows logging in to Goodreads (sending username + password) via chromeDriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By 
from time import sleep
from random import random
from selenium.common.exceptions import NoSuchElementException

# File storing Goodreads login information (has to be kept separate and is not uploaded onto github for obvious security reasons)
import login

# Create the programmable browser
pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
browser = webdriver.Chrome(service=service)
# Goodreads does allow browsing more than the first page of a "Shelf" (e.g. the Fantasy shelf) unless the user is logged in
browser.get("https://www.goodreads.com/user/sign_in")

# Click the button in the login page which allows users to log in using their email and password, rather than Google/Facebook
signin_button = browser.find_element(By.CLASS_NAME, "authPortalSignInButton")
signin_button.click()

# Find the email and password input fields
login_email_field = browser.find_element(By.NAME, value="email")
login_password_field = browser.find_element(By.NAME, value="password")

# Use Selenium's 'send_keys' method from 'selenium.webdriver.common.keys' to input email address and password to login to Goodreads
login_email_field.send_keys(login.email)
login_password_field.send_keys(login.password)

# Find the submit login info button and click it to log in to Goodreads
submit_button = browser.find_element(By.ID, value="signInSubmit")
submit_button.click()


# Add a wait time to manually enter the Captcha they try to stop you scraping/sending login with
sleep(25)

# Now that we have logged in, it is possible to navigate from page 1 through 25 on the Fantasy shelf and scrape the data...
currentPage = 1 # first page of Fantasy shelf
maxPages = 25 # last page of Fantasy which is accessible
# URL which just has the current page number added to it for each iteration of the while-loop
url = "https://www.goodreads.com/shelf/show/fantasy?page="

# Stores a list of dict objects, with each dict containing title/URL-link/author/rating data for each book
books = []

while currentPage <= maxPages:  
    
    # Add a random wait time so we don't overload the website and we don't look like a bot
    sleep(random() * 3)
    
    # Get the current-desired page from the shelf
    browser.get(url + str(currentPage))
    
    # Get all the elements with class 'left' (the class for the div storing each book's information)
    book_divs = browser.find_elements(By.CLASS_NAME, "left")
    
    # Iterate over the book divs
    book_div_count = 0
    for b in book_divs:
        # Sometimes, there elements with the following class names cannot be found (maybe because the page is dynamically built)
        try:
            book = {
                'title': b.find_element(By.CLASS_NAME, "bookTitle").text,
                'link': b.find_element(By.CLASS_NAME, "bookTitle").get_attribute('href'),
                'author': b.find_element(By.CLASS_NAME, "authorName").find_element(By.TAG_NAME, "span").text,
                'rating': b.find_element(By.CLASS_NAME, "greyText").text
            }
            books.append(book)
            book_div_count += 1
         # Therefore, throw an error, print message and continue if the book info is missing
        except NoSuchElementException as ex:
            # Print out the page and element which is missing, so that this can be filled in manually later (I think there were
            # only about 2-3 items which were missing)
            print("Error - No such element found --> page: " + str(currentPage) + " item: " + str(book_div_count))
            print(ex)
            book_div_count += 1
            continue
            
       
    # Proceed to the next page
    currentPage += 1

    

Error - No such element found --> page: 4 item: 38
Message: no such element: Unable to locate element: {"method":"css selector","selector":".bookTitle"}
  (Session info: chrome=114.0.5735.134); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
Backtrace:
	GetHandleVerifier [0x0058A813+48355]
	(No symbol) [0x0051C4B1]
	(No symbol) [0x00425358]
	(No symbol) [0x004509A5]
	(No symbol) [0x00450B3B]
	(No symbol) [0x00449AE1]
	(No symbol) [0x0046A784]
	(No symbol) [0x00449A36]
	(No symbol) [0x0046AA94]
	(No symbol) [0x0047C922]
	(No symbol) [0x0046A536]
	(No symbol) [0x004482DC]
	(No symbol) [0x004493DD]
	GetHandleVerifier [0x007EAABD+2539405]
	GetHandleVerifier [0x0082A78F+2800735]
	GetHandleVerifier [0x0082456C+2775612]
	GetHandleVerifier [0x006151E0+616112]
	(No symbol) [0x00525F8C]
	(No symbol) [0x00522328]
	(No symbol) [0x0052240B]
	(No symbol) [0x00514FF7]
	BaseThreadInitThunk [0x7

In [2]:
books

[{'title': 'Harry Potter and the Philosopher’s Stone (Harry Potter, #1)',
  'link': 'https://www.goodreads.com/book/show/72193.Harry_Potter_and_the_Philosopher_s_Stone',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.47 — 9,350,074 ratings — published 1997'},
 {'title': 'Harry Potter and the Chamber of Secrets (Harry Potter, #2)',
  'link': 'https://www.goodreads.com/book/show/15881.Harry_Potter_and_the_Chamber_of_Secrets',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.43 — 3,628,391 ratings — published 1998'},
 {'title': 'Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)',
  'link': 'https://www.goodreads.com/book/show/5.Harry_Potter_and_the_Prisoner_of_Azkaban',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.58 — 3,844,071 ratings — published 1999'},
 {'title': 'The Hobbit (The Lord of the Rings, #0)',
  'link': 'https://www.goodreads.com/book/show/5907.The_Hobbit',
  'author': 'J.R.R. Tolkien',
  'rating': 'avg rating 4.28 — 3,692,122 ratings — publis

In [101]:
# Missing books were on page 4, at positions 38 and 39 down the page
missingIndex1 = (50*4) + 38
missingIndex2 = (50*4) + 39

# Manually enter data for these two books into the dataset
books.insert(missingIndex1, {
    'title': "The Atlas Six (The Atlas, #1)",
    'link': 'https://www.goodreads.com/book/show/50520939-the-atlas-six',
    'author': 'Olivie Blake',
    'rating': 'avg rating 3.68 — 143,975 ratings — published 2020'
})

books.insert(missingIndex2, {
      'title': "Red Seas Under Red Skies (Gentleman Bastard, #2)",
      'link': 'https://www.goodreads.com/book/show/40604556-red-seas-under-red-skies',
      'author': 'Olivie Blake',
      'rating': 'avg rating 4.24 — 135,716 ratings — published 2007'
})


## Dealing with Missing 'Rating' Values

As mentioned above, the most important property of the data about the 1250 top-shelved fantasy books scraped from Goodreads is their **average rating**. However, Goodreads really does **not** make it easy for developers to scrape the rating data for books. The first problem, solved above by using the 'sleep' function, was the website demanding the user to fill in a Captcha when logging into the site using an automated browser instance with Selenium and Chromedriver, as without logging in, it is impossible to scrape more than one page of books. However, examining the data for the 1250 scraped books has been scraped revealed a problem which was even more difficult to address. For each 'book' section on the webpage, the line which is supposed to hold the rating, number of reviews, and publication date information was stored in an HTML element with the classname 'greytext', which I tried to access using the *b.find_element(By.CLASS_NAME, "greytext")* command. 

The screenshot below shows the desired element (the book 'Neverwhere' by Neil Gaiman) being 'inspected' in the web browser with the text content "avg rating 4.17 —504,608 ratings  — published 1996" supposedly inside the 'span' tag with the 'greytext' class.
![image-2.png](attachment:image-2.png)

However, for some reason, trying to scrape this text containing the rating information for each book did not work for most of the books stored as dicts in the 'books' list, as instead of this  'avg rating xxx - xxx ratings - published xxxx' text, the value for the 'rating' and 'date_published' key has been set to '(Goodreads Author)' instead of the required text. Obviously, without many ratings stored for the majority of the books in this list, it would be impossible to find the 100 best-rated fantasy books from the list!

The solution I found was using urllib3 and BeautifulSoup4 to go to the URL of every book in 'books* which did not have the proper ratings information stored and to scrape this information from the book's actual URL page, rather than from the fantasy 'shelf'. As going to each book's individual URL does not involve the need to either log in or to page through a list of pages, I chose BeautifulSoup4 over Selenium for this task.

In [6]:
from bs4 import *
import urllib3
from time import sleep


# Allows for arbitrary requests while transparently keeping track of necessary connection pools for you.
http = urllib3.PoolManager()

# Keep track of how many books left to go... (total number of books is 1250)
x=0

# Create a function to scrape the rating from the book's URL: try to access the rating using HTML elems with different classes
def scrapeRating(book):
    rating = None
    # Try to scrape 3 times, as sometimes it works, sometimes it unfortunately does not, no idea as to why!
    for i in range (0, 3):
        # Pause for a random amount of time to stop Goodreads thinking you're a bot
        sleep(random() * 3)
        # Get HTTP request from the book's stored URL
        response = http.request('GET', book['link'])
        # Get soup object from http response
        soup = BeautifulSoup(response.data, 'lxml')
        # Search soup object for divs with the class 'RatingStatistics__rating', which should store the avg rating
        rating_divs = soup.find_all("div", {"class": "RatingStatistics__rating"})
        # If something was found, get the first div (after Inspecting, I found that this will store the rating)
        if len(rating_divs) > 0:
            rating = rating_divs[0].get_text()
            # Convert string to number
            return float(rating)
        # If nothing was found, try another div also storing rating, with the class 'RatingStatistics__column'
        else:
            rating_divs = soup.find_all("div", {"class": "RatingStatistics__column"})
            # If these elements were found, then get the first one, and extract the avg rating from the long string storing
            # lots of different information in a sentence (rating will be the fourth 'word' in the string)
            if(len(rating_divs) > 0):
                # Convert string to number
                rating = float(ratings[0]['aria-label'].split(' ')[3])
                return rating              
    # Returns None if none of the other scraping techniques worked
    return rating

# Iterate over books list:
for book in books:  
    # Keep track of how many books left to go... (total number of books is 1250)
    print(x)
    # Check if the avg_rating for a book entry has not been properly scraped by checking 'rating' key
    if 'avg rating' not in book['rating']:
        book['rating'] = scrapeRating(book)
    x+=1

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
27

In [7]:
books

[{'title': 'Harry Potter and the Philosopher’s Stone (Harry Potter, #1)',
  'link': 'https://www.goodreads.com/book/show/72193.Harry_Potter_and_the_Philosopher_s_Stone',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.47 — 9,350,074 ratings — published 1997'},
 {'title': 'Harry Potter and the Chamber of Secrets (Harry Potter, #2)',
  'link': 'https://www.goodreads.com/book/show/15881.Harry_Potter_and_the_Chamber_of_Secrets',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.43 — 3,628,391 ratings — published 1998'},
 {'title': 'Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)',
  'link': 'https://www.goodreads.com/book/show/5.Harry_Potter_and_the_Prisoner_of_Azkaban',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.58 — 3,844,071 ratings — published 1999'},
 {'title': 'The Hobbit (The Lord of the Rings, #0)',
  'link': 'https://www.goodreads.com/book/show/5907.The_Hobbit',
  'author': 'J.R.R. Tolkien',
  'rating': 'avg rating 4.28 — 3,692,122 ratings — publis

In [65]:
# Unfortunately, some 'ratings' fields are still empty (None) because the elements could not be scraped.
# Therefore, I decided to create another list storing the URL/links of the books which I could not scrape the web rating for.
# Then, I will manually enter the ratings using this list of book URLs into the books list
book_urls = []

for book in books:
    if book['rating'] == None:
        book_urls.append(book['link'])

# 132 books do not have the rating --> I have to add it manually :(
print(len(book_urls))

# This took hours and hours... but I had no other option than to enter the rating manually, as this list of books
# with missing ratings included some very important popular fantasy novels (e.g. 'The House in the Cerulean Sea', 'Fairy Tale')
for book in books:
    if book['link'] == 'https://www.goodreads.com/book/show/23437156-six-of-crows':
        book['rating'] = 4.50
    if book['link'] == 'https://www.goodreads.com/book/show/68428.The_Final_Empire':
        book['rating'] = 4.47
    if book['link'] == 'https://www.goodreads.com/book/show/26032825-the-cruel-prince':
        book['rating'] = 4.07
    if book['link'] == 'https://www.goodreads.com/book/show/50659468-a-court-of-mist-and-fury':
        book['rating'] = 4.63
    if book['link'] == 'https://www.goodreads.com/book/show/50659472-a-court-of-wings-and-ruin':
        book['rating'] = 4.45
    if book['link'] == 'https://www.goodreads.com/book/show/77197.Assassin_s_Apprentice':
        book['rating'] = 4.17
    if book['link'] == 'https://www.goodreads.com/book/show/22328546-red-queen':
        book['rating'] = 4.02
    if book['link'] == 'https://www.goodreads.com/book/show/14061957-ruin-and-rising':
        book['rating'] = 4.01
    if book['link'] == 'https://www.goodreads.com/book/show/1582996.City_of_Ashes':
        book['rating'] = 4.12
    if book['link'] == 'https://www.goodreads.com/book/show/45047384-the-house-in-the-cerulean-sea':
        book['rating'] = 4.43
    if book['link'] == 'https://www.goodreads.com/book/show/27774758-an-ember-in-the-ashes':
        book['rating'] = 4.26
    if book['link'] == 'https://www.goodreads.com/book/show/8490112-daughter-of-smoke-bone':
        book['rating'] = 3.99
    if book['link'] == 'https://www.goodreads.com/book/show/40275288-the-priory-of-the-orange-tree':
        book['rating'] = 4.23
    if book['link'] == 'https://www.goodreads.com/book/show/27883214-caraval':
        book['rating'] = 4.00
    if book['link'] == 'https://www.goodreads.com/book/show/8667848-a-discovery-of-witches':
        book['rating'] = 4.02
    if book['link'] == 'https://www.goodreads.com/book/show/53138095-a-court-of-silver-flames':
        book['rating'] = 4.44
    if book['link'] == 'https://www.goodreads.com/book/show/32718027-the-city-of-brass':
        book['rating'] = 4.15
    if book['link'] == 'https://www.goodreads.com/book/show/10025305-clockwork-prince':
        book['rating'] = 4.43
    if book['link'] == 'https://www.goodreads.com/book/show/18335634-clockwork-princess':
        book['rating'] = 4.56
    if book['link'] == 'https://www.goodreads.com/book/show/25526296-every-heart-a-doorway':
        book['rating'] = 3.82 
    if book['link'] == 'https://www.goodreads.com/book/show/12127750-the-mark-of-athena':
        book['rating'] = 4.47
    if book['link'] == 'https://www.goodreads.com/book/show/18798983-the-wrath-and-the-dawn':
        book['rating'] = 4.07
    if book['link'] == 'https://www.goodreads.com/book/show/18705209-the-blood-of-olympus':
        book['rating'] = 4.43
    if book['link'] == 'https://www.goodreads.com/book/show/3428935-the-warded-man':
        book['rating'] = 4.25
    if book['link'] == 'https://www.goodreads.com/book/show/22864842-the-queen-of-the-tearling':
        book['rating'] = 3.99  
    if book['link'] == 'https://www.goodreads.com/book/show/6644117-the-iron-king':
        book['rating'] = 3.88
    if book['link'] == 'https://www.goodreads.com/book/show/47624.Lirael':
        book['rating'] = 4.28
    if book['link'] == 'https://www.goodreads.com/book/show/9317452-rivers-of-london':
        book['rating']  = 3.87
    if book['link'] == 'https://www.goodreads.com/book/show/6339664-hush-hush':
        book['rating']  = 3.93
    if book['link'] == 'https://www.goodreads.com/book/show/25895524-red-sister':
        book['rating'] = 4.17
    if book['link'] == 'https://www.goodreads.com/book/show/3682.A_Great_and_Terrible_Beauty':
        book['rating'] = 3.79
    if book['link'] == 'https://www.goodreads.com/book/show/55987278-once-upon-a-broken-heart':
        book['rating'] = 4.13
    if book['link'] == 'https://www.goodreads.com/book/show/104089.Tigana':
        book['rating'] = 4.09
    if book['link'] == 'https://www.goodreads.com/book/show/13569581-blood-song':
        book['rating'] = 4.42
    if book['link'] == 'https://www.goodreads.com/book/show/17182126-steelheart':
        book['rating'] = 4.14
    if book['link'] == 'https://www.goodreads.com/book/show/6342491-the-demon-king':
        book['rating'] = 4.15
    if book['link'] == 'https://www.goodreads.com/book/show/53439886-how-the-king-of-elfhame-learned-to-hate-stories':
        book['rating'] = 4.22
    if book['link'] == 'https://www.goodreads.com/book/show/53457092-six-crimson-cranes':
        book['rating'] = 4.24
    if book['link'] == 'https://www.goodreads.com/book/show/36329818-legendary':
        book['rating'] = 4.21
    if book['link'] == 'https://www.goodreads.com/book/show/12954620-falling-kingdoms':
        book['rating'] = 3.78
    if book['link'] == 'https://www.goodreads.com/book/show/15790883-promise-of-blood':
        book['rating'] = 4.13
    if book['link'] == 'https://www.goodreads.com/book/show/30841984-kings-of-the-wyld':
        book['rating'] = 4.28
    if book['link'] == 'https://www.goodreads.com/book/show/24934065-rebel-of-the-sands':
        book['rating'] = 3.96
    if book['link'] == 'https://www.goodreads.com/book/show/45102.Ship_of_Destiny':
        book['rating'] = 4.27
    if book['link'] == 'https://www.goodreads.com/book/show/22443261-the-rithmatist':
        book['rating'] = 4.25
    if book['link'] == 'https://www.goodreads.com/book/show/23308084-the-rose-the-dagger':
        book['rating'] = 4.05
    if book['link'] == 'https://www.goodreads.com/book/show/30809786-a-reaper-at-the-gates':
        book['rating'] = 4.18
    if book['link'] == 'https://www.goodreads.com/book/show/10611.The_Eyes_of_the_Dragon':
        book['rating'] = 3.94
    if book['link'] == 'https://www.goodreads.com/book/show/68494.Perdido_Street_Station':
        book['rating'] = 3.97
    if book['link'] == 'https://www.goodreads.com/book/show/17699853-chain-of-gold':
        book['rating'] = 4.42
    if book['link'] == 'https://www.goodreads.com/book/show/51190882-the-empress-of-salt-and-fortune':
        book['rating'] = 3.99
    if book['link'] == 'https://www.goodreads.com/book/show/55401.Deadhouse_Gates':
        book['rating'] = 4.26
    if book['link'] == 'https://www.goodreads.com/book/show/56978100-the-girl-who-fell-beneath-the-sea':
        book['rating'] = 4.15
    if book['link'] == 'https://www.goodreads.com/book/show/2986865-eon':
        book['rating'] = 3.96
    if book['link'] == 'https://www.goodreads.com/book/show/59263.The_Golem_s_Eye':
        book['rating'] = 4.11
    if book['link'] == 'https://www.goodreads.com/book/show/30183.Marked':
        book['rating'] = 3.81
    if book['link'] == 'https://www.goodreads.com/book/show/110494.Living_Dead_in_Dallas':
        book['rating'] = 3.96
    if book['link'] == 'https://www.goodreads.com/book/show/2802316-shadow-kiss':
        book['rating'] = 4.34
    if book['link'] == 'https://www.goodreads.com/book/show/60177373-fairy-tale':
        book['rating'] = 4.15
    if book['link'] == 'https://www.goodreads.com/book/show/23127048-air-awakens':
        book['rating'] = 4.01
    if book['link'] == 'https://www.goodreads.com/book/show/153784.First_Test':
        book['rating'] = 4.24
    if book['link'] == 'https://www.goodreads.com/book/show/6585201-changes':
        book['rating'] = 4.50
    if book['link'] == 'https://www.goodreads.com/book/show/40158.The_Queen_of_Attolia':
        book['rating'] = 4.17
    if book['link'] == 'https://www.goodreads.com/book/show/29394.Cursor_s_Fury':
        book['rating'] = 4.36
    if book['link'] == 'https://www.goodreads.com/book/show/27366528-beneath-the-sugar-sky':
        book['rating'] = 3.86
    if book['link'] == 'https://www.goodreads.com/book/show/34219873-the-trials-of-morrigan-crow':
        book['rating'] = 4.34
    if book['link'] == 'https://www.goodreads.com/book/show/24094.Wolf_Speaker':
        book['rating'] = 4.20
    if book['link'] == 'https://www.goodreads.com/book/show/58293924-book-of-night':
        book['rating'] = 3.54
    if book['link'] == 'https://www.goodreads.com/book/show/13833.Emperor_Mage':
        book['rating'] = 4.28
    if book['link'] == 'https://www.goodreads.com/book/show/53375824-lore':
        book['rating'] = 3.78
    if book['link'] == 'https://www.goodreads.com/book/show/8058301-ghost-story':
        book['rating'] = 4.25
    if book['link'] == 'https://www.goodreads.com/book/show/12891107-king-of-thorns':
        book['rating'] = 4.19
    if book['link'] == 'https://www.goodreads.com/book/show/91989.Black_Powder_War':
        book['rating'] = 3.89
    if book['link'] == 'https://www.goodreads.com/book/show/20578940-the-iron-trial':
        book['rating'] = 3.95
    if book['link'] == 'https://www.goodreads.com/book/show/6479259-spirit-bound':
        book['rating'] = 4.33
    if book['link'] == 'https://www.goodreads.com/book/show/30095464-the-bone-witch':
        book['rating'] = 3.70
    if book['link'] == 'https://www.goodreads.com/book/show/74270.Luck_in_the_Shadows':
        book['rating'] = 4.09
    if book['link'] == 'https://www.goodreads.com/book/show/40550366-blood-honey':
        book['rating'] = 3.54
    if book['link'] == 'https://www.goodreads.com/book/show/23301545-the-sleeper-and-the-spindle':
        book['rating'] = 3.89
    if book['link'] == 'https://www.goodreads.com/book/show/42201962-the-deep':
        book['rating'] = 3.79
    if book['link'] == 'https://www.goodreads.com/book/show/46261182-the-awakening':
        book['rating'] = 3.92
    if book['link'] == 'https://www.goodreads.com/book/show/22522805-the-buried-giant':
        book['rating'] = 3.56
    if book['link'] == 'https://www.goodreads.com/book/show/11774272-the-killing-moon':
        book['rating'] = 3.95
    if book['link'] == 'https://www.goodreads.com/book/show/34197390-trickster-s-queen':
        book['rating'] = 4.28
    if book['link'] == 'https://www.goodreads.com/book/show/30145666-the-dark-prophecy':
        book['rating'] = 4.17
    if book['link'] == 'https://www.goodreads.com/book/show/20443207-the-winner-s-crime':
        book['rating'] = 4.10
    if book['link'] == 'https://www.goodreads.com/book/show/52504334-a-master-of-djinn':
        book['rating'] = 4.06
    if book['link'] == 'https://www.goodreads.com/book/show/23444482-the-traitor-baru-cormorant':
        book['rating'] = 4.06
    if book['link'] == 'https://www.goodreads.com/book/show/16303287-the-bane-chronicles':
        book['rating'] = 4.10
    if book['link'] == 'https://www.goodreads.com/book/show/10381195-moon-over-soho':
        book['rating'] = 4.09
    if book['link'] == 'https://www.goodreads.com/book/show/61198133-the-stolen-heir':
        book['rating'] = 4.07
    if book['link'] == 'https://www.goodreads.com/book/show/29008738-the-bird-and-the-sword':
        book['rating'] = 4.20
    if book['link'] == 'https://www.goodreads.com/book/show/112750.Darkfever':
        book['rating'] = 4.06
    if book['link'] == 'https://www.goodreads.com/book/show/31520883-a-sky-beyond-the-storm':
        book['rating'] = 4.33
    if book['link'] == 'https://www.goodreads.com/book/show/25100.The_Sandman_Vol_3':
        book['rating'] = 4.24
    if book['link'] == 'https://www.goodreads.com/book/show/169875.Searching_for_Dragons':
        book['rating'] = 4.27
    if book['link'] == 'https://www.goodreads.com/book/show/20443235-the-winner-s-kiss':
        book['rating'] = 4.22
    if book['link'] == 'https://www.goodreads.com/book/show/49826643-daughter-of-no-worlds':
        book['rating'] = 4.12
    if book['link'] == 'https://www.goodreads.com/book/show/1421990.Halfway_to_the_Grave':
        book['rating'] = 4.13
    if book['link'] == 'https://www.goodreads.com/book/show/34196663-vow-of-thieves':
        book['rating'] = 4.36
    if book['link'] == 'https://www.goodreads.com/book/show/393146.The_Naming':
        book['rating'] = 4.03
    if book['link'] == 'https://www.goodreads.com/book/show/9461562-the-cloud-roads':
        book['rating'] = 3.97
    if book['link'] == 'https://www.goodreads.com/book/show/92855.First_King_of_Shannara':
        book['rating'] = 3.97
    if book['link'] == 'https://www.goodreads.com/book/show/7488244-unearthly':
        book['rating'] = 4.01
    if book['link'] == 'https://www.goodreads.com/book/show/21570318-crimson-bound':
        book['rating'] = 3.63
    if book['link'] == 'https://www.goodreads.com/book/show/25740412-the-black-witch':
        book['rating'] = 4.11
    if book['link'] == 'https://www.goodreads.com/book/show/13316328-the-last-dragonslayer':
        book['rating'] = 3.87
    if book['link'] == 'https://www.goodreads.com/book/show/4703427-dust-of-dreams':
        book['rating'] = 4.3
    if book['link'] == 'https://www.goodreads.com/book/show/56980403-vespertine':
        book['rating'] = 4.13
    if book['link'] == 'https://www.goodreads.com/book/show/41716919-jade-war':
        book['rating'] = 4.44
    if book['link'] == 'https://www.goodreads.com/book/show/17332556-the-burning-sky':
        book['rating'] = 3.91
    if book['link'] == 'https://www.goodreads.com/book/show/24641800-the-demon-in-the-wood':
        book['rating'] = 3.93
    if book['link'] == 'https://www.goodreads.com/book/show/8559047-magic-slays':
        book['rating'] = 4.40
    if book['link'] == 'https://www.goodreads.com/book/show/68497.The_Scar':
        book['rating'] = 4.18
    if book['link'] == 'https://www.goodreads.com/book/show/116563.So_You_Want_to_Be_a_Wizard':
        book['rating'] = 3.84
    if book['link'] == 'https://www.goodreads.com/book/show/20168816-rogues':
        book['rating'] = 3.88
    if book['link'] == 'https://www.goodreads.com/book/show/13262783-every-day':
        book['rating'] = 3.91
    if book['link'] == 'https://www.goodreads.com/book/show/32667458-the-last-namsara':
        book['rating'] = 4.08
    if book['link'] == 'https://www.goodreads.com/book/show/13415554-the-assassin-and-the-pirate-lord':
        book['rating'] = 4.21
    if book['link'] == 'https://www.goodreads.com/book/show/56530123-glint':
        book['rating'] = 4.20
    if book['link'] == 'https://www.goodreads.com/book/show/8447255-the-crippled-god':
        book['rating'] = 4.50
    if book['link'] == 'https://www.goodreads.com/book/show/60766189-a-day-of-fallen-night':
        book['rating'] = 4.49
    if book['link'] == 'https://www.goodreads.com/book/show/7514925-tiger-lily':
        book['rating'] = 3.96
    if book['link'] == 'https://www.goodreads.com/book/show/30238163-ace-of-shades':
        book['rating'] = 3.85
    if book['link'] == 'https://www.goodreads.com/book/show/7735333-matched':
        book['rating'] = 3.63
    if book['link'] == 'https://www.goodreads.com/book/show/12751687-finale':
        book['rating'] = 4.13
    if book['link'] == 'https://www.goodreads.com/book/show/44774415-mooncakes':
        book['rating'] = 3.83
    if book['link'] == 'https://www.goodreads.com/book/show/290628.The_Darkest_Road':
        book['rating'] = 4.20
    if book['link'] == 'https://www.goodreads.com/book/show/23299513-the-shadow-queen':
        book['rating'] = 3.74
    if book['link'] == 'https://www.goodreads.com/book/show/40291564-storm-and-fury':
        book['rating'] = 4.09
    if book['link'] == 'https://www.goodreads.com/book/show/42133479-wicked-fox':
        book['rating'] = 3.75
    if book['link'] == 'https://www.goodreads.com/book/show/25103.The_Sandman_Vol_8':
        book['rating'] = 4.45


0


In [66]:
books

[{'title': 'Harry Potter and the Philosopher’s Stone (Harry Potter, #1)',
  'link': 'https://www.goodreads.com/book/show/72193.Harry_Potter_and_the_Philosopher_s_Stone',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.47 — 9,350,074 ratings — published 1997'},
 {'title': 'Harry Potter and the Chamber of Secrets (Harry Potter, #2)',
  'link': 'https://www.goodreads.com/book/show/15881.Harry_Potter_and_the_Chamber_of_Secrets',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.43 — 3,628,391 ratings — published 1998'},
 {'title': 'Harry Potter and the Prisoner of Azkaban (Harry Potter, #3)',
  'link': 'https://www.goodreads.com/book/show/5.Harry_Potter_and_the_Prisoner_of_Azkaban',
  'author': 'J.K. Rowling',
  'rating': 'avg rating 4.58 — 3,844,071 ratings — published 1999'},
 {'title': 'The Hobbit (The Lord of the Rings, #0)',
  'link': 'https://www.goodreads.com/book/show/5907.The_Hobbit',
  'author': 'J.R.R. Tolkien',
  'rating': 'avg rating 4.28 — 3,692,122 ratings — publis

In [80]:
# All rating keys now have values which are either floats or include 'avg rating' format
for book in books:
    if type(book['rating']) != float:
        if 'avg rating' not in book['rating']:
            print(book['rating'])

Now that all of the books have ratings filled in, there is the question of cleaning the data further: 
1. As many Fantasy novels are part of a series, we want to store the name of the series instead of the individual book (e.g. 'The Kingkiller Chronicle' instead of 'The Name of the Wind'). Then, we will only keep one book from each series, which will be the one that received the highest rating. This is because for the analysis of the most popular sub-genres/themes in fantasy fiction, we can assume that there will be a lot of overlap between different installments in a series in terms of topics/creatures/myths included in the text.
<br><br>
2. Secondly, the books which have the 'rating' stored in the format such as 'avg rating 3.50 — 951,841 ratings — published 2016' need to have the 'rating' value processed so it only includes the floating point number storing the rating.

In [96]:
# How to extract text from between parentheses using regular expressions
# Ref: https://www.geeksforgeeks.org/python-extract-substrings-between-brackets/
import re
# Enables saving the original list/dictionary as a backup
from copy import deepcopy

# Extracts the title of book *series* from the whole title string
def findBookSeries(title):
    # If the title has a '#' symbol inside it on Goodreads, it means it is part of a series
    # E.g. 'The Amazing Maurice and His Educated Rodents (Discworld, #28)' --> let us store this just as 'Discworld'
    if '#' in title:
        series_name = re.findall(r'\(.*?\)', title)[0]
        # Get the series name before the ', #num' part
        # Get rid of the first '('/parenthesis with [1:]
        series_name = series_name.split(',')[0][1:]
        return series_name
    else:
        return title

# Remove all parentheses now series have been extracted, so we don't get duplicate titles with (Paperback) or (Hardback) after
def removeEverythingAfterParentheses(title):
    return title.split('(')[0]
    

# Extracts just the rating from the book information which is formatted as 'avg rating 3.97 — 27,632 ratings — published 1990'
def returnRating(rating_info):
    # Use split method to split string where there is a hyphen
    avg_rating = rating_info.split(' - ')[0]
    avg_rating = avg_rating.split(' ')[2]
    print(avg_rating)
    return float(avg_rating)

# This produces a fully independent copy from the original list (copies are made, not references to the original objects)
books_cleaned = deepcopy(books)

for book in books_cleaned:
    book['title'] = findBookSeries(book['title'])
    if '(' in book['title']:
        book['title'] = removeEverythingAfterParentheses(book['title'])
    # First check if rating is in the form 'avg rating 3.97 — 27,632 ratings — published 1990'
    if type(book['rating']) == str and 'avg rating' in book['rating']:
            book['rating'] = returnRating(book['rating'])
            
print(books_cleaned)
    

4.47
4.43
4.58
4.28
4.56
4.50
4.58
4.62
4.44
4.38
4.41
4.47
4.56
4.23
4.54
4.15
4.02
4.33
3.64
4.07
4.25
4.52
3.50
4.07
4.00
4.26
4.01
4.29
4.14
4.33
4.14
3.99
4.21
4.05
3.58
3.72
3.92
4.10
3.73
4.20
3.98
4.26
4.09
3.99
4.04
4.27
3.91
3.85
4.27
4.31
3.96
4.07
4.01
4.27
4.24
3.86
4.10
4.23
3.54
4.24
4.12
3.96
4.13
4.05
4.17
4.33
3.99
4.23
4.06
4.28
4.33
4.10
3.92
4.34
3.96
4.18
4.06
4.33
3.79
3.99
4.15
4.16
4.12
4.26
4.41
4.18
4.32
4.46
4.09
3.68
4.24
4.16
4.31
4.01
3.89
4.28
3.90
4.55
4.00
4.02
4.40
4.28
4.25
4.12
4.01
4.39
4.04
3.95
3.92
4.26
4.49
4.35
4.24
4.07
4.08
4.10
4.01
4.17
4.06
4.14
4.14
4.13
4.31
3.98
4.05
4.23
4.26
4.06
4.05
3.78
4.19
4.17
4.24
3.97
3.96
4.23
3.96
3.77
4.16
4.19
4.00
3.89
4.28
4.21
4.15
4.27
4.29
4.34
4.21
3.85
4.00
4.22
4.17
4.31
4.11
4.15
3.85
4.25
4.16
4.05
4.19
4.02
4.22
4.11
4.12
4.19
3.88
4.18
4.24
4.15
4.05
4.01
4.06
3.93
4.08
4.21
3.75
4.10
3.99
4.05
4.17
4.10
4.14
4.54
3.87
3.99
4.14
3.98
4.06
4.10
4.13
4.19
3.99
4.74
4.23
4.35
4.13
4.08
4.12
4.13


In [105]:
# We have done quite a lot of work by now, by finally scraping all 1250 books (which took hours),
# then manually filling in unsuccessfully scraped rating values, then cleaning the title/rating fields...
# so we will save the data we are working with thus far!
import csv 

filename = 'scraped_books17june2023.csv'

def saveBooksToCsv(fn, books, header):
    # Open new file called 'fantasyBooksFromGoodreads.csv' and write to this file
    with open(fn, 'w', encoding="utf-8", newline='') as f:
        writer = csv.DictWriter(f, fieldnames=header)
        writer.writeheader()
        for row in books:
            writer.writerow(row)



In [None]:
header = ["title", "link", "author", "rating"]
saveBooksToCsv(filename, books_cleaned, header)

In [2]:
# Code to open the books data from CSV file

import csv
filename = "fantasyBooksFromGoodreads.csv"

loaded_books_from_file = []
def loadBooksFromFile(filename, books_dict):
    with open(filename, 'r') as data:
    reader = csv.DictReader(data)
    for book in reader:
        book['rating'] = float(book('rating'))
        books_dict.append(book)

loadBooksFromFile(filename, loadBooksFromFile)


Now that the initial dataset storing book titles, authors, URLs and ratings from Goodreads has been scraped, the null values for the ratings has been manually filled in, and data has been cleaned, we can begin working with pandas to see which are the 100 top rated and 100 bottom rated fantasy books on Goodreads.
<br> <br>
The *pandas* drop_duplicates function is really helpful here, as we can sort the dataset by rating in descending order, and easily set the parameter inside drop_duplicates to drop all instances of a book-entry/row with the same title (for books with the same series), and just keep the book in the series with the highest rating.

In [13]:
import pandas as pd
# Display 150 top and 150 lowest rated books
pd.set_option('display.max_rows', 150)

df = pd.DataFrame(loaded_books_from_file)
df
# Sort from highest to lowest
df2 = df.sort_values(by='rating', ascending=False)
df2
# Delete all instances of a book series except highest-rated book
df3 = df2.drop_duplicates(subset='title', keep='first')
df3

# Drop all instances of 'Lord of the Rings' after the first one with a different string (but it's the same book)
lotr_duplicates = df3[df3['title'].str.contains('Lord of the Rings')]
lotr_duplicates_to_delete = lotr_duplicates[lotr_duplicates['rating'] < 4.61]
list_to_delete1 = lotr_duplicates_to_delete.index.values.tolist() # rows at index 13 and index 41 are duplicates of the first "Lord of the Rings" entry and should be dropped
df4 = df3.drop(list_to_delete1)

# Drop all instances of Sandman comics except first one by Neil Gaiman, so all rated less than 4.55 should be dropped
sandman_duplicates = df4[df3['title'].str.contains('Sandman')]
sandman_duplicates_to_delete = sandman_duplicates[sandman_duplicates['rating'] < 4.55]
list_to_delete2 = sandman_duplicates_to_delete.index.values.tolist()
df5 = df4.drop(list_to_delete2)

# Get 150 top-rated books
top150books_df = df5.head(150)

# Get 150 worst-rated books
lowest150books_df = df5.tail(150)

# Convert the pd DataFrames for 150 highest and lowest rated books to a list of dicts, 1 dict = 1 book
top150books_dict = top150books_df.to_dict('records')
lowest150books_dict = lowest150books_df.to_dict('records')

# Save both lists as CSV files
header = ["title", "link", "author", "rating"]
saveBooksToCsv('top150books.csv', top150books_dict, header);
saveBooksToCsv('lowest150books.csv', lowest150books_dict, header);

  sandman_duplicates = df4[df3['title'].str.contains('Sandman')]


In [38]:
# Now that we have dictionaries storing the title, author, link and rating for the 150 best and worst rated books,
# we can run more web scraping code to get the top 10 genres for each book in this list.
# The web scraping code for genres is quite complicated to run, as it involves automating a browser with Selenium,
# and clicking on multiple buttons one after another to finally get to all the genres that readers have categorized
# the book as. This takes can take many hours, so I wanted to get only the top 150 best- and lowest-rated books to get this
# data for first, as these are the books we are interested in analyzing, and it takes less time to scrape all the data
# for 300 books than it does for 1250 books!
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
#from selenium.webdriver.common.keys import Keys # Allows logging in to Goodreads (sending username + password) via chromeDriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By 
from time import sleep
from random import random
from selenium.common.exceptions import NoSuchElementException

# Create the programmable browser
pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
browser = webdriver.Chrome(service=service)

def findGenresLinkForBook(browser, bookURL):
    browser.get(bookURL)
    btons = browser.find_elements(By.CLASS_NAME, "Button--tag-inline")
    # We want to find the button with class name "Button--tag-inline" which has an 'aria-label' attribute
    # with the content: "Show all items in the list""
    for btn in btons:
        if btn.get_attribute('aria-label') == "Show all items in the list":
            # Click the correct button
            # btn.click() --> does not work: ElementClickInterceptedException 
            # Fix - Ref: https://stackoverflow.com/questions/37879010/selenium-debugging-element-is-not-clickable-at-point-x-y
            browser.execute_script("arguments[0].click();", btn)
            # Sleep until next action
            sleep(random() * 2)
            # Now we need to find another btn... the one which says "...show all", and is hidden until the previous bton is clicked
            btons2 = browser.find_elements(By.CLASS_NAME, "Button--tag-inline")
            for btn2 in btons2:
                # The aria-label for the "show all" link is as written below
                if btn2.get_attribute('aria-label') == "Tap to show all top genres for this book":
                    # Return the URL link to the genre-page for the book from the "show all" link/button
                    book_genres_url = btn2.get_attribute('href')
                    print(book_genres_url)
                    return book_genres_url

def getGenresPageForBooks(browser, books):
    for book in books:
        # 3 attempts in case first time elements do not return
        for i in range(0, 3):
            # Create a new key for the book called 'genres_link' if doesn't exist
            book['genres_link'] = findGenresLinkForBook(browser, book['link'])
            sleep(random() * 3)

    
getGenresPageForBooks(browser, top150books_dict)

# Get all the elements with class 'left' (the class for the div storing each book's information)
#book_divs = browser.find_elements(By.CLASS_NAME, "left")

https://www.goodreads.com/work/shelves/16482835-words-of-radiance
https://www.goodreads.com/work/shelves/96945623-fourth-wing-the-empyrean-1
https://www.goodreads.com/work/shelves/44600531-assassin-s-fate
https://www.goodreads.com/work/shelves/25272014-kingdom-of-ash
https://www.goodreads.com/work/shelves/89369-j-r-r-tolkien-4-book-boxed-set-the-hobbit-and-the-lord-of-the-rings
https://www.goodreads.com/work/shelves/42090179-crooked-kingdom
https://www.goodreads.com/work/shelves/6309710-clockwork-princess
https://www.goodreads.com/work/shelves/23811929-skin-game
https://www.goodreads.com/work/shelves/21539506-the-house-of-hades
https://www.goodreads.com/work/shelves/93481684-the-ballad-of-never-after
https://www.goodreads.com/work/shelves/40666827-the-sandman-vol-7-brief-lives
https://www.goodreads.com/work/shelves/10558806-a-memory-of-light
https://www.goodreads.com/work/shelves/95396305-tress-of-the-emerald-sea
https://www.goodreads.com/work/shelves/2502882-the-wise-man-s-fear
https:

https://www.goodreads.com/work/shelves/3797890-beyond-the-shadows
https://www.goodreads.com/work/shelves/781271-the-chronicles-of-narnia
https://www.goodreads.com/work/shelves/52062040-call-down-the-hawk
https://www.goodreads.com/work/shelves/24495476-murder-of-crows
https://www.goodreads.com/work/shelves/73710031-the-shadow-of-the-gods
https://www.goodreads.com/work/shelves/29234-by-the-sword
https://www.goodreads.com/work/shelves/2186426-homeland
https://www.goodreads.com/work/shelves/60177252-the-traitor-queen-the-bridge-kingdom-2-the-traitor-queen
https://www.goodreads.com/work/shelves/992628-the-princess-bride
https://www.goodreads.com/work/shelves/23840347-the-world-of-ice-and-fire-the-untold-history-of-westeros-and-the-game-o
https://www.goodreads.com/work/shelves/21979689-firefight
https://www.goodreads.com/work/shelves/1805413-legend
https://www.goodreads.com/work/shelves/90789229-babel-or-the-necessity-of-violence-an-arcane-history-of-the-oxford-tra
https://www.goodreads.com/

In [40]:
# Repeat to see if can get elements which returned 'None' the first time
for book in top150books_dict:
    if book['genres_link'] == None:
        book['genres_link'] = findGenresLinkForBook(browser, book['link'])


https://www.goodreads.com/work/shelves/2962492-harry-potter-series-box-set-harry-potter-1-7
https://www.goodreads.com/work/shelves/25126749-a-court-of-mist-and-fury
https://www.goodreads.com/work/shelves/4551489-the-last-olympian
https://www.goodreads.com/work/shelves/84765606-a-kingdom-of-flesh-and-fire
https://www.goodreads.com/work/shelves/68554383-our-violent-ends
https://www.goodreads.com/work/shelves/1742269-the-stand
https://www.goodreads.com/work/shelves/2792775-the-hunger-games
https://www.goodreads.com/work/shelves/46924845-tales-from-the-shadowhunter-academy


In [47]:
# All top books have genres link now!
# Do the same for lowest-rated books
pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
browser = webdriver.Chrome(service=service)
getGenresPageForBooks(browser, lowest150books_dict)


https://www.goodreads.com/work/shelves/6961290-forever
https://www.goodreads.com/work/shelves/80140444-only-a-monster
https://www.goodreads.com/work/shelves/2285229-the-city-of-ember
https://www.goodreads.com/work/shelves/2581-fairest
https://www.goodreads.com/work/shelves/41358233-the-sleeper-and-the-spindle
https://www.goodreads.com/work/shelves/58194201-the-merciful-crow
https://www.goodreads.com/work/shelves/24221752-snow-like-ashes
https://www.goodreads.com/work/shelves/27565413-rogues
https://www.goodreads.com/work/shelves/44080112-hunted
https://www.goodreads.com/work/shelves/53100842-ash-princess
https://www.goodreads.com/work/shelves/73806779-a-magic-steeped-in-poison
https://www.goodreads.com/work/shelves/13380425-the-last-dragonslayer
https://www.goodreads.com/work/shelves/57809962-witchmark
https://www.goodreads.com/work/shelves/3088909-the-ladies-of-grace-adieu-and-other-stories
https://www.goodreads.com/work/shelves/61870369-the-binding
https://www.goodreads.com/work/shel

https://www.goodreads.com/work/shelves/4021549-evermore
https://www.goodreads.com/work/shelves/2394716-gulliver-s-travels-into-several-remote-nations-of-the-world-in-four-par
https://www.goodreads.com/work/shelves/53292100-the-hazel-wood
https://www.goodreads.com/work/shelves/44916095-the-star-touched-queen
https://www.goodreads.com/work/shelves/57709985-wicked-saints
https://www.goodreads.com/work/shelves/74311549-the-wolf-and-the-woodsman
https://www.goodreads.com/work/shelves/1758112-confessions-of-an-ugly-stepsister
https://www.goodreads.com/work/shelves/91438756-book-of-night
https://www.goodreads.com/work/shelves/1479280-wicked-the-life-and-times-of-the-wicked-witch-of-the-west
https://www.goodreads.com/work/shelves/44394042-wintersong
https://www.goodreads.com/work/shelves/80304575-the-ex-hex
https://www.goodreads.com/work/shelves/2960716-shaman-s-crossing
https://www.goodreads.com/work/shelves/8624218-shades-of-milk-and-honey
https://www.goodreads.com/work/shelves/189503-beowul

In [50]:
# Repeat to see if can get elements which returned 'None' the first time
for book in lowest150books_dict:
    if book['genres_link'] == None:
        print(book)
        book['genres_link'] = findGenresLinkForBook(browser, book['link'])

In [86]:
# Now write code to navigate to the Genres page for each book, and get the top-10 genres for each book
# On Goodreads, a "genre" refers to a shelf/list structure which a user has 'tagged' the book with.
# Therefore, some less-common shelves can be 'my-favourite-books' or 'cool-books' created by some user
# We are interested in the top 10 genres only as they are the ones that are most well-recognized,
# as many users have tagged the book with this genre/category
# All top books have genres link now!
# Do the same for lowest-rated books
pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
browser = webdriver.Chrome(service=service)
from selenium.common.exceptions import NoSuchElementException
        
def getGenresForOneBook(browser, book, unacceptable_genres):
    list_of_genres = []
    # Multiple attempts to scrape the page, as often first one doesn't work
    for i in range (0, 3):
        # Go to book's URL on Goodreads
        browser.get(book['link'])
        # Find all HTML wrapper for the 'genres' section
        genre_wrappers = browser.find_elements(By.CLASS_NAME, "BookPageMetadataSection__genreButton")
        for genre_wrapper in genre_wrappers:
            # Find all the genres inside that 'genres' wrapper
            genre_containers = genre_wrapper.find_elements(By.TAG_NAME, "a")
            # For each genre-container, extract the URL/href and then the last part of URL to get name of genre
            for genre_container in genre_containers:
                genre_url = genre_container.get_attribute("href")
                # Get genre part of URL, so after "https://www.goodreads.com/genres/"
                genre = genre_url.split('https://www.goodreads.com/genres/')[1]
                if genre not in unacceptable_genres:
                    list_of_genres.append(genre)
        # If genres successfully scraped, then break out of the for-loop, otherwise try again up until 3 tries
        if len(genre_wrappers) > 0:
            break;
        sleep(random() * 2)
    book['genres'] = list_of_genres
    print(book['title'], book['genres'])


# Gets top genres for each book's URL page
def getGenresForAllBooks(browser, books):
    # Genres you don't want to add to the dataset: e.g. it's obvious that the fantasy books will be in the 'fantasy' genre!
    unacceptable_genres = ['fantasy', 'fiction', 'audiobook']
    for book in books:
        getGenresForOneBook(browser, book, unacceptable_genres)
    
getGenresForAllBooks(browser, top150books_dict)
        


The Stormlight Archive ['epic-fantasy', 'high-fantasy', 'adult', 'magic']
Harry Potter ['young-adult', 'magic', 'childrens', 'adventure', 'classics']
The Empyrean ['romance', 'dragons', 'fantasy-romance', 'new-adult']
The Fitz and the Fool ['epic-fantasy', 'dragons', 'high-fantasy', 'adventure']
Throne of Glass ['young-adult', 'romance', 'new-adult', 'fae', 'magic']
A Court of Thorns and Roses ['romance', 'young-adult', 'new-adult', 'fae', 'magic']
J.R.R. Tolkien 4-Book Boxed Set: The Hobbit and The Lord of the Rings  []
Six of Crows ['young-adult', 'romance', 'young-adult-fantasy', 'lgbt', 'magic']
The Infernal Devices ['young-adult', 'romance', 'paranormal', 'historical-fiction', 'steampunk', 'urban-fantasy']
The Dresden Files ['urban-fantasy', 'mystery', 'paranormal', 'magic']
The Heroes of Olympus []
Once Upon a Broken Heart ['romance', 'young-adult', 'young-adult-fantasy', 'fantasy-romance', 'magic']
The Sandman Vol. 7: Brief Lives  ['graphic-novels', 'comics', 'graphic-novels-com

Lore Olympus ['graphic-novels', 'romance', 'mythology', 'comics', 'retellings']
Grishaverse ['young-adult', 'short-stories', 'retellings', 'fairy-tales', 'young-adult-fantasy']
Red Country  []
Ender's Saga ['science-fiction', 'young-adult', 'classics', 'science-fiction-fantasy', 'dystopia']
The Riftwar Saga []
Immortals ['young-adult', 'magic', 'young-adult-fantasy', 'adventure', 'high-fantasy']
The Empire Trilogy ['epic-fantasy', 'science-fiction-fantasy', 'high-fantasy', 'magic', 'epic']
Kindred  ['historical-fiction', 'science-fiction', 'time-travel', 'classics', 'historical']
Gentleman Bastard ['adventure', 'high-fantasy', 'adult', 'epic-fantasy']
The Locked Tomb ['science-fiction', 'lgbt', 'queer', 'horror', 'lesbian']
The Lions of Al-Rassan  ['historical-fiction', 'historical-fantasy', 'historical', 'epic-fantasy', 'science-fiction-fantasy']
Cemetery Boys ['lgbt', 'young-adult', 'romance', 'queer', 'paranormal']
Warbreaker ['high-fantasy', 'epic-fantasy', 'adult', 'magic']
The Ka

In [89]:
# Try again for books that didn't scrape properly
for book in top150books_dict:
    unacceptable_genres = ['fantasy', 'fiction', 'audiobook']
    if len(book['genres']) == 0:
        getGenresForOneBook(browser, book, unacceptable_genres)

In [90]:
# Do the same for lowest-rated books
getGenresForAllBooks(browser, lowest150books_dict)

The Wolves of Mercy Falls ['young-adult', 'romance', 'paranormal', 'werewolves', 'paranormal-romance']
Cassidy Blake ['middle-grade', 'paranormal', 'young-adult', 'horror', 'ghosts']
Monsters ['young-adult', 'romance', 'young-adult-fantasy', 'paranormal', 'urban-fantasy', 'time-travel']
Book of Ember ['young-adult', 'dystopia', 'science-fiction', 'middle-grade', 'childrens']
Fairest  ['young-adult', 'fairy-tales', 'romance', 'middle-grade', 'retellings']
The Sleeper and the Spindle  ['graphic-novels', 'young-adult', 'fairy-tales', 'retellings', 'short-stories']
The Merciful Crow ['young-adult', 'young-adult-fantasy', 'romance', 'magic', 'lgbt']
Snow Like Ashes ['young-adult', 'romance', 'magic', 'young-adult-fantasy', 'high-fantasy']
Rogues  ['short-stories', 'anthologies', 'science-fiction', 'mystery', 'science-fiction-fantasy']
Hunted  ['young-adult', 'retellings', 'romance', 'fairy-tales', 'young-adult-fantasy']
Ash Princess Trilogy ['young-adult', 'romance', 'young-adult-fantasy', 

A Land Fit for Heroes ['lgbt', 'dark-fantasy', 'epic-fantasy', 'queer', 'high-fantasy']
The Serpent Gates ['lgbt', 'queer', 'lesbian', 'adult', 'high-fantasy']
Ravenspire ['young-adult', 'retellings', 'romance', 'fairy-tales', 'magic', 'dragons']
Practical Magic ['magical-realism', 'romance', 'witches', 'magic', 'paranormal']
Dark Olympus ['romance', 'mythology', 'retellings', 'adult', 'greek-mythology']
The Twilight Saga ['young-adult', 'romance', 'vampires', 'paranormal', 'paranormal-romance']
Mr. Penumbra's 24-Hour Bookstore ['mystery', 'books-about-books', 'contemporary', 'adult']
The Gilded Wolves ['young-adult', 'historical-fiction', 'historical', 'young-adult-fantasy', 'lgbt']
Before the Coffee Gets Cold ['magical-realism', 'contemporary', 'time-travel', 'japan', 'japanese-literature']
Sorcerer Royal ['historical-fiction', 'historical', 'magic', 'historical-fantasy', 'romance']
Rise of the Empress ['young-adult', 'retellings', 'young-adult-fantasy', 'romance', 'fairy-tales']
Cin

In [96]:
# Try again for books that didn't scrape properly
for book in lowest150books_dict:
    unacceptable_genres = ['fantasy', 'fiction', 'audiobook']
    if len(book['genres']) == 0:
        print(book)
        getGenresForOneBook(browser, book, unacceptable_genres)
    

In [107]:
top150books_dict
header=["title", "link", "author", "rating", "genres"]
saveBooksToCsv("top150bookswithgenres.csv", top150books_dict, header)
saveBooksToCsv("lowest150bookswithgenres.csv", lowest150books_dict, header)

In [142]:
# Now that we have gotten the genres for each book, it is required to get the description of the book, which also requires
# some complicated Selenium web scraping skills...
from selenium.webdriver.common.keys import Keys

pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
browser = webdriver.Chrome(service=service)

# Get the book description from Goodreads
def getDescriptionFromBookURL(book, browser):
    # Do 3 attempts as Goodreads is so difficult to scrape, there is always a large chance that the same code won't work once...
    for i in range (0, 3):
        sleep(random() * 3)
        browser.get(book['link'])
        # Find all button containers with Button--inline class
        button_containers = browser.find_elements(By.CLASS_NAME, "Button--inline")
        # If no elements were found, try to run the for-loop again
        if len(button_containers) == 0:
            continue
        # If elements were found, then get the correct button-container with the unique aria-label
        for container in button_containers:
            if container.get_attribute("aria-label") == "Tap to show more book description": 
                # Unfortunately, the straightforward '.click()' function does not work on Goodreads
                # This is the workaround - ref: https://stackoverflow.com/questions/58255396/elementclickinterceptedexception-message-element-is-not-clickable-at-point-x
                container.send_keys(Keys.ENTER) # A more reliable way to click on a button
                # Wait, so site doesn't think this is a bot...
                sleep(random() * 3) 
        # Find elems with the class "Formatted" (the book description)
        descriptions = browser.find_elements(By.CLASS_NAME, "Formatted")
        if len(descriptions) == 0:
            continue
        else:
            # Get first element in the list (it is the description) and its inner HTML text
            description = descriptions[0].text
            print(description)
            book["description"] = description
            # Do not run for-loop again
            break;
            
for book in top150books_dict:
    getDescriptionFromBookURL(book, browser)
    sleep(random() * 3)
    

Words of Radiance, Book Two of the Stormlight Archive, continues the immersive fantasy epic that The Way of Kings began.

Expected by his enemies to die the miserable death of a military slave, Kaladin survived to be given command of the royal bodyguards, a controversial first for a low-status "darkeyes." Now he must protect the king and Dalinar from every common peril as well as the distinctly uncommon threat of the Assassin, all while secretly struggling to master remarkable new powers that are somehow linked to his honorspren, Syl.

The Assassin, Szeth, is active again, murdering rulers all over the world of Roshar, using his baffling powers to thwart every bodyguard and elude all pursuers. Among his prime targets is Highprince Dalinar, widely considered the power behind the Alethi throne. His leading role in the war would seem reason enough, but the Assassin's master has much deeper motives.

Brilliant but troubled Shallan strives along a parallel path. Despite being broken in ways

Danger and betrayal, love and loss, secrets and enchantment are woven together in the breathtaking finale to the #1 New York Times bestselling Infernal Devices Trilogy, prequel to the internationally bestselling Mortal Instruments series.

THE INFERNAL DEVICES WILL NEVER STOP COMING

A net of shadows begins to tighten around the Shadowhunters of the London Institute. Mortmain plans to use his Infernal Devices, an army of pitiless automatons, to destroy the Shadowhunters. He needs only one last item to complete his plan: he needs Tessa Gray.

Charlotte Branwell, head of the London Institute, is desperate to find Mortmain before he strikes. But when Mortmain abducts Tessa, the boys who lay equal claim to her heart, Jem and Will, will do anything to save her. For though Tessa and Jem are now engaged, Will is as much in love with her as ever.

As those who love Tessa rally to rescue her from Mortmain’s clutches, Tessa realizes that the only person who can save her is herself. But can a sin

The Starks are scattered.

Robb Stark may be King in the North, but he must bend to the will of the old tyrant Walder Frey if he is to hold his crown. And while his youngest sister, Arya, has escaped the clutches of the depraved Cersei Lannister and her son, the capricious boy-king Joffrey, Sansa Stark remains their captive.

Meanwhile, across the ocean, Daenerys Stormborn, the last heir of the Dragon King, delivers death to the slave-trading cities of Astapor and Yunkai as she approaches Westeros with vengeance in her heart.
All year the half-bloods have been preparing for battle against the Titans, knowing the odds of victory are grim. Kronos's army is stronger than ever, and with every god and half-blood he recruits, the evil Titan's power only grows.

While the Olympians struggle to contain the rampaging monster Typhon, Kronos begins his advance on New York City, where Mount Olympus stands virtually unguarded. Now it's up to Percy Jackson and an army of young demigods to stop the L

The Eisner, Harvey, and Hugo Award-winning phenomenon continues, as new parents Marko and Alana travel to an alien world to visit their hero, while the family's pursuers finally close in on their targets.

Collects: Saga #13-18.
Commander Sam Vimes of the Ankh-Morpork City Watch had it all. But now he's back in his own rough, tough past without even the clothes he was standing up in when the lightning struck...

Living in the past is hard. Dying in the past is incredibly easy. But he must survive, because he has a job to do. He must track down a murderer, teach his younger self how to be a good copper, and change the outcome of a bloody rebellion. There's a problem: if he wins, he's got no wife, no child, no future.

A Discworld Tale of One City, with a full chorus of street urchins, ladies of negotiable affection, rebels, secret policemen, and other children of the revolution.

Truth! Justice! Freedom!
And a Hard-boiled Egg!
Brandon Sanderson creates worlds, and those worlds are linke

A mother struggling to repress her violent past,
A son struggling to grasp his violent future,
A father blind to the danger that threatens them all.

When the winds of war reach their peninsula, will the Matsuda family have the strength to defend their empire? Or will they tear each other apart before the true enemies even reach their shores?

High on a mountainside at the edge of the Kaigenese Empire live the most powerful warriors in the world, superhumans capable of raising the sea and wielding blades of ice. For hundreds of years, the fighters of the Kusanagi Peninsula have held the Empire’s enemies at bay, earning their frozen spit of land the name ‘The Sword of Kaigen.’

Born into Kusanagi’s legendary Matsuda family, fourteen-year-old Mamoru has always known his purpose: to master his family’s fighting techniques and defend his homeland. But when an outsider arrives and pulls back the curtain on Kaigen’s alleged age of peace, Mamoru realizes that he might not have much time to be

In Jade War, the sequel to the World Fantasy Award-winning novel Jade City, the Kaul siblings battle rival clans for honor and control over an Asia-inspired fantasy metropolis.

On the island of Kekon, the Kaul family is locked in a violent feud for control of the capital city and the supply of magical jade that endows trained Green Bone warriors with supernatural powers they alone have possessed for hundreds of years.

Beyond Kekon's borders, war is brewing. Powerful foreign governments and mercenary criminal kingpins alike turn their eyes on the island nation. Jade, Kekon's most prized resource, could make them rich - or give them the edge they'd need to topple their rivals.

Faced with threats on all sides, the Kaul family is forced to form new and dangerous alliances, confront enemies in the darkest streets and the tallest office towers, and put honor aside in order to do whatever it takes to ensure their own survival - and that of all the Green Bones of Kekon.

Jade War is the sec

An inheritance of shadows. A love in chains. An unconquerable foe.

Cordelia Carstairs is a Shadowhunter, a warrior trained since childhood to battle demons. When her father is accused of a terrible crime, she and her brother travel to London in hopes of preventing the family’s ruin. Cordelia’s mother wants to marry her off, but Cordelia is determined to be a hero rather than a bride. Soon Cordelia encounters childhood friends James and Lucie Herondale and is drawn into their world of glittering ballrooms, secret assignations, and supernatural salons, where vampires and warlocks mingle with mermaids and magicians. All the while, she must hide her secret love for James, who is sworn to marry someone else.

But Cordelia’s new life is blown apart when a shocking series of demon attacks devastate London. These monsters are nothing like those Shadowhunters have fought before—these demons walk in daylight, strike down the unwary with incurable poison, and seem impossible to kill. London is i

“Evil is a completely different creature, Mac. Evil is bad that believes it’s good.” — MacKayla Lane was just a child when she and her sister, Alina, were given up for adoption and banished from Ireland forever. — Twenty years later, Alina is dead and Mac has returned to the country that expelled them to hunt her sister’s murderer. But after discovering that she descends from a bloodline both gifted and cursed, Mac is plunged into a secret history: an ancient conflict between humans and immortals who have lived concealed among us for thousands of years.

What follows is a shocking chain of events with devastating consequences, and now Mac struggles to cope with grief while continuing her mission to acquire and control the Sinsar Dubh -- a book of dark, forbidden magic scribed by the mythical Unseelie King, containing the power to create and destroy worlds.

In an epic battle between humans and Fae, the hunter becomes the hunted when the Sinsar Dubh turns on Mac and begins mowing a dead

Jaenelle Angelline now reigns as Queen-protector of the Shadow Realm. No longer will the corrupt Blood slaughter her people and defile her lands. But where one chapter ends, a final, unseen battle remains to be written, and Jaenelle must unleash the terrible power that is Witch to destroy her enemies once and for all.

Even so, she cannot stand alone. Somewhere, long lost in madness, is Daemon, her promised Consort. Only his unyielding love can complete her Court and secure her reign. Yet, even together, their strength may not be enough to hold back the most malevolent of forces.
They come first.

My vision was growing dimmer, the blackness and ghosts closing in. I swore it was like I could hear Robert whispering in my ear: The world of the dead won't give you up a second time. Just before the light completely vanished, I saw Dimitri's face join Lissa's. I wanted to smile. I decided then that if the two people I loved most were safe, I could leave this world.

The dead could finally ha

Kazi and Jase have survived, stronger and more in love than ever. Their new life now lies before them--the Ballengers will be outlaws no longer, Tor's Watch will be a kingdom, and Kazi and Jase will meet all challenges side by side, together at last.

Tarisai has always longed for the warmth of a family. She was raised in isolation by a mysterious, often absent mother known only as The Lady. The Lady sends her to the capital of the global empire of Aritsar to compete with other children to be chosen as one of the Crown Prince’s Council of 11. If she’s picked, she’ll be joined with the other Council members through the Ray, a bond deeper than blood. That closeness is irresistible to Tarisai, who has always wanted to belong somewhere. But The Lady has other ideas, including a magical wish that Tarisai is compelled to obey: Kill the Crown Prince once she gains his trust. Tarisai won’t stand by and become someone’s pawn—but is she strong enough to choose a different path for herself?
The h

“The Trunchbull” is no match for Matilda!

Matilda is a little girl who is far too good to be true. At age five-and-a-half she's knocking off double-digit multiplication problems and blitz-reading Dickens. Even more remarkably, her classmates love her even though she's a super-nerd and the teacher's pet. But everything is not perfect in Matilda's world...

For starters she has two of the most idiotic, self-centered parents who ever lived. Then there's the large, busty nightmare of a school principal, Miss ("The") Trunchbull, a former hammer-throwing champion who flings children at will, and is approximately as sympathetic as a bulldozer. Fortunately for Matilda, she has the inner resources to deal with such annoyances: astonishing intelligence, saintly patience, and an innate predilection for revenge.

Roald Dahl was a spy, ace fighter-pilot, chocolate historian, and medical inventor. He was also the author of Charlie and the Chocolate Factory, Matilda, The BFG, and many more brilliant

Nevada Baylor is faced with the most challenging case of her detective career—a suicide mission to bring in a suspect in a volatile case. Nevada isn't sure she has the chops. Her quarry is a Prime, the highest rank of magic user, who can set anyone and anything on fire.

Then she's kidnapped by Connor "Mad" Rogan—a darkly tempting billionaire with equally devastating powers. Torn between wanting to run or surrender to their overwhelming attraction, Nevada must join forces with Rogan to stay alive.

Rogan's after the same target, so he needs Nevada. But she's getting under his skin, making him care about someone other than himself for a change. And, as Rogan has learned, love can be as perilous as death, especially in the magic world.
A pilot stranded in the desert awakes one morning to see, standing before him, the most extraordinary little fellow. "Please," asks the stranger, "draw me a sheep." And the pilot realizes that when life's events are too difficult to understand, there is no

The ruling Asharites of Al-Rassan have come from the desert sands, but over centuries, seduced by the sensuous pleasures of their new land, their stern piety has eroded. The Asharite empire has splintered into decadent city-states led by warring petty kings. King Almalik of Cartada is on the ascendancy, aided always by his friend and advisor, the notorious Ammar ibn Khairan — poet, diplomat, soldier — until a summer afternoon of savage brutality changes their relationship forever.

Meanwhile, in the north, the conquered Jaddites' most celebrated — and feared — military leader, Rodrigo Belmonte, driven into exile, leads his mercenary company south.

In the dangerous lands of Al-Rassan, these two men from different worlds meet and serve — for a time — the same master. Sharing their interwoven fate — and increasingly torn by her feelings — is Jehane, the accomplished court physician, whose own skills play an increasing role as Al-Rassan is swept to the brink of holy war, and beyond.

Haun

Originally titled Children’s and Household Tales, The Complete Grimm’s Fairy Tales contains the essential bedtime stories for children worldwide for the better part of two centuries. The Brothers Grimm, Jacob and Wilhelm, were German linguists and cultural researchers who gathered legendary folklore and aimed to collect the stories exactly as they heard them. 2012 marked the 200th anniversary of Grimm’s Fairy Tales, and what better way to celebrate than to include all 211 stories into the Knickerbocker Classic Series?

Featuring all your favorite classics, including “Hansel and Gretel,” “Cinderella,” “The Frog Prince,” “Rapunzel,” “Snow White,” “Rumpelstiltskin,” and dozens more, The Complete Grimm’s Fairy Tales is also accompanied by 40 color plates and 60 black and white illustrations from award-winning English illustrator Arthur Rackham, whose books and prints are now highly sought-after collectibles.

The third title in the Knickerbocker Classic series has 800 pages of classic fair

The final chapter in Mercedes Lackey's spellbinding fantasy trilogy! The Herald-Mage, Vanyel, and his Companion, Yfandes, are alone responsible for saving the once-peaceful kingdom of Valdemar from the forces of a master who wields a dark, forbidding magic. And if either Vanyel or Yfandes falters, both Valdemar and its Herald-Mage must pay the ultimate price.
A new queen has usurped the throne and is leading Cenaria into disaster. The country has become a broken realm with a threadbare army, little food and no hope. So Kylar Stern plans to reinstate his closest friend Logan as King, but can he really get away with murder?

In the north, the Godking's death has thrown Khalidor into civil war. To gain the upper hand, one faction attempts to raise the goddess Khali herself. But they are playing with volatile powers, and trigger conflict on a vast scale. Seven armies will converge to save - or destroy - an entire continent.

Kylar has finally learnt the bitter cost of immortality, and is f

The never-before-seen history of Westeros and the lands beyond. With hundreds of pages of all-new material from George R.R. Martin.
If the past is prologue, then George R.R. Martin’s masterwork—the most inventive and entertaining fantasy saga of our time—warrants one hell of an introduction. At long last, it has arrived with THE WORLD OF ICE AND FIRE.

George R.R. Martin, in collaboration with Elio M. García, Jr. and Linda Antonsson, has written a comprehensive history of the Seven Kingdoms, featuring the epic battles, bitter rivalries, and daring rebellions that lead up to the events in the bestselling A Song of Ice and Fire series.

Collected within this volume is the accumulated knowledge, scholarly speculation, and inherited folk tales of maesters and septons, maegi and singers, including over 170 full-colour illustrations and maps, family trees for the Houses Stark, Lannister and Targaryen, and in-depth explanations of the history and culture of Westeros.

This is the definitive c

In [147]:
# Do the same for lowest rated books
for book in lowest150books_dict:
    getDescriptionFromBookURL(book, browser)
    sleep(random() * 3)

then.
When Sam met Grace, he was a wolf and she was a girl. Eventually he found a way to become a boy, and their love moved from curious distance to the intense closeness of shared lives.

now.
That should have been the end of their story. But Grace was not meant to stay human. Now she is the wolf. And the wolves of Mercy Falls are about to be killed in one final, spectacular hunt.

forever.
Sam would do anything for Grace. But can one boy and one love really change a hostile, predatory world? The past, the present, and the future are about to collide in one pure moment--a moment of death or life, farewell or forever.
Cassidy Blake's parents are The Inspecters, a (somewhat inept) ghost-hunting team. But Cass herself can REALLY see ghosts. In fact, her best friend, Jacob, just happens to be one.

When The Inspecters head to ultra-haunted Edinburgh, Scotland, for their new TV show, Cass—and Jacob—come along. In Scotland, Cass is surrounded by ghosts, not all of them friendly. Then she me

Beauty knows the Beast’s forest in her bones—and in her blood. Though she grew up with the city’s highest aristocrats, far from her father’s old lodge, she knows that the forest holds secrets and that her father is the only hunter who’s ever come close to discovering them.

So when her father loses his fortune and moves Yeva and her sisters back to the outskirts of town, Yeva is secretly relieved. Out in the wilderness, there’s no pressure to make idle chatter with vapid baronessas…or to submit to marrying a wealthy gentleman. But Yeva’s father’s misfortune may have cost him his mind, and when he goes missing in the woods, Yeva sets her sights on one prey: the creature he’d been obsessively tracking just before his disappearance.

Deaf to her sisters’ protests, Yeva hunts this strange Beast back into his own territory—a cursed valley, a ruined castle, and a world of creatures that Yeva’s only heard about in fairy tales. A world that can bring her ruin or salvation. Who will survive: th

Just when Azalea should feel that everything is before her—beautiful gowns, dashing suitors, balls filled with dancing—it's taken away. All of it. And Azalea is trapped. The Keeper understands. He's trapped, too, held for centuries within the walls of the palace. So he extends an invitation.

Every night, Azalea and her eleven sisters may step through the enchanted passage in their room to dance in his silver forest, but there is a cost. The Keeper likes to keep things. Azalea may not realize how tangled she is in his web until it is too late.
The only daughter of a prominent samurai, Mariko has always known she’d been raised for one purpose and one purpose only: to marry. Never mind her cunning, which rivals that of her twin brother, Kenshin, or her skills as an accomplished alchemist. Since Mariko was not born a boy, her fate was sealed the moment she drew her first breath.

So, at just seventeen years old, Mariko is sent to the imperial palace to meet her betrothed, a man she did no

The year is 1806. England is beleaguered by the long war with Napoleon, and centuries have passed since practical magicians faded into the nation's past. But scholars of this glorious history discover that one remains: the reclusive Mr Norrell, whose displays of magic send a thrill through the country.

Proceeding to London, he raises a beautiful woman from the dead and summons an army of ghostly ships to terrify the French. Yet the cautious, fussy Norrell is challenged by the emergence of another magician: the brilliant novice Jonathan Strange.

Young, handsome and daring, Strange is the very antithesis of Norrel. So begins a dangerous battle between these two great men which overwhelms that between England and France. And their own obsessions and secret dabblings with the dark arts are going to cause more trouble than they can imagine.
Evelyn Hardcastle will be murdered at 11:00 p.m.

There are eight days, and eight witnesses for you to inhabit.

We will only let you escape once you 

Nita Callahan is at the end of her rope because of the bullies who've been hounding her at school... until she discovers a mysterious library book that promises her the chance to become a wizard. But she has no idea of the difference that taking the Wizard's Oath is going to make in her life. Shortly, in company with fellow beginner-wizard Kit Rodriguez, Nita's catapulted into what will be the adventure of a lifetime—if she and Kit can both live through it. For every wizard's career starts with an Ordeal in which he or she must challenge the one power in the universe that hates wizardry more than anything else: the Lone Power that invented death and turned it loose in the worlds. Plunged into a dark and deadly alternate New York full of the Lone One's creatures, Kit and Nita must venture into the very heart of darkness to find the stolen, legendary Book of Night with Moon. Only with the dangerous power of the wizardly Book do they have a chance to save not just their own lives, but the

It's one thing to learn to curtsy properly. It's quite another to learn to curtsy and throw a knife at the same time. Welcome to Finishing School.

Fourteen-year-old Sophronia is a great trial to her poor mother. Sophronia is more interested in dismantling clocks and climbing trees than proper manners--and the family can only hope that company never sees her atrocious curtsy. Mrs. Temminnick is desperate for her daughter to become a proper lady. So she enrolls Sophronia in Mademoiselle Geraldine's Finishing Academy for Young Ladies of Quality.

But Sophronia soon realizes the school is not quite what her mother might have hoped. At Mademoiselle Geraldine's, young ladies learn to finish...everything. Certainly, they learn the fine arts of dance, dress, and etiquette, but they also learn to deal out death, diversion, and espionage--in the politest possible ways, of course. Sophronia and her friends are in for a rousing first year's education.
The Shadow of the Torturer is the first volum

The #1 New York Times Bestselling Series
An Amazon Best YA Book of 2020
Glitter Magazine’s #1 Pick for Best YA of 2020
Optioned for Film by Universal

My whole world changed when I stepped inside the academy. Nothing is right about this place or the other students in it. Here I am, a mere mortal among gods…or monsters. I still can’t decide which of these warring factions I belong to, if I belong at all. I only know the one thing that unites them is their hatred of me.

Then there’s Jaxon Vega. A vampire with deadly secrets who hasn’t felt anything for a hundred years. But there’s something about him that calls to me, something broken in him that somehow fits with what’s broken in me.

Which could spell death for us all.

Because Jaxon walled himself off for a reason. And now someone wants to wake a sleeping monster, and I’m wondering if I was brought here intentionally—as the bait.
The stunningly original, must-read fantasy of 2018 follows two fiercely independent young women, centurie

All paths lead to war...

Marcus' hero days are behind him. He knows too well that even the smallest war still means somebody's death. When his men are impressed into a doomed army, staying out of a battle he wants no part of requires some unorthodox steps.

Cithrin is an orphan, ward of a banking house. Her job is to smuggle a nation's wealth across a war zone, hiding the gold from both sides. She knows the secret life of commerce like a second language, but the strategies of trade will not defend her from swords.

Geder, sole scion of a noble house, has more interest in philosophy than in swordplay. A poor excuse for a soldier, he is a pawn in these games. No one can predict what he will become.

Falling pebbles can start a landslide. A spat between the Free Cities and the Severed Throne is spiraling out of control. A new player rises from the depths of history, fanning the flames that will sweep the entire region onto The Dragon's Path-the path to war.
The year is 2059. Nineteen-yea

From the #1 New York Times best-selling author of The Darkest Minds comes a sweepingly ambitious, high-octane tale of power, destiny, love and redemption.

Every seven years, the Agon begins. As punishment for a past rebellion, nine Greek gods are forced to walk the earth as mortals, hunted by the descendants of ancient bloodlines, all eager to kill a god and seize their divine power and immortality.
Long ago, Lore Perseous fled that brutal world in the wake of her family's sadistic murder by a rival line, turning her back on the hunt's promises of eternal glory. For years she's pushed away any thought of revenge against the man--now a god--responsible for their deaths.

Yet as the next hunt dawns over New York City, two participants seek out her help: Castor, a childhood friend of Lore believed long dead, and a gravely wounded Athena, among the last of the original gods.

The goddess offers an alliance against their mutual enemy and, at last, a way for Lore to leave the Agon behind fo

Laurel was mesmerized, staring at the pale things with wide eyes. They were terrifyingly beautiful—too beautiful for words.

Laurel turned to the mirror again, her eyes on the hovering petals that floated beside her head. They looked almost like wings.

In this extraordinary tale of magic and intrigue, romance and danger, everything you thought you knew about faeries will be changed forever.
Wily, charming Kuni Garu, a bandit, and stern, fearless Mata Zyndu, the son of a deposed duke, seem like polar opposites. Yet, in the uprising against the emperor, the two quickly become the best of friends after a series of adventures fighting against vast conscripted armies, silk-draped airships, and shapeshifting gods. Once the emperor has been overthrown, however, they each find themselves the leader of separate factions—two sides with very different ideas about how the world should be run and the meaning of justice.
Betrothed since childhood to the prince of Mynaria, Princess Dennaleia has alw

Alternate cover for ISBN 9780425190371 (currently here).

The Owens sisters confront the challenges of life and love in this bewitching novel from New York Times bestselling author Alice Hoffman.

For more than two hundred years, the Owens women have been blamed for everything that has gone wrong in their Massachusetts town. Gillian and Sally have endured that fate as well: as children, the sisters were forever outsiders, taunted, talked about, pointed at. Their elderly aunts almost seemed to encourage the whispers of witchery, with their musty house and their exotic concoctions and their crowd of black cats. But all Gillian and Sally wanted was to escape.

One will do so by marrying, the other by running away. But the bonds they share will bring them back—almost as if by magic...
He was supposed to be a myth.
But from the moment I crossed the River Styx and fell under his dark spell... he was, quite simply, mine.

Society darling Persephone Dimitriou plans to flee the ultra-modern cit

"You must never do anything that might expose our secret. This means that, in general, you cannot form close bonds with humans. You can speak to us, and you can always commune with the Ocean, but you are deadly to humans. You are, essentially, a weapon. A very beautiful weapon. I won't lie to you, it can be a lonely existence, but once you are done, you get to live. All you have to give, for now, is obedience and time..."

The same speech has been given hundreds of times to hundreds of beautiful girls who enter the sisterhood of sirens. Kahlen has lived by these rules for years now, patiently waiting for the life she can call her own. But when Akinli, a human, enters her world, she can't bring herself to live by the rules anymore. Suddenly the life she's been waiting for doesn't seem nearly as important as the one she's living now.
The Black Tides of Heaven is one of a pair of standalone introductions to JY Yang's Tensorate Series. For more of the story you can read its twin novella Th

Ceony Twill arrives at the cottage of Magician Emery Thane with a broken heart. Having graduated at the top of her class from the Tagis Praff School for the Magically Inclined, Ceony is assigned an apprenticeship in paper magic despite her dreams of bespelling metal. And once she’s bonded to paper, that will be her only magic…forever.

Yet the spells Ceony learns under the strange yet kind Thane turn out to be more marvelous than she could have ever imagined—animating paper creatures, bringing stories to life via ghostly images, even reading fortunes. But as she discovers these wonders, Ceony also learns of the extraordinary dangers of forbidden magic.

An Excisioner—a practitioner of dark, flesh magic—invades the cottage and rips Thane’s heart from his chest. To save her teacher’s life, Ceony must face the evil magician and embark on an unbelievable adventure that will take her into the chambers of Thane’s still-beating heart—and reveal the very soul of the man.

From the imaginative 

It's Zinnia Gray's twenty-first birthday, which is extra-special because it's the last birthday she'll ever have. When she was young, an industrial accident left Zinnia with a rare condition. Not much is known about her illness, just that no one has lived past twenty-one.

Her best friend Charm is intent on making Zinnia's last birthday special with a full sleeping beauty experience, complete with a tower and a spinning wheel. But when Zinnia pricks her finger, something strange and unexpected happens, and she finds herself falling through worlds, with another sleeping beauty, just as desperate to escape her fate.
The Crescent Moon Kingdoms, home to djenn and ghuls, holy warriors and heretics, are at the boiling point of a power struggle between the iron-fisted Khalif and the mysterious master thief known as the Falcon Prince. In the midst of this brewing rebellion a series of brutal supernatural murders strikes at the heart of the Kingdoms. It is up to a handful of heroes to learn the

The city-state of Saraykeht dominates the Summer Cities. Its wealth is beyond measure; its port is open to all the merchants of the world, and its ruler, the Khai Saraykeht, commands forces to rival the Gods. Commerce and trade fill the streets with a hundred languages, and the coffers of the wealthy with jewels and gold. Any desire, however exotic or base, can be satisfied in its soft quarter. Blissfully ignorant of the forces that fuel their prosperity, the people live and work secure in the knowledge that their city is a bastion of progress in a harsh world. It would be a tragedy if it fell.

Saraykeht is poised on the knife-edge of disaster.

At the heart of the city's influence are the poet-sorcerer Heshai and the captive spirit, Seedless, whom he controls. For all his power, Heshai is weak, haunted by memories of shame and humiliation. A man faced with constant reminders of his responsibilities and his failures, he is the linchpin and the most vulnerable point in Saraykeht's grea

Some stories are so beautiful, so brutal, that they claw at your heart and refuse to let go. Welcome to the world of Wicked Saints—an epic, passionate novel that you won't soon forget. Prepare to meet:

A GIRL named Nadya, who hears the whisper of the gods inside her head.

A PRINCE surrounded by desperate suitors and deadly assassins

A MONSTER hidden behind pale, tortured eyes — and a smile that cuts like a knife

The paths of these three characters become entwined during a centuriea-long war filled with sinners and saints, magic and mystery, and a star-crossed romance that threatens to tip the scales between dark and light . .. forever.
"You've long set your heart against it, Axl, I know. But it's time now to think on it anew. There's a journey we must go on, and no more delay..."

The Buried Giant begins as a couple set off across a troubled land of mist and rain in the hope of finding a son they have not seen in years.

Sometimes savage, often intensely moving, Kazuo Ishiguro's fi

Nevare Burvelle was destined from birth to be a soldier. The second son of a newly anointed nobleman, he must endure the rigors of military training at the elite King's Cavalla Academy--and survive the hatred, cruelty, and derision of his aristocratic classmates--before joining the King of Gernia's brutal campaign of territorial expansion. The life chosen for him will be fraught with hardship, for he must ultimately face a forest-dwelling folk who will not submit easily to a king's tyranny. And they possess an ancient magic their would-be conquerors have long discounted--a powerful sorcery that threatens to claim Nevare Burvelle's soul and devastate his world once the Dark Evening brings the carnival to Old Thares.
The fantasy novel you’ve always wished Jane Austen had written

Shades of Milk and Honey is exactly what we could expect from Jane Austen if she had been a fantasy writer: Pride and Prejudice meets Jonathan Strange & Mr. Norrell. It is an intimate portrait of a woman, Jane, 

In [163]:
# Strip all the newline characters from the book descriptions

def stripNewLines(book_dict):
    for book in book_dict:
        book['description'] = book['description'].replace('\n', '')
        
stripNewLines(top150books_dict)
stripNewLines(lowest150books_dict)
lowest150books_dict

[{'title': 'The Wolves of Mercy Falls',
  'link': 'https://www.goodreads.com/book/show/9409458-forever',
  'author': 'Maggie Stiefvater',
  'rating': 3.89,
  'genres': ['young-adult',
   'romance',
   'paranormal',
   'werewolves',
   'paranormal-romance'],
  'description': 'then.When Sam met Grace, he was a wolf and she was a girl. Eventually he found a way to become a boy, and their love moved from curious distance to the intense closeness of shared lives.now.That should have been the end of their story. But Grace was not meant to stay human. Now she is the wolf. And the wolves of Mercy Falls are about to be killed in one final, spectacular hunt.forever.Sam would do anything for Grace. But can one boy and one love really change a hostile, predatory world? The past, the present, and the future are about to collide in one pure moment--a moment of death or life, farewell or forever.'},
 {'title': 'Cassidy Blake',
  'link': 'https://www.goodreads.com/book/show/35403058-city-of-ghosts',
 

In [164]:
header = ["title", "link", "author", "rating", "genres", "description"]
saveBooksToCsv("top_rated19june23.csv", top150books_dict, header)
saveBooksToCsv("lowest_rated19june23.csv", lowest150books_dict, header)
lowest150books_dict

[{'title': 'The Wolves of Mercy Falls',
  'link': 'https://www.goodreads.com/book/show/9409458-forever',
  'author': 'Maggie Stiefvater',
  'rating': 3.89,
  'genres': ['young-adult',
   'romance',
   'paranormal',
   'werewolves',
   'paranormal-romance'],
  'description': 'then.When Sam met Grace, he was a wolf and she was a girl. Eventually he found a way to become a boy, and their love moved from curious distance to the intense closeness of shared lives.now.That should have been the end of their story. But Grace was not meant to stay human. Now she is the wolf. And the wolves of Mercy Falls are about to be killed in one final, spectacular hunt.forever.Sam would do anything for Grace. But can one boy and one love really change a hostile, predatory world? The past, the present, and the future are about to collide in one pure moment--a moment of death or life, farewell or forever.'},
 {'title': 'Cassidy Blake',
  'link': 'https://www.goodreads.com/book/show/35403058-city-of-ghosts',
 

In [1]:
import csv
import os
def loadBooksFromFile(filename):
    books_dict = []
    with open(filename, 'r', encoding="utf8") as data:
        reader = csv.DictReader(data)
        for book in reader:
            book['rating'] = float(book['rating'])
            books_dict.append(book)
    return books_dict

top_books = loadBooksFromFile("top_rated19june23.csv")
lowest_books = loadBooksFromFile("lowest_rated19june23.csv")
print(lowest_books)



In [2]:
# Ref: https://pypi.org/project/goodreadsscraper/#description
import goodreadsscraper as grs
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.keys import Keys # Allows logging in to Goodreads (sending username + password) via chromeDriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By 
from time import sleep
from random import random
import logging
log = logging.getLogger("my-logger")
from selenium.common.exceptions import NoSuchElementException
# Create the programmable browser
pathToChromeDriver = 'C:\Program Files\Chrome Driver\chromedriver.exe'
service = Service(pathToChromeDriver)
options = Options()
#browser = webdriver.Chrome(service=service)

def get_ids(books_dict):
    for book in books_dict:
        book["id"] = grs.get_book_id(book["link"])

get_ids(top_books)
get_ids(lowest_books)


#args = Args(True, False, 20819685, False, False, False, True, (1, 29), False, "Could not get review")

#driver = grs.create_driver(args)
#reviews_page = grs.get_book_reviews_page(args, driver)
#reviews_list = grs.scrape_reviews_page(args, reviews_page, 1, 20)
#grs.get_book_page(args, driver)
# Experimenting with goodreadsscraper library
#grs.scrape_book_page(html_page="https://www.goodreads.com/book/show/20819685-the-bone-clocks")

In [3]:
class Args:
    def __init__(self, _logging, _isbn, _id, _btitle, _reviews_language, _reviews_simple, _reviews, _reviews_range,
                _reviews_output, _errors):
        self.logging = _logging
        self.isbn = _isbn
        self.id = _id
        self.btitle = _btitle
        self.reviews_language = _reviews_language
        self.reviews_simple = _reviews_simple
        self.reviews = _reviews
        self.reviews_range = _reviews_range
        self.reviews_output = _reviews_output
        self.errors = _errors


In [4]:
#from langdetect import detect # Ref: https://pypi.org/project/langdetect/
#reviews_list = grs.scrape_reviews_page(args, reviews_page, 1, 29)
#x = 1
#for review in reviews_list:
    #print(str(x) +":")
    #print(review.review)
    #if detect(review.review) != 'en':
        #reviews_list.remove(review)
        #print("Review not in English: removed")
    #print('\n')
    #x+=1

LOG : Scraping reviews's page for info [2023-06-19 23:21:02]
LOG : Scraping finished [2023-06-19 23:21:02]
1:


2:
For many, David Mitchell seems to be an untouchable. However, this was my first book written by this author. I read it based on the rave reviews I saw, but found it as close to unreadable as any I've come across.The first chapter is by far the best of the book, after this, the author goes off on tangents using language, terms, and words that were completely foreign to me.The story is all over the place and I was completely lost regarding character progression, relationships, and the overall storyline. The sections pertaining to the immortals was one of the worst pieces of writing I have ever experienced. I had waded my way through 500 something pages and was meandering about trying to connect the dots.I kept reading hoping that the proceeding pages would somehow make sense. Soon, I found myself skimming as I was oh, so tired. My feelings became one of "who cares." After 60



9:
"True metamorphosis doesn't come with flowcharts."Another genre-bending novel by David Mitchell also channels Stephen King and Carlos Ruiz Zafón. Did you just hear that? Yes, but Mitchell does nothing by mistake. It was evidently deliberate, and he mixes various castes of writing styles, although much less so than in CLOUD ATLAS and even THE THOUSAND AUTUMNS OF JACOB DE ZOET. Mitchell lures in mainstream readers, as well as his steadfast fans. I think he does one better, though, than the latter giants of the macabre. He not only advances the plot, he advances the reader."Power is crack cocaine for the ego and battery acid for the soul."Although Cloud Atlas remains my personal favorite of Mitchell's novels, I was no less astonished by the author's ability to chime all his previous books in THE BONE CLOCKS. Some authors, such as Coetzee, will name their protagonists after themselves in the later novels of their oeuvres. This is more and more common as novelists become established an



22:
I tried, I really did. Picked it up, put it down, picked it back up again. Made it to page 200, but this book is just not for me. Made my head hurt.


23:
نوشتن برای این کتاب سخته و بدون اسپویل ریز یا کمی درشت غیرممکن، و در عین حال اگر تا حالا دیوید میچل نخوندید، پیشنهاد نمی‌کنم همین‌طوری بدون پیش‌زمینه‌ برید سراغش چون سبک روایت خاصی داره که مورد پسند همه نمی‌تونه باشه. حداقل قبلش بهتره فیلم اطلس ابر رو ببینید. شاید فیلم شاهکاری نباشه ولی ارزش دیدن دارهتوی یه مصاحبه که میچل و نیل گیمن مهمان بودن، میچل خودش رو به عنوان یه فیک ناولیست معرفی می‌کنه و می‌گه من در اصل یه نویسنده ناولا هستم چون ساختار چیزی که می‌نویسم بین 80 تا 130 صفحه‌اسحالا کاری که میچل توی بیشتر کتاب‌هاش می‌کنه اینه که طبق ایده‌ی کلی‌ای که داره، میاد فصل‌های کتاب‌هاش رو به شکل ناولا می‌نویسهمنظور اینجا یه چیزی شبیه نغمه‌ی آتش و یخ نیست که هر روایت همزمان با داستان باقی شخصیت‌ها پیش بره. هر فصل شبیه یک کتاب مجزای صد و خورده‌ای صفحه‌اس و یه راوی داره، داستان اون شخصیت رو تقریباً از صفر شروع می‌کنه و در پایان اون فصل 

In [8]:
from langdetect import detect
from langdetect import LangDetectException

# Scrapes first 25 to 29 reviews for each book
def scrapeReviewsForEachBook(books):
     # Necessary Python instance of Args class to send into the Goodreads package functions (see above)
        args = Args(True, False, "", False, False, False, True, (1, 29), False, "Error: no such review")
        # Create a webdriver using Goodreads package
        driver = grs.create_driver(args)
        for book in books:
            print(book["link"])
            # Scrape reviews for specific books
            scrapeReviewsForBook(book, driver)
        
def scrapeReviewsForBook(book, driver):
        # Store the reviews for the book in this list
        book_reviews = []
        # First, use Goodreads' package get_book_reviews_page to get the HTML page storing first 29 reviews
        book_args = Args(True, False, book["id"], False, False, False, True, (1, 29), False, "Error: no such review")
        reviews_page = grs.get_book_reviews_page(book_args, driver)
        # Second, now that we have the Reviews page, we can scrape individual reviews
        reviews = grs.scrape_reviews_page(book_args, reviews_page, 1, 29)
        for review in reviews:
            try:
                # Only store reviews written in English using langdetect library
                if detect(review.review) == 'en':
                    # Add specific review to the temporary list (book_reviews)
                    book_reviews.append(review.review)
            # LangDetectException: Comes up when 'no features in text.' --> then just skip this review, unfortunately...
            except LangDetectException:
                continue
        # Append the list of reviews to the value of the book["reviews"] key for that book-dict
        book["reviews"] = book_reviews
    
scrapeReviewsForEachBook(lowest_books)

LOG : Creating driver [2023-06-20 15:46:17]
LOG : Firefox [2023-06-20 15:46:17]
LOG : Driver created [2023-06-20 15:46:22]
https://www.goodreads.com/book/show/9409458-forever
LOG : Searching for book reviews's page [2023-06-20 15:46:22]
LOG : Trying to get reviews page [2023-06-20 15:46:22]
LOG : Waiting for page to load... [2023-06-20 15:46:25]
LOG : Search finished [2023-06-20 15:46:29]
LOG : Scraping reviews's page for info [2023-06-20 15:46:29]
LOG : Scraping finished [2023-06-20 15:46:33]
https://www.goodreads.com/book/show/35403058-city-of-ghosts
LOG : Searching for book reviews's page [2023-06-20 15:46:33]
LOG : Trying to get reviews page [2023-06-20 15:46:33]
LOG : Waiting for page to load... [2023-06-20 15:46:34]
LOG : Refreshing page... [2023-06-20 15:46:38]
LOG : Waiting for page to load... [2023-06-20 15:46:40]
LOG : Refreshing page... [2023-06-20 15:46:42]
LOG : Waiting for page to load... [2023-06-20 15:46:45]
LOG : Search finished [2023-06-20 15:46:47]
LOG : Scraping rev

LOG : Search finished [2023-06-20 15:48:54]
LOG : Scraping reviews's page for info [2023-06-20 15:48:54]
LOG : Scraping finished [2023-06-20 15:48:54]
https://www.goodreads.com/book/show/8428195-entwined
LOG : Searching for book reviews's page [2023-06-20 15:48:55]
LOG : Trying to get reviews page [2023-06-20 15:48:55]
LOG : Waiting for page to load... [2023-06-20 15:48:57]
LOG : Refreshing page... [2023-06-20 15:49:00]
LOG : Waiting for page to load... [2023-06-20 15:49:05]
LOG : Search finished [2023-06-20 15:49:07]
LOG : Scraping reviews's page for info [2023-06-20 15:49:07]
LOG : Scraping finished [2023-06-20 15:49:07]
https://www.goodreads.com/book/show/23308087-flame-in-the-mist
LOG : Searching for book reviews's page [2023-06-20 15:49:07]
LOG : Trying to get reviews page [2023-06-20 15:49:07]
LOG : Waiting for page to load... [2023-06-20 15:49:10]
LOG : Search finished [2023-06-20 15:49:13]
LOG : Scraping reviews's page for info [2023-06-20 15:49:13]
LOG : Scraping finished [202

LOG : Search finished [2023-06-20 15:51:48]
LOG : Scraping reviews's page for info [2023-06-20 15:51:48]
LOG : Scraping finished [2023-06-20 15:51:48]
https://www.goodreads.com/book/show/54467051-the-unbroken
LOG : Searching for book reviews's page [2023-06-20 15:51:48]
LOG : Trying to get reviews page [2023-06-20 15:51:48]
LOG : Waiting for page to load... [2023-06-20 15:51:53]
LOG : Search finished [2023-06-20 15:51:55]
LOG : Scraping reviews's page for info [2023-06-20 15:51:55]
LOG : Scraping finished [2023-06-20 15:51:55]
https://www.goodreads.com/book/show/116563.So_You_Want_to_Be_a_Wizard
LOG : Searching for book reviews's page [2023-06-20 15:51:56]
LOG : Trying to get reviews page [2023-06-20 15:51:56]
LOG : Waiting for page to load... [2023-06-20 15:51:58]
LOG : Search finished [2023-06-20 15:52:01]
LOG : Scraping reviews's page for info [2023-06-20 15:52:01]
LOG : Scraping finished [2023-06-20 15:52:01]
https://www.goodreads.com/book/show/55215339-a-touch-of-darkness
LOG : Se

LOG : Search finished [2023-06-20 15:54:40]
LOG : Scraping reviews's page for info [2023-06-20 15:54:40]
LOG : Scraping finished [2023-06-20 15:54:40]
https://www.goodreads.com/book/show/23197837-the-belles
LOG : Searching for book reviews's page [2023-06-20 15:54:41]
LOG : Trying to get reviews page [2023-06-20 15:54:41]
LOG : Waiting for page to load... [2023-06-20 15:54:44]
LOG : Search finished [2023-06-20 15:54:46]
LOG : Scraping reviews's page for info [2023-06-20 15:54:46]
LOG : Scraping finished [2023-06-20 15:54:47]
https://www.goodreads.com/book/show/111450.Quidditch_Through_the_Ages
LOG : Searching for book reviews's page [2023-06-20 15:54:47]
LOG : Trying to get reviews page [2023-06-20 15:54:47]
LOG : Waiting for page to load... [2023-06-20 15:54:49]
LOG : Search finished [2023-06-20 15:54:52]
LOG : Scraping reviews's page for info [2023-06-20 15:54:52]
LOG : Scraping finished [2023-06-20 15:54:52]
https://www.goodreads.com/book/show/301538.The_Darkness_That_Comes_Before
L

https://www.goodreads.com/book/show/53375824-lore
LOG : Searching for book reviews's page [2023-06-20 15:57:15]
LOG : Trying to get reviews page [2023-06-20 15:57:15]
LOG : Waiting for page to load... [2023-06-20 15:57:17]
LOG : Search finished [2023-06-20 15:57:20]
LOG : Scraping reviews's page for info [2023-06-20 15:57:20]
LOG : Scraping finished [2023-06-20 15:57:20]
https://www.goodreads.com/book/show/18079804-half-bad
LOG : Searching for book reviews's page [2023-06-20 15:57:20]
LOG : Trying to get reviews page [2023-06-20 15:57:20]
LOG : Waiting for page to load... [2023-06-20 15:57:24]
LOG : Search finished [2023-06-20 15:57:26]
LOG : Scraping reviews's page for info [2023-06-20 15:57:26]
LOG : Scraping finished [2023-06-20 15:57:26]
https://www.goodreads.com/book/show/8089.Rose_Daughter
LOG : Searching for book reviews's page [2023-06-20 15:57:27]
LOG : Trying to get reviews page [2023-06-20 15:57:27]
LOG : Waiting for page to load... [2023-06-20 15:57:29]
LOG : Search finishe

LOG : Search finished [2023-06-20 15:59:33]
LOG : Scraping reviews's page for info [2023-06-20 15:59:33]
LOG : Scraping finished [2023-06-20 15:59:34]
https://www.goodreads.com/book/show/13538873-mr-penumbra-s-24-hour-bookstore
LOG : Searching for book reviews's page [2023-06-20 15:59:34]
LOG : Trying to get reviews page [2023-06-20 15:59:34]
LOG : Waiting for page to load... [2023-06-20 15:59:35]
LOG : Refreshing page... [2023-06-20 15:59:39]
LOG : Waiting for page to load... [2023-06-20 15:59:42]
LOG : Search finished [2023-06-20 15:59:44]
LOG : Scraping reviews's page for info [2023-06-20 15:59:44]
LOG : Scraping finished [2023-06-20 15:59:45]
https://www.goodreads.com/book/show/39863498-the-gilded-wolves
LOG : Searching for book reviews's page [2023-06-20 15:59:45]
LOG : Trying to get reviews page [2023-06-20 15:59:45]
LOG : Waiting for page to load... [2023-06-20 15:59:47]
LOG : Refreshing page... [2023-06-20 15:59:50]
LOG : Waiting for page to load... [2023-06-20 15:59:53]
LOG : 

LOG : Waiting for page to load... [2023-06-20 16:01:59]
LOG : Search finished [2023-06-20 16:02:01]
LOG : Scraping reviews's page for info [2023-06-20 16:02:01]
LOG : Scraping finished [2023-06-20 16:02:01]
https://www.goodreads.com/book/show/6931344-the-near-witch
LOG : Searching for book reviews's page [2023-06-20 16:02:02]
LOG : Trying to get reviews page [2023-06-20 16:02:02]
LOG : Waiting for page to load... [2023-06-20 16:02:05]
LOG : Search finished [2023-06-20 16:02:08]
LOG : Scraping reviews's page for info [2023-06-20 16:02:08]
LOG : Scraping finished [2023-06-20 16:02:08]
https://www.goodreads.com/book/show/30969741-an-enchantment-of-ravens
LOG : Searching for book reviews's page [2023-06-20 16:02:08]
LOG : Trying to get reviews page [2023-06-20 16:02:08]
LOG : Waiting for page to load... [2023-06-20 16:02:10]
LOG : Search finished [2023-06-20 16:02:13]
LOG : Scraping reviews's page for info [2023-06-20 16:02:13]
LOG : Scraping finished [2023-06-20 16:02:13]
https://www.good

LOG : Search finished [2023-06-20 16:04:57]
LOG : Scraping reviews's page for info [2023-06-20 16:04:57]
LOG : Scraping finished [2023-06-20 16:04:57]
https://www.goodreads.com/book/show/34275232-the-hazel-wood
LOG : Searching for book reviews's page [2023-06-20 16:04:57]
LOG : Trying to get reviews page [2023-06-20 16:04:57]
LOG : Waiting for page to load... [2023-06-20 16:05:00]
LOG : Search finished [2023-06-20 16:05:03]
LOG : Scraping reviews's page for info [2023-06-20 16:05:03]
LOG : Scraping finished [2023-06-20 16:05:03]
https://www.goodreads.com/book/show/25203675-the-star-touched-queen
LOG : Searching for book reviews's page [2023-06-20 16:05:04]
LOG : Trying to get reviews page [2023-06-20 16:05:04]
LOG : Waiting for page to load... [2023-06-20 16:05:06]
LOG : Search finished [2023-06-20 16:05:09]
LOG : Scraping reviews's page for info [2023-06-20 16:05:09]
LOG : Scraping finished [2023-06-20 16:05:09]
https://www.goodreads.com/book/show/36118682-wicked-saints
LOG : Searchin

In [10]:
import os
current_dir = os.getcwd()
#new_dir = "top_reviews"
path = os.path.join(current_dir, "lowest_reviews")
os.chdir(path)
current_dir = os.getcwd()
print(current_dir)

C:\Users\Ophelia\OneDrive\Documents\Programming with Data\MIDTERM\lowest_reviews


In [13]:
#for book in top_books:
   # new_directory = book["id"]
    #new_path = os.path.join(os.getcwd(), new_directory)
   # os.mkdir(new_path)
   # os.chdir(new_path)
   # book["review_path"] = os.getcwd()
    #review_count = 1
   # for review in book["reviews"]:
     ##   review_filename = "review_" + str(review_count) + ".txt"
      #  print(review_filename)
       # with open(review_filename, 'w', encoding='utf-8', newline='') as f:
        #    for line in review:
         #       f.writelines(line)
        #review_count += 1
    # Go back to 'reviews' folder
    #os.chdir('..')

# Stores reviews for each book in the list as individual text files, with each book having its own directory under 'top_reviews'
# or 'lowest_reviews'
def saveReviewsForAllBooks(books):
    for book in books:
        saveReviewsForOneBook(book)
        
# Stores a single book's reviews as text files in a directory named after the book's unique Goodreads ID 
def saveReviewsForOneBook(book):
    # Create a new directory to store that particular book's reviews
    new_directory = book["id"]
    # Form a path out of the current directory and the book's reviews sub-directory
    new_path = os.path.join(os.getcwd(), new_directory)
    # Create a new directory based on the above path
    os.mkdir(new_path)
    # Navigate into the particular book's folder/directory
    os.chdir(new_path)
    # Store the abs path to the book's reviews directory in a new dict-key called "review path"
    book["review_path"] = os.getcwd()
    # Number each text file for each review beginning with 1
    review_count = 1
    for review in book["reviews"]:
        # Create filename for each individual review
        review_filename = "review_" + str(review_count) + ".txt"
        # Print to keep track of which review is being stored right now
        print(review_filename)
        # Write the review from book["reviews"] into the 'review_<number>.txt' file
        with open(review_filename, 'w', encoding='utf-8', newline='') as f:
            for line in review:
                line = line.replace('\n', '')
                f.writelines(line)
        # Incremenet the number for the next filename
        review_count += 1
    # After done storing all reviews for that book, navigate back to the "reviews"/parent folder
    os.chdir('..')


In [15]:
saveReviewsForAllBooks(lowest_books)

review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
revie

review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_

review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_28.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_28.txt
review_1.t

review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_28.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
revie

review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
r

review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
re

review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_28.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_1.txt
review_2.txt
review_3.txt
review_4.txt
review_5.txt
review_6.txt
review_7.txt
review_8.txt
review_9.txt
review_10.txt
review_11.txt
review_12.txt
review_13.txt
review_14.txt
review_15.txt
review_16.txt
review_17.txt
review_18.txt
review_19.txt
review_20.txt
review_21.txt
review_22.txt
review_23.txt
review_24.txt
review_25.txt
review_26.txt
review_27.txt
review_28.txt
review_1.txt
review_2.tx

## References

- [Goodreads Scraper Package](https://pypi.org/project/goodreadsscraper/#description)
- [LangDetect Package](https://pypi.org/project/langdetect/)