In [1]:
import pandas as pd
from langdetect import detect
import string
import data_collector
import parser
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from collections import defaultdict
import pickle
import math
import numpy as np
import heapq
import re

# 1. Data Collection

## Get the list of the books
We already have the list of books in the pc, so we won't do it again.

Set to `True` both dirs, bests and links parameters to create the correct directories and download the txt containing all the html links.

In [2]:
data_collector.download_books(dirs=False, bests=False, links=False)

## 1.1 Crawl books
We already have all the htmls in the pc, so we won't do it again.

Set to `True` both the books and fails parameters to download all the html pages and remove the ones with broken pages.

In [3]:
data_collector.download_books(books=False, fails=False)

## 1.2 Parse downloaded pages
Set to `True` the create parameter to parse the downloaded html pages and create the tsv file.

In [4]:
parser.create_tsv(create=False)

In [46]:
df = pd.read_csv('parsed_books.tsv', sep='\t')

In [155]:
df.shape

(29959, 12)

In [156]:
df.head()

Unnamed: 0,bookTitle,bookSeries,bookAuthors,ratingValue,ratingCount,reviewCount,Plot,numberOfPages,PublishingDate,Characters,Setting,Url
0,The Hunger Games,The Hunger Games #1,Suzanne Collins,4.33,6408798.0,172554.0,"Could you survive on your own in the wild, wit...",374.0,September 14th 2008,Katniss Everdeen Peeta Mellark Cato (Hunger Ga...,"District 12, Panem Capitol, Panem Panem",https://www.goodreads.com/book/show/2767052-th...
1,Harry Potter and the Order of the Phoenix,Harry Potter #5,J.K. Rowling,4.5,2525157.0,42734.0,There is a door at the end of a silent corrido...,870.0,September 2004,Sirius Black Draco Malfoy Ron Weasley Petunia ...,Hogwarts School of Witchcraft and Wizardry Lon...,https://www.goodreads.com/book/show/2.Harry_Po...
2,To Kill a Mockingbird,To Kill a Mockingbird,Harper Lee,4.28,4527405.0,91802.0,The unforgettable novel of a childhood in a sl...,324.0,May 23rd 2006,Scout Finch Atticus Finch Jem Finch Arthur Rad...,"Maycomb, Alabama",https://www.goodreads.com/book/show/2657.To_Ki...
3,Pride and Prejudice,,Jane Austen,4.26,3017830.0,67811.0,Alternate cover edition of ISBN 9780679783268S...,279.0,October 10th 2000,Mr. Bennet Mrs. Bennet Jane Bennet Elizabeth B...,"United Kingdom Derbyshire, England England Her...",https://www.goodreads.com/book/show/1885.Pride...
4,Twilight,The Twilight Saga #1,Stephenie Meyer,3.6,4989910.0,104912.0,About three things I was absolutely positive.F...,501.0,September 6th 2006,Edward Cullen Jacob Black Laurent Renee Bella ...,"Forks, Washington Phoenix, Arizona Washington ...",https://www.goodreads.com/book/show/41865.Twil...


## 1.3 Dataset cleaning [preliminary steps]
Before actually jumping into the work itself, we want our dataframe to be clean, meaning that there are some preliminary steps we need to perform on it. First of all, missing data is something we should pay attention to. Lot's of rows are going to have missing data somewhere, and dealing with missing data it's not that nice. Notice that this will include different strategies for each of the column we will be considering (more details below). Then there is the problem with punctuation, stopwords, stems and so on so forth, so basic text data preprocessing. Let's make a brief recap:

1. **Missing data**
    - `bookTitle`: if a book is missing the title, then we can safely just remove the instance. In fact, books that are missing the title are actually missing all the informations, meaning that there is a problem with the GoodReads specific link. Also, even if a book was missing just the title, we wouldn't have a way to refer to it, thus it wouldn't be really useful considering we're building a search engine.
    - `bookSeries`, `Authors`, `Plot`, `PublishingDate`, `Characters`, `Setting`: if a book is missing one of the above mentioned columns, we can still include the book in the data, since the search engine could for example work with just the title. Obviously, we cannot just leave the values missing, since it would be really hard to perform any operation on that. These are all text columns, therefore the best way to address the missing values prolem is to replace NaNs with empty strings.
    - `ratingValue`, `NumberofPages`: TODO?
2. **Text data preprocessing**
    - Punctuation removal: this is the first step we want to perform, since it is going to make the next steps much easier (e.g., language detection will be easier if there aren't plots composed just by punctuation symbols).
    - Language detection: before doing anything else, we want to remove the books that present the books for which the plot isn't in english.
    - Stopwords removal (of the `Plot` column only)
    - Stemming (of the `Plot` column only)
    - Lowercase

### Missing values

#### Title
There are 774 books that are completely empty, and these corresponds to the ones that are missing the `bookTitle` column. If you give a look at the url, you can see that these are not given by our python script to download and parse the books, but actually from the fact that the link is broken. Also, you can see that all the books that are missing the `bookTitle` are also missing all the remaining data.

This means that we can safely just remove all the rows that are missing the `bookTitle` column.

In [3]:
n_missing = df[(df['bookTitle'].isna())].shape[0]
print('There are {} instances that are missing the `bookTitle` column.'.format(n_missing))
print()
df[(df['bookTitle'].isna())].head()

There are 774 instances that are missing the `bookTitle` column.



Unnamed: 0,bookTitle,bookSeries,bookAuthors,ratingValue,ratingCount,reviewCount,Plot,numberOfPages,PublishingDate,Characters,Setting,Url
311,,,,,,,,,,,,https://www.goodreads.com/book/show/40937505\r\n
370,,,,,,,,,,,,https://www.goodreads.com/book/show/30528535\r\n
379,,,,,,,,,,,,https://www.goodreads.com/book/show/30528544\r\n
789,,,,,,,,,,,,https://www.goodreads.com/book/show/40941582\r\n
1141,,,,,,,,,,,,https://www.goodreads.com/book/show/5295735\r\n


In [4]:
# Remove empty books
df = df[(df['bookTitle'].notna())]

#### Text data

In [5]:
str_columns = ['bookSeries', 'bookAuthors', 'Plot', 'PublishingDate', 'Characters', 'Setting']

for col in str_columns:
    df[col] = df[col].fillna('')

### Text data preprocessing

#### Punctuation removal

**Observations**:

There are several ways to remove punctuations, including the use of exernal libraries (like nltk). But actually the fastest way to perform punctuation removal is the use of the internal methong translate, which is programmed in C and therefore it's much faster than the other options (give a look to this [link](https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string) for a nice performance analysis of the various options).

In [12]:
def remove_punctuation(s):
    return s.translate(str.maketrans('', '', string.punctuation + '’—'))

def remove_punctuation_(s):
    return re.sub("[^\w\s]", " ", s)

In [161]:
for col in str_columns:
    if col == 'Plot':
        df[col] = df[col].apply(remove_punctuation_)
    else:
        df[col] = df[col].apply(remove_punctuation)

#### Language detection
There are four possibilities `Plot` column of a given book:
1. It is written in english
2. It is written in another language
3. It is empty
4. It contains symbols, numbers, and so on

We want to keep only the ones written in english or empty, so we are just going to discard the others.

In [162]:
def language(s):
    if s == '':
        return 'empty'
    try:
        return detect(s)
    except:
        return 'symbols'

In [163]:
df['plot_lang'] = df['Plot'].apply(language)

In [164]:
df = df[df['plot_lang'] == 'en'].drop(columns=['plot_lang'])

In [165]:
df.shape

(26126, 12)

#### Stopwords removal
We are not going to perform stopwords removal on all the columns, since we could remove important things (e.g., we don't want to remove anything from the names of the characters). The only column on which stopwords removal is necessary is `Plot`.

In [166]:
def remove_stopwords(s):
    stop_words = set(stopwords.words('english'))
    tokens = word_tokenize(s)
    return ' '.join([w for w in tokens if w not in stop_words])

In [167]:
df['Plot'] = df['Plot'].apply(remove_stopwords)

#### Stemming
As for the stopwords removal, the only column on which stemming is necessary is `Plot`.

In [168]:
def stemming(s):
    ps = PorterStemmer()
    tokens = word_tokenize(s)
    return ' '.join([ps.stem(w) for w in tokens])

In [169]:
df['Plot'] = df['Plot'].apply(stemming)
df['Plot'] = df['Plot'].apply(stemming)

#### Lowercase
On the other hand, we want all the string columns to be lowercase, so that our search engine won't have problems with upper/lower case differences.

In [170]:
for col in str_columns:
    df[col] = df[col].apply(lambda w: w.lower())

In [171]:
df.head()

Unnamed: 0,bookTitle,bookSeries,bookAuthors,ratingValue,ratingCount,reviewCount,Plot,numberOfPages,PublishingDate,Characters,Setting,Url
0,The Hunger Games,the hunger games 1,suzanne collins,4.33,6408798.0,172554.0,could surviv wild everi one make sure live see...,374.0,september 14th 2008,katniss everdeen peeta mellark cato hunger gam...,district 12 panem capitol panem panem,https://www.goodreads.com/book/show/2767052-th...
1,Harry Potter and the Order of the Phoenix,harry potter 5,jk rowling,4.5,2525157.0,42734.0,there door end silent corridor and haunt harri...,870.0,september 2004,sirius black draco malfoy ron weasley petunia ...,hogwarts school of witchcraft and wizardry lon...,https://www.goodreads.com/book/show/2.Harry_Po...
2,To Kill a Mockingbird,to kill a mockingbird,harper lee,4.28,4527405.0,91802.0,the unforgett novel childhood sleepi southern ...,324.0,may 23rd 2006,scout finch atticus finch jem finch arthur rad...,maycomb alabama,https://www.goodreads.com/book/show/2657.To_Ki...
3,Pride and Prejudice,,jane austen,4.26,3017830.0,67811.0,altern cover edit isbn 9780679783268sinc immed...,279.0,october 10th 2000,mr bennet mrs bennet jane bennet elizabeth ben...,united kingdom derbyshire england england hert...,https://www.goodreads.com/book/show/1885.Pride...
4,Twilight,the twilight saga 1,stephenie meyer,3.6,4989910.0,104912.0,about three thing i absolut posit first edward...,501.0,september 6th 2006,edward cullen jacob black laurent renee bella ...,forks washington phoenix arizona washington state,https://www.goodreads.com/book/show/41865.Twil...


In [172]:
df = df.reset_index(drop=True).reset_index()

### Save data

In [173]:
df.to_csv('clean_data.csv', index=False)

# 2. Search Engine

## 2.1 Conjunctive query

### 2.1.1 Create your index!

In [420]:
df = pd.read_csv('clean_data.csv')

In [3]:
# To save and load python dictionaries

def save_obj(obj, name):
    with open(name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

def load_obj(name):
    with open(name + '.pkl', 'rb') as f:
        return pickle.load(f)

In [4]:
def term_index(documents):
    words = set()
    for s in documents:
        try:
            tokens = set(word_tokenize(s))
            words.update(tokens)
        except:
            continue
        
    term_index = {}
    for i, word in enumerate(words):
        term_index[word] = i
    return term_index

In [17]:
term_indexes = term_index(df['Plot'])

In [75]:
save_obj(term_indexes, 'vocabulary')

In [5]:
def inverted_index(documents, term_indexes):
    inv_index = defaultdict(list)
    for i, s in enumerate(documents):
        try:
            tokens = set(word_tokenize(s))
            for token in tokens:
                token_index = term_indexes[token]
                inv_index[token_index].append(i)
        except:
            continue
    return inv_index

In [20]:
inv_indexes = inverted_index(df['Plot'], term_indexes)

In [21]:
save_obj(inv_indexes, 'inverted_index')

### 2.1.2 Execute the query

In [421]:
term_indexes = load_obj('vocabulary')
inv_indexes = load_obj('inverted_index')
tfidf_indexes = load_obj('tfidf_index')

In [494]:
# Write it as a classs

class SimpleSearchEngine:
    def __init__(self, df, term_indexes, inv_indexes):
        self.df = df
        self.term_indexes = term_indexes
        self.inv_indexes = inv_indexes
        
    def search(self, query):
        # Since we performed stemming on the plot column of the dataframe, we need to
        # perform stemming also on the query. Otherwise, our results wouldn't be accurate
        ps = PorterStemmer()
        query_tokens = set([ps.stem(w) for w in word_tokenize(query)])

        # Create term indexes for the query
        # notice: if one of the query element doesn't appear in the term_indexes dictionary
        # we can safely say that the **conjunctive** query has to return nothing
        term_indexes_tokens = []
        for token in query_tokens:
            if token in self.term_indexes.keys():
                term_indexes_tokens.append(self.term_indexes[token])
            else:
                return pd.DataFrame(columns=['bookTitle', 'Plot', 'Url'])

        query_inv_indexes = {}
        for token_index in term_indexes_tokens:
            query_inv_indexes[token_index] = set(self.inv_indexes[token_index])

        # Since it is a conjuntive query, we need to intersect the results of each query token
        documents_id = sorted(set.intersection(*query_inv_indexes.values()))

        return pd.DataFrame(data=self.df[self.df['index'].isin(documents_id)])
    
    def execute_query(self, query):
        return self.search(query)[['bookTitle', 'Plot', 'Url']]
        

In [495]:
simple_SE = SimpleSearchEngine(df, term_indexes, inv_indexes)

In [496]:
simple_SE.execute_query('survival games')

Unnamed: 0,bookTitle,Plot,Url
0,The Hunger Games,could surviv wild everi one make sure live see...,https://www.goodreads.com/book/show/2767052-th...
221,Catching Fire,spark are ignit flame are spread and the capit...,https://www.goodreads.com/book/show/6148028-ca...
319,Mockingjay,the final book ground break hunger game trilog...,https://www.goodreads.com/book/show/7260188-mo...
337,Legend,what western unit state home republ nation per...,https://www.goodreads.com/book/show/9275658-le...
652,The Magus,thi dare literari thriller rich erot suspen on...,https://www.goodreads.com/book/show/16286.The_...
...,...,...,...
25067,The Manhattan Hunt Club,in manhattan hunt club john saul plumb depth m...,https://www.goodreads.com/book/show/6553.The_M...
25531,Love's Forbidden Flower,plea note thi new adult romanc novel involv tw...,https://www.goodreads.com/book/show/16189423-l...
25806,The Southpaw,the southpaw stori come age america way baseb ...,https://www.goodreads.com/book/show/413736.The...
25827,Devil's Own,after surviv slaveri aiden macalpin noth thoug...,https://www.goodreads.com/book/show/8705483-de...


## 2.2 Conjunctive query & Ranking score

### 2.2.1 Inverted index

In [497]:
def tfidf_inv_indexes(documents, term_indexes, inv_indexes, corpus=df):
    tfidf_indexes = defaultdict(dict)
    for doc_id, s in enumerate(documents):
        try:
            tokens = word_tokenize(s)
            tokens_set = set(tokens)
            n_tokens = len(tokens)
            norm = 0
            for token in tokens_set:
                token_index = term_indexes[token]
                # tf = n_times token appears in the document over the number of words of the document
                tf = s.count(token) / n_tokens
                # idf = log of the number of documents in the corpus over the number of documents in which the token appears
                idf = math.log10(len(corpus) / (len(inv_indexes[token_index])))
                tf_idf = tf * idf
                # we just computed the tfidf for the a particular token, for the document we're considering
                tfidf_indexes[token_index][doc_id] = tf * idf
                
                # Store also the norm for the document we're considering
                # which is sqrt of the sum of the squares
                norm += tf_idf ** 2
                
            # apply sqrt
            norm = np.sqrt(norm)
            # Normalize each document tfidf
            for token in tokens_set:
                token_index = term_indexes[token]
                tfidf_indexes[token_index][doc_id] /= norm
        except:
            continue
    return tfidf_indexes

In [426]:
tfidf_indexes = tfidf_inv_indexes(df['Plot'], term_indexes, inv_indexes)

In [427]:
save_obj(tfidf_indexes, 'tfidf_index')

### 2.2.2 Execute the query

In [498]:
class RankCalculator():
    
    # return [0-1]
    def rank(book):
        pass

class ByTfidf(RankCalculator):
    def __init__(self, tfidf_indexes):
        self.tfidf_indexes = tfidf_indexes
    
    def rank(self, book, query, token_ids):
        doc = book['index']
        tfidf = 0
        for token_id in token_ids:
            tfidf += self.tfidf_indexes[token_id][doc]
        return tfidf / np.sqrt(len(query.split()))
    
class ByRatingValue(RankCalculator):
    def __init__(self, df):
        self.max_rating = df['ratingValue'].max()
        
    def rank(self, book, query, token_ids):
        return book['ratingValue'] / self.max_rating
    
class ByRatingCount(RankCalculator):
    def __init__(self, df):
        self.max_rating = df['ratingCount'].max()
        
    def rank(self, book, query, token_ids):
        return book['ratingCount'] / self.max_rating
    
class ByTitleMatch(RankCalculator):
    def rank(self, book, query, token_ids):
        title_lenght = len(word_tokenize(book['bookTitle']))
        matches = 0
        for token in set(word_tokenize(query)):
            if token in book['bookTitle'].lower():
                matches += 1
        return matches / title_lenght
    
class WeightedRanks(RankCalculator):
    def __init__(self, calculators):
        self.calculators = calculators
        
    def rank(self, book, query, token_ids):
        total_weight = np.sum([weight for weight, _ in self.calculators])
        return np.sum([calculator.rank(book, query, token_ids) * weight / total_weight for weight, calculator in self.calculators])

    
    
    
class RankedSearchEngine:
    def __init__(self, df, term_indexes, inv_indexes, tfidf_indexes, simple_SE, score):
        self.df = df
        self.term_indexes = term_indexes
        self.inv_indexes = inv_indexes
        self.tfidf_indexes = tfidf_indexes
        self.simple_SE = simple_SE
        self.rank_calculator = rank_calculator
        
    def execute_query(self, query, k):
        # First stem the query
        ps = PorterStemmer()
        query_tokens = set([ps.stem(w) for w in word_tokenize(query)])
        
        # Extract the token indexes from the vocabulary
        tokens_ids = []
        for token in query_tokens:
            try:
                tokens_ids.append(self.term_indexes[token])
            except:
                return
        
        # Compute the simple conjunctive query to get the books in which the query appears
        conj_query = self.simple_SE.search(query)
        
        # Extract the documents id for these books
        conj_query_ids = conj_query.index
        # Compute the similiarity
        conj_query['Similarity'] = conj_query.apply(lambda t: self.rank_calculator.rank(t, query, tokens_ids), axis=1)

        # Use heaps to extract top k rows
        conj_query_list = conj_query[['bookTitle', 'Plot', 'Url', 'Similarity']].values.tolist()
        # return conj_query_list
        heapq.heapify(conj_query_list)
        max_k = heapq.nlargest(k, conj_query_list, key = lambda t: t[3])

        # Convert back to dataframe to show it
        max_k_df = pd.DataFrame(data=max_k, columns=['bookTitle', 'Plot', 'Url', 'Similarity'])

        return max_k_df

In [502]:
rank_calculator = WeightedRanks([
    (1, ByTfidf(tfidf_indexes)),
])



tfidf_se = RankedSearchEngine(df, term_indexes, inv_indexes, tfidf_indexes, simple_se, rank_calculator)

tfidf_se.execute_query('survival games', 10)

Unnamed: 0,bookTitle,Plot,Url,Similarity
0,The Warden,alic led normal life she wake find trap sick g...,https://www.goodreads.com/book/show/33655366-t...,0.350603
1,Devil's Own,after surviv slaveri aiden macalpin noth thoug...,https://www.goodreads.com/book/show/8705483-de...,0.27722
2,The Quillan Games,let the game begin quillan territori verg dest...,https://www.goodreads.com/book/show/215540.The...,0.249645
3,The Hunger Games,could surviv wild everi one make sure live see...,https://www.goodreads.com/book/show/2767052-th...,0.225134
4,Truth,from new york time usa today bestsel author al...,https://www.goodreads.com/book/show/16070018-t...,0.174925
5,The Books of the South,march south ghastli battl tower charm black co...,https://www.goodreads.com/book/show/2365730.Th...,0.173684
6,Cage of Darkness,while travel fren allyssa odar hijack ruthless...,https://www.goodreads.com/book/show/33893388-c...,0.16252
7,Warcross,for million log everi day warcross game way li...,https://www.goodreads.com/book/show/41014903-w...,0.150607
8,Becoming Noah Baxter,part two two part seri jay lili complet way on...,https://www.goodreads.com/book/show/18926659-b...,0.146844
9,Mockingjay,the final book ground break hunger game trilog...,https://www.goodreads.com/book/show/7260188-mo...,0.146466


# 3. Define a new score!

For this particular task, we didn't feel like a search engine like this could benefit from multiple type of information in the query. Instead, we tought about creating a new scoring function by analizing the reviews columns, not by over analyzing the query itself.

Our scoring function is based on the fact that, probably, the most rated books are the most important ones; moreover, the user is probably most interested in the ones with an high number of ratings. Now there is an issue: which kind of book would you prefer to see, one which has a small amount of ratings but with perfect rating value, or one which has a score which is pretty high (not perfect), but with lots of ratings?

For example, let's say we have three books in our query results, and we want to sort them by their ratings:

- **Book 1**: ⭐⭐⭐⭐⭐ 5/5 (10 total ratings) $\rightarrow$ 100%, 10 positive reviews, 0 negative reviews
- **Book 2**: ⭐⭐⭐⭐✨ 4.8/5 (50 total ratings) $\rightarrow$ 96%, 48 positive reviews, 2 negative reviews
- **Book 3**: ⭐⭐⭐⭐✨ 4.65/5 (200 total ratings) $\rightarrow$ 93%, 186 positive reviews, 14 negative reviews

Which one should be considered the **best** book in terms of ratings?

We probably all have the same confidence in saying that the more data we see, it gives us more confidence in a given rating. When we look at perfect ratings, more often than not they come from a tiny number of reviews, which makes it feel more plausible that things could have gone another way, and give a lower ratings. How can we make this intuition quantitative? How can we rationally reason about this trade-off? A good mathematical explanation about this is given by John Cook in his [blogpost](https://www.johndcook.com/blog/2011/09/27/bayesian-amazon/). However, we are not going to dive too much in its mathematical explanation (even if it is actually really interesting!).

In our example, to get the best book, we need to use **Rule of succession**, a formula introduced by Laplace in the 18th century. This is basically used to answer the following question: if we repeat an experiment that we know can result in a success or failure, $n$ times independently, and get $s$ successes, and $n-s$ failures, then what is the probability that the next repetition will succeed? And the answer is:

$$P(X_{{n+1}}=1\mid X_{1}+\cdots +X_{n}=s)= \frac{s + 1}{n+2}.$$

In the above example, it means that when we see the ratings, we should pretend like were 2 more reviews, one which is positive, one which is negative. For each book, this would result in the following:

1) For **book 1**, we would have 11 positive ratings and 1 negative rating, which would give $\frac{11}{12} \approx 91.7\%$.

2) For **book 2**, we would have 49 positive ratings and 3 negative rating, which would give $\frac{49}{52} \approx 94.2\%$.

3) For **book 3**, we would have 187 positive ratings and 15 negative rating, which would give $\frac{187}{202} \approx 92.6\%$.

These probabilities we see here, would be the probabilities of having a good experience with those given books, and it is actually what we can say to infer that the second option would be the best one.

In [507]:
rank_calculator = WeightedRanks([
    (8, ByTfidf(tfidf_indexes)),
    (3, ByRatingValue(df)),
    (10, ByRatingCount(df)),
    (5, ByTitleMatch())
])


rnkd_SE = RankedSearchEngine(df, term_indexes, inv_indexes, tfidf_indexes, simple_se, rank_calculator)

rnkd_SE.execute_query('survival games', 10)

Unnamed: 0,bookTitle,Plot,Url,Similarity
0,The Hunger Games,could surviv wild everi one make sure live see...,https://www.goodreads.com/book/show/2767052-th...,0.580531
1,Mockingjay,the final book ground break hunger game trilog...,https://www.goodreads.com/book/show/7260188-mo...,0.265922
2,Catching Fire,spark are ignit flame are spread and the capit...,https://www.goodreads.com/book/show/6148028-ca...,0.250651
3,The Quillan Games,let the game begin quillan territori verg dest...,https://www.goodreads.com/book/show/215540.The...,0.238758
4,Wicked Games,abbi lewi never pictur surviv game show endur ...,https://www.goodreads.com/book/show/10719342-w...,0.229486
5,The Warden,alic led normal life she wake find trap sick g...,https://www.goodreads.com/book/show/33655366-t...,0.202725
6,Devil's Own,after surviv slaveri aiden macalpin noth thoug...,https://www.goodreads.com/book/show/8705483-de...,0.173697
7,Truth,from new york time usa today bestsel author al...,https://www.goodreads.com/book/show/16070018-t...,0.156104
8,The Books of the South,march south ghastli battl tower charm black co...,https://www.goodreads.com/book/show/2365730.Th...,0.152318
9,Cage of Darkness,while travel fren allyssa odar hijack ruthless...,https://www.goodreads.com/book/show/33893388-c...,0.149313
