## Setup

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel # faster here because of how TfidfVectorizer works

In [2]:
book_df = pd.read_csv("../datasets/clean/filtered_datasets/Final/final_books.csv")
book_df.shape

(2332, 11)

---

## Tf-Idf setup + preprocessing

We setup tfidf by setting the stop words in english and editing all empty descriptions to be ''.

Then we just fit Tf-Idf over our descriptions

In [3]:
#def a TF-IDF Vectorizer Object. Remove all english stop words such as 'the', 'a' etc.
tfidf = TfidfVectorizer(stop_words='english')

#Make empty cells into empty strings (should not be a lot of them)
# book_df['description'] = book_df['description'].fillna('') -> NO MORE CUZ WE REMOVED THEM

#Make TF-IDF matrix by fitting and transforming the data
tfidf_matrix = tfidf.fit_transform(book_df['description'])
tfidf_matrix.shape

(2332, 16492)

Fit linear kernel over the tfidf matrix

In [4]:
linear_kernel = linear_kernel(tfidf_matrix, tfidf_matrix)
# Look at other similarity functions

We create an indices matrix so we can search book index by name

In [5]:
indices = pd.Series(book_df.index, index=book_df['Book-Title']).drop_duplicates()

---

## Generating recommendations

We create the function that takes in a book title as input and outputs most similar books

In [21]:
def get_recommendations(title,amount = 10, linear_kernel=linear_kernel):
    if title in book_df['Book-Title'].values:
        idx = book_df.loc[book_df['Book-Title'] == title, 'Book-Id'].values[0]
    else:
        print(f"Book title '{title}' not found in the DataFrame.")
        return None
    sim_scores = list(enumerate(linear_kernel[idx]))

    # sort by most similar and get top 10 similar books
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:amount+1]
    
    book_indices = [i[0] for i in sim_scores]
    recommended_df = book_df[['Book-Title','categories','description']].iloc[book_indices]
    recommended_df = pd.DataFrame(recommended_df)
    return recommended_df

In [22]:
get_recommendations('The Lord of the Rings',amount=10)

Unnamed: 0,Book-Title,categories,description
844,The Languages of Tolkien's Middle-Earth,Fiction,Explains the fourteen different languages and ...
498,The Hobbit: Or There and Back Again,Juvenile Fiction,A newly rejacketed edition of the classic tale...
876,Island of Bones,Fiction,"When the bullet-ridden body of a woman, identi..."
145,"The Return of the King (The Lord of the Rings,...",Fantasy fiction,"In a sleepy village in the Shire, a young hobb..."
1836,The Return of the Shadow (The History of The L...,Fiction,In this sixth volume of The History of Middle-...
949,Phoenix: Terrible Swift Sword: Volume Two in t...,History,The second episode in this award-winning trilo...
28,"The Two Towers (The Lord of the Rings, Part 2)",Fiction,"Frodo Baggins, Sam, and a small band of compan..."
419,"The Two Towers (The Lord of the Rings, Part 2)",Fiction,The standard hardcover edition of the second v...
671,Swan Song,Fiction,An ancient evil roams the desolate landscape o...
2173,Fear of Falling: The Inner Life of the Middle ...,Social Science,A brilliant and insightful work that examines ...


In [23]:
get_recommendations('Midnight Voices',10)

Unnamed: 0,Book-Title,categories,description
350,Suffer the Children,Fiction,"100 years after a young girl's murder, childre..."
1840,Life: A User's Manual,Fiction,Represents an exploration of the relationship ...
2232,The Jungle: The Uncensored Original Edition,Fiction,The horrifying conditions of the Chicago stock...
634,Human Croquet,Fiction,A dark yet moving exploration of a young girl'...
1735,The Railway Children (Penguin Popular Classics),Fiction,"When their father is sent away to prison, thre..."
1815,The Waste Land and Other Writings (Modern Libr...,Poetry,"First published in 1922, ""The Waste Land"" is T..."
1022,The Diamond Age (Bantam Spectra Book),Fiction,The story of an engineer who creates a device ...
1406,The Living Blood,Fiction,Struggling to rebuild her life after the disap...
870,Savannah Blues,Fiction,Landing a catch like Talmadge Evans III got El...
345,Thin Air,Fiction,"When Lisa St. Claire, the beautiful young brid..."
