# Model Building, Approach 2. `SentenceTransformer` + `faiss`

### Getting ready

Importing libraries

In [1]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd
import sys; sys.path.insert(0, '../src'); from eda_utils import display_books_info

  from .autonotebook import tqdm as notebook_tqdm


Loading the dataset

In [2]:
df = pd.read_csv(r'..\datasets\preprocessed.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41617 entries, 0 to 41616
Data columns (total 25 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   bookId            41617 non-null  object 
 1   title             41617 non-null  object 
 2   series            20420 non-null  object 
 3   author            41617 non-null  object 
 4   rating            41617 non-null  float64
 5   description       41617 non-null  object 
 6   language          41617 non-null  object 
 7   isbn              41617 non-null  object 
 8   genres            41617 non-null  object 
 9   characters        41617 non-null  object 
 10  bookFormat        41234 non-null  object 
 11  edition           3773 non-null   object 
 12  pages             40464 non-null  object 
 13  publisher         39307 non-null  object 
 14  publishDate       41298 non-null  object 
 15  firstPublishDate  25401 non-null  object 
 16  awards            41617 non-null  object

### Encoding the embeddings in `Google Colab`

Since it takes some compute power to encode the embeddings, I am gonna use `Google Colab` notebook and `T4 GPU` runtime.

> [**Source code**](https://colab.research.google.com/drive/1GBtTwZAcCctheVXkqhp2jhIVQG_cCbnA?usp=sharing)

### Loading saved objects from `Google Colab`

Loading the embeddings

In [4]:
embeddings = np.load(r"..\assets\models\book_embeddings.npy")

Loading the `faiss` index

In [5]:
index = faiss.read_index(r"..\assets\models\book_index.faiss")

### Main function

Recommend function

In [6]:
def recommend(book_index, k=5):
  """
  Recommends k books similar to the given book.

  Args:
    book_index (int): The index of the book in the dataframe.
    k (int): The number of books to recommend.

  Returns:
    A dataframe containing the recommended books.
  """

  # getting embedding vector by book_index
  query_vector = embeddings[book_index].reshape(1, -1)

  # getting k + 1 similar vectors (+1 to exclude self)
  D, I = index.search(query_vector, k + 1)

  # returning a dataframe with selected indices (slicing by [1:] to exclude self)
  return df.iloc[I[0][1:]].to_dict(orient='records')

In [7]:
idx = int(df[df['title'].str.lower() == 'a clash of kings'.lower()].index.values[0])
idx

246

In [10]:
recommendations = recommend(idx, k=10)
for book in recommendations:
    print("Title: " + book['title'])
    print("Author: " + book['author'])
    print("Desctiption: " + book['description'])
    print()

Title: A Game of Thrones / A Clash of Kings
Author: George R.R. Martin
Desctiption: 2 eBooks in 1! George R. R. Martin, a writer of unsurpassed vision, power, and imagination, has created a landmark of fantasy fiction. Now his two epic works, A Game of Thrones and A Clash of Kings are combined together in this eBook edition. Sweeping from a harsh land of cold to a summertime kingdom of epicurean plenty, A Game of Thrones tells a tale of lords and ladies, soldiers and sorcerers, assassins and bastards who come together in a time of grim omens. Here, an enigmatic band of warriors bear swords of no human metal, a tribe of fierce wildings carry men off into madness, a cruel young dragon prince barters his sister to win back his throne, a child is lost in the twilight between life and death, and a determined woman undertakes a treacherous journey to protect all she holds dear. Amid plots and counterplots, tragedy and betrayal, victory and terror, allies and enemies, the fate of the Starks h

In [31]:
#display_books_info(recommend(idx, k=10))