# Model Building, Approach 2. `SentenceTransformer` + `faiss`

### Getting ready

Importing libraries

In [1]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd
import sys; sys.path.insert(0, '../src'); from eda_utils import display_books_info

  from .autonotebook import tqdm as notebook_tqdm


Loading the dataset

In [2]:
df = pd.read_csv(r'..\datasets\dataset.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87687 entries, 0 to 87686
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   title         87687 non-null  object 
 1   author        87687 non-null  object 
 2   desc          87687 non-null  object 
 3   genre         87687 non-null  object 
 4   rating        87687 non-null  float64
 5   reviews       87687 non-null  int64  
 6   totalratings  87687 non-null  int64  
 7   pages         87687 non-null  int64  
 8   img           87687 non-null  object 
 9   link          87687 non-null  object 
 10  isbn          87687 non-null  object 
 11  score         87687 non-null  float64
dtypes: float64(2), int64(3), object(7)
memory usage: 8.0+ MB


### Encoding the embeddings in `Google Colab`

Since it takes some compute power to encode the embeddings, I am gonna use `Google Colab` notebook and `T4 GPU` runtime.

> [**Source code**](https://colab.research.google.com/drive/1GBtTwZAcCctheVXkqhp2jhIVQG_cCbnA?usp=sharing)

### Loading saved objects from `Google Colab`

Loading the embeddings

In [3]:
embeddings = np.load(r"..\datasets\book_embeddings.npy")

Loading the `faiss` index

In [4]:
index = faiss.read_index(r"..\datasets\book_index.faiss")

### Main function

Recommend function

In [25]:
def recommend(book_index, k=5):
  """
  Recommends k books similar to the given book.

  Args:
    book_index (int): The index of the book in the dataframe.
    k (int): The number of books to recommend.

  Returns:
    A dataframe containing the recommended books.
  """

  # getting embedding vector by book_index
  query_vector = embeddings[book_index].reshape(1, -1)

  # getting k + 1 similar vectors (+1 to exclude self)
  D, I = index.search(query_vector, k + 1)

  # returning a dataframe with selected indices (slicing by [1:] to exclude self)
  return df.iloc[I[0][1:]].to_dict(orient='records')

In [30]:
idx = int(df[df['title'].str.lower() == 'a clash of kings'.lower()].index.values[0])
idx

7864

In [29]:
recommend(idx, k=2)

[{'title': 'A Clash of Kings',
  'author': "['George R.R. Martin']",
  'desc': 'A comet the color of blood and flame cuts across the sky. Two great leadersLord Eddard Stark and Robert Baratheonwho hold sway over an age of enforced peace are dead, victims of royal treachery. Now, from the ancient citadel of Dragonstone to the forbidding shores of Winterfell, chaos reigns. Six factions struggle for control of a divided land and the Iron Throne of the Seven Kingdoms, preparing to stake their claims through tempest, turmoil, and war. ,It is a tale in which brother plots against brother and the dead rise to walk in the night. Here a princess masquerades as an orphan boy; a knight of the mind prepares a poison for a treacherous sorceress; and wild men descend from the Mountains of the Moon to ravage the countryside. Against a backdrop of incest and fratricide, alchemy and murder, victory may go to the men and women possessed of the coldest steel...and the coldest hearts. For when kings clash

In [31]:
#display_books_info(recommend(idx, k=10))