# Model Building, Approach 2. `SentenceTransformer` + `faiss`

### Getting ready

Importing libraries

In [15]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import pandas as pd
import sys; sys.path.insert(0, '../src'); from eda_utils import display_books_info

Loading the dataset

In [2]:
df = pd.read_csv(r'..\datasets\dataset.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 87687 entries, 0 to 87686
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   title         87687 non-null  object 
 1   author        87687 non-null  object 
 2   desc          87687 non-null  object 
 3   genre         87687 non-null  object 
 4   rating        87687 non-null  float64
 5   reviews       87687 non-null  int64  
 6   totalratings  87687 non-null  int64  
 7   pages         87687 non-null  int64  
 8   img           87687 non-null  object 
 9   link          87687 non-null  object 
 10  isbn          87687 non-null  object 
 11  score         87687 non-null  float64
dtypes: float64(2), int64(3), object(7)
memory usage: 8.0+ MB


### Encoding the embeddings in `Google Colab`

Since it takes some compute power to encode the embeddings, I am gonna use `Google Colab` notebook and `T4 GPU` runtime.

> [**Source link**](https://colab.research.google.com/drive/1GBtTwZAcCctheVXkqhp2jhIVQG_cCbnA?usp=sharing)

### Loading saved objects from `Google Colab`

Loading the embeddings

In [3]:
embeddings = np.load(r"..\datasets\book_embeddings.npy")

Loading the `faiss` index

In [5]:
index = faiss.read_index(r"..\datasets\book_index.faiss")

### Main function

Recommend function

In [17]:
def recommend(book_index, k=5):
  """
  Recommends k books similar to the given book.

  Args:
    book_index (int): The index of the book in the dataframe.
    k (int): The number of books to recommend.

  Returns:
    A dataframe containing the recommended books.
  """

  # getting embedding vector by book_index
  query_vector = embeddings[book_index].reshape(1, -1)

  # getting k + 1 similar vectors (+1 to exclude self)
  D, I = index.search(query_vector, k + 1)

  # returning a dataframe with selected indices (slicing by [1:] to exclude self)
  return df.iloc[I[0][1:]]

In [19]:
idx = int(df[df['title'].str.lower() == 'steve jobs'.lower()].index.values[0])

In [20]:
display_books_info(recommend(idx))

Title: The Steve Jobs Way: iLeadership for a New Generation
Author: Jay Elliot, William L. Simon
Pages: 0
Link: https://goodreads.com/book/show/10589332-the-steve-jobs-way




Title: Insanely Simple: The Obsession That Drives Apple's Success
Author: Ken Segall
Pages: 240
Link: https://goodreads.com/book/show/13383957-insanely-simple




Title: I, Steve: Steve Jobs In His Own Words
Author: George Beahm
Pages: 160
Link: https://goodreads.com/book/show/12634780-i-steve




Title: Steve Jobs: Genius by Design
Author: Jason Quinn, Amit Tayal, Tayal,  Amit
Pages: 104
Link: https://goodreads.com/book/show/13536324-steve-jobs




Title: Steve Jobs: The Man Who Thought Different
Author: Karen Blumenthal
Pages: 320
Link: https://goodreads.com/book/show/12969593-steve-jobs




