## Recommendation systems 

These are algorithms and techniques designed to provide personalized suggestions or recommendations to users. These systems are widely used in various online platforms to enhance user experiences and help users discover relevant content, products, or services. There are several types of recommendation systems, including:



### Content-Based Filtering:

This approach recommends items similar to those a user has shown interest in, based on the characteristics of the items and the user's profile. It doesn't rely on user-user or item-item similarity.

### Collaborative Filtering:

User-Based Collaborative Filtering: This method recommends items to a user based on the preferences of users who are similar to them.

Item-Based Collaborative Filtering: This method recommends items based on their similarity to items the user has already interacted with.

### Hybrid Recommender Systems:

These systems combine multiple recommendation techniques to provide more accurate and diverse recommendations. For example, combining collaborative and content-based filtering.

- We will go for Collaborative Filtering based approach here

# Importing Libraries and Dataset

In [1]:
import pandas as pd
import numpy as np

### 1. Books

In [4]:
books = pd.read_csv("BX_Books.csv",
                   sep = ";", encoding= "latin-1")
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [5]:
books.shape

(271379, 8)

In [6]:
books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

In [7]:
books = books[["ISBN", "Book-Title",'Book-Author', 'Year-Of-Publication', 'Publisher' ]]
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


In [8]:
books.rename(columns = {"Book-Title": "title", 
                        "Book-Author": "author", 
                        'Year-Of-Publication' : "year", 
                        'Publisher' : "publisher"},
            inplace = True)
books.head()

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


### 2. Users

In [None]:
users = pd.read_csv("BX-Users.csv",
                   sep = ";", error_bad_lines = False, encoding= "latin-1")
users.head()

In [None]:
users.shape

In [None]:
users.rename(columns = {"User-ID" : "user_id", "Location": "location", "Age" : "age"}, inplace= True)
users.head()

### 3. Ratings

In [None]:
ratings = pd.read_csv("BX-Book-Ratings.csv",
                   sep = ";", error_bad_lines = False, encoding= "latin-1")
ratings.head()

In [None]:
ratings.rename(columns = {"User-ID" : "user_id", "Book-Rating" : "rating"}, inplace= True)
ratings.head()

Let us see the amount of data we have, how many records we have here:

In [None]:
books.shape

In [None]:
users.shape

In [None]:
ratings.shape

## Setting approach

We need to use collaborative filtering and give user recommendations based on preferences of users similar to them.

We will suggest that book to you which you have not read but you might like it because other similar user like it.

We will consider ratings of those users who have read lot of books, so that it would be a good estimate. (Knowledgable user)

We will create a matrix with users as columns and books as values.

We will use these constraints:
    
    1. We will pick Books which have atleast 50 ratings (so that we can rely on ratings.)
    2. We will consider users which have given atleast 200 ratings. (knowledgable user)

### 1. Picking Users

In [None]:
ratings.head(2)

In [None]:
ratings["rating"].value_counts()

Ok So, we have rating from 0 to 10

In [None]:
ratings["user_id"].value_counts()

Total unique users who have done some rating are about 1 lacs

We need to select those users who have done ratings more than 200 books.

In [None]:
ratings["user_id"].value_counts() > 200

In [None]:
x = ratings["user_id"].value_counts() > 200

In [None]:
x[x]

In [None]:
x[x].shape

We have only 899 users like that.

These are the great people, who's intelligence will make our model.

Let us take user_id of these users

In [None]:
y = x[x].index
y

In [None]:
y[0]

In [None]:
ratings["user_id"].isin(y)

In [None]:
ratings = ratings[ratings["user_id"].isin(y)]
ratings

So, we are left with 5 lakh something ratings by these 899 users.

### Joining books table to ratings table now

In [None]:
books.head()

Based on ISBN, we can join these two tables

In [None]:
ratings_with_books = ratings.merge(books, on = "ISBN")
ratings_with_books

As this is a inner join, means there are some books which do not have ratings and vice versa

## 2. Picking books

Books must have atleast 50 ratings given:

In [None]:
ratings_with_books.groupby("title")["rating"].count().reset_index()

In [None]:
number_rating = ratings_with_books.groupby("title")["rating"].count().reset_index()

In [None]:
number_rating

In [None]:
number_rating.rename(columns = {"rating" : "number of ratings"}, inplace= True)

In [None]:
number_rating

Joining this table to the ratings with books

In [None]:
final_rating = ratings_with_books.merge(number_rating, on = "title")
final_rating

In [None]:
final_rating = final_rating[final_rating["number of ratings"] >= 50]
final_rating

These are the users who have done more than 200 ratings and books which have atleast 50 ratings given.

### Removing duplicates

In [None]:
final_rating.duplicated(["user_id", "title"]).sum()

So, same user have done multiple times reviews of the same book which we need to remove from our records.

We will just consider the first review.

In [None]:
final_rating.drop_duplicates(["user_id", "title"])

In [None]:
final_rating.drop_duplicates(["user_id", "title"], inplace= True)

Let us see how many unique books we are having here.

In [None]:
len(final_rating["title"].unique())

So, we are just left with 742 books only to work on.

## Making a pivot

I want to see users in columns and their rating of books in the rows and the corresponding rating as the value of the cell.

In [None]:
book_pivot = final_rating.pivot_table(columns = "user_id", index = "title", values = "rating")
book_pivot

So, we are just left with 742 books and 888 users to play with.

We have the problem of NaN values here

In [None]:
book_pivot.fillna(0, inplace= True)

In [None]:
book_pivot

## Transformation

We will use sklearn nearest neighbors clustering algorithm which works on finding out distance between objects and making them fit into K clusters.

The pivot table that we have is having very sparse data means lot of zero, we can go for some optimization using CSR library.

A Compressed Sparse Row (CSR) matrix is a popular data structure used for efficient storage and manipulation of sparse matrices. 

Sparse matrices are matrices in which most of the elements are zero. 

CSR is a memory-efficient way to represent sparse matrices in computer memory.

In [None]:
import scipy.sparse as sp

In [None]:
csr_matrix = sp.csr_matrix(book_pivot)
csr_matrix

# Importing ML Clustering Algorithm

In [None]:
from sklearn.neighbors import NearestNeighbors

Creating an instance of the NearestNeighbors class from a machine learning library like Scikit-Learn. 

The NearestNeighbors class is used for unsupervised machine learning tasks, particularly for finding nearest neighbors in a dataset.

In [None]:
model = NearestNeighbors(algorithm="brute")

The algorithm parameter is set to "brute," which means that this instance of NearestNeighbors will use a brute-force search to find the nearest neighbors. In a brute-force search, the algorithm directly computes the distances between data points to find the nearest neighbors, which can be computationally expensive for large datasets.

In [None]:
model.fit(csr_matrix)

In [None]:
model.n_neighbors

So, it have chosen 5 nearest neighbors by default

Now, the model will tell us the suggestions of the books, by taking a input book and the number of recommendations of books we want

In [None]:
book_pivot

iloc is a method in the Pandas library, which is a popular Python library for data manipulation and analysis. It is used for integer-location-based indexing and selection of data in a Pandas DataFrame or Series. iloc allows you to select rows and columns by their integer positions, rather than by labels or boolean conditions.

In [None]:
book_pivot.iloc[0,:]

In [None]:
book_pivot.index.values

In [None]:
book_pivot.iloc[237,:]

In [None]:
book_pivot.iloc[237,:].values.reshape(1,-1)

In [None]:
distances, suggestions = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1,-1), n_neighbors = 6)

In [None]:
distances

In [None]:
suggestions

So, the model is able to find out the nearest 5 books in the hyper dimensional space that will be used for recommedndations.

In [None]:
book_pivot.index[240]

In [None]:
for suggestion in suggestions:
    print(book_pivot.index[suggestion])

These are all the books that are related to Harry Potter, wow how does it work, 

Just based on rating of people and based on collaborative filtering we can make very good recommendations.

## Creating a usable function for this recommender system

In [None]:
book_pivot.index == "The Cradle Will Fall"

But we want the index of it

In [None]:
np.where(book_pivot.index == "The Cradle Will Fall")

In [None]:
np.where(book_pivot.index == "The Cradle Will Fall")[0][0]

In [None]:
def recommed_books(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distances, suggestions = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors = 6)
    suggestions_list = []
    for index in suggestions:
        suggestions_list.append(book_pivot.index[index])
    return suggestions_list  

In [None]:
recommed_books("The Cradle Will Fall")

In [None]:
recommed_books("The Cradle Will Fall")[0][1:]

# Saving the requirements as pickle file to deploy this using streamlit application

In [None]:
book_pivot

We will require book pivot to create the list of books from where user will select the books

In [None]:
import pickle

In [None]:
pickle.dump(book_pivot, open("book_pivot.pkl", "wb"))

We will need the model

In [None]:
pickle.dump(model, open("model.pkl", "wb"))

And this function we can use now

In [None]:
def recommed_books(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distances, suggestions = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors = 6)
    suggestions_list = []
    for index in suggestions:
        suggestions_list.append(book_pivot.index[index])
    return suggestions_list  

In [None]:
book_pivot.index.values