## Recommendation systems 

These are algorithms and techniques designed to provide personalized suggestions or recommendations to users. These systems are widely used in various online platforms to enhance user experiences and help users discover relevant content, products, or services. There are several types of recommendation systems, including:



### Content-Based Filtering:

This approach recommends items similar to those a user has shown interest in, based on the characteristics of the items and the user's profile. It doesn't rely on user-user or item-item similarity.

### Collaborative Filtering:

User-Based Collaborative Filtering: This method recommends items to a user based on the preferences of users who are similar to them.

Item-Based Collaborative Filtering: This method recommends items based on their similarity to items the user has already interacted with.

### Hybrid Recommender Systems:

These systems combine multiple recommendation techniques to provide more accurate and diverse recommendations. For example, combining collaborative and content-based filtering.

- We will go for Collaborative Filtering based approach here

# Importing Libraries and Dataset

In [1]:
import pandas as pd
import numpy as np

### 1. Books

In [2]:
books = pd.read_csv("BX_Books.csv",
                   sep = ";", error_bad_lines = False, encoding= "latin-1")
books.head()



  books = pd.read_csv("BX_Books.csv",


Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [3]:
books.shape

(271379, 8)

In [4]:
books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

In [5]:
books = books[["ISBN", "Book-Title",'Book-Author', 'Year-Of-Publication', 'Publisher' ]]
books.head()

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


In [6]:
books.rename(columns = {"Book-Title": "title", 
                        "Book-Author": "author", 
                        'Year-Of-Publication' : "year", 
                        'Publisher' : "publisher"},
            inplace = True)
books.head()

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


### 2. Users

In [7]:
users = pd.read_csv("BX-Users.csv",
                   sep = ";", error_bad_lines = False, encoding= "latin-1")
users.head()



  users = pd.read_csv("BX-Users.csv",


Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [8]:
users.shape

(278858, 3)

In [9]:
users.rename(columns = {"User-ID" : "user_id", "Location": "location", "Age" : "age"}, inplace= True)
users.head()

Unnamed: 0,user_id,location,age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### 3. Ratings

In [10]:
ratings = pd.read_csv("BX-Book-Ratings.csv",
                   sep = ";", error_bad_lines = False, encoding= "latin-1")
ratings.head()



  ratings = pd.read_csv("BX-Book-Ratings.csv",


Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [11]:
ratings.rename(columns = {"User-ID" : "user_id", "Book-Rating" : "rating"}, inplace= True)
ratings.head()

Unnamed: 0,user_id,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


Let us see the amount of data we have, how many records we have here:

In [12]:
books.shape

(271379, 5)

In [13]:
users.shape

(278858, 3)

In [14]:
ratings.shape

(1149780, 3)

## Setting approach

We need to use collaborative filtering and give user recommendations based on preferences of users similar to them.

We will suggest that book to you which you have not read but you might like it because other similar user like it.

We will consider ratings of those users who have read lot of books, so that it would be a good estimate. (Knowledgable user)

We will create a matrix with users as columns and books as values.

We will use these constraints:
    
    1. We will pick Books which have atleast 50 ratings (so that we can rely on ratings.)
    2. We will consider users which have given atleast 200 ratings. (knowledgable user)

### 1. Picking Users

In [15]:
ratings.head(2)

Unnamed: 0,user_id,ISBN,rating
0,276725,034545104X,0
1,276726,0155061224,5


In [16]:
ratings["rating"].value_counts()

0     716109
8     103736
10     78610
7      76457
9      67541
5      50974
6      36924
4       8904
3       5996
2       2759
1       1770
Name: rating, dtype: int64

Ok So, we have rating from 0 to 10

In [17]:
ratings["user_id"].value_counts()

11676     13602
198711     7550
153662     6109
98391      5891
35859      5850
          ...  
116180        1
116166        1
116154        1
116137        1
276723        1
Name: user_id, Length: 105283, dtype: int64

Total unique users who have done some rating are about 1 lacs

We need to select those users who have done ratings more than 200 books.

In [18]:
ratings["user_id"].value_counts() > 200

11676      True
198711     True
153662     True
98391      True
35859      True
          ...  
116180    False
116166    False
116154    False
116137    False
276723    False
Name: user_id, Length: 105283, dtype: bool

In [19]:
x = ratings["user_id"].value_counts() > 200

In [20]:
x[x]

11676     True
198711    True
153662    True
98391     True
35859     True
          ... 
274808    True
28634     True
59727     True
268622    True
188951    True
Name: user_id, Length: 899, dtype: bool

In [21]:
x[x].shape

(899,)

We have only 899 users like that.

These are the great people, who's intelligence will make our model.

Let us take user_id of these users

In [22]:
y = x[x].index
y

Int64Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352,
            110973, 235105,
            ...
            260183,  73681,  44296, 155916,   9856, 274808,  28634,  59727,
            268622, 188951],
           dtype='int64', length=899)

In [23]:
y[0]

11676

In [24]:
ratings["user_id"].isin(y)

0          False
1          False
2          False
3          False
4          False
           ...  
1149775    False
1149776    False
1149777    False
1149778    False
1149779    False
Name: user_id, Length: 1149780, dtype: bool

In [25]:
ratings = ratings[ratings["user_id"].isin(y)]
ratings

Unnamed: 0,user_id,ISBN,rating
1456,277427,002542730X,10
1457,277427,0026217457,0
1458,277427,003008685X,8
1459,277427,0030615321,0
1460,277427,0060002050,0
...,...,...,...
1147612,275970,3829021860,0
1147613,275970,4770019572,0
1147614,275970,896086097,0
1147615,275970,9626340762,8


So, we are left with 5 lakh something ratings by these 899 users.

### Joining books table to ratings table now

In [26]:
books.head()

Unnamed: 0,ISBN,title,author,year,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton & Company


Based on ISBN, we can join these two tables

In [27]:
ratings_with_books = ratings.merge(books, on = "ISBN")
ratings_with_books

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc
...,...,...,...,...,...,...,...
487680,275970,1892145022,0,Here Is New York,E. B. White,1999,Little Bookroom
487681,275970,1931868123,0,There's a Porcupine in My Outhouse: Misadventu...,Mike Tougias,2002,Capital Books (VA)
487682,275970,3411086211,10,Die Biene.,Sybil GrÃ?Â¤fin SchÃ?Â¶nfeldt,1993,"Bibliographisches Institut, Mannheim"
487683,275970,3829021860,0,The Penis Book,Joseph Cohen,1999,Konemann


As this is a inner join, means there are some books which do not have ratings and vice versa

## 2. Picking books

Books must have atleast 50 ratings given:

In [28]:
ratings_with_books.groupby("title")["rating"].count().reset_index()

Unnamed: 0,title,rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
160275,Ã?Â?ber die Pflicht zum Ungehorsam gegen den S...,3
160276,Ã?Â?lpiraten.,1
160277,Ã?Â?rger mit Produkt X. Roman.,1
160278,Ã?Â?stlich der Berge.,1


In [31]:
number_rating = ratings_with_books.groupby("title")["rating"].count().reset_index()

In [32]:
number_rating

Unnamed: 0,title,rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
160275,Ã?Â?ber die Pflicht zum Ungehorsam gegen den S...,3
160276,Ã?Â?lpiraten.,1
160277,Ã?Â?rger mit Produkt X. Roman.,1
160278,Ã?Â?stlich der Berge.,1


In [33]:
number_rating.rename(columns = {"rating" : "number of ratings"}, inplace= True)

In [34]:
number_rating

Unnamed: 0,title,number of ratings
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1
...,...,...
160275,Ã?Â?ber die Pflicht zum Ungehorsam gegen den S...,3
160276,Ã?Â?lpiraten.,1
160277,Ã?Â?rger mit Produkt X. Roman.,1
160278,Ã?Â?stlich der Berge.,1


Joining this table to the ratings with books

In [35]:
final_rating = ratings_with_books.merge(number_rating, on = "title")
final_rating

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher,number of ratings
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
...,...,...,...,...,...,...,...,...
487680,275970,1892145022,0,Here Is New York,E. B. White,1999,Little Bookroom,1
487681,275970,1931868123,0,There's a Porcupine in My Outhouse: Misadventu...,Mike Tougias,2002,Capital Books (VA),1
487682,275970,3411086211,10,Die Biene.,Sybil GrÃ?Â¤fin SchÃ?Â¶nfeldt,1993,"Bibliographisches Institut, Mannheim",1
487683,275970,3829021860,0,The Penis Book,Joseph Cohen,1999,Konemann,1


In [39]:
final_rating = final_rating[final_rating["number of ratings"] >= 50]
final_rating

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher,number of ratings
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
...,...,...,...,...,...,...,...,...
236701,255489,0553579983,7,And Then You Die,Iris Johansen,1998,Bantam,50
236702,256407,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50
236703,257204,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50
236704,261829,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50


These are the users who have done more than 200 ratings and books which have atleast 50 ratings given.

### Removing duplicates

In [43]:
final_rating.duplicated(["user_id", "title"]).sum()

2003

So, same user have done multiple times reviews of the same book which we need to remove from our records.

We will just consider the first review.

In [44]:
final_rating.drop_duplicates(["user_id", "title"])

Unnamed: 0,user_id,ISBN,rating,title,author,year,publisher,number of ratings
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
1,3363,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
2,11676,002542730X,6,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
3,12538,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
4,13552,002542730X,0,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley & Sons Inc,82
...,...,...,...,...,...,...,...,...
236701,255489,0553579983,7,And Then You Die,Iris Johansen,1998,Bantam,50
236702,256407,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50
236703,257204,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50
236704,261829,0553579983,0,And Then You Die,Iris Johansen,1998,Bantam,50


In [45]:
final_rating.drop_duplicates(["user_id", "title"], inplace= True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_rating.drop_duplicates(["user_id", "title"], inplace= True)


Let us see how many unique books we are having here.

In [49]:
len(final_rating["title"].unique())

742

So, we are just left with 742 books only to work on.

## Making a pivot

I want to see users in columns and their rating of books in the rows and the corresponding rating as the value of the cell.

In [47]:
book_pivot = final_rating.pivot_table(columns = "user_id", index = "title", values = "rating")
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,0.0,,,,
1st to Die: A Novel,,,,,,,,,,,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,0.0,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,10.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,0.0,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,0.0,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


So, we are just left with 742 books and 888 users to play with.

We have the problem of NaN values here

In [51]:
book_pivot.fillna(0, inplace= True)

In [52]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Transformation

We will use sklearn nearest neighbors clustering algorithm which works on finding out distance between objects and making them fit into K clusters.

The pivot table that we have is having very sparse data means lot of zero, we can go for some optimization using CSR library.

A Compressed Sparse Row (CSR) matrix is a popular data structure used for efficient storage and manipulation of sparse matrices. 

Sparse matrices are matrices in which most of the elements are zero. 

CSR is a memory-efficient way to represent sparse matrices in computer memory.

In [55]:
import scipy.sparse as sp

In [56]:
csr_matrix = sp.csr_matrix(book_pivot)
csr_matrix

<742x888 sparse matrix of type '<class 'numpy.float64'>'
	with 14942 stored elements in Compressed Sparse Row format>

# Importing ML Clustering Algorithm

In [58]:
from sklearn.neighbors import NearestNeighbors

Creating an instance of the NearestNeighbors class from a machine learning library like Scikit-Learn. 

The NearestNeighbors class is used for unsupervised machine learning tasks, particularly for finding nearest neighbors in a dataset.

In [59]:
model = NearestNeighbors(algorithm="brute")

The algorithm parameter is set to "brute," which means that this instance of NearestNeighbors will use a brute-force search to find the nearest neighbors. In a brute-force search, the algorithm directly computes the distances between data points to find the nearest neighbors, which can be computationally expensive for large datasets.

In [60]:
model.fit(csr_matrix)

NearestNeighbors(algorithm='brute')

In [68]:
model.n_neighbors

5

So, it have chosen 5 nearest neighbors by default

Now, the model will tell us the suggestions of the books, by taking a input book and the number of recommendations of books we want

In [72]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


iloc is a method in the Pandas library, which is a popular Python library for data manipulation and analysis. It is used for integer-location-based indexing and selection of data in a Pandas DataFrame or Series. iloc allows you to select rows and columns by their integer positions, rather than by labels or boolean conditions.

In [73]:
book_pivot.iloc[0,:]

user_id
254       9.0
2276      0.0
2766      0.0
2977      0.0
3363      0.0
         ... 
275970    0.0
277427    0.0
277478    0.0
277639    0.0
278418    0.0
Name: 1984, Length: 888, dtype: float64

In [75]:
book_pivot.index.values

array(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       '84 Charing Cross Road', 'A Bend in the Road', 'A Case of Need',
       'A Child Called \\It\\": One Child\'s Courage to Survive"',
       'A Civil Action', 'A Cry In The Night',
       'A Darkness More Than Night', 'A Day Late and a Dollar Short',
       'A Fine Balance', 'A Great Deliverance',
       'A Heartbreaking Work of Staggering Genius',
       'A Is for Alibi (Kinsey Millhone Mysteries (Paperback))',
       'A Lesson Before Dying (Vintage Contemporaries (Paperback))',
       'A Man Named Dave: A Story of Triumph and Forgiveness',
       'A Man in Full', 'A Map of the World', 'A Painted House',
       'A Patchwork Planet', 'A Prayer for Owen Meany',
       'A Thin Dark Line (Mysteries & Horror)',
       "A Thousand Acres (Ballantine Reader's Circle)", 'A Time to Kill',
       "A Virtuous Woman (Oprah's Book Club (Paperback))",
       'A Walk to Remember', 'A Widow for One Year', 'A Wrinkle In Time',
      

In [70]:
book_pivot.iloc[237,:]

user_id
254       9.0
2276      0.0
2766      0.0
2977      0.0
3363      0.0
         ... 
275970    9.0
277427    0.0
277478    0.0
277639    0.0
278418    0.0
Name: Harry Potter and the Chamber of Secrets (Book 2), Length: 888, dtype: float64

In [77]:
book_pivot.iloc[237,:].values.reshape(1,-1)

array([[ 9.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  8.,  0.,  0.,  0.,  0.,  0.,  0.,
         8.,  0.,  0.,  8.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         9.,  0.,  0.,  0., 10.,  0.,  0., 10.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 10.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  7.,  9.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  8.,  0.,  0.,  0.,  0.,  0.,  0.,  9.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0., 10.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0., 10.,  0.,  0.,  0.,  0.,  0.,  0.,  8.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.

In [79]:
distances, suggestions = model.kneighbors(book_pivot.iloc[237,:].values.reshape(1,-1), n_neighbors = 6)

In [80]:
distances

array([[ 0.        , 68.78953409, 69.5413546 , 72.64296249, 76.83098333,
        77.28518616]])

In [81]:
suggestions

array([[237, 240, 238, 241, 184, 536]])

So, the model is able to find out the nearest 5 books in the hyper dimensional space that will be used for recommedndations.

In [89]:
book_pivot.index[240]

'Harry Potter and the Prisoner of Azkaban (Book 3)'

In [90]:
for suggestion in suggestions:
    print(book_pivot.index[suggestion])

Index(['Harry Potter and the Chamber of Secrets (Book 2)',
       'Harry Potter and the Prisoner of Azkaban (Book 3)',
       'Harry Potter and the Goblet of Fire (Book 4)',
       'Harry Potter and the Sorcerer's Stone (Book 1)', 'Exclusive',
       'The Cradle Will Fall'],
      dtype='object', name='title')


These are all the books that are related to Harry Potter, wow how does it work, 

Just based on rating of people and based on collaborative filtering we can make very good recommendations.

## Creating a usable function for this recommender system

In [91]:
book_pivot.index == "The Cradle Will Fall"

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,

But we want the index of it

In [92]:
np.where(book_pivot.index == "The Cradle Will Fall")

(array([536]),)

In [94]:
np.where(book_pivot.index == "The Cradle Will Fall")[0][0]

536

In [95]:
def recommed_books(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distances, suggestions = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors = 6)
    suggestions_list = []
    for index in suggestions:
        suggestions_list.append(book_pivot.index[index])
    return suggestions_list  

In [96]:
recommed_books("The Cradle Will Fall")

[Index(['The Cradle Will Fall', 'Exclusive', 'The Long Road Home',
        'Eyes of a Child', 'Jacob Have I Loved', 'No Safe Place'],
       dtype='object', name='title')]

In [114]:
recommed_books("The Cradle Will Fall")[0][1:]

Index(['Exclusive', 'The Long Road Home', 'Eyes of a Child',
       'Jacob Have I Loved', 'No Safe Place'],
      dtype='object', name='title')

# Saving the requirements as pickle file to deploy this using streamlit application

In [99]:
book_pivot

user_id,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We will require book pivot to create the list of books from where user will select the books

In [100]:
import pickle

In [103]:
pickle.dump(book_pivot, open("book_pivot.pkl", "wb"))

We will need the model

In [105]:
pickle.dump(model, open("model.pkl", "wb"))

And this function we can use now

In [106]:
def recommed_books(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    distances, suggestions = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors = 6)
    suggestions_list = []
    for index in suggestions:
        suggestions_list.append(book_pivot.index[index])
    return suggestions_list  

In [108]:
book_pivot.index.values

array(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       '84 Charing Cross Road', 'A Bend in the Road', 'A Case of Need',
       'A Child Called \\It\\": One Child\'s Courage to Survive"',
       'A Civil Action', 'A Cry In The Night',
       'A Darkness More Than Night', 'A Day Late and a Dollar Short',
       'A Fine Balance', 'A Great Deliverance',
       'A Heartbreaking Work of Staggering Genius',
       'A Is for Alibi (Kinsey Millhone Mysteries (Paperback))',
       'A Lesson Before Dying (Vintage Contemporaries (Paperback))',
       'A Man Named Dave: A Story of Triumph and Forgiveness',
       'A Man in Full', 'A Map of the World', 'A Painted House',
       'A Patchwork Planet', 'A Prayer for Owen Meany',
       'A Thin Dark Line (Mysteries & Horror)',
       "A Thousand Acres (Ballantine Reader's Circle)", 'A Time to Kill',
       "A Virtuous Woman (Oprah's Book Club (Paperback))",
       'A Walk to Remember', 'A Widow for One Year', 'A Wrinkle In Time',
      