### Types of Recommendation System

1. **Content-Based Recommendation System:**
   - **Description:** Content-based recommendation systems recommend items similar to those the user has liked or interacted with in the past. The recommendations are based on the attributes or features of the items themselves.
   - **Example:** Consider a movie streaming platform where users have rated movies they've watched. A content-based recommendation system can recommend similar movies based on attributes such as genre, actors, directors, and plot keywords. For example, if a user has watched and rated several action movies starring a particular actor, the system can recommend other action movies featuring the same actor.

2. **Collaborative Filtering Recommendation System:**
   - **Description:** Collaborative filtering recommendation systems make recommendations by analyzing the interactions and preferences of multiple users. It identifies patterns and similarities among users or items to generate recommendations.
   - **Example:** In a collaborative filtering system for e-commerce, if User A and User B have similar purchase histories and preferences, the system can recommend products to User A that User B has previously purchased and liked. Similarly, if User A has purchased items that are frequently co-purchased with other items, the system can recommend those related items to User A.

3. **Hybrid Recommendation System:**
   - **Description:** Hybrid recommendation systems combine multiple recommendation techniques, such as content-based filtering, collaborative filtering, and other approaches, to provide more accurate and diverse recommendations.
   - **Example:** A music streaming service may use a hybrid recommendation system that combines collaborative filtering (based on user listening history and preferences) with content-based filtering (based on music genre, artist similarity, etc.). Additionally, it may incorporate contextual information such as user location, time of day, and mood to further personalize recommendations. This hybrid approach can offer more accurate and relevant music recommendations tailored to each user's individual preferences and context.


### In our project we will use Collaborative Method

## Import Libraries

In [105]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors
import pickle

## Read Data

In [11]:
books = pd.read_csv('Data/BX-Books.csv', sep=';', on_bad_lines = 'warn', encoding='latin-1')
users = pd.read_csv('Data/BX-Users.csv', sep=';', on_bad_lines = 'warn', encoding='latin-1')
ratings = pd.read_csv('Data/BX-Book-Ratings.csv', sep=";", on_bad_lines = 'warn', encoding='latin-1')

Skipping line 43667: expected 8 fields, saw 10
Skipping line 51751: expected 8 fields, saw 9

  books = pd.read_csv('Data/BX-Books.csv', sep=';', on_bad_lines = 'warn', encoding='latin-1')
Skipping line 104319: expected 8 fields, saw 9
Skipping line 121768: expected 8 fields, saw 9

  books = pd.read_csv('Data/BX-Books.csv', sep=';', on_bad_lines = 'warn', encoding='latin-1')
Skipping line 150789: expected 8 fields, saw 9
Skipping line 157128: expected 8 fields, saw 9
Skipping line 180189: expected 8 fields, saw 9
Skipping line 185738: expected 8 fields, saw 9

  books = pd.read_csv('Data/BX-Books.csv', sep=';', on_bad_lines = 'warn', encoding='latin-1')
Skipping line 220626: expected 8 fields, saw 9
Skipping line 227933: expected 8 fields, saw 11
Skipping line 228957: expected 8 fields, saw 10
Skipping line 245933: expected 8 fields, saw 9
Skipping line 251296: expected 8 fields, saw 9
Skipping line 259941: expected 8 fields, saw 9
Skipping line 261529: expected 8 fields, saw 9

  boo

In [12]:
print(f"Shape of books: {books.shape}")
print(f"Shape of users: {users.shape}")
print(f"Shape of ratings: {ratings.shape}")

Shape of books: (271360, 8)
Shape of users: (278858, 3)
Shape of ratings: (1149780, 3)


In [13]:
## We can see all are having different shapes,,,Need to handle that later on.

## Books

In [14]:
books.head(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-S,Image-URL-M,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


In [15]:
books.columns

Index(['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher',
       'Image-URL-S', 'Image-URL-M', 'Image-URL-L'],
      dtype='object')

Here we can see we have 3 URL columns and 'L' is actually the superset of 'M' & 'M' is the superset of 'S'
So no need to keep 'S' & 'M' means columns 'Image-URL-S' & 'Image-URL-M'

In [16]:
books = books[['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher', 'Image-URL-L']]

In [17]:
books.head(5)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...


## Users

In [18]:
users.head(5)

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [21]:
users['Age'].isnull().sum()

110762

We can see that around half od the data is not having any Age value (means NULL). Although that information is not so much valuable so we can leave that.

## Book Ratings

In [24]:
ratings.head(5)

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [25]:
ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
 #   Column       Non-Null Count    Dtype 
---  ------       --------------    ----- 
 0   User-ID      1149780 non-null  int64 
 1   ISBN         1149780 non-null  object
 2   Book-Rating  1149780 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 26.3+ MB


Not a single NULL value

## PreProcessing

We have too many values,,,but if a user have few amount of data then that won't help us in training. So we will select the users who has a lots of data

In [27]:
x = ratings['User-ID'].value_counts() > 200

In [29]:
x[x].shape

(899,)

So there are only 899 users who has books more than 200

In [30]:
y = x[x].index
y

Index([ 11676, 198711, 153662,  98391,  35859, 212898, 278418,  76352, 110973,
       235105,
       ...
       260183,  73681,  44296, 155916,   9856, 274808,  28634,  59727, 268622,
       188951],
      dtype='int64', name='User-ID', length=899)

I will select ratings of those users only

In [31]:
print(f"ratings shape before selection: {ratings.shape}")
ratings = ratings[ratings['User-ID'].isin(y)]
print(f"ratings shape after selection: {ratings.shape}")

ratings shape before selection: (1149780, 3)
ratings shape after selection: (526356, 3)


We can see it almost got half

We are having 3 csv data,,,but is it really needed cause some columns are same,,Let's see whom to merge

In [37]:
books.head(2)

Unnamed: 0,ISBN,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...


In [36]:
users.head(2)

Unnamed: 0,User-ID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0


In [35]:
ratings.head(2)

Unnamed: 0,User-ID,ISBN,Book-Rating
1456,277427,002542730X,10
1457,277427,0026217457,0


In [38]:
ratings_with_books = ratings.merge(books, on='ISBN')

In [39]:
ratings_with_books.head(5)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...
1,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995,John Wiley &amp; Sons,http://images.amazon.com/images/P/0026217457.0...
2,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974,Thomson Learning,http://images.amazon.com/images/P/003008685X.0...
3,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982,Henry Holt &amp; Co,http://images.amazon.com/images/P/0030615321.0...
4,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002,Avon Books,http://images.amazon.com/images/P/0060002050.0...


In [40]:
ratings_with_books.shape

(487671, 8)

By doing this actually we got rid of missing values as well,, cause ratings shape was 526356,,but it decreased to 487671 as it only selected those rows who are having values in both ratings and books

In [46]:
num_rating = ratings_with_books.groupby('Book-Title')['Book-Rating'].count().reset_index()

In [48]:
num_rating.head(5)

Unnamed: 0,Book-Title,Book-Rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1
2,Apple Magic (The Collector's series),1
3,Beyond IBM: Leadership Marketing and Finance ...,1
4,Clifford Visita El Hospital (Clifford El Gran...,1


In [49]:
num_rating['Book-Rating'].value_counts()

Book-Rating
1      93585
2      27133
3      12476
4       6847
5       4340
       ...  
169        1
179        1
224        1
107        1
363        1
Name: count, Length: 179, dtype: int64

We can see there are some books which are having more than 1k rating,,and there are some books which are having only 1 rating(which won't help us in training our model)

In [50]:
num_rating.rename(columns={'Book-Rating':'Num-Of_Rating'},inplace=True)

In [52]:
num_rating.head(2)

Unnamed: 0,Book-Title,Num-Of_Rating
0,A Light in the Storm: The Civil War Diary of ...,2
1,Always Have Popsicles,1


Now we will merge this with ratings_with_books to filter out the useful ones only

In [53]:
final_rating = ratings_with_books.merge(num_rating, on='Book-Title')

In [54]:
final_rating.head(5)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L,Num-Of_Rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
1,277427,0026217457,0,Vegetarian Times Complete Cookbook,Lucy Moll,1995,John Wiley &amp; Sons,http://images.amazon.com/images/P/0026217457.0...,7
2,277427,003008685X,8,Pioneers,James Fenimore Cooper,1974,Thomson Learning,http://images.amazon.com/images/P/003008685X.0...,1
3,277427,0030615321,0,"Ask for May, Settle for June (A Doonesbury book)",G. B. Trudeau,1982,Henry Holt &amp; Co,http://images.amazon.com/images/P/0030615321.0...,1
4,277427,0060002050,0,On a Wicked Dawn (Cynster Novels),Stephanie Laurens,2002,Avon Books,http://images.amazon.com/images/P/0060002050.0...,13


In [55]:
# Lets take those books which got at least 50 rating of user

final_rating = final_rating[final_rating['Num-Of_Rating'] >= 50]

In [56]:
final_rating.head(5)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L,Num-Of_Rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
13,277427,0060930535,0,The Poisonwood Bible: A Novel,Barbara Kingsolver,1999,Perennial,http://images.amazon.com/images/P/0060930535.0...,133
15,277427,0060934417,0,Bel Canto: A Novel,Ann Patchett,2002,Perennial,http://images.amazon.com/images/P/0060934417.0...,108
18,277427,0061009059,9,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995,HarperTorch,http://images.amazon.com/images/P/0061009059.0...,108
24,277427,006440188X,0,The Secret Garden,Frances Hodgson Burnett,1998,HarperTrophy,http://images.amazon.com/images/P/006440188X.0...,79


In [57]:
final_rating.info()

<class 'pandas.core.frame.DataFrame'>
Index: 61853 entries, 0 to 487619
Data columns (total 9 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   User-ID              61853 non-null  int64 
 1   ISBN                 61853 non-null  object
 2   Book-Rating          61853 non-null  int64 
 3   Book-Title           61853 non-null  object
 4   Book-Author          61853 non-null  object
 5   Year-Of-Publication  61853 non-null  object
 6   Publisher            61853 non-null  object
 7   Image-URL-L          61853 non-null  object
 8   Num-Of_Rating        61853 non-null  int64 
dtypes: int64(3), object(6)
memory usage: 4.7+ MB


In [60]:
final_rating.shape

(61853, 9)

In [61]:
final_rating.drop_duplicates(['User-ID','Book-Title'],inplace=True)

In [62]:
final_rating.shape

(59850, 9)

In [64]:
final_rating.head(10)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author,Year-Of-Publication,Publisher,Image-URL-L,Num-Of_Rating
0,277427,002542730X,10,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc,http://images.amazon.com/images/P/002542730X.0...,82
13,277427,0060930535,0,The Poisonwood Bible: A Novel,Barbara Kingsolver,1999,Perennial,http://images.amazon.com/images/P/0060930535.0...,133
15,277427,0060934417,0,Bel Canto: A Novel,Ann Patchett,2002,Perennial,http://images.amazon.com/images/P/0060934417.0...,108
18,277427,0061009059,9,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995,HarperTorch,http://images.amazon.com/images/P/0061009059.0...,108
24,277427,006440188X,0,The Secret Garden,Frances Hodgson Burnett,1998,HarperTrophy,http://images.amazon.com/images/P/006440188X.0...,79
27,277427,0140067477,0,The Tao of Pooh,Benjamin Hoff,1983,Penguin Books,http://images.amazon.com/images/P/0140067477.0...,77
32,277427,014029628X,0,Girl in Hyacinth Blue,Susan Vreeland,2000,Penguin Books,http://images.amazon.com/images/P/014029628X.0...,91
36,277427,014100018X,0,Chocolat,Joanne Harris,2000,Penguin Books,http://images.amazon.com/images/P/014100018X.0...,103
38,277427,0142001740,0,The Secret Life of Bees,Sue Monk Kidd,2003,Penguin Books,http://images.amazon.com/images/P/0142001740.0...,209
56,277427,0312966091,0,Three To Get Deadly : A Stephanie Plum Novel (...,Janet Evanovich,1998,St. Martin's Paperbacks,http://images.amazon.com/images/P/0312966091.0...,105


We will make a pivot table to detect or extract similar kinds of interest among users for Collabrative Recomeendation System 

## Pivot Table

In [65]:
book_pivot = final_rating.pivot_table(columns='User-ID', index='Book-Title', values= 'Book-Rating')

In [66]:
book_pivot

User-ID,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,,,,,,,,,,...,,,,,,0.0,,,,
1st to Die: A Novel,,,,,,,,,,,...,,,,,,,,,,
2nd Chance,,10.0,,,,,,,,,...,,,,0.0,,,,,0.0,
4 Blondes,,,,,,,,,,0.0,...,,,,,,,,,,
84 Charing Cross Road,,,,,,,,,,,...,,,,,,10.0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,,,,7.0,,,,,7.0,,...,,,,,,0.0,,,,
You Belong To Me,,,,,,,,,,,...,,,,,,,,,,
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,,,,,0.0,,,,,0.0,...,,,,,,0.0,,,,
Zoya,,,,,,,,,,,...,,,,,,,,,,


In [67]:
book_pivot.shape

(742, 888)

We can place 0 inplace of NaN as it will be same

In [68]:
book_pivot.fillna(0, inplace=True)

In [69]:
book_pivot

User-ID,254,2276,2766,2977,3363,3757,4017,4385,6242,6251,...,274004,274061,274301,274308,274808,275970,277427,277478,277639,278418
Book-Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1984,9.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1st to Die: A Novel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2nd Chance,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4 Blondes,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84 Charing Cross Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,10.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Year of Wonders,0.0,0.0,0.0,7.0,0.0,0.0,0.0,0.0,7.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
You Belong To Me,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zen and the Art of Motorcycle Maintenance: An Inquiry into Values,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Zoya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Here we need to take only the ones who are having value greater than 0,,so we will use 'csr_matrix'

## Training Model

In [72]:
# I will convert the book_pivot to csr_matrix
book_sparse = csr_matrix(book_pivot)

In [75]:
book_sparse

<742x888 sparse matrix of type '<class 'numpy.float64'>'
	with 14961 stored elements in Compressed Sparse Row format>

Total = 742x888 = 139496. But it tells us that the sparse matrix has a total of 14,961 non-zero elements stored in it. All others are 0.

In [77]:
model = NearestNeighbors(algorithm= 'brute')

In [78]:
model.fit(book_sparse)

In [86]:
distance, suggestion = model.kneighbors(book_pivot.iloc[240,:].values.reshape(1,-1), n_neighbors=6 )

In [87]:
distance

array([[ 0.        , 62.08059278, 68.05145112, 71.49125821, 72.0624729 ,
        74.16198487]])

In [88]:
suggestion

array([[240, 238, 237, 241, 239, 688]])

In [89]:
book_pivot.iloc[237,:]

User-ID
254       9.0
2276      0.0
2766      0.0
2977      0.0
3363      0.0
         ... 
275970    9.0
277427    0.0
277478    0.0
277639    0.0
278418    0.0
Name: Harry Potter and the Chamber of Secrets (Book 2), Length: 888, dtype: float64

In [90]:
for i in range(len(suggestion)):
    print(book_pivot.index[suggestion[i]])

Index(['Harry Potter and the Prisoner of Azkaban (Book 3)',
       'Harry Potter and the Goblet of Fire (Book 4)',
       'Harry Potter and the Chamber of Secrets (Book 2)',
       'Harry Potter and the Sorcerer's Stone (Book 1)',
       'Harry Potter and the Order of the Phoenix (Book 5)', 'Tough Cookie'],
      dtype='object', name='Book-Title')


We can see for one Harry Potter Book we are getting suggestions of other Harry Potter Books. That mean our model is working.

In [92]:
book_names = book_pivot.index
book_names

Index(['1984', '1st to Die: A Novel', '2nd Chance', '4 Blondes',
       '84 Charing Cross Road', 'A Bend in the Road', 'A Case of Need',
       'A Child Called \It\": One Child's Courage to Survive"',
       'A Civil Action', 'A Cry In The Night',
       ...
       'Winter Solstice', 'Wish You Well', 'Without Remorse',
       'Wizard and Glass (The Dark Tower, Book 4)', 'Wuthering Heights',
       'Year of Wonders', 'You Belong To Me',
       'Zen and the Art of Motorcycle Maintenance: An Inquiry into Values',
       'Zoya', '\O\" Is for Outlaw"'],
      dtype='object', name='Book-Title', length=742)

## Find URL

In [93]:
ids = np.where(final_rating['Book-Title'] == "Harry Potter and the Chamber of Secrets (Book 2)")[0][0]

In [95]:
final_rating.iloc[ids]['Image-URL-L']

'http://images.amazon.com/images/P/0439064872.01.LZZZZZZZ.jpg'

In [96]:
book_name = []
for book_id in suggestion:
    book_name.append(book_pivot.index[book_id])

In [99]:
book_name

[Index(['Harry Potter and the Prisoner of Azkaban (Book 3)',
        'Harry Potter and the Goblet of Fire (Book 4)',
        'Harry Potter and the Chamber of Secrets (Book 2)',
        'Harry Potter and the Sorcerer's Stone (Book 1)',
        'Harry Potter and the Order of the Phoenix (Book 5)', 'Tough Cookie'],
       dtype='object', name='Book-Title')]

In [101]:
ids_index = []
for name in book_name[0]: 
    ids = np.where(final_rating['Book-Title'] == name)[0][0]
    ids_index.append(ids)

In [103]:
for idx in ids_index:
    url = final_rating.iloc[idx]['Image-URL-L']
    print(url)

http://images.amazon.com/images/P/0439136369.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0439139597.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0439064872.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/043936213X.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/043935806X.01.LZZZZZZZ.jpg
http://images.amazon.com/images/P/0553578308.01.LZZZZZZZ.jpg


In [106]:
pickle.dump(model,open('Models/model.pkl','wb'))
pickle.dump(book_names,open('Models/book_names.pkl','wb'))
pickle.dump(final_rating,open('Models/final_rating.pkl','wb'))
pickle.dump(book_pivot,open('Models/book_pivot.pkl','wb'))

## Testing Model

In [107]:
def recommend_book(book_name):
    book_id = np.where(book_pivot.index == book_name)[0][0]
    _, suggestion = model.kneighbors(book_pivot.iloc[book_id,:].values.reshape(1,-1), n_neighbors=6 )
    
    for i in range(len(suggestion)):
            books = book_pivot.index[suggestion[i]]
            for j in books:
                if j == book_name:
                    print(f"You searched '{book_name}'\n")
                    print("The suggestion books are: \n")
                else:
                    print(j)

In [108]:
book_name = "Harry Potter and the Chamber of Secrets (Book 2)"
recommend_book(book_name)

You searched 'Harry Potter and the Chamber of Secrets (Book 2)'

The suggestion books are: 

Harry Potter and the Goblet of Fire (Book 4)
Harry Potter and the Prisoner of Azkaban (Book 3)
Harry Potter and the Sorcerer's Stone (Book 1)
Exclusive
The Cradle Will Fall


In [121]:
book_name = book_names[50]
book_name

'Angels'

In [122]:
recommend_book(book_name)

You searched 'Angels'

The suggestion books are: 

Exclusive
No Safe Place
Long After Midnight
Lake Wobegon days
Pleading Guilty
