**About Book Crossing Dataset**<br>

This dataset has been compiled by Cai-Nicolas Ziegler in 2004, and it comprises of three tables for users, books and ratings. Explicit ratings are expressed on a scale from 1-10 (higher values denoting higher appreciation) and implicit rating is expressed by 0.

Reference: http://www2.informatik.uni-freiburg.de/~cziegler/BX/ 

**Objective**

This project entails building a Book Recommender System for users based on user-based and item-based collaborative filtering approaches.

#### Execute the below cell to load the datasets

In [0]:
import io
import pandas as pd
import numpy as np

In [683]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
path1 = "/content/drive/My Drive/Residency 5 -External Lab/books.csv"
path2 = "/content/drive/My Drive/Residency 5 -External Lab/ratings.csv"
path3 = "/content/drive/My Drive/Residency 5 -External Lab/users.csv"


In [685]:
#Loading data
books1 = pd.read_csv(path1, sep=";", error_bad_lines=False, encoding="latin-1")
books1.columns = ['ISBN', 'bookTitle', 'bookAuthor', 'yearOfPublication', 'publisher', 'imageUrlS', 'imageUrlM', 'imageUrlL']

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expected 8 fields, saw 9\n'
  interactivity=interactivity, compiler=compiler, result=result)


In [0]:

#Loading data
users1 = pd.read_csv(path3, sep=";", error_bad_lines=False, encoding="latin-1")
users1.columns = ['userID', 'Location', 'Age']


In [0]:
ratings1 = pd.read_csv(path2, sep=";", error_bad_lines=False, encoding="latin-1")
ratings1.columns = ['userID', 'ISBN', 'bookRating']

In [0]:
books = books1.copy(deep=True)

In [0]:
users = users1.copy(deep=True)

In [0]:
ratings = ratings1.copy(deep=True)

### Check no.of records and features given in each dataset

In [460]:
print(books.shape)

(271360, 8)


In [461]:
print(ratings.shape)

(1149780, 3)


In [462]:
print(users.shape)

(278858, 3)


## Exploring books dataset

In [691]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


### Drop last three columns containing image URLs which will not be required for analysis

In [0]:
books.drop(columns=['imageUrlS','imageUrlM','imageUrlL'],inplace=True)

In [693]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


**yearOfPublication**

### Check unique values of yearOfPublication


In [694]:
books['yearOfPublication'].unique()

array([2002, 2001, 1991, 1999, 2000, 1993, 1996, 1988, 2004, 1998, 1994,
       2003, 1997, 1983, 1979, 1995, 1982, 1985, 1992, 1986, 1978, 1980,
       1952, 1987, 1990, 1981, 1989, 1984, 0, 1968, 1961, 1958, 1974,
       1976, 1971, 1977, 1975, 1965, 1941, 1970, 1962, 1973, 1972, 1960,
       1966, 1920, 1956, 1959, 1953, 1951, 1942, 1963, 1964, 1969, 1954,
       1950, 1967, 2005, 1957, 1940, 1937, 1955, 1946, 1936, 1930, 2011,
       1925, 1948, 1943, 1947, 1945, 1923, 2020, 1939, 1926, 1938, 2030,
       1911, 1904, 1949, 1932, 1928, 1929, 1927, 1931, 1914, 2050, 1934,
       1910, 1933, 1902, 1924, 1921, 1900, 2038, 2026, 1944, 1917, 1901,
       2010, 1908, 1906, 1935, 1806, 2021, '2000', '1995', '1999', '2004',
       '2003', '1990', '1994', '1986', '1989', '2002', '1981', '1993',
       '1983', '1982', '1976', '1991', '1977', '1998', '1992', '1996',
       '0', '1997', '2001', '1974', '1968', '1987', '1984', '1988',
       '1963', '1956', '1970', '1985', '1978', '1973', '1980'

As it can be seen from above that there are some incorrect entries in this field. It looks like Publisher names 'DK Publishing Inc' and 'Gallimard' have been incorrectly loaded as yearOfPublication in dataset due to some errors in csv file.


Also some of the entries are strings and same years have been entered as numbers in some places. We will try to fix these things in the coming questions.

### Check the rows having 'DK Publishing Inc' as yearOfPublication

In [695]:
books[(books['yearOfPublication']=='Gallimard') ^ (books['yearOfPublication']=='DK Publishing Inc')]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
209538,078946697X,"DK Readers: Creating the X-Men, How It All Beg...",2000,DK Publishing Inc,http://images.amazon.com/images/P/078946697X.0...
220731,2070426769,"Peuple du ciel, suivi de 'Les Bergers\"";Jean-M...",2003,Gallimard,http://images.amazon.com/images/P/2070426769.0...
221678,0789466953,"DK Readers: Creating the X-Men, How Comic Book...",2000,DK Publishing Inc,http://images.amazon.com/images/P/0789466953.0...


### Drop the rows having `'DK Publishing Inc'` and `'Gallimard'` as `yearOfPublication`

In [0]:

books.drop([books.index[209538] , books.index[220731], books.index[221678]],inplace=True)


In [697]:
## Checking the dropped records 

books[(books['yearOfPublication']=='Gallimard') ^ (books['yearOfPublication']=='DK Publishing Inc')]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher


In [698]:
print(books1.shape)
print(books.shape)

(271360, 8)
(271357, 5)


### Change the datatype of yearOfPublication to 'int'

In [699]:
books.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 271357 entries, 0 to 271359
Data columns (total 5 columns):
ISBN                 271357 non-null object
bookTitle            271357 non-null object
bookAuthor           271356 non-null object
yearOfPublication    271357 non-null object
publisher            271355 non-null object
dtypes: object(5)
memory usage: 12.4+ MB


In [0]:
books['yearOfPublication'] = books['yearOfPublication'].astype(np.int32)

In [701]:
books.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 271357 entries, 0 to 271359
Data columns (total 5 columns):
ISBN                 271357 non-null object
bookTitle            271357 non-null object
bookAuthor           271356 non-null object
yearOfPublication    271357 non-null int32
publisher            271355 non-null object
dtypes: int32(1), object(4)
memory usage: 11.4+ MB


In [702]:
books.dtypes

ISBN                 object
bookTitle            object
bookAuthor           object
yearOfPublication     int32
publisher            object
dtype: object

In [703]:
books['yearOfPublication'].unique()

array([2002, 2001, 1991, 1999, 2000, 1993, 1996, 1988, 2004, 1998, 1994,
       2003, 1997, 1983, 1979, 1995, 1982, 1985, 1992, 1986, 1978, 1980,
       1952, 1987, 1990, 1981, 1989, 1984,    0, 1968, 1961, 1958, 1974,
       1976, 1971, 1977, 1975, 1965, 1941, 1970, 1962, 1973, 1972, 1960,
       1966, 1920, 1956, 1959, 1953, 1951, 1942, 1963, 1964, 1969, 1954,
       1950, 1967, 2005, 1957, 1940, 1937, 1955, 1946, 1936, 1930, 2011,
       1925, 1948, 1943, 1947, 1945, 1923, 2020, 1939, 1926, 1938, 2030,
       1911, 1904, 1949, 1932, 1928, 1929, 1927, 1931, 1914, 2050, 1934,
       1910, 1933, 1902, 1924, 1921, 1900, 2038, 2026, 1944, 1917, 1901,
       2010, 1908, 1906, 1935, 1806, 2021, 2012, 2006, 1909, 2008, 1378,
       1919, 1922, 1897, 2024, 1376, 2037])

### Drop NaNs in `'publisher'` column


In [704]:
books['publisher'].unique()

array(['Oxford University Press', 'HarperFlamingo Canada',
       'HarperPerennial', ..., 'Tempo', 'Life Works Books', 'Connaught'],
      dtype=object)

In [0]:
books.dropna(subset=['publisher'],inplace=True)

In [706]:
## Checking the dropped records 

books[(books['publisher']==np.NaN) ]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher


In [707]:
print(books1.shape)
print(books.shape)

(271360, 8)
(271355, 5)


## Exploring Users dataset

In [708]:
print(users.shape)
print(users1.shape)
users.head()

(278858, 3)
(278858, 3)


Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### Get all unique values in ascending order for column `Age`

In [709]:
pd.DataFrame({'Age':users['Age'].unique()}).sort_values(by='Age',ascending=True)

Unnamed: 0,Age
47,0.0
57,1.0
79,2.0
72,3.0
84,4.0
90,5.0
93,6.0
83,7.0
82,8.0
64,9.0


Age column has some invalid entries like nan, 0 and very high values like 100 and above

### Values below 5 and above 90 do not make much sense for our book rating case...hence replace these by NaNs

In [710]:
users.head()

Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


In [0]:
users['Age'].replace(users[(users['Age']<5)^(users['Age']>90)]['Age'],np.NaN,inplace=True)

In [712]:
users[(users['Age']<5)^(users['Age']>90)]['Age']

Series([], Name: Age, dtype: float64)

### Replace null values in column `Age` with mean

In [713]:
users['Age'].head()

0     NaN
1    18.0
2     NaN
3    17.0
4     NaN
Name: Age, dtype: float64

In [714]:
users['Age'].mean()

34.72384041634689

In [0]:
users['Age'].replace(np.NaN,users['Age'].mean(),inplace=True)

In [716]:
users['Age'].head()

0    34.72384
1    18.00000
2    34.72384
3    17.00000
4    34.72384
Name: Age, dtype: float64

### Change the datatype of `Age` to `int`

In [0]:
users['Age'] =users['Age'].astype(np.int64)

In [718]:
users['Age'].dtypes

dtype('int64')

In [719]:
print(sorted(users.Age.unique()))

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]


## Exploring the Ratings Dataset

### check the shape

In [720]:
print(ratings.shape)
print(ratings1.shape)
ratings.head()

(1149780, 3)
(1149780, 3)


Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [0]:
n_users = users.shape[0]
n_books = books.shape[0]

In [722]:
print(n_users,n_books)

278858 271355


In [723]:
ratings.head(5)

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


### Ratings dataset should have books only which exist in our books dataset. Drop the remaining rows

In [724]:

ratings.head()

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


In [725]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [726]:
users.head()

Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",34
1,2,"stockton, california, usa",18
2,3,"moscow, yukon territory, russia",34
3,4,"porto, v.n.gaia, portugal",17
4,5,"farnborough, hants, united kingdom",34


In [0]:
ratings_books = pd.merge(ratings,books,on='ISBN',how='inner')

In [728]:
print(books.shape)
print(ratings.shape)
print(ratings_books.shape)

(271355, 5)
(1149780, 3)
(1031130, 7)


In [729]:
ratings_books.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
2,6543,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
3,8680,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books
4,10314,034545104X,9,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books


### Ratings dataset should have ratings from users which exist in users dataset. Drop the remaining rows

In [0]:
ratings_books_users_df = pd.merge(ratings_books,users,on='userID',how='inner')

In [731]:
ratings_books_users_df.shape

(1031130, 9)

In [732]:
print(books.shape)
print(ratings.shape)
print(ratings_books.shape)

(271355, 5)
(1149780, 3)
(1031130, 7)


In [733]:
ratings_books_users_df.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,"tyler, texas, usa",34
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,"cincinnati, ohio, usa",23
2,2313,0812533550,9,Ender's Game (Ender Wiggins Saga (Paperback)),Orson Scott Card,1986,Tor Books,"cincinnati, ohio, usa",23
3,2313,0679745580,8,In Cold Blood (Vintage International),TRUMAN CAPOTE,1994,Vintage,"cincinnati, ohio, usa",23
4,2313,0060173289,9,Divine Secrets of the Ya-Ya Sisterhood : A Novel,Rebecca Wells,1996,HarperCollins,"cincinnati, ohio, usa",23


In [0]:
ratings_books_users_df_cpy = ratings_books_users_df.copy(deep=True)

### Consider only ratings from 1-10 and leave 0s in column `bookRating`

In [0]:
ratings_books_users_df = ratings_books_users_df[ratings_books_users_df['bookRating']!=0]

In [736]:
ratings_books_users_df.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
1,2313,034545104X,5,Flesh Tones: A Novel,M. J. Rose,2002,Ballantine Books,"cincinnati, ohio, usa",23
2,2313,0812533550,9,Ender's Game (Ender Wiggins Saga (Paperback)),Orson Scott Card,1986,Tor Books,"cincinnati, ohio, usa",23
3,2313,0679745580,8,In Cold Blood (Vintage International),TRUMAN CAPOTE,1994,Vintage,"cincinnati, ohio, usa",23
4,2313,0060173289,9,Divine Secrets of the Ya-Ya Sisterhood : A Novel,Rebecca Wells,1996,HarperCollins,"cincinnati, ohio, usa",23
5,2313,0385482388,5,The Mistress of Spices,Chitra Banerjee Divakaruni,1998,Anchor Books/Doubleday,"cincinnati, ohio, usa",23


In [737]:
ratings_books_users_df_cpy[ratings_books_users_df_cpy['bookRating']==0]['bookRating'].count()

647291

In [738]:
print(ratings_books_users_df_cpy.shape)

(1031130, 9)


In [739]:
print(ratings_books_users_df.shape)

(383839, 9)


In [0]:
## 383839+647291 = 1031130

### Find out which rating has been given highest number of times

In [741]:
ratings_books_users_df.columns

Index(['userID', 'ISBN', 'bookRating', 'bookTitle', 'bookAuthor',
       'yearOfPublication', 'publisher', 'Location', 'Age'],
      dtype='object')

In [742]:
ratings_books_users_df.groupby('bookRating').count().sort_values(by='userID',ascending=False)

## rating 8 is given highest number of times - 91804

Unnamed: 0_level_0,userID,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
bookRating,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8,91804,91804,91804,91803,91804,91804,91804,91804
10,71225,71225,71225,71225,71225,71225,71225,71225
7,66401,66401,66401,66401,66401,66401,66401,66401
9,60776,60776,60776,60776,60776,60776,60776,60776
5,45355,45355,45355,45355,45355,45355,45355,45355
6,31687,31687,31687,31687,31687,31687,31687,31687
4,7617,7617,7617,7617,7617,7617,7617,7617
3,5118,5118,5118,5118,5118,5118,5118,5118
2,2375,2375,2375,2375,2375,2375,2375,2375
1,1481,1481,1481,1481,1481,1481,1481,1481


### **Collaborative Filtering Based Recommendation Systems**

### For more accurate results only consider users who have rated atleast 100 books

In [743]:
ratings_books_users_df.columns

Index(['userID', 'ISBN', 'bookRating', 'bookTitle', 'bookAuthor',
       'yearOfPublication', 'publisher', 'Location', 'Age'],
      dtype='object')

In [0]:
user_grp = ratings_books_users_df.groupby(['userID']).count().sort_values(by=[ 'ISBN', 'bookRating', 'bookTitle', 'bookAuthor',
       'yearOfPublication', 'publisher', 'Location', 'Age'])

In [745]:
user_grp.head()

Unnamed: 0_level_0,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
9,1,1,1,1,1,1,1,1
12,1,1,1,1,1,1,1,1
16,1,1,1,1,1,1,1,1
19,1,1,1,1,1,1,1,1
22,1,1,1,1,1,1,1,1


In [0]:
userid = user_grp[user_grp['ISBN']>99].index

In [747]:
len(userid)

449

In [0]:
ratings_books_users100_df = ratings_books_users_df.loc[ratings_books_users_df['userID'].isin(userid)].copy(deep=True)

In [749]:
ratings_books_users100_df.shape

(103269, 9)

In [750]:
ratings_books_users_df.shape

(383839, 9)

In [751]:
ratings_books_users100_df.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
43,6543,446605484,10,Roses Are Red (Alex Cross Novels),James Patterson,2001,Warner Vision,"strafford, missouri, usa",34
47,6543,805062971,8,Fight Club,Chuck Palahniuk,1999,Owl Books,"strafford, missouri, usa",34
48,6543,345342968,8,Fahrenheit 451,RAY BRADBURY,1987,Del Rey,"strafford, missouri, usa",34
49,6543,446610038,9,1st to Die: A Novel,James Patterson,2002,Warner Vision,"strafford, missouri, usa",34
55,6543,61009059,8,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995,HarperTorch,"strafford, missouri, usa",34


### Generating ratings matrix from explicit ratings


#### Note: since NaNs cannot be handled by training algorithms, replace these by 0, which indicates absence of ratings

In [752]:
ratings_books_users100_df.isna().sum() 

userID               0
ISBN                 0
bookRating           0
bookTitle            0
bookAuthor           0
yearOfPublication    0
publisher            0
Location             0
Age                  0
dtype: int64

In [0]:
ratings_books_users100_df.fillna(0,inplace=True)

### Generate the predicted ratings using SVD with no.of singular values to be 50

In [754]:
pip install surprise



In [0]:
from collections import defaultdict
from surprise import SVD
from surprise import Dataset

In [0]:
from sklearn.model_selection import train_test_split

trainDF, tempDF = train_test_split(ratings_books_users100_df, test_size = 0.2, random_state = 100)

In [757]:
print(trainDF.shape, tempDF.shape)

(82615, 9) (20654, 9)


In [758]:
trainDF.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
425933,150979,0679460152,9,The Blackstone Chronicles,John Saul,1997,Random House,"greencastle, pennsylvania, usa",34
278665,60244,0393049566,7,Socrates Cafe: A Fresh Taste of Philosophy,Christopher Phillips,2001,W.W. Norton &amp; Company,"alvin, texas, usa",47
8536,98391,067104222X,9,Dangerous Dilemmas,Evelyn Palfrey,2001,Atria,"morrow, georgia, usa",52
145844,207782,0874060273,7,Barkley Come Home/26091236,Marilyn D Anderson,1985,Pages Publishing Group,"midland, texas, usa",28
264590,189835,1560549653,5,Letter from Peking,Pearl S. Buck,1992,Chivers Audio Books,"honolulu, hawaii, usa",34


In [759]:
tempDF.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
446631,197659,842304673,7,The Complete Book of Zingers,Croft M. Pentz,1990,Tyndale House Publishers,"indiana, pennsylvania, usa",49
135668,184299,345358791,8,2061: Odyssey Three,Arthur C. Clarke,1991,Del Rey Books,"omaha, nebraska, usa",31
84242,115003,1400031354,9,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002,Anchor,"asheville, north carolina, usa",43
193585,153662,380761319,10,The Shadow and the Star,Laura Kinsale,1991,Harpercollins,"ft. stewart, georgia, usa",44
103772,16795,394891139,6,The Glow-in-the-Dark Night Sky Book,Clint Hatchett,1988,Random House Trade,"mechanicsville, maryland, usa",47


In [0]:
testDF = tempDF.copy()

In [761]:
tempDF['bookRating'] = np.nan

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [762]:
tempDF.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
446631,197659,842304673,,The Complete Book of Zingers,Croft M. Pentz,1990,Tyndale House Publishers,"indiana, pennsylvania, usa",49
135668,184299,345358791,,2061: Odyssey Three,Arthur C. Clarke,1991,Del Rey Books,"omaha, nebraska, usa",31
84242,115003,1400031354,,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002,Anchor,"asheville, north carolina, usa",43
193585,153662,380761319,,The Shadow and the Star,Laura Kinsale,1991,Harpercollins,"ft. stewart, georgia, usa",44
103772,16795,394891139,,The Glow-in-the-Dark Night Sky Book,Clint Hatchett,1988,Random House Trade,"mechanicsville, maryland, usa",47


In [0]:
testDF = testDF.dropna()

In [764]:
testDF.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
446631,197659,842304673,7,The Complete Book of Zingers,Croft M. Pentz,1990,Tyndale House Publishers,"indiana, pennsylvania, usa",49
135668,184299,345358791,8,2061: Odyssey Three,Arthur C. Clarke,1991,Del Rey Books,"omaha, nebraska, usa",31
84242,115003,1400031354,9,Tears of the Giraffe (No.1 Ladies Detective Ag...,Alexander McCall Smith,2002,Anchor,"asheville, north carolina, usa",43
193585,153662,380761319,10,The Shadow and the Star,Laura Kinsale,1991,Harpercollins,"ft. stewart, georgia, usa",44
103772,16795,394891139,6,The Glow-in-the-Dark Night Sky Book,Clint Hatchett,1988,Random House Trade,"mechanicsville, maryland, usa",47


In [0]:
rtings = pd.concat([trainDF, tempDF]).reset_index()

In [766]:
rtings.sample(10)

Unnamed: 0,index,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
27770,786233,123094,385299397,8.0,Childhood Rising: The Astrology of Your Mother...,Michael Lutin,1991,Bantam Dell Pub Group,"morrisville, north carolina, usa",45
4490,31861,11676,671748742,5.0,Left to Die,Dan Kurzman,1995,Pocket,"n/a, n/a, n/a",34
34607,66539,275970,61059072,9.0,The Last Continent (Discworld Novels (Paperback)),Terry Pratchett,2000,HarperTorch,"pittsburgh, pennsylvania, usa",46
97213,698388,212965,821734989,,Forbidden Ecstasy,Janelle Taylor,1991,Zebra Books,"akron,, ohio, usa",43
28525,602294,23902,385490992,6.0,The Street Lawyer,John Grisham,1998,Doubleday Books,"london, england, united kingdom",34
84980,193656,153662,380772574,,Enchanted,Elizabeth Lowell,1994,Avon,"ft. stewart, georgia, usa",44
42051,166948,204864,140186409,10.0,The Grapes of Wrath (20th Century Classics),John Steinbeck,1992,Penguin Books,"simi valley, california, usa",47
83841,197217,153662,877959765,,The intermarriage handbook: A guide for Jews &...,Judy Petsonk,1988,Arbor House,"ft. stewart, georgia, usa",44
82952,271427,7346,399147195,,P Is for Peril (Kinsey Millhone Mysteries (Har...,Sue Grafton,2001,Putnam Publishing Group,"sunnyvale, california, usa",49
51209,34790,11676,3596259924,10.0,Die Unertragliche Leichtigkeit des Seins...The...,Milan Kundera,1997,Distribooks Inc,"n/a, n/a, n/a",34


In [767]:
rtings.head()

Unnamed: 0,index,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
0,425933,150979,0679460152,9.0,The Blackstone Chronicles,John Saul,1997,Random House,"greencastle, pennsylvania, usa",34
1,278665,60244,0393049566,7.0,Socrates Cafe: A Fresh Taste of Philosophy,Christopher Phillips,2001,W.W. Norton &amp; Company,"alvin, texas, usa",47
2,8536,98391,067104222X,9.0,Dangerous Dilemmas,Evelyn Palfrey,2001,Atria,"morrow, georgia, usa",52
3,145844,207782,0874060273,7.0,Barkley Come Home/26091236,Marilyn D Anderson,1985,Pages Publishing Group,"midland, texas, usa",28
4,264590,189835,1560549653,5.0,Letter from Peking,Pearl S. Buck,1992,Chivers Audio Books,"honolulu, hawaii, usa",34


In [768]:
rtings['userID'].min()

2033

In [769]:
rtings = rtings.drop_duplicates()
rtings.shape

(103269, 10)

In [770]:
rtings[(rtings['userID']==2033)#&(rtings['ISBN']=='0393049566')
      ]

Unnamed: 0,index,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher,Location,Age
126,364083,2033,1891400495,10.0,A Simple Choice : A Practical Guide to Saving ...,Deborah Taylor-Hough,2000,Champion Press Ltd,"omaha, nebraska, usa",27
239,364038,2033,0882710583,8.0,Catholic Children's Bible,Mary Theola,1985,Regina Press Malhame &amp; Company,"omaha, nebraska, usa",27
571,363987,2033,0671025554,10.0,What's in a Name,Susan Osborn,1999,Pocket,"omaha, nebraska, usa",27
1424,363969,2033,0451458028,10.0,The Invisible Ring,Anne Bishop,2000,Roc,"omaha, nebraska, usa",27
1518,364004,2033,0716724022,7.0,Physical Chemistry,P. W. Atkins,1994,W.H. Freeman &amp; Company,"omaha, nebraska, usa",27
4143,364066,2033,0895779129,6.0,"Foods That Harm, Foods That Heal: An A - Z Gu...",Reader's Digest,1997,Readers Digest,"omaha, nebraska, usa",27
4310,363927,2033,0590353403,9.0,Harry Potter and the Sorcerer's Stone (Book 1),J. K. Rowling,1998,Scholastic,"omaha, nebraska, usa",27
6419,364010,2033,0786880007,8.0,Simplify Your Life : 100 Ways to Slow Down and...,Elaine St. James,1994,Hyperion,"omaha, nebraska, usa",27
7831,364052,2033,0886775639,8.0,"Winds of Change (The Mage Winds, Book 2)",Mercedes Lackey,1994,Daw Books,"omaha, nebraska, usa",27
8796,364063,2033,0886778603,10.0,The Children of Wrath (Renshai Chronicles),Mickey Zucker Reichert,1999,Daw Books,"omaha, nebraska, usa",27


In [0]:
R_df = rtings.pivot(index = 'userID', columns = 'ISBN', values = 'bookRating').fillna(0)

In [772]:
R_df.head()

ISBN,0000913154,0001046438,000104687X,0001047213,0001047973,000104799X,0001048082,0001053736,0001053744,0001055607,0001056107,0001845039,0001935968,0001944711,0001952803,0001953877,0002000547,0002005018,0002005050,0002005557,0002006588,0002115328,0002116286,0002118580,0002154900,0002158973,0002163713,0002176181,0002176432,0002179695,0002181924,0002184974,0002190915,0002197154,0002223929,0002228394,000223257X,0002233509,0002239183,0002240114,...,987960170X,9974643058,999058284X,9992003766,9992059958,9993584185,9994256963,9994348337,9997405137,9997406567,9997406990,999740923X,9997409728,9997411757,9997411870,9997412044,9997412958,9997507002,999750805X,9997508769,9997512952,9997519086,9997555635,9998914140,B00001U0CP,B00005TZWI,B00006CRTE,B00006I4OX,B00007FYKW,B00008RWPV,B000092Q0A,B00009EF82,B00009NDAN,B0000DYXID,B0000T6KHI,B0000VZEJQ,B0000X8HIE,B00013AX9E,B0001I1KOG,B000234N3A
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
2033,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2110,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2276,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4017,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [773]:
R_df.index

Int64Index([  2033,   2110,   2276,   4017,   4385,   5582,   6242,   6251,
              6543,   6575,
            ...
            269566, 270713, 271448, 271705, 273113, 274061, 274301, 275970,
            277427, 278418],
           dtype='int64', name='userID', length=449)

In [774]:
R_df.get_value(2033,'0451457781') 

  """Entry point for launching an IPython kernel.


8.0

In [775]:
R_df.shape

(449, 66572)

In [776]:
rtings['userID'].nunique()

449

In [777]:
rtings['ISBN'].nunique()

66572

In [0]:
from scipy.sparse.linalg import svds

In [0]:
U, sigma, Vt = svds(R_df, k = 50)

In [780]:
sigma

array([131.07954208, 132.44479902, 132.61470995, 133.96010817,
       134.94232624, 136.38117803, 137.0634911 , 138.04647807,
       140.45935247, 141.29908114, 142.26811037, 143.88305269,
       144.27243066, 144.93753168, 149.39109893, 149.62291223,
       149.94512384, 152.15710138, 152.98116567, 154.23600256,
       155.64958852, 156.98587955, 158.30450983, 161.41139495,
       164.36235669, 164.60938522, 166.22369888, 168.8872909 ,
       173.19509942, 174.99507662, 176.37245022, 178.41205733,
       180.20327794, 181.26833216, 184.19621481, 186.26397001,
       190.17666439, 194.12064112, 202.52424067, 206.23585733,
       210.1876945 , 219.80287636, 223.09823012, 232.70628393,
       237.36014895, 252.56483856, 257.35846413, 338.84909015,
       567.12180411, 605.76299262])

In [0]:
sigma = np.diag(sigma)

In [782]:
sigma

array([[131.07954208,   0.        ,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        , 132.44479902,   0.        , ...,   0.        ,
          0.        ,   0.        ],
       [  0.        ,   0.        , 132.61470995, ...,   0.        ,
          0.        ,   0.        ],
       ...,
       [  0.        ,   0.        ,   0.        , ..., 338.84909015,
          0.        ,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
        567.12180411,   0.        ],
       [  0.        ,   0.        ,   0.        , ...,   0.        ,
          0.        , 605.76299262]])

In [0]:
all_users_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

In [0]:
preds_df = pd.DataFrame(all_users_predicted_ratings, columns = R_df.columns)

In [0]:
# preds_df.index.values


In [0]:
user_ids = R_df.index

In [0]:
pred_ids = preds_df.index.values

In [0]:
user_pred_map = dict(zip(user_ids,pred_ids))

In [0]:
################################## RECOMMENDATION ALGORITHM ##################################################################

In [0]:
def recommend_movies(userID,Recommendation_count):
  user_ids = R_df.index
  pred_ids = preds_df.index.values
  user_pred_map = dict(zip(user_ids,pred_ids))


  for user_ids, pred_row_number in user_pred_map.items():
    if user_ids == userID:
      pred_row_number
      # print(pred_row_number)
    
      user_pred_total = preds_df.loc[pred_row_number,:].sort_values(ascending=False)

      sorted_user_predictions = pd.DataFrame({'Predicted_Ratings':preds_df.loc[user_row_number].sort_values(ascending = False)})
      ## sorted_user_predictions.head()

      sorted_user_predictions.reset_index(inplace=True)
      ## sorted_user_predictions.head()

      ##  sorted_user_predictions.shape
      sorted_user_predictions_all = pd.merge(sorted_user_predictions,rtings,on='ISBN',how='inner')

      ##  sorted_user_predictions_all.shape
      ## Total_Predictions = sorted_user_predictions_all.shape[0]
      ## Total_Predictions

      Books_Rated = sorted_user_predictions_all[sorted_user_predictions_all['userID']==user_ids].dropna()

      Books_Not_Rated = sorted_user_predictions_all[((sorted_user_predictions_all['userID']==user_ids)&(sorted_user_predictions_all['bookRating'].isnull())^(sorted_user_predictions_all['userID']!=user_ids))]
      ##  Books_Not_Rated.shape[0]

      Books_Not_Rated_Unique = Books_Not_Rated[['Predicted_Ratings','bookTitle','bookAuthor', 'yearOfPublication','publisher']].drop_duplicates()

      ## Recommendation_count

      Recommendation_result = Books_Not_Rated_Unique.sort_values(by='Predicted_Ratings',ascending=False).head(Recommendation_count)

      print('UserID:- {0} , has already rated {1} books.'.format(user_ids, Books_Rated.shape[0]))
      print('Recommending the highest {0} predicted different ratings books not already rated by user {1}.'.format(Recommendation_count,user_ids))

      return(Recommendation_result)

In [853]:
np.array(sorted(rtings['userID'].unique()))

array([  2033,   2110,   2276,   4017,   4385,   5582,   6242,   6251,
         6543,   6575,   7286,   7346,   8067,   8245,   8681,   8890,
        10560,  11676,  11993,  12538,  12824,  12982,  13552,  13850,
        14422,  15408,  15418,  16634,  16795,  16966,  17950,  19085,
        21014,  23768,  23872,  23902,  25409,  25601,  25981,  26535,
        26544,  26583,  28591,  28634,  29259,  30276,  30511,  30711,
        30735,  30810,  31315,  31556,  31826,  32773,  33145,  35433,
        35836,  35857,  35859,  36299,  36554,  36606,  36609,  36836,
        36907,  37644,  37712,  37950,  38023,  38273,  38281,  39281,
        39467,  40889,  40943,  43246,  43910,  46398,  47316,  48025,
        48494,  49144,  49889,  51883,  52199,  52350,  52584,  52614,
        52917,  53220,  55187,  55490,  55492,  56271,  56399,  56447,
        56554,  56959,  59172,  60244,  60337,  60707,  63714,  63956,
        65258,  66942,  67840,  68555,  69078,  69389,  69697,  70415,
      

In [0]:
#x = recommend_movies(2033,12)
#x.index
#y = recommend_movies(278418,12)
#y.index

#print(x.index.intersection(y.index),len(x.index.intersection(y.index)))

In [864]:
recommend_movies(2110,12)

UserID:- 2110 , has already rated 85 books.
Recommending the highest 12 predicted different ratings books not already rated by user 2110.


Unnamed: 0,Predicted_Ratings,bookTitle,bookAuthor,yearOfPublication,publisher
0,0.285622,Purity in Death,J.D. Robb,2002,Berkley Publishing Group
15,0.253016,Face the Fire (Three Sisters Island Trilogy),Nora Roberts,2002,Jove Books
29,0.236891,Dance upon the Air (Three Sisters Island Trilogy),Nora Roberts,2003,Jove Books
48,0.212176,Jewels of the Sun (Irish Trilogy),Nora Roberts,2004,Jove Books
63,0.206198,Heart of the Sea (Irish Trilogy),Nora Roberts,2000,Jove Books
81,0.204699,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown"
152,0.196262,Tears of the Moon (Irish Trilogy),Nora Roberts,2000,Jove Books
169,0.188376,Witness in Death (Eve Dallas Mysteries (Paperb...,J. D. Robb,2004,Berkley Publishing Group
180,0.183306,Ceremony in Death (Eve Dallas Mysteries (Paper...,J. D. Robb,1997,Berkley Publishing Group
188,0.178672,Summer Pleasures,Nora Roberts,2002,Silhouette


In [859]:
recommend_movies(2033,12)

0
UserID:- 2033 , has already rated 98 books.
Recommending the highest 12 predicted ratings books not already rated by user 2033.


Unnamed: 0,Predicted_Ratings,bookTitle,bookAuthor,yearOfPublication,publisher
0,0.285622,Purity in Death,J.D. Robb,2002,Berkley Publishing Group
15,0.253016,Face the Fire (Three Sisters Island Trilogy),Nora Roberts,2002,Jove Books
29,0.236891,Dance upon the Air (Three Sisters Island Trilogy),Nora Roberts,2003,Jove Books
48,0.212176,Jewels of the Sun (Irish Trilogy),Nora Roberts,2004,Jove Books
63,0.206198,Heart of the Sea (Irish Trilogy),Nora Roberts,2000,Jove Books
81,0.204699,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown"
152,0.196262,Tears of the Moon (Irish Trilogy),Nora Roberts,2000,Jove Books
169,0.188376,Witness in Death (Eve Dallas Mysteries (Paperb...,J. D. Robb,2004,Berkley Publishing Group
180,0.183306,Ceremony in Death (Eve Dallas Mysteries (Paper...,J. D. Robb,1997,Berkley Publishing Group
188,0.178672,Summer Pleasures,Nora Roberts,2002,Silhouette


In [0]:
############################################  FINISH ###################################################################################

### Take a particular user_id

### Lets find the recommendations for user with id `2110`

#### Note: Execute the below cells to get the variables loaded

In [0]:
userID = 2110

In [0]:
user_id = 2 #2nd row in ratings matrix and predicted matrix

### Get the predicted ratings for userID `2110` and sort them in descending order

In [819]:
recommend_movies(2110,10)

UserID:- 2110 , has already rated 85 books.
Recommending the highest 10 predicted ratings books not already rated by user 2110.


Unnamed: 0,Predicted_Ratings,bookTitle,bookAuthor,yearOfPublication,publisher
0,0.285622,Purity in Death,J.D. Robb,2002,Berkley Publishing Group
15,0.253016,Face the Fire (Three Sisters Island Trilogy),Nora Roberts,2002,Jove Books
29,0.236891,Dance upon the Air (Three Sisters Island Trilogy),Nora Roberts,2003,Jove Books
48,0.212176,Jewels of the Sun (Irish Trilogy),Nora Roberts,2004,Jove Books
63,0.206198,Heart of the Sea (Irish Trilogy),Nora Roberts,2000,Jove Books
81,0.204699,The Lovely Bones: A Novel,Alice Sebold,2002,"Little, Brown"
152,0.196262,Tears of the Moon (Irish Trilogy),Nora Roberts,2000,Jove Books
169,0.188376,Witness in Death (Eve Dallas Mysteries (Paperb...,J. D. Robb,2004,Berkley Publishing Group
180,0.183306,Ceremony in Death (Eve Dallas Mysteries (Paper...,J. D. Robb,1997,Berkley Publishing Group
188,0.178672,Summer Pleasures,Nora Roberts,2002,Silhouette


### Create a dataframe with name `user_data` containing userID `2110` explicitly interacted books

In [0]:
user_data.head()

In [799]:
user_data.shape

(103, 10)

### Combine the user_data and and corresponding book data(`book_data`) in a single dataframe with name `user_full_info`

In [0]:
book_data.head()

In [0]:
user_full_info.head()

### Get top 10 recommendations for above given userID from the books not already rated by that user