**About Book Crossing Dataset**<br>

This dataset has been compiled by Cai-Nicolas Ziegler in 2004, and it comprises of three tables for users, books and ratings. Explicit ratings are expressed on a scale from 1-10 (higher values denoting higher appreciation) and implicit rating is expressed by 0.

Reference: http://www2.informatik.uni-freiburg.de/~cziegler/BX/ 

**Objective**

This project entails building a Book Recommender System for users based on user-based and item-based collaborative filtering approaches.

#### Execute the below cell to load the datasets

In [1]:
#Import the pandas and numpy libraries
import pandas as pd
import numpy as np

In [2]:
#Loading data and assigning the column names to the dataframe
books = pd.read_csv("books.csv", sep=";", error_bad_lines=False, encoding="latin-1")
books.columns = ['ISBN', 'bookTitle', 'bookAuthor', 'yearOfPublication', 'publisher', 'imageUrlS', 'imageUrlM', 'imageUrlL']

users = pd.read_csv('users.csv', sep=';', error_bad_lines=False, encoding="latin-1")
users.columns = ['userID', 'Location', 'Age']

ratings = pd.read_csv('ratings.csv', sep=';', error_bad_lines=False, encoding="latin-1")
ratings.columns = ['userID', 'ISBN', 'bookRating']

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expected 8 fields, saw 9\n'
  interactivity=interactivity, compiler=compiler, result=result)


### Check no.of records and features given in each dataset

In [3]:
#No of records and features in books
books.shape

(271360, 8)

In [4]:
#No of records and features in users
users.shape

(278858, 3)

In [5]:
#No of records and features in ratings
ratings.shape

(1149780, 3)

## Exploring books dataset

In [6]:
#Print the top 5 rows and looks at the sample data
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


### Drop last three columns containing image URLs which will not be required for analysis

In [7]:
#Retaining the ISBN, bookTitle, bookAuther, yearOfPublication and publisher features in the books data frame.
books = books[['ISBN','bookTitle','bookAuthor','yearOfPublication','publisher']]

In [8]:
#Look at the sample data of the books data frame
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


**yearOfPublication**

### Check unique values of yearOfPublication


In [9]:
#List the unique values of year. 
books['yearOfPublication'].unique()

array([2002, 2001, 1991, 1999, 2000, 1993, 1996, 1988, 2004, 1998, 1994,
       2003, 1997, 1983, 1979, 1995, 1982, 1985, 1992, 1986, 1978, 1980,
       1952, 1987, 1990, 1981, 1989, 1984, 0, 1968, 1961, 1958, 1974,
       1976, 1971, 1977, 1975, 1965, 1941, 1970, 1962, 1973, 1972, 1960,
       1966, 1920, 1956, 1959, 1953, 1951, 1942, 1963, 1964, 1969, 1954,
       1950, 1967, 2005, 1957, 1940, 1937, 1955, 1946, 1936, 1930, 2011,
       1925, 1948, 1943, 1947, 1945, 1923, 2020, 1939, 1926, 1938, 2030,
       1911, 1904, 1949, 1932, 1928, 1929, 1927, 1931, 1914, 2050, 1934,
       1910, 1933, 1902, 1924, 1921, 1900, 2038, 2026, 1944, 1917, 1901,
       2010, 1908, 1906, 1935, 1806, 2021, '2000', '1995', '1999', '2004',
       '2003', '1990', '1994', '1986', '1989', '2002', '1981', '1993',
       '1983', '1982', '1976', '1991', '1977', '1998', '1992', '1996',
       '0', '1997', '2001', '1974', '1968', '1987', '1984', '1988',
       '1963', '1956', '1970', '1985', '1978', '1973', '1980'

As it can be seen from above that there are some incorrect entries in this field. It looks like Publisher names 'DK Publishing Inc' and 'Gallimard' have been incorrectly loaded as yearOfPublication in dataset due to some errors in csv file.


Also some of the entries are strings and same years have been entered as numbers in some places. We will try to fix these things in the coming questions.

### Check the rows having 'DK Publishing Inc' as yearOfPublication

In [10]:
#2 rows with the yearOfPublication as 'DK Publishing Inc'
books[books['yearOfPublication'] == 'DK Publishing Inc' ]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
209538,078946697X,"DK Readers: Creating the X-Men, How It All Beg...",2000,DK Publishing Inc,http://images.amazon.com/images/P/078946697X.0...
221678,0789466953,"DK Readers: Creating the X-Men, How Comic Book...",2000,DK Publishing Inc,http://images.amazon.com/images/P/0789466953.0...


In [11]:
#1 row with the yearOfPublication as 'Gallimard'
books[books['yearOfPublication'] == 'Gallimard']

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
220731,2070426769,"Peuple du ciel, suivi de 'Les Bergers\"";Jean-M...",2003,Gallimard,http://images.amazon.com/images/P/2070426769.0...


In [12]:
#1048 rows with the yearOfPublication as 0
books[books['yearOfPublication'] == '0']

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
196656,3442035368,Ich Gestehe,Heinz G. Konsalik,0,Wilhelm Goldmann Verlag GmbH
196678,0553124803,Being There,Jerzy Kosinski,0,Bantam Doubleday Dell
196680,888274387X,Vaniglia E Cioccolato,Modignani Casati,0,Sperling Paperback
196685,033368155X,Surreal Lives the Surrealists 1945,Ruth Brandon,0,Humanity Press/prometheus Bk
196734,0207158452,Games of the Strong,Glenda Adams,0,Harpercollins Publisher
...,...,...,...,...,...
261929,0760700702,100 Great Archaeological Discoveries,Paul G Bahn,0,Barnes Noble Inc
261930,0760701962,UFO's: A Scientific Debate,Carl Sagan,0,Barnes Noble Books
261931,0760706379,Only Way to Cross,John Maxtone Graham,0,Barnes Noble
262120,0880292288,New York Times Guide to Reference Materials,Mona Mccormick,0,Dorset House Publishing Co Inc


### Drop the rows having `'DK Publishing Inc'` and `'Gallimard'` as `yearOfPublication`

In [13]:
#Drop the rows have 'DK Publishing Inc' and 'Gallimard' as yearOfPublication
books.drop(books[books['yearOfPublication'] == 'Gallimard'].index, inplace=True)

In [14]:
books.drop(books[books['yearOfPublication'] == 'DK Publishing Inc'].index, inplace=True)

### Change the datatype of yearOfPublication to 'int'

In [15]:
#Convert yearOfPublication column as int
books[['yearOfPublication']] = books[['yearOfPublication']].astype(int)

In [16]:
#List the data types of the books dataframe
books.dtypes

ISBN                 object
bookTitle            object
bookAuthor           object
yearOfPublication     int64
publisher            object
dtype: object

### Drop NaNs in `'publisher'` column


In [17]:
#Find the number of rows with NAN as one of it's column values
books.isna().sum()

ISBN                 0
bookTitle            0
bookAuthor           1
yearOfPublication    0
publisher            2
dtype: int64

In [18]:
#Drop all rows with NaN as one if it's column values
books=books.dropna(axis=0, how='any')

In [19]:
#Print the number of rows and columns
books.shape

(271354, 5)

## Exploring Users dataset

In [20]:
#Print user shape and a sample dataset
print(users.shape)
users.head()

(278858, 3)


Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### Get all unique values in ascending order for column `Age`

In [21]:
#Get all unique values of Age
users['Age'].unique()

array([ nan,  18.,  17.,  61.,  26.,  14.,  25.,  19.,  46.,  55.,  32.,
        24.,  20.,  34.,  23.,  51.,  31.,  21.,  44.,  30.,  57.,  43.,
        37.,  41.,  54.,  42.,  50.,  39.,  53.,  47.,  36.,  28.,  35.,
        13.,  58.,  49.,  38.,  45.,  62.,  63.,  27.,  33.,  29.,  66.,
        40.,  15.,  60.,   0.,  79.,  22.,  16.,  65.,  59.,  48.,  72.,
        56.,  67.,   1.,  80.,  52.,  69.,  71.,  73.,  78.,   9.,  64.,
       103., 104.,  12.,  74.,  75., 231.,   3.,  76.,  83.,  68., 119.,
        11.,  77.,   2.,  70.,  93.,   8.,   7.,   4.,  81., 114., 230.,
       239.,  10.,   5., 148., 151.,   6., 101., 201.,  96.,  84.,  82.,
        90., 123., 244., 133.,  91., 128.,  94.,  85., 141., 110.,  97.,
       219.,  86., 124.,  92., 175., 172., 209., 212., 237.,  87., 162.,
       100., 156., 136.,  95.,  89., 106.,  99., 108., 210.,  88., 199.,
       147., 168., 132., 159., 186., 152., 102., 116., 200., 115., 226.,
       137., 207., 229., 138., 109., 105., 228., 18

Age column has some invalid entries like nan, 0 and very high values like 100 and above

### Values below 5 and above 90 do not make much sense for our book rating case...hence replace these by NaNs

In [22]:
users['Age'].loc[(users.Age > 90)].count()

430

In [23]:
#Age value above 90 is replaced as NaN
users['Age'].mask(users['Age'] > 90, np.nan, inplace=True)

In [24]:
users['Age'].loc[(users.Age > 90)].count()

0

In [25]:
users['Age'].loc[(users.Age < 5)].count()

882

In [26]:
#Age value below 5 is replaced as NaN
users['Age'].mask(users['Age'] < 5, np.nan, inplace=True)

In [27]:
users['Age'].loc[(users.Age < 5)].count()

0

In [28]:
#Total number of rows with NaN values
users['Age'].isna().sum()

112074

In [29]:
#Mean of the age column
users['Age'].mean()

34.72384041634689

### Replace null values in column `Age` with mean

In [30]:
#Replace NaN values with mean of Age column
users['Age'].fillna((users['Age'].mean()),inplace=True)

In [31]:
users['Age'].isna().sum()

0

### Change the datatype of `Age` to `int`

In [32]:
#Change datatype of Age to int
users[['Age']] = users[['Age']].astype(int)

In [33]:
print(sorted(users.Age.unique()))

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]


## Exploring the Ratings Dataset

### check the shape

In [34]:
#Print the ratings shape
ratings.shape

(1149780, 3)

In [35]:
n_users = users.shape[0]
n_books = books.shape[0]

In [36]:
#Sample ratings data
ratings.head(5)

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


### Ratings dataset should have books only which exist in our books dataset. Drop the remaining rows

In [37]:
cond = ~ratings['ISBN'].isin(books['ISBN']) == True

In [38]:
#Drop the rows in ratings dataframe which does not exist in books dataframe
ratings.drop(ratings[cond].index, inplace = True)

In [39]:
#Print the rows and columns after dropping the rows
ratings.shape

(1031129, 3)

### Ratings dataset should have ratings from users which exist in users dataset. Drop the remaining rows

In [40]:
cond1 = ~ratings['userID'].isin(users['userID']) == True

In [41]:
ratings[cond1]

Unnamed: 0,userID,ISBN,bookRating


In [42]:
#Print the rows and columns after dropping the rows
#No columns dropped
ratings.shape

(1031129, 3)

### Consider only ratings from 1-10 and leave 0s in column `bookRating`

In [43]:
ratings_c = ratings[ratings['bookRating'] > 0]

In [44]:
ratings_c

Unnamed: 0,userID,ISBN,bookRating
1,276726,0155061224,5
3,276729,052165615X,3
4,276729,0521795028,6
8,276744,038550120X,7
16,276747,0060517794,9
...,...,...,...
1149771,276704,0743211383,7
1149773,276704,0806917695,5
1149775,276704,1563526298,9
1149777,276709,0515107662,10


### Find out which rating has been given highest number of times

In [45]:
#Looks like '8' ratings has given the most number of times.
ratings_c['bookRating'].value_counts()

8     91803
10    71225
7     66401
9     60776
5     45355
6     31687
4      7617
3      5118
2      2375
1      1481
Name: bookRating, dtype: int64

### **Collaborative Filtering Based Recommendation Systems**

### For more accurate results only consider users who have rated atleast 100 books

In [46]:
import pandasql as ps

q1 = """SELECT userID FROM ratings_c group by userID having count(userID) > 100"""
output = ps.sqldf(q1, locals())
output

Unnamed: 0,userID
0,2033
1,2110
2,2276
3,4017
4,4385
...,...
435,274061
436,274301
437,275970
438,277427


In [47]:
q2 = """select * from ratings_c where userID in (SELECT userID FROM ratings_c group by userID having count(userID) > 100)"""
considered_users = ps.sqldf(q2, locals())

In [48]:
considered_users

Unnamed: 0,userID,ISBN,bookRating
0,277427,002542730X,10
1,277427,003008685X,8
2,277427,0060006641,10
3,277427,0060542128,7
4,277427,0061009059,9
...,...,...,...
102364,275970,185649814X,7
102365,275970,1860462588,8
102366,275970,1886411077,6
102367,275970,3411086211,10


In [49]:
min_user_ratings = 100
filter_users = ratings_c['userID'].value_counts() > min_user_ratings
filter_users = filter_users[filter_users].index.tolist()

df_new = ratings_c[(ratings_c['userID'].isin(filter_users))]

In [50]:
df_new

Unnamed: 0,userID,ISBN,bookRating
1456,277427,002542730X,10
1458,277427,003008685X,8
1461,277427,0060006641,10
1465,277427,0060542128,7
1474,277427,0061009059,9
...,...,...,...
1147587,275970,185649814X,7
1147592,275970,1860462588,8
1147599,275970,1886411077,6
1147611,275970,3411086211,10


### Generating ratings matrix from explicit ratings


#### Note: since NaNs cannot be handled by training algorithms, replace these by 0, which indicates absence of ratings

In [51]:
from surprise import SVD

In [52]:
from surprise import Dataset
from surprise import Reader

In [53]:
reader = Reader()

In [54]:
data = Dataset.load_from_df(df_new,reader)

In [55]:
trainset = data.build_full_trainset()

In [56]:
trainset

<surprise.trainset.Trainset at 0x128db0c10>

In [57]:
trainset.ur

defaultdict(list,
            {0: [(0, 10.0),
              (1, 8.0),
              (2, 10.0),
              (3, 7.0),
              (4, 9.0),
              (5, 8.0),
              (6, 8.0),
              (7, 6.0),
              (8, 8.0),
              (9, 7.0),
              (10, 8.0),
              (11, 10.0),
              (12, 10.0),
              (13, 10.0),
              (14, 8.0),
              (15, 8.0),
              (16, 10.0),
              (17, 9.0),
              (18, 9.0),
              (19, 8.0),
              (20, 9.0),
              (21, 9.0),
              (22, 9.0),
              (23, 9.0),
              (24, 9.0),
              (25, 8.0),
              (26, 8.0),
              (27, 7.0),
              (28, 7.0),
              (29, 7.0),
              (30, 10.0),
              (31, 9.0),
              (32, 8.0),
              (33, 9.0),
              (34, 8.0),
              (35, 9.0),
              (36, 7.0),
              (37, 5.0),
              (38, 7.0),
       

In [58]:
algo = SVD()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1333be5d0>

In [59]:
algo.pu

array([[-0.11828196, -0.10033351,  0.09115127, ...,  0.17052433,
        -0.27263561, -0.15676346],
       [ 0.14018715,  0.33563262,  0.00395494, ..., -0.21339383,
         0.09031221,  0.14829719],
       [-0.0706707 , -0.03641866, -0.26881021, ...,  0.28288831,
         0.26668156,  0.06442616],
       ...,
       [ 0.06460528,  0.1499728 , -0.21000103, ..., -0.06358624,
         0.07621348,  0.06506644],
       [-0.1817239 ,  0.25292419, -0.00810756, ...,  0.36136451,
         0.08790827,  0.36930542],
       [-0.06310561, -0.19182386, -0.21339864, ...,  0.21353649,
         0.18674069,  0.07034101]])

In [60]:
algo.qi

array([[ 0.04933839,  0.11555546,  0.03041628, ...,  0.08375198,
        -0.06451871, -0.08169348],
       [ 0.13402359, -0.05760209, -0.07614932, ..., -0.07644947,
         0.11480305,  0.03866117],
       [-0.015504  , -0.08419772, -0.04778079, ...,  0.11308174,
        -0.07614533, -0.09193206],
       ...,
       [-0.07377169, -0.00132842, -0.13247609, ..., -0.05088892,
         0.00303399, -0.0619367 ],
       [ 0.08755331,  0.02501367,  0.11010782, ...,  0.13731056,
         0.17102878,  0.0382447 ],
       [ 0.14616339, -0.04615544, -0.10926716, ..., -0.12238888,
         0.11471309,  0.0046623 ]])

### Generate the predicted ratings using SVD function from surprise library

In [61]:
testset = trainset.build_anti_testset()

In [62]:
predictions = algo.test(testset)

In [63]:
predictions

[Prediction(uid=277427, iid='0006542808', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0060392185', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0140367209', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0140546499', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0192816071', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0307022196', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0310912520', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0312850131', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid='0312860862', r_ui=7.82591409508738, est=5, details={'was_impossible': False}),
 Prediction(uid=277427, iid=

### Lets find the recommendations for user with id `2110`

#### Note: Execute the below cells to get the variables loaded

In [64]:
userID = 2110

In [65]:
user_id = 2 #2nd row in ratings matrix and predicted matrix

### Get the predicted ratings for userID `2110` and sort them in descending order

In [66]:
from collections import defaultdict

In [67]:
#Function to get the predicted ratings for each user
def get_top_n(predictions, n=10):
    # First map the predictions to each user.
    top_n = defaultdict(list)
    for uid, iid, true_r, est, _ in predictions:
        top_n[uid].append((iid, est))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]

    return top_n

In [68]:
#Prediction for each user
top_n = get_top_n(predictions, n=10)

In [69]:
top_n

defaultdict(list,
            {277427: [('0006542808', 5),
              ('0060392185', 5),
              ('0140367209', 5),
              ('0140546499', 5),
              ('0192816071', 5),
              ('0307022196', 5),
              ('0310912520', 5),
              ('0312850131', 5),
              ('0312860862', 5),
              ('0312923651', 5)],
             278418: [('002542730X', 5),
              ('003008685X', 5),
              ('0060006641', 5),
              ('0060542128', 5),
              ('0061009059', 5),
              ('0062507109', 5),
              ('0132220598', 5),
              ('0140283374', 5),
              ('014039026X', 5),
              ('0140390715', 5)],
             2033: [('002542730X', 5),
              ('003008685X', 5),
              ('0060006641', 5),
              ('0060542128', 5),
              ('0061009059', 5),
              ('0062507109', 5),
              ('0132220598', 5),
              ('0140283374', 5),
              ('014039026X', 5),
 

In [70]:
# Print the recommended items for user 2110
top_n[userID]

[('002542730X', 5),
 ('003008685X', 5),
 ('0060006641', 5),
 ('0060542128', 5),
 ('0061009059', 5),
 ('0062507109', 5),
 ('0132220598', 5),
 ('0140283374', 5),
 ('014039026X', 5),
 ('0140390715', 5)]

In [71]:
# Print the recommended items for user 2110
for uid, user_ratings in top_n.items():
    if uid == 2110:
        print(uid, [iid for (iid, _) in user_ratings])

2110 ['002542730X', '003008685X', '0060006641', '0060542128', '0061009059', '0062507109', '0132220598', '0140283374', '014039026X', '0140390715']


### Create a dataframe with name `user_data` containing userID `2110` explicitly interacted books

In [72]:
user_data = trainset.ur[trainset.to_inner_uid(2110)]

In [73]:
user_data = pd.DataFrame(user_data)

In [74]:
user_data.head()

Unnamed: 0,0,1
0,381,7.0
1,382,8.0
2,383,10.0
3,384,8.0
4,385,5.0


In [75]:
user_data.shape

(103, 2)

### Combine the user_data and and corresponding book data(`book_data`) in a single dataframe with name `user_full_info`

In [94]:
book_data = trainset.ir[trainset.to_inner_uid(2110)]

In [95]:
book_data = pd.DataFrame(book_data)

In [96]:
book_data.head()

Unnamed: 0,0,1
0,0,7.0
1,19,8.0
2,46,7.0
3,157,8.0
4,186,5.0


In [97]:
book_data.shape

(9, 2)

In [98]:
book_data.head()

Unnamed: 0,0,1
0,0,7.0
1,19,8.0
2,46,7.0
3,157,8.0
4,186,5.0


### Get top 10 recommendations for above given userID from the books not already rated by that user

In [77]:
#Function to print the top 10 recomendations
recommendation = pd.DataFrame(columns=['BookID'])

# Print the recommended items for user 2110
for uid, user_ratings in top_n.items():
    if uid == 2110:
        #print([iid for (iid, _) in user_ratings])
        for (iid, _) in user_ratings:
            recommendation = recommendation.append({'BookID':iid}, ignore_index = True)
            print(iid)

002542730X
003008685X
0060006641
0060542128
0061009059
0062507109
0132220598
0140283374
014039026X
0140390715


In [88]:
#Books recommended
recommendation_books = books[books["ISBN"].isin(["002542730X","003008685X","0060006641","0060542128","0061009059","0062507109","0132220598","0140283374","014039026X","0140390715"])]
recommendation_books

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
669,0061009059,One for the Money (Stephanie Plum Novels (Pape...,Janet Evanovich,1995,HarperTorch
3739,002542730X,Politically Correct Bedtime Stories: Modern Ta...,James Finn Garner,1994,John Wiley &amp; Sons Inc
33560,0060542128,When the Storm Breaks,Heather Lowell,2003,HarperTorch
104926,0140283374,Women in Love (Penguin Great Books of the 20th...,D. H. Lawrence,2000,Penguin Books
104989,0060006641,"On Writing Well, 25th Anniversary : The Classi...",William Zinsser,2001,HarperResource
179081,014039026X,The Prairie (Penguin Classics),James Fenimore Cooper,1987,Penguin Books
247016,003008685X,Pioneers,James Fenimore Cooper,1974,Thomson Learning
247018,0062507109,Inner Bonding: Becoming a Loving Adult to Your...,Margaret Paul,1992,HarperSanFrancisco
247019,0132220598,Dynamics of Motor-Skill Acquisition,Margaret D. Robb,1972,Prentice Hall
247020,0140390715,The Pathfinder (Penguin Classic),James Fenimore Cooper,1989,Penguin Books
