**About Book Crossing Dataset**<br>

This dataset has been compiled by Cai-Nicolas Ziegler in 2004, and it comprises of three tables for users, books and ratings. Explicit ratings are expressed on a scale from 1-10 (higher values denoting higher appreciation) and implicit rating is expressed by 0.

Reference: http://www2.informatik.uni-freiburg.de/~cziegler/BX/ 

**Objective**

This project entails building a Book Recommender System for users based on user-based and item-based collaborative filtering approaches.

#### Execute the below cell to load the datasets

In [92]:
%matplotlib inline

import pandas as pd
import numpy as np
import seaborn as sns

%config IPCompleter.greedy=True

In [93]:
#Loading data
books = pd.read_csv("books.csv", sep=";", error_bad_lines=False, encoding="latin-1")
books.columns = ['ISBN', 'bookTitle', 'bookAuthor', 'yearOfPublication', 'publisher', 'imageUrlS', 'imageUrlM', 'imageUrlL']

b'Skipping line 6452: expected 8 fields, saw 9\nSkipping line 43667: expected 8 fields, saw 10\nSkipping line 51751: expected 8 fields, saw 9\n'
b'Skipping line 92038: expected 8 fields, saw 9\nSkipping line 104319: expected 8 fields, saw 9\nSkipping line 121768: expected 8 fields, saw 9\n'
b'Skipping line 144058: expected 8 fields, saw 9\nSkipping line 150789: expected 8 fields, saw 9\nSkipping line 157128: expected 8 fields, saw 9\nSkipping line 180189: expected 8 fields, saw 9\nSkipping line 185738: expected 8 fields, saw 9\n'
b'Skipping line 209388: expected 8 fields, saw 9\nSkipping line 220626: expected 8 fields, saw 9\nSkipping line 227933: expected 8 fields, saw 11\nSkipping line 228957: expected 8 fields, saw 10\nSkipping line 245933: expected 8 fields, saw 9\nSkipping line 251296: expected 8 fields, saw 9\nSkipping line 259941: expected 8 fields, saw 9\nSkipping line 261529: expected 8 fields, saw 9\n'
  interactivity=interactivity, compiler=compiler, result=result)


In [94]:
users = pd.read_csv('users.csv', sep=';', error_bad_lines=False, encoding="latin-1")
users.columns = ['userID', 'Location', 'Age']

In [95]:
ratings = pd.read_csv('ratings.csv', sep=';', error_bad_lines=False, encoding="latin-1")
ratings.columns = ['userID', 'ISBN', 'bookRating']

### Check no.of records and features given in each dataset

In [96]:
books.info()
print ("\n books data shape: ",  books.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 271360 entries, 0 to 271359
Data columns (total 8 columns):
ISBN                 271360 non-null object
bookTitle            271360 non-null object
bookAuthor           271359 non-null object
yearOfPublication    271360 non-null object
publisher            271358 non-null object
imageUrlS            271360 non-null object
imageUrlM            271360 non-null object
imageUrlL            271357 non-null object
dtypes: object(8)
memory usage: 16.6+ MB

 books data shape:  (271360, 8)


In [97]:
users.info()
print ("\n books data shape: ",  users.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 278858 entries, 0 to 278857
Data columns (total 3 columns):
userID      278858 non-null int64
Location    278858 non-null object
Age         168096 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 6.4+ MB

 books data shape:  (278858, 3)


In [98]:
ratings.info()
print ("\n books data shape: ",  ratings.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1149780 entries, 0 to 1149779
Data columns (total 3 columns):
userID        1149780 non-null int64
ISBN          1149780 non-null object
bookRating    1149780 non-null int64
dtypes: int64(2), object(1)
memory usage: 26.3+ MB

 books data shape:  (1149780, 3)


## Exploring books dataset

In [99]:
books.head(5)

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher,imageUrlS,imageUrlM,imageUrlL
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...,http://images.amazon.com/images/P/0195153448.0...
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...,http://images.amazon.com/images/P/0002005018.0...
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...,http://images.amazon.com/images/P/0060973129.0...
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...,http://images.amazon.com/images/P/0374157065.0...
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...,http://images.amazon.com/images/P/0393045218.0...


## Observations:

### yearofPublication should be integer. But here it is object datatype. Some issue is there with the data. Lets explore...

### Drop last three columns containing image URLs which will not be required for analysis

In [100]:
books.drop(books.columns[[-1,-2,-3]], axis=1, inplace=True)

In [101]:
books.head()

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
0,195153448,Classical Mythology,Mark P. O. Morford,2002,Oxford University Press
1,2005018,Clara Callan,Richard Bruce Wright,2001,HarperFlamingo Canada
2,60973129,Decision in Normandy,Carlo D'Este,1991,HarperPerennial
3,374157065,Flu: The Story of the Great Influenza Pandemic...,Gina Bari Kolata,1999,Farrar Straus Giroux
4,393045218,The Mummies of Urumchi,E. J. W. Barber,1999,W. W. Norton &amp; Company


In [102]:
# Checking for missing value 
print ("Count of null values in each feature:\n", books.isna().sum())

Count of null values in each feature:
 ISBN                 0
bookTitle            0
bookAuthor           1
yearOfPublication    0
publisher            2
dtype: int64


**yearOfPublication**

### Check unique values of yearOfPublication


In [103]:
books['yearOfPublication'].unique()

array([2002, 2001, 1991, 1999, 2000, 1993, 1996, 1988, 2004, 1998, 1994,
       2003, 1997, 1983, 1979, 1995, 1982, 1985, 1992, 1986, 1978, 1980,
       1952, 1987, 1990, 1981, 1989, 1984, 0, 1968, 1961, 1958, 1974,
       1976, 1971, 1977, 1975, 1965, 1941, 1970, 1962, 1973, 1972, 1960,
       1966, 1920, 1956, 1959, 1953, 1951, 1942, 1963, 1964, 1969, 1954,
       1950, 1967, 2005, 1957, 1940, 1937, 1955, 1946, 1936, 1930, 2011,
       1925, 1948, 1943, 1947, 1945, 1923, 2020, 1939, 1926, 1938, 2030,
       1911, 1904, 1949, 1932, 1928, 1929, 1927, 1931, 1914, 2050, 1934,
       1910, 1933, 1902, 1924, 1921, 1900, 2038, 2026, 1944, 1917, 1901,
       2010, 1908, 1906, 1935, 1806, 2021, '2000', '1995', '1999', '2004',
       '2003', '1990', '1994', '1986', '1989', '2002', '1981', '1993',
       '1983', '1982', '1976', '1991', '1977', '1998', '1992', '1996',
       '0', '1997', '2001', '1974', '1968', '1987', '1984', '1988',
       '1963', '1956', '1970', '1985', '1978', '1973', '1980'

## Observations:

### As it can be seen from above that there are some incorrect entries in this field. It looks like Publisher names 'DK Publishing Inc' and 'Gallimard' have been incorrectly loaded as yearOfPublication in dataset due to some errors in csv file.


### Also some of the entries are strings and same years have been entered as numbers in some places. We will try to fix these things in the coming questions.

### Check the rows having 'DK Publishing Inc' as yearOfPublication

In [104]:
# Get the count of non interger values in the feature yearOfPublication.
(books.yearOfPublication.str.isdigit() == False).sum()

3

In [105]:
# Get the rows of non interger values in the feature yearOfPublication.
books[books.yearOfPublication.str.isdigit() == False]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
209538,078946697X,"DK Readers: Creating the X-Men, How It All Beg...",2000,DK Publishing Inc,http://images.amazon.com/images/P/078946697X.0...
220731,2070426769,"Peuple du ciel, suivi de 'Les Bergers\"";Jean-M...",2003,Gallimard,http://images.amazon.com/images/P/2070426769.0...
221678,0789466953,"DK Readers: Creating the X-Men, How Comic Book...",2000,DK Publishing Inc,http://images.amazon.com/images/P/0789466953.0...


In [106]:
(books['yearOfPublication'] == 'DK Publishing Inc').sum()

2

### Drop the rows having `'DK Publishing Inc'` and `'Gallimard'` as `yearOfPublication`

In [107]:
books.drop(books[books.yearOfPublication.str.isdigit() == False].index, inplace=True)

In [108]:
# Check the count of non interger values in the feature yearOfPublication.
(books.yearOfPublication.str.isdigit() == False).sum()

0

### Change the datatype of yearOfPublication to 'int'

In [109]:
books['yearOfPublication'] = books['yearOfPublication'].astype('int64')

In [110]:
# Verify the dtypes of all the features
books.dtypes

ISBN                 object
bookTitle            object
bookAuthor           object
yearOfPublication     int64
publisher            object
dtype: object

### Drop NaNs in `'publisher'` column

In [111]:
# Get the count of NAN values in publisher column.
(books.publisher.isna() == True).sum()

2

In [112]:
# Get the rows of NAN values in publisher column.
books[books.publisher.isna() == True]

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
128890,193169656X,Tyrant Moon,Elaine Corvidae,2002,
129037,1931696993,Finders Keepers,Linnea Sinclair,2001,


In [113]:
# Drop the rows having NAN values in publisher column.
books.drop(books[books.publisher.isna() == True].index, inplace=True)

In [114]:
# Check the count of NAN values in publisher column.
(books.publisher.isna() == True).sum()

0

## Exploring Users dataset

In [115]:
print(users.shape)
users.head()

(278858, 3)


Unnamed: 0,userID,Location,Age
0,1,"nyc, new york, usa",
1,2,"stockton, california, usa",18.0
2,3,"moscow, yukon territory, russia",
3,4,"porto, v.n.gaia, portugal",17.0
4,5,"farnborough, hants, united kingdom",


### Get all unique values in ascending order for column `Age`

In [116]:
UsersAge = users['Age'].unique()
UsersAge.sort()
UsersAge

array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,
        22.,  23.,  24.,  25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,
        33.,  34.,  35.,  36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,
        44.,  45.,  46.,  47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,
        55.,  56.,  57.,  58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,
        66.,  67.,  68.,  69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,
        77.,  78.,  79.,  80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,
        88.,  89.,  90.,  91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,
        99., 100., 101., 102., 103., 104., 105., 106., 107., 108., 109.,
       110., 111., 113., 114., 115., 116., 118., 119., 123., 124., 127.,
       128., 132., 133., 136., 137., 138., 140., 141., 143., 146., 147.,
       148., 151., 152., 156., 157., 159., 162., 168., 172., 175., 183.,
       186., 189., 199., 200., 201., 204., 207., 20

## Observations:

### Age column has some invalid entries like nan, 0 and very high values like 100 and above

### Values below 5 and above 90 do not make much sense for our book rating case...hence replace these by NaNs

In [117]:
# Get the count of NAN values in Age column.
(users.Age.isna() == True).sum()

110762

In [118]:
# Get the count of rows have Age < 5 and > 90.
((users.Age < 5) | (users.Age > 90)).sum()

1312

In [119]:
# Replace the Age with NAN where Age < 5 and > 90.
users.loc[((users.Age < 5) | (users.Age > 90)), 'Age'] = np.nan

In [120]:
# Get the count of NAN values in Age.
(users.Age.isna() == True).sum()

112074

### Replace null values in column `Age` with mean

In [121]:
users.loc[(users.Age.isna() == True), 'Age'] = users.Age.mean()

# Check the count on NAN in Age column.
(users.Age.isna() == True).sum()

0

### Change the datatype of `Age` to `int`

In [122]:
users['Age'] = users['Age'].astype('int64')
users.dtypes

userID       int64
Location    object
Age          int64
dtype: object

In [123]:
# Get the unique values of Age to ensure correctness.
print(sorted(users.Age.unique()))

[5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]


## Exploring the Ratings Dataset

### check the shape

In [124]:
ratings.shape

(1149780, 3)

In [125]:
ratings.head(5)

Unnamed: 0,userID,ISBN,bookRating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6


### Ratings dataset should have books only which exist in our books dataset. Drop the remaining rows

In [126]:
print ("No of Rows in Ratings before irrelevant books drop: ", ratings.shape[0])
ratings = ratings.query("ISBN in @books.ISBN")
print ("No of Rows in Ratings after irrelevant books drop: ", ratings.shape[0])

No of Rows in Ratings before irrelevant books drop:  1149780
No of Rows in Ratings after irrelevant books drop:  1031130


### Ratings dataset should have ratings from users which exist in users dataset. Drop the remaining rows

In [127]:
print ("No of Rows in Ratings before irrelevant users drop: ", ratings.shape[0])
ratings = ratings.query("userID in @users.userID")
print ("No of Rows in Ratings after irrelevant users drop: ", ratings.shape[0])

No of Rows in Ratings before irrelevant users drop:  1031130
No of Rows in Ratings after irrelevant users drop:  1031130


### Consider only ratings from 1-10 and leave 0s in column `bookRating`

In [128]:
print ("No of Rows in Ratings before 0 rating drop: ", ratings.shape[0])
ratings.drop(ratings[ratings.bookRating == 0].index, inplace=True)
print ("No of Rows in Ratings after 0 rating drop: ", ratings.shape[0])

No of Rows in Ratings before 0 rating drop:  1031130
No of Rows in Ratings after 0 rating drop:  383839


### Find out which rating has been given highest number of times

In [129]:
#ratings.groupby('bookRating')['bookRating'].count()
print ("Rating given highest number of times: ", ratings.groupby('bookRating')['bookRating'].count().idxmax())

Rating given highest number of times:  8


### **Collaborative Filtering Based Recommendation Systems**

### For more accurate results only consider users who have rated atleast 100 books

In [130]:
# Group the ratings by userID and get the count of each user ratings.
UserRatingCount = ratings.groupby('userID')['userID'].count()

# Convert to dataframe with userID and ratings counts for easy explorations.
UserRatingCount_df = pd.DataFrame({'userID':UserRatingCount.index, 'Count':UserRatingCount.values})

In [131]:
UserRatingCount_df.head(5)

Unnamed: 0,userID,Count
0,8,7
1,9,1
2,12,1
3,14,3
4,16,1


In [132]:
print("Users count before dropping users who have rated less than 100 books: ", UserRatingCount_df.shape[0])
UserRatingCount_df.drop(UserRatingCount_df[UserRatingCount_df.Count < 100].index, inplace=True)
print("Users count after dropping users who have rated less than 100 books: ", UserRatingCount_df.shape[0])

Users count before dropping users who have rated less than 100 books:  68091
Users count after dropping users who have rated less than 100 books:  449


In [133]:
print ("No of Rows in Ratings before dropping users who have rated less than 100 books: ", ratings.shape[0])
ratings = ratings.query("userID in @UserRatingCount_df.userID")
print ("No of Rows in Ratings after dropping users who have rated less than 100 books: ", ratings.shape[0])

No of Rows in Ratings before dropping users who have rated less than 100 books:  383839
No of Rows in Ratings after dropping users who have rated less than 100 books:  103269


### Generating ratings matrix from explicit ratings


#### Note: since NaNs cannot be handled by training algorithms, replace these by 0, which indicates absence of ratings

In [134]:
# We want the format of ratings matrix to be one row per user and one column per movie. 
#we can pivot ratings_df to get that and call the new variable R_df.
R_df = ratings.pivot(index = 'userID', columns ='ISBN', values = 'bookRating').fillna(0)
R_df.tail()

ISBN,0000913154,0001046438,000104687X,0001047213,0001047973,000104799X,0001048082,0001053736,0001053744,0001055607,...,B000092Q0A,B00009EF82,B00009NDAN,B0000DYXID,B0000T6KHI,B0000VZEJQ,B0000X8HIE,B00013AX9E,B0001I1KOG,B000234N3A
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
274061,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
274301,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
275970,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
277427,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
278418,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Generate the predicted ratings using SVD with no.of singular values to be 50

In [135]:
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(R_df, k = 50)

In [136]:
sigma = np.diag(sigma)

In [137]:
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) 
preds_df = pd.DataFrame(all_user_predicted_ratings, columns = R_df.columns, index = R_df.index)

In [138]:
preds_df.head()

ISBN,0000913154,0001046438,000104687X,0001047213,0001047973,000104799X,0001048082,0001053736,0001053744,0001055607,...,B000092Q0A,B00009EF82,B00009NDAN,B0000DYXID,B0000T6KHI,B0000VZEJQ,B0000X8HIE,B00013AX9E,B0001I1KOG,B000234N3A
userID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2033,0.025341,-0.002146,-0.001431,-0.002146,-0.002146,0.002971,-0.00392,0.007035,0.007035,0.012316,...,0.00018,0.000226,0.042081,-0.016804,-0.080028,0.004746,0.028314,0.00012,-0.001693,0.067503
2110,-0.010012,-0.003669,-0.002446,-0.003669,-0.003669,0.001075,0.00144,-0.0035,-0.0035,0.001612,...,-0.000363,0.000403,0.008142,0.001104,-0.029224,0.000999,0.002363,-0.000242,2.9e-05,-0.013059
2276,-0.015054,-0.015457,-0.010304,-0.015457,-0.015457,0.007281,-0.014033,0.011941,0.011941,0.011796,...,-0.000455,0.001907,0.047982,0.005737,0.117859,0.006945,0.003119,-0.000304,0.009009,-0.057692
4017,-0.021499,0.035602,0.023735,0.035602,0.035602,0.030307,0.024215,-0.001053,-0.001053,0.067579,...,0.002971,0.009912,0.086248,-0.008818,0.016154,0.028848,-0.000125,0.001981,0.031201,-0.046664
4385,0.002077,-0.007965,-0.00531,-0.007965,-0.007965,0.002947,0.003057,0.000231,0.000231,0.00608,...,0.00212,0.001597,-0.012181,0.00942,0.673459,0.002591,-0.008229,0.001413,0.004918,0.047773


### Take a particular user_id

### Lets find the recommendations for user with id `2110`

In [139]:
# This function return the books with the highest predicted rating that the specified user hasn’t already rated
#Take specific user row from matrix from predictions
def recommend_books(predictions_df, userID, books_df, original_ratings_df, num_recommendations=5):
    
    # Get and sort the user's predictions
    user_row_number = userID
    sorted_user_predictions = predictions_df.loc[user_row_number].sort_values(ascending=False)
    
    # Get the user's data and merge in the books information.
    user_data = original_ratings_df[original_ratings_df.userID == (userID)]
    #Added title and genres
    user_full = (user_data.merge(books_df, how = 'left', left_on = 'ISBN', right_on = 'ISBN').
                     sort_values(['bookRating'], ascending=False)
                 )

    print ('User {0} has already rated {1} movies.'.format(userID, user_full.shape[0]))
    print ('Recommending the highest {0} predicted ratings movies not already rated.'.format(num_recommendations))
    
    # Recommend the highest predicted rating books that the user hasn't seen yet.
    recommendations = (books_df[~books_df['ISBN'].isin(user_full['ISBN'])].
         merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left',
               left_on = 'ISBN',
               right_on = 'ISBN').
         rename(columns = {user_row_number: 'Predictions'}).
         sort_values('Predictions', ascending = False).
                       iloc[:num_recommendations, :-1]
                      )

    return user_full, recommendations, sorted_user_predictions, user_data, user_full

In [140]:
# Calling the function recommend_books will give all the predictions for a particualr user.
already_rated, predictions, sorted_user_predictions, user_data, user_full = recommend_books(preds_df, 2110, books, ratings, 10)

User 2110 has already rated 103 movies.
Recommending the highest 10 predicted ratings movies not already rated.


### Note:
### The above funtion recommend_books can be called to get all the predictions and details for any user. 
### Instead of using the funtion lets do the below steps.

### Get the predicted ratings for userID `2110` and sort them in descending order

In [141]:
# Instead of calling the function lets do it one by one...

# Get and sort the user's predictions
sorted_user_predictions = preds_df.loc[2110].sort_values(ascending=False)
sorted_user_predictions.head(10)

ISBN
059035342X    0.682444
0345370775    0.368946
0345384911    0.333624
043935806X    0.333209
044021145X    0.329336
0451151259    0.313295
0439139597    0.305088
0439064872    0.290587
0380759497    0.278563
0345353145    0.250941
Name: 2110, dtype: float64

### Create a dataframe with name `user_data` containing userID `2110` explicitly interacted books

In [142]:
# Get the user's data from ratings dataframe
user_data = ratings[ratings.userID == (2110)]
user_data.head(10)

Unnamed: 0,userID,ISBN,bookRating
14448,2110,0060987529,7
14449,2110,0064472779,8
14450,2110,0140022651,10
14452,2110,0142302163,8
14453,2110,0151008116,5
14455,2110,015216250X,8
14457,2110,0345260627,10
14458,2110,0345283554,10
14459,2110,0345283929,10
14460,2110,034528710X,10


### Combine the user_data and and corresponding book data(`book_data`) in a single dataframe with name `user_full_info`

In [143]:
user_full_info = (user_data.merge(books, how = 'left', left_on = 'ISBN', right_on = 'ISBN').
                     sort_values(['bookRating'], ascending=False))
user_full_info.head()

Unnamed: 0,userID,ISBN,bookRating,bookTitle,bookAuthor,yearOfPublication,publisher
76,2110,067166865X,10,STAR TREK YESTERDAY'S SON (Star Trek: The Orig...,A.C. Crispin,1988,Audioworks
52,2110,0590109715,10,"The Andalite Chronicles (Elfangor's Journey, A...",Katherine Applegate,1997,Apple
64,2110,0590629786,10,"The Visitor (Animorphs, No 2)",K. A. Applegate,1996,Scholastic
63,2110,0590629778,10,"The Invasion (Animorphs, No 1)",K. A. Applegate,1996,Scholastic
61,2110,059046678X,10,The Yearbook,Peter Lerangis,1994,Scholastic


### Get top 10 recommendations for above given userID from the books not already rated by that user

In [144]:
# Recommend the highest predicted rating books that the user hasn't seen yet.
recommendations = (books[~books['ISBN'].isin(user_full['ISBN'])].
     merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left',
           left_on = 'ISBN',
           right_on = 'ISBN').
     rename(columns = {2110: 'Predictions'}).
     sort_values('Predictions', ascending = False).
                   iloc[:10, :-1]
                  )
recommendations

Unnamed: 0,ISBN,bookTitle,bookAuthor,yearOfPublication,publisher
1192,0345370775,Jurassic Park,Michael Crichton,1999,Ballantine Books
6184,0345384911,Crystal Line,Anne McCaffrey,1993,Del Rey Books
5458,043935806X,Harry Potter and the Order of the Phoenix (Boo...,J. K. Rowling,2003,Scholastic
455,044021145X,The Firm,John Grisham,1992,Bantam Dell Publishing Group
2031,0451151259,Eyes of the Dragon,Stephen King,1988,Penguin Putnam~mass
5383,0439139597,Harry Potter and the Goblet of Fire (Book 4),J. K. Rowling,2000,Scholastic
3413,0439064872,Harry Potter and the Chamber of Secrets (Book 2),J. K. Rowling,2000,Scholastic
976,0380759497,Xanth 15: The Color of Her Panties,Piers Anthony,1992,Eos
2435,0345353145,Sphere,MICHAEL CRICHTON,1988,Ballantine Books
6048,0451167317,The Dark Half,Stephen King,1994,Signet Book


## Note:
### For any other users we can just call the funtion recommend_books, to get user predictions and recommendations.