# Book Recommender System Prototype Using Surprise

A project for my Codecademy Certified Data Scientist: Machine Learning Specialist professional certification.

Robert Hall

01/07/2025


### 1. Data Importation and Basic Exploration

In [2]:
import pandas as pd
book_ratings = pd.read_csv('goodreads_ratings.csv')
book_ratings.head()

Unnamed: 0,user_id,book_id,review_id,rating,review_text,date_added,date_updated,read_at,started_at,n_votes,n_comments
0,d089c9b670c0b0b339353aebbace46a1,7686667,3337e0e75701f7f682de11638ccdc60c,3,"Like Matched, this book felt like it was echoi...",Fri Apr 29 14:45:32 -0700 2011,Mon Feb 02 12:57:57 -0800 2015,Sat Jun 18 00:00:00 -0700 2011,Thu May 19 00:00:00 -0700 2011,0,0
1,6dcb2c16e12a41ae0c6c38e9d46f3292,18073066,7201aa3c1161f2bad81258b6d4686c16,5,"WOW again! 4,5 Stars \n So i wont forget to me...",Thu Aug 01 02:15:18 -0700 2013,Mon Nov 18 14:49:26 -0800 2013,Mon Aug 19 00:00:00 -0700 2013,Mon Aug 12 00:00:00 -0700 2013,16,14
2,244e0ce681148a7586d7746676093ce9,13610986,07a203f87bfe1b65ff58774667f6f80d,5,The second novel was hot & heavy. Not only in ...,Sun Nov 23 18:17:50 -0800 2014,Sat May 16 20:34:19 -0700 2015,Fri Dec 19 00:00:00 -0800 2014,Sun Nov 23 00:00:00 -0800 2014,0,0
3,73fcc25ff29f8b73b3a7578aec846394,27274343,8be2d87b07098c16f9742020ec459383,1,What a maddening waste of time. And I unfortun...,Mon Oct 31 08:29:06 -0700 2016,Wed Apr 26 16:06:28 -0700 2017,Wed Apr 26 16:06:28 -0700 2017,Sun Apr 23 00:00:00 -0700 2017,0,1
4,f8880e158a163388a990b64fec7df300,11614718,a29c4ba03e33ad073a414ac775266c5f,4,4.5 stars! \n This was an awesome read! \n So ...,Tue Mar 26 10:55:30 -0700 2013,Mon Sep 08 09:57:05 -0700 2014,Sun Apr 20 09:26:41 -0700 2014,Fri Apr 18 00:00:00 -0700 2014,0,0


In [3]:
#1. Print dataset size and examine column data types
book_ratings.shape

(3500, 11)

In [4]:
#2. Distribution of ratings
book_ratings['rating'].value_counts()

rating
4    1278
5    1001
3     707
2     269
1     125
0     120
Name: count, dtype: int64

### 2. Data Preparation for Sunrise Recommender System

In [5]:
#3. Filter ratings that are out of range
book_ratings = book_ratings[book_ratings['rating'] != 0]
book_ratings['rating'].value_counts()

rating
4    1278
5    1001
3     707
2     269
1     125
Name: count, dtype: int64

### 3. Build Model and Truncate Original Dataset to Three Necessary Features

In [6]:
#4. Prepare data for surprise: build a Suprise reader object
from surprise import Reader
reader = Reader(rating_scale=(1, 5))

In [7]:
#5. Load `book_ratings` into a Surprise Dataset
from surprise import Dataset
data = Dataset.load_from_df(book_ratings[['user_id','book_id','rating']], reader)

### 4. Split Dataset into Train and Test Sets

In [8]:
#6. Create a 80:20 train-test split and set the random state to 7
from surprise.model_selection import train_test_split
trainset, testset = train_test_split(data, test_size=.2, random_state=7)

### 5. Build and Train a K-Nearest-Neighbors Collaborative Filter

In [9]:
#7. Use KNNBasice from Surprise to train a collaborative filter
from surprise import KNNBasic
nn_algo = KNNBasic()
nn_algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x11fb4a5a0>

### 6. Evaluate Model Accuracy By Its Root Mean Squared Error (RMSE)

In [10]:
#8. Evaluate the recommender system
from surprise import accuracy
predictions = nn_algo.test(testset)
accuracy.rmse(predictions)

RMSE: 1.1105


1.110471008157185

### 7. Predict Score of "The Martian" An Individual Would Give Who Gave "The Three Body Problem" a Rating of 5

In [11]:
#9. Prediction on a user who gave the "The Three-Body Problem" a rating of 5
print(nn_algo.predict('8842281e1d1347389f2ab93d60773d4d', '18007564').est)

3.8250739644970415
