# Introduction to Recommendation systems

These days there are so many products and services as well as information about each one that can overwhelm users and consumers, in order to help them to choose the product and services Recommendation Systems were developed. Recommeder system creates a similarity between the user and items and exploits the similarity between user/item to make recommendations.
main benefits of the recommender system are as follow:
1. it's easier for the user/consumer to choose the product/service, which leads to better user experience. 
2. Boosting user interaction with the system or leading to higher sales. For example, there's 40% more click on the google news due to recommendation.
3. Better matching the product/service with user/consumer needs.In Amazon , 35 % products get sold due to recommendation.
4. Creates more personalized experience for the user/consumer. such as In Netflix most of the rented movies are from recommendations.


# Types of Recommendation Systems
There are mainly 6 types of the recommendations systems :-
1. Popularity based systems :- It works by recommending items viewed and purchased by most people and are rated high.It is not a personalized recommendation.
2. Classification model based:- It works by understanding the features of the user and applying the classification algorithm to decide whether the user is interested or not in the product.
3. Content based recommendations:- It is based on the information on the contents of the item rather than on the user opinions.The main idea is if the user likes an item then he or she will like the "other" similar item.
4. Collaborative Filtering:- It is based on the assumption that people like things similar to other things they like, and things that are liked by other people with similar taste. it is mainly of two types: a) User-User b) Item -Item
5. Hybrid Approaches:- This system approach is to combine collaborative filtering, content-based filtering, and other approaches .
6. Association rule mining :- Association rules capture the relationships between items based on their patterns of co-occurrence across transactions.

# About this Project
In this project I present ways to make a recommended list of products for a user similar to what Amazon does based on review ranking of the item. Given the ranking of items, create a list of top 10 products to be recommended to the given customer, based on the purchase history of other customers in the website. Amazon uses currently item-item collaborative filtering, which scales to massive datasets and produces high quality recommendation systems in the real time. This system is a kind of an information filtering system which is used to predict the "rating" or preferences which a user is interested in.
Expected output for the solution is a deployable model that can generate a list of recommended items for a given user, based on history of purchase of the same user as well as the ranking of the similar items in reviews. The number of recommended items is 10 items, but also per customer the items that already have been purchased are deducted from the list, so the number of recommendations is max 10 and minimum 3. However there are so many solution for providing recommended list of items both



**problem statement**
In this project I would like to explore ways to make a recommended list of products for a user similar to what Amazon does based on review ranking of the item. Given the ranking of items, create a list of top 10 products to be recommended to the given customer, based on the purchase history of other customers in the website.
Expected output for the solution is a deployable model that can generate a list of recommended items for a given user, based on history of purchase of the same user as well as the ranking of the similar items in reviews. The number of recommended items is 10 items, but also per customer the items that already have been purchased are deducted from the list, so the number of recommendations is max 10 and minimum 3.

![recomendation example](https://miro.medium.com/max/6612/1*U8GGHEwDHzsCjidsHQImSQ.png)

since recommender systems are considered basically finding the K-Nearest Neighbours of the specific item, in this project I use  KNNWithMeans from surprise library, also in this project 4 different ways of creating recommended list are used described below, and benchmarked against great work done by Saurav Anand https://www.kaggle.com/saurav9786/recommender-system-using-amazon-reviews/notebook 

* First: A basic collaborative filtering algorithm. (KNNBasic model)
* Second: A basic collaborative filtering algorithm, taking into account the mean ratings of each user. ( benchmark model)
* Third: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user ( KNNWithZScore model)
* Forth: A basic collaborative filtering algorithm taking into account a baseline rating. (KNNBaseline model)

Learn more in **k-NN inspired algorithms**  https://surprise.readthedocs.io/en/stable/knn_inspired.html 


# Benchmark Model
the Model used for benchmarking is as follow:
algo = KNNWithMeans(k=5, sim_options={'name': 'pearson_baseline', 'user_based': False})
The benchmark model used item_based filtering to find the recommended items.
However I would like to explore more options considering 3 types of Algorithms available in the Surprise Library, for each method I use “rmse” as our accuracy metric for the predictions against the original benchmark model.



# Attribute Information:

* userId : Every user identified with a unique id 
* productId : Every product identified with a unique id 
* Rating : Rating of the corresponding product by the corresponding user 
* timestamp : Time of the rating ( ignore this column for this exercise)



# Import Libraries 

In [None]:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
import math
import json
import time
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.neighbors import NearestNeighbors
from sklearn.externals import joblib
import scipy.sparse
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
import warnings; warnings.simplefilter('ignore')
%matplotlib inline

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# Load the Dataset and Add headers

In [None]:
electronics_data=pd.read_csv("/kaggle/input/amazon-product-reviews/ratings_Electronics (1).csv",names=['userId', 'productId','Rating','timestamp'])


In [None]:
# Display the data

electronics_data.head()


In [None]:

#Shape of the data
electronics_data.shape

In [None]:
#Taking subset of the dataset
electronics_data=electronics_data.iloc[:1048576,0:]

In [None]:
#Check the datatypes
electronics_data.dtypes

In [None]:
electronics_data.info()


In [None]:
#Five point summary 

electronics_data.describe()['Rating'].T


In [None]:
#Find the minimum and maximum ratings
print('Minimum rating is: %d' %(electronics_data.Rating.min()))
print('Maximum rating is: %d' %(electronics_data.Rating.max()))

The rating of the product range from 0 to 1

## Handling Missing values


In [None]:
#Check for missing values
print('Number of missing values across columns: \n',electronics_data.isnull().sum())



## Ratings

In [None]:
# Check the distribution of the rating
with sns.axes_style('white'):
    g = sns.factorplot("Rating", data=electronics_data, aspect=2.0,kind='count')
    g.set_ylabels("Total number of ratings")

Most of the people has given the rating of 5

## Unique Users and products


In [None]:
print("Total data ")
print("-"*50)
print("\nTotal no of ratings :",electronics_data.shape[0])
print("Total No of Users   :", len(np.unique(electronics_data.userId)))
print("Total No of products  :", len(np.unique(electronics_data.productId)))

## Dropping the TimeStamp Column

In [None]:
#Dropping the Timestamp column

electronics_data.drop(['timestamp'], axis=1,inplace=True)

# Analyzing the rating

In [None]:
#Analysis of rating given by the user 

no_of_rated_products_per_user = electronics_data.groupby(by='userId')['Rating'].count().sort_values(ascending=False)

no_of_rated_products_per_user.head()

In [None]:
no_of_rated_products_per_user.describe()


In [None]:
quantiles = no_of_rated_products_per_user.quantile(np.arange(0,1.01,0.01), interpolation='higher')


In [None]:
plt.figure(figsize=(10,10))
plt.title("Quantiles and their Values")
quantiles.plot()
# quantiles with 0.05 difference
plt.scatter(x=quantiles.index[::5], y=quantiles.values[::5], c='orange', label="quantiles with 0.05 intervals")
# quantiles with 0.25 difference
plt.scatter(x=quantiles.index[::25], y=quantiles.values[::25], c='m', label = "quantiles with 0.25 intervals")
plt.ylabel('No of ratings by user')
plt.xlabel('Value at the quantile')
plt.legend(loc='best')
plt.show()

In [None]:
print('\n No of rated product more than 50 per user : {}\n'.format(sum(no_of_rated_products_per_user >= 50)) )


# Popularity Based Recommendation

Popularity based recommendation system works with the trend. It basically uses the items which are in trend right now. For example, if any product which is usually bought by every new user then there are chances that it may suggest that item to the user who just signed up.

The problems with popularity based recommendation system is that the personalization is not available with this method i.e. even though you know the behaviour of the user you cannot recommend items accordingly.


In [None]:
#Getting the new dataframe which contains users who has given 50 or more ratings

new_df=electronics_data.groupby("productId").filter(lambda x:x['Rating'].count() >=50)

In [None]:
no_of_ratings_per_product = new_df.groupby(by='productId')['Rating'].count().sort_values(ascending=False)

fig = plt.figure(figsize=plt.figaspect(.5))
ax = plt.gca()
plt.plot(no_of_ratings_per_product.values)
plt.title('# RATINGS per Product')
plt.xlabel('Product')
plt.ylabel('No of ratings per product')
ax.set_xticklabels([])

plt.show()

In [None]:
#Average rating of the product 

new_df.groupby('productId')['Rating'].mean().head()

In [None]:
new_df.groupby('productId')['Rating'].mean().sort_values(ascending=False).head()


In [None]:
#Total no of rating for product

new_df.groupby('productId')['Rating'].count().sort_values(ascending=False).head()

In [None]:
ratings_mean_count = pd.DataFrame(new_df.groupby('productId')['Rating'].mean())


In [None]:
ratings_mean_count['rating_counts'] = pd.DataFrame(new_df.groupby('productId')['Rating'].count())


In [None]:
ratings_mean_count.head()


In [None]:
ratings_mean_count['rating_counts'].max()


In [None]:
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
ratings_mean_count['rating_counts'].hist(bins=50)

In [None]:
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
ratings_mean_count['Rating'].hist(bins=50)

In [None]:
plt.figure(figsize=(8,6))
plt.rcParams['patch.force_edgecolor'] = True
sns.jointplot(x='Rating', y='rating_counts', data=ratings_mean_count, alpha=0.4)

In [None]:
popular_products = pd.DataFrame(new_df.groupby('productId')['Rating'].count())
most_popular = popular_products.sort_values('Rating', ascending=False)
most_popular.head(30).plot(kind = "bar")

# Collaberative filtering (Item-Item recommedation)

Collaborative filtering is commonly used for recommender systems. These techniques aim to fill in the missing entries of a user-item association matrix. We are going to use collaborative filtering (CF) approach.
CF is based on the idea that the best recommendations come from people who have similar tastes. In other words, it uses historical item ratings of like-minded people to predict how someone would rate an item.
Collaborative filtering has two sub-categories that are generally called **memory based** and **model-based** approaches.

**Memory Based Models**

we use KNN implimentations to find out the recommentations

* First: A basic collaborative filtering algorithm. (KNNBasic model)
* Second: A basic collaborative filtering algorithm, taking into account the mean ratings of each user. ( benchmark model)
* Third: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user ( KNNWithZScore model)
* Forth: A basic collaborative filtering algorithm taking into account a baseline rating. (KNNBaseline model)


**Model Based**

we use
* SVD
* SlopeOne



In [None]:
from surprise import SVD,  SlopeOne
from surprise import KNNBaseline, KNNBasic, KNNWithMeans, KNNWithZScore

from surprise import Dataset
from surprise import accuracy
from surprise import Reader
import os
from surprise.model_selection import train_test_split

In [None]:
#Reading the dataset
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(new_df,reader)

In [None]:
#Splitting the dataset
trainset, testset = train_test_split(data, test_size=0.3,random_state=10)

# Memory Based model
using KNNWithMeans from Surprise package, which we benchmark agains other types of KNN algorithms

In [None]:
# Defining bsl_options and sim_options for all methods , 
# more info https://surprise.readthedocs.io/en/stable/prediction_algorithms.html#similarity-measures-configuration
bsl_options = {'method': 'als', 'n_epochs': 5, 'reg_u': 12, 'reg_i': 5 }
sim_options={'name': 'pearson_baseline', 'user_based': False}

In [None]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering

algo_KNNWithMeans = KNNWithMeans(k=5,sim_options = sim_options , bsl_options = bsl_options)
predictions_KNNWithMeans = algo_KNNWithMeans.fit(trainset).test(testset)
rmse_KNNWithMeans = accuracy.rmse(predictions_KNNWithMeans)

# A basic collaborative filtering algorithm.

using KNNBasic from surprise package

In [None]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo_KNNBasic = KNNBasic(k=5,sim_options = sim_options , bsl_options = bsl_options)
predictions_KNNBasic = algo_KNNBasic.fit(trainset).test(testset)
rmse_KNNBasic = accuracy.rmse(predictions_KNNBasic)

# A basic collaborative filtering algorithm, taking into account the z-score normalization of each user.

using KNNWithZScore from surprise package

In [None]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo_KNNWithZScore = KNNWithZScore(k=5,sim_options = sim_options , bsl_options = bsl_options)
predictions_KNNWithZScore = algo_KNNWithZScore.fit(trainset).test(testset)
rmse_KNNWithZScore = accuracy.rmse(predictions_KNNWithZScore)

# A basic collaborative filtering algorithm taking into account a baseline rating. 
using KNNBaseline from surprise package.

In [None]:
# Use user_based true/false to switch between user-based or item-based collaborative filtering
algo_KNNBaseline = KNNBaseline(k=5,sim_options = sim_options , bsl_options = bsl_options)
predictions_KNNBaseline = algo_KNNBaseline.fit(trainset).test(testset)
rmse_KNNBaseline = accuracy.rmse(predictions_KNNBaseline)

# Model-based collaborative filtering system

These methods are based on machine learning and data mining techniques. The goal is to train models to be able to make predictions. For example, we could use existing user-item interactions to train a model to predict the top-5 items that a user might like the most. One advantage of these methods is that they are able to recommend a larger number of items to a larger number of users, compared to other methods like memory based approach. They have large coverage, even when working with large sparse matrices.

In [None]:
# Matrix Factorization-based SVD
from surprise import SVD

algo_SVD = SVD()
predictions_SVD = algo_SVD.fit(trainset).test(testset)
rmse_SVD = accuracy.rmse(predictions_SVD)

In [None]:
# Matrix Factorization-based SlopeOne
from surprise import  SlopeOne

algo_SlopeOne = SlopeOne()
predictions_SlopeOne = algo_SlopeOne.fit(trainset).test(testset)
rmse_SlopeOne = accuracy.rmse(predictions_SlopeOne)

# Comparing RMSE for Modles
Item-based ( Memory Based) Models:
* First: A basic collaborative filtering algorithm. (KNNBasic model)
* Second: A basic collaborative filtering algorithm, taking into account the mean ratings of each user. (KNNWithMeans model )
* Third: A basic collaborative filtering algorithm, taking into account the z-score normalization of each user ( KNNWithZScore model)
* Forth: A basic collaborative filtering algorithm taking into account a baseline rating. (KNNBaseline model)

Model-Based 
* SVD
* SlopOne

In [None]:
print("Item-based Model : Test Set")
print("KNNBasic Model" , rmse_KNNBasic)
print("KNNWithMeans Model" , rmse_KNNWithMeans)
print("KNNWithZScore Model" , rmse_KNNWithZScore)
print("KNNBaseline Model" , rmse_KNNBaseline)
print("Model-based Model : Test Set")
print("SVD Model" , rmse_SVD)
print("SlopeOne Model" , rmse_SlopeOne)



1. # Prediction List for a Particular User
in this section the list of recommendations are shown for chosen user, use different IDs to see the recommendation for each of the methods above

* KNNBasic
* KNNWithMeans
* KNNWithZScore
* KNNBaseline
* SVD
* SlopeOne


**Difference between Item-based methods and Model-Based methods**

For item based models, **algo.get_neighbors(product_inner_index , k = number_of_nearest_neighbors)** will produce the nearest neighbors to a product, if purchased by user then the neighbors of that product are to be recommended to the user, however for the model-based methods **algo.predict(user_id, product_id)** can be used to create a list of reccomended items.

In [None]:
#Creating list of all unique users and products

all_user_ids = list(new_df['userId'].unique())

all_user_ids[:10]

all_products = list(new_df['productId'].unique())

all_products[:10]

In [None]:
#the number of all unique users present in the dataset
len(all_user_ids)

In [None]:
#choose the index for a particular user to generate the items recommnedation list
#choose any number between 1:552909

user_index = 11

uid = all_user_ids[user_index]


In [None]:
#choosen user ID
print("User choosen to generate recommendation list is " + str(uid))

In [None]:
# method KNNBasic
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_KNNBasic.trainset.to_raw_iid(items))



#getting K Neareset Neighbors for first item purchased by the choosen user
KNN_Product = algo_KNNBasic.get_neighbors(items_purchased[0][0], 15)

recommendedation_lits = []
for product_iid in KNN_Product:
    if not product_iid in items_purchased[0]: #user already has purchased the item
        purchased_item = algo_KNNBasic.trainset.to_raw_iid(product_iid)
        recommendedation_lits.append(purchased_item)
print("Recommended items for user " + str(uid) + " by KNNBasic \n"  , recommendedation_lits)    

In [None]:
# method KNNWithMeans
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_KNNWithMeans.trainset.to_raw_iid(items))



#getting K Neareset Neighbors for first item purchased by the choosen user
KNN_Product = algo_KNNWithMeans.get_neighbors(items_purchased[0][0], 15)

recommendedation_lits = []
for product_iid in KNN_Product:
    if not product_iid in items_purchased[0]: #user already has purchased the item
        purchased_item = algo_KNNWithMeans.trainset.to_raw_iid(product_iid)
        recommendedation_lits.append(purchased_item)
print("Recommended items for user " + str(uid) + " by KNNWithMeans \n"  , recommendedation_lits)    

In [None]:
# method KNNWithZScore
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_KNNWithZScore.trainset.to_raw_iid(items))



#getting K Neareset Neighbors for first item purchased by the choosen user
KNN_Product = algo_KNNWithZScore.get_neighbors(items_purchased[0][0], 15)

recommendedation_lits = []
for product_iid in KNN_Product:
    if not product_iid in items_purchased[0]: #user already has purchased the item
        purchased_item = algo_KNNWithZScore.trainset.to_raw_iid(product_iid)
        recommendedation_lits.append(purchased_item)
print("Recommended items for user " + str(uid) + " by KNNWithZScore \n"  , recommendedation_lits)    

In [None]:
# method KNNBaseline
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_KNNBaseline.trainset.to_raw_iid(items))



#getting K Neareset Neighbors for first item purchased by the choosen user
KNN_Product = algo_KNNBaseline.get_neighbors(items_purchased[0][0], 15)

recommendedation_lits = []
for product_iid in KNN_Product:
    if not product_iid in items_purchased[0]: #user already has purchased the item
        purchased_item = algo_KNNBaseline.trainset.to_raw_iid(product_iid)
        recommendedation_lits.append(purchased_item)
print("Recommended items for user " + str(uid) + " by KNNBaseline \n"  , recommendedation_lits)    

Model Based Methods recommendation lists

In [None]:
# method SVD
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_SVD.trainset.to_raw_iid(items))


Recommended_list = []

UserID = all_user_ids[user_index]

for product_id in all_products:
    pred = algo_SVD.predict(UserID,  product_id, r_ui=4, verbose=True)
    Recommended_list.append(pred)

    print("Recommended items for user " + str(uid) + " by SVD \n")

pred[:15]

In [None]:
# method SlopeOne
# list of items already purchased by user choosen above

items_purchased = trainset.ur[trainset.to_inner_uid(uid)]


print("Choosen User has purchased the following items ")
for items in items_purchased[0]: 
    print(algo_SlopeOne.trainset.to_raw_iid(items))


Recommended_list = []

UserID = all_user_ids[user_index]

for product_id in all_products:
    pred = algo_SlopeOne.predict(UserID,  product_id, r_ui=4, verbose=True)
    Recommended_list.append(pred)

    print("Recommended items for user " + str(uid) + " by SlopeOne \n")

pred[:15]

# Additional Analysis of the predictions 
here in the following section, the predictions of each model is analys  


In [None]:
#To compare each model prediction a Predictions Dataframe needs to be created, the following functions are helpers
def get_Iu(uid):
    """ return the number of items rated by given user
    args: 
      uid: the id of the user
    returns: 
      the number of items rated by the user
    """
    try:
        return len(trainset.ur[trainset.to_inner_uid(uid)])
    except ValueError: # user was not part of the trainset
        return 0
    
def get_Ui(iid):
    """ return number of users that have rated given item
    args:
      iid: the raw id of the item
    returns:
      the number of users that have rated the item.
    """
    try: 
        return len(trainset.ir[trainset.to_inner_iid(iid)])
    except ValueError:
        return 0

In [None]:
#Predictions for KNNBasic

df_predictions_KNNBasic = pd.DataFrame(predictions_KNNBasic, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_KNNBasic['Iu'] = df_predictions_KNNBasic.uid.apply(get_Iu)
df_predictions_KNNBasic['Ui'] = df_predictions_KNNBasic.iid.apply(get_Ui)
df_predictions_KNNBasic['err'] = abs(df_predictions_KNNBasic.est - df_predictions_KNNBasic.rui)

In [None]:
df_predictions_KNNBasic.head()

In [None]:
best_predictions = df_predictions_KNNBasic.sort_values(by='err')[:10]
worst_predictions = df_predictions_KNNBasic.sort_values(by='err')[-10:]

In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_KNNBasic.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_KNNBasic.uid)))
print("Total No of products  :", len(np.unique(df_predictions_KNNBasic.iid)))

In [None]:
#Predictions for KNNWithMeans

df_predictions_KNNWithMeans = pd.DataFrame(predictions_KNNWithMeans, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_KNNWithMeans['Iu'] = df_predictions_KNNWithMeans.uid.apply(get_Iu)
df_predictions_KNNWithMeans['Ui'] = df_predictions_KNNWithMeans.iid.apply(get_Ui)
df_predictions_KNNWithMeans['err'] = abs(df_predictions_KNNWithMeans.est - df_predictions_KNNWithMeans.rui)

In [None]:
df_predictions_KNNWithMeans.head()

In [None]:
best_predictions = df_predictions_KNNWithMeans.sort_values(by='err')[:10]
worst_predictions = df_predictions_KNNWithMeans.sort_values(by='err')[-10:]

In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_KNNWithMeans.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_KNNWithMeans.uid)))
print("Total No of products  :", len(np.unique(df_predictions_KNNWithMeans.iid)))

In [None]:
#Predictions for KNNWithZScore

df_predictions_KNNWithZScore = pd.DataFrame(predictions_KNNWithZScore, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_KNNWithZScore['Iu'] = df_predictions_KNNWithZScore.uid.apply(get_Iu)
df_predictions_KNNWithZScore['Ui'] = df_predictions_KNNWithZScore.iid.apply(get_Ui)
df_predictions_KNNWithZScore['err'] = abs(df_predictions_KNNWithZScore.est - df_predictions_KNNWithZScore.rui)

In [None]:
df_predictions_KNNWithZScore.head()

In [None]:
best_predictions = df_predictions_KNNWithZScore.sort_values(by='err')[:10]
worst_predictions = df_predictions_KNNWithZScore.sort_values(by='err')[-10:]

In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_KNNWithZScore.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_KNNWithZScore.uid)))
print("Total No of products  :", len(np.unique(df_predictions_KNNWithZScore.iid)))

In [None]:
#KNNBaseline
#Predictions for KNNBaseline

df_predictions_KNNBaseline = pd.DataFrame(predictions_KNNBaseline, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_KNNBaseline['Iu'] = df_predictions_KNNBaseline.uid.apply(get_Iu)
df_predictions_KNNBaseline['Ui'] = df_predictions_KNNBaseline.iid.apply(get_Ui)
df_predictions_KNNBaseline['err'] = abs(df_predictions_KNNBaseline.est - df_predictions_KNNBaseline.rui)


In [None]:
df_predictions_KNNBaseline.head()

In [None]:
best_predictions = df_predictions_KNNBaseline.sort_values(by='err')[:10]
worst_predictions = df_predictions_KNNBaseline.sort_values(by='err')[-10:]

In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_KNNBaseline.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_KNNBaseline.uid)))
print("Total No of products  :", len(np.unique(df_predictions_KNNBaseline.iid)))

In [None]:
#Predictions for SVD

df_predictions_SVD = pd.DataFrame(predictions_SVD, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_SVD['Iu'] = df_predictions_SVD.uid.apply(get_Iu)
df_predictions_SVD['Ui'] = df_predictions_SVD.iid.apply(get_Ui)
df_predictions_SVD['err'] = abs(df_predictions_SVD.est - df_predictions_SVD.rui)

df_predictions_SVD.head()

In [None]:
best_predictions = df_predictions_SVD.sort_values(by='err')[:10]
worst_predictions = df_predictions_SVD.sort_values(by='err')[-10:]

In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_SVD.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_SVD.uid)))
print("Total No of products  :", len(np.unique(df_predictions_SVD.iid)))

In [None]:
#Predictions for SlopeOne

df_predictions_SlopeOne = pd.DataFrame(predictions_SlopeOne, columns=['uid', 'iid', 'rui', 'est', 'details'])
df_predictions_SlopeOne['Iu'] = df_predictions_SlopeOne.uid.apply(get_Iu)
df_predictions_SlopeOne['Ui'] = df_predictions_SlopeOne.iid.apply(get_Ui)
df_predictions_SlopeOne['err'] = abs(df_predictions_SlopeOne.est - df_predictions_SlopeOne.rui)

df_predictions_SlopeOne.head()

In [None]:
best_predictions = df_predictions_SlopeOne.sort_values(by='err')[:10]
worst_predictions = df_predictions_SlopeOne.sort_values(by='err')[-10:]


In [None]:
best_predictions

In [None]:
worst_predictions

In [None]:
print("\nTotal no of ratings :",df_predictions_SlopeOne.shape[0])
print("Total No of Users   :", len(np.unique(df_predictions_SlopeOne.uid)))
print("Total No of products  :", len(np.unique(df_predictions_SlopeOne.iid)))