## Problem statement

The e-commerce business is quite popular today. Here, you do not need to take orders by going to each customer. A company launches its website to sell the items to the end consumer, and customers can order the products that they require from the same website. Famous examples of such e-commerce companies are Amazon, Flipkart, Myntra, Paytm and Snapdeal.

Suppose you are working as a Machine Learning Engineer in an e-commerce company named 'Ebuss'. Ebuss has captured a huge market share in many fields, and it sells the products in various categories such as household essentials, books, personal care products, medicines, cosmetic items, beauty products, electrical appliances, kitchen and dining products and health care products.

With the advancement in technology, it is imperative for Ebuss to grow quickly in the e-commerce market to become a major leader in the market because it has to compete with the likes of Amazon, Flipkart, etc., which are already market leaders.

As a senior ML Engineer, you are asked to build a model that will improve the recommendations given to the users given their past reviews and ratings. 


The steps to be performed for the first task are given below.

- Exploratory data analysis
- Data cleaning
- Text preprocessing
- Feature extraction: In order to extract features from the text data, you may choose from any of the methods, including bag-of-words, TF-IDF vectorization or word embedding.
- Training a text classification model: You need to build at least three ML models. You then need to analyse the performance of each of these models and choose the best model. At least three out of the following four models need to be built (Do not forget, if required, handle the class imbalance and perform hyperparameter tuning.). 
    1. Logistic regression
    2. Random forest
    3. XGBoost
    4. Naive Bayes

Out of these four models, you need to select one classification model based on its performance.

Building a recommendation system
As you learnt earlier, you can use the following types of recommendation systems.
1. User-based recommendation system
2. Item-based recommendation system

Your task is to analyse the recommendation systems and select the one that is best suited in this case. 

Once you get the best-suited recommendation system, the next task is to recommend 20 products that a user is most likely to purchase based on the ratings. <br/>

You can use the 'reviews_username' (one of the columns in the dataset) to identify your user. 
- Improving the recommendations using the sentiment analysis model

Now, the next task is to link this recommendation system with the sentiment analysis model that was built earlier (recall that we asked you to select one ML model out of the four options). Once you recommend 20 products to a particular user using the recommendation engine, you need to filter out the 5 best products based on the sentiments of the 20 recommended product reviews. 

In this way, you will get an ML model (for sentiments) and the best-suited recommendation system. 



# **Task 5: Building the Recommendation System**

We will build these two recommendation system
- User-based recommendation system
- Item-based recommendation system

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

In [2]:
from sklearn.model_selection import train_test_split, GridSearchCV


In [3]:
import Utility

In [4]:
from sklearn.metrics.pairwise import pairwise_distances, cosine_similarity

In [5]:
cl_df = pd.read_pickle("savedData/preprocessed-dataframe.pkl")
cl_df.head()

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_rating,reviews_text,reviews_title,reviews_username,user_sentiment,reviews_preprocess_text,reviews_complete_text
0,AV13O1A8GV-KLJ3akUyj,Universal Music,"Movies, Music & Books,Music,R&b,Movies & TV,Mo...",others,Pink Friday: Roman Reloaded Re-Up (w/dvd),2012-11-30 06:21:45+00:00,5,i love this album. it's very good. more to the...,Just Awesome,joshua,1,awesome love album good hip hop side current p...,awesome love album good hip hop side current p...
1,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",others,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09 00:00:00+00:00,5,Good flavor. This review was collected as part...,Good,dorothy w,1,good good flavor review collected part promotion,good good flavor review collect part promotion
2,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",others,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09 00:00:00+00:00,5,Good flavor.,Good,dorothy w,1,good good flavor,good good flavor
3,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",others,K-Y Love Sensuality Pleasure Gel,2016-01-06 00:00:00+00:00,1,I read through the reviews on here before look...,Disappointed,rebecca,0,disappointed read reviews looking buying one c...,disappoint read review look buy one couple lub...
4,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",others,K-Y Love Sensuality Pleasure Gel,2016-12-21 00:00:00+00:00,1,My husband bought this gel for us. The gel cau...,Irritation,walker557,0,irritation husband bought gel us gel caused ir...,irritation husband buy gel us gel cause irrita...


In [6]:
cl_df[(cl_df["reviews_title"] == "unknown") | (cl_df["reviews_username"] == "unknown")].shape

(72, 13)

In [7]:
user_recommendation_df_columns = ['id', 'name', 'reviews_rating', 'reviews_username']

In [8]:
user_recommendation_df = cl_df[user_recommendation_df_columns]
user_recommendation_df.head()

Unnamed: 0,id,name,reviews_rating,reviews_username
0,AV13O1A8GV-KLJ3akUyj,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,joshua
1,AV14LG0R-jtxr-f38QfS,Lundberg Organic Cinnamon Toast Rice Cakes,5,dorothy w
2,AV14LG0R-jtxr-f38QfS,Lundberg Organic Cinnamon Toast Rice Cakes,5,dorothy w
3,AV16khLE-jtxr-f38VFn,K-Y Love Sensuality Pleasure Gel,1,rebecca
4,AV16khLE-jtxr-f38VFn,K-Y Love Sensuality Pleasure Gel,1,walker557


In [9]:
user_recommendation_df.shape

(29255, 4)

### **Dividing the data in train and test**

In [10]:
train, test = train_test_split(user_recommendation_df, test_size=0.30, random_state=42)


In [11]:
print(train.shape)
print(test.shape)

(20478, 4)
(8777, 4)


In [12]:
user_recommendation_df_pivot = train.pivot_table(index='reviews_username', columns='id', values='reviews_rating').fillna(0)
user_recommendation_df_pivot.head()

id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0325home,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [13]:
user_recommendation_df_pivot.shape

(17790, 218)

### **Creating dummy train and test**
These dataset will be used for prediction:
- Dummy train will be used later for prediction of the products which has not been rated by the user. To ignore the products rated by the user, we will mark it as 0 during prediction. The products not rated by user is marked as 1 for prediction in dummy train dataset.
- Dummy test will be used for evaluation. To evaluate, we will only make prediction on the products rated by the user. So, this is marked as 1. This is just opposite of dummy_train.

In [14]:
# Copy the train dataset into dummy_train
recommendation_user_dummy_train = train.copy()


In [15]:
# The products not rated by user is marked as 1 for prediction.
recommendation_user_dummy_train['reviews_rating'] = recommendation_user_dummy_train['reviews_rating'].apply(lambda x: 0 if x>=1 else 1)


In [16]:
# Convert the dummy train dataset into matrix format
recommendation_user_dummy_train = recommendation_user_dummy_train.pivot_table(index='reviews_username', columns='id', values='reviews_rating').fillna(1)
recommendation_user_dummy_train.head()


id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
01impala,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
02dakota,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
02deuce,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
0325home,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [17]:
product_column = "id"
user_column = "reviews_username"
value_column = "reviews_rating"
user_input = "manny"

#### **Cosine Similarity**
Cosine Similarity is a measurement that quantifies the similarity between two vectors. In this case it is reviews_rating column.

#### **Adjusted Cosine**
Adjusted cosine similarity is a modified version of vector-based similarity where we incorporate the fact that different users have different ratings schemes. In other words, some users might rate items highly in general, and others might give items lower ratings as a preference. To handle this nature from rating given by user , we subtract average ratings for each user from each user's rating for different products.

## **User Based Recommendation System**
### **User Similarity Matrix**
**Using adjusted Cosine** <br/>
Here, we are not removing the NaN values and calculating the mean only for the products rated by the user

In [18]:
# Create Pivot the train ratings data into matrix
# In this case columns are Products and the rows are usernames.
user_recommendation_df_pivot = train.pivot_table(index='reviews_username', columns='id', values='reviews_rating')
user_recommendation_df_pivot.head()

id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
01impala,,,,,,,,,,,...,,,,,,,,,,
02dakota,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
0325home,,,,,,,,,,,...,,,,,,,,,,


In [19]:
# user_recommendation_df_pivot.index.nunique()

In [20]:
mean = np.nanmean(user_recommendation_df_pivot, axis=1)
user_recommendation_df_subtracted = (user_recommendation_df_pivot.T-mean).T
user_recommendation_df_subtracted.head()

id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
01impala,,,,,,,,,,,...,,,,,,,,,,
02dakota,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
0325home,,,,,,,,,,,...,,,,,,,,,,


In [21]:
# Creating the User Similarity Matrix using pairwise_distance function.
user_correlation = 1 - pairwise_distances(user_recommendation_df_subtracted.fillna(0), metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [22]:
user_correlation.shape

(17790, 17790)

#### **Prediction - User User**
Doing the prediction for the users which are positively related with other users, and not the users which are negatively related as we are interested in the users which are more similar to the current users. <br/>
So, ignoring the correlation for values less than 0.
                                                                                                                                                                                                                                            

In [23]:
user_correlation[user_correlation<0]=0
user_correlation

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

Rating predicted by the user (for products rated as well as not rated) is the weighted sum of correlation with the product rating (as present in the rating dataset).

In [24]:
user_predicted_ratings = np.dot(user_correlation, user_recommendation_df_pivot.fillna(0))
print(user_predicted_ratings.shape)
user_predicted_ratings

(17790, 218)


array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [25]:
user_final_rating = np.multiply(user_predicted_ratings,recommendation_user_dummy_train)
user_final_rating.head()

id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0325home,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [26]:
Utility.save_object(user_final_rating, "user_final_rating")

# **Task 6: Recommendation of Top 20 Products to a Specified User `User-User`**

In [27]:
recommendations_user_user = user_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
recommendations_user_user

id
AVpfPaoqLJeJML435Xk9    74.553884
AVpe41TqilAPnD_xQH3d    30.277626
AVpe59io1cnluZ0-ZgDU    26.673776
AVpf3VOfilAPnD_xjpun    13.253695
AVpf2tw1ilAPnD_xjflC    13.181836
AVpfJP1C1cnluZ0-e3Xy    11.502760
AVpfM_ytilAPnD_xXIJb     9.547001
AVpfR5m0LJeJML436K3W     7.078396
AVpf5Z1zLJeJML43FpB-     5.378422
AVpe8gsILJeJML43y6Ed     5.023649
AVpfOmKwLJeJML435GM7     4.904194
AVpfv4TlilAPnD_xhjNS     4.549390
AVpf63aJLJeJML43F__Q     4.143523
AVpfOIrkilAPnD_xXgDG     3.728256
AVpfPnrU1cnluZ0-g9rL     3.691839
AVpfEqruilAPnD_xUWDr     3.331550
AVpfRYbSilAPnD_xYkD4     3.263956
AVpe-PJnLJeJML43ziaj     3.263956
AVpfBU2S1cnluZ0-cJsO     3.263956
AVpe5JOgilAPnD_xQPfE     3.077287
Name: manny, dtype: float64

In [28]:
#display the top 20 product id, name and similarity_score 
final_recommendations_user = pd.DataFrame({'product_id': recommendations_user_user.index, 'similarity_score' : recommendations_user_user})
final_recommendations_user.reset_index(drop=True)
pd.merge(final_recommendations_user, train, on="id")[["id", "name", "similarity_score"]].drop_duplicates()

Unnamed: 0,id,name,similarity_score
0,AVpfPaoqLJeJML435Xk9,Godzilla 3d Includes Digital Copy Ultraviolet ...,74.553884
2313,AVpe41TqilAPnD_xQH3d,Mike Dave Need Wedding Dates (dvd + Digital),30.277626
2823,AVpe59io1cnluZ0-ZgDU,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,26.673776
3285,AVpf3VOfilAPnD_xjpun,Clorox Disinfecting Wipes Value Pack Scented 1...,13.253695
9206,AVpf2tw1ilAPnD_xjflC,Red (special Edition) (dvdvideo),13.181836
9662,AVpfJP1C1cnluZ0-e3Xy,Clorox Disinfecting Bathroom Cleaner,11.50276
11100,AVpfM_ytilAPnD_xXIJb,Tostitos Bite Size Tortilla Chips,9.547001
11291,AVpfR5m0LJeJML436K3W,Jason Aldean - They Don't Know,7.078396
11439,AVpf5Z1zLJeJML43FpB-,"Lysol Concentrate Deodorizing Cleaner, Origina...",5.378422
11542,AVpe8gsILJeJML43y6Ed,"Pendaflex174 Divide It Up File Folder, Multi S...",5.023649


In [29]:
def get_top_recommendation_users(users):
    new_df = pd.DataFrame()
    for user_input in users:
        try:
            temp_recommendations = user_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
            temp = temp_recommendations.to_frame().reset_index()
            temp = temp[temp.columns.difference(['id'])].T
            temp.columns = [f"product{ind}" for ind in range(len(temp.columns))]
            temp.insert(0, "user", user_input)
            
            if len(new_df) != 0:
                new_df = pd.concat([new_df, temp], ignore_index=True, axis=0)
            else:
                new_df = temp

        except:
            continue

    return new_df

In [30]:
temp_user_recommendation_df = get_top_recommendation_users(train.reviews_username.unique())

In [31]:
Utility.save_object(temp_user_recommendation_df.sort_values(by=["product0"], ascending=False).head(10), "best_recommendation_users")

#### **Users with best product recommendations**

In [32]:
temp_user_recommendation_df.sort_values(by=["product0"], ascending=False).head(10)

Unnamed: 0,user,product0,product1,product2,product3,product4,product5,product6,product7,product8,...,product10,product11,product12,product13,product14,product15,product16,product17,product18,product19
6151,manny,74.553884,30.277626,26.673776,13.253695,13.181836,11.50276,9.547001,7.078396,5.378422,...,4.904194,4.54939,4.143523,3.728256,3.691839,3.33155,3.263956,3.263956,3.263956,3.077287
1449,moerena,74.067436,21.816133,19.066274,19.030876,16.617397,10.036487,10.016632,8.697712,8.413393,...,7.197199,5.606674,4.892243,4.656636,4.119252,3.891039,3.891039,3.86568,3.790994,3.448617
2769,nana,70.577337,56.694429,49.406397,31.322992,16.123417,13.976228,11.319356,10.725913,10.26042,...,9.314261,9.240835,8.892015,8.44256,7.782452,7.761607,7.014709,7.002242,6.841551,6.597396
1275,vicki,70.389129,57.651201,43.615622,34.720689,13.211525,11.145445,10.053928,9.047835,8.968361,...,8.576103,8.013447,7.987062,7.681255,6.595717,5.653617,5.212496,5.189562,5.157319,5.090876
2988,viewer,68.30013,57.014452,47.6262,39.359921,30.819626,13.134011,11.452792,10.194899,9.922057,...,8.717697,8.480747,8.432344,8.371707,7.822141,7.375848,7.278232,5.338724,4.829629,4.477209
2010,brandon,68.18143,54.455826,42.28923,30.394557,30.362211,11.004222,10.194899,9.611441,8.811906,...,8.685644,7.928822,7.782452,7.413893,7.278232,6.811862,5.82812,5.739426,5.317702,5.159947
3171,ronnie,67.857236,48.294413,31.752083,29.578719,13.730138,11.378891,10.699978,9.573047,9.570263,...,8.396605,8.294562,8.186615,7.845243,7.828421,7.768277,7.623787,7.337366,7.08611,6.73468
3896,thom,66.529931,53.025387,48.623608,41.477352,30.697828,12.84989,11.499159,11.06642,10.226556,...,9.17222,8.949494,8.418798,8.306439,7.337366,6.978337,6.613531,6.454689,6.413566,6.220085
9535,chas,66.529931,53.025387,48.623608,41.477352,30.697828,12.84989,11.499159,11.06642,10.226556,...,9.17222,8.949494,8.418798,8.306439,7.337366,6.978337,6.613531,6.454689,6.413566,6.220085
505,adam,66.096354,45.845239,40.217108,29.455278,29.377614,12.342892,10.981117,10.710738,9.453713,...,8.860351,8.724332,8.510261,8.228932,7.337366,7.273503,7.08611,6.978337,6.82434,6.613531


#### **Evaluation User-User**

In [33]:
# Find out the common users of test and train dataset.
common_users = test[test.reviews_username.isin(train.reviews_username)]
common_users.shape

(1892, 4)

In [34]:
common_users.head()

Unnamed: 0,id,name,reviews_rating,reviews_username
19958,AVpfJP1C1cnluZ0-e3Xy,Clorox Disinfecting Bathroom Cleaner,5,mommy2three
8516,AVpf3VOfilAPnD_xjpun,Clorox Disinfecting Wipes Value Pack Scented 1...,5,angie0104
18684,AVpfJP1C1cnluZ0-e3Xy,Clorox Disinfecting Bathroom Cleaner,4,babe
21876,AVpfMpZ51cnluZ0-f_L9,Chips Ahoy! Original Chocolate Chip - Cookies ...,5,alexis
25458,AVpfPaoqLJeJML435Xk9,Godzilla 3d Includes Digital Copy Ultraviolet ...,5,ross


In [35]:
# convert into the user-movie matrix.
common_user_based_matrix = pd.pivot_table(common_users,index=user_column, columns = product_column, values = value_column)
common_user_based_matrix.head()

id,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YneDPglJLPUi8IJyQ,AV1YqAaMGV-KLJ3adiDj,AV1YtGjdglJLPUi8IOfJ,AV1ZSp2uglJLPUi8IQFy,...,AVpfkak01cnluZ0-nJj6,AVpfluP1ilAPnD_xejxO,AVpfm8yiLJeJML43AYyu,AVpfoSS51cnluZ0-oVH9,AVpfov9TLJeJML43A7B0,AVpfpM2yilAPnD_xfmDG,AVpfr5cb1cnluZ0-pZFp,AVpfrFDZLJeJML43Bmv0,AVpfrfHF1cnluZ0-pRai,AVpftikC1cnluZ0-p31V
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
1234,,,,,,,,,,,...,,,,,,,,,,
1234567,,,,,,,,,,,...,,,,,,,,,,
1943,,,,,,,,,,,...,,,,,,,,,,
23jen,,,,,,,,,,,...,,,,,,,,,,


In [36]:
# Convert the user_correlation matrix into dataframe.
user_correlation_df = pd.DataFrame(user_correlation)
user_correlation_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,17780,17781,17782,17783,17784,17785,17786,17787,17788,17789
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
user_correlation_df[user_column] = user_recommendation_df_pivot.index
user_correlation_df.set_index(user_column,inplace=True)
user_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,17780,17781,17782,17783,17784,17785,17786,17787,17788,17789
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0325home,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [38]:
list_name = common_users.reviews_username.tolist()

user_correlation_df.columns = user_recommendation_df_pivot.index.tolist()
user_correlation_df_1 =  user_correlation_df[user_correlation_df.index.isin(list_name)]

In [39]:
user_correlation_df_1.shape

(1588, 17790)

In [40]:
user_correlation_df_2 = user_correlation_df_1.T[user_correlation_df_1.T.index.isin(list_name)]

In [41]:
user_correlation_df_3 = user_correlation_df_2.T

In [42]:
user_correlation_df_3[user_correlation_df_3<0]=0

common_user_predicted_ratings = np.dot(user_correlation_df_3, common_user_based_matrix.fillna(0))
common_user_predicted_ratings

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [43]:
recommendation_user_dummy_test = common_users.copy()

recommendation_user_dummy_test[value_column] = recommendation_user_dummy_test[value_column].apply(lambda x: 1 if x>=1 else 0)
recommendation_user_dummy_test = pd.pivot_table(recommendation_user_dummy_test,index=user_column, columns = product_column, values = value_column).fillna(0)


In [44]:
recommendation_user_dummy_test.shape


(1588, 100)

In [45]:
common_user_based_matrix.head()

id,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YneDPglJLPUi8IJyQ,AV1YqAaMGV-KLJ3adiDj,AV1YtGjdglJLPUi8IOfJ,AV1ZSp2uglJLPUi8IQFy,...,AVpfkak01cnluZ0-nJj6,AVpfluP1ilAPnD_xejxO,AVpfm8yiLJeJML43AYyu,AVpfoSS51cnluZ0-oVH9,AVpfov9TLJeJML43A7B0,AVpfpM2yilAPnD_xfmDG,AVpfr5cb1cnluZ0-pZFp,AVpfrFDZLJeJML43Bmv0,AVpfrfHF1cnluZ0-pRai,AVpftikC1cnluZ0-p31V
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,,,,,,,,,,,...,,,,,,,,,,
1234,,,,,,,,,,,...,,,,,,,,,,
1234567,,,,,,,,,,,...,,,,,,,,,,
1943,,,,,,,,,,,...,,,,,,,,,,
23jen,,,,,,,,,,,...,,,,,,,,,,


In [46]:
recommendation_user_dummy_test.head()

id,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YneDPglJLPUi8IJyQ,AV1YqAaMGV-KLJ3adiDj,AV1YtGjdglJLPUi8IOfJ,AV1ZSp2uglJLPUi8IQFy,...,AVpfkak01cnluZ0-nJj6,AVpfluP1ilAPnD_xejxO,AVpfm8yiLJeJML43AYyu,AVpfoSS51cnluZ0-oVH9,AVpfov9TLJeJML43A7B0,AVpfpM2yilAPnD_xfmDG,AVpfr5cb1cnluZ0-pZFp,AVpfrFDZLJeJML43Bmv0,AVpfrfHF1cnluZ0-pRai,AVpftikC1cnluZ0-p31V
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1234567,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1943,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
23jen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
common_user_predicted_ratings = np.multiply(common_user_predicted_ratings,recommendation_user_dummy_test)


In [48]:
common_user_predicted_ratings.head()

id,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YneDPglJLPUi8IJyQ,AV1YqAaMGV-KLJ3adiDj,AV1YtGjdglJLPUi8IOfJ,AV1ZSp2uglJLPUi8IQFy,...,AVpfkak01cnluZ0-nJj6,AVpfluP1ilAPnD_xejxO,AVpfm8yiLJeJML43AYyu,AVpfoSS51cnluZ0-oVH9,AVpfov9TLJeJML43A7B0,AVpfpM2yilAPnD_xfmDG,AVpfr5cb1cnluZ0-pZFp,AVpfrFDZLJeJML43Bmv0,AVpfrfHF1cnluZ0-pRai,AVpftikC1cnluZ0-p31V
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1234567,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1943,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
23jen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**Calculating the RMSE for only the products rated by user. For RMSE, normalising the rating to (1,5) range.**

In [49]:
#calculate RMSE

from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = common_user_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]


In [50]:
common_users_ = pd.pivot_table(common_users, index = user_column, columns = product_column, values = value_column)


In [51]:
total_non_nan = np.count_nonzero(~np.isnan(y))


In [52]:
temp_test_df = get_top_recommendation_users(test.reviews_username.unique())

In [53]:
temp_test_df.sort_values(by=["product0"], ascending=False).head(10)

Unnamed: 0,user,product0,product1,product2,product3,product4,product5,product6,product7,product8,...,product10,product11,product12,product13,product14,product15,product16,product17,product18,product19
571,manny,74.553884,30.277626,26.673776,13.253695,13.181836,11.50276,9.547001,7.078396,5.378422,...,4.904194,4.54939,4.143523,3.728256,3.691839,3.33155,3.263956,3.263956,3.263956,3.077287
1555,vicki,70.389129,57.651201,43.615622,34.720689,13.211525,11.145445,10.053928,9.047835,8.968361,...,8.576103,8.013447,7.987062,7.681255,6.595717,5.653617,5.212496,5.189562,5.157319,5.090876
1181,brandon,68.18143,54.455826,42.28923,30.394557,30.362211,11.004222,10.194899,9.611441,8.811906,...,8.685644,7.928822,7.782452,7.413893,7.278232,6.811862,5.82812,5.739426,5.317702,5.159947
67,chas,66.529931,53.025387,48.623608,41.477352,30.697828,12.84989,11.499159,11.06642,10.226556,...,9.17222,8.949494,8.418798,8.306439,7.337366,6.978337,6.613531,6.454689,6.413566,6.220085
175,adam,66.096354,45.845239,40.217108,29.455278,29.377614,12.342892,10.981117,10.710738,9.453713,...,8.860351,8.724332,8.510261,8.228932,7.337366,7.273503,7.08611,6.978337,6.82434,6.613531
878,robert,65.163173,28.201435,25.575304,13.692511,12.172277,9.42449,7.580041,5.047125,4.701621,...,4.446839,4.356348,3.827659,3.75,3.75,3.75,3.651118,3.535534,3.535534,3.305324
1301,jimmy,64.589525,52.861891,38.052271,29.294426,28.944883,12.05996,10.348706,9.616993,9.39462,...,8.532749,7.952292,7.895069,7.647249,7.537273,7.103924,6.863324,6.605217,4.677476,4.488549
776,chrissy,63.843874,50.097914,43.462303,36.683365,28.375215,27.848502,10.226556,8.970554,8.966358,...,8.199462,7.644235,7.463914,7.440169,7.337366,6.705564,6.613531,6.561823,6.220085,4.553418
782,drew,63.555636,43.77483,29.829907,28.401334,13.234265,10.092538,9.932488,9.430843,8.860351,...,8.396605,8.240811,8.219839,8.133037,7.337366,7.294089,6.986209,6.82434,6.422285,6.289016
1221,joel,63.365408,51.886801,41.608215,28.723801,28.094492,10.092538,9.932488,8.610013,8.563271,...,8.292772,8.219839,7.860265,7.374785,7.337366,6.986209,6.422285,5.127423,5.044912,4.913918


#### **RMSE of `User-User`**

In [54]:
rmse_user_user = (sum(sum((common_users_ - y )**2))/total_non_nan)**0.5
print(rmse_user_user)

2.3931530272458366


## **Item Based Recommendation System**

### **Item Similarity Matrix**
Taking the transpose of the rating matrix to normalize the rating around the mean for different movie ID. In the user based similarity, we had taken mean for each user instead of each movie.

In [55]:
item_recommendation_df_pivot = train.pivot_table(index='reviews_username', columns='id',values='reviews_rating').T
item_recommendation_df_pivot.head()

reviews_username,00sab00,01impala,02dakota,02deuce,0325home,06stidriver,08dallas,1.11E+24,1085,1143mom,...,zpalma,zsazsa,zt313,zubb,zuttle,zwithanx,zxcsdfd,zxjki,zyiah4,zzz1127
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AV13O1A8GV-KLJ3akUyj,,,,,,,,,,,...,,,,,,,,,,
AV14LG0R-jtxr-f38QfS,,,,,,,,,,,...,,,,,,,,,,
AV16khLE-jtxr-f38VFn,,,,,,,,,,,...,,,,,,,,,,
AV1YGDqsGV-KLJ3adc-O,,,,,,,,,,3.0,...,,,,,,,,,,
AV1YlENIglJLPUi8IHsX,,,,,,,,,,,...,,,,,,,,,,


**Normalising the movie rating for each movie for using the Adujsted Cosine**

In [56]:
mean = np.nanmean(item_recommendation_df_pivot, axis=1)
item_recommendation_df_subtracted = (item_recommendation_df_pivot.T-mean).T
item_recommendation_df_subtracted.head()

reviews_username,00sab00,01impala,02dakota,02deuce,0325home,06stidriver,08dallas,1.11E+24,1085,1143mom,...,zpalma,zsazsa,zt313,zubb,zuttle,zwithanx,zxcsdfd,zxjki,zyiah4,zzz1127
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AV13O1A8GV-KLJ3akUyj,,,,,,,,,,,...,,,,,,,,,,
AV14LG0R-jtxr-f38QfS,,,,,,,,,,,...,,,,,,,,,,
AV16khLE-jtxr-f38VFn,,,,,,,,,,,...,,,,,,,,,,
AV1YGDqsGV-KLJ3adc-O,,,,,,,,,,-1.09375,...,,,,,,,,,,
AV1YlENIglJLPUi8IHsX,,,,,,,,,,,...,,,,,,,,,,


**Cosine similarity using pairwise distances approach**

In [57]:
# Item Similarity Matrix
item_correlation = 1 - pairwise_distances(item_recommendation_df_subtracted.fillna(0), metric='cosine')
item_correlation[np.isnan(item_correlation)] = 0
print(item_correlation)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 1. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 [0. 0. 0. ... 0. 0. 1.]]


**correlation only for which the value is greater than 0. (Positively correlated)**

In [58]:
item_correlation[item_correlation<0]=0
item_correlation

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

#### **Prediction - `Item-Item`**

In [59]:
item_predicted_ratings = np.dot((item_recommendation_df_pivot.fillna(0).T),item_correlation)
item_predicted_ratings

array([[0.        , 0.        , 0.        , ..., 0.01190255, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.00364662, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.00911654, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.00729323, 0.        ,
        0.        ]])

In [60]:
item_predicted_ratings.shape

(17790, 218)

#### **Filtering the rating only for the products not rated by the user for recommendation**

In [61]:
item_final_rating = np.multiply(item_predicted_ratings, recommendation_user_dummy_train)
item_final_rating.head()



id,AV13O1A8GV-KLJ3akUyj,AV14LG0R-jtxr-f38QfS,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1YnUMYglJLPUi8IJpK,...,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfs0tUilAPnD_xgqN2,AVpfsQoeilAPnD_xgfx5,AVpfshNsLJeJML43CB8q,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V,AVpfv4TlilAPnD_xhjNS
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
00sab00,0.0,0.0,0.0,0.000604,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011903,0.0,0.0
01impala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01816,0.0,...,0.001341,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030267,0.0,...,0.002236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024213,0.0,...,0.001789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
0325home,0.0,0.0,0.0,0.0,0.0,0.0,0.002464,0.003665,0.002407,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.009117,0.0,0.0


In [62]:
Utility.save_object(item_final_rating, "item_final_rating")

# **Task 6: Recommendation of Top 20 Products to a Specified User `Item-Item`**

In [63]:
recommendations_item_item = item_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
recommendations_item_item

id
AVpfOIrkilAPnD_xXgDG    0.221069
AVpfv4TlilAPnD_xhjNS    0.087152
AVpfkQkcLJeJML43_kEC    0.069913
AVpfthSailAPnD_xg3ON    0.068964
AVpf0pfrilAPnD_xi6s_    0.048274
AVpe6PCDLJeJML43yFQH    0.047824
AVpfOmKwLJeJML435GM7    0.032066
AVpf2tw1ilAPnD_xjflC    0.027454
AVpe8gsILJeJML43y6Ed    0.027205
AVpe59io1cnluZ0-ZgDU    0.024612
AVpe5c23LJeJML43xybi    0.024377
AVpf0thK1cnluZ0-r8vR    0.023955
AVpfCuzrilAPnD_xTroT    0.023391
AVpe31o71cnluZ0-YrSD    0.022211
AVpe7sl91cnluZ0-aI1Y    0.021966
AVpfM_ytilAPnD_xXIJb    0.021875
AVpe7GIELJeJML43yZfu    0.021788
AVpfov9TLJeJML43A7B0    0.020920
AVpe9W4D1cnluZ0-avf0    0.019541
AVpfDI3xilAPnD_xTz-k    0.019007
Name: manny, dtype: float64

In [64]:
#display the top 20 product id, name and similarity_score 
final_recommendations_item = pd.DataFrame({'product_id': recommendations_item_item.index, 'similarity_score' : recommendations_item_item})
final_recommendations_item.reset_index(drop=True)
pd.merge(final_recommendations_item, train, on="id")[["id", "name", "similarity_score"]].drop_duplicates()

Unnamed: 0,id,name,similarity_score
0,AVpfOIrkilAPnD_xXgDG,Alex Cross (dvdvideo),0.221069
106,AVpfv4TlilAPnD_xhjNS,Various - Red Hot Blue:Tribute To Cole Porter ...,0.087152
109,AVpfkQkcLJeJML43_kEC,Cococare 100% Natural Castor Oil,0.069913
111,AVpfthSailAPnD_xg3ON,"Musselman Apple Sauce, Cinnamon, 48oz",0.068964
116,AVpf0pfrilAPnD_xi6s_,Nearly Natural 5.5' Bamboo W/decorative Planter,0.048274
122,AVpe6PCDLJeJML43yFQH,Wagan Smartac 80watt Inverter With Usb,0.047824
126,AVpfOmKwLJeJML435GM7,Clear Scalp & Hair Therapy Total Care Nourishi...,0.032066
378,AVpf2tw1ilAPnD_xjflC,Red (special Edition) (dvdvideo),0.027454
834,AVpe8gsILJeJML43y6Ed,"Pendaflex174 Divide It Up File Folder, Multi S...",0.027205
1062,AVpe59io1cnluZ0-ZgDU,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,0.024612


#### **Best recommended users - item recommendation**

In [65]:
def get_top_recommendation_items(users):
    new_df = pd.DataFrame()
    for user_input in users:
        try:
            temp_recommendations = item_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
            temp = temp_recommendations.to_frame().reset_index()
            temp = temp[temp.columns.difference(['id'])].T
            temp.columns = [f"product{ind}" for ind in range(len(temp.columns))]
            temp.insert(0, "user", user_input)
            
            if len(new_df) != 0:
                new_df = pd.concat([new_df, temp], ignore_index=True, axis=0)
            else:
                new_df = temp

        except:
            continue

    return new_df

In [66]:
temp_item_recommendation_df = get_top_recommendation_items(train.reviews_username.unique())

In [67]:
temp_item_recommendation_df.sort_values(by=["product0"], ascending=False).head(10)

Unnamed: 0,user,product0,product1,product2,product3,product4,product5,product6,product7,product8,...,product10,product11,product12,product13,product14,product15,product16,product17,product18,product19
8151,dfwatheartgirl,1.844855,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
504,cdguerrero,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1991,gelon33,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4649,rayray,1.844855,0.731307,0.2426,0.027092,0.024815,0.023539,0.019219,0.014117,0.013602,...,0.012134,0.011373,0.011037,0.010289,0.00962,0.00937,0.009147,0.009117,0.008516,0.008181
1757,kimmicha,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8110,coupongirl63301,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10921,traviemom,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
14687,violagirl522,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10920,momandteacher,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13944,purple22,1.844855,0.023411,0.019665,0.013698,0.009192,0.004062,0.002464,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### **Evaluation - Item Item**
Evaluation will we same as you have seen above for the prediction. The only difference being, you will evaluate for the movie already rated by the user insead of predicting it for the movie not rated by the user.

In [68]:
common_items = test[test['id'].isin(train['id'])]
print(common_items.shape)
common_items.head()

(8759, 4)


Unnamed: 0,id,name,reviews_rating,reviews_username
10776,AVpf3VOfilAPnD_xjpun,Clorox Disinfecting Wipes Value Pack Scented 1...,5,brant
27425,AVpfRTh1ilAPnD_xYic2,Planes: Fire Rescue (2 Discs) (includes Digita...,4,kingsixx
1019,AV1YGDqsGV-KLJ3adc-O,Windex Original Glass Cleaner Refill 67.6oz (2...,5,grangolfer
21235,AVpfm8yiLJeJML43AYyu,Nexxus Exxtra Gel Style Creation Sculptor,1,rohzgirl
14861,AVpf3VOfilAPnD_xjpun,Clorox Disinfecting Wipes Value Pack Scented 1...,5,soccermom1


#### **Item Based Matrix**

In [69]:
common_item_based_matrix =  common_items.pivot_table(index='reviews_username', columns='id', values='reviews_rating').T
common_item_based_matrix.shape

(158, 8181)

In [70]:
item_correlation_df = pd.DataFrame(item_correlation)
item_correlation_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,208,209,210,211,212,213,214,215,216,217
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002029,0.0
4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [71]:
item_correlation_df['productId'] = item_recommendation_df_subtracted.index
item_correlation_df.set_index('productId',inplace=True)
item_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,208,209,210,211,212,213,214,215,216,217
productId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AV13O1A8GV-KLJ3akUyj,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV14LG0R-jtxr-f38QfS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV16khLE-jtxr-f38VFn,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV1YGDqsGV-KLJ3adc-O,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002029,0.0
AV1YlENIglJLPUi8IHsX,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [72]:
item_correlation_df.columns = item_recommendation_df_subtracted.index.tolist()
item_correlation_df_1 =  item_correlation_df[item_correlation_df.index.isin(common_items['id'].tolist())]

In [73]:
item_correlation_df_2 = item_correlation_df_1.T[item_correlation_df_1.T.index.isin(common_items['id'].tolist())]
item_correlation_df_3 = item_correlation_df_2.T

In [74]:
item_correlation_df_3.head()

Unnamed: 0_level_0,AV16khLE-jtxr-f38VFn,AV1YGDqsGV-KLJ3adc-O,AV1YlENIglJLPUi8IHsX,AV1YmBrdGV-KLJ3adewb,AV1YmDL9vKc47QAVgr7_,AV1Ymf_rglJLPUi8II2v,AV1Yn94nvKc47QAVgtst,AV1Ynb3bglJLPUi8IJxJ,AV1YneDPglJLPUi8IJyQ,AV1YqAaMGV-KLJ3adiDj,...,AVpfov9TLJeJML43A7B0,AVpfpM2yilAPnD_xfmDG,AVpfqW4WilAPnD_xf7a_,AVpfr5cb1cnluZ0-pZFp,AVpfrFDZLJeJML43Bmv0,AVpfrTyiLJeJML43BrSI,AVpfrfHF1cnluZ0-pRai,AVpfrgjFLJeJML43BvCc,AVpfthSailAPnD_xg3ON,AVpftikC1cnluZ0-p31V
productId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AV16khLE-jtxr-f38VFn,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV1YGDqsGV-KLJ3adc-O,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003144,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002029
AV1YlENIglJLPUi8IHsX,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.006919,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV1YmBrdGV-KLJ3adewb,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AV1YmDL9vKc47QAVgr7_,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [75]:
item_correlation_df_3[item_correlation_df_3<0]=0

common_item_predicted_ratings = np.dot(item_correlation_df_3, common_item_based_matrix.fillna(0))
common_item_predicted_ratings

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.02415616, 0.        , 0.00532649, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.03037509, 0.        , ..., 0.        , 0.00911654,
        0.05062516],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [76]:
common_item_predicted_ratings.shape

(158, 8181)

In [77]:
dummy_test_recommendation = common_items.copy()
dummy_test_recommendation['reviews_rating'] = dummy_test_recommendation['reviews_rating'].apply(lambda x: 1 if x>=1 else 0)
dummy_test_recommendation = dummy_test_recommendation.pivot_table(index='reviews_username', columns='id', values='reviews_rating').T.fillna(0)


In [78]:
common_item_predicted_ratings = np.multiply(common_item_predicted_ratings,dummy_test_recommendation)

**The products not rated is marked as 0 for evaluation. And make the item- item matrix representaion.**

In [79]:
common_items_ = common_items.pivot_table(index='reviews_username', columns='id', values='reviews_rating').T

In [80]:
X  = common_item_predicted_ratings.copy()
X = X[X>0]

# Applying MinMaxScaler
scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]


In [81]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [82]:
rmse_item_item = (sum(sum((common_items_ - y )**2))/total_non_nan)**0.5
print(rmse_item_item)

3.5701485985345913


### **Selecting Best Recommendation Model:**

In [83]:
print("RMSE of User Based Recommendation System: ", rmse_user_user)
print("RMSE of Item Based Recommendation System: ", rmse_item_item)

if rmse_user_user < rmse_item_item:
  print("User-User Based Model is  Recommended")
else:
  print("Item-Item Based Model is  Recommended")

RMSE of User Based Recommendation System:  2.3931530272458366
RMSE of Item Based Recommendation System:  3.5701485985345913
User-User Based Model is  Recommended


On comparing the RMSE values of User Based Recommender and Item Based Recommender, User based recommendation model seems to be better in this case, as it has a lower RMSE value `(~2.3)` as compared to Item-Item based recommendation systm. <br/>
**User Based Recommendation system is chosen**.