# Business Understanding

## Problem Statement

You are working as a Machine Learning Engineer in an e-commerce company named 'Ebuss' & you are required to build a model that will improve the recommendations given to the users given their past reviews and ratings. 

In order to do this, need to build a sentiment-based product recommendation system using following steps:

1. Data sourcing and sentiment analysis

2. Building a recommendation system

3. Improving the recommendations using the sentiment analysis model

4. Deploying the end-to-end project with a user interface

## End Goals 

An end-to-end Jupyter Notebook, which consists of the entire code of recommendation system including following points:

* Data cleaning steps
* Text preprocessing
* Feature extraction
* 3 ML models used to build sentiment analysis models
* Two recommendation systems and their evaluations


Deployment of only one ML model and only one recommendation system that you have obtained from the previous steps along with the entire code to deploy the end-to-end project using Flask and Heroku.

# Data Understanding

In [49]:
#General
import numpy as np
import pandas as pd
import sys
from collections import Counter
import matplotlib.pyplot as plt
import string
import re

#NLP
import nltk
from nltk.tokenize import word_tokenize

#Stop words
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')

#Lemmatization
nltk.download('wordnet')
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()

#Stemming
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english")

#Modelling Basics
from sklearn.model_selection import cross_val_score
from scipy.sparse import hstack
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

#Dealing Imbalance & Model Save
from imblearn.over_sampling import SMOTE
from collections import Counter
import pickle

#Models
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier

#Cosine Similarity
from sklearn.metrics.pairwise import pairwise_distances

#Model Accuracy
from sklearn.metrics import accuracy_score

#Min max scaler
from sklearn.preprocessing import MinMaxScaler

[nltk_data] Downloading package stopwords to C:\Users\Octillion
[nltk_data]     0017\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Octillion
[nltk_data]     0017\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [50]:
df = pd.read_csv('input/sample30.csv')

In [51]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id                    30000 non-null  object
 1   brand                 30000 non-null  object
 2   categories            30000 non-null  object
 3   manufacturer          29859 non-null  object
 4   name                  30000 non-null  object
 5   reviews_date          29954 non-null  object
 6   reviews_didPurchase   15932 non-null  object
 7   reviews_doRecommend   27430 non-null  object
 8   reviews_rating        30000 non-null  int64 
 9   reviews_text          30000 non-null  object
 10  reviews_title         29810 non-null  object
 11  reviews_userCity      1929 non-null   object
 12  reviews_userProvince  170 non-null    object
 13  reviews_username      29937 non-null  object
 14  user_sentiment        29999 non-null  object
dtypes: int64(1), object(14)
memory usage

In [52]:
df['user_sentiment'].value_counts()

Positive    26632
Negative     3367
Name: user_sentiment, dtype: int64

In [53]:
#Remove the review row were username is null
df = df[df['reviews_username'].notna()]

In [54]:
#Remove the review row were user sentiment is null
df = df[df['user_sentiment'].notna()]

In [55]:
#Replace the review title null values with space
df['reviews_title']= df['reviews_title'].fillna(' ')

In [56]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29936 entries, 0 to 29999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id                    29936 non-null  object
 1   brand                 29936 non-null  object
 2   categories            29936 non-null  object
 3   manufacturer          29795 non-null  object
 4   name                  29936 non-null  object
 5   reviews_date          29896 non-null  object
 6   reviews_didPurchase   15931 non-null  object
 7   reviews_doRecommend   27395 non-null  object
 8   reviews_rating        29936 non-null  int64 
 9   reviews_text          29936 non-null  object
 10  reviews_title         29936 non-null  object
 11  reviews_userCity      1900 non-null   object
 12  reviews_userProvince  166 non-null    object
 13  reviews_username      29936 non-null  object
 14  user_sentiment        29936 non-null  object
dtypes: int64(1), object(14)
memory usage

In [57]:
df.head()

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,AV13O1A8GV-KLJ3akUyj,Universal Music,"Movies, Music & Books,Music,R&b,Movies & TV,Mo...",Universal Music Group / Cash Money,Pink Friday: Roman Reloaded Re-Up (w/dvd),2012-11-30T06:21:45.000Z,,,5,i love this album. it's very good. more to the...,Just Awesome,Los Angeles,,joshua,Positive
1,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor. This review was collected as part...,Good,,,dorothy w,Positive
2,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor.,Good,,,dorothy w,Positive
3,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-01-06T00:00:00.000Z,False,False,1,I read through the reviews on here before look...,Disappointed,,,rebecca,Negative
4,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-12-21T00:00:00.000Z,False,False,1,My husband bought this gel for us. The gel cau...,Irritation,,,walker557,Negative


**Here other than name, reviews_rating, reviews_title, reviews_text, reviews_username & user_sentiment other columns are not required.**

# Data Preparation

In [58]:
df_master = df[['reviews_username','name','reviews_rating','user_sentiment']].copy()
df_master['reviews'] = df['reviews_title'] + " " + df['reviews_text']
df_master.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29936 entries, 0 to 29999
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   reviews_username  29936 non-null  object
 1   name              29936 non-null  object
 2   reviews_rating    29936 non-null  int64 
 3   user_sentiment    29936 non-null  object
 4   reviews           29936 non-null  object
dtypes: int64(1), object(4)
memory usage: 1.4+ MB


In [59]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,Just Awesome i love this album. it's very good...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,Good Good flavor. This review was collected as...
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,Good Good flavor.
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,Disappointed I read through the reviews on her...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,Irritation My husband bought this gel for us. ...


In [60]:
#Remove the Hyperlinks
df_master['reviews'] = df_master['reviews'].apply(lambda x:re.sub(r"http\S+", "", x))

In [61]:
#Remove the numbers
df_master['reviews'] = df_master['reviews'].apply(lambda x:re.sub(r"[0-9]", "", x))

In [62]:
#Remove Punctuations/Special Characters
df_master['reviews'] = df_master['reviews'].apply(lambda x:''.join([i for i in x if i not in string.punctuation]))

In [63]:
#Lower case the text
df_master['reviews'] = df_master['reviews'].str.lower()

In [64]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,just awesome i love this album its very good m...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor this review was collected as ...
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappointed i read through the reviews on her...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,irritation my husband bought this gel for us t...


In [65]:
#Tokenize & Remove the stop words
df_master['reviews'] = df_master['reviews'].apply(word_tokenize)
df_master['reviews'] = df_master['reviews'].apply(lambda x: [i for i in x if i not in stop])
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesome, love, album, good, hip, hop, side, c..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collected, part, ..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappointed, read, reviews, looking, buying,..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irritation, husband, bought, gel, us, gel, ca..."


### Lemmatization & Stemming

In [66]:
#Conveting Tokenized reviews to String
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,awesome love album good hip hop side current p...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor review collected part promotion
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappointed read reviews looking buying one c...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,irritation husband bought gel us gel caused ir...


In [67]:
#Tokenize and Lemmatize
def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)]

df_master['reviews'] = df_master.reviews.apply(lemmatize_text)
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesome, love, album, good, hip, hop, side, c..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collected, part, ..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappointed, read, review, looking, buying, ..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irritation, husband, bought, gel, u, gel, cau..."


In [68]:
#Conveting Tokenized reviews to String
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))

In [69]:
def stemming_text(text):
    return [stemmer.stem(w) for w in w_tokenizer.tokenize(text)]

df_master['reviews'] = df_master.reviews.apply(stemming_text)
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesom, love, album, good, hip, hop, side, cu..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collect, part, pr..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappoint, read, review, look, buy, one, cou..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irrit, husband, bought, gel, u, gel, caus, ir..."


In [70]:
#Conveting Tokenized reviews to String
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))

# Data Modeling

In [71]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,awesom love album good hip hop side current po...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor review collect part promot
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappoint read review look buy one coupl lubr...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,irrit husband bought gel u gel caus irrit felt...


In [72]:
X=df_master['reviews'].copy()
y=df_master['user_sentiment'].copy()
X.head()

0    awesom love album good hip hop side current po...
1          good good flavor review collect part promot
2                                     good good flavor
3    disappoint read review look buy one coupl lubr...
4    irrit husband bought gel u gel caus irrit felt...
Name: reviews, dtype: object

In [73]:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=77)

In [74]:
#Tfid Vectorizer
word_vectorizer = TfidfVectorizer(
    sublinear_tf=True,
    strip_accents='unicode',
    analyzer='word',
    token_pattern=r'\w{1,}',
    stop_words='english',
    ngram_range=(1,1),
    max_features=10000)
word_vectorizer.fit(X_train)
X_train = word_vectorizer.transform(X_train)

#Save the pickel file
pickle.dump(word_vectorizer, open('output/word_vec.pkl', 'wb'))

In [75]:
counter = Counter(y_train)
print('Before',counter)
# oversampling the train dataset using SMOTE
smt = SMOTE()
X_train_sm, y_train_sm = smt.fit_resample(X_train, y_train)

counter = Counter(y_train_sm)
print('After',counter)

Before Counter({'Positive': 18601, 'Negative': 2354})
After Counter({'Positive': 18601, 'Negative': 18601})


### Random Forest

In [76]:
#Random Forest Model
classifier = RandomForestClassifier()
classifier.fit(X_train_sm, y_train_sm)

#Save Pickle File
filename = 'output/randomforest_model.pkl'
pickle.dump(classifier, open(filename, 'wb'))

In [77]:
#Random Forest Model Prediction
filename = 'output/randomforest_model.pkl'
loaded_rfmodel = pickle.load(open(filename, 'rb'))
r_pred = loaded_rfmodel.predict(word_vectorizer.transform(X_test))

In [78]:
#Random Forest Model Accuracy
rf_accuracy = accuracy_score(r_pred, y_test)
rf_accuracy

0.9059124819062465

### SVM

In [79]:
#SVM Model
model = LinearSVC()
model.fit(X_train_sm, y_train_sm)

LinearSVC()

In [80]:
#SVM Model Prediction
svm_pred = model.predict(word_vectorizer.transform(X_test))

In [81]:
#SVM Model Acurracy
rf_accuracy_svm = accuracy_score(svm_pred, y_test)
rf_accuracy_svm

0.8641576661841666

### XG Boost

In [82]:
#XG Boost Model Building
XGB = XGBClassifier(learning_rate=0.05,max_depth=5)
XGB.fit(X_train_sm,y_train_sm)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.05, max_delta_step=0, max_depth=5,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [83]:
#XG Boost Model Prediction
XGB_pred = XGB.predict(word_vectorizer.transform(X_test))

In [84]:
#XG Boost Model Acurracy
rf_accuracy_xgb = accuracy_score(XGB_pred, y_test)
rf_accuracy_xgb

0.8413317002560962

**Better model is Random Forest according to the evaulation.**

# RECOMMENDATION SYSTEM

In [85]:
df_recommend = df_master.copy()
df_recommend = df_recommend[['name', 'reviews_username', 'reviews_rating']]
df_recommend.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29936 entries, 0 to 29999
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   name              29936 non-null  object
 1   reviews_username  29936 non-null  object
 2   reviews_rating    29936 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 935.5+ KB


In [86]:
#Taking mean of the rating's from same user to same product
df_recommend =  df_recommend.groupby(by=["reviews_username","name"]).mean().reset_index()
df_recommend.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27588 entries, 0 to 27587
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   reviews_username  27588 non-null  object 
 1   name              27588 non-null  object 
 2   reviews_rating    27588 non-null  float64
dtypes: float64(1), object(2)
memory usage: 646.7+ KB


In [87]:
#Splitting the Data into train and test
train, test = train_test_split(df_recommend, test_size=0.30, random_state=43)
print(train.shape)
print(test.shape)

(19311, 3)
(8277, 3)


In [88]:
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).fillna(0)

df_pivot.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
print('df_pivot :' + str(df_pivot.shape))
print('unique users :' + str(train['reviews_username'].nunique()))
print('unique products :' + str(train['name'].nunique()))

df_pivot :(17838, 248)
unique users :17838
unique products :248


In [90]:
# Copy the train dataset into dummy_train
dummy_train = train.copy()
dummy_train.head()

Unnamed: 0,reviews_username,name,reviews_rating
22234,ruth,Clorox Disinfecting Wipes Value Pack Scented 1...,5.0
24652,superd,Tostitos Bite Size Tortilla Chips,5.0
15512,lockwood,Godzilla 3d Includes Digital Copy Ultraviolet ...,5.0
22876,sdonovan724,Clorox Disinfecting Wipes Value Pack Scented 1...,5.0
18386,monica,Clear Scalp & Hair Therapy Total Care Nourishi...,2.0


In [91]:
# The products not rated by user is marked as 1 for prediction. 
dummy_train['reviews_rating'] = dummy_train['reviews_rating'].apply(lambda x: 0 if x>=1 else 1)

In [92]:
# Convert the dummy train dataset into matrix format.
dummy_train = dummy_train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).fillna(1)
dummy_train.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
02deuce,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
06stidriver,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
08dallas,1.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
09mommy11,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Adjusted Cosine similarity

In [93]:
# Create a user-movie matrix.
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
)
df_pivot.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
06stidriver,,,,,,,,,,,...,,,,,,,,,,
08dallas,,5.0,,,,,,,,,...,,,,,,,,,,
09mommy11,,,,,,,,,,,...,,,,,,,,,,


In [94]:
#Normalising the rating of the movie for each user around 0 mean
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T
df_subtracted.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
06stidriver,,,,,,,,,,,...,,,,,,,,,,
08dallas,,0.0,,,,,,,,,...,,,,,,,,,,
09mommy11,,,,,,,,,,,...,,,,,,,,,,


In [95]:
df_subtracted.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,,,,,,,,,,,...,,,,,,,,,,
02deuce,,,,,,,,,,,...,,,,,,,,,,
06stidriver,,,,,,,,,,,...,,,,,,,,,,
08dallas,,0.0,,,,,,,,,...,,,,,,,,,,
09mommy11,,,,,,,,,,,...,,,,,,,,,,


In [96]:
# Creating the User Similarity Matrix using pairwise_distance function.
user_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


In [97]:
user_correlation.shape

(17838, 17838)

## Prediciting for User-User based recommendation 

In [98]:
user_correlation[user_correlation<0]=0
user_correlation

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [99]:
user_predicted_ratings = np.dot(user_correlation, df_pivot.fillna(0))
user_predicted_ratings

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [100]:
user_predicted_ratings.shape

(17838, 248)

In [101]:
user_final_rating = np.multiply(user_predicted_ratings,dummy_train)
user_final_rating.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Recommending top 20

In [102]:
# Take the user ID as input.
#user_input = str(input("Enter your user name"))
user_input = '1234'
print(user_input)

1234


In [103]:
user_final_rating.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [104]:
df_recommend_ii = pd.DataFrame(user_final_rating.loc[user_input].sort_values(ascending=False)[0:20])
df_recommend_ii

Unnamed: 0_level_0,1234
name,Unnamed: 1_level_1
Planes: Fire Rescue (2 Discs) (includes Digital Copy) (blu-Ray/dvd),30.396275
The Resident Evil Collection 5 Discs (blu-Ray),26.279946
My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Digital),19.498721
Red (special Edition) (dvdvideo),16.986419
Tostitos Bite Size Tortilla Chips,14.468224
Clorox Disinfecting Bathroom Cleaner,10.838323
Jason Aldean - They Don't Know,9.742555
"Coty Airspun Face Powder, Translucent Extra Coverage",6.947049
Alex Cross (dvdvideo),5.882843
Chester's Cheese Flavored Puffcorn Snacks,5.405528


In [105]:
recommended_uu = df_recommend_ii.index.to_list()
recommended_uu

['Planes: Fire Rescue (2 Discs) (includes Digital Copy) (blu-Ray/dvd)',
 'The Resident Evil Collection 5 Discs (blu-Ray)',
 'My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Digital)',
 'Red (special Edition) (dvdvideo)',
 'Tostitos Bite Size Tortilla Chips',
 'Clorox Disinfecting Bathroom Cleaner',
 "Jason Aldean - They Don't Know",
 'Coty Airspun Face Powder, Translucent Extra Coverage',
 'Alex Cross (dvdvideo)',
 "Chester's Cheese Flavored Puffcorn Snacks",
 '100:Complete First Season (blu-Ray)',
 'Hormel Chili, No Beans',
 "Burt's Bees Lip Shimmer, Raisin",
 'Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter',
 "Cheetos Crunchy Flamin' Hot Cheese Flavored Snacks",
 'Dark Shadows (includes Digital Copy) (ultraviolet) (dvdvideo)',
 'Windex Original Glass Cleaner Refill 67.6oz (2 Liter)',
 'Cuisinart174 Electric Juicer - Stainless Steel Cje-1000',
 'Pleasant Hearth Diamond Fireplace Screen - Espresso',
 'Beanitos Bean Chips, Simply Pinto Bean']

## Evaluation for User-User based recommendation

In [106]:
# Find out the common users of test and train dataset.
common = test[test.reviews_username.isin(train.reviews_username)]
common.shape

(1076, 3)

In [107]:
common.head()

Unnamed: 0,reviews_username,name,reviews_rating
25745,timothy,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,5.0
14641,lance,Godzilla 3d Includes Digital Copy Ultraviolet ...,4.0
11481,jean,Clorox Disinfecting Wipes Value Pack Scented 1...,5.0
24661,supergirl,L'or233al Paris Elvive Extraordinary Clay Reba...,4.0
18637,mrfrost,Mike Dave Need Wedding Dates (dvd + Digital),5.0


In [108]:
# convert into the user-product matrix.
common_user_based_matrix = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating')

In [109]:
common_user_based_matrix.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),"42 Dual Drop Leaf Table with 2 Madrid Chairs""",Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Australian Gold Exotic Blend Lotion, SPF 4","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter","BRIDGESTONE 130/70ZR18M/C(63W)FRONT EXEDRA G851, CRUISER RADL","Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",...,Toy Story Kids' Woody Accessory Kit,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,,,,,,,,,,,...,,,,,,,,,,
123charlie,,,,,,,,,,,...,,,,,,,,,,
1witch,,,,,,,,,,,...,,,,,,,,,,
37f5p,,5.0,,,,,,,,,...,,,,,,,,,,
50cal,,,,,,,,,,,...,,,,,,,,,,


In [110]:
common_user_based_matrix.shape

(915, 113)

In [111]:
# Convert the user_correlation matrix into dataframe.
user_correlation_df = pd.DataFrame(user_correlation)

In [112]:
user_correlation_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,17828,17829,17830,17831,17832,17833,17834,17835,17836,17837
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [113]:
user_correlation_df.shape

(17838, 17838)

In [114]:
df_subtracted.shape

(17838, 248)

In [115]:
user_correlation_df['reviews_username'] = df_subtracted.index

user_correlation_df.set_index('reviews_username',inplace=True)
user_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,17828,17829,17830,17831,17832,17833,17834,17835,17836,17837
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
02deuce,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [116]:
user_correlation_df.shape

(17838, 17838)

In [117]:
common.head(3)

Unnamed: 0,reviews_username,name,reviews_rating
25745,timothy,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,5.0
14641,lance,Godzilla 3d Includes Digital Copy Ultraviolet ...,4.0
11481,jean,Clorox Disinfecting Wipes Value Pack Scented 1...,5.0


In [118]:
list_name = common.reviews_username.tolist()
user_correlation_df.columns = df_subtracted.index.tolist()
user_correlation_df_1 =  user_correlation_df[user_correlation_df.index.isin(list_name)]

In [119]:
user_correlation_df_2 = user_correlation_df_1.T[user_correlation_df_1.T.index.isin(list_name)]
user_correlation_df_3 = user_correlation_df_2.T
user_correlation_df_3.head()

Unnamed: 0_level_0,1234,123charlie,1witch,37f5p,50cal,abbey,abby,acv4217,adam,aep1010,...,woody,woottos,xmom,xstr8edgex,yohnie1,yummy,yvonne,zipper,zippy,zitro
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.288675,0.0
123charlie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1witch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
37f5p,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50cal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [120]:
user_correlation_df_3.shape

(915, 915)

In [121]:
user_correlation_df_3[user_correlation_df_3<0]=0
common_user_predicted_ratings = np.dot(user_correlation_df_3, common_user_based_matrix.fillna(0))
common_user_predicted_ratings

array([[0.        , 3.11004234, 0.        , ..., 0.        , 0.04848811,
        1.15470054],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 5.38675135, 0.        , ..., 0.        , 2.90949434,
        2.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [122]:
print(common_user_predicted_ratings.shape)
print(common.shape)

(915, 113)
(1076, 3)


In [123]:
dummy_test = common.copy()

dummy_test['reviews_rating'] = dummy_test['reviews_rating'].apply(lambda x: 1 if x>=1 else 0)

dummy_test = dummy_test.pivot_table(index='reviews_username', columns='name', values='reviews_rating').fillna(0)

In [124]:
common_user_predicted_ratings = np.multiply(common_user_predicted_ratings,dummy_test)

In [125]:
common_user_predicted_ratings.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),"42 Dual Drop Leaf Table with 2 Madrid Chairs""",Alex Cross (dvdvideo),"Aussie Aussome Volume Shampoo, 13.5 Oz","Australian Gold Exotic Blend Lotion, SPF 4","Aveeno Baby Continuous Protection Lotion Sunscreen with Broad Spectrum SPF 55, 4oz","Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter","BRIDGESTONE 130/70ZR18M/C(63W)FRONT EXEDRA G851, CRUISER RADL","Banana Boat Sunless Summer Color Self Tanning Lotion, Light To Medium",...,Toy Story Kids' Woody Accessory Kit,Tresemme Kertatin Smooth Infusing Conditioning,Vaseline Intensive Care Healthy Hands Stronger Nails,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee",Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1234,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
123charlie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1witch,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
37f5p,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50cal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Calculating RMSE

In [126]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = common_user_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]
  data_min = np.nanmin(X, axis=0)
  data_max = np.nanmax(X, axis=0)


In [127]:
common_ = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating')

In [128]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [129]:
rmse = (sum(sum((common_ - y )**2))/total_non_nan)**0.5
print(rmse)

2.575502169496468


## Prediciting for Item-Item based recommendation 

In [130]:
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).T
df_pivot.head()

reviews_username,02dakota,02deuce,06stidriver,08dallas,09mommy11,1.11E+24,1085,10ten,11111111aaaaaaaaaaaaaaaaa,11677j,...,zsarah,zsazsa,zulaa118,zuttle,zwithanx,zxcsdfd,zxjki,zyiah4,zzdiane,zzz1127
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,,,,,,,,,,,...,,,,,,,,,,
100:Complete First Season (blu-Ray),,,,5.0,,,,,,,...,,,,,,,,,,
2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,,,,,,,,,,,...,,,,,,,,,,
"2x Ultra Era with Oxi Booster, 50fl oz",,,,,,,,,,,...,,,,,,,,,,
4C Grated Parmesan Cheese 100% Natural 8oz Shaker,,,,,,,,,,,...,,,,,,,,,,


In [131]:
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T
df_subtracted.head()

reviews_username,02dakota,02deuce,06stidriver,08dallas,09mommy11,1.11E+24,1085,10ten,11111111aaaaaaaaaaaaaaaaa,11677j,...,zsarah,zsazsa,zulaa118,zuttle,zwithanx,zxcsdfd,zxjki,zyiah4,zzdiane,zzz1127
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,,,,,,,,,,,...,,,,,,,,,,
100:Complete First Season (blu-Ray),,,,0.268421,,,,,,,...,,,,,,,,,,
2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,,,,,,,,,,,...,,,,,,,,,,
"2x Ultra Era with Oxi Booster, 50fl oz",,,,,,,,,,,...,,,,,,,,,,
4C Grated Parmesan Cheese 100% Natural 8oz Shaker,,,,,,,,,,,...,,,,,,,,,,


In [132]:
# Item Similarity Matrix
item_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
item_correlation[np.isnan(item_correlation)] = 0
item_correlation

array([[ 1.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.        ,  1.        ,  0.        , ..., -0.00556526,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [ 0.        , -0.00556526,  0.        , ...,  1.        ,
         0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         1.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        , ...,  0.        ,
         0.        ,  1.        ]])

In [133]:
item_correlation.shape

(248, 248)

In [134]:
#Filtering the correlation only for which the value is greater than 0. (Positively correlated)
item_correlation[item_correlation<0]=0
item_correlation

array([[1., 0., 0., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]])

In [135]:
item_predicted_ratings = np.dot((df_pivot.fillna(0).T),item_correlation)
item_predicted_ratings

array([[0.        , 0.03219031, 0.        , ..., 0.00645327, 0.        ,
        0.01784792],
       [0.        , 0.02575225, 0.        , ..., 0.00516261, 0.        ,
        0.01427833],
       [0.        , 0.        , 0.        , ..., 0.0016896 , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.0016896 , 0.        ,
        0.        ],
       [0.        , 0.04219839, 0.        , ..., 0.01115828, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.00135168, 0.        ,
        0.        ]])

In [136]:
print(item_predicted_ratings.shape)
print(dummy_train.shape)

(17838, 248)
(17838, 248)


In [137]:
#Filtering the rating only for the movies not rated by the user for recommendation
item_final_rating = np.multiply(item_predicted_ratings,dummy_train)
item_final_rating.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.03219,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017201,0.0,0.0,0.0,0.006453,0.0,0.017848
02deuce,0.0,0.025752,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.013761,0.0,0.0,0.0,0.005163,0.0,0.014278
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001833,0.003713,0.0,...,0.0,0.0,0.0,0.009094,0.0,0.0,0.0,0.00169,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004359,0.006218,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08297,0.002484,0.0


### Recommending top 20

In [138]:
# Take the user ID as input
#user_input = str(input("Enter your user name"))
user_input = '02deuce'
print(user_input)

02deuce


In [139]:
item_final_rating.head()

name,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alberto VO5 Salon Series Smooth Plus Sleek Shampoo,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Ambi Complexion Cleansing Bar,...,Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Wallmount Server Cabinet (450mm, 9 RU)","Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Wilton Black Dots Standard Baking Cups,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
reviews_username,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
02dakota,0.0,0.03219,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017201,0.0,0.0,0.0,0.006453,0.0,0.017848
02deuce,0.0,0.025752,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.013761,0.0,0.0,0.0,0.005163,0.0,0.014278
06stidriver,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001833,0.003713,0.0,...,0.0,0.0,0.0,0.009094,0.0,0.0,0.0,0.00169,0.0,0.0
08dallas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
09mommy11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004359,0.006218,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08297,0.002484,0.0


In [140]:
# Recommending the Top 20 products to the user.
df_recommend_ii = pd.DataFrame(item_final_rating.loc[user_input].sort_values(ascending=False)[0:20])
df_recommend_ii

Unnamed: 0_level_0,02deuce
name,Unnamed: 1_level_1
Various Artists - Choo Choo Soul (cd),0.130433
Planes: Fire Rescue (2 Discs) (includes Digital Copy) (blu-Ray/dvd),0.040743
CeraVe SA Renewing Cream,0.033757
Jason Aldean - They Don't Know,0.032464
Clear Scalp & Hair Therapy Total Care Nourishing Shampoo,0.031909
Nearly Natural 5.5' Bamboo W/decorative Planter,0.030648
100:Complete First Season (blu-Ray),0.025752
There's Something About Mary (dvd),0.01947
"Caress Moisturizing Body Bar Natural Silk, 4.75oz",0.018802
K-Y Love Sensuality Pleasure Gel,0.016247


In [141]:
recommended_ii = df_recommend_ii.index.to_list()
recommended_ii

['Various Artists - Choo Choo Soul (cd)',
 'Planes: Fire Rescue (2 Discs) (includes Digital Copy) (blu-Ray/dvd)',
 'CeraVe SA Renewing Cream',
 "Jason Aldean - They Don't Know",
 'Clear Scalp & Hair Therapy Total Care Nourishing Shampoo',
 "Nearly Natural 5.5' Bamboo W/decorative Planter",
 '100:Complete First Season (blu-Ray)',
 "There's Something About Mary (dvd)",
 'Caress Moisturizing Body Bar Natural Silk, 4.75oz',
 'K-Y Love Sensuality Pleasure Gel',
 "Newman's Own Organics Licorice Twist, Black 5oz",
 'Yes To Grapefruit Rejuvenating Body Wash',
 'Equals (blu-Ray)',
 'Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee',
 'Chips Ahoy! Original Chocolate Chip - Cookies - Family Size 18.2oz',
 "L'oreal Paris Visible Lift Smooth Absolute, Natural Buff",
 "Mrs. Meyer's174 Lemon Verbena Laundry Scent Booster - 18oz",
 'Mike Dave Need Wedding Dates (dvd + Digital)',
 "Newman's Own Balsamic Vinaigrette, 16.0oz",
 'Holmes174 Personal In

## Evaluation for Item-Item based recommendation 

In [142]:
test.columns

Index(['reviews_username', 'name', 'reviews_rating'], dtype='object')

In [143]:
common = test[test.name.isin(train.name)]
common.shape

(8248, 3)

In [144]:
common.head(4)

Unnamed: 0,reviews_username,name,reviews_rating
18416,monkeygirl5,Tostitos Bite Size Tortilla Chips,5.0
11081,james mcdonald,Chester's Cheese Flavored Puffcorn Snacks,5.0
16718,maseawee,Godzilla 3d Includes Digital Copy Ultraviolet ...,4.0
12019,jhosborne,Planes: Fire Rescue (2 Discs) (includes Digita...,5.0


In [145]:
common_item_based_matrix = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating').T

In [146]:
common_item_based_matrix.shape

(194, 7963)

In [147]:
item_correlation_df = pd.DataFrame(item_correlation)

In [148]:
item_correlation_df.head(1)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,238,239,240,241,242,243,244,245,246,247
0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [149]:
item_correlation_df['name'] = df_subtracted.index
item_correlation_df.set_index('name',inplace=True)
item_correlation_df.head()

Unnamed: 0_level_0,0,1,2,3,4,5,6,7,8,9,...,238,239,240,241,242,243,244,245,246,247
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100:Complete First Season (blu-Ray),0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"2x Ultra Era with Oxi Booster, 50fl oz",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4C Grated Parmesan Cheese 100% Natural 8oz Shaker,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [150]:
list_name = common.name.tolist()
item_correlation_df.columns = df_subtracted.index.tolist()
item_correlation_df_1 =  item_correlation_df[item_correlation_df.index.isin(list_name)]
item_correlation_df_2 = item_correlation_df_1.T[item_correlation_df_1.T.index.isin(list_name)]
item_correlation_df_3 = item_correlation_df_2.T
item_correlation_df_3.head()

Unnamed: 0_level_0,0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,100:Complete First Season (blu-Ray),2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,"2x Ultra Era with Oxi Booster, 50fl oz",4C Grated Parmesan Cheese 100% Natural 8oz Shaker,Africa's Best No-Lye Dual Conditioning Relaxer System Super,Alex Cross (dvdvideo),"All,bran Complete Wheat Flakes, 18 Oz.",Annie's Homegrown Gluten Free Double Chocolate Chip Granola Bars,Arrid Extra Dry Anti-Perspirant Deodorant Spray Regular,...,Vaseline Intensive Care Lip Therapy Cocoa Butter,"Vicks Vaporub, Regular, 3.53oz",Voortman Sugar Free Fudge Chocolate Chip Cookies,Wagan Smartac 80watt Inverter With Usb,"Way Basics 3-Shelf Eco Narrow Bookcase Storage Shelf, Espresso - Formaldehyde Free - Lifetime Guarantee","WeatherTech 40647 14-15 Outlander Cargo Liners Behind 2nd Row, Black",Weleda Everon Lip Balm,Windex Original Glass Cleaner Refill 67.6oz (2 Liter),Yes To Carrots Nourishing Body Wash,Yes To Grapefruit Rejuvenating Body Wash
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0.6 Cu. Ft. Letter A4 Size Waterproof 30 Min. Fire File Chest,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100:Complete First Season (blu-Ray),0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2017-2018 Brownline174 Duraflex 14-Month Planner 8 1/2 X 11 Black,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"2x Ultra Era with Oxi Booster, 50fl oz",0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4C Grated Parmesan Cheese 100% Natural 8oz Shaker,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [151]:
item_correlation_df_3[item_correlation_df_3<0]=0
common_item_predicted_ratings = np.dot(item_correlation_df_3, common_item_based_matrix.fillna(0))
common_item_predicted_ratings

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.03729771, 0.01931419, ..., 0.        , 0.03219031,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.00754696, 0.00387196, ..., 0.        , 0.00645327,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.01070875, ..., 0.        , 0.01784792,
        0.01293414]])

In [152]:
common_item_predicted_ratings.shape

(194, 7963)

In [153]:
dummy_test = common.copy()
dummy_test['reviews_rating'] = dummy_test['reviews_rating'].apply(lambda x: 1 if x>=1 else 0)
dummy_test = dummy_test.pivot_table(index='reviews_username', columns='name', values='reviews_rating').T.fillna(0)
common_item_predicted_ratings = np.multiply(common_item_predicted_ratings,dummy_test)

In [154]:
common_ = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating').T

In [155]:
X  = common_item_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

MinMaxScaler(feature_range=(1, 5))
[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]
  data_min = np.nanmin(X, axis=0)
  data_max = np.nanmax(X, axis=0)


In [156]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [157]:
rmse = (sum(sum((common_ - y )**2))/total_non_nan)**0.5
print(rmse)

3.5883929648845965


### User-user based recommendation system gave 2.57 RMSE & Item-item based recommendation system is giving 3.588 RMSE so User-user based recommendation system is better.

##  Sentiment Analysis to recommend top 5 out of 20 products
From above Evaluation it is concluded that the User-User based is better recommendation system so 20products recommended by that system will be used for Sentimated based tuning.

In [158]:
#Copy of master dataset
df_recommend_5 = df_master.copy()

#Review of the user-user based top 20 product filtered from the master dataset
df_recommend_5 = df_recommend_5[df_recommend_5.name.isin(recommended_uu)]  
print('Total unique products : ' + str(df_recommend_5.name.nunique()))

Total unique products : 20


In [159]:
df_recommend_5.head(2)

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
688,mountainman,Windex Original Glass Cleaner Refill 67.6oz (2...,1,Positive,doesnt clean window leaf window streak althoug...
689,rick,Windex Original Glass Cleaner Refill 67.6oz (2...,1,Positive,use leaf streak bad use windsheild washer flui...


In [160]:
#Importing & Transforming reviews using TF-IDF vectorizer
tfid_vec = pickle.load(open('output/word_vec.pkl', 'rb'))
transformed_reviews = tfid_vec.transform(df_recommend_5['reviews'])

In [161]:
#Importing & predicting sentiment using Random Forest Model
rfmodel = pickle.load(open('output/randomforest_model.pkl', 'rb'))
sent_op = rfmodel.predict(transformed_reviews)
df_recommend_5['user_sentiment_predicted'] = sent_op
Counter(df_recommend_5['user_sentiment_predicted'])

Counter({'Positive': 7806, 'Negative': 818})

In [162]:
#Conveting predicted sentiment column into boolean
df_recommend_5['user_sentiment_predicted'] = df_recommend_5['user_sentiment_predicted'].map({'Negative': 0, 'Positive': 1})
df_recommend_5['user_sentiment_predicted'].unique()

array([1, 0], dtype=int64)

In [163]:
df_recommend_5.head(2)

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews,user_sentiment_predicted
688,mountainman,Windex Original Glass Cleaner Refill 67.6oz (2...,1,Positive,doesnt clean window leaf window streak althoug...,1
689,rick,Windex Original Glass Cleaner Refill 67.6oz (2...,1,Positive,use leaf streak bad use windsheild washer flui...,1


In [164]:
#Recommending top 5 products with most postive reviews
review_df = pd.DataFrame(df_recommend_5.groupby(['name'])['user_sentiment_predicted'].count()).reset_index()
review_df['positive'] = df_recommend_5.groupby(['name'])['user_sentiment_predicted'].sum().values
review_df['%positive']= review_df['positive']/review_df['user_sentiment_predicted']
review_df = review_df.sort_values(by = ['%positive'],ascending=False)
review_df.head()

Unnamed: 0,name,user_sentiment_predicted,positive,%positive
13,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,668,647,0.968563
15,Pleasant Hearth Diamond Fireplace Screen - Esp...,31,30,0.967742
9,Cuisinart174 Electric Juicer - Stainless Steel...,103,99,0.961165
0,100:Complete First Season (blu-Ray),139,133,0.956835
16,Red (special Edition) (dvdvideo),672,642,0.955357


In [165]:
#Using Rule of succession to resolve small numberof reviews issue
review_df['user_sentiment_predicted'] = review_df['user_sentiment_predicted'].apply(lambda x: x+2)
review_df['positive'] = review_df['positive'].apply(lambda x: x+1)
review_df['%positive']= review_df['positive']/review_df['user_sentiment_predicted']
review_df = review_df.sort_values(by = ['%positive'],ascending=False)[0:5]
review_df

Unnamed: 0,name,user_sentiment_predicted,positive,%positive
13,My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Di...,670,648,0.967164
16,Red (special Edition) (dvdvideo),674,643,0.954006
9,Cuisinart174 Electric Juicer - Stainless Steel...,105,100,0.952381
0,100:Complete First Season (blu-Ray),141,134,0.950355
2,Avery174 Ready Index Contemporary Table Of Con...,315,299,0.949206


In [166]:
#Listing the top 5 recommended products in sequence
top5_recommended__ii = review_df.name.to_list()
top5_recommended__ii

['My Big Fat Greek Wedding 2 (blu-Ray + Dvd + Digital)',
 'Red (special Edition) (dvdvideo)',
 'Cuisinart174 Electric Juicer - Stainless Steel Cje-1000',
 '100:Complete First Season (blu-Ray)',
 'Avery174 Ready Index Contemporary Table Of Contents Divider, 1-8, Multi, Letter']