# Business Understanding

## Problem Statement

You are working as a Machine Learning Engineer in an e-commerce company named 'Ebuss' & you are required to build a model that will improve the recommendations given to the users given their past reviews and ratings. 

In order to do this, need to build a sentiment-based product recommendation system using following steps:

1. Data sourcing and sentiment analysis

2. Building a recommendation system

3. Improving the recommendations using the sentiment analysis model

4. Deploying the end-to-end project with a user interface

## End Goals 

An end-to-end Jupyter Notebook, which consists of the entire code of recommendation system including following points:

* Data cleaning steps
* Text preprocessing
* Feature extraction
* 3 ML models used to build sentiment analysis models
* Two recommendation systems and their evaluations


Deployment of only one ML model and only one recommendation system that you have obtained from the previous steps along with the entire code to deploy the end-to-end project using Flask and Heroku.

# Data Understanding

In [24]:
#General
import numpy as np
import pandas as pd
import sys
from collections import Counter
import matplotlib.pyplot as plt
import string
import re
import sys

#NLP
import nltk
from nltk.tokenize import word_tokenize

#Stop words
nltk.download('stopwords')
from nltk.corpus import stopwords
stop = stopwords.words('english')

#Lemmatization
nltk.download('wordnet')
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()

#Stemming
from nltk.stem.snowball import SnowballStemmer
stemmer = SnowballStemmer("english")

#Spell Checker
from spellchecker import SpellChecker
spell = SpellChecker()
spell.correction('awesom')

#Modelling Basics
from sklearn.model_selection import cross_val_score
from scipy.sparse import hstack
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

#Dealing Imbalance & Model Save
from imblearn.over_sampling import SMOTE
from collections import Counter
import pickle

#Models
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from xgboost import XGBClassifier

#Cosine Similarity
from sklearn.metrics.pairwise import pairwise_distances

#Model Accuracy
from sklearn.metrics import accuracy_score

[nltk_data] Downloading package stopwords to C:\Users\Octillion
[nltk_data]     0017\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to C:\Users\Octillion
[nltk_data]     0017\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [2]:
df = pd.read_csv('input/sample30.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id                    30000 non-null  object
 1   brand                 30000 non-null  object
 2   categories            30000 non-null  object
 3   manufacturer          29859 non-null  object
 4   name                  30000 non-null  object
 5   reviews_date          29954 non-null  object
 6   reviews_didPurchase   15932 non-null  object
 7   reviews_doRecommend   27430 non-null  object
 8   reviews_rating        30000 non-null  int64 
 9   reviews_text          30000 non-null  object
 10  reviews_title         29810 non-null  object
 11  reviews_userCity      1929 non-null   object
 12  reviews_userProvince  170 non-null    object
 13  reviews_username      29937 non-null  object
 14  user_sentiment        29999 non-null  object
dtypes: int64(1), object(14)
memory usage

In [4]:
df['user_sentiment'].value_counts()

Positive    26632
Negative     3367
Name: user_sentiment, dtype: int64

In [5]:
#Remove the review row were username is null
df = df[df['reviews_username'].notna()]

In [6]:
#Remove the review row were user sentiment is null
df = df[df['user_sentiment'].notna()]

In [7]:
#Replace the review title null values with space
df['reviews_title']= df['reviews_title'].fillna(' ')

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29936 entries, 0 to 29999
Data columns (total 15 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   id                    29936 non-null  object
 1   brand                 29936 non-null  object
 2   categories            29936 non-null  object
 3   manufacturer          29795 non-null  object
 4   name                  29936 non-null  object
 5   reviews_date          29896 non-null  object
 6   reviews_didPurchase   15931 non-null  object
 7   reviews_doRecommend   27395 non-null  object
 8   reviews_rating        29936 non-null  int64 
 9   reviews_text          29936 non-null  object
 10  reviews_title         29936 non-null  object
 11  reviews_userCity      1900 non-null   object
 12  reviews_userProvince  166 non-null    object
 13  reviews_username      29936 non-null  object
 14  user_sentiment        29936 non-null  object
dtypes: int64(1), object(14)
memory usage

In [9]:
df.head()

Unnamed: 0,id,brand,categories,manufacturer,name,reviews_date,reviews_didPurchase,reviews_doRecommend,reviews_rating,reviews_text,reviews_title,reviews_userCity,reviews_userProvince,reviews_username,user_sentiment
0,AV13O1A8GV-KLJ3akUyj,Universal Music,"Movies, Music & Books,Music,R&b,Movies & TV,Mo...",Universal Music Group / Cash Money,Pink Friday: Roman Reloaded Re-Up (w/dvd),2012-11-30T06:21:45.000Z,,,5,i love this album. it's very good. more to the...,Just Awesome,Los Angeles,,joshua,Positive
1,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor. This review was collected as part...,Good,,,dorothy w,Positive
2,AV14LG0R-jtxr-f38QfS,Lundberg,"Food,Packaged Foods,Snacks,Crackers,Snacks, Co...",Lundberg,Lundberg Organic Cinnamon Toast Rice Cakes,2017-07-09T00:00:00.000Z,True,,5,Good flavor.,Good,,,dorothy w,Positive
3,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-01-06T00:00:00.000Z,False,False,1,I read through the reviews on here before look...,Disappointed,,,rebecca,Negative
4,AV16khLE-jtxr-f38VFn,K-Y,"Personal Care,Medicine Cabinet,Lubricant/Sperm...",K-Y,K-Y Love Sensuality Pleasure Gel,2016-12-21T00:00:00.000Z,False,False,1,My husband bought this gel for us. The gel cau...,Irritation,,,walker557,Negative


**Here other than name, reviews_rating, reviews_title, reviews_text, reviews_username & user_sentiment other columns are not required.**

# Data Preparation

In [69]:
df_master = df[['reviews_username','name','reviews_rating','user_sentiment']].copy()
df_master['reviews'] = df['reviews_title'] + " " + df['reviews_text']
df_master.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 29936 entries, 0 to 29999
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   reviews_username  29936 non-null  object
 1   name              29936 non-null  object
 2   reviews_rating    29936 non-null  int64 
 3   user_sentiment    29936 non-null  object
 4   reviews           29936 non-null  object
dtypes: int64(1), object(4)
memory usage: 1.4+ MB


In [70]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,Just Awesome i love this album. it's very good...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,Good Good flavor. This review was collected as...
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,Good Good flavor.
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,Disappointed I read through the reviews on her...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,Irritation My husband bought this gel for us. ...


In [71]:
#Remove the Hyperlinks
df_master['reviews'] = df_master['reviews'].apply(lambda x:re.sub(r"http\S+", "", x))

In [72]:
#Remove the numbers
df_master['reviews'] = df_master['reviews'].apply(lambda x:re.sub(r"[0-9]", "", x))

In [73]:
#Remove Punctuations/Special Characters
df_master['reviews'] = df_master['reviews'].apply(lambda x:''.join([i for i in x if i not in string.punctuation]))

In [74]:
#Lower case the text
df_master['reviews'] = df_master['reviews'].str.lower()

In [75]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,just awesome i love this album its very good m...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor this review was collected as ...
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappointed i read through the reviews on her...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,irritation my husband bought this gel for us t...


In [76]:
#Tokenize & Remove the stop words
df_master['reviews'] = df_master['reviews'].apply(word_tokenize)
df_master['reviews'] = df_master['reviews'].apply(lambda x: [i for i in x if i not in stop])
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesome, love, album, good, hip, hop, side, c..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collected, part, ..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappointed, read, reviews, looking, buying,..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irritation, husband, bought, gel, us, gel, ca..."


### Lemmatization & Stemming

In [77]:
#Conveting Tokenized reviews to String
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,awesome love album good hip hop side current p...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor review collected part promotion
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappointed read reviews looking buying one c...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,irritation husband bought gel us gel caused ir...


In [78]:
#Tokenize and Lemmatize
def lemmatize_text(text):
    return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)]

df_master['reviews'] = df_master.reviews.apply(lemmatize_text)
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesome, love, album, good, hip, hop, side, c..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collected, part, ..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappointed, read, review, looking, buying, ..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irritation, husband, bought, gel, u, gel, cau..."


In [79]:
#Conveting Tokenized reviews to String
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))

In [80]:
def stemming_text(text):
    return [stemmer.stem(w) for w in w_tokenizer.tokenize(text)]

df_master['reviews'] = df_master.reviews.apply(stemming_text)
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,"[awesom, love, album, good, hip, hop, side, cu..."
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor, review, collect, part, pr..."
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,"[good, good, flavor]"
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,"[disappoint, read, review, look, buy, one, cou..."
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,"[irrit, husband, bought, gel, u, gel, caus, ir..."


### Spell Checker

In [81]:
def correct_spellings(text):
    corrected_text = []
    missplled_words = spell.unknown(text.split())
    for word in text.split():
        if word in missplled_words:
            corrected_text.append(spell.correction(word))
        else:
            corrected_text.append(word)
    return " ".join(corrected_text)  

correct_spellings("speling correctin")

'spelling correcting'

In [82]:
#Spell checking
df_master['reviews'] = df_master['reviews'].apply(lambda x: " ".join(x))
df_master['reviews'] = df_master.reviews.apply(correct_spellings)
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,awesome love album good hip hop side current p...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor review collect part promote
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappoint read review look buy one coupl rubr...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,ifrit husband bought gel u gel caus ifrit felt...


In [83]:
df_master.to_csv('output/clean_dataset.csv')

# Data Modeling

In [10]:
df_master = pd.read_csv('output/master_clean_dataset.csv')
df_master.drop(['Unnamed: 0'], inplace=True, axis=1)

In [11]:
df_master.head()

Unnamed: 0,reviews_username,name,reviews_rating,user_sentiment,reviews
0,joshua,Pink Friday: Roman Reloaded Re-Up (w/dvd),5,Positive,awesome love album good hip hop side current p...
1,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor review collect part promote
2,dorothy w,Lundberg Organic Cinnamon Toast Rice Cakes,5,Positive,good good flavor
3,rebecca,K-Y Love Sensuality Pleasure Gel,1,Negative,disappoint read review look buy one coupl rubr...
4,walker557,K-Y Love Sensuality Pleasure Gel,1,Negative,ifrit husband bought gel u gel caus ifrit felt...


In [14]:
X=df_master['reviews'].copy()
y=df_master['user_sentiment'].copy()
X.head()

0    awesome love album good hip hop side current p...
1         good good flavor review collect part promote
2                                     good good flavor
3    disappoint read review look buy one coupl rubr...
4    ifrit husband bought gel u gel caus ifrit felt...
Name: reviews, dtype: object

In [None]:
#Tfid Vectorizer
word_vectorizer = TfidfVectorizer(
    sublinear_tf=True,
    strip_accents='unicode',
    analyzer='word',
    token_pattern=r'\w{1,}',
    stop_words='english',
    ngram_range=(1,1),
    max_features=10000)
word_vectorizer.fit(X)
X = word_vectorizer.transform(X)

#Save the pickel file
pickle.dump(word_vectorizer, open('output/word_vec.pkl', 'wb'))

In [17]:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=77)

In [21]:
counter = Counter(y_train)
print('Before',counter)
# oversampling the train dataset using SMOTE
smt = SMOTE()
X_train_sm, y_train_sm = smt.fit_resample(X_train, y_train)

counter = Counter(y_train_sm)
print('After',counter)

Before Counter({'Positive': 18601, 'Negative': 2354})
After Counter({'Positive': 18601, 'Negative': 18601})


### Random Forest

In [32]:
#Random Forest Model
classifier = RandomForestClassifier()
classifier.fit(X_train_sm, y_train_sm)

RandomForestClassifier()

In [33]:
#Random Forest Model Prediction
r_pred = classifier.predict(X_test)

In [34]:
#Random Forest Model Accuracy
rf_accuracy = accuracy_score(r_pred, y_test)
rf_accuracy

0.9030174813495156

### SVM

In [35]:
#SVM Model
model = LinearSVC()
model.fit(X_train_sm, y_train_sm)

LinearSVC()

In [36]:
#SVM Model Prediction
svm_pred = model.predict(X_test)

In [37]:
#SVM Model Acurracy
rf_accuracy_svm = accuracy_score(svm_pred, y_test)
rf_accuracy_svm

0.8581449727201871

### XG Boost

In [38]:
#XG Boost Model Building
XGB = XGBClassifier(learning_rate=0.05,max_depth=5)
XGB.fit(X_train_sm,y_train_sm)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.05, max_delta_step=0, max_depth=5,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [39]:
#XG Boost Model Prediction
XGB_pred = XGB.predict(X_test)

In [40]:
#XG Boost Model Acurracy
rf_accuracy_xgb = accuracy_score(XGB_pred, y_test)
rf_accuracy_xgb

0.8416657387818729

**Better model is Random Forest according to the evaulation.**

# RECOMMENDATION SYSTEM

In [7]:
df_recommend = df[['name', 'user_sentiment', 'reviews_rating']].copy()
df_recommend.head()

Unnamed: 0,name,user_sentiment,reviews_rating
0,Pink Friday: Roman Reloaded Re-Up (w/dvd),Positive,5
1,Lundberg Organic Cinnamon Toast Rice Cakes,Positive,5
2,Lundberg Organic Cinnamon Toast Rice Cakes,Positive,5
3,K-Y Love Sensuality Pleasure Gel,Negative,1
4,K-Y Love Sensuality Pleasure Gel,Negative,1


In [None]:
df_recommend = df[['name', 'user_sentiment', 'reviews_rating']].copy()
df_recommend.head()

In [None]:
regiment_Rating_Score = df['Rating_Score'].groupby(df['regiment'])

In [None]:
#Removing the repeated rating of same user for same product
df_master.groupby(by=["reviews_username","name"]).mean()

In [None]:
df_master =  df_master.groupby(by=["reviews_username","name"]).mean().reset_index()
df_master.shape

In [None]:
train, test = train_test_split(df_master, test_size=0.30, random_state=43)
print(train.shape)
print(test.shape)

In [None]:
train['reviews_username'].nunique()

In [None]:
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).fillna(0)


df_pivot.head(3)

In [None]:
# Copy the train dataset into dummy_train
dummy_train = train.copy()
dummy_train.head()

In [None]:
# The products not rated by user is marked as 1 for prediction. 
dummy_train['reviews_rating'] = dummy_train['reviews_rating'].apply(lambda x: 0 if x>=1 else 1)

In [None]:
# Convert the dummy train dataset into matrix format.
dummy_train = dummy_train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).fillna(1)
dummy_train.head()

### Adjusted Cosine similarity

In [None]:
# Create a user-movie matrix.
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
)
df_pivot.head()

In [None]:
#Normalising the rating of the movie for each user around 0 mean
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T
df_subtracted.head()

In [None]:
# Creating the User Similarity Matrix using pairwise_distance function.
user_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
user_correlation[np.isnan(user_correlation)] = 0
print(user_correlation)

In [None]:
user_correlation.shape

In [84]:
## Prediciting for User-User based recommendation 

In [None]:
user_correlation[user_correlation<0]=0
user_correlation

In [None]:
user_predicted_ratings = np.dot(user_correlation, df_pivot.fillna(0))
user_predicted_ratings

In [None]:
user_predicted_ratings.shape

In [None]:
user_final_rating = np.multiply(user_predicted_ratings,dummy_train)
user_final_rating.head()

### Recommending top 20

In [None]:
# Take the user ID as input.
#user_input = str(input("Enter your user name"))
user_input = '01impala'
print(user_input)

In [None]:
user_final_rating.head()

In [None]:
d = user_final_rating.loc[user_input].sort_values(ascending=False)[0:20]
d

## Evaluation for User-User based recommendation

In [None]:
# Find out the common users of test and train dataset.
common = test[test.reviews_username.isin(train.reviews_username)]
common.shape

In [None]:
common.head()

In [None]:
# convert into the user-product matrix.
common_user_based_matrix = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating')

In [None]:
common_user_based_matrix.head()

In [None]:
common_user_based_matrix.shape

In [None]:
# Convert the user_correlation matrix into dataframe.
user_correlation_df = pd.DataFrame(user_correlation)

In [None]:
user_correlation_df.head()

In [None]:
user_correlation_df.shape

In [None]:
df_subtracted.shape

In [None]:
user_correlation_df['reviews_username'] = df_subtracted.index

user_correlation_df.set_index('reviews_username',inplace=True)
user_correlation_df.head()

In [None]:
user_correlation_df.shape

In [None]:
common.head(3)

In [None]:
list_name = common.reviews_username.tolist()
user_correlation_df.columns = df_subtracted.index.tolist()
user_correlation_df_1 =  user_correlation_df[user_correlation_df.index.isin(list_name)]

In [None]:
user_correlation_df_2 = user_correlation_df_1.T[user_correlation_df_1.T.index.isin(list_name)]
user_correlation_df_3 = user_correlation_df_2.T
user_correlation_df_3.head()

In [None]:
user_correlation_df_3.shape

In [None]:
user_correlation_df_3[user_correlation_df_3<0]=0
common_user_predicted_ratings = np.dot(user_correlation_df_3, common_user_based_matrix.fillna(0))
common_user_predicted_ratings

In [None]:
print(common_user_predicted_ratings.shape)
print(common.shape)

In [None]:
dummy_test = common.copy()

dummy_test['reviews_rating'] = dummy_test['reviews_rating'].apply(lambda x: 1 if x>=1 else 0)

dummy_test = dummy_test.pivot_table(index='reviews_username', columns='name', values='reviews_rating').fillna(0)

In [None]:
common_user_predicted_ratings = np.multiply(common_user_predicted_ratings,dummy_test)

In [None]:
common_user_predicted_ratings.head()

### Calculating RMSE

In [None]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *

X  = common_user_predicted_ratings.copy() 
X = X[X>0]

scaler = MinMaxScaler(feature_range=(1, 5))
print(scaler.fit(X))
y = (scaler.transform(X))

print(y)

In [None]:
common_ = common.pivot_table(index='reviews_username', columns='name', values='reviews_rating')

In [None]:
# Finding total non-NaN value
total_non_nan = np.count_nonzero(~np.isnan(y))

In [None]:
rmse = (sum(sum((common_ - y )**2))/total_non_nan)**0.5
print(rmse)

## Prediciting for Item-Item based recommendation 

In [None]:
df_pivot = train.pivot(
    index='reviews_username',
    columns='name',
    values='reviews_rating'
).T
df_pivot.head()

In [None]:
mean = np.nanmean(df_pivot, axis=1)
df_subtracted = (df_pivot.T-mean).T
df_subtracted.head()

In [None]:
# Item Similarity Matrix
item_correlation = 1 - pairwise_distances(df_subtracted.fillna(0), metric='cosine')
item_correlation[np.isnan(item_correlation)] = 0
item_correlation

In [None]:
item_correlation.shape

In [None]:
#Filtering the correlation only for which the value is greater than 0. (Positively correlated)
item_correlation[item_correlation<0]=0
item_correlation

In [None]:
item_predicted_ratings = np.dot((df_pivot.fillna(0).T),item_correlation)
item_predicted_ratings

In [None]:
print(item_predicted_ratings.shape)
print(dummy_train.shape)

In [None]:
#Filtering the rating only for the movies not rated by the user for recommendation
item_final_rating = np.multiply(item_predicted_ratings,dummy_train)
item_final_rating.head()

### Recommending top 20

In [None]:
# Take the user ID as input
#user_input = str(input("Enter your user name"))
user_input = '01impala'
print(user_input)

In [None]:
item_final_rating.head(20)

In [None]:
# Recommending the Top 20 products to the user.
d = pd.DataFrame(item_final_rating.loc[user_input].sort_values(ascending=False)[0:20])
d

In [None]:
c = d.index.to_list()

In [85]:
## Evaluation for Item-Item based recommendation 