
##### Building Recommendation Model

We will build the below two recommendation systems

1.User-based recommendation system

2.Item-based recommendation system

Once built, we will analyze the recommendation systems and select the one that is best suited in this case. Once we get the best-suited recommendation system, the next task will be to recommend 20 products that a user is most likely to purchase based on the ratings.


In [2]:
# Import general purpose libraries 
import os
import re
import time
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
from datetime import datetime
import warnings
warnings.filterwarnings("ignore") 

# Set Pandas options
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_colwidth', 300)
pd.set_option("display.precision", 2)

#Helper Functions
# from utils import (
#     clean_stopwords,
#     clean_punctuation, 
#     calc_missing_rowcount,
#     clean_lemma    
# )

In [3]:


# ML Modelling Libraries

from sklearn.model_selection import train_test_split
from sklearn.metrics.pairwise import pairwise_distances, cosine_similarity #calculate distance similarity
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score



#### Import Data

In [None]:
import os
cwd = os.getcwd()
#df_reco = pd.read_csv(cwd + "/user_reviews.csv", parse_dates= ['reviews_date'])

**We need only the below columns for building a recommendation system**

["id", "name", "reviews_rating", "reviews_username"]

**Handle Null Values**

**Train-Test Split**

In [4]:
# Pivot the train ratings' dataset into matrix format in which columns are Products and the rows are usernames.


Creating dummy train & dummy test dataset

Dummy train will be used later for prediction of the products which has not been rated by the user. To ignore the products rated by the user, we will mark it as 0 during prediction. The products not rated by user is marked as 1 for prediction in dummy train dataset.

Dummy test will be used for evaluation. To evaluate, we will only make prediction on the product rated by the user. So, this is marked as 1. This is just opposite of dummy_train.


In [None]:
# Copy the train dataset into dummy_train

In [None]:
# Convert the dummy train dataset into matrix format



**Cosine Similarity**

Cosine Similarity is a measurement that quantifies the similarity between two vectors [Which is reviews_rating in this case]

**Adjusted Cosine**

Adjusted cosine similarity is a modified version of vector-based similarity where we incorporate the fact that different users have different ratings schemes. In other words, some users might rate items highly in general, and others might give items lower ratings as a preference. To handle this nature from rating given by user , we subtract average ratings for each user from each user's rating for different products.


In [None]:
# Pivot the train ratings' dataset into matrix format in which columns are Products and the rows are usernames.



Normalising the rating of the product for each user around 0 mean



**Cosine Similarity**

In [None]:
# Creating the User Similarity Matrix using pairwise_distance function.


Prediction - User User

Doing the prediction for the users which are positively related with other users, and not the users which are negatively related as we are interested in the users which are more similar to the current users. So, we will be ignoring the correlation for values less than 0.




Rating predicted by the user (for products rated as well as not rated) is the weighted sum of correlation with the product rating (as present in the rating dataset).




Finding the Top 20 products that a user is most likely to purchase based on the ratings (user-user based recommendation


In [5]:
# Take a sample username as input.
user_input = '00sab00'




**Evaluation - User User**

Evaluation will we same as you have seen above for the prediction. The only difference being, you will evaluate for the product already rated by the user insead of predicting it for the product not rated by the user.


In [None]:
# Find out the common users of test and train dataset.

In [None]:
# convert into the user-product matrix.

In [None]:
# Convert the user_correlation matrix into dataframe.

In [None]:
# Creating dummy test dataframe



Calculating the RMSE for only the products rated by user. For RMSE, normalising the rating to (1,5) range.


In [6]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *



RMSE (Root Mean Square Error) for User-User recommendation system


##### Using Item Similarity


Taking the transpose of the rating matrix to normalize the rating around the mean for different Product ID. In the user based similarity, we had taken mean for each user instead of each product.





Normalizing the product rating for each product for using the Adujsted Cosine


In [7]:
# Item Similarity Matrix


Prediction Item-Item




Filtering the rating only for the products not rated by the user for recommendation




Finding the Top 20 products that a user is most likely to purchase based on the ratings (item-item based recommendation)


**Evaluation**

Find out the common products of test and train dataset.


In [8]:
from sklearn.preprocessing import MinMaxScaler
from numpy import *


Best-suited Recommendation model

* To get the best recommendation model, we will use RMSE (Root Mean Square Error) scores for both user-user and item-item based recommendation systems and do a comparison of the values.

* The recommendation model with the least RMSE will be selected as the best recommendation model.





Top 20 Product recommendations to the user by the best recommendation model 


Load sentiment model and updated data

In [10]:
import pickle
# pickled_model = pickle.load(open('sentiment_classification_logreg_model.pkl', 'rb'))
# pickled_tfidf = pickle.load(open('tfidf-vectorizer.pkl', 'rb'))
# pickled_data = pickle.load(open('cleaned_data.pkl', 'rb'))



Fine-Tuning the Recommendation System and Recommendation of Top 5 Products


In [11]:
# Create function to recommend top 5 products to any user
def product_recommendations_user(user_name):
    # Get top 20 recommended products from the best recommendation model

    # Get only the recommended products from the prepared dataframe "df_sent"

    # For these 20 products, get their user reviews and pass them through TF-IDF vectorizer to convert the data into suitable format for modeling

    # Use the best sentiment model to predict the sentiment for these user reviews

    # Select only name and predicted_sentiment

    # Create a new dataframe "pred_df" to store the count of positive user sentiments


    # Create a column to measure the total sentiment count

    # Create a column that measures the % of positive user sentiment for each product review

    # Return top 5 recommended products to the user

    pass


Top 5 Products Recommendation


In [12]:


# Take a sample username as input
user_input = 'debb'
# 'Venkat'
# 'debb'
# 'evrydayhustla420'
print(f"Printing the top 5 recommended products for the user: {user_input} along with each product's positive sentiment count, overall review count and positive sentiment %")
print("\n")
top5_reco_sent_reco_user = product_recommendations_user(user_input)
top5_reco_sent_reco_user



Printing the top 5 recommended products for the user: debb along with each product's positive sentiment count, overall review count and positive sentiment %


