# **Recommeder System Using Amazon Reviews**

> <center><img src="https://cdn.vox-cdn.com/thumbor/yUpkdU-kEcTqiO0gntohs74rXYE=/1400x788/filters:format(jpeg)/cdn.vox-cdn.com/uploads/chorus_asset/file/19718016/Amazon_Reviews_Final.jpg" width="1000px"></center>

<h1 style='color:white;background-color:black' > Table of Contents </h1>

* [Introduction](#introduction)
* [Data Acquisition](#data_acquisition)
* [EDA](#eda)
* [Type of Recommender System](#recommender)
    * [Popular-Based](#popular)
    * [Content-Based](#content)
    * [Collaborative Filtering](#colla)
    * [Hybrid](#hybrid)

<a id="introduction"></a>
## 1. Introduction

<div align='left'><font size="3" color="#000000"> Online E-commerce websites like Amazon, Filpkart uses different recommendation models to provide different suggestions to different users. Amazon currently uses item-to-item collaborative filtering, which scales to massive data sets and produces high-quality recommendations in real time. This type of filtering matches each of the user's purchased and rated items to similar items, then combines those similar items into a recommendation list for the user.
</font></div>

* **Goal:**
<div align='left'><font size="3" color="#000000"> In this project we are going to build recommendation model for the electronics products of Amazon.
</font></div>


* **Attribute Information:**
<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>userId : Every user identified with a unique id (First Column)</li>
  <li>productId : Every product identified with a unique id(Second Column)</li>
  <li>Rating : Rating of the corresponding product by the corresponding user(Third Column)</li>
  <li>timestamp : Time of the rating ( Fourth Column)</li>
</ul>
</font></div>


<a id="data_acquisition"></a>
## 2. Data Acquisition

### Import Libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
%matplotlib inline

# Split
from sklearn.model_selection import train_test_split

from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### Load Dataset

In [None]:
df = pd.read_csv("/kaggle/input/amazon-product-reviews/ratings_Electronics (1).csv",
                             names=['userId', 'productId','rating','timestamp'])

<a id="eda"></a>
## 3. EDA

In [None]:
df.head()

In [None]:
print("Total Reviews:",df.shape[0])
print("Total Columns:",df.shape[1])

In [None]:
# Taking subset of the dataset
df = df.iloc[:5000,0:]

In [None]:
print("Total Reviews:",df.shape[0])
print("Total Columns:",df.shape[1])

In [None]:
print("Total number of ratings :",df.rating.nunique())
print("Total number of users   :", df.userId.nunique())
print("Total number of products  :", df.productId.nunique())

In [None]:
df.info()

In [None]:
# Check missing value
df.isnull().sum()

In [None]:
# Check Duplicate data
df[df.duplicated()].any()

In [None]:
# rating describe summary 
df.describe()['rating']

In [None]:
print("Unique value of Rating:",df.rating.unique())

In [None]:
# Find the minimum and maximum ratings
print('Minimum rating is: %d' %(df.rating.min()))
print('Maximum rating is: %d' %(df.rating.max()))

### 3.1 Data Visualization

In [None]:
# Average rating of products
ratings = pd.DataFrame(df.groupby('productId')['rating'].mean())
ratings['ratings_count'] = pd.DataFrame(df.groupby('productId')['rating'].count())
ratings['ratings_average'] = pd.DataFrame(df.groupby('productId')['rating'].mean())
ratings.head(10)

In [None]:
plt.figure(figsize=(10,4))
ratings['rating'].hist(bins=70)

In [None]:
sns.jointplot(x='rating',y='ratings_count',data=ratings,alpha=0.5)

In [None]:
# Most top 30 products
popular_products = pd.DataFrame(df.groupby('productId')['rating'].count())
most_popular = popular_products.sort_values('rating', ascending=False)
most_popular.head(30).plot(kind = "bar",figsize=(12, 4))

<a id="recommender"></a>
## 4. Type of Recommender System

### Methods Used
<div align='left'><font size="3" color="#000000"> Four types of recommender systems:
</font></div>

<div align='left'><font size="3" color="#000000"> 
<ol>
  <li>Popular-Based</li>
  <li>Content-Based</li>
  <li>Collaborative Filtering</li>
  <li>Hybrid</li>
</ol>
</font></div>

> <center><img src="https://miro.medium.com/max/700/1*AaE5pUCOkMS6Dv6j96trsA.png" width="500px"></center>

<a id="popular"></a>
### 1. Popular-Based

<div align='left'><font size="3" color="#000000"> This is the baseline performance and the most intuitive recommendation that we can find anywhere. Examples are the IMDB top-rated movies, Top 10 in your country today in Netflix, etc. These recommendations can be found when you are a new joiner and the provider doesn't have enough information about you. So it would be a safe bet to recommend to you what others like.
</font></div>

<div align='left'><font size="3" color="#000000">❗ Limitation: All users get the same recommendation set. It's not personalized.
</font></div>

<div align='left'><font size="3" color="#000000">Example illustration for movie recommendation below:
</font></div>

> <center><img src="https://miro.medium.com/max/646/1*7v-Ha1BOzh2r2y_96WceIg.png" width="500px"></center>


#### 1.1 Develop Recommendation System using Popular-Based method

<div align='left'><font size="3" color="#000000"> Weighted rating systems used to score the rating of each movie. Here is the formula of the weighted rating score.
</font></div>

<div align='center'><font size="4" color="#000000"> WR = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
</font></div>

<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>R is the average rating for the item.</li>
  <li>v is the number of votes for the item.</li>
  <li>m is the minimum votes required to be listed in the popular items(defined by > percentile 80 of total votes)</li>
  <li>C is the average rating across the whole dataset.</li>
</ul>
</font></div>

In [None]:
vote_counts = ratings[ratings['ratings_count'].notnull()]['ratings_count'].astype('int')
vote_averages = ratings[ratings['ratings_average'].notnull()]['ratings_average'].astype('int')
C = vote_averages.mean()
print("Average rating of product across the whole dataset is",C)

In [None]:
m = vote_counts.quantile(0.95)
print("Minimum votes required to be listed in the chart is",m)

In [None]:
ratings.head()

In [None]:
qualified = ratings[(ratings['ratings_count'] >= m) & (ratings['ratings_count'].notnull()) & (ratings['ratings_average'].notnull())][['ratings_count', 'ratings_average']]

In [None]:
qualified['ratings_count'] = qualified['ratings_count'].astype('int')
qualified['ratings_average'] = qualified['ratings_average'].astype('int')
qualified.head().sort_values(by='ratings_count', ascending=False)

In [None]:
qualified.shape

In [None]:
def weighted_rating(x):
    v = x['ratings_count']
    R = x['ratings_average']
    return (v/(v+m) * R) + (m/(m+v) * C)

In [None]:
qualified['wr'] = qualified.apply(weighted_rating, axis=1)

In [None]:
qualified = qualified.sort_values('wr', ascending=False).head(20)

#### 1.2 Top 10 Products for recommendation using Popular-based method

In [None]:
qualified.head(10)

In [None]:
# Add color
from matplotlib import cm
color = cm.inferno_r(np.linspace(.4, .8, 30))

rating_plot_count = qualified['ratings_count'].plot.bar(figsize=(12, 4),color=color)
rating_plot_count.set_title("Rating Count Bar-Plot")
rating_plot_count.set_xlabel("productId")
rating_plot_count.set_ylabel("Count")

In [None]:
rating_plot_avg = qualified['ratings_average'].plot.bar(figsize=(12, 4),color=color)
rating_plot_avg.set_title("Rating Average Bar-Plot")
rating_plot_avg.set_xlabel("productId")
rating_plot_avg.set_ylabel("rating")

In [None]:
wr_plot = qualified['wr'].plot.bar(figsize=(12, 4),color=color)
wr_plot.set_title("Weight Rating Bar-Plot")
wr_plot.set_xlabel("productId")
wr_plot.set_ylabel("rating")

<a id="content"></a>
### 2. Content-Based

<div align='left'><font size="3" color="#000000"> This method will be similar to the popular-based or content-based recommendation. However, the difference is how to come up with a set of similar items. Here we use the user-item interaction matrix rather than the rating or genres like the methods mentioned above. In general, content-based focuses on the attributes of items and provides you with recommendations based on the similarities between them. We will find similarities between each type of product item and rank the similarity score from highest to lowest and select product sets based on the number of recommendations we want to offer.
</font></div>

<div align='left'><font size="3" color="#000000">❗ Limitation: The recommendation will be limited to what users liked, watched, interacted with before. It doesn't give users a chance to explore a new area they’ve never been to before. Also, all users who like item X will receive the same recommendation set.
</font></div>

<div align='left'><font size="3" color="#000000">Example illustration for movie recommendation below:
</font></div>

> <center><img src="https://miro.medium.com/max/647/1*NvmFrVY5BDI7Fjw6SO9Kbg.png" width="500px"></center>

<div align='left'><font size="3" color="#000000"> In this particular case, no product type is represented so we can't really implement this method due to the lack of information about the product type.
</font></div>

### 3. Collaborative Filtering

<div align='left'><font size="3" color="#000000"> In general, Collaborative filtering (CF) is more commonly used than content-based systems because it usually gives better results and is relatively easy to understand (from an overall implementation perspective). CF is based on the idea that the best recommendations come from people who have similar tastes. In other words, it uses historical item ratings of like-minded people to predict how someone would rate an item.
</font></div>

<div align='left'><font size="3" color="#000000"> Collaborative filtering recommends the set of items based on what is called the user-item interaction matrix. Here is how the user-item interaction matrix look likes.
</font></div>



> <center><img src="https://miro.medium.com/max/614/1*HzXfBUMiFl6gezFT9bx-Tw.png" width="500px"></center>

<div align='left'><font size="3" color="#000000">Collaborative filtering  can be divided into two:
</font></div>

<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>Memory-Based Collaborative Filtering</li>
  <li>Model-Based Collaborative filtering</li>
</ul>
</font></div>

#### 1.1 Memory-Based Collaborative Filtering

<div align='left'><font size="3" color="#000000">Memory-Based Collaborative Filtering approaches can be divided into two main sections: 
</font></div>

<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>User-item filtering</li>
  <li>Item-item filtering</li>
</ul>
</font></div>

<div align='left'><font size="3" color="#000000">A user-item filtering will take a particular user, find users that are similar to that user based on similarity of ratings, and recommend items that those similar users liked. 
</font></div>

<div align='left'><font size="3" color="#000000">In contrast, item-item filtering will take an item, find users who liked that item, and find other items that those users or similar users also liked. It takes items and outputs other items as recommendations. 
</font></div>

<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>Item-Item Collaborative Filtering: “Users who liked this item also liked …”</li>
  <li>User-Item Collaborative Filtering: “Users who are similar to you also liked …”</li>
</ul>
</font></div>

#### 1.1.1 Collaberative filtering (User-Item recommedation)

<div align='left'><font size="3" color="#000000">In this method we will be using a well-known matrix factorization called Singular value decomposition (SVD). This technique will personalize our recommendation based on the similar group of users we derived from the above user-item interaction matrix. The below figure shows you how we came up with the set of recommendations for user#1. You can see that for each user, the set of recommendations will change based on the group of similar users, and the group of similar users will vary based on how user#1 interacts with each item.
</font></div>

> <center><img src="https://miro.medium.com/max/700/1*YGDIaNdODnlvhA7BrDoXVA.png" width="700px"></center>

In [None]:
reader = Reader()

In [None]:
df.head()

In [None]:
data = Dataset.load_from_df(df[['userId', 'productId', 'rating']], reader)

In [None]:
# Use the famous SVD algorithm
svd = SVD()

# Run 5-fold cross-validation and then print results
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

<div align='left'><font size="3" color="#000000">From these results, the mean Root Mean Square Error is not good for our case, this happens due to several factors such as lack of training data because we only apply 5000 data. Now let's train our dataset and make some predictions.
</font></div>

In [None]:
trainset = data.build_full_trainset()
svd.fit(trainset)

In [None]:
df.head()

In [None]:
df['userId'].value_counts()

In [None]:
# Check specific userId review
df[df['userId'] == 'A3LDPF5FMB782Z']

In [None]:
# predict based on this data
svd.predict('A3LDPF5FMB782Z', '140053271X', 5.0)

<div align='left'><font size="3" color="#000000">From the prediction results, we have estimation that are quite close to the actual value.
</font></div>

<a id="hybrid"></a>
### 4. Hybrid

<div align='left'><font size="3" color="#000000">We see that each method has its strength. It would be best if we can combine all those strengths and provide a better recommendation. This idea leads us to another improvement of the recommendation, which is the hybrid method. For example, we can combine the content-based and item-based collaborative filtering recommendations together to leverage both domain features (genres and user-item interaction).
</font></div>

## 5. References

<div align='left'><font size="3" color="#000000"> 
<ul>
  <li>https://www.kaggle.com/rounakbanik/movie-recommender-systems</li>
  <li>https://towardsdatascience.com/a-complete-guide-to-recommender-system-tutorial-with-sklearn-surprise-keras-recommender-5e52e8ceace1</li>
</ul>
</font></div>