# Skincare Recommendation System using Collaborative Filtering (Matrix Factorization)

## Introduction

**Have you ever wondered how websites like Bukalapak are so good at suggesting products you might want, even before you start searching for them?**

Well, the answer to this marvel lies in something called "Recommendation Systems." These systems are like your shopping companions in the digital world, and they're pretty good at understanding your preferences and showing you things you'd love.

Let's break it down a bit. Recommendation systems are like helpful friends who pay attention to what you look at, what you buy, and even what you put in your virtual shopping cart. They do this to get a sense of what you like. Then, using some clever math and algorithms, they suggest other products that you might find interesting. That's why you often see suggestions like "People Also Bought" or "You Might Also Like."

But how do these systems know what you'll like? That's what we'll explore **Collaborative filtering** is one of basic models for recommendation system which are based on assumption that people like things similar to other things they like, or things that are liked by other people which have similar taste with them.

Let's dive into the fascinating world of product recommendations in e-commerce!

<img src="assets/New Project.png" width="600" />

<div style="text-align:justify">From the ilustration above, information given that Kiki (girl with black cat) likes to buy apple, banana, and watermelon. While Satsuki (girl with yellow shirt) likes to buy apple and banana. They have similar taste in apple and banana, so we can recommend Satsuki to buy watermelon.<br></div>
In <b>collaborative filtering</b> method there are two approaches which can be implemented :<br>
<b>1. Memory-based approach: </b>create recommendation system by calculated closest users or items using cosine similarity or pearson correlation coefficients.<br>
<b>2. Model-based approach: </b>create recommendation system by given predict user's rating value of unrated items.<br>
<br>
<div style="text-align:justify">In this notebook, I will create simply recommender system to recommend skincare product to the customers which have never buy before. I'll predict the unrated items using Singular Value Decomposition (SVD) of <b>Matrix Factorization</b> algorithm. The data used comes from scraping result in Femaledaily Website. Data contains information about review product given by customers. There are several attribut inside, for more details, let's check it out!</div>

### Library Preparation

In [None]:
import pandas as pd
import numpy as np
from scipy.sparse.linalg import svds
import matplotlib.pyplot as plt
import seaborn as sns
import recmetrics

### Data Preparation

#### Read the Data

In [None]:
data = pd.read_csv("data_input/Skincare.csv")
data

#### Check and drop missing values

In [None]:
data.isna().sum()

#### Rename Columns

In [None]:
# rename columns 
data.rename(columns={'Reviewer':'reviewer','Product':'product','Stars':'rating'}, inplace=True)

#### Drop the reviewer with empty names

In [None]:
data = data[data['reviewer'] != ' ']

In [None]:
data

## Data Exploration

Since in the next step (modelling) we will define and create matrix based on Product X User, so we need to understanding about size of both unique product and user.

### Number of Unique Product

In [None]:
uniq_product = data['product'].nunique()
print("Number of uniq product :",uniq_product)

Here above, product have 3297 unique number, this number will become number of columns matrix in modelling step.

### Number of Unique Users

In [None]:
uniq_reviewer = data['reviewer'].nunique()
print("Number of uniq users :",uniq_reviewer)

Here above, user have 22359 unique number, this number will become number of rows matrix in modelling step.

### Distribution rating given by users

In [None]:
data['rating'].value_counts().plot(kind = 'bar', color = 'red')

From the visualization above, bar plot shown that users frequently give rating in 5 or 4, which mean that they are satisfied with the product.

## Build Recommendation System

### Matrix Factorization

<div style="text-align:justify">If you see the pivot matrix below, you will find that pivot matrix have so many zero value (missing value). Why did it happen? It can be happen because not every user give a rating in a every product, this condition called <b>sparse matrix.</b> Sparse matrix is limitation in collaborative filtering models, because sparse matrix gives bias information in our recommender system. There will be popularity bias in the recommendation given by the system to user, system will give recommends the product with the most interactions without any personalization.<br></div>
<div style="text-align:justify"><b>Matrix Factorization</b> is one way to handle those issue. Matrix factorization will breaking down of one matrix into a product of multiple matrices and give predictive rating in sparse matrix. Basic idea from matrix factorization is that attitudes or preferences of a user can be determined by a small number of hidden factors.<br></div>
Illustration given below:

<img src="assets/matrix.JPG" width="600" />

<div style="text-align:justify">Intuitively, we can understand hidden factors for items and users from illustration above. Say that U is low dimensional matrix for Users features and V is low dimensional matrix for Product features. Every matrix values represent different characteristics about the users and the product. 

We can get the predictive ratings by calculate the dot product between matrix U and matrix V.</div>

### Singular Value Decomposition (SVD)

<div style="text-align:justify"><b>Singular Value Decomposition</b> is one of type Matrix Factorization. SVD algorithm will decomposes a matrix R into the best lower rank approximation of the original matrix R. Matematically SVD produce by the formula below :</div>
<br>
<div style="text-align:center">$ R = U \Sigma V^T $</div>
<br>
<div style="text-align:justify">where U and V are orthogonal matrix with orthonormal eigenvectors and $\sum$ is the diagonal matrix of singular values (essentially weights). The matrix can be factorized as :</div>

<img src="assets/matrix_.jpg" width="400" />

We can arrange eigenvectors in different orders to produce U and V.

### Implementation Recommender System in Python Code

#### 1. Create Matrix Pivot

Create matrix pivot where the vertical value is users name, horizontal value is product name, and the value inside matrix is rating given by users.

In [None]:
matrix_pivot = pd.pivot_table(data, 
                              index = 'reviewer',
                              columns = 'product',
                              values = 'rating').fillna(0)
matrix_pivot.head()

#### 2. Normalize Rating Values

❓ **Why we do need to normalize the rating?**

Because it starts with the fact that people rate often on very different scales. Say that Kiki and Satsuki use a product B, and Kiki gives rating value 5 on that product, because Satsuki has a high standart she only gives 3 on that product. Here is, the 5 from Kiki is 3 from Satsuki. To make the model better is, we can increase the efficiency of this algorithm if we normalize user’s rating by substract rating value given by user in each product with mean rating in each product.

In [None]:
matrix_pivot_ = matrix_pivot.values
user_ratings_mean = np.mean(matrix_pivot_, axis = 1)
user_rating = matrix_pivot_ - user_ratings_mean.reshape(-1,1)

In [None]:
pd.DataFrame(user_rating).head()

#### 3. Singular Value Decomposition (SVD)

Create matrix U and Vt using library scipy.

In [None]:
from scipy.sparse.linalg import svds
U, sigma, Vt = svds(user_rating, k = 50)

In [None]:
sigma = np.diag(sigma)

#### 4. Create Predictive Rating

After we get the value from decomposition matrix above, we can create product ratings predictions for every user. 

In [None]:
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt) + user_ratings_mean.reshape(-1, 1)

And, here below matrix is result of predictive rating from each user in each product. 

In [None]:
preds_df = pd.DataFrame(all_user_predicted_ratings, columns = matrix_pivot.columns, index=matrix_pivot.index)
preds_df.head()

#### 5. Create Recommendation

In this final step we will create recommendation product. I'll return the product with the 5 highest predicted rating that the user hasn’t already rated. 

In [None]:
# step 1
preds_df.loc['zzulia'].sort_values(ascending = False)

In [None]:
# step 2
user_data = data[data['reviewer'] == 'zzulia']
user_data

In [None]:
# step 3
def recommend_product(predictions_df, user, data_,num_recommendations):
    
    user_row_number = user
    # sorting bobot untuk product yang belum pernah di rate
    sorted_user_predictions = predictions_df.loc[user_row_number].sort_values(ascending=False) # bobot tertinggi sudah terurut
    
    # menampilkan data historical dari pembelian user
    user_data = data_[data_.reviewer == (user)]
    user_full = user_data
    
    print('User {0} has already rated {1} product'.format(user, user_full.shape[0]))
    a = data_.drop_duplicates(subset='product', keep='last')

    # exclude product yang sudah dirate dari keseluruhan product
    recommendations = (a[~a['product'].isin(user_full['product'])].
         merge(pd.DataFrame(sorted_user_predictions).reset_index(), how = 'left',
               left_on = 'product',
               right_on = 'product').
         rename(columns = {user_row_number: 'Predictions'}).
         sort_values('Predictions', ascending = False).
                       iloc[:num_recommendations, :-1]
                      )
    return user_full, recommendations

Input the user id to whom you want recommend the product.

In [None]:
user = str(input("Enter the user id to whom you want to recommend : "))
already_rated, predictions = recommend_product(preds_df, user, data, 5)

Result below shown that "zzulia" already rate 3 product. Twice for Facial Mask with different rating in each product, and once for Pembersih Two In One Bengkoang Whitening.

In [None]:
already_rated

<div style="text-align:justify">And, here below 5 highest predicted rating from user id "zzulia". The recommendation system suggest "zzulia" to buy Prominent Essence Facial Mask, Facial Mask Bedak Dingin, Oil Control Mask, White Aqua Serum Sheet Mask, and Essential Vitamin. Suggested products are dominated with "Mask" product, because from historical data above "zzulia" already rate 2 product with category "Mask.</div>

In [None]:
prod_pred = predictions['product']
prod_pred

#### 6. Evaluation Criteria

You can get the rating score of the recommended item and calculate the error instead.

Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) are used to evaluate the accuracy of predicted values that such as ratings compared to the true value, y. These can also be used to evalaute the reconstruction of a ratings matrix.

In [None]:
recmetrics.mse(preds_df, matrix_pivot)