## Business Understanding

### Problem Statements
Berdasarkan kondisi yang telah diuraikan sebelumnya, muncul dalam pertanyaan, seperti:
- Bagaimana sistem rekomendasi berbasis kesamaan kategori produk dapat membantu pengguna menemukan produk lain yang relevan?
- Bagaimana sistem ini dapat memberikan rekomendasi berdasarkan informasi kategori dan popularitas produk melalui rating?
- Bagaimana sistem ini dapat memberikan rekomendasi berdasarkan informasi kategori dan kualitas produk melalui total rekomendasi di tiap produk dari pengguna?

### Goals
Tujuan utama dari proyek analisis prediktif ini adalah untuk menjawab pertanyaan-pertanyaan di atas, beberapa tujuan spesifik yang ingin dicapai adalah sebagai berikut:
- Menyediakan rekomendasi produk yang relevan dengan teknik Content-Based Filtering menggunakan kesamaan kategori produk
- Memanfaatkan informasi kategori dan rating produk untuk menampilkan produk serupa yang lebih relevan bagi pengguna
- Memanfaatkan informasi kategori dan total rekomendasi tiap produk untuk menampilkan produk serupa yang lebih relevan bagi pengguna

### Solution statements
Dengan menggunakan Content-Based Filtering dengan TF-IDF dan Cosine Similarity

Pendekatan ini memanfaatkan representasi tekstual produk melalui teknik TF-IDF (Term Frequency-Inverse Document Frequency) untuk mengukur pentingnya kata-kata dalam kategori produk. Setiap produk akan direpresentasikan sebagai vektor berdasarkan kata kunci yang terdapat dalam kategori, dan kesamaan antar produk akan dihitung menggunakan cosine similarity. Dengan cara ini, sistem dapat merekomendasikan produk kepada pengguna berdasarkan kesamaan kategori dengan produk yang telah mereka pilih sebelumnya.

## Data Understanding

Dataset berisi 7.636 baris dan 19 kolom, dataset terdiri dari Kolom-kolomnya berisi brand_name, product_name, product_id, beauty_point_earned, price_range, price_by_combinations, url, active_date, default_category, categories, rating_types_str, average_rating, total_reviews, average_rating_by_types, total_recommended_count, total_repurchase_maybe_count, total_repurchase_no_count, total_repurchase_yes_count, total_in_wishlist. Variabel yang akan digunakan pada kasus kali ini sebagai parameter rekomendasi adalah variabel default_category. Kondisi data masih belum bersih dengan ditandai masih adanya missing values.

Referensi:
Hafizhan Ibrahim. "Sociolla: All Brands Products Catalog". Tautan: [https://www.kaggle.com/datasets/ibrahimhafizhan/sociolla-all-brands-products-catalog]. Diakses pada 27 Oktober 2024

## Data Loading

In [6]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
import seaborn as sns

In [7]:
products = pd.read_csv('/content/products_all_brands.csv')
products

Unnamed: 0,brand_name,product_name,product_id,beauty_point_earned,price_range,price_by_combinations,url,active_date,default_category,categories,rating_types_str,average_rating,total_reviews,average_rating_by_types,total_recommended_count,total_repurchase_maybe_count,total_repurchase_no_count,total_repurchase_yes_count,total_in_wishlist
0,796_3ce,MULTI EYE COLOR PALETTE,97802,130,Rp 555.000 - Rp 687.000,,https://www.sociolla.com/eyeshadow/69460-phan-...,2022-10-03T03:30:06.681Z,Eyeshadow,Makeup; Eyes; Eyeshadow,is_star_long_wear;is_star_packaging;is_star_pi...,4.920000,5,"""star_long_wear"": 5; ""star_packaging"": 4.8; ""s...",5,0,0,5,717
1,796_3ce,VELVET LIP TINT,97810,50,Rp 264.000,,https://www.sociolla.com/lip-cream/69468-son-k...,2022-10-03T03:02:40.340Z,Lip Cream,Makeup; Lips; Lip Cream,is_star_long_wear;is_star_packaging;is_star_pi...,4.576190,42,"""star_long_wear"": 4.309523809523809; ""star_pac...",42,10,2,30,682
2,796_3ce,LIP COLOR,97822,60,Rp 317.000,,https://www.sociolla.com/lip-matte/69480-son-t...,2023-05-30T09:49:15.158Z,Lipstick,Makeup; Lips; Lipstick,is_star_long_wear;is_star_packaging;is_star_pi...,0.000000,0,,0,0,0,0,173
3,796_3ce,MINI MULTI EYE COLOR PALETTE,97833,80,Rp 423.000,,https://www.sociolla.com/eyeshadow/69491-phan-...,2022-10-03T03:27:01.334Z,Eyeshadow,Makeup; Eyes; Eyeshadow,is_star_long_wear;is_star_packaging;is_star_pi...,4.883333,6,"""star_long_wear"": 4.916666666666667; ""star_pac...",12,1,0,11,257
4,796_3ce,FACE BLUSH,97801,60,Rp 300.000,,https://www.sociolla.com/blush/69459-phan-ma-h...,2022-10-03T03:22:26.610Z,Blush,Makeup; Face; Blush,is_star_long_wear;is_star_packaging;is_star_pi...,4.858824,13,"""star_long_wear"": 4.9411764705882355; ""star_pa...",17,2,0,15,387
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7631,273_yves-rocher,Ambre Noir Eau De Toilette,2158,130,Rp 659.000,,https://sociolla.com/fragrance/8088-ambre-noir...,,Eau De Toilette,Shop By Departments; Home; UOBMon,is_star_long_wear;is_star_packaging;is_star_va...,4.476190,7,"""star_long_wear"": 4.285714285714286; ""star_pac...",6,1,2,4,355
7632,273_yves-rocher,Soin Stimulating Conditioner,64174,30,Rp 199.000,,https://vn.sociolla.com/d-u-x/17239-soin-stimu...,,Conditioner,Hair Care; Shampoo; Conditioner,is_star_effectiveness;is_star_packaging;is_sta...,4.638889,9,"""star_effectiveness"": 4.444444444444445; ""star...",7,1,2,6,122
7633,273_yves-rocher,Repair Hair Mask,36357,50,Rp 289.000,,https://www.sociolla.com/hair-mask/14833-repai...,2022-10-27T07:33:19.380Z,Hair Mask,Hair Care; Hair Treatment; Hair Mask,is_star_effectiveness;is_star_packaging;is_sta...,4.750000,5,"""star_effectiveness"": 5; ""star_packaging"": 5; ...",5,1,0,4,58
7634,273_yves-rocher,Hand Cream Olive Petitgrain,36404,20,Rp 109.000,,https://www.sociolla.com/221-hand-and-foot-cre...,,Hand & Foot Cream,Shop By Departments; Home; 12.12 PRICE POINT 50%,is_star_effectiveness;is_star_packaging;is_sta...,4.968750,8,"""star_effectiveness"": 5; ""star_packaging"": 5; ...",8,2,0,6,117


- Data berhasil terpanggil, di sini data terdiri dari 7636 baris dan 19 kolom.
- Kolom-kolomnya berisi brand_name, product_name, product_id, beauty_point_earned, price_range, price_by_combinations, url, active_date, default_category, categories, rating_types_str, average_rating, total_reviews, average_rating_by_types, total_recommended_count, total_repurchase_maybe_count, total_repurchase_no_count, total_repurchase_yes_count, total_in_wishlist

## Univariate Exploratory Data Analysis

Variabel-variabel pada Sociolla: All Brands Products Catalog  dataset adalah sebagai berikut:
- brand_name : id dan merek atau nama brand dari tiap produk, yang dipisahkan dengan garis bawah
- product_name : nama produk
- product_id : id produk
- beauty_point_earned : poin kecantikan yang diperoleh melalui pembelian
- price_range : kisaran umum harga produk
- price_by_combinations : kisaran khusus harga produk berdasarkan variasi produk yang berbeda
- url : URL link yang mengarahkan pada laman Sociolla.com
- active_date : Informasi tentang tanggal setiap produk menjadi aktif atau tersedia di Sociolla.com
- default_category : Kategori umum produk
- categories : Kategori khusus produk, untuk klasifikasi produk terperinci
- rating_types_str : Uasan konsumen tiap produk
- average_rating : Rata-rata rating penilaian produk
- total_reviews : Total konsumen yang memberikan ulasan
- average_rating_by_types : Rata-rata rating penilaian produk dalam aspek tertentu
- total_recommended_count : Total konsumen yang merekomendasikan produk
- total_repurchase_maybe_count : Total konsumen yang mungkin membeli produk ulang
- total_repurchase_no_count : Total konsumen yang tidak membeli produk ulang
- total_repurchase_yes_count : Total konsumen yang membeli produk ulang
- total_in_wishlist : Total konsumen yang memasukkan produk ke dalam wishlist

Variabel default_category, average_rating, dan total_recommended_count akan digunakan pada model rekomendasi. Sedangkan, variabel brand_name, product_name, dan price_range untuk melihat output yang dihasilkan.


In [8]:
products.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7636 entries, 0 to 7635
Data columns (total 19 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   brand_name                    7636 non-null   object 
 1   product_name                  7636 non-null   object 
 2   product_id                    7636 non-null   int64  
 3   beauty_point_earned           7636 non-null   int64  
 4   price_range                   7636 non-null   object 
 5   price_by_combinations         4087 non-null   object 
 6   url                           7636 non-null   object 
 7   active_date                   5534 non-null   object 
 8   default_category              7636 non-null   object 
 9   categories                    7632 non-null   object 
 10  rating_types_str              7561 non-null   object 
 11  average_rating                7636 non-null   float64
 12  total_reviews                 7636 non-null   int64  
 13  ave

In [9]:
print('Jumlah product_id: ', len(products.product_id.unique()))
print('Jumlah brand_name: ', len(products.brand_name.unique()))
print('Jumlah data average_rating: ', len(products.average_rating.unique()))
print('Jumlah data default_category: ', len(products.default_category.unique()))

Jumlah product_id:  7636
Jumlah brand_name:  321
Jumlah data average_rating:  3187
Jumlah data default_category:  195


Dengan fungsi unnique, dapat diketahui jika dataaset terdiri dari 7636 nama produk yang berbeda, 319 nama brand yang berbeda, 3187 nilai rating yang berbeda, dan 195 kategori produk yang berbeda

## Data Preparation

#### Memeriksa Data Terduplikasi

In [10]:
products.duplicated().sum()

0

Data tidak memiliki data terduplikasi

#### Mengatasi Missing Value

In [11]:
products.describe()

Unnamed: 0,product_id,beauty_point_earned,average_rating,total_reviews,total_recommended_count,total_repurchase_maybe_count,total_repurchase_no_count,total_repurchase_yes_count,total_in_wishlist
count,7636.0,7636.0,7636.0,7636.0,7636.0,7636.0,7636.0,7636.0,7636.0
mean,82903.974725,37.236773,3.517905,198.197617,186.560503,40.603457,15.479963,141.873494,633.016108
std,27257.17365,46.056962,2.000138,852.067365,803.185318,169.473278,79.716058,625.673542,2055.067002
min,68.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-317.0
25%,75267.0,10.0,3.910714,1.0,1.0,0.0,0.0,0.0,28.0
50%,91856.5,20.0,4.576923,9.0,9.0,2.0,0.0,6.0,132.0
75%,101825.5,50.0,4.738155,76.0,71.0,17.0,6.0,52.0,480.0
max,109155.0,610.0,5.0,21536.0,20804.0,5025.0,2824.0,17653.0,58568.0


In [12]:
products.isnull().sum()

Unnamed: 0,0
brand_name,0
product_name,0
product_id,0
beauty_point_earned,0
price_range,0
price_by_combinations,3549
url,0
active_date,2102
default_category,0
categories,4


terdapat banyak missing values di beberapa kolom, seperti 'price_by_combinations', 'active_date', dan 'average_rating_by_types'. Untuk itu, kolom-kolom ini akan di drop, karena memiliki missing values dan nilai yang tidak terlalu berpengaruh terhadap tujuan rekomendasi.


Kolom-kolom lainnya yang tidak berpengaruh juga akan di drop, seperti 'beauty_point_earned', 'active_date', 'categories', 'rating_types_str', 'total_repurchase_maybe_count', 'total_repurchase_no_count', 'total_repurchase_yes_count'

In [13]:
#Kolom yang tidak terlalu berpengaruh dan memiliki NaN didrop
products.drop(['beauty_point_earned'], inplace=True, axis=1)
products.drop(['price_by_combinations'], inplace=True, axis=1)
products.drop(['active_date'], inplace=True, axis=1)
products.drop(['categories'], inplace=True, axis=1)
products.drop(['rating_types_str'], inplace=True, axis=1)
products.drop(['average_rating_by_types'], inplace=True, axis=1)
products.drop(['total_repurchase_maybe_count'], inplace=True, axis=1)
products.drop(['total_repurchase_no_count'], inplace=True, axis=1)
products.drop(['total_repurchase_yes_count'], inplace=True, axis=1)
products

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist
0,796_3ce,MULTI EYE COLOR PALETTE,97802,Rp 555.000 - Rp 687.000,https://www.sociolla.com/eyeshadow/69460-phan-...,Eyeshadow,4.920000,5,5,717
1,796_3ce,VELVET LIP TINT,97810,Rp 264.000,https://www.sociolla.com/lip-cream/69468-son-k...,Lip Cream,4.576190,42,42,682
2,796_3ce,LIP COLOR,97822,Rp 317.000,https://www.sociolla.com/lip-matte/69480-son-t...,Lipstick,0.000000,0,0,173
3,796_3ce,MINI MULTI EYE COLOR PALETTE,97833,Rp 423.000,https://www.sociolla.com/eyeshadow/69491-phan-...,Eyeshadow,4.883333,6,12,257
4,796_3ce,FACE BLUSH,97801,Rp 300.000,https://www.sociolla.com/blush/69459-phan-ma-h...,Blush,4.858824,13,17,387
...,...,...,...,...,...,...,...,...,...,...
7631,273_yves-rocher,Ambre Noir Eau De Toilette,2158,Rp 659.000,https://sociolla.com/fragrance/8088-ambre-noir...,Eau De Toilette,4.476190,7,6,355
7632,273_yves-rocher,Soin Stimulating Conditioner,64174,Rp 199.000,https://vn.sociolla.com/d-u-x/17239-soin-stimu...,Conditioner,4.638889,9,7,122
7633,273_yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.750000,5,5,58
7634,273_yves-rocher,Hand Cream Olive Petitgrain,36404,Rp 109.000,https://www.sociolla.com/221-hand-and-foot-cre...,Hand & Foot Cream,4.968750,8,8,117


Dapat, dilihat, kolom-kolom yang tidak berpengaruh berhasil dihapus

In [14]:
products.isnull().sum()

Unnamed: 0,0
brand_name,0
product_name,0
product_id,0
price_range,0
url,0
default_category,0
average_rating,0
total_reviews,0
total_recommended_count,0
total_in_wishlist,0


Setelah diperiksa, nilai missing values sudah tidak ada

#### Memisahkan Nilai Kolom Name Brand

Dapat diketahui jika, kolom 'brand_name', tergabung dari dua values, yaitu nama brand, dan id brand, untuk itu dilakukan pemisahan dengan fungsi split()

In [15]:
products[['brand_name_id', 'brand_name']] = products['brand_name'].str.split('_', expand=True)
products

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
0,3ce,MULTI EYE COLOR PALETTE,97802,Rp 555.000 - Rp 687.000,https://www.sociolla.com/eyeshadow/69460-phan-...,Eyeshadow,4.920000,5,5,717,796
1,3ce,VELVET LIP TINT,97810,Rp 264.000,https://www.sociolla.com/lip-cream/69468-son-k...,Lip Cream,4.576190,42,42,682,796
2,3ce,LIP COLOR,97822,Rp 317.000,https://www.sociolla.com/lip-matte/69480-son-t...,Lipstick,0.000000,0,0,173,796
3,3ce,MINI MULTI EYE COLOR PALETTE,97833,Rp 423.000,https://www.sociolla.com/eyeshadow/69491-phan-...,Eyeshadow,4.883333,6,12,257,796
4,3ce,FACE BLUSH,97801,Rp 300.000,https://www.sociolla.com/blush/69459-phan-ma-h...,Blush,4.858824,13,17,387,796
...,...,...,...,...,...,...,...,...,...,...,...
7631,yves-rocher,Ambre Noir Eau De Toilette,2158,Rp 659.000,https://sociolla.com/fragrance/8088-ambre-noir...,Eau De Toilette,4.476190,7,6,355,273
7632,yves-rocher,Soin Stimulating Conditioner,64174,Rp 199.000,https://vn.sociolla.com/d-u-x/17239-soin-stimu...,Conditioner,4.638889,9,7,122,273
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.750000,5,5,58,273
7634,yves-rocher,Hand Cream Olive Petitgrain,36404,Rp 109.000,https://www.sociolla.com/221-hand-and-foot-cre...,Hand & Foot Cream,4.968750,8,8,117,273


Hasilnya pun, nilai berhasil dipisah, dan kolom 'brand_name_id' berhasil dibuat

#### TF-IDF Vectorizer

Teknik TF-IDF digunakan pada sistem rekomendasi untuk menemukan representasi fitur penting dari setiap kategori produk.

In [16]:
products

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
0,3ce,MULTI EYE COLOR PALETTE,97802,Rp 555.000 - Rp 687.000,https://www.sociolla.com/eyeshadow/69460-phan-...,Eyeshadow,4.920000,5,5,717,796
1,3ce,VELVET LIP TINT,97810,Rp 264.000,https://www.sociolla.com/lip-cream/69468-son-k...,Lip Cream,4.576190,42,42,682,796
2,3ce,LIP COLOR,97822,Rp 317.000,https://www.sociolla.com/lip-matte/69480-son-t...,Lipstick,0.000000,0,0,173,796
3,3ce,MINI MULTI EYE COLOR PALETTE,97833,Rp 423.000,https://www.sociolla.com/eyeshadow/69491-phan-...,Eyeshadow,4.883333,6,12,257,796
4,3ce,FACE BLUSH,97801,Rp 300.000,https://www.sociolla.com/blush/69459-phan-ma-h...,Blush,4.858824,13,17,387,796
...,...,...,...,...,...,...,...,...,...,...,...
7631,yves-rocher,Ambre Noir Eau De Toilette,2158,Rp 659.000,https://sociolla.com/fragrance/8088-ambre-noir...,Eau De Toilette,4.476190,7,6,355,273
7632,yves-rocher,Soin Stimulating Conditioner,64174,Rp 199.000,https://vn.sociolla.com/d-u-x/17239-soin-stimu...,Conditioner,4.638889,9,7,122,273
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.750000,5,5,58,273
7634,yves-rocher,Hand Cream Olive Petitgrain,36404,Rp 109.000,https://www.sociolla.com/221-hand-and-foot-cre...,Hand & Foot Cream,4.968750,8,8,117,273


In [17]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
tf.fit(products['default_category'])
tf.get_feature_names_out()

array(['2in1', 'accessories', 'acne', 'after', 'ampoule', 'and',
       'applicators', 'aromatherapy', 'arts', 'baby', 'bags', 'balm',
       'bar', 'bath', 'bb', 'beauty', 'blotting', 'blush', 'body',
       'booster', 'bottles', 'breast', 'bronzer', 'brow', 'brush',
       'brushes', 'bug', 'bundles', 'butter', 'cake', 'cap', 'card',
       'care', 'case', 'cc', 'cellulite', 'clay', 'cleaner', 'cleanser',
       'cleansing', 'clipper', 'clippers', 'clothing', 'color', 'combs',
       'concealer', 'conditioner', 'contour', 'cotton', 'crayon', 'cream',
       'curler', 'curling', 'cushion', 'de', 'deodorant', 'diaper', 'dry',
       'dryers', 'eau', 'essence', 'exclusive', 'exfoliants',
       'exfoliator', 'eye', 'eyebrows', 'eyelash', 'eyelashes',
       'eyeliner', 'eyeshadow', 'fabric', 'face', 'facial', 'false',
       'family', 'feminine', 'files', 'foam', 'foot', 'for', 'foundation',
       'fragrance', 'gel', 'gift', 'gloss', 'glue', 'hair', 'hand',
       'head', 'highlighter'

In [18]:
tfidf_matrix = tf.fit_transform(products['default_category'])
tfidf_matrix.shape

(7636, 210)

Matriks berukuran (7636, 210). Nilai 7636 merupakan ukuran data dan 210 merupakan matrik kategori produk atau banyaknya tipe dari kategori produk

In [19]:
tfidf_matrix.todense()

matrix([[0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        ...,
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.],
        [0., 0., 0., ..., 0., 0., 0.]])

Selanjutnya, matriks tf-idf untuk nama produk dan kategori produk.

In [20]:
pd.DataFrame(
    tfidf_matrix.todense(),
    columns=tf.get_feature_names_out(),
    index=products['product_name'].values
).sample(20, axis=1).sample(20, axis=0)

Unnamed: 0,exfoliator,brow,shaving,matte,paper,treatment,tools,toe,lotion,face,eyeliner,clippers,foundation,hand,makeup,gift,after,patch,touch,bath
Gillette Foamy Shave Cream Menthol,0.0,0.0,0.808244,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8P Sponge Wedges Nude SBR,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
New Softbrow Pensil,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Duo Hypergloss Free Keychain + Mini SBWC,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Sleek eyebrow set,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Royal Lip Mousse,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Apply & Reapply Sunscreen as Easy as 1-2-3!,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AlpHA Deodorant Brightening Underarm Care,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Glozee - Angelic Grey,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Body Wash PATCHOULI,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Output sampel matriks tf-idf di atas menunjukkan Restaurant Shell Glow Highlighter (Like Shell Collection) termasuk dalam kategori highlighter. Hal ini terlihat dari nilai matriks 1.0 pada kategori highlighter. Begitu juga dengan Aloe Hydramild Facial Wash, termasuk dalam kategori face, dengan nilai 0.589674.

## Model Development dengan Content Based Filtering berdasarkan Kategori Produk

Teknik content based filtering akan merekomendasikan item yang mirip dengan item yang disukai pengguna di masa lalu. Pada tahap ini, ditemukan representasi fitur penting dari setiap kategori produk dengan tfidf vectorizer dan menghitung tingkat kesamaan dengan cosine similarity.

#### Cosine Similarity

Sekarang, dihitung derajat kesamaan (similarity degree) antar nama produk dengan teknik cosine similarity.

In [21]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(tfidf_matrix)
cosine_sim

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.        , ..., 0.        , 0.24144692,
        0.        ],
       [0.        , 0.        , 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        1.        ],
       [0.        , 0.24144692, 0.        , ..., 0.        , 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 1.        , 0.        ,
        1.        ]])

Dihitung cosine similarity dataframe tfidf_matrix yang  diperoleh pada tahapan sebelumnya. Dengan satu baris kode untuk memanggil fungsi cosine similarity dari library sklearn, hasil similarity tiap nama produk sudah didapat, hasilnya berupa matriks kesamaan dalam bentuk array.

In [22]:
cosine_sim_df = pd.DataFrame(cosine_sim, index=products['product_name'], columns=products['product_name'])
print('Shape:', cosine_sim_df.shape)

cosine_sim_df.sample(20, axis=1).sample(20, axis=0)

Shape: (7636, 7636)


product_name,"Your Skin Bae Serum Lactic Acid 10% + Niacinamide 2,5%",Extra Bright Vitamin Lotion C&E 180mL Twinpack,Hyaluronic Acid Moisturizing Mask,Age Miracle Day Cream,Bright Beauty Perfect Potion Essence,Power Bright Serum Body Wash 250ml,Lavie Lash by Marlene Hariman - No Drama Petite,Seamless Liquid Foundation,CICA Acne Fighter Starter Kit,pH-Balanced Facial Cleanser,Aha Body Booster Bright Serum,Power Pair 5 in 1 Set,Lador Perfumed Hair Oil Osmanthus,Flirt - Faux Mink,Powerskin Liquid Glow Moisturizer,Aloe Vera Botanical Gel,Deodorant Invisible All In One Anti Bakteri Roll On Pack Of Two,Royal Lip Mousse,Glow Better Instant Potion: Glowing sat set & ready to go!,7Days Plus Mask - Aloe
product_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Sekkisei Clear Treatment Essence,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Set Of 5 Soothing Serum Facial Mask,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
Pond's Men Acne Solution Facial Foam,0.364334,0.278937,0.0,0.278937,0.0,0.585097,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.278937,0.282273,0.0,0.0,0.0,0.0
Texture Experience Shampoo + Conditioner Mint Sorbet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.603187,0.0,0.0,0.210464,0.40149,0.0,0.0,0.0,0.0,0.0,0.603187,0.0
Dancoly texture paste (medium hold) - for men,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.339256,0.0,0.0,0.0,0.0
2 Step brightening,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.217603,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
MC2 Advance Soothing Essence Toner,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Joylab Package - Wonderskin Series,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Bonavie Package Poison of Maison du Safran,0.0,0.0,0.0,0.0,0.0,0.312459,0.0,0.0,0.229024,0.0,0.370667,0.181881,0.0,0.0,0.0,0.0,0.0,0.0,0.229024,0.0
Texture Experience Creambath Strawberry Yoghurt Softening & Shine-Boosting Treatment,0.0,0.0,0.421551,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.474024,0.0,0.0,0.0,0.0,0.0,0.0,0.421551


Dengan cosine similarity, berhasil mengidentifikasi kesamaan antar produk lainnya. Shape (7636, 7636) merupakan ukuran matriks similarity dari data. Namun, dalam output, hanya menampilkan sebanyak 20 sampel data saja.


Contoh: angka 1.0 pada Skin Buddy Dot Burst Face Wash dan Bio Renew Deep Cleanser menunjukkan dua produk ini memiliki kesamaan. Begitu juga, dengan Oh! So Bright Serum dengan Skin'o'tic Serum, yang juga mendapat nilai 1.0

#### Mendapatkan Rekomendasi

Sistem rekomendasi akan memberikan produk yang memiliki similarity terhadap produk yang diinput oleh pengguna berdasarkan kesamaan kategori dari produk-produk rekomendasi, hasil similarity tiap produk sudah didapat dari perhitungan sebelumnya.

In [23]:
def products_recommendations(nama_produk, similarity_data=cosine_sim_df, items=products[['product_name',
                                                                                          'brand_name',
                                                                                          'price_range',]], k=10):
    index = similarity_data.loc[:,nama_produk].to_numpy().argpartition(
        range(-1, -k-1, -1))

    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    closest = closest.drop(nama_produk, errors='ignore')
    recommendations = pd.DataFrame(closest, columns=['product_name']).merge(items, on='product_name')

    return recommendations.head(k)

Dibuat fungsi dengan nama products_recommendations, dengan nama_produk sebagai parameter pencarian, hasil kesamaan yang diambil dari cosine_sim_df, isi dari dataframe yang ingin ditampilkan dan k (jumlah rekomendasi yang diinginkan) sebanyak 10.

Lalu membuat index untuk mengambil urutan indek produk yang paling mirip, dengan similarity_data, dengan fungsi argpartition untuk mengurutkan indeks array berdasarkan skor kemiripan dari yang tertinggi ke terendah (berdasarkan parameter range(-1, -k-1, -1)).

Daftar produk disimpan dalam closest, dengan mengambil skor kemiripan tertinggi, lalu menghapus nama produk itu sendiri dari daftar rekomendasi. terakhir, dibuat variabel recommendation, untuk membuat dataframe dari data closest, column, dan items untuk digabung menjadi satu, dan nilai k dikembalikan.

In [24]:
products['product_name'].unique()

array(['MULTI EYE COLOR PALETTE', 'VELVET LIP TINT', 'LIP COLOR', ...,
       'Repair Hair Mask', 'Hand Cream Olive Petitgrain', 'Color Mask'],
      dtype=object)

In [25]:
products[products.product_name.eq('Repair Hair Mask')]

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.75,5,5,58,273


Output rekomendasi diharapkan akan memberikan produk serupa Repair Hair Mask, dengan kategori Hair Mask yang mirip

In [26]:
products_recommendations('Repair Hair Mask')

Unnamed: 0,product_name,brand_name,price_range
0,Color Mask,yves-rocher,Rp 289.000
1,Extra Fresh & Hydrate Treatment,moist-diane,Rp 136.000
2,Moist & Shine Treatment,moist-diane,Rp 136.000
3,Bonheur Grasse Rose Treatment,moist-diane,Rp 190.000
4,Extra Damage Repair Hair Mask,moist-diane,Rp 119.000
5,Extra Smooth & Straight Treatment,moist-diane,Rp 136.000
6,Extra Smooth & Straight Hair Mask,moist-diane,Rp 119.000
7,Texture Experience Creambath Vanilla Milk Inte...,makarizo-professional,Rp 288.600
8,Extra Volume & Scalp Treatment,moist-diane,Rp 136.000
9,Bonheur Blue Jasmine Treatment Damage Repair &...,moist-diane,Rp 190.000


Output pun menampilkan 10 daftar produk rekomendasi yang memiliki kemiripan dengan Repair Hair Mask, dengan rentang harga dan asal brandnya juga

Menguji dengan nama produk lain

In [27]:
products_recommendations('LIP COLOR')

Unnamed: 0,product_name,brand_name,price_range
0,Lip Velvet Hydrating Balm,raine-beauty,Rp 154.000
1,Color Hypnose Creamy Lipmatte,make-over,Rp 89.000
2,PLUMP HIGH SHINE LIP GLOW,focallure,Rp 32.000
3,Sugar Rush Lipstick,emina,Rp 39.500
4,Better Lips-Talk Velvet,etude,Rp 188.000
5,PURE MATTE LIPSTICK,focallure,Rp 42.000
6,Lip Bullet,blp-beauty,Rp 129.000
7,The Grail Vivid Matte Lipstick,mineral-botanica,Rp 43.900
8,Everyday Velvet Rouge Lipstick,buttonscarves,Rp 295.000
9,Stick With Me Velvet Matte Lipstick,nama-beauty,Rp 119.000


In [28]:
products_recommendations('Centella Mask Pack')

Unnamed: 0,product_name,brand_name,price_range
0,Missha Double Green Tea Pack,sociolla,Rp 148.800
1,MEDIHEAL Teatree Nude Gel Mask,mediheal,Rp 49.900
2,Teatree Essential Mask,mediheal,Rp 29.900
3,MEDIHEAL Collagen Nude Gel Mask,mediheal,Rp 49.900
4,Lemonlime Vita Ade Mask,mediheal,Rp 21.500
5,Gentle Natural Booster Set of 5,mediheal,Rp 149.500
6,THE N.M.F Ampoule Mask,mediheal,Rp 29.900
7,BUY 2 GET 1 MEDIHEAL TEATREE ESSENTIAL MASK,mediheal,Rp 89.700
8,BUY 2 GET 1 MEDIHEAL THE N.M.F AMPOULE MASK,mediheal,Rp 89.700
9,Collagen Essential Mask,mediheal,Rp 29.900


## Model Development dengan Content Based Filtering berdasarkan Rating Produk

Teknik content based filtering akan merekomendasikan item yang mirip dengan item yang disukai pengguna di masa lalu. Pada tahap ini, ditemukan representasi fitur penting dari setiap kategori produk dengan tfidf vectorizer dan menghitung tingkat kesamaan berdasarkan rating dengan cosine similarity.

### Data Preparation

##### TF-IDF Vectorizer

Teknik TF-IDF digunakan pada sistem rekomendasi untuk menemukan representasi fitur penting dari setiap kategori produk.

In [29]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
tf.fit(products['default_category'])
tf.get_feature_names_out()

array(['2in1', 'accessories', 'acne', 'after', 'ampoule', 'and',
       'applicators', 'aromatherapy', 'arts', 'baby', 'bags', 'balm',
       'bar', 'bath', 'bb', 'beauty', 'blotting', 'blush', 'body',
       'booster', 'bottles', 'breast', 'bronzer', 'brow', 'brush',
       'brushes', 'bug', 'bundles', 'butter', 'cake', 'cap', 'card',
       'care', 'case', 'cc', 'cellulite', 'clay', 'cleaner', 'cleanser',
       'cleansing', 'clipper', 'clippers', 'clothing', 'color', 'combs',
       'concealer', 'conditioner', 'contour', 'cotton', 'crayon', 'cream',
       'curler', 'curling', 'cushion', 'de', 'deodorant', 'diaper', 'dry',
       'dryers', 'eau', 'essence', 'exclusive', 'exfoliants',
       'exfoliator', 'eye', 'eyebrows', 'eyelash', 'eyelashes',
       'eyeliner', 'eyeshadow', 'fabric', 'face', 'facial', 'false',
       'family', 'feminine', 'files', 'foam', 'foot', 'for', 'foundation',
       'fragrance', 'gel', 'gift', 'gloss', 'glue', 'hair', 'hand',
       'head', 'highlighter'

In [30]:
tfidf_matrix = tf.fit_transform(products['default_category'])
tfidf_matrix.shape

(7636, 210)

### Pemodelan

##### Cosine Similarity

Sekarang, dihitung derajat kesamaan (similarity degree) antar rating dari kategori yang sama pada produk dengan teknik cosine similarity.

In [31]:
total_rating = products[['average_rating']].values
combined_features = np.hstack([total_rating, tfidf_matrix.toarray()])
cosine_sim_rating = cosine_similarity(combined_features)

Pada tahapan ini, dihitung cosine similarity dataframe tfidf_matrix yang diperoleh sebelumnya, lalu menyimpan nilai rating dalam variabel total_rating, dan menggabungkan dalam bentuk array dari hasil tfidf_matrix dengan nilai rating yang sudah disimpan

In [32]:
cosine_sim_df_2 = pd.DataFrame(cosine_sim_rating, index=products['product_name'], columns=products['product_name'])
print('Shape:', cosine_sim_df_2.shape)

cosine_sim_df_2.sample(20, axis=1).sample(20, axis=0)

Shape: (7636, 7636)


product_name,Lip & Eye Make Up Remover,Safi White Natural Brightening Cream Mangosteen,Lemonlime Vita Ade Mask,Herb Tonic - Hair & Scalp Treatment,Cica Beat The Sun Powder,Meta-Glow Double shot! C Day A Night Deep Brightening korea Double Serum + Bloomatte True Beauty Inside Cushion,Vita-C Rapid Dark Circle Corrector,Nourishing Bath Routine - Fresh Lime & Coconut (Sukin Body Wash + Lotion),Mini Disney Mickey - Holiday Couple,Lavie Lash by Ryan Ogilvy - Sunset,lilybyred Mood Keyboard,Lip Scrub Strawberry,Rhodiola Peptide Acne Serum,Pure Lavender Shampoo,Magic Eyeliner Perfector,I AM CAPRICORN BODY LOTION,Daily Hydrating Vitamin E & Avocado Body Lotion,SASC X Lula Lahfah Glow or Never Ultra Fine Face Mist,BLUR WATER TINT,Keratin Pro Daily Shampoo 1000ml
product_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Kimchi Coolagen Cream Moisturizer,0.956198,0.999635,0.957434,0.957206,0.95773,0.0,0.971768,0.052667,0.958912,0.96012,0.0,0.955211,0.971768,0.95518,0.956934,0.969243,0.968981,0.957068,0.957956,0.956395
The Complexion Kit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
I AM CANCER BODY LOTION,0.95332,0.963015,0.954552,0.954325,0.954847,0.0,0.970586,0.21693,0.956026,0.95723,0.0,0.952336,0.970586,0.952306,0.954054,0.999916,0.999938,0.943226,0.955073,0.953517
Darling Lash out love + Curler,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Deep Clean Hydrating Foaming Cleanser,0.954735,0.964985,0.955969,0.955742,0.956265,0.0,0.973678,0.0,0.957445,0.958651,0.0,0.95375,0.973678,0.953719,0.95547,0.957017,0.956666,0.95881,0.95649,0.954932
Carecoal - SUNDAE Shower Cream,0.954518,0.9513,0.955752,0.955525,0.956048,0.0,0.958433,0.119126,0.957227,0.958433,0.0,0.953533,0.958433,0.953503,0.955253,0.981137,0.98099,0.944412,0.956273,0.954715
Moongazing All-In-one Face Palette,0.957611,0.961852,0.958849,0.958621,0.959145,0.0,0.969872,0.0,0.960329,0.961538,0.0,0.956622,0.969872,0.956592,0.958348,0.9599,0.959547,0.955338,0.959371,0.957808
Total Effects 7in1 Day Cream Gentle SPF 15,0.952246,0.999968,0.953477,0.95325,0.953772,0.0,0.968866,0.057488,0.954949,0.956152,0.0,0.951263,0.968866,0.951233,0.952979,0.966267,0.966015,0.954165,0.953997,0.952443
Matte Coat + 2-in-1 Base & Top Coat • Duo Set,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Cranberry Juice Face Mask,0.951481,0.948272,0.966984,0.952484,0.953005,0.0,0.955383,0.0,0.954181,0.955383,0.0,0.950499,0.955383,0.950468,0.952213,0.953755,0.953404,0.941406,0.95323,0.951677


Dengan cosine similarity, berhasil mengidentifikasi kesamaan antar produk lainnya. Shape (7636, 7636) merupakan ukuran matriks similarity dari data. Namun, dalam output, hanya menampilkan sebanyak 20 sampel data saja.


Contoh: angka 	0.961538 pada Papaya dan Clear Line N' Lash menunjukkan dua produk ini memiliki kesamaan kategori dan rating yang serupa

##### Mendapatkan Rekomendasi

Sistem rekomendasi akan memberikan produk yang memiliki similarity terhadap produk yang diinput oleh pengguna berdasarkan kesamaan kategori dan rating yang paling tinggi dari produk-produk rekomendasi, hasil similarity tiap produk sudah didapat dari perhitungan sebelumnya.

In [33]:
def products_recommendations_by_rating(nama_produk, similarity_data=cosine_sim_df_2, items=products[['product_name',
                                                                                                     'brand_name',
                                                                                                     'price_range',
                                                                                                     'average_rating']], k=10):
    index = similarity_data.loc[:,nama_produk].to_numpy().argpartition(
        range(-1, -k-1, -1))

    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    closest = closest.drop(nama_produk, errors='ignore')
    recommendations = pd.DataFrame(closest, columns=['product_name']).merge(items, on='product_name')
    recommendations = recommendations.sort_values(by='average_rating', ascending=False)

    return recommendations.head(k)

Dibuat fungsi dengan nama products_recommendations, dengan nama_produk sebagai parameter pencarian, hasil kesamaan yang diambil dari cosine_sim_df, isi dari dataframe yang ingin ditampilkan dan k (jumlah rekomendasi yang diinginkan) sebanyak 10.

Lalu membuat index untuk mengambil urutan indek produk yang paling mirip, dengan similarity_data, dengan fungsi argpartition untuk mengurutkan indeks array berdasarkan skor kemiripan dari yang tertinggi ke terendah (berdasarkan parameter range(-1, -k-1, -1)).

Daftar produk disimpan dalam closest, dengan mengambil skor kemiripan tertinggi, lalu menghapus nama produk itu sendiri dari daftar rekomendasi. terakhir, dibuat variabel recommendation, untuk membuat dataframe dari data closest, column, dan items untuk digabung menjadi satu, diberikan tambahan untuk mengurutkan produk rekomendasi dari rating yang paling tinggi dan nilai k dikembalikan.

In [34]:
products['product_name'].unique()

array(['MULTI EYE COLOR PALETTE', 'VELVET LIP TINT', 'LIP COLOR', ...,
       'Repair Hair Mask', 'Hand Cream Olive Petitgrain', 'Color Mask'],
      dtype=object)

In [35]:
products[products.product_name.eq('Repair Hair Mask')]

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.75,5,5,58,273


Output rekomendasi diharapkan akan memberikan produk serupa Repair Hair Mask, dengan kategori Hair Mask yang mirip

In [36]:
products_recommendations_by_rating('Repair Hair Mask')

Unnamed: 0,product_name,brand_name,price_range,average_rating
8,Bonheur Grasse Rose Treatment,moist-diane,Rp 190.000,4.784091
7,Roughness Eraser Shower Scrub flora's Secret,lavojoy,Rp 99.000,4.783898
2,Tresemme Keratin Deep Smoothening Hair Mask Pe...,tresemme,Rp 87.000,4.75
0,MISSDAISY Perfume Hair Mask - Blackcurrant & V...,miss-daisy,Rp 366.000,4.75
1,Balancing & Soothing Scalp Pack,rated-green,Rp 118.000,4.75
3,Hair Energy Fibertherapy Hair & Scalp Creambat...,makarizo,Rp 22.154 - Rp 124.100,4.74537
4,Lador Tea Tree Scalp Clinic Hair Pack,lador,Rp 210.000,4.733333
5,Texture Experience Creambath Vanilla Milk Inte...,makarizo-professional,Rp 288.600,4.732143
6,Marula Repair Hair Mask,dancoly,Rp 329.000,4.718254
9,Argan Repair Hairmask,dancoly,Rp 289.000,4.708498


Output pun menampilkan 10 daftar produk rekomendasi yang memiliki kemiripan dengan Repair Hair Mask, dengan rentang harga, asal brandnya, dan nilai rating dari yang paling tinggi ke terendah

In [37]:
products_recommendations_by_rating('MULTI EYE COLOR PALETTE')

Unnamed: 0,product_name,brand_name,price_range,average_rating
6,Cute Eyes Maker,etude,Rp 205.000,5.0
7,Mirror Holic Liquid Eyes,etude,Rp 188.000,5.0
8,Look My Eyes Velvet NEW,etude,Rp 109.000,5.0
9,Play Color Eyes Mini Objet,etude,Rp 342.000,5.0
4,LIQUID PRIMER EYE SHADOW,3ce,Rp 264.000,4.975
2,lilybyred Skinny Mes Brow Pencil,lilybyred,Rp 120.000,4.933333
0,Look At My Eyes Café,etude,Rp 79.000 - Rp 109.000,4.92
1,Pro Eye Palette Mini,clio,Rp 310.000,4.92
3,MINI MULTI EYE COLOR PALETTE,3ce,Rp 423.000,4.883333
5,YOU Colorland Explorer Colorland Wander Nature...,you-beauty,Rp 258.000,4.85


## Model Development dengan Content Based Filtering berdasarkan Recommended Count

Teknik content based filtering akan merekomendasikan item yang mirip dengan item yang disukai pengguna di masa lalu. Pada tahap ini, ditemukan representasi fitur penting dari setiap kategori produk dengan tfidf vectorizer dan menghitung tingkat kesamaan berdasarkan total rekomendasi dengan cosine similarity.

### Data Preparation

##### TF-IDF Vectorizer

Teknik TF-IDF digunakan pada sistem rekomendasi untuk menemukan representasi fitur penting dari setiap kategori produk.

In [38]:
products

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
0,3ce,MULTI EYE COLOR PALETTE,97802,Rp 555.000 - Rp 687.000,https://www.sociolla.com/eyeshadow/69460-phan-...,Eyeshadow,4.920000,5,5,717,796
1,3ce,VELVET LIP TINT,97810,Rp 264.000,https://www.sociolla.com/lip-cream/69468-son-k...,Lip Cream,4.576190,42,42,682,796
2,3ce,LIP COLOR,97822,Rp 317.000,https://www.sociolla.com/lip-matte/69480-son-t...,Lipstick,0.000000,0,0,173,796
3,3ce,MINI MULTI EYE COLOR PALETTE,97833,Rp 423.000,https://www.sociolla.com/eyeshadow/69491-phan-...,Eyeshadow,4.883333,6,12,257,796
4,3ce,FACE BLUSH,97801,Rp 300.000,https://www.sociolla.com/blush/69459-phan-ma-h...,Blush,4.858824,13,17,387,796
...,...,...,...,...,...,...,...,...,...,...,...
7631,yves-rocher,Ambre Noir Eau De Toilette,2158,Rp 659.000,https://sociolla.com/fragrance/8088-ambre-noir...,Eau De Toilette,4.476190,7,6,355,273
7632,yves-rocher,Soin Stimulating Conditioner,64174,Rp 199.000,https://vn.sociolla.com/d-u-x/17239-soin-stimu...,Conditioner,4.638889,9,7,122,273
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.750000,5,5,58,273
7634,yves-rocher,Hand Cream Olive Petitgrain,36404,Rp 109.000,https://www.sociolla.com/221-hand-and-foot-cre...,Hand & Foot Cream,4.968750,8,8,117,273


In [39]:
from sklearn.feature_extraction.text import TfidfVectorizer

tf = TfidfVectorizer()
tf.fit(products['default_category'])
tf.get_feature_names_out()

array(['2in1', 'accessories', 'acne', 'after', 'ampoule', 'and',
       'applicators', 'aromatherapy', 'arts', 'baby', 'bags', 'balm',
       'bar', 'bath', 'bb', 'beauty', 'blotting', 'blush', 'body',
       'booster', 'bottles', 'breast', 'bronzer', 'brow', 'brush',
       'brushes', 'bug', 'bundles', 'butter', 'cake', 'cap', 'card',
       'care', 'case', 'cc', 'cellulite', 'clay', 'cleaner', 'cleanser',
       'cleansing', 'clipper', 'clippers', 'clothing', 'color', 'combs',
       'concealer', 'conditioner', 'contour', 'cotton', 'crayon', 'cream',
       'curler', 'curling', 'cushion', 'de', 'deodorant', 'diaper', 'dry',
       'dryers', 'eau', 'essence', 'exclusive', 'exfoliants',
       'exfoliator', 'eye', 'eyebrows', 'eyelash', 'eyelashes',
       'eyeliner', 'eyeshadow', 'fabric', 'face', 'facial', 'false',
       'family', 'feminine', 'files', 'foam', 'foot', 'for', 'foundation',
       'fragrance', 'gel', 'gift', 'gloss', 'glue', 'hair', 'hand',
       'head', 'highlighter'

In [40]:
tfidf_matrix = tf.fit_transform(products['default_category'])
tfidf_matrix.shape

(7636, 210)

### Pemodelan

##### Cosine Similarity

Sekarang, dihitung derajat kesamaan (similarity degree) antar total rekomendasi dari kategori yang sama pada produk dengan teknik cosine similarity.

In [41]:
total_recommended_count = products[['total_recommended_count']].values
combined_features = np.hstack([total_recommended_count, tfidf_matrix.toarray()])
cosine_sim_recom = cosine_similarity(combined_features)

Pada tahapan ini, dihitung cosine similarity dataframe tfidf_matrix yang diperoleh sebelumnya, lalu menyimpan total rekomendasi dalam variabel total_recommended_count, dan menggabungkan dalam bentuk array dari hasil tfidf_matrix dengan total rekomendasi yang sudah disimpan

In [42]:
cosine_sim_df_3 = pd.DataFrame(cosine_sim_recom, index=products['product_name'], columns=products['product_name'])
print('Shape:', cosine_sim_df_3.shape)

cosine_sim_df_3.sample(20, axis=1).sample(20, axis=0)

Shape: (7636, 7636)


product_name,YOUR SKIN BAE Toner Ceramide LC S-20 1% + Mugwort + Cica,Gokujyun Ultimate Moisturizing Lotion,Reve de The Revitalising Moisturising Milk 24 h,Beam x Blendercleanser Solid,Beauty Brush Collection - Sea Shell (Set of 7),Eye Makeup Holiday (Thunder Lash Lengthening + Eyebrow Gel),Origin Serum Set,Calming Mask Pack,Ready To Reset Body Wash Flora's Secret Bundle 2,Scalp Massager,Chic to Cheek Blush,I Love Sun-Day SPF 50 PA+++,SKIN1004 Madagascar Centella Poremizing LIight Gel Cream,EYEBROW MASCARA,All Clean Low pH Balancing Vegan Toner,Roll On For Her,Cica Care Gel Moisturizer,1574 Ultimate Base Set,Professional Faux Mink,2 In1 Face Matte Booster Face Spray
product_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Renewing Body Lotion Collagen & Elastin,0.999536,0.999544,0.7282,0.0,0.998904,0.0,0.999382,0.999452,0.017074,0.995436,0.980131,0.969697,0.999572,0.993428,0.948248,0.999518,0.999454,0.999494,0.969697,0.980131
Don't Forget Sunscreen,0.999991,0.999996,0.707104,0.0,0.999358,0.0,0.999836,0.999907,0.0,0.995889,0.980576,0.970858,0.999971,0.993879,0.948679,0.999972,0.999725,0.999948,0.970138,0.980576
02 Air Mask Lemon Moisturizing & Brightening,0.894423,0.894427,0.632456,0.0,0.893857,0.0,0.894285,0.90031,0.0,0.890754,0.877058,0.867722,0.894406,0.888957,0.848528,0.894406,0.894185,0.894385,0.867722,0.877058
Green Tea Seed Eye and Face Ball,0.894423,0.894427,0.704431,0.0,0.893857,0.0,0.894285,0.894348,0.0,0.890754,0.877058,0.867722,0.894406,0.888957,0.848528,0.894406,0.894185,0.894385,0.867722,0.877058
Melt My Day! Cleansing Balm,0.999737,0.999742,0.706924,0.0,0.999105,0.0,0.999582,0.999653,0.0,0.995636,0.980328,0.969892,0.999718,0.993627,0.948438,0.999719,0.999472,0.999695,0.969892,0.980328
NIVEA MEN Crème,0.999856,0.999867,0.710062,0.0,0.999224,0.0,0.999702,0.999772,0.0,0.995755,0.980445,0.970008,0.999953,0.993746,0.948552,0.999838,0.999978,0.999814,0.970008,0.981131
AC Collection Ultimate Spot Cream,0.999992,0.999997,0.707105,0.0,0.99936,0.0,0.999837,0.999908,0.0,0.99589,0.980578,0.970139,0.999976,0.993881,0.94868,0.999974,0.999737,0.99995,0.970139,0.980578
Soothing & Calming Lotion Twin Set,0.0,8.5e-05,0.707107,0.0,0.0,0.0,0.0,0.0,0.56369,0.0,0.0,0.0,0.0018,0.0,0.0,0.0,0.006025,0.0,0.0,0.0
Vita Duo Cream Joan Day Joan Night,0.99961,0.999623,0.711923,0.0,0.998978,0.0,0.999455,0.999526,0.0,0.995509,0.980203,0.969768,0.999783,0.993501,0.948318,0.999591,0.99999,0.999567,0.969768,0.981346
[BUNDLE] Jacquelle Disney Princess Ariel edition Bundle - Exclusive Gift Box,0.995032,0.995037,0.703598,0.0,0.994403,0.0,0.994879,0.995395,0.0,0.990951,0.975714,0.965328,0.995013,0.988951,0.943975,0.995014,0.994768,0.99499,0.965328,0.975714


Dengan cosine similarity, berhasil mengidentifikasi kesamaan antar produk lainnya. Shape (7636, 7636) merupakan ukuran matriks similarity dari data. Namun, dalam output, hanya menampilkan sebanyak 20 sampel data saja.


Contoh: angka 1.000000 pada Hand Butter Set (3pcs) dan Botanical Essentials - Bundle PATCHOULI Body Lotion and Hand Sanitizer menunjukkan dua produk ini memiliki kesamaan kategori dan total rekomendasi yang serupa

##### Mendapatkan Rekomendasi

Sistem rekomendasi akan memberikan produk yang memiliki similarity terhadap produk yang diinput oleh pengguna berdasarkan kesamaan kategori dan rating yang paling tinggi dari produk-produk rekomendasi, hasil similarity tiap produk sudah didapat dari perhitungan sebelumnya.

In [43]:
def products_recommendations_by_count_recom(nama_produk, similarity_data=cosine_sim_df_3, items=products[['product_name',
                                                                                                     'brand_name',
                                                                                                     'price_range',
                                                                                                     'total_recommended_count']], k=10):
    index = similarity_data.loc[:,nama_produk].to_numpy().argpartition(
        range(-1, -k-1, -1))

    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    closest = closest.drop(nama_produk, errors='ignore')
    recommendations = pd.DataFrame(closest, columns=['product_name']).merge(items, on='product_name')
    recommendations = recommendations.sort_values(by='total_recommended_count', ascending=False)

    return recommendations.head(k)

Dibuat fungsi dengan nama products_recommendations, dengan nama_produk sebagai parameter pencarian, hasil kesamaan yang diambil dari cosine_sim_df, isi dari dataframe yang ingin ditampilkan dan k (jumlah rekomendasi yang diinginkan) sebanyak 10.

Lalu membuat index untuk mengambil urutan indek produk yang paling mirip, dengan similarity_data, dengan fungsi argpartition untuk mengurutkan indeks array berdasarkan skor kemiripan dari yang tertinggi ke terendah (berdasarkan parameter range(-1, -k-1, -1)).

Daftar produk disimpan dalam closest, dengan mengambil skor kemiripan tertinggi, lalu menghapus nama produk itu sendiri dari daftar rekomendasi. terakhir, dibuat variabel recommendation, untuk membuat dataframe dari data closest, column, dan items untuk digabung menjadi satu, diberikan tambahan untuk mengurutkan produk rekomendasi dari total rekomendasi yang paling banyak dan nilai k dikembalikan.

In [44]:
products['product_name'].unique()

array(['MULTI EYE COLOR PALETTE', 'VELVET LIP TINT', 'LIP COLOR', ...,
       'Repair Hair Mask', 'Hand Cream Olive Petitgrain', 'Color Mask'],
      dtype=object)

In [45]:
products[products.product_name.eq('Repair Hair Mask')]

Unnamed: 0,brand_name,product_name,product_id,price_range,url,default_category,average_rating,total_reviews,total_recommended_count,total_in_wishlist,brand_name_id
7633,yves-rocher,Repair Hair Mask,36357,Rp 289.000,https://www.sociolla.com/hair-mask/14833-repai...,Hair Mask,4.75,5,5,58,273


Output rekomendasi diharapkan akan memberikan produk serupa Repair Hair Mask, dengan kategori Hair Mask yang mirip

In [46]:
products_recommendations_by_count_recom('Repair Hair Mask')

Unnamed: 0,product_name,brand_name,price_range,total_recommended_count
7,Texture Experience Creambath Mint Sorbet Purif...,makarizo-professional,Rp 288.600,13
8,Wind Down Scalp Masque,runa-beauty,Rp 199.000,13
6,Nourishing & Moisturizing Scalp Pack,rated-green,Rp 118.000,10
4,Texture Experience Creambath Green Tea Butter ...,makarizo-professional,Rp 288.600,9
5,Tresemme Keratin Deep Smoothening Hair Mask Pe...,tresemme,Rp 87.000,9
2,Miracle You Treatment,moist-diane,Rp 149.000,7
3,Herb Mask Deep Conditioning & Repairing Damage,ree-derma-wellness,Rp 118.000,7
1,Extra Moist & Shine Hair Mask,moist-diane,Rp 119.000,6
0,Honey Dew Repair Mask Dusset,makarizo-professional,Rp 409.600,5
9,Hair Energy Fibertherapy Hair & Scalp Creambat...,makarizo,Rp 124.100,3


Output pun menampilkan 10 daftar produk rekomendasi yang memiliki kemiripan dengan Repair Hair Mask, dengan rentang harga, asal brandnya, dan total rekomendasi dari yang paling banyak ke yang sedikit

In [47]:
products_recommendations_by_count_recom('VELVET LIP TINT')

Unnamed: 0,product_name,brand_name,price_range,total_recommended_count
9,Misty Matte Lip Cream,dazzle-me,Rp 33.900,76
8,CREAMY LIP & CHEEK DUO,focallure,Rp 38.000,73
6,Colorfit Fresh Matte Lip Ink,wardah,Rp 64.000,68
5,Exclusive Matte Lip Cream X Ayang Cempaka,wardah,Rp 66.500,67
4,YOU NEW FORMULA Rouge Power Matte Lip Cream,you-beauty,Rp 93.000,51
3,Silky Velvet Lip Cream,pinkflash,Rp 70.000,45
1,Matte Lip Colour,sada-by-cathy-sharon,Rp 180.000,44
0,Velvet Matte Lip Cream,dazzle-me,Rp 29.900,42
2,Staymax Matte Lip Ink,focallure,Rp 89.000,40
7,Lasting Matte Lipcream,pinkflash,Rp 42.000,30
