# Project: Amazon Fashion Recommender Demo


## Table of contents

1. [Introduction](#1)
2. [Data splitting](#2)
3. [Exploratory Data Analysis (EDA)](#3)
4. [Feature engineering](#4)
5. [Preprocessing and transformations](#5) 
6. [Baseline model](#6)
7. [Linear models](#7)
8. [Different models](#8)
9. [Feature selection](#9)
10. [Hyperparameter optimization](#10)
11. [Interpretation and feature importances](#11) 
12. [Results on the test set](#12)
13. [Summary of the results](#13)


<!-- BEGIN QUESTION -->

## Imports

In [1]:
from hashlib import sha1
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
from sklearn.preprocessing import KBinsDiscretizer
from sklearn.dummy import DummyRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.compose import ColumnTransformer
from sklearn.compose import make_column_transformer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.impute import SimpleImputer
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import (
    GridSearchCV,
    cross_val_score,
    cross_validate,
    train_test_split,
)
from sklearn.feature_selection import RFE, SequentialFeatureSelector
from sklearn.pipeline import Pipeline, make_pipeline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
import shap
plt.rcParams["font.size"] = 16

In [2]:
#pip install numpy pandas matplotlib scikit-learn shap



In [3]:
#!pip install ipywidgets




## Introduction <a name="in"></a>




#### A final note

## 1) Clustering + Regression (rating 예측)

“`amazon_co-ecommerce_sample.csv`를 불러와서,

**피처 준비**  
- 수치형: `price`, `number_of_reviews`, `number_available_in_stock`  
- 범주형: `manufacturer`, `amazon_category_and_sub_category` → One-Hot 인코딩  

**클러스터링**  
- KMeans로 상품을 k개 군집(예: k=10)으로 묶고, `cluster` 컬럼 추가  

**회귀 모델 학습**  
- 각 군집마다 `RandomForestRegressor` 등을 사용해 `average_review_rating` (1~5) 예측 모델 학습  

**추천**  
1. 사용자가 본(또는 선택한) 상품의 `cluster` c 식별  
2. 군집 c 내 다른 상품들의 피처로 `regressor[c].predict()` 수행 → 예측 평점 산출  
3. 예측 평점 높은 순으로 상위 N개 상품 추천

---

## 2) Popularity-Based Recommendation (인기 순위)

“`amazon_co-ecommerce_sample.csv`에서

- `number_of_reviews` (리뷰 개수)와 `average_review_rating`(평점)을 기준으로  
- **인기 지표**: 예) `score = average_review_rating * log(1 + number_of_reviews)`  
- `score` 내림차순 정렬 후 상위 N개 `uniq_id`, `product_name` 반환  
- (옵션) `number_of_answered_questions`나 `number_available_in_stock`도 가중치로 활용

---

## 3) Price & Rating Filter (가격·평점 필터)

“사용자로부터 예산 범위(`min_price`, `max_price`)와 평점 기준(`min_rating`)을 입력받아,

1. `price`가 `min_price ≤ price ≤ max_price` 이고  
2. `average_review_rating ≥ min_rating` 인 상품 필터링  
3. 필터된 결과를 `average_review_rating` 내림차순으로 정렬  
4. 상위 N개 `uniq_id`, `product_name`, `price`, `average_review_rating` 반환”



<br><br>

In [4]:
...

Ellipsis

In [5]:
df = pd.read_csv('data/amazon_co-ecommerce_sample.csv')
df.head(10)

Unnamed: 0,index,uniq_id,product_name,manufacturer,price,number_available_in_stock,number_of_reviews,number_of_answered_questions,average_review_rating,amazon_category_and_sub_category,customers_who_bought_this_item_also_bought,description,product_information,product_description,items_customers_buy_after_viewing_this_item,customer_questions_and_answers,customer_reviews,sellers
0,0,eac7efa5dbd3d667f26eb3d3ab504464,Hornby 2014 Catalogue,Hornby,£3.42,5 new,15,1.0,4.9 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Hornby-R8150-Catalogue...,Product Description Hornby 2014 Catalogue Box ...,Technical Details Item Weight640 g Product Dim...,Product Description Hornby 2014 Catalogue Box ...,http://www.amazon.co.uk/Hornby-R8150-Catalogue...,Does this catalogue detail all the previous Ho...,Worth Buying For The Pictures Alone (As Ever) ...,"{""seller""=>[{""Seller_name_1""=>""Amazon.co.uk"", ..."
1,1,b17540ef7e86e461d37f3ae58b7b72ac,FunkyBuys® Large Christmas Holiday Express Fes...,FunkyBuys,£16.99,,2,1.0,4.5 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Christmas-Holiday-Expr...,Size Name:Large FunkyBuys® Large Christmas Hol...,Technical Details Manufacturer recommended age...,Size Name:Large FunkyBuys® Large Christmas Hol...,http://www.amazon.co.uk/Christmas-Holiday-Expr...,can you turn off sounds // hi no you cant turn...,Four Stars // 4.0 // 18 Dec. 2015 // By\n \...,"{""seller""=>{""Seller_name_1""=>""UHD WHOLESALE"", ..."
2,2,348f344247b0c1a935b1223072ef9d8a,CLASSIC TOY TRAIN SET TRACK CARRIAGES LIGHT EN...,ccf,£9.99,2 new,17,2.0,3.9 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Classic-Train-Lights-B...,BIG CLASSIC TOY TRAIN SET TRACK CARRIAGE LIGHT...,Technical Details Manufacturer recommended age...,BIG CLASSIC TOY TRAIN SET TRACK CARRIAGE LIGHT...,http://www.amazon.co.uk/Train-With-Tracks-Batt...,What is the gauge of the track // Hi Paul.Trut...,**Highly Recommended!** // 5.0 // 26 May 2015 ...,"{""seller""=>[{""Seller_name_1""=>""DEAL-BOX"", ""Sel..."
3,3,e12b92dbb8eaee78b22965d2a9bbbd9f,HORNBY Coach R4410A BR Hawksworth Corridor 3rd,Hornby,£39.99,,1,2.0,5.0 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,,Hornby 00 Gauge BR Hawksworth 3rd Class W 2107...,Technical Details Item Weight259 g Product Dim...,Hornby 00 Gauge BR Hawksworth 3rd Class W 2107...,,,I love it // 5.0 // 22 July 2013 // By\n \n...,
4,4,e33a9adeed5f36840ccc227db4682a36,Hornby 00 Gauge 0-4-0 Gildenlow Salt Co. Steam...,Hornby,£32.19,,3,2.0,4.7 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Hornby-R6367-RailRoad-...,Product Description Hornby RailRoad 0-4-0 Gild...,Technical Details Item Weight159 g Product Dim...,Product Description Hornby RailRoad 0-4-0 Gild...,http://www.amazon.co.uk/Hornby-R2672-RailRoad-...,,Birthday present // 5.0 // 14 April 2014 // By...,
5,5,cb34f0a84102c1ebc3ef6892d7444d36,20pcs Model Garden Light Double Heads Lamppost...,Generic,£6.99,,2,1.0,5.0 out of 5 stars,Hobbies > Model Trains & Railway Sets > Lighti...,http://www.amazon.co.uk/Single-Head-Garden-Lig...,These delicate model garden lights are mainly ...,Technical Details Manufacturer recommended age...,These delicate model garden lights are mainly ...,http://www.amazon.co.uk/Single-Head-Garden-Lig...,is it possible to replace thr grain of wheat l...,Five Stars // 5.0 // 27 Dec. 2014 // By\n \...,"{""seller""=>{""Seller_name_1""=>""STK e-Shop"", ""Se..."
6,6,f74b562470571dfb689324adf236f82c,Hornby 00 Gauge 230mm BR Bogie Passenger Brake...,Hornby,£24.99,,2,1.0,4.5 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Hornby-R4388-RailRoad-...,Product Description Hornby BR bogie passenger ...,Technical Details Item Weight222 g Product Dim...,Product Description Hornby BR bogie passenger ...,,,"High standard model, well worth the wait. Repl...","{""seller""=>{""Seller_name_1""=>""MyHobbyStore Ret..."
7,7,87bbb472ef9d90dcef140a551665c929,Hornby Santa's Express Train Set,Hornby,£69.93,3 new,36,7.0,4.3 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Hornby-R8221-Gauge-Tra...,Product Description Inject a bit of Hornby mag...,Technical Details Item Weight1.2 Kg Product Di...,Product Description Inject a bit of Hornby mag...,http://www.amazon.co.uk/Hornby-R1151-Caledonia...,Can this train go backwards as well as forward...,Beautiful set // 5.0 // 3 Dec. 2015 // By\n ...,"{""seller""=>[{""Seller_name_1""=>""Toy Arena"", ""Se..."
8,8,7e2aa2b4596a39ba852449718413d7cc,Hornby Gauge Western Express Digital Train Set...,Hornby,£235.58,4 new,1,1.0,5.0 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Hornby-Western-Master-...,Western Express Digital Train Set with eLink a...,Technical Details Item Weight2.3 Kg Product Di...,Western Express Digital Train Set with eLink a...,http://www.amazon.co.uk/Hornby-Western-Master-...,The description is incorrect. the hornby site...,Five Stars // 5.0 // 23 Dec. 2015 // By\n \...,"{""seller""=>[{""Seller_name_1""=>""Amazon.co.uk"", ..."
9,9,5afbaf65680c9f378af5b3a3ae22427e,Learning Curve Chuggington Interactive Chatsworth,Chuggington,,1 new,8,1.0,4.8 out of 5 stars,Hobbies > Model Trains & Railway Sets > Rail V...,http://www.amazon.co.uk/Learning-Curve-Chuggin...,Product Description An amazingly Interactive C...,Technical Details Item Weight150 g Product Dim...,Product Description An amazingly Interactive C...,http://www.amazon.co.uk/Chuggington | http://w...,,Chuggers are go! // 4.0 // 11 Jan. 2011 // By\...,


<!-- END QUESTION -->

<br><br>


## 2. Data splitting <a name="2"></a>


**Your tasks:**

1. Split the data into train (70%) and test (30%) portions with `random_state=123`.


<!-- END QUESTION -->

<br><br>


## 3. EDA <a name="3"></a>


**Your tasks:**

1. Perform exploratory data analysis on the train set.
2. Include at least two summary statistics and two visualizations that you find useful, and accompany each one with a sentence explaining it.
3. Summarize your initial observations about the data. 
4. Pick appropriate metric/metrics for assessment. 

In [6]:
...

Ellipsis

In [7]:
...

Ellipsis

In [8]:
...

Ellipsis

In [9]:
...

Ellipsis

In [10]:
...

Ellipsis

In [11]:
...

Ellipsis

In [12]:
...

Ellipsis

In [13]:
...

Ellipsis

In [14]:
...

Ellipsis

In [15]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>


## 4. Feature engineering <a name="4"></a>

**Your tasks:**

1. Carry out feature engineering. In other words, extract new features relevant for the problem and work with your new feature set in the following exercises. You may have to go back and forth between feature engineering and preprocessing. 

<!-- END QUESTION -->

<br><br>



## 5. Preprocessing and transformations <a name="5"></a>
<hr>


In [16]:
...

Ellipsis

In [17]:
...

Ellipsis

In [18]:
...

Ellipsis

In [19]:
...

Ellipsis

In [20]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>


## 6. Baseline model <a name="6"></a>


**Your tasks:**
1. Try `scikit-learn`'s baseline model and report results.

In [21]:
...

Ellipsis

In [22]:
...

Ellipsis

In [23]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>

## 7. Linear models <a name="7"></a>

**Your tasks:**

1. Try a linear model as a first real attempt. 
2. Carry out hyperparameter tuning to explore different values for the complexity hyperparameter. 
3. Report cross-validation scores along with standard deviation. 
4. Summarize your results.

In [24]:
...

Ellipsis

In [25]:
...

Ellipsis

In [26]:
...

Ellipsis

In [27]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>



## 8. Different models <a name="8"></a>


**Your tasks:**
1. Try at least 3 other models aside from a linear model. One of these models should be a tree-based ensemble model. 
2. Summarize your results in terms of overfitting/underfitting and fit and score times. Can you beat a linear model? 

In [28]:
...

Ellipsis

In [29]:
...

Ellipsis

In [30]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>



## 9. Feature selection <a name="9"></a>


**Your tasks:**

Make some attempts to select relevant features. You may try `RFECV` or forward selection for this. Do the results improve with feature selection? Summarize your results. If you see improvements in the results, keep feature selection in your pipeline. If not, you may abandon it in the next exercises. 

In [31]:
...

Ellipsis

In [32]:
...

Ellipsis

In [33]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>



## 10. Hyperparameter optimization <a name="10"></a>


**Your tasks:**

Make some attempts to optimize hyperparameters for the models you've tried and summarize your results. In at least one case you should be optimizing multiple hyperparameters for a single model. You may use `sklearn`'s methods for hyperparameter optimization or fancier Bayesian optimization methods. 
  - [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)   
  - [RandomizedSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)
  - [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize) 

In [34]:
...

Ellipsis

In [35]:
...

Ellipsis

In [36]:
...

Ellipsis

In [37]:
...

Ellipsis

In [38]:
...

Ellipsis

In [39]:
...

Ellipsis

In [40]:
...

Ellipsis

In [41]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>



## 11. Interpretation and feature importances <a name="1"></a>


**Your tasks:**

1. Use the methods we saw in class (e.g., `shap`) (or any other methods of your choice) to examine the most important features of one of the non-linear models. 
2. Summarize your observations. 

In [42]:
...

Ellipsis

In [43]:
...

Ellipsis

In [44]:
...

Ellipsis

In [45]:
...

Ellipsis

In [46]:
...

Ellipsis

In [47]:
...

Ellipsis

In [48]:
...

Ellipsis

In [49]:
...

Ellipsis

In [50]:
...

Ellipsis

In [51]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>



## 12. Results on the test set <a name="12"></a>

**Your tasks:**

1. Try your best performing model on the test data and report test scores. 
2. Do the test scores agree with the validation scores from before? To what extent do you trust your results? Do you think you've had issues with optimization bias? 
3. Take one or two test predictions and explain these individual predictions (e.g., with SHAP force plots).  

In [52]:
...

Ellipsis

In [53]:
...

Ellipsis

In [54]:
...

Ellipsis

In [55]:
...

Ellipsis

In [56]:
...

Ellipsis

In [57]:
...

Ellipsis

In [58]:
...

Ellipsis

In [59]:
...

Ellipsis

In [60]:
...

Ellipsis

In [61]:
...

Ellipsis

In [62]:
...

Ellipsis

In [63]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>


## 13. Summary of results <a name="13"></a>

Imagine that you want to present the summary of these results to your boss and co-workers. 

**Your tasks:**

1. Create a table summarizing important results. 
2. Write concluding remarks.
3. Discuss other ideas that you did not try but could potentially improve the performance/interpretability . 


In [64]:
...

Ellipsis

In [65]:
...

Ellipsis

<!-- END QUESTION -->

<br><br>

<br><br>



## 14. Your takeaway <a name="15"></a>


**Your tasks:**

What is your biggest takeaway from the supervised machine learning material?

<!-- END QUESTION -->

<br><br>