# Testing Notebook

#Some Theory about Recommender Systems

The main families of methods for RecSys are:

- Collaborative Filtering: This method makes automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on a set of items, A is more likely to have B's opinion for a given item than that of a randomly chosen person.

- Content-Based Filtering: This method uses only information about the description and attributes of the items users has previously consumed to model user's preferences. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present). In particular, various candidate items are compared with items previously rated by the user and the best-matching items are recommended.

- Hybrid methods: Recent research has demonstrated that a hybrid approach, combining collaborative filtering and content-based filtering could be more effective than pure approaches in some cases. These methods can also be used to overcome some of the common problems in recommender systems such as cold start and the sparsity problem.

https://www.kaggle.com/code/gspmoreira/recommender-systems-in-python-101

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
color = sns.color_palette()

In [2]:
import scipy
import math
import random
import sklearn
from scipy.sparse import csr_matrix
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.sparse.linalg import svds
from sklearn.preprocessing import MinMaxScaler

## 1. ItemBased Collaborative Filter Recommendation

Example: https://www.kaggle.com/code/hendraherviawan/itembased-collaborative-filter-recommendation-r/report

### 2.1 Preprocessing

In [3]:
articles = pd.read_csv("data/articles.csv")
articles.head()

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
3,110065001,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,9,Black,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,10,White,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."


In [4]:
import re

# regex pattern: find all column names with '_id', '_code', or '_no'
pattern = '.*(_id|_code|_no).*'

# dict comprehension: Sets all columns with '_id', '_code', or '_no' to str type
dtype_dict = {column: str for column in articles.columns if re.match(pattern, column)}

articles = articles.astype(dtype = dtype_dict)

In [5]:
customers = pd.read_csv("data/customers.csv")
customers.head()

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,,,ACTIVE,NONE,54.0,5d36574f52495e81f019b680c843c443bd343d5ca5b1c2...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,1.0,1.0,ACTIVE,Regularly,52.0,25fa5ddee9aac01b35208d01736e57942317d756b32ddd...


In [6]:
transactions = pd.read_csv("data/transactions_train.csv")
transactions.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,663713001.0,0.050831,2.0
1,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,541518023.0,0.030492,2.0
2,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,505221004.0,0.015237,2.0
3,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687003.0,0.016932,2.0
4,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687004.0,0.016932,2.0


In [7]:
#article_id is a float. First convert to int and then to string.
transactions['article_id'] = transactions['article_id'].astype("Int64").astype(str) 

### 2.2 Build single dataframe: Articles + Transactions + Customers

#### Create Transactions subset for testing

In [8]:
transactions_subset = transactions.sample(20000)
#transactions_subset = transactions

In [9]:
transactions_subset.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
15873908,2019-08-24,8c316b51a1d960a909f5111f229fb52e502fd28342dc3b...,767687002,0.029441,2.0
3694574,2018-12-14,00a50f190136f8145ca31977927cdc7d82f3d592a30ff1...,636323001,0.016932,2.0
3461525,2018-12-07,30d6d36375da58d9393edb6c0b2d825b500e95b9ffe716...,693243001,0.027102,1.0
15945033,2019-08-26,e34a808f89c302470b05591308c0bd184a43ab78319e35...,725663002,0.013542,2.0
6553032,2019-02-23,1351b305c2de05f47586f518d4b53b68ecdcb7248f48c1...,561277001,0.018712,2.0


#### Join Transactions and Articles dataframes

In [10]:
#transactions_articles_joined = transactions_subset.set_index('article_id').join(articles.set_index('article_id'))
transactions_articles_joined = transactions_subset.join(articles.set_index('article_id'), on='article_id')

In [11]:
transactions_articles_joined.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
15873908,2019-08-24,8c316b51a1d960a909f5111f229fb52e502fd28342dc3b...,767687002,0.029441,2.0,767687,HONEY long bralette,306,Bra,Underwear,...,Ladies Sport Bras,S,Sport,26,Sport,5,Ladies H&M Sport,1005,Jersey Fancy,Bralette in fast-drying functional fabric with...
3694574,2018-12-14,00a50f190136f8145ca31977927cdc7d82f3d592a30ff1...,636323001,0.016932,2.0,636323,Skinny H.W Ankle Queens,272,Trousers,Garment Lower body,...,Denim Trousers,D,Divided,2,Divided,57,Ladies Denim,1016,Trousers Denim,"Ankle-length jeans in washed, stretch denim wi..."
3461525,2018-12-07,30d6d36375da58d9393edb6c0b2d825b500e95b9ffe716...,693243001,0.027102,1.0,693243,Matey,252,Sweater,Garment Upper body,...,Knitwear,A,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1003,Knitwear,"Wide, V-neck jumper in a soft, fine knit conta..."
15945033,2019-08-26,e34a808f89c302470b05591308c0bd184a43ab78319e35...,725663002,0.013542,2.0,725663,Ruffle Tanga,59,Swimwear bottom,Swimwear,...,Swimwear,B,Lingeries/Tights,1,Ladieswear,60,"Womens Swimwear, beachwear",1018,Swimwear,Fully lined bikini bottoms with a low waist an...
6553032,2019-02-23,1351b305c2de05f47586f518d4b53b68ecdcb7248f48c1...,561277001,0.018712,2.0,561277,PLUS Support 70 Den 1 p Tights,304,Underwear Tights,Socks & Tights,...,Tights basic,B,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,"Semi-opaque tights that shape the tummy, thigh..."


#### Join Transactions-Articles with Customers dataframes

In [36]:
# Join also customer info

#trans_arts_cust_joined = transactions_articles_joined.set_index('customer_id').join(customers.set_index('customer_id'))
trans_arts_cust_joined = transactions_articles_joined.join(customers.set_index('customer_id'), on='customer_id')
trans_arts_cust_joined.head()

#Index of output df belongs to original index of transactions

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,...,section_name,garment_group_no,garment_group_name,detail_desc,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
15873908,2019-08-24,8c316b51a1d960a909f5111f229fb52e502fd28342dc3b...,767687002,0.029441,2.0,767687,HONEY long bralette,306,Bra,Underwear,...,Ladies H&M Sport,1005,Jersey Fancy,Bralette in fast-drying functional fabric with...,,,ACTIVE,NONE,38.0,af9319d0eb71ae54a0d822fc7aaa3b330552bd66cb55d7...
3694574,2018-12-14,00a50f190136f8145ca31977927cdc7d82f3d592a30ff1...,636323001,0.016932,2.0,636323,Skinny H.W Ankle Queens,272,Trousers,Garment Lower body,...,Ladies Denim,1016,Trousers Denim,"Ankle-length jeans in washed, stretch denim wi...",,,PRE-CREATE,NONE,26.0,9046f1953ca6f4fac22c775659b78d919d6c69aff2943a...
3461525,2018-12-07,30d6d36375da58d9393edb6c0b2d825b500e95b9ffe716...,693243001,0.027102,1.0,693243,Matey,252,Sweater,Garment Upper body,...,Womens Everyday Collection,1003,Knitwear,"Wide, V-neck jumper in a soft, fine knit conta...",,,ACTIVE,,27.0,cb7b28d6587cddadda62eba998b2c9dd815acdf2c3fd3e...
15945033,2019-08-26,e34a808f89c302470b05591308c0bd184a43ab78319e35...,725663002,0.013542,2.0,725663,Ruffle Tanga,59,Swimwear bottom,Swimwear,...,"Womens Swimwear, beachwear",1018,Swimwear,Fully lined bikini bottoms with a low waist an...,,,ACTIVE,NONE,26.0,ffd87a0df7a5d342e2c07e27ae558d5de4578b56e41226...
6553032,2019-02-23,1351b305c2de05f47586f518d4b53b68ecdcb7248f48c1...,561277001,0.018712,2.0,561277,PLUS Support 70 Den 1 p Tights,304,Underwear Tights,Socks & Tights,...,"Womens Nightwear, Socks & Tigh",1021,Socks and Tights,"Semi-opaque tights that shape the tummy, thigh...",,,ACTIVE,NONE,37.0,a688edcb90ff9dbe057cdb1b897390ad35bc83fa7984fd...


#### Check if join has been done correctly

In [13]:
#1. check that customer_id are repeated (some customers bought multiple items)
#Number of products purchased by each customer
grouped = trans_arts_cust_joined.groupby("customer_id")["customer_id"].count().reset_index(name='counts').sort_values(by='counts', ascending=False)
grouped

Unnamed: 0,customer_id,counts
884,0b8b3c4a5b1b39b1837fbd85c67becede7f547b2996409...,4
18822,f8b67072cffdcbb04932ebc3fb2d5f9e455dcb7d73d72a...,4
5822,4b83e32f6482fed86a0296a0f441689770f8fba760ec80...,4
10252,86088b6c51b12af9a00ef5ddfae37870f6604f8a8626a1...,3
11249,931232fbe6631e54765a023b5aeaa9c30a0fdd30a02bf7...,3
...,...,...
6553,55507c56953a4307e196b5670146844ff8548ae1524b51...,1
6552,554c09c507b39f817027814618b9879546f7b20b9e42f2...,1
6551,554b16516950f40b73e6b49f8ac2ffca177dd363f3a27b...,1
6550,554a1f1d47173de158c74168c722a4e8b6d19da9d82af6...,1


In [14]:
#2. check that article_id are repeated (different customers bought same item)
#Number of times the products were purchased by the customers
grouped = trans_arts_cust_joined.groupby("article_id")["article_id"].count().reset_index(name='counts').sort_values(by='counts', ascending=False)
grouped

Unnamed: 0,article_id,counts
7790,706016001,26
3001,610776002,21
1844,562245001,21
257,351484002,20
4858,656763001,20
...,...,...
4699,652924012,1
4700,652956001,1
4701,652956002,1
4702,652961004,1


In [15]:
#3. Check duplicated rows
trans_arts_cust_joined.duplicated().sum()

3

Duplicate rows correspond to multiple purchases of the same item by the same client. 

### 2.3 Drop columns

@To-Do: Are we working with the name or the number of the following categories?:
- product
- product_type
- product_group
- garment_group
- ...

Drop the unchosen.

In [29]:
trans_arts_cust_joined.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,product_type_no,product_group_name,graphical_appearance_no,graphical_appearance_name,...,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,club_member_status,fashion_news_frequency,age,postal_code
15873908,2019-08-24,8c316b51a1d960a909f5111f229fb52e502fd28342dc3b...,767687002,0.029441,2.0,767687,306,Underwear,1010016,Solid,...,Sport,26,Sport,5,Ladies H&M Sport,1005,ACTIVE,NONE,38.0,af9319d0eb71ae54a0d822fc7aaa3b330552bd66cb55d7...
3694574,2018-12-14,00a50f190136f8145ca31977927cdc7d82f3d592a30ff1...,636323001,0.016932,2.0,636323,272,Garment Lower body,1010023,Denim,...,Divided,2,Divided,57,Ladies Denim,1016,PRE-CREATE,NONE,26.0,9046f1953ca6f4fac22c775659b78d919d6c69aff2943a...
3461525,2018-12-07,30d6d36375da58d9393edb6c0b2d825b500e95b9ffe716...,693243001,0.027102,1.0,693243,252,Garment Upper body,1010010,Melange,...,Ladieswear,1,Ladieswear,15,Womens Everyday Collection,1003,ACTIVE,ACTIVE,27.0,cb7b28d6587cddadda62eba998b2c9dd815acdf2c3fd3e...
15945033,2019-08-26,e34a808f89c302470b05591308c0bd184a43ab78319e35...,725663002,0.013542,2.0,725663,59,Swimwear,1010016,Solid,...,Lingeries/Tights,1,Ladieswear,60,"Womens Swimwear, beachwear",1018,ACTIVE,NONE,26.0,ffd87a0df7a5d342e2c07e27ae558d5de4578b56e41226...
6553032,2019-02-23,1351b305c2de05f47586f518d4b53b68ecdcb7248f48c1...,561277001,0.018712,2.0,561277,304,Socks & Tights,1010016,Solid,...,Lingeries/Tights,1,Ladieswear,62,"Womens Nightwear, Socks & Tigh",1021,ACTIVE,NONE,37.0,a688edcb90ff9dbe057cdb1b897390ad35bc83fa7984fd...


In [37]:
# Drop columns
trans_arts_cust_joined = trans_arts_cust_joined.drop("detail_desc", axis=1)   
trans_arts_cust_joined = trans_arts_cust_joined.drop("prod_name", axis=1)  
trans_arts_cust_joined = trans_arts_cust_joined.drop("product_type_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("garment_group_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("product_group_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("graphical_appearance_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("colour_group_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("perceived_colour_value_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("perceived_colour_master_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("department_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("index_name", axis=1) 
#trans_arts_cust_joined = trans_arts_cust_joined.drop("index_group_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("section_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("index_group_name", axis=1) 
trans_arts_cust_joined = trans_arts_cust_joined.drop("FN", axis=1)   

In [38]:
trans_arts_cust_joined.head()

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id,product_code,product_type_no,graphical_appearance_no,colour_group_code,perceived_colour_value_id,...,department_no,index_code,index_group_no,section_no,garment_group_no,Active,club_member_status,fashion_news_frequency,age,postal_code
15873908,2019-08-24,8c316b51a1d960a909f5111f229fb52e502fd28342dc3b...,767687002,0.029441,2.0,767687,306,1010016,51,1,...,8316,S,26,5,1005,,ACTIVE,NONE,38.0,af9319d0eb71ae54a0d822fc7aaa3b330552bd66cb55d7...
3694574,2018-12-14,00a50f190136f8145ca31977927cdc7d82f3d592a30ff1...,636323001,0.016932,2.0,636323,272,1010023,9,4,...,1772,D,2,57,1016,,PRE-CREATE,NONE,26.0,9046f1953ca6f4fac22c775659b78d919d6c69aff2943a...
3461525,2018-12-07,30d6d36375da58d9393edb6c0b2d825b500e95b9ffe716...,693243001,0.027102,1.0,693243,252,1010010,72,7,...,1626,A,1,15,1003,,ACTIVE,,27.0,cb7b28d6587cddadda62eba998b2c9dd815acdf2c3fd3e...
15945033,2019-08-26,e34a808f89c302470b05591308c0bd184a43ab78319e35...,725663002,0.013542,2.0,725663,59,1010016,53,7,...,4242,B,1,60,1018,,ACTIVE,NONE,26.0,ffd87a0df7a5d342e2c07e27ae558d5de4578b56e41226...
6553032,2019-02-23,1351b305c2de05f47586f518d4b53b68ecdcb7248f48c1...,561277001,0.018712,2.0,561277,304,1010016,9,4,...,3608,B,1,62,1021,,ACTIVE,NONE,37.0,a688edcb90ff9dbe057cdb1b897390ad35bc83fa7984fd...


### 2.4 Manage Null values

In [39]:
trans_arts_cust_joined.isnull().sum()

t_dat                             0
customer_id                       0
article_id                        0
price                             0
sales_channel_id                  0
product_code                      0
product_type_no                   0
graphical_appearance_no           0
colour_group_code                 0
perceived_colour_value_id         0
perceived_colour_master_id        0
department_no                     0
index_code                        0
index_group_no                    0
section_no                        0
garment_group_no                  0
Active                        11671
club_member_status               33
fashion_news_frequency           79
age                              85
postal_code                       0
dtype: int64

In [40]:
#Replace Age with the mean
mean_age = trans_arts_cust_joined['age'].median()
trans_arts_cust_joined['age'].fillna(mean_age,inplace=True)

In [41]:
#Remove Active Column
trans_arts_cust_joined = trans_arts_cust_joined.drop("Active", axis=1)   

In [42]:
#Rplace club_member_status and fashion_news_frequency with the most common value
trans_arts_cust_joined = trans_arts_cust_joined.fillna(trans_arts_cust_joined['club_member_status'].value_counts().index[0])
trans_arts_cust_joined = trans_arts_cust_joined.fillna(trans_arts_cust_joined['fashion_news_frequency'].value_counts().index[0])

In [43]:
trans_arts_cust_joined.isnull().sum()

t_dat                         0
customer_id                   0
article_id                    0
price                         0
sales_channel_id              0
product_code                  0
product_type_no               0
graphical_appearance_no       0
colour_group_code             0
perceived_colour_value_id     0
perceived_colour_master_id    0
department_no                 0
index_code                    0
index_group_no                0
section_no                    0
garment_group_no              0
club_member_status            0
fashion_news_frequency        0
age                           0
postal_code                   0
dtype: int64

### 2.4 Manage Categorical Columns

#### Explore levels of the categorical variables.

In [44]:
trans_arts_cust_joined['sales_channel_id'].value_counts()

2.0    13971
1.0     6029
Name: sales_channel_id, dtype: int64

In [45]:
trans_arts_cust_joined['product_code'].value_counts()

562245    105
610776     89
706016     78
573716     61
599580     59
         ... 
555487      1
765974      1
739845      1
614429      1
651291      1
Name: product_code, Length: 7101, dtype: int64

In [46]:
trans_arts_cust_joined['product_type_no'].value_counts()

272    2698
265    1861
252    1745
255    1434
253    1004
       ... 
49        1
156       1
465       1
503       1
84        1
Name: product_type_no, Length: 93, dtype: int64

In [47]:
trans_arts_cust_joined['graphical_appearance_no'].value_counts()

1010016    10611
1010001     2881
1010023     1189
1010017     1148
1010010     1144
1010021      533
1010014      351
1010004      292
1010026      288
1010008      241
1010005      216
1010007      165
1010006      147
1010009      127
1010002      112
1010020      107
1010022      104
1010018       78
1010012       50
1010011       42
1010015       36
1010024       29
1010013       28
1010027       25
1010025       17
-1            16
1010019       12
1010028       11
Name: graphical_appearance_no, dtype: int64

In [48]:
trans_arts_cust_joined['colour_group_code'].value_counts()

9     6860
10    2237
73    1707
72     730
12     717
71     615
42     568
7      530
43     528
51     525
11     515
8      487
13     434
19     431
93     367
22     302
6      279
52     277
31     237
17     202
33     182
5      138
53     102
21      94
92      92
32      91
14      87
91      84
3       81
23      80
83      64
82      42
63      42
81      38
50      38
41      27
1       26
61      25
20      19
40      18
15      16
30      14
62      13
-1       9
70       8
2        7
60       6
4        5
80       2
90       2
Name: colour_group_code, dtype: int64

In [49]:
trans_arts_cust_joined['perceived_colour_value_id'].value_counts()

4     10004
1      3182
3      2964
2      2207
7       831
5       777
6        26
-1        9
Name: perceived_colour_value_id, dtype: int64

In [50]:
trans_arts_cust_joined['perceived_colour_master_id'].value_counts()

5     6770
2     3063
9     2820
12    1306
18    1130
11    1037
4      936
20     511
8      475
19     471
3      435
13     310
15     224
1      137
-1     129
7      123
6       96
14      26
16       1
Name: perceived_colour_master_id, dtype: int64

In [51]:
trans_arts_cust_joined['index_code'].value_counts()

A    8018
D    4546
B    3479
F    1088
C    1076
S     715
I     467
H     317
G     201
J      93
Name: index_code, dtype: int64

In [52]:
trans_arts_cust_joined['index_group_no'].value_counts()

1     12573
2      4546
3      1088
4      1078
26      715
Name: index_group_no, dtype: int64

In [53]:
trans_arts_cust_joined['section_no'].value_counts()

15    3611
53    2440
60    1716
61    1235
11    1227
16     958
6      820
57     688
51     684
5      680
62     600
18     395
66     392
26     392
64     377
2      313
65     307
50     298
58     243
52     238
19     212
20     192
77     187
8      153
79     132
47     131
21     124
23     115
76     115
44     110
14     110
46     107
55      96
56      79
45      68
72      58
40      49
25      48
42      37
41      37
43      36
97      35
82      30
22      26
80      23
31      20
70      14
48      11
27      10
49       9
24       4
30       4
28       3
29       1
Name: section_no, dtype: int64

In [54]:
trans_arts_cust_joined['garment_group_no'].value_counts()

1005    3405
1009    1852
1002    1834
1018    1787
1017    1670
1010    1634
1003    1461
1013    1334
1019    1045
1016     849
1025     524
1020     467
1021     448
1007     444
1012     342
1001     266
1008     229
1011     153
1023     149
1006      56
1014      51
Name: garment_group_no, dtype: int64

In [55]:
trans_arts_cust_joined['club_member_status'].value_counts()

ACTIVE        19478
PRE-CREATE      512
LEFT CLUB        10
Name: club_member_status, dtype: int64

In [56]:
trans_arts_cust_joined['fashion_news_frequency'].value_counts()

NONE         11430
Regularly     8479
ACTIVE          79
Monthly         12
Name: fashion_news_frequency, dtype: int64

In [58]:
trans_arts_cust_joined['postal_code'].value_counts()

2c29ae653a9282cce4151bd87643c907644e09541abc28ae87dea0d1f6603b1c    363
89f8890eeb72c49dcde3d8f842d1e31f54cd7765a32587a331d0c4dbf3148f6a      6
7c1fa3b0ec1d37ce2c3f34f63bd792f3b4494f324b6be5d1e4ba6a75456b96a7      5
d9272c82dc4723b0ba6d06566aa210a277400c45bd0b91b5479063b7a33bd7ee      5
c612f57c92e7b28687cd99e7bb7b2a2760f92c111d01f5eb72cdb0403650bf07      5
                                                                   ... 
5a5b79c16e6065ca2d7c8733115cc56b6afce890e6353ddbc4636e53fbc486bf      1
11b09c2cf3836c7a80f2ad2d6be4ae854a790a521e52a6c5d63d13d75762a683      1
620be22e76fc97f102f28df4ceea5a88ac1bebb70266e4918b9a540ed26e93da      1
79100f1a5c30fee51281e5b3577bbeb07f0f19a71d508fffaa555c28751987e4      1
fe7c334d69509c77d8a34088fbe99dbd4afb0143f0cf50c19d17321f2fef3275      1
Name: postal_code, Length: 18115, dtype: int64

### 2.5 Machine Learning

## 1. Image processing

Future, if there is time.

Example: https://www.kaggle.com/code/gulgaishatemerbekova/clothes-recommendation-system-using-densenet121