# Customer Segmentation - Pycaret
This notebook briefly explains how you can explore the dataset to apply data science techniques for customer segmentation.
It has been providing 3 main features that can be the input of your strategy: recency, frequency and monetary.

Some references link:

- [Wikipedia](https://en.wikipedia.org/wiki/RFM_(market_research))
- [RFM Segmentation](https://www.optimove.com/resources/learning-center/rfm-segmentation)
- [Recency, Frequency, Monetary Value (RFM) Definition](https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp)
- [What is RFM (recency, frequency, monetary) analysis?](https://www.techtarget.com/searchdatamanagement/definition/RFM-analysis)

In [None]:
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import datetime
import itertools
import plotly.express as px
import os

# Dataset initial steps...
dataset_10k = pd.read_csv(
    "/kaggle/input/customer-segmentation-rfm/customer_segmentation_10k.csv"
)
dataset_10k.last_order = pd.to_datetime(dataset_10k.last_order)
min_date = dataset_10k.last_order.min()
dataset_10k.last_order = dataset_10k.last_order.apply(lambda x: (x - min_date).days)
dataset_10k

For these experiments I gonna use [pycaret clustering](https://pycaret.gitbook.io/docs/).

In [None]:
pip install --pre pycaret==2.3.6

# Frequency
Creating the pycaret setup for frequency field, training and labelling the data.

In [None]:
from pycaret.clustering import *

exp_fre = setup(
    dataset_10k[["customer_id", "qtt_order"]].sort_values('qtt_order'),
    normalize=True,
    ignore_features=["customer_id"],
    session_id=123,
    silent=True,
    verbose=0
)

kmeans_frequency = create_model("kmeans")
result_frequency = assign_model(kmeans_frequency)
aux={}
for i, cluster in enumerate(result_frequency.groupby('Cluster').qtt_order.describe().sort_values('min').index):
    aux[cluster] =i+1
result_frequency['score']  = result_frequency.Cluster.map(aux)
result_frequency.groupby('score').qtt_order.describe().sort_values('min')

In [None]:
plot_model(kmeans_frequency, plot="elbow");

# Recency
Creating the pycaret setup for recency field, training and labelling the data.

In [None]:


exp_rec = setup(
    dataset_10k[["customer_id", "last_order"]].sort_values('last_order'),
    normalize=True,
    ignore_features=["customer_id"],
    session_id=123,
    silent=True,
    verbose=0
)

kmeans_recency = create_model("kmeans")
result_recency = assign_model(kmeans_recency)
aux={}
for i, cluster in enumerate(result_recency.groupby('Cluster').last_order.describe().sort_values('min').index):
    aux[cluster] =i+1
result_recency['score']  = result_recency.Cluster.map(aux)
result_recency.groupby('score').last_order.describe().sort_values('min')

In [None]:
plot_model(kmeans_recency, plot="elbow");

#  Monetary
Creating the pycaret setup for monetary field, training and labelling the data.

In [None]:
exp_mon = setup(
    dataset_10k[["customer_id", "total_spent"]].sort_values('total_spent'),
    normalize=True,
    ignore_features=["customer_id"],
    session_id=123,
    silent=True,
    verbose=0
)

kmeans_monetary = create_model("kmeans")
result_monetary = assign_model(kmeans_monetary)
aux={}
for i, cluster in enumerate(result_monetary.groupby('Cluster').total_spent.describe().sort_values('min').index):
    aux[cluster] =i+1
result_monetary['score']  = result_monetary.Cluster.map(aux)
result_monetary.groupby('score').total_spent.describe().sort_values('min')

In [None]:
plot_model(kmeans_recency, plot="elbow");

# RFM Table

In [None]:
dataset_10k['score_monetary']  = result_monetary.score
dataset_10k['score_frequency']  = result_frequency.score
dataset_10k['score_recency']  = result_recency.score
dataset_10k['score_total'] =  result_monetary.score  + result_frequency.score +result_recency.score  
dataset_10k

In [None]:
vc_df = dataset_10k[['score_total','score_monetary','score_frequency','score_recency']].value_counts().reset_index()
vc_df.rename(columns={0:'Counts'},inplace=True)
vc_df[:10]


#### Understanding the scores or clusters..
The score is settled between 1 and 4, where 1 means a small numeric (min) and 4 means a big numeric value (max).

| Score     	| 1         |           2~3          	| 4                               	|
|-----------	|----------------------|-------------	|-----------------------------------	|
| Monetary  	| Spent less money on total orders. |~	| Spent more money on total orders. 	|
| Recency   	| Ordered recently.            | ~    	| Ordered a long time ago.          	|
| Frequency 	| Ordered a few times.          |~    	| Ordered many times.               	|



<br/><br/><br/>





#### Apply the logic over a few examples on the table?


Here are just a few profiles that we can determine/find in the RFM models, of course, the entire categorization depends on the market strategy that you will use.



| Monetary 	| Recency 	| Frequency 	| Score 	| Customer Profile                                                                  	|
|----------	|---------	|-----------	|-------	|-----------------------------------------------------------------------------------	|
| 4        	| 2       	| 2~4       	| 8~12  	| A customer who ordered many times in the past and didn't order anything recently. 	|
| 1        	| 1       	| 1         	| 3     	| A new customer that ordered a few products recently.                              	|
| 4        	| 1       	| 4         	| 9     	| A customer that spend much...                                                     	|
| 3        	| 2       	| 2         	| 7     	| Customer that...                                                                  	|