# Customer Segmentation with RFM in 6 Steps

### 1. Business Problem
### 2. Data Understanding
### 3. Data Preparation
### 4. Calculating RFM Metrics
### 5. Calculating RFM Scores
### 6. Naming & Analysing RFM Segments

### Attribute Information:

### 1. Business Problem
An e-commerce company wants to segment its customers and determine marketing strategies according to these segments. To this end, we will define the behavior of customers and create groups according to clusters in these behaviors.

### 2. Data Understanding

In [1]:
import datetime as dt
import pandas as pd
pd.set_option('display.max_columns', None)
df_=pd.read_excel(r'E:\3. HAFTA\online_retail_II.xlsx')
df = df_.copy()
df.head()


Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


In [2]:
df.groupby("Description").agg({"Quantity": "sum"}).sort_values("Quantity", ascending=False).head()

Unnamed: 0_level_0,Quantity
Description,Unnamed: 1_level_1
WHITE HANGING HEART T-LIGHT HOLDER,57733
WORLD WAR 2 GLIDERS ASSTD DESIGNS,54698
BROCADE RING PURSE,47647
PACK OF 72 RETRO SPOT CAKE CASES,46106
ASSORTED COLOUR BIRD ORNAMENT,44925


In [4]:
# fatura basina ortalama kac para kazanilmistir? ,
# (iki değişkeni çarparak yeni bir değişken oluşturmak gerekmektedir)
# iadeleri çıkararak yeniden df'i oluşturalım
df = df[~df["Invoice"].str.contains("C", na=False)]

df["TotalPrice"] = df["Quantity"] * df["Price"]


### 3. Data Preparation

In [5]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107560
Country             0
TotalPrice          0
dtype: int64

In [6]:
df.dropna(inplace=True)

df.describe([0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 0.99]).T


Unnamed: 0,count,mean,std,min,1%,5%,10%,25%,50%,75%,90%,95%,99%,max
Quantity,407695.0,13.586686,96.842229,1.0,1.0,1.0,1.0,2.0,5.0,12.0,24.0,36.0,144.0,19152.0
Price,407695.0,3.294188,34.756655,0.0,0.29,0.42,0.65,1.25,1.95,3.75,6.75,8.5,14.95,10953.5
Customer ID,407695.0,15368.504107,1679.7957,12346.0,12435.0,12731.0,13044.0,13997.0,15321.0,16812.0,17706.0,17913.0,18196.0,18287.0
TotalPrice,407695.0,21.663261,77.147356,0.0,0.65,1.25,2.1,4.95,11.9,19.5,35.7,67.5,201.6,15818.4


### 4. Calculating RFM Metrics

In [8]:
#Recency, Frequency, Monetary

#Recency : How recent was a customer’s latest purchase from you?
#today date - last purchase date

df["InvoiceDate"].max()

today_date = dt.datetime(2010, 12, 11) #added 2 days to last purchase date(otherwise those who purchased on the last day would be 0)

#creating RFM
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda date: (today_date - date.max()).days,
                                     'Invoice': lambda num: len(num),
                                     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

#columns name
rfm.columns = ['Recency', 'Frequency', 'Monetary']

#we observed only those with frequency and monetary value greater than zero. So it has a frequency but has zero monetary value.
rfm[~((rfm["Monetary"]) > 0 & (rfm["Frequency"] > 0))]
#so we chose the greater than zero of these two values
rfm = rfm[(rfm["Monetary"]) > 0 & (rfm["Frequency"] > 0)]


In [9]:
rfm

Unnamed: 0_level_0,Recency,Frequency,Monetary
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346.0,165,33,372.86
12347.0,3,71,1323.32
12348.0,74,20,222.16
12349.0,43,102,2671.14
12351.0,11,21,300.93
...,...,...,...
18283.0,18,230,641.77
18284.0,67,28,461.68
18285.0,296,12,427.00
18286.0,112,67,1296.43


### 5. Calculating RFM Scores

In [10]:
rfm["RecencyScore"] = pd.qcut(rfm['Recency'], 5, labels=[5, 4, 3, 2, 1])
rfm["FrequencyScore"] = pd.qcut(rfm['Frequency'], 5, labels=[1, 2, 3, 4, 5])
rfm["MonetaryScore"] = pd.qcut(rfm['Monetary'], 5, labels=[1, 2, 3, 4, 5])


rfm["RFM_SCORE"] = (rfm['RecencyScore'].astype(str) +
                    rfm['FrequencyScore'].astype(str) +
                    rfm['MonetaryScore'].astype(str))

#let's check
rfm[rfm["RFM_SCORE"] == "555"].head()



Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12415.0,11,212,19543.84,5,5,5,555
12431.0,9,170,4370.52,5,5,5,555
12433.0,2,286,7205.39,5,5,5,555
12471.0,10,678,20139.74,5,5,5,555
12472.0,5,572,11308.48,5,5,5,555


### 6. Naming & Analysing RFM Segments

In [11]:
seg_map = {
    r'[1-2][1-2]': 'Hibernating',
    r'[1-2][3-4]': 'At_Risk',
    r'[1-2]5': 'Cant_Loose',
    r'3[1-2]': 'About_to_Sleep',
    r'33': 'Need_Attention',
    r'[3-4][4-5]': 'Loyal_Customers',
    r'41': 'Promising',
    r'51': 'New_Customers',
    r'[4-5][2-3]': 'Potential_Loyalists',
    r'5[4-5]': 'Champions'
}

rfm

Unnamed: 0_level_0,Recency,Frequency,Monetary,RecencyScore,FrequencyScore,MonetaryScore,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12346.0,165,33,372.86,2,3,2,232
12347.0,3,71,1323.32,5,4,4,544
12348.0,74,20,222.16,2,2,1,221
12349.0,43,102,2671.14,3,4,5,345
12351.0,11,21,300.93,5,2,2,522
...,...,...,...,...,...,...,...
18283.0,18,230,641.77,4,5,3,453
18284.0,67,28,461.68,3,2,2,322
18285.0,296,12,427.00,1,1,2,112
18286.0,112,67,1296.43,2,4,4,244


In [12]:
rfm['Segment'] = rfm['RecencyScore'].astype(str) + rfm['FrequencyScore'].astype(str)
rfm['Segment']

Customer ID
12346.0    23
12347.0    54
12348.0    22
12349.0    34
12351.0    52
           ..
18283.0    45
18284.0    32
18285.0    11
18286.0    24
18287.0    44
Name: Segment, Length: 4312, dtype: object

In [13]:
rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)
rfm['Segment']

Customer ID
12346.0                At_Risk
12347.0              Champions
12348.0            Hibernating
12349.0        Loyal_Customers
12351.0    Potential_Loyalists
                  ...         
18283.0        Loyal_Customers
18284.0         About_to_Sleep
18285.0            Hibernating
18286.0                At_Risk
18287.0        Loyal_Customers
Name: Segment, Length: 4312, dtype: object

In [14]:
rfm[["Segment", "Recency", "Frequency", "Monetary"]].groupby("Segment").agg(["mean", "count"])

Unnamed: 0_level_0,Recency,Recency,Frequency,Frequency,Monetary,Monetary
Unnamed: 0_level_1,mean,count,mean,count,mean,count
Segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
About_to_Sleep,53.543605,344,16.107558,344,447.839826,344
At_Risk,165.055459,577,59.564991,577,1180.62517,577
Cant_Loose,128.868687,99,220.646465,99,3002.42698,99
Champions,7.039557,632,273.35443,632,6964.077188,632
Hibernating,206.061344,1027,14.516066,1027,461.186768,1027
Loyal_Customers,37.401035,773,168.276843,773,2662.046864,773
Need_Attention,53.680723,166,45.271084,166,935.626627,166
New_Customers,7.757576,66,7.818182,66,482.087121,66
Potential_Loyalists,18.439922,516,37.203488,516,1024.27688,516
Promising,24.991071,112,8.616071,112,456.508214,112
