<div style="text-align:center"><img src="https://hivemarketingcloud.com/media/zphnp5zi/rfm-analysis-blog-graphic-01.png?center=0.55126050420168071,0.58738261801222658&mode=crop&width=730&height=467&rnd=133039200171670000" /></div>


# **Customer Segmentation using RFM Analysis**


RFM analysis is a customer segmentation technique used to understand customer purchasing behaviors. RFM stands for Recency, Frequency, and Monetary, representing the first letter of each term. These three factors are analyzed separately based on customers' purchasing behavior, and the results are combined to segment customers' purchasing behavior into segments.

* Recency represents the number of days since the customer's last purchase. Customers who have made purchases more recently are generally considered more valuable. 
* Frequency represents the number of purchases a customer has made within a given time frame. Customers who purchase more frequently are considered more valuable. 
* Monetary represents the total amount of money a customer has spent within a given time frame. Customers who have spent more money are considered more valuable.

RFM analysis is used to segment customers based on a particular time frame and understand the unique purchasing behavior of each segment. By doing so, businesses can manage their customer base more effectively and improve customer satisfaction by better understanding their purchasing behaviors.

**Data Set Information**

An e-commerce company wants to segment its customers and determine marketing strategies based on these segments.

The dataset named Online Retail II contains the sales of an online retail store based in the United Kingdom between 01/12/2009 - 09/12/2011.

**Attribute Information**

* InvoiceNo: Invoice number. Unique number for each transaction or invoice. If it starts with C, it indicates a canceled transaction.
* StockCode: Product code. Unique number for each product.
* Description: Product name.
* Quantity: Quantity of products. Indicates how many of the products on the invoices were sold.
* InvoiceDate: Invoice date and time.
* UnitPrice: Product price (in British pounds).
* CustomerID: Unique customer number.
* Country: Name of the country. The country where the customer resides.


In [1]:
import datetime as dt
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.float_format', lambda x: '%.3f' % x)

df_ = pd.read_excel('/kaggle/input/uci-online-retail-ii-data-set/online_retail_II.xlsx', sheet_name='Year 2009-2010')
df = df_.copy()
df.head()


Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.1,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom


In [2]:
# Data Understanding

def check_df(dataframe, head=5):
    print(" SHAPE ".center(70, '-'))
    print('Rows: {}'.format(dataframe.shape[0]))
    print('Columns: {}'.format(dataframe.shape[1]))
    print(" TYPES ".center(70, '-'))
    print(dataframe.dtypes)
    print(" MISSING VALUES ".center(70, '-'))
    print(dataframe.isnull().sum())
    print(" DUPLICATED VALUES ".center(70, '-'))
    print(dataframe.duplicated().sum())
    print(" DESCRIBE ".center(70, '-'))
    print(dataframe.describe().T)

check_df(df)


------------------------------- SHAPE --------------------------------
Rows: 525461
Columns: 8
------------------------------- TYPES --------------------------------
Invoice                object
StockCode              object
Description            object
Quantity                int64
InvoiceDate    datetime64[ns]
Price                 float64
Customer ID           float64
Country                object
dtype: object
--------------------------- MISSING VALUES ---------------------------
Invoice             0
StockCode           0
Description      2928
Quantity            0
InvoiceDate         0
Price               0
Customer ID    107927
Country             0
dtype: int64
------------------------- DUPLICATED VALUES --------------------------
6865
------------------------------ DESCRIBE ------------------------------
                 count      mean      std        min       25%       50%  \
Quantity    525461.000    10.338  107.424  -9600.000     1.000     3.000   
Price       525461.00

In [3]:
# Data Preparation

df["TotalPrice"] = df["Quantity"] * df["Price"]

df.groupby("Invoice").agg({"TotalPrice": "sum"}).head()

df = df[(df['Quantity'] > 0)]

df.dropna(inplace=True)

df["Invoice"] = df["Invoice"].astype(str)
df = df[~df["Invoice"].str.contains("C", na=False)]

df.shape

(407695, 9)

In [4]:
# Calculating RFM Metrics

df["InvoiceDate"].max()

today_date = dt.datetime(2010, 12, 11)
type(today_date)

rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda InvoiceDate: (today_date - InvoiceDate.max()).days,
                                     'Invoice': lambda Invoice: Invoice.nunique(),
                                     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})
rfm.head()

Unnamed: 0_level_0,InvoiceDate,Invoice,TotalPrice
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346.0,165,11,372.86
12347.0,3,2,1323.32
12348.0,74,1,222.16
12349.0,43,3,2671.14
12351.0,11,1,300.93


In [5]:
# We are changing the column names.

rfm.columns = ['recency', 'frequency', 'monetary']

rfm.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
recency,4314.0,91.27,96.944,1.0,18.0,53.0,136.0,374.0
frequency,4314.0,4.454,8.169,1.0,1.0,2.0,5.0,205.0
monetary,4314.0,2047.289,8912.523,0.0,307.95,705.55,1722.802,349164.35


In [6]:
# We are filtering out the monetary values that are greater than zero.

rfm = rfm[rfm["monetary"] > 0]
rfm.shape

(4312, 3)

In [7]:
# Calculating RFM Scores

rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])

rfm["frequency_score"] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])

rfm["monetary_score"] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])

rfm["RFM_SCORE"] = (rfm['recency_score'].astype(str) +
                    rfm['frequency_score'].astype(str))

rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,recency_score,frequency_score,monetary_score,RFM_SCORE
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
12346.0,165,11,372.86,2,5,2,25
12347.0,3,2,1323.32,5,2,4,52
12348.0,74,1,222.16,2,1,1,21
12349.0,43,3,2671.14,3,3,5,33
12351.0,11,1,300.93,5,1,2,51


In [8]:
# Creating & Analysing RFM Segments

seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex=True)

rfm[["segment", "recency", "frequency", "monetary"]].groupby("segment").agg(["mean", "count"])

Unnamed: 0_level_0,recency,recency,frequency,frequency,monetary,monetary
Unnamed: 0_level_1,mean,count,mean,count,mean,count
segment,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
about_to_sleep,53.819,343,1.201,343,441.32,343
at_Risk,152.159,611,3.074,611,1188.878,611
cant_loose,124.117,77,9.117,77,4099.45,77
champions,7.119,663,12.554,663,6852.264,663
hibernating,213.886,1015,1.126,1015,403.978,1015
loyal_customers,36.287,742,6.83,742,2746.067,742
need_attention,53.266,207,2.449,207,1060.357,207
new_customers,8.58,50,1.0,50,386.199,50
potential_loyalists,18.793,517,2.017,517,729.511,517
promising,25.747,87,1.0,87,367.087,87


**Based on the results of RFM analysis, different marketing strategies can be developed according to the behaviors of customer groups segmented by RFM scores.**

**For example, reward systems such as loyalty programs can be implemented to increase the purchase frequency of customers with the highest RFM scores while maintaining their loyalty.**

**Customers with low frequency and monetary scores can be encouraged to increase their purchase frequency through discounts, offers, and other benefits.**

**In addition, targeted marketing campaigns can be created for different customer groups based on the results of RFM analysis. For instance, discounts can be offered exclusively to customers who haven't made a purchase for a long time, or product recommendations can be made to increase customers' shopping activity.**

**All these marketing strategies can be supported by customer-specific reports and data visualizations. The results of RFM analysis can be used in these reports and visualizations to provide a more detailed view of customer behavior.**