# Customer Segmentation with RFM

### What is Customer Segmentation?

Customer segmentation is the process of dividing customers into groups based on common characteristics / behaviours. There are some segmentation techniques. And RFM is one of them.

### What is RFM?

RFM analysis is a customer segmentation technique used to quantitatively rank and group customers based on three metrics. RFM is also a simple but effective segmentation technique.

![](https://cdn.hackernoon.com/images/ARCWTrgYpoc531106B2eMedWoT42-8w1376c.jpeg)

### RFM Metrics

RFM stands for three dimensions: Recency, frequency and monetary.

R for how recently did the customer purchase? This information can be used to remind recent customers to revisit the business soon to continue meeting their purchase needs.

F for how often do the customer purchase? Predicting this can assist marketing efforts directed at reminding the customer to visit the business again.

M for how much do the customer spend? While this can produce a better return on investment in marketing and customer service, it also runs the risk of alienating customers who have been consistent but have not spent as much with each transaction.

### RFM Segments

![](https://cdn.enhencer.com/website-assets/images/blog/AdvantagesAndInefficaciesOfRFMSegmentation1.png)

**Champions:** Your best customers, they buy and spend a lot and made their last purchase recently.

**Loyal Customer:** Very good customers. They spend a lot.

**Potential Loyalist:** Recent customers, but who have already spent a lot.

**New Customer:** Recent customers, who made only a few purchases.

**Promising:** Customers who buy frequently and spend a lot, but made their last purchase some time ago.

**Need Attention:** Customers with recency and above-average spending.

**At Risk:** Customers who bought frequently, but haven't made any purchases in a long time.

**Can't lose them:** Customers who have spent a lot, but have been inactive for a long time.

**Hibernate:** Low-frequency, low-spender customers who haven't bought in a long time.

**Lost:** Your worst customers. They haven't bought in a long time, they only bought once and they spent very little.

### How to perform RFM Analysis using Python?

In [1]:
import datetime as dt
import pandas as pd

In [3]:
# due to some reasons, we need to use encoding parameters
file ='/content/Year 2010-2011.csv'
import chardet
with open(file, 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

df = pd.read_csv(file,encoding='ISO-8859-1')

In [4]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


### Data Preparation

In [5]:
# Remove the missing observations from the data set. Use the 'inplace=True' parameter for subtraction.
df.dropna(inplace=True)

In [6]:
# The 'C' in the invoices shows the canceled transactions. Remove the canceled transactions from the dataset.
df = df[~df['Invoice'].str.contains('C', na=False)]

In [7]:
df = df[(df['Quantity'] > 0)]

In [8]:
df = df[(df['Price'] > 0)]

In [9]:
# Create a variable named 'TotalPrice' that represents the total earnings per invoice.
df['TotalPrice'] = df['Quantity'] * df['Price']

### Calculating RFM Metrics

In [10]:
df["InvoiceDate"].max()

'9/9/2011 9:52'

In [11]:
today_date = dt.datetime(2010, 12, 11)

In [12]:
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'], format='%m/%d/%Y %H:%M')

In [13]:
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda InvoiceDate: (today_date - InvoiceDate.max()).days,
                                     'Invoice': lambda Invoice: Invoice.nunique(),
                                     'TotalPrice': lambda TotalPrice: TotalPrice.sum()})

In [14]:
rfm.columns = ['recency', 'frequency', 'monetary']

### Calculating RFM Scores

In [15]:
# Latest date score. Here, 1 is the closest date and 5 is the farthest date.
# For us, 1.5 is of higher importance than 5, as the most important case is 1, the most recent date.
rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])

In [16]:
# Shopping frequency score. Here, 1 represents the least frequency, 5 the most frequent.
rfm["frequency_score"] = pd.qcut(rfm['frequency'].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])

In [17]:
# The amount of money he left us. Here 1 represents the least money and 5 represents the most money.
rfm["monetary_score"] = pd.qcut(rfm['monetary'], 5, labels=[1, 2, 3, 4, 5])

In [18]:
rfm["RFM_SCORE"] = (rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str))

### Creating & Analysing RFM Segments

In [19]:
# RFM nomenclature
seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

In [20]:
rfm['segment'] = rfm['RFM_SCORE'].replace(seg_map, regex=True)

In [21]:
rfm = rfm[["recency", "frequency", "monetary", "segment"]]

In [22]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,segment
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.0,-39,1,77183.6,hibernating
12347.0,-362,7,4310.0,champions
12348.0,-289,4,1797.24,at_Risk
12349.0,-346,1,1757.55,promising
12350.0,-54,1,334.4,hibernating


Sources:

https://www.techtarget.com/searchdatamanagement/definition/RFM-analysis

https://www.actioniq.com/blog/what-is-rfm-analysis/

https://clevertap.com/blog/rfm-analysis/

https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp