
# RFM analysis is a customer segmentation technique used in marketing and business intelligence. It involves analyzing three key metrics for each customer:

**Recency (R):** How recently a customer made a purchase or engaged with the business. It measures the time since the last customer transaction.

**Frequency (F):** How often a customer makes a purchase or engages with the business. It counts the number of transactions within a specific period.

**Monetary Value (M):** The total amount of money a customer has spent or contributed to the business. It reflects the customer's overall value in terms of monetary contributions.

# Import libraries

In [None]:
import pandas as pd
import datetime as dt
import warnings
warnings.filterwarnings('ignore')

# Read data

In [None]:
df = pd.read_csv(r"/content/compined_data.csv")

In [None]:
df.head()

Unnamed: 0,DATE,STORE_NBR,LYLTY_CARD_NBR,TXN_ID,PROD_NBR,PROD_NAME,PROD_QTY,TOT_SALES,size,brand,LIFESTAGE,PREMIUM_CUSTOMER
0,2018-10-17,1,1000,1,5,Natural Chip Compny Seasalt,2,6.0,175,NATURAL,YOUNG SINGLES/COUPLES,Premium
1,2019-05-14,1,1307,348,66,Ccs Nacho Cheese,3,6.3,175,CCS,MIDAGE SINGLES/COUPLES,Budget
2,2019-05-20,1,1343,383,61,Smithss Crinkle Cut Chips Chicken,2,2.9,170,SMITHS,MIDAGE SINGLES/COUPLES,Budget
3,2018-08-17,2,2373,974,69,Smithss Chip Thinly S/Cream&Onion,5,15.0,175,SMITHS,MIDAGE SINGLES/COUPLES,Budget
4,2018-08-18,2,2426,1038,108,Kettle Tortilla Chpshny&Jlpno Chili,3,13.8,150,KETTLE,MIDAGE SINGLES/COUPLES,Budget


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 245143 entries, 0 to 245142
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   DATE              245143 non-null  object 
 1   STORE_NBR         245143 non-null  int64  
 2   LYLTY_CARD_NBR    245143 non-null  int64  
 3   TXN_ID            245143 non-null  int64  
 4   PROD_NBR          245143 non-null  int64  
 5   PROD_NAME         245143 non-null  object 
 6   PROD_QTY          245143 non-null  int64  
 7   TOT_SALES         245143 non-null  float64
 8   size              245143 non-null  int64  
 9   brand             245143 non-null  object 
 10  LIFESTAGE         245143 non-null  object 
 11  PREMIUM_CUSTOMER  245143 non-null  object 
dtypes: float64(1), int64(6), object(5)
memory usage: 22.4+ MB


In [None]:
df.DATE = pd.to_datetime(df.DATE, format='%Y-%m-%d')

In [None]:
for column in df.columns:
    if df[column].dtype == 'object':
        df[column] = df[column].astype('category')

# RFM

In [None]:
df['DATE'].max()

Timestamp('2019-06-30 00:00:00')

In [None]:
now =  dt.datetime(2019,7,1)

In [None]:
rfm = df.groupby('LYLTY_CARD_NBR').agg({'DATE' : lambda day : (now - day.max()).days,
                               'TXN_ID': lambda TXN_ID : len(TXN_ID),
                              'TOT_SALES': lambda TOT_SALES : TOT_SALES.sum()


                             })
col_list = ['Recency','Frequency','Monetary']
rfm.columns = col_list

In [None]:
rfm.describe()

Unnamed: 0,Recency,Frequency,Monetary
count,71285.0,71285.0,71285.0
mean,107.946917,3.438914,25.176341
std,93.859934,2.433053,19.492851
min,1.0,1.0,1.7
25%,32.0,1.0,8.8
50%,79.0,3.0,20.6
75%,163.0,5.0,37.4
max,365.0,17.0,1300.0


In [None]:
rfm["R"] = pd.qcut(rfm["Recency"], 5, labels=[5, 4, 3, 2, 1])

In [None]:
rfm["F"] = pd.qcut(rfm["Frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])


In [None]:
rfm["M"] = pd.qcut(rfm["Monetary"], 5, labels=[1,2,3,4,5])

In [None]:
rfm["RFM_Score"] = rfm["R"].astype(str) +rfm["F"].astype(str) + rfm["M"].astype(str)

## We will depend on Recency and Frequency in  Customer Segmentation

- **Hibernating:** Customers who made infrequent purchases a while ago.

- **At Risk:** Customers with moderate frequency who made purchases a while ago.

- **Can't Lose:** Customers with high frequency despite less recent purchases.

- **About to Sleep:** Customers who made a recent purchase but infrequently.

- **Need Attention:** Recently purchased with a moderate frequency, may need attention.

- **Loyal Customers:** Both recent and frequent buyers, indicating loyalty.

- **Promising:** Recently purchased but with a low frequency, showing potential.

- **New Customers:** Recently acquired with a low frequency, considered new.

- **Potential Loyalists:** Recent purchases with a moderate frequency, potential for loyalty.

- **Champions:** Recent and frequent buyers, highly valuable and loyal.

In [None]:
seg_map = {
    r'[1-2][1-2]': 'Hibernating',
    r'[1-2][3-4]': 'At Risk',
    r'[1-2]5': 'Can\'t Loose',
    r'3[1-2]': 'About to Sleep',
    r'33': 'Need Attention',
    r'[3-4][4-5]': 'Loyal Customers',
    r'41': 'Promising',
    r'51': 'New Customers',
    r'[4-5][2-3]': 'Potential Loyalists',
    r'5[4-5]': 'Champions'
}

In [None]:
rfm['Segment'] = rfm['R'].astype(str) + rfm['F'].astype(str)
rfm['Segment'] = rfm['Segment'].replace(seg_map, regex=True)
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Monetary,R,F,M,RFM_Score,Segment
LYLTY_CARD_NBR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1000,257,1,6.0,1,1,1,111,Hibernating
1002,288,1,2.7,1,1,1,111,Hibernating
1003,115,2,6.6,2,2,1,221,Hibernating
1004,241,1,1.9,1,1,1,111,Hibernating
1005,185,1,2.8,2,1,1,211,Hibernating


In [None]:
segment_means = rfm.groupby('Segment').mean().sort_values('Monetary')
segment_means

Unnamed: 0_level_0,Recency,Frequency,Monetary
Segment,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Promising,41.292537,1.0,6.192836
New Customers,12.180797,1.0,6.399081
Hibernating,225.518701,1.247193,8.3408
About to Sleep,81.932155,1.417017,9.674387
Potential Loyalists,27.896345,2.374265,17.380583
Need Attention,81.009961,2.734983,20.294295
At Risk,170.999433,3.237561,24.400697
Loyal Customers,56.732501,5.912791,43.864317
Champions,11.960051,6.380474,47.134004
Can't Loose,140.098827,6.773032,49.96809
