#RFM Analysis in Python

https://thecleverprogrammer.com/2023/06/12/rfm-analysis-using-python/

RFM Analysis: Overview
RFM Analysis is a concept used by Data Science professionals, especially in the marketing domain for understanding and segmenting customers based on their buying behaviour.

Using RFM Analysis, a business can assess customers’:


*   recency (the date they made their last purchase)
*   frequency (how often they make purchases)
*   monetary value (the amount spent on purchases)


Recency, Frequency, and Monetary value of a customer are three key metrics that provide information about customer engagement, loyalty, and value to a business.

To perform RFM analysis using Python, we need a dataset that includes customer IDs, purchase dates, and transaction amounts. With this information, we can calculate RFM values for each customer and analyze their patterns and behaviours. I found an ideal dataset for this task

## Libraries

In [148]:
import pandas as pd

import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"

from datetime import datetime

## Import data

In [149]:
df = pd.read_csv('/content/drive/MyDrive/10_Learning/03_Python/Jupiter_root/RFM/data/rfm_data.csv')
df.head()

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location
0,8814,2023-04-11,943.31,Product C,890075,Tokyo
1,2188,2023-04-11,463.7,Product A,176819,London
2,4608,2023-04-11,80.28,Product A,340062,New York
3,2559,2023-04-11,221.29,Product A,239145,London
4,9482,2023-04-11,739.56,Product A,194545,Paris


In [150]:
df.dtypes

CustomerID              int64
PurchaseDate           object
TransactionAmount     float64
ProductInformation     object
OrderID                 int64
Location               object
dtype: object

In [151]:
# To manipulate PurchaseDate, we want to convert it into a date format
df['PurchaseDate'] = pd.to_datetime(df['PurchaseDate'])


## Recency

In [152]:
# Calculate Recency
df['Recency'] = (datetime.now().date() - df['PurchaseDate'].dt.date).dt.days

## Frequency

In [153]:
# Calculate Frequency
frequency_data = df.groupby('CustomerID')['OrderID'].count().reset_index()
frequency_data.rename(columns={'OrderID': 'Frequency'}, inplace=True)


In [154]:
df = df.merge(frequency_data, on='CustomerID', how='left')

## Monetary

In [155]:
monetary_data = df.groupby("CustomerID")["TransactionAmount"].sum().reset_index()
monetary_data.rename(columns={'TransactionAmount': 'MonetaryValue'}, inplace=True)


In [156]:
df = df.merge(monetary_data, on="CustomerID", how="left")

## RFM - Scores

We assigned scores from 5 to 1 to calculate the recency score, where a higher score indicates a more recent purchase. It means that customers who have purchased more recently will receive higher recency scores.

We assigned scores from 1 to 5 to calculate the frequency score, where a higher score indicates a higher purchase frequency. Customers who made more frequent purchases will receive higher frequency scores.

To calculate the monetary score, we assigned scores from 1 to 5, where a higher score indicates a higher amount spent by the customer.



In [157]:
# we use the function cut to split the data into 5 groups

df["Recency_score"] = pd.cut(x=df['Recency'], bins=5, labels=[5, 4, 3, 2, 1])
df["Frequency_score"] = pd.cut(x=df['Frequency'], bins=5, labels=[1, 2, 3, 4, 5])
df["Monetary_score"] = pd.cut(x=df['MonetaryValue'], bins=5, labels=[1, 2, 3, 4, 5])


In [158]:
df["Recency_score"] = df["Recency_score"].astype(int)
df["Frequency_score"] = df["Frequency_score"].astype(int)
df["Monetary_score"] = df["Monetary_score"].astype(int)

## RFM Value Segmentation

We divided RFM scores into three segments, namely “Low-Value”, “Mid-Value”, and “High-Value”.

In [159]:
df['RFM_Score'] = df['Recency_score'] + df['Frequency_score'] + df['Monetary_score']

In [160]:
df["Value_Segment"] = pd.qcut(df["RFM_Score"],q=3 ,labels=['Low-Value', 'Mid-Value', 'High-Value'])

Visualisation

In [161]:
segment_counts = df["Value_Segment"].value_counts().reset_index()
segment_counts.columns = ["Value_Segment","Frequency"]
segment_counts

Unnamed: 0,Value_Segment,Frequency
0,Low-Value,435
1,Mid-Value,386
2,High-Value,179


In [162]:
pastel_colors = px.colors.qualitative.Pastel

# Create the bar chart
fig_segment_dist = px.bar(segment_counts, x='Value_Segment', y='Frequency',
                          color='Value_Segment', color_discrete_sequence=pastel_colors,
                          title='RFM Value Segment Distribution')

# Update the layout
fig_segment_dist.update_layout(xaxis_title='RFM Value Segment',
                              yaxis_title='Frequency',
                              showlegend=False)

# Show the figure
fig_segment_dist.show()

## RFM_Customer_Segments

In [163]:
#Mapping RFM_Customer_Segments
def Segments(value):
  if value >= 9:
    return 'Champions'
  elif value >= 6:
    return 'Potential Loyalists'
  elif value >= 5:
    return 'At Risk Customers'
  elif value >= 4:
    return 'Cannot Lose'
  elif value >= 3:
    return 'Lost'

#Apply function to RFM Score column:
df["RFM_Customer_Segments"]=df["RFM_Score"].apply(Segments)
#df["RFM_Customer_Segments2"]=df.apply(lambda x: Segments(x["RFM_Score"]), axis=1)

df.head()

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location,Recency,Frequency,MonetaryValue,Recency_score,Frequency_score,Monetary_score,RFM_Score,Value_Segment,RFM_Customer_Segments
0,8814,2023-04-11,943.31,Product C,890075,Tokyo,306,1,943.31,1,1,2,4,Low-Value,Cannot Lose
1,2188,2023-04-11,463.7,Product A,176819,London,306,1,463.7,1,1,1,3,Low-Value,Lost
2,4608,2023-04-11,80.28,Product A,340062,New York,306,1,80.28,1,1,1,3,Low-Value,Lost
3,2559,2023-04-11,221.29,Product A,239145,London,306,1,221.29,1,1,1,3,Low-Value,Lost
4,9482,2023-04-11,739.56,Product A,194545,Paris,306,1,739.56,1,1,2,4,Low-Value,Cannot Lose


## RFM Analysis

In [164]:
segment_product_counts = df.groupby(['Value_Segment', 'RFM_Customer_Segments'])['CustomerID'].count().reset_index(name='Count')
segment_product_counts = segment_product_counts.sort_values(['Count'], ascending=[False])
segment_product_counts

Unnamed: 0,Value_Segment,RFM_Customer_Segments,Count
9,Mid-Value,Potential Loyalists,386
0,Low-Value,At Risk Customers,180
1,Low-Value,Cannot Lose,173
14,High-Value,Potential Loyalists,117
3,Low-Value,Lost,82
12,High-Value,Champions,62
2,Low-Value,Champions,0
4,Low-Value,Potential Loyalists,0
5,Mid-Value,At Risk Customers,0
6,Mid-Value,Cannot Lose,0


In [165]:
# Treemap of the count by Value Segment and RFM Customer Segments
fig_treemap_segment_product = px.treemap(segment_product_counts,
                                         path=['Value_Segment', 'RFM_Customer_Segments'],
                                         values='Count',
                                         color='Value_Segment', color_discrete_sequence=px.colors.qualitative.Pastel,
                                         title='RFM_Customer_Segments by Value - Tree map')
fig_treemap_segment_product.show()

In [166]:
#Bar plot
fig = px.bar(segment_product_counts, x='Value_Segment', y='Count', color='RFM_Customer_Segments', color_discrete_sequence=px.colors.qualitative.Pastel, title="RFM_Customer_Segments by Value - Bar Plot")
fig.show()

In [167]:
def seg_dist(segment):
  segment_df = df[df['RFM_Customer_Segments'] == segment]

  fig = go.Figure()
  fig.add_trace(go.Box(y=segment_df['Recency_score'], name='Recency'))
  fig.add_trace(go.Box(y=segment_df['Frequency_score'], name='Frequency'))
  fig.add_trace(go.Box(y=segment_df['Monetary_score'], name='Monetary'))

  fig.update_layout(title='Distribution of RFM Values within Segment',
                    yaxis_title='RFM Value',
                    showlegend=True)

  return fig



In [168]:
seg_dist('Champions')

In [169]:
def seg_corr(segment):
  corr_df = df[df['RFM_Customer_Segments'] == segment]

  correlation_matrix = corr_df[['Recency_score', 'Frequency_score', 'Monetary_score']].corr()

  # Visualize the correlation matrix using a heatmap
  fig_heatmap = go.Figure(data=go.Heatmap(
                    z=correlation_matrix.values,
                    x=correlation_matrix.columns,
                    y=correlation_matrix.columns,
                    colorscale='RdBu',
                    colorbar=dict(title='Correlation')))

  fig_heatmap.update_layout(title='Correlation Matrix of RFM Values within '+ segment)

  return fig_heatmap

In [170]:
seg_corr('Champions')

In [171]:
segment_counts = df['RFM_Customer_Segments'].value_counts()

In [172]:
fig = go.Figure(go.Bar(x = segment_counts.index,y = segment_counts.values))

# Update the layout
fig.update_layout(title='Comparison of RFM Segments',
                  xaxis_title='RFM Segments',
                  yaxis_title='Number of Customers',
                  showlegend=False)

fig.show()

In [175]:
df

Unnamed: 0,CustomerID,PurchaseDate,TransactionAmount,ProductInformation,OrderID,Location,Recency,Frequency,MonetaryValue,Recency_score,Frequency_score,Monetary_score,RFM_Score,Value_Segment,RFM_Customer_Segments
0,8814,2023-04-11,943.31,Product C,890075,Tokyo,306,1,943.31,1,1,2,4,Low-Value,Cannot Lose
1,2188,2023-04-11,463.70,Product A,176819,London,306,1,463.70,1,1,1,3,Low-Value,Lost
2,4608,2023-04-11,80.28,Product A,340062,New York,306,1,80.28,1,1,1,3,Low-Value,Lost
3,2559,2023-04-11,221.29,Product A,239145,London,306,1,221.29,1,1,1,3,Low-Value,Lost
4,9482,2023-04-11,739.56,Product A,194545,Paris,306,1,739.56,1,1,2,4,Low-Value,Cannot Lose
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2970,2023-06-10,759.62,Product B,275284,London,246,1,759.62,5,1,2,8,High-Value,Potential Loyalists
996,6669,2023-06-10,941.50,Product C,987025,New York,246,1,941.50,5,1,2,8,High-Value,Potential Loyalists
997,8836,2023-06-10,545.36,Product C,512842,London,246,1,545.36,5,1,2,8,High-Value,Potential Loyalists
998,1440,2023-06-10,729.94,Product B,559753,Paris,246,1,729.94,5,1,2,8,High-Value,Potential Loyalists


In [176]:
Score = df[['RFM_Customer_Segments','Recency_score', 'Frequency_score', 'Monetary_score']].melt('RFM_Customer_Segments', value_name='Score')
Score.rename(columns={"variable":"Score_segment"}, inplace=True)

Score_mean = Score.groupby(['RFM_Customer_Segments',"Score_segment"])['Score'].mean().reset_index()


In [179]:
fig = px.bar(Score_mean, x="RFM_Customer_Segments", y="Score",
             color="Score_segment", color_discrete_sequence=px.colors.qualitative.Pastel,
             barmode = 'group')
fig.show()