Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
368 lines (287 sloc) 13 KB
---
title: "RFM - Customer Level Data"
author: "Aravind Hebbali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to RFM}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Introduction
```{r, echo=FALSE, message=FALSE}
library(rfm)
library(knitr)
library(kableExtra)
library(magrittr)
library(dplyr)
library(ggplot2)
library(DT)
library(grDevices)
library(RColorBrewer)
options(knitr.table.format = "html")
options(tibble.width = Inf)
```
**RFM** (recency, frequency, monetary) analysis is a behavior based technique used to segment customers by examining their transaction history such as
- how recently a customer has purchased (recency)
- how often they purchase (frequency)
- how much the customer spends (monetary)
It is based on the marketing axiom that **80% of your business comes from 20% of your customers**. RFM helps to identify customers who are more likely to respond to promotions by segmenting them into various categories.
## Data
To calculate the RFM score for each customer we need transaction data which should include the following:
- a unique customer id
- number of transaction/order
- total revenue from the customer
- number of days since the last visit
`rfm` includes a sample data set `rfm_data_orders` which includes the above details:
```{r rfm_data_orders}
rfm_data_customer
```
## RFM Score
So how is the RFM score computed for each customer? The below steps explain the process:
- A recency score is assigned to each customer based on date of most recent purchase. The score is generated by binning the recency values into a number of categories (default is 5). For example, if you use four categories, the customers with the most recent purchase dates receive a recency ranking of 4, and those with purchase dates in the distant past receive a recency ranking of 1.
- A frequency ranking is assigned in a similar way. Customers with high purchase frequency are assigned a higher score (4 or 5) and those with lowest frequency are assigned a score 1.
- Monetary score is assigned on the basis of the total revenue generated by the customer in the period under consideration for the analysis. Customers with highest revenue/order amount are assigned a higher score while those with lowest revenue are assigned a score of 1.
- A fourth score, RFM score is generated which is simply the three individual scores concatenated into a single value.
The customers with the highest RFM scores are most likely to respond to an offer. Now that we have understood how the RFM score is computed, it is time to put it into practice. Use `rfm_table_order()` to generate the score for each customer from the sample data set `rfm_data_orders`.
`rfm_table_order()` takes 8 inputs:
- `data`: a data set with
- unique customer id
- date of transaction
- and amount
- `customer_id`: name of the customer id column
- `order_date`: name of the transaction date column
- `revenue`: name of the transaction amount column
- `analysis_date`: date of analysis
- `recency_bins`: number of rankings for recency score (default is 5)
- `frequency_bins`: number of rankings for frequency score (default is 5)
- `monetary_bins`: number of rankings for monetary score (default is 5)
## RFM Table
```{r rfm_table_order, eval=FALSE}
analysis_date <- lubridate::as_date('2007-01-01', tz = 'UTC')
rfm_result <- rfm_table_customer(rfm_data_customer, customer_id, number_of_orders,
recency_days, revenue, analysis_date)
rfm_result
```
```{r rfm_table_order2, eval=TRUE, echo=FALSE}
analysis_date <- lubridate::as_date('2007-01-01', tz = 'UTC')
rfm_result <- rfm_table_customer(rfm_data_customer, customer_id, number_of_orders,
recency_days, revenue, analysis_date)
rfm_result %>%
use_series(rfm) %>%
slice(1:10) %>%
kable() %>%
kable_styling()
```
`rfm_table_customer()` will return the following columns as seen in the above table:
- `customer_id`: unique customer id
- `date_most_recent`: date of most recent visit
- `recency_days`: days since the most recent visit
- `transaction_count`: number of transactions of the customer
- `amount`: total revenue generated by the customer
- `recency_score`: recency score of the customer
- `frequency_score`: frequency score of the customer
- `monetary_score`: monetary score of the customer
- `rfm_score`: RFM score of the customer
## Heat Map
The heat map shows the average monetary value for different categories of recency and frequency scores. Higher scores of frequency and recency are characterized by higher average monetary value as indicated by the darker areas in the heatmap.
```{r heatmap, fig.align='center', fig.width=8, fig.height=6}
rfm_heatmap(rfm_result)
```
## Bar Chart
Use `rfm_bar_chart()` to generate the distribution of monetary scores for the different combinations of frequency and recency scores.
```{r barchart, fig.align='center', fig.width=8, fig.height=6}
rfm_bar_chart(rfm_result)
```
## Histogram
Use `rfm_histograms()` to examine the relative distribution of
- monetary value (total revenue generated by each customer)
- recency days (days since the most recent visit for each customer)
- frequency (transaction count for each customer)
```{r rfmhist, fig.align='center', fig.width=8, fig.height=6}
rfm_histograms(rfm_result)
```
## Customers by Orders
Visualize the distribution of customers across orders.
```{r rfmorders, fig.align='center', fig.width=8, fig.height=6}
rfm_order_dist(rfm_result)
```
## Scatter Plots
The best customers are those who:
- bought most recently
- most often
- and spend the most
Now let us examine the relationship between the above.
#### Recency vs Monetary Value
Customers who visited more recently generated more revenue compared to those who visited in the distant past. The customers who visited in the recent past are more likely to return compared to those who visited long time ago as most of those would be lost customers. As such, higher revenue would be associated with most recent visits.
```{r mr, fig.align='center', fig.width=7, fig.height=7}
rfm_rm_plot(rfm_result)
```
#### Frequency vs Monetary Value
As the frequency of visits increases, the revenue generated also increases. Customers who visit more frquently are your champion customers, loyal customers or potential loyalists and they drive higher revenue.
```{r fm, fig.align='center', fig.width=7, fig.height=7}
rfm_fm_plot(rfm_result)
```
#### Recency vs Frequency
Customers with low frequency visited in the distant past while those with high frequency have visited in the recent past. Again, the customers who visited in the recent past are more likely to return compared to those who visited long time ago. As such, higher frequency would be associated with the most recent visits.
```{r fr, fig.align='center', fig.width=7, fig.height=7}
rfm_rf_plot(rfm_result)
```
## Segments
Let us classify our customers based on the individual recency, frequency and monetary scores.
```{r segments, echo=FALSE}
segment <- c(
"Champions", "Loyal Customers", "Potential Loyalist",
"New Customers", "Promising", "Need Attention",
"About To Sleep", "At Risk", "Can't Lose Them", "Hibernating",
"Lost"
)
description <- c(
"Bought recently, buy often and spend the most",
"Spend good money. Responsive to promotions",
"Recent customers, spent good amount, bought more than once",
"Bought more recently, but not often",
"Recent shoppers, but haven't spent much",
"Above average recency, frequency & monetary values",
"Below average recency, frequency & monetary values",
"Spent big money, purchased often but long time ago",
"Made big purchases and often, but long time ago",
"Low spenders, low frequency, purchased long time ago",
"Lowest recency, frequency & monetary scores"
)
recency <- c("4 - 5", "2 - 5", "3 - 5", "4 - 5", "3 - 4", "2 - 3", "2 - 3", "<= 2", "<= 1", "1 - 2", "<= 2")
frequency <- c("4 - 5", "3 - 5", "1 - 3", "<= 1", "<= 1", "2 - 3", "<= 2", "2 - 5", "4 - 5", "1 - 2", "<= 2")
monetary <- c("4 - 5", "3 - 5", "1 - 3", "<= 1", "<= 1", "2 - 3", "<= 2", "2 - 5", "4 - 5", "1 - 2", "<= 2")
segments <- tibble(
Segment = segment, Description = description,
R = recency, `F` = frequency, M = monetary
)
segments %>%
kable() %>%
kable_styling(full_width = TRUE, font_size = 12)
```
## Segmented Customer Data
We can use the segmented data to identify
- best customers
- loyal customers
- at risk customers
- and lost customers
Once we have classified a customer into a particular segment, we can take appropriate action to increase his/her lifetime value.
```{r criteria, echo=FALSE}
rfm_segments <- rfm_result %>%
use_series(rfm) %>%
mutate(
segment = case_when(
(recency_score %>% between(4, 5)) & (frequency_score %>% between(4, 5)) &
(monetary_score %>% between(4, 5)) ~ "Champions",
(recency_score %>% between(2, 5)) & (frequency_score %>% between(3, 5)) &
(monetary_score %>% between(3, 5)) ~ "Loyal Customers",
(recency_score %>% between(3, 5)) & (frequency_score %>% between(1, 3)) &
(monetary_score %>% between(1, 3)) ~ "Potential Loyalist",
(recency_score %>% between(4, 5)) & (frequency_score == 1) &
(monetary_score == 1) ~ "New Customers",
(recency_score %>% between(3, 4)) & (frequency_score == 1) &
(monetary_score == 1) ~ "Promising",
(recency_score %>% between(2, 3)) & (frequency_score %>% between(2, 3)) &
(monetary_score %>% between(2, 3)) ~ "Needs Attention",
(recency_score %>% between(2, 3)) & (frequency_score <= 2) &
(monetary_score <= 2) ~ "About To Sleep",
(recency_score <= 2) & (frequency_score %>% between(2, 5)) &
(monetary_score %>% between(2, 5)) ~ "At Risk",
(recency_score == 1) & (frequency_score %>% between(4, 5)) &
(monetary_score %>% between(4, 5)) ~ "Cant Lose Them",
(recency_score %>% between(1, 2)) & (frequency_score %>% between(1, 2)) &
(monetary_score %>% between(1, 2)) ~ "Hibernating",
(recency_score <= 2) & (frequency_score <= 2) &
(monetary_score <= 2) ~ "Lost",
TRUE ~ "Others"
)
) %>%
select(
customer_id, segment, rfm_score, transaction_count, recency_days,
amount
)
# use datatable
rfm_segments %>%
datatable(
filter = "top",
options = list(pageLength = 5, autoWidth = TRUE),
colnames = c(
"Customer", "Segment", "RFM",
"Orders", "Recency", "Total Spend"
)
)
```
## Segment Size
Now that we have defined and segmented our customers, let us examine the distribution of customers across the segments. Ideally, we should have very few or no customer in segments such as `At Risk` or `Needs Attention`.
```{r rfm_customers}
rfm_segments %>%
count(segment) %>%
arrange(desc(n)) %>%
rename(Segment = segment, Count = n)
```
## Segments
We can also examine the median recency, frequency and monetary value across segments to ensure that the
logic used for customer classification is sound and practical.
### Median Recency
```{r avg_recency, fig.align='center', fig.height=5, fig.width=6}
data <-
rfm_segments %>%
group_by(segment) %>%
select(segment, recency_days) %>%
summarize(median(recency_days)) %>%
rename(segment = segment, avg_recency = `median(recency_days)`) %>%
arrange(avg_recency)
n_fill <- nrow(data)
ggplot(data, aes(segment, avg_recency)) +
geom_bar(stat = "identity", fill = brewer.pal(n = n_fill, name = "Set1")) +
xlab("Segment") + ylab("Median Recency") +
ggtitle("Median Recency by Segment") +
coord_flip() +
theme(
plot.title = element_text(hjust = 0.5)
)
```
### Median Frequency
```{r avg_frequency, fig.align='center', fig.height=5, fig.width=6}
data <-
rfm_segments %>%
group_by(segment) %>%
select(segment, transaction_count) %>%
summarize(median(transaction_count)) %>%
rename(segment = segment, avg_frequency = `median(transaction_count)`) %>%
arrange(avg_frequency)
n_fill <- nrow(data)
ggplot(data, aes(segment, avg_frequency)) +
geom_bar(stat = "identity", fill = brewer.pal(n = n_fill, name = "Set1")) +
xlab("Segment") + ylab("Median Frequency") +
ggtitle("Median Frequency by Segment") +
coord_flip() +
theme(
plot.title = element_text(hjust = 0.5)
)
```
### Median Monetary Value
```{r avg_monetary, fig.align='center', fig.height=5, fig.width=6}
data <-
rfm_segments %>%
group_by(segment) %>%
select(segment, amount) %>%
summarize(median(amount)) %>%
rename(segment = segment, avg_monetary = `median(amount)`) %>%
arrange(avg_monetary)
n_fill <- nrow(data)
ggplot(data, aes(segment, avg_monetary)) +
geom_bar(stat = "identity", fill = brewer.pal(n = n_fill, name = "Set1")) +
xlab("Segment") + ylab("Median Monetary Value") +
ggtitle("Median Monetary Value by Segment") +
coord_flip() +
theme(
plot.title = element_text(hjust = 0.5)
)
```
## References
- Data Mining: Concepts and Techniques , Second Edition , Jiawei Han
University of Illinois at Urbana-Champaign Micheline Kamber.
- https://joaocorreia.io/blog/rfm-analysis-increase-sales-by-segmenting-your-customers.html
- http://www.sciencedirect.com/science/article/pii/S1877050910003868