# Project 1, Part 6, Identifying Best Customers


# 1.6 Ideas on how the sales data can be used to help identify best customers

The data science team would like to know your best ideas on how the sales data can be used to help identify the company's best customers.



They are going to start with the most common and most basic model known as RFM, which consists of the 3 dimensions.

* R - Recency - How recently did the customer purchase?

* F - Frequency - How often do they purchase?

* M - Monetary Value - How much do they spend?



At first glance, it's pretty easy to think of a simple query for each dimension.

However, after some thought, it's not quite so easy. It's very open ended, with a lot of grey areas, and no single right or wrong answer (just like 99% of data science and AI!)  For each dimension, there can be some pretty complex ways to determine.

The data science team also has to come up with a way to synthesize the 3 dimensions into a single customer value for each customer.



The data science team would like for you to present your ideas in the form of a title and 4 paragraphs as follows:

* Title - A title describing what you will be explaining

* Recency - A brief paragraph explaining your ideas on how the data can be used to determine recency.  

* Frequency - A brief paragraph explaining your ideas on how the data can be used to determine frequency.

* Monetary Value - A brief paragraph explaining your ideas on how the data can be used to determine monetary value.

* Synthesis - A brief paragraph explaining your ideas on how to synthesize the 3 dimensions of recency, frequency, and monetary value into a customer value for each customer and how to determine who the best customers are.



Put the title and all 4 paragraphs in a single markdown cell.

Note that you do not write code for this, only english language descriptions of your ideas.



# Detailed Guide to RFM Analysis for Customer Value Estimation

## Recency
Recency measures how recently a customer has made a purchase. Computing raw (unnormalized) recency score requires: 1) Defining a reference date, such as the last date in the sales data or the current date, 2) For each customer, finding the most recent date of purchase in the sales data and 3) Subtracting the most recent date of purchase from the reference date to get the number of days since the last purchase. This is the raw recency score for each customer. For example, if we have a data set of gourmet food sales from January 1, 2020 to December 31, 2020, and we use December 31, 2020 as the reference date, then for a customer who bought gourmet food on January 15, February 10, and November 20 in 2020. Their most recent purchase was on November 20, so their recency score is December 31 - November 20 = 41 days. Depending on the distribution of raw recency scores, we can then group or segment customers into different categories using quintile approach and assign each customer a recency rating from one to five based on which quintile they fall into. A lower rating indicates a longer time since the last purchase and a higher rating indicates a shorter time since the last purchase.

## Frequency
Frequency measures how often a customer has made a purchase. Computing raw frequency score based on counting the number of purchases a customer has made in the chosen period has information loss, hence I believe using exponential weighted average (EWA) is an ideal approach. Computing raw (unnormalized) recency score requires: 1) Defining a time-interval, such as weeks or months and counting purchases made in that time, 2) Creating a time-series and computing the exponential weighted average for the sales frequency in each time step. EWA gives more weightage to last time-steps as opposed to initial time-steps, which provides additional information and differentiates customers with similar sales frequency based on their purchase history. Depending on the distribution of raw frequency scores, we can then group or segment customers into different categories using quintile approach and assign each customer a frequency rating from one to five based on which quintile they fall into. A lower rating indicates a lower purchase frequency and old customers, whereas a higher rating indicates a higher purchase frequency and new customers.

## Monetary
Monetary value measures how much a customer has spent. There are few ways to compute raw monetary score: 1) Aggregating the total amount spent by each customer in a given period, such as a year or a quarter, 2) Average order value (AOV), which is the total amount spent divided by the number of orders and 3) Customer lifetime value (CLV), which is the projected future revenue from a customer based on average revenue generated by a customer and the total average profit. I believe CLV is a complex but powerful way for representing the raw monetary score. Depending on the distribution of raw monetary scores, we can then group or segment customers into different categories using quintile approach and assign each customer a frequency rating from one to five based on which quintile they fall into. A lower rating indicates the expected revenue from a customer is low, whereas a higher rating indicates the expected revenue from a customer is high.

## Synthesis
The above suggestions for computing recency, frequency and monetary value, ensures that there are on the same scale (1-5) and semantic meaning (1-low and 5-high), which makes it easier to synthesize into a customer value. I believe frequency score, should have the maximum weightage as it’s clear metric for identifying the company’s best and loyal customers. This should be preceded by monetary value, as in the end revenue / profit is what makes a company successful and big. The least weighted score should be recency value, as it provides less information to establish customer loyalty. Suggested weights for RFM value is [Recency: 0.1, Frequency: 0.6, Monetary: 0.3], which keeps the customer value in range [1,5]. A higher rating indicates the company’s best customer and lower rating indicates the opposite. For example, if Customer A has RFM value: 432, then it’s customer value is 4 * 0.1 + 3 * 0.6 + 2 * 0.3 = 2.8. If Customer B has RFM value: 342, then it’s customer value is 3 * 0.1 + 4 * 0.6 + 2 * 0.3 = 3.3
