<font color = '#BB6688'>
    
# **Customer Segmentation with RFM**

![](http://www.blastanalytics.com/wp-content/uploads/2016/07/rfm-analysis-blog-header-nofade.jpg)

<font color = '#BB6688'>
    
   
Content:
    
1.[RFM Analysis](#1) 
    
* [Problem to be solved](#2)
    
2.[Data Understanding](#3)
    
* [Import Libraries](#4)
    
* [Load Data](#5)
    
* [Data Analysis](#6)
    
* [Data Preprocessing](#7)
   
3.[RFM Segments](#8)
    

<font color = '#000'>
    <a id = "1"></a><br>
    
#  **RFM Analysis**

Rfm analysis is a technique used to segment customer behavior. It helps determine marketing and sales strategies through customers purchasing habits.The word RFM consists of the initials of the metrics Recency, Frequency, Monetary. Each corresponds to some basic customer trait.An RFM analysis can show you who are the most valuable customers for your business. The ones who buy most frequently, most often, and spend the most.


<font color = '#000'>
   <a id = "2"></a><br>
    
## Problems to be solved
An e-commerce company wants to segment it’s customers and determine marketing startups according to these segments.
RFM analysis helps marketers find answers to the following questions:
* Who are my best customers?
* Which customers are at the verge of churning?
* Who has the potential to be converted in more profitable customers?
* Who are lost customers that you don’t need to pay much attention to?
* Which customers you must retain?
* Who are your loyal customers?
* Which group of customers is most likely to respond to your current campaign? 
* How do I attract new customers to the company?

<font color = '#000'>
    <a id = "3"></a><br>
    
# **Data Understanding**

<font color = '#000'>
    <a id = "4"></a><br>
    
## Import Libraries 

In [None]:
!pip install xlrd
!pip install openpyxl
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import seaborn as sns
import missingno as msno
import warnings
warnings.filterwarnings("ignore")

<font color = '#000'>
    <a id = "5"></a><br>
    
## Load Data

In [None]:
# reading the dataset

df_=pd.read_excel("/kaggle/input/online-retail-ii-data-set-from-ml-repository/online_retail_II.xlsx",
                  sheet_name="Year 2009-2010")
df = df_.copy()

In [None]:
df.shape

In [None]:
df.head()

<font color = '#000'>
    <a id = "6"></a><br>
    
 ## Data Analysis

In [None]:
#Checking Variables

def check_df(dataframe):
    print("##################### Shape #####################")
    print(dataframe.shape)
    print("##################### Types #####################")
    print(dataframe.dtypes)
    print("##################### Head #####################")
    print(dataframe.head(3))
    print("##################### Tail #####################")
    print(dataframe.tail(3))
    print("##################### NA #####################")
    print(dataframe.isnull().sum())
    print("##################### Quantiles #####################")
    print(dataframe.quantile([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df)

In [None]:
sns.boxplot(df["Quantity"]);

In [None]:
sns.boxplot(df["Price"]);

In [None]:
# unique number of products?
df["Description"].nunique()

In [None]:
# how many of which products are there?
df["Description"].value_counts().head()

In [None]:
# what is the most ordered product?
df.groupby("Description").agg({"Quantity": "sum"}).sort_values("Quantity", ascending=False)

In [None]:
# how many invoices have been cut in total?
df["Invoice"].nunique()

In [None]:
# what are the most expensive products?
df.sort_values("Price", ascending=False)

<font color = '#000'>
    <a id = "7"></a><br>
    
## Data Preprocessing

 **The most commonly used method in outlier analysis is to set a lower limit and an upper limit and suppress it.
 Here, the IQR is calculated by considering the 1st and 3rd quartiles, and the lower and upper limits are determined here.**

In [None]:
def outlier_thresholds(dataframe, variable):
    quartile1 = dataframe[variable].quantile(0.01)
    quartile3 = dataframe[variable].quantile(0.99)
    interquantile_range = quartile3 - quartile1
    up_limit = quartile3 + 1.5 * interquantile_range
    low_limit = quartile1 - 1.5 * interquantile_range
    return low_limit, up_limit


def replace_with_thresholds(dataframe, variable):
    low_limit, up_limit = outlier_thresholds(dataframe, variable)
    dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [None]:
replace_with_thresholds(df,"Quantity")
replace_with_thresholds(df,"Price")

In [None]:
# cleared outlier values
df.describe([0.01,0.25,0.50,0.75,0.99]).T

In [None]:
def crm_data_prep(dataframe):
    dataframe.dropna(axis=0, inplace=True)
    dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
    dataframe = dataframe[dataframe["Quantity"] > 0]
    dataframe["TotalPrice"] = dataframe["Quantity"] * dataframe["Price"]
    return dataframe

In [None]:
df=crm_data_prep(df)

In [None]:
df.head()

In [None]:
check_df(df)

<font color = '#000'>
    <a id = "8"></a><br>

# RFM Segments
    

![](https://cdn-bjlne.nitrocdn.com/pTOvwVLqIiaWgukfsujeSbgmJtDkgBpj/assets/static/optimized/rev-85d2a06/wp-content/uploads/2019/06/RFm.jpg)

The word RFM consists of the initials of the metrics Recency, Frequency, Monetary. Each corresponds to some basic customer trait.
* Recency: the period of time since their last purchase
* Frequency: the metric that shows how often a customer purchases or customer visit
* Monetary: the total value of their purchases

In [None]:
def create_rfm(dataframe):
    # RFM Metrics
    

    today_date = dt.datetime(2010, 12, 11) # df["InvoiceDate"].max() last transaction date

    rfm = dataframe.groupby('Customer ID').agg({'InvoiceDate': lambda date: (today_date - date.max()).days,
                                                'Invoice': lambda num: num.nunique(),
                                                "TotalPrice": lambda price: price.sum()})

    rfm.columns = ['recency', 'frequency', "monetary"]

    rfm = rfm[(rfm['monetary'] > 0)]


    # RFM score
    rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
    rfm["frequency_score"] = pd.qcut(rfm["frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])



    # segment naming
    rfm['rfm_segment'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str)

    seg_map = {
        r'[1-2][1-2]': 'hibernating',
        r'[1-2][3-4]': 'at_risk',
        r'[1-2]5': 'cant_loose',
        r'3[1-2]': 'about_to_sleep',
        r'33': 'need_attention',
        r'[3-4][4-5]': 'loyal_customers',
        r'41': 'promising',
        r'51': 'new_customers',
        r'[4-5][2-3]': 'potential_loyalists',
        r'5[4-5]': 'champions'
    }

    rfm['rfm_segment'] = rfm['rfm_segment'].replace(seg_map, regex=True)
    rfm = rfm[["recency", "frequency", "monetary", "rfm_segment"]]
    return rfm

![](https://miro.medium.com/max/875/1*vcpdfpdfFpmIT8dqaqNGQQ.png)

In [None]:
rfm = create_rfm(df)
rfm.head()

**New Customer**
Customers who have just made purchases and have no history, assigned score of 511.We can arrange special coupons, discounts and campaigns for this segment.

**At Risk**
Shopping frequency averages a group of customers but it’s been a long time since their last purchases, assigned score of 234.One-on-one customer contact and current campaigns can be prepared and presented in addition to the product categories they have received in the past.

**Loyal Customer**
Shopping frequency is a high group of customers and it has not been long since their last purchases, assigned score of 444.Messages, emails describing existing campaigns to remind ourselves can be discarded.

In [None]:
plt.figure(figsize=(15,7))
sns.barplot(x="rfm_segment", y="frequency", data=rfm);