Hi Kagglers,

Welcome to My Kernal about RFM Analysis of eCommerce behavior data

If there are any feedbacks/suggestions you would like to see in the Kernel please let me know. This notebook will always be a work in progress. Please leave any comments about further improvements to the notebook. I appreciate every note!

If you like it, you can upvote and/or leave a comment :)

# **Data Preprocessing**

In [None]:
#Import Library
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import squarify

# **Load The Data**

In [None]:
data_oct = pd.read_csv('../input/ecommerce-behavior-data-from-multi-category-store/2019-Oct.csv',nrows=1)
data_oct

**only load the required data **

In [None]:
data_oct = pd.read_csv('../input/ecommerce-behavior-data-from-multi-category-store/2019-Oct.csv',usecols=['event_time','event_type','price','user_id','user_session'])
data_nov = pd.read_csv('../input/ecommerce-behavior-data-from-multi-category-store/2019-Nov.csv',usecols=['event_time','event_type','price','user_id','user_session'])

In [None]:
data_oct.head()

In [None]:
data_oct['event_type'].unique()

In [None]:
data_nov.head()

In [None]:
data_nov['event_type'].unique()

**Column event type have view, cart, and purchase but we only need purchase data so use .loc to get only purchase data**

In [None]:
data_oct=data_oct.loc[data_oct.event_type == 'purchase']
data_nov=data_nov.loc[data_nov.event_type == 'purchase']

**Then union October and November data**

In [None]:
frames = [data_oct, data_nov]
data=pd.concat(frames)

In [None]:
data.head()

In [None]:
%%time
data['event_time']=pd.to_datetime(data['event_time']).dt.tz_convert(None)

In [None]:
data.dtypes

but sometimes the notebook we run crashes so it's better to save the cleaned data into feather format 
more reference about feather you can read it on [here](https://medium.com/@steven.p.dye/feather-files-faster-than-the-speed-of-light-d4666ce24387) or [here](https://arrow.apache.org/docs/python/feather.html)

In [None]:
%%time
#save dataframe as feather in case our notebook got crashed
#feather save column data types
import pyarrow.feather as feather
os.makedirs('tmp', exist_ok=True)  # Make a temp dir for storing the feather file
feather.write_feather(data, './tmp/data')

In [None]:
%%time
#load the feather data cause feather more lightweight
data = pd.read_feather('./tmp/data')
data

In [None]:
data.dtypes

we can see that each session is equal to 1 transaction in terms of sessions being multiples with every product sold
so we need to merge every session together 

In [None]:
data=data.groupby(by='user_session').agg(Date_order=('event_time',lambda x: x.max()),
                                                  user_id=('user_id',lambda x: x.unique()),
                                          Quantity=('user_session','count'),
                                         money_spent=('price','sum')).reset_index(drop=True)
data

# **RFM Analysis**

RFM is a method used for analyzing customer value. It is commonly used in database marketing and direct marketing and has received particular attention in retail and professional services industries.

RFM stands for the three dimensions:

* Recency – How recently did the customer purchase?
* Frequency – How often do they purchase?
* Monetary Value – How much do they spend?

source: [wikipedia](https://en.wikipedia.org/wiki/RFM_(market_research))

so we will make that 3 attribute Recency, Frequency, and Monetary

In [None]:
data['Date_order'].max()

The last date we have is 2019-11-30 so we will use date 2019-12-1 as reference


In [None]:
import datetime as dt
study_date = dt.datetime(2019,12,1)
data=pd.DataFrame(data)
data['last_purchase']=study_date - data['Date_order']
data['last_purchase'].astype('timedelta64[D]')
data['last_purchase']=data['last_purchase'] / np.timedelta64(1, 'D')
data.head()


In [None]:
#Calculate Recency, Frequency, and Monetary of the data
RFM= data.groupby('user_id').agg(Recency=('last_purchase',lambda x: x.min()),
                                 Frequency=('user_id',lambda x: x.count()),
                                 Monetary=('money_spent',lambda x: x.sum()))
RFM.head()


# **Frequency**

Frequency attribute answers the question: How often do they purchase?

In [None]:
RFM['Frequency'].describe()

# **Monetary**

Monetary attribute answers the question: How much do they spend overtime?


In [None]:
RFM['Recency'].describe()

# **RFM Segmentation**

RFM segmentation is a useful tool for identifying groups of clients who should be given extra attention. RFM segmentation enables marketers to target specific groups of customers with communications that are far more relevant to their unique behaviors, resulting in improved response rates, enhanced loyalty, and increased customer lifetime value. RFM segmentation is a method for segmenting data, just like other approaches. 

The simplest way to create customers segments from RFM Model is to use Quartiles. We assign a score from 1 to 4 to Recency, Frequency and Monetary. Four is the best/highest value, and one is the lowest/worst value. A final RFM score is calculated simply by combining individual RFM score numbers.

**RFM Quartiles**

In [None]:
RFM.quantile(q=[0.25,0.5,0.75])

In [None]:
quartiles=RFM.quantile(q=[0.25,0.5,0.75]).to_dict()
quartiles

# **Creation of RFM Segments**

In [None]:
## for Recency 

def R(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
## for Frequency and Monetary 

def FM(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1    
    

In [None]:
#create RFM segmentation column
RFM['R_Quartile'] = RFM['Recency'].apply(R, args=('Recency',quartiles,))
RFM['F_Quartile'] = RFM['Frequency'].apply(FM, args=('Frequency',quartiles,))
RFM['M_Quartile'] = RFM['Monetary'].apply(FM, args=('Monetary',quartiles,))
RFM['RFM_segmentation'] = RFM.R_Quartile.map(str) \
                    + RFM.F_Quartile.map(str) \
                    + RFM.M_Quartile.map(str)
RFM['RFM_score'] = RFM.R_Quartile.map(int) \
                    + RFM.F_Quartile.map(int) \
                    + RFM.M_Quartile.map(int)
RFM.head()


In [None]:
# Define rfm_level function
def RFM_label(data):
    if data['RFM_score'] >= 10:
        return 'Lost'
    elif ((data['RFM_score'] >= 9) and (data['RFM_score'] < 10)):
        return 'Hibernating'
    elif ((data['RFM_score'] >= 8) and (data['RFM_score'] < 9)):
        return 'Can’t Lose Them'
    elif ((data['RFM_score'] >= 7) and (data['RFM_score'] < 8)):
        return 'About To Sleep'
    elif ((data['RFM_score'] >= 6) and (data['RFM_score'] < 7)):
        return 'Promising'
    elif ((data['RFM_score'] >= 5) and (data['RFM_score'] < 6)):
        return 'Potential Loyalist'
    elif ((data['RFM_score'] >= 4) and (data['RFM_score'] < 5)):
        return 'Loyal Customers'
    else:
        return 'Champions'
#Create RFM label for customer
RFM['RFM_label'] = RFM.apply(RFM_label, axis=1)
RFM.head()

How many customers do we have in each segment?


In [None]:
# Calculate average values for each RFM_Level, and return a size of each segment 
RFM_desc = RFM.groupby('RFM_label').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'Monetary': ['mean', 'count']
}).round(1)
# Print the aggregated dataset
print(RFM_desc)

In [None]:
RFM_desc.columns = RFM_desc.columns.droplevel()
RFM_desc.columns = ['RecencyMean','FrequencyMean','MonetaryMean', 'Count']
#Create our plot and resize it.
fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 9)
squarify.plot(sizes=RFM_desc['Count'], 
              label=['Lost',
                     'Hibernating',
                     'Can’t Lose Them',
                     'About To Sleep',
                     'Promising', 
                     'Potential Loyalist', 
                     'Loyal Customers',
                     'Champions'], alpha=.6 )
plt.title("RFM Segments",fontsize=18,fontweight="bold")
plt.axis('off')
plt.show()

Now that we've identified our customer categories, we can decide how to approach or deal with each customer.

* Champions : Reward them. Can be early adopters of new products. Will promote your brand. Most likely to send referrals.
* Loyal Customer : Upsell higher value products. Ask for reviews.
* Potential Loyalist : Offer membership / loyalty program. Keep them engaged. Offer personalised recommendations.
* Promising : Offer coupons. Bring them back to the platform and keep them engaged. Offer personalised recommendations.
* About to Sleep : Win them back via renewals or newer products, don’t lose them to competition. Talk to them if necessary. Spend time on highest possible personalisation.
* Can't Lose Them : Provide helpful resources on the site. Send personalised emails.
* Hibernating : Make subject lines of emails very personalised. Revive their interest by a specific discount on a specific product.
* Lost : Revive interest with reach out campaign. Ignore otherwise.

reference : [here](https://docs.exponea.com/docs/rfm-segmentation)