###**RFM Analysis**

RFM (Recency, Frequency, Monetary) Analysis merupakan metode analisis yang membagi customer berdasarkan riwayat transaksi mereka.
Dengan metode RFM data customer dititikberatkan pada 3 atribut yaitu Recency, Frequency dan Monetary.

- Recency adalah waktu terakhir transaksi.
- Frequency mewakili jumlah transaksi yang terjadi dalam periode waktu tertentu.
- Monetary adalah total nilai transaksi pada periode waktu tertentu.

Tujuannya adalah untuk mengetahui kelompok pelanggan mana yang sangat menguntungkan, memperkirakan tingkat pengembalian dan meningkatkan laba. 

Berikut adalah contoh dari analisis RFM.




####Data Preparation

In [None]:
# Import Library
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt

In [None]:
# Read data / csv
df = pd.read_csv('Retail_Data_Transactions.csv')
df.head()

Unnamed: 0,customer_id,trans_date,tran_amount
0,CS5295,11-Feb-13,35
1,CS4768,15-Mar-15,39
2,CS2122,26-Feb-13,52
3,CS1217,16-Nov-11,99
4,CS1850,20-Nov-13,78


In [None]:
# Cek missing value
df.isnull().sum()

customer_id    0
trans_date     0
tran_amount    0
dtype: int64

In [None]:
# Cek tipe data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 125000 entries, 0 to 124999
Data columns (total 3 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   customer_id  125000 non-null  object
 1   trans_date   125000 non-null  object
 2   tran_amount  125000 non-null  int64 
dtypes: int64(1), object(2)
memory usage: 2.9+ MB


In [None]:
# Merubah tipe data kolom trans_date menjadi datetime
df['trans_date']=pd.to_datetime(df['trans_date'])

In [None]:
# mengidentifikasi tanggal transaksi paling awal dan terbaru.
print(df['trans_date'].min(), df['trans_date'].max())

2011-05-16 00:00:00 2015-03-16 00:00:00


In [None]:
# Menetapkan 2015-04-01 sebagai waktu pengambilan data.
import datetime as dt
NOW = dt.datetime(2015,4,1)

# Menghitung jarak waktu transaksi dengan waktu pengambilan data (Recency).
df['hist']=NOW - df['trans_date']
df['hist'].astype('timedelta64[D]')
df['hist']=df['hist'] / np.timedelta64(1, 'D')
df.head()

Unnamed: 0,customer_id,trans_date,tran_amount,hist
0,CS5295,2013-02-11,35,779.0
1,CS4768,2015-03-15,39,17.0
2,CS2122,2013-02-26,52,764.0
3,CS1217,2011-11-16,99,1232.0
4,CS1850,2013-11-20,78,497.0


#### Table RFM

In [None]:
# Membuat tabel RFM 
rfmTable = df.groupby('customer_id').agg({'hist': lambda x:x.min(),             # Recency
                                        'customer_id': lambda x: len(x),        # Frequency
                                        'tran_amount': lambda x: x.sum()})      # Monetary Value

rfmTable.rename(columns={'hist': 'recency', 
                         'customer_id': 'frequency', 
                         'tran_amount': 'monetary_value'}, inplace=True)

rfmTable.head()

Unnamed: 0_level_0,recency,frequency,monetary_value
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CS1112,77.0,15,1012
CS1113,51.0,20,1490
CS1114,48.0,19,1432
CS1115,27.0,22,1659
CS1116,219.0,13,857


Table RFM dibuat dengan meringkas data (Aggregat data) berdasarkan jarak waktu transaksi dengan waktu pengambilan data (Recency), jumlah / banyaknya transaksi (Frequency) dan total nilai transaksi (Monetary).

#### Analisis RFM

In [None]:
# Mencari kuartil data
quartiles = rfmTable.quantile(q=[0.25,0.50,0.75])
print(quartiles, type(quartiles))

      recency  frequency  monetary_value
0.25     38.0       14.0           781.0
0.50     69.0       18.0          1227.0
0.75    127.0       22.0          1520.0 <class 'pandas.core.frame.DataFrame'>


Analisis RFM mengelompokkan R, F, M dalam 3 kategori atau lebih berdasarkan kuartil dimana kuartil secara kasar membagi 4 segmen dengan proporsi yang sama.

#### Tabel Segmentasi RFM

In [None]:
# Definisi Kategori Segmentasi

## for Recency 
def RClass(x,p,d):
    if x <= d[p][0.25]:
        return 1
    elif x <= d[p][0.50]:
        return 2
    elif x <= d[p][0.75]: 
        return 3
    else:
        return 4
    
## for Frequency and Monetary value 
def FMClass(x,p,d):
    if x <= d[p][0.25]:
        return 4
    elif x <= d[p][0.50]:
        return 3
    elif x <= d[p][0.75]: 
        return 2
    else:
        return 1        

Terdapat dua kelas  segmentasi RFM , Untuk nilai Recency lebih rendah lebih baik, sedangkan untuk nilai frequency dan nilai monetary lebih tinggi lebih baik. Karenanya skema kategorisasi dibalik.

In [None]:
# Membuat Tabel Segmentasi RFM
rfmSeg = rfmTable
rfmSeg['R_Quartile'] = rfmSeg['recency'].apply(RClass, args=('recency',quartiles,))
rfmSeg['F_Quartile'] = rfmSeg['frequency'].apply(FMClass, args=('frequency',quartiles,))
rfmSeg['M_Quartile'] = rfmSeg['monetary_value'].apply(FMClass, args=('monetary_value',quartiles,))

rfmSeg['RFMClass'] = rfmSeg.R_Quartile.map(str) \
                            + rfmSeg.F_Quartile.map(str) \
                            + rfmSeg.M_Quartile.map(str)

rfmSeg.head()

Unnamed: 0_level_0,recency,frequency,monetary_value,R_Quartile,F_Quartile,M_Quartile,RFMClass
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CS1112,77.0,15,1012,3,3,3,333
CS1113,51.0,20,1490,2,2,2,222
CS1114,48.0,19,1432,2,2,2,222
CS1115,27.0,22,1659,1,2,1,121
CS1116,219.0,13,857,4,4,3,443


In [None]:
# Identifikasi Segmen Customer
print("Best Customers: ",len(rfmSeg[rfmSeg['RFMClass']=='111']))
print('Loyal Customers: ',len(rfmSeg[rfmSeg['F_Quartile']==1]))
print("Big Spenders: ",len(rfmSeg[rfmSeg['M_Quartile']==1]))
print('Almost Lost: ', len(rfmSeg[rfmSeg['RFMClass']=='311']))
print('Lost Customers: ',len(rfmSeg[rfmSeg['RFMClass']=='411']))
print('Lost Cheap Customers: ',len(rfmSeg[rfmSeg['RFMClass']=='444']))

Best Customers:  438
Loyal Customers:  1401
Big Spenders:  1721
Almost Lost:  325
Lost Customers:  163
Lost Cheap Customers:  550
