# RFM分析

RFM分析は、顧客データを「最近の購入（Recency）」「購入頻度（Frequency）」「購入金額（Monetary）」の3つの指標で分析する手法です。マーケティング分野で広く用いられています。この分析により、顧客の購買行動を理解し、マーケティング戦略を最適化することが可能です。
- 参考文献：https://qiita.com/NobuYoshi/items/2e44cbcc6df830c6538e

<a href="https://colab.research.google.com/github/fuyu-quant/data-science-wiki/blob/main/tabledata/marketing/rfm_analysis.ipynb" target="_blank" rel="noopener noreferrer"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import datetime as dt

In [9]:
dtypes = {
    'InvoiceNo': 'object',
    'StockCode': 'object',
    'Description': 'object',
    'Quantity': 'int8',
    'InvoiceDate': 'datetime64[ns]',
    'UnitPrice': 'float64',
    'CustomerID': 'object',
    'Country': 'object'
}

url = 'https://raw.githubusercontent.com/fuyu-quant/data-science-wiki/develop/datasets/OnlineRetail.csv'
df = pd.read_csv(url, dtype=dtypes, engine='python',encoding='shift_jis')
df.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850,United Kingdom


In [15]:
data = df.query('Quantity >= 0 & UnitPrice >= 0').dropna(axis=0, subset=['CustomerID'])
data['TotalPrice'] = data['Quantity'] * data['UnitPrice']


### RFM分析の実行

Recencyを計算するために現在時刻を指定する

In [16]:
data['InvoiceDate'].max()

Timestamp('2011-12-09 12:50:00')

In [17]:
NOW = dt.datetime(2011,12,10)

RFMをそれぞれ以下の定義で計算する

In [18]:
rfm = data.groupby("CustomerID") .agg({
    "InvoiceDate": lambda date: (NOW - date.max()).days,
    "InvoiceNo": lambda num: num.nunique(),
    "TotalPrice": lambda price: price.sum()
    }).reset_index()

# カラム名変更
rfm.rename(columns={'InvoiceDate': 'recency',
                    'InvoiceNo': 'frequency',
                    'TotalPrice': 'monetary'}, inplace=True)

rfm.head()

Unnamed: 0,CustomerID,recency,frequency,monetary
0,12346,325,1,26.0
1,12347,2,7,4060.4
2,12348,75,4,1546.68
3,12349,18,1,1757.55
4,12350,310,1,334.4


recency,frequency,monetaryの三つの指標について四分位を求める

In [21]:
quantiles = rfm.drop('CustomerID', axis = 1).quantile(q=[0.25,0.5,0.75])
quantiles_dict = quantiles.to_dict()
quantiles_dict

{'recency': {0.25: 17.0, 0.5: 50.0, 0.75: 142.0}, 'frequency': {0.25: 1.0, 0.5: 2.0, 0.75: 5.0}, 'monetary': {0.25: 300.67499999999995, 0.5: 656.6899999999999, 0.75: 1601.0}}


In [32]:
def cal_R(x, df):
    if x <= df[0.25]:
        return 1
    elif x <= df[0.50]:
        return 2
    elif x <= df[0.75]:
        return 3
    else:
        return 4

def cal_FM(x, df):
    if x <= df[0.25]:
        return 4
    elif x <= df[0.50]:
        return 3
    elif x <= df[0.75]:
        return 2
    else:
        return 1

RFM scoreが値で例えば4が2つ以上なら優良顧客などセグメントを分ける

In [37]:
rfm['R_score'] = rfm['recency'].apply(cal_R, args=(quantiles_dict['recency'],))
rfm['F_score'] = rfm['frequency'].apply(cal_FM, args=(quantiles_dict['frequency'],))
rfm['M_score'] = rfm['monetary'].apply(cal_FM, args=(quantiles_dict['monetary'],))

rfm["RFM_score"] = rfm.R_score.astype(str)+ rfm.F_score.astype(str) + rfm.M_score.astype(str)

rfm.reset_index(inplace=True)
rfm.head()


Unnamed: 0,index,CustomerID,recency,frequency,monetary,R_score,F_score,M_score,RFM_score
0,0,12346,325,1,26.0,4,4,4,444
1,1,12347,2,7,4060.4,1,1,1,111
2,2,12348,75,4,1546.68,3,2,2,322
3,3,12349,18,1,1757.55,2,4,1,241
4,4,12350,310,1,334.4,4,4,3,443
