<div class="alert alert-block alert-info" style="margin-top: 20px">
<h1>Customer Life Time Value</h1>
<p><strong>Customer lifetime value.</strong> </p>
<p>It is the monetary value that a customer will bring to this company during their relationship with a company. </p>
<p>At RFM, we divided our customers into groups, but we did not have the knowledge to measure how these customers would add value to us from a broader perspective, ie time projection. </p>
<hr />
<p><strong>CLTV</strong> = (Customer Value / Churn Rate) x Profit Margin</p>
<p><strong>Customer Value</strong> = Average Order Value * Purchase Frequency</p>
<p><strong>Average Order Value</strong> = Total Revenue / Total Number of Orders</p>
<p><strong>Purchase Frequency</strong> = Total Number of Orders / Total Number of Customers</p>
<p><strong>Churn Rate</strong> = 1 - Repeat Rate</p>
<p>&nbsp;</p>
<p><strong>Repeat Rate:</strong> The ratio of returning customers. If 10 customers went and 2 came back, the repeat rate is 0.2. </p>
<hr />
<h1>Report</h1>
<p>Customer life time values have been calculated, but there is <strong>not temporality</strong> here. </p>
<p>That&#39;s why I made a customer life time value prediction using BG-NBD and GammaGamme models. </p>
</div>

In [1]:
import pandas as pd
import datetime as dt
from sklearn.preprocessing import MinMaxScaler
from helpers import *

In [2]:
df_ = pd.read_excel("online_retail_II.xlsx", sheet_name='Year 2010-2011')

In [3]:
df = df_.copy()

In [4]:
check_df(df)

##################### Shape #####################
(541910, 8)
##################### Types #####################
Invoice                object
StockCode              object
Description            object
Quantity                int64
InvoiceDate    datetime64[ns]
Price                 float64
Customer ID           float64
Country                object
dtype: object
##################### Head #####################
  Invoice StockCode                         Description  Quantity  \
0  536365    85123A  WHITE HANGING HEART T-LIGHT HOLDER         6   
1  536365     71053                 WHITE METAL LANTERN         6   
2  536365    84406B      CREAM CUPID HEARTS COAT HANGER         8   

          InvoiceDate  Price  Customer ID         Country  
0 2010-12-01 08:26:00   2.55      17850.0  United Kingdom  
1 2010-12-01 08:26:00   3.39      17850.0  United Kingdom  
2 2010-12-01 08:26:00   2.75      17850.0  United Kingdom  
##################### NA #####################
Invoice             0

# Data Preposition for CLTV

In [5]:
# Drop NA's
df.dropna(axis=0, inplace=True)
# C means refundees, drop them
df = df[~df["Invoice"].str.contains("C", na=False)]
# there shouldn't be quantity value as zero
df = df[df["Quantity"] > 0]
# Replace outliers with threshold (0.99-0.01)
replace_with_thresholds(df, "Quantity")
replace_with_thresholds(df, "Price")
# Creating TotalPrice feature
df["TotalPrice"] = df["Quantity"] * df["Price"]

In [6]:
df["InvoiceDate"].max()

Timestamp('2011-12-09 12:50:00')

In [7]:
today_date = dt.datetime(2011, 12, 11)

# RFM Metrics

In [8]:
rfm = df.groupby('Customer ID').agg({'InvoiceDate': lambda date: (today_date - date.max()).days,
                                            'Invoice': lambda num: num.nunique(),
                                            "TotalPrice": lambda price: price.sum()})

rfm.columns = ['recency', 'frequency', "monetary"]

rfm = rfm[(rfm['monetary'] > 0)]


# RFM SKORLARININ HESAPLANMASI
rfm["recency_score"] = pd.qcut(rfm['recency'], 5, labels=[5, 4, 3, 2, 1])
rfm["frequency_score"] = pd.qcut(rfm["frequency"].rank(method="first"), 5, labels=[1, 2, 3, 4, 5])

# Monetary segment tanımlamada kullanılmadığı için işlemlere alınmadı.

# SEGMENTLERIN ISIMLENDIRILMESI
rfm['rfm_segment'] = rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str)

seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

rfm['rfm_segment'] = rfm['rfm_segment'].replace(seg_map, regex=True)
rfm = rfm[["recency", "frequency", "monetary", "rfm_segment"]]

In [9]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,rfm_segment
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12346.0,326,1,310.44,hibernating
12347.0,3,7,4310.0,champions
12348.0,76,4,1770.78,at_risk
12349.0,19,1,1491.72,promising
12350.0,311,1,331.46,hibernating


# CLTV Calculation

In [10]:
# avg_order_value
rfm['avg_order_value'] = rfm['monetary'] / rfm['frequency']

In [11]:
# purchase_frequency
rfm["purchase_frequency"] = rfm['frequency'] / rfm.shape[0] # divided for generalizability

In [12]:
# repeat rate & churn rate
repeat_rate = rfm[rfm.frequency > 1].shape[0] / rfm.shape[0]
churn_rate = 1 - repeat_rate

In [13]:
# profit
rfm['profit_margin'] = rfm['monetary'] * 0.05

In [14]:
# Customer Value
rfm['cv'] = (rfm['avg_order_value'] * rfm["purchase_frequency"])

In [15]:
# Customer Lifetime Value
rfm['cltv'] = (rfm['cv'] / churn_rate) * rfm['profit_margin']

In [16]:
# minmaxscaler 
scaler = MinMaxScaler(feature_range=(1, 100))
scaler.fit(rfm[["cltv"]])
rfm["cltv_c"] = scaler.transform(rfm[["cltv"]])

rfm["cltv_c_segment"] = pd.qcut(rfm["cltv_c"], 3, labels=["C", "B", "A"])

rfm = rfm[["recency", "frequency", "monetary", "rfm_segment",
                       "cltv_c", "cltv_c_segment"]]

In [17]:
rfm.head()

Unnamed: 0_level_0,recency,frequency,monetary,rfm_segment,cltv_c,cltv_c_segment
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12346.0,326,1,310.44,hibernating,1.000135,C
12347.0,3,7,4310.0,champions,1.025959,A
12348.0,76,4,1770.78,at_risk,1.004382,A
12349.0,19,1,1491.72,promising,1.00311,A
12350.0,311,1,331.46,hibernating,1.000154,C


In [18]:
rfm.to_excel("CLTV.xlsx", index=False)