# Customer Lifetime Value

Customer Lifetime values leverages a class of statistical models designed to predict the behavioral characteristics of customers. 

The customer lifetime value uses the following 4 metrics from the dataset:
* Recency: The age of the customer at the time of their last purchase
* Monetary: The average total sales of the customer
* Frequency: Number of purchases/transactions
* Age (T): The age of the customer's relationship with the company

Through the Lifetimes library, the notebook uses a Beta Geonmetric/Negative Binomial Distribution to predict the expected number of purchases by each customer over the next 90 days, as well as the probability that a customer is returning based off their frequency and recency. From there, a Gamma-Gamma distribution is added to finally calculate the total value of a customer throughout their lifetime with the company. 

The below notebook is used as a proof of concept, and ultimately was utilized in the web app dashboard. 

In [1]:
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from lifetimes import BetaGeoFitter
from lifetimes import GammaGammaFitter

from lifetimes.plotting import *
from lifetimes.utils import *
import joblib

import plotly.graph_objects as go


In [2]:
df = pd.read_csv("Data/cleaned_sales_data")
df.InvoiceDate = pd.to_datetime(df.InvoiceDate)
df.drop(df.loc[df["Customer ID"] == 12346.0].index, inplace = True)
#df.dropna(inplace = True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1039537 entries, 0 to 1039561
Data columns (total 10 columns):
 #   Column       Non-Null Count    Dtype         
---  ------       --------------    -----         
 0   Invoice      1039537 non-null  int64         
 1   StockCode    1039537 non-null  int64         
 2   Description  1037881 non-null  object        
 3   Quantity     1039537 non-null  int64         
 4   InvoiceDate  1039537 non-null  datetime64[ns]
 5   Price        1039537 non-null  float64       
 6   Customer ID  802668 non-null   float64       
 7   Country      1039537 non-null  object        
 8   Revenue      1039537 non-null  float64       
 9   refund       0 non-null        float64       
dtypes: datetime64[ns](1), float64(4), int64(3), object(2)
memory usage: 87.2+ MB


In [13]:
today_date = dt.datetime(2011, 12, 12)

rfm = df.groupby("Customer ID").agg({"InvoiceDate": [lambda date: (date.max() - date.min()).days,
                                                     lambda date: (today_date - date.min()).days],
                                     "Invoice": lambda num: num.nunique(),
                                      "Revenue": lambda price: price.sum()}) #total price per customer

rfm.columns = rfm.columns.droplevel(0)
rfm.columns = ['recency', "T", 'frequency', "monetary_value"]

# Calculating average monetary values per order:
rfm["monetary_value"] = rfm["monetary_value"] / rfm["frequency"]

In [14]:
rfm

Unnamed: 0_level_0,recency,T,frequency,monetary_value
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
12347.0,402,406,8,704.165000
12348.0,362,440,5,331.680000
12349.0,570,591,3,1226.230000
12350.0,0,312,1,294.400000
12351.0,0,377,1,300.930000
...,...,...,...,...
18283.0,654,660,22,124.122727
18284.0,0,433,1,411.680000
18285.0,0,662,1,377.000000
18286.0,247,725,2,623.215000


In [17]:
bgf = BetaGeoFitter(penalizer_coef=0.01)
bgf.fit(rfm['frequency'], rfm['recency'], rfm['T'])

rfm["probability_alive"] = bgf.conditional_probability_alive(rfm['frequency'], rfm['recency'], rfm['T'])
rfm

Unnamed: 0_level_0,recency,T,frequency,monetary_value,probability_alive
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
12347.0,402,406,8,704.165000,0.971673
12348.0,362,440,5,331.680000,0.880941
12349.0,570,591,3,1226.230000,0.916467
12350.0,0,312,1,294.400000,0.015541
12351.0,0,377,1,300.930000,0.010767
...,...,...,...,...,...
18283.0,654,660,22,124.122727,0.988626
18284.0,0,433,1,411.680000,0.008196
18285.0,0,662,1,377.000000,0.003488
18286.0,247,725,2,623.215000,0.249516


In [26]:
# Prediction of expected number of transaction for each customer for one year (365 days)
rfm['expctd_num_of_purch'] = bgf.predict(90, rfm['frequency'], rfm['recency'], rfm['T']) 
rfm.sort_values("expctd_num_of_purch",ascending=False).head()

Unnamed: 0,Customer ID,recency,T,frequency,monetary_value,probability_alive,expctd_num_of_purch,CLV
1824,14911,737,740,373,741.701367,0.997664,43.101961,116259.898846
273,12748,734,737,322,162.084814,0.99777,37.37427,22222.129309
3950,17841,735,739,211,335.76763,0.9971,24.455944,29952.121691
2112,15311,738,740,207,562.686763,0.998317,23.993238,49073.402728
513,13089,734,739,203,575.06335,0.996228,23.513032,49139.225493


In [27]:
rfm.monetary_value.astype(int)
rfm = rfm[rfm.monetary_value >=1]     #Eliminating rows where moneytary value is negative
rfm = rfm[rfm['frequency']>1]         #Eliminating rows where the frequency is 1. This is not useful in the model as we're 
                                      #looking at repeat customers - where the frequency is greater than 1. 
ggf = GammaGammaFitter(penalizer_coef = 0)       #Instantiating the Gamma Gamma model and fitting on the frequency and
ggf.fit(rfm['frequency'],rfm['monetary_value'])  #monetary value

print(ggf)

<lifetimes.GammaGammaFitter: fitted with 4233 subjects, p: 1.83, q: 3.96, v: 613.20>


In [28]:
rfm["CLV"] = ggf.customer_lifetime_value(
    bgf, #the model to use to predict the number of future transactions
    rfm['frequency'],
    rfm['recency'],
    rfm['T'],
    rfm['monetary_value'],
    time=12, # Over the next 12 months
    discount_rate=0.01) #Assuming a minimal discount provided to new customers, or promotions offered throughout the relationship.

In [29]:
rfm.reset_index(inplace = True)


In [30]:
rfm["Customer ID"] = rfm["Customer ID"].astype(int)
rfm.to_csv("Data/rfm", index = False)
rfm

Unnamed: 0,index,Customer ID,recency,T,frequency,monetary_value,probability_alive,expctd_num_of_purch,CLV
0,0,12347,402,406,8,704.165000,0.971673,1.790380,4145.297084
1,1,12348,362,440,5,331.680000,0.880941,1.011752,1238.190811
2,2,12349,570,591,3,1226.230000,0.916467,0.539137,1797.787902
3,3,12352,356,394,9,192.171111,0.942280,1.980055,1556.670517
4,4,12353,204,410,2,203.380000,0.516001,0.321745,319.769947
...,...,...,...,...,...,...,...,...,...
4228,4228,18281,397,579,2,100.570000,0.717020,0.325524,261.741813
4229,4229,18282,118,128,2,89.025000,0.862802,1.435056,1049.453999
4230,4230,18283,654,660,22,124.122727,0.988626,2.941923,1511.545892
4231,4231,18286,247,725,2,623.215000,0.249516,0.091742,169.733555
