# Implementing and Training Predictive CLV Models in Python

### What is Customer Lifetime Value (CLV)?

Total profit of the entire relationship with a customer
* Costs to attract, service, and maintain customer
* Customer transactions (number and value)
* Cusotmer network effects (e.g. word-of-mouth)

** This model will exclusively focus on the Revenue part of the profit equation. 

### Why do we care about CLV?

* Customer segmentation to identify the most profitable customers. 
* Identify traits and features of valuable customers.
* Determine how to allocate resources among customers.
* Enable evaluation of what a company should pay to acquire the customer relationship. 

### Business Contexts

##### Contractual
* Customer 'death' can be observed
* Often modeled using survival-based approaches (e.g. membership)

##### Non-contractual
* Customer 'death' is unobserved
* Customer lifetime distribution often modeled via exponential models (e.g. online retailers)

##### Discrete purchases
* Occur at fixed periods or frequencies (e.g. magzine subscription)

##### Continuous purchases
* Can happen at any time

|                     | Contractual                                                                                                                           | Non-Contractual                                                       |
|:---------------------|:---------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------|
| Discrete Purchase   | - Magazine/newspaper subscriptions<br>- Fitness clubs<br>- Most insurance/lending products<br>- Streaming srevices<br>- Most cell phone plans | - Prescription refills<br>- Charity fund drives<br>- Event attendance |
| Continuous Purchase | - Costco membership<br>- Credit cards                                                                                                 | - Movie<br>- Hotel stays<br>- Grocery purchases<br>- Amazon.com       |

> CLV = **Total Number of Purchases of Each Customer** * Value of Each Future Transaction at the Customer Level

### Pareto/NBD CLV Model

A Hierarchical Bayesian Model with prior distributions for:
* **Purchase Count** in a give time window: Modeled with latent parameter $\lambda$ >>> Poisson Distribution
* **Lifetime**: Modeled with latent parameter $\mu$ >>> Exponential distribution with a slope of $\mu$

at individual level.

* **Combined Models**: 
    * Pareto: Exponential x Gamma
    * NBD: Poisson x Gamma

> The **prior distributions** represent our belief on how the latent parameters are distributed in the customer population.

Advantage of Pareto/NBD Model: small amunt of required data to generate an effective result.

### Data Structure: Recency-Frequency-Monetary Value

* **Recency** = Last purchase date - Initial purchase date = $t_i$ - $t_0$
* (Repeat) **Frequency** = Number of purchases excluding the initial one = i-1
* **T** (time interval) = Last date - Initial purchase date = $t_{now}$ - $t_0$

Pareto/NBD and other models only require an RFM data structure (at the individual level) to be trained.

### Generating an RFM object

* Recency: time of most recent purchase (Most recent - Initial)
* Frequency: number of repeat purchases (Total purchases except Initial)
* T: total elapsed time since customer's first purchase

In [32]:
# !pip install lifetimes
import pandas as pd
from datetime import datetime, timedelta  #timedelta(d1.weekday()) >> convert int. to a time interval
import lifetimes
from lifetimes.datasets import load_dataset

In [22]:
# Import sample data
cdnow_transactions = load_dataset(
    'CDNOW_sample.txt',
    header=None,
    delim_whitespace=True,
    names=['customer_id','customer_index','date','quantity','amount'],
    converters={'date':lambda x: pd.to_datetime(x, format="%Y%m%d")}
)

cdnow_transactions.head()

Unnamed: 0,customer_id,customer_index,date,quantity,amount
0,4,1,1997-01-01,2,29.33
1,4,1,1997-01-18,2,29.73
2,4,1,1997-08-02,1,14.96
3,4,1,1997-12-12,2,26.48
4,21,2,1997-01-01,3,63.34


In [19]:
cdnow_transactions.shape

(6919, 5)

In [21]:
# 1997-10-01 ~ 1998-06-30 is typically used as out-of-sample data
print(cdnow_transactions.date.min())
print(cdnow_transactions.date.max())

1997-01-01 00:00:00
1998-06-30 00:00:00


##### Data Aggregation
The frequency bucket that we want to group the transactions in is "week".

In [33]:
# lifetimes provides a transaction log -> RFM util function
rfm = lifetimes.utils.summary_data_from_transaction_data(
        cdnow_transactions,
        'customer_id',
        'date',
        observation_period_end=pd.to_datetime('1997-09-30'),
        freq='W'  # count all the transactions within a week as 1 observation
)

rfm.head()

Unnamed: 0_level_0,frequency,recency,T
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
4,2.0,30.0,39.0
18,0.0,0.0,39.0
21,1.0,2.0,39.0
50,0.0,0.0,39.0
60,0.0,0.0,35.0


In [79]:
# Examine the aggregation result
cust_id = 18
selected_cust = cdnow_transactions[cdnow_transactions['customer_id']==cust_id]
selected_cust

Unnamed: 0,customer_id,customer_index,date,quantity,amount
157,18,58,1997-01-04,1,14.96


### Theory

##### Purchasing Rate

The number of purhcases made in a period $t$ follows a Poisson Distribution:
$$ p(x~|~\lambda, \Delta t ) =  \frac{(\lambda \Delta t)^x}{x!} e^{-\lambda \Delta t} $$ 

Purchasing Rate paramer $\lambda$ is distributed Gamma with parameters $r$ and $alpha$:

$$\lambda \sim \Gamma(r, \alpha)$$

A customer with a transaction rate $\lambda$ will make on average $\lambda \times \Delta t$ transactions in a period of time $\Delta t$. 

##### Lifetime 

At the customer level, the lifetime $\tau$ is distributed according to an Exponential distribution:

$$ p(\tau~|~\mu) = \mu e^{-\mu \tau } $$ 

where $\tau > 0$. In other words, each customer has its own lifetime distribution. Note that the expectation value for the lifetime $\tau$ is $E[\tau~|~\mu] = \frac{1}{\mu}$. 

The value of $\mu$ varies across the customers according to another gamma distribution with shape $s$ and rate $\beta$ : 

$$\mu \sim \Gamma(s, \beta)$$

##### Likelihood

Likelihood for an individual purchasing rate and lifetime conditional on purchasing frequency, recency, and time since initial purchase:

$$ L(\lambda, \mu~|~x,t_x,T) = \frac{\lambda^x \mu}{\lambda+\mu}e^{-(\lambda+\mu)t_x}+\frac{\lambda^{x+1}}{\lambda+\mu}e^{-(\lambda+\mu)T} $$

where $x$ is the repeat purchase frequency, $t_x$ is the recency and $T$ is the length of the calibration/training period. 


### Training the Pareto/NBD Model

Splitting the data by time period:
- Training Period: at least 3x inter-purchase time
- Validation Period: at least 1/2 of Training period
- Forecast Period: depends on needs >> comparable to the training period