<a href="https://colab.research.google.com/github/seanreed1111/BDA_py_demos/blob/master/btyd_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

see also: 
- https://www.briancallander.com/posts/customer_lifetime_value/pareto-nbd.html
- https://www.briancallander.com/posts/customer_lifetime_value/recency_frequency.Rmd
- https://www.briancallander.com/posts/customer_lifetime_value/recency_frequency.html
- https://github.com/mplatzer/BTYDplus/blob/master/R/pareto-nbd-mcmc.R
- https://cran.r-project.org/web/packages/BTYD/BTYD.pdf
- https://github.com/mplatzer/BTYDplus



In [1]:
# installlation required
!pip install pyro-ppl=='1.8.0'


Collecting pyro-ppl==1.8.0
  Downloading pyro_ppl-1.8.0-py3-none-any.whl (713 kB)
[?25l[K     |▌                               | 10 kB 21.4 MB/s eta 0:00:01[K     |█                               | 20 kB 8.8 MB/s eta 0:00:01[K     |█▍                              | 30 kB 11.3 MB/s eta 0:00:01[K     |█▉                              | 40 kB 13.3 MB/s eta 0:00:01[K     |██▎                             | 51 kB 11.8 MB/s eta 0:00:01[K     |██▊                             | 61 kB 13.5 MB/s eta 0:00:01[K     |███▏                            | 71 kB 11.2 MB/s eta 0:00:01[K     |███▊                            | 81 kB 11.2 MB/s eta 0:00:01[K     |████▏                           | 92 kB 11.4 MB/s eta 0:00:01[K     |████▋                           | 102 kB 10.2 MB/s eta 0:00:01[K     |█████                           | 112 kB 10.2 MB/s eta 0:00:01[K     |█████▌                          | 122 kB 10.2 MB/s eta 0:00:01[K     |██████                          | 133 kB 10.2 MB

<a id = "7"></a><br>
# LIBRARIES

In [30]:
import os
import datetime as dt
import pandas as pd
pd.set_option('display.max_rows', 100)
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import logging
from scipy.stats import expon, poisson, uniform, lognorm

from numpy.random import Generator, PCG64
numpy_randomGen = Generator(PCG64(seed=1))


import torch
from torch.distributions import constraints
from torch import tensor

import pyro
import pyro.distributions as dist
from pyro.infer import SVI,Trace_ELBO
from pyro.infer.autoguide  import AutoMultivariateNormal, AutoNormal, init_to_mean
from pyro.optim import ClippedAdam

# Set matplotlib settings
%matplotlib inline
plt.style.use('default')
plt.rcParams['figure.figsize'] = [8, 4]
import warnings 
warnings.filterwarnings("ignore")

DEBUG:matplotlib.pyplot:Loaded backend module://ipykernel.pylab.backend_inline version unknown.


Let’s describe the model first by simulation. 

Suppose we have a company that is 2 years old and a total of 2000 customers, C, that have made at least one purchase from us. 

We’ll assume a linear rate of customer acquisition, so that the first purchase date is simply a uniform random variable over the 2 years of the company existance. These assumptions are just to keep the example concrete, and are not so important for understanding the model.

Each customer c∈C is assumed to have a certain lifetime, τc, starting on their join-date. 

During their lifetime, they will purchase at a constant rate, λc, so that they will make k∼Poisson(tλc) purchases over a time-interval t. 

Once their lifetime is over, they will stop purchasing. We only observe the customer for Tc units of time, and this observation time can be either larger or smaller than the lifetime, τc. 

Since we don’t observe τc itself, we will assume it follows an exponential distribution, i.e. τc∼Exp(μc).

In [3]:
logging.basicConfig(level=logging.DEBUG)

In [4]:
def create_start_dates(n, max_number_of_periods):
  '''
  returns an array of n start dates in interval [0, max_number_of_periods)
  
  inputs 
  int n: number of customers to generate
  int max_number_of_periods: max number of periods customer can be observed in simulation

  output: 
  start_date[n]: starting period of customer n, starting from 0
  '''
  return np.random.default_rng(1).integers(low=0, high=max_number_of_periods, size=n)

In [42]:
def simulate_purchases(*,T, mean_customer_lifetime, mean_period_between_purchases, var_customer_lifetime = None, var_period_between_purchases = None, max_number_of_periods=200  ):
  '''
  input: 
  T: customer enrollment date

  mean_customer_lifetime: mean of customer lifetime, in periods
  var_customer_lifetime: var of customer lifetime
  mean_period_between_purchases: mean period between purchases
  var_period_between_purchases: var of period between purchases

  output:
  k: number of purchases
  T: customer enrollment date
  max_number_of_periods - T: observation time = This is the length of time they have been a customer
  tau: actual (latent) lifetime for this customer (drawn from exponential distribution with scale=mean_customer_lifetime)
 
  '''
  from scipy.stats import expon

  assert mean_customer_lifetime > 0 and mean_period_between_purchases > 0, "mean lifetime and mean period between purchases must both be > 0"
  
  tau = expon.rvs(scale=mean_customer_lifetime) # actual lifetime for this customer
  t, k = T, 0 
  wait = expon.rvs(scale=mean_period_between_purchases) # waiting time between purchases
  while ((t + wait) < min(max_number_of_periods, T + tau)): 
    t = t + wait
    k = k + 1
    wait = expon.rvs(scale=mean_period_between_purchases)

  return  T,  max_number_of_periods - T, k, tau, t  #final value of t is time of last purchased

simulate_purchases_vec = np.vectorize(simulate_purchases)




In [36]:
simulate_purchases(T=94, mean_customer_lifetime=100, mean_period_between_purchases=8)

(94, 106, 0, 1.7772683997866974, 94)

In [49]:
def create_customer_df(*,n, mean_customer_lifetime, mean_period_between_purchases, max_number_of_periods, var_customer_lifetime = None, var_period_between_purchases = None):
  '''
  output: 
    dataframe[['k','T','tau','t']] where

  k: number of purchases
  T: enrollment date
  tau: actual lifetime for this customer drawn from exponential distribution with scale=mean_customer_lifetime
  t_recency: time since customer's last purchase
  '''
  T =  create_start_dates(n=n, max_number_of_periods=max_number_of_periods)
  result = np.round(simulate_purchases_vec(T=T, mean_customer_lifetime=100, mean_period_between_purchases=10),0)
  return pd.DataFrame(result, index=['enrollment_date', 'T_observed','purchases', 'tau', 'date_of_last_purchase']).T


In [50]:
max_number_of_periods=200
customers = create_customer_df(n=100, mean_customer_lifetime=100, mean_period_between_purchases=1, max_number_of_periods=max_number_of_periods)
customers

Unnamed: 0,enrollment_date,T_observed,purchases,tau,date_of_last_purchase
0,94.0,106.0,2.0,29.0,115.0
1,102.0,98.0,0.0,6.0,102.0
2,151.0,49.0,0.0,4.0,151.0
3,190.0,10.0,0.0,57.0,190.0
4,6.0,194.0,11.0,142.0,134.0
5,28.0,172.0,4.0,75.0,94.0
6,164.0,36.0,4.0,22.0,185.0
7,189.0,11.0,1.0,35.0,193.0
8,49.0,151.0,1.0,11.0,50.0
9,62.0,138.0,6.0,68.0,90.0


In [56]:
customers.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
enrollment_date,100.0,101.02,56.832019,3.0,55.75,99.5,151.75,196.0
T_observed,100.0,98.98,56.832019,4.0,48.25,100.5,144.25,197.0
purchases,100.0,5.66,5.486844,0.0,1.0,4.0,9.0,22.0
tau,100.0,96.01,94.938204,0.0,25.0,63.5,148.25,405.0
date_of_last_purchase,100.0,149.65,49.734929,5.0,114.0,167.5,192.0,200.0


In [55]:
data = customers[customers['purchases'] >= 2.].copy()
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
enrollment_date,71.0,90.014085,53.505539,5.0,51.5,85.0,138.5,193.0
T_observed,71.0,109.985915,53.505539,7.0,61.5,115.0,148.5,195.0
purchases,71.0,7.788732,5.160054,2.0,3.5,6.0,11.0,22.0
tau,71.0,118.957746,92.38343,8.0,43.5,92.0,167.0,405.0
date_of_last_purchase,71.0,156.859155,45.527479,12.0,128.5,175.0,193.0,200.0


In [54]:
t, T, k = tensor(data['date_of_last_purchase'].values), tensor(data['T_observed'].values), tensor(data['purchases'].values)

## Test Model Definition

In [None]:
def model_test(t, T, k, prior_only=False):
  '''
  input:
  vector t (nx1)  = time since most recent purchase (recency)
  vector T (nx1) = total observation time
  vector k (nx1) = number of purchases observed (k must be >= 2)

  n, etau_alpha, etau_beta, Lambda_alpha, Lambda_beta are scalars
  n = number of customers
  etau_alpha, etau_beta are priors for etau
  Lambda_alpha, Lambda_beta are priors for Lambda
  '''
  assert torch.all(k >=2.), "There are illegal values of k. k must be >= 2"

  def loglik(Lambda, mu, t, T, k):
    target = k * torch.log(Lambda) - torch.log(Lambda + mu)
    n = Lambda.size(0)
    for i in range(n):
      target  = target + torch.logaddexp(torch.log(Lambda[i]) - (Lambda[i] + mu[i]) * T[i],
                                        torch.log(mu[i]) - (Lambda[i] + mu[i]) * t[i]
                                        )
    return target
  
  tau_alpha = pyro.sample('tau_alpha', dist.Uniform(-1,1))
  tau_beta = pyro.sample('tau_beta', dist.Uniform(-2,2))
  Lambda_alpha = pyro.sample('Lambda_alpha', dist.Uniform(-3,3))
  Lambda_beta = pyro.sample('Lambda_beta', dist.Uniform(-4,4))

  if not prior_only:
    with pyro.plate("data", t.size(0)):
      tau  = pyro.sample('tau', dist.Gamma(tau_alpha, tau_beta))
      mu = 1./tau
      Lambda = pyro.sample('Lambda', dist.Gamma(Lambda_alpha, Lambda_beta))
    pyro.factor('loglik', loglik(Lambda, mu, t, T, k))

## original model

In [None]:
def model_one(t, T, k, prior_only=False):
  '''
  input:
  vector t (nx1)  = time since most recent purchase (recency)
  vector T (nx1) = total observation time
  vector k (nx1) = number of purchases observed (k must be >= 2)

  n, etau_alpha, etau_beta, Lambda_alpha, Lambda_beta are scalars
  n = number of customers
  etau_alpha, etau_beta are priors for etau
  Lambda_alpha, Lambda_beta are priors for Lambda
  '''
  assert torch.all(k >=2.), "There are illegal values of k. k must be >= 2"

  def loglik(Lambda, mu, t, T, k):
    target = k * torch.log(Lambda) - torch.log(Lambda + mu)
    n = Lambda.size(0)
    for i in range(n):
      target  = target + torch.logaddexp(torch.log(Lambda[i]) - (Lambda[i] + mu[i]) * T[i],
                                        torch.log(mu[i]) - (Lambda[i] + mu[i]) * t[i]
                                        )
    return target
  
  # etau_alpha = pyro.sample('etau_alpha', dist.)
  # etau_beta = pyro.sample('etau_beta', dist)
  # Lambda_alpha = pyro.sample('Lambda_alpha', dist)
  # Lambda_beta = pyro.sample('Lambda_beta', dist)

  with pyro.plate("data", t.size(0)):
    etau  = pyro.sample('etau', dist.InverseGamma(etau_alpha, etau_beta))
    mu = 1./etau
    Lambda = pyro.sample('Lambda', dist.Gamma(Lambda_alpha, Lambda_beta))

  if prior_only:
    pyro.factor('loglik', loglik(Lambda, mu, t, T, k))
  else:
    pyro.factor('zero', 0)

## create data

In [None]:
data = create_df(n=100, mean_lifetime =100, mean_period_between_purchases =1)
data = data[data['k'] >= 2.] # multiple purchases only
data

In [None]:
t, T, k = tensor(data['t_recency'].values), tensor(data['T'].values), tensor(data['k'].values)

## Perform MCMC

In [None]:
from pyro.infer import MCMC, NUTS
model = model_test
nuts_kernel = NUTS(model)
mcmc = MCMC(nuts_kernel, num_samples=1000, warmup_steps=250)

mcmc.run(t, T, k, prior_only=True)

In [None]:
hmc_samples = {k: v.detach().cpu().numpy() for k, v in mcmc.get_samples().items()}
hmc_samples.keys()

In [None]:
for key in hmc_samples.keys():
  sns.kdeplot(data = hmc_samples[key])

## Centered BTYD

In [None]:

# https://www.briancallander.com/posts/customer_lifetime_value/models/rf.stan
# data_hyperpriors <- list(
#   log_life_mean_mu = log(31),
#   log_life_mean_sigma = 0.7,
#   log_life_scale_sigma = 0.8,

#   log_lambda_mean_mu = log(1 / 14),
#   log_lambda_mean_sigma = 0.3,
#   log_lambda_scale_sigma = 0.5
# )
data {
  int<lower = 1> n;       // number of customers
  vector<lower = 0>[n] t; // time to most recent purchase
  vector<lower = 0>[n] T; // total observation time
  vector<lower = 0>[n] k; // number of purchases observed

  // user-specified parameters
  real<lower = 0> etau_mean_alpha;
  real<lower = 0> etau_mean_beta;
  real<lower = 0> etau_sd_alpha;
  real<lower = 0> etau_sd_beta;

  real<lower = 0> lambda_mean_alpha;
  real<lower = 0> lambda_mean_beta;
  real<lower = 0> lambda_sd_alpha;
  real<lower = 0> lambda_sd_beta;
}

parameters {
  vector<lower = 0>[n] lambda; // purchase rate
  vector<lower = 0>[n] etau;   // expected mean lifetime

  vector<lower = 0>[n] etau_mean; // mean expected life span
  vector<lower = 0>[n] etau_sd;
  vector<lower = 0>[n] lambda_mean; // mean purchase rate
  vector<lower = 0>[n] lambda_sd;

}

transformed parameters {
  vector<lower = 0>[n] etau_beta = etau_mean;
  vector<lower = 0>[n] etau_alpha = etau_sd;
  vector<lower = 0>[n] lambda_beta = lambda_mean ./ (lambda_sd .* lambda_sd);
  vector<lower = 0>[n] lambda_alpha = lambda_beta .* lambda_mean;

  vector<lower = 0>[n] mu = 1.0 ./ etau;
}

model {
  // hyperpriors
  etau_mean ~ gamma(etau_mean_alpha, etau_mean_beta);
  etau_sd ~ gamma(etau_sd_alpha, etau_sd_beta);

  lambda_mean ~ gamma(lambda_mean_alpha, lambda_mean_beta);
  lambda_sd ~ gamma(lambda_sd_alpha, lambda_sd_beta);

  // priors
  etau ~ inv_gamma(etau_alpha, etau_beta);
  lambda ~ gamma(lambda_alpha, lambda_beta);

  // likelihood
  target += k .* log(lambda) - log(lambda + mu);
  for (i in 1:n) {
    target += log_sum_exp(
      log(lambda[i]) - (lambda[i] + mu[i]) .* T[i],
      log(mu[i]) - (lambda[i] + mu[i]) .* t[i]
    );
  }
}


## Non-centered BTYD

In [None]:
# non-centered BTYD
# https://www.briancallander.com/posts/customer_lifetime_value/recency_frequency.html
# https://www.briancallander.com/posts/customer_lifetime_value/models/rf_noncentred.stan

data {
  int<lower = 1> n;       // number of customers
  vector<lower = 0>[n] t; // time between first and last purchase
  vector<lower = 0>[n] T; // total observation time
  vector<lower = 0>[n] k; // number of purchases

  // hyperparameters for the expected lifetime
  real log_life_mean_mu;
  real<lower = 0> log_life_mean_sigma;
  // hyperparameter for scale of customer-level lifetime effects
  real<lower = 0> log_life_scale_sigma;

  // hyperparameters for the expected purchase rate
  real log_lambda_mean_mu;
  real<lower = 0> log_lambda_mean_sigma;
  // hyperparameter for scale of customer-level purchase-rate effects
  real<lower = 0> log_lambda_scale_sigma;

  // flag whether to only sample from the prior
  // to draw from the prior-predictive distribution: prior_only = 1
  // to draw from the posterior distribution: prior_only = 0
  int<lower = 0, upper = 1> prior_only;
}

transformed data {
  vector<lower = 0, upper = 0>[2] zero = rep_vector(0, 2);
  vector[2] J = [-1, 1]';
  vector[2] m = [log_life_mean_mu, log_lambda_mean_mu]';
  matrix<lower = 0>[2, 2] m_sigma = diag_matrix([log_life_mean_sigma, log_lambda_mean_sigma]');
  matrix<lower = 0>[2, 2] s_sigma = diag_matrix([log_life_scale_sigma, log_lambda_scale_sigma]');
}

parameters {
  row_vector[2] log_centres;
  vector<lower = 0>[2] scales;
  matrix[n, 2] customer; // customer-level effects
}

transformed parameters {
  matrix<lower = 0>[n, 2] theta = exp(
    diag_post_multiply(
      rep_matrix(log_centres, n) + diag_post_multiply(customer, scales),
      J
    )
  ); // (mu, lambda)
}

model {
  // priors
  log_centres ~ multi_normal_cholesky(m, m_sigma);
  scales ~ multi_normal_cholesky(zero, s_sigma);

  for (i in 1:n) {

    customer[i, ] ~ std_normal();

    // likelihood
    if (prior_only == 0) {
      target += log_sum_exp(
        log(theta[i, 2]) - (theta[i, 2] + theta[i, 1]) .* T[i],
        log(theta[i, 1]) - (theta[i, 2] + theta[i, 1]) .* t[i]
      );
    }
  }

  if (prior_only == 0) {
    target += k .* log(theta[, 2]) - log(theta[, 2] + theta[, 1]);
  }

}
