Adapt Evaluation Plots from `lifetimes` #326

ColtAllen · 2023-07-13T19:24:47Z

zwelitunyiswa · 2023-07-13T19:45:32Z

I have also found tracking plots incredibly useful for explaining to stakeholders. CLVTools has some very nice implementations: https://www.clvtools.com/articles/CLVTools.html (scroll down)

…

On Thu, Jul 13, 2023 at 3:24 PM Colt Allen ***@***.***> wrote: lifetimes contains a variety of plotting functions for model evaluation: - plot_period_transactions - plot_calibration_purchases_vs_holdout_purchases - plot_frequency_recency_matrix - plot_probability_alive_matrix - plot_expected_repeat_purchases - plot_history_alive - plot_cumulative_transactions - plot_incremental_transactions - plot_transaction_rate_heterogeneity - plot_dropout_rate_heterogeneity This notebook <https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb> provides some examples of their use. @larryshamalama <https://github.com/larryshamalama> already added the matrix plots, but the others would need to be modified for posterior confidence intervals. Some of them also require utility functions that I'll create PRs for soon. I don't consider plot_history_alive to be all that useful, but if there's interest it's something to consider. It's also worth reviewing the research papers <#91> for additional ideas of plots to add. — Reply to this email directly, view it on GitHub <#326>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AH3QQV7R2Z43XJLC3ZMCPZDXQBDQVANCNFSM6AAAAAA2JMSLJA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ColtAllen · 2023-07-16T21:45:16Z

I have also found tracking plots incredibly useful for explaining to
stakeholders. CLVTools has some very nice implementations:

I think plot_incremental_transactions is the equivalent in lifetimes. Here's an example:

ColtAllen · 2023-07-16T21:58:15Z

plot_period_transactions is essentially plotting a posterior predictive check for customer purchase frequency:

Unless I'm mistaken, this means we can't use a Potential to define the likelihood. GammaGammaModel is fine because this plotting method doesn't apply, but BetaGeoModel would need its own distribution block per #128.

zwelitunyiswa · 2023-07-16T22:05:06Z

I have also found tracking plots incredibly useful for explaining to
stakeholders. CLVTools has some very nice implementations:

I think plot_incremental_transactions is the equivalent in lifetimes. Here's an example:

@ColtAllen I did not realize that these were added/in there. That is very cool. Thanks for pointing them out.

ColtAllen · 2023-07-18T17:53:35Z

Some WIP ideas for adapting plot_period_transactions in this issue: #278

wd60622 · 2023-08-05T14:44:53Z

I have a PR to add ax argument to the current CLV plotting functions. Might be a good standard here in order to give additional flexibility to the user

ColtAllen · 2023-08-05T14:58:40Z

I also want to add a parameter to return the formatted data instead of the plot, in case anyone wants to build the plots in something other than matplotlib (this has come up in my day job).

In the case of the plot_cumulative_transactions and plot_period_transactions functions, returning the data also enables use of the Wasserstein Distance function from scipy for concept drift monitoring, which will be helpful if the model is registered to mlflow.

wd60622 · 2023-08-05T15:27:08Z

Love that idea. Separate the data transformation from plotting. Total sense to me

ColtAllen · 2023-08-05T16:00:26Z

The ArviZ library also has a lot of useful plotting options to compliment whatever we add in pymc-marketing, and examples would be a great addition to the tutorial notebooks. I've played around with a few of them in the dev notebook for ParetoNBDModel:

https://github.com/pymc-labs/pymc-marketing/blob/main/docs/source/notebooks/clv/dev/pareto_nbd.ipynb

jcfisher · 2023-08-31T19:01:12Z

Hi all, I was trying to create the histogram comparing observed and expected purchase counts for the BG/NBD model for myself when I came across this issue. This would be the histogram that a plot_period_transactions function would create for BetaGeoModel. I put together a function that computes the values based on the Fader, Hardie, and Lee (2007) note, and thought I would share it here in case it's useful.

Based on that note, they calculated that histogram by estimating $Pr(X(t) = x | r, \alpha, a, b)$ for $x = 1, 2, \ldots, 7+$ using each value of $t$ in the original data. Then they summed over the values of $t$ to get the expected number of repeat purchases (i.e., the expected number of people with $x = 1, 2, \ldots, 7+$) for the original cohort ($E(f_x)$ in their notation).

They calculated this in Excel. I essentially transcribed their Excel functions into Python, making the following function:

import numpy as np
from scipy.special import betaln, gammaln

def calculate_pmf(
    t_val: np.array, 
    max_x: int,
    # defaults are values from lifetimes file
    r: float = 0.242594150135163,
    alpha: float = 4.41359212062015,
    a: float = 0.792919156875463,
    b: float = 2.4258950494563
) -> np.array:

    # First calculate values that don't vary with x or t
    log_beta_ab = betaln(a, b)
    log_gamma_r = gammaln(r)

    # Create an output matrix
    out = np.empty((len(t_val), max_x + 1))

    # Next loop through all the values of t and calculate the pmf for each one
    for idx, t in enumerate(t_val):
        log_alpha_r = r * (np.log(alpha) - np.log(alpha + t))

        # Create a range of values for x.  This will allow us to use
        # vectorized functions to loop
        x = np.arange(max_x)

        # For each value of the output range, calculate the part that depends on x
        log_term1 = betaln(a, b + x) - log_beta_ab \
            + gammaln(r + x) - log_gamma_r - gammaln(x + 1) \
            + log_alpha_r \
            + (x * (np.log(t) - np.log(alpha + t)))

        log_term2 = np.where(
            x > 0,
            betaln(a + 1, b + x - 1) - log_beta_ab,
            0
        )

        # To calculate the sum, we're going to calculate each term and apply cumsum
        # Two notes:
        # 1. You need to exponentiate before summing
        # 2. For a given value of x, we need the (x - 1)th term of the sum
        #    We'll take care of that on the fly when creating the output

        individual_sum_terms = np.cumsum(np.exp(
                gammaln(r + x) - gammaln(x + 1) \
                + (x * (np.log(t) - np.log(alpha + t)))
                ))

        # Log and add the terms that you factored out back in
        log_term3 = log_alpha_r - log_gamma_r + np.log(individual_sum_terms)

        # Get the rest of term 3
        term3 = 1 - np.exp(log_term3)

        # Now create the output
        out1 = np.exp(log_term1) + np.where(
            x > 0, 
            np.exp(log_term2) * term3[x - 1],
            0
            ) 
        
        # Append the remainder as the final term and add it to the output matrix
        out[idx, :] = np.append(out1, 1 - np.sum(out1))

    return out

This version takes just one value for each of the estimated parameters. So, either you could take the MAP values and plug them in, or calculate it once for each draw from the posterior, which would give you a posterior distribution. To do the latter, this function would need to be modified to allow the r, alpha, a, and b, arguments to be vectors.

Hopefully this is helpful. If you want me to create pull request to add this in somewhere, let me know, and I'll try to figure out how to do that.

ricardoV94 · 2023-09-01T07:46:24Z

This version takes just one value for each of the estimated parameters. So, either you could take the MAP values and plug them in, or calculate it once for each draw from the posterior, which would give you a posterior distribution. To do the latter, this function would need to be modified to allow the r, alpha, a, and b, arguments to be vectors.

If you use xarray you shouldn't need much looping or worry about chains/draws. It will take of vectorizing all operations automatically. Something like that is done here:

pymc-marketing/pymc_marketing/clv/models/beta_geo.py

Line 223 in 4c9f4ca

def expected_num_purchases(

By the way how does your function differ from expected_num_purchases or expected_num_purchases_new_customer? And can't you use the logp function defined inside build_model (we could extract it out)?

Hopefully this is helpful. If you want me to create pull request to add this in somewhere, let me know, and I'll try to figure out how to do that.

That would be much appreciated! Let's just first agree on the details before you jump in

CC @ColtAllen and @larryshamalama

jcfisher · 2023-09-01T19:24:24Z

Thank you for taking a look! I'll see if I can rewrite this to use xarray. (I am new to pymc and xarray, so that is a helpful pointer.)

Here is my understanding of how this function differs from expected_num_purchases and expected_num_purchases_new_customer. In short, those functions return the expected number of purchases for a customer, and this function returns the probability that a given customer has 1, 2, 3, ... purchases.

For a given value of $t$, this function returns $Pr(X(t) = x | r, \alpha, a, b)$ for $x = 1, 2, \ldots,$ max_x. That is, it returns a vector of probabilities for each value of $t$ and $x$. This is equation (8) from Fader et al. (2005).
expected_num_purchases_new_customer returns the expected number of purchases, which is $E(X(t) | r, \alpha, a, b) = \sum_{x} x * Pr(X(t) = x | r, \alpha, a, b)$. That is, this function averages over the values of $x$ for each value of $t$ to get the expected purchase count for each time interval $t$. This is equation (9) from Fader el. al (2005).
expected num_purchases returns the expected number of additional purchases, conditional on the data (a given number of previous purchases) and parameters. So expected_num_purchases_new_customer covers the interval $(0, T]$, and expected_num_purchases covers the interval $(T, T + t]$. This is equation (10) from Fader et al. (2005)

The histogram from the paper averages equation (8) over all values of $t$ observed in the original data, to get expected counts of customers who have made 1, 2, 3, ... purchases. Letting $i$ index customers and $t_i$ represent a customer's age, the histogram calculates $E(f_x) = \sum_i Pr(X(t_i) = x | r, \alpha, a, b)$, which is a function of $x$. (In the note, they calculate $Pr((X(t) = x)$ for the unique values of $t$ and then multiply by the number of customers who have that value of $t$.)

Regarding using logp, good question. I think logp is equation (6) in the Fader et al. (2005) paper. I think the difference between equation (6) and equation (8) is that equation (6) is the joint probability of purchase count $x$, recency $t_x$, and customer age $T$ given the parameters, $Pr(X = x, t_x, T | r, \alpha, a, b) = \mathcal{L}(r, \alpha, a, b | X = x, t_x, T)$, while equation (8) is just the probability of a purchase count $x$ given the parameters as a function of the length of the time interval, $Pr(X(t) = x | r, \alpha, a, b)$. (That is, conditional on the parameters, equation (8) doesn't use recency or frequency.) I could definitely be wrong about that though.

ColtAllen · 2023-09-02T02:35:21Z

Hey @jcfisher,

There is already a PR open for the method you are proposing:

#216

It was abandoned by its original author quite some time ago though, so it may be prudent to open another one.

As to using this method to generate plot_period_transactions, I am curious how it would compare to a backend posterior predictive check, but the latter approach is my preference as it can be used across all transaction models.

jcfisher · 2023-09-05T17:42:57Z

@ColtAllen Thanks! Happy to modify that one or create a new one, whichever is easier. I revised the function above to use xarray, so now it should work with a vector of draws for each parameter.

I tried to find a sample_posterior_predictive function for the BetaGeoModel for comparison, but I'm having trouble getting the one from pymc to work. Is there are standard way of getting draws from the posterior predictive distribution? (Apologies if the answer to this question is obvious.) Without seeing it, I'd assume they would be similar, and agree that if they are, just using a posterior predictive check would be preferable.

wd60622 · 2023-09-07T15:40:20Z

I tried to find a sample_posterior_predictive function for the BetaGeoModel for comparison, but I'm having trouble getting the one from pymc to work. Is there are standard way of getting draws from the posterior predictive distribution? (Apologies if the answer to this question is obvious.) Without seeing it, I'd assume they would be similar, and agree that if they are, just using a posterior predictive check would be preferable.

for the BetaGeoModel, the parameters can be sampled like from within this method. You should just be able to pass self.fit_results or model.fit_results into pm.sample_posterior_predictive. If you need access to the pm.Model instance, it'd be in the model attribute of the BetaGeoModel instance

jcfisher · 2023-09-07T20:52:11Z

Got it, thanks. I was having trouble because I wasn't using a with model: block (same as this issue).

Just to double check that I'm understanding this correctly, to generate draws from the posterior predictive distribution, we'd need a random number generator function for the custom likelihood, right? I'm thinking something like this:

# Random number generator function for BG-NBD model
# Returns the original data sampled from the data generating process, given the parameters
# Original data are:
# 1. Frequency (number of repeat purchases, i.e., number of purchases - 1)
# 2. Recency (time of last purchase relative to now)
# 3. Age (time of first purchase relative to now)
# As inputs, we use:
# * T: "now" time
# * r, alpha, a, b: draw from the posterior parameter values

def beta_geo_rng(T, r, alpha, a, b):

    # Draw from the gamma and beta distributions to get a value of lambda and p
    base_rng = np.random.default_rng()
    lam_draw = base_rng.gamma(shape = r, scale = 1 / alpha)
    p_draw = base_rng.beta(a, b)

    # Time of first purchase
    t = first_purchase_time = np.random.exponential(scale = lam_draw)
    x = 0  # purchases are 0 indexed

    while t < T:
        # Drop out with probability p
        if base_rng.binomial(n = 1, p = p_draw) == 1:
            return (x, T - t, T - first_purchase_time)
        else:
            # If not dropping out, add a new purchase after exponential waiting time
            t += np.random.exponential(scale = lam_draw)
            x += 1
    
    # Return tuple of frequency, recency, age
    return (x, T - t, T - first_purchase_time)

If this doesn't look right, please let me know. I'll work on putting together a comparison of this with the $Pr(X(t) = x | \ldots)$ function above.

Also, hope I'm not hijacking the conversation on this issue. If I should take this discussion somewhere else (or just post a gist with these functions and let you all take it from there), let me know.

ricardoV94 · 2023-10-20T13:07:34Z

@jcfisher I think we already have something like that in here:

pymc-marketing/pymc_marketing/clv/distributions.py

Lines 151 to 186 in 32098cc

    
           def rng_fn(cls, rng, lam, p, T, size): 
        
               size = pm.distributions.shape_utils.to_tuple(size) 
        
               # To do: broadcast sizes 
        
               lam = np.asarray(lam) 
        
               p = np.asarray(p) 
        
               T = np.asarray(T) 
        
               if size == (): 
        
                   size = np.broadcast_shapes(lam.shape, p.shape, T.shape) 
        
               lam = np.broadcast_to(lam, size) 
        
               p = np.broadcast_to(p, size) 
        
               T = np.broadcast_to(T, size) 
        
               x_1 = rng.poisson(lam * T) 
        
               x_2 = rng.geometric(p) 
        
               x = np.minimum(x_1, x_2) 
        
               nzp = x == 0  # nzp = non-zero purchases 
        
               if x.shape == (): 
        
                   if nzp: 
        
                       return np.array([0, 0, float(x_1 > x_2)]) 
        
                   else: 
        
                       return np.array( 
        
                           [rng.beta(x, np.maximum(x_1 + 1 - x_2, 1)) * T, x, float(x_1 > x_2)] 
        
                       ) 
        
               x[nzp] = 1.0  # temporary to avoid errors in rng.beta below 
        
               t_x = rng.beta(x, np.maximum(x_1 + 1 - x_2, 1)) * T 
        
               x[nzp] = 0.0 
        
               t_x[nzp] = 0.0 
        
               return np.stack([t_x, x, (x_1 > x_2).astype(float)], axis=-1)

That was rewritten recently, it used to look more like your implementation:

pymc-marketing/pymc_marketing/clv/distributions.py

Lines 158 to 201 in 05f97cf

    
           def rng_fn(cls, rng, lam, p, T, size): 
        
               size = pm.distributions.shape_utils.to_tuple(size) 
        
               # To do: broadcast sizes 
        
               lam = np.asarray(lam) 
        
               p = np.asarray(p) 
        
               T = np.asarray(T) 
        
               if size == (): 
        
                   size = np.broadcast_shapes(lam.shape, p.shape, T.shape) 
        
               lam = np.broadcast_to(lam, size) 
        
               p = np.broadcast_to(p, size) 
        
               T = np.broadcast_to(T, size) 
        
               output = np.zeros(shape=size + (3,)) 
        
               def sim_data(lam, p, T): 
        
                   t = 0 
        
                   n = 0 
        
                   dropout = 0 
        
                   while not dropout: 
        
                       wait = rng.exponential(scale=1 / lam) 
        
                       # If we didn't go into the future 
        
                       if (t + wait) < T: 
        
                           n += 1 
        
                           t = t + wait 
        
                           dropout = rng.binomial(n=1, p=p) 
        
                       else: 
        
                           break 
        
                   return np.array( 
        
                       [ 
        
                           t, 
        
                           n, 
        
                           dropout, 
        
                       ], 
        
                   ) 
        
               for index in np.ndindex(*size): 
        
                   output[index] = sim_data(lam[index], p[index], T[index]) 
        
               return output

ColtAllen · 2024-03-20T14:09:32Z

@wd60622 this is the utility function which will need to be adapted from lifetimes to plot cumulative transactions over time:

https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L506

billlyzhaoyh · 2024-07-11T15:03:43Z

Have you encountered memory issue when running plot_frequency_recency_matrix? I am following the https://www.pymc-marketing.io/en/stable/notebooks/clv/clv_quickstart.html and then ran it on my own dataset of 10000 examples and noticed peak memory usage going up to 50-70Gb which then the jupyter kernel eventually gets killed. Haven't investigated in depth but perhaps I can try recreating it with a simple dataset.

ColtAllen · 2024-07-11T21:10:51Z

Hey @billlyzhaoyh,

When this happens, use model.thin_fit_result. We've been meaning to add an example for this to the Quickstart per #448.

Also, this shouldn't ever happen with a MAP fit model unless your data has tens of millions of customers or more.

billlyzhaoyh · 2024-07-12T12:13:35Z

@ColtAllen Thanks for getting back to me about this and I will certainly try out the thin_fit_result method for all plotting purposes.

The original memory issue was encountered with tens of millions of customers I put together but then I realised that for the sub-sample of 10000 users, I didn't reset their customer_id to index+1, which solved the problem. I think it might be worth calling out from the documentation on resetting the customer id to index+1 and keeping the mapping from customer_id in your custom database to the ones used for modelling purposes. I naively used the same 7-digit numbers in string format for customer_id I have in my DB which threw a bunch of errors later on as well

ColtAllen added enhancement New feature or request CLV labels Jul 13, 2023

ColtAllen self-assigned this Jul 13, 2023

ColtAllen mentioned this issue Jul 13, 2023

ParetoNBDModel population distribution sampling and save/load test #327

Merged

ColtAllen added the priority: medium label Jul 18, 2023

ColtAllen mentioned this issue Jul 18, 2023

Implement validation PPC plots for CLV models #278

Open

wd60622 mentioned this issue Aug 5, 2023

Plots for CLV model datasets #343

Closed

wd60622 mentioned this issue May 19, 2024

WIP: Add expected cumulative transaction func #692

Closed

13 tasks

ColtAllen mentioned this issue Oct 1, 2024

Add CLV Cumulative Transaction Utlity #1076

Merged

12 tasks

ColtAllen mentioned this issue Oct 24, 2024

Add plot_expected_purchases function #1115

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt Evaluation Plots from `lifetimes` #326

Adapt Evaluation Plots from `lifetimes` #326

ColtAllen commented Jul 13, 2023

zwelitunyiswa commented Jul 13, 2023 via email

ColtAllen commented Jul 16, 2023

ColtAllen commented Jul 16, 2023

zwelitunyiswa commented Jul 16, 2023 •

edited

Loading

ColtAllen commented Jul 18, 2023

wd60622 commented Aug 5, 2023

ColtAllen commented Aug 5, 2023

wd60622 commented Aug 5, 2023

ColtAllen commented Aug 5, 2023

jcfisher commented Aug 31, 2023 •

edited

Loading

ricardoV94 commented Sep 1, 2023 •

edited

Loading

jcfisher commented Sep 1, 2023 •

edited

Loading

ColtAllen commented Sep 2, 2023

jcfisher commented Sep 5, 2023

wd60622 commented Sep 7, 2023

jcfisher commented Sep 7, 2023

ricardoV94 commented Oct 20, 2023 •

edited

Loading

ColtAllen commented Mar 20, 2024

billlyzhaoyh commented Jul 11, 2024

ColtAllen commented Jul 11, 2024

billlyzhaoyh commented Jul 12, 2024 •

edited

Loading

Adapt Evaluation Plots from lifetimes #326

Adapt Evaluation Plots from lifetimes #326

Comments

ColtAllen commented Jul 13, 2023

zwelitunyiswa commented Jul 13, 2023 via email

ColtAllen commented Jul 16, 2023

ColtAllen commented Jul 16, 2023

zwelitunyiswa commented Jul 16, 2023 • edited Loading

ColtAllen commented Jul 18, 2023

wd60622 commented Aug 5, 2023

ColtAllen commented Aug 5, 2023

wd60622 commented Aug 5, 2023

ColtAllen commented Aug 5, 2023

jcfisher commented Aug 31, 2023 • edited Loading

ricardoV94 commented Sep 1, 2023 • edited Loading

jcfisher commented Sep 1, 2023 • edited Loading

ColtAllen commented Sep 2, 2023

jcfisher commented Sep 5, 2023

wd60622 commented Sep 7, 2023

jcfisher commented Sep 7, 2023

ricardoV94 commented Oct 20, 2023 • edited Loading

ColtAllen commented Mar 20, 2024

billlyzhaoyh commented Jul 11, 2024

ColtAllen commented Jul 11, 2024

billlyzhaoyh commented Jul 12, 2024 • edited Loading

Adapt Evaluation Plots from `lifetimes` #326

Adapt Evaluation Plots from `lifetimes` #326

zwelitunyiswa commented Jul 16, 2023 •

edited

Loading

jcfisher commented Aug 31, 2023 •

edited

Loading

ricardoV94 commented Sep 1, 2023 •

edited

Loading

jcfisher commented Sep 1, 2023 •

edited

Loading

ricardoV94 commented Oct 20, 2023 •

edited

Loading

billlyzhaoyh commented Jul 12, 2024 •

edited

Loading