-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt Evaluation Plots from lifetimes
#326
Comments
I have also found tracking plots incredibly useful for explaining to
stakeholders. CLVTools has some very nice implementations:
https://www.clvtools.com/articles/CLVTools.html (scroll down)
…On Thu, Jul 13, 2023 at 3:24 PM Colt Allen ***@***.***> wrote:
lifetimes contains a variety of plotting functions for model evaluation:
- plot_period_transactions
- plot_calibration_purchases_vs_holdout_purchases
- plot_frequency_recency_matrix
- plot_probability_alive_matrix
- plot_expected_repeat_purchases
- plot_history_alive
- plot_cumulative_transactions
- plot_incremental_transactions
- plot_transaction_rate_heterogeneity
- plot_dropout_rate_heterogeneity
This notebook
<https://github.com/ColtAllen/marketing-case-study/blob/main/case-study.ipynb>
provides some examples of their use.
@larryshamalama <https://github.com/larryshamalama> already added the
matrix plots, but the others would need to be modified for posterior
confidence intervals. Some of them also require utility functions that I'll
create PRs for soon. I don't consider plot_history_alive to be all that
useful, but if there's interest it's something to consider.
It's also worth reviewing the research papers
<#91> for additional
ideas of plots to add.
—
Reply to this email directly, view it on GitHub
<#326>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH3QQV7R2Z43XJLC3ZMCPZDXQBDQVANCNFSM6AAAAAA2JMSLJA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Unless I'm mistaken, this means we can't use a |
@ColtAllen I did not realize that these were added/in there. That is very cool. Thanks for pointing them out. |
Some WIP ideas for adapting |
I have a PR to add ax argument to the current CLV plotting functions. Might be a good standard here in order to give additional flexibility to the user |
I also want to add a parameter to return the formatted data instead of the plot, in case anyone wants to build the plots in something other than In the case of the |
Love that idea. Separate the data transformation from plotting. Total sense to me |
The ArviZ library also has a lot of useful plotting options to compliment whatever we add in https://github.com/pymc-labs/pymc-marketing/blob/main/docs/source/notebooks/clv/dev/pareto_nbd.ipynb |
Hi all, I was trying to create the histogram comparing observed and expected purchase counts for the BG/NBD model for myself when I came across this issue. This would be the histogram that a Based on that note, they calculated that histogram by estimating They calculated this in Excel. I essentially transcribed their Excel functions into Python, making the following function:
This version takes just one value for each of the estimated parameters. So, either you could take the MAP values and plug them in, or calculate it once for each draw from the posterior, which would give you a posterior distribution. To do the latter, this function would need to be modified to allow the Hopefully this is helpful. If you want me to create pull request to add this in somewhere, let me know, and I'll try to figure out how to do that. |
If you use xarray you shouldn't need much looping or worry about chains/draws. It will take of vectorizing all operations automatically. Something like that is done here:
By the way how does your function differ from
That would be much appreciated! Let's just first agree on the details before you jump in CC @ColtAllen and @larryshamalama |
Thank you for taking a look! I'll see if I can rewrite this to use xarray. (I am new to pymc and xarray, so that is a helpful pointer.) Here is my understanding of how this function differs from
The histogram from the paper averages equation (8) over all values of Regarding using |
Hey @jcfisher, There is already a PR open for the method you are proposing: It was abandoned by its original author quite some time ago though, so it may be prudent to open another one. As to using this method to generate |
@ColtAllen Thanks! Happy to modify that one or create a new one, whichever is easier. I revised the function above to use xarray, so now it should work with a vector of draws for each parameter. I tried to find a |
for the |
Got it, thanks. I was having trouble because I wasn't using a Just to double check that I'm understanding this correctly, to generate draws from the posterior predictive distribution, we'd need a random number generator function for the custom likelihood, right? I'm thinking something like this:
If this doesn't look right, please let me know. I'll work on putting together a comparison of this with the Also, hope I'm not hijacking the conversation on this issue. If I should take this discussion somewhere else (or just post a gist with these functions and let you all take it from there), let me know. |
@jcfisher I think we already have something like that in here: pymc-marketing/pymc_marketing/clv/distributions.py Lines 151 to 186 in 32098cc
That was rewritten recently, it used to look more like your implementation: pymc-marketing/pymc_marketing/clv/distributions.py Lines 158 to 201 in 05f97cf
|
@wd60622 this is the utility function which will need to be adapted from https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py#L506 |
Have you encountered memory issue when running plot_frequency_recency_matrix? I am following the https://www.pymc-marketing.io/en/stable/notebooks/clv/clv_quickstart.html and then ran it on my own dataset of 10000 examples and noticed peak memory usage going up to 50-70Gb which then the jupyter kernel eventually gets killed. Haven't investigated in depth but perhaps I can try recreating it with a simple dataset. |
Hey @billlyzhaoyh, When this happens, use model.thin_fit_result. We've been meaning to add an example for this to the Quickstart per #448. Also, this shouldn't ever happen with a MAP fit model unless your data has tens of millions of customers or more. |
@ColtAllen Thanks for getting back to me about this and I will certainly try out the The original memory issue was encountered with tens of millions of customers I put together but then I realised that for the sub-sample of 10000 users, I didn't reset their customer_id to index+1, which solved the problem. I think it might be worth calling out from the documentation on resetting the customer id to index+1 and keeping the mapping from customer_id in your custom database to the ones used for modelling purposes. I naively used the same 7-digit numbers in string format for customer_id I have in my DB which threw a bunch of errors later on as well |
lifetimes
contains a variety of plotting functions for model evaluation:plot_period_transactions
plot_calibration_purchases_vs_holdout_purchases
plot_frequency_recency_matrix
plot_probability_alive_matrix
plot_expected_repeat_purchases
plot_history_alive
plot_cumulative_transactions
plot_incremental_transactions
plot_transaction_rate_heterogeneity
plot_dropout_rate_heterogeneity
This notebook provides some examples of their use.
@larryshamalama already added the matrix plots, but the others would need to be modified for posterior confidence intervals. Some of them also require utility functions that I'll create PRs for soon. I don't consider
plot_history_alive
to be all that useful, but if there's interest it's something to consider.It's also worth reviewing the research papers for additional ideas of plots to add.
The text was updated successfully, but these errors were encountered: