# Propensity to Lapse: Options in an Infrequent Purchase Context

Main notebook documenting requirements, research, key findings, references and outcomes

## Summary

The aim of ‘propensity to lapse’ is to understand who in our customer base is ‘active’, ‘lapsing’ or ‘lost’. This enables us to communicate differently to customers with these different attributes. The propensity to lapse output needs to be calculated at an individual level as opposed to at group level. The output should be a **single categorical value against each customer record.**

It has been agreed with stakeholders that **two** segmentations will be created serving different use cases
* Transactional - focussing exclusively on propensity to lapse in the transactional context
* Behavioural - platform based engagement


## Requirements

* Single catagorical value against each customer record 
  * active
  * lapsing
  * lost
* Must be reproducable in an automated way if required

## Options

* Rule Based Segmentation:
  * Calculate purchase frequency. Compare to time since last transaction.
* Cohort Based:
  * Group customers by aquisition date. Compare individual to cohort.
* BTYD Modelling with `lifetimes` package
  * Generate a probability of life based on transactions (requires at least 2)
* BTYD + Covariates:
  * As above but includes seasonality, external factors, Tenure effect (transactions more regular when first aquired), Cohort Shifts (Groups aquired at different dates can behave differently)
* Survival Analysis (Optional Covariates) `lifelines`, `scikit-survival` packages:
  * Estiamtes probability of custoemr lapsing and when


## Key Data

* Legacy Theta datasets: `mpb-platform-prod-f68e.theta_data_to_MPB` (use V2 data!)


## Thoughts

* Do we have access to catagorical data about customers? I assume not
* Would we want to classify customers with less than 2 transactions at all?
  * Maybe we assume active? Depends on definition
* How to evaluate performance? 
  * Train on historical data and see if predictions are accurate, CV etc.
  * Need to be careful of leakage in time
* How much data do we have (yrs)
* What is our defintion of lapsing, lost in terms of time?
   * 9 months / 15 months?


## References

* [Propensity to Lapse Kick off Slides](https://docs.google.com/presentation/d/179-C5yNzVS4nNXA_CbQMXH8GeMVuTY9nrZhiqoA1vOs/edit?slide=id.g370e1de7d7d_0_3#slide=id.g370e1de7d7d_0_3)
* [Options for approaches PtL Doc](https://docs.google.com/document/d/1ZJEun7c_fys9jN0EfnKOlXg8tthy1ArEfVPX_AJT86I/edit?tab=t.0)
* [Theta on CLV and ML Video](https://4634547.hs-sites.com/share/hubspotvideo/194067088397?utm_campaign=General&utm_medium=email&_hsenc=p2ANqtz-_eYBqy8-3U4YXMu1LaJqdalfo9lsuAElhOUYz4F9cDF1tcQDs6WmwKDzIzrVC8JBkcO_DOfHEBiqni60GK2fje8VYl2Q&_hsmi=374791551&utm_content=374791551&utm_source=hs_email)
