
**Welcome to LTVision Module 1**

Copyright (c) Meta Platforms, Inc. and affiliates.  <br>
This source code is licensed under the BSD-style license, which can be found in the LICENSE file in the root directory of this source tree.

In [None]:
from src import LTVSyntheticData
from src import LTVexploratory
from src.graph import save_plot

# Data Preparation

## Generate simulated data or import your own 

To demonstrate an end-to-end implementation of LTVision Module 1, run the below code to generate simulated demo data for 20,000 users with purchases over a period of 180 days. <br>

In [None]:

synth_data_gen = LTVSyntheticData(n_users=20000, random_seed=42)
customer_table = synth_data_gen.get_customers_data()
event_table = synth_data_gen.get_events_data()


When you are ready to run LTVision Module 1 with your own data, follow the data requirements in the documentation and run the following code to import your data. Don’t forget to update ‘example.csv’ with your file path and name: <br>
      
*import pandas as pd* <br>
*customer_table = pd.read_csv('example.csv')* <br>
*event_table = pd.read_csv('example.csv')*


## Format Data

Before analysis begins, use the **“LTVexploratory”** function to map data into structured formats. <br>

In [None]:

da = LTVexploratory(
    customer_table, 
    event_table,
    registration_time_col='registration_date',
    event_time_col='event_date',
    event_name_col='event_name',
    value_col='value'
    )

# Data Validation

## Customer & Event table overview

The *‘customer’* table is a user-level table that defines the Day0 of each user who has engaged with the business, marking the initial point of interaction or anchor event, such as installing the App or making a first purchase. <br> 

In [None]:

customer_table.head()


The *‘event’* table is a transaction-level documentation of all revenue-generating events completed for the advertising unit. <br> 

In [None]:

event_table.head()


## Customer & Event table overlap

The **"plot_customers_intersection"** function shows how much percent of customers are actually purchasers and whether there are customers who are present in the customer table but not present in the events table, and therefore need to be excluded from following analysis.<br> 

In [None]:
# Intersection between users in the two datasets
fig, data = da.plot_customers_intersection()
save_plot(fig, "images/customer_intersection.png")


From the demo data output we can see that: <br> 
1. Upper right: 95.6% of customers are not generating any revenue <br> 
2. Lower right: 4.4% of all customers are revenue-generating customers or purchasers <br> 
3. Lower left: 0.0% means everyone in the customer table are also in the events table, therefore, there’s no need to exclude any customers from following analysis.<br> 
4. Upper left: should always be 0% 0.0% <br> 


# Purchase Frequency

The **“plot_purchases_distribution”** function visualizes the purchase frequency among all purchasers. <br> 

This function has two input parameters: <br> 

*days_limit:* This parameter defines the minimum duration since a customers’ initial interaction, for them to be included in this  visualization. In this example, the time frame is set at 60 days, but you can change it to a time frame that makes most sense for your business. (e.g. 30 days, 120 days or 365 days etc.)<br> 

*truncate_share:* This parameter defines the percentage of all purchasers that are shown in this visualization, or the percentage of ‘outliers’ that are excluded from this visualization, to make this histogram easier to read. In this example, the truncate_share is set to 0.999, which means that the top 0.1% of the highest spenders were excluded from this visualization.<br> 


In [None]:

fig, data = da.plot_purchases_distribution(days_limit=60, truncate_share=0.999)
save_plot(fig, "images/purchases_distribution.png")
fig


From the demo data output, we can see that: <br> 

- 39% of all purchasers have purchased only once <br> 
- 29% purchased twice<br> 
- 14% purchased three times<br> 

# Top spenders' contribution to total revenue

The **“plot_revenue_pareto”** function visualizes if a significant portion of revenue was contributed by a small group of high spenders.<br> 

Similar to *“plot_purchases_distribution”*, the *“plot_revenue_pareto”* function uses the same days_limit parameter to generate the output and it operates on the same customer cohorts as the *“plot_purchases_distribution”* function. <br>

*days_limit:* Defines the minimum duration since a customers’ initial interaction, for them to be included in this visualization. In this example, the limit is set at 60 days.<br> 


In [None]:

fig, data = da.plot_revenue_pareto(days_limit=60)
save_plot(fig, "images/revenue_pareto.png")
fig


From the demo data output we can see that: <br> 
- The top 5% highest spending customers contributed to 69% of total revenue<br>
- The top 10% contributed to 75% of total revenue<br>
- The top 20% contributed to more than 84% of total revenue<br>

# Time to first purchase

The **“plot_customers_histogram_per_conversion_day”** function visualizes the duration between the initial interaction and the first purchase.<br> 
This function also uses the *days_limit* parameter. <br>

In [None]:

fig, data = da.plot_customers_histogram_per_conversion_day(days_limit=60)
save_plot(fig, "images/customers_histogram_per_conversion_day.png")
fig


From the demo data output we can see that: <br>

- 55% of first-time purchases happened within 7 days of the initial interaction<br>
- Since the remaining 45% of first purchases happen beyond the 7-day optimization window, it means that the current digital customer acquisition campaign is missing out on 45% of the purchases that happen outside the 7-day optimization window<br>

# Correlation between short-term and long-term revenue

The **“plot_early_late_revenue_correlation”** function demonstrates the correlation between short-term and long-term purchase values across various timeframes. <br>
This function also uses the *days_limit* parameter.<br>

In [None]:

fig, data = da.plot_early_late_revenue_correlation(days_limit=70)
save_plot(fig, "images/early_late_revenue_correlation.jpeg")


From the demo data output we can see that:  <br>

- There is high correlation in early time frames. For example, the correlation between day-7 revenue and day-10 revenue is a robust 95%
- However, as time progresses, the correlation between day-7 revenue and future revenue weakens significantly. By day-22, this correlation has already dropped below 40%. This suggests that day-7 revenue is not a reliable indicator for revenue on day-22 and beyond.
- This diminishing correlation between early and later revenue is a crucial indicator of the potential value a pLTV strategy could bring to a business.<br>


# Purchaser flow overtime

The **“plot_paying_customers_flow”** function provides further insights into purchasers’ buying behavior overtime, showing how low, medium and high purchasers flow to the same or different classes at a later point in time. <br>

This function has two input parameters: <br>

*early_limit:* This parameter sets the time stamp on the left axis, which shows the cumulative value of a customer by that early point in time, categorized into equally sized and ranked groups: No spend, low spend, medium spend and high spend; It’s set to 7 days by default because most digital campaigns have a 7-day optimization window.

*days_limit:* This parameter sets the time stamp on the right axis, which shows the cumulative value of the same customer at a later point in time, again categorized into equally sized and ranked groups: low spend, medium spend and high spend; feel free to play with this parameter and experiment with different future timestamps (e.g. 120 days, 180 days, 365 days etc.), to gain a more nuanced exploration of your customers’ purchasing behavior across different time frames. 

**Please note** that this visualization includes ALL purchasers defined by the days_limit parameter. In this example, early_limit is set to 7 days and days_limit is set to 60 days, which means this visualization includes ALL purchasers up to day 60 from initial interaction. <br>

In [None]:

fig, data = da.plot_paying_customers_flow(days_limit=60, early_limit=7, spending_breaks={}, end_spending_breaks={})
save_plot(fig, "images/paying_customer_flow.png", dpi=400) # you can increase the dpi to get a higher resolution
fig

In [None]:
data


From the demo data output we can see that: <br>

- 27% of no spenders by day-7 became high spenders by day-60 (99/(156+111+99) = 27%)
- 29% of low spenders by day-7 became high spenders by day-60 (44/(68+38+44) = 29%)
- 23% of medium spenders by day-7 became high spenders by day-60 (35/(48+67+35) = 23%)
- In total, 66% of high spenders by day-60 were not high spenders at day-7 ((99+44+35)/(99+44+35+93) = 178/271 = 66%) <br>


# pLTV Opportunity Size Estimation

**The goal of estimating the pLTV opportunity size is to enable businesses to make informed decision on whether investing in pLTV models and strategies should take priority over other initiatives.** <br>

**Key definitions**: <br>

- *pLTV opportunity size:* The additional revenue lift generated by building a pLTV model and activating a pLTV strategy, compared to the business’ existing customer acquisition strategy, to help decision makers evaluate the potential ROI before investing in a pLTV model and strategy.
- *Baseline/business-as-usual Customer Acquisition strategy:* A broad targeted strategy (without any constraints on the target audience) optimizing towards a standard revenue event (e.g. Purchase, Subscription, in-app purchase etc.), with the goal to acquire new customers.
- *pLTV strategy:* A broad targeted acquisition strategy optimizing towards acquiring high-pLTV customers, with the goal to increase long-term revenue/profitability of a business. <br>


**Key Assumptions**: <br>

The pLTV opportunity size for a specific business depends on the business’ ability to first identify high-value new customers with a pLTV model and then acquire high-value customers with a pLTV strategy through signal-based and scalable digital marketing platforms. <br>

1. *Identify high-value customers:* The pLTV model is able to predict user-level, long-term LTV values for all new users with high accuracy.
2. *Acquire high-value customers:* With the same Customer Acquisition budget, a pLTV strategy implemented on a scalable digital marketing platform, is able to acquire the same number of new customers but with higher pLTV values, compared to those acquired through a business as usual strategy. 
3. *Signal-based & scalable digital marketing platform:* Signal-based platforms with a large and active user community, such as Meta, are scalable enough to find and convert net new high-value customers based on the signals provided to the optimization algorithms. For example, Meta’s 3.2 billion daily active users provide a vast pool of potential customers, making it possible to find and convert relevant customers based on provided optimization signals without compromising on conversion volume. 

**Please note** that the opportunity size estimation in LTVision is primarily based on how pLTV strategies function on Meta platforms. However, these assumptions may also hold true for strategies on other scalable platforms. <br>

In [None]:
# If spending breaks is empty, it will find default values, you can specify your own groups in the format Dict[str, float],
# e.g. {'No spend': 0, 'Low spend': 10, 'Medium spend': 100, 'High spend': 1000}
# if you are a mobile/gaming company, use True for is_mobile or False if you are eCommerce
data = da.estimate_ltv_impact(
    days_limit=60, 
    early_limit=7, 
    spending_breaks={},
    is_mobile=False)
data

Based on the demo data: <br>

- If user identification happens before the 1st Purchase/revenue event, implementing a pLTV strategy could lead to 36,100 additional revenue or an estimated **maximum revenue increase of 203%**, compared to the BAU acquisition strategy
- If user identification relies on the 1st Purchase/revenue event, implementing a pLTV strategy could lead to 2,686 additional revenue or an estimated **maximum revenue increase of 15%**, compared to the BAU acquisition strategy. <br>


Now the estimated additional revenue is in place, businesses can proceed to calculate the **ROI** of the entire pLTV initiative by dividing the additional revenue by the estimated costs. <br>

To calculate the **ROI**, an estimation of the associated costs is needed, which includes both developing and maintaining the pLTV model, and activating and optimizing the pLTV strategy across various platforms. <br>

Please refer to the **LTVision documentation** for detailed explanations on the assumptions and formulas used in the Opportunity Size Estimations. <br>
