# Bayesian Marketing Mix Models 



## Objective of this Notebook 

This notebook serves as a guide on how to build MM models in order to obtain contributions of the different channels (touchpoints) on product sales. This toy example includes relevant feature transformations as adstock (decay), seasonality, saturation, lags, etc. 

Likewise, it has been built with toy data which mimics real-life scenarios, but is not a full picture of the marketing spends a real product has during a year.

The basic application of these type of models uses a sophisticated Bayesian models using the famous [LightweightMMM](https://github.com/google/lightweight_mmm) library.

## About Bayesian Models and Probabilistic Programming. 

This excellent [tutorial](https://juanitorduz.github.io/intro_pymc3/) made by the developers of [PYMC](https://www.pymc.io/projects/docs/en/stable/learn/core_notebooks/pymc_overview.html) (one of the leading open-source Bayesian statistics libraries in Python), goes over the fundamentals of Bayesian Machine Learning and a few of its advantages over traditional (frequentist) approaches.


## About the data. 

We are using the same dataset as in the [mmx_linear_model_example](mmx_linear_model_example.ipynb), find more details there.

## 1. Imports and setup

# LightWeight MMX

Marketing Mix is much more advanced. It uses Bayesian models and it allows for seamless calculation of adstock, ROI and more. It also includes a powerful optimizer. It's a great starting point for advanced MMX models.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Import the relevant modules of the lightweight_mmm library
from lightweight_mmm import lightweight_mmm
from lightweight_mmm import optimize_media
from lightweight_mmm import plot
from lightweight_mmm import preprocessing
# Import jax.numpy and any other library we might need.
import jax.numpy as jnp


In [None]:
data_path = "../data/cough_and_cold_sales.csv"

# Define channels to use.
CHANNELS: list[str] = ['tv', 'social_media', 'congress', 'trade']
EXT_VARS: list[str] = ['flu_index', 'stringency_index'] 
# Define target variable.
TARGET = 'sales'
# Define weeks for testing period.
TEST_SIZE = 8
# Seed for reproducibility
SEED = 123456

# load data
df = pd.read_csv(data_path, sep= ";", parse_dates=["date"])
df.set_index('date', inplace=True)
# Lightweight wants the data in a different format

media_data = df[CHANNELS].values
extra_features = df[EXT_VARS].values
target = df[TARGET].values.reshape(1, -1)[0, : , ]
# We assign (arbitrarly in this example) a total cost per channel
costs = jnp.array(
    [
        350,  # TV cost in hundreds of thousands
        100,  # social media
        200,  # congress
        175,  # trade
    ]
)


In [None]:

# Split and scale data.
split_point = len(df) - TEST_SIZE
# Media data
media_data_train = media_data[: split_point, ...]
media_data_test = media_data[split_point: , ...]
# Extra features
extra_features_train = extra_features[: split_point , ...]
extra_features_test = extra_features[split_point: , ...]
# Target
target_train = target[: split_point]

Scaling is essential for many modelling problems and this one is no exception.

Lightweight MMM provides `CustomScaler` which behaves accordingly with sklearn scalers.

In this case, for the `cough_and_cold_sales.csv` the data is ALREADY scaled, so no need to re-escale it, however, for demonstration, we are going to scale the cost data as well. 

**__NOTE__**

---
In most cases you will need 3 or 4 scalers. One scaler for the media data, one for the target and one for costs. Optionally if you are adding extra features those might need an extra scaler. It is very important that you save and "carry with you" those scalers throughout your MMM journey as LighweightMMM will allow you to re-insert these scalers at different points to ensure everything is always in the correct scale and results. If some results don't make sense, it might be a scaling problem.

A few more details on CustomScaler usage:

This scaler can be used in two fashions for both the multiplication and division operation.

By specifying a value to use for the scaling operation.
By specifying an operation used at column level to calculate the value for the
actual scaling operation.

Eg. if one wants to scale the dataset by multiply by 100 you can directly pass multiply_by=100. Value can also be an array of an appropriate shape by which to divide or multiply the data. But if you want to multiply by the mean value of each column, then you can pass multiply_operation=jnp.mean (or any other operation desired).

Operation parameters have the upper hand in the cases where both values and operations are passed, values will be ignored in this case.

Consult the full class documentation if you still need to know more.

In this demo we divide the data on media, extra features and the target by their mean to ensure that the result has a mean of 1. This allows the model to be agnostic to the scale of the inputs (e.g. a user can use either the number of sales or the value of sales). The costs are not used in the model directly, they are only used to inform the prior distributions on the media variables (see the model documentation here). These costs have been scaled down by multiplying by 0.15 to reflect typical ranges in MMMs.

---

In [None]:
# Leaving commented examples of how the scalers for the other datasets will look. 

# media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
# extra_features_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
# target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean, multiply_by=0.15)

# media_data_train = media_scaler.fit_transform(media_data_train)
# extra_features_train = extra_features_scaler.fit_transform(extra_features_train)
# target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)

In [None]:
costs

# Data Quality (EDA)

In [None]:
correlations, variances, spend_fractions, variance_inflation_factors = preprocessing.check_data_quality(
    media_data=media_data,
    target_data=target,
    cost_data=costs,
    extra_features_data=extra_features
)

In [None]:
correlations[0].round(2)

In [None]:
# We should always aim for values under 3 ideally, but surely under 5!
variance_inflation_factors

For any analysis that aims to analyse feature importance, it's primordial to check for correlation in the covariates (covariance) as it can obfuscate or obscure the true relationship between the independent variables and your target variable. Tests like VIF (Variance Inflation Factor) can help you spot covariance issues. In case you assess that the variables are heavily correlated, it's important to either merge them into one feature, or find a way to "break" the correlation and isolate their effect.

# Training the model

The model executes multiple simulations using [Markov Chain Monte Carlo](https://num.pyro.ai/en/stable/mcmc.html) simulations, which require a "warmup" to facilitate convergence. 

In [None]:
mmm = lightweight_mmm.LightweightMMM(model_name="adstock")

number_warmup=1000
number_samples=1000

In [None]:

# For replicability in terms of random number generation in sampling
# reuse the same seed for different trainings.
mmm.fit(
    media=media_data_train,
    media_prior=costs,
    target=target_train,
    extra_features=extra_features_train,
    number_warmup=number_warmup,
    number_samples=number_samples,
    seed=SEED
)

In [None]:
mmm.print_summary()


Ideally we would never allow an RHat value of above 1.1, and we should aim for 0 divergences. If divergences are present, they invalidate our model!! We need to analyse what could cause them with some diagnonstics

## Bayesian Diagnostics

In [None]:
plot.plot_media_channel_posteriors(media_mix_model=mmm)


In [None]:
### Posteriors vs Priors

In [None]:
plot.plot_prior_and_posterior(media_mix_model=mmm)


If the organge line is close to the blue one, it means our "priors" are informative and well selected. If not, then we probably passed on uninformative or wrong priors. We can modify them and re-run the experiment if needed

In [None]:

# We fit the model and check its performance.
plot.plot_model_fit(mmm)

In [None]:

# We have to scale the test media data if we have not done so before.
new_predictions = mmm.predict(
    media=media_data_test,
    extra_features=extra_features_test,
    seed=SEED
)
new_predictions.shape

In [None]:

plot.plot_out_of_sample_model_fit(
    out_of_sample_predictions=new_predictions,
    out_of_sample_target=target[split_point:]
)


# Media insights

In [None]:
media_contribution, roi_hat = mmm.get_posterior_metrics(cost_scaler=cost_scaler)


In [None]:
plot.plot_media_baseline_contribution_area_plot(
    media_mix_model=mmm,
    fig_size=(30,10)
)

In [None]:
plot.plot_bars_media_metrics(
    metric=media_contribution,
    metric_name="Media Contribution Percentage",
    channel_names=CHANNELS
)


In [None]:
plot.plot_bars_media_metrics(metric=roi_hat, metric_name="ROI hat", channel_names=CHANNELS)


In [None]:
# KPI == incremental sales contribution
plot.plot_response_curves(
    media_mix_model=mmm, seed=SEED,
)

# Optimization

In [None]:
prices = jnp.ones(mmm.n_media_channels)
n_time_periods = TEST_SIZE
budget = jnp.sum(jnp.dot(prices, media_data.mean(axis=0))) * n_time_periods
     

In [None]:
# Run optimization with the parameters of choice.
solution, kpi_without_optim, previous_media_allocation = optimize_media.find_optimal_budgets(
    n_time_periods=n_time_periods,
    media_mix_model=mmm,
    extra_features=extra_features_test[:n_time_periods],
    budget=budget,
    prices=prices,
    seed=SEED
)
     

In [None]:

# Obtain the optimal weekly allocation.
optimal_buget_allocation = prices * solution.x
optimal_buget_allocation

In [None]:
# similar renormalization to get previous budget allocation
previous_budget_allocation = prices * previous_media_allocation
previous_budget_allocation

In [None]:

# Both numbers should be almost equal
budget, jnp.sum(solution.x * prices)

In [None]:

# Plot out pre post optimization budget allocation and predicted target variable comparison.
plot.plot_pre_post_budget_allocation_comparison(
    media_mix_model=mmm, 
    kpi_with_optim=solution['fun'], 
    kpi_without_optim=kpi_without_optim,
    optimal_buget_allocation=optimal_buget_allocation, 
    previous_budget_allocation=previous_budget_allocation, 
    figure_size=(10,10),
    channel_names=CHANNELS
)
