# Meridian MMM Tutorial in Snowflake Notebooks

[Meridian](https://github.com/google/meridian/tree/main) is an exciting new open-source MMM released by Google and it's a major upgrade to a previous MMM package called [LightweightMMM](https://github.com/google/lightweight_mmm). Both are Bayesian in approach, but there are some major [feature differences](https://developers.google.com/meridian/docs/migrate).

This notebook will show you how to get this package installed and running inside a Snowflake notebook. There are three major steps:

1. Run the `setup.sql` script, this will set up your environment, create a user role for this project, and other needed Snowflake objects
2. Create a new Snowflake Notebook, or adjust the settings in this one to get it all working.
3. Run the Meridian tutorial below

You can also reference the original tutorial link [here](https://developers.google.com/meridian/notebook/meridian-getting-started).

## Step 1: Run setup.sql

Run this script in a Snowflake SQL worksheet and make sure you switch to the new Role you just created. Come back to these instructions when complete.

## Step 2: Create a Snowflake Notebook using Container Runtimes

When you create a new Notebook, you can follow the settings below. Otherwise, if you find the Notebook Settings option in the top right-hand side, you can follow along.

#### General Tab
* Select any warehouse
* Select Run on Container
* Select "Snowflake ML Runtime GPU vX.X"
* Select your compute pool
* Pick a low idle time (30mins to 1hour)

#### External Access
* Turn on any access integrations specified in the setup.sql or...
* Turn on PYPI_ACCESS_INTEGRATION

Your Notebook will either be created or restarted at this time.

## Step 3: Follow the Tutorial below

### Install and Load Packages

In [None]:
# Install Required Packages

# Note: There are pandas bugs in Meridian v1.0.0, so we need to use v1.0.2 which has to be installed from github.
# Unfortunately v1.0.2 requires Python 3.11, but Snowflake Container Runtime Notebooks are currently limited to 3.10,
# so we simply have to pass the `--ignore-requires-python` parameter to the pip installer. The initial tutorial
# runs fine using Python 3.10.

!pip install --ignore-requires-python google-meridian[and-cuda]@git+https://github.com/google/meridian@v1.0.2

In [None]:
# Package Imports

import warnings
import tempfile
import arviz as az
import pandas as pd
import streamlit as st
import tensorflow as tf
import tensorflow_probability as tfp
from psutil import virtual_memory
from meridian import constants
from meridian.data import load
from meridian.model import model, spec, prior_distribution
from meridian.analysis import optimizer, visualizer, summarizer

warnings.filterwarnings('ignore')

In [None]:
# Check GPU memory
ram_gb = virtual_memory().total / 1e9
print(f'Your runtime has {ram_gb} gigabytes of available RAM\n')
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
print("Num CPUs Available: ", len(tf.config.experimental.list_physical_devices('CPU')))

### Map Data Columns to Meridian Inputs

We will use the `load.CoordToColumns()` function to map all the relevant column names to the appropriate inputs required by the Meridian model. Next we will use the two dictionaries, `media_to_channel` and `media_spend_to_channel` dictionaries to map the various impression and spend data to the corresponding media channels.


We will then load the [provided tutorial data](https://github.com/google/meridian/blob/main/meridian/data/simulated_data/csv/geo_all_channels.csv) into this Notebook using the "+" sign under the File browser. We will use the Meridian-provided function `load.DataFrameDataLoader()` to load pandas dataframes into the model.

In [None]:
# Define all the column mapping to Meridian data spec
coord_to_columns = load.CoordToColumns(
    time='time',
    geo='geo',
    controls=['GQV', 'Competitor_Sales'],
    population='population',
    kpi='conversions',
    revenue_per_kpi='revenue_per_conversion',
    media=[
        'Channel0_impression',
        'Channel1_impression',
        'Channel2_impression',
        'Channel3_impression',
        'Channel4_impression',
    ],
    media_spend=[
        'Channel0_spend',
        'Channel1_spend',
        'Channel2_spend',
        'Channel3_spend',
        'Channel4_spend',
    ],
    organic_media=['Organic_channel0_impression'],
    non_media_treatments=['Promo']
)

media_to_channel = {
    'Channel0_impression': 'Channel0',  # Edited vs tutorial 
    'Channel1_impression': 'Channel1',  # Edited vs tutorial
    'Channel2_impression': 'Channel2',  # Edited vs tutorial
    'Channel3_impression': 'Channel3',  # Edited vs tutorial
    'Channel4_impression': 'Channel4',  # Edited vs tutorial
    'Organic_channel0_impression': 'Channel0'  # Edited vs tutorial
}

# organic_media_to_channel = {
#     'Organic_channel0_impression': 'Channel0'
# }

media_spend_to_channel = {
    'Channel0_spend': 'Channel0',  # Edited vs tutorial
    'Channel1_spend': 'Channel1',  # Edited vs tutorial
    'Channel2_spend': 'Channel2',  # Edited vs tutorial
    'Channel3_spend': 'Channel3',  # Edited vs tutorial
    'Channel4_spend': 'Channel4',  # Edited vs tutorial
}

In [None]:
# Load data from CSV (provided in tutorial). Load into Notebook space on LHS using the '+' symbol.
df = pd.read_csv('geo_all_channels.csv', index_col=0)

In [None]:
# Load from dataframe
data_loader = load.DataFrameDataLoader(
    df=df,
    coord_to_columns=coord_to_columns,
    kpi_type='non_revenue',
    media_to_channel=media_to_channel,
    media_spend_to_channel=media_spend_to_channel,
    # organic_media_to_channel=organic_media_to_channel,  # Doesn't seem to be implemented?
)

data = data_loader.load()

### Create Model

In [None]:
# Define your prior distribution - either as a single-value or by-channel

# # Mu and Sigms for all channels, single-value
roi_mu = 0.2     # Mu for ROI prior for each media channel.
roi_sigma = 0.9  # Sigma for ROI prior for each media channel.

# Mu and Sigma for each channel, warning this may not converge!
# roi_mu    = [0.2, 0.3, 0.4, 0.3, 0.3]
# roi_sigma = [0.7, 0.9, 0.6, 0.7, 0.6]

prior = prior_distribution.PriorDistribution(
    roi_m=tfp.distributions.LogNormal(roi_mu, roi_sigma, name=constants.ROI_M)
)

In [None]:
# Define your model specifications. Full set of options is commented out below.

# Simple Model Spec
model_spec = spec.ModelSpec(prior=prior)

# Full Model Spec
# More details here: https://developers.google.com/meridian/docs/user-guide/configure-model
# model_spec = spec.ModelSpec(
#     prior=prior,
#     media_effects_dist='log_normal',
#     hill_before_adstock=False,
#     max_lag=8,
#     unique_sigma_for_each_geo=False,
#     paid_media_prior_type='roi',
#     roi_calibration_period=None,
#     rf_roi_calibration_period=None,
#     knots=None,  # 1=No Seasonality adjustment
#     baseline_geo=None,
#     holdout_id=None,
#     control_population_scaling_id=None,
# )

In [None]:
mmm = model.Meridian(input_data=data, model_spec=model_spec)
mmm.sample_prior(500)
mmm.sample_posterior(n_chains=7, n_adapt=500, n_burnin=500, n_keep=1000)

### Review Model Diagnostics


In [None]:
model_diagnostics = visualizer.ModelDiagnostics(mmm)
rhat_chart = model_diagnostics.plot_rhat_boxplot()
rhat_chart['width'] = 800
rhat_chart

In [None]:
parameters_to_plot=["roi_m"]
for params in parameters_to_plot:
  az.plot_trace(
      mmm.inference_data,
      var_names=params,
      compact=False,
      backend_kwargs={"constrained_layout": True},
  )

In [None]:
model_diagnostics.plot_prior_and_posterior_distribution()

In [None]:
model_diagnostics.predictive_accuracy_table()

In [None]:
model_fit = visualizer.ModelFit(mmm)
fit_chart = model_fit.plot_model_fit()
fit_chart['width'] = 800
fit_chart

In [None]:
model_fit.plot_model_fit(
    n_top_largest_geos=2,
    show_geo_level=True,
    include_baseline=False,
    include_ci=False
)

### Review Model Summaries

In [None]:
mmm_summarizer = summarizer.Summarizer(mmm)

start_date = '2021-01-25'
end_date = '2024-01-15'

# Create a temporary directory to save the resulting output files.
tmpdir = tempfile.mkdtemp()

In [None]:
mmm_summarizer.output_model_results_summary('summary_output.html', tmpdir, start_date, end_date)

# Export to file
f = open(tmpdir + '/summary_output.html', 'r') 
st.download_button('download', f, 'summary_output.html', mime="text/html")

In [None]:
media_summary = visualizer.MediaSummary(mmm)
media_summary.summary_table()

In [None]:
# Other Available Plots and Tables (just run an individual line and comment all others)
# media_summary.plot_contribution_waterfall_chart()
# media_summary.plot_contribution_pie_chart()
# media_summary.plot_spend_vs_contribution()
# media_summary.plot_roi_bar_chart()
# media_summary.plot_roi_bar_chart(include_ci=False)
# media_summary.plot_cpik()
# media_summary.plot_roi_vs_effectiveness()
# media_summary.plot_roi_vs_effectiveness(disable_size=True)
# media_summary.plot_roi_vs_mroi()
media_summary.plot_roi_vs_mroi(selected_channels=["Channel1", "Channel4"], equal_axes=True)

In [None]:
# Plot media effects (incremental outcome vs spend)
media_effects = visualizer.MediaEffects(mmm)

# Plot all charts together
media_effects.plot_response_curves()

# Plot 1 chart, top 1
media_effects.plot_response_curves(plot_separately=False, num_channels_displayed=1)

# Plot each chart individually
media_effects.plot_response_curves(plot_separately=False, include_ci=False)

In [None]:
media_effects.plot_adstock_decay()

In [None]:
media_effects.plot_hill_curves()

In [None]:
# Model Reach & Frequency
# NOTE: WILL NOT RUN IN THIS DEMO AS WE DO NOT HAVE REACH AND FREQ IN OUR DATASET
reach_and_frequency = visualizer.ReachAndFrequency(mmm)
reach_and_frequency.plot_optimal_frequency()

### Run Simulations and Optimizations

In [None]:
budget_optimizer = optimizer.BudgetOptimizer(mmm)

# Optimize without constraints
optimization_results = budget_optimizer.optimize()

In [None]:
# Optimize with constraints - Fixed Budget
optimization_results = budget_optimizer.optimize(
      selected_times=('2023-01-16', '2024-01-15'),
      budget=70000000,
      pct_of_spend=[0.2, 0.2, 0.2, 0.1, 0.3],
      spend_constraint_lower=[0.3, 0.2, 0.3, 0.3, 0.3],
      spend_constraint_upper=[0.3, 0.2, 0.3, 0.3, 0.3],
)

In [None]:
# Optimize with constraints - Target Minimum ROI
optimization_results = budget_optimizer.optimize(
      selected_times=('2023-01-16','2024-01-15'),
      fixed_budget=False,
      spend_constraint_lower=0.5,
      spend_constraint_upper=0.5,
      target_roi=1,
)

In [None]:
# Optimize with constraints - Target Marginal ROI
optimization_results = budget_optimizer.optimize(
      selected_times=('2023-01-16','2024-01-15'),
      fixed_budget=False,
      spend_constraint_lower=0.5,
      spend_constraint_upper=0.5,
      target_mroi=1,
)

In [None]:
# Other Available Plots and Tables (just run an individual line and comment all others)
optimization_results.plot_spend_delta()
optimization_results.plot_incremental_impact_delta()
optimization_results.plot_budget_allocation()
optimization_results.plot_response_curves()


In [None]:
optimization_results.output_optimization_summary('optimization_output.html', tmpdir)

f = open(tmpdir + '/optimization_output.html', 'r') 
st.download_button('download',f, 'optimization_output.html', mime="text/html")

### Advanced Diagnostics

In [None]:
optimization_results.nonoptimized_data

In [None]:
optimization_results.optimized_data

### Save Model For Later

In [None]:
file_path= tmpdir + '/saved_mmm.pkl'
model.save_mmm(mmm, file_path)

In [None]:
mmm = model.load_mmm(file_path)