# Robyn: Marketing Mix Modeling Application

This notebook demonstrates the usage of Robyn, a Marketing Mix Modeling (MMM) application. 
We'll go through the main steps of performing robyn_inputs and robyn_engineering.



## 1. Import Required Libraries. Define Paths.

First, be sure to setup your virtual environment. Be sure to switch over to your new environment in this notebook. 

-```cd {root_folder}```

-```python3 -m yourvenv```

-```source yourvenv/bin/activate```

-```cd Robyn/python```

-```pip install -r requirements.txt```


Then import the necessary libraries. Make sure to define your paths below.



In [1]:
import sys

# Add Robyn to path
sys.path.append("/Users/yijuilee/robynpy_release_reviews/Robyn/python/src")

In [2]:
import os
import pandas as pd
import pyreadr
from typing import Dict
from robyn.data.entities.mmmdata import MMMData
from robyn.data.entities.enums import AdstockType
from robyn.data.entities.holidays_data import HolidaysData
from robyn.data.entities.hyperparameters import Hyperparameters, ChannelHyperparameters
from robyn.modeling.entities.modelrun_trials_config import TrialsConfig
from robyn.modeling.model_executor import ModelExecutor
from robyn.modeling.entities.enums import NevergradAlgorithm, Models
from robyn.modeling.feature_engineering import FeatureEngineering

  from .autonotebook import tqdm as notebook_tqdm


## 2.1 Load Mock R data

We need to set the base path for the data directory.
Create a .env file in the same directory as your notebook and put in define the path to the data dir.
for example: ROBYN_BASE_PATH=.../Robyn/R/data

In [3]:
# Read the simulated data and holidays data
dt_simulated_weekly = pd.read_csv("resources/dt_simulated_weekly.csv")

dt_prophet_holidays = pd.read_csv("resources/dt_prophet_holidays.csv")

## Setup MMM Data

We will now set up the MMM data specification which includes defining the dependent variable, independent variables, and the time window for analysis.

In [4]:
def setup_mmm_data(dt_simulated_weekly) -> MMMData:

    mmm_data_spec = MMMData.MMMDataSpec(
        dep_var="revenue",
        dep_var_type="revenue",
        date_var="DATE",
        context_vars=["competitor_sales_B", "events"],
        paid_media_spends=["tv_S", "ooh_S", "print_S", "facebook_S", "search_S"],
        paid_media_vars=["tv_S", "ooh_S", "print_S", "facebook_I", "search_clicks_P"],
        organic_vars=["newsletter"],
        window_start="2016-01-01",
        window_end="2018-12-31",
    )

    return MMMData(data=dt_simulated_weekly, mmmdata_spec=mmm_data_spec)


mmm_data = setup_mmm_data(dt_simulated_weekly)
mmm_data.data.head()

Unnamed: 0,DATE,revenue,tv_S,ooh_S,print_S,facebook_I,search_clicks_P,search_S,competitor_sales_B,facebook_S,events,newsletter
0,2015-11-23,2754372.0,22358.346667,0.0,12728.488889,24301280.0,0.0,0.0,8125009,7607.132915,na,19401.653846
1,2015-11-30,2584277.0,28613.453333,0.0,0.0,5527033.0,9837.238486,4133.333333,7901549,1141.95245,na,14791.0
2,2015-12-07,2547387.0,0.0,132278.4,453.866667,16651590.0,12044.119653,3786.666667,8300197,4256.375378,na,14544.0
3,2015-12-14,2875220.0,83450.306667,0.0,17680.0,10549770.0,12268.070319,4253.333333,8122883,2800.490677,na,2800.0
4,2015-12-21,2215953.0,0.0,277336.0,0.0,2934090.0,9467.248023,3613.333333,7105985,689.582605,na,15478.0


## Feature Preprocessing

We will perform feature engineering to prepare the data for modeling. This includes transformations like adstock and other preprocessing steps.

In [5]:
hyperparameters = Hyperparameters(
    {
        "facebook_S": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0, 0.3],
        ),
        "print_S": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0.1, 0.4],
        ),
        "tv_S": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0.3, 0.8],
        ),
        "search_S": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0, 0.3],
        ),
        "ooh_S": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0.1, 0.4],
        ),
        "newsletter": ChannelHyperparameters(
            alphas=[0.5, 3],
            gammas=[0.3, 1],
            thetas=[0.1, 0.4],
        ),
    },
    adstock=AdstockType.GEOMETRIC,
    lambda_=0.0,
    train_size=[0.5, 0.8],
)

print("Hyperparameters setup complete.")

Hyperparameters setup complete.


In [6]:
# Create HolidaysData object
holidays_data = HolidaysData(
    dt_holidays=dt_prophet_holidays,
    prophet_vars=["trend", "season", "holiday"],
    prophet_country="DE",
    prophet_signs=["default", "default", "default"],
)
# Setup FeaturizedMMMData
feature_engineering = FeatureEngineering(mmm_data, hyperparameters, holidays_data)

In [7]:
featurized_mmm_data = feature_engineering.perform_feature_engineering()

2024-11-13 03:52:10 - robyn.modeling.feature_engineering - INFO - Starting feature engineering process
2024-11-13 03:52:10 - robyn.modeling.feature_engineering - INFO - Starting Prophet decomposition
2024-11-13 03:52:10 - robyn.modeling.feature_engineering - INFO - Starting Prophet decomposition
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.holidays['ds'] = pd.to_datetime(self.holidays['ds'])
03:52:11 - cmdstanpy - INFO - Chain [1] start processing
2024-11-13 03:52:11 - cmdstanpy - INFO - Chain [1] start processing
03:52:11 - cmdstanpy - INFO - Chain [1] done processing
2024-11-13 03:52:11 - cmdstanpy - INFO - Chain [1] done processing
2024-11-13 03:52:11 - robyn.modeling.feature_engineering - INFO - Prophet decomposition complete
2024-11-13 03:52:11 - robyn.modelin

In [8]:
from robyn.visualization.feature_visualization import FeaturePlotter
import matplotlib.pyplot as plt

# Create a FeaturePlotter instance
feature_plotter = FeaturePlotter(mmm_data, hyperparameters)

# Plot spend-exposure relationship for each channel
for channel in mmm_data.mmmdata_spec.paid_media_spends:
    try:
        fig = feature_plotter.plot_spend_exposure(featurized_mmm_data, channel)
        plt.show()
    except ValueError as e:
        print(f"Skipping {channel}: {str(e)}")

2024-11-13 03:52:11 - robyn.visualization.feature_visualization - INFO - Initializing FeaturePlotter
2024-11-13 03:52:11 - robyn.visualization.feature_visualization - INFO - Generating spend-exposure plot for channel: tv_S
2024-11-13 03:52:11 - robyn.visualization.feature_visualization - ERROR - Channel tv_S not found in featurized data results
2024-11-13 03:52:11 - robyn.visualization.feature_visualization - ERROR - Failed to generate spend-exposure plot for channel tv_S: No spend-exposure data available for channel: tv_S
Traceback (most recent call last):
  File "/Users/yijuilee/robynpy_release_reviews/Robyn/python/src/robyn/visualization/feature_visualization.py", line 88, in plot_spend_exposure
    raise ValueError(f"No spend-exposure data available for channel: {channel}")
ValueError: No spend-exposure data available for channel: tv_S
2024-11-13 03:52:11 - robyn.visualization.feature_visualization - INFO - Generating spend-exposure plot for channel: ooh_S
2024-11-13 03:52:11 - rob

Skipping tv_S: No spend-exposure data available for channel: tv_S
Skipping ooh_S: No spend-exposure data available for channel: ooh_S
Skipping print_S: No spend-exposure data available for channel: print_S
Skipping facebook_S: No spend-exposure data available for channel: facebook_S
Skipping search_S: No spend-exposure data available for channel: search_S


In [9]:
# Setup ModelExecutor
model_executor = ModelExecutor(
    mmmdata=mmm_data,
    holidays_data=holidays_data,
    hyperparameters=hyperparameters,
    calibration_input=None,  # Add calibration input if available
    featurized_mmm_data=featurized_mmm_data,
)

# Setup TrialsConfig
trials_config = TrialsConfig(iterations=2000, trials=5)  # Set to the number of cores you want to use

print(
    f">>> Starting {trials_config.trials} trials with {trials_config.iterations} iterations each using {NevergradAlgorithm.TWO_POINTS_DE.value} nevergrad algorithm on x cores..."
)

# Run the model

output_models = model_executor.model_run(
    trials_config=trials_config,
    ts_validation=False,  # changed from True to False -> deacitvate
    add_penalty_factor=False,
    rssd_zero_penalty=True,
    cores=8,
    nevergrad_algo=NevergradAlgorithm.TWO_POINTS_DE,
    intercept=True,
    intercept_sign="non_negative",
    model_name=Models.RIDGE,
)
print("Model training complete.")

# TODO fix graph outputs

2024-11-13 03:52:11 - robyn.modeling.base_model_executor - INFO - Initializing BaseModelExecutor
2024-11-13 03:52:11 - robyn.modeling.model_executor - INFO - Starting model execution with model_name=Models.RIDGE
2024-11-13 03:52:11 - robyn.modeling.base_model_executor - INFO - Input validation successful
2024-11-13 03:52:11 - robyn.modeling.base_model_executor - INFO - Preparing hyperparameters
2024-11-13 03:52:11 - robyn.modeling.base_model_executor - INFO - Completed hyperparameter preparation with 19 parameters to optimize
2024-11-13 03:52:11 - robyn.modeling.model_executor - INFO - Initializing Ridge model builder
2024-11-13 03:52:11 - robyn.modeling.model_executor - INFO - Building models with configured parameters
2024-11-13 03:52:11 - robyn.modeling.ridge_model_builder - INFO - Collecting hyperparameters for optimization... {'prepared_hyperparameters': Hyperparameters(hyperparameters={'facebook_S': ChannelHyperparameters(thetas=[0, 0.3], shapes=None, scales=None, alphas=[0.5, 3]

>>> Starting 5 trials with 2000 iterations each using TwoPointsDE nevergrad algorithm on x cores...


Running trial 1 of total 5 trials: 100%|███████████████████████████████████
2024-11-13 03:54:06 - robyn.modeling.ridge_model_builder - INFO -  Finished in 1.92 mins
Running trial 2 of total 5 trials: 100%|███████████████████████████████████
2024-11-13 03:56:04 - robyn.modeling.ridge_model_builder - INFO -  Finished in 1.96 mins
Running trial 3 of total 5 trials: 100%|███████████████████████████████████
2024-11-13 03:58:03 - robyn.modeling.ridge_model_builder - INFO -  Finished in 1.95 mins
Running trial 4 of total 5 trials: 100%|███████████████████████████████████
2024-11-13 04:00:00 - robyn.modeling.ridge_model_builder - INFO -  Finished in 1.94 mins
Running trial 5 of total 5 trials: 100%|███████████████████████████████████
2024-11-13 04:01:57 - robyn.modeling.ridge_model_builder - INFO -  Finished in 1.94 mins
2024-11-13 04:01:58 - robyn.visualization.model_convergence_visualizer - INFO - Initialized ModelConvergenceVisualizer with n_cuts=20, nrmse_win=[0, 0.998]
2024-11-13 04:01:58

Model training complete.


In [10]:
# Assuming model_outputs.trials[0] is already an object from your model
trial = output_models.trials[0]


# Function to check if an object has a 'shape' attribute
def has_shape(obj):
    return hasattr(obj, "shape")


# Get all attribute names of the object and print their shapes if they have a 'shape' attribute
attribute_names = [attr for attr in dir(trial) if not callable(getattr(trial, attr)) and not attr.startswith("__")]
for attribute_name in attribute_names:
    attribute_value = getattr(trial, attribute_name)
    if has_shape(attribute_value):
        print(f"{attribute_name}: Shape = {attribute_value.shape}")
    else:
        print(f"{attribute_name}: No shape attribute, Type = {type(attribute_value).__name__}")

decomp_rssd: Shape = ()
decomp_spend_dist: Shape = (10000, 34)
elapsed: No shape attribute, Type = float
elapsed_accum: No shape attribute, Type = float
iter_ng: No shape attribute, Type = int
iter_par: No shape attribute, Type = int
lambda_: No shape attribute, Type = float
lambda_hp: No shape attribute, Type = float
lambda_max: Shape = ()
lambda_min_ratio: No shape attribute, Type = float
lift_calibration: No shape attribute, Type = NoneType
mape: No shape attribute, Type = float
nrmse: Shape = ()
pos: No shape attribute, Type = bool
result_hyp_param: Shape = (2000, 34)
rsq_test: No shape attribute, Type = float
rsq_train: No shape attribute, Type = float
rsq_val: No shape attribute, Type = float
sol_id: No shape attribute, Type = str
train_size: No shape attribute, Type = float
trial: No shape attribute, Type = int
x_decomp_agg: Shape = (24000, 29)


In [11]:
# Assuming model_outputs.trials[0] is already an object from your model
trial = output_models.trials[0]


# Function to check if an object has a 'shape' attribute
def has_shape(obj):
    return hasattr(obj, "shape")


# Get all attribute names of the object and print their shapes if they have a 'shape' attribute
attribute_names = [attr for attr in dir(trial) if not callable(getattr(trial, attr)) and not attr.startswith("__")]
for attribute_name in attribute_names:
    attribute_value = getattr(trial, attribute_name)
    if has_shape(attribute_value):
        print(f"{attribute_name}: Shape = {attribute_value.shape}")
        # Check if the attribute is a multi-dimensional array with more than one column
        if len(attribute_value.shape) > 1 and attribute_value.shape[1] > 1:
            try:
                # Attempt to print column names if it's a structured array or DataFrame
                columns = (
                    attribute_value.columns if hasattr(attribute_value, "columns") else attribute_value.dtype.names
                )
                print(f"  Columns: {columns}")
            except AttributeError:
                print("  No column names available.")
    else:
        print(f"{attribute_name}: No shape attribute, Type = {type(attribute_value).__name__}")

decomp_rssd: Shape = ()
decomp_spend_dist: Shape = (10000, 34)
  Columns: Index(['rn', 'coef', 'xDecompAgg', 'xDecompPerc', 'xDecompMeanNon0',
       'xDecompMeanNon0Perc', 'xDecompAggRF', 'xDecompPercRF',
       'xDecompMeanNon0RF', 'xDecompMeanNon0PercRF', 'pos', 'mean_spend',
       'total_spend', 'spend_share', 'spend_share_refresh', 'effect_share',
       'effect_share_refresh', 'rsq_train', 'rsq_val', 'rsq_test',
       'nrmse_train', 'nrmse_val', 'nrmse_test', 'nrmse', 'decomp.rssd',
       'mape', 'lambda', 'lambda_hp', 'lambda_max', 'lambda_min_ratio',
       'solID', 'trial', 'iterNG', 'iterPar'],
      dtype='object')
elapsed: No shape attribute, Type = float
elapsed_accum: No shape attribute, Type = float
iter_ng: No shape attribute, Type = int
iter_par: No shape attribute, Type = int
lambda_: No shape attribute, Type = float
lambda_hp: No shape attribute, Type = float
lambda_max: Shape = ()
lambda_min_ratio: No shape attribute, Type = float
lift_calibration: No shape attri

In [12]:
best_model_id = output_models.select_id
print(f"Best model ID: {best_model_id}")

Best model ID: 2_1962_1


In [13]:
from IPython.display import Image, display
import base64
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd


# 1. Display the MOO Distribution Plot
if "moo_distrb_plot" in output_models.convergence:
    moo_distrb_plot = output_models.convergence["moo_distrb_plot"]
    display(Image(data=base64.b64decode(moo_distrb_plot)))

TypeError: argument should be a bytes-like object or ASCII string, not 'NoneType'

In [None]:
# 2. Display the MOO Cloud Plot
if "moo_cloud_plot" in output_models.convergence:
    moo_cloud_plot = output_models.convergence["moo_cloud_plot"]
    display(Image(data=base64.b64decode(moo_cloud_plot)))

In [None]:
# 3. Print convergence messages
if "conv_msg" in output_models.convergence:
    for msg in output_models.convergence["conv_msg"]:
        print(msg)

In [None]:
# 4. Display time series validation and convergence plots
if "ts_validation_plot" in output_models.convergence:
    ts_validation_plot = output_models.convergence["ts_validation_plot"]
    display(Image(data=base64.b64decode(ts_validation_plot)))

In [None]:
best_model_id = output_models.select_id
print(f"Best model ID: {best_model_id}")

In [None]:
from utils.data_mapper import load_data_from_json, import_input_collect, import_output_models

# Load data from JSON exported from R
raw_input_collect = load_data_from_json(
    "/Users/yijuilee/project_robyn/original/Robyn_original_2/Robyn/robyn_api/data/Pareto_InputCollect.json"
)
raw_output_models = load_data_from_json(
    "/Users/yijuilee/project_robyn/original/Robyn_original_2/Robyn/robyn_api/data/Pareto_OutputModels.json"
)

# Convert R data to Python objects
r_input_collect = import_input_collect(raw_input_collect)
r_output_models = import_output_models(raw_output_models)

# Extract individual components
r_mmm_data = r_input_collect["mmm_data"]
r_featurized_mmm_data = r_input_collect["featurized_mmm_data"]
r_holidays_data = r_input_collect["holidays_data"]
r_hyperparameters = r_input_collect["hyperparameters"]