---
author: 
  - name: Matthew Reda
    email: redam94@gmail.com
license: "CC BY"
copyright: 
  holder: Matthew Reda
  year: 2024
citation: true
---

# The Illusion of Significance

> The Selection Challenge in Media Mix Modeling

## Introduction

Effective Media Mix Modeling (MMM) requires accurately controlling for numerous external factors (e.g., macroeconomic conditions, seasonality, trends, competitor actions) and determining appropriate functional forms for media effects (e.g., adstock, saturation) to isolate marketing impact and avoid omitted variable bias. Identifying these essential components frequently involves evaluating many potential factors and transformations against time-series data. A prevalent approach selects these components based on statistical significance tests (p-values, t-statistics from regression models, or correlation analysis) with the objective of identifying key sales drivers and constructing a reliable model.

Relying solely on statistical significance for this selection process, however, especially when dealing with data collected over time, encounters significant challenges, notably the potential for an illusion of significance. Selecting model components based solely on these tests undermines the reliability and standard interpretation of statistical measures like p-values and confidence intervals for the elements ultimately chosen.

This occurs because the selection process itself, when it involves conducting numerous tests on the same dataset, can inadvertently resemble problematic research practices such as **p-hacking** (analyzing data in multiple ways but only reporting results that appear significant) or **data snooping** (forming hypotheses after observing patterns in the data). Such practices violate the principles underlying standard statistical inference, particularly the assumption that hypotheses are defined before examining the data, which is essential for p-values to be interpreted correctly.

Analyzing data collected over time introduces additional complexities. For example, underlying trends (a property called non-stationarity) can create misleading correlations between unrelated variables, making them appear connected when they are not (this is known as spurious regression). Similarly, patterns that repeat over time (autocorrelation) or regular seasonal effects, if not properly addressed in the model, can invalidate the results of standard statistical tests used during the selection process, for instance, by making statistical errors appear smaller than they truly are.

Consequently, the p-values associated with selected model components often appear smaller (more significant) than they genuinely are. This can lead to overconfidence in the findings and incorrect conclusions about what influences sales. These p-values obtained after selection do not accurately reflect the true probability of observing such a strong relationship by chance, even if the time-series characteristics of the data were perfectly managed. Furthermore, focusing solely on statistical significance overlooks another important consideration: variables necessary for statistical control (to account for confounding effects between marketing and external factors) might not themselves appear statistically significant or might have estimated effects close to zero. Removing such variables simply because they lack statistical significance can be detrimental, as their inclusion might still be essential for obtaining more accurate and less biased estimates of marketing effectiveness; their primary role might be statistical adjustment, rather than having a large, directly measurable impact themselves. 

::: {.callout-warning collapse="true"}
## Consequences of this Approach
Media mix optimization and strategic media planning based on models with mis-specified factors or biased coefficients can lead to inefficient budget allocation and suboptimal marketing performance. Moreover, if confounded macroeconomic factors shift, recommendations derived from such flawed models may prove particularly unreliable, as the model might misinterpret the impact of these external changes.
:::

::: {.callout-tip collapse="false"}
## Analogy
Imagine interviewing thousands of people to understand typical financial habits. If you only seek out and report on the experiences of those who recently won a major lottery, your conclusions about 'typical' financial success would be extremely misleading. You selected them precisely because they had an exceptionally rare, positive outcome (winning the lottery). Similarly, when numerous statistical tests are performed on potential model components (factors or transformations), finding one that meets a 'significance' threshold (like winning the lottery) is more likely simply because many tests were run. The resulting p-value loses its standard meaning because the selection process focused on finding an extreme or 'lucky' result from many possibilities, rather than reflecting the probability of such a finding under normal circumstances.
:::



## Core Statistical Challenges in MMM Selection
The selection of factors and transformations in MMM faces a confluence of statistical difficulties:

- **Multiple Testing Problem:** Simultaneously testing numerous potential external factors and functional forms for media variables drastically increases the probability of false positives (Type I errors). Standard significance thresholds (e.g., p < 0.05) are inadequate for controlling the overall error rate across such extensive testing.

- **Post-Selection Inference (PSI) Invalidity:** Standard statistical inference (p-values, confidence intervals) lacks theoretical validity when applied after model components have been selected using the same data. These post-selection metrics are often biased (e.g., p-values too small, confidence intervals too narrow), failing to maintain their nominal error rates or coverage probabilities.

- **Time-Series Complications:** MMM data's time-series nature introduces further challenges:

    - *Autocorrelation:* Residuals are often correlated over time, violating standard regression assumptions and rendering OLS p-values inaccurate.

    - *Non-stationarity:* Trends in variables can lead to spurious regression—finding significant relationships where none exist causally.

    - *Seasonality:* Unaccounted-for seasonal patterns can induce spurious correlations. These properties can invalidate the significance tests used during selection, compounding the problem.

### Multiple Testing Problem

### Post-Selection Inference (PSI) Invalidity

### Time-Series Complications

## Common Problematic Selection Practices in MMM
Several widely used methodologies are particularly susceptible to the challenges outlined above:

- **Stepwise Regression:** Automated procedures (forward, backward, bidirectional) iteratively add/remove variables based on significance thresholds. These are highly prone to multiple testing errors and instability, especially with autocorrelated or non-stationary time-series data, potentially selecting variables based on spurious correlations or invalid tests.

- **Significance-Based Transformation Selection:** Testing numerous functional forms (e.g., hundreds of adstock decays or saturation curves) for each media variable and selecting the one yielding the "most significant" coefficient represents a severe form of multiple testing, risking the selection of forms that fit noise.

- **Correlation Screening / Univariate Tests:** Selecting external factors based on simple correlations or individual significance tests ignores multicollinearity and is highly vulnerable to spurious findings driven by trends or seasonality, while also failing to account for multiple comparisons.

- **Best Subset Selection:** While potentially better if using information criteria (AIC/BIC), evaluating numerous model subsets still incurs a multiple testing burden and remains vulnerable to underlying time-series issues if not properly addressed within the model structure.

## Consequences of Flawed Selection in MMM
Employing naive selection methods and subsequent inference leads to significant practical problems:

- **Misattribution of ROI & Effects:** Biased estimates for marketing coefficients (ROI) and potentially inaccurate representations of media dynamics (adstock, saturation).

- **Flawed Budget Allocation:** Suboptimal marketing investment decisions stemming from unreliable ROI figures.

- **Poor Understanding of Business Drivers:** Incorrect identification of baseline factors (trends, seasonality, macroeconomics) and media response patterns.

- **Model Instability & Non-Reproducibility:** Selected factors and transformations may vary considerably with data updates, reducing model credibility.

- **Overfitting:** Models capture noise specific to the historical data, resulting in poor predictive performance for forecasting or simulations.

- **Misinterpretation of Control Factor Coefficients:** Attributing causal effects to the coefficients of baseline or control factors (e.g., macroeconomic variables, competitor activity) included in the model. These factors are typically observational and likely confounded themselves; their coefficients primarily reflect statistical association and adjustment needed to isolate media effects, not necessarily isolated causal impacts. This misinterpretation is related to the "Table 2 fallacy," where coefficients from a multivariable model are improperly treated as independent causal effects.

## Recommended Approaches and Considerations
Addressing these challenges requires more robust methodologies:

- **Rigorous Time-Series Handling:** Explicitly model or remove seasonality (e.g., dummies, Fourier terms, decomposition); test for and address non-stationarity (e.g., differencing); incorporate theoretically sound lags for media (adstock) and potentially external variables.

- **Regularization Methods (LASSO, Ridge, Elastic Net):** Handle many predictors simultaneously, perform coefficient shrinkage and implicit variable selection, often yielding more stable results than stepwise methods. Must be applied in conjunction with appropriate time-series structures.

- **Information Criteria (AIC, BIC):** Use for comparing non-nested models that correctly account for time-series properties, providing a more principled approach than p-value thresholds alone.

- **Time-Series Cross-Validation:** Employ methods like rolling-origin validation to assess out-of-sample predictive performance robustly.

- **Bayesian Frameworks:** Offer a probabilistic approach to uncertainty.

    - *Priors on Functional Forms:* Incorporate prior knowledge or average over plausible media transformations (adstock/saturation) instead of hard selection.

    - *Sparsity-Inducing Priors (e.g., Regularized Horseshoe):* Provide principled variable selection for external factors by shrinking irrelevant coefficients while retaining influential ones, directly modeling inclusion uncertainty.

- **Causal Inference Techniques:** Explore advanced time-series methods if the primary goal is establishing causal links (use with caution).

- **Domain Knowledge & Theory:** Prioritize pre-selecting candidate factors and transformation ranges based on business logic, economic theory, and prior research. Validate final model components for plausibility and stability.

::: {.callout-important}
## Improper Use of Domain Knowledge
Exogenous factors in MMM are frequently confounded with media variables or other unobserved drivers. Consequently, the estimated coefficient for an exogenous variable may not represent its direct causal impact on the outcome but rather the statistical adjustment necessary to deconfound the estimated media effects. Rejecting such a variable because its coefficient sign contradicts simple causal expectations might inadvertently remove a necessary control variable, potentially leading to more biased estimates of media effectiveness.
:::
Adopting these more rigorous approaches is fundamental to developing media mix models that are statistically sound, reliable, and strategically valuable.