# 08 — Executive Summary: Marketing Mix Modeling

This report summarizes the findings of a Bayesian Marketing Mix Model trained on DT Mart's 2015–2016 e-commerce data.

In [None]:
# Support code — data and model setup
import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy.special import expit

warnings.filterwarnings("ignore")
%matplotlib inline

from pymc_marketing.mmm import MMM

from mmm_demo.config import OUTPUTS_DIR, ModelConfig
from mmm_demo.data import load_mmm_weekly_data
from mmm_demo.diagnostics import check_convergence

# Load data
df = load_mmm_weekly_data()
config = ModelConfig()
feature_cols = [config.date_column, *config.channel_columns, *config.control_columns]
x = df[feature_cols]
y = df[config.target_column]

# Load most recent saved model
model_dir = OUTPUTS_DIR / "models"
model_files = sorted(model_dir.glob("mmm_fit_*.nc"))
if not model_files:
    raise FileNotFoundError("No saved model found. Run notebook 02 first.")
model_path = model_files[-1]
mmm = MMM.load(str(model_path))
idata = mmm.idata

# Key summary numbers
date_min = df[config.date_column].min()
date_max = df[config.date_column].max()
total_weeks = len(df)
total_gmv = y.sum()
total_spend = df[config.channel_columns].sum().sum()
n_channels = len(config.channel_columns)

# Convergence check
diag = check_convergence(idata)

print(f"Model loaded: {model_path.name}")
print(
    f"Convergence: {'PASSED' if diag.passed else 'Did not fully converge — results are directional'}"
)

---
## 1. About This Analysis

- **What MMM does:** It separates total sales into the estimated contribution of each marketing channel, background growth, and other business drivers — giving a single consistent picture of what is driving revenue.
- **What data was used:** Weekly e-commerce sales (GMV) and marketing spend across four channels — TV, Sponsorship, Digital, and Online — covering July 2015 to June 2016.
- **What the model estimates:** For every rupee spent in each channel, how much incremental GMV was generated on average, and how long that effect lasted after the campaign ended.

> **Important caveat:** This is a proof-of-concept using 12 months of historical data. Estimates are directional and should not be used for large budget decisions without additional validation.

---
## 2. Key Metrics at a Glance

In [None]:
media_to_gmv_ratio = total_spend / total_gmv * 100

metrics = [
    ("Analysis period", f"{date_min.strftime('%b %Y')} – {date_max.strftime('%b %Y')}"),
    ("Weeks of data", f"{total_weeks}"),
    ("Channels analyzed", ", ".join(config.channel_columns)),
    ("Total media spend (annual)", f"INR {total_spend:,.0f}"),
    ("Total GMV (annual)", f"INR {total_gmv:,.0f}"),
    ("Media spend as % of GMV", f"{media_to_gmv_ratio:.1f}%"),
]

print()
print("  " + "=" * 58)
print("  MARKETING MIX MODEL — KEY METRICS SUMMARY")
print("  " + "=" * 58)
for label, value in metrics:
    print(f"  {label:<30}  {value}")
print("  " + "=" * 58)
print()

---
## 3. What Drives Sales?

The chart below shows how total weekly GMV breaks down across its estimated sources.

In [None]:
contributions = mmm.compute_mean_contributions_over_time(original_scale=True)
avg = contributions.mean()
total_contrib = avg.sum()
shares = (avg / total_contrib * 100).sort_values(ascending=True)

# Clean display names
label_map = {
    "intercept": "Baseline / Organic",
    "TV": "TV",
    "Sponsorship": "Sponsorship",
    "Digital": "Digital",
    "Online": "Online",
    "NPS": "Customer Satisfaction (NPS)",
    "total_Discount": "Promotions & Discounts",
    "sale_days": "Sale Events",
}
shares.index = [label_map.get(i, i) for i in shares.index]

channel_labels = {"TV", "Sponsorship", "Digital", "Online"}
bar_colors = [
    "#2196F3"
    if lbl in channel_labels
    else "#4CAF50"
    if lbl == "Baseline / Organic"
    else "#9E9E9E"
    for lbl in shares.index
]

fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.barh(
    shares.index, shares.values, color=bar_colors, edgecolor="white", height=0.6
)

for bar, val in zip(bars, shares.values, strict=False):
    ax.text(
        val + 0.4,
        bar.get_y() + bar.get_height() / 2,
        f"{val:.1f}%",
        va="center",
        fontsize=10,
        color="#333333",
    )

ax.set_xlabel("Estimated share of total GMV (%)", fontsize=11)
ax.set_title(
    "Estimated Contribution to Total GMV (weekly average)", fontsize=13, pad=14
)
ax.set_xlim(0, shares.max() + 10)
ax.spines[["top", "right"]].set_visible(False)

from matplotlib.patches import Patch

legend_handles = [
    Patch(facecolor="#4CAF50", label="Baseline / Organic"),
    Patch(facecolor="#2196F3", label="Marketing Channels"),
    Patch(facecolor="#9E9E9E", label="Other Business Drivers"),
]
ax.legend(handles=legend_handles, loc="lower right", fontsize=9)

plt.tight_layout()
plt.show()

**Key takeaways:**

- The largest share of GMV comes from **baseline / organic sources** — this is the sales floor the business would achieve even without any marketing spend.
- **Sponsorship** is the highest-contributing paid channel, reflecting its dominant share of the media budget.
- Marketing channels collectively explain a meaningful portion of GMV, but confidence intervals are wide given only 12 months of data — treat the ranking as directional, not definitive.

---
## 4. Where Is the Budget Going vs. Where It's Working?

This chart compares how budget is currently allocated with each channel's estimated contribution to GMV.

In [None]:
channel_cols = config.channel_columns

spend_share = df[channel_cols].mean() / df[channel_cols].mean().sum() * 100

channel_contrib_avg = avg[[c for c in avg.index if c in channel_cols]]
contrib_share = channel_contrib_avg / channel_contrib_avg.sum() * 100

comparison = pd.DataFrame(
    {
        "Budget share (%)": spend_share,
        "GMV contribution share (%)": contrib_share,
    }
).reindex(channel_cols)

x_pos = np.arange(len(channel_cols))
width = 0.35

fig, ax = plt.subplots(figsize=(9, 5))
bars1 = ax.bar(
    x_pos - width / 2,
    comparison["Budget share (%)"],
    width,
    label="Budget share",
    color="#78909C",
    edgecolor="white",
)
bars2 = ax.bar(
    x_pos + width / 2,
    comparison["GMV contribution share (%)"],
    width,
    label="GMV contribution share",
    color="#1565C0",
    edgecolor="white",
)

for bar in list(bars1) + list(bars2):
    ax.text(
        bar.get_x() + bar.get_width() / 2,
        bar.get_height() + 0.5,
        f"{bar.get_height():.1f}%",
        ha="center",
        va="bottom",
        fontsize=9,
    )

ax.set_xticks(x_pos)
ax.set_xticklabels(channel_cols, fontsize=11)
ax.set_ylabel("Share (%)", fontsize=11)
ax.set_title(
    "Budget Allocation vs. Estimated GMV Contribution by Channel", fontsize=12, pad=12
)
ax.legend(fontsize=10)
ax.spines[["top", "right"]].set_visible(False)
ax.set_ylim(0, max(comparison.max().max() + 12, 60))

plt.tight_layout()
plt.show()

**Key takeaways:**

- **Sponsorship** absorbs nearly half the budget and also delivers the largest estimated contribution — broadly efficient at current levels.
- **Online and Digital** channels each receive a moderate share of budget; their relative contribution compared to spend is worth monitoring as more data accumulates.
- **TV** has the smallest budget share; its contribution estimate is directional only — the channel may warrant a structured spending test before drawing firm conclusions.

---
## 5. How Long Do Campaigns Keep Working?

After a campaign ends, its influence on sales does not stop immediately. The table below shows how long each channel's effect is estimated to persist.

In [None]:
summary_alpha = az.summary(idata, var_names=["adstock_alpha"])

carryover_rows = []
for ch in channel_cols:
    param = f"adstock_alpha[{ch}]"
    alpha = summary_alpha.loc[param, "mean"]
    carryover_rows.append(
        {
            "Channel": ch,
            "Effect remaining after 1 week": f"{alpha * 100:.0f}%",
            "Effect remaining after 2 weeks": f"{alpha**2 * 100:.0f}%",
            "Effect remaining after 4 weeks": f"{alpha**4 * 100:.0f}%",
        }
    )

carryover_df = pd.DataFrame(carryover_rows).set_index("Channel")

print()
print(carryover_df.to_string())
print()

This tells us how long after a campaign ends it continues to drive sales. Longer carryover means the campaign investment has extended impact — pausing spend does not immediately eliminate its effect on GMV.

> These are model estimates based on 12 months of data. Industry benchmarks typically show TV with longer carryover (6–12 weeks) and digital with shorter carryover (1–3 weeks). Treat these figures as a starting point for internal alignment, not as precise measurements.

---
## 6. Directional Budget Recommendation

Based on estimated marginal returns at current spend levels, the table below shows how the same total budget could be redistributed to improve overall GMV impact.

**Directional only — requires validation before acting.**

In [None]:
summary_lam = az.summary(idata, var_names=["saturation_lam"])
summary_beta = az.summary(idata, var_names=["saturation_beta"])

current_spend_mean = df[channel_cols].mean()
channel_max_spend = df[channel_cols].max()

# Marginal return at current spend level
marginal_returns = {}
for ch in channel_cols:
    lam = summary_lam.loc[f"saturation_lam[{ch}]", "mean"]
    beta = summary_beta.loc[f"saturation_beta[{ch}]", "mean"]
    x_scaled = float(current_spend_mean[ch] / channel_max_spend[ch])
    s = expit(lam * x_scaled)
    marginal_returns[ch] = beta * lam * 2 * s * (1 - s)

marginal_series = pd.Series(marginal_returns)
weights = marginal_series / marginal_series.sum()
total_budget = current_spend_mean.sum()
suggested = weights * total_budget

budget_table = pd.DataFrame(
    {
        "Channel": channel_cols,
        "Current weekly spend (INR)": [
            f"{current_spend_mean[ch]:>15,.0f}" for ch in channel_cols
        ],
        "Current budget share": [
            f"{current_spend_mean[ch] / total_budget * 100:.1f}%" for ch in channel_cols
        ],
        "Suggested weekly spend (INR)": [
            f"{suggested[ch]:>15,.0f}" for ch in channel_cols
        ],
        "Suggested budget share": [
            f"{suggested[ch] / total_budget * 100:.1f}%" for ch in channel_cols
        ],
    }
).set_index("Channel")

print()
print(budget_table.to_string())
print()
print(f"  Total weekly budget (unchanged): INR {total_budget:,.0f}")
print()

> **How to read this table:** The suggested allocation shifts budget toward channels with higher estimated incremental returns at current spend levels. Channels receiving a smaller suggested share may be experiencing diminishing returns — not that they should be cut entirely.

> **Before acting on this:** Run a 3-month holdout test, collect a second year of data, and re-run the model. Use these figures to design experiments, not to immediately move budgets.

---
## 7. Confidence Assessment

| Question | Confidence |
|---|---|
| Which channels contribute to GMV? | Medium — directional ranking likely correct |
| Exact contribution percentages | Low — 12 months of data is insufficient for precise estimates |
| Budget reallocation recommendations | Low-Medium — use as a starting point for testing |
| Campaign carryover estimates | Medium — consistent with industry benchmarks |

To increase confidence: collect 2+ years of weekly data, validate on held-out months, and re-run the model.

---
## 8. Recommended Next Steps

1. Validate the model on held-out data (e.g., withhold the last 3 months and check how well it predicts those weeks).
2. Collect 2 years of historical weekly spend and sales data to improve the reliability of channel attribution.
3. Run controlled spending experiments (incrementality tests) to independently verify the channel attribution findings.
4. Refresh the model quarterly as new data arrives, so recommendations stay current with campaign strategy changes.