# 00 — Problem context

## Objective
Define the business question, target KPI, channels, scope, and assumptions for a Marketing Mix Model (MMM) built from scratch.

---

## Business question

### Question
What is the incremental impact of each marketing channel on weekly purchases (conversions) over time?

### Intended decision (what this model will be used for)

#### Decision
Use this MMM to guide **cross-channel budget allocation** (how much to invest in each channel) and to run **scenario analysis** (what happens if spend changes).

#### Rationale
MMM works best for **strategic / portfolio-level** decisions because it uses aggregated time series data and estimates *average* effects over time.

#### Low-level allocation (within-channel)
This project aims to eventually support **low-level allocation** (e.g. within Paid Search or Paid Social), but that requires an additional layer beyond MMM:
- experimentation (geo-lift / incrementality tests) where possible
- within-channel models or rules (campaign/creative segmentation)
- operational constraints (pacing, min/max spend)

For this reason, the initial MMM is treated as a **top-down allocator** across channels, not a daily campaign optimizer.

---

## Target (KPI)

### Decision
Use **ALL_PURCHASES** as the primary KPI.

### Definition
Total number of purchases completed in a given time period.

- Unit: count (purchases)
- Raw granularity: daily
- Modeling granularity: weekly (daily aggregated to weekly)

### Rationale
- Purchase counts are stable and interpretable for MMM learning.
- Revenue-based KPIs can be explored later once the modeling pipeline is stable.

### What would change my mind
- If order value varies strongly due to promotions/pricing changes, a revenue KPI may be preferable.

---

## Channels (initial)

### Decision
Use **paid media spend** as marketing inputs (baseline MMM).

Channels available in the dataset:
- GOOGLE_PAID_SEARCH_SPEND
- GOOGLE_SHOPPING_SPEND
- GOOGLE_PMAX_SPEND
- GOOGLE_DISPLAY_SPEND
- GOOGLE_VIDEO_SPEND
- META_FACEBOOK_SPEND
- META_INSTAGRAM_SPEND
- META_OTHER_SPEND
- TIKTOK_SPEND

### Rationale
Spend is the standard MMM input and provides the cleanest baseline for incremental impact estimation.

### What we will exclude initially
- Clicks and impressions (kept for later exploration)
- Organic/direct click metrics as “channels” (may be used as controls later)

---

## Potential control variables

Controls represent non-marketing drivers of the KPI.

### Minimum set (always included)
- Trend (time index)
- Seasonality (week-of-year / month)

### Optional (if available)
- Promotions / discount periods
- Pricing changes
- Product launches
- Major external events / holidays

### Rationale
Controls reduce the risk of attributing non-marketing variation to marketing channels.

---

## Assumptions (initial)

### 1) Carryover effects (adstock)
**Assumption:** Marketing effects can persist over time.
- Spend this week may affect purchases in future weeks.

**Why it matters:** Without carryover, upper-funnel channels may be undervalued.

### 2) Diminishing returns (saturation)
**Assumption:** Returns diminish as spend increases.
- Extra spend has decreasing marginal impact.

**Why it matters:** Without saturation, the model may recommend unrealistic “put everything in one channel” allocations.

### 3) Linear relationship after transformations
**Assumption:** After applying adstock and saturation, a linear model can approximate the response.

**Why it matters:** It keeps the baseline interpretable while capturing core MMM dynamics.

### 4) Observational (correlation-based)
**Assumption:** The model estimates associations, not true causal effects.

**Why it matters:** MMM supports decisions but does not prove causal impact without additional evidence.

### 5) Parameter stability in the modeling window
**Assumption:** Effects are roughly stable over the modeled period.

**Why it matters:** If effects change drastically (algorithm/product/pricing changes), a single average effect can mislead.

---

## Risks & limitations

### Omitted variables bias
Unobserved factors (promotions, pricing, PR, stockouts) may bias attribution.

### Multicollinearity
Channels may move together, making it difficult to separate their effects reliably.

### Misspecification risk
Wrong adstock/saturation choices can distort channel contributions and scenario recommendations.

### Short-term shocks
Sudden events can create unexplained spikes/drops and mislead coefficient estimates.

### Sensitivity to time window and validation strategy
Model results can change depending on the selected time period and holdout approach.

---

## Definition of Done
- KPI, channels, and scope clearly defined
- Intended decisions (high-level and low-level roadmap) explicitly stated
- Assumptions and limitations written explicitly
- Base modeling table schema sketched

---

## Base modeling table (draft)

Expected schema (weekly aggregated):

- week
- target (ALL_PURCHASES)
- spend_* per channel
- controls (trend, seasonality, promo flags if available)

---

## Notes
No data loading in this notebook.
This notebook focuses on problem framing only.
