Skip to content

Add Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity#394

Merged
igerber merged 1 commit intomainfrom
feature/tutorial-20-had
Apr 26, 2026
Merged

Add Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity#394
igerber merged 1 commit intomainfrom
feature/tutorial-20-had

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 26, 2026

Summary

  • Add docs/tutorials/20_had_brand_campaign.ipynb: practitioner walkthrough for HeterogeneousAdoptionDiD on a 60-DMA, 8-week panel where every market got the campaign at varying intensity and no untreated comparison group exists. Covers headline WAS on a 2-period collapse (design='auto'continuous_at_zero), multi-week event study with per-week pointwise CIs and pre-launch placebos, and a stakeholder communication template. 23 cells (13 markdown / 10 code), mirrors T19's structure, no in-notebook drift cell.
  • Add tests/test_t20_had_brand_campaign_drift.py: 12 drift tests with module-scoped fixtures pinning panel composition (60 DMAs, 1 at $0), design auto-detection (continuous_at_zero), overall WAS / SE / CI to one-decimal display, dose mean, n_units, and per-week event-study coverage of TRUE_SLOPE=100 plus zero-coverage at every placebo horizon. Mirrors T19's test-file-only pattern (T19's notebook itself has zero in-notebook asserts).
  • Wire the tutorial into the existing diff_diff/had.py entry in docs/doc-deps.yaml; extend docs/practitioner_decision_tree.rst § "Varying Spending Levels" with a new subsection for the no-untreated-controls regime + tip cross-linking T20; add T20 entry to docs/tutorials/README.md and [Unreleased] section in CHANGELOG.md.

Methodology references (required if estimator / math changes)

  • Method name(s): N/A - no methodology changes. Tutorial walks through the existing HeterogeneousAdoptionDiD estimator (shipped v3.3.0) on a synthetic panel.
  • Paper / source link(s): N/A
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: tests/test_t20_had_brand_campaign_drift.py (12 tests, 0.06s, all passing).
  • Backtest / simulation / notebook evidence (if applicable): notebook executes cleanly via `pytest --nbmake docs/tutorials/20_had_brand_campaign.ipynb` (2.5s). Em-dash count on the notebook: 0 (per project convention). All locked numbers from `_scratch/had_tutorial/20_assemble_outputs.py` are stable; HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved on HAD).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

Blocker — I found one P0 and two P1 methodology/documentation issues in the new HAD tutorial package. The blocker is that the notebook/test DGP relabels one DMA as zero-dose/never-treated after outcomes are generated, so the quoted WAS/CI/coverage numbers are produced on internally inconsistent synthetic data.

Executive Summary

  • P0 The tutorial/test create the “never-treated anchor” by mutating dose and first_treat after outcome generation, without recomputing that DMA’s post-treatment outcomes. The notebook’s reported HAD numbers are therefore locked to mislabeled data.
  • P1 The new docs present Tutorial 20 as a pure “no untreated comparison group” / “every unit got some treatment” example, but the example actually relies on a zero-dose Design 1' / continuous_at_zero anchor. In the true no-QUG (d_lower > 0) regime, the estimand and assumptions change.
  • P1 The notebook overstates the HAD pretest workflow as verifying parallel trends. Per the registry, the two-period workflow is QUG + linearity only; the pre-trends step is deferred there and only closed by the multi-period joint variants.
  • P2 One new drift guard silently skips missing placebo horizons, so the tutorial can lose promised pre-launch rows without the test failing.
  • I could not execute pytest/the notebook here because this environment lacks project Python deps (pytest, pandas), so this is a static review.

Methodology

  • P0 docs/tutorials/20_had_brand_campaign.ipynb, diff_diff/prep_dgp.py, tests/test_t20_had_brand_campaign_drift.py. The notebook/test zero out one DMA’s dose and set first_treat=0 after generate_continuous_did_data() has already generated outcome and true_att from the original positive dose. Impact: the supposed zero-dose / never-treated reference still carries treated outcomes, so the quoted WAS, CI, and “covers TRUE_SLOPE=100” claims are not tied to a valid DGP. Concrete fix: generate a genuine never-treated unit in the DGP itself, or recompute that DMA’s post-period outcome/true_att after the mutation and then rebaseline all quoted notebook numbers and drift tests.

  • P1 docs/tutorials/20_had_brand_campaign.ipynb, docs/practitioner_decision_tree.rst, docs/tutorials/README.md, CHANGELOG.md, docs/methodology/REGISTRY.md, docs/api/had.rst. The new prose repeatedly says this is the regime with “no untreated comparison group” / “every unit gets some treatment,” but the actual example forces a zero-dose anchor and the fit resolves to continuous_at_zero, i.e. Design 1' with target WAS. The registry/API explicitly distinguish that from true Design 1 (d_lower > 0), where the shipped estimator targets WAS_d_lower and warns about extra Assumptions 5/6. Impact: practitioners in a genuine no-zero-dose panel are being pointed at this tutorial as if the same WAS interpretation applies. Concrete fix: either rewrite the tutorial package as a Design 1' / QUG-at-zero example, or change the DGP to a genuine d_lower > 0 case and explicitly document target_parameter="WAS_d_lower" plus the Assumption 5/6 caveat.

  • P1 docs/tutorials/20_had_brand_campaign.ipynb, docs/methodology/REGISTRY.md, docs/methodology/REGISTRY.md. The extensions section says the HAD “composite pretest combining QUG, Stute, and Yatchew-HR tests verifies the parallel-trends assumption holds at the boundary.” The shipped two-period workflow does not run the Assumption 7 pre-trends step; it only runs QUG + linearity and explicitly flags that gap. Impact: users can over-trust the workflow as a formal parallel-trends verification when the library documentation says not to. Concrete fix: reword this to distinguish the two-period QUG+linearity workflow from the multi-period aggregate="event_study" joint pre-trends/joint homogeneity workflow.

Code Quality

No additional material findings beyond the methodology issues above.

Performance

No material findings in the changed files.

Maintainability

No material findings beyond the documentation consistency problems above.

Tech Debt

No mitigating TODO.md tracking was added for the P0/P1 issues above, and those would not be mitigated by TODO tracking anyway.

Security

No material findings.

Documentation/Tests

  • P2 tests/test_t20_had_brand_campaign_drift.py. test_event_study_pre_placebos_cover_zero() silently skips a claimed placebo horizon when it is missing (if e in event_times:). Impact: the tutorial can lose one or more of the promised e=-4,-3,-2 placebo rows and the drift test still passes. Concrete fix: first assert that all expected placebo horizons are present, then check their values/CIs.

Path to Approval

  1. Make the tutorial/test DGP internally consistent: stop post-hoc relabeling a treated DMA as untreated, or recompute that DMA’s outcomes after the relabeling.
  2. Choose one HAD regime for Tutorial 20 and make all new docs match it: either Design 1' (continuous_at_zero, WAS) or true Design 1 (WAS_d_lower with Assumption 5/6 caveats).
  3. Correct the pretest wording so it no longer claims that the QUG + Stute + Yatchew workflow verifies parallel trends on the two-period path.

igerber added a commit that referenced this pull request Apr 26, 2026
DGP correctness (P0):
Replace post-hoc mutation of `dose` and `first_treat` for one DMA with
a Uniform[$5K, $50K] regional spend DGP where every DMA participates.
The previous DGP zeroed the dose for one DMA AFTER the generator had
baked the original positive-dose treatment effect into the outcome,
producing a "never-treated reference" that still carried treated
outcomes. The new DGP is internally consistent: outcomes are generated
from the dose values that HAD then sees, no relabeling.

Design 1 framing (P1):
HAD's auto-detection now resolves to `continuous_near_d_lower`
(Design 1) instead of `continuous_at_zero` (Design 1'), matching the
"every market got some treatment, no untreated comparison group"
narrative throughout. Target parameter is `WAS_d_lower` (per-$1K above
the boundary spend, ~$5K), not `WAS`. Notebook Section 3 now explains
the WAS_d_lower interpretation: multiply by `(actual_dose - d_lower)`
for per-DMA lift estimates (a DMA at $30K saw ~(30 - 5) * 100 = 2,500
extra weekly visits, not 30 * 100 = 3,000). Section 3 acknowledges
the Assumption 5/6 advisory the library fires for Design 1
(non-testable local linearity at the boundary) and explains why it
holds in this DGP (linear by construction). Section 4 event-study fit
filters the duplicate Assumption 5/6 warning. Stakeholder template
(Section 5) frames the result as "per-$1K above the $5K floor" and
flags the Assumption 6 caveat.

Pretest description (P1):
Section 6 extensions cell now describes the composite pretest workflow
accurately: QUG + linearity (Stute, Yatchew-HR) on the two-period
path. The notebook no longer claims this verifies parallel-trends;
that is closed by the multi-period joint variants
(`stute_joint_pretest`, `joint_pretrends_test`,
`joint_homogeneity_test`).

Drift test placebo-presence (P2):
New `test_event_study_horizons_complete` asserts all 7 expected
event-times (e=-4..-2, 0..3) are present, so per-horizon coverage
tests can drop the `if e in event_times` guard that would silently
pass on a truncated horizon list. Pre-placebo test no longer skips
silently.

Lock changes:
- MAIN_SEED: 126 -> 87 (cleanest seed in the new DGP's seed search).
- Locked numbers: overall att=100.0, SE=0.7, CI [98.6, 101.4],
  d_lower=5.2, dose_mean=24.7, n_treated=59 (out of 60).

Docs synced:
- CHANGELOG entry rewritten for the new design path / target /
  13-test count.
- Decision-tree code block updated to use the new DGP knobs (low=$5K,
  seed=87) and describe the boundary-anchored interpretation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6bd3c4e8218e62a859ac11c5933340b523b65f67


Overall assessment

Looks good — the prior P0/P1 methodology issues appear resolved. I found only P2/P3 documentation/test consistency issues in the changed files; no unmitigated P0 or P1 findings remain.

Executive summary

  • The previous blocker is fixed: the tutorial/test DGP no longer manufactures a zero-dose anchor after outcome generation, and the main notebook now matches Design 1 / continuous_near_d_lower.
  • The notebook’s core methodology framing is now mostly aligned with the registry: it presents WAS_d_lower and surfaces the Assumption 5/6 caveat.
  • P2 The notebook extensions section still misstates what the QUG test does.
  • P2 Summary docs still contain stale WAS / continuous_at_zero descriptions for Tutorial 20, even though the tutorial was rewritten to WAS_d_lower.
  • P2 The new decision-tree code block is not runnable as written.
  • P3 The new drift file does not actually pin all of the newly quoted prose-level numbers.

Methodology

Previous re-review blockers look addressed: the notebook/test DGP now keep post-treatment doses strictly positive, the main tutorial narrative now documents Design 1 / WAS_d_lower, and the pretest discussion now distinguishes the two-period and event-study workflows.

  • Severity: P2. docs/tutorials/20_had_brand_campaign.ipynb:L392, docs/methodology/REGISTRY.md:L2423-L2429, docs/methodology/REGISTRY.md:L2479-L2488. The extensions section says the workflow combines the QUG test with “constant-per-period effect,” but qug_test() is the support-boundary test H0: d̲ = 0. Impact: readers get the wrong picture of what the formal HAD pretest battery actually checks. Concrete fix: change that parenthetical to “quasi-untreated-group / support-infimum test” or remove the parenthetical entirely.

Code Quality

No material findings in changed executable library code.

Performance

No material findings.

Maintainability

  • Severity: P2. CHANGELOG.md:L11-L27, docs/tutorials/README.md:L98-L102. The new Unreleased changelog entry correctly says Tutorial 20 is continuous_near_d_lower / WAS_d_lower, but the older Tutorial 20 changelog bullet and the README index still describe it as continuous_at_zero / WAS. Impact: repo docs disagree about the tutorial’s estimand and design path, which recreates part of the earlier review confusion for anyone scanning summary docs. Concrete fix: delete or rewrite the stale CHANGELOG.md bullet at L27, and update the README bullet to say WAS_d_lower.

Tech Debt

No separate tech-debt finding. TODO.md does not mitigate the items above.

Security

No material findings.

Documentation/Tests

  • Severity: P2. docs/practitioner_decision_tree.rst:L267-L287, diff_diff/had.py:L2842-L2849. The new decision-tree snippet is not runnable as written: it gestures at zeroing pre-treatment doses, but then passes the unmodified 8-period panel straight into HAD.fit()’s default aggregate="overall" path, which requires exactly two time periods. Impact: copy/paste users will hit a front-door error instead of reproducing the documented workflow. Concrete fix: either show the actual preprocessing and 2-period collapse, or explicitly label the block as schematic and point readers to Tutorial 20 for executable code.
  • Severity: P3. docs/tutorials/20_had_brand_campaign.ipynb:L349-L365, tests/test_t20_had_brand_campaign_drift.py:L182-L186, tests/test_t20_had_brand_campaign_drift.py:L251-L265. The drift harness says it locks the quoted markdown, but two new prose claims are only weakly checked: the notebook says placebo effects are within ±0.06 and spend is “median ~$25K”, while the test only enforces abs(placebo) < 0.5 and pins dose_mean, not the median. Impact: those prose claims can drift while the drift file stays green. Concrete fix: either add exact assertions for placebo magnitudes and the sample median, or soften the prose to qualitative wording.

Static-review note: I could not run pytest or nbmake here because this environment does not have pytest, numpy, or pandas installed.

igerber added a commit that referenced this pull request Apr 26, 2026
- Delete the stale T20 CHANGELOG bullet that was orphaned under
  [3.3.1] when the new [Unreleased] entry was added in R1; only the
  current Design 1 / WAS_d_lower description survives.
- Update the README index bullet from "WAS" to "WAS_d_lower" and
  spell out the boundary-anchored interpretation, matching the
  tutorial body.
- Make the practitioner decision-tree code block runnable: switch to
  n_periods=2 / cohort_periods=[2] so the panel directly satisfies
  HAD's overall-mode 2-period contract; show the explicit pre-period
  zeroing instead of "..." hand-waving.
- Correct the QUG description in the notebook extensions cell: it is
  the support-infimum test (`H0: d_lower = 0`, adjudicates between
  `continuous_at_zero` and `continuous_near_d_lower`), not a
  constant-per-period-effect test.
- Drift test pinning: add a sample-median assertion to lock the
  README/template "median ~$25K" prose; tighten the pre-launch
  placebo magnitude check from `< 0.5` to `< 0.1` so the notebook's
  "within ±0.06" claim cannot drift unnoticed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d60f9c18af391ab43c028754a15527450af8f249


Overall Assessment

Looks good — no unmitigated P0 or P1 findings in the changed files.

Executive Summary

  • Prior re-review issues look addressed: the tutorial/docs now consistently frame Tutorial 20 as HAD Design 1 continuous_near_d_lower targeting WAS_d_lower, and the new decision-tree snippet is a real two-period HAD example.
  • Methodology cross-check found no undocumented mismatch versus the registry/docstrings for Design 1 identification, event-study pointwise CIs, or the multi-period HAD pretest workflow.
  • P3 The new drift test still does not fully pin the notebook’s quoted “within ±0.06” placebo claim; it allows values up to < 0.1.
  • P3 The drift-test docstring points maintainers to _scratch/had_tutorial/*, but those paths are not present in the repo.
  • Static review only: I could not run pytest here because pytest, numpy, and pandas are not installed in this environment.

Methodology

Affected method(s): HeterogeneousAdoptionDiD docs/tutorial only, specifically the no-untreated Design 1 path (continuous_near_d_lower / WAS_d_lower), event-study pointwise inference, and the HAD pretest workflow narrative.

No findings. The changed docs now line up with the registry on the Assumption 5/6 identification caveat and pointwise event-study CIs (docs/methodology/REGISTRY.md:L2528-L2535), and the tutorial’s extensions section now matches the shipped did_had_pretest_workflow(..., aggregate="event_study") contract (docs/tutorials/20_had_brand_campaign.ipynb:L392-L394, diff_diff/had_pretests.py:L4044-L4047). The new decision-tree example also now satisfies HAD’s two-period overall-path contract (docs/practitioner_decision_tree.rst:L257-L288).

Code Quality

No findings. The PR does not change estimator or inference implementation; executable changes are limited to the new drift test.

Performance

No findings. The test file uses module-scoped fixtures and is small enough that I do not see a meaningful regression risk.

Maintainability

  • tests/test_t20_had_brand_campaign_drift.py:L15-L16 — Severity: P3. Impact: the docstring says the locked DGP / quoted outputs come from _scratch/had_tutorial/40_build_notebook.py and _scratch/had_tutorial/20_assemble_outputs.py, but those paths are not present in the repo, so future maintainers cannot inspect the stated source of truth. Concrete fix: either commit those scripts under a tracked path or replace the references with a note that the constants are duplicated directly in the test/notebook.

Tech Debt

No separate finding. TODO.md does not mitigate the minor items above.

Security

No findings. I did not see secrets, credentials, or unsafe new execution surfaces in the changed files.

Documentation/Tests

  • docs/tutorials/20_had_brand_campaign.ipynb:L349-L349, tests/test_t20_had_brand_campaign_drift.py:L257-L272 — Severity: P3. Impact: the notebook says the placebo effects are “within ±0.06,” but the test only enforces abs(att) < 0.1, so the quoted prose can drift materially without the harness failing. Concrete fix: either tighten the assertion to the stated bound (or an exact rounded pin), or soften the prose to “within about ±0.1.”

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
@igerber igerber force-pushed the feature/tutorial-20-had branch from d60f9c1 to 5e66491 Compare April 26, 2026 15:27
…Intensity

Practitioner walkthrough for HeterogeneousAdoptionDiD on the
no-untreated-controls case: every market got the campaign at varying
intensity and there is no clean comparison group. Fills the structural
gap T14 (ContinuousDiD) cannot address.

Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's
structure):
- Sections 1-3: framing the no-untreated-controls measurement problem,
  setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K]
  regional add-on spend (every DMA participates, no DMA at $0). DGP
  is internally consistent: outcomes are generated from the dose
  values HAD then sees, no post-hoc relabeling.
- Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean)
  collapse - HAD's overall mode requires exactly 2 periods
  (had.py:952-959). Locked headline: per-$1K marginal effect of 100
  weekly visits per DMA above the boundary spend (95% CI [98.6,
  101.4]) with design auto-detection landing on
  `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`.
  Surfaces the Assumption 5/6 advisory the library fires for Design 1
  and explains why it holds in this DGP (linear by construction).
- Section 5: multi-week event-study fit on the 8-week panel,
  per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and
  pre-launch placebos at e=-2..-4 sitting on zero.
- Section 6: stakeholder communication template (T18/T19 markdown
  blockquote pattern), per-DMA dollar-lift interpretation
  `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat.
- Section 7: extensions (population-weighted/survey path, composite
  pretest workflow described accurately as QUG support-infimum test
  + linearity tests, mass-point design path), related-tutorials
  cross-links (T01, T02, T14, T17, T18, T19), summary checklist.

Drift detection: companion tests/test_t20_had_brand_campaign_drift.py
(13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's
notebook itself has zero in-notebook asserts). Pins panel composition
including sample median, design auto-detection / target / d_lower,
overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose
mean, n_units, full event-study horizon presence (e=-4..-2, 0..3),
per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at
every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X`
pins throughout - HAD's analytical SE path is bit-identical regardless
of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87.

Documentation integration:
- docs/tutorials/README.md: new T20 entry following T18/T19's
  5-bullet pattern.
- docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py
  entry; cross-link to docs/practitioner_decision_tree.rst added.
- docs/practitioner_decision_tree.rst: `.. tip::` block at the end
  of `section-no-untreated` (Universal Rollout - landed on main via
  PR #389) cross-links to T20 for the full walkthrough.
- CHANGELOG.md: new ### Added bullet under [Unreleased].

Out of scope (queued in project_had_followups.md memory):
- _handle_had in practitioner.py:_HANDLERS map.
- HAD entries in llms-full.txt / choosing_estimator.rst.
- Pretest workflow tutorial, weighted/survey HAD tutorial,
  mass-point design demo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/tutorial-20-had branch from 5e66491 to 846fd8f Compare April 26, 2026 20:37
@igerber igerber merged commit f1d7674 into main Apr 26, 2026
22 checks passed
@igerber igerber deleted the feature/tutorial-20-had branch April 26, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant