Add Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity#394
Add Tutorial 20: HAD for National Brand Campaign with Regional Spend Intensity#394
Conversation
|
Overall Assessment ⛔ Blocker — I found one P0 and two P1 methodology/documentation issues in the new HAD tutorial package. The blocker is that the notebook/test DGP relabels one DMA as zero-dose/never-treated after outcomes are generated, so the quoted WAS/CI/coverage numbers are produced on internally inconsistent synthetic data. Executive Summary
Methodology
Code Quality No additional material findings beyond the methodology issues above. Performance No material findings in the changed files. Maintainability No material findings beyond the documentation consistency problems above. Tech Debt No mitigating Security No material findings. Documentation/Tests
Path to Approval
|
DGP correctness (P0): Replace post-hoc mutation of `dose` and `first_treat` for one DMA with a Uniform[$5K, $50K] regional spend DGP where every DMA participates. The previous DGP zeroed the dose for one DMA AFTER the generator had baked the original positive-dose treatment effect into the outcome, producing a "never-treated reference" that still carried treated outcomes. The new DGP is internally consistent: outcomes are generated from the dose values that HAD then sees, no relabeling. Design 1 framing (P1): HAD's auto-detection now resolves to `continuous_near_d_lower` (Design 1) instead of `continuous_at_zero` (Design 1'), matching the "every market got some treatment, no untreated comparison group" narrative throughout. Target parameter is `WAS_d_lower` (per-$1K above the boundary spend, ~$5K), not `WAS`. Notebook Section 3 now explains the WAS_d_lower interpretation: multiply by `(actual_dose - d_lower)` for per-DMA lift estimates (a DMA at $30K saw ~(30 - 5) * 100 = 2,500 extra weekly visits, not 30 * 100 = 3,000). Section 3 acknowledges the Assumption 5/6 advisory the library fires for Design 1 (non-testable local linearity at the boundary) and explains why it holds in this DGP (linear by construction). Section 4 event-study fit filters the duplicate Assumption 5/6 warning. Stakeholder template (Section 5) frames the result as "per-$1K above the $5K floor" and flags the Assumption 6 caveat. Pretest description (P1): Section 6 extensions cell now describes the composite pretest workflow accurately: QUG + linearity (Stute, Yatchew-HR) on the two-period path. The notebook no longer claims this verifies parallel-trends; that is closed by the multi-period joint variants (`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`). Drift test placebo-presence (P2): New `test_event_study_horizons_complete` asserts all 7 expected event-times (e=-4..-2, 0..3) are present, so per-horizon coverage tests can drop the `if e in event_times` guard that would silently pass on a truncated horizon list. Pre-placebo test no longer skips silently. Lock changes: - MAIN_SEED: 126 -> 87 (cleanest seed in the new DGP's seed search). - Locked numbers: overall att=100.0, SE=0.7, CI [98.6, 101.4], d_lower=5.2, dose_mean=24.7, n_treated=59 (out of 60). Docs synced: - CHANGELOG entry rewritten for the new design path / target / 13-test count. - Decision-tree code block updated to use the new DGP knobs (low=$5K, seed=87) and describe the boundary-anchored interpretation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment ✅ Looks good — the prior P0/P1 methodology issues appear resolved. I found only P2/P3 documentation/test consistency issues in the changed files; no unmitigated P0 or P1 findings remain. Executive summary
Methodology Previous re-review blockers look addressed: the notebook/test DGP now keep post-treatment doses strictly positive, the main tutorial narrative now documents Design 1 /
Code Quality No material findings in changed executable library code. Performance No material findings. Maintainability
Tech Debt No separate tech-debt finding. Security No material findings. Documentation/Tests
Static-review note: I could not run |
- Delete the stale T20 CHANGELOG bullet that was orphaned under [3.3.1] when the new [Unreleased] entry was added in R1; only the current Design 1 / WAS_d_lower description survives. - Update the README index bullet from "WAS" to "WAS_d_lower" and spell out the boundary-anchored interpretation, matching the tutorial body. - Make the practitioner decision-tree code block runnable: switch to n_periods=2 / cohort_periods=[2] so the panel directly satisfies HAD's overall-mode 2-period contract; show the explicit pre-period zeroing instead of "..." hand-waving. - Correct the QUG description in the notebook extensions cell: it is the support-infimum test (`H0: d_lower = 0`, adjudicates between `continuous_at_zero` and `continuous_near_d_lower`), not a constant-per-period-effect test. - Drift test pinning: add a sample-median assertion to lock the README/template "median ~$25K" prose; tighten the pre-launch placebo magnitude check from `< 0.5` to `< 0.1` so the notebook's "within ±0.06" claim cannot drift unnoticed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated Executive Summary
Methodology Affected method(s): No findings. The changed docs now line up with the registry on the Assumption 5/6 identification caveat and pointwise event-study CIs ( Code Quality No findings. The PR does not change estimator or inference implementation; executable changes are limited to the new drift test. Performance No findings. The test file uses module-scoped fixtures and is small enough that I do not see a meaningful regression risk. Maintainability
Tech Debt No separate finding. Security No findings. I did not see secrets, credentials, or unsafe new execution surfaces in the changed files. Documentation/Tests
|
d60f9c1 to
5e66491
Compare
…Intensity Practitioner walkthrough for HeterogeneousAdoptionDiD on the no-untreated-controls case: every market got the campaign at varying intensity and there is no clean comparison group. Fills the structural gap T14 (ContinuousDiD) cannot address. Notebook scope (23 cells, 13 markdown / 10 code, mirrors T19's structure): - Sections 1-3: framing the no-untreated-controls measurement problem, setup imports, synthetic 60-DMA / 8-week panel with Uniform[$5K, $50K] regional add-on spend (every DMA participates, no DMA at $0). DGP is internally consistent: outcomes are generated from the dose values HAD then sees, no post-hoc relabeling. - Section 4: overall WAS_d_lower fit on a 2-period (pre/post mean) collapse - HAD's overall mode requires exactly 2 periods (had.py:952-959). Locked headline: per-$1K marginal effect of 100 weekly visits per DMA above the boundary spend (95% CI [98.6, 101.4]) with design auto-detection landing on `continuous_near_d_lower` (Design 1) and target `WAS_d_lower`. Surfaces the Assumption 5/6 advisory the library fires for Design 1 and explains why it holds in this DGP (linear by construction). - Section 5: multi-week event-study fit on the 8-week panel, per-week WAS_d_lower for e=0..3 (~100 each, CIs cover truth) and pre-launch placebos at e=-2..-4 sitting on zero. - Section 6: stakeholder communication template (T18/T19 markdown blockquote pattern), per-DMA dollar-lift interpretation `(actual_dose - d_lower) * WAS_d_lower`, Assumption 6 caveat. - Section 7: extensions (population-weighted/survey path, composite pretest workflow described accurately as QUG support-infimum test + linearity tests, mass-point design path), related-tutorials cross-links (T01, T02, T14, T17, T18, T19), summary checklist. Drift detection: companion tests/test_t20_had_brand_campaign_drift.py (13 tests, 0.06s, mirrors T19's test-file-only pattern - T19's notebook itself has zero in-notebook asserts). Pins panel composition including sample median, design auto-detection / target / d_lower, overall WAS_d_lower / SE / CI endpoints to one-decimal display, dose mean, n_units, full event-study horizon presence (e=-4..-2, 0..3), per-week post-launch coverage of TRUE_SLOPE=100 and zero coverage at every placebo horizon (|placebo_att| < 0.1). Tight `round(_, 1) == X.X` pins throughout - HAD's analytical SE path is bit-identical regardless of backend env (no Rust kernel involved). Locked DGP seed: MAIN_SEED=87. Documentation integration: - docs/tutorials/README.md: new T20 entry following T18/T19's 5-bullet pattern. - docs/doc-deps.yaml: T20 added to the existing diff_diff/had.py entry; cross-link to docs/practitioner_decision_tree.rst added. - docs/practitioner_decision_tree.rst: `.. tip::` block at the end of `section-no-untreated` (Universal Rollout - landed on main via PR #389) cross-links to T20 for the full walkthrough. - CHANGELOG.md: new ### Added bullet under [Unreleased]. Out of scope (queued in project_had_followups.md memory): - _handle_had in practitioner.py:_HANDLERS map. - HAD entries in llms-full.txt / choosing_estimator.rst. - Pretest workflow tutorial, weighted/survey HAD tutorial, mass-point design demo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5e66491 to
846fd8f
Compare
Summary
docs/tutorials/20_had_brand_campaign.ipynb: practitioner walkthrough forHeterogeneousAdoptionDiDon a 60-DMA, 8-week panel where every market got the campaign at varying intensity and no untreated comparison group exists. Covers headline WAS on a 2-period collapse (design='auto'→continuous_at_zero), multi-week event study with per-week pointwise CIs and pre-launch placebos, and a stakeholder communication template. 23 cells (13 markdown / 10 code), mirrors T19's structure, no in-notebook drift cell.tests/test_t20_had_brand_campaign_drift.py: 12 drift tests with module-scoped fixtures pinning panel composition (60 DMAs, 1 at $0), design auto-detection (continuous_at_zero), overall WAS / SE / CI to one-decimal display, dose mean, n_units, and per-week event-study coverage ofTRUE_SLOPE=100plus zero-coverage at every placebo horizon. Mirrors T19's test-file-only pattern (T19's notebook itself has zero in-notebook asserts).diff_diff/had.pyentry indocs/doc-deps.yaml; extenddocs/practitioner_decision_tree.rst§ "Varying Spending Levels" with a new subsection for the no-untreated-controls regime + tip cross-linking T20; add T20 entry todocs/tutorials/README.mdand[Unreleased]section inCHANGELOG.md.Methodology references (required if estimator / math changes)
HeterogeneousAdoptionDiDestimator (shipped v3.3.0) on a synthetic panel.Validation
tests/test_t20_had_brand_campaign_drift.py(12 tests, 0.06s, all passing).Security / privacy
Generated with Claude Code