HAD Phase 2b: multi-period event-study extension (Appendix B.2)#350
HAD Phase 2b: multi-period event-study extension (Appendix B.2)#350
Conversation
Lifts `aggregate="event_study"` scaffolding left by Phase 2a.
`HeterogeneousAdoptionDiD.fit(..., aggregate="event_study")` now returns
a new `HeterogeneousAdoptionDiDEventStudyResults` dataclass with per-
event-time WAS estimates on multi-period panels. All three Phase 2a
design paths (continuous_at_zero, continuous_near_d_lower, mass_point)
are reused verbatim on per-horizon first differences anchored at
`Y_{g, F-1}` (uniform F-1 baseline, consistent with paper Garrett
application).
Staggered-timing panels are auto-filtered to the last-treatment cohort
with a `UserWarning` per paper Appendix B.2 prescription ("did_had may
be used only for the LAST treatment cohort in a staggered design").
Pre-period placebos included for `e <= -2`; the anchor `e = -1` is
skipped since `ΔY = 0` there by construction.
Per-horizon SEs use INDEPENDENT sandwiches: continuous paths use the
CCT-2014 robust SE from Phase 1c divided by `|den|`; mass-point path
uses the Phase 2a structural-residual 2SLS sandwich computed on each
horizon's first differences. Pointwise CIs match the paper's own
Pierce-Schott application (Figure 2); joint cross-horizon covariance
is deferred to a follow-up PR (tracked in TODO.md).
All Phase 2a policy guards (reciprocal regime partition, d_lower
contracts, NaN contract via safe_inference, sklearn get/set_params
atomicity) are preserved identically on the event-study path. The
multi-period panel validator adds dose-contiguity enforcement (pre-
periods < post-periods in natural ordering) and rejects non-monotonic
dose sequences with a pointer to ChaisemartinDHaultfoeuille.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paper Appendix B.2: "in designs with variation in treatment timing,
there must be an untreated group, at least till the period where the
last cohort gets treated." Previously the staggered last-cohort filter
dropped BOTH earlier cohorts AND never-treated units (first_treat=0),
which was undocumented and changed the comparison set.
Fixed: `_validate_had_panel_event_study` now keeps `first_treat == F_last`
AND `first_treat == 0` (never-treated), dropping only earlier cohorts
(`first_treat` in `dropped_cohorts`). Never-treated units satisfy the
dose invariant at every period (D=0 throughout) and serve as the
untreated-group comparison per the paper's requirement. Matches
CallawaySantAnna's never-treated handling convention.
Updated:
- Warning message names the "plus never-treated" composition.
- REGISTRY.md Note (Phase 2b last-cohort filter) documents the new
contract explicitly.
- Function docstring clarifies `data_filtered` and `filter_info` shape.
- New test `test_staggered_filter_retains_never_treated_units` pins
the contract: kept first_treat values are exactly {0, F_last}.
- Existing tests updated to reflect new n_kept / n_dropped counts.
Also addresses AI review P3s:
- Clarify sort-key comment (no false "str fallback" claim).
- Drop misleading NaN-rendering comment on event-study summary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality No additional findings. Performance No additional findings. Maintainability No additional findings beyond the documentation/test issues below. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
Execution note: I did not run the test suite in this environment because the available Python on PATH does not have the project dependencies ( |
…tection
**P0 (blocker):** `_aggregate_multi_period_first_differences` reuses
`D_{g, F}` as the single regressor for every event-time horizon. Without
validation, panels where a unit's dose varies across post-treatment
periods silently misattribute later-horizon effects to the period-F dose.
Fix: `_validate_had_panel_event_study` now rejects panels where any unit
has time-varying dose across post-periods (within-unit spread beyond
float tolerance), with a `ValueError` redirecting to
ChaisemartinDHaultfoeuille for genuinely time-varying regimes.
**P1:** Staggered-timing auto-filter previously only ran inside
`if first_treat_col is not None`. Multi-cohort panels without cohort
metadata slipped through, treating later-cohort units as zero-dose
"controls" at the inferred F, violating Appendix B.2's last-cohort-only
contract.
Fix: When `first_treat_col is None`, the validator computes per-unit
first-positive-dose period from the dose path. If multiple distinct
cohorts are detected, it raises a `ValueError` directing users to
pass `first_treat_col` (which activates the last-cohort auto-filter)
or use ChaisemartinDHaultfoeuille for full staggered support.
**P2 (docs):** Reconciled contradictory REGISTRY guidance between the
legacy edge-case note (line ~2251) and the new Phase 2b last-cohort
filter note. Both now describe the auto-filter + front-door rejection
of un-annotated staggered panels consistently.
**P2 (tests):** Added regression tests for both blockers:
- `test_time_varying_post_F_dose_rejected`
- `test_staggered_without_first_treat_col_rejected`
Also added a **Note (Phase 2b constant-dose requirement)** block to
REGISTRY documenting the new validator guard. TODO.md entry updated
to reflect front-door rejection of time-varying doses (not silent
reuse as before).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…P1 ordered time **P1 (first_treat_col vs dose mismatch):** The last-cohort filter trusted `first_treat_col` without validating it against the observed dose path. A swapped or mistyped cohort label could silently retain the wrong cohort as F_last. Fix: `_validate_had_panel_event_study` now cross-validates each unit's declared first_treat against their actual first-positive-dose period: - declared == 0: unit must have D=0 at every period - declared == F_g > 0: unit's first period with D>0 must equal F_g Any mismatch raises `ValueError` with an example unit, declared value, and actual first-positive period. **P1 (unordered time labels):** Event-study chronology was inferred via raw `sorted()` on period labels. For object/string dtypes that falls back to lexicographic sort, which silently misorders panels like "pre1"/"pre2"/"post1"/"post2" or month-name labels. Fix: Event-study path now requires a numeric, datetime, or ordered- categorical time column. Object/string dtypes raise a front-door `ValueError` directing users to convert. Ordered categoricals are sorted by their declared category order (not the underlying string), via a dtype-aware `_sort_key` reused by both the validator and the multi-period aggregator. **P3 (docstring):** Class docstring no longer says the event-study extension is "queued for Phase 2b"; now documents both aggregation modes with pointers to the respective result classes. **Tests added:** - `test_first_treat_col_mismatch_with_dose_raises` pins the cross- validation contract. - `test_unordered_string_time_col_rejected` pins front-door rejection of object dtypes. - `test_ordered_categorical_time_col_accepted` confirms ordered categoricals sort by category order and fit successfully. Minor: added `observed=False` to the categorical-groupby in the balance check to silence the pandas FutureWarning while preserving behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…docstrings **P0 (cohort sort key):** `_validate_had_panel_event_study` sorted first_treat_col values with raw Python `(x is None, x)` while `time_col` was already required to be ordered (numeric/datetime/ordered categorical). On ordered-categorical staggered panels where chronological order differs from lexicographic order, `F_last = cohorts[-1]` silently picked the lexicographically latest cohort, not the chronologically latest. That keeps the wrong cohort and returns event-study estimates for the wrong estimand. Fix: Promoted the dtype-aware `_sort_key` (ordered-categorical uses declared category index; numeric/datetime use natural order) to the top of the validator, just after the time-dtype check. Cohort sorting, pre/post period sorting, contiguity check, and the staggered-without-first_treat detection all now share this single `_sort_key`. Removed the duplicate `_sort_key` definition that was sitting further down in the same function. **P3 (stale docstrings):** - `fit()` no longer opens with "two-period panel"; now describes both aggregation modes with links to the respective result classes. - `HeterogeneousAdoptionDiDEventStudyResults.n_units` docstring no longer says "only last-cohort units"; now accurately reports last-cohort PLUS never-treated retained. **Test added:** `test_staggered_ordered_categorical_chooses_chronological_last` uses categories `["q1", "q2", "q3", "q10"]` where lex max of the two cohorts (`"q2", "q10"`) is `"q2"` but chronological last is `"q10"`; asserts the fix picks `"q10"` as `F_last` and retains only the q10-cohort units. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Review basis: static diff review only; I did not run |
…balance check
**P1:** Round 3 added `observed=False` to the balance-check groupby to
silence a pandas FutureWarning, but that creates a false-unbalance bug:
on ordered-categorical `time_col` with extra category levels beyond the
observed periods, `observed=False` materializes zero-count unit-period
cells for the unused levels, and the balance check rejects the panel.
Fix: switched to `observed=True`. This tells categorical groupby to
count only OBSERVED unit-period cells, matching the `periods_list`
(observed uniques) that the rest of the validator is keyed to. No
change for numeric / datetime time columns.
**Test added:** `test_ordered_categorical_with_unused_levels_accepted`
declares categories `["pre0", "pre1", "pre2", "post1", "post2", "post3"]`
but only observes `{"pre1", "pre2", "post1", "post2"}`; asserts the
fit succeeds with `F="post1"` and `event_times=[-2, 0, 1]`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Review basis: static diff review only; I could not run |
`HeterogeneousAdoptionDiDEventStudyResults.to_dict()` promised JSON- serializable output but previously returned raw numpy scalars via `list(ndarray)`, which `json.dumps` can't serialize. The `F` field and `filter_info.F_last` could also hold numpy scalars or pandas Timestamps that break serialization. Fix: - Per-horizon arrays use `.tolist()` (unwraps numpy scalars to native Python). - New `_json_safe_scalar` helper coerces numpy scalars via `.item()` and pandas Timestamp/Timedelta via `.isoformat()`; everything else passes through. - New `_json_safe_filter_info` helper applies `_json_safe_scalar` to `F_last` and each element of `dropped_cohorts`, and casts counts to native `int`. - `to_dict()` now applies these helpers consistently. **Test added:** `test_to_dict_json_serializable` asserts `json.dumps(result.to_dict())` succeeds and the round-trip values parse back as native Python types (int, float, list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0 or P1 findings in the changed diff. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
aggregate="event_study"scaffolding from Phase 2a.HeterogeneousAdoptionDiD.fit(..., aggregate="event_study")now returns a newHeterogeneousAdoptionDiDEventStudyResultsdataclass with per-event-time WAS estimates on multi-period panels (paper Appendix B.2).Y_{g, F-1}(uniform baseline, consistent with Garrett et al. application).UserWarningsurfaces kept/dropped counts and dropped cohort labels.|den|; mass-point: Phase 2a structural-residual 2SLS). Pointwise CIs match paper's Pierce-Schott application. Joint cross-horizon covariance deferred to follow-up.Methodology references (required if estimator / math changes)
HeterogeneousAdoptionDiD(event-study mode)F-1baseline for all horizons (paper review line 232 suggests asymmetricY_{g,t} - Y_{g,1}for pre-periods; uniform baseline matches the paper's Garrett Section 5.1 application and simplifies event-time indexing). Documented in REGISTRY Note (Phase 2b baseline convention).D_{g,F}used as single dose regressor across all horizons (paper convention assumes "once treated, stay treated with same dose"). Time-varying post-period dose deferred. Documented in TODO.md.Validation
tests/test_had.py(+41 event-study tests, all passing)filter_infopopulated, sample composition (last-cohort + never-treated retained, earlier cohorts dropped), explicit "retain never-treated" pin testatol=1e-12e = -1not inevent_times; event-time indexing symmetric around FHeterogeneousAdoptionDiDEventStudyResultsAPI:summary(),to_dict(),to_dataframe(),reprSecurity / privacy
Generated with Claude Code