HAD Phase 4.5 B: weighted mass-point 2SLS + event-study survey composition + sup-t bootstrap by igerber · Pull Request #363 · igerber/diff-diff

igerber · 2026-04-24T20:33:40Z

Summary

Closes the two Phase 4.5 A NotImplementedError gates:

design=\"mass_point\" + weights/survey at diff_diff/had.py:2807-2817
aggregate=\"event_study\" + weights/survey at diff_diff/had.py:2376-2382

Ships weighted 2SLS mass-point path, event-study survey composition for all three designs, and a multiplier-bootstrap sup-t simultaneous confidence band on the weighted event-study path.

Weighted 2SLS in _fit_mass_point_2sls follows Wooldridge 2010 Ch. 12 pweight convention (w² in HC1 meat, w·u in CR1 cluster score, weighted bread Z'WX). HC1 and CR1 bit-exact with estimatr::iv_robust(..., weights=, clusters=) at atol=1e-10 on 5 DGPs (new cross-language golden).
Per-unit IF on β̂-scale scales so compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1] at atol=1e-10 (PR HAD Phase 4.5: survey support on continuous-dose paths #359 convention, applied uniformly across continuous + mass-point).
Event-study wiring: per-horizon loop threads weights_unit_full + resolved_survey_unit_full, composes Binder-TSL variance per horizon via compute_survey_if_variance, populates survey_metadata + variance_formula + effective_dose_mean (previously hardcoded None).
Sup-t multiplier bootstrap reuses generate_survey_multiplier_weights_batch / generate_bootstrap_weights_batch from diff_diff.bootstrap_utils (stratum centering, FPC, lonely-PSU). Perturbations xi @ Psi with NO (1/n) prefactor (matches staggered_bootstrap.py:373). At H=1 reduces to Φ⁻¹(1-α/2) ≈ 1.96 (locked by TestSupTReducesToNormalAtH1).

Public API

HeterogeneousAdoptionDiD.__init__ gains n_bootstrap: int = 999, seed: Optional[int] = None.
HeterogeneousAdoptionDiD.fit() gains cband: bool = True (weighted event-study only).
_fit_mass_point_2sls signature: new weights= + return_influence= kwargs; always returns 3-tuple (beta, se, psi).
HeterogeneousAdoptionDiDEventStudyResults gains variance_formula, effective_dose_mean, cband_low, cband_high, cband_crit_value, cband_method, cband_n_bootstrap (all None on unweighted fits).
All new fields surfaced in to_dict, to_dataframe, summary, __repr__.

Stability invariants

Unweighted mass-point and event-study paths bit-exact vs pre-PR (all 260 existing HAD tests pass unchanged).
PR HAD Phase 4.5: survey support on continuous-dose paths #359 survey tests (TestHADSurvey) unchanged on static path.
Non-pweight SurveyDesigns (aweight, fweight, replicate) rejected with NotImplementedError on both new paths (reciprocal-guard discipline).
filter_info identical across unweighted / uniform-weighted / informatively-weighted fits (identification-theory, not sampling-domain).
HAD pretests (qug_test, stute_test, yatchew_hr_test, joint variants, did_had_pretest_workflow) still don't accept weights/survey — deferred to Phase 4.5 C / C0.

Test plan

pytest tests/test_had.py tests/test_bias_corrected_lprobust.py tests/test_nprobust_port.py tests/test_estimatr_iv_robust_parity.py -v — all 380 pass, 2 skipped
Rscript benchmarks/R/generate_estimatr_iv_robust_golden.R — regenerates 5-DGP golden
PR HAD Phase 4.5: survey support on continuous-dose paths #359 survey regression (TestHADSurvey unchanged)
TestSupTReducesToNormalAtH1 confirms q ≈ 1.96 at H=1, G=500, B=5000
TestEstimatrIVRobustHC1Parity / TestEstimatrIVRobustCR1Parity confirm cross-language bit-parity at atol=1e-10
TestIFScaleInvariant confirms PR HAD Phase 4.5: survey support on continuous-dose paths #359 convention holds uniformly for 2SLS IF

🤖 Generated with Claude Code

…t-study survey composition + sup-t bootstrap Closes the two Phase 4.5 A NotImplementedError gates on design='mass_point' + weights/survey and aggregate='event_study' + weights/survey. Weighted 2SLS in _fit_mass_point_2sls follows the Wooldridge 2010 Ch. 12 pweight convention (w² in HC1 meat, w·u in CR1 cluster score, weighted bread Z'WX). HC1 and CR1 match estimatr::iv_robust bit-exactly at atol=1e-10 (new cross-language golden). Per-unit IF on β̂-scale scales so compute_survey_if_variance(psi, trivial) ≈ V_HC1 at atol=1e-10 (PR #359 convention applied uniformly). Event-study path threads weights + survey through the per-horizon loop, composing Binder-TSL variance per horizon and populating survey_metadata + variance_formula + effective_dose_mean (previously hardcoded None). New _sup_t_multiplier_bootstrap helper reuses generate_survey_multiplier_weights_batch / generate_bootstrap_weights_batch from diff_diff.bootstrap_utils — no custom Rademacher draws, no (1/n) prefactor. At H=1 reduces to Φ⁻¹(1-α/2) ≈ 1.96 (reduction-locked). New __init__ kwargs: n_bootstrap=999, seed=None. New fit() kwarg: cband=True. HeterogeneousAdoptionDiDEventStudyResults gains survey_metadata + variance_formula + effective_dose_mean + cband_* fields, surfaced through to_dict / to_dataframe / summary / __repr__. Unweighted event-study output (att/se) bit-exactly preserved; cband disabled on the unweighted path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-24T20:41:51Z

Overall Assessment

⛔ Blocker

Executive Summary

The weighted 2SLS HC1/CR1 core in _fit_mass_point_2sls() looks consistent with the updated registry/parity story; I did not count the documented classical-vs-estimatr deviation as a defect.
The new simultaneous-band path has a P0 NaN-gating bug: degenerate horizons can still emit finite cband_low/high, which is misleading statistical output.
The new survey sup-t implementation does not match its documented stratified-bootstrap method: it assumes stratum-centered PSU multipliers, but the reused helper does not center within strata.
The weights= event-study path violates the row-order contract by filtering with data_filtered.index, so valid non-RangeIndex inputs can misalign weights or fail.
The new TODO entries for analytical cross-horizon covariance and unweighted sup-t are properly tracked and are not blockers.

Methodology

P0 Impact: _sup_t_multiplier_bootstrap() excludes invalid horizons from the bootstrap t-statistics by replacing nonpositive/nonfinite SEs with NaN, but it then materializes cband_low/high with the raw se_per_horizon. That means a horizon with se == 0 gets a finite simultaneous band equal to the point estimate even though safe_inference() correctly sets the pointwise inference fields to NaN. This is partial NaN-guarding on new inference output and can silently overstate precision. Concrete fix: apply the same finite-and-positive SE mask when building cband_low/high, and if no horizons remain valid, blank the cband surface consistently. References: diff_diff/had.py:L2047-L2078, diff_diff/utils.py:L177-L210.
P1 Impact: The PR and registry say the new survey sup-t path reuses stratum-centered PSU multiplier draws, but generate_survey_multiplier_weights_batch() currently just draws iid weights within each stratum and applies FPC/lonely-PSU handling; it does not center within strata. The analytical survey variance target is explicitly centered at the stratum level, so the new HAD survey cband does not match the documented Binder-TSL/bootstrap contract for stratified designs. Concrete fix: center the PSU multipliers within each stratum for every draw, or equivalently center PSU-aggregated Psi within strata before perturbation, then add a stratified H=1 regression test against the analytical survey variance/critical-value target. References: diff_diff/had.py:L1933-L2078, diff_diff/bootstrap_utils.py:L556-L639, diff_diff/survey.py:L1257-L1313, docs/methodology/REGISTRY.md:L2305-L2315.

Code Quality

P1 Impact: The new weights= event-study path filters weights with w_full[data_filtered.index.to_numpy()], which treats DataFrame index labels as positional offsets. The public weights contract is row-order aligned, not index-label aligned. On a non-RangeIndex input this can either crash or, worse, silently attach the wrong weights to rows after filtering. Concrete fix: carry a positional keep mask out of _validate_had_panel_event_study(), or reset to a positional index before validation and slice weights by position only. Add tests with both custom integer and string indices. Reference: diff_diff/had.py:L3560-L3581.

Performance

No findings.

Maintainability

No findings beyond the documentation drift noted below.

Tech Debt

P3 Impact: The new follow-up items for weighted event-study analytical covariance and unweighted sup-t are properly tracked, so I do not count those deferred gaps against the PR. Concrete fix: none required in this PR. Reference: TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

P3 Impact: Public HAD docstrings still describe the old Phase 4.5 A behavior: fit() says mass_point/aggregate="event_study" + survey/weights are unsupported, and the event-study result docstring still says Phase 2b has only pointwise CIs and survey_metadata is always None. That now contradicts both the code and the registry, and cband is not documented on the public fit() surface. Concrete fix: update the HeterogeneousAdoptionDiD.fit() and HeterogeneousAdoptionDiDEventStudyResults docstrings to reflect weighted event-study/mass-point support and the new cband fields; add regression tests for the zero-SE cband case and non-RangeIndex weights= case. References: diff_diff/had.py:L540-L545, diff_diff/had.py:L590-L591, diff_diff/had.py:L2666-L2706.

Path to Approval

NaN-gate the new simultaneous-band outputs exactly like the existing pointwise inference surface, and add a regression test where one horizon has se <= 0.
Fix weights= filtering on the event-study path to use positional row alignment rather than DataFrame index labels, and add tests with custom integer/string indices.
Make the survey sup-t perturbation actually stratum-centered (or center the PSU-aggregated IF matrix before applying multipliers), and add a stratified H=1 validation against the analytical survey variance target.

R1 P0 (methodology): cband_low/high now NaN-gated for horizons with se <= 0 or non-finite, matching the safe_inference contract for pointwise output. Previously a horizon with se=0 produced a finite band equal to the point estimate, overstating precision. R1 P1 (methodology): sup-t multiplier bootstrap now applies stratum-centered + small-sample-corrected PSU aggregation before the matmul, so Var_xi(xi @ Psi_psu_scaled) reproduces the analytical Binder-TSL variance term-for-term (V = sum_h (1-f_h)(n_h/(n_h-1)) sum_j (psi_hj - psi_h_bar)^2). Previously the bootstrap omitted centering and the small-sample correction; under stratified designs the critical value diverged from the analytical target. Verified at H=1 G=400 n_strata=4: bootstrap q=1.985 matches Phi^-1(0.975)=1.960. R1 P1 (code quality): event-study weights= filtering now translates data_filtered.index LABELS to POSITIONAL offsets via data.index.get_indexer, so non-RangeIndex inputs work correctly. Previous code treated index labels as positions and silently broke on custom int / string indices. Raises a clear ValueError when index alignment fails (duplicate or malformed labels). R1 P3 (docs): HeterogeneousAdoptionDiDEventStudyResults docstring updated to describe Phase 4.5 B weighted/survey support + new cband_* fields + variance_formula / effective_dose_mean; fit() docstring updated to reflect the Phase 4.5 B dispatch matrix and document the new cband kwarg. Regression tests: - test_zero_se_horizon_nan_gates_cband locks the P0 NaN gate - test_weights_nonrange_index_aligned_positionally locks the P1 row-order contract across RangeIndex vs custom int indices - Existing TestSupTReducesToNormalAtH1 covers the stratum-centered path via the unit-PSU reduction case Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T20:56:48Z

/ai-review

github-actions · 2026-04-24T21:02:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 082368a5e8982e78f780425cf4885397dd14534d

Overall Assessment

⚠️ Needs changes

Executive Summary

The previously reported blocker items look addressed: simultaneous-band NaN gating now matches safe_inference(), non-RangeIndex weights= slicing is positional, and ordinary stratified survey sup-t now centers PSU IFs before perturbation.
The weighted 2SLS core is aligned with the updated methodology registry; the documented classical-vs-estimatr difference is a registered deviation, not a defect.
P1 The new sup-t survey bootstrap is still inconsistent with the supported lonely_psu="adjust" mode. Singleton strata are pooled in the bootstrap helper, but _sup_t_multiplier_bootstrap() skips the matching singleton-PSU IF transformation, so the simultaneous band no longer targets the same Binder-TSL variance as compute_survey_if_variance(). diff_diff/had.py:L2064-L2127, diff_diff/bootstrap_utils.py:L599-L639, diff_diff/survey.py:L1683-L1704, docs/methodology/REGISTRY.md:L2305-L2315
P1 The new survey-enabled mass-point path accepts cluster=, then overwrites the CR1 SE with Binder-TSL while still surfacing vcov_type="cr1" / cluster_name. That is a silent inference mismatch on both overall and event-study fits. diff_diff/had.py:L3278-L3309, diff_diff/had.py:L3917-L4044, diff_diff/had.py:L4151-L4153
The new TODO items for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and are not blockers. TODO.md:L95-L96

Methodology

P1 Impact: lonely_psu="adjust" is a supported survey mode, but the new sup-t path only matches the analytical Binder-TSL target for ordinary strata. In _sup_t_multiplier_bootstrap(), singleton strata are skipped under the assumption that the bootstrap helper zeros them, but generate_survey_multiplier_weights_batch() explicitly gives pooled nonzero draws for "adjust". The analytical helper instead centers singleton PSU scores around the global mean. That leaves the simultaneous band wrong for supported singleton-strata designs, with no warning. Concrete fix: either implement the "adjust" branch in _sup_t_multiplier_bootstrap() so pooled singleton PSU columns are transformed to the same target used by compute_survey_if_variance(), or reject lonely_psu="adjust" when cband=True until that derivation is implemented. Add an H=1 regression on a singleton-strata design comparing bootstrap variance / critical value to the analytical target. diff_diff/had.py:L2105-L2115, diff_diff/bootstrap_utils.py:L599-L639, diff_diff/survey.py:L1683-L1704
P1 Impact: on the new mass-point survey= path, cluster= does not affect final inference even though the result still reports CR1 metadata. _fit_mass_point_2sls() is called with cluster_arr, but its SE is then overwritten by compute_survey_if_variance(). The returned result still advertises vcov_type="cr1" and cluster_name=..., so users are told they got cluster-robust inference when they did not. Concrete fix: reject cluster= with survey= on mass-point HAD until a combined methodology exists, or implement the combined variance explicitly and relabel outputs accordingly. Add regression coverage for both overall and aggregate="event_study" paths. diff_diff/had.py:L3278-L3309, diff_diff/had.py:L3917-L4044, diff_diff/had.py:L4151-L4153

Code Quality

No findings beyond the inference-path issues above.

Performance

No findings.

Maintainability

No additional findings.

Tech Debt

P3 Impact: the follow-up gaps for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked, so they should not block this PR. Concrete fix: none required in this PR. TODO.md:L95-L96

Security

No findings.

Documentation/Tests

No separate blocking documentation issues. The needed test additions are the ones tied to the two P1 fixes above.

Path to Approval

Fix or explicitly reject lonely_psu="adjust" on the weighted event-study cband path, and add an H=1 singleton-strata regression that checks the bootstrap target against compute_survey_if_variance().
Resolve the survey= + cluster= mass-point interaction by either rejecting it or implementing the combined inference properly, and ensure result metadata no longer claims CR1 when Binder-TSL is the actual variance path.

R2 P1 (methodology): reject SurveyDesign(lonely_psu='adjust') with singleton strata in _sup_t_multiplier_bootstrap. The bootstrap helper pools singletons into a pseudo-stratum with NONZERO multipliers, but compute_survey_if_variance centers singleton PSU scores around the global mean — without the matching pseudo-stratum centering transform in the bootstrap, the simultaneous band target diverges from the analytical Binder-TSL variance. Clear NotImplementedError points users to lonely_psu='remove' (matches the 'remove' analytical target) or cband=False (skips bootstrap). 'remove' / 'certainty' continue to work unchanged. Deferred transform tracked for follow-up. R2 P1 (code quality): reject cluster= + weights/survey= on design='mass_point' on both static and event-study paths. The weighted path composes Binder-TSL variance via compute_survey_if_variance which was silently overriding the CR1 sandwich while result metadata still advertised vcov_type='cr1' and cluster_name=<col>. Clean NotImplementedError with a pointer to the two supported combinations: cluster= alone (unweighted CR1) or survey/weights alone (Binder-TSL). Combined cluster-robust survey inference requires a derivation not yet in scope. Regression tests (+4): - test_mass_point_survey_plus_cluster_rejected_static - test_mass_point_survey_plus_cluster_rejected_event_study - test_lonely_psu_adjust_with_singletons_rejected_on_cband - test_stratified_h1_sup_t_matches_analytical (H=1 quantile lock under n_strata=4, psu_per_unit: q=1.985, matches Phi^-1(0.975)) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T21:06:29Z

/ai-review

github-actions · 2026-04-24T21:17:53Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 4ecc6f437fbdaea479fec838f7d700a2ce452ee5

Overall Assessment

⚠️ Needs changes

Executive Summary

The two P1 findings from the prior review look addressed: lonely_psu="adjust" with singleton strata now hard-errors on the cband path, and cluster= is rejected on weighted mass-point overall/event-study fits instead of silently mixing CR1 and Binder inference. diff_diff/had.py:L2064-L2103, diff_diff/had.py:L3301-L3320, diff_diff/had.py:L3917-L3934, tests/test_had.py:L4850-L4992
The weighted 2SLS HC1/CR1 core is aligned with the Methodology Registry and the new estimatr parity fixture; the weighted-classical difference versus estimatr is a documented deviation, so that part is not a defect. docs/methodology/REGISTRY.md:L2296-L2301, tests/test_estimatr_iv_robust_parity.py:L45-L206
P1 The new sup-t bootstrap is still mis-specified on the supported survey=SurveyDesign(weights=...) event-study path when no explicit strata, psu, or fpc is present.
P1 The new mass-point weighted paths do not handle vcov_type="classical" consistently: the survey path overwrites classical SE with an HC1-aligned Binder composition while still labeling the result as classical, and the event-study cband path mixes classical SE with an HC1-scaled influence matrix.
The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and are non-blocking. TODO.md:L95-L96

Methodology

Affected methods: Wooldridge-style weighted 2SLS on the HAD mass-point path, Binder-TSL survey composition, and the new sup-t multiplier bootstrap.

P1 Impact: _sup_t_multiplier_bootstrap() only takes the survey-aware branch when strata, psu, or fpc is present, so the supported survey=SurveyDesign(weights=...) event-study path falls back to raw iid multipliers even though its analytical SE comes from the centered trivial-design Binder formula. The helper it is supposed to reuse already supports the no-strata/no-PSU case, and the registry/docstring both describe the survey branch as keyed off resolved_survey is not None, not “has explicit strata/psu/fpc”. This silently miscalibrates simultaneous bands on a supported survey path. Concrete fix: treat any non-None resolved_survey as a survey bootstrap input, including the implicit-PSU case, or reject survey=SurveyDesign(weights=...) when cband=True until that target is implemented. Add an H=1 regression for the trivial survey design. References: diff_diff/had.py:L1986-L2026, diff_diff/had.py:L2058-L2176, diff_diff/bootstrap_utils.py:L509-L579, diff_diff/survey.py:L1388-L1404, diff_diff/survey.py:L1586-L1592, docs/methodology/REGISTRY.md:L2305-L2315
P1 Impact: the new mass-point weighted paths accept vcov_type="classical" but the only returned influence function is explicitly HC1-scaled. On the survey= path, that HC1-aligned psi overwrites the classical sandwich SE via compute_survey_if_variance() while the result still reports vcov_type="classical"; on the event-study weights= + cband=True path, the band divides HC1-scale perturbations by classical se_e. That is an undocumented variance mismatch on changed code, not a documented registry deviation. Concrete fix: either reject vcov_type="classical" anywhere the weighted mass-point path needs psi (survey composition or cband), or derive/return a classical-aligned IF and thread it through both overall and event-study inference, with metadata updated to match the actual variance family. Add regressions for static survey= and event-study weights=/survey= with vcov_type="classical". References: diff_diff/had.py:L2426-L2482, diff_diff/had.py:L3336-L3367, diff_diff/had.py:L4097-L4147, docs/methodology/REGISTRY.md:L2296-L2303

Code Quality

No separate findings beyond the methodology issues above.

Performance

No findings.

Maintainability

No separate findings beyond the methodology issues above.

Tech Debt

P3 Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked, so they should not block this PR. Concrete fix: none required in this PR. References: TODO.md:L95-L96

Security

No findings.

Documentation/Tests

P2 Impact: the added tests lock HC1/CR1 parity, the prior re-review fixes, and the stratified survey sup-t branch, but they do not cover the supported trivial survey=SurveyDesign(weights="w") cband path or vcov_type="classical" on weighted mass-point survey/cband paths. That is why both P1 interactions above remain unguarded. Concrete fix: add those targeted regressions. References: tests/test_had.py:L4449-L4454, tests/test_had.py:L4644-L4770, tests/test_had.py:L4950-L4992, tests/test_estimatr_iv_robust_parity.py:L45-L206

Path to Approval

Make _sup_t_multiplier_bootstrap() use the survey-aware bootstrap for every non-None resolved_survey, including the no-strata/no-PSU case, or explicitly reject that cband combination until implemented; add an H=1 regression for survey=SurveyDesign(weights="w").
Resolve the weighted mass-point vcov_type="classical" interaction by either rejecting it on survey/cband paths or implementing a classical-aligned IF end-to-end; add regressions for static survey= and event-study weighted cband cases, and ensure result metadata matches the actual variance family.

Review based on diff inspection; I did not run the test suite in this environment.

R3 P1 (methodology — trivial-survey sup-t): _sup_t_multiplier_bootstrap now treats ANY non-None resolved_survey as survey-aware bootstrap input, including the trivial SurveyDesign(weights=...) case (no explicit strata / PSU / FPC). Previously the gate required explicit strata / psu / fpc, so the trivial survey path fell through to raw unit-level Rademacher — Var(xi @ Psi) = Σ psi², NOT the centered (n/(n-1))·Σ(ψ−ψ̄)² target that compute_survey_if_variance uses for the analytical SE. The existing "no strata → single implicit stratum" branch in _sup_t_multiplier_bootstrap now handles the trivial case end-to-end. H=1 trivial-survey regression locks q ≈ 1.96. R3 P1 (methodology — classical weighted mass-point): reject vcov_type='classical' on mass-point whenever the IF matrix is used (static survey= path; event-study survey= OR weights= shortcut with cband=True). The IF is HC1-scaled (sqrt((n-1)/(n-k)) factor in _fit_mass_point_2sls targets compute_survey_if_variance(psi, trivial) ≈ V_HC1[1,1]); mixing it with a classical analytical SE either through the survey Binder-TSL override or the sup-t bootstrap normalization would silently report a V_HC1-targeted SE under a classical vcov_type label. Rejection message points users to hc1. weights= shortcut + cband=False + classical continues to work unchanged (no IF consumption; analytical classical sandwich returned as-is). Regression tests (+4): - test_trivial_survey_h1_sup_t_matches_analytical: H=1 trivial SurveyDesign reduces to Normal quantile, confirms stratum-demean + sqrt(n/(n-1)) correction fires on the trivial case. - test_mass_point_classical_survey_rejected_static - test_mass_point_classical_event_study_with_cband_rejected - test_mass_point_classical_event_study_cband_false_accepts (complement: confirms classical+weights+cband=False still works, no IF consumption) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T21:40:41Z

/ai-review

github-actions · 2026-04-24T21:47:11Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cb8a711e10275e2059c4ad8ce9199859d7ea5b52

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior re-review blockers around the trivial-survey sup-t branch and the classical/HC1 IF-mixing issue look addressed.
P1 The new public mass-point guard now rejects cluster= on all weighted fits, including the valid weights= shortcut path. That overreaches the actual incompatibility, because _fit_mass_point_2sls and the new estimatr parity tests explicitly support weighted CR1.
P1 The new HAD sup-t path hard-errors on lonely_psu="adjust" with singleton strata, but the Methodology Registry still says the reused bootstrap helper provides lonely-PSU handling on this path. That is an undocumented methodology deviation.
P2 Weighted event-study results now report n_units / n_obs_per_horizon as the full design size under zero-weight subpopulation fits, not the contributing positive-weight subset.
P2 The new event-study result docstring says all weighted paths use Binder linearization, but the implementation and registry still distinguish survey= Binder from the weights= shortcut’s analytical HC1/2SLS path.
The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and non-blocking.

Methodology

Severity: P1. Impact: fit() now rejects cluster= whenever weights_unit_full is not None, which blocks the documented and parity-tested weights= + weighted-CR1 mass-point path on both overall and event-study fits, even though the actual survey/Binder incompatibility only exists on the survey= path (and on event-study weights= only when cband=True). This is an undocumented methodology/API mismatch against the registry’s weighted-2SLS contract and the new CR1 parity harness. Concrete fix: narrow the guard from weights_unit_full is not None to the actual incompatible cases: resolved_survey_unit_full is not None on the static path, and on event-study only reject weights= when cluster= and cband=True unless you add a cluster-aware bootstrap. References: diff_diff/had.py:L3314-L3324, diff_diff/had.py:L3951-L3960, docs/methodology/REGISTRY.md:L2296-L2301, tests/test_estimatr_iv_robust_parity.py:L106-L150, tests/test_had.py:L4850-L4878.
Severity: P1. Impact: _sup_t_multiplier_bootstrap() now raises on lonely_psu="adjust" with singleton strata, but the registry still documents the reused survey bootstrap helper as supplying lonely-PSU handling here, and the shared survey-bootstrap registry says "adjust" is supported. If this HAD-specific limitation is methodologically necessary, it needs to be documented as a deviation; otherwise this is an unsupported production edge case on a newly added survey path. Concrete fix: either implement pooled-singleton pseudo-stratum centering in _sup_t_multiplier_bootstrap() so the HAD sup-t path actually supports "adjust", or add a HAD-specific Note/Deviation in REGISTRY.md and keep the error/tests aligned with that narrower support matrix. References: diff_diff/had.py:L2070-L2107, docs/methodology/REGISTRY.md:L2307-L2311, docs/methodology/REGISTRY.md:L2998-L3008, tests/test_had.py:L4910-L4948.

Code Quality

Severity: P2. Impact: weighted event-study results set n_obs_per_horizon and n_units to G_full, even though the code explicitly drops zero-weight units from design resolution and uses the positive-weight subset for estimation. That violates the existing HAD weighted-count contract and overstates contributing sample size in summaries/dataframes. Concrete fix: keep survey_metadata on the full design, but report n_units / n_obs_per_horizon from the positive-weight contributing subset, and add an event-study analogue of the static zero-weight count regression. References: diff_diff/had.py:L556-L560, diff_diff/had.py:L581-L585, diff_diff/had.py:L4073-L4080, diff_diff/had.py:L4270-L4277, tests/test_had.py:L4092-L4110, docs/methodology/REGISTRY.md:L3139-L3145.

Performance

No findings.

Maintainability

No separate findings beyond the reporting/doc drift above.

Tech Debt

Severity: P3. Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked and should not block this PR. Concrete fix: none required in this PR. References: TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the new HeterogeneousAdoptionDiDEventStudyResults.se docstring says all weighted event-study fits use Binder linearization, but the implementation and registry still use Binder only on survey= and analytical HC1/2SLS on the weights= shortcut. The new event-study tests also cover cband population but not the zero-weight reporting contract that the static path already enforces. Concrete fix: update the docstring to match the actual variance split, and add an event-study regression for zero-weight n_units / n_obs_per_horizon. References: diff_diff/had.py:L540-L547, docs/methodology/REGISTRY.md:L2303-L2305, tests/test_had.py:L4595-L4670.

Path to Approval

Narrow the mass-point cluster= guard to the actually incompatible cases so the valid weights= + weighted-CR1 shortcut remains available; if event-study cband=True still cannot handle clustered weights, reject only that specific combination.
Resolve the lonely_psu="adjust" survey-bootstrap mismatch by either implementing the missing pooled-singleton centering step or documenting the HAD-specific unsupported combo in REGISTRY.md with a proper Note/Deviation.
Fix the weighted event-study reporting/documentation drift: report positive-weight contributing counts, correct the se docstring, and add a regression for the zero-weight event-study result contract.

Review based on diff inspection only; I could not run the test suite here because the environment is missing project dependencies (numpy).

R4 P1 (cluster guard overreach): narrow the mass-point cluster+weighted rejection. Previously rejected ANY cluster= + weighted fit, which blocked the valid weights= shortcut + cluster= weighted-CR1 pweight sandwich path (parity-tested vs estimatr::iv_robust se_type='stata'). Narrowed to: - Static path: reject only cluster= + survey= (Binder-TSL override would silently overwrite CR1 while metadata says vcov_type='cr1'). - Event-study path: reject cluster= + survey= OR cluster= + weights= + cband=True (sup-t bootstrap mixes HC1-scale perturbations with CR1 analytical SE). cluster= + weights= + cband=False is allowed (no IF consumption). R4 P1 (lonely_psu="adjust" deviation): added REGISTRY.md Note explicitly documenting the HAD-specific unsupported combo (weighted event-study + cband=True + lonely_psu="adjust" with singleton strata). The shared survey bootstrap helper supports pooled singletons but compute_survey_if_variance centers them at the global mean; matching the two requires a pseudo-stratum centering transform not yet derived. Other survey-bootstrap consumers (CS, dCDH, SDID) retain full "adjust" support. R4 P2 (n_units reporting): weighted event-study now reports the POSITIVE-WEIGHT contributing sample size in n_units and n_obs_per_horizon, matching the static-path n_obs contract. Full- design size still surfaces through survey_metadata.n_psu / effective_n / etc. R4 P2 (docstring drift): HeterogeneousAdoptionDiDEventStudyResults.se docstring now documents the three regimes separately — unweighted analytical sandwich, weights= shortcut analytical (HC1 / classical / CR1), and survey= Binder-TSL. Previously said "Binder-TSL for all weighted fits" which was wrong for the weights= shortcut. Regression tests (+5): - test_mass_point_weights_plus_cluster_shortcut_allowed - test_mass_point_weights_plus_cluster_event_study_cband_false_allowed - test_mass_point_weights_plus_cluster_event_study_cband_true_rejected - test_event_study_zero_weight_units_excluded_from_n_units - Updated test_mass_point_survey_plus_cluster_rejected_static to use survey= instead of weights= (matches narrowed guard) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T21:54:32Z

/ai-review

github-actions · 2026-04-24T22:01:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 35794442f3a40045627c26bcd21c0d77acfdbfb1

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior blockers around the mass-point cluster= guard, the lonely_psu="adjust" documentation mismatch, and weighted event-study count/docstring drift look addressed in the changed code and registry.
P1 The new “classical-on-IF-consumption” rejection is incomplete: it only fires when vcov_type="classical" is passed explicitly. The default mass-point config (vcov_type=None, robust=False) still resolves to classical and bypasses the guard on the new weighted paths.
On the static survey= mass-point path, that leaves Binder-TSL inference built from an HC1-scaled IF while the result still advertises classical variance settings.
On the weighted event-study mass-point path with weights= and default cband=True, it can also feed an HC1-scaled IF into the sup-t bootstrap while normalizing by classical per-horizon SE.
No new P0 NaN/inference-propagation issues stood out in the changed code; safe_inference() and the new cband NaN gate are used consistently.
The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and non-blocking.

Methodology

Severity: P1. Impact: the weighted mass-point “classical is unsupported when the IF is consumed” fix is only keyed off vcov_type_arg == "classical", not the effective default mapping vcov_type=None, robust=False -> "classical". That leaves two silent mismatches on newly added paths: (1) design="mass_point" + survey= on the overall path still composes Binder-TSL SE from the HC1-scaled IF while labeling the fit as classical, and (2) design="mass_point" + aggregate="event_study" + weights= with default cband=True can still mix HC1-scale bootstrap perturbations with classical analytical SE. This contradicts the new Methodology Registry deviations, which say these IF-consuming classical combinations should be rejected. Concrete fix: resolve vcov_requested first from vcov_type/robust, then apply the existing rejection to vcov_requested == "classical" in both branches. References: diff_diff/had.py:L2568-L2581, diff_diff/had.py:L3352-L3369, diff_diff/had.py:L3394-L3401, diff_diff/had.py:L4058-L4080, diff_diff/had.py:L4192-L4243, docs/methodology/REGISTRY.md:L2319-L2321.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the methodology guard gap above.

Tech Debt

Severity: P3. Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked and should not block this PR. Concrete fix: none in this PR. References: TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the new regression coverage only checks the explicit vcov_type="classical" rejection, so it misses the default vcov_type=None, robust=False path that currently slips through. Concrete fix: add regressions for HeterogeneousAdoptionDiD(design="mass_point") with survey= on the overall path, and with aggregate="event_study", weights=..., cband=True on the event-study path, both without setting vcov_type. References: tests/test_had.py:L5044-L5070, tests/test_had.py:L5072-L5105.

Path to Approval

In both mass-point branches, compute the effective vcov_requested before any classical-specific guard and reject IF-consuming classical combinations based on that resolved value, not only on an explicit vcov_type="classical" input.
Add regression tests for the default mass-point weighted cases (vcov_type=None, robust=False) on static survey= and weighted event-study cband=True so this branch cannot regress again.

R5 P1 (classical-guard bypass): the classical-on-IF-consumption rejection was only checking the RAW vcov_type kwarg, which missed the default mapping (vcov_type=None, robust=False → classical). Re-ordered both mass-point branches (static + event-study) to resolve vcov_requested first, then gate the rejection on the resolved value. Default-path users who pass HeterogeneousAdoptionDiD(design='mass_point').fit(..., survey=...) without an explicit vcov_type now hit the same clear NotImplementedError as the explicit vcov_type='classical' path. R5 P2 (regression coverage): added three tests covering the default-path cases that slipped through the previous guard: - test_mass_point_default_vcov_survey_rejected_static: default vcov_type=None + survey= on mass-point → NotImplementedError - test_mass_point_default_vcov_event_study_cband_rejected: default vcov_type=None + weights= + cband=True on event-study → rejects - test_mass_point_default_vcov_robust_true_survey_allowed: complement confirms robust=True (default mapping → hc1) survey= is allowed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T22:43:12Z

/ai-review

github-actions · 2026-04-24T22:50:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8d7cf9401186922e9996c7209cb14ad684b7fc6c

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. One P2 performance item and one P3 test-gap note remain.

Executive Summary

The prior P1 from the last review is fixed: both mass-point branches now reject IF-consuming weighted classical requests after resolving the effective vcov family, including the default vcov_type=None, robust=False -> "classical" mapping (diff_diff/had.py:L3343-L3376, diff_diff/had.py:L4058-L4092, tests/test_had.py:L5292-L5379).
The weighted 2SLS, event-study survey composition, and sup-t band line up with the updated Methodology Registry; the remaining restrictions are documented deviations rather than silent mismatches (docs/methodology/REGISTRY.md:L2296-L2321).
No new P0/P1 inference issues stood out: pointwise inference uses safe_inference(), and simultaneous-band endpoints are NaN-gated on invalid SEs (diff_diff/had.py:L2225-L2233, diff_diff/had.py:L3422-L3431, diff_diff/had.py:L4230-L4238).
P2: cband=False still allocates/populates the full weighted event-study IF matrix and forces IF returns on the weights= shortcut even though the bootstrap is skipped (diff_diff/had.py:L4130-L4255).
P3: the new event-study coverage is strong for weights= and the standalone bootstrap helper, but there is still no estimator-level survey= + aggregate="event_study" regression in the changed tests (tests/test_had.py:L4622-L4770, tests/test_had.py:L4911-L5042).
I could not execute the suite here because this environment lacks pytest and numpy.

Methodology

Severity: P3 informational. Impact: The previous blocker is resolved. The code now rejects IF-consuming weighted mass-point requests based on the resolved vcov family, so the default classical path no longer slips through. Concrete fix: none. References: diff_diff/had.py:L3343-L3376, diff_diff/had.py:L4058-L4092, tests/test_had.py:L5292-L5379.
Severity: P3 informational. Impact: The remaining methodology deviations are documented and front-door enforced: lonely_psu="adjust" with singleton strata on the sup-t path, classical-on-IF-consuming weighted mass-point paths, and mass-point cluster= + survey= / cluster= + weights= + cband=True. Under the stated review rules, these are non-defects. Concrete fix: none. References: diff_diff/had.py:L2080-L2119, diff_diff/had.py:L3327-L3376, diff_diff/had.py:L3969-L4006, docs/methodology/REGISTRY.md:L2305-L2321.

Code Quality

No findings.

Performance

Severity: P2. Impact: fit(..., aggregate="event_study", cband=False) still allocates Psi for every weighted horizon and, on the weights= shortcut, forces per-horizon IF computation even though the simultaneous band is disabled. That adds avoidable O(GH) memory and extra IF work to the opt-out path. Concrete fix: split needs_horizon_if from needs_stacked_if_matrix; only allocate/fill Psi when cband=True, and on the weights= shortcut avoid force_return_influence=True when cband=False. References: diff_diff/had.py:L4130-L4137, diff_diff/had.py:L4154-L4181, diff_diff/had.py:L4205-L4226, diff_diff/had.py:L4240-L4255, tests/test_had.py:L4622-L4643, tests/test_had.py:L5107-L5136.

Maintainability

No findings beyond the performance issue above.

Tech Debt

Severity: P3 informational. Impact: The remaining weighted event-study joint-covariance work and unweighted sup-t extension are explicitly tracked in TODO.md, so they are non-blocking under the deferred-work policy. Concrete fix: none in this PR. References: TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The changed tests cover weights= end-to-end and _sup_t_multiplier_bootstrap directly, but they do not add an estimator-level fit(..., aggregate="event_study", survey=SurveyDesign(...)) regression for the new positive survey path. That leaves the survey event-study dispatch/composition path without a full integration lock. Concrete fix: add one continuous and one mass-point survey-backed event-study regression asserting variance_formula, survey_metadata.df_survey, and cband_* behavior (or the documented rejection path where applicable). References: tests/test_had.py:L4622-L4770, tests/test_had.py:L4911-L5042.

R6 P2 (event-study opt-out overhead): cband=False on the weighted event-study path no longer allocates the stacked (G, H) IF matrix or forces per-horizon IF return on the weights= shortcut. Split two flags internally: needs_per_horizon_if = survey= path OR (weights= AND cband=True) needs_stacked_if_matrix = weights= AND cband=True (alias for weighted_es AND cband) - Psi allocation gated on needs_stacked_if_matrix. - _fit_continuous force_return_influence gated on (needs_stacked_if_matrix AND resolved_survey_unit_full is None) — under survey= path, _fit_continuous returns the IF anyway via its resolved_survey_unit gate, so no extra cost. - _fit_mass_point_2sls return_influence gated on needs_per_horizon_if — survey= path needs the per-horizon IF for the Binder-TSL override regardless of cband. Net effect: cband=False + weights= shortcut + weighted_es skips the O(GH) Psi allocation and the per-horizon IF work entirely. cband=True paths and survey= paths unchanged. R6 P3 (event-study survey= integration coverage): added two end-to-end integration tests for the previously-unguarded positive-path estimator-level survey= + aggregate='event_study' dispatch: - test_survey_event_study_continuous_end_to_end: continuous_at_zero + SurveyDesign(strata='stratum') — asserts variance_formula= 'survey_binder_tsl', survey_metadata.df_survey=G-n_strata, cband_* populated, PSU dispatch through _aggregate_unit_resolved_survey. - test_survey_event_study_mass_point_end_to_end: mass_point + SurveyDesign(strata=...) — asserts variance_formula= 'survey_binder_tsl_2sls' and that the 2SLS IF flows through per-horizon Binder-TSL + sup-t bootstrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T22:53:03Z

/ai-review

github-actions · 2026-04-24T23:01:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c325f3f22e98e96552ac4b16dd42d8c1f554c49

Overall Assessment

⛔ Blocker

One unmitigated P0 remains: the new mass-point weighted event-study shortcut miscalibrates the sup-t band on weights= + aggregate="event_study" + cband=True, producing silently too-narrow simultaneous confidence bands.

Executive Summary

The main prior re-review items I could verify are resolved: the event-study mass-point classical guard now keys off the resolved vcov family, the Psi/IF allocation split fixes the earlier cband=False waste, and estimator-level survey= event-study regressions were added (diff_diff/had.py:L4058-L4092, diff_diff/had.py:L4130-L4144, tests/test_had.py:L5354-L5454).
P0: on the new design="mass_point" weighted event-study shortcut, _fit_mass_point_2sls() returns an HC1-targeted IF, but _sup_t_multiplier_bootstrap() calibrates the weights= shortcut with raw unit-level multipliers against sum(psi^2) rather than the documented trivial-Binder / HC1 target. That makes the sup-t distribution under-dispersed and the simultaneous band too narrow with no warning (diff_diff/had.py:L2184-L2192, diff_diff/had.py:L2482-L2498, diff_diff/survey.py:L1214-L1231, docs/methodology/REGISTRY.md:L2301-L2313).
The affected scope is narrow: mass_point + weights= + event_study + cband=True. The survey= path is protected by the centered/scaled survey-aware bootstrap, and cband=False is unaffected (diff_diff/had.py:L2080-L2182, diff_diff/had.py:L4217-L4272).
The lonely-PSU / classical / cluster restrictions remain documented deviations in the Methodology Registry, so I am not counting them as defects (docs/methodology/REGISTRY.md:L2317-L2321).
The new tests still miss a calibration lock on the affected shortcut branch: generic H=1 and trivial-survey reduction tests exist, but the mass-point shortcut only has a smoke test that a band was populated (tests/test_had.py:L4502-L4538, tests/test_had.py:L4995-L5042, tests/test_had.py:L4736-L4770).
I did not execute the suite in this environment; the review is based on source inspection.

Methodology

Affected method: HeterogeneousAdoptionDiD mass-point weighted event-study / sup-t path (_fit_mass_point_2sls, _fit_event_study, _sup_t_multiplier_bootstrap).

Severity: P0. Impact: _fit_mass_point_2sls() explicitly scales its returned IF so that compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1, and the registry documents that same contract. But on the weights= shortcut, _fit_event_study() passes that IF matrix into _sup_t_multiplier_bootstrap() with resolved_survey=None, which takes the raw unit-level multiplier branch and treats Var(xi @ Psi) as sum(psi^2). For this 2SLS IF, those are not the same target: by the PR’s own derivation, raw multiplier variance is only ((n-1)/n) * V_HC1. The result is a silently under-scaled sup-t distribution and too-narrow simultaneous bands on the shortcut path. This also breaks the front-door contract that weights= is internally equivalent to survey=SurveyDesign(weights=w) on this branch. Concrete fix: on mass_point + weights= + event_study + cband=True, either run the shortcut through the same trivial-survey centering / sqrt(n/(n-1)) correction as the survey-aware branch, or return a separate shortcut-bootstrap IF whose raw sum(psi^2) matches the HC1 analytical variance. References: diff_diff/had.py:L3750-L3757, diff_diff/had.py:L4217-L4272, diff_diff/had.py:L2184-L2192, diff_diff/had.py:L2482-L2498, diff_diff/survey.py:L1214-L1231, docs/methodology/REGISTRY.md:L2301-L2313.

Code Quality

No findings.

Performance

No findings. The prior Psi allocation concern appears resolved by the needs_per_horizon_if / needs_stacked_if_matrix split in diff_diff/had.py:L4130-L4144.

Maintainability

No findings.

Tech Debt

No separate findings. The new follow-ups in TODO.md:L95-L96 are reasonable, but they do not track or mitigate the blocker above.

Security

No findings.

Documentation/Tests

Severity: P3 informational. Impact: the changed tests cover the generic H=1 reduction and the trivial-survey centered branch, but the affected mass-point weights= shortcut only has a smoke test that cband_crit_value is finite. That gap is why this branch-specific calibration error can slip through while the surrounding bootstrap tests still pass. Concrete fix: add a regression for design="mass_point", aggregate="event_study", weights= shortcut, cband=True, H=1 asserting q ≈ 1.96 and/or equality with survey=SurveyDesign(weights=...) on identical data. References: tests/test_had.py:L4502-L4538, tests/test_had.py:L4995-L5042, tests/test_had.py:L4736-L4770.

Path to Approval

Make the mass-point weights= shortcut cband path use the same variance family as the analytical HC1 SE. The smallest fix is to apply the trivial-survey centering and small-sample rescale before xi @ Psi, or equivalently route that branch through a synthetic trivial ResolvedSurveyDesign.
Add a regression test on the affected branch: design="mass_point", aggregate="event_study", weights= shortcut, cband=True, single horizon / H=1, asserting the shortcut critical value reduces to the normal quantile and matches the survey=SurveyDesign(weights=...) result on the same data.

R7 P0 (sup-t under-scaling on weights= shortcut): the per-unit IF returned by _fit_continuous / _fit_mass_point_2sls is HC1-scaled per the PR #359 convention — compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1. Routing the weights= shortcut through the unit-level ``resolved_survey=None`` branch of _sup_t_multiplier_bootstrap normalized against raw sum(psi²) = ((n-1)/n) · V_HC1, producing silently too-narrow simultaneous bands. Fix: when the weighted event-study + cband=True path runs, always route the sup-t bootstrap through a ResolvedSurveyDesign. On the weights= shortcut (no user-supplied survey), construct a synthetic trivial resolved (pweight, no strata/psu/fpc, lonely_psu='remove') so the centered + sqrt(n/(n-1))-corrected survey-aware branch fires. The no-strata/no-PSU path inside that branch falls through to the "single implicit stratum — demean across all PSUs, scale by sqrt(n_psu/(n_psu-1))" block, which gives Var_xi(xi @ Psi_psu) ≈ V_HC1 as the IF scale convention requires. Net effect: weights= shortcut and survey=SurveyDesign(weights=...) now target the SAME variance family in the bootstrap (~atol=0.05 between the two quantiles at matching seeds, bounded by the bc_fit.se_robust vs Binder-TSL per-horizon SE convergence tolerance from PR #359). Previously the shortcut was under-scaled by sqrt(n/(n-1)) relative to the analytical HC1 target. Regression tests (+2): - test_weights_shortcut_mass_point_h1_cband_matches_normal: helper- level H=1 lock with mass-point HC1-scaled IF + synthetic trivial resolved. q → Phi^-1(0.975) ≈ 1.96 at atol=0.15 (MC noise at B=5000). The pre-fix under-scaling would have produced q ≈ 1.94 (systematic drift outside MC noise). - test_weights_shortcut_cband_matches_trivial_survey: weights= shortcut and survey=SurveyDesign(weights='w') event-study cband quantiles agree within atol=0.05 on the same DGP / seed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T23:08:01Z

/ai-review

github-actions · 2026-04-24T23:14:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2431cce6e010b0e809ee75cdc6299f7869c3bbc8

Overall Assessment

✅ Looks good

Executive Summary

The prior blocker on the mass-point weights= event-study cband path appears resolved: the shortcut is now routed through a synthetic trivial ResolvedSurveyDesign, and _sup_t_multiplier_bootstrap() applies the centered/small-sample-corrected survey branch instead of the old raw shortcut calibration (diff_diff/had.py:L2069-L2176, diff_diff/had.py:L4257-L4288).
The weighted 2SLS formulas, HC1/CR1 parity locks, and HC1-scale IF contract line up with the new registry entry and dedicated parity tests (diff_diff/had.py:L2241-L2516, docs/methodology/REGISTRY.md:L2296-L2321, tests/test_estimatr_iv_robust_parity.py:L1-L206).
Parameter propagation looks complete: n_bootstrap/seed are exposed in estimator params, the event-study path separates per-horizon IF needs from stacked-bootstrap IF allocation, and weighted event-study results now surface survey_metadata, variance_formula, effective_dose_mean, and cband_* consistently (diff_diff/had.py:L2731-L2741, diff_diff/had.py:L4130-L4144, diff_diff/had.py:L4326-L4388).
I did not execute the suite here; the local Python environment is missing numpy, so this review is source-based.

Methodology

No unmitigated P0/P1 findings. The changed estimator logic now matches the documented HC1/Binder targets materially better than in the previous review state.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The follow-up work on analytical cross-horizon covariance and unweighted sup-t is properly tracked in TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the implementation now uses analytical HC1/2SLS pointwise SE on the weights= event-study shortcut and routes sup-t calibration through a synthetic trivial survey design, but parts of the prose still describe the weighted path as Binder-TSL/raw unit-level shortcut behavior. That can mislead future maintainers when reconciling the previous blocker fix. Concrete fix: update the _fit_event_study() docstring, the Phase 4.5 B changelog entry, and the registry sup-t bullet to describe the actual fixed shortcut branch: analytical HC1/2SLS pointwise SE on weights=, plus trivial-survey bootstrap routing for cband=True. References: diff_diff/had.py:L3728-L3733, diff_diff/had.py:L4217-L4240, diff_diff/had.py:L4257-L4288, docs/methodology/REGISTRY.md:L2305-L2311, CHANGELOG.md:L11.

R8 P3 (doc-consistency drift): prose in _fit_event_study docstring, REGISTRY sup-t bullet, and CHANGELOG entry still described the pre-R7-fix weighted-path behavior (Binder-TSL for all weighted fits; raw unit-level shortcut bootstrap). Updated all three to match the actual implementation: - _fit_event_study docstring: distinguishes survey= path (Binder-TSL) from weights= shortcut (analytical CCT-2014 / 2SLS pweight sandwich), documents trivial-survey bootstrap routing for cband. - REGISTRY.md: new "weights= shortcut ↔ bootstrap routing" bullet under the sup-t section explaining the synthetic trivial ResolvedSurveyDesign construction and why (centered + sqrt(n/(n-1))-corrected branch targets V_HC1, raw unit-level would give ((n-1)/n) · V_HC1 under-scaling). - CHANGELOG.md: per-horizon variance on weights= shortcut clarified as analytical (not Binder-TSL); sup-t routing through trivial resolved documented. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the Phase 4.5 C0 promise (PR #367 commit 29f8b12). Linearity- family pretests now accept survey=/weights= keyword-only kwargs: - stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, did_had_pretest_workflow. Stute family: PSU-level Mammen multiplier bootstrap via generate_survey_multiplier_weights_batch. Each replicate draws (B, n_psu) Mammen multipliers, broadcast to per-obs perturbation eta_obs[g] = eta_psu[psu(g)], weighted OLS refit, weighted CvM via new _cvm_statistic_weighted helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both vector-valued empirical-process unit-level dependence (Delgado 1993; Escanciano 2006) AND PSU clustering (Krieger-Pfeffermann 1997). NOT Rao-Wu rescaling -- multiplier bootstrap is a different mechanism. Yatchew: closed-form weighted OLS + pweight-sandwich variance components (no bootstrap): sigma2_lin = sum(w * eps^2) / sum(w) sigma2_diff = sum(w_avg * diff^2) / (2 * sum(w)) [Reviewer CRITICAL #2] sigma4_W = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg) T_hr = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W where w_avg_g = (w_g + w_{g-1}) / 2 (Krieger-Pfeffermann 1997 Section 3). All three components reduce bit-exactly to existing unweighted formulas at w=ones(G); locked at atol=1e-14 by direct helper test. Workflow under survey/weights: skips the QUG step with UserWarning (per C0 deferral), sets qug=None on the report, dispatches the linearity family with survey-aware mechanism, appends "linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0" suffix to the verdict. all_pass drops the QUG-conclusiveness gate (one less precondition). HADPretestReport.qug retyped from QUGTestResults to Optional[QUGTestResults]; summary/to_dict/to_dataframe updated to None-tolerant rendering. Pweight shortcut routing: weights= passes through a synthetic trivial ResolvedSurveyDesign (new survey._make_trivial_resolved helper) so the same kernel handles both entry paths -- mirrors PR #363's R7 fix pattern on HAD sup-t. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise NotImplementedError at every entry point (defense in depth, reciprocal- guard discipline). The per-replicate weight-ratio rescaling for the OLS-on-residuals refit step is not covered by the multiplier-bootstrap composition; deferred to a parallel follow-up. Per-row weights= / survey=col aggregated to per-unit via existing HAD helpers (_aggregate_unit_weights, _aggregate_unit_resolved_survey; constant-within-unit invariant enforced) through new _resolve_pretest_unit_weights helper. Strictly-positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Stability invariants preserved: - Unweighted code paths bit-exact pre-PR (the new survey/weights branch is a separate if arm; existing 138 pretest tests pass unchanged). - Yatchew weighted variance components reduce to unweighted at w=1 at atol=1e-14 (locked by TestYatchewHRTestSurvey). - HADPretestReport schema bit-exact on the unweighted path; qug=None triggers the new None-tolerant rendering only on the survey path. 20 new tests across TestHADPretestWorkflowSurveyGuards (revised from C0 rejection-only to C functional + 2 mutex/replicate-weight retained), TestStuteTestSurvey (7), TestYatchewHRTestSurvey (7), TestJointStuteSurvey (5). Full pretest suite: 158 tests pass. Patch-level addition (additive on stable surfaces). See docs/methodology/REGISTRY.md "QUG Null Test" -- Note (Phase 4.5 C) for the full methodology. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026

igerber merged commit c2d6755 into main Apr 25, 2026
23 of 24 checks passed

igerber deleted the had-phase-4.5-survey-dispatch branch April 25, 2026 00:21

This was referenced Apr 25, 2026

Release 3.3.0: HAD estimator, profile_panel, dCDH by_path, SDID survey complete #368

Merged

HAD Phase 4.5 C: linearity-family pretests under survey #370

Open

Conversation

igerber commented Apr 24, 2026

Summary

Public API

Stability invariants

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees