Skip to content

HAD Phase 4.5 B: weighted mass-point 2SLS + event-study survey composition + sup-t bootstrap#363

Merged
igerber merged 9 commits intomainfrom
had-phase-4.5-survey-dispatch
Apr 25, 2026
Merged

HAD Phase 4.5 B: weighted mass-point 2SLS + event-study survey composition + sup-t bootstrap#363
igerber merged 9 commits intomainfrom
had-phase-4.5-survey-dispatch

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 24, 2026

Summary

Closes the two Phase 4.5 A NotImplementedError gates:

  • design=\"mass_point\" + weights/survey at diff_diff/had.py:2807-2817
  • aggregate=\"event_study\" + weights/survey at diff_diff/had.py:2376-2382

Ships weighted 2SLS mass-point path, event-study survey composition for all three designs, and a multiplier-bootstrap sup-t simultaneous confidence band on the weighted event-study path.

  • Weighted 2SLS in _fit_mass_point_2sls follows Wooldridge 2010 Ch. 12 pweight convention ( in HC1 meat, w·u in CR1 cluster score, weighted bread Z'WX). HC1 and CR1 bit-exact with estimatr::iv_robust(..., weights=, clusters=) at atol=1e-10 on 5 DGPs (new cross-language golden).
  • Per-unit IF on β̂-scale scales so compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1] at atol=1e-10 (PR HAD Phase 4.5: survey support on continuous-dose paths #359 convention, applied uniformly across continuous + mass-point).
  • Event-study wiring: per-horizon loop threads weights_unit_full + resolved_survey_unit_full, composes Binder-TSL variance per horizon via compute_survey_if_variance, populates survey_metadata + variance_formula + effective_dose_mean (previously hardcoded None).
  • Sup-t multiplier bootstrap reuses generate_survey_multiplier_weights_batch / generate_bootstrap_weights_batch from diff_diff.bootstrap_utils (stratum centering, FPC, lonely-PSU). Perturbations xi @ Psi with NO (1/n) prefactor (matches staggered_bootstrap.py:373). At H=1 reduces to Φ⁻¹(1-α/2) ≈ 1.96 (locked by TestSupTReducesToNormalAtH1).

Public API

  • HeterogeneousAdoptionDiD.__init__ gains n_bootstrap: int = 999, seed: Optional[int] = None.
  • HeterogeneousAdoptionDiD.fit() gains cband: bool = True (weighted event-study only).
  • _fit_mass_point_2sls signature: new weights= + return_influence= kwargs; always returns 3-tuple (beta, se, psi).
  • HeterogeneousAdoptionDiDEventStudyResults gains variance_formula, effective_dose_mean, cband_low, cband_high, cband_crit_value, cband_method, cband_n_bootstrap (all None on unweighted fits).
  • All new fields surfaced in to_dict, to_dataframe, summary, __repr__.

Stability invariants

  • Unweighted mass-point and event-study paths bit-exact vs pre-PR (all 260 existing HAD tests pass unchanged).
  • PR HAD Phase 4.5: survey support on continuous-dose paths #359 survey tests (TestHADSurvey) unchanged on static path.
  • Non-pweight SurveyDesigns (aweight, fweight, replicate) rejected with NotImplementedError on both new paths (reciprocal-guard discipline).
  • filter_info identical across unweighted / uniform-weighted / informatively-weighted fits (identification-theory, not sampling-domain).
  • HAD pretests (qug_test, stute_test, yatchew_hr_test, joint variants, did_had_pretest_workflow) still don't accept weights/survey — deferred to Phase 4.5 C / C0.

Test plan

  • pytest tests/test_had.py tests/test_bias_corrected_lprobust.py tests/test_nprobust_port.py tests/test_estimatr_iv_robust_parity.py -v — all 380 pass, 2 skipped
  • Rscript benchmarks/R/generate_estimatr_iv_robust_golden.R — regenerates 5-DGP golden
  • PR HAD Phase 4.5: survey support on continuous-dose paths #359 survey regression (TestHADSurvey unchanged)
  • TestSupTReducesToNormalAtH1 confirms q ≈ 1.96 at H=1, G=500, B=5000
  • TestEstimatrIVRobustHC1Parity / TestEstimatrIVRobustCR1Parity confirm cross-language bit-parity at atol=1e-10
  • TestIFScaleInvariant confirms PR HAD Phase 4.5: survey support on continuous-dose paths #359 convention holds uniformly for 2SLS IF

🤖 Generated with Claude Code

…t-study survey composition + sup-t bootstrap

Closes the two Phase 4.5 A NotImplementedError gates on design='mass_point' +
weights/survey and aggregate='event_study' + weights/survey.

Weighted 2SLS in _fit_mass_point_2sls follows the Wooldridge 2010 Ch. 12
pweight convention (w² in HC1 meat, w·u in CR1 cluster score, weighted
bread Z'WX). HC1 and CR1 match estimatr::iv_robust bit-exactly at
atol=1e-10 (new cross-language golden). Per-unit IF on β̂-scale scales
so compute_survey_if_variance(psi, trivial) ≈ V_HC1 at atol=1e-10 (PR #359
convention applied uniformly).

Event-study path threads weights + survey through the per-horizon loop,
composing Binder-TSL variance per horizon and populating survey_metadata
+ variance_formula + effective_dose_mean (previously hardcoded None).

New _sup_t_multiplier_bootstrap helper reuses
generate_survey_multiplier_weights_batch / generate_bootstrap_weights_batch
from diff_diff.bootstrap_utils — no custom Rademacher draws, no (1/n)
prefactor. At H=1 reduces to Φ⁻¹(1-α/2) ≈ 1.96 (reduction-locked).

New __init__ kwargs: n_bootstrap=999, seed=None. New fit() kwarg:
cband=True. HeterogeneousAdoptionDiDEventStudyResults gains
survey_metadata + variance_formula + effective_dose_mean + cband_*
fields, surfaced through to_dict / to_dataframe / summary / __repr__.

Unweighted event-study output (att/se) bit-exactly preserved; cband
disabled on the unweighted path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

Blocker

Executive Summary

  • The weighted 2SLS HC1/CR1 core in _fit_mass_point_2sls() looks consistent with the updated registry/parity story; I did not count the documented classical-vs-estimatr deviation as a defect.
  • The new simultaneous-band path has a P0 NaN-gating bug: degenerate horizons can still emit finite cband_low/high, which is misleading statistical output.
  • The new survey sup-t implementation does not match its documented stratified-bootstrap method: it assumes stratum-centered PSU multipliers, but the reused helper does not center within strata.
  • The weights= event-study path violates the row-order contract by filtering with data_filtered.index, so valid non-RangeIndex inputs can misalign weights or fail.
  • The new TODO entries for analytical cross-horizon covariance and unweighted sup-t are properly tracked and are not blockers.

Methodology

  • P0 Impact: _sup_t_multiplier_bootstrap() excludes invalid horizons from the bootstrap t-statistics by replacing nonpositive/nonfinite SEs with NaN, but it then materializes cband_low/high with the raw se_per_horizon. That means a horizon with se == 0 gets a finite simultaneous band equal to the point estimate even though safe_inference() correctly sets the pointwise inference fields to NaN. This is partial NaN-guarding on new inference output and can silently overstate precision. Concrete fix: apply the same finite-and-positive SE mask when building cband_low/high, and if no horizons remain valid, blank the cband surface consistently. References: diff_diff/had.py:L2047-L2078, diff_diff/utils.py:L177-L210.
  • P1 Impact: The PR and registry say the new survey sup-t path reuses stratum-centered PSU multiplier draws, but generate_survey_multiplier_weights_batch() currently just draws iid weights within each stratum and applies FPC/lonely-PSU handling; it does not center within strata. The analytical survey variance target is explicitly centered at the stratum level, so the new HAD survey cband does not match the documented Binder-TSL/bootstrap contract for stratified designs. Concrete fix: center the PSU multipliers within each stratum for every draw, or equivalently center PSU-aggregated Psi within strata before perturbation, then add a stratified H=1 regression test against the analytical survey variance/critical-value target. References: diff_diff/had.py:L1933-L2078, diff_diff/bootstrap_utils.py:L556-L639, diff_diff/survey.py:L1257-L1313, docs/methodology/REGISTRY.md:L2305-L2315.

Code Quality

  • P1 Impact: The new weights= event-study path filters weights with w_full[data_filtered.index.to_numpy()], which treats DataFrame index labels as positional offsets. The public weights contract is row-order aligned, not index-label aligned. On a non-RangeIndex input this can either crash or, worse, silently attach the wrong weights to rows after filtering. Concrete fix: carry a positional keep mask out of _validate_had_panel_event_study(), or reset to a positional index before validation and slice weights by position only. Add tests with both custom integer and string indices. Reference: diff_diff/had.py:L3560-L3581.

Performance

  • No findings.

Maintainability

  • No findings beyond the documentation drift noted below.

Tech Debt

  • P3 Impact: The new follow-up items for weighted event-study analytical covariance and unweighted sup-t are properly tracked, so I do not count those deferred gaps against the PR. Concrete fix: none required in this PR. Reference: TODO.md:L95-L96.

Security

  • No findings.

Documentation/Tests

  • P3 Impact: Public HAD docstrings still describe the old Phase 4.5 A behavior: fit() says mass_point/aggregate="event_study" + survey/weights are unsupported, and the event-study result docstring still says Phase 2b has only pointwise CIs and survey_metadata is always None. That now contradicts both the code and the registry, and cband is not documented on the public fit() surface. Concrete fix: update the HeterogeneousAdoptionDiD.fit() and HeterogeneousAdoptionDiDEventStudyResults docstrings to reflect weighted event-study/mass-point support and the new cband fields; add regression tests for the zero-SE cband case and non-RangeIndex weights= case. References: diff_diff/had.py:L540-L545, diff_diff/had.py:L590-L591, diff_diff/had.py:L2666-L2706.

Path to Approval

  1. NaN-gate the new simultaneous-band outputs exactly like the existing pointwise inference surface, and add a regression test where one horizon has se <= 0.
  2. Fix weights= filtering on the event-study path to use positional row alignment rather than DataFrame index labels, and add tests with custom integer/string indices.
  3. Make the survey sup-t perturbation actually stratum-centered (or center the PSU-aggregated IF matrix before applying multipliers), and add a stratified H=1 validation against the analytical survey variance target.

R1 P0 (methodology): cband_low/high now NaN-gated for horizons with
se <= 0 or non-finite, matching the safe_inference contract for
pointwise output. Previously a horizon with se=0 produced a finite band
equal to the point estimate, overstating precision.

R1 P1 (methodology): sup-t multiplier bootstrap now applies
stratum-centered + small-sample-corrected PSU aggregation before the
matmul, so Var_xi(xi @ Psi_psu_scaled) reproduces the analytical
Binder-TSL variance term-for-term
(V = sum_h (1-f_h)(n_h/(n_h-1)) sum_j (psi_hj - psi_h_bar)^2).
Previously the bootstrap omitted centering and the small-sample
correction; under stratified designs the critical value diverged from
the analytical target. Verified at H=1 G=400 n_strata=4: bootstrap
q=1.985 matches Phi^-1(0.975)=1.960.

R1 P1 (code quality): event-study weights= filtering now translates
data_filtered.index LABELS to POSITIONAL offsets via
data.index.get_indexer, so non-RangeIndex inputs work correctly.
Previous code treated index labels as positions and silently broke on
custom int / string indices. Raises a clear ValueError when index
alignment fails (duplicate or malformed labels).

R1 P3 (docs): HeterogeneousAdoptionDiDEventStudyResults docstring
updated to describe Phase 4.5 B weighted/survey support + new cband_*
fields + variance_formula / effective_dose_mean; fit() docstring
updated to reflect the Phase 4.5 B dispatch matrix and document the
new cband kwarg.

Regression tests:
- test_zero_se_horizon_nan_gates_cband locks the P0 NaN gate
- test_weights_nonrange_index_aligned_positionally locks the P1
  row-order contract across RangeIndex vs custom int indices
- Existing TestSupTReducesToNormalAtH1 covers the stratum-centered
  path via the unit-PSU reduction case

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 082368a5e8982e78f780425cf4885397dd14534d


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previously reported blocker items look addressed: simultaneous-band NaN gating now matches safe_inference(), non-RangeIndex weights= slicing is positional, and ordinary stratified survey sup-t now centers PSU IFs before perturbation.
  • The weighted 2SLS core is aligned with the updated methodology registry; the documented classical-vs-estimatr difference is a registered deviation, not a defect.
  • P1 The new sup-t survey bootstrap is still inconsistent with the supported lonely_psu="adjust" mode. Singleton strata are pooled in the bootstrap helper, but _sup_t_multiplier_bootstrap() skips the matching singleton-PSU IF transformation, so the simultaneous band no longer targets the same Binder-TSL variance as compute_survey_if_variance(). diff_diff/had.py:L2064-L2127, diff_diff/bootstrap_utils.py:L599-L639, diff_diff/survey.py:L1683-L1704, docs/methodology/REGISTRY.md:L2305-L2315
  • P1 The new survey-enabled mass-point path accepts cluster=, then overwrites the CR1 SE with Binder-TSL while still surfacing vcov_type="cr1" / cluster_name. That is a silent inference mismatch on both overall and event-study fits. diff_diff/had.py:L3278-L3309, diff_diff/had.py:L3917-L4044, diff_diff/had.py:L4151-L4153
  • The new TODO items for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and are not blockers. TODO.md:L95-L96

Methodology

  • P1 Impact: lonely_psu="adjust" is a supported survey mode, but the new sup-t path only matches the analytical Binder-TSL target for ordinary strata. In _sup_t_multiplier_bootstrap(), singleton strata are skipped under the assumption that the bootstrap helper zeros them, but generate_survey_multiplier_weights_batch() explicitly gives pooled nonzero draws for "adjust". The analytical helper instead centers singleton PSU scores around the global mean. That leaves the simultaneous band wrong for supported singleton-strata designs, with no warning. Concrete fix: either implement the "adjust" branch in _sup_t_multiplier_bootstrap() so pooled singleton PSU columns are transformed to the same target used by compute_survey_if_variance(), or reject lonely_psu="adjust" when cband=True until that derivation is implemented. Add an H=1 regression on a singleton-strata design comparing bootstrap variance / critical value to the analytical target. diff_diff/had.py:L2105-L2115, diff_diff/bootstrap_utils.py:L599-L639, diff_diff/survey.py:L1683-L1704
  • P1 Impact: on the new mass-point survey= path, cluster= does not affect final inference even though the result still reports CR1 metadata. _fit_mass_point_2sls() is called with cluster_arr, but its SE is then overwritten by compute_survey_if_variance(). The returned result still advertises vcov_type="cr1" and cluster_name=..., so users are told they got cluster-robust inference when they did not. Concrete fix: reject cluster= with survey= on mass-point HAD until a combined methodology exists, or implement the combined variance explicitly and relabel outputs accordingly. Add regression coverage for both overall and aggregate="event_study" paths. diff_diff/had.py:L3278-L3309, diff_diff/had.py:L3917-L4044, diff_diff/had.py:L4151-L4153

Code Quality

  • No findings beyond the inference-path issues above.

Performance

  • No findings.

Maintainability

  • No additional findings.

Tech Debt

  • P3 Impact: the follow-up gaps for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked, so they should not block this PR. Concrete fix: none required in this PR. TODO.md:L95-L96

Security

  • No findings.

Documentation/Tests

  • No separate blocking documentation issues. The needed test additions are the ones tied to the two P1 fixes above.

Path to Approval

  1. Fix or explicitly reject lonely_psu="adjust" on the weighted event-study cband path, and add an H=1 singleton-strata regression that checks the bootstrap target against compute_survey_if_variance().
  2. Resolve the survey= + cluster= mass-point interaction by either rejecting it or implementing the combined inference properly, and ensure result metadata no longer claims CR1 when Binder-TSL is the actual variance path.

R2 P1 (methodology): reject SurveyDesign(lonely_psu='adjust') with
singleton strata in _sup_t_multiplier_bootstrap. The bootstrap helper
pools singletons into a pseudo-stratum with NONZERO multipliers, but
compute_survey_if_variance centers singleton PSU scores around the
global mean — without the matching pseudo-stratum centering transform
in the bootstrap, the simultaneous band target diverges from the
analytical Binder-TSL variance. Clear NotImplementedError points
users to lonely_psu='remove' (matches the 'remove' analytical target)
or cband=False (skips bootstrap). 'remove' / 'certainty' continue to
work unchanged. Deferred transform tracked for follow-up.

R2 P1 (code quality): reject cluster= + weights/survey= on
design='mass_point' on both static and event-study paths. The
weighted path composes Binder-TSL variance via
compute_survey_if_variance which was silently overriding the CR1
sandwich while result metadata still advertised vcov_type='cr1' and
cluster_name=<col>. Clean NotImplementedError with a pointer to the
two supported combinations: cluster= alone (unweighted CR1) or
survey/weights alone (Binder-TSL). Combined cluster-robust survey
inference requires a derivation not yet in scope.

Regression tests (+4):
- test_mass_point_survey_plus_cluster_rejected_static
- test_mass_point_survey_plus_cluster_rejected_event_study
- test_lonely_psu_adjust_with_singletons_rejected_on_cband
- test_stratified_h1_sup_t_matches_analytical (H=1 quantile lock
  under n_strata=4, psu_per_unit: q=1.985, matches Phi^-1(0.975))

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 4ecc6f437fbdaea479fec838f7d700a2ce452ee5


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The two P1 findings from the prior review look addressed: lonely_psu="adjust" with singleton strata now hard-errors on the cband path, and cluster= is rejected on weighted mass-point overall/event-study fits instead of silently mixing CR1 and Binder inference. diff_diff/had.py:L2064-L2103, diff_diff/had.py:L3301-L3320, diff_diff/had.py:L3917-L3934, tests/test_had.py:L4850-L4992
  • The weighted 2SLS HC1/CR1 core is aligned with the Methodology Registry and the new estimatr parity fixture; the weighted-classical difference versus estimatr is a documented deviation, so that part is not a defect. docs/methodology/REGISTRY.md:L2296-L2301, tests/test_estimatr_iv_robust_parity.py:L45-L206
  • P1 The new sup-t bootstrap is still mis-specified on the supported survey=SurveyDesign(weights=...) event-study path when no explicit strata, psu, or fpc is present.
  • P1 The new mass-point weighted paths do not handle vcov_type="classical" consistently: the survey path overwrites classical SE with an HC1-aligned Binder composition while still labeling the result as classical, and the event-study cband path mixes classical SE with an HC1-scaled influence matrix.
  • The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and are non-blocking. TODO.md:L95-L96

Methodology

Affected methods: Wooldridge-style weighted 2SLS on the HAD mass-point path, Binder-TSL survey composition, and the new sup-t multiplier bootstrap.

  • P1 Impact: _sup_t_multiplier_bootstrap() only takes the survey-aware branch when strata, psu, or fpc is present, so the supported survey=SurveyDesign(weights=...) event-study path falls back to raw iid multipliers even though its analytical SE comes from the centered trivial-design Binder formula. The helper it is supposed to reuse already supports the no-strata/no-PSU case, and the registry/docstring both describe the survey branch as keyed off resolved_survey is not None, not “has explicit strata/psu/fpc”. This silently miscalibrates simultaneous bands on a supported survey path. Concrete fix: treat any non-None resolved_survey as a survey bootstrap input, including the implicit-PSU case, or reject survey=SurveyDesign(weights=...) when cband=True until that target is implemented. Add an H=1 regression for the trivial survey design. References: diff_diff/had.py:L1986-L2026, diff_diff/had.py:L2058-L2176, diff_diff/bootstrap_utils.py:L509-L579, diff_diff/survey.py:L1388-L1404, diff_diff/survey.py:L1586-L1592, docs/methodology/REGISTRY.md:L2305-L2315
  • P1 Impact: the new mass-point weighted paths accept vcov_type="classical" but the only returned influence function is explicitly HC1-scaled. On the survey= path, that HC1-aligned psi overwrites the classical sandwich SE via compute_survey_if_variance() while the result still reports vcov_type="classical"; on the event-study weights= + cband=True path, the band divides HC1-scale perturbations by classical se_e. That is an undocumented variance mismatch on changed code, not a documented registry deviation. Concrete fix: either reject vcov_type="classical" anywhere the weighted mass-point path needs psi (survey composition or cband), or derive/return a classical-aligned IF and thread it through both overall and event-study inference, with metadata updated to match the actual variance family. Add regressions for static survey= and event-study weights=/survey= with vcov_type="classical". References: diff_diff/had.py:L2426-L2482, diff_diff/had.py:L3336-L3367, diff_diff/had.py:L4097-L4147, docs/methodology/REGISTRY.md:L2296-L2303

Code Quality

No separate findings beyond the methodology issues above.

Performance

No findings.

Maintainability

No separate findings beyond the methodology issues above.

Tech Debt

  • P3 Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked, so they should not block this PR. Concrete fix: none required in this PR. References: TODO.md:L95-L96

Security

No findings.

Documentation/Tests

  • P2 Impact: the added tests lock HC1/CR1 parity, the prior re-review fixes, and the stratified survey sup-t branch, but they do not cover the supported trivial survey=SurveyDesign(weights="w") cband path or vcov_type="classical" on weighted mass-point survey/cband paths. That is why both P1 interactions above remain unguarded. Concrete fix: add those targeted regressions. References: tests/test_had.py:L4449-L4454, tests/test_had.py:L4644-L4770, tests/test_had.py:L4950-L4992, tests/test_estimatr_iv_robust_parity.py:L45-L206

Path to Approval

  1. Make _sup_t_multiplier_bootstrap() use the survey-aware bootstrap for every non-None resolved_survey, including the no-strata/no-PSU case, or explicitly reject that cband combination until implemented; add an H=1 regression for survey=SurveyDesign(weights="w").
  2. Resolve the weighted mass-point vcov_type="classical" interaction by either rejecting it on survey/cband paths or implementing a classical-aligned IF end-to-end; add regressions for static survey= and event-study weighted cband cases, and ensure result metadata matches the actual variance family.

Review based on diff inspection; I did not run the test suite in this environment.

R3 P1 (methodology — trivial-survey sup-t): _sup_t_multiplier_bootstrap
now treats ANY non-None resolved_survey as survey-aware bootstrap input,
including the trivial SurveyDesign(weights=...) case (no explicit
strata / PSU / FPC). Previously the gate required explicit strata /
psu / fpc, so the trivial survey path fell through to raw unit-level
Rademacher — Var(xi @ Psi) = Σ psi², NOT the centered (n/(n-1))·Σ(ψ−ψ̄)²
target that compute_survey_if_variance uses for the analytical SE.
The existing "no strata → single implicit stratum" branch in
_sup_t_multiplier_bootstrap now handles the trivial case end-to-end.
H=1 trivial-survey regression locks q ≈ 1.96.

R3 P1 (methodology — classical weighted mass-point): reject
vcov_type='classical' on mass-point whenever the IF matrix is used
(static survey= path; event-study survey= OR weights= shortcut with
cband=True). The IF is HC1-scaled (sqrt((n-1)/(n-k)) factor in
_fit_mass_point_2sls targets compute_survey_if_variance(psi, trivial)
≈ V_HC1[1,1]); mixing it with a classical analytical SE either
through the survey Binder-TSL override or the sup-t bootstrap
normalization would silently report a V_HC1-targeted SE under a
classical vcov_type label. Rejection message points users to hc1.
weights= shortcut + cband=False + classical continues to work
unchanged (no IF consumption; analytical classical sandwich returned
as-is).

Regression tests (+4):
- test_trivial_survey_h1_sup_t_matches_analytical: H=1 trivial
  SurveyDesign reduces to Normal quantile, confirms stratum-demean +
  sqrt(n/(n-1)) correction fires on the trivial case.
- test_mass_point_classical_survey_rejected_static
- test_mass_point_classical_event_study_with_cband_rejected
- test_mass_point_classical_event_study_cband_false_accepts
  (complement: confirms classical+weights+cband=False still works,
  no IF consumption)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cb8a711e10275e2059c4ad8ce9199859d7ea5b52


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior re-review blockers around the trivial-survey sup-t branch and the classical/HC1 IF-mixing issue look addressed.
  • P1 The new public mass-point guard now rejects cluster= on all weighted fits, including the valid weights= shortcut path. That overreaches the actual incompatibility, because _fit_mass_point_2sls and the new estimatr parity tests explicitly support weighted CR1.
  • P1 The new HAD sup-t path hard-errors on lonely_psu="adjust" with singleton strata, but the Methodology Registry still says the reused bootstrap helper provides lonely-PSU handling on this path. That is an undocumented methodology deviation.
  • P2 Weighted event-study results now report n_units / n_obs_per_horizon as the full design size under zero-weight subpopulation fits, not the contributing positive-weight subset.
  • P2 The new event-study result docstring says all weighted paths use Binder linearization, but the implementation and registry still distinguish survey= Binder from the weights= shortcut’s analytical HC1/2SLS path.
  • The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and non-blocking.

Methodology

  • Severity: P1. Impact: fit() now rejects cluster= whenever weights_unit_full is not None, which blocks the documented and parity-tested weights= + weighted-CR1 mass-point path on both overall and event-study fits, even though the actual survey/Binder incompatibility only exists on the survey= path (and on event-study weights= only when cband=True). This is an undocumented methodology/API mismatch against the registry’s weighted-2SLS contract and the new CR1 parity harness. Concrete fix: narrow the guard from weights_unit_full is not None to the actual incompatible cases: resolved_survey_unit_full is not None on the static path, and on event-study only reject weights= when cluster= and cband=True unless you add a cluster-aware bootstrap. References: diff_diff/had.py:L3314-L3324, diff_diff/had.py:L3951-L3960, docs/methodology/REGISTRY.md:L2296-L2301, tests/test_estimatr_iv_robust_parity.py:L106-L150, tests/test_had.py:L4850-L4878.
  • Severity: P1. Impact: _sup_t_multiplier_bootstrap() now raises on lonely_psu="adjust" with singleton strata, but the registry still documents the reused survey bootstrap helper as supplying lonely-PSU handling here, and the shared survey-bootstrap registry says "adjust" is supported. If this HAD-specific limitation is methodologically necessary, it needs to be documented as a deviation; otherwise this is an unsupported production edge case on a newly added survey path. Concrete fix: either implement pooled-singleton pseudo-stratum centering in _sup_t_multiplier_bootstrap() so the HAD sup-t path actually supports "adjust", or add a HAD-specific Note/Deviation in REGISTRY.md and keep the error/tests aligned with that narrower support matrix. References: diff_diff/had.py:L2070-L2107, docs/methodology/REGISTRY.md:L2307-L2311, docs/methodology/REGISTRY.md:L2998-L3008, tests/test_had.py:L4910-L4948.

Code Quality

  • Severity: P2. Impact: weighted event-study results set n_obs_per_horizon and n_units to G_full, even though the code explicitly drops zero-weight units from design resolution and uses the positive-weight subset for estimation. That violates the existing HAD weighted-count contract and overstates contributing sample size in summaries/dataframes. Concrete fix: keep survey_metadata on the full design, but report n_units / n_obs_per_horizon from the positive-weight contributing subset, and add an event-study analogue of the static zero-weight count regression. References: diff_diff/had.py:L556-L560, diff_diff/had.py:L581-L585, diff_diff/had.py:L4073-L4080, diff_diff/had.py:L4270-L4277, tests/test_had.py:L4092-L4110, docs/methodology/REGISTRY.md:L3139-L3145.

Performance

  • No findings.

Maintainability

  • No separate findings beyond the reporting/doc drift above.

Tech Debt

  • Severity: P3. Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked and should not block this PR. Concrete fix: none required in this PR. References: TODO.md:L95-L96.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new HeterogeneousAdoptionDiDEventStudyResults.se docstring says all weighted event-study fits use Binder linearization, but the implementation and registry still use Binder only on survey= and analytical HC1/2SLS on the weights= shortcut. The new event-study tests also cover cband population but not the zero-weight reporting contract that the static path already enforces. Concrete fix: update the docstring to match the actual variance split, and add an event-study regression for zero-weight n_units / n_obs_per_horizon. References: diff_diff/had.py:L540-L547, docs/methodology/REGISTRY.md:L2303-L2305, tests/test_had.py:L4595-L4670.

Path to Approval

  1. Narrow the mass-point cluster= guard to the actually incompatible cases so the valid weights= + weighted-CR1 shortcut remains available; if event-study cband=True still cannot handle clustered weights, reject only that specific combination.
  2. Resolve the lonely_psu="adjust" survey-bootstrap mismatch by either implementing the missing pooled-singleton centering step or documenting the HAD-specific unsupported combo in REGISTRY.md with a proper Note/Deviation.
  3. Fix the weighted event-study reporting/documentation drift: report positive-weight contributing counts, correct the se docstring, and add a regression for the zero-weight event-study result contract.

Review based on diff inspection only; I could not run the test suite here because the environment is missing project dependencies (numpy).

R4 P1 (cluster guard overreach): narrow the mass-point
cluster+weighted rejection. Previously rejected ANY cluster= +
weighted fit, which blocked the valid weights= shortcut + cluster=
weighted-CR1 pweight sandwich path (parity-tested vs
estimatr::iv_robust se_type='stata'). Narrowed to:
- Static path: reject only cluster= + survey= (Binder-TSL override
  would silently overwrite CR1 while metadata says vcov_type='cr1').
- Event-study path: reject cluster= + survey= OR cluster= + weights=
  + cband=True (sup-t bootstrap mixes HC1-scale perturbations with
  CR1 analytical SE). cluster= + weights= + cband=False is allowed
  (no IF consumption).

R4 P1 (lonely_psu="adjust" deviation): added REGISTRY.md Note
explicitly documenting the HAD-specific unsupported combo (weighted
event-study + cband=True + lonely_psu="adjust" with singleton
strata). The shared survey bootstrap helper supports pooled
singletons but compute_survey_if_variance centers them at the global
mean; matching the two requires a pseudo-stratum centering transform
not yet derived. Other survey-bootstrap consumers (CS, dCDH, SDID)
retain full "adjust" support.

R4 P2 (n_units reporting): weighted event-study now reports the
POSITIVE-WEIGHT contributing sample size in n_units and
n_obs_per_horizon, matching the static-path n_obs contract. Full-
design size still surfaces through survey_metadata.n_psu /
effective_n / etc.

R4 P2 (docstring drift): HeterogeneousAdoptionDiDEventStudyResults.se
docstring now documents the three regimes separately — unweighted
analytical sandwich, weights= shortcut analytical (HC1 / classical /
CR1), and survey= Binder-TSL. Previously said "Binder-TSL for all
weighted fits" which was wrong for the weights= shortcut.

Regression tests (+5):
- test_mass_point_weights_plus_cluster_shortcut_allowed
- test_mass_point_weights_plus_cluster_event_study_cband_false_allowed
- test_mass_point_weights_plus_cluster_event_study_cband_true_rejected
- test_event_study_zero_weight_units_excluded_from_n_units
- Updated test_mass_point_survey_plus_cluster_rejected_static to use
  survey= instead of weights= (matches narrowed guard)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 35794442f3a40045627c26bcd21c0d77acfdbfb1


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior blockers around the mass-point cluster= guard, the lonely_psu="adjust" documentation mismatch, and weighted event-study count/docstring drift look addressed in the changed code and registry.
  • P1 The new “classical-on-IF-consumption” rejection is incomplete: it only fires when vcov_type="classical" is passed explicitly. The default mass-point config (vcov_type=None, robust=False) still resolves to classical and bypasses the guard on the new weighted paths.
  • On the static survey= mass-point path, that leaves Binder-TSL inference built from an HC1-scaled IF while the result still advertises classical variance settings.
  • On the weighted event-study mass-point path with weights= and default cband=True, it can also feed an HC1-scaled IF into the sup-t bootstrap while normalizing by classical per-horizon SE.
  • No new P0 NaN/inference-propagation issues stood out in the changed code; safe_inference() and the new cband NaN gate are used consistently.
  • The new TODO entries for weighted analytical cross-horizon covariance and unweighted sup-t are properly tracked and non-blocking.

Methodology

  • Severity: P1. Impact: the weighted mass-point “classical is unsupported when the IF is consumed” fix is only keyed off vcov_type_arg == "classical", not the effective default mapping vcov_type=None, robust=False -> "classical". That leaves two silent mismatches on newly added paths: (1) design="mass_point" + survey= on the overall path still composes Binder-TSL SE from the HC1-scaled IF while labeling the fit as classical, and (2) design="mass_point" + aggregate="event_study" + weights= with default cband=True can still mix HC1-scale bootstrap perturbations with classical analytical SE. This contradicts the new Methodology Registry deviations, which say these IF-consuming classical combinations should be rejected. Concrete fix: resolve vcov_requested first from vcov_type/robust, then apply the existing rejection to vcov_requested == "classical" in both branches. References: diff_diff/had.py:L2568-L2581, diff_diff/had.py:L3352-L3369, diff_diff/had.py:L3394-L3401, diff_diff/had.py:L4058-L4080, diff_diff/had.py:L4192-L4243, docs/methodology/REGISTRY.md:L2319-L2321.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings beyond the methodology guard gap above.

Tech Debt

  • Severity: P3. Impact: the follow-ups for weighted analytical cross-horizon covariance and unweighted sup-t are explicitly tracked and should not block this PR. Concrete fix: none in this PR. References: TODO.md:L95-L96.

Security

  • No findings.

Documentation/Tests

  • Severity: P2. Impact: the new regression coverage only checks the explicit vcov_type="classical" rejection, so it misses the default vcov_type=None, robust=False path that currently slips through. Concrete fix: add regressions for HeterogeneousAdoptionDiD(design="mass_point") with survey= on the overall path, and with aggregate="event_study", weights=..., cband=True on the event-study path, both without setting vcov_type. References: tests/test_had.py:L5044-L5070, tests/test_had.py:L5072-L5105.

Path to Approval

  1. In both mass-point branches, compute the effective vcov_requested before any classical-specific guard and reject IF-consuming classical combinations based on that resolved value, not only on an explicit vcov_type="classical" input.
  2. Add regression tests for the default mass-point weighted cases (vcov_type=None, robust=False) on static survey= and weighted event-study cband=True so this branch cannot regress again.

R5 P1 (classical-guard bypass): the classical-on-IF-consumption
rejection was only checking the RAW vcov_type kwarg, which missed
the default mapping (vcov_type=None, robust=False → classical).
Re-ordered both mass-point branches (static + event-study) to
resolve vcov_requested first, then gate the rejection on the
resolved value. Default-path users who pass
HeterogeneousAdoptionDiD(design='mass_point').fit(..., survey=...)
without an explicit vcov_type now hit the same clear
NotImplementedError as the explicit vcov_type='classical' path.

R5 P2 (regression coverage): added three tests covering the
default-path cases that slipped through the previous guard:
- test_mass_point_default_vcov_survey_rejected_static: default
  vcov_type=None + survey= on mass-point → NotImplementedError
- test_mass_point_default_vcov_event_study_cband_rejected: default
  vcov_type=None + weights= + cband=True on event-study → rejects
- test_mass_point_default_vcov_robust_true_survey_allowed: complement
  confirms robust=True (default mapping → hc1) survey= is allowed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8d7cf9401186922e9996c7209cb14ad684b7fc6c


Overall Assessment

Looks good

No unmitigated P0/P1 findings. One P2 performance item and one P3 test-gap note remain.

Executive Summary

  • The prior P1 from the last review is fixed: both mass-point branches now reject IF-consuming weighted classical requests after resolving the effective vcov family, including the default vcov_type=None, robust=False -> "classical" mapping (diff_diff/had.py:L3343-L3376, diff_diff/had.py:L4058-L4092, tests/test_had.py:L5292-L5379).
  • The weighted 2SLS, event-study survey composition, and sup-t band line up with the updated Methodology Registry; the remaining restrictions are documented deviations rather than silent mismatches (docs/methodology/REGISTRY.md:L2296-L2321).
  • No new P0/P1 inference issues stood out: pointwise inference uses safe_inference(), and simultaneous-band endpoints are NaN-gated on invalid SEs (diff_diff/had.py:L2225-L2233, diff_diff/had.py:L3422-L3431, diff_diff/had.py:L4230-L4238).
  • P2: cband=False still allocates/populates the full weighted event-study IF matrix and forces IF returns on the weights= shortcut even though the bootstrap is skipped (diff_diff/had.py:L4130-L4255).
  • P3: the new event-study coverage is strong for weights= and the standalone bootstrap helper, but there is still no estimator-level survey= + aggregate="event_study" regression in the changed tests (tests/test_had.py:L4622-L4770, tests/test_had.py:L4911-L5042).
  • I could not execute the suite here because this environment lacks pytest and numpy.

Methodology

  • Severity: P3 informational. Impact: The previous blocker is resolved. The code now rejects IF-consuming weighted mass-point requests based on the resolved vcov family, so the default classical path no longer slips through. Concrete fix: none. References: diff_diff/had.py:L3343-L3376, diff_diff/had.py:L4058-L4092, tests/test_had.py:L5292-L5379.
  • Severity: P3 informational. Impact: The remaining methodology deviations are documented and front-door enforced: lonely_psu="adjust" with singleton strata on the sup-t path, classical-on-IF-consuming weighted mass-point paths, and mass-point cluster= + survey= / cluster= + weights= + cband=True. Under the stated review rules, these are non-defects. Concrete fix: none. References: diff_diff/had.py:L2080-L2119, diff_diff/had.py:L3327-L3376, diff_diff/had.py:L3969-L4006, docs/methodology/REGISTRY.md:L2305-L2321.

Code Quality

  • No findings.

Performance

  • Severity: P2. Impact: fit(..., aggregate="event_study", cband=False) still allocates Psi for every weighted horizon and, on the weights= shortcut, forces per-horizon IF computation even though the simultaneous band is disabled. That adds avoidable O(GH) memory and extra IF work to the opt-out path. Concrete fix: split needs_horizon_if from needs_stacked_if_matrix; only allocate/fill Psi when cband=True, and on the weights= shortcut avoid force_return_influence=True when cband=False. References: diff_diff/had.py:L4130-L4137, diff_diff/had.py:L4154-L4181, diff_diff/had.py:L4205-L4226, diff_diff/had.py:L4240-L4255, tests/test_had.py:L4622-L4643, tests/test_had.py:L5107-L5136.

Maintainability

  • No findings beyond the performance issue above.

Tech Debt

  • Severity: P3 informational. Impact: The remaining weighted event-study joint-covariance work and unweighted sup-t extension are explicitly tracked in TODO.md, so they are non-blocking under the deferred-work policy. Concrete fix: none in this PR. References: TODO.md:L95-L96.

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: The changed tests cover weights= end-to-end and _sup_t_multiplier_bootstrap directly, but they do not add an estimator-level fit(..., aggregate="event_study", survey=SurveyDesign(...)) regression for the new positive survey path. That leaves the survey event-study dispatch/composition path without a full integration lock. Concrete fix: add one continuous and one mass-point survey-backed event-study regression asserting variance_formula, survey_metadata.df_survey, and cband_* behavior (or the documented rejection path where applicable). References: tests/test_had.py:L4622-L4770, tests/test_had.py:L4911-L5042.

R6 P2 (event-study opt-out overhead): cband=False on the weighted
event-study path no longer allocates the stacked (G, H) IF matrix or
forces per-horizon IF return on the weights= shortcut. Split two
flags internally:

  needs_per_horizon_if = survey= path OR (weights= AND cband=True)
  needs_stacked_if_matrix = weights= AND cband=True  (alias for
                           weighted_es AND cband)

- Psi allocation gated on needs_stacked_if_matrix.
- _fit_continuous force_return_influence gated on
  (needs_stacked_if_matrix AND resolved_survey_unit_full is None) —
  under survey= path, _fit_continuous returns the IF anyway via its
  resolved_survey_unit gate, so no extra cost.
- _fit_mass_point_2sls return_influence gated on
  needs_per_horizon_if — survey= path needs the per-horizon IF for
  the Binder-TSL override regardless of cband.

Net effect: cband=False + weights= shortcut + weighted_es skips the
O(GH) Psi allocation and the per-horizon IF work entirely. cband=True
paths and survey= paths unchanged.

R6 P3 (event-study survey= integration coverage): added two end-to-end
integration tests for the previously-unguarded positive-path
estimator-level survey= + aggregate='event_study' dispatch:

- test_survey_event_study_continuous_end_to_end: continuous_at_zero +
  SurveyDesign(strata='stratum') — asserts variance_formula=
  'survey_binder_tsl', survey_metadata.df_survey=G-n_strata, cband_*
  populated, PSU dispatch through _aggregate_unit_resolved_survey.
- test_survey_event_study_mass_point_end_to_end: mass_point +
  SurveyDesign(strata=...) — asserts variance_formula=
  'survey_binder_tsl_2sls' and that the 2SLS IF flows through
  per-horizon Binder-TSL + sup-t bootstrap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2c325f3f22e98e96552ac4b16dd42d8c1f554c49


Overall Assessment

Blocker

One unmitigated P0 remains: the new mass-point weighted event-study shortcut miscalibrates the sup-t band on weights= + aggregate="event_study" + cband=True, producing silently too-narrow simultaneous confidence bands.

Executive Summary

  • The main prior re-review items I could verify are resolved: the event-study mass-point classical guard now keys off the resolved vcov family, the Psi/IF allocation split fixes the earlier cband=False waste, and estimator-level survey= event-study regressions were added (diff_diff/had.py:L4058-L4092, diff_diff/had.py:L4130-L4144, tests/test_had.py:L5354-L5454).
  • P0: on the new design="mass_point" weighted event-study shortcut, _fit_mass_point_2sls() returns an HC1-targeted IF, but _sup_t_multiplier_bootstrap() calibrates the weights= shortcut with raw unit-level multipliers against sum(psi^2) rather than the documented trivial-Binder / HC1 target. That makes the sup-t distribution under-dispersed and the simultaneous band too narrow with no warning (diff_diff/had.py:L2184-L2192, diff_diff/had.py:L2482-L2498, diff_diff/survey.py:L1214-L1231, docs/methodology/REGISTRY.md:L2301-L2313).
  • The affected scope is narrow: mass_point + weights= + event_study + cband=True. The survey= path is protected by the centered/scaled survey-aware bootstrap, and cband=False is unaffected (diff_diff/had.py:L2080-L2182, diff_diff/had.py:L4217-L4272).
  • The lonely-PSU / classical / cluster restrictions remain documented deviations in the Methodology Registry, so I am not counting them as defects (docs/methodology/REGISTRY.md:L2317-L2321).
  • The new tests still miss a calibration lock on the affected shortcut branch: generic H=1 and trivial-survey reduction tests exist, but the mass-point shortcut only has a smoke test that a band was populated (tests/test_had.py:L4502-L4538, tests/test_had.py:L4995-L5042, tests/test_had.py:L4736-L4770).
  • I did not execute the suite in this environment; the review is based on source inspection.

Methodology

Affected method: HeterogeneousAdoptionDiD mass-point weighted event-study / sup-t path (_fit_mass_point_2sls, _fit_event_study, _sup_t_multiplier_bootstrap).

  • Severity: P0. Impact: _fit_mass_point_2sls() explicitly scales its returned IF so that compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1, and the registry documents that same contract. But on the weights= shortcut, _fit_event_study() passes that IF matrix into _sup_t_multiplier_bootstrap() with resolved_survey=None, which takes the raw unit-level multiplier branch and treats Var(xi @ Psi) as sum(psi^2). For this 2SLS IF, those are not the same target: by the PR’s own derivation, raw multiplier variance is only ((n-1)/n) * V_HC1. The result is a silently under-scaled sup-t distribution and too-narrow simultaneous bands on the shortcut path. This also breaks the front-door contract that weights= is internally equivalent to survey=SurveyDesign(weights=w) on this branch. Concrete fix: on mass_point + weights= + event_study + cband=True, either run the shortcut through the same trivial-survey centering / sqrt(n/(n-1)) correction as the survey-aware branch, or return a separate shortcut-bootstrap IF whose raw sum(psi^2) matches the HC1 analytical variance. References: diff_diff/had.py:L3750-L3757, diff_diff/had.py:L4217-L4272, diff_diff/had.py:L2184-L2192, diff_diff/had.py:L2482-L2498, diff_diff/survey.py:L1214-L1231, docs/methodology/REGISTRY.md:L2301-L2313.

Code Quality

No findings.

Performance

No findings. The prior Psi allocation concern appears resolved by the needs_per_horizon_if / needs_stacked_if_matrix split in diff_diff/had.py:L4130-L4144.

Maintainability

No findings.

Tech Debt

No separate findings. The new follow-ups in TODO.md:L95-L96 are reasonable, but they do not track or mitigate the blocker above.

Security

No findings.

Documentation/Tests

  • Severity: P3 informational. Impact: the changed tests cover the generic H=1 reduction and the trivial-survey centered branch, but the affected mass-point weights= shortcut only has a smoke test that cband_crit_value is finite. That gap is why this branch-specific calibration error can slip through while the surrounding bootstrap tests still pass. Concrete fix: add a regression for design="mass_point", aggregate="event_study", weights= shortcut, cband=True, H=1 asserting q ≈ 1.96 and/or equality with survey=SurveyDesign(weights=...) on identical data. References: tests/test_had.py:L4502-L4538, tests/test_had.py:L4995-L5042, tests/test_had.py:L4736-L4770.

Path to Approval

  1. Make the mass-point weights= shortcut cband path use the same variance family as the analytical HC1 SE. The smallest fix is to apply the trivial-survey centering and small-sample rescale before xi @ Psi, or equivalently route that branch through a synthetic trivial ResolvedSurveyDesign.
  2. Add a regression test on the affected branch: design="mass_point", aggregate="event_study", weights= shortcut, cband=True, single horizon / H=1, asserting the shortcut critical value reduces to the normal quantile and matches the survey=SurveyDesign(weights=...) result on the same data.

R7 P0 (sup-t under-scaling on weights= shortcut): the per-unit IF
returned by _fit_continuous / _fit_mass_point_2sls is HC1-scaled per
the PR #359 convention — compute_survey_if_variance(psi,
trivial_resolved) ≈ V_HC1. Routing the weights= shortcut through the
unit-level ``resolved_survey=None`` branch of
_sup_t_multiplier_bootstrap normalized against raw sum(psi²) =
((n-1)/n) · V_HC1, producing silently too-narrow simultaneous bands.

Fix: when the weighted event-study + cband=True path runs, always
route the sup-t bootstrap through a ResolvedSurveyDesign. On the
weights= shortcut (no user-supplied survey), construct a synthetic
trivial resolved (pweight, no strata/psu/fpc, lonely_psu='remove')
so the centered + sqrt(n/(n-1))-corrected survey-aware branch fires.
The no-strata/no-PSU path inside that branch falls through to the
"single implicit stratum — demean across all PSUs, scale by
sqrt(n_psu/(n_psu-1))" block, which gives Var_xi(xi @ Psi_psu) ≈
V_HC1 as the IF scale convention requires.

Net effect: weights= shortcut and survey=SurveyDesign(weights=...)
now target the SAME variance family in the bootstrap (~atol=0.05
between the two quantiles at matching seeds, bounded by the
bc_fit.se_robust vs Binder-TSL per-horizon SE convergence
tolerance from PR #359). Previously the shortcut was under-scaled
by sqrt(n/(n-1)) relative to the analytical HC1 target.

Regression tests (+2):
- test_weights_shortcut_mass_point_h1_cband_matches_normal: helper-
  level H=1 lock with mass-point HC1-scaled IF + synthetic trivial
  resolved. q → Phi^-1(0.975) ≈ 1.96 at atol=0.15 (MC noise at
  B=5000). The pre-fix under-scaling would have produced q ≈ 1.94
  (systematic drift outside MC noise).
- test_weights_shortcut_cband_matches_trivial_survey: weights=
  shortcut and survey=SurveyDesign(weights='w') event-study cband
  quantiles agree within atol=0.05 on the same DGP / seed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 24, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2431cce6e010b0e809ee75cdc6299f7869c3bbc8


Overall Assessment

✅ Looks good

Executive Summary

  • The prior blocker on the mass-point weights= event-study cband path appears resolved: the shortcut is now routed through a synthetic trivial ResolvedSurveyDesign, and _sup_t_multiplier_bootstrap() applies the centered/small-sample-corrected survey branch instead of the old raw shortcut calibration (diff_diff/had.py:L2069-L2176, diff_diff/had.py:L4257-L4288).
  • The weighted 2SLS formulas, HC1/CR1 parity locks, and HC1-scale IF contract line up with the new registry entry and dedicated parity tests (diff_diff/had.py:L2241-L2516, docs/methodology/REGISTRY.md:L2296-L2321, tests/test_estimatr_iv_robust_parity.py:L1-L206).
  • Parameter propagation looks complete: n_bootstrap/seed are exposed in estimator params, the event-study path separates per-horizon IF needs from stacked-bootstrap IF allocation, and weighted event-study results now surface survey_metadata, variance_formula, effective_dose_mean, and cband_* consistently (diff_diff/had.py:L2731-L2741, diff_diff/had.py:L4130-L4144, diff_diff/had.py:L4326-L4388).
  • I did not execute the suite here; the local Python environment is missing numpy, so this review is source-based.

Methodology

No unmitigated P0/P1 findings. The changed estimator logic now matches the documented HC1/Binder targets materially better than in the previous review state.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The follow-up work on analytical cross-horizon covariance and unweighted sup-t is properly tracked in TODO.md:L95-L96.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: the implementation now uses analytical HC1/2SLS pointwise SE on the weights= event-study shortcut and routes sup-t calibration through a synthetic trivial survey design, but parts of the prose still describe the weighted path as Binder-TSL/raw unit-level shortcut behavior. That can mislead future maintainers when reconciling the previous blocker fix. Concrete fix: update the _fit_event_study() docstring, the Phase 4.5 B changelog entry, and the registry sup-t bullet to describe the actual fixed shortcut branch: analytical HC1/2SLS pointwise SE on weights=, plus trivial-survey bootstrap routing for cband=True. References: diff_diff/had.py:L3728-L3733, diff_diff/had.py:L4217-L4240, diff_diff/had.py:L4257-L4288, docs/methodology/REGISTRY.md:L2305-L2311, CHANGELOG.md:L11.

R8 P3 (doc-consistency drift): prose in _fit_event_study docstring,
REGISTRY sup-t bullet, and CHANGELOG entry still described the
pre-R7-fix weighted-path behavior (Binder-TSL for all weighted fits;
raw unit-level shortcut bootstrap). Updated all three to match the
actual implementation:

- _fit_event_study docstring: distinguishes survey= path (Binder-TSL)
  from weights= shortcut (analytical CCT-2014 / 2SLS pweight
  sandwich), documents trivial-survey bootstrap routing for cband.
- REGISTRY.md: new "weights= shortcut ↔ bootstrap routing" bullet
  under the sup-t section explaining the synthetic trivial
  ResolvedSurveyDesign construction and why (centered +
  sqrt(n/(n-1))-corrected branch targets V_HC1, raw unit-level would
  give ((n-1)/n) · V_HC1 under-scaling).
- CHANGELOG.md: per-horizon variance on weights= shortcut clarified
  as analytical (not Binder-TSL); sup-t routing through trivial
  resolved documented.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026
@igerber igerber merged commit c2d6755 into main Apr 25, 2026
23 of 24 checks passed
@igerber igerber deleted the had-phase-4.5-survey-dispatch branch April 25, 2026 00:21
igerber added a commit that referenced this pull request Apr 25, 2026
Closes the Phase 4.5 C0 promise (PR #367 commit 29f8b12). Linearity-
family pretests now accept survey=/weights= keyword-only kwargs:

- stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test,
  joint_homogeneity_test, did_had_pretest_workflow.

Stute family: PSU-level Mammen multiplier bootstrap via
generate_survey_multiplier_weights_batch. Each replicate draws
(B, n_psu) Mammen multipliers, broadcast to per-obs perturbation
eta_obs[g] = eta_psu[psu(g)], weighted OLS refit, weighted CvM via new
_cvm_statistic_weighted helper. Joint Stute SHARES the multiplier matrix
across horizons within each replicate, preserving both vector-valued
empirical-process unit-level dependence (Delgado 1993; Escanciano 2006)
AND PSU clustering (Krieger-Pfeffermann 1997). NOT Rao-Wu rescaling --
multiplier bootstrap is a different mechanism.

Yatchew: closed-form weighted OLS + pweight-sandwich variance components
(no bootstrap):
  sigma2_lin  = sum(w * eps^2) / sum(w)
  sigma2_diff = sum(w_avg * diff^2) / (2 * sum(w))   [Reviewer CRITICAL #2]
  sigma4_W    = sum(w_avg * eps_g^2 * eps_{g-1}^2) / sum(w_avg)
  T_hr        = sqrt(sum(w)) * (sigma2_lin - sigma2_diff) / sigma2_W
where w_avg_g = (w_g + w_{g-1}) / 2 (Krieger-Pfeffermann 1997 Section 3).
All three components reduce bit-exactly to existing unweighted formulas
at w=ones(G); locked at atol=1e-14 by direct helper test.

Workflow under survey/weights: skips the QUG step with UserWarning (per
C0 deferral), sets qug=None on the report, dispatches the linearity
family with survey-aware mechanism, appends "linearity-conditional
verdict; QUG-under-survey deferred per Phase 4.5 C0" suffix to the
verdict. all_pass drops the QUG-conclusiveness gate (one less
precondition). HADPretestReport.qug retyped from QUGTestResults to
Optional[QUGTestResults]; summary/to_dict/to_dataframe updated to
None-tolerant rendering.

Pweight shortcut routing: weights= passes through a synthetic trivial
ResolvedSurveyDesign (new survey._make_trivial_resolved helper) so the
same kernel handles both entry paths -- mirrors PR #363's R7 fix pattern
on HAD sup-t.

Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise
NotImplementedError at every entry point (defense in depth, reciprocal-
guard discipline). The per-replicate weight-ratio rescaling for the
OLS-on-residuals refit step is not covered by the multiplier-bootstrap
composition; deferred to a parallel follow-up.

Per-row weights= / survey=col aggregated to per-unit via existing HAD
helpers (_aggregate_unit_weights, _aggregate_unit_resolved_survey;
constant-within-unit invariant enforced) through new
_resolve_pretest_unit_weights helper. Strictly-positive weights required
on Yatchew (the adjacent-difference variance is undefined under
contiguous-zero blocks).

Stability invariants preserved:
- Unweighted code paths bit-exact pre-PR (the new survey/weights branch
  is a separate if arm; existing 138 pretest tests pass unchanged).
- Yatchew weighted variance components reduce to unweighted at w=1 at
  atol=1e-14 (locked by TestYatchewHRTestSurvey).
- HADPretestReport schema bit-exact on the unweighted path; qug=None
  triggers the new None-tolerant rendering only on the survey path.

20 new tests across TestHADPretestWorkflowSurveyGuards (revised from
C0 rejection-only to C functional + 2 mutex/replicate-weight retained),
TestStuteTestSurvey (7), TestYatchewHRTestSurvey (7), TestJointStuteSurvey
(5). Full pretest suite: 158 tests pass.

Patch-level addition (additive on stable surfaces). See
docs/methodology/REGISTRY.md "QUG Null Test" -- Note (Phase 4.5 C) for
the full methodology.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant