HAD Phase 4.5 C0: QUG-under-survey decision gate by igerber · Pull Request #367 · igerber/diff-diff

igerber · 2026-04-25T01:09:52Z

Summary

Decision gate: qug_test(..., survey=) and qug_test(..., weights=) now raise NotImplementedError permanently with an educational methodology message; same gate on did_had_pretest_workflow.
QUG's test statistic uses extreme order statistics D_(1), D_(2) — not smooth functionals of the empirical CDF, so standard survey machinery (Binder TSL linearization, Rao-Wu rescaled bootstrap) does not yield a calibrated test. Permanent deferral with documented rationale.
Workflow gate is temporary — Phase 4.5 C will close it for the linearity-family pretests (stute_test, yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Joint Stute is the survey-compatible alternative.
11 new tests (5 on TestQUGTest + 6 on new TestHADPretestWorkflowSurveyGuards); 138 pretest tests pass.
Sister pretests untouched in C0 — Phase 4.5 C will add kwargs + implementation together to avoid API churn.

Methodology rationale (mirrored across docstring ↔ error message ↔ REGISTRY)

Extreme order statistics are not smooth functionals of the empirical CDF. Standard survey machinery (Binder-TSL via compute_survey_if_variance, Rao-Wu via bootstrap_utils.generate_rao_wu_weights, Krieger-Pfeffermann (1997) EDF tests) all rely on Hadamard differentiability — the first two order statistics fail it.
The Exp(1)/Exp(1) limit law assumes iid sampling. Under cluster sampling D_(1) and D_(2) may both come from the same PSU, breaking independence; under stratification the smallest dose may come from a small over- or under-sampled stratum, biasing the test.
The EVT-under-unequal-probability-sampling literature is sparse. Quintos et al. (2001), Beirlant et al. cover tail-INDEX estimation. No off-the-shelf method for "test the support endpoint under complex sampling" exists. The de Chaisemartin et al. (2026) HAD paper does not discuss survey extensions of QUG.

The survey-compatible alternative for HAD pretesting is joint Stute (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF that admits Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap. Phase 4.5 C ships this.

Research direction sketch (out of scope)

The theoretical bridge is sketchable: combine endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity" λ_eff = Σ_h W_h · f_h(0+). Under a "no boundary clumping" assumption, the Exp(1)/Exp(1) pivotality is preserved and only the calibration needs a survey-aware bootstrap. Publishable methodology research, ~6-12 months for a methods PhD student. Not engineering work for this library. See docs/methodology/REGISTRY.md § "QUG Null Test" — Note (Phase 4.5 C0) for the full sketch.

Files

File	Change
`diff_diff/had_pretests.py`	`qug_test` + `did_had_pretest_workflow`: new keyword-only `survey=`/`weights=` kwargs, mutex + reject guards, docstring Survey/weighted data sections.
`docs/methodology/REGISTRY.md`	Note (Phase 4.5 C0) under QUG Null Test entry, with three-reason rationale + research-direction sketch.
`tests/test_had_pretests.py`	5 new tests on `TestQUGTest` + new `TestHADPretestWorkflowSurveyGuards` class (6 tests).
`CHANGELOG.md`	`[Unreleased]` Added entry.
`TODO.md`	Replaces decision-gate row with carry-forward research row.

Stability invariants preserved

Unweighted qug_test(d) and did_had_pretest_workflow(...) are bit-exact pre-PR (kwargs are keyword-only after *; no positional change).
All 10 existing TestQUGTest tests pass unchanged at atol=1e-12.
All 138 tests in tests/test_had_pretests.py pass.
Mutex guard text mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890 — cross-surface consistency.

Test plan

pytest tests/test_had_pretests.py::TestQUGTest tests/test_had_pretests.py::TestHADPretestWorkflowSurveyGuards -v — 21/21 green
pytest tests/test_had_pretests.py -v — 138/138 green (full pretest regression)
black diff_diff/had_pretests.py tests/test_had_pretests.py — clean
ruff check diff_diff/had_pretests.py tests/test_had_pretests.py — clean
Manual REPL smoke: unweighted call works, weights= raises, survey= raises, mutex raises ValueError before NotImplementedError
CI

🤖 Generated with Claude Code

Add survey=/weights= kwargs to qug_test and did_had_pretest_workflow as keyword-only with default None. Both raise NotImplementedError when either kwarg is non-None, with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on survey=+weights= mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890. QUG-under-survey is permanently deferred. The test statistic uses extreme order statistics (D_(1), D_(2)) which are not smooth functionals of the empirical CDF -- standard survey machinery (Binder TSL, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the Exp(1)/Exp(1) limit law's independence assumption breaks; and the EVT-under-unequal- probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is temporary -- Phase 4.5 C will close it for the linearity-family pretests (stute_test, yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Sister pretests keep their closed signatures in this release; Phase 4.5 C will add kwargs and implementation together to avoid API churn. - 11 new tests: 5 on TestQUGTest covering rejection / mutex / message- text checks / unweighted regression; 6 on new TestHADPretestWorkflowSurveyGuards covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression. - docs/methodology/REGISTRY.md: Note (Phase 4.5 C0) under QUG section with three-reason rationale plus a research-direction sketch (the theoretical bridge would combine Hall 1982 / Aarssen-de Haan 1994 / Hall-Wang 1999 endpoint EVT, Boistard-Lopuhaa-Ruiz-Gazen 2017 / Bertail-Chautru-Clemencon 2017 survey-aware functional CLT, and Drees 2003 tail-empirical-process theory -- publishable methodology research, not engineering work). - CHANGELOG.md: Phase 4.5 C0 entry under [Unreleased]. - TODO.md: replaces decision-gate row with carry-forward research row. Unweighted qug_test(d) and did_had_pretest_workflow(...) calls are bit-exact pre-PR (kwargs are keyword-only after *; positional path unchanged). 138 pretest tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-25T01:17:21Z

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The diff adds front-door rejection for unsupported survey/weights inputs without changing the unweighted QUG or workflow computations. The only issue I found is a P3 documentation inconsistency about the planned Phase 4.5 C methodology for yatchew_hr_test.

Executive Summary

The QUG gate is source-material consistent: the paper sets up HAD under an i.i.d. sampling assumption and defines the QUG test as the order-statistic ratio with critical region T > 1/α - 1; this PR keeps that unweighted math unchanged and documents the survey/weights rejection in the registry. diff_diff/had_pretests.py:L1033-L1152, docs/methodology/REGISTRY.md:L2364-L2377 (cdn.arenafi.org)
qug_test and did_had_pretest_workflow both reject survey/weights at the front door, and the mutex wording is aligned with HeterogeneousAdoptionDiD.fit(). diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2829-L2855, diff_diff/had.py:L2890-L2896
I did not find new NaN/inference, variance/SE, or control-group bugs; the unweighted computational paths are untouched after the new guards. diff_diff/had_pretests.py:L1154-L1233, diff_diff/had_pretests.py:L2857-L3000
P3: the forward-looking docs are internally inconsistent about Phase 4.5 C survey support for yatchew_hr_test. TODO.md says Rao-Wu is for the Stute family and Yatchew will use weighted OLS/variance, but had_pretests.py, REGISTRY.md, and CHANGELOG.md describe the whole linearity family as Rao-Wu-based. diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377, CHANGELOG.md:L11-L11, TODO.md:L98-L98
I could not execute the added tests here because pytest is not installed in this environment; by inspection, the new coverage hits direct guards, mutex ordering, message content, both workflow paths, and an unweighted regression smoke test. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969

Methodology

P3 — Impact: none to current behavior. The permanent survey/weights rejection on qug_test and did_had_pretest_workflow is a documented deviation, not an undocumented methodology change. The paper frames the QUG setup under i.i.d. sampling and uses the same order-statistic test/statistical decision rule preserved for unweighted calls here. Concrete fix: none required. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2706-L2855, docs/methodology/REGISTRY.md:L2364-L2377 (cdn.arenafi.org)
P3 — Impact: future methodology guidance for Phase 4.5 C is internally inconsistent for yatchew_hr_test, which can mislead the next implementer even though runtime behavior is correct today. Concrete fix: change the new wording to say “Rao-Wu rescaled bootstrap for the Stute family; weighted OLS residuals + weighted variance estimator for Yatchew,” matching TODO.md, or remove Yatchew from the Rao-Wu sentence until implementation lands. diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377, CHANGELOG.md:L11-L11, TODO.md:L98-L98

Code Quality

No findings. The new kwargs are keyword-only, the mutex handling mirrors the estimator surface, and the rejection path is isolated before downstream computation. diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2829-L2855, diff_diff/had.py:L2890-L2896

Performance

No findings. The diff adds constant-time guard logic only.

Maintainability

No additional findings beyond the documentation drift above.

Tech Debt

P3 — Impact: none. The research-level QUG-under-survey limitation and the planned survey support for the remaining pretests are explicitly tracked in project debt, so they are mitigated under this review rubric. Concrete fix: none for this PR. TODO.md:L97-L98

Security

No findings.

Documentation/Tests

No blocker. The added tests cover the changed public contract well for both direct and workflow surfaces. I could not execute them here because pytest is unavailable. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969

R1 P3 — forward-looking docs were internally inconsistent about Phase 4.5 C survey support for yatchew_hr_test. TODO.md correctly distinguishes Rao-Wu (Stute family) from weighted-OLS+weighted-variance (Yatchew); the new C0 docs in had_pretests.py, REGISTRY.md, and CHANGELOG.md described the whole linearity family as Rao-Wu-based, which would mislead the next implementer (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Updated all three new docs to spell out the per-test mechanism explicitly: Rao-Wu rescaled bootstrap for stute_test + stute_joint_pretest + joint_pretrends_test + joint_homogeneity_test; weighted OLS residuals + weighted variance estimator for yatchew_hr_test. Now consistent with TODO.md. Test message-text assertions still pass (assert "Rao-Wu" in msg holds because Stute family still uses Rao-Wu). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T01:19:57Z

/ai-review

github-actions · 2026-04-25T01:34:27Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 853ec02871a6cc061fea00aebc909840c5bd5d8f

Overall Assessment

✅ Looks good

The affected methods are qug_test and did_had_pretest_workflow. The paper’s unweighted QUG statistic and TWFE-screening workflow are unchanged; this PR only adds front-door rejection for unsupported survey/weights inputs, and that deviation is explicitly documented in the methodology registry rather than silently changing estimator behavior. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2717-L2865, docs/methodology/REGISTRY.md:L2364-L2377. citeturn4view0turn2view0

Executive Summary

Re-review result: the prior P3 documentation inconsistency around Phase 4.5 C / yatchew_hr_test appears resolved. The changed files now consistently say Rao-Wu for the Stute family and weighted OLS + weighted variance for Yatchew. CHANGELOG.md:L11-L11, TODO.md:L98-L98, diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377
qug_test adds keyword-only survey/weights guards without altering the Theorem 4 computation on unweighted inputs. diff_diff/had_pretests.py:L1033-L1155. citeturn4view0
did_had_pretest_workflow adds the same front-door guard on both overall and event-study entry paths before panel validation or downstream pretests, so unsupported survey inputs cannot leak into unweighted code paths. diff_diff/had_pretests.py:L2829-L2867
The QUG-under-survey deferral is documented in REGISTRY.md and tracked in TODO.md, so under this rubric it is mitigated, not a blocker. docs/methodology/REGISTRY.md:L2372-L2377, TODO.md:L97-L98
Added tests cover direct guards, mutex ordering, message content, both workflow aggregate paths, and unweighted regression smoke paths. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969
I did not execute the test suite here because the local environment is missing Python deps (numpy import fails and pytest is not installed).

Methodology

P3-informational (mitigated) — Impact: none to current behavior. The new survey/weights rejection on qug_test and did_had_pretest_workflow is documented in docs/methodology/REGISTRY.md as a deliberate Phase 4.5 C0 deviation, so this is not an undocumented methodology mismatch. Concrete fix: none required. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2829-L2865, docs/methodology/REGISTRY.md:L2372-L2377. citeturn4view0turn2view0
No unmitigated P0/P1 findings.

Code Quality

No findings. The mutex check is explicit, the reject path is isolated ahead of downstream computation, and the changed public signatures are propagated to the workflow surface rather than leaving a partial wrapper mismatch. diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2834-L2865

Performance

No findings. The diff adds constant-time guard logic only.

Maintainability

No findings. The changed contract is now aligned across code, registry, changelog, and TODO, which resolves the prior wording drift. CHANGELOG.md:L11-L11, TODO.md:L97-L98, diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2372-L2377

Tech Debt

P3-informational (tracked) — Impact: none to current behavior. The research-only possibility of a future survey-aware support-endpoint test is explicitly tracked in TODO.md, which satisfies the project’s deferred-work policy. Concrete fix: none required for this PR. TODO.md:L97-L98

Security

No findings.

Documentation/Tests

No findings by inspection. The new tests are appropriately targeted at the changed contract, and the documentation now consistently explains the permanent QUG gate vs the temporary workflow gate. Concrete fix: none required. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969, CHANGELOG.md:L11-L11, docs/methodology/REGISTRY.md:L2372-L2377

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 29f8b12 into main Apr 25, 2026
19 checks passed

igerber deleted the had-phase-4.5-c0 branch April 25, 2026 10:44

This was referenced Apr 25, 2026

Release 3.3.0: HAD estimator, profile_panel, dCDH by_path, SDID survey complete #368

Merged

HAD Phase 4.5 C: linearity-family pretests under survey #370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HAD Phase 4.5 C0: QUG-under-survey decision gate#367

HAD Phase 4.5 C0: QUG-under-survey decision gate#367
igerber merged 2 commits intomainfrom
had-phase-4.5-c0

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 25, 2026

Summary

Methodology rationale (mirrored across docstring ↔ error message ↔ REGISTRY)

Research direction sketch (out of scope)

Files

Stability invariants preserved

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant