Skip to content

HAD Phase 4.5 C0: QUG-under-survey decision gate#367

Merged
igerber merged 2 commits intomainfrom
had-phase-4.5-c0
Apr 25, 2026
Merged

HAD Phase 4.5 C0: QUG-under-survey decision gate#367
igerber merged 2 commits intomainfrom
had-phase-4.5-c0

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

  • Decision gate: qug_test(..., survey=) and qug_test(..., weights=) now raise NotImplementedError permanently with an educational methodology message; same gate on did_had_pretest_workflow.
  • QUG's test statistic uses extreme order statistics D_(1), D_(2) — not smooth functionals of the empirical CDF, so standard survey machinery (Binder TSL linearization, Rao-Wu rescaled bootstrap) does not yield a calibrated test. Permanent deferral with documented rationale.
  • Workflow gate is temporary — Phase 4.5 C will close it for the linearity-family pretests (stute_test, yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Joint Stute is the survey-compatible alternative.
  • 11 new tests (5 on TestQUGTest + 6 on new TestHADPretestWorkflowSurveyGuards); 138 pretest tests pass.
  • Sister pretests untouched in C0 — Phase 4.5 C will add kwargs + implementation together to avoid API churn.

Methodology rationale (mirrored across docstring ↔ error message ↔ REGISTRY)

  1. Extreme order statistics are not smooth functionals of the empirical CDF. Standard survey machinery (Binder-TSL via compute_survey_if_variance, Rao-Wu via bootstrap_utils.generate_rao_wu_weights, Krieger-Pfeffermann (1997) EDF tests) all rely on Hadamard differentiability — the first two order statistics fail it.
  2. The Exp(1)/Exp(1) limit law assumes iid sampling. Under cluster sampling D_(1) and D_(2) may both come from the same PSU, breaking independence; under stratification the smallest dose may come from a small over- or under-sampled stratum, biasing the test.
  3. The EVT-under-unequal-probability-sampling literature is sparse. Quintos et al. (2001), Beirlant et al. cover tail-INDEX estimation. No off-the-shelf method for "test the support endpoint under complex sampling" exists. The de Chaisemartin et al. (2026) HAD paper does not discuss survey extensions of QUG.

The survey-compatible alternative for HAD pretesting is joint Stute (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF that admits Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap. Phase 4.5 C ships this.

Research direction sketch (out of scope)

The theoretical bridge is sketchable: combine endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity" λ_eff = Σ_h W_h · f_h(0+). Under a "no boundary clumping" assumption, the Exp(1)/Exp(1) pivotality is preserved and only the calibration needs a survey-aware bootstrap. Publishable methodology research, ~6-12 months for a methods PhD student. Not engineering work for this library. See docs/methodology/REGISTRY.md § "QUG Null Test" — Note (Phase 4.5 C0) for the full sketch.

Files

File Change
diff_diff/had_pretests.py qug_test + did_had_pretest_workflow: new keyword-only survey=/weights= kwargs, mutex + reject guards, docstring Survey/weighted data sections.
docs/methodology/REGISTRY.md Note (Phase 4.5 C0) under QUG Null Test entry, with three-reason rationale + research-direction sketch.
tests/test_had_pretests.py 5 new tests on TestQUGTest + new TestHADPretestWorkflowSurveyGuards class (6 tests).
CHANGELOG.md [Unreleased] Added entry.
TODO.md Replaces decision-gate row with carry-forward research row.

Stability invariants preserved

  • Unweighted qug_test(d) and did_had_pretest_workflow(...) are bit-exact pre-PR (kwargs are keyword-only after *; no positional change).
  • All 10 existing TestQUGTest tests pass unchanged at atol=1e-12.
  • All 138 tests in tests/test_had_pretests.py pass.
  • Mutex guard text mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890 — cross-surface consistency.

Test plan

  • pytest tests/test_had_pretests.py::TestQUGTest tests/test_had_pretests.py::TestHADPretestWorkflowSurveyGuards -v — 21/21 green
  • pytest tests/test_had_pretests.py -v — 138/138 green (full pretest regression)
  • black diff_diff/had_pretests.py tests/test_had_pretests.py — clean
  • ruff check diff_diff/had_pretests.py tests/test_had_pretests.py — clean
  • Manual REPL smoke: unweighted call works, weights= raises, survey= raises, mutex raises ValueError before NotImplementedError
  • CI

🤖 Generated with Claude Code

Add survey=/weights= kwargs to qug_test and did_had_pretest_workflow
as keyword-only with default None. Both raise NotImplementedError when
either kwarg is non-None, with an educational message naming the
methodology rationale and pointing users to joint Stute (Phase 4.5 C,
planned) as the survey-compatible alternative. Mutex guard on
survey=+weights= mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890.

QUG-under-survey is permanently deferred. The test statistic uses
extreme order statistics (D_(1), D_(2)) which are not smooth functionals
of the empirical CDF -- standard survey machinery (Binder TSL, Rao-Wu
rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not
yield a calibrated test; under cluster sampling the Exp(1)/Exp(1) limit
law's independence assumption breaks; and the EVT-under-unequal-
probability-sampling literature (Quintos et al. 2001, Beirlant et al.)
addresses tail-index estimation, not boundary tests.

The workflow's gate is temporary -- Phase 4.5 C will close it for the
linearity-family pretests (stute_test, yatchew_hr_test, joint variants)
via Rao-Wu rescaled bootstrap. Sister pretests keep their closed
signatures in this release; Phase 4.5 C will add kwargs and
implementation together to avoid API churn.

- 11 new tests: 5 on TestQUGTest covering rejection / mutex / message-
  text checks / unweighted regression; 6 on new
  TestHADPretestWorkflowSurveyGuards covering both kwarg paths, mutex,
  methodology pointer, both aggregate paths, and unweighted regression.
- docs/methodology/REGISTRY.md: Note (Phase 4.5 C0) under QUG section
  with three-reason rationale plus a research-direction sketch (the
  theoretical bridge would combine Hall 1982 / Aarssen-de Haan 1994 /
  Hall-Wang 1999 endpoint EVT, Boistard-Lopuhaa-Ruiz-Gazen 2017 /
  Bertail-Chautru-Clemencon 2017 survey-aware functional CLT, and
  Drees 2003 tail-empirical-process theory -- publishable methodology
  research, not engineering work).
- CHANGELOG.md: Phase 4.5 C0 entry under [Unreleased].
- TODO.md: replaces decision-gate row with carry-forward research row.

Unweighted qug_test(d) and did_had_pretest_workflow(...) calls are
bit-exact pre-PR (kwargs are keyword-only after *; positional path
unchanged). 138 pretest tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. The diff adds front-door rejection for unsupported survey/weights inputs without changing the unweighted QUG or workflow computations. The only issue I found is a P3 documentation inconsistency about the planned Phase 4.5 C methodology for yatchew_hr_test.

Executive Summary

  • The QUG gate is source-material consistent: the paper sets up HAD under an i.i.d. sampling assumption and defines the QUG test as the order-statistic ratio with critical region T > 1/α - 1; this PR keeps that unweighted math unchanged and documents the survey/weights rejection in the registry. diff_diff/had_pretests.py:L1033-L1152, docs/methodology/REGISTRY.md:L2364-L2377 (cdn.arenafi.org)
  • qug_test and did_had_pretest_workflow both reject survey/weights at the front door, and the mutex wording is aligned with HeterogeneousAdoptionDiD.fit(). diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2829-L2855, diff_diff/had.py:L2890-L2896
  • I did not find new NaN/inference, variance/SE, or control-group bugs; the unweighted computational paths are untouched after the new guards. diff_diff/had_pretests.py:L1154-L1233, diff_diff/had_pretests.py:L2857-L3000
  • P3: the forward-looking docs are internally inconsistent about Phase 4.5 C survey support for yatchew_hr_test. TODO.md says Rao-Wu is for the Stute family and Yatchew will use weighted OLS/variance, but had_pretests.py, REGISTRY.md, and CHANGELOG.md describe the whole linearity family as Rao-Wu-based. diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377, CHANGELOG.md:L11-L11, TODO.md:L98-L98
  • I could not execute the added tests here because pytest is not installed in this environment; by inspection, the new coverage hits direct guards, mutex ordering, message content, both workflow paths, and an unweighted regression smoke test. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969

Methodology

  • P3 — Impact: none to current behavior. The permanent survey/weights rejection on qug_test and did_had_pretest_workflow is a documented deviation, not an undocumented methodology change. The paper frames the QUG setup under i.i.d. sampling and uses the same order-statistic test/statistical decision rule preserved for unweighted calls here. Concrete fix: none required. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2706-L2855, docs/methodology/REGISTRY.md:L2364-L2377 (cdn.arenafi.org)
  • P3 — Impact: future methodology guidance for Phase 4.5 C is internally inconsistent for yatchew_hr_test, which can mislead the next implementer even though runtime behavior is correct today. Concrete fix: change the new wording to say “Rao-Wu rescaled bootstrap for the Stute family; weighted OLS residuals + weighted variance estimator for Yatchew,” matching TODO.md, or remove Yatchew from the Rao-Wu sentence until implementation lands. diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377, CHANGELOG.md:L11-L11, TODO.md:L98-L98

Code Quality

  • No findings. The new kwargs are keyword-only, the mutex handling mirrors the estimator surface, and the rejection path is isolated before downstream computation. diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2829-L2855, diff_diff/had.py:L2890-L2896

Performance

  • No findings. The diff adds constant-time guard logic only.

Maintainability

  • No additional findings beyond the documentation drift above.

Tech Debt

  • P3 — Impact: none. The research-level QUG-under-survey limitation and the planned survey support for the remaining pretests are explicitly tracked in project debt, so they are mitigated under this review rubric. Concrete fix: none for this PR. TODO.md:L97-L98

Security

  • No findings.

Documentation/Tests

  • No blocker. The added tests cover the changed public contract well for both direct and workflow surfaces. I could not execute them here because pytest is unavailable. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969

R1 P3 — forward-looking docs were internally inconsistent about
Phase 4.5 C survey support for yatchew_hr_test. TODO.md correctly
distinguishes Rao-Wu (Stute family) from weighted-OLS+weighted-variance
(Yatchew); the new C0 docs in had_pretests.py, REGISTRY.md, and
CHANGELOG.md described the whole linearity family as Rao-Wu-based,
which would mislead the next implementer (Yatchew 1997 is a closed-form
variance-ratio test, not bootstrap-based).

Updated all three new docs to spell out the per-test mechanism
explicitly: Rao-Wu rescaled bootstrap for stute_test +
stute_joint_pretest + joint_pretrends_test + joint_homogeneity_test;
weighted OLS residuals + weighted variance estimator for
yatchew_hr_test. Now consistent with TODO.md.

Test message-text assertions still pass (assert "Rao-Wu" in msg holds
because Stute family still uses Rao-Wu).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 853ec02871a6cc061fea00aebc909840c5bd5d8f


Overall Assessment

✅ Looks good

The affected methods are qug_test and did_had_pretest_workflow. The paper’s unweighted QUG statistic and TWFE-screening workflow are unchanged; this PR only adds front-door rejection for unsupported survey/weights inputs, and that deviation is explicitly documented in the methodology registry rather than silently changing estimator behavior. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2717-L2865, docs/methodology/REGISTRY.md:L2364-L2377. citeturn4view0turn2view0

Executive Summary

  • Re-review result: the prior P3 documentation inconsistency around Phase 4.5 C / yatchew_hr_test appears resolved. The changed files now consistently say Rao-Wu for the Stute family and weighted OLS + weighted variance for Yatchew. CHANGELOG.md:L11-L11, TODO.md:L98-L98, diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2376-L2377
  • qug_test adds keyword-only survey/weights guards without altering the Theorem 4 computation on unweighted inputs. diff_diff/had_pretests.py:L1033-L1155. citeturn4view0
  • did_had_pretest_workflow adds the same front-door guard on both overall and event-study entry paths before panel validation or downstream pretests, so unsupported survey inputs cannot leak into unweighted code paths. diff_diff/had_pretests.py:L2829-L2867
  • The QUG-under-survey deferral is documented in REGISTRY.md and tracked in TODO.md, so under this rubric it is mitigated, not a blocker. docs/methodology/REGISTRY.md:L2372-L2377, TODO.md:L97-L98
  • Added tests cover direct guards, mutex ordering, message content, both workflow aggregate paths, and unweighted regression smoke paths. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969
  • I did not execute the test suite here because the local environment is missing Python deps (numpy import fails and pytest is not installed).

Methodology

  • P3-informational (mitigated) — Impact: none to current behavior. The new survey/weights rejection on qug_test and did_had_pretest_workflow is documented in docs/methodology/REGISTRY.md as a deliberate Phase 4.5 C0 deviation, so this is not an undocumented methodology mismatch. Concrete fix: none required. diff_diff/had_pretests.py:L1033-L1152, diff_diff/had_pretests.py:L2829-L2865, docs/methodology/REGISTRY.md:L2372-L2377. citeturn4view0turn2view0
  • No unmitigated P0/P1 findings.

Code Quality

  • No findings. The mutex check is explicit, the reject path is isolated ahead of downstream computation, and the changed public signatures are propagated to the workflow surface rather than leaving a partial wrapper mismatch. diff_diff/had_pretests.py:L1113-L1152, diff_diff/had_pretests.py:L2834-L2865

Performance

  • No findings. The diff adds constant-time guard logic only.

Maintainability

  • No findings. The changed contract is now aligned across code, registry, changelog, and TODO, which resolves the prior wording drift. CHANGELOG.md:L11-L11, TODO.md:L97-L98, diff_diff/had_pretests.py:L2807-L2814, docs/methodology/REGISTRY.md:L2372-L2377

Tech Debt

  • P3-informational (tracked) — Impact: none to current behavior. The research-only possibility of a future survey-aware support-endpoint test is explicitly tracked in TODO.md, which satisfies the project’s deferred-work policy. Concrete fix: none required for this PR. TODO.md:L97-L98

Security

  • No findings.

Documentation/Tests

  • No findings by inspection. The new tests are appropriately targeted at the changed contract, and the documentation now consistently explains the permanent QUG gate vs the temporary workflow gate. Concrete fix: none required. tests/test_had_pretests.py:L185-L244, tests/test_had_pretests.py:L2869-L2969, CHANGELOG.md:L11-L11, docs/methodology/REGISTRY.md:L2372-L2377

@igerber igerber merged commit 29f8b12 into main Apr 25, 2026
19 checks passed
@igerber igerber deleted the had-phase-4.5-c0 branch April 25, 2026 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant