HAD Phase 4.5 C0: QUG-under-survey decision gate#367
Conversation
Add survey=/weights= kwargs to qug_test and did_had_pretest_workflow as keyword-only with default None. Both raise NotImplementedError when either kwarg is non-None, with an educational message naming the methodology rationale and pointing users to joint Stute (Phase 4.5 C, planned) as the survey-compatible alternative. Mutex guard on survey=+weights= mirrors HeterogeneousAdoptionDiD.fit() at had.py:2890. QUG-under-survey is permanently deferred. The test statistic uses extreme order statistics (D_(1), D_(2)) which are not smooth functionals of the empirical CDF -- standard survey machinery (Binder TSL, Rao-Wu rescaled bootstrap, Krieger-Pfeffermann (1997) EDF tests) does not yield a calibrated test; under cluster sampling the Exp(1)/Exp(1) limit law's independence assumption breaks; and the EVT-under-unequal- probability-sampling literature (Quintos et al. 2001, Beirlant et al.) addresses tail-index estimation, not boundary tests. The workflow's gate is temporary -- Phase 4.5 C will close it for the linearity-family pretests (stute_test, yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Sister pretests keep their closed signatures in this release; Phase 4.5 C will add kwargs and implementation together to avoid API churn. - 11 new tests: 5 on TestQUGTest covering rejection / mutex / message- text checks / unweighted regression; 6 on new TestHADPretestWorkflowSurveyGuards covering both kwarg paths, mutex, methodology pointer, both aggregate paths, and unweighted regression. - docs/methodology/REGISTRY.md: Note (Phase 4.5 C0) under QUG section with three-reason rationale plus a research-direction sketch (the theoretical bridge would combine Hall 1982 / Aarssen-de Haan 1994 / Hall-Wang 1999 endpoint EVT, Boistard-Lopuhaa-Ruiz-Gazen 2017 / Bertail-Chautru-Clemencon 2017 survey-aware functional CLT, and Drees 2003 tail-empirical-process theory -- publishable methodology research, not engineering work). - CHANGELOG.md: Phase 4.5 C0 entry under [Unreleased]. - TODO.md: replaces decision-gate row with carry-forward research row. Unweighted qug_test(d) and did_had_pretest_workflow(...) calls are bit-exact pre-PR (kwargs are keyword-only after *; positional path unchanged). 138 pretest tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The diff adds front-door rejection for unsupported Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R1 P3 — forward-looking docs were internally inconsistent about Phase 4.5 C survey support for yatchew_hr_test. TODO.md correctly distinguishes Rao-Wu (Stute family) from weighted-OLS+weighted-variance (Yatchew); the new C0 docs in had_pretests.py, REGISTRY.md, and CHANGELOG.md described the whole linearity family as Rao-Wu-based, which would mislead the next implementer (Yatchew 1997 is a closed-form variance-ratio test, not bootstrap-based). Updated all three new docs to spell out the per-test mechanism explicitly: Rao-Wu rescaled bootstrap for stute_test + stute_joint_pretest + joint_pretrends_test + joint_homogeneity_test; weighted OLS residuals + weighted variance estimator for yatchew_hr_test. Now consistent with TODO.md. Test message-text assertions still pass (assert "Rao-Wu" in msg holds because Stute family still uses Rao-Wu). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good The affected methods are Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
qug_test(..., survey=)andqug_test(..., weights=)now raiseNotImplementedErrorpermanently with an educational methodology message; same gate ondid_had_pretest_workflow.D_(1), D_(2)— not smooth functionals of the empirical CDF, so standard survey machinery (Binder TSL linearization, Rao-Wu rescaled bootstrap) does not yield a calibrated test. Permanent deferral with documented rationale.stute_test,yatchew_hr_test, joint variants) via Rao-Wu rescaled bootstrap. Joint Stute is the survey-compatible alternative.TestQUGTest+ 6 on newTestHADPretestWorkflowSurveyGuards); 138 pretest tests pass.Methodology rationale (mirrored across docstring ↔ error message ↔ REGISTRY)
compute_survey_if_variance, Rao-Wu viabootstrap_utils.generate_rao_wu_weights, Krieger-Pfeffermann (1997) EDF tests) all rely on Hadamard differentiability — the first two order statistics fail it.D_(1)andD_(2)may both come from the same PSU, breaking independence; under stratification the smallest dose may come from a small over- or under-sampled stratum, biasing the test.The survey-compatible alternative for HAD pretesting is joint Stute (a CvM cusum of regression residuals) — a smooth functional of the empirical CDF that admits Krieger-Pfeffermann (1997) + Rao-Wu rescaled bootstrap. Phase 4.5 C ships this.
Research direction sketch (out of scope)
The theoretical bridge is sketchable: combine endpoint-estimation EVT (Hall 1982, Aarssen-de Haan 1994, Hall-Wang 1999, Beirlant-de Wet-Goegebeur 2006), survey-aware functional CLTs (Boistard-Lopuhaä-Ruiz-Gazen 2017, Bertail-Chautru-Clémençon 2017), and tail-empirical-process theory (Drees 2003) to define a "design-effective boundary intensity"
λ_eff = Σ_h W_h · f_h(0+). Under a "no boundary clumping" assumption, theExp(1)/Exp(1)pivotality is preserved and only the calibration needs a survey-aware bootstrap. Publishable methodology research, ~6-12 months for a methods PhD student. Not engineering work for this library. Seedocs/methodology/REGISTRY.md§ "QUG Null Test" — Note (Phase 4.5 C0) for the full sketch.Files
diff_diff/had_pretests.pyqug_test+did_had_pretest_workflow: new keyword-onlysurvey=/weights=kwargs, mutex + reject guards, docstring Survey/weighted data sections.docs/methodology/REGISTRY.mdtests/test_had_pretests.pyTestQUGTest+ newTestHADPretestWorkflowSurveyGuardsclass (6 tests).CHANGELOG.md[Unreleased]Added entry.TODO.mdStability invariants preserved
qug_test(d)anddid_had_pretest_workflow(...)are bit-exact pre-PR (kwargs are keyword-only after*; no positional change).TestQUGTesttests pass unchanged atatol=1e-12.tests/test_had_pretests.pypass.HeterogeneousAdoptionDiD.fit()athad.py:2890— cross-surface consistency.Test plan
pytest tests/test_had_pretests.py::TestQUGTest tests/test_had_pretests.py::TestHADPretestWorkflowSurveyGuards -v— 21/21 greenpytest tests/test_had_pretests.py -v— 138/138 green (full pretest regression)black diff_diff/had_pretests.py tests/test_had_pretests.py— cleanruff check diff_diff/had_pretests.py tests/test_had_pretests.py— cleanweights=raises,survey=raises, mutex raises ValueError before NotImplementedError🤖 Generated with Claude Code