Consolidate HAD survey-design API to single survey_design= kwarg by igerber · Pull Request #376 · igerber/diff-diff

igerber · 2026-04-25T19:33:34Z

Summary

Consolidates all 8 HAD surfaces (HAD.fit + workflow + 6 pretests) to the canonical survey_design= kwarg matching ContinuousDiD/EfficientDiD/dCDH.
Soft deprecation cycle: survey= and weights= become DeprecationWarning aliases; removal queued for the next minor (TODO row added).
New public helper make_pweight_design(weights: np.ndarray) -> ResolvedSurveyDesign exported from diff_diff top level for the pweight-only convenience on array-in pretest helpers (formerly the private survey._make_trivial_resolved).
Bit-exact regression preserved — internal back-end paths unchanged; deprecation shim only rebinds entry kwarg names.

Key design choices

Surface split (data-in vs array-in): data-in surfaces (HAD.fit, workflow, joint data-in wrappers) accept survey_design=SurveyDesign(weights="col") and resolve against data at fit time. Array-in surfaces (stute_test, yatchew_hr_test, stute_joint_pretest, qug_test) take pre-resolved ResolvedSurveyDesign only; passing a SurveyDesign raises TypeError with migration guidance to make_pweight_design(arr) or pre-resolution.
Three-way mutex: at most one of {survey_design, survey, weights} may be non-None per call. Two distinct error messages per surface group (data-in vs array-in) point users to the right migration target.
Normalization-order invariant (load-bearing): the weights= deprecation shim binds survey_design = make_pweight_design(weights_unnormalized) and lets the unified path apply the mean=1 normalization step exactly once. Locked by scale-invariance test.

Test plan

535 tests pass (489 pre-PR baseline + 46 new in tests/test_had_dual_knob_deprecation.py)
Bit-exact numerical regression on the legacy weights= and survey= paths (all existing weighted tests still produce identical numbers + a DeprecationWarning)
8 surfaces × {survey_design= smoke, weights= warn, survey= warn, parity legacy ≡ new, three-way mutex} — 40 cases
Surface-spanning: make_pweight_design exported, alias of _make_trivial_resolved, array-in TypeError on SurveyDesign, scale-invariance for both stute_test and yatchew_hr_test
Cross-estimator regression: tests/test_survey.py + test_continuous_did.py + test_efficient_did.py + test_chaisemartin_dhaultfoeuille.py — 536 passed (no breakage from survey.py changes)
black + ruff clean on touched files

🤖 Generated with Claude Code

github-actions · 2026-04-25T19:39:30Z

Overall Assessment

⚠️ Needs changes

Executive Summary

I did not find a paper-level estimator/SE rewrite here; the HAD changes are intended as an API-front-door consolidation, and the core math/variance paths still look unchanged.
P1: the deprecated survey= alias is not actually equivalent to survey_design= on the array-in helpers. SurveyDesign(...) bypasses the new type guard in stute_test, yatchew_hr_test, and stute_joint_pretest, so the alias path fails later with the wrong exception/message instead of the documented TypeError.
P2: the PR removes or de-indexes existing API docs for HeterogeneousAdoptionDiD and StaggeredTripleDifference while README still points to those pages/anchors, creating broken/orphaned public docs.
P2: the bundled practitioner guide regresses by removing dCDH HonestDiD support even though the implementation, registry, and tests still support compute_honest_did() on ChaisemartinDHaultfoeuille results.
P2: docs/methodology/REGISTRY.md becomes internally contradictory on HAD survey support, which is risky because the registry is supposed to be the load-bearing methodology contract.
P3: alias removal/back-end cleanup is properly tracked in TODO.md; that follow-up is not a blocker.

Methodology

Severity: P2. Impact: docs/methodology/REGISTRY.md:L2350-L2351 says both that HAD pretests “still do NOT accept survey= / weights=” and that “all 8 HAD surfaces” now accept survey_design= plus deprecated aliases. Because the registry is the source of truth for method behavior, leaving both statements in place makes the survey-support contract ambiguous for future reviewers and users. Concrete fix: remove/update the stale pre-Phase-4.5 note so the registry has one HAD survey-support contract.
No unmitigated paper/math/SE deviation found in the estimator logic itself.

Code Quality

Severity: P1. Impact: in diff_diff/had_pretests.py:L1556-L1593, L2029-L2058, and L2707-L2736, the SurveyDesign type guard runs before deprecated alias rebinding. So survey_design=SurveyDesign(...) gets the intended TypeError, but survey=SurveyDesign(...) does not; it is later treated as if it were a ResolvedSurveyDesign. That breaks the advertised “survey= is a deprecation alias of survey_design=” contract on three public surfaces. Concrete fix: move alias rebinding ahead of the type guard, or rerun the same guard immediately after survey_design = survey; add regression tests for survey=SurveyDesign(...) on all three helpers. tests/test_had_dual_knob_deprecation.py:L136-L165 and L251-L430 currently cover only the canonical-kwarg guard, not the deprecated-alias version.

Performance

No findings.

Maintainability

No additional findings beyond the documentation regressions below.

Tech Debt

Severity: P3. Impact: the next-minor cleanup to remove deprecated HAD aliases and fold remaining legacy back-end paths is properly tracked in TODO.md:L102. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

Severity: P2. Impact: the diff deletes or de-indexes major public API docs for existing estimators (docs/api/had.rst deleted; docs/api/index.rst removes HAD entries; docs/api/staggered.rst:L128-L145 removes StaggeredTripleDifference) while README still links to those pages (README.md:L113-L117). That leaves published docs with broken/orphaned links for two public estimators. Concrete fix: either keep the HAD / StaggeredTripleDifference API pages in the Sphinx toctree, or update README/guides to point to the new canonical destination before merge.
Severity: P2. Impact: the Step 6 edit in diff_diff/guides/llms-practitioner.txt removes dCDH from the list of result types supported by compute_honest_did(), but the implementation and tests still support it (docs/methodology/REGISTRY.md:L637; tests/test_honest_did.py:L1343-L1415). This turns the bundled guide into a false negative for a supported analysis path. Concrete fix: restore the dCDH-specific HonestDiD guidance, including its placebo-based caveats, instead of collapsing the section to MultiPeriodDiD/CallawaySantAnna only.
Severity: P3. Impact: the README survey-support text remains stale for HAD by describing survey= / weights= as the HAD entry point instead of survey_design= with deprecated aliases (README.md:L129-L133 in the changed block). Concrete fix: update that sentence to reflect the new canonical kwarg.

Path to Approval

Make the deprecated survey= alias behaviorally identical to survey_design= on stute_test, yatchew_hr_test, and stute_joint_pretest, and add regression tests for survey=SurveyDesign(...).
Reconcile docs/methodology/REGISTRY.md so it contains one HAD survey-support contract, not mutually inconsistent old and new notes.
Restore or replace the deleted/orphaned API docs for HeterogeneousAdoptionDiD and StaggeredTripleDifference, and ensure README links resolve.
Restore the dCDH HonestDiD guidance in the bundled practitioner guide.

Adds survey_design= as the canonical kwarg on all 8 HAD surfaces (HAD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test) to match the rest of the library (ContinuousDiD/EfficientDiD/dCDH already use survey_design=). The existing survey= and weights= kwargs become deprecated aliases (DeprecationWarning, removal next minor); internal back-end paths unchanged so numerical results are bit-exact pre-PR. Promotes survey._make_trivial_resolved → public make_pweight_design helper for the pweight-only convenience on array-in pretest helpers (which take ResolvedSurveyDesign, not column-referencing SurveyDesign). Underscore name kept as permanent private alias for back-compat. Three-way mutex (survey_design + survey + weights) extends the prior 2-way; two distinct error messages per surface group point users to the right migration target (SurveyDesign(weights='col') for data-in surfaces vs make_pweight_design(arr) for array-in helpers). 535 tests pass (489 pre-PR + 46 new in tests/test_had_dual_knob_deprecation.py covering 8 surfaces × {survey_design= smoke, weights= warn, survey= warn, parity, mutex} plus surface-spanning tests for type guards, normalization- order invariant, and public-helper export). Bit-exact regression locked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R1 P1: deprecated `survey=SurveyDesign(...)` alias didn't trigger the SurveyDesign type guard on stute_test, yatchew_hr_test, stute_joint_pretest because the guard ran BEFORE the alias rebinding. Move the guard AFTER the soft-deprecation block so it covers both `survey_design=SurveyDesign(...)` (canonical) and `survey=SurveyDesign(...)` (deprecated alias) identically. Adds 3 regression tests in TestArrayInTypeGuard covering the alias path on all 3 array-in surfaces. R1 P2: REGISTRY.md had two contradictory notes on HAD survey support — the pre-Phase-4.5-C bullet said "pretests still do NOT accept survey/weights" while the Phase 4.5 C bullet listed all 8 surfaces as supporting them. Rewrote the older bullet to reflect the current Phase 4.5 B + C state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:45:59Z

/ai-review

github-actions · 2026-04-25T19:52:15Z

🔁 AI review rerun (requested by @igerber)

Head SHA: eef8af4d46d13f910f875c7b5f3a2086455885d1

Overall Assessment
⚠️ Needs changes

Executive Summary

The previous array-in alias-parity bug appears fixed: survey=SurveyDesign(...) now goes through the same guard path as survey_design=... in the three array-in linearity helpers.
I did not find a paper-level estimator, weighting, or variance rewrite here; this diff is an API-front-door consolidation, and the registry now documents it as such.
P1 [Newly identified]: the new survey_design= front door is not directly tested across the full HAD dispatch matrix it now fronts. The added coverage only exercises HeterogeneousAdoptionDiD.fit on the two-period continuous_at_zero/overall path and did_had_pretest_workflow on the overall path, leaving weighted event-study and mass-point canonical entry points unverified.
P3: did_had_pretest_workflow(..., weights=..., aggregate="event_study") still re-emits inner deprecation warnings from the joint wrappers.
P3: the shared array-in mutex/docs text is still slightly misleading for qug_test, which permanently rejects any non-None survey input.

Methodology

No unmitigated P0/P1 methodology findings. Affected methods are the HAD survey-design entry points only (HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test), and I did not find an undocumented estimator/SE change relative to docs/methodology/REGISTRY.md:L2350-L2351 and docs/methodology/REGISTRY.md:L2435-L2440.
P3 Impact: the shared array-in migration text says array-in surfaces should prefer survey_design=make_pweight_design(arr) in diff_diff/survey.py:L754-L760, and the registry repeats that wording for qug_test in docs/methodology/REGISTRY.md:L2351-L2351, but qug_test still permanently rejects any non-None survey_design/survey/weights in diff_diff/had_pretests.py:L1294-L1323. That makes the QUG exception slightly self-contradictory in the load-bearing docs. Concrete fix: special-case qug_test in the mutex text and registry note so the canonical-name guidance does not imply a supported migration path.

Code Quality

P3 Impact: did_had_pretest_workflow comments say the internal event-study calls use canonical kwargs to avoid duplicate deprecation warnings, but on the deprecated weights= path it still forwards weights=joint_weights into both joint wrappers at diff_diff/had_pretests.py:L4239-L4255 and diff_diff/had_pretests.py:L4261-L4276; those wrappers re-warn on any non-None weights at diff_diff/had_pretests.py:L3360-L3365 and diff_diff/had_pretests.py:L3642-L3647. One deprecated workflow call can therefore emit three DeprecationWarnings. Concrete fix: suppress inner deprecation warnings on the workflow’s deprecated weights= event-study path, or route through a private non-warning helper.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 Impact: the next-minor cleanup to remove the deprecated HAD aliases is properly tracked in TODO.md:L102-L102. Concrete fix: none for this PR.

Security

No findings.

Documentation/Tests

P1 [Newly identified] Impact: the common survey_design= front door added in diff_diff/had.py:L2893-L2924 now fronts all HAD design × aggregate combinations, but the new direct coverage only exercises the two-period continuous_at_zero path in tests/test_had_dual_knob_deprecation.py:L618-L650. Likewise, the workflow front door changed in diff_diff/had_pretests.py:L4109-L4134 now fronts both overall and event-study dispatch, but the new direct workflow coverage is only the overall path in tests/test_had_dual_knob_deprecation.py:L654-L720. Per the parameter-interaction checklist, the new kwarg is still unverified on the weighted event-study and mass-point branches it now fronts. Concrete fix: add direct survey_design= smoke/parity tests for fit(design="mass_point", ...), fit(aggregate="event_study", ..., cband=True), and did_had_pretest_workflow(aggregate="event_study", ...), plus legacy-alias parity on those same branches.

Path to Approval

Add direct survey_design= regression coverage for HeterogeneousAdoptionDiD.fit on at least the weighted mass_point path and the weighted aggregate="event_study" path, with parity checks against the legacy survey=/weights= entry points.
Add direct did_had_pretest_workflow(..., aggregate="event_study", survey_design=SurveyDesign(...)) coverage, plus parity checks for deprecated survey= and weights= on that same front door.
Clean up the two user-facing contract mismatches: suppress nested deprecation warnings on the workflow event-study weights= path, and special-case the qug_test mutex/registry wording so it does not point users to an unsupported make_pweight_design migration.

R2 P1: extended dispatch-matrix coverage on the new survey_design= front door. Added 3 test classes covering paths that PR #376 fronted but didn't directly test: - TestHADFitMassPointSurveyDesign: design='mass_point' + survey_design= smoke + legacy-alias att-parity (vcov_type='hc1' required by the Phase 4.5 B mass-point + survey deviation). - TestHADFitEventStudySurveyDesign: aggregate='event_study' + cband=True + survey_design= smoke + legacy survey= parity (full bit-equality on att, se under same seed + design). - TestDidHadPretestWorkflowEventStudySurveyDesign: workflow event-study smoke via survey_design=, plus legacy survey= and weights= parity. The weights= parity test also locks the R2 P3 nested-warning suppression (asserts exactly ONE DeprecationWarning fires from the workflow front door, not three from cascading joint wrappers). R2 P3 #1: workflow's event-study `weights=` path was emitting up to 3 DeprecationWarnings (one at workflow front door + one each from the joint wrappers' internal weights= path). Wrap the internal joint wrapper calls in `warnings.catch_warnings() + simplefilter("ignore", DeprecationWarning)` since the user-facing warning has already fired at the workflow front door. Joint wrappers can't accept ResolvedSurveyDesign (their `_resolve_pretest_unit_weights` requires a SurveyDesign with .resolve()), so converting weights= to survey_design= via make_pweight_design isn't an option here. Locked by the new test_legacy_alias_parity_weights assertion `n_dep_warnings == 1`. R2 P3 #2: qug_test mutex error pointed users to `survey_design=make_pweight_design(arr)` as a migration target via the shared HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN constant, but qug_test permanently rejects ALL survey_design/survey/weights inputs (Phase 4.5 C0 deferral). Replaced with a qug-specific mutex message that says "no migration path; see NotImplementedError below" instead of suggesting make_pweight_design. 545 tests pass (was 538 + 7 new dispatch-matrix tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:59:33Z

/ai-review

github-actions · 2026-04-25T20:05:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3a81e735f5f88bf76ba0bff094a606ce1cf8deb3

Overall Assessment
⚠️ Needs changes

Executive Summary

Previous P1s from the last review look addressed: array-in alias parity now goes through the same type guard, the workflow event-study path suppresses nested deprecation warnings, and the new regression file adds direct coverage for mass-point overall, weighted event-study fit, and workflow event-study (diff_diff/had_pretests.py:L1568-L1604, diff_diff/had_pretests.py:L2045-L2073, diff_diff/had_pretests.py:L2727-L2755, diff_diff/had_pretests.py:L4245-L4292, tests/test_had_dual_knob_deprecation.py:L727-L964).
I did not find an estimator, weighting, or variance rewrite here; the registry documents this PR as an API consolidation and the changed entry points mostly just rebind into the legacy back ends (docs/methodology/REGISTRY.md:L2350-L2351, diff_diff/had.py:L2898-L2924, diff_diff/had_pretests.py:L4107-L4144).
P1: the new make_pweight_design() / array-in weights= shim path skips 1-D validation, so malformed scalar weights can raise a raw low-level exception instead of the library’s normal ValueError contract (diff_diff/survey.py:L723-L725, diff_diff/had_pretests.py:L1314-L1320, diff_diff/had_pretests.py:L1577-L1583, diff_diff/had_pretests.py:L2051-L2057, diff_diff/had_pretests.py:L2733-L2739).
P3: the new registry/changelog wording still describes qug_test as if it shared the make_pweight_design(arr) migration path, but the implementation correctly says there is no migration path and permanently rejects all survey-aware inputs (docs/methodology/REGISTRY.md:L2351, CHANGELOG.md:L11, diff_diff/had_pretests.py:L1294-L1307).

Methodology

No unmitigated P0/P1 findings. The affected methods are the HAD survey-design entry points only, and the implementation matches the registry’s “API consolidation, back-end unchanged” note plus the existing Phase 4.5 C / C0 methodology notes (docs/methodology/REGISTRY.md:L2350-L2351, docs/methodology/REGISTRY.md:L2429-L2449).
Severity: P3. Impact: the methodology registry/changelog still overstate qug_test’s migration contract by grouping it with the array-in helpers that point users to make_pweight_design(arr), while the code explicitly says qug_test has no survey-aware migration path at all. This is a documentation-contract mismatch, not a numerical defect (docs/methodology/REGISTRY.md:L2351, CHANGELOG.md:L11, diff_diff/had_pretests.py:L1294-L1307). Concrete fix: special-case qug_test in the consolidation note/changelog so only stute_test, yatchew_hr_test, and stute_joint_pretest advertise make_pweight_design(arr).

Code Quality

Severity: P1. Impact: make_pweight_design() assumes weights.shape[0] exists, and the new array-in deprecation shims call it before any _validate_1d_numeric() gate. A scalar deprecated weights= input therefore now fails with a low-level exception instead of the package’s normal front-door ValueError; this regresses the prior validation behavior on these public helper paths (diff_diff/survey.py:L681-L725, diff_diff/had_pretests.py:L1314-L1320, diff_diff/had_pretests.py:L1577-L1583, diff_diff/had_pretests.py:L2051-L2057, diff_diff/had_pretests.py:L2733-L2739, versus the intended 1-D validation at diff_diff/had_pretests.py:L1675-L1694, diff_diff/had_pretests.py:L2113-L2128, diff_diff/had_pretests.py:L2946-L2965). Concrete fix: validate weights as 1-D inside make_pweight_design() (preferred, since it is now a public top-level helper) or validate before every shim call, and add regression tests for scalar / 0-D inputs.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The next-minor alias-removal cleanup is properly tracked in TODO.md:L102.

Security

No findings.

Documentation/Tests

No blocking findings. The previous direct-coverage gap is addressed by the new regression cases for mass-point overall, weighted event-study fit, and workflow event-study in tests/test_had_dual_knob_deprecation.py:L727-L964.
I could not execute the tests in this environment because pytest is not installed.

Path to Approval

Add 1-D front-door validation to make_pweight_design() (or before each array-in weights= shim) so malformed scalar / 0-D inputs raise a clean ValueError instead of bubbling a low-level exception.
Add regression coverage for that error contract on both the new helper and at least one deprecated alias path, e.g. make_pweight_design(1.0), stute_test(..., weights=1.0), and/or qug_test(..., weights=1.0).

R3 P1: make_pweight_design() now validates 1-D input at the front door. Was: scalar / 0-D / column-vector inputs reached `int(w.shape[0])` and failed with cryptic low-level exceptions (IndexError on scalars, inconsistent results on column vectors). Now: clear ValueError ("weights must be 1-dimensional...") that points users to common mistakes (e.g. df[['w']].to_numpy() vs df['w'].to_numpy()). The validation also propagates to the deprecated `weights=` shim path on all 4 array-in helpers (stute_test, yatchew_hr_test, stute_joint_pretest, qug_test), since the shim routes through make_pweight_design. 5 new regression tests in TestPublicHelpers cover scalar, 0-D ndarray, and column-vector inputs to make_pweight_design directly, plus the deprecated `weights=scalar` path on the 3 linearity helpers and qug_test. R3 P3: REGISTRY note + CHANGELOG entry now special-case qug_test as having no migration path. Was: both lumped qug_test in with the array-in helpers' "use survey_design=make_pweight_design(arr)" advice, but qug_test permanently rejects all survey-aware inputs (Phase 4.5 C0 deferral) regardless of which kwarg variant is used. Now: REGISTRY + CHANGELOG explicitly distinguish the 3 linearity helpers (which have a migration path) from qug_test (which doesn't). Error message uses "1-dimensional" wording to also satisfy the existing TestPhase45CR1Regressions tests (test_stute_test_rejects_2d_weights / test_yatchew_hr_test_rejects_2d_weights) that were already gating on the "1-dimensional" regex; both my new tests and the pre-PR tests pass. 550 tests pass (was 545 + 5 new R3 P1 regression tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T20:11:06Z

/ai-review

github-actions · 2026-04-25T20:18:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 03c5c9d0c9b3989f9938ba90066f74a916455c13

Overall Assessment

⚠️ Needs changes

Executive Summary

Severity: P1 [Newly identified]. HeterogeneousAdoptionDiD.fit() now inserts * before survey / weights / cband, so legacy positional callers will fail with TypeError even though this PR is described as an additive, non-breaking deprecation shim (diff_diff/had.py:L2783-L2796, CHANGELOG.md:L11-L14).
Previous re-review issues look addressed: make_pweight_design() now rejects non-1D inputs, the array-in alias/type-guard parity is in place, and the registry/changelog now correctly special-case qug_test (diff_diff/survey.py:L681-L754, diff_diff/had_pretests.py:L1294-L1328,L1563-L1604, docs/methodology/REGISTRY.md:L2350-L2450).
I did not find a methodology, weighting, or variance/SE mismatch in the changed HAD estimator/pretest logic. The new code is an API/deprecation layer over the existing Phase 4.5 B/C/C0 back ends (diff_diff/had.py:L2893-L2924, diff_diff/had_pretests.py:L1568-L1604,L2045-L2073,L2727-L2755, docs/methodology/REGISTRY.md:L2350-L2450).
The new regression file is strong on keyword-based alias parity, but it does not cover legacy positional fit() calls, so the break above is currently unguarded (tests/test_had_dual_knob_deprecation.py:L660-L693,L864-L900).
I could not execute the tests here because pytest is not installed.

Methodology

No findings. Affected methods are the HAD survey-design entry points only (HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test). The registry documents this PR as an API consolidation with unchanged back-end behavior, and the changed code matches that contract (docs/methodology/REGISTRY.md:L2350-L2450, diff_diff/had.py:L2893-L2924, diff_diff/had_pretests.py:L1309-L1328,L1568-L1604,L2045-L2073,L2727-L2755).

Code Quality

Severity: P1 [Newly identified]. Impact: HeterogeneousAdoptionDiD.fit() changes survey, weights, and cband from positional-or-keyword to keyword-only by inserting * ahead of the new survey_design parameter. Any existing call sites using the pre-PR positional order now error immediately, which contradicts the changelog’s “patch-level addition / no breaking changes” statement and the TODO entry that schedules alias removal for the next minor release, not this one (diff_diff/had.py:L2783-L2796, CHANGELOG.md:L11-L14, TODO.md:L102). Concrete fix: preserve the legacy positional order (survey, weights, cband) and make only survey_design the new keyword-only addition.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No blocking findings. The planned alias-removal cleanup is properly tracked in TODO.md:L102.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: the new regression coverage exercises keyword-form survey_design=, survey=, and weights= calls, but it never asserts that legacy positional HeterogeneousAdoptionDiD.fit(...) calls still work, so it does not protect the compatibility contract this PR is trying to preserve (tests/test_had_dual_knob_deprecation.py:L660-L693,L864-L900). Concrete fix: add at least one regression using the old positional order for survey and/or weights/cband.
I could not run the suite because pytest is unavailable in this environment.

Path to Approval

Reorder HeterogeneousAdoptionDiD.fit’s parameters so the pre-PR positional call shapes remain valid, with survey_design added as a new keyword-only argument rather than making survey / weights / cband keyword-only.
Add regression coverage for at least one legacy positional fit() call shape next to the existing keyword-based deprecation tests.

R4 P1: HeterogeneousAdoptionDiD.fit() inadvertently made `survey`, `weights`, and `cband` keyword-only when adding the new `survey_design=` kwarg, by inserting `*,` before all four. This broke pre-PR positional callers, contradicting the "additive, non-breaking" CHANGELOG claim. Reorder so `survey`, `weights`, `cband` keep their pre-PR positional-or-keyword status; only `survey_design=` is the new keyword-only addition (placed after the `*,` separator at the end). R4 P3: added test_legacy_positional_call_back_compat in TestHADFitDeprecation that exercises the full pre-PR positional call shape: `fit(df, "y", "d", "time", "unit", None, "overall", sd, None, True)` — locks the back-compat contract. The 6 array-in pretest helpers + workflow + qug_test were unaffected by this issue: their pre-PR signatures already had `*,` before survey/ weights/etc, so those kwargs were already keyword-only. 551 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T20:25:33Z

/ai-review

github-actions · 2026-04-25T20:30:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: af998df267482848edf7aeb80d90f492575168ee

Overall Assessment

✅ Looks good

Executive Summary

Previous P1 is resolved: HeterogeneousAdoptionDiD.fit() again preserves the pre-PR positional survey / weights / cband calling convention, and a regression now locks that shape. diff_diff/had.py:L2783-L2800, tests/test_had_dual_knob_deprecation.py:L695-L719
Methodology cross-check passed: the Methodology Registry documents this PR as an API consolidation with unchanged HAD estimator/pretest back ends, and the code matches that contract by rebinding survey_design= to the existing survey/weights paths before the established weighting/bootstrap logic runs. docs/methodology/REGISTRY.md:L2350-L2351, diff_diff/had.py:L2897-L2928, diff_diff/had_pretests.py:L1294-L1328, diff_diff/had_pretests.py:L1563-L1604, diff_diff/had_pretests.py:L2040-L2073, diff_diff/had_pretests.py:L2722-L2755, diff_diff/had_pretests.py:L4107-L4144
The new test coverage closes the earlier surface gaps, including positional fit(), mass-point weighted fit, weighted event-study fit, and workflow event-study parity. tests/test_had_dual_knob_deprecation.py:L695-L719, tests/test_had_dual_knob_deprecation.py:L863-L1032
No new P0/P1 issues found in estimator math, weighting, variance/SE, identification checks, or default behaviors.
One minor documentation drift remains in fit()’s docstring: it still describes deprecated aliases as keyword-only and omits survey_design= from the cband paragraph. diff_diff/had.py:L2864-L2883
I could not execute the test suite here because the environment lacks numpy, so the test assessment below is from static inspection only.

Methodology

No findings. Affected methods are HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test. The registry explicitly treats this as a canonical-kwarg/deprecation change, not a change to the underlying estimators or inference, and the implementation follows that design. docs/methodology/REGISTRY.md:L2350-L2450, diff_diff/had.py:L2897-L2928, diff_diff/had_pretests.py:L1294-L1328, diff_diff/had_pretests.py:L1563-L1604, diff_diff/had_pretests.py:L2040-L2073, diff_diff/had_pretests.py:L2722-L2755, diff_diff/had_pretests.py:L4107-L4144

Code Quality

No findings. The prior compatibility blocker is fixed by restoring positional compatibility in fit() and adding a regression for the legacy call shape. diff_diff/had.py:L2783-L2800, tests/test_had_dual_knob_deprecation.py:L695-L719

Performance

No findings. The PR stays in argument-validation / alias-routing territory and does not materially alter the hot estimator or bootstrap loops.

Maintainability

No findings. Centralizing mutex/deprecation strings and promoting make_pweight_design reduces duplicated API glue across the eight HAD surfaces. diff_diff/survey.py:L681-L791

Tech Debt

Severity: P3
Impact: The remaining alias-removal cleanup is properly tracked and therefore non-blocking for this release. TODO.md:L102-L102
Concrete fix: None in this PR; remove survey= / weights= and collapse the legacy routing in the next minor as already tracked.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: HeterogeneousAdoptionDiD.fit()’s updated docstring still says deprecated survey / weights are “keyword-only”, and its cband text only mentions the old survey= / weights= knobs even though the canonical public surface is now survey_design=. Runtime behavior is correct, but generated help text slightly misstates the API. diff_diff/had.py:L2864-L2883
Concrete fix: Update that docstring to say survey / weights remain positional-or-keyword deprecated aliases for one minor cycle, and explicitly mention that cband applies on the weighted event-study survey_design= path too.
No blocking test gap found in the diff. The new regression file covers the previous re-review gaps, including positional fit() compatibility and the mass-point / event-study / workflow survey-design paths. tests/test_had_dual_knob_deprecation.py:L695-L719, tests/test_had_dual_knob_deprecation.py:L863-L1032
I could not execute the suite here because the environment is missing numpy.

R5 P3: HAD.fit() docstring still described `survey` and `weights` as "keyword-only" (they're positional-or-keyword after the R4 P1 fix that restored back-compat) and the `cband` paragraph only mentioned the old `survey=` / `weights=` kwargs (omitted the canonical `survey_design=`). Runtime behavior was correct; just generated help text drift. - Drop "keyword-only" qualifier from `survey` / `weights` parameter lines in the docstring; add note that they remain positional-or-keyword for one minor cycle for back-compat. - Update `cband` paragraph to cover all three of `survey_design=` / `survey=` / `weights=` (was: only the deprecated two). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T20:35:24Z

/ai-review

github-actions · 2026-04-25T20:41:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 86ca61c35d040cf685b99e98648bce47381f56c9

Overall Assessment

Looks good

Executive Summary

Re-review focus passed: the prior fit() compatibility/doc drift concern appears resolved. HeterogeneousAdoptionDiD.fit() now preserves the legacy positional survey / weights / cband call shape, and the new regression file locks that behavior. diff_diff/had.py:L2783-L2800, tests/test_had_dual_knob_deprecation.py:L695-L719
Methodology cross-check passed. The registry documents this PR as an API consolidation with unchanged HAD estimator/pretest back ends, permanent QUG deferral under survey, and a data-in vs array-in surface split; the implementation matches that contract. docs/methodology/REGISTRY.md:L2350-L2351, docs/methodology/REGISTRY.md:L2429-L2448, diff_diff/had.py:L2900-L2931, diff_diff/had_pretests.py:L1294-L1328, diff_diff/had_pretests.py:L1563-L1604, diff_diff/had_pretests.py:L2040-L2073, diff_diff/had_pretests.py:L2722-L2755, diff_diff/had_pretests.py:L3358-L3377, diff_diff/had_pretests.py:L3640-L3659, diff_diff/had_pretests.py:L4107-L4144
I found no new P0/P1 issues in weighting, variance/SE, identification checks, or default behavior.
One minor documentation drift remains: several had_pretests.py Raises blocks still describe the old two-way survey/weights mutex and omit survey_design=. diff_diff/had_pretests.py:L1503-L1510, diff_diff/had_pretests.py:L1958-L1965, diff_diff/had_pretests.py:L4049-L4054
Static review only: I could not run the suite here because the environment is missing both numpy and pytest.

Methodology

No findings. The affected methods are HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test; I did not find an undocumented change to estimator math, weighting, or inference. docs/methodology/REGISTRY.md:L2350-L2351, docs/methodology/REGISTRY.md:L2429-L2448

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: The next-minor cleanup to remove deprecated survey= / weights= aliases is explicitly tracked, so the deferred cleanup is non-blocking for this PR. TODO.md:L102-L102
Concrete fix: None in this PR; remove the aliases in the next minor as already tracked.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The Raises sections for stute_test, yatchew_hr_test, and did_had_pretest_workflow still document the pre-PR two-way survey/weights mutex instead of the new three-way survey_design / survey / weights surface, so generated help text is slightly out of sync with runtime behavior. diff_diff/had_pretests.py:L1503-L1510, diff_diff/had_pretests.py:L1958-L1965, diff_diff/had_pretests.py:L4049-L4054
Concrete fix: Update those Raises blocks to describe the three-way mutex and the canonical survey_design= entry point.
No blocking test-gap finding from static inspection. The new regression file covers helper export/type-guard behavior, legacy positional fit() compatibility, and the expanded dispatch matrix including mass-point and event-study paths. tests/test_had_dual_knob_deprecation.py:L109-L260, tests/test_had_dual_knob_deprecation.py:L695-L1032
I could not execute the tests here because the environment lacks numpy and pytest.

R6 P3: Raises blocks on stute_test, yatchew_hr_test, and did_had_pretest_workflow still described the pre-PR 2-way `survey`/`weights` mutex. Updated to the 3-way `survey_design + survey + weights` form, with `survey=` / `weights=` flagged as deprecated aliases. Also added the TypeError raise to the array-in helpers' Raises blocks (stute_test, yatchew_hr_test) since they reject SurveyDesign instances on `survey_design=` (and equivalently on the deprecated `survey=` alias after R1's guard reorder). Runtime behavior was correct; just generated help text drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T20:44:11Z

/ai-review

github-actions · 2026-04-25T20:50:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 83329a2ef6639b6728dfc1befc3ecd17150409a5

Overall Assessment

✅ Looks good

Executive Summary

Re-review focus passed: the prior had_pretests.py Raises-block drift is fixed; the updated docstrings now describe the 3-way mutex and survey_design= surface. diff_diff/had_pretests.py:L1503-L1518, diff_diff/had_pretests.py:L1966-L1981, diff_diff/had_pretests.py:L4065-L4072
Methodology cross-check passed. The registry documents this as an API-only consolidation with unchanged HAD weighting/inference back ends, canonical survey_design=, documented QUG permanent rejection under survey, and documented data-in vs array-in split; the implementation matches that contract. docs/methodology/REGISTRY.md:L2350-L2351, docs/methodology/REGISTRY.md:L2429-L2450, diff_diff/had.py:L2895-L2931, diff_diff/had_pretests.py:L1294-L1328, diff_diff/had_pretests.py:L1571-L1612, diff_diff/had_pretests.py:L4125-L4408
I found no new P0/P1 issues in estimator math, weighting, variance/SE, identification checks, or default behavior.
One minor documentation drift remains: HeterogeneousAdoptionDiD.fit()’s survey_design parameter doc still reads as continuous-path-only even though mass-point support is documented and regression-tested. diff_diff/had.py:L2850-L2863, docs/methodology/REGISTRY.md:L2350-L2351, tests/test_had_dual_knob_deprecation.py:L855-L926
Static review only: I could not execute the test suite here because numpy, pandas, and pytest are not installed.

Methodology

No findings. The affected methods are HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test; the code follows the documented API-only consolidation and keeps the previously documented weighting / variance behavior intact. docs/methodology/REGISTRY.md:L2350-L2351, docs/methodology/REGISTRY.md:L2429-L2450

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: The alias-removal and routing unification work is intentionally deferred, but it is explicitly tracked for the next minor release, so it is non-blocking for this PR. TODO.md:L102-L102
Concrete fix: None in this PR; complete the tracked cleanup in the next minor.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: HeterogeneousAdoptionDiD.fit() now presents survey_design= as the canonical public kwarg, but its parameter doc still says survey design-based inference is for the two continuous paths only. That understates actual mass-point support on both overall and event_study, which the registry and new regression tests now cover. diff_diff/had.py:L2850-L2863, docs/methodology/REGISTRY.md:L2350-L2351, tests/test_had_dual_knob_deprecation.py:L855-L926
Concrete fix: Update that parameter docstring to mention mass-point support and its variance behavior, or point readers directly to the registry note.
No blocking test-gap finding from static inspection. The new regression file covers helper export/validation, array-in type guards, positional fit() compatibility, mass-point and event-study front doors, and the nested-warning suppression case. tests/test_had_dual_knob_deprecation.py:L109-L284, tests/test_had_dual_knob_deprecation.py:L660-L792, tests/test_had_dual_knob_deprecation.py:L855-L1032

R7 P3: HAD.fit()'s `survey_design` parameter docstring still scoped to "the two continuous-dose paths" even though Phase 4.5 B added mass-point support and Phase 4.5 B added event-study survey composition (both with test coverage in TestHADFitMassPointSurveyDesign and TestHADFitEventStudySurveyDesign). Widened the description to cover the full dispatch matrix: continuous × {overall, event_study} + mass_point × {overall, event_study}. Notes mass-point's vcov_type='hc1' requirement, event-study's cband=True simultaneous CI, and points readers to REGISTRY.md for the full matrix. Proactive sweep (per user request): also updated HeterogeneousAdoptionDiDEventStudyResults.variance_formula's class docstring to clarify that the "weights= shortcut" / "survey= path" labels refer to internal variance-source families (still accurate internally) — added explicit "including via the deprecated weights= alias" / "via survey_design= or the deprecated survey= alias" so the field-level help text agrees with the consolidation. Other surfaces audited (no drift found): did_had_pretest_workflow, joint_pretrends_test, joint_homogeneity_test, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest survey_design= docstrings; all already align with the canonical kwarg + 3-way mutex contract. Internal back-end comments using "weights= shortcut" / "survey= path" describe the (unchanged) routing mechanism; left as-is. 551 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T21:01:50Z

/ai-review

github-actions · 2026-04-25T21:07:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 41d7d45e12868db809e5e78d82fcb4a241a35b3e

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review focus mostly passed: the prior docstring drift on HeterogeneousAdoptionDiD.fit() is fixed, and the revised survey_design= docs now match the methodology registry on mass-point and event-study support. diff_diff/had.py:L2853-L2891, docs/methodology/REGISTRY.md:L2352-L2353
Methodology cross-check passed. I found no undocumented change to HAD estimator math, weighting, variance/SE, or identification assumptions; this PR remains an API-surface consolidation/deprecation cycle as documented in the registry. docs/methodology/REGISTRY.md:L2352-L2450
Severity P1 [Newly identified]: HeterogeneousAdoptionDiD.fit() does not enforce the documented data-in type split for survey_design=. Passing the new make_pweight_design(...)/ResolvedSurveyDesign into fit() falls through to low-level aggregate-dependent errors instead of a front-door TypeError. diff_diff/had.py:L2918-L2944, diff_diff/had.py:L3011-L3057, diff_diff/had.py:L3815-L3892
The new regression file covers alias parity and the array-in type guard well, but it does not lock the corresponding data-in misuse case on fit(). tests/test_had_dual_knob_deprecation.py:L181-L250, tests/test_had_dual_knob_deprecation.py:L661-L719, tests/test_had_dual_knob_deprecation.py:L893-L926
Static review only: I could not run the test suite here because numpy, pandas, and pytest are not installed.

Methodology

No findings. The affected methods are HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test. The registry explicitly documents the canonical survey_design= surface, deprecated aliases, QUG’s permanent survey rejection, and unchanged weighting/variance back ends, and the modified docstrings align with that contract. docs/methodology/REGISTRY.md:L2352-L2450, diff_diff/had.py:L2853-L2891

Code Quality

Severity: P1 [Newly identified]
Impact: The PR introduces a surface split where data-in APIs take SurveyDesign and array-in APIs take pre-resolved designs, and it adds explicit type guards for the array-in helpers. HeterogeneousAdoptionDiD.fit() does not add the matching data-in guard. After the alias rebinding, the overall path assumes survey.weights is a column name, while the event-study path unconditionally calls survey.resolve(data_filtered), so fit(..., survey_design=make_pweight_design(w)) will fall through to low-level failures instead of the documented front-door contract. This is a production-facing edge-case bug on the new public API surface. diff_diff/had_pretests.py:L1571-L1605, diff_diff/had_pretests.py:L2056-L2085, diff_diff/had_pretests.py:L2738-L2767, diff_diff/had.py:L2918-L2944, diff_diff/had.py:L3011-L3057, diff_diff/had.py:L3815-L3892, docs/methodology/REGISTRY.md:L2352-L2353
Concrete fix: After the deprecation rebinding in fit(), reject non-SurveyDesign inputs on the data-in surface with a clear TypeError that points users to survey_design=SurveyDesign(weights='col', ...) and reserves make_pweight_design(...) for the array-in helpers. Reusing the existing shared survey validator in diff_diff/survey.py:L1202-L1213 would align HAD with the rest of the codebase. Add regression tests for both aggregate="overall" and aggregate="event_study" covering survey_design=make_pweight_design(w) and the deprecated survey=make_pweight_design(w) alias.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: The alias-removal and deeper routing cleanup are intentionally deferred, but that work is already tracked in TODO.md, so it is non-blocking for this PR. TODO.md:L102-L102
Concrete fix: None in this PR; complete the tracked next-minor cleanup.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The new deprecation suite explicitly tests the array-in wrong-type guard, but it does not add the corresponding negative coverage for the data-in HeterogeneousAdoptionDiD.fit() surface, which is why the P1 above is still unpinned. tests/test_had_dual_knob_deprecation.py:L181-L250, tests/test_had_dual_knob_deprecation.py:L661-L719, tests/test_had_dual_knob_deprecation.py:L893-L926
Concrete fix: Add negative tests asserting a clear TypeError for fit(..., survey_design=make_pweight_design(...)) and fit(..., survey=make_pweight_design(...)) on both aggregates.

Path to Approval

Add a front-door type guard in HeterogeneousAdoptionDiD.fit() so the data-in survey_design=/survey= surface accepts only SurveyDesign-like inputs with .resolve, and rejects ResolvedSurveyDesign/make_pweight_design(...) with migration guidance.
Add regression tests for the four misuse cases above: aggregate="overall" and "event_study", each via both survey_design= and deprecated survey=.

R8 P1: HAD.fit() lacked the data-in symmetric type guard that was already present on the array-in pretest helpers. Result: passing `survey_design=make_pweight_design(arr)` (or the deprecated `survey=make_pweight_design(arr)` alias) to `fit()` would fall through to low-level errors -- `survey.resolve(data)` AttributeError on event-study, or `survey.weights` (a numpy array on Resolved) misinterpreted as a column name on overall. Fix: after the alias rebinding in `fit()`, reject any non-`SurveyDesign`- like input (no `.resolve()` method) with a clear TypeError that points users to `survey_design=SurveyDesign(weights='col_name', ...)` for the data-in surface and reserves `make_pweight_design(arr)` for the array-in pretest helpers. Mirrors the array-in helpers' `isinstance(survey_design, SurveyDesign)` rejection — the data-in / array-in surface split is now symmetric on type guards in both directions. The `did_had_pretest_workflow` and joint data-in wrappers already had this protection via `_resolve_pretest_unit_weights`'s `if not hasattr(survey, "resolve"): raise TypeError(...)` check; HAD.fit was the missing surface. 4 new regression tests in TestHADFitDeprecation cover the misuse cases the reviewer named: `survey_design=ResolvedSurveyDesign` and `survey=ResolvedSurveyDesign` on both `aggregate="overall"` and `aggregate="event_study"`. Updated existing test_survey_bad_type_raises regex from "SurveyDesign- like" → "SurveyDesign" since the new guard fires before the prior hasattr-weights check (both messages contain "SurveyDesign", so the broader regex matches both). 555 tests pass (was 551 + 4 new R8 P1 regression tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T21:12:04Z

/ai-review

github-actions · 2026-04-25T21:17:37Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9c0d74266894b26eb9cac9693fe525e2741fd898

Overall Assessment

✅ Looks good

Executive Summary

Prior P1 is resolved: HeterogeneousAdoptionDiD.fit() now explicitly rejects pre-resolved survey objects on the data-in surface, and the new tests cover overall/event-study plus deprecated-alias variants. diff_diff/had.py:L2908-L2968, tests/test_had_dual_knob_deprecation.py:L695-L766
Methodology cross-check passed. The registry documents this PR as a HAD survey-API consolidation with unchanged estimator, weighting, variance, and identification back ends; I found no undocumented methodology drift. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2437-L2448
The new public make_pweight_design() helper matches the documented contract and top-level export plan, including 1-D front-door validation and retention of the private alias. diff_diff/survey.py:L681-L792, diff_diff/__init__.py:L149-L155, diff_diff/__init__.py:L444-L449
Severity: P3. The data-in pretest wrappers still reject canonical survey_design= misuse through a helper error that says survey= must be a SurveyDesign, so the canonical kwarg name is not surfaced consistently outside HeterogeneousAdoptionDiD.fit(). diff_diff/had_pretests.py:L3264-L3269, diff_diff/had_pretests.py:L3374-L3393, diff_diff/had_pretests.py:L3656-L3675, diff_diff/had_pretests.py:L4125-L4160
Severity: P3. The new deprecation suite does not yet lock legacy-vs-canonical parity on joint_pretrends_test, joint_homogeneity_test, or did_had_pretest_workflow(..., aggregate="overall"). tests/test_had_dual_knob_deprecation.py:L540-L606, tests/test_had_dual_knob_deprecation.py:L609-L656, tests/test_had_dual_knob_deprecation.py:L795-L865
Static review only: I could not run pytest, and package imports were unavailable because numpy is not installed in this environment.

Methodology

No findings. The affected surfaces are the eight HAD survey-design entry points, and the registry’s API-consolidation note matches the implementation: canonical survey_design=, deprecated survey= / weights=, unchanged numerical back ends, array-in vs data-in split preserved, and QUG still permanently rejected under survey inputs. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2448

Code Quality

Severity: P3
Impact: did_had_pretest_workflow, joint_pretrends_test, and joint_homogeneity_test enforce the new data-in contract indirectly through _resolve_pretest_unit_weights(). If a caller misuses canonical survey_design= with make_pweight_design(...) / ResolvedSurveyDesign, the TypeError still says survey= must be a SurveyDesign, which is inconsistent with the new public surface and less clear than the dedicated HeterogeneousAdoptionDiD.fit() guard. diff_diff/had_pretests.py:L3264-L3269, diff_diff/had.py:L2946-L2967
Concrete fix: add the same explicit front-door type guard used in HeterogeneousAdoptionDiD.fit(), or parameterize _resolve_pretest_unit_weights() so the error names survey_design= and points array-in callers to make_pweight_design(...).

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: alias removal and full cleanup of the legacy weights= routing are intentionally deferred, but the PR correctly tracks that work in TODO.md, so it is non-blocking. TODO.md:L102-L102
Concrete fix: none in this PR; complete the tracked next-minor cleanup.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: the deprecation suite covers smoke/warning/mutex behavior for joint_pretrends_test, joint_homogeneity_test, and the overall workflow, but unlike the other surfaces it does not assert that deprecated survey= / weights= remain numerically identical to canonical survey_design= there. For an API-only consolidation, those are the missing regression locks most likely to catch future rebinding drift. tests/test_had_dual_knob_deprecation.py:L540-L606, tests/test_had_dual_knob_deprecation.py:L609-L656, tests/test_had_dual_knob_deprecation.py:L795-L865
Concrete fix: add parity tests for survey_design=SurveyDesign(weights="w") vs deprecated survey=SurveyDesign(weights="w"), and uniform-weight weights=np.ones(n) where applicable, on joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(..., aggregate="overall").

R9 P3 #1 (helper error message canonical-kwarg consistency): `_resolve_pretest_unit_weights`'s TypeError on non-`SurveyDesign`-like input still said `survey=` must be a SurveyDesign — but on the data-in wrappers (workflow / joint_pretrends_test / joint_homogeneity_test) the canonical kwarg is now `survey_design=`. Updated the message to name `survey_design=` (with `survey=` flagged as the deprecated alias) and to point pre-resolved-design users to the array-in pretest helpers, mirroring HAD.fit's data-in guard. R9 P3 #2 (legacy-vs-canonical parity coverage on data-in pretests): Added 3 parity tests (test_legacy_alias_parity_survey on joint_pretrends_test + joint_homogeneity_test, plus test_legacy_alias_parity_survey_overall on did_had_pretest_workflow overall path). Locks the rebinding contract on the data-in surfaces that previously only had smoke / warning / mutex coverage. 558 tests pass (was 555 + 3 new R9 P3 parity tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T21:20:37Z

/ai-review

github-actions · 2026-04-25T21:27:14Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b2370229739cea9c69572c6abbcacf59ab2facba

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings.

Executive Summary

Prior P3 code-quality issue is resolved: the data-in pretest wrappers now reject pre-resolved inputs with a canonical survey_design= message and correct migration guidance. diff_diff/had_pretests.py:L3264-L3280
Methodology cross-check passed. The registry documents this as an API consolidation with unchanged back-end estimator, weighting, and variance behavior; QUG remains permanently deferred under survey, and the Stute/Yatchew survey mechanics still match the documented Phase 4.5 C note. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2448
Severity P3: qug_test still emits generic deprecation warnings that tell users to migrate to survey_design= / make_pweight_design(...), even though QUG has no supported survey-aware migration path. diff_diff/had_pretests.py:L1309-L1320, diff_diff/survey.py:L779-L792, docs/methodology/REGISTRY.md:L2353-L2353, docs/methodology/REGISTRY.md:L2431-L2437
Severity P3: the prior parity-test gap is only partially closed. The new suite adds survey= parity for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(..., aggregate="overall"), but weights= on those same surfaces is still warning-only with no direct legacy-vs-canonical parity lock. tests/test_had_dual_knob_deprecation.py:L557-L640, tests/test_had_dual_knob_deprecation.py:L661-L725, tests/test_had_dual_knob_deprecation.py:L882-L968
The next-minor alias cleanup is properly tracked in TODO.md. TODO.md:L102-L102
Static review only: pytest is not available in this environment.

Methodology

No findings. The affected methods are the HAD survey-design entry surfaces, and the implementation matches the registry’s consolidation note plus the unchanged QUG/Stute/Yatchew method notes. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2448

Code Quality

Severity: P3
Impact: qug_test’s deprecation warnings contradict both the new registry text and the function’s own mutex/NotImplementedError: they tell users to migrate to survey_design= even though qug_test permanently rejects all survey-aware kwargs. That makes the deprecation guidance internally inconsistent. diff_diff/had_pretests.py:L1309-L1320, diff_diff/survey.py:L779-L792, docs/methodology/REGISTRY.md:L2353-L2353, docs/methodology/REGISTRY.md:L2431-L2437
Concrete fix: give qug_test its own deprecation warning text for survey= / weights= that says the aliases are deprecated but survey-aware QUG remains unsupported, and point users to unweighted qug_test or did_had_pretest_workflow(..., survey_design=...) for the survey-aware linearity family.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The deprecated-alias removal / legacy-path folding is already tracked in TODO.md:L102-L102.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The earlier test-gap finding is only partially resolved. The new tests lock survey= parity on the joint data-in wrappers and the overall workflow, but the corresponding weights= paths still only assert warning emission, not numerical parity against canonical survey_design=. That leaves those rebinding paths without the same regression protection now present on other surfaces. tests/test_had_dual_knob_deprecation.py:L557-L640, tests/test_had_dual_knob_deprecation.py:L661-L725, tests/test_had_dual_knob_deprecation.py:L882-L968
Concrete fix: add direct parity tests comparing weights=np.ones(n) to survey_design=SurveyDesign(weights="w") for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(..., aggregate="overall"), asserting identical cvm_stat_joint / p_value and workflow stute / yatchew statistics.

R10 P3 #1 (qug_test deprecation warning text): qug_test was using the shared array-in deprecation messages that point users to migrate to `survey_design=` / `make_pweight_design(arr)`, but qug_test permanently rejects ALL survey-aware kwargs (Phase 4.5 C0 deferral). Replaced with qug-specific warning text that says the aliases are deprecated AND that survey-aware QUG remains unsupported, pointing users to `did_had_pretest_workflow(..., survey_design=...)` for the survey-aware linearity family instead. R10 P3 #2 (weights= parity tests on data-in wrappers): the previous round added survey= parity for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(aggregate='overall') but left the weights= rebinding paths warning-only with no numerical parity lock. Added 3 new tests: test_legacy_alias_parity_weights (joint_pretrends_test + joint_homogeneity_test) and test_legacy_alias_parity_weights_overall (workflow). Each asserts `weights=np.ones(n)` ≡ `survey_design=SurveyDesign(weights="w")` (uniform 1.0 column) on identical-numerical-output, locking the rebinding contract. 561 tests pass (was 558 + 3 new R10 P3 parity tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T21:29:53Z

/ai-review

github-actions · 2026-04-25T21:35:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 236bdcebe6247775a3160d29918463981968f770

Overall Assessment

✅ Looks good

Executive Summary

Methodology cross-check passed for the 8 affected HAD entry surfaces: HeterogeneousAdoptionDiD.fit, did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, and joint_homogeneity_test. The registry now documents this as an API consolidation with unchanged Phase 4.5 B/C/C0 estimator, weighting, and variance behavior underneath the new front door. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2452
Previous re-review P3 is resolved: qug_test now has qug-specific mutex/deprecation text and no longer advertises make_pweight_design(...) as a valid migration path for a permanently unsupported survey-QUG surface. diff_diff/had_pretests.py:L1294-L1377
Previous re-review P3 is resolved: direct weights= parity locks were added for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow on the relevant overall/event-study paths. tests/test_had_dual_knob_deprecation.py:L608-L676, tests/test_had_dual_knob_deprecation.py:L729-L796, tests/test_had_dual_knob_deprecation.py:L1040-L1072, tests/test_had_dual_knob_deprecation.py:L1267-L1312
Severity P3 informational: the shared data-in weights= deprecation warning is still slightly misleading on HeterogeneousAdoptionDiD.fit, because the suggested migration changes the variance family today; the next-minor cleanup is already tracked in TODO.md. diff_diff/survey.py:L783-L787, TODO.md:L102-L102
Static review only: pytest is not installed in this environment, so I could not execute the suite.

Methodology

No findings. The PR rebadges the entry kwargs and adds front-door type/mutex handling, but the underlying methodology remains the documented split: QUG stays permanently rejected under survey; Stute/joint Stute keep the Phase 4.5 C PSU-level Mammen multiplier bootstrap; Yatchew keeps the documented weighted closed-form variance components. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2452, diff_diff/had.py:L2908-L2968, diff_diff/had_pretests.py:L1595-L1636, diff_diff/had_pretests.py:L2080-L2112, diff_diff/had_pretests.py:L2762-L2795, diff_diff/had_pretests.py:L4162-L4199

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3
Impact: HeterogeneousAdoptionDiD.fit(weights=...) emits a deprecation warning that points users to survey_design=SurveyDesign(weights='col_name'), but that migration is not inference-equivalent today: the deprecated shortcut still uses variance_formula="pweight" / "pweight_2sls", while the canonical survey-design path uses survey_binder_tsl / survey_binder_tsl_2sls. This is already tracked for the next minor cleanup, so it is informational rather than blocking. diff_diff/survey.py:L783-L787, diff_diff/had.py:L2882-L2891, diff_diff/had.py:L3514-L3576, diff_diff/had.py:L4402-L4433, tests/test_had.py:L3314-L3335, TODO.md:L102-L102
Concrete fix: split the shared data-in weights= deprecation message so HeterogeneousAdoptionDiD.fit explicitly states that survey_design=SurveyDesign(weights='col') is the long-term API but that current releases preserve a distinct pweight-shortcut variance family until the TODO-tracked unification lands.

Security

No findings.

Documentation/Tests

No findings. The earlier re-review test-gap items are now closed by dedicated parity coverage on the joint wrappers and workflow, plus a warning-count lock on the workflow event-study weights= path. tests/test_had_dual_knob_deprecation.py:L608-L676, tests/test_had_dual_knob_deprecation.py:L729-L796, tests/test_had_dual_knob_deprecation.py:L1040-L1072, tests/test_had_dual_knob_deprecation.py:L1267-L1312

R11 P3 (informational): HAD.fit's `weights=` deprecation message used the shared HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN constant, which suggests migrating to `survey_design=SurveyDesign(weights='col_name')`. On the workflow + joint data-in wrappers that migration is inference-equivalent (both go through _resolve_pretest_unit_weights to per-unit weights, then through the shared Stute / Yatchew kernels). On HAD.fit it is NOT inference-equivalent today: the deprecated `weights=arr` shortcut keeps `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 weighted-robust / 2SLS pweight-sandwich), while `survey_design=SurveyDesign(weights=col)` composes Binder-TSL (`"survey_binder_tsl"` / `"survey_binder_tsl_2sls"`). Following the migration changes the SE family — the long-term unification is tracked in TODO row 102 for the next minor. Fix: add a HAD.fit-specific HAD_DEPRECATION_MSG_WEIGHTS_KWARG_HAD_FIT constant that says the long-term API is still survey_design= but explicitly notes the SE-family caveat applies in the current release. HAD.fit consumes this new constant; the workflow + joint wrappers keep the original HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN (no SE-family divergence on those surfaces). 561 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T21:41:52Z

/ai-review

igerber · 2026-04-25T22:04:29Z

/ai-review

github-actions · 2026-04-25T22:09:11Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6fb44e6aaf505ed9825902fb5964838f987e54d4

Overall Assessment

✅ Looks good

Static review only: I could not execute the test suite in this environment because runtime deps are missing (pytest and numpy were unavailable).

Executive Summary

No P0/P1 findings. The changed surfaces are API/front-door consolidations; the registry and code both preserve the underlying methodology: QUG remains permanently unsupported under survey, the Stute family still uses PSU-level Mammen multipliers, Yatchew still uses the documented weighted closed-form variance components, and HeterogeneousAdoptionDiD.fit keeps its existing pweight-vs-Binder split. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2445, diff_diff/had.py:L2918-L2945, diff_diff/had_pretests.py:L1294-L1352, diff_diff/had_pretests.py:L1595-L1636, diff_diff/had_pretests.py:L2080-L2113, diff_diff/had_pretests.py:L2762-L2795
The prior re-review items appear addressed: qug_test now has qug-specific runtime messaging, alias-parity coverage was added across the joint wrappers/workflow, and the HeterogeneousAdoptionDiD.fit(weights=...) warning now explicitly discloses the current SE-family caveat while the cleanup is tracked in TODO.md. diff_diff/had_pretests.py:L1294-L1345, tests/test_had_dual_knob_deprecation.py:L608-L676, tests/test_had_dual_knob_deprecation.py:L729-L796, tests/test_had_dual_knob_deprecation.py:L1006-L1072, tests/test_had_dual_knob_deprecation.py:L1231-L1312, diff_diff/survey.py:L788-L809, TODO.md:L102-L102
One non-blocking documentation inconsistency remains: qug_test prose still partially implies a migration to survey_design= / make_pweight_design(...), but runtime and the registry say there is no migration path because all survey-aware QUG kwargs are permanently rejected. diff_diff/had_pretests.py:L1236-L1245, diff_diff/had_pretests.py:L1351-L1367, CHANGELOG.md:L11-L14, docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2437

Methodology

No findings. The PR’s registry note documents this as an API consolidation with unchanged backend behavior, and the touched estimator/pretest code still routes into the pre-existing methodology-specific kernels rather than changing estimator math, weighting, or variance formulas. docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2437-L2445, diff_diff/had.py:L2918-L2945, diff_diff/had_pretests.py:L1595-L1636, diff_diff/had_pretests.py:L2080-L2113, diff_diff/had_pretests.py:L2762-L2795

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The next-minor cleanup for removing the deprecated knobs and folding the legacy back-end routing is explicitly tracked in TODO.md, so the remaining duplication is properly deferred rather than silently accruing. TODO.md:L102-L102

Security

No findings.

Documentation/Tests

Severity P3 | Impact: qug_test documentation still describes weights= as a deprecated alias of survey_design=make_pweight_design(arr), and the changelog’s deprecation bullet broadly says the pretest kwargs are deprecated “in favor of” survey_design=. That is inconsistent with the actual contract: qug_test permanently rejects all non-None survey_design / survey / weights inputs, with no supported migration path. diff_diff/had_pretests.py:L1236-L1245, diff_diff/had_pretests.py:L1351-L1367, CHANGELOG.md:L11-L14, docs/methodology/REGISTRY.md:L2352-L2353, docs/methodology/REGISTRY.md:L2431-L2437
Concrete fix: update the qug_test parameter docs and the changelog deprecation note to carve out qug_test explicitly: the kwarg is surface-symmetric only, all non-None values still raise NotImplementedError, and make_pweight_design(...) is not a valid QUG migration target.

R12 P3: qug_test parameter docs and CHANGELOG deprecation entry still described `weights=` as a deprecated alias of `survey_design=make_pweight_design(arr)` -- but qug_test permanently rejects all non-`None` survey_design / survey / weights values (Phase 4.5 C0 deferral). The deprecation on qug_test is kwarg-name-consolidation only, NOT a migration path; `make_pweight_design(arr)` is not a valid QUG migration target. Updated: - qug_test parameter docs (had_pretests.py:1236-1257) explicitly carve out: surface-symmetric kwarg, all non-None still raises NotImplementedError, no migration path. - CHANGELOG Deprecated entry adds an explicit qug_test carve-out paragraph pointing users to did_had_pretest_workflow(..., survey_design=...) for survey-aware HAD pretesting (which skips QUG under survey). The runtime warning text was already qug-specific from R10; this round aligns the static docs with the runtime contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T22:25:10Z

/ai-review

github-actions · 2026-04-25T22:31:04Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f869576500d655d8e3269c199e5d778ddb4973e3

Overall Assessment

✅ Looks good

Static review only: I could not execute the test suite here because pytest is not installed, and importing diff_diff in this environment also fails because numpy is unavailable.

Executive Summary

No unmitigated P0/P1 findings. The affected methods are the HAD survey/pretest entry points, and the registry now explicitly documents this PR as an API consolidation with unchanged backend methodology rather than a math/variance rewrite: docs/methodology/REGISTRY.md:L2352-L2452, diff_diff/had.py:L2908-L2968, diff_diff/had_pretests.py:L1605-L1646, diff_diff/had_pretests.py:L4172-L4455.
The prior P3 documentation concern around qug_test appears resolved. The changelog, registry, docstrings, warnings, and runtime error now consistently state that QUG remains permanently unsupported under survey/weights and that the survey-aware alternative is the workflow, not make_pweight_design(...): CHANGELOG.md:L13-L14, docs/methodology/REGISTRY.md:L2431-L2438, diff_diff/had_pretests.py:L1236-L1255, diff_diff/had_pretests.py:L1319-L1387.
Parameter propagation looks complete across all 8 affected surfaces. Data-in surfaces take SurveyDesign, array-in linearity helpers take pre-resolved ResolvedSurveyDesign, and HeterogeneousAdoptionDiD.fit now explicitly rejects pre-resolved designs on the data-in surface: docs/methodology/REGISTRY.md:L2353-L2353, diff_diff/had.py:L2946-L2968, diff_diff/had_pretests.py:L1605-L1646, diff_diff/had_pretests.py:L2090-L2123, diff_diff/had_pretests.py:L2772-L2805, diff_diff/had_pretests.py:L3298-L3315.
Test coverage is materially stronger than in the prior review: the new suite covers helper export/input guards, alias parity, type guards, positional back-compat on HAD.fit, event-study/mass-point dispatch, and nested-warning suppression on the workflow event-study path: tests/test_had_dual_knob_deprecation.py:L1-L22, tests/test_had_dual_knob_deprecation.py:L109-L220, tests/test_had_dual_knob_deprecation.py:L540-L676, tests/test_had_dual_knob_deprecation.py:L799-L920, tests/test_had_dual_knob_deprecation.py:L1006-L1072, tests/test_had_dual_knob_deprecation.py:L1075-L1312.
The remaining cleanup is already properly tracked for the next minor release, so it is informational only: TODO.md:L102-L102.

Methodology

No findings. Affected methods: HeterogeneousAdoptionDiD.fit, qug_test, stute_test, yatchew_hr_test, stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow. The registry’s new API-consolidation note matches the code, and I did not find an undocumented change to estimator math, weighting, identification, or variance/SE behavior: docs/methodology/REGISTRY.md:L2352-L2452, diff_diff/had.py:L2908-L2968, diff_diff/had_pretests.py:L1209-L1387, diff_diff/had_pretests.py:L1605-L1646, diff_diff/had_pretests.py:L2090-L2123, diff_diff/had_pretests.py:L2772-L2805, diff_diff/had_pretests.py:L4172-L4455.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity P3 | Impact: The deprecated survey= / weights= shims and duplicate legacy routing remain until the next minor release, but this is explicitly tracked and does not create a current correctness issue. TODO.md:L102-L102, diff_diff/had.py:L2918-L2945, diff_diff/had_pretests.py:L1605-L1646, diff_diff/had_pretests.py:L4172-L4209
Concrete fix: In the planned next-minor cleanup, remove the deprecated kwargs and collapse the remaining legacy survey= / weights= branches into the unified survey_design= path.

Security

No findings.

Documentation/Tests

No findings. The prior qug_test documentation inconsistency looks resolved, and the new regression file substantially improves front-door coverage for the consolidated API: CHANGELOG.md:L13-L14, diff_diff/had_pretests.py:L1236-L1255, diff_diff/had_pretests.py:L1319-L1387, tests/test_had_dual_knob_deprecation.py:L1-L22, tests/test_had_dual_knob_deprecation.py:L1075-L1312.
Residual risk: I could not run the tests in this environment because pytest and numpy are unavailable, so this is a static review only.

igerber and others added 2 commits April 25, 2026 15:45

igerber force-pushed the had-survey-design-consolidation branch from efdcb2d to eef8af4 Compare April 25, 2026 19:45

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 631bfc5 into main Apr 26, 2026
26 of 27 checks passed

igerber deleted the had-survey-design-consolidation branch April 26, 2026 00:41

igerber mentioned this pull request Apr 26, 2026

Release 3.3.1: HAD survey-design consolidation, dCDH by_path placebos + sup-t bands, Phase 4.5 C #387

Merged

Conversation

igerber commented Apr 25, 2026

Summary

Key design choices

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant