Consolidate HAD survey-design API to single survey_design= kwarg#376
Consolidate HAD survey-design API to single survey_design= kwarg#376
Conversation
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Adds survey_design= as the canonical kwarg on all 8 HAD surfaces (HAD.fit,
did_had_pretest_workflow, qug_test, stute_test, yatchew_hr_test,
stute_joint_pretest, joint_pretrends_test, joint_homogeneity_test) to match
the rest of the library (ContinuousDiD/EfficientDiD/dCDH already use
survey_design=). The existing survey= and weights= kwargs become deprecated
aliases (DeprecationWarning, removal next minor); internal back-end paths
unchanged so numerical results are bit-exact pre-PR.
Promotes survey._make_trivial_resolved → public make_pweight_design helper
for the pweight-only convenience on array-in pretest helpers (which take
ResolvedSurveyDesign, not column-referencing SurveyDesign). Underscore name
kept as permanent private alias for back-compat.
Three-way mutex (survey_design + survey + weights) extends the prior 2-way;
two distinct error messages per surface group point users to the right
migration target (SurveyDesign(weights='col') for data-in surfaces vs
make_pweight_design(arr) for array-in helpers).
535 tests pass (489 pre-PR + 46 new in tests/test_had_dual_knob_deprecation.py
covering 8 surfaces × {survey_design= smoke, weights= warn, survey= warn,
parity, mutex} plus surface-spanning tests for type guards, normalization-
order invariant, and public-helper export). Bit-exact regression locked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 P1: deprecated `survey=SurveyDesign(...)` alias didn't trigger the SurveyDesign type guard on stute_test, yatchew_hr_test, stute_joint_pretest because the guard ran BEFORE the alias rebinding. Move the guard AFTER the soft-deprecation block so it covers both `survey_design=SurveyDesign(...)` (canonical) and `survey=SurveyDesign(...)` (deprecated alias) identically. Adds 3 regression tests in TestArrayInTypeGuard covering the alias path on all 3 array-in surfaces. R1 P2: REGISTRY.md had two contradictory notes on HAD survey support — the pre-Phase-4.5-C bullet said "pretests still do NOT accept survey/weights" while the Phase 4.5 C bullet listed all 8 surfaces as supporting them. Rewrote the older bullet to reflect the current Phase 4.5 B + C state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
efdcb2d to
eef8af4
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
R2 P1: extended dispatch-matrix coverage on the new survey_design= front door. Added 3 test classes covering paths that PR #376 fronted but didn't directly test: - TestHADFitMassPointSurveyDesign: design='mass_point' + survey_design= smoke + legacy-alias att-parity (vcov_type='hc1' required by the Phase 4.5 B mass-point + survey deviation). - TestHADFitEventStudySurveyDesign: aggregate='event_study' + cband=True + survey_design= smoke + legacy survey= parity (full bit-equality on att, se under same seed + design). - TestDidHadPretestWorkflowEventStudySurveyDesign: workflow event-study smoke via survey_design=, plus legacy survey= and weights= parity. The weights= parity test also locks the R2 P3 nested-warning suppression (asserts exactly ONE DeprecationWarning fires from the workflow front door, not three from cascading joint wrappers). R2 P3 #1: workflow's event-study `weights=` path was emitting up to 3 DeprecationWarnings (one at workflow front door + one each from the joint wrappers' internal weights= path). Wrap the internal joint wrapper calls in `warnings.catch_warnings() + simplefilter("ignore", DeprecationWarning)` since the user-facing warning has already fired at the workflow front door. Joint wrappers can't accept ResolvedSurveyDesign (their `_resolve_pretest_unit_weights` requires a SurveyDesign with .resolve()), so converting weights= to survey_design= via make_pweight_design isn't an option here. Locked by the new test_legacy_alias_parity_weights assertion `n_dep_warnings == 1`. R2 P3 #2: qug_test mutex error pointed users to `survey_design=make_pweight_design(arr)` as a migration target via the shared HAD_DUAL_KNOB_MUTEX_MSG_ARRAY_IN constant, but qug_test permanently rejects ALL survey_design/survey/weights inputs (Phase 4.5 C0 deferral). Replaced with a qug-specific mutex message that says "no migration path; see NotImplementedError below" instead of suggesting make_pweight_design. 545 tests pass (was 538 + 7 new dispatch-matrix tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
R3 P1: make_pweight_design() now validates 1-D input at the front door.
Was: scalar / 0-D / column-vector inputs reached `int(w.shape[0])` and
failed with cryptic low-level exceptions (IndexError on scalars,
inconsistent results on column vectors). Now: clear ValueError
("weights must be 1-dimensional...") that points users to common
mistakes (e.g. df[['w']].to_numpy() vs df['w'].to_numpy()). The
validation also propagates to the deprecated `weights=` shim path on
all 4 array-in helpers (stute_test, yatchew_hr_test, stute_joint_pretest,
qug_test), since the shim routes through make_pweight_design.
5 new regression tests in TestPublicHelpers cover scalar, 0-D ndarray,
and column-vector inputs to make_pweight_design directly, plus the
deprecated `weights=scalar` path on the 3 linearity helpers and qug_test.
R3 P3: REGISTRY note + CHANGELOG entry now special-case qug_test as
having no migration path. Was: both lumped qug_test in with the array-in
helpers' "use survey_design=make_pweight_design(arr)" advice, but
qug_test permanently rejects all survey-aware inputs (Phase 4.5 C0
deferral) regardless of which kwarg variant is used. Now: REGISTRY +
CHANGELOG explicitly distinguish the 3 linearity helpers (which have a
migration path) from qug_test (which doesn't).
Error message uses "1-dimensional" wording to also satisfy the existing
TestPhase45CR1Regressions tests (test_stute_test_rejects_2d_weights /
test_yatchew_hr_test_rejects_2d_weights) that were already gating on the
"1-dimensional" regex; both my new tests and the pre-PR tests pass.
550 tests pass (was 545 + 5 new R3 P1 regression tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
R4 P1: HeterogeneousAdoptionDiD.fit() inadvertently made `survey`, `weights`, and `cband` keyword-only when adding the new `survey_design=` kwarg, by inserting `*,` before all four. This broke pre-PR positional callers, contradicting the "additive, non-breaking" CHANGELOG claim. Reorder so `survey`, `weights`, `cband` keep their pre-PR positional-or-keyword status; only `survey_design=` is the new keyword-only addition (placed after the `*,` separator at the end). R4 P3: added test_legacy_positional_call_back_compat in TestHADFitDeprecation that exercises the full pre-PR positional call shape: `fit(df, "y", "d", "time", "unit", None, "overall", sd, None, True)` — locks the back-compat contract. The 6 array-in pretest helpers + workflow + qug_test were unaffected by this issue: their pre-PR signatures already had `*,` before survey/ weights/etc, so those kwargs were already keyword-only. 551 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R5 P3: HAD.fit() docstring still described `survey` and `weights` as "keyword-only" (they're positional-or-keyword after the R4 P1 fix that restored back-compat) and the `cband` paragraph only mentioned the old `survey=` / `weights=` kwargs (omitted the canonical `survey_design=`). Runtime behavior was correct; just generated help text drift. - Drop "keyword-only" qualifier from `survey` / `weights` parameter lines in the docstring; add note that they remain positional-or-keyword for one minor cycle for back-compat. - Update `cband` paragraph to cover all three of `survey_design=` / `survey=` / `weights=` (was: only the deprecated two). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R6 P3: Raises blocks on stute_test, yatchew_hr_test, and did_had_pretest_workflow still described the pre-PR 2-way `survey`/`weights` mutex. Updated to the 3-way `survey_design + survey + weights` form, with `survey=` / `weights=` flagged as deprecated aliases. Also added the TypeError raise to the array-in helpers' Raises blocks (stute_test, yatchew_hr_test) since they reject SurveyDesign instances on `survey_design=` (and equivalently on the deprecated `survey=` alias after R1's guard reorder). Runtime behavior was correct; just generated help text drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R7 P3: HAD.fit()'s `survey_design` parameter docstring still scoped to
"the two continuous-dose paths" even though Phase 4.5 B added mass-point
support and Phase 4.5 B added event-study survey composition (both with
test coverage in TestHADFitMassPointSurveyDesign and
TestHADFitEventStudySurveyDesign). Widened the description to cover the
full dispatch matrix: continuous × {overall, event_study} + mass_point ×
{overall, event_study}. Notes mass-point's vcov_type='hc1' requirement,
event-study's cband=True simultaneous CI, and points readers to
REGISTRY.md for the full matrix.
Proactive sweep (per user request): also updated
HeterogeneousAdoptionDiDEventStudyResults.variance_formula's class
docstring to clarify that the "weights= shortcut" / "survey= path"
labels refer to internal variance-source families (still accurate
internally) — added explicit "including via the deprecated weights=
alias" / "via survey_design= or the deprecated survey= alias" so the
field-level help text agrees with the consolidation.
Other surfaces audited (no drift found): did_had_pretest_workflow,
joint_pretrends_test, joint_homogeneity_test, qug_test, stute_test,
yatchew_hr_test, stute_joint_pretest survey_design= docstrings; all
already align with the canonical kwarg + 3-way mutex contract. Internal
back-end comments using "weights= shortcut" / "survey= path" describe
the (unchanged) routing mechanism; left as-is.
551 tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology No findings. The affected methods are Code Quality
Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
R8 P1: HAD.fit() lacked the data-in symmetric type guard that was already present on the array-in pretest helpers. Result: passing `survey_design=make_pweight_design(arr)` (or the deprecated `survey=make_pweight_design(arr)` alias) to `fit()` would fall through to low-level errors -- `survey.resolve(data)` AttributeError on event-study, or `survey.weights` (a numpy array on Resolved) misinterpreted as a column name on overall. Fix: after the alias rebinding in `fit()`, reject any non-`SurveyDesign`- like input (no `.resolve()` method) with a clear TypeError that points users to `survey_design=SurveyDesign(weights='col_name', ...)` for the data-in surface and reserves `make_pweight_design(arr)` for the array-in pretest helpers. Mirrors the array-in helpers' `isinstance(survey_design, SurveyDesign)` rejection — the data-in / array-in surface split is now symmetric on type guards in both directions. The `did_had_pretest_workflow` and joint data-in wrappers already had this protection via `_resolve_pretest_unit_weights`'s `if not hasattr(survey, "resolve"): raise TypeError(...)` check; HAD.fit was the missing surface. 4 new regression tests in TestHADFitDeprecation cover the misuse cases the reviewer named: `survey_design=ResolvedSurveyDesign` and `survey=ResolvedSurveyDesign` on both `aggregate="overall"` and `aggregate="event_study"`. Updated existing test_survey_bad_type_raises regex from "SurveyDesign- like" → "SurveyDesign" since the new guard fires before the prior hasattr-weights check (both messages contain "SurveyDesign", so the broader regex matches both). 555 tests pass (was 551 + 4 new R8 P1 regression tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The affected surfaces are the eight HAD survey-design entry points, and the registry’s API-consolidation note matches the implementation: canonical Code Quality
Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
R9 P3 #1 (helper error message canonical-kwarg consistency): `_resolve_pretest_unit_weights`'s TypeError on non-`SurveyDesign`-like input still said `survey=` must be a SurveyDesign — but on the data-in wrappers (workflow / joint_pretrends_test / joint_homogeneity_test) the canonical kwarg is now `survey_design=`. Updated the message to name `survey_design=` (with `survey=` flagged as the deprecated alias) and to point pre-resolved-design users to the array-in pretest helpers, mirroring HAD.fit's data-in guard. R9 P3 #2 (legacy-vs-canonical parity coverage on data-in pretests): Added 3 parity tests (test_legacy_alias_parity_survey on joint_pretrends_test + joint_homogeneity_test, plus test_legacy_alias_parity_survey_overall on did_had_pretest_workflow overall path). Locks the rebinding contract on the data-in surfaces that previously only had smoke / warning / mutex coverage. 558 tests pass (was 555 + 3 new R9 P3 parity tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. Executive Summary
Methodology No findings. The affected methods are the HAD survey-design entry surfaces, and the implementation matches the registry’s consolidation note plus the unchanged QUG/Stute/Yatchew method notes. Code Quality
Performance No findings. Maintainability No findings. Tech Debt No findings. The deprecated-alias removal / legacy-path folding is already tracked in Security No findings. Documentation/Tests
|
R10 P3 #1 (qug_test deprecation warning text): qug_test was using the shared array-in deprecation messages that point users to migrate to `survey_design=` / `make_pweight_design(arr)`, but qug_test permanently rejects ALL survey-aware kwargs (Phase 4.5 C0 deferral). Replaced with qug-specific warning text that says the aliases are deprecated AND that survey-aware QUG remains unsupported, pointing users to `did_had_pretest_workflow(..., survey_design=...)` for the survey-aware linearity family instead. R10 P3 #2 (weights= parity tests on data-in wrappers): the previous round added survey= parity for joint_pretrends_test, joint_homogeneity_test, and did_had_pretest_workflow(aggregate='overall') but left the weights= rebinding paths warning-only with no numerical parity lock. Added 3 new tests: test_legacy_alias_parity_weights (joint_pretrends_test + joint_homogeneity_test) and test_legacy_alias_parity_weights_overall (workflow). Each asserts `weights=np.ones(n)` ≡ `survey_design=SurveyDesign(weights="w")` (uniform 1.0 column) on identical-numerical-output, locking the rebinding contract. 561 tests pass (was 558 + 3 new R10 P3 parity tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The PR rebadges the entry kwargs and adds front-door type/mutex handling, but the underlying methodology remains the documented split: QUG stays permanently rejected under survey; Stute/joint Stute keep the Phase 4.5 C PSU-level Mammen multiplier bootstrap; Yatchew keeps the documented weighted closed-form variance components. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No findings. The earlier re-review test-gap items are now closed by dedicated parity coverage on the joint wrappers and workflow, plus a warning-count lock on the workflow event-study |
R11 P3 (informational): HAD.fit's `weights=` deprecation message used the shared HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN constant, which suggests migrating to `survey_design=SurveyDesign(weights='col_name')`. On the workflow + joint data-in wrappers that migration is inference-equivalent (both go through _resolve_pretest_unit_weights to per-unit weights, then through the shared Stute / Yatchew kernels). On HAD.fit it is NOT inference-equivalent today: the deprecated `weights=arr` shortcut keeps `variance_formula="pweight"` / `"pweight_2sls"` (CCT-2014 weighted-robust / 2SLS pweight-sandwich), while `survey_design=SurveyDesign(weights=col)` composes Binder-TSL (`"survey_binder_tsl"` / `"survey_binder_tsl_2sls"`). Following the migration changes the SE family — the long-term unification is tracked in TODO row 102 for the next minor. Fix: add a HAD.fit-specific HAD_DEPRECATION_MSG_WEIGHTS_KWARG_HAD_FIT constant that says the long-term API is still survey_design= but explicitly notes the SE-family caveat applies in the current release. HAD.fit consumes this new constant; the workflow + joint wrappers keep the original HAD_DEPRECATION_MSG_WEIGHTS_KWARG_DATA_IN (no SE-family divergence on those surfaces). 561 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only: I could not execute the test suite in this environment because runtime deps are missing ( Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R12 P3: qug_test parameter docs and CHANGELOG deprecation entry still described `weights=` as a deprecated alias of `survey_design=make_pweight_design(arr)` -- but qug_test permanently rejects all non-`None` survey_design / survey / weights values (Phase 4.5 C0 deferral). The deprecation on qug_test is kwarg-name-consolidation only, NOT a migration path; `make_pweight_design(arr)` is not a valid QUG migration target. Updated: - qug_test parameter docs (had_pretests.py:1236-1257) explicitly carve out: surface-symmetric kwarg, all non-None still raises NotImplementedError, no migration path. - CHANGELOG Deprecated entry adds an explicit qug_test carve-out paragraph pointing users to did_had_pretest_workflow(..., survey_design=...) for survey-aware HAD pretesting (which skips QUG under survey). The runtime warning text was already qug-specific from R10; this round aligns the static docs with the runtime contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Static review only: I could not execute the test suite here because Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
survey_design=kwarg matching ContinuousDiD/EfficientDiD/dCDH.survey=andweights=become DeprecationWarning aliases; removal queued for the next minor (TODO row added).make_pweight_design(weights: np.ndarray) -> ResolvedSurveyDesignexported fromdiff_difftop level for the pweight-only convenience on array-in pretest helpers (formerly the privatesurvey._make_trivial_resolved).Key design choices
survey_design=SurveyDesign(weights="col")and resolve againstdataat fit time. Array-in surfaces (stute_test,yatchew_hr_test,stute_joint_pretest,qug_test) take pre-resolvedResolvedSurveyDesignonly; passing aSurveyDesignraisesTypeErrorwith migration guidance tomake_pweight_design(arr)or pre-resolution.{survey_design, survey, weights}may be non-None per call. Two distinct error messages per surface group (data-in vs array-in) point users to the right migration target.weights=deprecation shim bindssurvey_design = make_pweight_design(weights_unnormalized)and lets the unified path apply the mean=1 normalization step exactly once. Locked by scale-invariance test.Test plan
tests/test_had_dual_knob_deprecation.py)weights=andsurvey=paths (all existing weighted tests still produce identical numbers + a DeprecationWarning)make_pweight_designexported, alias of_make_trivial_resolved, array-in TypeError on SurveyDesign, scale-invariance for both stute_test and yatchew_hr_testtests/test_survey.py + test_continuous_did.py + test_efficient_did.py + test_chaisemartin_dhaultfoeuille.py— 536 passed (no breakage fromsurvey.pychanges)🤖 Generated with Claude Code