Skip to content

Add yatchew_hr_test(null='mean_independence') mode#397

Merged
igerber merged 2 commits intomainfrom
had-yatchew-mean-independence
Apr 26, 2026
Merged

Add yatchew_hr_test(null='mean_independence') mode#397
igerber merged 2 commits intomainfrom
had-yatchew-mean-independence

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 26, 2026

Summary

  • Adds keyword-only null: Literal["linearity", "mean_independence"] = "linearity" to yatchew_hr_test. New mode mirrors R YatchewTest::yatchew_test(order=0) — fits intercept-only OLS so residuals are dy - mean(dy) instead of dy - (a + b·d). Downstream sigma2_diff / sigma2_W / sort-by-d machinery is shared between the two modes.
  • Wired into both unweighted and survey-weighted code paths (4-arm dispatch on (null × weighted)).
  • Closes the placebo Yatchew R-parity gap from PR HAD Phase 4: trends_lin (Eq 17/18) + R-package end-to-end parity #392tests/test_did_had_parity.py::TestYatchewParity now routes effect rows through null="linearity" and placebo rows through null="mean_independence". Both modes share the documented × G/(G-1) finite-sample convention shift and parity holds at atol=1e-10.
  • YatchewTestResults gained null_form: str = "linearity" field; summary() renders the correct null-hypothesis title; __repr__ and to_dict() updated.

Patch-level (additive keyword-only kwarg + additive dataclass field with default). Default null="linearity" is bit-exact backcompat across all existing tests.

Test plan

  • pytest tests/test_had_pretests.py::TestYatchewHRTest tests/test_had_pretests.py::TestYatchewHRTestSurvey tests/test_had_pretests.py::TestYatchewHRTestMeanIndependence tests/test_did_had_parity.py (58 tests)
  • pytest tests/test_had_pretests.py (full file regression — 237 tests)
  • pytest tests/test_had.py tests/test_had_dual_knob_deprecation.py (HAD estimator + deprecation regressions — 368 tests)
  • CI's pure-Python and Rust matrices

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings.

Executive Summary

  • Affected method: diff_diff.had_pretests.yatchew_hr_test. The PR adds an additive null switch and changes only residual construction; the variance-ratio core and existing guardrails remain intact (diff_diff/had_pretests.py:L2008-L2498).
  • The new null="mean_independence" mode is documented in the Methodology Registry as the R-placebo parity extension, so this is not an undocumented methodology deviation (docs/methodology/REGISTRY.md:L2502-L2504).
  • Default behavior stays on the paper’s linearity null, including the internal workflow caller, so existing post-treatment workflow semantics are preserved (diff_diff/had_pretests.py:L2046-L2056, diff_diff/had_pretests.py:L4895-L4903).
  • Result metadata is propagated consistently via null_form through the dataclass, summary, and dict export (diff_diff/had_pretests.py:L323-L430).
  • Test coverage is strong for the new branch: hand-computed unweighted/weighted baselines, tie/low-G/invalid-null edges, and end-to-end R parity routing for placebo vs. effect rows (tests/test_had_pretests.py:L3243-L3515, tests/test_did_had_parity.py:L360-L450).
  • Runtime validation was not executed in this sandbox because pytest and numpy are unavailable here.

Methodology

No findings. The affected estimator surface is yatchew_hr_test; the PR preserves the documented 2G/G-1 variance conventions and one-sided normal inference, and limits the new mode to an intercept-only residual definition that is explicitly registry-documented for R placebo parity (diff_diff/had_pretests.py:L2031-L2063, diff_diff/had_pretests.py:L2350-L2484, docs/methodology/REGISTRY.md:L2502-L2504).

Code Quality

No findings. null_form is threaded through every return path and every user-facing renderer/serializer, so the new public parameter does not leave stale or partially propagated result state (diff_diff/had_pretests.py:L380-L429, diff_diff/had_pretests.py:L2303-L2498).

Performance

No findings. The change adds only a small dispatch branch plus intercept-only helper(s); the asymptotic cost of the Yatchew path is unchanged (diff_diff/had_pretests.py:L1016-L1050, diff_diff/had_pretests.py:L2350-L2433).

Maintainability

No findings. The intercept-only mode is isolated in dedicated helpers and reuses the existing downstream variance code instead of duplicating formula logic (diff_diff/had_pretests.py:L1016-L1050, diff_diff/had_pretests.py:L2350-L2433).

Tech Debt

No findings. This PR closes a previously tracked follow-up rather than introducing new untracked deferred work.

Security

No findings. The diff is limited to numerical logic, documentation, and tests; it does not add new I/O, parsing, credential, or command-execution surface.

Documentation/Tests

No findings. The added tests cover the exact interaction points that mattered here: default back-compat, weighted/unweighted composition, duplicate-dose and low-G guards, and R parity for placebo/effect routing (tests/test_had_pretests.py:L3243-L3515, tests/test_did_had_parity.py:L360-L450).

@igerber igerber force-pushed the had-yatchew-mean-independence branch from 35833fb to d7c45db Compare April 26, 2026 20:50
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d7c45db722cc3e440c780372f59110e879151890


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope is clean: the prior AI review had no P1+ findings, and I did not identify any new unmitigated P0/P1 issues in this PR.
  • Affected surface is diff_diff.had_pretests.yatchew_hr_test; the PR adds null={"linearity","mean_independence"} while preserving the default null="linearity" behavior and the workflow’s post-treatment call site (diff_diff/had_pretests.py:L2008-L2063, diff_diff/had_pretests.py:L4895-L4903).
  • Methodology-wise, null="mean_independence" is documented in the Methodology Registry as an R-parity extension, so this is not an undocumented deviation (docs/methodology/REGISTRY.md:L2502-L2502).
  • The implementation changes only residual construction; sigma2_diff, sigma2_W, low-G handling, tie handling, and weighted normalization remain shared across both nulls (diff_diff/had_pretests.py:L2350-L2498).
  • Parameter/result propagation is complete inside the changed surface: every YatchewTestResults return path now carries null_form, and summary() / to_dict() / __repr__ were updated (diff_diff/had_pretests.py:L323-L430, diff_diff/had_pretests.py:L2303-L2497).
  • Runtime verification was not possible here because this environment does not have numpy, scipy, pandas, or pytest installed.

Methodology

  • No findings. The new placebo-facing null is explicitly documented as an R-parity extension, and the workflow still uses the paper’s linearity null for the post-treatment Yatchew step (docs/methodology/REGISTRY.md:L2502-L2502, diff_diff/had_pretests.py:L4895-L4903).

Code Quality

  • No findings. Front-door validation and all YatchewTestResults(...) construction sites were updated consistently (diff_diff/had_pretests.py:L2176-L2180, diff_diff/had_pretests.py:L2303-L2497).

Performance

  • No findings. The change is a small dispatch plus two intercept-only helpers; asymptotic cost is unchanged (diff_diff/had_pretests.py:L1016-L1050, diff_diff/had_pretests.py:L2350-L2498).

Maintainability

  • No findings. The new mode is isolated in dedicated helpers and reuses the existing variance-ratio path instead of duplicating formula logic (diff_diff/had_pretests.py:L1016-L1050, diff_diff/had_pretests.py:L2350-L2498).

Tech Debt

  • No findings. The previously deferred mean_independence item is explicitly marked shipped rather than silently left open (TODO.md:L106-L106).

Security

  • No findings. The diff does not add new I/O, parsing, shelling, or secret-handling surface.

Documentation/Tests

  • Severity P3. Impact: the source-of-truth narrative is internally inconsistent. The new mode is documented in the parity note, but the main Yatchew algorithm block and some in-code docstrings still describe yatchew_hr_test as linearity-only, which can mislead future maintainers about what is paper-derived versus what is the documented R-parity extension (docs/methodology/REGISTRY.md:L2469-L2477, docs/methodology/REGISTRY.md:L2546-L2546, diff_diff/had_pretests.py:L14-L16, diff_diff/had_pretests.py:L323-L330). Concrete fix: update those narratives to describe yatchew_hr_test as a specification test with null={"linearity","mean_independence"}, and label mean_independence explicitly as the documented R-parity extension.
  • Severity P3. Impact: the new null_form propagation is only lightly regression-tested. Current tests cover arithmetic and the attribute itself, but not whether summary(), to_dict(), and __repr__ actually surface the new mode correctly, so a future regression there could slip through (tests/test_had_pretests.py:L840-L843, tests/test_had_pretests.py:L1189-L1194, tests/test_had_pretests.py:L3242-L3515). Concrete fix: add explicit assertions that to_dict()["null_form"] matches the selected mode, summary() switches headings for both nulls, and repr() includes null_form=.

igerber added a commit that referenced this pull request Apr 26, 2026
P3: REGISTRY.md / docstring narrative inconsistency. Updates main
Yatchew algorithm block (REGISTRY.md:2469-2477), Phase 3 row
(REGISTRY.md:2546), module-level docstring (had_pretests.py:14-16),
and YatchewTestResults class docstring (had_pretests.py:323-330) to
describe yatchew_hr_test as a specification test with both nulls,
labelling mean_independence explicitly as the documented R-parity
extension (vs paper-derived linearity).

P3: null_form propagation lightly tested. Adds three explicit
regression tests in the existing test classes the reviewer cited:
- TestJSONSerialization.test_yatchew_to_dict_carries_null_form: locks
  to_dict()["null_form"] for both modes.
- TestSummary.test_yatchew_summary_switches_title_on_null_form: locks
  summary() title switching with both positive and negative controls.
- TestSummary.test_yatchew_repr_includes_null_form: locks
  null_form= appears in repr() for both modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ac66214d2338e0cb9d8077fbb7aeae737a381d53


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope is clean: I did not identify any new unmitigated P0/P1 issues in this diff.
  • Affected methodology surface is yatchew_hr_test; the PR adds null={"linearity","mean_independence"} via dedicated intercept-only helpers and a 4-arm (null × weighted) dispatch while preserving default null="linearity" behavior in diff_diff/had_pretests.py:L1029-L1057 and diff_diff/had_pretests.py:L2021-L2511.
  • The paper deviation is documented correctly: null="mean_independence" is explicitly labeled as an R-parity extension in docs/methodology/REGISTRY.md:L2469-L2478, while the workflow keeps the paper-derived linearity null for its post-treatment Yatchew step in diff_diff/had_pretests.py:L4908-L4916.
  • Parameter/result propagation looks complete: all YatchewTestResults return paths carry null_form, and summary(), __repr__, and to_dict() were updated in diff_diff/had_pretests.py:L328-L443.
  • Test coverage for the new mode is strong: the suite adds direct arithmetic checks, weighted/unweighted coverage, edge-case guards, result-surface checks, and parity routing for placebo vs effect rows in tests/test_had_pretests.py:L845-L853, tests/test_had_pretests.py:L1206-L1224, tests/test_had_pretests.py:L3288-L3545, and tests/test_did_had_parity.py:L360-L449.
  • I could not execute the tests here because this environment does not have pytest, numpy, scipy, or pandas installed.

Methodology

  • No findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. TODO.md appropriately marks the previously deferred mean_independence item as shipped in TODO.md:L106-L106.

Security

  • No findings.

Documentation/Tests

  • No findings.
  • Residual risk: runtime verification was not possible in this environment because the Python scientific stack and pytest are unavailable.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
igerber and others added 2 commits April 26, 2026 18:00
Mirrors R YatchewTest::yatchew_test(order=0). Closes the placebo
Yatchew R-parity gap from PR #392.

- New keyword-only `null: Literal["linearity", "mean_independence"]`
  on `yatchew_hr_test` (default `"linearity"` is bit-exact backcompat).
- `"mean_independence"` fits intercept-only OLS (residuals = dy - mean(dy));
  the downstream sigma2_diff / sigma2_W / sort-by-d machinery is shared.
- Wired through both unweighted and survey-weighted code paths
  (4-arm dispatch on (null × weighted)).
- `YatchewTestResults` gained `null_form: str = "linearity"` field;
  `summary()` renders the correct null-hypothesis title; `__repr__`
  and `to_dict()` updated.
- `tests/test_did_had_parity.py::TestYatchewParity` removed the
  placebo skip; routes effect rows through `null="linearity"` (R order=1)
  and placebo rows through `null="mean_independence"` (R order=0); both
  modes share the documented `× G/(G-1)` finite-sample convention shift
  and parity holds at `atol=1e-10`.
- New `TestYatchewHRTestMeanIndependence` class (15 tests) covering
  happy path, naive Python baseline at `atol=1e-12`, population-variance
  closed form, invalid value, default backcompat, mode-agnostic
  tie/constant-d rejection, NaN handling, weighted reduction at
  w=ones(G) at `atol=1e-14`, weighted non-uniform baseline,
  default-under-weights, survey×null orthogonality, the (linearity,
  weighted) baseline (4-arm coverage), zero/replicate-weight rejection,
  and G<3 mode-agnostic. One additive backcompat case in each of
  `TestYatchewHRTest` and `TestYatchewHRTestSurvey`.
- REGISTRY.md HAD § Yatchew note: TODO marker replaced with shipped
  description. CHANGELOG.md and TODO.md updated.

Patch-level (additive keyword-only kwarg + additive dataclass field
with default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P3: REGISTRY.md / docstring narrative inconsistency. Updates main
Yatchew algorithm block (REGISTRY.md:2469-2477), Phase 3 row
(REGISTRY.md:2546), module-level docstring (had_pretests.py:14-16),
and YatchewTestResults class docstring (had_pretests.py:323-330) to
describe yatchew_hr_test as a specification test with both nulls,
labelling mean_independence explicitly as the documented R-parity
extension (vs paper-derived linearity).

P3: null_form propagation lightly tested. Adds three explicit
regression tests in the existing test classes the reviewer cited:
- TestJSONSerialization.test_yatchew_to_dict_carries_null_form: locks
  to_dict()["null_form"] for both modes.
- TestSummary.test_yatchew_summary_switches_title_on_null_form: locks
  summary() title switching with both positive and negative controls.
- TestSummary.test_yatchew_repr_includes_null_form: locks
  null_form= appears in repr() for both modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the had-yatchew-mean-independence branch from ac66214 to ebdcede Compare April 26, 2026 22:01
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ebdceded75bcbbffd7f7c47837da7f8d58d3955a


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • Severity: P3-informational. Impact: null="mean_independence" is a documented R-parity extension rather than an undocumented methodology drift. The registry and in-code docstrings agree that only the residual definition changes, and the implementation matches that contract by swapping dy ~ 1 + d vs dy ~ 1 while leaving sigma2_diff, sigma2_W, sort-by-d, and default backcompat unchanged. Concrete fix: none. References: docs/methodology/REGISTRY.md:L2469-L2478, diff_diff/had_pretests.py:L2021-L2511.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The previously deferred mean_independence item is appropriately closed as shipped in TODO.md:L106-L106.

Security

  • No findings.

Documentation/Tests

  • No findings. The registry was updated alongside the code, and the new tests cover the new null mode’s arithmetic, weighted/unweighted dispatch, result serialization/summary surfaces, edge-case NaN behavior, and R parity routing. Local execution was not possible here because the Python scientific stack and pytest are unavailable.

@igerber igerber merged commit 5443c13 into main Apr 26, 2026
22 checks passed
@igerber igerber deleted the had-yatchew-mean-independence branch April 26, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant