dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath#360
Conversation
|
Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
ca4df9d to
b40ca7b
Compare
|
Rebased onto current No TROP / /ai-review |
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… version Three findings from CI review round 2 (1 P3 + 2 P2): - P2: `POINT_RTOL` tightened from 0.025 (MIXED_POINT_RTOL) to 1e-9. Observed rtol on all 15 (path, horizon) cells is ~1e-11 across both scenarios, consistent with R's 10-digit JSON rounding. The prior 2.5% tolerance would silently accept ~6 orders of magnitude of point-estimate regression while the docs claim exact R parity. - P2: `_compare_by_path()` now asserts matching finite/positive state on `py_se` vs `r_se` BEFORE the `pytest.approx` numeric check. Previously the `if py_se > 0 and r_se > 0:` guard silently skipped SE parity when one side degraded to 0/NaN/inf while the other stayed finite; the regression now fails explicitly with a diagnostic citing both values and suggesting a variance-identifiability check. - P3: `DIDmultiplegtDYN` version pin moved from `>= 2.3.3` to `== 2.3.3` so a future release with changed `by_path` slot semantics cannot silently regenerate the fixture. Comment documents the update protocol: bump the pin AND re-run the parity class when moving to a newer known-compatible version, or extend to an explicit allowlist once a second version is verified. Tests still pass (2/2 in TestDCDHDynRParityByPath); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. Re-review scope here is the Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Wave 1 of the by_path follow-up sequence. Locks in the per-path SE
convention against R DIDmultiplegtDYN 2.3.3 before downstream waves
(inference completion, design variants, survey) build on it.
Extends benchmarks/R/generate_dcdh_dynr_test_values.R with an
extract_dcdh_by_path helper and two scenarios:
- mixed_single_switch_by_path: 2 observed paths, by_path=2
- multi_path_reversible_by_path: 4 observed paths, by_path=3
The multi_path_reversible DGP is deterministic: path assignment is a
function of F_g so each (D_{g,1}, F_g, S_g) cohort contains switchers
from a single path. This keeps the cohort-recentered IF comparable to
R's per-path re-run; cross-path cohort sharing is the documented SE
divergence mechanism.
New TestDCDHDynRParityByPath in tests/test_chaisemartin_dhaultfoeuille_parity.py
matches paths by tuple label via set-equality (robust to R's
undocumented frequency-tie tiebreak) and hard-asserts per-path
switcher counts before SE comparison. Observed parity: point
estimates and switcher counts match R exactly; per-path SE within
12% rtol (Phase 2 envelope) — scenario 13 max 10.2%, scenario 14
max 4.2%.
REGISTRY.md Note (Phase 3 by_path...) updated with R-parity
confirmation and a **Deviation from R (cross-path cohort-sharing
SE)** bullet describing the mechanism under which full-panel
cohort-centered plug-in (ours) and per-path re-run (R's) diverge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… version Three findings from CI review round 2 (1 P3 + 2 P2): - P2: `POINT_RTOL` tightened from 0.025 (MIXED_POINT_RTOL) to 1e-9. Observed rtol on all 15 (path, horizon) cells is ~1e-11 across both scenarios, consistent with R's 10-digit JSON rounding. The prior 2.5% tolerance would silently accept ~6 orders of magnitude of point-estimate regression while the docs claim exact R parity. - P2: `_compare_by_path()` now asserts matching finite/positive state on `py_se` vs `r_se` BEFORE the `pytest.approx` numeric check. Previously the `if py_se > 0 and r_se > 0:` guard silently skipped SE parity when one side degraded to 0/NaN/inf while the other stayed finite; the regression now fails explicitly with a diagnostic citing both values and suggesting a variance-identifiability check. - P3: `DIDmultiplegtDYN` version pin moved from `>= 2.3.3` to `== 2.3.3` so a future release with changed `by_path` slot semantics cannot silently regenerate the fixture. Comment documents the update protocol: bump the pin AND re-run the parity class when moving to a newer known-compatible version, or extend to an explicit allowlist once a second version is verified. Tests still pass (2/2 in TestDCDHDynRParityByPath); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…4 doc cleanup Two findings (1 P2 + 1 P3): - P2: `_compare_by_path` now asserts `py_path["frequency_rank"] == r_path_entry["frequency_rank"]` for every committed path. Both scenarios are constructed with unique path frequencies (scenario 13 via the mixed_single_switch pattern, scenario 14 via deterministic counts 40/25/10/5), so rank ordering is unambiguous and any regression in top-k tiebreak handling now fails explicitly instead of passing silently as long as the selected path set and per-path effects remain correct. - P3: scenario 14 generator docstring and recorded params still described the old stochastic `p_switch`-driven DGP (the pre-PR variant that blew out SE parity via cross-path cohort mixing). The `multi_path_reversible` pattern is now DETERMINISTIC: path assignment is a fixed function of F_g with counts 20/20/15/10/10/5 across the 6 F_g values. `p_switch = 0.35` dropped from both the scenario call and the `params` block in the fixture; comment block rewritten to describe the deterministic design and cite the REGISTRY note for the rationale behind the design choice. Fixture regenerated; scenario 14 params no longer carry the stale `p_switch` entry. Point and SE parity numbers unchanged (deterministic DGP produces the same treatment matrix as before). Tests pass (2/2); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
23259ed to
daefda7
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. Re-review scope here is the new Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
by_pathfollow-up sequence — locks in the per-path SE convention against RDIDmultiplegtDYN 2.3.3before downstream waves (inference completion, design variants, survey) build on it.benchmarks/data/dcdh_dynr_golden_values.json:mixed_single_switch_by_path(2 paths,by_path=2) andmulti_path_reversible_by_path(4 paths,by_path=3via a new deterministic multi-path DGP pattern in the R generator).TestDCDHDynRParityByPathclass intests/test_chaisemartin_dhaultfoeuille_parity.py— 2 parity tests matching paths by tuple label and cross-checking per-path switcher counts before SE comparison.Observed parity
SE_RTOL = 0.12passes both with margin)Documented deviation
REGISTRY.md
Note (Phase 3 by_path...)block now includes:**Deviation from R (cross-path cohort-sharing SE):**— our full-panel cohort-centered plug-in (joiners/leavers precedent) vs R's per-path re-run diverges materially when a(D_{g,1}, F_g, S_g)cohort spans multiple observed paths; the two coincide when every cohort is single-path. Practitioner guidance: interpret per-path SE as a within-full-panel marginal variance, not a per-path conditional variance.Scenario 14's DGP is constructed to keep cohorts single-path by making path assignment a deterministic function of
F_g, demonstrating the regime where Python and R agree up to envelope.R script additions
extract_dcdh_by_pathhelper readsres$by_levels+res$by_level_1..by_level_k, capturing per-path effects / SE / CI / switcher counts per horizon."multi_path_reversible"pattern ingen_reversiblewith cohort-clean path assignment.stopifnot(packageVersion("DIDmultiplegtDYN") >= "2.3.3")at script top —by_pathoutput slots are not version-stable per R package docs.Wave queue updated
~/.claude/projects/.../memory/project_dcdh_by_path_next_prs.mdmarks Wave 1 complete with the SE-gap observation. Next up: Wave 2 (inference completion —n_bootstrap > 0, per-path placebos, per-pathsup_t_bands).Test plan
pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath -vpytest tests/test_chaisemartin_dhaultfoeuille_parity.py -v(17 passed locally)pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_methodology_chaisemartin_dhaultfoeuille.py(189 + 10 passed locally)to_dataframe(level="by_path")round-trip on R-derived data returns 6 rows × 11 columns as expectedRscript benchmarks/R/generate_dcdh_dynr_test_values.R(requires R + DIDmultiplegtDYN 2.3.3 + jsonlite)🤖 Generated with Claude Code