ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete#476
Conversation
…omplete) Flips the ContinuousDiD tracker row to **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473). Consolidation only — no source code changes, no new tests, no new docstrings. - METHODOLOGY_REVIEW.md L59 row flipped In Progress -> Complete with Last Review 2026-05-20. L634-655 detail section rewritten with the five-block tracker template: 12 Verified Components rows backed by 15 methodology tests + 80 unit tests + R parity at relative tolerance on 6 benchmark configurations. - docs/methodology/REGISTRY.md ## ContinuousDiD gains a formal Deviations block (4 entries with framing header) before the Implementation Checklist: boundary-knots Deviation from R + three Phase 2 silent-failures audit fixes documented as library extensions with no R correspondence. Existing Edge Cases bullet and Note entries remain in place — Deviations is the canonical AI-review surface per CLAUDE.md "Documenting Deviations" labels. - CHANGELOG.md [Unreleased] ### Added gains the ContinuousDiD tracker-promotion bullet at the top with per-benchmark tolerance language calling out the relative-tolerance scope caveat (NOT bit-exact like HAD) due to the boundary-knots deviation precluding algorithmic bit-equality. - TODO.md gains one consolidated row tracking the three CGBS 2024 feature deferrals (covariates kwarg, discrete-treatment saturated regression, lowest-dose-as-control Remark 3.1) — these mirror R contdid v0.1.0's omissions and are explicitly marked deferred in the REGISTRY Implementation Checklist L755-757. R parity scope: 1% overall ATT on all 6 benchmarks; 1% max ATT(d) curve and 2% max ACRT(d) curve on benchmarks 1-3 via _compare_with_r helper; 1% overall ACRT on benchmarks 4-5; benchmark 6 is event-study ATT-only. NOT bit-exact (atol=1e-8) like HAD — boundary-knots divergence precludes algorithmic bit-equality on aggregated dose-response curves. 89 regression tests pass (80 unit + 9 methodology, R benchmarks deselected without R/contdid installed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eness + event-study scope Three informational P3s from local codex R1, all narrow text fixes: 1. **Methodology** — reword "R parity" claims to distinguish two surfaces: (a) scalar parity with raw R cont_did / pte_default output (overall ATT on all 6 benchmarks; overall ACRT on benchmarks 4-5; scalar overall_att on benchmark 6 event-study), and (b) harmonized boundary-knot-normalized curve parity (max ATT(d), max ACRT(d) on benchmarks 1-3 only — _compare_with_r helper rebuilds R-side basis under Boundary.knots = range(treated_doses) before comparison because raw contdid curves use range(dvals)). Applied to METHODOLOGY_REVIEW.md R-Reference row + Verified Components rows + Test Coverage block + long-form Deviations #1; REGISTRY Deviations #1; CHANGELOG bullet. 2. **Maintainability** — replace hard-coded REGISTRY line numbers (L755-757, L758) with stable section/item anchors: "REGISTRY ## ContinuousDiD -> Implementation Checklist -> Survey design support item" and "Implementation Checklist deferred items" instead of fragile :L755-757 refs that already drifted to L774-776 with this same PR's REGISTRY Deviations block insertion. Applied in METHODOLOGY_REVIEW.md (2 occurrences) + TODO.md (1). 3. **Documentation/Tests** — clarify that benchmark 6 validates the event-study code path through scalar overall_att only (binarized ATT, no per-horizon comparison); per-horizon event_study_effects estimates and inference are exercised by Python-side tests at tests/test_continuous_did.py:557-690 and :1500-1528 with no R cross-language comparison on the per-horizon surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
One remaining P3 from local codex R2 — "benchmarks use R's dvals grid for exact knot alignment" partially reintroduced the knot-parity ambiguity the surrounding text fixed. dvals aligns the evaluation grid between Python and R outputs; boundary knots are re-harmonized separately to range(treated_doses) inside the _compare_with_r helper. Reworded to "exact evaluation-grid alignment between Python and R outputs (boundary knots are harmonized separately under surface (b))" for clarity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n_r_contdid One remaining P3 from local codex R3 — the docs attributed the R-side basis rebuild to _compare_with_r when it actually lives in _run_r_contdid at tests/test_methodology_continuous_did.py:333-367; _compare_with_r only orchestrates the Python-vs-R comparison at :395-459. This sends future reviewers to the wrong code path when auditing the documented parity surface. Reworded 7 citations across METHODOLOGY_REVIEW.md (R Reference row + Verified Components dose-response row + Test Coverage block + long-form Deviations #1 + the in-line dvals-grid-alignment note), REGISTRY Deviations #1, and the CHANGELOG bullet to attribute the rebuild to _run_r_contdid at L333-367 explicitly, keeping _compare_with_r credited as the orchestrator at :395-459. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good This is a docs-only tracker-promotion PR for Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…+ stale (L745) ref Two informational P3s from CI codex R1, both narrow doc fixes: 1. **Documentation/Tests** — METHODOLOGY_REVIEW.md Test Coverage block attributed survey-design coverage to tests/test_continuous_did.py, but the ContinuousDiD survey tests actually live in tests/test_survey_phase3.py::TestContinuousDiDSurvey (L653-705 analytical SE + bootstrap; L1368-1407 event-study + rejection paths) and tests/test_survey_phase6.py (L1230-1244 replicate + n_bootstrap rejection; L1548-1610 positive-weight-gate cell skipping). test_continuous_did.py has zero survey-flagged tests (grep confirmed). Split the coverage summary into core estimator tests vs. survey-specific tests and cited the correct files. 2. **Maintainability** — REGISTRY Deviations entry #2 hardcoded "(L745)" referring to the § Edge Cases bspline_derivative note. The L745 ref would drift on the next nearby edit, weakening the "canonical AI-review surface" claim. Replaced with a stable textual cross-reference ("the § Edge Cases **Note:** bullet above (bspline_derivative_design_matrix entry)"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good This is a docs-only re-review for Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ueue + example refs One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still described ContinuousDiD as "In Progress" in two surrounding surfaces even after the status-table flip, creating conflicting status signals. Fixed both sites: 1. L27 explanatory paragraph: removed the ContinuousDiD example from the In Progress band's "has methodology file but no paper review" illustration (it's now Complete). 2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD) and renumbered the remaining queue. Retroactive fix per feedback_changelog_accuracy_fixes (CI review catching one factual error in the queue means scanning for the same mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but left entry #6 (HAD) in the same In Progress queue. Removed HAD's entry too and renumbered, so the queue is now self-consistent with the status table for all Complete entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good This re-review affects Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11 renumbered to igerber#6-igerber#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
## ContinuousDiDgains a formal Deviations block consolidating the boundary-knots deviation from Rcontdidv0.1.0, thebspline_derivativederivative-failureUserWarning(Phase 2 axis-C Address code review feedback for rank_control_units #12), the+inf→0never-treated recoding warning, and the zero-first_treat+nonzero-doseforce-zeroing warning (both axis-E silent-coercion fixes) into a single AI-review-recognized labeled surface.METHODOLOGY_REVIEW.md,docs/methodology/REGISTRY.md,CHANGELOG.md, andTODO.md.dvalsknot-alignment clarification; (R3) attributed the R-side basis rebuild to_run_r_contdid(was incorrectly cited to_compare_with_r). R4 verdict: 0 findings.Methodology references (required if estimator / math changes)
contdidv0.1.0 (CRAN)range(dose)vsrange(dvals)boundary knots — library avoids extrapolation thatcontdidv0.1.0 exhibits at dose-grid extremes. R cross-language coverage therefore runs at relative tolerance bands across two surfaces (scalar parity with raw R at 1% on all 6 benchmarks; harmonized boundary-knot-normalized curve parity at 1% / 2% on benchmarks 1-3 via the_run_r_contdid+_compare_with_rharness), NOT bit-exact like HAD. (2-4) Library extensions (no R correspondence):bspline_derivative_design_matrixderivative-failureUserWarning(axis-C Address code review feedback for rank_control_units #12);+inf→0recoding warns + negativefirst_treatraises (axis-E); zero-first_treat+ nonzero-doseforce-zeroing warns (axis-E). The original Edge Cases bullet and**Note:**entries remain in place — the Deviations block is the canonical AI-review-recognized surface per CLAUDE.md "Documenting Deviations" labels.Validation
tests/test_methodology_continuous_did.py(5 classes) and 80 unit tests intests/test_continuous_did.py. 89 / 89 pass locally with no source code modifications (R benchmarks deselected without R /contdidinstalled). The bspline-derivative-warning Verified Components row citesTestBSplineDerivativeDegenerateBasis(3 tests).Security / privacy
Generated with Claude Code