ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete by igerber · Pull Request #476 · igerber/diff-diff

igerber · 2026-05-20T20:06:23Z

Summary

ContinuousDiD methodology-review-tracker promotion: tracker row flipped In Progress → Complete with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR HeterogeneousAdoptionDiD methodology-review-tracker promotion: In Progress -> Complete #473).
REGISTRY ## ContinuousDiD gains a formal Deviations block consolidating the boundary-knots deviation from R contdid v0.1.0, the bspline_derivative derivative-failure UserWarning (Phase 2 axis-C Address code review feedback for rank_control_units #12), the +inf → 0 never-treated recoding warning, and the zero-first_treat+nonzero-dose force-zeroing warning (both axis-E silent-coercion fixes) into a single AI-review-recognized labeled surface.
Consolidation only — no source code changes, no new tests, no new docstrings. Touches METHODOLOGY_REVIEW.md, docs/methodology/REGISTRY.md, CHANGELOG.md, and TODO.md.
Three local codex rounds of P3-only refinement: (R1) parity wording + stale REGISTRY line refs + event-study scope; (R2) dvals knot-alignment clarification; (R3) attributed the R-side basis rebuild to _run_r_contdid (was incorrectly cited to _compare_with_r). R4 verdict: 0 findings.

Methodology references (required if estimator / math changes)

Method name(s): ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
Paper / source link(s): NBER WP 32117, Difference-in-Differences with a Continuous Treatment
R reference: contdid v0.1.0 (CRAN)
Any intentional deviations from the source (and why): Four documented deviations now consolidated in REGISTRY § ContinuousDiD → Deviations block: (1) Deviation from R: range(dose) vs range(dvals) boundary knots — library avoids extrapolation that contdid v0.1.0 exhibits at dose-grid extremes. R cross-language coverage therefore runs at relative tolerance bands across two surfaces (scalar parity with raw R at 1% on all 6 benchmarks; harmonized boundary-knot-normalized curve parity at 1% / 2% on benchmarks 1-3 via the _run_r_contdid + _compare_with_r harness), NOT bit-exact like HAD. (2-4) Library extensions (no R correspondence): bspline_derivative_design_matrix derivative-failure UserWarning (axis-C Address code review feedback for rank_control_units #12); +inf → 0 recoding warns + negative first_treat raises (axis-E); zero-first_treat + nonzero-dose force-zeroing warns (axis-E). The original Edge Cases bullet and **Note:** entries remain in place — the Deviations block is the canonical AI-review-recognized surface per CLAUDE.md "Documenting Deviations" labels.

Validation

Tests added/updated: No test changes. Validated against existing 15 methodology tests in tests/test_methodology_continuous_did.py (5 classes) and 80 unit tests in tests/test_continuous_did.py. 89 / 89 pass locally with no source code modifications (R benchmarks deselected without R / contdid installed). The bspline-derivative-warning Verified Components row cites TestBSplineDerivativeDegenerateBasis (3 tests).
Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial / notebook touched.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…omplete) Flips the ContinuousDiD tracker row to **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473). Consolidation only — no source code changes, no new tests, no new docstrings. - METHODOLOGY_REVIEW.md L59 row flipped In Progress -> Complete with Last Review 2026-05-20. L634-655 detail section rewritten with the five-block tracker template: 12 Verified Components rows backed by 15 methodology tests + 80 unit tests + R parity at relative tolerance on 6 benchmark configurations. - docs/methodology/REGISTRY.md ## ContinuousDiD gains a formal Deviations block (4 entries with framing header) before the Implementation Checklist: boundary-knots Deviation from R + three Phase 2 silent-failures audit fixes documented as library extensions with no R correspondence. Existing Edge Cases bullet and Note entries remain in place — Deviations is the canonical AI-review surface per CLAUDE.md "Documenting Deviations" labels. - CHANGELOG.md [Unreleased] ### Added gains the ContinuousDiD tracker-promotion bullet at the top with per-benchmark tolerance language calling out the relative-tolerance scope caveat (NOT bit-exact like HAD) due to the boundary-knots deviation precluding algorithmic bit-equality. - TODO.md gains one consolidated row tracking the three CGBS 2024 feature deferrals (covariates kwarg, discrete-treatment saturated regression, lowest-dose-as-control Remark 3.1) — these mirror R contdid v0.1.0's omissions and are explicitly marked deferred in the REGISTRY Implementation Checklist L755-757. R parity scope: 1% overall ATT on all 6 benchmarks; 1% max ATT(d) curve and 2% max ACRT(d) curve on benchmarks 1-3 via _compare_with_r helper; 1% overall ACRT on benchmarks 4-5; benchmark 6 is event-study ATT-only. NOT bit-exact (atol=1e-8) like HAD — boundary-knots divergence precludes algorithmic bit-equality on aggregated dose-response curves. 89 regression tests pass (80 unit + 9 methodology, R benchmarks deselected without R/contdid installed). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…eness + event-study scope Three informational P3s from local codex R1, all narrow text fixes: 1. **Methodology** — reword "R parity" claims to distinguish two surfaces: (a) scalar parity with raw R cont_did / pte_default output (overall ATT on all 6 benchmarks; overall ACRT on benchmarks 4-5; scalar overall_att on benchmark 6 event-study), and (b) harmonized boundary-knot-normalized curve parity (max ATT(d), max ACRT(d) on benchmarks 1-3 only — _compare_with_r helper rebuilds R-side basis under Boundary.knots = range(treated_doses) before comparison because raw contdid curves use range(dvals)). Applied to METHODOLOGY_REVIEW.md R-Reference row + Verified Components rows + Test Coverage block + long-form Deviations #1; REGISTRY Deviations #1; CHANGELOG bullet. 2. **Maintainability** — replace hard-coded REGISTRY line numbers (L755-757, L758) with stable section/item anchors: "REGISTRY ## ContinuousDiD -> Implementation Checklist -> Survey design support item" and "Implementation Checklist deferred items" instead of fragile :L755-757 refs that already drifted to L774-776 with this same PR's REGISTRY Deviations block insertion. Applied in METHODOLOGY_REVIEW.md (2 occurrences) + TODO.md (1). 3. **Documentation/Tests** — clarify that benchmark 6 validates the event-study code path through scalar overall_att only (binarized ATT, no per-horizon comparison); per-horizon event_study_effects estimates and inference are exercised by Python-side tests at tests/test_continuous_did.py:557-690 and :1500-1528 with no R cross-language comparison on the per-horizon surface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

One remaining P3 from local codex R2 — "benchmarks use R's dvals grid for exact knot alignment" partially reintroduced the knot-parity ambiguity the surrounding text fixed. dvals aligns the evaluation grid between Python and R outputs; boundary knots are re-harmonized separately to range(treated_doses) inside the _compare_with_r helper. Reworded to "exact evaluation-grid alignment between Python and R outputs (boundary knots are harmonized separately under surface (b))" for clarity. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n_r_contdid One remaining P3 from local codex R3 — the docs attributed the R-side basis rebuild to _compare_with_r when it actually lives in _run_r_contdid at tests/test_methodology_continuous_did.py:333-367; _compare_with_r only orchestrates the Python-vs-R comparison at :395-459. This sends future reviewers to the wrong code path when auditing the documented parity surface. Reworded 7 citations across METHODOLOGY_REVIEW.md (R Reference row + Verified Components dose-response row + Test Coverage block + long-form Deviations #1 + the in-line dvals-grid-alignment note), REGISTRY Deviations #1, and the CHANGELOG bullet to attribute the rebuild to _run_r_contdid at L333-367 explicitly, keeping _compare_with_r credited as the orchestrator at :395-459. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-20T20:10:56Z

Overall Assessment

✅ Looks good

This is a docs-only tracker-promotion PR for ContinuousDiD. I found no unmitigated P0/P1 issues. The remaining issues are P3 documentation-maintenance items, not methodology or correctness blockers.

Executive Summary

Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, variance, weighting, identification, or default-behavior code changed in this PR.
The new ContinuousDiD deviations block is supported by the current implementation and tests: boundary-knot behavior, derivative-failure warning, +inf recoding warning, zero-dose coercion warning, and anticipation-aware control masking all already exist in code.
No P0/P1 methodology defects: the deviations are explicitly labeled in docs/methodology/REGISTRY.md, and the remaining paper-supported extensions are properly deferred in TODO.md.
P3: METHODOLOGY_REVIEW.md attributes survey-support coverage to tests/test_continuous_did.py, but the actual ContinuousDiD survey tests live in tests/test_survey_phase3.py and tests/test_survey_phase6.py.
P3: docs/methodology/REGISTRY.md embeds a brittle internal line-number reference (L745).

Methodology

No findings. Affected method is ContinuousDiD. The promoted/deviation text matches the existing behavior in docs/methodology/continuous-did.md:L357-L366, diff_diff/continuous_did_bspline.py:L21-L64, diff_diff/continuous_did_bspline.py:L143-L187, diff_diff/continuous_did.py:L251-L327, diff_diff/continuous_did.py:L918-L925, and the R-benchmark harness at tests/test_methodology_continuous_did.py:L333-L459. The remaining paper-supported gaps are explicitly deferred in docs/methodology/REGISTRY.md:L766-L778 and TODO.md:L85-L85, so I do not view the Complete promotion as a P1 methodology issue.

Code Quality

No findings. No runtime code changed.

Performance

No findings. No runtime code changed.

Maintainability

Severity: P3. Impact: docs/methodology/REGISTRY.md:L762-L762 hard-codes (L745) into prose. That reference will drift on the next nearby edit, which weakens the stated “canonical AI-review surface.” Concrete fix: replace the hard-coded line number with a stable textual cross-reference (for example, “the Edge Cases note above”) or a normal file/section reference.

Tech Debt

No findings. The new deferred-work row in TODO.md:L85-L85 appropriately tracks the unresolved CGBS extensions, so those gaps are mitigated and non-blocking.

Security

No findings. No executable code, credentials, or sensitive configuration changed.

Documentation/Tests

Severity: P3. Impact: METHODOLOGY_REVIEW.md:L660-L662 says the 80 tests in tests/test_continuous_did.py cover survey design, but the ContinuousDiD survey evidence actually lives in tests/test_survey_phase3.py:L653-L705, tests/test_survey_phase3.py:L1368-L1588, tests/test_survey_phase6.py:L1230-L1244, and tests/test_survey_phase6.py:L1548-L1610. The survey coverage exists, but the tracker currently points reviewers to the wrong evidence. Concrete fix: split the coverage summary into core estimator tests vs. survey-specific tests and cite the correct files.

…+ stale (L745) ref Two informational P3s from CI codex R1, both narrow doc fixes: 1. **Documentation/Tests** — METHODOLOGY_REVIEW.md Test Coverage block attributed survey-design coverage to tests/test_continuous_did.py, but the ContinuousDiD survey tests actually live in tests/test_survey_phase3.py::TestContinuousDiDSurvey (L653-705 analytical SE + bootstrap; L1368-1407 event-study + rejection paths) and tests/test_survey_phase6.py (L1230-1244 replicate + n_bootstrap rejection; L1548-1610 positive-weight-gate cell skipping). test_continuous_did.py has zero survey-flagged tests (grep confirmed). Split the coverage summary into core estimator tests vs. survey-specific tests and cited the correct files. 2. **Maintainability** — REGISTRY Deviations entry #2 hardcoded "(L745)" referring to the § Edge Cases bspline_derivative note. The L745 ref would drift on the next nearby edit, weakening the "canonical AI-review surface" claim. Replaced with a stable textual cross-reference ("the § Edge Cases **Note:** bullet above (bspline_derivative_design_matrix entry)"). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-20T20:49:22Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2685f32c544be004c554ea5a790b7e6061229337

Overall Assessment

✅ Looks good

This is a docs-only re-review for ContinuousDiD. I found no unmitigated P0/P1 issues. The new deviations are now explicitly labeled in docs/methodology/REGISTRY.md, and the remaining paper-supported non-implemented extensions are properly tracked in TODO.md.

Executive Summary

Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, weighting, variance/SE, identification, or default-behavior code changed in this PR.
The new REGISTRY Deviations block matches the current implementation and existing tests for boundary knots, derivative-failure warnings, +inf recoding, and zero-dose coercion: docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L143-L187.
The prior re-review P3s are resolved: survey coverage now points to the dedicated survey suites in METHODOLOGY_REVIEW.md:L658-L663, and the brittle internal self-line reference previously called out in REGISTRY is gone.
Severity: P3. Impact: METHODOLOGY_REVIEW.md still describes ContinuousDiD as an In Progress example and keeps it in the “Promotion priority for the In Progress entries” queue, which now conflicts with the new Complete status. Concrete fix: remove ContinuousDiD from those two surrounding references, or refresh the queue/examples against the current status table. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L1277-L1292
The new TODO row correctly tracks the remaining deferred CGBS extensions, so those are non-blocking by policy: TODO.md:L85-L85.

Methodology

No findings. This PR does not modify estimator code. The surfaced deviations are documented with the required labels in docs/methodology/REGISTRY.md:L747-L764, and they match current behavior in diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L143-L187, plus the R-benchmark harness in tests/test_methodology_continuous_did.py:L263-L388 and tests/test_methodology_continuous_did.py:L395-L459. The three remaining paper-supported extensions are now explicitly deferred in docs/methodology/REGISTRY.md:L774-L777 and TODO.md:L85-L85, so I do not view the promotion to Complete as a methodology defect.

Code Quality

No findings. No runtime code changed.

Performance

No findings. No runtime code changed.

Maintainability

Severity: P3. Impact: The tracker now contains conflicting status signals for ContinuousDiD: the summary table marks it Complete, but surrounding prose still places it in the In Progress set. That weakens the tracker as a source of truth for future review triage. Concrete fix: delete the ContinuousDiD mention from the In Progress explanatory paragraph and from the promotion-priority list, or regenerate that queue from the current status table. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L1277-L1292

Tech Debt

No findings. The new deferred-work row for covariates=, discrete-treatment saturation, and lowest-dose-as-control is appropriately tracked in TODO.md:L85-L85, so these are mitigated P3s rather than blockers.

Security

No findings. Docs-only diff; no executable code, credentials, or sensitive configuration changed.

Documentation/Tests

No findings. The previous review’s documentation issues are addressed: survey-support evidence now points to tests/test_survey_phase3.py and tests/test_survey_phase6.py in METHODOLOGY_REVIEW.md:L658-L663, and REGISTRY no longer contains the brittle internal (L745) self-reference.

…ueue + example refs One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still described ContinuousDiD as "In Progress" in two surrounding surfaces even after the status-table flip, creating conflicting status signals. Fixed both sites: 1. L27 explanatory paragraph: removed the ContinuousDiD example from the In Progress band's "has methodology file but no paper review" illustration (it's now Complete). 2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD) and renumbered the remaining queue. Retroactive fix per feedback_changelog_accuracy_fixes (CI review catching one factual error in the queue means scanning for the same mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but left entry #6 (HAD) in the same In Progress queue. Removed HAD's entry too and renumbered, so the queue is now self-consistent with the status table for all Complete entries. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-05-20T20:56:02Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 14ccc9290648ea080a29ba6a73950ee401a19b25

Overall Assessment

✅ Looks good

This re-review affects ContinuousDiD only. The diff is docs/tracker/TODO/changelog only; I found no unmitigated P0/P1 issues, and the prior review’s stale-status P3 is resolved.

Executive Summary

Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, weighting, variance/SE, identification, or default-behavior code changed in this PR; the diff only promotes tracker/docs surfaces. CHANGELOG.md:L11-L11, METHODOLOGY_REVIEW.md:L59-L59
The new REGISTRY deviations block uses accepted labels and matches current implementation for boundary knots, derivative-failure warnings, +inf recoding, and zero-dose coercion. docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L150-L187
The promotion narrative is supported by existing test surfaces, including the R benchmark harness and survey suites. tests/test_methodology_continuous_did.py:L263-L459, tests/test_survey_phase3.py:L653-L705, tests/test_survey_phase3.py:L1368-L1407, tests/test_survey_phase6.py:L1230-L1244, tests/test_survey_phase6.py:L1548-L1610
[Re-review] The prior P3 is addressed: ContinuousDiD is no longer described as an In Progress example or queued for promotion, while the summary row and detailed section now mark it Complete. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L59-L59, METHODOLOGY_REVIEW.md:L640-L685, METHODOLOGY_REVIEW.md:L1287-L1294
Remaining paper-supported but unimplemented extensions are now explicitly deferred in both REGISTRY and TODO.md, so they are non-blocking by policy. docs/methodology/REGISTRY.md:L774-L780, TODO.md:L85-L85, diff_diff/continuous_did.py:L159-L169, diff_diff/continuous_did.py:L296-L309, diff_diff/continuous_did.py:L355-L360

Methodology

No findings. The only affected method is ContinuousDiD, and this PR does not change estimator code. The newly documented deviations are surfaced with accepted REGISTRY labels and match the current implementation/test harness. docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L382-L392, diff_diff/continuous_did.py:L892-L925, tests/test_methodology_continuous_did.py:L263-L459

Code Quality

No findings. No runtime code changed.

Performance

No findings. No runtime code changed.

Maintainability

No findings. The prior stale-status tracker inconsistency called out in the earlier review is resolved. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L59-L59, METHODOLOGY_REVIEW.md:L1287-L1294

Tech Debt

No findings. The three CGBS follow-ups (covariates=, discrete-treatment saturated regression, lowest-dose-as-control) are explicitly tracked as deferred work rather than silently promoted away. docs/methodology/REGISTRY.md:L774-L780, TODO.md:L85-L85, docs/methodology/continuous-did.md:L265-L277

Security

No findings. Docs-only diff; no executable code, secrets, or privileged config changed.

Documentation/Tests

No findings. The new tracker section is consistent with the referenced ContinuousDiD methodology/unit/survey test surfaces. I did not run the test suite as part of this review. METHODOLOGY_REVIEW.md:L658-L663, tests/test_methodology_continuous_did.py:L391-L459, tests/test_continuous_did.py:L557-L692, tests/test_continuous_did.py:L1500-L1530

Release notes consolidate 8 PRs since 3.4.0 (2026-05-19): Public-surface variance lifts: - SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468) - SpilloverDiD vcov_type=conley + survey_design via stratified-Conley on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477) - SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472) - WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475) Methodology-review-tracker promotions (mostly docs/tests): - PreTrendsPower R pretrends parity goldens (PR-C, igerber#471) - HAD methodology-review-tracker promotion (igerber#473) - ContinuousDiD methodology-review-tracker promotion (igerber#476) All changes additive; bit-equal defaults preserved across the affected estimators. No new estimators (patch-level per semver convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to Complete. Adds the Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns detail section mirroring the ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7 DCDH deviations from the paper, from R DIDmultiplegtDYN, and library extensions into a labeled REGISTRY surface per the AI-review "Documenting Deviations" convention. CHANGELOG [Unreleased] gains a new Added entry. L27 In Progress example re-pointed to WooldridgeDiD; L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11 renumbered to igerber#6-igerber#10. No source code changes, no new tests, no new docstrings — documentation consolidation only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

igerber and others added 4 commits May 20, 2026 15:46

igerber added the ready-for-ci Triggers CI test workflows label May 20, 2026

igerber merged commit 6c77137 into main May 20, 2026
11 of 12 checks passed

igerber deleted the feature/continuous-did-methodology-review-tracker-promotion branch May 20, 2026 21:39

This was referenced May 21, 2026

Release 3.4.1: SpilloverDiD survey + Conley lifts, SunAbraham vcov_type, WLS-CR2 BM, methodology-tracker promotions #480

Merged

ChaisemartinDHaultfoeuille (DCDH) methodology-review-tracker promotion: In Progress -> Complete #481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete#476

ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete#476
igerber merged 6 commits into
mainfrom
feature/continuous-did-methodology-review-tracker-promotion

igerber commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 20, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

github-actions Bot commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant