Skip to content

ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete#476

Merged
igerber merged 6 commits into
mainfrom
feature/continuous-did-methodology-review-tracker-promotion
May 20, 2026
Merged

ContinuousDiD methodology-review-tracker promotion: In Progress -> Complete#476
igerber merged 6 commits into
mainfrom
feature/continuous-did-methodology-review-tracker-promotion

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 20, 2026

Summary

  • ContinuousDiD methodology-review-tracker promotion: tracker row flipped In ProgressComplete with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR HeterogeneousAdoptionDiD methodology-review-tracker promotion: In Progress -> Complete #473).
  • REGISTRY ## ContinuousDiD gains a formal Deviations block consolidating the boundary-knots deviation from R contdid v0.1.0, the bspline_derivative derivative-failure UserWarning (Phase 2 axis-C Address code review feedback for rank_control_units #12), the +inf0 never-treated recoding warning, and the zero-first_treat+nonzero-dose force-zeroing warning (both axis-E silent-coercion fixes) into a single AI-review-recognized labeled surface.
  • Consolidation only — no source code changes, no new tests, no new docstrings. Touches METHODOLOGY_REVIEW.md, docs/methodology/REGISTRY.md, CHANGELOG.md, and TODO.md.
  • Three local codex rounds of P3-only refinement: (R1) parity wording + stale REGISTRY line refs + event-study scope; (R2) dvals knot-alignment clarification; (R3) attributed the R-side basis rebuild to _run_r_contdid (was incorrectly cited to _compare_with_r). R4 verdict: 0 findings.

Methodology references (required if estimator / math changes)

  • Method name(s): ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024)
  • Paper / source link(s): NBER WP 32117, Difference-in-Differences with a Continuous Treatment
  • R reference: contdid v0.1.0 (CRAN)
  • Any intentional deviations from the source (and why): Four documented deviations now consolidated in REGISTRY § ContinuousDiD → Deviations block: (1) Deviation from R: range(dose) vs range(dvals) boundary knots — library avoids extrapolation that contdid v0.1.0 exhibits at dose-grid extremes. R cross-language coverage therefore runs at relative tolerance bands across two surfaces (scalar parity with raw R at 1% on all 6 benchmarks; harmonized boundary-knot-normalized curve parity at 1% / 2% on benchmarks 1-3 via the _run_r_contdid + _compare_with_r harness), NOT bit-exact like HAD. (2-4) Library extensions (no R correspondence): bspline_derivative_design_matrix derivative-failure UserWarning (axis-C Address code review feedback for rank_control_units #12); +inf0 recoding warns + negative first_treat raises (axis-E); zero-first_treat + nonzero-dose force-zeroing warns (axis-E). The original Edge Cases bullet and **Note:** entries remain in place — the Deviations block is the canonical AI-review-recognized surface per CLAUDE.md "Documenting Deviations" labels.

Validation

  • Tests added/updated: No test changes. Validated against existing 15 methodology tests in tests/test_methodology_continuous_did.py (5 classes) and 80 unit tests in tests/test_continuous_did.py. 89 / 89 pass locally with no source code modifications (R benchmarks deselected without R / contdid installed). The bspline-derivative-warning Verified Components row cites TestBSplineDerivativeDegenerateBasis (3 tests).
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial / notebook touched.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 4 commits May 20, 2026 15:46
…omplete)

Flips the ContinuousDiD tracker row to **Complete** with full Verified
Components / Test Coverage / Corrections Made / Deviations / Outstanding
Concerns structure mirroring the HAD precedent (PR #473). Consolidation
only — no source code changes, no new tests, no new docstrings.

- METHODOLOGY_REVIEW.md L59 row flipped In Progress -> Complete with
  Last Review 2026-05-20. L634-655 detail section rewritten with the
  five-block tracker template: 12 Verified Components rows backed by
  15 methodology tests + 80 unit tests + R parity at relative tolerance
  on 6 benchmark configurations.
- docs/methodology/REGISTRY.md ## ContinuousDiD gains a formal
  Deviations block (4 entries with framing header) before the
  Implementation Checklist: boundary-knots Deviation from R + three
  Phase 2 silent-failures audit fixes documented as library extensions
  with no R correspondence. Existing Edge Cases bullet and Note entries
  remain in place — Deviations is the canonical AI-review surface per
  CLAUDE.md "Documenting Deviations" labels.
- CHANGELOG.md [Unreleased] ### Added gains the ContinuousDiD
  tracker-promotion bullet at the top with per-benchmark tolerance
  language calling out the relative-tolerance scope caveat (NOT
  bit-exact like HAD) due to the boundary-knots deviation precluding
  algorithmic bit-equality.
- TODO.md gains one consolidated row tracking the three CGBS 2024
  feature deferrals (covariates kwarg, discrete-treatment saturated
  regression, lowest-dose-as-control Remark 3.1) — these mirror R
  contdid v0.1.0's omissions and are explicitly marked deferred in the
  REGISTRY Implementation Checklist L755-757.

R parity scope: 1% overall ATT on all 6 benchmarks; 1% max ATT(d) curve
and 2% max ACRT(d) curve on benchmarks 1-3 via _compare_with_r helper;
1% overall ACRT on benchmarks 4-5; benchmark 6 is event-study ATT-only.
NOT bit-exact (atol=1e-8) like HAD — boundary-knots divergence precludes
algorithmic bit-equality on aggregated dose-response curves.

89 regression tests pass (80 unit + 9 methodology, R benchmarks deselected
without R/contdid installed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…eness + event-study scope

Three informational P3s from local codex R1, all narrow text fixes:

1. **Methodology** — reword "R parity" claims to distinguish two surfaces:
   (a) scalar parity with raw R cont_did / pte_default output (overall
   ATT on all 6 benchmarks; overall ACRT on benchmarks 4-5; scalar
   overall_att on benchmark 6 event-study), and (b) harmonized
   boundary-knot-normalized curve parity (max ATT(d), max ACRT(d) on
   benchmarks 1-3 only — _compare_with_r helper rebuilds R-side basis
   under Boundary.knots = range(treated_doses) before comparison
   because raw contdid curves use range(dvals)). Applied to
   METHODOLOGY_REVIEW.md R-Reference row + Verified Components rows +
   Test Coverage block + long-form Deviations #1; REGISTRY Deviations
   #1; CHANGELOG bullet.

2. **Maintainability** — replace hard-coded REGISTRY line numbers
   (L755-757, L758) with stable section/item anchors: "REGISTRY
   ## ContinuousDiD -> Implementation Checklist -> Survey design
   support item" and "Implementation Checklist deferred items" instead
   of fragile :L755-757 refs that already drifted to L774-776 with this
   same PR's REGISTRY Deviations block insertion. Applied in
   METHODOLOGY_REVIEW.md (2 occurrences) + TODO.md (1).

3. **Documentation/Tests** — clarify that benchmark 6 validates the
   event-study code path through scalar overall_att only (binarized
   ATT, no per-horizon comparison); per-horizon event_study_effects
   estimates and inference are exercised by Python-side tests at
   tests/test_continuous_did.py:557-690 and :1500-1528 with no R
   cross-language comparison on the per-horizon surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
One remaining P3 from local codex R2 — "benchmarks use R's dvals grid
for exact knot alignment" partially reintroduced the knot-parity
ambiguity the surrounding text fixed. dvals aligns the evaluation grid
between Python and R outputs; boundary knots are re-harmonized
separately to range(treated_doses) inside the _compare_with_r helper.
Reworded to "exact evaluation-grid alignment between Python and R
outputs (boundary knots are harmonized separately under surface (b))"
for clarity.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n_r_contdid

One remaining P3 from local codex R3 — the docs attributed the R-side
basis rebuild to _compare_with_r when it actually lives in
_run_r_contdid at tests/test_methodology_continuous_did.py:333-367;
_compare_with_r only orchestrates the Python-vs-R comparison at
:395-459. This sends future reviewers to the wrong code path when
auditing the documented parity surface.

Reworded 7 citations across METHODOLOGY_REVIEW.md (R Reference row +
Verified Components dose-response row + Test Coverage block +
long-form Deviations #1 + the in-line dvals-grid-alignment note),
REGISTRY Deviations #1, and the CHANGELOG bullet to attribute the
rebuild to _run_r_contdid at L333-367 explicitly, keeping
_compare_with_r credited as the orchestrator at :395-459.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

This is a docs-only tracker-promotion PR for ContinuousDiD. I found no unmitigated P0/P1 issues. The remaining issues are P3 documentation-maintenance items, not methodology or correctness blockers.

Executive Summary

  • Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, variance, weighting, identification, or default-behavior code changed in this PR.
  • The new ContinuousDiD deviations block is supported by the current implementation and tests: boundary-knot behavior, derivative-failure warning, +inf recoding warning, zero-dose coercion warning, and anticipation-aware control masking all already exist in code.
  • No P0/P1 methodology defects: the deviations are explicitly labeled in docs/methodology/REGISTRY.md, and the remaining paper-supported extensions are properly deferred in TODO.md.
  • P3: METHODOLOGY_REVIEW.md attributes survey-support coverage to tests/test_continuous_did.py, but the actual ContinuousDiD survey tests live in tests/test_survey_phase3.py and tests/test_survey_phase6.py.
  • P3: docs/methodology/REGISTRY.md embeds a brittle internal line-number reference (L745).

Methodology

  • No findings. Affected method is ContinuousDiD. The promoted/deviation text matches the existing behavior in docs/methodology/continuous-did.md:L357-L366, diff_diff/continuous_did_bspline.py:L21-L64, diff_diff/continuous_did_bspline.py:L143-L187, diff_diff/continuous_did.py:L251-L327, diff_diff/continuous_did.py:L918-L925, and the R-benchmark harness at tests/test_methodology_continuous_did.py:L333-L459. The remaining paper-supported gaps are explicitly deferred in docs/methodology/REGISTRY.md:L766-L778 and TODO.md:L85-L85, so I do not view the Complete promotion as a P1 methodology issue.

Code Quality

  • No findings. No runtime code changed.

Performance

  • No findings. No runtime code changed.

Maintainability

  • Severity: P3. Impact: docs/methodology/REGISTRY.md:L762-L762 hard-codes (L745) into prose. That reference will drift on the next nearby edit, which weakens the stated “canonical AI-review surface.” Concrete fix: replace the hard-coded line number with a stable textual cross-reference (for example, “the Edge Cases note above”) or a normal file/section reference.

Tech Debt

  • No findings. The new deferred-work row in TODO.md:L85-L85 appropriately tracks the unresolved CGBS extensions, so those gaps are mitigated and non-blocking.

Security

  • No findings. No executable code, credentials, or sensitive configuration changed.

Documentation/Tests

  • Severity: P3. Impact: METHODOLOGY_REVIEW.md:L660-L662 says the 80 tests in tests/test_continuous_did.py cover survey design, but the ContinuousDiD survey evidence actually lives in tests/test_survey_phase3.py:L653-L705, tests/test_survey_phase3.py:L1368-L1588, tests/test_survey_phase6.py:L1230-L1244, and tests/test_survey_phase6.py:L1548-L1610. The survey coverage exists, but the tracker currently points reviewers to the wrong evidence. Concrete fix: split the coverage summary into core estimator tests vs. survey-specific tests and cite the correct files.

…+ stale (L745) ref

Two informational P3s from CI codex R1, both narrow doc fixes:

1. **Documentation/Tests** — METHODOLOGY_REVIEW.md Test Coverage block
   attributed survey-design coverage to tests/test_continuous_did.py,
   but the ContinuousDiD survey tests actually live in
   tests/test_survey_phase3.py::TestContinuousDiDSurvey (L653-705
   analytical SE + bootstrap; L1368-1407 event-study + rejection
   paths) and tests/test_survey_phase6.py (L1230-1244 replicate +
   n_bootstrap rejection; L1548-1610 positive-weight-gate cell
   skipping). test_continuous_did.py has zero survey-flagged tests
   (grep confirmed). Split the coverage summary into core estimator
   tests vs. survey-specific tests and cited the correct files.

2. **Maintainability** — REGISTRY Deviations entry #2 hardcoded
   "(L745)" referring to the § Edge Cases bspline_derivative note.
   The L745 ref would drift on the next nearby edit, weakening the
   "canonical AI-review surface" claim. Replaced with a stable
   textual cross-reference ("the § Edge Cases **Note:** bullet above
   (bspline_derivative_design_matrix entry)").

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2685f32c544be004c554ea5a790b7e6061229337


Overall Assessment

✅ Looks good

This is a docs-only re-review for ContinuousDiD. I found no unmitigated P0/P1 issues. The new deviations are now explicitly labeled in docs/methodology/REGISTRY.md, and the remaining paper-supported non-implemented extensions are properly tracked in TODO.md.

Executive Summary

  • Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, weighting, variance/SE, identification, or default-behavior code changed in this PR.
  • The new REGISTRY Deviations block matches the current implementation and existing tests for boundary knots, derivative-failure warnings, +inf recoding, and zero-dose coercion: docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L143-L187.
  • The prior re-review P3s are resolved: survey coverage now points to the dedicated survey suites in METHODOLOGY_REVIEW.md:L658-L663, and the brittle internal self-line reference previously called out in REGISTRY is gone.
  • Severity: P3. Impact: METHODOLOGY_REVIEW.md still describes ContinuousDiD as an In Progress example and keeps it in the “Promotion priority for the In Progress entries” queue, which now conflicts with the new Complete status. Concrete fix: remove ContinuousDiD from those two surrounding references, or refresh the queue/examples against the current status table. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L1277-L1292
  • The new TODO row correctly tracks the remaining deferred CGBS extensions, so those are non-blocking by policy: TODO.md:L85-L85.

Methodology

  • No findings. This PR does not modify estimator code. The surfaced deviations are documented with the required labels in docs/methodology/REGISTRY.md:L747-L764, and they match current behavior in diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L143-L187, plus the R-benchmark harness in tests/test_methodology_continuous_did.py:L263-L388 and tests/test_methodology_continuous_did.py:L395-L459. The three remaining paper-supported extensions are now explicitly deferred in docs/methodology/REGISTRY.md:L774-L777 and TODO.md:L85-L85, so I do not view the promotion to Complete as a methodology defect.

Code Quality

  • No findings. No runtime code changed.

Performance

  • No findings. No runtime code changed.

Maintainability

  • Severity: P3. Impact: The tracker now contains conflicting status signals for ContinuousDiD: the summary table marks it Complete, but surrounding prose still places it in the In Progress set. That weakens the tracker as a source of truth for future review triage. Concrete fix: delete the ContinuousDiD mention from the In Progress explanatory paragraph and from the promotion-priority list, or regenerate that queue from the current status table. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L1277-L1292

Tech Debt

  • No findings. The new deferred-work row for covariates=, discrete-treatment saturation, and lowest-dose-as-control is appropriately tracked in TODO.md:L85-L85, so these are mitigated P3s rather than blockers.

Security

  • No findings. Docs-only diff; no executable code, credentials, or sensitive configuration changed.

Documentation/Tests

  • No findings. The previous review’s documentation issues are addressed: survey-support evidence now points to tests/test_survey_phase3.py and tests/test_survey_phase6.py in METHODOLOGY_REVIEW.md:L658-L663, and REGISTRY no longer contains the brittle internal (L745) self-reference.

…ueue + example refs

One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still
described ContinuousDiD as "In Progress" in two surrounding surfaces
even after the status-table flip, creating conflicting status signals.

Fixed both sites:
1. L27 explanatory paragraph: removed the ContinuousDiD example from
   the In Progress band's "has methodology file but no paper review"
   illustration (it's now Complete).
2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD)
   and renumbered the remaining queue.

Retroactive fix per feedback_changelog_accuracy_fixes (CI review
catching one factual error in the queue means scanning for the same
mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but
left entry #6 (HAD) in the same In Progress queue. Removed HAD's
entry too and renumbered, so the queue is now self-consistent with
the status table for all Complete entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 14ccc9290648ea080a29ba6a73950ee401a19b25


Overall Assessment

✅ Looks good

This re-review affects ContinuousDiD only. The diff is docs/tracker/TODO/changelog only; I found no unmitigated P0/P1 issues, and the prior review’s stale-status P3 is resolved.

Executive Summary

  • Affected method: ContinuousDiD (Callaway, Goodman-Bacon & Sant'Anna 2024). No estimator, weighting, variance/SE, identification, or default-behavior code changed in this PR; the diff only promotes tracker/docs surfaces. CHANGELOG.md:L11-L11, METHODOLOGY_REVIEW.md:L59-L59
  • The new REGISTRY deviations block uses accepted labels and matches current implementation for boundary knots, derivative-failure warnings, +inf recoding, and zero-dose coercion. docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L918-L925, diff_diff/continuous_did_bspline.py:L150-L187
  • The promotion narrative is supported by existing test surfaces, including the R benchmark harness and survey suites. tests/test_methodology_continuous_did.py:L263-L459, tests/test_survey_phase3.py:L653-L705, tests/test_survey_phase3.py:L1368-L1407, tests/test_survey_phase6.py:L1230-L1244, tests/test_survey_phase6.py:L1548-L1610
  • [Re-review] The prior P3 is addressed: ContinuousDiD is no longer described as an In Progress example or queued for promotion, while the summary row and detailed section now mark it Complete. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L59-L59, METHODOLOGY_REVIEW.md:L640-L685, METHODOLOGY_REVIEW.md:L1287-L1294
  • Remaining paper-supported but unimplemented extensions are now explicitly deferred in both REGISTRY and TODO.md, so they are non-blocking by policy. docs/methodology/REGISTRY.md:L774-L780, TODO.md:L85-L85, diff_diff/continuous_did.py:L159-L169, diff_diff/continuous_did.py:L296-L309, diff_diff/continuous_did.py:L355-L360

Methodology

  • No findings. The only affected method is ContinuousDiD, and this PR does not change estimator code. The newly documented deviations are surfaced with accepted REGISTRY labels and match the current implementation/test harness. docs/methodology/REGISTRY.md:L747-L764, diff_diff/continuous_did.py:L230-L327, diff_diff/continuous_did.py:L382-L392, diff_diff/continuous_did.py:L892-L925, tests/test_methodology_continuous_did.py:L263-L459

Code Quality

  • No findings. No runtime code changed.

Performance

  • No findings. No runtime code changed.

Maintainability

  • No findings. The prior stale-status tracker inconsistency called out in the earlier review is resolved. METHODOLOGY_REVIEW.md:L27-L27, METHODOLOGY_REVIEW.md:L59-L59, METHODOLOGY_REVIEW.md:L1287-L1294

Tech Debt

  • No findings. The three CGBS follow-ups (covariates=, discrete-treatment saturated regression, lowest-dose-as-control) are explicitly tracked as deferred work rather than silently promoted away. docs/methodology/REGISTRY.md:L774-L780, TODO.md:L85-L85, docs/methodology/continuous-did.md:L265-L277

Security

  • No findings. Docs-only diff; no executable code, secrets, or privileged config changed.

Documentation/Tests

  • No findings. The new tracker section is consistent with the referenced ContinuousDiD methodology/unit/survey test surfaces. I did not run the test suite as part of this review. METHODOLOGY_REVIEW.md:L658-L663, tests/test_methodology_continuous_did.py:L391-L459, tests/test_continuous_did.py:L557-L692, tests/test_continuous_did.py:L1500-L1530

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 20, 2026
@igerber igerber merged commit 6c77137 into main May 20, 2026
11 of 12 checks passed
@igerber igerber deleted the feature/continuous-did-methodology-review-tracker-promotion branch May 20, 2026 21:39
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Release notes consolidate 8 PRs since 3.4.0 (2026-05-19):

Public-surface variance lifts:
- SpilloverDiD survey_design on HC1/CR1 via Binder TSL (Wave E.1, igerber#468)
- SpilloverDiD vcov_type=conley + survey_design via stratified-Conley
  on PSU totals (Wave E.2, igerber#474) + lag_cutoff>0 follow-up (igerber#477)
- SunAbraham vcov_type ∈ {classical, hc1, hc2, hc2_bm} (Phase 1b 1/8, igerber#472)
- WLS-CR2 Bell-McCaffrey gates lifted via clubSandwich port (igerber#475)

Methodology-review-tracker promotions (mostly docs/tests):
- PreTrendsPower R pretrends parity goldens (PR-C, igerber#471)
- HAD methodology-review-tracker promotion (igerber#473)
- ContinuousDiD methodology-review-tracker promotion (igerber#476)

All changes additive; bit-equal defaults preserved across the affected
estimators. No new estimators (patch-level per semver convention).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to
Complete. Adds the Verified Components / Test Coverage / Corrections
Made / Deviations / Outstanding Concerns detail section mirroring the
ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7
DCDH deviations from the paper, from R DIDmultiplegtDYN, and library
extensions into a labeled REGISTRY surface per the AI-review
"Documenting Deviations" convention. CHANGELOG [Unreleased] gains a
new Added entry. L27 In Progress example re-pointed to WooldridgeDiD;
L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11
renumbered to igerber#6-igerber#10.

No source code changes, no new tests, no new docstrings —
documentation consolidation only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant