Wave 2: PanelProfile outcome/dose shape extensions + autonomous-guide worked examples#366
Wave 2: PanelProfile outcome/dose shape extensions + autonomous-guide worked examples#366
Conversation
Overall assessment
Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ptive-only; fix WooldridgeDiD method kwarg P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…corrected scope; cover new exports in import-surface test P3 #1 (ROADMAP wording drift): ROADMAP.md still said the new fields "gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit" and mentioned "time-invariance", which contradicted the round-1 corrections to TreatmentDoseShape's docstring + autonomous guide §2 + §5.2. Reworded to match: the new fields add descriptive distributional context only; `outcome_shape.is_count_like` informs (not gates) the WooldridgeDiD QMLE judgment, and the authoritative ContinuousDiD pre-fit gates remain `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced`. "Time-invariance" wording removed (the field was dropped in round 1). P3 #2 (import-surface test coverage): `test_top_level_import_surface()` previously only verified `profile_panel`, `PanelProfile`, `Alert`. Extended to also cover the two new public exports `OutcomeShape` and `TreatmentDoseShape`, asserting both their importability and their presence in `diff_diff.__all__`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment✅ Looks good — no unmitigated P0/P1 findings. The prior substantive re-review issues I checked are resolved. Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ontinuousDiD prerequisite summaries Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. I did not see these P1 items mitigated in Security No findings. Documentation/Tests
Path to Approval
|
…n estimand wording + is_count_like non-negativity guard P1 #1 (Wooldridge Poisson estimand wording): The guide §4.11 and §5.3 worked example described `WooldridgeDiD(method="poisson")`'s `overall_att` as a "multiplicative effect" / "log-link effect" / "proportional change" to be reported. Verified against `wooldridge.py:1225` (`att = _avg(mu_1 - mu_0, cell_mask)`) and `_reporting_helpers.py:262-281` (registered estimand: "ASF-based average from Wooldridge ETWFE ... average-structural-function (ASF) contrast between treated and counterfactual untreated outcomes ... on the natural outcome scale"): the actual quantity is `E[exp(η_1)] - E[exp(η_0)]`, an outcome-scale DIFFERENCE, not a multiplicative ratio. An agent following the previous wording would misreport the headline scalar. Rewrote both surfaces to: - Describe the estimand as an ASF-based outcome-scale difference, citing `wooldridge.py:1225` and Wooldridge (2023) + REGISTRY.md §WooldridgeDiD nonlinear / ASF path. - Explicitly note the headline `overall_att` is a difference on the natural outcome scale, NOT a multiplicative ratio. - Mention that a proportional / percent-change interpretation can be derived post-hoc as `overall_att / E[Y_0]` but is not the estimator's reported scalar. Added `test_autonomous_count_outcome_uses_asf_outcome_scale_estimand` in `tests/test_guides.py`: extracts §4.11 and §5.3 blocks, asserts forbidden phrases ("multiplicative effect under qmle", "estimates the multiplicative effect", "multiplicative (log-link) effect", "report the multiplicative effect", "report the multiplicative") do NOT appear, and asserts §5.3 explicitly contains "ASF" and "outcome scale" so future edits cannot silently weaken the description. P1 #2 (`is_count_like` non-negativity guard): The `is_count_like` heuristic gated on integer-valued + has-zeros + right-skewed + > 2 distinct values, but did NOT exclude negative support. Verified against `wooldridge.py:1105-1109`: Poisson method hard-rejects `y < 0` with `ValueError`. Without a value_min >= 0 guard, a right-skewed integer outcome with zeros and some negatives would set `is_count_like=True` and steer an agent toward an estimator that then refuses to fit. Added `value_min >= 0.0` to the heuristic and explained the non-negativity gate in the docstring + autonomous guide §2 field reference (now reads "is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND n_distinct_values > 2 AND value_min >= 0"). The guide also notes that the gate exists specifically to align the routing signal with WooldridgeDiD Poisson's hard non-negativity requirement. Added `test_outcome_shape_count_like_excludes_negative_support` in `tests/test_profile_panel.py` covering a Poisson-distributed outcome with a small share of negative integers spliced in: asserts `is_count_like=False` despite the other four conditions firing. P2 (test coverage for both P1s): Both regressions above guard the new contracts. The guide test guards the wording surface; the profile test guards the heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology Re-review note: the previous Poisson estimand-wording and negative-count-routing findings appear resolved in the provided diff.
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. I did not find the Security No findings. Documentation/Tests
Path to Approval
|
b444300 to
ed57c7c
Compare
…ptive-only; fix WooldridgeDiD method kwarg P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…corrected scope; cover new exports in import-surface test P3 #1 (ROADMAP wording drift): ROADMAP.md still said the new fields "gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit" and mentioned "time-invariance", which contradicted the round-1 corrections to TreatmentDoseShape's docstring + autonomous guide §2 + §5.2. Reworded to match: the new fields add descriptive distributional context only; `outcome_shape.is_count_like` informs (not gates) the WooldridgeDiD QMLE judgment, and the authoritative ContinuousDiD pre-fit gates remain `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced`. "Time-invariance" wording removed (the field was dropped in round 1). P3 #2 (import-surface test coverage): `test_top_level_import_surface()` previously only verified `profile_panel`, `PanelProfile`, `Alert`. Extended to also cover the two new public exports `OutcomeShape` and `TreatmentDoseShape`, asserting both their importability and their presence in `diff_diff.__all__`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontinuousDiD prerequisite summaries Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n estimand wording + is_count_like non-negativity guard P1 #1 (Wooldridge Poisson estimand wording): The guide §4.11 and §5.3 worked example described `WooldridgeDiD(method="poisson")`'s `overall_att` as a "multiplicative effect" / "log-link effect" / "proportional change" to be reported. Verified against `wooldridge.py:1225` (`att = _avg(mu_1 - mu_0, cell_mask)`) and `_reporting_helpers.py:262-281` (registered estimand: "ASF-based average from Wooldridge ETWFE ... average-structural-function (ASF) contrast between treated and counterfactual untreated outcomes ... on the natural outcome scale"): the actual quantity is `E[exp(η_1)] - E[exp(η_0)]`, an outcome-scale DIFFERENCE, not a multiplicative ratio. An agent following the previous wording would misreport the headline scalar. Rewrote both surfaces to: - Describe the estimand as an ASF-based outcome-scale difference, citing `wooldridge.py:1225` and Wooldridge (2023) + REGISTRY.md §WooldridgeDiD nonlinear / ASF path. - Explicitly note the headline `overall_att` is a difference on the natural outcome scale, NOT a multiplicative ratio. - Mention that a proportional / percent-change interpretation can be derived post-hoc as `overall_att / E[Y_0]` but is not the estimator's reported scalar. Added `test_autonomous_count_outcome_uses_asf_outcome_scale_estimand` in `tests/test_guides.py`: extracts §4.11 and §5.3 blocks, asserts forbidden phrases ("multiplicative effect under qmle", "estimates the multiplicative effect", "multiplicative (log-link) effect", "report the multiplicative effect", "report the multiplicative") do NOT appear, and asserts §5.3 explicitly contains "ASF" and "outcome scale" so future edits cannot silently weaken the description. P1 #2 (`is_count_like` non-negativity guard): The `is_count_like` heuristic gated on integer-valued + has-zeros + right-skewed + > 2 distinct values, but did NOT exclude negative support. Verified against `wooldridge.py:1105-1109`: Poisson method hard-rejects `y < 0` with `ValueError`. Without a value_min >= 0 guard, a right-skewed integer outcome with zeros and some negatives would set `is_count_like=True` and steer an agent toward an estimator that then refuses to fit. Added `value_min >= 0.0` to the heuristic and explained the non-negativity gate in the docstring + autonomous guide §2 field reference (now reads "is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND n_distinct_values > 2 AND value_min >= 0"). The guide also notes that the gate exists specifically to align the routing signal with WooldridgeDiD Poisson's hard non-negativity requirement. Added `test_outcome_shape_count_like_excludes_negative_support` in `tests/test_profile_panel.py` covering a Poisson-distributed outcome with a small share of negative integers spliced in: asserts `is_count_like=False` despite the other four conditions firing. P2 (test coverage for both P1s): Both regressions above guard the new contracts. The guide test guards the wording surface; the profile test guards the heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-dose gate to ContinuousDiD prerequisite summaries
P1 (newly identified — `dose_min > 0` ContinuousDiD gate omitted):
Reviewer correctly noted that the round-3 prerequisite summary
(`has_never_treated`, `treatment_varies_within_unit == False`,
`is_balanced`, no `duplicate_unit_time_rows` alert) omits the
estimator's strictly-positive-treated-dose restriction. Verified
`continuous_did.py:287-294` raises `ValueError` ("Dose must be
strictly positive for treated units (D > 0)") on negative treated
dose support — a panel can satisfy every documented gate above and
still hard-fail at fit time when `treatment_dose.dose_min < 0`.
Updated wording in five surfaces to add `treatment_dose.dose_min > 0`
as the fifth pre-fit gate:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out all five gates with numbered list referencing the
`continuous_did.py:287-294` line range).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (notes
`dose_min > 0` is itself a gate; the other treatment_dose
sub-fields remain descriptive).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (now five gates checked; added a new
counter-example covering the negative-dose path so an agent
reading the example sees the contradictory case explicitly).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track building-block.
P2 (regression test for negative-dose path):
Added `test_treatment_dose_min_flags_negative_dose_continuous_panels`
in `tests/test_profile_panel.py` covering a balanced, never-treated,
time-invariant continuous panel where every dose level is negative.
The test asserts that all four other gates pass cleanly and that
`dose.dose_min < 0` is correctly observed — the fixture an agent
would see when reasoning about whether ContinuousDiD applies.
Added `dose_min > 0` content-stability assertion in
`tests/test_guides.py` so future wording changes can't silently drop
the gate from the autonomous-guide summary.
Rebased onto origin/main; resolved CHANGELOG conflicts (kept HAD
Phase 4.5 B + dCDH by_path Wave 2 + my Wave 2 entries side by side).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability
Tech Debt No findings. The P1 above is not mitigated by Security No findings. Documentation/Tests
Path to Approval
|
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat
P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.
Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out the screening framing explicitly + lists the `first_treat`
validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(aligned with public contract: most fields descriptive,
`dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
the multi-paragraph block to describe screening + first_treat
caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph: screening checks + necessary-not-sufficient
language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (rewrote step 2 to call out screening +
first_treat caveat; clarified counter-example #4 that
`P(D=0) > 0` is required under BOTH `control_group="never_treated"`
and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.
P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.
P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good — the prior P1 is resolved, and I did not find any new unmitigated P0/P1 issues in the changed code/docs. Executive Summary
Methodology No findings. The changed docs now match the ContinuousDiD contract in Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
|
…se in CHANGELOG is_count_like description Reviewer correctly noted that the CHANGELOG bullet describing `is_count_like` listed only four of the five conditions (integer-valued + has zeros + right-skewed + > 2 distinct values) but omitted the `value_min >= 0` non-negativity clause added in round 4. Readers of the release notes alone would have expected negative-valued count-like outcomes to route as `is_count_like=True` even though the implementation intentionally suppresses that to stay compatible with `WooldridgeDiD(method="poisson")`'s hard non-negative requirement (`wooldridge.py:1105-1109`). Updated the bullet to include the non-negativity clause and explicitly cite the wooldridge.py line range so the release-notes description matches `diff_diff/profile.py`, `diff_diff/guides/llms-autonomous.txt`, and the `test_outcome_shape_count_like_excludes_negative_support` regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. Cross-checking the changed profile/guide language against docs/methodology/REGISTRY.md:L717-L745 and docs/methodology/REGISTRY.md:L1333-L1349 found no undocumented deviation in estimator behavior. Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Static review only: the added coverage in tests/test_profile_panel.py:L887-L1313 and tests/test_guides.py:L37-L182 looks appropriate, but I could not run it in this environment. |
…D opening bullet with the five-check screen below it §4.7 had two summaries of the ContinuousDiD pre-fit contract that disagreed: the opening bullet (Wave 1 era) said "Three eligibility prerequisites", while the paragraph immediately below correctly listed all five profile-side screening checks (Wave 2 added the duplicate-row hard stop and the negative-dose `dose_min > 0` screen). A reader who stopped at the opening bullet could miss those two load-bearing gates and route a duplicate-containing or negative-dose panel into a failed/silent-overwrite fit path. Updated the opening bullet to enumerate all five profile-side checks: (a) zero-dose controls / `P(D=0) > 0`, (b) per-unit time-invariant dose, (c) balanced panel, (d) no `duplicate_unit_time_rows` alert, (e) strictly positive treated doses. Also clarified that the duplicate-row case (d) coerces silently rather than raising — distinct from (a)/(b)/(c)/(e) which all raise ValueError. Pointer to the paragraph below + §2 for the `first_treat` validation surface kept intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
No additional test-coverage findings in the diff. Testing note: I could not execute tests/test_profile_panel.py or tests/test_guides.py in this environment because the required Python packages are not installed. |
…ationale in functional-form terms; rename §5.2 + clarify dose constancy P2 (linear-OLS SE rationale wording): The §4.11 and §5.3 prose justified `WooldridgeDiD(method="poisson")` over linear-OLS DiD by claiming linear-OLS asymptotic SEs "assume normal-shaped errors" that a right-skewed count distribution violates. That misrepresents the library's documented inference contract: linear ETWFE uses cluster-robust sandwich SEs (`OLS path` in REGISTRY.md) which are valid for any error distribution with finite moments — not just Gaussian. Rewrote both surfaces in functional-form / efficiency terms: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome. SEs are calibrated (cluster-robust) but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` imposes a multiplicative (log-link) functional form that respects non-negativity and matches the typical generative process for count data; QMLE sandwich SEs are robust to distributional misspecification. - The choice is about WHICH functional form best summarizes the treatment effect (additive vs multiplicative), not about whether SEs are calibrated. §5.3 reasoning chain step 2 + step 3 rewritten to reflect this: both estimators are now described as having calibrated inference; the trade-off is parameterization (linear ATT vs. ASF outcome-scale ATT). The Poisson non-negativity gate explanation is preserved. P3 (§5.2 example self-contradiction): §5.2 was titled "Continuous-dose panel with zero baseline" with prose "block-style adoption in year 3", which suggested a `0,0,d,d` within-unit dose path — but the shown profile has `treatment_varies_within_unit=False` (per-unit constant dose) and the same section's later counter-example correctly says `0,0,d,d` is out of scope for ContinuousDiD. Self-contradictory framing. Renamed to "Continuous-dose panel with zero-dose controls" (per reviewer suggestion) and clarified the prose: each dose group holds its assigned dose value (including 0 for the never-treated controls) in every observed period; adoption timing is carried separately via the `first_treat` column passed to `ContinuousDiD.fit()`, not via within-unit dose variation. Updated the matching `test_autonomous_contains_worked_examples_section` assertion in `tests/test_guides.py` to track the new title. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…tcome rationale to functional-form framing Round 9 rewrote §4.11 detailed bullet and §5.3 reasoning chain to describe the Poisson-vs-linear choice in functional-form / efficiency terms, but two surfaces still carried the old SE-based framing: - §2 `is_count_like` field reference: "questionable asymptotic SEs" - §4.11 preamble paragraph: "asymptotic SEs may mislead" Both phrases conflict with the library's documented inference contract (REGISTRY.md §WooldridgeDiD: linear ETWFE uses cluster-robust sandwich SEs which are valid for any error distribution with finite moments — not just Gaussian). Reviewer correctly flagged this as the remaining methodology P2. Updated both surfaces to match the round-9 framing: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome; cluster-robust SEs remain calibrated but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` is the multiplicative (log-link) ETWFE alternative that respects the non-negative support and matches the typical count-process generative model, with QMLE sandwich SEs robust to distributional misspecification. - The choice is functional-form / efficiency / support, NOT SE calibration. Verified no remaining "asymptotic SEs" / "questionable" / "mislead" phrases in the guide via grep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…mous-guide worked examples (Wave 2) Wave 2 of the AI-agent enablement track. Extends profile_panel() with two new optional sub-dataclasses: - OutcomeShape (numeric outcomes only): n_distinct_values, pct_zeros, value_min/max, NaN-safe skewness + excess_kurtosis (gated on n_distinct >= 3 and std > 0), is_integer_valued, is_count_like (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values), is_bounded_unit ([0, 1] support). - TreatmentDoseShape (treatment_type == "continuous" only): n_distinct_doses, has_zero_dose, dose_min/max/mean over non-zero doses, is_time_invariant (per-unit non-zero doses have at most one distinct value). Both fields are None when their classification gate is not met. to_dict() serializes the nested dataclasses as JSON-compatible nested dicts. llms-autonomous.txt gains a new §5 "Worked examples" with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections, §3 footnote cross-ref, §4.7 cross-ref, and a new §4.11 outcome-shape considerations section. Existing §5-§8 renumbered to §6-§9. Descriptive only - no recommender language inside the worked examples. Tests: 16 new unit tests in tests/test_profile_panel.py covering each heuristic (count-like Poisson, binary-as-not-count-like, continuous normal, bounded unit, categorical returning None, skewness gating, JSON roundtrip, time-invariant dose, time-varying dose, no-zero-dose, binary-treatment returning None, categorical-treatment returning None, JSON roundtrip, frozen invariants on both new dataclasses). Two new content-stability tests in tests/test_guides.py guard the §5 worked examples and the new field references. CHANGELOG and ROADMAP updated; ROADMAP marks Wave 2 shipped, promotes sanity_checks block to top of "Next blocks toward the vision," and documents why the originally-proposed post-hoc mismatch detection was rescoped (largely overlaps existing fit-time validators and caveats). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…are integer detection
P1 from local review (`treatment_dose.is_time_invariant`):
- Removed `np.round(..., 8)` tolerance from `_compute_treatment_dose`'s
per-unit non-zero distinct-count check. The documented contract is
"per-unit non-zero doses have at most one distinct value" (exact),
but the implementation was rounding to 8 decimals before comparing,
silently classifying tiny-but-real dose variation as time-invariant
and contradicting the docstring + CHANGELOG + autonomous guide §2.
Now uses exact `np.unique(unit_nonzero).size > 1`. Added a regression
test (`test_treatment_dose_distinguishes_doses_at_high_precision`) for
a unit with two non-equal doses separated by 1e-9 (sub the previous
rounding window) — asserts `is_time_invariant=False`.
Related dead-code removal:
- Removed the `len(nonzero) == 0` defensive branch in
`_compute_treatment_dose`. `treatment_type == "continuous"` is reached
only when the treatment column has more than two distinct values OR
a 2-valued numeric outside `{0, 1}`; an all-zero numeric column is
classified as `binary_absorbing` and never reaches this branch, so
`nonzero` is guaranteed non-empty. Removing the branch eliminates
the NaN-vs-Optional[float] inconsistency the reviewer flagged on
`dose_min/max/mean`.
P2 from local review (`is_integer_valued` brittleness):
- Switched from `np.equal(np.mod(arr, 1.0), 0.0)` to
`np.isclose(arr, np.round(arr), rtol=0.0, atol=1e-12)`. The treatment
/ outcome column is user input (system boundary), and CSV-roundtripped
count columns commonly carry float64 representation noise (e.g.,
`1.0` stored as `1.0000000000000002`). Tolerance-aware integer
detection is the right discipline at the boundary; downstream the
`is_count_like` heuristic remains gated on this AND `pct_zeros > 0`
AND `skewness > 0.5` AND `n_distinct > 2`, so isolated noise can't
flip the classification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ptive-only; fix WooldridgeDiD method kwarg P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…corrected scope; cover new exports in import-surface test P3 #1 (ROADMAP wording drift): ROADMAP.md still said the new fields "gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit" and mentioned "time-invariance", which contradicted the round-1 corrections to TreatmentDoseShape's docstring + autonomous guide §2 + §5.2. Reworded to match: the new fields add descriptive distributional context only; `outcome_shape.is_count_like` informs (not gates) the WooldridgeDiD QMLE judgment, and the authoritative ContinuousDiD pre-fit gates remain `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced`. "Time-invariance" wording removed (the field was dropped in round 1). P3 #2 (import-surface test coverage): `test_top_level_import_surface()` previously only verified `profile_panel`, `PanelProfile`, `Alert`. Extended to also cover the two new public exports `OutcomeShape` and `TreatmentDoseShape`, asserting both their importability and their presence in `diff_diff.__all__`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontinuousDiD prerequisite summaries Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n estimand wording + is_count_like non-negativity guard P1 #1 (Wooldridge Poisson estimand wording): The guide §4.11 and §5.3 worked example described `WooldridgeDiD(method="poisson")`'s `overall_att` as a "multiplicative effect" / "log-link effect" / "proportional change" to be reported. Verified against `wooldridge.py:1225` (`att = _avg(mu_1 - mu_0, cell_mask)`) and `_reporting_helpers.py:262-281` (registered estimand: "ASF-based average from Wooldridge ETWFE ... average-structural-function (ASF) contrast between treated and counterfactual untreated outcomes ... on the natural outcome scale"): the actual quantity is `E[exp(η_1)] - E[exp(η_0)]`, an outcome-scale DIFFERENCE, not a multiplicative ratio. An agent following the previous wording would misreport the headline scalar. Rewrote both surfaces to: - Describe the estimand as an ASF-based outcome-scale difference, citing `wooldridge.py:1225` and Wooldridge (2023) + REGISTRY.md §WooldridgeDiD nonlinear / ASF path. - Explicitly note the headline `overall_att` is a difference on the natural outcome scale, NOT a multiplicative ratio. - Mention that a proportional / percent-change interpretation can be derived post-hoc as `overall_att / E[Y_0]` but is not the estimator's reported scalar. Added `test_autonomous_count_outcome_uses_asf_outcome_scale_estimand` in `tests/test_guides.py`: extracts §4.11 and §5.3 blocks, asserts forbidden phrases ("multiplicative effect under qmle", "estimates the multiplicative effect", "multiplicative (log-link) effect", "report the multiplicative effect", "report the multiplicative") do NOT appear, and asserts §5.3 explicitly contains "ASF" and "outcome scale" so future edits cannot silently weaken the description. P1 #2 (`is_count_like` non-negativity guard): The `is_count_like` heuristic gated on integer-valued + has-zeros + right-skewed + > 2 distinct values, but did NOT exclude negative support. Verified against `wooldridge.py:1105-1109`: Poisson method hard-rejects `y < 0` with `ValueError`. Without a value_min >= 0 guard, a right-skewed integer outcome with zeros and some negatives would set `is_count_like=True` and steer an agent toward an estimator that then refuses to fit. Added `value_min >= 0.0` to the heuristic and explained the non-negativity gate in the docstring + autonomous guide §2 field reference (now reads "is_integer_valued AND pct_zeros > 0 AND skewness > 0.5 AND n_distinct_values > 2 AND value_min >= 0"). The guide also notes that the gate exists specifically to align the routing signal with WooldridgeDiD Poisson's hard non-negativity requirement. Added `test_outcome_shape_count_like_excludes_negative_support` in `tests/test_profile_panel.py` covering a Poisson-distributed outcome with a small share of negative integers spliced in: asserts `is_count_like=False` despite the other four conditions firing. P2 (test coverage for both P1s): Both regressions above guard the new contracts. The guide test guards the wording surface; the profile test guards the heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e-dose gate to ContinuousDiD prerequisite summaries
P1 (newly identified — `dose_min > 0` ContinuousDiD gate omitted):
Reviewer correctly noted that the round-3 prerequisite summary
(`has_never_treated`, `treatment_varies_within_unit == False`,
`is_balanced`, no `duplicate_unit_time_rows` alert) omits the
estimator's strictly-positive-treated-dose restriction. Verified
`continuous_did.py:287-294` raises `ValueError` ("Dose must be
strictly positive for treated units (D > 0)") on negative treated
dose support — a panel can satisfy every documented gate above and
still hard-fail at fit time when `treatment_dose.dose_min < 0`.
Updated wording in five surfaces to add `treatment_dose.dose_min > 0`
as the fifth pre-fit gate:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out all five gates with numbered list referencing the
`continuous_did.py:287-294` line range).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (notes
`dose_min > 0` is itself a gate; the other treatment_dose
sub-fields remain descriptive).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (now five gates checked; added a new
counter-example covering the negative-dose path so an agent
reading the example sees the contradictory case explicitly).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track building-block.
P2 (regression test for negative-dose path):
Added `test_treatment_dose_min_flags_negative_dose_continuous_panels`
in `tests/test_profile_panel.py` covering a balanced, never-treated,
time-invariant continuous panel where every dose level is negative.
The test asserts that all four other gates pass cleanly and that
`dose.dose_min < 0` is correctly observed — the fixture an agent
would see when reasoning about whether ContinuousDiD applies.
Added `dose_min > 0` content-stability assertion in
`tests/test_guides.py` so future wording changes can't silently drop
the gate from the autonomous-guide summary.
Rebased onto origin/main; resolved CHANGELOG conflicts (kept HAD
Phase 4.5 B + dCDH by_path Wave 2 + my Wave 2 entries side by side).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat
P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.
Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out the screening framing explicitly + lists the `first_treat`
validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(aligned with public contract: most fields descriptive,
`dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
the multi-paragraph block to describe screening + first_treat
caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph: screening checks + necessary-not-sufficient
language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (rewrote step 2 to call out screening +
first_treat caveat; clarified counter-example #4 that
`P(D=0) > 0` is required under BOTH `control_group="never_treated"`
and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.
P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.
P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se in CHANGELOG is_count_like description Reviewer correctly noted that the CHANGELOG bullet describing `is_count_like` listed only four of the five conditions (integer-valued + has zeros + right-skewed + > 2 distinct values) but omitted the `value_min >= 0` non-negativity clause added in round 4. Readers of the release notes alone would have expected negative-valued count-like outcomes to route as `is_count_like=True` even though the implementation intentionally suppresses that to stay compatible with `WooldridgeDiD(method="poisson")`'s hard non-negative requirement (`wooldridge.py:1105-1109`). Updated the bullet to include the non-negativity clause and explicitly cite the wooldridge.py line range so the release-notes description matches `diff_diff/profile.py`, `diff_diff/guides/llms-autonomous.txt`, and the `test_outcome_shape_count_like_excludes_negative_support` regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D opening bullet with the five-check screen below it §4.7 had two summaries of the ContinuousDiD pre-fit contract that disagreed: the opening bullet (Wave 1 era) said "Three eligibility prerequisites", while the paragraph immediately below correctly listed all five profile-side screening checks (Wave 2 added the duplicate-row hard stop and the negative-dose `dose_min > 0` screen). A reader who stopped at the opening bullet could miss those two load-bearing gates and route a duplicate-containing or negative-dose panel into a failed/silent-overwrite fit path. Updated the opening bullet to enumerate all five profile-side checks: (a) zero-dose controls / `P(D=0) > 0`, (b) per-unit time-invariant dose, (c) balanced panel, (d) no `duplicate_unit_time_rows` alert, (e) strictly positive treated doses. Also clarified that the duplicate-row case (d) coerces silently rather than raising — distinct from (a)/(b)/(c)/(e) which all raise ValueError. Pointer to the paragraph below + §2 for the `first_treat` validation surface kept intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ationale in functional-form terms; rename §5.2 + clarify dose constancy P2 (linear-OLS SE rationale wording): The §4.11 and §5.3 prose justified `WooldridgeDiD(method="poisson")` over linear-OLS DiD by claiming linear-OLS asymptotic SEs "assume normal-shaped errors" that a right-skewed count distribution violates. That misrepresents the library's documented inference contract: linear ETWFE uses cluster-robust sandwich SEs (`OLS path` in REGISTRY.md) which are valid for any error distribution with finite moments — not just Gaussian. Rewrote both surfaces in functional-form / efficiency terms: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome. SEs are calibrated (cluster-robust) but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` imposes a multiplicative (log-link) functional form that respects non-negativity and matches the typical generative process for count data; QMLE sandwich SEs are robust to distributional misspecification. - The choice is about WHICH functional form best summarizes the treatment effect (additive vs multiplicative), not about whether SEs are calibrated. §5.3 reasoning chain step 2 + step 3 rewritten to reflect this: both estimators are now described as having calibrated inference; the trade-off is parameterization (linear ATT vs. ASF outcome-scale ATT). The Poisson non-negativity gate explanation is preserved. P3 (§5.2 example self-contradiction): §5.2 was titled "Continuous-dose panel with zero baseline" with prose "block-style adoption in year 3", which suggested a `0,0,d,d` within-unit dose path — but the shown profile has `treatment_varies_within_unit=False` (per-unit constant dose) and the same section's later counter-example correctly says `0,0,d,d` is out of scope for ContinuousDiD. Self-contradictory framing. Renamed to "Continuous-dose panel with zero-dose controls" (per reviewer suggestion) and clarified the prose: each dose group holds its assigned dose value (including 0 for the never-treated controls) in every observed period; adoption timing is carried separately via the `first_treat` column passed to `ContinuousDiD.fit()`, not via within-unit dose variation. Updated the matching `test_autonomous_contains_worked_examples_section` assertion in `tests/test_guides.py` to track the new title. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tcome rationale to functional-form framing Round 9 rewrote §4.11 detailed bullet and §5.3 reasoning chain to describe the Poisson-vs-linear choice in functional-form / efficiency terms, but two surfaces still carried the old SE-based framing: - §2 `is_count_like` field reference: "questionable asymptotic SEs" - §4.11 preamble paragraph: "asymptotic SEs may mislead" Both phrases conflict with the library's documented inference contract (REGISTRY.md §WooldridgeDiD: linear ETWFE uses cluster-robust sandwich SEs which are valid for any error distribution with finite moments — not just Gaussian). Reviewer correctly flagged this as the remaining methodology P2. Updated both surfaces to match the round-9 framing: - Linear-OLS DiD imposes an additive functional form on a non-negative count outcome; cluster-robust SEs remain calibrated but the linear model can be inefficient and may produce counterfactual predictions outside the non-negative support. - `WooldridgeDiD(method="poisson")` is the multiplicative (log-link) ETWFE alternative that respects the non-negative support and matches the typical count-process generative model, with QMLE sandwich SEs robust to distributional misspecification. - The choice is functional-form / efficiency / support, NOT SE calibration. Verified no remaining "asymptotic SEs" / "questionable" / "mislead" phrases in the guide via grep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…light checks as standard-workflow predictions, not estimator gates
Reviewer correctly noted that calling
{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}
the "screening checks" / "necessary" gates of `ContinuousDiD`
overstates the contract. `ContinuousDiD.fit()` keys off the
separate `first_treat` column (which `profile_panel` does not see),
defines never-treated controls as `first_treat == 0` rows,
force-zeroes nonzero `dose` on those rows with a `UserWarning`,
and rejects negative dose only among treated units `first_treat > 0`
(see `continuous_did.py:276-327` and `:348-360`).
Two of the five checks (`has_never_treated`, `dose_min > 0`) are
first_treat-dependent: agents who relabel positive- or negative-dose
units as `first_treat == 0` trigger the force-zero coercion path
with a `UserWarning` and may still fit panels that fail those
preflights, with the methodology shifting. The other three
(`treatment_varies_within_unit`, `is_balanced`, duplicate-row
absence) are real fit-time gates that hold regardless of how
`first_treat` is constructed.
Reframed every wording site to call these "standard-workflow
preflight checks" — predictive when the agent derives `first_treat`
from the same dose column passed to `profile_panel`, but not the
estimator's literal contract:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote
the multi-paragraph block; explicit standard-workflow definition
+ per-check first_treat dependency map + force-zero coercion
caveat).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(already brief; stays consistent).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (long
rewrite covering the standard-workflow framing + override paths).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
trailing paragraph (both updated; opening bullet now spells out
which of the five checks are first_treat-dependent vs. hard
fit-time stops; trailing paragraph promotes the standard-
workflow framing).
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step
2 (rewrote the gate-checking paragraph; counter-example #4
expanded to enumerate (a) supply matching first_treat and accept
rejection, (b) deliberate relabel + coercion, (c) different
estimator; counter-example #5 distinguishes negative-dose
treated-unit rejection from never-treated coercion).
- `CHANGELOG.md` Wave 2 entry (matches the new framing).
- `ROADMAP.md` AI-Agent Track building block (matches).
Test coverage:
- Renamed assertion messages in
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and `test_treatment_dose_min_flags_negative_dose_continuous_panels`
to remove "authoritative gate" phrasing; reframed as "standard-
workflow preflight" assertions consistent with the corrected docs.
- Added `test_negative_dose_on_never_treated_coerces_not_rejects`
in `tests/test_continuous_did.py::TestEdgeCases` covering the
reviewer's specific request: never-treated rows with NEGATIVE
nonzero dose must coerce (with `UserWarning`) rather than raise
the treated-unit negative-dose error. Sister to the existing
`test_nonzero_dose_on_never_treated_warns` which covers the
positive-dose case.
Rebased onto origin/main during this round (no conflicts beyond
prior CHANGELOG resolutions; main advanced 19 commits).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31e8875 to
25a1197
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…s-fallback wording; correct duplicate-row "fit-time stop" claim
P1 (relabel-to-manufacture-controls misframing):
Round 11 introduced wording across the guide, profile docstring,
CHANGELOG, ROADMAP, and test docstrings that presented intentional
`first_treat == 0` relabeling of nonzero-dose units as an
"option" / "fallback" for fitting `ContinuousDiD` when the
profile-side preflights (`has_never_treated`, `dose_min > 0`)
fail. REGISTRY does not document this as a routing option, and the
estimator still requires actual `P(D=0) > 0` because Remark 3.1
lowest-dose-as-control is not yet implemented. The force-zero
coercion at `continuous_did.py:311-327` is implementation behavior
for INCONSISTENT inputs (e.g., user accidentally passes nonzero
dose on a never-treated row), not a methodology fallback.
Reworded every site to remove the relabeling-as-option framing and
replace it with the registry-documented fixes when (1) or (5)
fails: re-encode the treatment column to a non-negative scale that
contains a true never-treated group, or route to a different
estimator (`HeterogeneousAdoptionDiD` for graded-adoption panels;
linear DiD with the treatment as a continuous covariate). Every
remaining "manufacture controls" mention in the guide, profile,
and tests is now an explicit anti-recommendation ("do not relabel
... to manufacture controls"). Updated:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (item (1):
"not an opportunity to relabel ..."; item (5): coercion is
"implementation behavior for inconsistent inputs, not a
methodological fallback").
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (the
When-(1)-or-(5)-fails paragraph names re-encode + alternative
estimator only; explicit anti-relabel warning).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
trailing paragraph (consolidated; opening bullet drops the
relabel-as-fallback framing; trailing paragraph trimmed to a
pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 step 2 + counter-
example #4 + counter-example #5 (relabel-as-option language
removed; explicit "do not relabel" callouts; counter-example #4
options trimmed to (a) re-encode and (b) different estimator).
- `CHANGELOG.md` (relabel-as-option clause removed; replaced with
re-encode / different-estimator framing).
- `ROADMAP.md` (same).
- `tests/test_profile_panel.py` two test docstrings (relabel-as-
workflow language removed).
P2 (duplicate-row "hard fit-time stop" misclaim):
Round 11 wording said "duplicate-row failures are hard fit-time
stops" — incorrect. `_precompute_structures` at
`continuous_did.py:818-823` silently overwrites with last-row-wins,
no exception raised. Reworded as "hard preflight veto: the agent
must deduplicate before fit because `ContinuousDiD` otherwise uses
last-row-wins, no fit-time exception" in profile.py docstring,
guide §4.7 opening bullet, and §5.2 step 2 (now defers to §2 for
the breakdown). The previously-correct §2 description of the
silent-coerce path is preserved.
Length housekeeping:
The round-11 round-12 expansion pushed `llms-autonomous.txt`
above `llms-full.txt`, breaking `test_full_is_largest`. Trimmed
~2.7KB by consolidating the §4.7 trailing paragraph + §5.2 step 2
trailing block to point at §2's full breakdown rather than
duplicating the per-check semantics. autonomous: 65364 chars,
full: 66058 chars.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
… first_treat from dose" framing; add PanelProfile backward-compat defaults; fix test_continuous_did docstring P1 (canonical ContinuousDiD setup vs. derive-from-dose framing): Round 12 introduced a "standard workflow" description across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that said agents derive `first_treat` from the same dose column passed to `profile_panel`. Reviewer correctly noted this conflicts with the actual ContinuousDiD contract (`continuous_did.py:222-228`, `prep_dgp.py:970-993`, `docs/methodology/continuous-did.md:65-73`): the canonical setup uses a **time-invariant per-unit dose** `D_i` and a **separate `first_treat` column** the caller supplies — the dose column has no within-unit time variation in this setup, so it cannot encode timing. An agent following the rejected framing would either build a `0,0,d,d` path (which `fit()` rejects) or keep a valid constant-dose panel (in which case the dose column carries no timing information). Reworded every site to drop the derive-from-dose framing and replace with the canonical setup. The five facts on the dose column remain predictive of `fit()` outcomes BECAUSE the canonical convention ties `first_treat == 0` to `D_i == 0` and treated units carry their constant dose across all periods — so `has_never_treated` proxies `P(D=0) > 0` and `dose_min > 0` predicts the strictly- positive-treated-dose requirement, without any "derivation" of `first_treat` from the dose column. Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block to use the canonical-setup framing and added an explicit "agent must validate `first_treat` independently" note). - `diff_diff/guides/llms-autonomous.txt` §2 field reference. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet. - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 + counter-examples #4 and #5 (now describe the canonical setup rather than a derive-from-dose workflow). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` `test_treatment_dose_min_flags _negative_dose_continuous_panels` docstring/comments. P2 (PanelProfile direct-construction backward compat): Wave 2 added `outcome_shape` and `treatment_dose` to PanelProfile without defaults, breaking direct `PanelProfile(...)` calls that predate Wave 2. Made both fields default to `None` (moved them to the end of the field list; both are `Optional[...]`). Added `test_panel_profile_direct_construction_without_wave2_fields` asserting that direct construction without the new fields succeeds and yields `None` defaults that serialize correctly through `to_dict()`. P3 (test_continuous_did.py docstring overstating sanction): The new `test_negative_dose_on_never_treated_coerces_not_rejects` docstring said the contract "lets agents legally relabel negative-dose units as `first_treat == 0` to coerce them away." Reworded as observed implementation behavior for inconsistent inputs, NOT a sanctioned routing option — the test locks in the coercion contract while the autonomous guide §5.2 explicitly tells agents not to use this path methodologically. Length invariant maintained: autonomous (65748 chars) < full (66031 chars); `test_full_is_largest` still passes (compares character count, not byte count, so on-disk size with UTF-8 multi-byte characters differs from the assertion target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
PanelProfilewithoutcome_shape: Optional[OutcomeShape](numeric outcomes) andtreatment_dose: Optional[TreatmentDoseShape](continuous treatments). New fields populate distributional facts that gateWooldridgeDiD(QMLE) andContinuousDiDprerequisites pre-fit.llms-autonomous.txt(binary-staggered with never-treated, continuous-dose with zero baseline, count-shaped outcome). New §5 between the existing §4 reasoning and §5 post-fit validation; existing §5–§8 renumbered to §6–§9.OutcomeShape,TreatmentDoseShapefromdiff_diff. Both new dataclasses are frozen and JSON-serializable viaPanelProfile.to_dict().Methodology references (required if estimator / math changes)
profile_panelis descriptive (not an estimator); the worked examples referenceCallawaySantAnna,WooldridgeDiD,ContinuousDiD. No estimator math is changed in this PR.docs/methodology/REGISTRY.md.is_time_invariantfield uses exact distinct-count on observed non-zero doses (matching the documented contract andContinuousDiD.fit()'s exact equality check). Theis_count_likeandis_integer_valuedfields are explicitly labeled heuristics in the docstring + guide.Validation
tests/test_profile_panel.pycovering each shape heuristic (count-like Poisson, binary-as-not-count-like, continuous normal, bounded unit, categorical returning None, skewness/kurtosis gating, JSON roundtrip, time-invariant dose, time-varying dose, no-zero-dose, binary/categorical treatment returning None, frozen invariants on both new dataclasses, sub-1e-8 dose-precision regression). 2 new content-stability tests intests/test_guides.pyguard the §5 worked-examples block and the new field references.Security / privacy