From a214168eb144b9c63ba65c68fff74222c64dcfcb Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sun, 31 May 2026 16:47:37 -0400
Subject: [PATCH] WooldridgeDiD: outcome-fit hint for OLS on binary/count
 outcomes

Add a non-fatal UserWarning when WooldridgeDiD(method="ols") is used on a
binary ({0,1}) or non-negative integer-count outcome. Following Wooldridge
(2023), the warning notes that a matching nonlinear model (logit / Poisson)
is often the MORE APPROPRIATE specification for such outcomes: it imposes
parallel trends on the link/index scale rather than in levels (level-PT is
only valid for continuous/unbounded outcomes), and the paper's Section 5
simulations show the linear model both biased and less precise where the
nonlinear mean holds. It is a different identifying assumption than linear
OLS -- which fits depends on which parallel-trends restriction holds -- so
the hint is a recommended comparison, not an automatic switch or a free
efficiency upgrade. OLS remains a valid QMLE for any response (Table 1).

- _suggest_nonlinear_method: pure, non-fatal detector (binary -> logit,
  non-negative count with >2 distinct -> poisson; fractional/continuous/
  negative/non-numeric -> None). Bounded binomial-style integers are not
  separately distinguished from unbounded counts (documented heuristic).
- fit() gate 0g (OLS-only, stacklevel=2) emits the hint on the full
  outcome column before sample filtering; never alters the fit or raises
- TestOutcomeFitHint: detector units (incl. the bounded-support case) +
  gate behavior + suppression + paper-faithful framing guard
  (more-appropriate / biased / different assumption / recommended-comparison)
- Docs: REGISTRY Note (LPT-vs-IPT + Section 5 evidence + two-sided framing
  provenance), method docstring, llms-full.txt, wooldridge_etwfe.rst, CHANGELOG
- TODO: remove the Tier A + Methodology/Correctness rows (#216)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                   |   1 +
 TODO.md                        |   3 +-
 diff_diff/guides/llms-full.txt |   5 +-
 diff_diff/wooldridge.py        |  92 ++++++++++++++++++++++-
 docs/api/wooldridge_etwfe.rst  |  16 ++++
 docs/methodology/REGISTRY.md   |   2 +
 tests/test_wooldridge.py       | 130 +++++++++++++++++++++++++++++++++
 7 files changed, 244 insertions(+), 5 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index fe5c90c1..7bd213d8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **PowerAnalysis methodology-review-tracker promotion: In Progress → Complete, with a panel-variance correction (behavior change).** Closes the Bloom (1995) + Burlig, Preonas & Woerman (2020) source audits on the tracker (PR-A #506 added both paper reviews + under-review Notes; this PR validates the source against the code and reconciles the discrepancies). **Behavior change:** the analytical *panel* DiD variance was the Moulton design-effect factor `(1+(T−1)·rho)/T`, wrong two ways versus the source — wrong period-scaling (~4× too small at `rho=0`, `m=r=5` versus the iid DiD benchmark) and the **opposite `rho`-sign** (it *raised* the MDE as within-unit correlation grew). It is replaced by the within-unit equicorrelated special case of Burlig et al. Eq. 2, `Var(ATT) = sigma² · (1/n_T + 1/n_C) · (1/n_pre + 1/n_post) · (1 − rho)`, in which within-unit (serial) correlation *lowers* the MDE because the difference-in-differences cancels the shared within-unit component. So `PowerAnalysis.mde` / `power` / `sample_size` (and the `compute_*` wrappers) now return a **smaller** MDE / required N as `rho` rises for **all** designs; the 2×2 path matches Bloom's `2σ²` at the default `rho = 0` and is continuous with the panel form at `n_pre = n_post = 1`. New input validation, enforced for **all** designs *before* the 2×2-vs-panel router: `n_pre >= 1`, `n_post >= 1`, `rho ∈ [−1/(T−1), 1)` (`T = n_pre + n_post`), finite `sigma >= 0`, positive group counts, and `treat_frac ∈ (0, 1)` now raise `ValueError` (previously invalid two-period shapes and out-of-range `rho` fell through to `basic_did` silently). The `(1 − rho)` factor applies at `T = 2` too — the 2×2 path is Burlig's `m = r = 1` special case (footnote 11), so a nonzero `rho` is no longer silently ignored there, while `rho = 0` still recovers Bloom's `2σ²`. The MDE multiplier stays the **normal (z)** Bloom multiplier (a deliberate large-sample approximation to Burlig's t, documented as `**Deviation from R:**`) — unchanged. New `tests/test_methodology_power.py` (Bloom Table 1 multipliers; 2×2 + panel closed forms; a literal-equicorrelated Monte-Carlo validation of the panel variance; `sample_size`↔`mde` round-trip; input-guard + `rho`-at-`T=2` + `compute_*` wrapper validation; base-R `qnorm` parity at `benchmarks/data/r_power_golden.json`, generator `benchmarks/R/generate_power_golden.R`); the two `tests/test_power.py` ICC-direction tests were inverted to Burlig's sign. REGISTRY `## PowerAnalysis` equation block rewritten (z not t; corrected 2×2 / panel SE + sample-size; removed the cluster-`m` and inverted-`R²` terms that matched neither code nor source); `docs/references.rst` adds Frison & Pocock (1992) + McKenzie (2012) as the equicorrelated lineage; tutorial `06_power_analysis.ipynb` corrected. `METHODOLOGY_REVIEW.md` row promoted to **Complete** (`Last Review = 2026-05-31`); priority queue pruned; the PR-A under-review Notes removed across REGISTRY / `power.py` / `references.rst`.
+- **`WooldridgeDiD` outcome-fit hint:** `WooldridgeDiD(method="ols")` now emits a `UserWarning` when the outcome is binary (`{0, 1}`) or a non-negative integer count, noting that a matching nonlinear model (`method="logit"` / `method="poisson"`) is often the **more appropriate specification** for such outcomes. Following Wooldridge (2023): the nonlinear paths impose parallel trends on the link/index scale rather than in levels (level-PT is only valid for continuous/unbounded outcomes), and the paper's Section 5 simulations show the linear model both biased and less precise where the nonlinear mean holds. It is a **different identifying assumption** than linear OLS — which one fits depends on which parallel-trends restriction holds — so the warning frames it as a recommended comparison, not an automatic switch or free efficiency upgrade. OLS remains a valid QMLE for *any* response (Table 1). Always-on (suppress via `warnings.filterwarnings`); detection is high-signal (binary requires exactly `{0, 1}`; the count branch suggests Poisson — the natural unbounded-count model — for *any* non-negative integers with >2 distinct values, so bounded binomial / known-upper-bound integer outcomes are not separately distinguished from unbounded counts; fractional / continuous outcomes are not flagged).
 - **New tutorial: `docs/tutorials/24_staggered_vs_collapsed_power.ipynb` — "Staggered Rollout or a Simple 2×2? A Power-Analysis Decision Guide".** A practitioner walkthrough for geo experiments (framed on a 50-state staggered rollout) on when to reach for Callaway-Sant'Anna vs collapsing to a familiar pre/post 2×2. Shows, with live paired Monte Carlo on `generate_staggered_data`, that the collapsed 2×2 silently targets a *diluted* estimand (reports ~60–94% of the true effect-on-treated as the rollout staggers, with near-zero CI coverage of the truth under a slow rollout), and that CS's minimum-detectable-lift penalty is a *fast-rollout* phenomenon that shrinks to parity as the rollout becomes more staggered. Fully self-contained (runs live, no committed data files); ends with a CS-vs-2×2 decision guide.
 - **`SyntheticControl` in-space placebo permutation inference + reporting-stack integration (ADH 2010 §2.4).** New `SyntheticControlResults.in_space_placebo()` provides the significance test classic SCM lacks an analytical SE for: it reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the **other `J−1` donors** (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matches `SCtools::generate.placebos`), and ranks the treated unit's post/pre **RMSPE ratio** among the `J+1` units. New fields `placebo_p_value` (`= rank/(n_placebos+1)`, an upper-tail rank test on the unsigned RMSPE ratio — direction-agnostic, so it detects an effect of *either* sign rather than a signed/one-directional hypothesis; ties counted via `≥`), `rmspe_ratio` (the treated statistic, set at fit), and `n_placebos`/`n_failed` (effective reference-set sizes; non-converged placebos are excluded from BOTH numerator and denominator, never penalized into the rank). `placebo_p_value` is a **separate field** from the (always-NaN) `p_value` — it is a permutation p-value with no SE/t-stat and does not flow through `safe_inference`; `is_significant` stays bound to `p_value`. Edge cases fail closed: scale-aware RMSPE-ratio floor (a perfect pre-fit gives a finite ratio, not `inf`), `J<2` → NaN+warn, `J==2` → degenerate+coarse warn, deterministic given `seed`. New `get_placebo_df()` returns the per-unit RMSPE-ratio summary table (incl. the treated row and any failed donors) used for the rank. The design keeps the placebo *compute* opt-in — the per-donor refit loop runs only on the explicit `in_space_placebo()` call. To support that opt-in call, every fit retains a `_SyntheticControlFitSnapshot` of the pivoted panel (memory O(units × periods × predictor-vars), like `SyntheticDiD`'s snapshot for `in_time_placebo`; excluded from pickling). A compact/lazy snapshot representation is tracked as a follow-up in `TODO.md`. **Reporting-stack integration:** `SyntheticControlResults` is now routed through `DiagnosticReport` (fit-based `scm_fit` parallel-trends analogue → verdict `design_enforced_pt` reading `pre_rmspe`; `_scm_native` surfaces `pre_rmspe` + donor-weight concentration + the placebo p-value when already computed — never triggering the refit loop implicitly), `practitioner_next_steps` (`_handle_synthetic_control` with the placebo as the headline significance step), and `BusinessReport` (fit-based assumption block, ADH 2010 attribution, robustness via `estimator_native_diagnostics`; HonestDiD passthrough rejected like SDiD/TROP). Also fixes a latent BR bug where the headline `is_significant` was a non-JSON-serializable numpy `bool_` when `p_value` is a numpy `NaN`. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the donor-pool construction, failure handling, RMSPE-ratio floor, and the non-analytical-p-value split), `docs/methodology/REPORTING.md`, `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
 - **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN; significance comes from in-space placebo permutation inference via `in_space_placebo()` (see the dedicated entry below). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
diff --git a/TODO.md b/TODO.md
index 013513b4..17fed4c2 100644
--- a/TODO.md
+++ b/TODO.md
@@ -100,7 +100,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: unconditional inference for `aggregate(weights="cohort_share")` accounting for sampling uncertainty in the cohort shares ω̂_g / ω̂_{ge} (paper W2025 Section 7.5). Current impl fail-closes the t-stat / p-value / conf-int fields to NaN under cohort-share aggregation because the analytical SE is conditional-on-shares. Proper APE/GMM-style aggregate inference (Wooldridge 2023 Section 4 framework) re-enables full inference. | `wooldridge_results.py::aggregate` cohort_share inference branch | PR-B follow-up | Medium |
 | WooldridgeDiD: `cohort_trends=True` + `survey_design` composition. PR-B Stage E fail-closes the cross-product with `NotImplementedError` at `fit()` because the full-dummy `dg_i · t` design composed with the survey TSL variance hasn't been validated against R-parity goldens. Follow-up: validate the composition (or implement a survey-aware alternative) and re-enable the surface. | `wooldridge.py` fit guard, `wooldridge_results.py::aggregate` (if survey-aware cohort_trends variance plumbing is added) | PR-B follow-up | Low |
 | WooldridgeDiD: `cohort_trends=True` + `control_group="never_treated"` composition. PR-B Stage E (codex R9 P1 fix) fail-closes the cross-product with `NotImplementedError` at `fit()` because the OLS + never_treated branch emits ALL `(g, t)` cells as treatment-cell dummies (paper Section 4.4 placebo coverage); the appended `dg_i · t` trend columns are linearly spanned by the per-cohort sum of those cell dummies, so the Section 8 trend specification is unidentified. Follow-up: implement a separate design-matrix branch that drops the pre-treatment placebo dummies (or restricts the trend interaction to post-treatment cells) under the trend specification, then re-enable the combination. | `wooldridge.py` fit guard + `_build_interaction_matrix` redesign for the cohort_trends path | PR-B follow-up | Low |
-| WooldridgeDiD: optional *efficiency hint* (NOT a canonical-link violation per W2023 Prop 3.1) when method/outcome pairing is sub-optimal — e.g., `method="ols"` on binary data is consistent under QMLE, but `method="logit"` is typically more efficient. The original framing in this row as a "canonical link requirement" tied to Prop 3.1 was incorrect: Wooldridge (2023) Table 1 lists Gaussian/OLS for "any response" and logistic-Bernoulli for "binary OR fractional". A useful hint exists (efficiency), but should not be framed as a methodology violation. See PR #453 R1 review for the corrected reading. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
 | PreTrendsPower: CS/SA `anticipation=1` R-parity fixture. The PR-C R-parity goldens cover NIS power + γ_p MDV at `atol=1e-4` on four shifted-grid / regular / irregular / K=1 fixtures, but R `pretrends` has no anticipation parameter so the Python-side `_extract_pre_period_params` anticipation filter (`if t < _pre_cutoff` in `pretrends.py` lines 1138-1150 for CS; mirror in SA branch) is not R-parity-locked. Build a synthetic `CallawaySantAnnaResults` (or `SunAbrahamResults`) with `anticipation=1` and a t=-1 event-study entry that should be filtered before reaching `_compute_power_nis`, then assert the resulting γ_p matches R's `slope_for_power()` on the K=4 shifted-grid fixture. Existing PR-B MC-based tests (`TestPretrendsPropositions`) and full-VCV tests (`TestPretrendsCovarianceSource`) already cover the filter mechanically; this would close the loop against R. | `tests/test_methodology_pretrends.py::TestPretrendsParityR`, `benchmarks/R/generate_pretrends_golden.R` | PR-C follow-up | Low |
 | `StackedDiD` `vcov_type="conley"` — deferred for a **methodology** reason, NOT plumbing (unlike the now-shipped SunAbraham / WooldridgeDiD-OLS conley threading): the stacked design replicates each control unit across every sub-experiment it qualifies for (`_build_sub_experiment`), so one geographic unit occupies many stacked rows. Conley's pairwise distance matrix would see those same-unit copies at distance 0 (`K(0)=1`, perfectly correlated), conflating the stacking-replication device with real spatial correlation, and there is no `conleyreg` analogue for stacked DiD to anchor parity. A correct treatment needs a per-stack spatial identifier and is **paper-gated**. | `diff_diff/stacked_did.py` | follow-up | Low |
@@ -186,7 +185,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 
 #### Tier A — Quick wins (≤1 day, ≤3 CI rounds expected)
 
-- WooldridgeDiD: optional efficiency hint when method/outcome pairing is sub-optimal (NOT a canonical-link violation per W2023 Prop 3.1 — see Methodology/Correctness row for the corrected framing)
+_(No active items. The sole prior entry — the WooldridgeDiD method/outcome efficiency hint — has shipped; see CHANGELOG `## [Unreleased]` and REGISTRY §WooldridgeDiD "Nonlinear extensions".)_
 
 (SyntheticDiD `placebo_effects` → `variance_effects` rename moved to Tier B — the user-facing field rename + one-release deprecation alias is too large for ≤1 day / ≤3 CI rounds.)
 
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index b9a54e21..c0e2caf9 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -1084,7 +1084,10 @@ results.aggregate("event")     # ATT by event time (relative to treatment)
 ```python
 from diff_diff import WooldridgeDiD
 
-# OLS (linear outcomes)
+# OLS (linear outcomes). On a binary/count outcome this emits a UserWarning that
+# logit/poisson is often the more appropriate specification -- link-scale (not
+# level) parallel trends, and less biased/more precise in Wooldridge (2023) sims;
+# a different identifying assumption, so a recommended comparison not a free switch.
 etwfe = WooldridgeDiD(method='ols')
 results = etwfe.fit(data, outcome='y', unit='id', time='t', cohort='first_treat')
 print(results.aggregate("simple"))
diff --git a/diff_diff/wooldridge.py b/diff_diff/wooldridge.py
index fd977bc2..d6597f5d 100644
--- a/diff_diff/wooldridge.py
+++ b/diff_diff/wooldridge.py
@@ -131,6 +131,45 @@ def _warn_and_fill_nan_cohort(df: pd.DataFrame, cohort: str, stacklevel: int) ->
     return df
 
 
+def _suggest_nonlinear_method(outcome_values) -> Optional[str]:
+    """Outcome-fit hint for ``WooldridgeDiD(method="ols")``.
+
+    Return ``"logit"`` when the outcome support is binary (exactly ``{0, 1}``)
+    or ``"poisson"`` when it is a non-negative integer count (>2 distinct
+    values), else ``None``.
+
+    OLS/Gaussian QMLE is a valid method for any response (Wooldridge 2023
+    Table 1), but for binary / count outcomes a matching nonlinear model
+    (logit / Poisson) is often the *more appropriate* specification: it imposes
+    parallel trends on the link/index scale rather than in levels (level-PT is
+    implausible for such limited outcomes), and where that nonlinear mean holds
+    Wooldridge's (2023) simulations show the linear model is both biased and
+    less precise. It rests on a *different identifying assumption* — which one
+    fits depends on which parallel-trends restriction holds — so this is a
+    recommended comparison, not a free upgrade and not a canonical-link
+    requirement.
+
+    Pure and non-fatal: non-numeric / all-non-finite / empty inputs return
+    ``None`` rather than raising, so a fit that would otherwise succeed is
+    never broken by the hint.
+    """
+    try:
+        y = np.asarray(outcome_values, dtype=float)
+    except (TypeError, ValueError):
+        return None
+    y = y[np.isfinite(y)]
+    if y.size == 0:
+        return None
+    uniq = np.unique(y)
+    # Binary: support is exactly {0, 1} (both present) -> Bernoulli QMLE (logit).
+    if uniq.size == 2 and np.all((uniq == 0) | (uniq == 1)):
+        return "logit"
+    # Count: non-negative integers with >2 distinct values -> Poisson QMLE.
+    if uniq.size > 2 and np.all(y >= 0) and np.all(y == np.floor(y)):
+        return "poisson"
+    return None
+
+
 def _filter_sample(
     data: pd.DataFrame,
     unit: str,
@@ -275,8 +314,18 @@ class WooldridgeDiD:
     Parameters
     ----------
     method : {"ols", "logit", "poisson"}
-        Estimation method. "ols" for continuous outcomes; "logit" for binary
-        or fractional outcomes; "poisson" for count data.
+        Estimation method. ``"ols"`` is the linear baseline — valid for any
+        response (Wooldridge 2023) and the usual choice for continuous outcomes;
+        ``"logit"`` for binary or fractional outcomes; ``"poisson"`` for count
+        data. When ``method="ols"`` is used on a binary (``{0, 1}``) or
+        non-negative integer-count outcome, a ``UserWarning`` notes that a
+        matching nonlinear model (logit / Poisson) is often the *more
+        appropriate* specification — it imposes parallel trends on the link
+        scale rather than in levels, and Wooldridge's (2023) simulations show
+        the linear model both biased and less precise for such outcomes when
+        the nonlinear mean holds. It rests on a different identifying assumption
+        than linear OLS, so it is a recommended comparison, not an automatic
+        switch; suppress via ``warnings.filterwarnings``.
     control_group : {"not_yet_treated", "never_treated"}
         Which units serve as the comparison group.  "not_yet_treated" (jwdid
         default) uses all untreated observations at each time period;
@@ -710,6 +759,45 @@ def fit(
                     "analytical Conley SE, or vcov_type='hc1' with the bootstrap."
                 )
 
+        # 0g. Outcome-fit hint (non-fatal): method="ols" on a binary or
+        # non-negative-count outcome. For such outcomes a matching nonlinear
+        # model (logit / Poisson) is often the MORE APPROPRIATE specification —
+        # it imposes parallel trends on the link/index scale, not in levels
+        # (Wooldridge 2023: level-PT is implausible for limited outcomes), and
+        # where the nonlinear mean holds its sims show the linear model both
+        # biased and less precise. A DIFFERENT identifying assumption than
+        # linear OLS (which fits depends on which PT holds), so the hint is a
+        # recommended comparison, not a free upgrade (see REGISTRY.md
+        # §WooldridgeDiD "Nonlinear extensions"). Detection reads the FULL
+        # outcome column: outcome *type* is a property of the variable, so the
+        # named column is the right signal. This is also identical to the
+        # estimation sample here — _filter_sample expresses the control-group
+        # choice through the design matrix, NOT by dropping rows, so the column
+        # and the fitted sample share the same outcome support (invariant pinned
+        # by test_filter_sample_preserves_outcome_support); classifying on the
+        # variable stays correct should that ever change.
+        if self.method == "ols":
+            suggested = _suggest_nonlinear_method(df[outcome])
+            if suggested is not None:
+                kind = "binary" if suggested == "logit" else "count"
+                warnings.warn(
+                    f"WooldridgeDiD(method='ols'): outcome {outcome!r} looks "
+                    f"{kind}. For a {kind} outcome a matching nonlinear model "
+                    f"(method={suggested!r}) is often the more appropriate "
+                    f"specification — it imposes parallel trends on the "
+                    f"link/index scale, not in levels (Wooldridge 2023: "
+                    f"level-PT is implausible for limited outcomes), and where "
+                    f"that nonlinear mean holds its simulations show the linear "
+                    f"model both biased and less precise. It is a different "
+                    f"identifying assumption than linear OLS (which fits depends "
+                    f"on which parallel-trends restriction holds), so treat "
+                    f"method={suggested!r} as a recommended comparison, not an "
+                    f"automatic switch; suppress this hint via "
+                    f"warnings.filterwarnings.",
+                    UserWarning,
+                    stacklevel=2,
+                )
+
         # 1. Filter to analysis sample
         sample = _filter_sample(df, unit, time, cohort, self.control_group, self.anticipation)
 
diff --git a/docs/api/wooldridge_etwfe.rst b/docs/api/wooldridge_etwfe.rst
index 6142f185..cd066fa3 100644
--- a/docs/api/wooldridge_etwfe.rst
+++ b/docs/api/wooldridge_etwfe.rst
@@ -105,6 +105,22 @@ Basic OLS (follows Stata ``jwdid y, ivar(unit) tvar(time) gvar(cohort)``)::
     print(r.summary('group'))
     print(r.summary('simple'))
 
+.. note::
+
+   When ``method="ols"`` is applied to a binary (``{0, 1}``) or non-negative
+   integer-count outcome, ``fit()`` emits a ``UserWarning`` noting that a
+   matching nonlinear model (``method="logit"`` / ``method="poisson"``) is often
+   the *more appropriate* specification for such outcomes — it imposes parallel
+   trends on the link/index scale rather than in levels (Wooldridge 2023 notes
+   level-PT is only valid for continuous/unbounded outcomes), and in that
+   paper's simulations the linear model is both biased and less precise where
+   the nonlinear mean holds. It rests on a *different identifying assumption*
+   than linear OLS, so treat it as a recommended comparison, not an automatic
+   switch. OLS remains a valid QMLE for *any* response (Wooldridge 2023);
+   suppress the hint via ``warnings.filterwarnings``. The check is heuristic:
+   bounded discrete (binomial-style) outcomes with a known upper bound are not
+   separately detected from unbounded counts.
+
 View cohort×time cell estimates (post-treatment)::
 
     for (g, t), v in sorted(r.group_time_effects.items()):
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index a6256831..38c8022f 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1579,6 +1579,8 @@ Average Structural Function (ASF) approach. For each treated cell (g, t):
 where `g(·)` is the link inverse (logistic or exp), `η_i` is the individual linear predictor
 (fixed effects + controls), and `δ_{g,t}` is the interaction coefficient from the nonlinear model.
 
+- **Note (outcome-fit hint):** When `method="ols"` is used on a **binary** (`{0,1}`) or a **non-negative integer count** outcome, `WooldridgeDiD.fit()` emits a `UserWarning` noting that a matching nonlinear model (`method="logit"` / `method="poisson"`) is often the **more appropriate specification** for such outcomes. The framing is the paper's, not an efficiency heuristic: the nonlinear paths impose parallel trends on the **link/index scale** (logit index / Poisson log-mean) rather than in **levels**, and Wooldridge (2023) states the **linear-PT assumption is only valid for continuous/unbounded outcomes** (Eq. 2.5 vs the Index-PT Eq. 2.6–2.7). In the paper's **Section 5 simulations** the linear model is both **biased** (POLS −0.15 to −0.29 for binary; OLS >30%, >50% in places, for counts) **and less precise** (nonlinear SEs ~30–70% smaller) where the nonlinear mean holds, and the paper notes pre-trends tests often **fail to detect** the linear model's misspecification. This is a **different identifying assumption** than linear OLS — which one fits depends on which parallel-trends restriction holds — so the warning frames the nonlinear model as a **recommended comparison, not an automatic switch or a free upgrade**. Per **Table 1**, Gaussian/OLS remains a valid QMLE for *any* response, so the hint never asserts OLS is mechanically wrong. **Framing provenance (two-sided):** the earlier "canonical link requirement tied to Prop 3.1" reading was **incorrect** (corrected in **PR #453 R1**); the warning is equally careful *not* to over-correct into framing the nonlinear path as a pure efficiency add-on, since the paper's case is primarily about **bias / appropriateness** (codex R1, reconciled against `docs/methodology/papers/wooldridge-2023-review.md` §5). Detection is deliberately high-signal — binary requires support exactly `{0,1}`; the count branch suggests Poisson for non-negative integers with `>2` distinct values (Poisson is the natural *unbounded* nonnegative/count model per Table 1; bounded-support / **binomial** outcomes with a known upper bound are not separately detected — that routing would need user-supplied metadata). Fractional / bounded-continuous outcomes are **not flagged by this heuristic** (fractional responses are themselves a logistic/Bernoulli case in Table 1; the heuristic simply does not attempt to detect them — a coverage limit, not a claim that OLS is preferred for such outcomes). **Detection sample:** the hint reads the *full* outcome column passed to `fit()` — outcome *type* is a property of the variable, so the named column is the right signal. In this estimator that column is also identical to the estimation sample: `WooldridgeDiD._filter_sample` expresses the control-group choice through the design matrix (`_build_interaction_matrix`), **not** by dropping rows, so the full column and the fitted sample always share the same outcome support (invariant pinned by `tests/test_wooldridge.py::TestOutcomeFitHint::test_filter_sample_preserves_outcome_support`). Classifying on the variable keeps the suggestion methodologically correct (a count is a count regardless of which rows enter estimation) should that selection ever change. Always-on (suppress via `warnings.filterwarnings`), implemented as the pure helper `_suggest_nonlinear_method` + the OLS-only fit-time gate `0g`; it never alters the fit or raises.
+
 *Standard errors:*
 - OLS: Cluster-robust sandwich estimator at the unit level (default)
 - Logit/Poisson: QMLE sandwich `(X'WX)^{-1} meat (X'WX)^{-1}` via `compute_robust_vcov(..., weights=w, weight_type="aweight")` where `w = p_i(1-p_i)` for logit or `w = μ_i` for Poisson
diff --git a/tests/test_wooldridge.py b/tests/test_wooldridge.py
index 554d4147..e5264a45 100644
--- a/tests/test_wooldridge.py
+++ b/tests/test_wooldridge.py
@@ -11,6 +11,7 @@
     _build_interaction_matrix,
     _filter_sample,
     _prepare_covariates,
+    _suggest_nonlinear_method,
 )
 from diff_diff.wooldridge_results import WooldridgeDiDResults
 
@@ -2221,3 +2222,132 @@ def _raise(X, cluster_ids, bread, contrasts):
             assert np.isnan(eff["p_value"])
             assert np.isnan(eff["conf_int"][0])
             assert np.isnan(eff["conf_int"][1])
+
+
+class TestOutcomeFitHint:
+    """method='ols' outcome-fit hint: binary -> logit, count -> poisson.
+
+    Per Wooldridge (2023), a matching nonlinear model is often the *more
+    appropriate* specification for binary/count outcomes -- link-scale (not
+    level) parallel trends, and less biased / more precise in the paper's
+    Section 5 simulations. A different identifying assumption, so a recommended
+    comparison, never a free efficiency upgrade or a canonical-link / validity
+    requirement. See REGISTRY.md WooldridgeDiD "Nonlinear extensions".
+    """
+
+    # ---- detector unit tests ----
+    @pytest.mark.parametrize(
+        "values, expected",
+        [
+            ([0, 1, 1, 0, 1], "logit"),  # binary {0, 1}
+            ([0.0, 1.0, 0.0], "logit"),  # binary as floats
+            (pd.Series([True, False, True]), "logit"),  # bool dtype
+            ([0, 1, 2, 3, 4], "poisson"),  # count
+            ([1, 2, 3], "poisson"),  # count without zero
+            ([0, 1, 2, 3], "poisson"),  # bounded-support integer routes to poisson too
+            # ^ documented heuristic limit: a known-upper-bound (binomial-style)
+            #   integer outcome is NOT separately distinguished from an unbounded
+            #   count -- both take the poisson branch (Wooldridge 2023 Table 1).
+            ([0.1, 0.5, 1.7, 2.3], None),  # continuous
+            ([0.1, 0.4, 0.9], None),  # fractional in [0, 1]
+            ([-1, 0, 1, 2], None),  # has a negative
+            ([np.nan, np.nan], None),  # all non-finite
+            ([1.0, 1.0, 1.0], None),  # constant / single value
+            (pd.Series(["a", "b", "c"], dtype=object), None),  # non-numeric
+        ],
+    )
+    def test_detector(self, values, expected):
+        assert _suggest_nonlinear_method(values) == expected
+
+    # ---- gate behavior ----
+    def _binary_panel(self):
+        df = _make_panel(seed=1)
+        rng = np.random.default_rng(1)
+        df["y"] = rng.integers(0, 2, len(df)).astype(float)
+        return df
+
+    def _count_panel(self):
+        df = _make_panel(seed=2)
+        rng = np.random.default_rng(2)
+        df["y"] = rng.integers(0, 6, len(df)).astype(float)
+        return df
+
+    @staticmethod
+    def _fit(df, **kwargs):
+        return WooldridgeDiD(**kwargs).fit(df, "y", "unit", "time", "cohort")
+
+    @staticmethod
+    def _hint_msgs(rec):
+        return [str(w.message) for w in rec if "matching nonlinear model" in str(w.message)]
+
+    def test_ols_binary_warns_logit(self):
+        df = self._binary_panel()
+        with pytest.warns(UserWarning, match=r"method='logit'.*more appropriate"):
+            res = self._fit(df, method="ols")
+        assert np.isfinite(res.overall_att)
+
+    def test_ols_count_warns_poisson(self):
+        df = self._count_panel()
+        with pytest.warns(UserWarning, match=r"method='poisson'.*more appropriate"):
+            self._fit(df, method="ols")
+
+    def test_ols_continuous_silent(self):
+        df = _make_panel(seed=3)  # default y is standard-normal continuous
+        with warnings.catch_warnings(record=True) as rec:
+            warnings.simplefilter("always")
+            self._fit(df, method="ols")
+        assert not self._hint_msgs(rec)
+
+    def test_logit_binary_no_hint(self):
+        # The hint is OLS-only; a logit fit on binary data must not emit it.
+        df = self._binary_panel()
+        with warnings.catch_warnings(record=True) as rec:
+            warnings.simplefilter("always")
+            self._fit(df, method="logit")
+        assert not self._hint_msgs(rec)
+
+    def test_hint_fires_with_cohort_trends(self):
+        # Param-interaction smoke: hint still fires on the cohort_trends path.
+        df = self._binary_panel()
+        with pytest.warns(UserWarning, match=r"more appropriate"):
+            self._fit(df, method="ols", cohort_trends=True)
+
+    def test_suppression_via_filterwarnings(self):
+        df = self._binary_panel()
+        with warnings.catch_warnings():
+            warnings.filterwarnings("ignore", category=UserWarning)
+            res = self._fit(df, method="ols")  # must not raise
+        assert np.isfinite(res.overall_att)
+
+    def test_framing_paper_faithful(self):
+        # Lock the paper-faithful framing (Wooldridge 2023: LPT vs IPT + the
+        # Section 5 simulations). The nonlinear model is the *more appropriate*
+        # specification for binary/count outcomes — it reduces bias (not only
+        # variance) and rests on a *different identifying assumption* (link-scale
+        # vs level parallel trends): a recommended comparison, never an
+        # unconditional efficiency upgrade and never a "violation" of OLS.
+        df = self._binary_panel()
+        with warnings.catch_warnings(record=True) as rec:
+            warnings.simplefilter("always")
+            self._fit(df, method="ols")
+        msgs = self._hint_msgs(rec)
+        assert msgs
+        text = msgs[0].lower()
+        assert "more appropriate" in text  # appropriateness, not only efficiency
+        assert "biased" in text  # the paper's bias finding, not just precision
+        assert "assumption" in text  # a different identifying assumption
+        assert "recommended comparison" in text  # a comparison, not a switch
+        # Never frame OLS as wrong / a link requirement or an automatic upgrade.
+        for forbidden in ("canonical", "violation", "required", "must use"):
+            assert forbidden not in text
+
+    def test_filter_sample_preserves_outcome_support(self):
+        # Invariant behind full-column detection: _filter_sample selects the
+        # control group via the design matrix, NOT by dropping rows, so the
+        # full outcome column and the estimation sample always share the same
+        # support -- the pre/post-filter detection distinction is therefore
+        # moot. A future refactor that drops rows would surface here.
+        df = self._count_panel()
+        for cg in ("not_yet_treated", "never_treated"):
+            sample = _filter_sample(df, "unit", "time", "cohort", cg, 0)
+            assert sorted(sample["y"].unique()) == sorted(df["y"].unique())