igerber · igerber · Apr 11, 2026 · Apr 10, 2026 · Apr 11, 2026
diff --git a/README.md b/README.md
@@ -82,6 +82,7 @@ Measuring campaign lift? Evaluating a product launch? diff-diff handles the caus
 - **[Which method fits my problem?](docs/practitioner_decision_tree.rst)** - Start from your business scenario (campaign in some markets, staggered rollout, survey data) and find the right estimator
 - **[Getting started for practitioners](docs/practitioner_getting_started.rst)** - End-to-end walkthrough: marketing campaign -> causal estimate -> stakeholder-ready result
 - **[Brand awareness survey tutorial](docs/tutorials/17_brand_awareness_survey.ipynb)** - Full example with complex survey design, brand funnel analysis, and staggered rollouts
+- **Have BRFSS/ACS/CPS individual records?** Use [`aggregate_survey()`](docs/api/prep.rst) to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights. The returned second-stage design uses analytic weights (`aweight`), so it works directly with `DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD`, `SunAbraham`, `ContinuousDiD`, and `EfficientDiD` (estimators marked **Full** in the [survey support matrix](docs/choosing_estimator.rst))
 
 Already know DiD? The [academic quickstart](docs/quickstart.rst) and [estimator guide](docs/choosing_estimator.rst) cover the full technical details.
 
@@ -106,6 +107,7 @@ Already know DiD? The [academic quickstart](docs/quickstart.rst) and [estimator
 - **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests
 - **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator
 - **Data prep utilities**: Helper functions for common data preparation tasks
+- **Survey microdata aggregation**: `aggregate_survey()` rolls individual-level survey data (BRFSS, ACS, CPS, NHANES) into geographic-period panels with design-based precision weights for second-stage DiD
 - **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst))
 
 ## Estimator Aliases

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -102,7 +102,7 @@ Parallel track targeting data science practitioners — marketing, product, oper
 | **B3a.** `BusinessReport` class — plain-English summaries, markdown export; rich export via optional `[reporting]` extra | HIGH | Not started |
 | **B3b.** `DiagnosticReport` — unified diagnostic runner with plain-English interpretation. Includes making `practitioner_next_steps()` context-aware (substitute actual column names from fitted results into code snippets instead of generic placeholders). | HIGH | Not started |
 | **B3c.** Practitioner data generator wrappers (thin wrappers around existing generators with business-friendly names) | MEDIUM | Not started |
-| **B3d.** `survey_aggregate()` helper (see [Survey Aggregation Helper](#future-survey-aggregation-helper)) | MEDIUM | Not started |
+| **B3d.** `aggregate_survey()` helper (microdata-to-panel bridge for BRFSS/ACS/CPS) | MEDIUM | Shipped (v3.0.1) |
 
 ### Phase B4: Platform (Longer-term)
 
@@ -116,14 +116,6 @@ Parallel track targeting data science practitioners — marketing, product, oper
 
 ---
 
-## Future: Survey Aggregation Helper
-
-**`survey_aggregate()` helper function** for the microdata-to-panel workflow. Bridges individual-level survey data (BRFSS, ACS, CPS) collected as repeated cross-sections to geographic-level (state, city) panel DiD. Computes design-based cell means and precision weights that estimators can consume directly.
-
-Also cross-referenced as **B3d** — directly enables the practitioner survey tutorial workflow beyond the original academic framing.
-
----
-
 ## Future Estimators
 
 ### de Chaisemartin-D'Haultfouille Estimator

diff --git a/TODO.md b/TODO.md
@@ -60,6 +60,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | EfficientDiD `control_group="last_cohort"` trims at `last_g - anticipation` but REGISTRY says `t >= last_g`. With `anticipation=0` (default) these are identical. With `anticipation>0`, code is arguably more conservative (excludes anticipation-contaminated periods). Either align REGISTRY with code or change code to `t < last_g` — needs design decision. | `efficient_did.py` | #230 | Low |
+| `aggregate_survey()` returns `SurveyDesign(weight_type="aweight")`, but most modern staggered estimators (`CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `StackedDiD`, `TripleDifference`, `StaggeredTripleDifference`, `SyntheticDiD`, `TROP`, `WooldridgeDiD`) reject `aweight`. The microdata-to-staggered-DiD workflow currently requires a manual second-stage `SurveyDesign` reconstruction for these estimators. Investigate whether `aggregate_survey()` should offer an opt-in `output_weight_type="pweight"` mode (statistically dubious but practically useful) or whether the manual workaround should be documented prominently. | `diff_diff/prep.py` | #288 | Medium |
 | TripleDifference power: `generate_ddd_data` is a fixed 2×2×2 cross-sectional DGP — no multi-period or unbalanced-group support. Add a `generate_ddd_panel_data` for panel DDD power analysis. | `prep_dgp.py`, `power.py` | #208 | Low |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
 | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
@@ -88,9 +89,12 @@ Deferred items from PR reviews that were not addressed before merge.
 |-------|----------|----|----------|
 | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
-| ~376 `duplicate object description` Sphinx warnings — restructure `docs/api/*.rst` to avoid duplicate `:members:` + `autosummary` | `docs/api/*.rst` | — | Low |
+| ~1583 `duplicate object description` Sphinx warnings — restructure `docs/api/*.rst` to avoid duplicate `:members:` + `autosummary` (count grew from ~376 as API surface expanded) | `docs/api/*.rst` | — | Low |
 | Doc-snippet smoke tests only cover `.rst` files; `.txt` AI guides outside CI validation | `tests/test_doc_snippets.py` | #239 | Low |
 | Add CI validation for `docs/doc-deps.yaml` integrity (stale paths, unmapped source files) | `docs/doc-deps.yaml` | #269 | Low |
+| Sphinx autodoc fails to import 3 result members: `DiDResults.ci`, `MultiPeriodDiDResults.att`, `CallawaySantAnnaResults.aggregate` — investigate whether these are renamed/removed or just unresolvable from autosummary template | `docs/api/results.rst`, `docs/api/staggered.rst` | — | Medium |
+| `EDiDBootstrapResults` cross-reference is ambiguous — class is exported from both `diff_diff` and `diff_diff.efficient_did_bootstrap`, producing 3 "more than one target found" warnings. Add `:noindex:` to one source or use full-path refs | `diff_diff/efficient_did_results.py`, `docs/api/efficient_did.rst` | — | Low |
+| Tracked Sphinx autosummary stubs in `docs/api/_autosummary/*.rst` are stale — every sphinx build regenerates them with new attributes (e.g., `coef_var`, `survey_metadata`) that have been added to result classes. Either commit a refresh or move the directory to `.gitignore` and treat as build output. Also 6 untracked stubs exist for newer estimators (`WooldridgeDiD`, `SimulationMDEResults`, etc.) that have never been committed. | `docs/api/_autosummary/` | — | Low |
 
 ---
 

diff --git a/docs/business-strategy.md b/docs/business-strategy.md
@@ -329,7 +329,7 @@ The project has an existing ROADMAP.md covering Phase 10 (survey academic credib
 
 **Directly subsumed items:**
 - **10g. "Practitioner guidance: when does survey design matter?"** -- this becomes part of the business tutorials and Getting Started guide. No longer a standalone item.
-- **survey_aggregate() helper** -- the microdata-to-panel workflow helper is directly relevant for Persona A (survey data from BRFSS/ACS -> geographic panel). Should be prioritized alongside business tutorials.
+- **`aggregate_survey()` helper** -- shipped in v3.0.1. The microdata-to-panel workflow helper is in place for Persona A (survey data from BRFSS/ACS -> geographic panel). Practitioner-facing tutorials should reference it.
 
 **Reprioritized by business use cases:**
 - **de Chaisemartin-D'Haultfouille (reversible treatments)** -- marketing interventions frequently switch on/off (seasonal campaigns, promotions). This estimator becomes higher priority for business DS than for academics. Should move up in the roadmap.
@@ -373,7 +373,7 @@ Tutorials in priority order (ship incrementally, not all at once):
 12. `BusinessReport` class (3a) -- core uses only numpy/pandas/scipy; rich export via optional `[reporting]` extra
 13. `DiagnosticReport` descriptive assessment (3b)
 14. Business data generator wrappers (3c)
-15. `survey_aggregate()` helper from existing roadmap -- directly enables the survey tutorial workflow
+15. ~~`survey_aggregate()` helper from existing roadmap~~ -- shipped in v3.0.1 as `aggregate_survey()`; directly enables the survey tutorial workflow
 
 ### Phase 4: Platform (Longer-term)
 *Goal: Integrate into business DS workflows*

diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
@@ -603,6 +603,22 @@ All estimators accept an optional ``survey_design`` parameter in ``fit()``.
 Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
 estimation. The depth of support varies by estimator:
 
+.. note::
+
+   If your data starts as **individual-level survey microdata** (e.g., BRFSS,
+   ACS, CPS, NHANES respondent records), use :func:`~diff_diff.aggregate_survey`
+   as a preprocessing step. It pools microdata into geographic-period cells with
+   inverse-variance precision weights and returns a pre-configured
+   :class:`~diff_diff.SurveyDesign` with ``weight_type="aweight"``. This
+   second-stage design is directly compatible with estimators marked **Full** in
+   the matrix below: :class:`~diff_diff.DifferenceInDifferences`,
+   :class:`~diff_diff.TwoWayFixedEffects`, :class:`~diff_diff.MultiPeriodDiD`,
+   :class:`~diff_diff.SunAbraham`, :class:`~diff_diff.ContinuousDiD`, and
+   :class:`~diff_diff.EfficientDiD`. Estimators marked **pweight only** (CS,
+   ImputationDiD, TwoStageDiD, StackedDiD, TripleDifference, etc.) explicitly
+   reject ``aweight`` and require a manually constructed second-stage
+   ``SurveyDesign`` instead. See :doc:`api/prep` for the API reference.
+
 .. list-table::
    :header-rows: 1
    :widths: 25 12 18 18 18

diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml
@@ -550,6 +550,14 @@ sources:
     docs:
       - path: docs/api/prep.rst
         type: api_reference
+      - path: docs/practitioner_getting_started.rst
+        type: user_guide
+      - path: docs/practitioner_decision_tree.rst
+        type: user_guide
+      - path: docs/choosing_estimator.rst
+        type: user_guide
+      - path: docs/survey-roadmap.md
+        type: roadmap
 
   diff_diff/prep_dgp.py:
     drift_risk: low

diff --git a/docs/index.rst b/docs/index.rst
@@ -48,6 +48,7 @@ Quick Links
 - :doc:`practitioner_decision_tree` - Which method fits your business problem?
 - :doc:`quickstart` - Installation and your first DiD analysis
 - :doc:`choosing_estimator` - Which estimator should I use?
+- :func:`~diff_diff.aggregate_survey` - Have BRFSS/ACS/CPS microdata? Bridge it to a geographic panel for DiD
 - :doc:`tutorials/01_basic_did` - Hands-on basic tutorial
 - :doc:`troubleshooting` - Common issues and solutions
 - :doc:`r_comparison` - Coming from R?
@@ -99,6 +100,8 @@ Quick Links
    tutorials/13_stacked_did
    tutorials/14_continuous_did
    tutorials/15_efficient_did
+   tutorials/16_survey_did
+   tutorials/16_wooldridge_etwfe
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst
@@ -233,6 +233,15 @@ Ignoring survey weights and clustering makes your confidence intervals too narro
 you will be overconfident about the result. Passing a ``SurveyDesign`` to ``fit()``
 corrects for this automatically.
 
+**If your data is individual-level microdata** (e.g., BRFSS, ACS, CPS, or NHANES
+respondent records), use :func:`~diff_diff.aggregate_survey` first to roll it up
+to a geographic-period panel with inverse-variance precision weights. The
+returned second-stage design uses ``weight_type="aweight"``, so it works with
+estimators marked **Full** in the :ref:`survey-design-support` matrix (DiD,
+TWFE, MultiPeriodDiD, SunAbraham, ContinuousDiD, EfficientDiD) but not with
+``pweight``-only estimators like ``CallawaySantAnna`` or ``ImputationDiD``.
+See :doc:`practitioner_getting_started` for an end-to-end example.
+
 .. code-block:: python
 
    from diff_diff import DifferenceInDifferences, SurveyDesign

diff --git a/docs/practitioner_getting_started.rst b/docs/practitioner_getting_started.rst
@@ -293,6 +293,42 @@ Ignoring these makes your confidence intervals too narrow.
 diff-diff handles this via :class:`~diff_diff.SurveyDesign` - pass it to any estimator's
 ``fit()`` method.
 
+If your data is **individual-level microdata** - one row per respondent, with
+sampling weights and strata/PSU columns (BRFSS, ACS, CPS, NHANES) - use
+:func:`~diff_diff.aggregate_survey` first to roll it up to a geographic-period
+panel. The helper computes design-based cell means with inverse-variance
+precision weights and returns a pre-configured ``SurveyDesign`` (with
+``weight_type="aweight"``) for the second-stage fit. This second-stage design
+works directly with estimators marked **Full** in the
+:ref:`survey-design-support` matrix - notably
+:class:`~diff_diff.DifferenceInDifferences`, :class:`~diff_diff.SunAbraham`,
+:class:`~diff_diff.MultiPeriodDiD`, and :class:`~diff_diff.EfficientDiD`.
+``pweight``-only estimators (``CallawaySantAnna``, ``ImputationDiD``, etc.)
+require a manually constructed ``SurveyDesign`` instead.
+
+.. code-block:: python
+
+   from diff_diff import aggregate_survey, SurveyDesign, SunAbraham
+
+   # 1. Describe the microdata's sampling design
+   design = SurveyDesign(weights="finalwt", strata="strat", psu="psu")
+
+   # 2. Roll up respondent records into a state-year panel
+   panel, stage2 = aggregate_survey(
+       microdata, by=["state", "year"],
+       outcomes="brand_awareness", survey_design=design,
+   )
+
+   # 3. Add the campaign launch year per state, then fit a modern staggered
+   #    estimator with the pre-configured second-stage SurveyDesign:
+   # panel["first_treat"] = panel["state"].map(campaign_launch_year)  # NaN = control
+   # results = SunAbraham().fit(
+   #     panel, outcome="brand_awareness_mean",
+   #     unit="state", time="year", first_treat="first_treat",
+   #     survey_design=stage2,
+   # )
+   # results.print_summary()
+
 For a complete walkthrough with brand funnel metrics and survey design corrections,
 see `Tutorial 17: Brand Awareness Survey
 <tutorials/17_brand_awareness_survey.ipynb>`_.

diff --git a/docs/survey-roadmap.md b/docs/survey-roadmap.md
@@ -112,6 +112,18 @@ Files: `benchmarks/R/benchmark_realdata_*.R`, `tests/test_survey_real_data.py`,
 - **10d.** Tutorial rewrite — flat-weight vs design-based comparison with known ground truth
 - **10f.** WooldridgeDiD survey support — OLS, logit, Poisson paths with `pweight` + strata/PSU/FPC + TSL variance
 
+### v3.0.1: Survey Aggregation Helper
+
+`aggregate_survey()` (in `diff_diff.prep`) bridges individual-level survey
+microdata (BRFSS, ACS, CPS, NHANES) to geographic-period panels for
+second-stage DiD estimation. Computes design-based cell means and precision
+weights using domain estimation (Lumley 2004 §3.4), with SRS fallback for
+small cells. Returns a panel DataFrame plus a pre-configured `SurveyDesign`
+for the second-stage fit. Supports both TSL and replicate-weight variance.
+
+See `docs/api/prep.rst` for the API reference and `docs/methodology/REGISTRY.md`
+for the methodology entry.
+
 ---
 
 ## Phase 10: Remaining Items