diff --git a/.claude/commands/dev-checklists.md b/.claude/commands/dev-checklists.md index a4dc87e6..0b9dac7a 100644 --- a/.claude/commands/dev-checklists.md +++ b/.claude/commands/dev-checklists.md @@ -152,8 +152,11 @@ Final checklist before approving a PR: 3. **Documentation Sync**: - [ ] Docstrings updated for all changed signatures - - [ ] README updated if user-facing behavior changes - - [ ] REGISTRY.md updated if methodology edge cases change + - [ ] `diff_diff/guides/llms.txt` updated if a new estimator/feature appears in the public API (this is the AI-agent contract; it cascades to RTD) + - [ ] `docs/api/*.rst` updated for new modules / signatures + - [ ] `docs/references.rst` updated if a new scholarly source is cited + - [ ] `README.md` updated ONLY if (a) new estimator catalog one-liner, (b) hero/badges/tagline change, or (c) top-level capability paragraph (Diagnostics & Sensitivity, Survey Support). Do NOT add usage examples, parameter tables, or per-estimator sections. + - [ ] `REGISTRY.md` updated if methodology edge cases change ## Quick Reference: Common Patterns to Check diff --git a/.claude/commands/docs-check.md b/.claude/commands/docs-check.md index 0e43d72b..7e5d925c 100644 --- a/.claude/commands/docs-check.md +++ b/.claude/commands/docs-check.md @@ -12,35 +12,54 @@ Verify that documentation is complete and includes appropriate scholarly referen The user may provide an optional argument: `$ARGUMENTS` - If empty or "all": Run all checks (including map validation) -- If "readme": Check README.md sections only -- If "refs" or "references": Check scholarly references only +- If "readme": Check the README catalog one-liner only +- If "refs" or "references": Check scholarly references in `docs/references.rst` only - If "api": Check API documentation (RST files) only - If "tutorials": Check tutorial coverage only -- If "map": Validate docs/doc-deps.yaml integrity only +- If "map": Validate `docs/doc-deps.yaml` integrity only + +## Documentation surface map (post 2026-04 docs refresh) + +The README is a **landing page**, not the documentation. Each estimator/feature has documentation across multiple authoritative surfaces: + +- **`diff_diff/guides/llms.txt`** - AI-agent contract (one-line catalog entry per estimator with paper citation + RTD link). Source of truth that mirrors into RTD via `html_extra_path` and into the wheel via `get_llm_guide()`. +- **`docs/api/*.rst`** - Sphinx API reference (autoclass). +- **`docs/references.rst`** - Bibliography (one entry per scholarly source, organized by sub-section). +- **`docs/tutorials/*.ipynb`** - Hands-on examples. +- **`README.md`** - **One-line catalog entry only** under `## Estimators` (or `## Diagnostics & Sensitivity` for diagnostic-class features). No usage examples, no parameter tables, no per-estimator section. ## Estimators and Required Documentation The following estimators/features MUST have documentation: -### Core Estimators (require README section + API docs + references) - -| Estimator | README Section | API RST | Reference Category | -|-----------|----------------|---------|-------------------| -| DifferenceInDifferences | "Basic Difference-in-Differences" | estimators.rst | "Difference-in-Differences" | -| TwoWayFixedEffects | "Two-Way Fixed Effects" | estimators.rst | "Two-Way Fixed Effects" | -| MultiPeriodDiD | "Multi-Period" | estimators.rst | "Multi-Period and Staggered" | -| SyntheticDiD | "Synthetic DiD" or "Synthetic Difference" | estimators.rst | "Synthetic Difference-in-Differences" | -| CallawaySantAnna | "Callaway" or "Staggered" | staggered.rst | "Multi-Period and Staggered" | -| SunAbraham | "Sun" and "Abraham" | staggered.rst | "Multi-Period and Staggered" | -| TripleDifference | "Triple Diff" or "DDD" | triple_diff.rst | "Triple Difference" | -| TROP | "TROP" or "Triply Robust" | trop.rst | "Triply Robust Panel" | -| HonestDiD | "Honest DiD" or "sensitivity" | honest_did.rst | "Honest DiD" | -| BaconDecomposition | "Bacon" or "decomposition" | estimators.rst | "Multi-Period and Staggered" | - -### Supporting Features (require README mention + API docs) - -| Feature | README Mention | API RST | -|---------|----------------|---------| +### Core Estimators (require llms.txt entry + README catalog line + API docs + references) + +| Estimator | llms.txt entry | README catalog | API RST | Reference Category | +|-----------|---------------|----------------|---------|-------------------| +| DifferenceInDifferences | "DifferenceInDifferences" | "DifferenceInDifferences" | estimators.rst | "Difference-in-Differences" | +| TwoWayFixedEffects | "TwoWayFixedEffects" | "TwoWayFixedEffects" | estimators.rst | "Two-Way Fixed Effects" | +| MultiPeriodDiD | "MultiPeriodDiD" | "MultiPeriodDiD" | estimators.rst | "Multi-Period and Staggered" | +| SyntheticDiD | "SyntheticDiD" | "SyntheticDiD" | estimators.rst | "Synthetic Difference-in-Differences" | +| CallawaySantAnna | "CallawaySantAnna" | "CallawaySantAnna" | staggered.rst | "Multi-Period and Staggered" | +| SunAbraham | "SunAbraham" | "SunAbraham" | staggered.rst | "Multi-Period and Staggered" | +| ImputationDiD | "ImputationDiD" | "ImputationDiD" | imputation.rst | "Multi-Period and Staggered" | +| TwoStageDiD | "TwoStageDiD" | "TwoStageDiD" | two_stage.rst | "Multi-Period and Staggered" | +| ChaisemartinDHaultfoeuille | "ChaisemartinDHaultfoeuille" | "ChaisemartinDHaultfoeuille" | chaisemartin_dhaultfoeuille.rst | "Multi-Period and Staggered" | +| EfficientDiD | "EfficientDiD" | "EfficientDiD" | efficient_did.rst | "Multi-Period and Staggered" | +| StackedDiD | "StackedDiD" | "StackedDiD" | stacked_did.rst | "Multi-Period and Staggered" | +| ContinuousDiD | "ContinuousDiD" | "ContinuousDiD" | continuous_did.rst | "Multi-Period and Staggered" | +| HeterogeneousAdoptionDiD | "HeterogeneousAdoptionDiD" | "HeterogeneousAdoptionDiD" | had.rst | "Heterogeneous Adoption (No-Untreated Designs)" | +| TripleDifference | "TripleDifference" | "TripleDifference" | triple_diff.rst | "Triple Difference" | +| StaggeredTripleDifference | "StaggeredTripleDifference" | "StaggeredTripleDifference" | staggered.rst | "Triple Difference" | +| WooldridgeDiD | "WooldridgeDiD" | "WooldridgeDiD" | wooldridge_etwfe.rst | "Multi-Period and Staggered" | +| TROP | "TROP" | "TROP" | trop.rst | "Triply Robust Panel" | +| HonestDiD | n/a (in `## Diagnostics`) | n/a (in `## Diagnostics`) | honest_did.rst | "Honest DiD" | +| BaconDecomposition | "BaconDecomposition" | "BaconDecomposition" | bacon.rst | "Multi-Period and Staggered" | + +### Supporting Features (require llms.txt mention + API docs; README mention only if landing-page-relevant) + +| Feature | llms.txt Mention | API RST | +|---------|------------------|---------| | Wild bootstrap | "wild" and "bootstrap" | utils.rst | | Cluster-robust SE | "cluster" | utils.rst | | Parallel trends | "parallel trends" | utils.rst | @@ -50,7 +69,7 @@ The following estimators/features MUST have documentation: ## Required Scholarly References -Each estimator category MUST have at least one scholarly reference in README.md: +Each estimator category MUST have at least one scholarly reference in `docs/references.rst`: ### Reference Requirements @@ -95,26 +114,49 @@ Goodman-Bacon Decomposition: Determine which checks to run based on `$ARGUMENTS`. -### 2. README Section Check +### 2. llms.txt + README Catalog Check + +For each estimator/diagnostic in the table above: -For each estimator in the table above: -1. Read README.md -2. Search for the required section/mention (case-insensitive) -3. Report missing sections +1. Read `diff_diff/guides/llms.txt` and verify the name appears under the right section: + - **Estimators** (e.g. CallawaySantAnna, SunAbraham, TROP, BaconDecomposition): under `## Estimators` + - **Diagnostics-class** (HonestDiD, and any future diagnostic-only entries): under `## Diagnostics and Sensitivity Analysis` +2. Read `README.md` and verify the name appears in the matching flat catalog: + - **Estimators**: in the `## Estimators` section + - **Diagnostics-class** (HonestDiD): in the `## Diagnostics & Sensitivity` section +3. Report missing entries ```bash -# Example: Check if "Callaway" appears in README -grep -i "callaway" README.md +# Extract the README ## Estimators section. Use a flag-based awk because the +# range form `awk '/^## Estimators/,/^## /'` self-terminates on the opening H2. +extract_section() { + awk -v target="$1" ' + $0 == "## " target { flag=1; next } + flag && /^## / { flag=0 } + flag { print } + ' README.md +} + +# Example: an estimator (lives in ## Estimators) +extract_section "Estimators" | grep -c 'CallawaySantAnna' + +# Example: a diagnostic (lives in ## Diagnostics & Sensitivity) +extract_section "Diagnostics & Sensitivity" | grep -c 'Honest DiD' + +# Always verify both surfaces +grep -c 'CallawaySantAnna' diff_diff/guides/llms.txt ``` +Do NOT search for per-estimator README sections - they were intentionally removed in the 2026-04 docs refresh. The README's `## Estimators` and `## Diagnostics & Sensitivity` headings are the only valid catalog surfaces. + ### 3. Scholarly References Check For each reference category: -1. Search README.md References section for required citations +1. Search `docs/references.rst` for required citations (NOT README.md - the bibliography moved out of README in the 2026-04 docs refresh) 2. Verify author names and year appear together 3. Report missing references -Check patterns (case-insensitive): +Check patterns (case-insensitive, run against `docs/references.rst`): - "Arkhangelsky.*2021" for Synthetic DiD - "Callaway.*Sant.Anna.*2021" for staggered - "Rambachan.*Roth.*2023" for Honest DiD @@ -122,6 +164,11 @@ Check patterns (case-insensitive): - "Goodman.Bacon.*2021" for Bacon decomposition - etc. +```bash +# Example +grep -i 'Arkhangelsky.*2021' docs/references.rst +``` + ### 4. API Documentation Check For each RST file in `docs/api/`: @@ -176,12 +223,12 @@ Generate a summary report: ``` === Documentation Completeness Check === -README Sections: - [PASS] DifferenceInDifferences - Found in "Basic Difference-in-Differences" - [PASS] CallawaySantAnna - Found in "Staggered Adoption" - [FAIL] NewEstimator - NOT FOUND +llms.txt + README Catalog: + [PASS] DifferenceInDifferences - Found in llms.txt and README Estimators catalog + [PASS] CallawaySantAnna - Found in both surfaces + [FAIL] NewEstimator - missing from llms.txt and README catalog -Scholarly References: +Scholarly References (docs/references.rst): [PASS] Synthetic DiD - Arkhangelsky et al. (2021) [PASS] Honest DiD - Rambachan & Roth (2023) [FAIL] Bacon Decomposition - Missing Goodman-Bacon (2021) diff --git a/.claude/commands/pre-merge-check.md b/.claude/commands/pre-merge-check.md index 57c07e75..4a593772 100644 --- a/.claude/commands/pre-merge-check.md +++ b/.claude/commands/pre-merge-check.md @@ -187,7 +187,9 @@ Based on your changes to: ``` ### Documentation Sync - [ ] Docstrings updated for changed function signatures -- [ ] README updated if user-facing behavior changes +- [ ] `diff_diff/guides/llms.txt` updated if the public API surface changed (AI-agent contract) +- [ ] `docs/api/*.rst` and `docs/references.rst` updated as appropriate +- [ ] `README.md` updated ONLY for landing-page-relevant changes (catalog one-liner, hero/badges/tagline, top-level capability paragraph). Per CONTRIBUTING.md, README is not the place for usage examples or per-estimator sections. ``` #### If This Appears to Be a Bug Fix diff --git a/.claude/commands/review-plan.md b/.claude/commands/review-plan.md index ba904f14..68e36870 100644 --- a/.claude/commands/review-plan.md +++ b/.claude/commands/review-plan.md @@ -225,7 +225,7 @@ Check for **missing related changes**: - Tests for new/changed functionality - `__init__.py` export updates - `get_params()` / `set_params()` updates for new parameters -- Documentation updates (README, RST, tutorials, CONTRIBUTING.md, CLAUDE.md if design patterns change) +- Documentation updates (`diff_diff/guides/llms.txt` for new public-API surfaces, `docs/api/*.rst`, `docs/references.rst` for new citations, tutorials, CONTRIBUTING.md, CLAUDE.md if design patterns change). README updates only if the change affects the landing page (new estimator catalog one-liner, hero/badges/tagline, top-level capability paragraph) - per CONTRIBUTING.md, README is not the place for usage examples or per-estimator sections. - For bug fixes: did the plan grep for ALL occurrences of the pattern, or just the one reported? Check for **unnecessary additions**: diff --git a/.claude/commands/review-pr.md b/.claude/commands/review-pr.md index 57c3fbae..bff66a80 100644 --- a/.claude/commands/review-pr.md +++ b/.claude/commands/review-pr.md @@ -45,8 +45,10 @@ Analyze PRs across 6 dimensions: ### 4. Documentation Review - Docstrings for new/modified functions -- README updates if needed -- API documentation (RST files) +- `diff_diff/guides/llms.txt` updated if a new public-API surface landed (AI-agent contract) +- API documentation (RST files in `docs/api/`) +- `docs/references.rst` updated for new scholarly citations +- README updated ONLY for landing-page-relevant changes (catalog one-liner, hero/badges/tagline, top-level capability paragraph). Per CONTRIBUTING.md, README is not the place for usage examples or per-estimator sections. - Inline comments for complex logic ### 5. Performance @@ -133,7 +135,7 @@ If no PR number is provided, use AskUserQuestion to request it. ## Part 4: Documentation Assessment -[Check for docstrings, README updates, API docs as needed] +[Check docstrings, llms.txt for new public-API surfaces, API RST docs, references.rst for new citations, README only for landing-page-relevant changes] --- diff --git a/BRIEFING.md b/BRIEFING.md index 853c7a27..1ff30292 100644 --- a/BRIEFING.md +++ b/BRIEFING.md @@ -1,59 +1,80 @@ -# dcdh-by-path — Briefing - -## The ask - -Clément de Chaisemartin (dCDH author) suggested implementing the `by_path` -option from R's `did_multiplegt_dyn`. It disaggregates the dynamic event-study -by observed treatment trajectory so practitioners can compare paths like: - -- `(0,1,0,0)` — one pulse -- `(0,1,1,0)` — two periods on, then off -- `(0,1,1,1)` — three periods on, then off -- `(0,1,0,1)` vs `(0,1,1,0)` — sequencing - -Use case: "is a single pulse enough, or do you need sustained exposure?" - -## Where we stand today - -`diff_diff/chaisemartin_dhaultfoeuille.py` implements `ChaisemartinDHaultfoeuille`. - -- Supports reversible on/off treatments (the only estimator in the library - that does) -- **Currently drops multi-switch groups by default** (`drop_larger_lower=True`) — - exactly the groups `by_path` wants to keep and compare -- Stratifies by direction cohort (`DID_+`, `DID_-`, `S_g = sign(Δ)`) but not - by trajectory -- No `by_path`, `treatment_path`, or path-enumeration code exists anywhere -- Not on ROADMAP.md; not in TODO.md - -## Shape of the work - -1. Parameter: likely `by_path: bool = False` (implies `drop_larger_lower=False`) -2. Enumerate unique treatment histories `(D_{g,1}, …, D_{g,T})` per group; - optionally accept a user-specified subset of paths of interest -3. Per-path `DID_{g,l}` aggregation with influence-function SEs per path -4. Result container extension: `path_effects` dict keyed by trajectory tuple, - each holding ATT + SE + CI vectors -5. Decide interaction with `drop_larger_lower`: probably forbid both being - non-default simultaneously, or have `by_path` override -6. REGISTRY.md section on path-heterogeneity methodology + deviation notes -7. Methodology reference: `did_multiplegt_dyn` manual §on `by_path`; dCDH - dynamic paper for the `DID_{g,l}` building block (already cited in REGISTRY) - -## Open methodology questions (for plan mode) - -- Which paths are enumerable? All observed, or user-specified subset only? - R's default behavior on cardinality control is worth checking. -- How does path stratification interact with the current cohort pooling - `(D_{g,1}, F_g, S_g)` used for variance recentering — does it still apply - per path? -- Placebo and TWFE diagnostics: compute per-path or overall only? -- Bootstrap interaction: per-path bootstrap blocks vs single bootstrap with - per-path aggregation - -## Before starting - -- Pull the R manual section on `by_path` for `did_multiplegt_dyn` — the option - spec there is load-bearing; don't infer from usage examples alone -- Methodology changes: consult `docs/methodology/REGISTRY.md` first -- New estimator surface → budget ~12-20 CI review rounds +# docs-refresh — Briefing + +## The goal + +Two-part documentation sweep, sequenced as one initiative across multiple PRs: + +1. **README.md aggressive trim** +2. **RTD staleness audit + targeted fixes** + +Tutorial work is OUT OF SCOPE — that's a separate worktree (`dcdh-tutorial`). + +## Why now + +Recent releases (3.0.x → 3.3.0) shipped a lot of new surface area without +proportional README/RTD updates: + +- HeterogeneousAdoptionDiD (entirely new estimator, multi-phase) +- profile_panel() + llms-autonomous.txt +- dCDH by_path + R parity +- SDiD survey support across all three variance methods +- BR/DR target_parameter (schema 2.0) +- TROP backend parity + +README is too long for skim consumption (SEO + first-impression problem). +RTD likely has stale pages, missing API references, and outdated examples. + +## Sequencing + +### PR 1 — README aggressive trim +Target a tight shape: +- One-line value prop +- Install (`pip install diff-diff`) +- Minimal working example (5-10 lines, one estimator) +- Estimator-list one-liner with link to RTD for full reference +- Citation + license + +Aggressive cuts. Anything that belongs on RTD goes to RTD (or stays there if +already there). Don't try to be the docs. + +Out of scope: rewriting RTD content that the README links to. + +### PR 2+ — RTD staleness audit + fixes + +Audit step (read-only): +- Walk `docs/` and identify pages missing post-3.0.x estimators / surfaces +- Cross-reference `docs/doc-deps.yaml` to surface known dependency drift +- Categorize: missing API page, stale example, broken link, outdated narrative + +Then fix in scoped PRs (one PR per coherent batch — e.g., "Add HAD API reference ++ choosing-estimator entry", "Refresh practitioner decision tree for 3.3.0"). + +## What to read first + +- `README.md` (current state, length) +- `docs/index.rst` (RTD entry point) +- `docs/doc-deps.yaml` (source-to-doc dependency map) +- `docs/api/` (API reference pages — what's missing) +- `docs/methodology/REGISTRY.md` (don't reformat; just cross-check it's + referenced from RTD where appropriate) +- `CLAUDE.md` "Documenting Deviations" section (label patterns, don't violate) + +## Memory rules to honor + +- Hyphens, not em dashes (writing style) +- No competitor mentions in formal docs (ROADMAP / user-facing) +- No version numbers as RTD section headings +- diff-diff perspective (not neutral comparisons) +- Tutorial-scope discipline does NOT apply here — this is reference docs + +## Out of scope + +- New tutorials (separate `dcdh-tutorial` worktree owns DCDH; HAD tutorial queued after) +- ROADMAP.md restructuring (separate concern) +- BR/DR positioning beyond "experimental preview" framing (per memory) + +## Cleanup note + +This BRIEFING.md was accidentally committed to main from a prior worktree +session. Long-term, drop it from main and add to .gitignore so worktree +briefings stay local. diff --git a/CLAUDE.md b/CLAUDE.md index ec5f9b83..fede87d2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -114,6 +114,21 @@ category (`Methodology/Correctness`, `Performance`, or `Testing/Docs`): |-------|----------|----|----------| | Description of deferred item | `file.py` | #NNN | Medium/Low | +## README discipline + +`README.md` is a **landing page**, not the documentation. Target ~190 lines. The 3,119-line README that existed before the 2026-04 docs refresh grew because workflow conventions told contributors to add to README on every change. + +When adding new functionality, the source of truth is: + +- **`diff_diff/guides/llms.txt`** for the AI-agent contract (one-line catalog entry per estimator with paper citation + RTD link). This file is bundled in the wheel and published on RTD via `docs/conf.py` `html_extra_path`. +- **`docs/api/*.rst`** for full API reference. +- **`docs/references.rst`** for scholarly citations. +- **`docs/tutorials/*.ipynb`** for hands-on examples. +- **`CHANGELOG.md`** for release notes. +- **`README.md`** for ONE LINE in the `## Estimators` flat catalog (or `## Diagnostics & Sensitivity` for diagnostic-class features). Do NOT add usage examples, parameter tables, per-estimator sections, or full bibliographies. + +`/docs-impact` and `/docs-check` enforce these surfaces. See `CONTRIBUTING.md` "README is a landing page, not the docs" for the full convention. + ## Testing Conventions - **`ci_params` fixture** (session-scoped in `conftest.py`): Use `ci_params.bootstrap(n)` and diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c6802369..4940f575 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,52 +4,68 @@ When implementing new functionality, **always include accompanying documentation updates**. +### README is a landing page, not the docs + +`README.md` is the GitHub/PyPI first-impression surface. Keep it lean (~190 lines). Most new content does NOT belong here. + +**Only edit `README.md` when**: +- A new estimator is added (one line in the `## Estimators` flat catalog) +- A new top-level capability lands (one paragraph in `## Diagnostics & Sensitivity` or `## Survey Support`) +- Hero image, badges, or top-of-fold value-prop changes +- Documentation links rot + +If you find yourself adding a usage example, a parameter table, or a multi-paragraph explanation to the README, you are in the wrong file - those belong on RTD or in `diff_diff/guides/llms.txt`. + ### For New Estimators or Major Features -1. **README.md** - Add: - - Feature mention in the features list - - Full usage section with code examples - - Parameter documentation table - - API reference section (constructor params, fit() params, results attributes/methods) - - Scholarly references if applicable +1. **`diff_diff/guides/llms.txt`** (AI-agent source of truth) - Add: + - One-line catalog entry in the `## Estimators` section with paper citation + RTD link + - One-line entry in `## Diagnostics and Sensitivity Analysis` if applicable + - This file is published on RTD via `docs/conf.py` `html_extra_path` and bundled in the wheel via `get_llm_guide()` - it is the canonical machine-readable contract -2. **docs/api/*.rst** - Add: +2. **`docs/api/*.rst`** (technical source of truth) - Add: - RST documentation with `autoclass` directives - Method summaries - References to academic papers -3. **docs/tutorials/*.ipynb** - Update relevant tutorial or create new one: +3. **`docs/tutorials/*.ipynb`** - Update relevant tutorial or create new one: - Working code examples - Explanation of when/why to use the feature - - Comparison with related functionality -4. **CLAUDE.md** - Update only if adding new critical rules or design patterns +4. **`docs/references.rst`** (bibliography source of truth) - Add: + - Full citation under the appropriate sub-section (matches the `### Subsection` headings already in that file) + - Use the RST format: `**Author (Year).** "Title." *Journal*, vol(num), pages. ` + +5. **`README.md`** - Add ONLY: + - One line in the `## Estimators` catalog with the paper citation and RTD link + +6. **`CHANGELOG.md`** - Add a release-note bullet under the next unreleased version. + +7. **`CLAUDE.md`** - Update only if adding new critical rules or design patterns. -5. **ROADMAP.md** - Update: - - Move implemented features from planned to current status - - Update version numbers +8. **`ROADMAP.md`** - Update only if shipping moves an item from planned to current. -6. **docs/doc-deps.yaml** - Add source-to-doc mappings for the new module +9. **`docs/doc-deps.yaml`** - Add source-to-doc mappings for the new module. ### For Bug Fixes or Minor Enhancements - Update relevant docstrings - Add/update tests -- Update CHANGELOG.md (if exists) +- Update `CHANGELOG.md` - **If methodology-related**: Update `docs/methodology/REGISTRY.md` edge cases section +- **README is almost never the right place** - skip it unless the bug was in a README claim ### Scholarly References For methods based on academic papers, always include: -- Full citation in README.md references section -- Reference in RST docs with paper details +- Full citation in **`docs/references.rst`** under the appropriate `### Subsection` heading (NOT in README) +- Reference in RST API docs with paper details - Citation in tutorial summary +- Optional: methodology reference in `docs/methodology/REGISTRY.md` for non-trivial design choices -Example format: +Example format (RST): ``` -Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in -event studies with heterogeneous treatment effects. *Journal of Econometrics*, -225(2), 175-199. +- **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. https://doi.org/10.1016/j.jeconom.2020.09.006 ``` ## Test Writing Guidelines diff --git a/README.md b/README.md index ebd80e8b..aa556e12 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,11 @@ # diff-diff +

+ diff-diff: Difference-in-Differences causal inference in Python - sklearn-like API with Callaway-Sant'Anna, Synthetic DiD, Honest DiD, and Event Studies +

+ [![PyPI version](https://img.shields.io/pypi/v/diff-diff.svg)](https://pypi.org/project/diff-diff/) [![Python versions](https://img.shields.io/pypi/pyversions/diff-diff.svg)](https://pypi.org/project/diff-diff/) [![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) @@ -7,7 +13,7 @@ [![Documentation](https://readthedocs.org/projects/diff-diff/badge/?version=stable)](https://diff-diff.readthedocs.io/en/stable/) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19646175.svg)](https://doi.org/10.5281/zenodo.19646175) -A Python library for Difference-in-Differences (DiD) causal inference analysis with an sklearn-like API and statsmodels-style outputs. +A Python library for Difference-in-Differences (DiD) causal inference - sklearn-like estimators with statsmodels-style outputs, built for econometricians, marketing analysts, and data scientists running campaign-lift, policy, and staggered-rollout analyses. ## Installation @@ -15,12 +21,12 @@ A Python library for Difference-in-Differences (DiD) causal inference analysis w pip install diff-diff ``` -Or install from source: +For development: ```bash git clone https://github.com/igerber/diff-diff.git cd diff-diff -pip install -e . +pip install -e ".[dev]" ``` ## Quick Start @@ -29,48 +35,30 @@ pip install -e . import pandas as pd from diff_diff import DifferenceInDifferences # or: DiD -# Create sample data data = pd.DataFrame({ 'outcome': [10, 11, 15, 18, 9, 10, 12, 13], 'treated': [1, 1, 1, 1, 0, 0, 0, 0], - 'post': [0, 0, 1, 1, 0, 0, 1, 1] + 'post': [0, 0, 1, 1, 0, 0, 1, 1], }) -# Fit the model did = DifferenceInDifferences() results = did.fit(data, outcome='outcome', treatment='treated', time='post') - -# View results -print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583) -results.print_summary() -``` - -Output: +print(results) # DiDResults(ATT=3.0000, SE=1.7321, p=0.1583) +results.print_summary() # full statsmodels-style table ``` -====================================================================== - Difference-in-Differences Estimation Results -====================================================================== - -Observations: 8 -Treated units: 4 -Control units: 4 -R-squared: 0.9055 ----------------------------------------------------------------------- -Parameter Estimate Std. Err. t-stat P>|t| ----------------------------------------------------------------------- -ATT 3.0000 1.7321 1.732 0.1583 ----------------------------------------------------------------------- +## Documentation -95% Confidence Interval: [-1.8089, 7.8089] - -Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 -====================================================================== -``` +- [Quickstart](https://diff-diff.readthedocs.io/en/stable/quickstart.html) - basic 2x2 DiD with column-name and formula interfaces, covariates, fixed effects, cluster-robust SEs +- [Choosing an Estimator](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html) - decision flowchart for picking the right estimator +- [Tutorials](https://diff-diff.readthedocs.io/en/stable/tutorials/01_basic_did.html) - hands-on Jupyter notebooks covering every estimator and design pattern +- [Troubleshooting](https://diff-diff.readthedocs.io/en/stable/troubleshooting.html) - common issues and solutions +- [R Comparison](https://diff-diff.readthedocs.io/en/stable/r_comparison.html) | [Python Comparison](https://diff-diff.readthedocs.io/en/stable/python_comparison.html) | [Benchmarks](https://diff-diff.readthedocs.io/en/stable/benchmarks.html) - validation results vs `did`, `synthdid`, `fixest` +- [API Reference](https://diff-diff.readthedocs.io/en/stable/api/index.html) - full API for all estimators, results classes, diagnostics, utilities ## For AI Agents -If you are an AI agent or LLM using this library, call `diff_diff.get_llm_guide()` for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis — not just calling `fit()`, but testing assumptions, running sensitivity analysis, and checking robustness. +If you are an AI agent or LLM using this library, call `diff_diff.get_llm_guide()` for a concise API reference with an 8-step practitioner workflow (based on Baker et al. 2025). The workflow ensures rigorous DiD analysis - testing assumptions, running sensitivity analysis, and checking robustness, not just calling `fit()`. ```python from diff_diff import get_llm_guide @@ -78,3020 +66,101 @@ from diff_diff import get_llm_guide get_llm_guide() # concise API reference get_llm_guide("practitioner") # 8-step workflow (Baker et al. 2025) get_llm_guide("full") # comprehensive documentation +get_llm_guide("autonomous") # autonomous-agent variant ``` -The guides are bundled in the wheel, so they are accessible from a `pip install` with no network access required. - -After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps. +The guides are bundled in the wheel - accessible from a `pip install` with no network access. After estimation, call `practitioner_next_steps(results)` for context-aware guidance on remaining diagnostic steps. ## For Data Scientists -Measuring campaign lift? Evaluating a product launch? diff-diff handles the causal inference so you can focus on the business question. - -- **[Which method fits my problem?](docs/practitioner_decision_tree.rst)** - Start from your business scenario (campaign in some markets, staggered rollout, survey data) and find the right estimator -- **[Getting started for practitioners](docs/practitioner_getting_started.rst)** - End-to-end walkthrough: marketing campaign -> causal estimate -> stakeholder-ready result -- **[Brand awareness survey tutorial](docs/tutorials/17_brand_awareness_survey.ipynb)** - Full example with complex survey design, brand funnel analysis, and staggered rollouts -- **Have BRFSS/ACS/CPS individual records?** Use [`aggregate_survey()`](docs/api/prep.rst) to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights. The returned second-stage design uses analytic weights (`aweight`), so it works directly with `DifferenceInDifferences`, `TwoWayFixedEffects`, `MultiPeriodDiD`, `SunAbraham`, `ContinuousDiD`, and `EfficientDiD` (estimators marked **Full** in the [survey support matrix](docs/choosing_estimator.rst)) - -### Experimental preview: `BusinessReport` and `DiagnosticReport` - -diff-diff ships two preview classes, `BusinessReport` and `DiagnosticReport`, that produce plain-English output and a structured `to_dict()` schema from any fitted result. **Both are experimental in this release** — wording, verdict thresholds, and schema shape will change as the library learns from real practitioner usage. Do not anchor downstream tooling on the schema yet; the experimental flag is noted in the CHANGELOG. - -```python -from diff_diff import CallawaySantAnna, BusinessReport - -cs = CallawaySantAnna(base_period="universal").fit( - df, outcome="revenue", unit="store", time="month", - first_treat="first_treat", aggregate="event_study", -) -report = BusinessReport( - cs, - outcome_label="Revenue per store", - outcome_unit="$", - business_question="Did the loyalty program lift revenue?", - treatment_label="the loyalty program", - # Optional: pass the panel + column names so the auto-constructed - # DiagnosticReport can run data-dependent checks (2x2 pre-trends, - # Goodman-Bacon decomposition, EfficientDiD Hausman pretest). - # Without these the auto path still runs but skips those checks. - data=df, - outcome="revenue", - unit="store", - time="month", - first_treat="first_treat", -) -print(report.summary()) -``` - -`BusinessReport` auto-constructs a `DiagnosticReport` so the summary mentions pre-trends, sensitivity, and design-effect findings in one call. Methodology (phrasing rules, verdict thresholds, schema stability) is documented in [docs/methodology/REPORTING.md](docs/methodology/REPORTING.md). Feedback on wording, applicability, and missing diagnostics is welcome — this is the part of the library most likely to evolve in the next few releases. - -Already know DiD? The [academic quickstart](docs/quickstart.rst) and [estimator guide](docs/choosing_estimator.rst) cover the full technical details. - -## Features - -- **sklearn-like API**: Familiar `fit()` interface with `get_params()` and `set_params()` -- **Pythonic results**: Easy access to coefficients, standard errors, and confidence intervals -- **Multiple interfaces**: Column names or R-style formulas -- **Robust inference**: Heteroskedasticity-robust (HC1) and cluster-robust standard errors -- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights -- **Panel data support**: Two-way fixed effects estimator for panel designs -- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects -- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), Efficient DiD (Chen, Sant'Anna & Xie 2025), and Wooldridge ETWFE (2021/2023) estimators for heterogeneous treatment timing -- **Reversible (non-absorbing) treatments**: de Chaisemartin-D'Haultfœuille `DID_M` estimator for treatments that switch on AND off over time (marketing campaigns, seasonal promotions, on/off policy cycles) — the only library option for non-absorbing treatments -- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling -- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness -- **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025) -- **Event study plots**: Publication-ready visualization of treatment effects -- **Parallel trends testing**: Multiple methods including equivalence tests -- **Goodman-Bacon decomposition**: Diagnose TWFE bias by decomposing into 2x2 comparisons -- **Placebo tests**: Comprehensive diagnostics including fake timing, fake group, permutation, and leave-one-out tests -- **Honest DiD sensitivity analysis**: Rambachan-Roth (2023) bounds and breakdown analysis for parallel trends violations -- **Pre-trends power analysis**: Roth (2022) minimum detectable violation (MDV) and power curves for pre-trends tests -- **Power analysis**: MDE, sample size, and power calculations for study design; simulation-based power for any estimator -- **Data prep utilities**: Helper functions for common data preparation tasks -- **Survey microdata aggregation**: `aggregate_survey()` rolls individual-level survey data (BRFSS, ACS, CPS, NHANES) into geographic-period panels with design-based precision weights for second-stage DiD -- **Validated against R**: Benchmarked against `did`, `synthdid`, and `fixest` packages (see [benchmarks](docs/benchmarks.rst)) - -## Estimator Aliases - -All estimators have short aliases for convenience: - -| Alias | Full Name | Method | -|-------|-----------|--------| -| `DiD` | `DifferenceInDifferences` | Basic 2x2 DiD | -| `TWFE` | `TwoWayFixedEffects` | Two-way fixed effects | -| `EventStudy` | `MultiPeriodDiD` | Event study / multi-period | -| `CS` | `CallawaySantAnna` | Callaway & Sant'Anna (2021) | -| `SA` | `SunAbraham` | Sun & Abraham (2021) | -| `BJS` | `ImputationDiD` | Borusyak, Jaravel & Spiess (2024) | -| `Gardner` | `TwoStageDiD` | Gardner (2022) two-stage | -| `SDiD` | `SyntheticDiD` | Synthetic DiD | -| `DDD` | `TripleDifference` | Triple difference | -| `CDiD` | `ContinuousDiD` | Continuous treatment DiD | -| `Stacked` | `StackedDiD` | Stacked DiD | -| `Bacon` | `BaconDecomposition` | Goodman-Bacon decomposition | -| `EDiD` | `EfficientDiD` | Efficient DiD | -| `ETWFE` | `WooldridgeDiD` | Wooldridge ETWFE (2021/2023) | -| `DCDH` | `ChaisemartinDHaultfoeuille` | de Chaisemartin & D'Haultfœuille (2020) — reversible treatments | - -`TROP` already uses its short canonical name and needs no alias. - -## Tutorials - -We provide Jupyter notebook tutorials in `docs/tutorials/`: - -| Notebook | Description | -|----------|-------------| -| `01_basic_did.ipynb` | Basic 2x2 DiD, formula interface, covariates, fixed effects, cluster-robust SE, wild bootstrap | -| `02_staggered_did.ipynb` | Staggered adoption with Callaway-Sant'Anna and Sun-Abraham, group-time effects, aggregation methods, Bacon decomposition | -| `03_synthetic_did.ipynb` | Synthetic DiD, unit/time weights, inference methods, regularization | -| `04_parallel_trends.ipynb` | Testing parallel trends, equivalence tests, placebo tests, diagnostics | -| `05_honest_did.ipynb` | Honest DiD sensitivity analysis, bounds, breakdown values, visualization | -| `06_power_analysis.ipynb` | Power analysis, MDE, sample size calculations, simulation-based power | -| `07_pretrends_power.ipynb` | Pre-trends power analysis (Roth 2022), MDV, power curves | -| `08_triple_diff.ipynb` | Triple Difference (DDD) estimation with proper covariate handling | -| `09_real_world_examples.ipynb` | Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws) | -| `10_trop.ipynb` | Triply Robust Panel (TROP) estimation with factor model adjustment | -| `11_imputation_did.ipynb` | Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison | -| `12_two_stage_did.ipynb` | Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects | -| `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions | -| `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference | -| `16_survey_did.ipynb` | Survey-aware DiD with complex sampling designs (strata, PSU, FPC, weights), replicate weights, subpopulation analysis, DEFF diagnostics | -| `17_brand_awareness_survey.ipynb` | Measuring campaign impact on brand awareness with survey data — naive vs. survey-corrected comparison, brand funnel analysis, staggered rollouts, stakeholder communication | - -## Data Preparation - -diff-diff provides utility functions to help prepare your data for DiD analysis. These functions handle common data transformation tasks like creating treatment indicators, reshaping panel data, and validating data formats. - -### Generate Sample Data - -Create synthetic data with a known treatment effect for testing and learning: - -```python -from diff_diff import generate_did_data, DifferenceInDifferences - -# Generate panel data with 100 units, 4 periods, and a treatment effect of 5 -data = generate_did_data( - n_units=100, - n_periods=4, - treatment_effect=5.0, - treatment_fraction=0.5, # 50% of units are treated - treatment_period=2, # Treatment starts at period 2 - seed=42 -) - -# Verify the estimator recovers the treatment effect -did = DifferenceInDifferences() -results = did.fit(data, outcome='outcome', treatment='treated', time='post') -print(f"Estimated ATT: {results.att:.2f} (true: 5.0)") -``` - -### Create Treatment Indicators - -Convert categorical variables or numeric thresholds to binary treatment indicators: - -```python -from diff_diff import make_treatment_indicator - -# From categorical variable -df = make_treatment_indicator( - data, - column='state', - treated_values=['CA', 'NY', 'TX'] # These states are treated -) - -# From numeric threshold (e.g., firms above median size) -df = make_treatment_indicator( - data, - column='firm_size', - threshold=data['firm_size'].median() -) - -# Treat units below threshold -df = make_treatment_indicator( - data, - column='income', - threshold=50000, - above_threshold=False # Units with income <= 50000 are treated -) -``` - -### Create Post-Treatment Indicators - -Convert time/date columns to binary post-treatment indicators: - -```python -from diff_diff import make_post_indicator - -# From specific post-treatment periods -df = make_post_indicator( - data, - time_column='year', - post_periods=[2020, 2021, 2022] -) - -# From treatment start date -df = make_post_indicator( - data, - time_column='year', - treatment_start=2020 # All years >= 2020 are post-treatment -) - -# Works with datetime columns -df = make_post_indicator( - data, - time_column='date', - treatment_start='2020-01-01' -) -``` - -### Reshape Wide to Long Format - -Convert wide-format data (one row per unit, multiple time columns) to long format: - -```python -from diff_diff import wide_to_long - -# Wide format: columns like sales_2019, sales_2020, sales_2021 -wide_df = pd.DataFrame({ - 'firm_id': [1, 2, 3], - 'industry': ['tech', 'retail', 'tech'], - 'sales_2019': [100, 150, 200], - 'sales_2020': [110, 160, 210], - 'sales_2021': [120, 170, 220] -}) - -# Convert to long format for DiD -long_df = wide_to_long( - wide_df, - value_columns=['sales_2019', 'sales_2020', 'sales_2021'], - id_column='firm_id', - time_name='year', - value_name='sales', - time_values=[2019, 2020, 2021] -) -# Result: 9 rows (3 firms × 3 years), columns: firm_id, year, sales, industry -``` - -### Balance Panel Data - -Ensure all units have observations for all time periods: - -```python -from diff_diff import balance_panel - -# Keep only units with complete data (drop incomplete units) -balanced = balance_panel( - data, - unit_column='firm_id', - time_column='year', - method='inner' -) - -# Include all unit-period combinations (creates NaN for missing) -balanced = balance_panel( - data, - unit_column='firm_id', - time_column='year', - method='outer' -) - -# Fill missing values -balanced = balance_panel( - data, - unit_column='firm_id', - time_column='year', - method='fill', - fill_value=0 # Or None for forward/backward fill -) -``` - -### Validate Data - -Check that your data meets DiD requirements before fitting: - -```python -from diff_diff import validate_did_data - -# Validate and get informative error messages -result = validate_did_data( - data, - outcome='sales', - treatment='treated', - time='post', - unit='firm_id', # Optional: for panel-specific validation - raise_on_error=False # Return dict instead of raising -) - -if result['valid']: - print("Data is ready for DiD analysis!") - print(f"Summary: {result['summary']}") -else: - print("Issues found:") - for error in result['errors']: - print(f" - {error}") - -for warning in result['warnings']: - print(f"Warning: {warning}") -``` - -### Summarize Data by Groups - -Get summary statistics for each treatment-time cell: - -```python -from diff_diff import summarize_did_data - -summary = summarize_did_data( - data, - outcome='sales', - treatment='treated', - time='post' -) -print(summary) -``` - -Output: -``` - n mean std min max -Control - Pre 250 100.5000 15.2340 65.0000 145.0000 -Control - Post 250 105.2000 16.1230 68.0000 152.0000 -Treated - Pre 250 101.2000 14.8900 67.0000 143.0000 -Treated - Post 250 115.8000 17.5600 72.0000 165.0000 -DiD Estimate - 9.9000 - - - -``` - -### Create Event Time for Staggered Designs - -For designs where treatment occurs at different times: - -```python -from diff_diff import create_event_time - -# Add event-time column relative to treatment timing -df = create_event_time( - data, - time_column='year', - treatment_time_column='treatment_year' -) -# Result: event_time = -2, -1, 0, 1, 2 relative to treatment -``` - -### Aggregate to Cohort Means - -Aggregate unit-level data for visualization: - -```python -from diff_diff import aggregate_to_cohorts - -cohort_data = aggregate_to_cohorts( - data, - unit_column='firm_id', - time_column='year', - treatment_column='treated', - outcome='sales' -) -# Result: mean outcome by treatment group and period -``` - -### Rank Control Units - -Select the best control units for DiD or Synthetic DiD analysis by ranking them based on pre-treatment outcome similarity: - -```python -from diff_diff import rank_control_units, generate_did_data - -# Generate sample data -data = generate_did_data(n_units=50, n_periods=6, seed=42) - -# Rank control units by their similarity to treated units -ranking = rank_control_units( - data, - unit_column='unit', - time_column='period', - outcome_column='outcome', - treatment_column='treated', - n_top=10 # Return top 10 controls -) - -print(ranking[['unit', 'quality_score', 'pre_trend_rmse']]) -``` - -Output: -``` - unit quality_score pre_trend_rmse -0 35 1.0000 0.4521 -1 42 0.9234 0.5123 -2 28 0.8876 0.5892 -... -``` - -With covariates for matching: - -```python -# Add covariate-based matching -ranking = rank_control_units( - data, - unit_column='unit', - time_column='period', - outcome_column='outcome', - treatment_column='treated', - covariates=['size', 'age'], # Match on these too - outcome_weight=0.7, # 70% weight on outcome trends - covariate_weight=0.3 # 30% weight on covariate similarity -) -``` - -Filter data for SyntheticDiD using top controls: - -```python -from diff_diff import SyntheticDiD - -# Get top control units -top_controls = ranking['unit'].tolist() - -# Filter data to treated + top controls -filtered_data = data[ - (data['treated'] == 1) | (data['unit'].isin(top_controls)) -] - -# Fit SyntheticDiD with selected controls -sdid = SyntheticDiD() -results = sdid.fit( - filtered_data, - outcome='outcome', - treatment='treated', - unit='unit', - time='period', - post_periods=[3, 4, 5] -) -``` - -## Usage - -### Basic DiD with Column Names - -```python -from diff_diff import DifferenceInDifferences - -did = DifferenceInDifferences(robust=True, alpha=0.05) -results = did.fit( - data, - outcome='sales', - treatment='treated', - time='post_policy' -) - -# Access results -print(f"ATT: {results.att:.4f}") -print(f"Standard Error: {results.se:.4f}") -print(f"P-value: {results.p_value:.4f}") -print(f"95% CI: {results.conf_int}") -print(f"Significant: {results.is_significant}") -``` - -### Using Formula Interface - -```python -# R-style formula syntax -results = did.fit(data, formula='outcome ~ treated * post') - -# Explicit interaction syntax -results = did.fit(data, formula='outcome ~ treated + post + treated:post') - -# With covariates -results = did.fit(data, formula='outcome ~ treated * post + age + income') -``` - -### Including Covariates - -```python -results = did.fit( - data, - outcome='outcome', - treatment='treated', - time='post', - covariates=['age', 'income', 'education'] -) -``` - -### Fixed Effects - -Use `fixed_effects` for low-dimensional categorical controls (creates dummy variables): - -```python -# State and industry fixed effects -results = did.fit( - data, - outcome='sales', - treatment='treated', - time='post', - fixed_effects=['state', 'industry'] -) - -# Access fixed effect coefficients -state_coefs = {k: v for k, v in results.coefficients.items() if k.startswith('state_')} -``` - -Use `absorb` for high-dimensional fixed effects (more efficient, uses within-transformation): - -```python -# Absorb firm-level fixed effects (efficient for many firms) -results = did.fit( - data, - outcome='sales', - treatment='treated', - time='post', - absorb=['firm_id'] -) -``` - -Combine covariates with fixed effects: - -```python -results = did.fit( - data, - outcome='sales', - treatment='treated', - time='post', - covariates=['size', 'age'], # Linear controls - fixed_effects=['industry'], # Low-dimensional FE (dummies) - absorb=['firm_id'] # High-dimensional FE (absorbed) -) -``` - -### Cluster-Robust Standard Errors - -```python -did = DifferenceInDifferences(cluster='state') -results = did.fit( - data, - outcome='outcome', - treatment='treated', - time='post' -) -``` - -### Wild Cluster Bootstrap - -When you have few clusters (<50), standard cluster-robust SEs are biased. Wild cluster bootstrap provides valid inference even with 5-10 clusters. - -```python -# Use wild bootstrap for inference -did = DifferenceInDifferences( - cluster='state', - inference='wild_bootstrap', - n_bootstrap=999, - bootstrap_weights='rademacher', # or 'webb' for <10 clusters, 'mammen' - seed=42 -) -results = did.fit(data, outcome='y', treatment='treated', time='post') - -# Results include bootstrap-based SE and p-value -print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") -print(f"P-value: {results.p_value:.4f}") -print(f"95% CI: {results.conf_int}") -print(f"Inference method: {results.inference_method}") -print(f"Number of clusters: {results.n_clusters}") -``` - -**Weight types:** -- `'rademacher'` - Default, ±1 with p=0.5, good for most cases -- `'webb'` - 6-point distribution, recommended for <10 clusters -- `'mammen'` - Two-point distribution, alternative to Rademacher - -Works with `DifferenceInDifferences` and `TwoWayFixedEffects` estimators. - -### Two-Way Fixed Effects (Panel Data) - -```python -from diff_diff import TwoWayFixedEffects - -twfe = TwoWayFixedEffects() -results = twfe.fit( - panel_data, - outcome='outcome', - treatment='treated', - time='year', - unit='firm_id' -) -``` - -### Multi-Period DiD (Event Study) - -For settings with multiple pre- and post-treatment periods. Estimates treatment × period -interactions for ALL periods (pre and post), enabling parallel trends assessment: - -```python -from diff_diff import MultiPeriodDiD - -# Fit full event study with pre and post period effects -did = MultiPeriodDiD() -results = did.fit( - panel_data, - outcome='sales', - treatment='treated', - time='period', - post_periods=[3, 4, 5], # Periods 3-5 are post-treatment - reference_period=2, # Last pre-period (e=-1 convention) - unit='unit_id', # Optional: warns if staggered adoption detected -) - -# Pre-period effects test parallel trends (should be ≈ 0) -for period, effect in results.pre_period_effects.items(): - print(f"Pre {period}: {effect.effect:.3f} (SE: {effect.se:.3f})") - -# Post-period effects estimate dynamic treatment effects -for period, effect in results.post_period_effects.items(): - print(f"Post {period}: {effect.effect:.3f} (SE: {effect.se:.3f})") - -# View average treatment effect across post-periods -print(f"Average ATT: {results.avg_att:.3f}") -print(f"Average SE: {results.avg_se:.3f}") - -# Full summary with pre and post period effects -results.print_summary() -``` - -Output: -``` -================================================================================ - Multi-Period Difference-in-Differences Estimation Results -================================================================================ - -Observations: 600 -Pre-treatment periods: 3 -Post-treatment periods: 3 - --------------------------------------------------------------------------------- -Average Treatment Effect --------------------------------------------------------------------------------- -Average ATT 5.2000 0.8234 6.315 0.0000 --------------------------------------------------------------------------------- -95% Confidence Interval: [3.5862, 6.8138] - -Period-Specific Effects: --------------------------------------------------------------------------------- -Period Effect Std. Err. t-stat P>|t| --------------------------------------------------------------------------------- -3 4.5000 0.9512 4.731 0.0000*** -4 5.2000 0.8876 5.858 0.0000*** -5 5.9000 0.9123 6.468 0.0000*** --------------------------------------------------------------------------------- - -Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 -================================================================================ -``` - -### Staggered Difference-in-Differences (Callaway-Sant'Anna) - -When treatment is adopted at different times by different units, traditional TWFE estimators can be biased. The Callaway-Sant'Anna estimator provides unbiased estimates with staggered adoption. - -```python -from diff_diff import CallawaySantAnna - -# Panel data with staggered treatment -# 'first_treat' = period when unit was first treated (0 if never treated) -cs = CallawaySantAnna() -results = cs.fit( - panel_data, - outcome='sales', - unit='firm_id', - time='year', - first_treat='first_treat', # 0 for never-treated, else first treatment year - aggregate='event_study' # Compute event study effects -) - -# View results -results.print_summary() +Measuring campaign lift? Evaluating a product launch? Rolling out a policy in waves? diff-diff handles the causal inference so you can focus on the business question. -# Access group-time effects ATT(g,t) -for (group, time), effect in results.group_time_effects.items(): - print(f"Cohort {group}, Period {time}: {effect['effect']:.3f}") +- [Which method fits my problem?](https://diff-diff.readthedocs.io/en/stable/practitioner_decision_tree.html) - start from your business scenario (campaign in some markets, staggered rollout, survey data) and find the right estimator +- [Getting started for practitioners](https://diff-diff.readthedocs.io/en/stable/practitioner_getting_started.html) - end-to-end walkthrough from marketing campaign to causal estimate to stakeholder-ready result +- [Brand awareness survey tutorial](https://diff-diff.readthedocs.io/en/stable/tutorials/17_brand_awareness_survey.html) - full example with complex survey design, brand funnel analysis, and staggered rollouts +- Have BRFSS/ACS/CPS individual records? Use [`aggregate_survey()`](https://diff-diff.readthedocs.io/en/stable/api/prep.html) to roll respondent-level microdata into a geographic-period panel with inverse-variance precision weights for second-stage DiD -# Event study effects (averaged by relative time) -for rel_time, effect in results.event_study_effects.items(): - print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})") +`BusinessReport` and `DiagnosticReport` are experimental preview classes that produce plain-English output and a structured `to_dict()` schema from any fitted result - wording and schema will evolve. See [docs/methodology/REPORTING.md](https://github.com/igerber/diff-diff/blob/main/docs/methodology/REPORTING.md) for usage and stability notes. -# Convert to DataFrame -df = results.to_dataframe(level='event_study') -``` - -Output: -``` -===================================================================================== - Callaway-Sant'Anna Staggered Difference-in-Differences Results -===================================================================================== - -Total observations: 600 -Treated units: 35 -Control units: 15 -Treatment cohorts: 3 -Time periods: 8 -Control group: never_treated - -------------------------------------------------------------------------------------- - Overall Average Treatment Effect on the Treated -------------------------------------------------------------------------------------- -Parameter Estimate Std. Err. t-stat P>|t| Sig. -------------------------------------------------------------------------------------- -ATT 2.5000 0.3521 7.101 0.0000 *** -------------------------------------------------------------------------------------- - -95% Confidence Interval: [1.8099, 3.1901] - -------------------------------------------------------------------------------------- - Event Study (Dynamic) Effects -------------------------------------------------------------------------------------- -Rel. Period Estimate Std. Err. t-stat P>|t| Sig. -------------------------------------------------------------------------------------- -0 2.1000 0.4521 4.645 0.0000 *** -1 2.5000 0.4123 6.064 0.0000 *** -2 2.8000 0.5234 5.349 0.0000 *** -------------------------------------------------------------------------------------- - -Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 -===================================================================================== -``` - -**When to use Callaway-Sant'Anna vs TWFE:** - -| Scenario | Use TWFE | Use Callaway-Sant'Anna | -|----------|----------|------------------------| -| All units treated at same time | ✓ | ✓ | -| Staggered adoption, homogeneous effects | ✓ | ✓ | -| Staggered adoption, heterogeneous effects | ✗ | ✓ | -| Need event study with staggered timing | ✗ | ✓ | -| Fewer than ~20 treated units | ✓ | Depends on design | - -**Parameters:** - -```python -CallawaySantAnna( - control_group='never_treated', # or 'not_yet_treated' - anticipation=0, # Periods before treatment with effects - estimation_method='dr', # 'dr', 'ipw', or 'reg' - alpha=0.05, # Significance level - cluster=None, # Column for cluster SEs - n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None # Random seed -) -``` - -**Multiplier bootstrap for inference:** - -With few clusters or when analytical standard errors may be unreliable, use the multiplier bootstrap for valid inference. This implements the approach from Callaway & Sant'Anna (2021). - -```python -# Bootstrap inference with 999 iterations -cs = CallawaySantAnna( - n_bootstrap=999, - bootstrap_weights='rademacher', # or 'mammen', 'webb' - seed=42 -) -results = cs.fit( - data, - outcome='sales', - unit='firm_id', - time='year', - first_treat='first_treat', - aggregate='event_study' -) - -# Access bootstrap results -print(f"Overall ATT: {results.overall_att:.3f}") -print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}") -print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}") -print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}") - -# Event study bootstrap inference -for rel_time, se in results.bootstrap_results.event_study_ses.items(): - ci = results.bootstrap_results.event_study_cis[rel_time] - print(f"e={rel_time}: SE={se:.3f}, 95% CI=[{ci[0]:.3f}, {ci[1]:.3f}]") -``` - -**Bootstrap weight types:** -- `'rademacher'` - Default, ±1 with p=0.5, good for most cases -- `'mammen'` - Two-point distribution matching first 3 moments -- `'webb'` - Six-point distribution, recommended for very few clusters (<10) - -**Covariate adjustment for conditional parallel trends:** - -When parallel trends only holds conditional on covariates, use the `covariates` parameter: - -```python -# Doubly robust estimation with covariates -cs = CallawaySantAnna(estimation_method='dr') # 'dr', 'ipw', or 'reg' -results = cs.fit( - data, - outcome='sales', - unit='firm_id', - time='year', - first_treat='first_treat', - covariates=['size', 'age', 'industry'], # Covariates for conditional PT - aggregate='event_study' -) -``` - -### Sun-Abraham Interaction-Weighted Estimator - -The Sun-Abraham (2021) estimator provides an alternative to Callaway-Sant'Anna using an interaction-weighted (IW) regression approach. Running both estimators serves as a useful robustness check—when they agree, results are more credible. - -```python -from diff_diff import SunAbraham +## Practitioner Workflow (Baker et al. 2025) -# Basic usage -sa = SunAbraham() -results = sa.fit( - panel_data, - outcome='sales', - unit='firm_id', - time='year', - first_treat='first_treat' # 0 for never-treated, else first treatment year -) +For rigorous DiD analysis, follow these 8 steps. Skipping diagnostic steps produces unreliable results. -# View results -results.print_summary() +1. **Define target parameter** - ATT, group-time ATT(g,t), or event-study ATT_es(e). State whether weighted or unweighted. +2. **State identification assumptions** - which parallel trends variant (unconditional, conditional, PT-GT-Nev, PT-GT-NYT), no-anticipation, overlap. +3. **Test parallel trends** - simple 2x2: `check_parallel_trends()`, `equivalence_test_trends()`; staggered: inspect CS event-study pre-period coefficients (generic PT tests are invalid for staggered designs). Insignificant pre-trends do NOT prove PT holds. +4. **Choose estimator** - staggered adoption -> CS/SA/BJS (NOT plain TWFE); few treated units -> SDiD; factor confounding -> TROP; simple 2x2 -> DiD. Run `BaconDecomposition` to diagnose TWFE bias. +5. **Estimate** - `estimator.fit(data, ...)`. Always print the cluster count first and choose inference method based on the result (cluster-robust if >= 50 clusters, wild bootstrap if fewer). +6. **Sensitivity analysis** - `compute_honest_did(results)` for bounds under PT violations (MultiPeriodDiD, CS, or dCDH), `run_all_placebo_tests()` for 2x2 falsification, specification comparisons for staggered designs. +7. **Heterogeneity** - CS: `aggregate='group'`/`'event_study'`; SA: `results.event_study_effects` / `to_dataframe(level='cohort')`; subgroup re-estimation. +8. **Robustness** - compare 2-3 estimators (CS vs SA vs BJS), report with and without covariates (shows whether conditioning drives identification), present pre-trends and sensitivity bounds. -# Event study effects (by relative time to treatment) -for rel_time, effect in results.event_study_effects.items(): - print(f"e={rel_time}: {effect['effect']:.3f} (SE: {effect['se']:.3f})") +Full guide: `diff_diff.get_llm_guide("practitioner")`. -# Overall ATT -print(f"Overall ATT: {results.overall_att:.3f} (SE: {results.overall_se:.3f})") +## Estimators -# Cohort weights (how each cohort contributes to each event-time estimate) -for rel_time, weights in results.cohort_weights.items(): - print(f"e={rel_time}: {weights}") -``` - -**Parameters:** - -```python -SunAbraham( - control_group='never_treated', # or 'not_yet_treated' - anticipation=0, # Periods before treatment with effects - alpha=0.05, # Significance level - cluster=None, # Column for cluster SEs - n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None # Random seed -) -``` - -**Bootstrap inference:** - -```python -# Bootstrap inference with 999 iterations -sa = SunAbraham( - n_bootstrap=999, - bootstrap_weights='rademacher', - seed=42 -) -results = sa.fit( - data, - outcome='sales', - unit='firm_id', - time='year', - first_treat='first_treat' -) +- [DifferenceInDifferences](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - basic 2x2 DiD with robust/cluster-robust SEs, wild bootstrap, formula interface, and fixed effects +- [TwoWayFixedEffects](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - panel data DiD with unit and time fixed effects via within-transformation or dummies +- [MultiPeriodDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - event study design with period-specific treatment effects for dynamic analysis +- [CallawaySantAnna](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Callaway & Sant'Anna (2021) group-time ATT estimator for staggered adoption +- [ChaisemartinDHaultfoeuille](https://diff-diff.readthedocs.io/en/stable/api/chaisemartin_dhaultfoeuille.html) - de Chaisemartin & D'Haultfœuille (2020/2022) for **reversible (non-absorbing) treatments** with multi-horizon event study, normalized effects, cost-benefit delta, sup-t bands, and dynamic placebos. The only library option for treatments that switch on AND off. Alias `DCDH`. +- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies +- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects +- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance +- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units +- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility +- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves +- [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html) - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲ = 0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲ > 0`, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release - repeated cross-sections rejected by the validator. Alias `HAD`. +- [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html) - Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments +- [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html) - Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs +- [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html) - Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment +- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered.html#staggeredtripledifference) - Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT +- [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html) - Wooldridge (2023, 2025) ETWFE: saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias `ETWFE`. +- [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html) - Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings -# Access bootstrap results -print(f"Overall ATT: {results.overall_att:.3f}") -print(f"Bootstrap SE: {results.bootstrap_results.overall_att_se:.3f}") -print(f"Bootstrap 95% CI: {results.bootstrap_results.overall_att_ci}") -print(f"Bootstrap p-value: {results.bootstrap_results.overall_att_p_value:.4f}") -``` - -**When to use Sun-Abraham vs Callaway-Sant'Anna:** - -| Aspect | Sun-Abraham | Callaway-Sant'Anna | -|--------|-------------|-------------------| -| Approach | Interaction-weighted regression | 2x2 DiD aggregation | -| Efficiency | More efficient under homogeneous effects | More robust to heterogeneity | -| Weighting | Weights by cohort share at each relative time | Weights by sample size | -| Use case | Robustness check, regression-based inference | Primary staggered DiD estimator | - -**Both estimators should give similar results when:** -- Treatment effects are relatively homogeneous across cohorts -- Parallel trends holds +## Diagnostics & Sensitivity -**Running both as robustness check:** +- [Parallel Trends Testing](https://diff-diff.readthedocs.io/en/stable/api/diagnostics.html) - simple and Wasserstein-robust parallel trends tests, equivalence testing (TOST) +- [Placebo Tests](https://diff-diff.readthedocs.io/en/stable/api/diagnostics.html) - placebo timing, group, permutation, leave-one-out +- [Honest DiD](https://diff-diff.readthedocs.io/en/stable/api/honest_did.html) - Rambachan & Roth (2023) sensitivity analysis: robust CI under PT violations, breakdown values +- [Pre-Trends Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/pretrends.html) - Roth (2022) minimum detectable violation and power curves +- [Power Analysis](https://diff-diff.readthedocs.io/en/stable/api/power.html) - analytical and simulation-based MDE, sample size, power curves for study design -```python -from diff_diff import CallawaySantAnna, SunAbraham - -# Callaway-Sant'Anna -cs = CallawaySantAnna() -cs_results = cs.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat') +## Survey Support -# Sun-Abraham -sa = SunAbraham() -sa_results = sa.fit(data, outcome='y', unit='unit', time='time', first_treat='first_treat') +Most estimators accept an optional `survey_design` parameter (or `survey=` / `weights=` for `HeterogeneousAdoptionDiD`) for design-based variance estimation. Coverage and supported weight types vary by estimator - see the [Survey Design Support compatibility matrix](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support) for the per-estimator support table. -# Compare -print(f"Callaway-Sant'Anna ATT: {cs_results.overall_att:.3f}") -print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}") - -# If results differ substantially, investigate heterogeneity -``` +- **Design elements available across the supported set**: strata, PSU, FPC, lonely PSU handling, nest. Weight types vary by estimator: some surfaces (e.g. CallawaySantAnna, StackedDiD, the HAD continuous path) accept `pweight` only; others accept `pweight` / `fweight` / `aweight`. +- **Variance methods**: Taylor Series Linearization (TSL via Binder 1983), replicate weights (BRR / Fay / JK1 / JKn / SDR), survey-aware bootstrap +- **Diagnostics**: DEFF per coefficient, effective n, subpopulation analysis, weight trimming, CV on estimates +- **Repeated cross-sections**: `CallawaySantAnna(panel=False)` for BRFSS, ACS, CPS -### Borusyak-Jaravel-Spiess Imputation Estimator +No other Python or R DiD package offers design-based variance estimation for modern heterogeneity-robust estimators. -The Borusyak et al. (2024) imputation estimator is the **efficient** estimator for staggered DiD under parallel trends, producing ~50% shorter confidence intervals than Callaway-Sant'Anna and 2-3.5x shorter than Sun-Abraham under homogeneous treatment effects. +## Requirements -```python -from diff_diff import ImputationDiD, imputation_did +- Python 3.9 - 3.14 +- numpy >= 1.20 +- pandas >= 1.3 +- scipy >= 1.7 -# Basic usage -est = ImputationDiD() -results = est.fit(data, outcome='outcome', unit='unit', - time='period', first_treat='first_treat') -results.print_summary() +## Development -# Event study -results = est.fit(data, outcome='outcome', unit='unit', - time='period', first_treat='first_treat', - aggregate='event_study') +```bash +# Install with dev dependencies +pip install -e ".[dev]" -# Pre-trend test (Equation 9) -pt = results.pretrend_test(n_leads=3) -print(f"F-stat: {pt['f_stat']:.3f}, p-value: {pt['p_value']:.4f}") +# Run tests +pytest -# Convenience function -results = imputation_did(data, 'outcome', 'unit', 'period', 'first_treat', - aggregate='all') -``` - -```python -ImputationDiD( - anticipation=0, # Number of anticipation periods - alpha=0.05, # Significance level - cluster=None, # Cluster variable (defaults to unit) - n_bootstrap=0, # Bootstrap iterations (0=analytical inference) - seed=None, # Random seed - horizon_max=None, # Max event-study horizon - aux_partition="cohort_horizon", # Variance partition: "cohort_horizon", "cohort", "horizon" -) -``` - -**When to use Imputation DiD vs Callaway-Sant'Anna:** - -| Aspect | Imputation DiD | Callaway-Sant'Anna | -|--------|---------------|-------------------| -| Efficiency | Most efficient under homogeneous effects | Less efficient but more robust to heterogeneity | -| Control group | Always uses all untreated obs | Choice of never-treated or not-yet-treated | -| Inference | Conservative variance (Theorem 3) | Multiplier bootstrap | -| Pre-trends | Built-in F-test (Equation 9) | Separate testing | - -### Two-Stage DiD (Gardner 2022) - -Two-Stage DiD addresses TWFE bias in staggered adoption designs by estimating unit and time fixed effects on untreated observations only, then regressing the residualized outcomes on treatment indicators. Point estimates match the Imputation DiD estimator (Borusyak et al. 2024); the key difference is that Two-Stage DiD uses a GMM sandwich variance estimator that accounts for first-stage estimation error, while Imputation DiD uses a conservative variance (Theorem 3). - -```python -from diff_diff import TwoStageDiD - -# Basic usage -est = TwoStageDiD() -results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat') -results.print_summary() -``` - -**Event study:** - -```python -# Event study aggregation with visualization -results = est.fit(data, outcome='outcome', unit='unit', time='period', - first_treat='first_treat', aggregate='event_study') -plot_event_study(results) -``` - -**Parameters:** - -```python -TwoStageDiD( - anticipation=0, # Periods of anticipation effects - alpha=0.05, # Significance level for CIs - cluster=None, # Column for cluster-robust SEs (defaults to unit) - n_bootstrap=0, # Bootstrap iterations (0 = analytical GMM SEs) - seed=None, # Random seed - rank_deficient_action='warn', # 'warn', 'error', or 'silent' - horizon_max=None, # Max event-study horizon -) -``` - -**When to use Two-Stage DiD vs Imputation DiD:** - -| Aspect | Two-Stage DiD | Imputation DiD | -|--------|--------------|---------------| -| Point estimates | Identical | Identical | -| Variance | GMM sandwich (accounts for first-stage error) | Conservative (Theorem 3, may overcover) | -| Intuition | Residualize then regress | Impute counterfactuals then aggregate | -| Reference impl. | R `did2s` package | R `didimputation` package | - -Both estimators are the efficient estimator under homogeneous treatment effects, producing shorter confidence intervals than Callaway-Sant'Anna or Sun-Abraham. - -### Stacked DiD (Wing, Freedman & Hollingsworth 2024) - -Stacked DiD addresses TWFE bias in staggered adoption settings by constructing a "clean" comparison dataset for each treatment cohort and stacking them together. Each cohort's sub-experiment compares units treated at that cohort's timing against units that are not yet treated (or never treated) within a symmetric event-study window. This avoids the "bad comparisons" problem in TWFE while retaining a regression-based framework that practitioners familiar with event studies will find intuitive. - -```python -from diff_diff import StackedDiD, generate_staggered_data - -# Generate sample data -data = generate_staggered_data(n_units=200, n_periods=12, - cohort_periods=[4, 6, 8], seed=42) - -# Fit stacked DiD with event study -est = StackedDiD(kappa_pre=2, kappa_post=2) -results = est.fit(data, outcome='outcome', unit='unit', - time='period', first_treat='first_treat', - aggregate='event_study') -results.print_summary() - -# Access stacked data for custom analysis -stacked = results.stacked_data - -# Convenience function -from diff_diff import stacked_did -results = stacked_did(data, 'outcome', 'unit', 'period', 'first_treat', - kappa_pre=2, kappa_post=2, aggregate='event_study') -``` - -**Parameters:** - -```python -StackedDiD( - kappa_pre=1, # Pre-treatment event-study periods - kappa_post=1, # Post-treatment event-study periods - weighting='aggregate', # 'aggregate', 'population', or 'sample_share' - clean_control='not_yet_treated', # 'not_yet_treated', 'strict', or 'never_treated' - cluster='unit', # 'unit' or 'unit_subexp' - alpha=0.05, # Significance level - anticipation=0, # Anticipation periods - rank_deficient_action='warn', # 'warn', 'error', or 'silent' -) -``` - -> **Note:** Group aggregation (`aggregate='group'`) is not supported because the pooled -> stacked regression cannot produce cohort-specific effects. Use `CallawaySantAnna` or -> `ImputationDiD` for cohort-level estimates. - -**When to use Stacked DiD vs Callaway-Sant'Anna:** - -| Aspect | Stacked DiD | Callaway-Sant'Anna | -|--------|-------------|-------------------| -| Approach | Stack cohort sub-experiments, run pooled TWFE | 2x2 DiD aggregation | -| Symmetric windows | Enforced via kappa_pre / kappa_post | Not required | -| Control group | Not-yet-treated (default) or never-treated | Never-treated or not-yet-treated | -| Covariates | Passed to pooled regression | Doubly robust / IPW | -| Intuition | Familiar event-study regression | Nonparametric aggregation | - -**Convenience function:** - -```python -# One-liner estimation -results = stacked_did( - data, - outcome='outcome', - unit='unit', - time='period', - first_treat='first_treat', - kappa_pre=3, - kappa_post=3, - aggregate='event_study' -) -``` - -### Efficient DiD (Chen, Sant'Anna & Xie 2025) - -Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs along the **no-covariate path**, producing tighter confidence intervals than standard estimators when the stronger PT-All assumption holds. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*. A doubly-robust covariate path is also available: it is consistent if either the outcome regression or the sieve propensity ratio is correctly specified, but the linear OLS outcome regression does not generically attain the efficiency bound unless the conditional mean is linear in the covariates. - -```python -from diff_diff import EfficientDiD, generate_staggered_data - -# Generate sample data -data = generate_staggered_data(n_units=300, n_periods=10, - cohort_periods=[4, 6, 8], seed=42) - -# Fit with PT-All (overidentified, tighter SEs) -edid = EfficientDiD(pt_assumption="all") -results = edid.fit(data, outcome='outcome', unit='unit', - time='period', first_treat='first_treat', - aggregate='all') -results.print_summary() - -# PT-Post mode (matches CS for post-treatment effects) -edid_post = EfficientDiD(pt_assumption="post") -results_post = edid_post.fit(data, outcome='outcome', unit='unit', - time='period', first_treat='first_treat') -``` - -**Parameters:** - -```python -EfficientDiD( - pt_assumption='all', # 'all' (overidentified) or 'post' (matches CS post-treatment ATT) - alpha=0.05, # Significance level - n_bootstrap=0, # Bootstrap iterations (0 = analytical only) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None, # Random seed - anticipation=0, # Anticipation periods -) -``` - -> **Note:** EfficientDiD supports covariate adjustment via a doubly-robust path -> (sieve-based propensity score ratios and a linear OLS outcome regression). -> The DR property gives consistency if either the OR or the PS is correctly -> specified, but the OLS working model for the outcome regression does not -> generically attain the semiparametric efficiency bound. The unqualified -> efficiency-bound claim applies to the no-covariate path only. See the -> `covariates` parameter on `fit()` and `docs/methodology/REGISTRY.md`. - -**When to use Efficient DiD vs Callaway-Sant'Anna:** - -| Aspect | Efficient DiD | Callaway-Sant'Anna | -|--------|--------------|-------------------| -| Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation | -| PT assumption | PT-All (stronger) or PT-Post | Conditional PT | -| Efficiency | Achieves semiparametric bound on the no-covariate path; DR covariate path is consistent but does not generically attain the bound under a linear OLS outcome regression | Not efficient | -| Covariates | Supported (doubly robust, sieve-based PS + linear OLS OR) | Supported (OR, IPW, DR) | -| When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT | - -### de Chaisemartin-D'Haultfœuille (dCDH) for Reversible Treatments - -`ChaisemartinDHaultfoeuille` (alias `DCDH`) is the only library estimator that handles **non-absorbing (reversible) treatments** — treatment can switch on AND off over time. This is the natural fit for marketing campaigns, seasonal promotions, on/off policy cycles. - -Ships `DID_M` (= `DID_1` at horizon `l = 1`), the full multi-horizon event study `DID_l` for `l = 1..L_max` via the `L_max` parameter, residualization-style covariate adjustment (`controls`), group-specific linear trends (`trends_linear`), state-set-specific trends (`trends_nonparam`), heterogeneity testing, non-binary treatment, HonestDiD sensitivity integration on placebos, and survey support via Taylor-series linearization. - -```python -from diff_diff import ChaisemartinDHaultfoeuille -from diff_diff.prep import generate_reversible_did_data - -# Generate a reversible-treatment panel -data = generate_reversible_did_data( - n_groups=80, n_periods=6, pattern="single_switch", seed=42, -) - -# Fit the estimator -est = ChaisemartinDHaultfoeuille() -results = est.fit( - data, - outcome="outcome", - group="group", - time="period", - treatment="treatment", -) -results.print_summary() - -# Decomposition -print(f"DID_M (overall): {results.overall_att:.3f}") -print(f"DID_+ (joiners): {results.joiners_att:.3f}") -print(f"DID_- (leavers): {results.leavers_att:.3f}") -print(f"Placebo (DID^pl): {results.placebo_effect:.3f}") -``` - -**Parameters:** - -```python -ChaisemartinDHaultfoeuille( - alpha=0.05, # Significance level - n_bootstrap=0, # 0 = analytical SE only; >0 = multiplier bootstrap - bootstrap_weights="rademacher", # 'rademacher', 'mammen', or 'webb' - seed=None, # Random seed for bootstrap - placebo=True, # Auto-compute single-lag placebo - twfe_diagnostic=True, # Auto-compute TWFE decomposition diagnostic - drop_larger_lower=True, # Drop multi-switch groups (matches R DIDmultiplegtDYN) - rank_deficient_action="warn", # Used by TWFE diagnostic OLS -) -``` - -**What you get back on the results object:** - -| Field | Description | -|-------|-------------| -| `overall_att`, `overall_se`, `overall_conf_int` | `DID_M` when `L_max=None`; cost-benefit `delta` when `L_max > 1` (delta-method SE from per-horizon SEs) | -| `joiners_att`, `leavers_att` | Decomposition into the joiners (`DID_+`) and leavers (`DID_-`) views | -| `placebo_effect` | Single-lag placebo (`DID_M^pl`) point estimate | -| `per_period_effects` | Per-period decomposition with explicit A11-violation flags | -| `twfe_weights`, `twfe_fraction_negative`, `twfe_sigma_fe`, `twfe_beta_fe` | Theorem 1 decomposition diagnostic | -| `n_groups_dropped_crossers`, `n_groups_dropped_singleton_baseline` | Filter counts (multi-switch groups dropped before estimation; singleton-baseline groups excluded from variance) | -| `n_groups_dropped_never_switching` | Backwards-compatibility metadata. Never-switching groups participate in the variance via stable-control roles; this field is no longer a filter count. | - -**Multi-horizon event study** (pass `L_max` to `fit()`): - -```python -results = est.fit(data, outcome="outcome", group="group", - time="period", treatment="treatment", L_max=5) - -# Per-horizon effects with analytical SE -for horizon in sorted(results.event_study_effects): - e = results.event_study_effects[horizon] - print(f" l={horizon}: DID_l={e['effect']:.3f} (SE={e['se']:.3f})") - -# Cost-benefit delta (becomes overall_att when L_max > 1) -print(f"Cost-benefit delta: {results.cost_benefit_delta['delta']:.3f}") - -# Normalized effects: DID^n_l = DID_l / l (for binary treatment) -for horizon in sorted(results.normalized_effects): - print(f" DID^n_{horizon} = {results.normalized_effects[horizon]['effect']:.3f}") - -# Event study DataFrame (includes placebos as negative horizons) -df = results.to_dataframe("event_study") - -# Plot (integrates with plot_event_study) -from diff_diff import plot_event_study -plot_event_study(results) -``` - -**Standalone TWFE decomposition diagnostic** (without fitting the full estimator): - -```python -from diff_diff import twowayfeweights - -diagnostic = twowayfeweights( - data, outcome="outcome", group="group", time="period", treatment="treatment", -) -print(f"Plain TWFE coefficient: {diagnostic.beta_fe:.3f}") -print(f"Fraction of negative weights: {diagnostic.fraction_negative:.3f}") -print(f"sigma_fe (sign-flipping threshold): {diagnostic.sigma_fe:.3f}") -``` - -> **Note:** Placebo SE is `NaN` for the single-period `DID_M^pl` (`L_max=None`) because the per-period aggregation path has no influence-function derivation; the point estimate is meaningful for visual pre-trends inspection. Multi-horizon dynamic placebos `DID^{pl}_l` (`L_max >= 1`) have valid analytical SE via the same cohort-recentered plug-in variance as the positive horizons, with bootstrap SE available when `n_bootstrap > 0`. See `docs/methodology/REGISTRY.md` for the full contract. - -> **Note:** By default (`drop_larger_lower=True`), the estimator drops groups whose treatment switches more than once before estimation. This matches R `DIDmultiplegtDYN`'s default and is required for the analytical variance formula to be consistent with the point estimate. Each drop emits an explicit warning. - -> **Note:** The estimator requires panels with a **balanced baseline** (every group observed at the first global period) and **no interior period gaps**. Late-entry groups (missing the baseline) raise `ValueError`; interior-gap groups are dropped with a warning; terminally-missing groups (early exit / right-censoring) are retained and contribute from their observed periods only. This is a documented deviation from R `DIDmultiplegtDYN`, which supports unbalanced panels - see [`docs/methodology/REGISTRY.md`](docs/methodology/REGISTRY.md) for the rationale, the defensive guards that make terminal missingness safe, and workarounds for unbalanced inputs. - -> **Note:** Survey design is supported via Taylor-series linearization on `pweight` with strata / PSU / FPC. Replicate-weight variance and PSU-level bootstrap for dCDH are a planned extension. The `aggregate` parameter still raises `NotImplementedError`. - -### Triple Difference (DDD) - -Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations). - -```python -from diff_diff import TripleDifference, triple_difference - -# Basic usage -ddd = TripleDifference(estimation_method='dr') # doubly robust (recommended) -results = ddd.fit( - data, - outcome='wages', - group='policy_state', # 1=state enacted policy, 0=control state - partition='female', # 1=women (affected by policy), 0=men - time='post' # 1=post-policy, 0=pre-policy -) - -# View results -results.print_summary() -print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") - -# With covariates (properly incorporated, unlike naive DDD) -results = ddd.fit( - data, - outcome='wages', - group='policy_state', - partition='female', - time='post', - covariates=['age', 'education', 'experience'] -) -``` - -**Estimation methods:** - -| Method | Description | When to use | -|--------|-------------|-------------| -| `"dr"` | Doubly robust | Recommended. Consistent if either outcome or propensity model is correct | -| `"reg"` | Regression adjustment | Simple outcome regression with full interactions | -| `"ipw"` | Inverse probability weighting | When propensity score model is well-specified | - -```python -# Compare estimation methods -for method in ['reg', 'ipw', 'dr']: - est = TripleDifference(estimation_method=method) - res = est.fit(data, outcome='y', group='g', partition='p', time='t') - print(f"{method}: ATT={res.att:.3f} (SE={res.se:.3f})") -``` - -**Convenience function:** - -```python -# One-liner estimation -results = triple_difference( - data, - outcome='wages', - group='policy_state', - partition='female', - time='post', - covariates=['age', 'education'], - estimation_method='dr' -) -``` - -**Why use DDD instead of DiD?** - -DDD allows for violations of parallel trends that are: -- Group-specific (e.g., economic shocks in treatment states) -- Partition-specific (e.g., trends affecting women everywhere) - -As long as these biases are additive, DDD differences them out. The key assumption is that the *differential* trend between eligible and ineligible units would be the same across groups. - -### Event Study Visualization - -Create publication-ready event study plots: - -```python -from diff_diff import plot_event_study, MultiPeriodDiD, CallawaySantAnna, SunAbraham - -# From MultiPeriodDiD (full event study with pre and post period effects) -did = MultiPeriodDiD() -results = did.fit(data, outcome='y', treatment='treated', - time='period', post_periods=[3, 4, 5], reference_period=2) -plot_event_study(results, title="Treatment Effects Over Time") - -# From CallawaySantAnna (with event study aggregation) -cs = CallawaySantAnna() -results = cs.fit(data, outcome='y', unit='unit', time='period', - first_treat='first_treat', aggregate='event_study') -plot_event_study(results, title="Staggered DiD Event Study (CS)") - -# From SunAbraham -sa = SunAbraham() -results = sa.fit(data, outcome='y', unit='unit', time='period', - first_treat='first_treat') -plot_event_study(results, title="Staggered DiD Event Study (SA)") - -# From a DataFrame -df = pd.DataFrame({ - 'period': [-2, -1, 0, 1, 2], - 'effect': [0.1, 0.05, 0.0, 2.5, 2.8], - 'se': [0.3, 0.25, 0.0, 0.4, 0.45] -}) -plot_event_study(df, reference_period=0) - -# With customization -ax = plot_event_study( - results, - title="Dynamic Treatment Effects", - xlabel="Years Relative to Treatment", - ylabel="Effect on Sales ($1000s)", - color="#2563eb", - marker="o", - shade_pre=True, # Shade pre-treatment region - show_zero_line=True, # Horizontal line at y=0 - show_reference_line=True, # Vertical line at reference period - figsize=(10, 6), - show=False # Don't call plt.show(), return axes -) -``` - -### Synthetic Difference-in-Differences - -Synthetic DiD combines the strengths of Difference-in-Differences and Synthetic Control methods by re-weighting control units to better match treated units' pre-treatment outcomes. - -```python -from diff_diff import SyntheticDiD - -# Fit Synthetic DiD model -sdid = SyntheticDiD() -results = sdid.fit( - panel_data, - outcome='gdp_growth', - treatment='treated', - unit='state', - time='year', - post_periods=[2015, 2016, 2017, 2018] -) - -# View results -results.print_summary() -print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") - -# Examine unit weights (which control units matter most) -weights_df = results.get_unit_weights_df() -print(weights_df.head(10)) - -# Examine time weights -time_weights_df = results.get_time_weights_df() -print(time_weights_df) -``` - -Output: -``` -=========================================================================== - Synthetic Difference-in-Differences Estimation Results -=========================================================================== - -Observations: 500 -Treated units: 1 -Control units: 49 -Pre-treatment periods: 6 -Post-treatment periods: 4 -Regularization (lambda): 0.0000 -Pre-treatment fit (RMSE): 0.1234 - ---------------------------------------------------------------------------- -Parameter Estimate Std. Err. t-stat P>|t| ---------------------------------------------------------------------------- -ATT 2.5000 0.4521 5.530 0.0000 ---------------------------------------------------------------------------- - -95% Confidence Interval: [1.6139, 3.3861] - ---------------------------------------------------------------------------- - Top Unit Weights (Synthetic Control) ---------------------------------------------------------------------------- - Unit state_12: 0.3521 - Unit state_5: 0.2156 - Unit state_23: 0.1834 - Unit state_8: 0.1245 - Unit state_31: 0.0892 - (8 units with weight > 0.001) - -Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 -=========================================================================== -``` - -#### When to Use Synthetic DiD Over Vanilla DiD - -Use Synthetic DiD instead of standard DiD when: - -1. **Few treated units**: When you have only one or a small number of treated units (e.g., a single state passed a policy), standard DiD averages across all controls equally. Synthetic DiD finds the optimal weighted combination of controls. - - ```python - # Example: California passed a policy, want to estimate its effect - # Standard DiD would compare CA to the average of all other states - # Synthetic DiD finds states that together best match CA's pre-treatment trend - ``` - -2. **Parallel trends is questionable**: When treated and control groups have different pre-treatment levels or trends, Synthetic DiD can construct a better counterfactual by matching the pre-treatment trajectory. - - ```python - # Example: A tech hub city vs rural areas - # Rural areas may not be a good comparison on average - # Synthetic DiD can weight urban/suburban controls more heavily - ``` - -3. **Heterogeneous control units**: When control units are very different from each other, equal weighting (as in standard DiD) is suboptimal. - - ```python - # Example: Comparing a treated developing country to other countries - # Some control countries may be much more similar economically - # Synthetic DiD upweights the most comparable controls - ``` - -4. **You want transparency**: Synthetic DiD provides explicit unit weights showing which controls contribute most to the comparison. - - ```python - # See exactly which units are driving the counterfactual - print(results.get_unit_weights_df()) - ``` - -**Key differences from standard DiD:** - -| Aspect | Standard DiD | Synthetic DiD | -|--------|--------------|---------------| -| Control weighting | Equal (1/N) | Optimized to match pre-treatment | -| Time weighting | Equal across periods | Can emphasize informative periods | -| N treated required | Can be many | Works with 1 treated unit | -| Parallel trends | Assumed | Partially relaxed via matching | -| Interpretability | Simple average | Explicit weights | - -**Parameters:** - -```python -SyntheticDiD( - zeta_omega=None, # Unit weight regularization (None = auto-computed from data) - zeta_lambda=None, # Time weight regularization (None = auto-computed from data) - alpha=0.05, # Significance level - variance_method="placebo", # "placebo" (default, matches R) or "bootstrap" - n_bootstrap=200, # Replications for SE estimation - seed=None # Random seed for reproducibility -) -``` - -### Triply Robust Panel (TROP) - -TROP (Athey, Imbens, Qu & Viviano 2025) extends Synthetic DiD by adding interactive fixed effects (factor model) adjustment. It's particularly useful when there are unobserved time-varying confounders with a factor structure that could bias standard DiD or SDID estimates. - -TROP combines three robustness components: -1. **Nuclear norm regularized factor model**: Estimates interactive fixed effects L_it via soft-thresholding -2. **Exponential distance-based unit weights**: ω_j = exp(-λ_unit × distance(j,i)) -3. **Exponential time decay weights**: θ_s = exp(-λ_time × |s-t|) - -Tuning parameters are selected via leave-one-out cross-validation (LOOCV). - -```python -from diff_diff import TROP, trop - -# Fit TROP model with automatic tuning via LOOCV -trop_est = TROP( - lambda_time_grid=[0.0, 0.5, 1.0, 2.0], # Time decay grid - lambda_unit_grid=[0.0, 0.5, 1.0, 2.0], # Unit distance grid - lambda_nn_grid=[0.0, 0.1, 1.0], # Nuclear norm grid - n_bootstrap=200 -) -# Note: TROP infers treatment periods from the treatment indicator column. -# The 'treated' column must be an absorbing state (D=1 for all periods -# during and after treatment starts for each unit). -results = trop_est.fit( - panel_data, - outcome='gdp_growth', - treatment='treated', - unit='state', - time='year' -) - -# View results -results.print_summary() -print(f"ATT: {results.att:.3f} (SE: {results.se:.3f})") -print(f"Effective rank: {results.effective_rank:.2f}") - -# Selected tuning parameters -print(f"λ_time: {results.lambda_time:.2f}") -print(f"λ_unit: {results.lambda_unit:.2f}") -print(f"λ_nn: {results.lambda_nn:.2f}") - -# Examine unit effects -unit_effects = results.get_unit_effects_df() -print(unit_effects.head(10)) -``` - -Output: -``` -=========================================================================== - Triply Robust Panel (TROP) Estimation Results - Athey, Imbens, Qu & Viviano (2025) -=========================================================================== - -Observations: 500 -Treated units: 1 -Control units: 49 -Treated observations: 4 -Pre-treatment periods: 6 -Post-treatment periods: 4 - ---------------------------------------------------------------------------- - Tuning Parameters (selected via LOOCV) ---------------------------------------------------------------------------- -Lambda (time decay): 1.0000 -Lambda (unit distance): 0.5000 -Lambda (nuclear norm): 0.1000 -Effective rank: 2.35 -LOOCV score: 0.012345 -Variance method: bootstrap -Bootstrap replications: 200 - ---------------------------------------------------------------------------- -Parameter Estimate Std. Err. t-stat P>|t| ---------------------------------------------------------------------------- -ATT 2.5000 0.3892 6.424 0.0000 *** ---------------------------------------------------------------------------- - -95% Confidence Interval: [1.7372, 3.2628] - -Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1 -=========================================================================== -``` - -#### When to Use TROP Over Synthetic DiD - -Use TROP when you suspect **factor structure** in the data—unobserved confounders that affect outcomes differently across units and time: - -| Scenario | Use SDID | Use TROP | -|----------|----------|----------| -| Simple parallel trends | ✓ | ✓ | -| Unobserved factors (e.g., economic cycles) | May be biased | ✓ | -| Strong unit-time interactions | May be biased | ✓ | -| Low-dimensional confounding | ✓ | ✓ | - -**Example scenarios where TROP excels:** -- Regional economic shocks that affect states differently based on industry composition -- Global trends that impact countries differently based on their economic structure -- Common factors in financial data (market risk, interest rates, etc.) - -**How TROP works:** - -1. **Factor estimation**: Estimates interactive fixed effects L_it using nuclear norm regularization (encourages low-rank structure) -2. **Unit weights**: Exponential distance-based weighting ω_j = exp(-λ_unit × d(j,i)) where d(j,i) is the RMSE of outcome differences -3. **Time weights**: Exponential decay weighting θ_s = exp(-λ_time × |s-t|) based on proximity to treatment -4. **ATT computation**: τ = Y_it - α_i - β_t - L_it for treated observations - -```python -# Compare TROP vs SDID under factor confounding -from diff_diff import SyntheticDiD - -# Synthetic DiD (may be biased with factors) -sdid = SyntheticDiD() -sdid_results = sdid.fit(data, outcome='y', treatment='treated', - unit='unit', time='time', post_periods=[5,6,7]) - -# TROP (accounts for factors) -# Note: TROP infers treatment periods from the treatment indicator column -# (D=1 for treated observations, D=0 for control) -trop_est = TROP() # Uses default grids with LOOCV selection -trop_results = trop_est.fit(data, outcome='y', treatment='treated', - unit='unit', time='time') - -print(f"SDID estimate: {sdid_results.att:.3f}") -print(f"TROP estimate: {trop_results.att:.3f}") -print(f"Effective rank: {trop_results.effective_rank:.2f}") -``` - -**Tuning parameter grids:** - -```python -# Custom tuning grids (searched via LOOCV) -trop = TROP( - lambda_time_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Time decay - lambda_unit_grid=[0.0, 0.1, 0.5, 1.0, 2.0, 5.0], # Unit distance - lambda_nn_grid=[0.0, 0.01, 0.1, 1.0, 10.0] # Nuclear norm -) - -# Fixed tuning parameters (skip LOOCV search) -trop = TROP( - lambda_time_grid=[1.0], # Single value = fixed - lambda_unit_grid=[1.0], # Single value = fixed - lambda_nn_grid=[0.1] # Single value = fixed -) -``` - -**Parameters:** - -```python -TROP( - method='local', # Estimation method: 'local' (default) or 'global' - lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5]) - lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5]) - lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10]) - max_iter=100, # Max iterations for factor estimation - tol=1e-6, # Convergence tolerance - alpha=0.05, # Significance level - n_bootstrap=200, # Bootstrap replications - seed=None # Random seed -) -``` - -**Estimation methods:** -- `'local'` (default): Per-observation model fitting following Algorithm 2 of the paper. Computes observation-specific weights and fits a model for each treated observation, then averages the individual treatment effects. More flexible but computationally intensive. -- `'global'`: Global weighted least squares optimization. Fits a single model on control observations with global weights, then computes per-observation treatment effects as residuals. Faster but uses global rather than observation-specific weights. - -**Convenience function:** - -```python -# One-liner estimation with default tuning grids -# Note: TROP infers treatment periods from the treatment indicator -results = trop( - data, - outcome='y', - treatment='treated', - unit='unit', - time='time', - n_bootstrap=200 -) -``` - -## Working with Results - -### Export Results - -```python -# As dictionary -results.to_dict() -# {'att': 3.5, 'se': 1.26, 'p_value': 0.037, ...} - -# As DataFrame -df = results.to_dataframe() -``` - -### Check Significance - -```python -if results.is_significant: - print(f"Effect is significant at {did.alpha} level") - -# Get significance stars -print(f"ATT: {results.att}{results.significance_stars}") -# ATT: 3.5000* -``` - -### Access Full Regression Output - -```python -# All coefficients -results.coefficients -# {'const': 9.5, 'treated': 1.0, 'post': 2.5, 'treated:post': 3.5} - -# Variance-covariance matrix -results.vcov - -# Residuals and fitted values -results.residuals -results.fitted_values - -# R-squared -results.r_squared -``` - -## Checking Assumptions - -### Parallel Trends - -**Simple slope-based test:** - -```python -from diff_diff.utils import check_parallel_trends - -trends = check_parallel_trends( - data, - outcome='outcome', - time='period', - treatment_group='treated' -) - -print(f"Treated trend: {trends['treated_trend']:.4f}") -print(f"Control trend: {trends['control_trend']:.4f}") -print(f"Difference p-value: {trends['p_value']:.4f}") -``` - -**Robust distributional test (Wasserstein distance):** - -```python -from diff_diff.utils import check_parallel_trends_robust - -results = check_parallel_trends_robust( - data, - outcome='outcome', - time='period', - treatment_group='treated', - unit='firm_id', # Unit identifier for panel data - pre_periods=[2018, 2019], # Pre-treatment periods - n_permutations=1000 # Permutations for p-value -) - -print(f"Wasserstein distance: {results['wasserstein_distance']:.4f}") -print(f"Wasserstein p-value: {results['wasserstein_p_value']:.4f}") -print(f"KS test p-value: {results['ks_p_value']:.4f}") -print(f"Parallel trends plausible: {results['parallel_trends_plausible']}") -``` - -The Wasserstein (Earth Mover's) distance compares the full distribution of outcome changes, not just means. This is more robust to: -- Non-normal distributions -- Heterogeneous effects across units -- Outliers - -**Equivalence testing (TOST):** - -```python -from diff_diff.utils import equivalence_test_trends - -results = equivalence_test_trends( - data, - outcome='outcome', - time='period', - treatment_group='treated', - unit='firm_id', - equivalence_margin=0.5 # Define "practically equivalent" -) - -print(f"Mean difference: {results['mean_difference']:.4f}") -print(f"TOST p-value: {results['tost_p_value']:.4f}") -print(f"Trends equivalent: {results['equivalent']}") -``` - -### Honest DiD Sensitivity Analysis (Rambachan-Roth) - -Pre-trends tests have low power and can exacerbate bias. **Honest DiD** (Rambachan & Roth 2023) provides sensitivity analysis showing how robust your results are to violations of parallel trends. - -```python -from diff_diff import HonestDiD, MultiPeriodDiD - -# First, fit a full event study (pre + post period effects) -did = MultiPeriodDiD() -event_results = did.fit( - data, - outcome='outcome', - treatment='treated', - time='period', - post_periods=[5, 6, 7, 8, 9], - reference_period=4, # Last pre-period (e=-1 convention) -) - -# Compute honest bounds with relative magnitudes restriction -# M=1 means post-treatment violations can be up to 1x the worst pre-treatment violation -honest = HonestDiD(method='relative_magnitude', M=1.0) -honest_results = honest.fit(event_results) - -print(honest_results.summary()) -print(f"Original estimate: {honest_results.original_estimate:.4f}") -print(f"Robust 95% CI: [{honest_results.ci_lb:.4f}, {honest_results.ci_ub:.4f}]") -print(f"Effect robust to violations: {honest_results.is_significant}") -``` - -**Sensitivity analysis over M values:** - -```python -# How do results change as we allow larger violations? -sensitivity = honest.sensitivity_analysis( - event_results, - M_grid=[0, 0.5, 1.0, 1.5, 2.0] -) - -print(sensitivity.summary()) -print(f"Breakdown value: M = {sensitivity.breakdown_M}") -# Breakdown = smallest M where the robust CI includes zero -``` - -**Breakdown value:** - -The breakdown value tells you how robust your conclusion is: - -```python -breakdown = honest.breakdown_value(event_results) -if breakdown >= 1.0: - print("Result holds even if post-treatment violations are as bad as pre-treatment") -else: - print(f"Result requires violations smaller than {breakdown:.1f}x pre-treatment") -``` - -**Smoothness restriction (alternative approach):** - -```python -# Bounds second differences of trend violations -# M=0 means linear extrapolation of pre-trends -honest_smooth = HonestDiD(method='smoothness', M=0.5) -smooth_results = honest_smooth.fit(event_results) -``` - -**Visualization:** - -```python -from diff_diff import plot_sensitivity, plot_honest_event_study - -# Plot sensitivity analysis -plot_sensitivity(sensitivity, title="Sensitivity to Parallel Trends Violations") - -# Event study with honest confidence intervals -plot_honest_event_study(event_results, honest_results) -``` - -### Pre-Trends Power Analysis (Roth 2022) - -A passing pre-trends test doesn't mean parallel trends holds—it may just mean the test has low power. **Pre-Trends Power Analysis** (Roth 2022) answers: "What violations could my pre-trends test have detected?" - -```python -from diff_diff import PreTrendsPower, MultiPeriodDiD - -# First, fit a full event study -did = MultiPeriodDiD() -event_results = did.fit( - data, - outcome='outcome', - treatment='treated', - time='period', - post_periods=[5, 6, 7, 8, 9], - reference_period=4, -) - -# Analyze pre-trends test power -pt = PreTrendsPower(alpha=0.05, power=0.80) -power_results = pt.fit(event_results) - -print(power_results.summary()) -print(f"Minimum Detectable Violation (MDV): {power_results.mdv:.4f}") -print(f"Power to detect violations of size MDV: {power_results.power:.1%}") -``` - -**Key concepts:** - -- **Minimum Detectable Violation (MDV)**: Smallest violation magnitude that would be detected with your target power (e.g., 80%). Passing the pre-trends test does NOT rule out violations up to this size. -- **Power**: Probability of detecting a violation of given size if it exists. -- **Violation types**: Linear trend, constant violation, last-period only, or custom patterns. - -**Power curve visualization:** - -```python -from diff_diff import plot_pretrends_power - -# Generate power curve across violation magnitudes -curve = pt.power_curve(event_results) - -# Plot the power curve -plot_pretrends_power(curve, title="Pre-Trends Test Power Curve") - -# Or from the curve object directly -curve.plot() -``` - -**Different violation patterns:** - -```python -# Linear trend violations (default) - most common assumption -pt_linear = PreTrendsPower(violation_type='linear') - -# Constant violation in all pre-periods -pt_constant = PreTrendsPower(violation_type='constant') - -# Violation only in the last pre-period (sharp break) -pt_last = PreTrendsPower(violation_type='last_period') - -# Custom violation pattern -custom_weights = np.array([0.1, 0.3, 0.6]) # Increasing violations -pt_custom = PreTrendsPower(violation_type='custom', violation_weights=custom_weights) -``` - -**Combining with HonestDiD:** - -Pre-trends power analysis and HonestDiD are complementary: -1. **Pre-trends power** tells you what the test could have detected -2. **HonestDiD** tells you how robust your results are to violations - -```python -from diff_diff import HonestDiD, PreTrendsPower - -# If MDV is large relative to your estimated effect, be cautious -pt = PreTrendsPower() -power_results = pt.fit(event_results) -sensitivity = pt.sensitivity_to_honest_did(event_results) -print(sensitivity['interpretation']) - -# Use HonestDiD for robust inference -honest = HonestDiD(method='relative_magnitude', M=1.0) -honest_results = honest.fit(event_results) -``` - -### Placebo Tests - -Placebo tests help validate the parallel trends assumption by checking whether effects appear where they shouldn't (before treatment or in untreated groups). - -**Fake timing test:** - -```python -from diff_diff import run_placebo_test - -# Test: Is there an effect before treatment actually occurred? -# Actual treatment is at period 3 (post_periods=[3, 4, 5]) -# We test if a "fake" treatment at period 1 shows an effect -results = run_placebo_test( - data, - outcome='outcome', - treatment='treated', - time='period', - test_type='fake_timing', - fake_treatment_period=1, # Pretend treatment was in period 1 - post_periods=[3, 4, 5] # Actual post-treatment periods -) - -print(results.summary()) -# If parallel trends hold, placebo_effect should be ~0 and not significant -print(f"Placebo effect: {results.placebo_effect:.3f} (p={results.p_value:.3f})") -print(f"Is significant (bad): {results.is_significant}") -``` - -**Fake group test:** - -```python -# Test: Is there an effect among never-treated units? -# Get some control unit IDs to use as "fake treated" -control_units = data[data['treated'] == 0]['firm_id'].unique()[:5] - -results = run_placebo_test( - data, - outcome='outcome', - treatment='treated', - time='period', - unit='firm_id', - test_type='fake_group', - fake_treatment_group=list(control_units), # List of control unit IDs - post_periods=[3, 4, 5] -) -``` - -**Permutation test:** - -```python -# Randomly reassign treatment and compute distribution of effects -# Note: requires binary post indicator (use 'post' column, not 'period') -results = run_placebo_test( - data, - outcome='outcome', - treatment='treated', - time='post', # Binary post-treatment indicator - unit='firm_id', - test_type='permutation', - n_permutations=1000, - seed=42 -) - -print(f"Original effect: {results.original_effect:.3f}") -print(f"Permutation p-value: {results.p_value:.4f}") -# Low p-value indicates the effect is unlikely to be due to chance -``` - -**Leave-one-out sensitivity:** - -```python -# Test sensitivity to individual treated units -# Note: requires binary post indicator (use 'post' column, not 'period') -results = run_placebo_test( - data, - outcome='outcome', - treatment='treated', - time='post', # Binary post-treatment indicator - unit='firm_id', - test_type='leave_one_out' -) - -# Check if any single unit drives the result -print(results.leave_one_out_effects) # Effect when each unit is dropped -``` - -**Run all placebo tests:** - -```python -from diff_diff import run_all_placebo_tests - -# Comprehensive diagnostic suite -# Note: This function runs fake_timing tests on pre-treatment periods. -# The permutation and leave_one_out tests require a binary post indicator, -# so they may return errors if the data uses multi-period time column. -all_results = run_all_placebo_tests( - data, - outcome='outcome', - treatment='treated', - time='period', - unit='firm_id', - pre_periods=[0, 1, 2], - post_periods=[3, 4, 5], - n_permutations=500, - seed=42 -) - -for test_name, result in all_results.items(): - if hasattr(result, 'p_value'): - print(f"{test_name}: p={result.p_value:.3f}, significant={result.is_significant}") - elif isinstance(result, dict) and 'error' in result: - print(f"{test_name}: Error - {result['error']}") -``` - -## API Reference - -### DifferenceInDifferences - -```python -DifferenceInDifferences( - robust=True, # Use HC1 robust standard errors - cluster=None, # Column for cluster-robust SEs - alpha=0.05 # Significance level for CIs -) -``` - -**Methods:** - -| Method | Description | -|--------|-------------| -| `fit(data, outcome, treatment, time, ...)` | Fit the DiD model | -| `summary()` | Get formatted summary string | -| `print_summary()` | Print summary to stdout | -| `get_params()` | Get estimator parameters (sklearn-compatible) | -| `set_params(**params)` | Set estimator parameters (sklearn-compatible) | - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Input data | -| `outcome` | str | Outcome variable column name | -| `treatment` | str | Treatment indicator column (0/1) | -| `time` | str | Post-treatment indicator column (0/1) | -| `formula` | str | R-style formula (alternative to column names) | -| `covariates` | list | Linear control variables | -| `fixed_effects` | list | Categorical FE columns (creates dummies) | -| `absorb` | list | High-dimensional FE (within-transformation) | - -### DiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `att` | Average Treatment effect on the Treated | -| `se` | Standard error of ATT | -| `t_stat` | T-statistic | -| `p_value` | P-value for H0: ATT = 0 | -| `conf_int` | Tuple of (lower, upper) confidence bounds | -| `n_obs` | Number of observations | -| `n_treated` | Number of treated units | -| `n_control` | Number of control units | -| `r_squared` | R-squared of regression | -| `coefficients` | Dictionary of all coefficients | -| `is_significant` | Boolean for significance at alpha | -| `significance_stars` | String of significance stars | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | - -### MultiPeriodDiD - -```python -MultiPeriodDiD( - robust=True, # Use HC1 robust standard errors - cluster=None, # Column for cluster-robust SEs - alpha=0.05 # Significance level for CIs -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Input data | -| `outcome` | str | Outcome variable column name | -| `treatment` | str | Treatment indicator column (0/1) | -| `time` | str | Time period column (multiple values) | -| `post_periods` | list | List of post-treatment period values | -| `covariates` | list | Linear control variables | -| `fixed_effects` | list | Categorical FE columns (creates dummies) | -| `absorb` | list | High-dimensional FE (within-transformation) | -| `reference_period` | any | Omitted period (default: last pre-period, e=-1 convention) | -| `unit` | str | Unit identifier column (for staggered adoption warning) | - -### MultiPeriodDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `period_effects` | Dict mapping periods to PeriodEffect objects (pre and post, excluding reference) | -| `avg_att` | Average ATT across post-treatment periods only | -| `avg_se` | Standard error of average ATT | -| `avg_t_stat` | T-statistic for average ATT | -| `avg_p_value` | P-value for average ATT | -| `avg_conf_int` | Confidence interval for average ATT | -| `n_obs` | Number of observations | -| `pre_periods` | List of pre-treatment periods | -| `post_periods` | List of post-treatment periods | -| `reference_period` | The omitted reference period (coefficient = 0 by construction) | -| `interaction_indices` | Dict mapping period → column index in VCV (for sub-VCV extraction) | -| `pre_period_effects` | Property: pre-period effects only (for parallel trends assessment) | -| `post_period_effects` | Property: post-period effects only | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `get_effect(period)` | Get PeriodEffect for specific period | -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | - -### PeriodEffect - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `period` | Time period identifier | -| `effect` | Treatment effect estimate | -| `se` | Standard error | -| `t_stat` | T-statistic | -| `p_value` | P-value | -| `conf_int` | Confidence interval | -| `is_significant` | Boolean for significance at 0.05 | -| `significance_stars` | String of significance stars | - -### SyntheticDiD - -```python -SyntheticDiD( - zeta_omega=None, # Unit weight regularization (None = auto from data) - zeta_lambda=None, # Time weight regularization (None = auto from data) - alpha=0.05, # Significance level for CIs - variance_method="placebo", # "placebo" (R default) or "bootstrap" - n_bootstrap=200, # Replications for SE estimation - seed=None # Random seed for reproducibility -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `treatment` | str | Treatment indicator column (0/1) | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | -| `post_periods` | list | List of post-treatment period values | -| `covariates` | list | Covariates to residualize out | - -### SyntheticDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `att` | Average Treatment effect on the Treated | -| `se` | Standard error (bootstrap or placebo-based) | -| `t_stat` | T-statistic | -| `p_value` | P-value | -| `conf_int` | Confidence interval | -| `n_obs` | Number of observations | -| `n_treated` | Number of treated units | -| `n_control` | Number of control units | -| `unit_weights` | Dict mapping control unit IDs to weights | -| `time_weights` | Dict mapping pre-treatment periods to weights | -| `pre_periods` | List of pre-treatment periods | -| `post_periods` | List of post-treatment periods | -| `pre_treatment_fit` | RMSE of synthetic vs treated in pre-period | -| `placebo_effects` | Array of placebo effect estimates | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | -| `get_unit_weights_df()` | Get unit weights as DataFrame | -| `get_time_weights_df()` | Get time weights as DataFrame | - -### TROP - -```python -TROP( - lambda_time_grid=None, # Time decay grid (default: [0, 0.1, 0.5, 1, 2, 5]) - lambda_unit_grid=None, # Unit distance grid (default: [0, 0.1, 0.5, 1, 2, 5]) - lambda_nn_grid=None, # Nuclear norm grid (default: [0, 0.01, 0.1, 1, 10]) - max_iter=100, # Max iterations for factor estimation - tol=1e-6, # Convergence tolerance - alpha=0.05, # Significance level for CIs - n_bootstrap=200, # Bootstrap replications (minimum 2; TROP requires bootstrap for SEs) - seed=None # Random seed -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `treatment` | str | Treatment indicator column (0/1 absorbing state) | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | - -Note: TROP infers treatment periods from the treatment indicator column. The treatment column should be an absorbing state indicator where D=1 for all periods during and after treatment starts. - -### TROPResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `att` | Average Treatment effect on the Treated | -| `se` | Standard error (bootstrap) | -| `t_stat` | T-statistic | -| `p_value` | P-value | -| `conf_int` | Confidence interval | -| `n_obs` | Number of observations | -| `n_treated` | Number of treated units | -| `n_control` | Number of control units | -| `n_treated_obs` | Number of treated unit-time observations | -| `unit_effects` | Dict mapping unit IDs to fixed effects | -| `time_effects` | Dict mapping periods to fixed effects | -| `treatment_effects` | Dict mapping (unit, time) to individual effects | -| `lambda_time` | Selected time decay parameter | -| `lambda_unit` | Selected unit distance parameter | -| `lambda_nn` | Selected nuclear norm parameter | -| `factor_matrix` | Low-rank factor matrix L (n_periods x n_units) | -| `effective_rank` | Effective rank of factor matrix | -| `loocv_score` | LOOCV score for selected parameters | -| `n_pre_periods` | Number of pre-treatment periods | -| `n_post_periods` | Number of post-treatment periods | -| `bootstrap_distribution` | Bootstrap distribution (if bootstrap) | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | -| `get_unit_effects_df()` | Get unit fixed effects as DataFrame | -| `get_time_effects_df()` | Get time fixed effects as DataFrame | -| `get_treatment_effects_df()` | Get individual treatment effects as DataFrame | - -### SunAbraham - -```python -SunAbraham( - control_group='never_treated', # or 'not_yet_treated' - anticipation=0, # Periods of anticipation effects - alpha=0.05, # Significance level for CIs - cluster=None, # Column for cluster-robust SEs - n_bootstrap=0, # Bootstrap iterations (0 = analytical SEs) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None # Random seed -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | -| `first_treat` | str | Column with first treatment period (0 for never-treated) | -| `covariates` | list | Covariate column names | - -### SunAbrahamResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `event_study_effects` | Dict mapping relative time to effect info | -| `overall_att` | Overall average treatment effect | -| `overall_se` | Standard error of overall ATT | -| `overall_t_stat` | T-statistic for overall ATT | -| `overall_p_value` | P-value for overall ATT | -| `overall_conf_int` | Confidence interval for overall ATT | -| `cohort_weights` | Dict mapping relative time to cohort weights | -| `groups` | List of treatment cohorts | -| `time_periods` | List of all time periods | -| `n_obs` | Total number of observations | -| `n_treated_units` | Number of ever-treated units | -| `n_control_units` | Number of never-treated units | -| `is_significant` | Boolean for significance at alpha | -| `significance_stars` | String of significance stars | -| `bootstrap_results` | SABootstrapResults (if bootstrap enabled) | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') | - -### ImputationDiD - -```python -ImputationDiD( - anticipation=0, # Periods of anticipation effects - alpha=0.05, # Significance level for CIs - cluster=None, # Column for cluster-robust SEs - n_bootstrap=0, # Bootstrap iterations (0 = analytical) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None, # Random seed - rank_deficient_action='warn', # 'warn', 'error', or 'silent' - horizon_max=None, # Max event-study horizon - aux_partition='cohort_horizon', # Variance partition -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | -| `first_treat` | str | First treatment period column (0 for never-treated) | -| `covariates` | list | Covariate column names | -| `aggregate` | str | Aggregation: None, "event_study", "group", "all" | -| `balance_e` | int | Balance event study to this many pre-treatment periods | - -### ImputationDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `overall_att` | Overall average treatment effect on the treated | -| `overall_se` | Standard error (conservative, Theorem 3) | -| `overall_t_stat` | T-statistic | -| `overall_p_value` | P-value for H0: ATT = 0 | -| `overall_conf_int` | Confidence interval | -| `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) | -| `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) | -| `treatment_effects` | DataFrame of unit-level imputed treatment effects | -| `n_treated_obs` | Number of treated observations | -| `n_untreated_obs` | Number of untreated observations | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') | -| `pretrend_test(n_leads)` | Run pre-trend F-test (Equation 9) | - -### TwoStageDiD - -```python -TwoStageDiD( - anticipation=0, # Periods of anticipation effects - alpha=0.05, # Significance level for CIs - cluster=None, # Column for cluster-robust SEs (defaults to unit) - n_bootstrap=0, # Bootstrap iterations (0 = analytical GMM SEs) - bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb' - seed=None, # Random seed - rank_deficient_action='warn', # 'warn', 'error', or 'silent' - horizon_max=None, # Max event-study horizon -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | -| `first_treat` | str | First treatment period column (0 for never-treated) | -| `covariates` | list | Covariate column names | -| `aggregate` | str | Aggregation: None, "event_study", "group", "all" | -| `balance_e` | int | Balance event study to this many pre-treatment periods | - -### TwoStageDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `overall_att` | Overall average treatment effect on the treated | -| `overall_se` | Standard error (GMM sandwich variance) | -| `overall_t_stat` | T-statistic | -| `overall_p_value` | P-value for H0: ATT = 0 | -| `overall_conf_int` | Confidence interval | -| `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) | -| `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) | -| `treatment_effects` | DataFrame of unit-level treatment effects | -| `n_treated_obs` | Number of treated observations | -| `n_untreated_obs` | Number of untreated observations | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') | - -### StackedDiD - -```python -StackedDiD( - kappa_pre=1, # Pre-treatment event-study periods - kappa_post=1, # Post-treatment event-study periods - weighting='aggregate', # 'aggregate', 'population', or 'sample_share' - clean_control='not_yet_treated', # 'not_yet_treated', 'strict', or 'never_treated' - cluster='unit', # 'unit' or 'unit_subexp' - alpha=0.05, # Significance level - anticipation=0, # Anticipation periods - rank_deficient_action='warn', # 'warn', 'error', or 'silent' -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Panel data | -| `outcome` | str | Outcome variable column name | -| `unit` | str | Unit identifier column | -| `time` | str | Time period column | -| `first_treat` | str | First treatment period column (0 for never-treated) | -| `population` | str, optional | Population column (required if weighting='population') | -| `aggregate` | str | Aggregation: None, `"simple"`, or `"event_study"` | - -### StackedDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `overall_att` | Overall average treatment effect on the treated | -| `overall_se` | Standard error | -| `overall_t_stat` | T-statistic | -| `overall_p_value` | P-value for H0: ATT = 0 | -| `overall_conf_int` | Confidence interval | -| `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'`) | -| `stacked_data` | The stacked dataset used for estimation | -| `n_treated_obs` | Number of treated observations | -| `n_untreated_obs` | Number of untreated (clean control) observations | -| `n_cohorts` | Number of treatment cohorts | -| `kappa_pre` | Pre-treatment window used | -| `kappa_post` | Post-treatment window used | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dataframe(level)` | Convert to DataFrame ('event_study') | - -### TripleDifference - -```python -TripleDifference( - estimation_method='dr', # 'dr' (doubly robust), 'reg', or 'ipw' - robust=True, # Use HC1 robust standard errors - cluster=None, # Column for cluster-robust SEs - alpha=0.05, # Significance level for CIs - pscore_trim=0.01 # Propensity score trimming threshold -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `data` | DataFrame | Input data | -| `outcome` | str | Outcome variable column name | -| `group` | str | Group indicator column (0/1): 1=treated group | -| `partition` | str | Partition/eligibility indicator column (0/1): 1=eligible | -| `time` | str | Time indicator column (0/1): 1=post-treatment | -| `covariates` | list | Covariate column names for adjustment | - -### TripleDifferenceResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `att` | Average Treatment effect on the Treated | -| `se` | Standard error of ATT | -| `t_stat` | T-statistic | -| `p_value` | P-value for H0: ATT = 0 | -| `conf_int` | Tuple of (lower, upper) confidence bounds | -| `n_obs` | Total number of observations | -| `n_treated_eligible` | Obs in treated group & eligible partition | -| `n_treated_ineligible` | Obs in treated group & ineligible partition | -| `n_control_eligible` | Obs in control group & eligible partition | -| `n_control_ineligible` | Obs in control group & ineligible partition | -| `estimation_method` | Method used ('dr', 'reg', or 'ipw') | -| `group_means` | Dict of cell means for diagnostics | -| `pscore_stats` | Propensity score statistics (IPW/DR only) | -| `is_significant` | Boolean for significance at alpha | -| `significance_stars` | String of significance stars | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary(alpha)` | Get formatted summary string | -| `print_summary(alpha)` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | - -### HonestDiD - -```python -HonestDiD( - method='relative_magnitude', # 'relative_magnitude' or 'smoothness' - M=None, # Restriction parameter (default: 1.0 for RM, 0.0 for SD) - alpha=0.05, # Significance level for CIs - l_vec=None # Linear combination vector for target parameter -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `results` | MultiPeriodDiDResults | Results from MultiPeriodDiD.fit() | -| `M` | float | Restriction parameter (overrides constructor value) | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `fit(results, M)` | Compute bounds for given event study results | -| `sensitivity_analysis(results, M_grid)` | Compute bounds over grid of M values | -| `breakdown_value(results, tol)` | Find smallest M where CI includes zero | - -### HonestDiDResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `original_estimate` | Point estimate under parallel trends | -| `lb` | Lower bound of identified set | -| `ub` | Upper bound of identified set | -| `ci_lb` | Lower bound of robust confidence interval | -| `ci_ub` | Upper bound of robust confidence interval | -| `ci_width` | Width of robust CI | -| `M` | Restriction parameter used | -| `method` | Restriction method ('relative_magnitude' or 'smoothness') | -| `alpha` | Significance level | -| `is_significant` | True if robust CI excludes zero | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary()` | Get formatted summary string | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | - -### SensitivityResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `M_grid` | Array of M values analyzed | -| `results` | List of HonestDiDResults for each M | -| `breakdown_M` | Smallest M where CI includes zero (None if always significant) | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary()` | Get formatted summary string | -| `plot(ax)` | Plot sensitivity analysis | -| `to_dataframe()` | Convert to pandas DataFrame | - -### PreTrendsPower - -```python -PreTrendsPower( - alpha=0.05, # Significance level for pre-trends test - power=0.80, # Target power for MDV calculation - violation_type='linear', # 'linear', 'constant', 'last_period', 'custom' - violation_weights=None # Custom weights (required if violation_type='custom') -) -``` - -**fit() Parameters:** - -| Parameter | Type | Description | -|-----------|------|-------------| -| `results` | MultiPeriodDiDResults | Results from event study | -| `M` | float | Specific violation magnitude to evaluate | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `fit(results, M)` | Compute power analysis for given event study | -| `power_at(results, M)` | Compute power for specific violation magnitude | -| `power_curve(results, M_grid, n_points)` | Compute power across range of M values | -| `sensitivity_to_honest_did(results)` | Compare with HonestDiD analysis | - -### PreTrendsPowerResults - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `power` | Power to detect the specified violation | -| `mdv` | Minimum detectable violation at target power | -| `violation_magnitude` | Violation magnitude (M) tested | -| `violation_type` | Type of violation pattern | -| `alpha` | Significance level | -| `target_power` | Target power level | -| `n_pre_periods` | Number of pre-treatment periods | -| `test_statistic` | Expected test statistic under violation | -| `critical_value` | Critical value for pre-trends test | -| `noncentrality` | Non-centrality parameter | -| `is_informative` | Heuristic check if test is informative | -| `power_adequate` | Whether power meets target | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `summary()` | Get formatted summary string | -| `print_summary()` | Print summary to stdout | -| `to_dict()` | Convert to dictionary | -| `to_dataframe()` | Convert to pandas DataFrame | - -### PreTrendsPowerCurve - -**Attributes:** - -| Attribute | Description | -|-----------|-------------| -| `M_values` | Array of violation magnitudes | -| `powers` | Array of power values | -| `mdv` | Minimum detectable violation | -| `alpha` | Significance level | -| `target_power` | Target power level | -| `violation_type` | Type of violation pattern | - -**Methods:** - -| Method | Description | -|--------|-------------| -| `plot(ax, show_mdv, show_target)` | Plot power curve | -| `to_dataframe()` | Convert to DataFrame with M and power columns | - -### Data Preparation Functions - -#### generate_did_data - -```python -generate_did_data( - n_units=100, # Number of units - n_periods=4, # Number of time periods - treatment_effect=5.0, # True ATT - treatment_fraction=0.5, # Fraction treated - treatment_period=2, # First post-treatment period - unit_fe_sd=2.0, # Unit fixed effect std dev - time_trend=0.5, # Linear time trend - noise_sd=1.0, # Idiosyncratic noise std dev - seed=None # Random seed -) -``` - -Returns DataFrame with columns: `unit`, `period`, `treated`, `post`, `outcome`, `true_effect`. - -#### make_treatment_indicator - -```python -make_treatment_indicator( - data, # Input DataFrame - column, # Column to create treatment from - treated_values=None, # Value(s) indicating treatment - threshold=None, # Numeric threshold for treatment - above_threshold=True, # If True, >= threshold is treated - new_column='treated' # Output column name -) -``` - -#### make_post_indicator - -```python -make_post_indicator( - data, # Input DataFrame - time_column, # Time/period column - post_periods=None, # Specific post-treatment period(s) - treatment_start=None, # First post-treatment period - new_column='post' # Output column name -) -``` - -#### wide_to_long - -```python -wide_to_long( - data, # Wide-format DataFrame - value_columns, # List of time-varying columns - id_column, # Unit identifier column - time_name='period', # Name for time column - value_name='value', # Name for value column - time_values=None # Values for time periods -) -``` - -#### balance_panel - -```python -balance_panel( - data, # Panel DataFrame - unit_column, # Unit identifier column - time_column, # Time period column - method='inner', # 'inner', 'outer', or 'fill' - fill_value=None # Value for filling (if method='fill') -) -``` - -#### validate_did_data - -```python -validate_did_data( - data, # DataFrame to validate - outcome, # Outcome column name - treatment, # Treatment column name - time, # Time/post column name - unit=None, # Unit column (for panel validation) - raise_on_error=True # Raise ValueError or return dict -) -``` - -Returns dict with `valid`, `errors`, `warnings`, and `summary` keys. - -#### summarize_did_data - -```python -summarize_did_data( - data, # Input DataFrame - outcome, # Outcome column name - treatment, # Treatment column name - time, # Time/post column name - unit=None # Unit column (optional) -) -``` - -Returns DataFrame with summary statistics by treatment-time cell. - -#### create_event_time - -```python -create_event_time( - data, # Panel DataFrame - time_column, # Calendar time column - treatment_time_column, # Column with treatment timing - new_column='event_time' # Output column name -) -``` - -#### aggregate_to_cohorts - -```python -aggregate_to_cohorts( - data, # Unit-level panel data - unit_column, # Unit identifier column - time_column, # Time period column - treatment_column, # Treatment indicator column - outcome, # Outcome variable column - covariates=None # Additional columns to aggregate -) -``` - -#### rank_control_units - -```python -rank_control_units( - data, # Panel data in long format - unit_column, # Unit identifier column - time_column, # Time period column - outcome_column, # Outcome variable column - treatment_column=None, # Treatment indicator column (0/1) - treated_units=None, # Explicit list of treated unit IDs - pre_periods=None, # Pre-treatment periods (default: first half) - covariates=None, # Covariate columns for matching - outcome_weight=0.7, # Weight for outcome trend similarity (0-1) - covariate_weight=0.3, # Weight for covariate distance (0-1) - exclude_units=None, # Units to exclude from control pool - require_units=None, # Units that must appear in output - n_top=None, # Return only top N controls - suggest_treatment_candidates=False, # Identify treatment candidates - n_treatment_candidates=5, # Number of treatment candidates - lambda_reg=0.0 # Regularization for synthetic weights -) -``` - -Returns DataFrame with columns: `unit`, `quality_score`, `outcome_trend_score`, `covariate_score`, `synthetic_weight`, `pre_trend_rmse`, `is_required`. - -## Requirements - -- Python 3.9 - 3.14 -- numpy >= 1.20 -- pandas >= 1.3 -- scipy >= 1.7 - -## Development - -```bash -# Install with dev dependencies -pip install -e ".[dev]" - -# Run tests -pytest - -# Format code -black diff_diff tests -ruff check diff_diff tests +# Format code +black diff_diff tests +ruff check diff_diff tests ``` ## References -This library implements methods from the following scholarly works: - -### Difference-in-Differences - -- **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. [https://doi.org/10.2307/1924810](https://doi.org/10.2307/1924810) - -- **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. [https://www.jstor.org/stable/2118030](https://www.jstor.org/stable/2118030) - -- **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences. - -### Two-Way Fixed Effects - -- **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press. - -- **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. [https://doi.org/10.1017/pan.2020.33](https://doi.org/10.1017/pan.2020.33) - -### Robust Standard Errors - -- **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. [https://doi.org/10.2307/1912934](https://doi.org/10.2307/1912934) - -- **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. [https://doi.org/10.1016/0304-4076(85)90158-7](https://doi.org/10.1016/0304-4076(85)90158-7) - -- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. [https://doi.org/10.1198/jbes.2010.07136](https://doi.org/10.1198/jbes.2010.07136) - -### Wild Cluster Bootstrap - -- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. [https://doi.org/10.1162/rest.90.3.414](https://doi.org/10.1162/rest.90.3.414) - -- **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. [https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf](https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf) - -- **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. [https://doi.org/10.1111/ectj.12107](https://doi.org/10.1111/ectj.12107) - -### Placebo Tests and DiD Diagnostics - -- **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. [https://doi.org/10.1162/003355304772839588](https://doi.org/10.1162/003355304772839588) - -### Synthetic Control Method - -- **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. [https://doi.org/10.1257/000282803321455188](https://doi.org/10.1257/000282803321455188) - -- **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. [https://doi.org/10.1198/jasa.2009.ap08746](https://doi.org/10.1198/jasa.2009.ap08746) - -- **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. [https://doi.org/10.1111/ajps.12116](https://doi.org/10.1111/ajps.12116) - -### Synthetic Difference-in-Differences - -- **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. [https://doi.org/10.1257/aer.20190159](https://doi.org/10.1257/aer.20190159) - -### Triply Robust Panel (TROP) - -- **Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025).** "Triply Robust Panel Estimators." *Working Paper*. [https://arxiv.org/abs/2508.21536](https://arxiv.org/abs/2508.21536) - - This paper introduces the TROP estimator which combines three robustness components: - - **Factor model adjustment**: Low-rank factor structure via SVD removes unobserved confounders - - **Unit weights**: Synthetic control style weighting for optimal comparison - - **Time weights**: SDID style time weighting for informative pre-periods - - TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time. - -### Triple Difference (DDD) - -- **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. [https://arxiv.org/abs/2505.09942](https://arxiv.org/abs/2505.09942) - - This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The `TripleDifference` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators. - -- **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. [https://www.jstor.org/stable/2118071](https://www.jstor.org/stable/2118071) - - Classic paper introducing the Triple Difference design for policy evaluation. - -- **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. [https://doi.org/10.1093/ectj/utac010](https://doi.org/10.1093/ectj/utac010) - -### Parallel Trends and Pre-Trend Testing - -- **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. [https://doi.org/10.1257/aeri.20210236](https://doi.org/10.1257/aeri.20210236) - -- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. [https://doi.org/10.1177/1948550617697177](https://doi.org/10.1177/1948550617697177) - -### Honest DiD / Sensitivity Analysis - -The `HonestDiD` module implements sensitivity analysis methods for relaxing the parallel trends assumption: - -- **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. [https://doi.org/10.1093/restud/rdad018](https://doi.org/10.1093/restud/rdad018) - - This paper introduces the "Honest DiD" framework implemented in our `HonestDiD` class: - - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations - - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends - - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions - - **Robust Confidence Intervals**: Valid inference under partial identification - -- **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. [https://doi.org/10.3982/ECTA19402](https://doi.org/10.3982/ECTA19402) - - Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate. - -### Multi-Period and Staggered Adoption - -- **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. [https://doi.org/10.1093/restud/rdae007](https://doi.org/10.1093/restud/rdae007) - - This paper introduces the imputation estimator implemented in our `ImputationDiD` class: - - **Efficient imputation**: OLS on untreated observations → impute counterfactuals → aggregate - - **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model - - **Pre-trend test**: Independent of treatment effect estimation (Proposition 9) - - **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects - -- **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001) - -- **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003) - -- **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. [https://doi.org/10.1016/j.jeconom.2020.09.006](https://doi.org/10.1016/j.jeconom.2020.09.006) - -- **Gardner, J. (2022).** "Two-stage differences in differences." *arXiv preprint arXiv:2207.05943*. [https://arxiv.org/abs/2207.05943](https://arxiv.org/abs/2207.05943) - -- **Butts, K., & Gardner, J. (2022).** "did2s: Two-Stage Difference-in-Differences." *The R Journal*, 14(1), 162-173. [https://doi.org/10.32614/RJ-2022-048](https://doi.org/10.32614/RJ-2022-048) - -- **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. [https://doi.org/10.1257/aer.20181169](https://doi.org/10.1257/aer.20181169) - -- **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. [https://doi.org/10.1016/j.jeconom.2021.03.014](https://doi.org/10.1016/j.jeconom.2021.03.014) - -- **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. [https://www.nber.org/papers/w32054](https://www.nber.org/papers/w32054) - -### Power Analysis - -- **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. [https://doi.org/10.1177/0193841X9501900504](https://doi.org/10.1177/0193841X9501900504) - -- **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. [https://doi.org/10.1016/j.jdeveco.2020.102458](https://doi.org/10.1016/j.jdeveco.2020.102458) - - Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings. - -- **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. [https://doi.org/10.1080/19439342.2016.1244555](https://doi.org/10.1080/19439342.2016.1244555) - -### General Causal Inference - -- **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press. - -- **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. [https://mixtape.scunning.com/](https://mixtape.scunning.com/) +This library implements methods from a wide body of econometric and causal-inference research. See the full bibliography on [Read the Docs](https://diff-diff.readthedocs.io/en/stable/references.html) for citations spanning DiD foundations, modern staggered estimators, sensitivity analysis, and synthetic controls. ## Citing diff-diff @@ -3108,11 +177,11 @@ If you use diff-diff in your research, please cite it: } ``` -The DOI above is the Zenodo concept DOI — it always resolves to the latest release. To cite a specific version, look up its versioned DOI on [the Zenodo project page](https://doi.org/10.5281/zenodo.19646175). +The DOI above is the Zenodo concept DOI - it always resolves to the latest release. To cite a specific version, look up its versioned DOI on [the Zenodo project page](https://doi.org/10.5281/zenodo.19646175). -See [`CITATION.cff`](CITATION.cff) for the full citation metadata. +See [`CITATION.cff`](https://github.com/igerber/diff-diff/blob/main/CITATION.cff) for the full citation metadata. -**Note on authorship**: academic citation (`CITATION.cff`, the BibTeX above) lists individual authors with ORCIDs per scholarly convention. Package metadata surfaces (`pyproject.toml`, Sphinx docs) list "diff-diff contributors" to acknowledge the collective — see [`CONTRIBUTORS.md`](CONTRIBUTORS.md) for the full list. +**Note on authorship**: academic citation (`CITATION.cff`, the BibTeX above) lists individual authors with ORCIDs per scholarly convention. Package metadata surfaces (`pyproject.toml`, Sphinx docs) list "diff-diff contributors" to acknowledge the collective - see [`CONTRIBUTORS.md`](https://github.com/igerber/diff-diff/blob/main/CONTRIBUTORS.md) for the full list. ## License diff --git a/TODO.md b/TODO.md index bd97aedb..a180f301 100644 --- a/TODO.md +++ b/TODO.md @@ -105,7 +105,7 @@ Deferred items from PR reviews that were not addressed before merge. | `HeterogeneousAdoptionDiD` Phase 3 R-parity: Phase 3 ships coverage-rate validation on synthetic DGPs (not tight point parity against `chaisemartin::stute_test` / `yatchew_test`). Tight numerical parity requires aligning bootstrap seed semantics and `B` across numpy/R and is deferred. | `tests/test_had_pretests.py` | Phase 3 | Low | | `HeterogeneousAdoptionDiD` Phase 3 nprobust bandwidth for Stute: some Stute variants on continuous regressors use nprobust-style optimal bandwidth selection. Phase 3 uses OLS residuals from a 2-parameter linear fit (no bandwidth selection). nprobust integration is a future enhancement; not in paper scope. | `diff_diff/had_pretests.py::stute_test` | Phase 3 | Low | | `HeterogeneousAdoptionDiD` Phase 4: Pierce-Schott (2016) replication harness; reproduce paper Figure 2 values and Table 1 coverage rates. | `benchmarks/`, `tests/` | Phase 2a | Low | -| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms.txt` updates (preserving UTF-8 fingerprint). | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/` | Phase 2a | Low | +| `HeterogeneousAdoptionDiD` Phase 5: `practitioner_next_steps()` integration, tutorial notebook, and `llms-full.txt` HeterogeneousAdoptionDiD section (preserving UTF-8 fingerprint). README catalog + bundled `llms.txt` entry + `docs/api/had.rst` + `docs/references.rst` citation landed in PR #372 docs refresh. | `diff_diff/practitioner.py`, `tutorials/`, `diff_diff/guides/llms-full.txt` | Phase 2a | Low | | `HeterogeneousAdoptionDiD` time-varying dose on event study: Phase 2b REJECTS panels where `D_{g,t}` varies within a unit for `t >= F` (the aggregation uses `D_{g, F}` as the single regressor for all horizons, paper Appendix B.2 constant-dose convention). A follow-up PR could add a time-varying-dose estimator for these panels; current behavior is front-door rejection with a redirect to `ChaisemartinDHaultfoeuille`. | `diff_diff/had.py::_validate_had_panel_event_study` | Phase 2b | Low | | `HeterogeneousAdoptionDiD` repeated-cross-section support: paper Section 2 defines HAD on panel OR repeated cross-section, but Phase 2a is panel-only. RCS inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator with the generic "unit(s) do not appear in both periods" error. A follow-up PR will add an RCS identification path based on pre/post cell means (rather than unit-level first differences), with its own validator and a distinct `data_mode` / API surface. | `diff_diff/had.py::_validate_had_panel`, `diff_diff/had.py::_aggregate_first_difference` | Phase 2a | Medium | | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | diff --git a/diff-diff.png b/diff-diff.png new file mode 100644 index 00000000..5ea5a14a Binary files /dev/null and b/diff-diff.png differ diff --git a/diff_diff/guides/llms-practitioner.txt b/diff_diff/guides/llms-practitioner.txt index a8a78008..acb0adaa 100644 --- a/diff_diff/guides/llms-practitioner.txt +++ b/diff_diff/guides/llms-practitioner.txt @@ -293,11 +293,25 @@ print(results.summary()) This step is CRITICAL and most often skipped. Run at least one of: -### HonestDiD (Rambachan & Roth 2023) — recommended +### HonestDiD (Rambachan & Roth 2023) - recommended Bounds on the treatment effect under violations of parallel trends. -Works with MultiPeriodDiD and CallawaySantAnna results only. For CS, -requires `aggregate='event_study'` or `aggregate='all'` so that event -study effects are available. +Works with MultiPeriodDiD, CallawaySantAnna, and ChaisemartinDHaultfoeuille +(dCDH) results. + +- For CS: requires `aggregate='event_study'` or `aggregate='all'` so that + event study effects are available. +- For dCDH: requires `L_max >= 1` (multi-horizon mode). Bounds use placebo + estimates `DID^{pl}_l` as pre-period coefficients rather than standard + event-study pre-treatment coefficients, and use diagonal variance (no + full VCV available for dCDH); a `UserWarning` is emitted at runtime. + Default restriction is Relative Magnitudes (DeltaRM) with Mbar=1.0 + targeting the equal-weight average over post-treatment horizons + (`l_vec=None`); R's HonestDiD defaults to the first post / on-impact + effect, so pass `compute_honest_did(results, l_vec=...)` for R parity. + When `trends_linear=True`, bounds apply to the second-differenced + estimand (parallel trends in first differences). Gaps in the horizon + grid from `trends_nonparam` support-trimming are handled by filtering + to the largest consecutive block with a warning. ```python from diff_diff import compute_honest_did diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index 996a440d..2d14d525 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -61,10 +61,11 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves +- [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`. - [StackedDiD](https://diff-diff.readthedocs.io/en/stable/api/stacked_did.html): Wing, Freedman & Hollingsworth (2024) stacked DiD with Q-weights and sub-experiments - [EfficientDiD](https://diff-diff.readthedocs.io/en/stable/api/efficient_did.html): Chen, Sant'Anna & Xie (2025) efficient DiD with optimal weighting for tighter SEs - [TROP](https://diff-diff.readthedocs.io/en/stable/api/trop.html): Triply Robust Panel estimator (Athey et al. 2025) with nuclear norm factor adjustment -- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/index.html): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT +- [StaggeredTripleDifference](https://diff-diff.readthedocs.io/en/stable/api/staggered.html#staggeredtripledifference): Ortiz-Villavicencio & Sant'Anna (2025) staggered DDD with group-time ATT - [WooldridgeDiD](https://diff-diff.readthedocs.io/en/stable/api/wooldridge_etwfe.html): Wooldridge (2023, 2025) ETWFE — saturated OLS, logit/Poisson QMLE (ASF-based ATT). Alias: ETWFE - [BaconDecomposition](https://diff-diff.readthedocs.io/en/stable/api/bacon.html): Goodman-Bacon (2021) decomposition for diagnosing TWFE bias in staggered settings diff --git a/docs/api/had.rst b/docs/api/had.rst new file mode 100644 index 00000000..f63722d7 --- /dev/null +++ b/docs/api/had.rst @@ -0,0 +1,99 @@ +Heterogeneous Adoption Difference-in-Differences +================================================ + +Estimator for designs where **no unit remains untreated** at the post period. +Every unit `g` is exposed to treatment at the same single date but adoption +intensity (dose) varies across units; there is no genuinely untreated control +group to anchor a standard DiD contrast. + +This module implements the methodology from de Chaisemartin, Ciccia, +D'Haultfœuille & Knau (2026), "Difference-in-Differences Estimators When No +Unit Remains Untreated" (arXiv:2405.04465v6), which: + +1. **Targets WAS or WAS_{d̲} depending on design path:** Design 1' (the + QUG / Quasi-Untreated-Group case with ``d̲ = 0``) identifies the + Weighted Average Slope (WAS, paper Equation 2); Design 1 (no QUG, + ``d̲ > 0``) identifies ``WAS_{d̲}`` under Assumption 6, or sign + identification only under Assumption 5 (neither additional assumption + is testable via pre-trends). The shipped result classes expose + ``target_parameter == "WAS"`` versus ``"WAS_d_lower"`` so callers can + key on the resolved estimand. +2. **Estimates the target via local-linear regression at the dose support + boundary**, with three concrete fit paths: ``continuous_at_zero`` for + Design 1', and ``continuous_near_d_lower`` or ``mass_point`` for + Design 1 (auto-detected from the dose distribution). +3. **Provides bias-corrected confidence intervals** ported from the + ``nprobust`` machinery for the continuous-dose paths, and a + structural-residual 2SLS sandwich for the mass-point path. +4. **Extends to multi-period event-study settings** (paper Appendix B.2), + restricting staggered-timing panels to the last-treatment cohort (which + retains never-treated units as comparisons) with pointwise per-horizon CIs. + +.. note:: + + **When to use HAD.** Use ``HeterogeneousAdoptionDiD`` when your panel has + no untreated unit at the post period (e.g. universal-rollout policies, + industry-wide tariff changes) but treatment intensity varies across + units. For panels with a never-treated control group and continuous + treatment, use :class:`~diff_diff.ContinuousDiD` instead. For binary + reversible treatments, use :class:`~diff_diff.ChaisemartinDHaultfoeuille`. + +.. note:: + + **Inference contract.** Per-horizon CIs are always pointwise. There are + three SE regimes selected by call site: + + - **Unweighted** - continuous paths use the CCT-2014 weighted-robust SE + from the in-house ``lprobust`` port; the mass-point path uses a + structural-residual 2SLS sandwich. No cross-horizon covariance. + - **``weights=`` shortcut** - continuous paths reuse the CCT-2014 SE; + the mass-point path uses an analytical weighted 2SLS sandwich + (``classical`` / ``hc1`` only - ``hc2`` / ``hc2_bm`` raise + ``NotImplementedError`` pending a 2SLS-specific leverage derivation). + - **``survey=``** - both paths compose Binder (1983) Taylor-series + linearization with ``df_survey`` threaded into ``safe_inference``. + + A simultaneous confidence band (sup-t) is available only on the + **weighted event-study path** via ``cband=True``. Joint cross-horizon + analytical covariance is not computed in this release; tracked in + ``TODO.md``. + + **Mass-point ``vcov_type="classical"`` deviation.** The mass-point + ``survey=`` paths (static and event-study) and the ``weights=`` + + ``aggregate="event_study"`` + ``cband=True`` path reject + ``vcov_type="classical"`` with ``NotImplementedError``. The per-unit + 2SLS influence function returned by the mass-point fit is HC1-scaled + so that ``compute_survey_if_variance`` and the sup-t bootstrap target + ``V_HC1`` consistently; mixing it with a classical analytical SE + would silently report a ``V_HC1``-targeted variance under a + ``classical`` label. Use ``vcov_type="hc1"`` (or leave it unset with + the default ``robust=True`` mapping); a classical-aligned IF + derivation is queued for a follow-up PR. + +HeterogeneousAdoptionDiD +------------------------ + +.. autoclass:: diff_diff.HeterogeneousAdoptionDiD + :members: + :undoc-members: + :show-inheritance: + +HeterogeneousAdoptionDiDResults +------------------------------- + +Single-period results container for ``HeterogeneousAdoptionDiD`` estimation. + +.. autoclass:: diff_diff.HeterogeneousAdoptionDiDResults + :members: + :undoc-members: + :show-inheritance: + +HeterogeneousAdoptionDiDEventStudyResults +----------------------------------------- + +Multi-period event-study results container for the Appendix B.2 extension. + +.. autoclass:: diff_diff.HeterogeneousAdoptionDiDEventStudyResults + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/api/index.rst b/docs/api/index.rst index da128317..88324723 100644 --- a/docs/api/index.rst +++ b/docs/api/index.rst @@ -24,6 +24,7 @@ Core estimator classes for DiD analysis: diff_diff.TripleDifference diff_diff.TROP diff_diff.ContinuousDiD + diff_diff.HeterogeneousAdoptionDiD diff_diff.EfficientDiD diff_diff.TwoStageDiD diff_diff.WooldridgeDiD @@ -56,6 +57,8 @@ Result containers returned by estimators: diff_diff.TROPResults diff_diff.ContinuousDiDResults diff_diff.DoseResponseCurve + diff_diff.HeterogeneousAdoptionDiDResults + diff_diff.HeterogeneousAdoptionDiDEventStudyResults diff_diff.EfficientDiDResults diff_diff.EDiDBootstrapResults diff_diff.TwoStageDiDResults @@ -237,6 +240,7 @@ Estimators triple_diff trop continuous_did + had efficient_did two_stage wooldridge_etwfe diff --git a/docs/api/staggered.rst b/docs/api/staggered.rst index 71bed3ac..3c26c9b7 100644 --- a/docs/api/staggered.rst +++ b/docs/api/staggered.rst @@ -3,12 +3,13 @@ Staggered Adoption Estimators for staggered DiD designs where treatment is adopted at different times. -This module provides two main estimators for staggered adoption settings: +This module provides three estimators for staggered adoption settings: 1. **Callaway-Sant'Anna (2021)**: Aggregates group-time 2x2 DiD comparisons 2. **Sun-Abraham (2021)**: Interaction-weighted regression approach +3. **Ortiz-Villavicencio & Sant'Anna (2025)**: Staggered triple-difference (DDD) with group-time ATT -Running both provides a useful robustness check—when they agree, results are more credible. +Running CS and SA together provides a useful robustness check - when they agree, results are more credible. .. module:: diff_diff.staggered @@ -123,3 +124,24 @@ Bootstrap inference results for Sun-Abraham estimation. :members: :undoc-members: :show-inheritance: + +StaggeredTripleDifference +------------------------- + +Ortiz-Villavicencio & Sant'Anna (2025) staggered triple-difference (DDD) estimator +with group-time ATT identification under heterogeneous treatment timing. + +.. autoclass:: diff_diff.StaggeredTripleDifference + :members: + :undoc-members: + :show-inheritance: + +StaggeredTripleDiffResults +-------------------------- + +Results container for ``StaggeredTripleDifference`` estimation. + +.. autoclass:: diff_diff.StaggeredTripleDiffResults + :members: + :undoc-members: + :show-inheritance: diff --git a/docs/doc-deps.yaml b/docs/doc-deps.yaml index 5d6894e5..2a5e6f4f 100644 --- a/docs/doc-deps.yaml +++ b/docs/doc-deps.yaml @@ -94,12 +94,15 @@ sources: - path: docs/tutorials/09_real_world_examples.ipynb type: tutorial - path: README.md - section: "DifferenceInDifferences" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "DifferenceInDifferences" type: user_guide - path: diff_diff/guides/llms.txt + section: "Estimators" type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -135,12 +138,15 @@ sources: type: tutorial note: "CallawaySantAnna survey examples" - path: README.md - section: "CallawaySantAnna" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "CallawaySantAnna" type: user_guide - path: diff_diff/guides/llms.txt + section: "Estimators" type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -166,6 +172,20 @@ sources: type: methodology - path: docs/api/staggered.rst type: api_reference + - path: docs/tutorials/16_survey_did.ipynb + type: tutorial + note: "StaggeredTripleDifference appears in survey-DiD examples" + - path: README.md + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst + type: user_guide + - path: diff_diff/guides/llms-full.txt + section: "StaggeredTripleDifference" + type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide # ── SunAbraham ────────────────────────────────────────────────────── @@ -181,11 +201,16 @@ sources: type: tutorial note: "SunAbraham comparison section" - path: README.md - section: "SunAbraham" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "SunAbraham" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide - path: docs/benchmarks.rst @@ -204,11 +229,16 @@ sources: - path: docs/tutorials/11_imputation_did.ipynb type: tutorial - path: README.md - section: "ImputationDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "ImputationDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -225,11 +255,16 @@ sources: - path: docs/tutorials/12_two_stage_did.ipynb type: tutorial - path: README.md - section: "TwoStageDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "TwoStageDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -246,11 +281,16 @@ sources: - path: docs/tutorials/15_efficient_did.ipynb type: tutorial - path: README.md - section: "EfficientDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "EfficientDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide - path: docs/benchmarks.rst @@ -268,12 +308,15 @@ sources: - path: docs/api/chaisemartin_dhaultfoeuille.rst type: api_reference - path: README.md - section: "ChaisemartinDHaultfoeuille" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "ChaisemartinDHaultfoeuille" type: user_guide - path: diff_diff/guides/llms.txt + section: "Estimators" type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -302,17 +345,53 @@ sources: - path: docs/tutorials/14_continuous_did.ipynb type: tutorial - path: README.md - section: "ContinuousDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "ContinuousDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide - path: docs/practitioner_decision_tree.rst section: "Varying Spending Levels" type: user_guide + # ── HeterogeneousAdoptionDiD (HAD) ────────────────────────────────── + + diff_diff/had.py: + drift_risk: medium + docs: + - path: docs/methodology/REGISTRY.md + section: "HeterogeneousAdoptionDiD" + type: methodology + - path: docs/api/had.rst + type: api_reference + - path: README.md + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst + type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide + # Note: llms-full.txt does not yet have a HeterogeneousAdoptionDiD section + # (deferred to TODO.md Phase 5 follow-up); the dependency mapping will be + # added when that section lands. + + diff_diff/had_pretests.py: + drift_risk: medium + docs: + - path: docs/methodology/REGISTRY.md + section: "HeterogeneousAdoptionDiD" + type: methodology + - path: docs/api/had.rst + type: api_reference + # ── SyntheticDiD ───────────��─────────────────────────────────────── diff_diff/synthetic_did.py: @@ -326,11 +405,16 @@ sources: - path: docs/tutorials/03_synthetic_did.ipynb type: tutorial - path: README.md - section: "SyntheticDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "SyntheticDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/practitioner_decision_tree.rst section: "Few Test Markets" type: user_guide @@ -352,11 +436,16 @@ sources: - path: docs/tutorials/08_triple_diff.ipynb type: tutorial - path: README.md - section: "TripleDifference" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "TripleDifference" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -373,11 +462,16 @@ sources: - path: docs/tutorials/13_stacked_did.ipynb type: tutorial - path: README.md - section: "StackedDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "StackedDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -394,11 +488,16 @@ sources: - path: docs/tutorials/16_wooldridge_etwfe.ipynb type: tutorial - path: README.md - section: "WooldridgeDiD" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "WooldridgeDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide @@ -415,11 +514,16 @@ sources: - path: docs/tutorials/10_trop.ipynb type: tutorial - path: README.md - section: "TROP" + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "TROP" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide - path: docs/choosing_estimator.rst type: user_guide - path: docs/performance-plan.md @@ -438,11 +542,16 @@ sources: - path: docs/tutorials/05_honest_did.ipynb type: tutorial - path: README.md - section: "HonestDiD" + section: "Diagnostics & Sensitivity (one-line entry)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "HonestDiD" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Diagnostics and Sensitivity Analysis" + type: user_guide # ── BaconDecomposition ───���───────────────────────────────────────── @@ -454,9 +563,20 @@ sources: type: methodology - path: docs/api/bacon.rst type: api_reference + - path: docs/tutorials/02_staggered_did.ipynb + type: tutorial + note: "Bacon decomposition diagnostics for staggered DiD" - path: README.md + section: "Estimators (one-line catalog entry)" + type: user_guide + - path: docs/references.rst + type: user_guide + - path: diff_diff/guides/llms-full.txt section: "BaconDecomposition" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Estimators" + type: user_guide # ── Diagnostics & analysis ───────���───────────────────────────────── @@ -493,7 +613,9 @@ sources: - path: docs/api/business_report.rst type: api_reference - path: README.md - section: "BusinessReport" + section: "For Data Scientists (one-line mention)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "BusinessReport" @@ -508,7 +630,9 @@ sources: - path: docs/api/diagnostic_report.rst type: api_reference - path: README.md - section: "DiagnosticReport" + section: "For Data Scientists (one-line mention)" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "DiagnosticReport" @@ -543,11 +667,16 @@ sources: - path: docs/tutorials/16_survey_did.ipynb type: tutorial - path: README.md - section: "Survey" + section: "Survey Support" + type: user_guide + - path: docs/references.rst type: user_guide - path: diff_diff/guides/llms-full.txt section: "Survey" type: user_guide + - path: diff_diff/guides/llms.txt + section: "Survey Support" + type: user_guide - path: docs/choosing_estimator.rst section: "Survey Design Support" type: user_guide @@ -597,6 +726,7 @@ sources: section: "Results API" type: user_guide - path: diff_diff/guides/llms.txt + section: "Estimators" type: user_guide - path: docs/methodology/REGISTRY.md section: "SyntheticDiD" @@ -665,6 +795,7 @@ sources: drift_risk: low docs: - path: diff_diff/guides/llms.txt + section: "Estimators" type: user_guide note: "Public API surface" diff --git a/docs/index.rst b/docs/index.rst index 18766ba5..f1922a1f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -70,6 +70,7 @@ Quick Links quickstart choosing_estimator troubleshooting + references .. toctree:: :maxdepth: 1 diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index d88e38e5..d215d021 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -2508,7 +2508,7 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in - **Note (mass-point SE):** Standard errors on the mass-point path use the structural-residual 2SLS sandwich `[Z'X]^{-1} · Ω · [Z'X]^{-T}` with `Ω` built from the structural residuals `u = ΔY - α̂ - β̂·D` (not the reduced-form residuals from an OLS-on-indicator shortcut). Supported: `classical`, `hc1`, and CR1 (cluster-robust) when `cluster=` is supplied. `hc2` and `hc2_bm` raise `NotImplementedError` pending a 2SLS-specific leverage derivation (the OLS leverage `x_i' (X'X)^{-1} x_i` is wrong for 2SLS; the correct finite-sample correction depends on `(Z'X)^{-1}` rather than `(X'X)^{-1}`) plus a dedicated R parity anchor. Queued for the follow-up PR. - **Note (Design 1 identification):** `continuous_near_d_lower` and `mass_point` fits emit a `UserWarning` surfacing that `WAS_{d̲}` identification requires Assumption 6 (or Assumption 5 for sign identification only) beyond parallel trends, and that neither is testable via pre-trends. `continuous_at_zero` (Design 1', Assumption 3 only) does not emit this warning. - **Note (CI endpoints):** Because the continuous-path `att` is `(mean(ΔY) - τ̂_bc) / den`, the beta-scale CI endpoints reverse relative to the Phase 1c boundary-limit CI: `CI_lower(β̂) = (mean(ΔY) - CI_upper(τ̂_bc)) / den` and `CI_upper(β̂) = (mean(ΔY) - CI_lower(τ̂_bc)) / den`. The `HeterogeneousAdoptionDiD.fit()` implementation computes `att ± z · se` directly via `safe_inference`, which handles the reversal naturally from the transformed point estimate. - - **Note (Phase 2a/2b scope):** Phase 2a ships the single-period `aggregate="overall"` path; Phase 2b lifts `aggregate="event_study"` (Appendix B.2 multi-period extension) which returns a `HeterogeneousAdoptionDiDEventStudyResults` with per-event-time WAS estimates and pointwise CIs. `survey=` and `weights=` kwargs raise `NotImplementedError` pointing to the follow-up survey-integration PR. + - **Note (Phase 2a/2b scope, superseded by Phase 4.5):** Phase 2a ships the single-period `aggregate="overall"` path; Phase 2b lifts `aggregate="event_study"` (Appendix B.2 multi-period extension) which returns a `HeterogeneousAdoptionDiDEventStudyResults` with per-event-time WAS estimates and pointwise CIs. The original Phase 2a/2b release raised `NotImplementedError` on `survey=` and `weights=`, but Phase 4.5 (A/B/C0) lifted both gates with the per-design vcov contract documented above (see L2340-L2379, including the mass-point `vcov_type="classical"` deviation and `cband=True` sup-t restriction); the `survey=` path composes Binder (1983) TSL via `compute_survey_if_variance` on both continuous and mass-point IFs. - **Note (panel-only):** The paper (Section 2) defines HAD on *panel or repeated cross-section* data, but both the overall and event-study paths ship a panel-only implementation: `HeterogeneousAdoptionDiD.fit()` requires a balanced panel with a unit identifier so that unit-level first differences `ΔY_{g,t} = Y_{g,t} - Y_{g,t_anchor}` can be formed. Repeated-cross-section inputs (disjoint unit IDs between periods) are rejected by the balanced-panel validator. RCS support is queued for a follow-up PR (tracked in `TODO.md`); it will need a separate identification path based on pre/post cell means rather than unit-level differences. - [x] Phase 2b: Multi-period event-study extension (Appendix B.2). `aggregate="event_study"` produces per-event-time WAS estimates using a uniform `F-1` baseline (`ΔY_{g,t} = Y_{g,t} - Y_{g,F-1}` for every horizon), reusing the three Phase 2a design paths on per-horizon first differences. Pre-period placebos included for `e <= -2` (the anchor `e = -1` is skipped since `ΔY = 0` trivially). Post-period estimates for `e >= 0`. The joint Stute test (Equation 18) across pre-periods is a SEPARATE diagnostic deferred to a **Phase 3 follow-up patch** (Phase 3 ships the single-horizon Stute test; the joint stacked-residual variant is tracked in `TODO.md`). - **Note (Phase 2b last-cohort filter):** When `first_treat_col` indicates more than one nonzero cohort, the panel is auto-filtered to the last-treatment cohort (`F_last = max(cohorts)`) **plus never-treated units** (`first_treat = 0`), with a `UserWarning` naming kept/dropped unit counts and dropped cohort labels. Paper Appendix B.2 is explicit that HAD "may be used only for the LAST treatment cohort in a staggered design"; the auto-filter implements this prescription, retaining never-treated units per the paper's "there must be an untreated group, at least till the period where the last cohort gets treated" requirement. Only earlier-cohort units (with `first_treat > 0` and `< F_last`) are dropped — never-treated units satisfy the dose invariant at every period (`D = 0` throughout) and preserve Design 1' identifiability (boundary at `0`) when last-cohort doses are uniformly positive. When `first_treat_col` is omitted on a >2-period panel, the validator infers each unit's first-positive-dose period from the dose path; if multiple distinct first-positive-dose cohorts are detected, the estimator raises a front-door `ValueError` directing users to pass `first_treat_col` (which activates the auto-filter) or use `ChaisemartinDHaultfoeuille` for full staggered support — there is no silent acceptance of staggered panels without cohort metadata. Common-adoption panels (single first-positive-dose cohort, or only never-treated + one cohort) pass through unchanged with `F` inferred from the dose invariant, and require dose contiguity (pre-periods < post-periods in natural ordering). Non-contiguous dose sequences (e.g., reverse treatment) raise with a pointer to `ChaisemartinDHaultfoeuille`. @@ -2528,7 +2528,8 @@ Shipped in `diff_diff/had_pretests.py` as `stute_joint_pretest()` (residuals-in - [ ] Phase 4: Pierce-Schott (2016) replication harness reproduces Figure 2 values. - [ ] Phase 4: Full DGP 1/2/3 coverage-rate reproduction from Table 1. - [ ] Phase 5: `practitioner_next_steps()` integration for HAD results. -- [ ] Phase 5: Tutorial notebook + `llms.txt` + `llms-full.txt` updates (preserving the UTF-8 fingerprint). +- [x] Phase 5 (partial): README catalog one-liner, bundled `llms.txt` `## Estimators` entry, `docs/api/had.rst` (autoclass for the three classes), and `docs/references.rst` citation landed in PR #372 docs refresh. +- [ ] Phase 5 (remaining): Tutorial notebook + `llms-full.txt` HeterogeneousAdoptionDiD section (preserving the UTF-8 fingerprint). - [ ] Documentation of non-testability of Assumptions 5 and 6. - [ ] Warnings for staggered treatment timing (redirect to `ChaisemartinDHaultfoeuille`). - [ ] `NotImplementedError` phase pointer when `covariates=` is passed (Theorem 6 future work). diff --git a/docs/references.rst b/docs/references.rst new file mode 100644 index 00000000..7a5cda6d --- /dev/null +++ b/docs/references.rst @@ -0,0 +1,211 @@ +References +========== + +This library implements methods from the following scholarly works. + +Difference-in-Differences +------------------------- + +- **Ashenfelter, O., & Card, D. (1985).** "Using the Longitudinal Structure of Earnings to Estimate the Effect of Training Programs." *The Review of Economics and Statistics*, 67(4), 648-660. https://doi.org/10.2307/1924810 + +- **Card, D., & Krueger, A. B. (1994).** "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania." *The American Economic Review*, 84(4), 772-793. https://www.jstor.org/stable/2118030 + +- **Angrist, J. D., & Pischke, J.-S. (2009).** *Mostly Harmless Econometrics: An Empiricist's Companion*. Princeton University Press. Chapter 5: Differences-in-Differences. + +Two-Way Fixed Effects +--------------------- + +- **Wooldridge, J. M. (2010).** *Econometric Analysis of Cross Section and Panel Data* (2nd ed.). MIT Press. + +- **Imai, K., & Kim, I. S. (2021).** "On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data." *Political Analysis*, 29(3), 405-415. https://doi.org/10.1017/pan.2020.33 + +Wooldridge ETWFE +---------------- + +- **Wooldridge, J. M. (2025).** "Two-Way Fixed Effects, the Two-Way Mundlak Regression, and Difference-in-Differences Estimators." *Empirical Economics*, 69(5), 2545-2587. (Published version of NBER Working Paper 29154.) + + Primary source for the saturated OLS ETWFE design implemented in our ``WooldridgeDiD`` class. + +- **Wooldridge, J. M. (2023).** "Simple Approaches to Nonlinear Difference-in-Differences with Panel Data." *The Econometrics Journal*, 26(3), C31-C66. https://doi.org/10.1093/ectj/utad016 + + Secondary source for the logit/Poisson QMLE (ASF-based ATT) extensions in ``WooldridgeDiD``. + +Robust Standard Errors +---------------------- + +- **White, H. (1980).** "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity." *Econometrica*, 48(4), 817-838. https://doi.org/10.2307/1912934 + +- **MacKinnon, J. G., & White, H. (1985).** "Some Heteroskedasticity-Consistent Covariance Matrix Estimators with Improved Finite Sample Properties." *Journal of Econometrics*, 29(3), 305-325. https://doi.org/10.1016/0304-4076(85)90158-7 + +- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2011).** "Robust Inference With Multiway Clustering." *Journal of Business & Economic Statistics*, 29(2), 238-249. https://doi.org/10.1198/jbes.2010.07136 + +Wild Cluster Bootstrap +---------------------- + +- **Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008).** "Bootstrap-Based Improvements for Inference with Clustered Errors." *The Review of Economics and Statistics*, 90(3), 414-427. https://doi.org/10.1162/rest.90.3.414 + +- **Webb, M. D. (2014).** "Reworking Wild Bootstrap Based Inference for Clustered Errors." Queen's Economics Department Working Paper No. 1315. https://www.econ.queensu.ca/sites/econ.queensu.ca/files/qed_wp_1315.pdf + +- **MacKinnon, J. G., & Webb, M. D. (2018).** "The Wild Bootstrap for Few (Treated) Clusters." *The Econometrics Journal*, 21(2), 114-135. https://doi.org/10.1111/ectj.12107 + +Nonparametric Bias-Corrected Inference +-------------------------------------- + +- **Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014).** "Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs." *Econometrica*, 82(6), 2295-2326. https://doi.org/10.3982/ECTA11757 + + Source of the bias-combined design matrix used by the in-house ``lprobust`` port that backs ``HeterogeneousAdoptionDiD`` Phase 1c (continuous-dose paths) for the bias-corrected weighted-robust SE. + +- **Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2018).** "On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference." *Journal of the American Statistical Association*, 113(522), 767-779. https://doi.org/10.1080/01621459.2017.1285776 + +- **Calonico, S., Cattaneo, M. D., & Farrell, M. H. (2019).** "nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference." *Journal of Statistical Software*, 91(8), 1-33. https://doi.org/10.18637/jss.v091.i08 + + CCF (2018, 2019) is the underlying ``nprobust`` machinery (MSE-optimal bandwidth selection and robust bias-corrected CIs) that ``HeterogeneousAdoptionDiD`` ports in-house for the continuous-dose paths. + +Survey-Design Inference (Taylor-Series Linearization) +----------------------------------------------------- + +- **Binder, D. A. (1983).** "On the Variances of Asymptotically Normal Estimators from Complex Surveys." *International Statistical Review*, 51(3), 279-292. https://doi.org/10.2307/1402588 + + Foundational TSL (Taylor-Series Linearization) variance derivation used across diff-diff's survey-aware estimators (``compute_survey_if_variance`` and the per-estimator influence-function compositions, including the dCDH and HeterogeneousAdoptionDiD ``survey=`` paths). + +Placebo Tests and DiD Diagnostics +--------------------------------- + +- **Bertrand, M., Duflo, E., & Mullainathan, S. (2004).** "How Much Should We Trust Differences-in-Differences Estimates?" *The Quarterly Journal of Economics*, 119(1), 249-275. https://doi.org/10.1162/003355304772839588 + +Synthetic Control Method +------------------------ + +- **Abadie, A., & Gardeazabal, J. (2003).** "The Economic Costs of Conflict: A Case Study of the Basque Country." *The American Economic Review*, 93(1), 113-132. https://doi.org/10.1257/000282803321455188 + +- **Abadie, A., Diamond, A., & Hainmueller, J. (2010).** "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." *Journal of the American Statistical Association*, 105(490), 493-505. https://doi.org/10.1198/jasa.2009.ap08746 + +- **Abadie, A., Diamond, A., & Hainmueller, J. (2015).** "Comparative Politics and the Synthetic Control Method." *American Journal of Political Science*, 59(2), 495-510. https://doi.org/10.1111/ajps.12116 + +Synthetic Difference-in-Differences +----------------------------------- + +- **Arkhangelsky, D., Athey, S., Hirshberg, D. A., Imbens, G. W., & Wager, S. (2021).** "Synthetic Difference-in-Differences." *American Economic Review*, 111(12), 4088-4118. https://doi.org/10.1257/aer.20190159 + +Triply Robust Panel (TROP) +-------------------------- + +- **Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025).** "Triply Robust Panel Estimators." *Working Paper*. https://arxiv.org/abs/2508.21536 + + This paper introduces the TROP estimator which combines three robustness components: + + - **Factor model adjustment**: Low-rank factor structure via SVD removes unobserved confounders + - **Unit weights**: Synthetic control style weighting for optimal comparison + - **Time weights**: SDID style time weighting for informative pre-periods + + TROP is particularly useful when there are unobserved time-varying confounders with a factor structure that affect different units differently over time. + +Triple Difference (DDD) +----------------------- + +- **Ortiz-Villavicencio, M., & Sant'Anna, P. H. C. (2025).** "Better Understanding Triple Differences Estimators." *Working Paper*. https://arxiv.org/abs/2505.09942 + + This paper shows that common DDD implementations (taking the difference between two DiDs, or applying three-way fixed effects regressions) are generally invalid when identification requires conditioning on covariates. The ``TripleDifference`` class implements their regression adjustment, inverse probability weighting, and doubly robust estimators. + +- **Gruber, J. (1994).** "The Incidence of Mandated Maternity Benefits." *American Economic Review*, 84(3), 622-641. https://www.jstor.org/stable/2118071 + + Classic paper introducing the Triple Difference design for policy evaluation. + +- **Olden, A., & Møen, J. (2022).** "The Triple Difference Estimator." *The Econometrics Journal*, 25(3), 531-553. https://doi.org/10.1093/ectj/utac010 + +Parallel Trends and Pre-Trend Testing +------------------------------------- + +- **Roth, J. (2022).** "Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends." *American Economic Review: Insights*, 4(3), 305-322. https://doi.org/10.1257/aeri.20210236 + +- **Lakens, D. (2017).** "Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses." *Social Psychological and Personality Science*, 8(4), 355-362. https://doi.org/10.1177/1948550617697177 + +Honest DiD / Sensitivity Analysis +--------------------------------- + +The ``HonestDiD`` module implements sensitivity analysis methods for relaxing the parallel trends assumption. + +- **Rambachan, A., & Roth, J. (2023).** "A More Credible Approach to Parallel Trends." *The Review of Economic Studies*, 90(5), 2555-2591. https://doi.org/10.1093/restud/rdad018 + + This paper introduces the "Honest DiD" framework implemented in our ``HonestDiD`` class: + + - **Relative Magnitudes (ΔRM)**: Bounds post-treatment violations by a multiple of observed pre-treatment violations + - **Smoothness (ΔSD)**: Bounds on second differences of trend violations, allowing for linear extrapolation of pre-trends + - **Breakdown Analysis**: Finding the smallest violation magnitude that would overturn conclusions + - **Robust Confidence Intervals**: Valid inference under partial identification + +- **Roth, J., & Sant'Anna, P. H. C. (2023).** "When Is Parallel Trends Sensitive to Functional Form?" *Econometrica*, 91(2), 737-747. https://doi.org/10.3982/ECTA19402 + + Discusses functional form sensitivity in parallel trends assumptions, relevant to understanding when smoothness restrictions are appropriate. + +Multi-Period and Staggered Adoption +----------------------------------- + +- **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. https://doi.org/10.1093/restud/rdae007 + + This paper introduces the imputation estimator implemented in our ``ImputationDiD`` class: + + - **Efficient imputation**: OLS on untreated observations, impute counterfactuals, aggregate + - **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model + - **Pre-trend test**: Independent of treatment effect estimation (Proposition 9) + - **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects + +- **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. https://doi.org/10.1016/j.jeconom.2020.12.001 + +- **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. https://doi.org/10.1016/j.jeconom.2020.06.003 + +- **Sun, L., & Abraham, S. (2021).** "Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects." *Journal of Econometrics*, 225(2), 175-199. https://doi.org/10.1016/j.jeconom.2020.09.006 + +- **Gardner, J. (2022).** "Two-stage differences in differences." *arXiv preprint arXiv:2207.05943*. https://arxiv.org/abs/2207.05943 + +- **Butts, K., & Gardner, J. (2022).** "did2s: Two-Stage Difference-in-Differences." *The R Journal*, 14(1), 162-173. https://doi.org/10.32614/RJ-2022-048 + +- **de Chaisemartin, C., & D'Haultfœuille, X. (2020).** "Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects." *American Economic Review*, 110(9), 2964-2996. https://doi.org/10.1257/aer.20181169 + +- **de Chaisemartin, C., & D'Haultfœuille, X. (2022, revised 2024).** "Difference-in-Differences Estimators of Intertemporal Treatment Effects." *NBER Working Paper* 29873. https://www.nber.org/papers/w29873 + + Dynamic companion to the 2020 paper. Web Appendix Section 3.7.3 contains the cohort-recentered plug-in variance formula implemented in our ``ChaisemartinDHaultfoeuille`` class. + +- **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014 + +- **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. https://www.nber.org/papers/w32054 + +- **Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025).** "Efficient Difference-in-Differences and Event Study Estimators." *Working Paper*. + + Primary source for the optimal-weighting / PT-All / PT-Post efficient DiD implemented in our ``EfficientDiD`` class. + +- **Baker, A., Callaway, B., Cunningham, S., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2025).** "Difference-in-Differences Designs: A Practitioner's Guide." *arXiv preprint* arXiv:2503.13323. https://arxiv.org/abs/2503.13323 + + Source for the 8-step practitioner workflow surfaced via ``diff_diff.get_llm_guide("practitioner")`` and the README ``## Practitioner Workflow`` section. See ``docs/methodology/REGISTRY.md`` for the diff-diff renumbering and per-step deviations. + +Continuous Treatment DiD +------------------------ + +- **Callaway, B., Goodman-Bacon, A., & Sant'Anna, P. H. C. (2024).** "Difference-in-Differences with a Continuous Treatment." *NBER Working Paper* 32117. https://www.nber.org/papers/w32117 + + Primary source for ATT(d), ACRT, dose-response curves, and B-spline flexibility implemented in our ``ContinuousDiD`` class. + +Heterogeneous Adoption (No-Untreated Designs) +--------------------------------------------- + +- **de Chaisemartin, C., Ciccia, D., D'Haultfœuille, X., & Knau, F. (2026).** "Difference-in-Differences Estimators When No Unit Remains Untreated." *arXiv preprint* arXiv:2405.04465v6. https://arxiv.org/abs/2405.04465 + + Primary source for the Weighted Average Slope (WAS) estimator and its multi-period event-study extension implemented in our ``HeterogeneousAdoptionDiD`` class. Targets panels where no unit remains untreated at the post period and treatment dose ``D_{g,2}`` is nonnegative, using local-linear regression at the dose support boundary - both Design 1' (the QUG case with ``d̲ = 0``) and Design 1 (no QUG with ``d̲ > 0``) are supported. + +Power Analysis +-------------- + +- **Bloom, H. S. (1995).** "Minimum Detectable Effects: A Simple Way to Report the Statistical Power of Experimental Designs." *Evaluation Review*, 19(5), 547-556. https://doi.org/10.1177/0193841X9501900504 + +- **Burlig, F., Preonas, L., & Woerman, M. (2020).** "Panel Data and Experimental Design." *Journal of Development Economics*, 144, 102458. https://doi.org/10.1016/j.jdeveco.2020.102458 + + Essential reference for power analysis in panel DiD designs. Discusses how serial correlation (ICC) affects power and provides formulas for panel data settings. + +- **Djimeu, E. W., & Houndolo, D.-G. (2016).** "Power Calculation for Causal Inference in Social Science: Sample Size and Minimum Detectable Effect Determination." *Journal of Development Effectiveness*, 8(4), 508-527. https://doi.org/10.1080/19439342.2016.1244555 + +General Causal Inference +------------------------ + +- **Imbens, G. W., & Rubin, D. B. (2015).** *Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction*. Cambridge University Press. + +- **Cunningham, S. (2021).** *Causal Inference: The Mixtape*. Yale University Press. https://mixtape.scunning.com/ diff --git a/llms.txt b/llms.txt new file mode 100644 index 00000000..f067fa75 --- /dev/null +++ b/llms.txt @@ -0,0 +1,16 @@ +# diff-diff + +> Python library for Difference-in-Differences (DiD) causal inference. sklearn-like estimators, statsmodels-style summary output, validated against R. + +## Guides for AI agents and LLMs + +- Concise API reference: https://diff-diff.readthedocs.io/en/stable/llms.txt +- Full API reference: https://diff-diff.readthedocs.io/en/stable/llms-full.txt +- 8-step practitioner workflow (Baker et al. 2025): https://diff-diff.readthedocs.io/en/stable/llms-practitioner.txt + +After `pip install diff-diff`, also accessible in-process via `from diff_diff import get_llm_guide`: + +- `get_llm_guide()` - concise +- `get_llm_guide("full")` - full +- `get_llm_guide("practitioner")` - 8-step workflow +- `get_llm_guide("autonomous")` - autonomous-agent variant (in-wheel only; not yet on RTD)