# Synthetic Control

> **Reference:** *Causal Inference: The Mixtape*, Chapter 10: Synthetic Control (pp. 469–540)

This lecture introduces the synthetic control method for estimating causal effects in comparative case studies with panel data.

---

## Part I: Theory

This section covers the theoretical foundations of the synthetic control method as presented in Cunningham's *Causal Inference: The Mixtape*, Chapter 10.

## 1. The Comparative Case Study

Causal inference often involves evaluating the effect of a policy or intervention applied to a **single unit**—a country, state, firm, or product. When only one unit is treated, traditional methods face a fundamental challenge: there is no randomization and the "sample size" of treated units is exactly one.

The traditional approach is a **comparative case study**: compare the affected unit to one or more "comparison" units not exposed to the intervention. For example, Card (1990) studied the 1980 Mariel Boatlift—which brought 125,000 Cuban immigrants to Miami—by comparing Miami's labor market to a set of comparison cities.

The problem with traditional comparative case studies is **subjectivity in selecting comparison units**. Different researchers might choose different comparison groups and reach different conclusions. Is Atlanta a better comparison for Miami, or is Houston? The answer often depends on the researcher's judgment, which introduces an element of arbitrariness.

The **synthetic control method** (Abadie and Gardeazabal 2003; Abadie, Diamond, and Hainmueller 2010) provides a systematic, data-driven approach to this problem. Rather than selecting a single comparison unit, it constructs a *synthetic* comparison as a weighted average of multiple untreated units.

## 2. The Synthetic Control Estimator

### Setup

Suppose we observe $J + 1$ units over $T$ time periods. Unit $j = 1$ receives a treatment or intervention at time $T_0 + 1$. The remaining $J$ units ($j = 2, \ldots, J+1$) are untreated and form the **donor pool**.

Let $Y_{jt}^N$ denote the potential outcome of unit $j$ at time $t$ in the absence of treatment, and $Y_{jt}^I$ the potential outcome under treatment. The causal effect of the intervention on the treated unit at time $t > T_0$ is:

$$\alpha_{1t} = Y_{1t}^I - Y_{1t}^N$$

We observe $Y_{1t}^I$ directly (the treated unit's actual outcome after treatment). The challenge is estimating $Y_{1t}^N$—what the treated unit's outcome *would have been* without the intervention.

### The Synthetic Control

The synthetic control method estimates $Y_{1t}^N$ as a weighted average of the donor units' outcomes:

$$\hat{Y}_{1t}^N = \sum_{j=2}^{J+1} w_j^* Y_{jt}$$

where the weights $W^* = (w_2^*, \ldots, w_{J+1}^*)'$ satisfy:

- **Non-negativity**: $w_j \geq 0$ for all $j$
- **Sum to one**: $\sum_{j=2}^{J+1} w_j = 1$

These constraints ensure the synthetic control lies in the **convex hull** of the donor pool—it is an interpolation, never an extrapolation.

The estimated treatment effect at each post-treatment period is:

$$\hat{\alpha}_{1t} = Y_{1t} - \sum_{j=2}^{J+1} w_j^* Y_{jt}$$

## 3. Choosing Weights

The optimal weights are chosen to make the synthetic control match the treated unit as closely as possible in the pre-treatment period. Let:

- $X_1$ be a $(k \times 1)$ vector of pre-treatment characteristics for the treated unit
- $X_0$ be a $(k \times J)$ matrix of the same characteristics for the donor units

The characteristics in $X$ typically include pre-treatment outcome values (or averages over pre-treatment subperiods) and other predictors of the outcome.

The optimal weights minimize:

$$W^* = \arg\min_W \|X_1 - X_0 W\|_V = \arg\min_W \sqrt{(X_1 - X_0 W)' V (X_1 - X_0 W)}$$

subject to $w_j \geq 0$ and $\sum w_j = 1$.

### The Role of $V$

The matrix $V$ is a $(k \times k)$ positive semidefinite matrix that reflects the relative importance of different pre-treatment characteristics. It can be:

| Approach | Description |
|----------|-------------|
| **Researcher-specified** | Set $V$ based on domain knowledge about which characteristics matter most |
| **Data-driven** | Choose $V$ to minimize the mean squared prediction error (MSPE) of the outcome in the pre-treatment period |
| **Diagonal** | Restrict $V$ to be diagonal, where each entry reflects one characteristic's importance |

In practice, the data-driven approach is most common: a nested optimization selects $V$ in an outer loop to minimize pre-treatment MSPE, and given $V$, the inner loop solves for the optimal weights $W^*$.

## 4. Advantages Over Regression

A natural alternative to synthetic control is to regress the treated unit's outcome on the donor units' outcomes using ordinary least squares. Why prefer synthetic control?

| Criterion | Synthetic Control | Regression |
|-----------|-------------------|------------|
| **Interpolation** | Weights are non-negative and sum to 1 $\rightarrow$ synthetic control lies within the convex hull of donors | Coefficients are unrestricted $\rightarrow$ can extrapolate beyond the data |
| **Transparency** | Weights reveal exactly which units contribute and how much | Coefficients mix unit contributions with functional form |
| **Overfitting** | Convexity constraint acts as implicit regularization | OLS with many predictors can overfit pre-treatment data while predicting post-treatment poorly |
| **Design focus** | Constructed entirely from pre-treatment data, separating design from analysis | Same specification used for estimation, inviting specification search |

The interpolation property is particularly important. When the number of potential control units is large relative to the number of pre-treatment periods, OLS will perfectly fit the pre-treatment data (overfitting) but may perform poorly out of sample. The convexity constraint of the synthetic control acts as a natural regularizer, forcing the method to find a *meaningful* combination of donors rather than an arbitrary one.

## 5. Inference

Standard inference methods—t-tests, confidence intervals from asymptotic theory—are not appropriate for synthetic control because the treated sample size is one. Instead, Abadie, Diamond, and Hainmueller (2010) propose **placebo tests** based on permutation inference.

### Placebo-in-Space Tests

The procedure is:

1. For each unit $j$ in the donor pool, pretend it was treated and fit a synthetic control using the remaining units as donors
2. Compute the **gap** (difference between actual and synthetic) for each placebo unit in the post-treatment period
3. Compare the treated unit's gap to the distribution of placebo gaps

If the treated unit's gap is unusually large relative to the placebo distribution, we have evidence of a genuine treatment effect.

### RMSPE Ratios

To formalize the comparison, compute the ratio of post-treatment to pre-treatment root mean squared prediction error for each unit:

$$r_j = \frac{\text{RMSPE}_j^{\text{post}}}{\text{RMSPE}_j^{\text{pre}}}$$

where $\text{RMSPE}_j^{\text{post}}$ measures the post-treatment gap and $\text{RMSPE}_j^{\text{pre}}$ measures the pre-treatment fit quality.

The ratio adjusts for the fact that some placebo units may have poor pre-treatment fit (large $\text{RMSPE}^{\text{pre}}$), which would inflate their post-treatment gaps even without any real effect.

### Exact p-Value

The treated unit's rank among all $J + 1$ RMSPE ratios gives an **exact p-value**:

$$p = \frac{\text{Rank of treated unit's } r_1 \text{ among all } r_j}{J + 1}$$

For example, if the treated unit has the highest RMSPE ratio among 20 units (1 treated + 19 donors), the p-value is $1/20 = 0.05$.

## 6. Practical Considerations

### Donor Pool Selection

The donor pool should contain units that are **plausible comparisons**—units that could reasonably approximate the treated unit's trajectory in the absence of treatment. Exclude units that:

- Experienced similar interventions or major idiosyncratic shocks during the study period
- Are structurally very different from the treated unit (different scale, different market)

Including inappropriate donors adds noise without improving the counterfactual estimate.

### Pre-Treatment Fit Quality

The credibility of the synthetic control depends on how well it tracks the treated unit in the pre-treatment period. A large **pre-treatment MSPE** is a warning sign: if the method cannot reproduce the treated unit's trajectory before treatment, there is little reason to trust its counterfactual prediction after treatment.

As a rule of thumb, the pre-treatment fit should be evaluated both visually (does the synthetic line closely track the treated line?) and quantitatively (is the MSPE small relative to the outcome's variance?).

### When Synthetic Control Works Best

The method is most effective when:

- The pre-treatment period is long enough to reveal the outcome's dynamics
- The donor pool contains units whose weighted combination can reproduce the treated unit's trajectory
- The treated unit's outcome lies within the range of donor outcomes (convex hull condition)
- There are no large, idiosyncratic shocks to the treated unit in the pre-treatment period

When these conditions hold, matching on pre-treatment outcomes implicitly controls for heterogeneous responses to unobserved confounders—a key theoretical result from Abadie, Diamond, and Hainmueller (2010).

---

## Part II: Application

In Part I we developed the theory of synthetic control: a weighted average of donor units constructs a transparent counterfactual for a single treated unit, with weights chosen to match pre-treatment outcomes. Inference relies on placebo tests rather than standard errors.

We now apply this method to a product-level panel from our online retail simulation. A single flagship product receives a content optimization campaign starting on a specific date. We observe daily revenue for all products before and after the campaign. The question is whether we can recover the true causal effect of the campaign.

**Can synthetic control isolate the campaign's effect from common time trends that affect all products?**

In [None]:
# Standard library
import inspect

# Third-party packages
from IPython.display import Code
from impact_engine_measure import evaluate_impact, load_results
from online_retail_simulator import simulate, load_job_results
import pandas as pd
import yaml

# Local imports
from support import (
    compute_ground_truth_att,
    create_synthetic_control_data,
    plot_gap,
    plot_method_comparison,
    plot_treated_vs_synthetic,
    plot_weights,
)

## 1. Business Context

An e-commerce company runs a **content optimization campaign** on a single flagship product. The campaign improves the product listing's images, descriptions, and search metadata, starting on a specific date. The company observes daily revenue for all products—both the flagship and dozens of untreated products—before and after the campaign launch.

The company wants to know: **how much additional daily revenue did the campaign generate?**

The challenge is that revenue fluctuates over time for all products due to seasonal patterns, market conditions, and promotional cycles. A simple before-after comparison for the flagship product would confound the campaign's effect with these common time trends. The synthetic control method addresses this by constructing a counterfactual from the untreated products' trajectories.

In [None]:
! cat config_simulation.yaml

In [None]:
# Run simulation and load results
job_info = simulate("config_simulation.yaml")
metrics = load_job_results(job_info)["metrics"]

print(f"Metrics records: {len(metrics)}")
print(f"Unique products: {metrics['product_identifier'].nunique()}")
print(f"Date range: {metrics['date'].min()} to {metrics['date'].max()}")

In [None]:
Code(inspect.getsource(create_synthetic_control_data), language="python")

In [None]:
# Create panel data with known treatment effect
TRUE_EFFECT = 50.0  # $50 daily revenue boost from content optimization
TREATMENT_DATE = "2024-11-15"

panel, treated_product = create_synthetic_control_data(
    metrics, treatment_date=TREATMENT_DATE, true_effect=TRUE_EFFECT, seed=42
)

# Save for the Impact Engine pipeline
panel.to_csv("panel_data.csv", index=False)

true_att = compute_ground_truth_att(panel, treated_product, TREATMENT_DATE)

print(f"Panel shape: {panel.shape}")
print(f"Products: {panel['product_identifier'].nunique()}")
print(f"Date range: {panel['date'].min().date()} to {panel['date'].max().date()}")
print(f"Treated product: {treated_product}")
print(f"Treatment date: {TREATMENT_DATE}")
print(f"True ATT: ${true_att:,.2f}")

## 2. What Does the Naive Comparison Tell Us?

As a baseline, we compute a simple **before-after difference** for the treated product: compare its average daily revenue in the post-treatment period to its average in the pre-treatment period. We use the `ExperimentAdapter` to run this as an OLS regression of revenue on a post-treatment indicator.

This approach ignores the donor pool entirely. Any change in the treated product's revenue—whether caused by the campaign or by common time trends—is attributed to the treatment.

In [None]:
# Naive before-after comparison for the treated product
treated_data = panel[panel["product_identifier"] == treated_product].copy()
treated_data["post"] = (treated_data["date"] >= pd.Timestamp(TREATMENT_DATE)).astype(int)

# Save for the Impact Engine pipeline
treated_data.to_csv("treated_data.csv", index=False)

naive_job = evaluate_impact("config_experiment.yaml", storage_url="./output/experiment")
naive_result = load_results(naive_job)
naive_estimate = naive_result.impact_results["data"]["impact_estimates"]["params"]["post"]

print("Naive Before-After Comparison")
print("=" * 50)
print(f"Pre-treatment mean:  ${treated_data[treated_data['post'] == 0]['revenue'].mean():,.2f}")
print(f"Post-treatment mean: ${treated_data[treated_data['post'] == 1]['revenue'].mean():,.2f}")
print(f"Naive estimate:      ${naive_estimate:,.2f}")
print(f"\nTrue ATT:            ${true_att:,.2f}")
print(f"Bias:                ${naive_estimate - true_att:,.2f}")

### Why Does the Naive Estimate Fail?

The before-after comparison confounds the treatment effect with **common time trends**. All products in the data experience a shared upward trend in revenue over the study period. The naive estimator attributes this trend entirely to the campaign, biasing the estimate upward.

The synthetic control method solves this problem by constructing a counterfactual from the donor pool. Since the untreated products share the same time trends but were not affected by the campaign, their weighted combination provides a valid counterfactual that "subtracts out" the common trajectory.

## 3. Synthetic Control with the Impact Engine

The `SyntheticControlAdapter` from the **Impact Engine** implements the synthetic control method from Part I. It uses `pysyncon` to find the optimal donor weights that minimize pre-treatment MSPE and estimates the average treatment effect on the treated (ATT) in the post-treatment period.

### Configuration-to-Theory Mapping

| YAML Config Field | Part I Concept |
|-------------------|----------------|
| `unit_column` | Unit identifier $j$ in the donor pool ($j = 2, \ldots, J+1$) |
| `time_column` | Time index $t$ in the panel |
| `outcome_column` | Outcome $Y_{jt}$ for unit $j$ at time $t$ |
| `treated_unit` | Treated unit $j=1$ receiving the intervention |
| `treatment_time` | Intervention time $T_0$ splitting pre/post periods |
| `optim_method` | Optimization for weight selection: $\min \|X_1 - X_0 W\|$ |

In [None]:
# Generate the synthetic control config (treated_unit is determined at runtime)
sc_config = {
    "DATA": {
        "SOURCE": {"type": "file", "CONFIG": {"path": "panel_data.csv", "date_column": None}},
        "TRANSFORM": {"FUNCTION": "passthrough", "PARAMS": {}},
    },
    "MEASUREMENT": {
        "MODEL": "synthetic_control",
        "PARAMS": {
            "unit_column": "product_identifier",
            "time_column": "date",
            "outcome_column": "revenue",
            "treated_unit": treated_product,
            "treatment_time": TREATMENT_DATE,
            "optim_method": "Nelder-Mead",
            "optim_initial": "equal",
        },
    },
}

with open("config_synthetic_control.yaml", "w") as f:
    yaml.dump(sc_config, f, default_flow_style=False, sort_keys=False)

! cat config_synthetic_control.yaml

In [None]:
# Run the synthetic control pipeline
sc_job = evaluate_impact("config_synthetic_control.yaml", storage_url="./output/synthetic_control")
sc_result = load_results(sc_job)

sc_data = sc_result.impact_results["data"]
sc_att = sc_data["impact_estimates"]["att"]
sc_se = sc_data["impact_estimates"]["se"]
sc_mspe = sc_data["model_summary"]["mspe"]

print("Synthetic Control Results")
print("=" * 50)
print(f"Estimated ATT:  ${sc_att:,.2f}  (SE: ${sc_se:,.2f})")
print(f"True ATT:       ${true_att:,.2f}")
print(f"Bias:           ${sc_att - true_att:,.2f}")
print(f"\nPre-treatment MSPE: {sc_mspe:,.4f}")
print(f"Control units used: {sc_data['model_summary']['n_control_units']}")

## 4. Which Method Best Recovers the True Effect?

We now compare the naive before-after estimate against the synthetic control estimate. Because we generated the data with known potential outcomes, we can directly measure each method's bias.

In [None]:
plot_method_comparison(
    {"Naive\n(Before-After)": naive_estimate, "Synthetic\nControl": sc_att},
    true_att,
)

In [None]:
# Summary statistics table
summary = pd.DataFrame(
    {
        "Method": ["Naive (Before-After)", "Synthetic Control"],
        "Estimate ($)": [naive_estimate, sc_att],
        "Error ($)": [naive_estimate - true_att, sc_att - true_att],
        "% Error": [
            (naive_estimate - true_att) / true_att * 100,
            (sc_att - true_att) / true_att * 100,
        ],
    }
)
summary["Estimate ($)"] = summary["Estimate ($)"].map(lambda x: f"${x:,.2f}")
summary["Error ($)"] = summary["Error ($)"].map(lambda x: f"${x:,.2f}")
summary["% Error"] = summary["% Error"].map(lambda x: f"{x:+.1f}%")

print(f"True ATT: ${true_att:,.2f}")
print()
summary

## 5. Diagnostics

The credibility of a synthetic control analysis rests on three diagnostics: (1) how well the synthetic control tracks the treated unit pre-treatment, (2) which donor units contribute to the synthetic control, and (3) the gap between treated and synthetic series over time.

In [None]:
plot_weights(sc_data)

In [None]:
synthetic_ts = plot_treated_vs_synthetic(panel, treated_product, sc_data, TREATMENT_DATE)

In [None]:
plot_gap(panel, treated_product, synthetic_ts, TREATMENT_DATE)

## Additional Resources

- **Abadie, A. & Gardeazabal, J. (2003)**. [The economic costs of conflict: A case study of the Basque Country](https://doi.org/10.1257/000282803321455188). *American Economic Review*, 93(1), 113-132.

- **Abadie, A., Diamond, A., & Hainmueller, J. (2010)**. [Synthetic control methods for comparative case studies: Estimating the effect of California's tobacco control program](https://doi.org/10.1198/jasa.2009.ap08746). *Journal of the American Statistical Association*, 105(490), 493-505.

- **Abadie, A., Diamond, A., & Hainmueller, J. (2015)**. [Comparative politics and the synthetic control method](https://doi.org/10.1111/ajps.12116). *American Journal of Political Science*, 59(2), 495-510.

- **Abadie, A. (2021)**. [Using synthetic controls: Feasibility, data requirements, and methodological aspects](https://doi.org/10.1257/jel.20191450). *Journal of Economic Literature*, 59(2), 391-425.

- **Card, D. (1990)**. [The impact of the Mariel Boatlift on the Miami labor market](https://doi.org/10.2307/2523702). *Industrial and Labor Relations Review*, 43(2), 245-257.

- **Cunningham, S. (2021)**. [*Causal Inference: The Mixtape*](https://mixtape.scunning.com/). Yale University Press. Chapter 10: Synthetic Control.