# Synthetic Difference-in-Differences (SDID) - Example Notebook

This notebook demonstrates how to use the `sdid` package for causal inference.

## Overview

SDID combines:
- **Synthetic Control**: Weights control units to match treated unit trends
- **Difference-in-Differences**: Compares pre/post treatment changes

Reference: Arkhangelsky et al. (2021), American Economic Review


## 1. Installation & Import


In [None]:
# Install if needed
# !pip install sdid

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sdid import SyntheticDiffInDiff

# Set random seed for reproducibility
np.random.seed(42)

print("Packages loaded successfully!")


## 2. Generate Synthetic Panel Data

We'll create a dataset simulating a policy intervention:
- **20 control states** + **2 treated states**
- **Years 2015-2022** (intervention starts 2020)
- **True treatment effect = 15**


In [None]:
# Configuration
N_CONTROL = 20      # Number of control units
N_TREATED = 2       # Number of treated units
YEARS = list(range(2015, 2023))  # 2015-2022
TREATMENT_YEAR = 2020
TRUE_EFFECT = 15.0  # The ground truth we want to recover

# Generate panel data
data = []
control_units = [f"State_{i:02d}" for i in range(N_CONTROL)]
treated_units = [f"Treated_{i:02d}" for i in range(N_TREATED)]

for unit in control_units + treated_units:
    is_treated = unit.startswith("Treated")
    base_level = 100 + np.random.randn() * 15  # Unit-specific baseline
    trend = 2 + np.random.randn() * 0.5        # Unit-specific trend
    
    for year in YEARS:
        is_post = year >= TREATMENT_YEAR
        effect = TRUE_EFFECT if (is_treated and is_post) else 0
        
        outcome = (
            base_level 
            + trend * (year - YEARS[0])
            + effect
            + np.random.randn() * 3  # Noise
        )
        
        data.append({
            'unit': unit,
            'year': year,
            'outcome': outcome,
            'treated': is_treated,
            'post': is_post
        })

df = pd.DataFrame(data)

print(f"Dataset: {df.shape[0]} observations")
print(f"   Control units: {N_CONTROL}")
print(f"   Treated units: {N_TREATED}")
print(f"   Time periods: {len(YEARS)} ({min(YEARS)}-{max(YEARS)})")
print(f"   Treatment starts: {TREATMENT_YEAR}")
print(f"   True effect: {TRUE_EFFECT}")
print()
df.head(10)


## 3. Visualize Raw Data


In [None]:
fig, ax = plt.subplots(figsize=(12, 6))

# Plot individual control units (light gray)
for unit in df[~df['treated']]['unit'].unique():
    unit_data = df[df['unit'] == unit]
    ax.plot(unit_data['year'], unit_data['outcome'], 
            color='lightgray', alpha=0.5, linewidth=1)

# Plot control average (blue)
control_avg = df[~df['treated']].groupby('year')['outcome'].mean()
ax.plot(control_avg.index, control_avg.values, 
        color='#2563eb', linewidth=2.5, label='Control Average', marker='o')

# Plot treated average (red)
treated_avg = df[df['treated']].groupby('year')['outcome'].mean()
ax.plot(treated_avg.index, treated_avg.values, 
        color='#dc2626', linewidth=2.5, label='Treated Average', marker='s')

# Intervention line
ax.axvline(x=TREATMENT_YEAR, color='black', linestyle='--', 
           linewidth=2, alpha=0.7, label=f'Intervention ({TREATMENT_YEAR})')

ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Outcome', fontsize=12)
ax.set_title('Panel Data: Control vs Treated Units', fontsize=14, fontweight='bold')
ax.legend(loc='upper left', fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()


## 4. Run SDID Analysis


In [None]:
# Initialize SDID estimator
sdid = SyntheticDiffInDiff(
    data=df,
    outcome_col='outcome',
    times_col='year',
    units_col='unit',
    treat_col='treated',
    post_col='post'
)

# Fit the model
effect = sdid.fit(verbose=False)

print("=" * 50)
print("SDID ESTIMATION RESULTS")
print("=" * 50)
print(f"Estimated Treatment Effect: {effect:.4f}")
print(f"True Treatment Effect:      {TRUE_EFFECT:.4f}")
print(f"Estimation Error:           {abs(effect - TRUE_EFFECT):.4f}")
print("=" * 50)


## 6. Examine Weights


In [None]:
# Get weights summary
weights = sdid.get_weights_summary()

print("Top 10 Unit Weights (Control Units):")
print(weights['unit_weights'].head(10))
print(f"\nTotal unit weights: {weights['unit_weights']['weight'].sum():.4f}")

print("\n" + "-" * 40)
print("\nTime Weights:")
print(weights['time_weights'])


In [None]:
# Visualize unit weights
fig, ax = plt.subplots(figsize=(10, 6))

unit_w = weights['unit_weights'].sort_values('weight', ascending=True)
colors = ['#2563eb' if w > 0.01 else '#93c5fd' for w in unit_w['weight']]

ax.barh(range(len(unit_w)), unit_w['weight'], color=colors)
ax.set_yticks(range(len(unit_w)))
ax.set_yticklabels(unit_w.index, fontsize=9)
ax.set_xlabel('Weight', fontsize=12)
ax.set_title('SDID Unit Weights (Control Units)', fontsize=14, fontweight='bold')
ax.axvline(x=0, color='gray', linewidth=0.5)

plt.tight_layout()
plt.show()


## 7. Event Study Analysis


In [None]:
# Create event study plot with confidence intervals
# Note: This may take a minute due to bootstrap SE estimation

fig = sdid.plot_event_study(
    times=[2020, 2021, 2022],
    n_bootstrap=50,  # Use 200+ for production
    confidence_level=0.95,
    n_jobs=2,
    figsize=(10, 6)
)

plt.show()


## 8. Summary

### SDID Advantages

1. **Robustness**: Combines best of synthetic control + DiD
2. **Flexibility**: Handles multiple treated units
3. **Two-Way Fixed Effects**: Proper identification via unit & time FE
4. **Interpretable Weights**: Clear understanding of comparison group

### Mathematical Foundation

The treatment effect $\tau$ is identified via weighted two-way fixed effects:

$$Y_{it} = \alpha_i + \gamma_t + \tau \cdot D_{it} + \varepsilon_{it}$$

where observations are weighted by $w_{it} = \hat{\omega}_i \cdot \hat{\lambda}_t$

### Reference

```bibtex
@article{arkhangelsky2021synthetic,
  title={Synthetic difference-in-differences},
  author={Arkhangelsky, Dmitry and Athey, Susan and Hirshberg, David A and Imbens, Guido W and Wager, Stefan},
  journal={American Economic Review},
  volume={111},
  number={12},
  pages={4088--4118},
  year={2021}
}
```

---

**Install**: `pip install sdid`

**GitHub**: [github.com/AluminumShark/Synthetic_Difference_in_Difference](https://github.com/AluminumShark/Synthetic_Difference_in_Difference)


In [None]:
# Estimate standard error using placebo bootstrap
# Note: Use n_bootstrap=400 for production, fewer for demo
se = sdid.estimate_se(n_bootstrap=100, seed=42, n_jobs=2)

print(sdid.summary())
