# Regression Discontinuity with Simulated Data (`rd_sim.ipynb`)

This notebook mirrors the RD simulation material from the slides, providing **both R and Python** implementations:

- Simulate an RD-style dataset in R
- Visualize the discontinuity (scatter and binned averages) in R and Python
- Compare standard OLS to local linear RD regression in R and Python
- Estimate RD effects using RD packages (`rdrobust` in R and Python)

The notebook assumes a Python kernel with access to R via `rpy2` (as in Google Colab).

In [None]:
# If needed, install required Python packages.
# You can comment this out after the first run if the packages are already installed.

%pip install rdrobust rddensity statsmodels pandas numpy matplotlib rpy2

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# RD packages (Python)
from rdrobust import rdplot, rdrobust

# Regression
import statsmodels.formula.api as smf

# R interface
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, r
pandas2ri.activate()

# Convenience alias
R = ro.r

pd.set_option("display.max_columns", 10)
pd.set_option("display.width", 1000)

## 1. Simulate RD Data in R

We simulate a running variable \(X\) on \([0,2]\), a treatment indicator \(W = 1[X>1]\),
and an outcome:
\[
Y = 0.5 + 2X + 4W - 2.5(W \times X) + \varepsilon, \quad \varepsilon \sim N(0, 0.5^2).
\]

The true treatment effect at the cutoff is 1.5.

### R code

In [None]:
%%R
# Install and load required R packages
if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse, rdrobust)

set.seed(12345)
n <- 1000

rd.dat <- tibble(
  X = runif(n, 0, 2),
  W = as.integer(X > 1),
  Y = 0.5 + 2 * X + 4 * W - 2.5 * W * X + rnorm(n, 0, 0.5)
)

head(rd.dat)

### R: Scatter Plot with Cutoff

In [None]:
%%R
plot1 <- rd.dat %>%
  ggplot(aes(x = X, y = Y)) +
  geom_point() +
  theme_bw() +
  geom_vline(aes(xintercept = 1), linetype = "dashed") +
  scale_x_continuous(
    breaks = c(0.5, 1.5),
    labels = c("Untreated", "Treated")
  ) +
  xlab("Running Variable") +
  ylab("Outcome")

plot1

### Bridge: Use the Same Data in Python

We now pull the R tibble `rd.dat` into Python as a pandas DataFrame using `rpy2`.

In [None]:
# Convert R tibble rd.dat -> pandas DataFrame rd_dat
rd_dat = pandas2ri.rpy2py(R["rd.dat"])
rd_dat.head()

### Python: Scatter Plot with Cutoff

In [None]:
fig, ax = plt.subplots()

ax.scatter(rd_dat["X"], rd_dat["Y"], alpha=0.5)
ax.axvline(x=1.0, linestyle="--")

ax.set_xticks([0.5, 1.5])
ax.set_xticklabels(["Untreated", "Treated"])

ax.set_xlabel("Running Variable")
ax.set_ylabel("Outcome")
ax.set_title("RD Scatter Plot (Python)")

plt.tight_layout()
plt.show()

## 2. Binned Averages / Graphical Evidence

We next construct binned averages of \(Y\) within bins of \(X\) using `rdplot`.

### R: Binned Averages with `rdplot`

In [None]:
%%R
rd.result <- rdplot(
  y = rd.dat$Y,
  x = rd.dat$X,
  c = 1,
  title = "RD Plot with Binned Average",
  x.label = "Running Variable",
  y.label = "Outcome",
  hide = TRUE  # suppress rdplot's own graph
)

bin.avg <- as_tibble(rd.result$vars_bins)

plot.bin <- bin.avg %>%
  ggplot(aes(x = rdplot_mean_x, y = rdplot_mean_y)) +
  geom_point() +
  theme_bw() +
  geom_vline(aes(xintercept = 1), linetype = "dashed") +
  scale_x_continuous(
    breaks = c(0.5, 1.5),
    labels = c("Untreated", "Treated")
  ) +
  xlab("Running Variable") +
  ylab("Outcome")

plot.bin

### Python: Binned Averages with `rdplot`

In [None]:
# Use Python rdplot to compute binned averages (hide built-in plot)
rd_result_py = rdplot(
    y=rd_dat["Y"].to_numpy(),
    x=rd_dat["X"].to_numpy(),
    c=1.0,
    title="RD Plot with Binned Average",
    x_label="Running Variable",
    y_label="Outcome",
    hide=True  # do not show rdplot's own figure
)

bin_avg_py = rd_result_py.vars_bins
bin_avg_py.head()

In [None]:
# Custom plot of binned averages in Python
fig, ax = plt.subplots()

ax.scatter(bin_avg_py["rdplot_mean_x"], bin_avg_py["rdplot_mean_y"])
ax.axvline(x=1.0, linestyle="--")

ax.set_xticks([0.5, 1.5])
ax.set_xticklabels(["Untreated", "Treated"])

ax.set_xlabel("Running Variable")
ax.set_ylabel("Outcome")
ax.set_title("RD Plot with Binned Average (Python)")

plt.tight_layout()
plt.show()

## 3. Linear Regression vs Local Linear RD

We compare:

1. A standard OLS regression of \(Y\) on \(X\) and \(W\) using the full sample.
2. A local linear regression restricting to observations with \(0.8 < X < 1.2\) and
   using \(X - 1\) as the running variable (centered at the cutoff).

### R: OLS and Local Linear RD

In [None]:
%%R
# OLS on full sample
ols <- lm(Y ~ X + W, data = rd.dat)

# Local linear regression around cutoff X = 1
rd.dat3 <- rd.dat %>%
  mutate(x_dev = X - 1) %>%
  filter(X > 0.8 & X < 1.2)

rd <- lm(Y ~ x_dev + W, data = rd.dat3)

summary(ols)


In [None]:
%%R
summary(rd)

### Python: OLS and Local Linear RD

In [None]:
# 3.1 OLS on full sample
ols_py = smf.ols("Y ~ X + W", data=rd_dat).fit()
print(ols_py.summary())

In [None]:
# 3.2 Local linear regression around the cutoff X = 1
rd_dat3_py = (
    rd_dat
    .assign(x_dev=lambda df: df["X"] - 1.0)
    .query("X > 0.8 and X < 1.2")
)

rd_local_py = smf.ols("Y ~ x_dev + W", data=rd_dat3_py).fit()
print(rd_local_py.summary())

In [None]:
true_effect = 1.5
coef_ols_py = ols_py.params["W"]
coef_local_py = rd_local_py.params["W"]

print(f"True effect at cutoff: {true_effect:.2f}")
print(f"Python OLS coefficient on W (full sample): {coef_ols_py:.2f}")
print(f"Python local linear RD coefficient on W (0.8 < X < 1.2): {coef_local_py:.2f}")

## 4. RD Estimation with RD Packages

We now estimate the RD effect using `rdrobust` in both R and Python.

### R: `rdrobust`

In [None]:
%%R
rd.y <- rd.dat$Y
rd.x <- rd.dat$X

rd.est <- rdrobust(y = rd.y, x = rd.x, c = 1)
summary(rd.est)

### Python: `rdrobust`

In [None]:
rd_y = rd_dat["Y"].to_numpy()
rd_x = rd_dat["X"].to_numpy()

rd_est_py = rdrobust(y=rd_y, x=rd_x, c=1.0)
print(rd_est_py)

## 5. Optional: Manipulation / Density Tests

In practice, you should test for manipulation of the running variable using a density test at the cutoff.

### R template (using `rddensity`)

```r
# install.packages("rddensity")
library(rddensity)
dens_res <- rddensity(X, c = 1)
summary(dens_res)
```

### Python template (using `rddensity`)

```python
from rddensity import rddensity

dens_res_py = rddensity(x=rd_dat["X"].to_numpy(), c=1.0)
print(dens_res_py)
```

---

You now have both **R and Python** implementations for:

- Simulating RD data
- Visualizing the discontinuity (scatter and binned averages)
- Comparing OLS and local linear RD regressions
- Estimating RD effects with `rdrobust`

You can modify the simulation, bandwidth/window choices, and model specification
in either language to explore robustness and sensitivity of RD estimates.
