# CausalPy + maketables Integration Demo

This notebook demonstrates how CausalPy experiments integrate with the `maketables` library for generating publication-ready coefficient tables.

CausalPy implements the zero-coupling plug-in format (`__maketables_coef_table__`, etc.), which maketables automatically detects - no registration needed!

In [None]:
# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "causalpy",
#     "scikit-learn",
#     "maketables @ git+https://github.com/py-econometrics/maketables.git@bbc5c80",
# ]
# ///

import warnings
warnings.filterwarnings('ignore')

import causalpy as cp
from sklearn.linear_model import LinearRegression
import pandas as pd

## 1. Difference-in-Differences with OLS

Let's start with a simple DiD example using OLS.

In [2]:
# Load the DiD dataset
df = cp.load_data("did")
df.head()

Unnamed: 0,group,t,unit,post_treatment,y
0,0,0.0,0,False,0.897122
1,0,1.0,0,True,1.961214
2,1,0.0,1,False,1.233525
3,1,1.0,1,True,2.752794
4,0,0.0,2,False,1.149207


In [3]:
# Fit DiD with OLS
did_ols = cp.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=LinearRegression(),
)

In [4]:
# Access the maketables coefficient table directly
did_ols.__maketables_coef_table__

Unnamed: 0_level_0,b,se,t,p,ci_lower,ci_upper
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,1.07561,,,,,
post_treatment[T.True],0.9861,,,,,
group,0.162468,,,,,
group:post_treatment[T.True],0.504334,,,,,


## 2. Difference-in-Differences with PyMC (Bayesian)

Now let's run the same analysis with a Bayesian model.

In [5]:
# Fit DiD with PyMC (Bayesian)
sample_kwargs = {"tune": 500, "draws": 500, "chains": 2, "cores": 2, "progressbar": True}

did_pymc = cp.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group*post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=cp.pymc_models.LinearRegression(sample_kwargs=sample_kwargs),
)

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, y_hat_sigma]


Output()

Sampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 0 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
Sampling: [beta, y_hat, y_hat_sigma]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]


In [6]:
# Access the maketables coefficient table - now with full Bayesian statistics!
did_pymc.__maketables_coef_table__

Unnamed: 0_level_0,b,se,t,p,ci_lower,ci_upper
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,1.073705,0.025931,,0.0,1.020986,1.117792
post_treatment[T.True],0.989181,0.036796,,0.0,0.926604,1.059406
group,0.165089,0.036445,,0.0,0.096214,0.231315
group:post_treatment[T.True],0.500077,0.050361,,0.0,0.404518,0.594181


### Understanding the Bayesian Coefficient Table

For Bayesian models, the columns represent:
- **b**: Posterior mean (point estimate)
- **se**: Posterior standard deviation (uncertainty)
- **t**: None (not applicable for Bayesian inference)
- **p**: Two-tailed posterior probability: `min(P(β>0), P(β<0)) * 2`
- **ci_lower/ci_upper**: 94% Highest Density Interval (HDI) bounds

## 3. Other maketables Attributes

CausalPy experiments also provide other maketables-compatible attributes.

In [7]:
# Get statistics
print("Sample size (N):", did_pymc.__maketables_stat__("N"))
print("Model type:", did_pymc.__maketables_stat__("model_type"))
print("Experiment type:", did_pymc.__maketables_stat__("experiment_type"))

Sample size (N): 40
Model type: PyMC
Experiment type: Difference in Differences


In [8]:
# Get dependent variable name
print("Dependent variable:", did_pymc.__maketables_depvar__)

Dependent variable: y


In [9]:
# Get variance-covariance info
print("VCov info:", did_pymc.__maketables_vcov_info__)

VCov info: {'vcov_type': 'Bayesian', 'clustervar': None}


## 4. Using maketables to Create Tables

Now let's use `maketables` to create a publication-ready table comparing both models.

In [11]:
from maketables import ETable

# maketables automatically detects CausalPy experiments via the plug-in format!
table = ETable(
    [did_ols,
    did_pymc],
    headers=["OLS", "Bayesian"],
)
table

Unnamed: 0_level_0,y,y
Unnamed: 0_level_1,(1),(2)
coef,coef,coef
post_treatment=True,0.986 (-),0.989*** (0.037)
group,0.162 (-),0.165*** (0.036)
group × post_treatment=True,0.504 (-),0.500*** (0.050)
Intercept,1.076 (-),1.074*** (0.026)
stats,stats,stats
Observations,40,40
R2,-,-
"Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)"




## 5. Regression Discontinuity Example

In [12]:
# Load RD dataset
rd_df = cp.load_data("rd")

# Fit RD with OLS
rd_ols = cp.RegressionDiscontinuity(
    rd_df,
    formula="y ~ 1 + x + treated + x:treated",
    running_variable_name="x",
    model=LinearRegression(),
    treatment_threshold=0.5,
)

rd_ols.__maketables_coef_table__

Unnamed: 0_level_0,b,se,t,p,ci_lower,ci_upper
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,0.0,,,,,
treated[T.True],2.471126,,,,,
x,1.318479,,,,,
x:treated[T.True],-3.108333,,,,,


In [13]:
# Fit RD with PyMC
rd_pymc = cp.RegressionDiscontinuity(
    rd_df,
    formula="y ~ 1 + x + treated + x:treated",
    running_variable_name="x",
    model=cp.pymc_models.LinearRegression(sample_kwargs=sample_kwargs),
    treatment_threshold=0.5,
)

rd_pymc.__maketables_coef_table__

Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [beta, y_hat_sigma]


Output()

Sampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 1 seconds.
We recommend running at least 4 chains for robust computation of convergence diagnostics
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
Sampling: [beta, y_hat, y_hat_sigma]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]
Sampling: [y_hat]


Unnamed: 0_level_0,b,se,t,p,ci_lower,ci_upper
Coefficient,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Intercept,0.083177,0.047904,,0.076,-0.000265,0.175558
treated[T.True],2.47522,0.404007,,0.0,1.694751,3.172776
x,1.317992,0.091985,,0.0,1.146906,1.478921
x:treated[T.True],-3.110242,0.514766,,0.0,-4.090046,-2.19407


In [15]:
# Compare RD models
ETable(
    [rd_ols,
    rd_pymc],
    headers=["RD (OLS)", "RD (Bayesian)"],
)

Unnamed: 0_level_0,y,y
Unnamed: 0_level_1,(1),(2)
coef,coef,coef
treated=True,2.471 (-),2.475*** (0.404)
x,1.318 (-),1.318*** (0.092)
x × treated=True,-3.108 (-),-3.110*** (0.515)
Intercept,0.000 (-),0.083* (0.048)
stats,stats,stats
Observations,100,100
R2,0.84,unit_0_r2 0.836121 unit_0_r2_std 0.012656 dtype: float64
"Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)","Significance levels: * p < 0.1, ** p < 0.05, *** p < 0.01. Format of coefficient cell: Coefficient (Std. Error)"




## 6. Compare Multiple Experiments

You can even compare different experiment types side by side!

In [None]:
# This will only work if the coefficients align - which they don't for DiD vs RD
# But you can compare multiple DiD specifications:

did_simple = cp.DifferenceInDifferences(
    df,
    formula="y ~ 1 + group + post_treatment + group:post_treatment",
    time_variable_name="t",
    group_variable_name="group",
    model=LinearRegression(),
)

ETable(
    [did_ols, did_simple],
    headers=["DiD (default)", "DiD (explicit)"],
)

## Summary

CausalPy's maketables integration provides:

1. **Zero-coupling**: No registration needed - just pass experiments to `ETable()` and maketables auto-detects them
2. **Bayesian-aware**: Proper handling of posterior statistics (HDI, tail probabilities)
3. **Model-agnostic**: Works with both PyMC (Bayesian) and scikit-learn (OLS) models
4. **Experiment-agnostic**: Works with all CausalPy experiment types (DiD, RD, ITS, SC, etc.)

The dunder attributes implemented on `BaseExperiment` are:
- `__maketables_coef_table__`: DataFrame with b, se, t, p, ci_lower, ci_upper
- `__maketables_stat__(key)`: Returns N, model_type, experiment_type, r2
- `__maketables_depvar__`: Returns dependent variable name
- `__maketables_vcov_info__`: Returns variance-covariance metadata