# naturalexperiments

The `naturalexperiments` package is a comprehensive toolbox for treatment effect estimation. The package includes a variety of datasets, estimators, and evaluation metrics for treatment effect estimation. The package is designed to be accessible to researchers and practitioners who are new to treatment effect estimation and to provide a comprehensive set of tools for experienced researchers.

In [1]:
!pip install naturalexperiments

Collecting naturalexperiments
  Downloading naturalexperiments-0.2.1-py3-none-any.whl.metadata (7.2 kB)
Collecting geopandas (from naturalexperiments)
  Downloading geopandas-0.14.4-py3-none-any.whl.metadata (1.5 kB)
Collecting geopy (from naturalexperiments)
  Downloading geopy-2.4.1-py3-none-any.whl.metadata (6.8 kB)
Collecting contextily (from naturalexperiments)
  Downloading contextily-1.6.0-py3-none-any.whl.metadata (2.9 kB)
Collecting catenets (from naturalexperiments)
  Downloading catenets-0.2.3-py3-none-any.whl.metadata (7.9 kB)
Collecting rdata (from naturalexperiments)
  Downloading rdata-0.11.2-py3-none-any.whl.metadata (11 kB)
Collecting gdown (from catenets->naturalexperiments)
  Downloading gdown-5.2.0-py3-none-any.whl.metadata (5.8 kB)
Collecting jax>=0.3.16 (from catenets->naturalexperiments)
  Downloading jax-0.4.28-py3-none-any.whl.metadata (23 kB)
Collecting loguru>=0.5.3 (from catenets->naturalexperiments)
  Downloading loguru-0.7.2-py3-none-any.whl.metadata (23 k

In [2]:
import naturalexperiments as ne

## Datasets

We introduce a novel treatment effect dataset from an early childhood literacy natural experiment. The treatment in the experiment is participation in Reach Out and Read Colorado (RORCO). The dataset has an observational version called RORCO Real with real literacy outcomes and a semi-synthetic version called RORCO for estimator evaluation purposes.

In addition to RORCO and RORCO Real, we provide easy access to standard treatment effect datasets including ACIC 2016, ACIC 2017, IHDP, Jobs, News, and Twins.

All of the datasets can be loaded using the `dataloaders` object.

In [3]:
dataset = 'RORCO'

X, y, z = ne.dataloaders[dataset]()

## Estimators

We propose a novel, theoretically motivated doubly robust estimator called Double-Double. In addition to Double-Double, we provide implementations of more than 20 established estimators from the literature.

All of the estimators can be easily loaded using the `methods` object.

In [4]:
method = 'Double-Double'
estimator = ne.methods[method]

Each method takes the following arguments: the covariates `X`, the outcomes `y`, the treatment assignment `z`, propensity score estimates `p`, and a function for training `train` predictions in the estimator.

We can use the `estimate_propensity` function to estimate the propensity scores.

In [5]:
p = ne.estimate_propensity(X, z)

Then, with the propensity scores, we can estimate the treatment effects.

In [6]:
estimated_effect = estimator(X, y, z, p, ne.train)

By default, `train` trains a three-layer neural network with 100 units in each layer and ReLU activations.
Some estimators, such as regression discontinuity, do not use the training functions and some estimators, such as the CATENet estimators, use custom training functions defined in the estimator.

## Exploring the Datasets

We can explore the datasets with a tabular comparison and several figures.

In [10]:
for dataset in ne.dataloaders:
    if dataset != 'ACIC 2017': continue
    print(dataset)
    X, y, z = ne.dataloaders[dataset]()
    print(z.max(), z.min())
    p = ne.estimate_propensity(X, z)
    estimated_effect = estimator(X, y, z, p, ne.train)
    print(f'{dataset}: {estimated_effect}')

ACIC 2017
6.0 0.0


RuntimeError: all elements of target should be between 0 and 1

In [7]:
# Produces a markdown table comparing the size, number of variables, treatment rate, etc.
ne.dataset_table(ne.dataloaders, print_md=True)

RuntimeError: all elements of target should be between 0 and 1

In [None]:
# Produces plots of the propensity score distribution, outcomes by propensity scores, and propensity calibration
ne.plot_all_data(ne.dataloaders)

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


## Benchmarking the Estimators

We can benchmark the estimators on the datasets using the `compute_variance` function.

In [None]:
methods = {name: ne.methods[name] for name in ['Double-Double', 'Regression Discontinuity', 'TARNet']}

variance, times = ne.compute_variance(methods, dataset, num_runs=1)

Due to the computational complexity of some estimators (e..g, the CATENets), the benchmark subsamples the data by default. We can adjust the subsample size with the `limit` argument. Even then, many estimators may take a long time to run.

Once we benchmark the estimators, we can print the results in a table.

In [None]:
ne.benchmark_table(variance, times, print_md=True)

## Additional Features

The `naturalexperiments` package includes additional features for comprehensively evaluating treatment effect estimation.

There are functions for computing the empirical variance as a function of the sample size, the correlation in the outcomes, and the propensity score accuracy.

These functions and more appear in the `paper_experiments` folder (as the name suggests, the folder includes code to reproduce the results in the paper). Because some experiments are computationally intensive, the functions are designed to run in parallel by writing the results to a shared cache.