Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This repository provides data and code for reproducing "RNN-based counterfactual prediction, with an application to homestead policy and public schooling".

Please cite the paper if you use this code for academic research:

      title={RNN-based counterfactual prediction, with an application to homestead policy and public schooling}, 
      author={Jason Poulos and Shuxi Zeng},


  • Python 3 (tested on Python 3.6.8)
    • scikit-learn (tested on 0.23.2)
    • numpy (tested on 1.19.1)
    • pandas (tested on 1.0.3)
    • h5py (tested on 2.10.0)
    • matplotlib (tested on 2.0.2)
  • Tensorflow 2.1.0 (CUDA 10.1 and cudDNN 7.6 for Linux GPU)
  • Keras (tested on 2.3.1)
  • R (tested on 3.6.3)

Set up

  • Clone a copy of the repository to your working directory with the command
$ git clone
  • Open code/package-list.R in a script editor
    • Verify that all required packages are installed in your R library

Placebo test experiments

Make each file below executable, then execute in shell (within the home dir.):

  • code/ sine waves data
  • code/ GP data
  • code/ education spending data
  • code/ stock market data

The results reproduce Table 1 (staggered treatment) and Table SM-1 (simultaneous treatment).

code/educ-placebo-plot.R and code/sine-placebo-plot.R creates the plots to reproduce Figure 2.

code/stock-placebo-plot.R creates the plot to reproduce Figure 3.

Application: counterfactual predictions

  1. First, prepare public education spending data by running in R code/prepare-funds.R

  2. Second, run in shell with command line arguments <GPU_ID> <hidden_activation> <n_hidden> <patience> <dropout rate> <penalty> <learning_rate> <epochs> <batches> <data_name> <window_size> <T> <imputation_method>; e.g.,

python3 code/ 3 'tanh' 128 25 0.5 0.01 0.001 500 32 'educ' 10 203 'locf'
python3 code/ 3 'tanh' 128 25 0.5 0.01 0.001 500 32 'educ' 10 203 'locf'

The script code/ trains RNNs on differently imputed datasets and different RNNs configurations.

  • To plot the training and validation error, run code/ <file location of training log> <title>; e.g.,
python3 code/ './results/encoder-decoder/educ/training_log_educ_locf_tanh_128_25_0.2_0.01_32.csv' 'Encoder-decoder loss'
python3 code/ './results/lstm/educ/training_log_educ_locf_tanh_128_25_0.2_0.01_32.csv' 'LSTM loss'
  • To estimate and plot causal estimates and randomization confidence intervals for RNNs trained on differently imputed datasets and different configurations, execute in shell code/ (Figure 4, first column of Table 2, Table SM-2, and Table SM-3)
  1. To compare RNNs estimates with alternative estimators and imputation methods, execute in shell code/ (Table 2 and Table SM-3)

  2. For RNNs placebo treatement effects estimates on pre-treatment data, execute in shell code/, which produces results for second column of Table 2.

  3. To plot autocorrelation function for the placebo test datasets (Figure 1), run code/autocorrelation-plot.R

  4. To plot the extent of non-response in education spending data (Figure SM-1) , run code/non-response-plot.R