This repo contains the replication code for the paper "Program targeting with machine learning and mobile phone data: Evidence from an anti-poverty intervention in Afghanistan" by Emily Aiken, Guadalupe Bedoya, Joshua Blumenstock, and Aidan Coville (2022). The repo is structured as follows:
data/survey.csv
: Synthetic household survey datadata/phone_features.csv
: Synthetic featurized phone data, containing around a thousand features relating to mobile phone usedata/interim_analysis_datasets
: Most datasets saved during the analysis will be stored here
data/generate_synthetic_data.ipynb
: Generates the synthetic survey and phone features datasets; adjust parameters for more or fewer observations, or to inject correlations between variables0requirements.ipynb
: Installs the required packages for the subsequent replication scripts 1-41survey.ipynb
: Analysis of raw survey data and generation of additional survey-based outcomes (asset index, below poverty line)2machinelearning.ipynb
: Implementation and evaluation of machine learning models to predict survey outcomes from mobile phone features3targeting.iynb
: Targeting simulations to compare accuracy of targeting methods4costs.ipynb
: Calculation of targeting costs for PMT, CBT, and the phone-based approach
results/tables
: All tables that are present in the paper are saved here in .csv formatresults/figures
: All figures that are present in the paper are saved here in .png formatresults/simulations
: Results from machine learning models (predictions, feature importances, the models themselves) are saved here
- python = 3.7.7
- numpy = 1.18.5
- pandas = 1.0.5
- matplotlib = 3.3.2
- seaborn = 0.10.1
- scikit-learn = 0.23.1
- scikit-misc = 0.1.3
- lightgbm = 2.3.0
- joblib = 0.15.1