Skip to content
Leverage state-of-the-art computer vision tools for impact evaluation in development economics (work in progress)
Python Shell R
Branch: master
Clone or download
Pull request Compare This branch is 37 commits ahead, 3 commits behind jfzhang95:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dataloaders
docs
model
regress
scripts
spacenetutilities
utils
.gitignore
README.md
__init__.py
define_path.py
infer.py
train.py
visualize.py

README.md

Impact Evaluation with Machine Learning and High-Resolution Satellite Images

Eliminating Poverty

The elimination of poverty worldwide is the No. 1 UN Sustainable Development Goal for 2030. To achieve this, we need more rigorous evaluations to know what anti-poverty program works and what doesn't. However, traditional data collection methods in developing countries (for example, household surveys) tend to be very expensive. Each evaluation can cost up to 0.4-3 million USD (How Much Will an Impact Evaluation Cost?).

Promise of Satellite Imagery and Machine Learning

High-resolution satellite images and machine learning offer great promise for cheaper evaluations. Satellite images contain extremely rich information about households' economic status: quality of their housing, asset ownership such as cars and barns, agricultural productivity on their lands, local infrastructure quality, and so on. With the state-of-the-art machine learning models to help process large image data, these can serve as objective and reliable economic measures. With additional econometric analysis leveraging either experimental or quasi-experimental variations, economists can evaluate programs at a fraction of the cost of traditional methods.

This Project

The setting for this project is rural Kenya, where poorer people live in houses that they built themselves with a thatched roof (made with dry vegetation such as straw and palm branches). These roofs are not very durable, and need to be replaced frequently. Richer people tend to live in houses with more durable and high-quality metal roofs. Taking roof quality as a proxy for economic well-being, this project uses a machine learning model, DeepLabV3+, to identify buildings with metal roofs in high-resolution satellite images taken over these Kenyan villages.

Figure 1. Visualization of segmentation masks for high-quality (metal) roof on high-resolution satellite images in Kenyan villages.

Mask

To validate these measures, I test whether a large, randomized unconditional cash transfer (GiveDirectly) improved roof quality in rural Kenya. The answer is yes! Preliminary results (see Figure 2) show that program effects estimated from satellite-derived outcomes are consistent with those from traditional methods (household surveys).

Figure 2. Distribution of the proportion of pixels that are covered by high-quality (metal) roof for each household. Consistent with the survey findings, those who receive a large cash transfer (Treatment) live in houses with higher roof quality, compared to those who do not (Control). This difference is statistically significant.

These results are preliminary and incomplete. Please do not cite or distribute. For a more comprehensive description of the randomized controlled trial and the institutional contexts, see Michael Walker's Job Market Paper.

This project is just a very first step towards pursuing a broader research agenda, which seeks to leverage technological advances in data science to improve effectiveness of aid projects in developing countries.

Project Documentation

This repo is forked from pytorch-deeplab-xception, which is beautifully written and a PyTorch implementation of the original model (authors' implementation in Tensorflow). This repo also makes use of the official SpaneNet utilities (these codes are in the spacenetutilities/ folder). I preprocessed the spatial datasets, added dataloaders to connect them with the model, and trained the model on a supercomputing facility. I also conducted some preliminary econometric analyses and data visualizations.

Building footprint segmentation is not one of the benchmarks in the original DeepLab paper series, so I fetched the SpaceNet (Round 2 Khartoum, as this is closest to a Kenyan setting) dataset to pre-train a DeepLabV3+ model (because the annotated Google Earth image dataset is small). I then fine-tune the model on a set of Google Earth Images from the Kenyan villages that have been enrolled into the GiveDirectly experiment. All the metal roof buildings in the Google Earth images have been annotated, and the model produces predicted segmentation masks, as shown in Figure 1.

To run these codes (these codes are set up to be run on a supercomputing facility with Slurm support, hence the headers in the bash scripts in scripts/), remember to set up the correct local path in define_path.py first

  • Download SpaceNet Data (Round 2), particularly for Khartoum (with awscli installed)
aws s3 cp s3://spacenet-dataset/AOI_5_Khartoum/AOI_5_Khartoum_Train.tar.gz .
tar -xf AOI_5_Khartoum_Train.tar.gz
  • Preprocess using the SpaceNetUtilities (with slurm sbatch), fixing a few bugs from the v3 branch in the official repo
cd scripts
sbatch preprocess.sh
  • Train the model with SpaceNet first, and then fine-tune on Google Earth (modifying command-line arguments in train_mobilenet.sh) (CUDA support required)
sbatch train_mobilenet.sh
  • Tensorboard is supported, launch tensorboard during training
./launch_tb.sh
  • Visualize a subset of images and masks (replicate Figure 1) (CUDA support required)
sbatch visualize_mobilenet.sh
  • Generate predictions for all images (CUDA support required)
sbatch infer.sh
  • Merge model predictions with existing dataset
cd ../regress
python merge.py
  • Subsequent econometric analysis and visualization of results are done in RStudio with regress.R (replicate Figure 2)
You can’t perform that action at this time.