Skip to content
Using machine learning to recover air quality data from remote sensing datasets
Jupyter Notebook Python Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Information, Incentives and Air Quality: New Evidence from Machine Learning Predictions

Yue 'Luna' Huang (UC Berkeley), Minghao Qiu (MIT)

Slides | Manuscript | Github

In command-and-control regulations, information asymmetry between central regulators and local agents is often cited as a key issue leading to ineffective policies. We evaluate a policy in China, which built air quality monitoring stations and enforced automatic data reporting to the central government, effectively preventing data manipulations by local officials. Exploiting the staggered implementation of this policy across 367 cities, we examine the impacts of the policy on local air quality. However, before monitoring stations were set up and data were credibly reported, we cannot observe pre-treatment air quality data. To overcome this challenge, we leverage recent development in machine learning (specifically, extreme gradient boosting) and a rich set of satellite images from NASA and reconstruct a comprehensive air pollution dataset in China with almost 0.5 million observations spanning from 2005 to 2016. Our structural break estimates do not demonstrate significant program effects.

You can’t perform that action at this time.