## Download and Pre-process EPA's Air Quality Data

Last updated on 07/06/2023


This notebook demonstrates the download and pre-processing of the EPA real-time air quality monitoring data from AirNow API: https://docs.airnowapi.org/

How to build the query is available here: https://docs.airnowapi.org/Data/query

The notebook includes:
- Download the EPA real-time air quality data.  through API
- Organize the data into a pandas data frame
- Save the data frame to a csv file for further anlaysis  

User-defined variable

In [2]:
# URL built for the Chicago region
url = 'https://www.airnowapi.org/aq/data/?startDate=2023-06-26T14&endDate=2023-06-26T15&parameters=PM25&BBOX=-89.615860,41.33,-84.606094,44.3&dataType=A&format=text/csv&verbose=0&monitorType=0&includerawconcentrations=0&API_KEY=3EDB1ADE-7637-4F22-A5B3-05218A41D98D'
# Output csv file
outfilename = 'drive/MyDrive/epa_chicago_20230626.csv'

## Google Colab Environment

In [3]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## Download the EPA Air Quality Data

EPA provides the web service for real-time air quality data. The website lets us quary data based on the bounding box or zip codes, and build the data download URL. Once you have the URL, the download is one line with the "requests" package that sends HTTP/1.1 requests.

- AirNow API: https://docs.airnowapi.org/
- Data Quary: https://docs.airnowapi.org/Data/query

In [4]:
import requests

In [5]:
r = requests.get(url, allow_redirects=True)
open('epa_aqi_temp.csv', 'wb').write(r.content)

3531

## Pre-Process Datasets

We create the pandas dataframe and save it to a file with the specified name.

In [6]:
import pandas as pd

In [7]:
df=pd.read_csv('epa_aqi_temp.csv',header=None, names=['Latitude','Longitude','DateTime','Analyte','AQI','flag'])

In [8]:
# Re-save with proper headers
df.to_csv(outfilename)

In [9]:
df

Unnamed: 0,Latitude,Longitude,DateTime,Analyte,AQI,flag
0,43.07378,-89.43595,2023-06-26T14:00,PM2.5,80,2
1,43.1008,-89.3572,2023-06-26T14:00,PM2.5,73,2
2,42.267002,-89.08917,2023-06-26T14:00,PM2.5,58,2
3,43.46611,-88.621109,2023-06-26T14:00,PM2.5,84,2
4,42.2211,-88.2411,2023-06-26T14:00,PM2.5,54,2
5,43.0203,-88.215,2023-06-26T14:00,PM2.5,64,2
6,41.7714,-88.1522,2023-06-26T14:00,PM2.5,56,2
7,41.526885,-88.116474,2023-06-26T14:00,PM2.5,53,2
8,42.93257,-87.93434,2023-06-26T14:00,PM2.5,69,2
9,43.0178,-87.9333,2023-06-26T14:00,PM2.5,60,2
