In this notebook we'll be collecting data from the EPA AQS api (https://aqs.epa.gov/aqsweb/documents/data_api.html) for all of the available monitoring sites in Cook county. According to the wikipedia page for AQI values in the United States (https://en.wikipedia.org/wiki/Air_quality_index#United_States) can be based on a number of different measurements. Namely, the $O_3$ (8-hr and 1-hr), $PM_{2.5}$ (24-hr), $PM_{10}$ (24-hr), $CO$ (8-hr), $SO_2$ (1-hr and 24-hr), and $NO_2$ (1-hr) concentrations. The Wikipedia article has details for which values to use in certain scenarios for calculating AQI.

In [110]:
import numpy as np
import pandas as pd
import requests
import json
import matplotlib.pyplot as plt
%matplotlib inline

In [47]:
# api access variables
# state and county codes obtained from the list sample queries on the
url_head = 'https://aqs.epa.gov/data/api'
email = 'garrett.l.ducharme@gmail.com'
with open('C:/Users/ducha/.api_keys/aqs.txt') as f:
    key = json.load(f)["key"]
state = '17'
county = '031'

In [48]:
# Obtaining monitoring sites for cook county
end_point = 'list/sitesByCounty'
url = f'{url_head}/{end_point}?email={email}&key={key}&state={state}&county={county}'
site_codes = requests.get(url)
sites = pd.DataFrame(json.loads(site_codes.text)['Data'])
sites.head()

Unnamed: 0,code,value_represented
0,1,VILLAGE GARAGE
1,2,
2,3,
3,4,
4,5,


In [60]:
sites[~sites.value_represented.isna()]

Unnamed: 0,code,value_represented
0,1,VILLAGE GARAGE
13,14,FARR HALL
21,22,WASHINGTON HS
24,26,CERMAK PUMP STATION
30,32,SOUTH WATER FILTRATION PLANT
40,42,SEARS TOWER
46,50,SE POLICE STATION
48,52,MAYFAIR PUMP STATION
53,57,SPRINGFIELD PUMP STATION
56,60,CARVER HS


In [49]:
# Obtaining the valid parameter list
end_point = 'list/classes'
url = f'{url_head}/{end_point}?email={email}&key={key}'
param_codes = requests.get(url)
params = pd.DataFrame(json.loads(param_codes.text)['Data'])
params

Unnamed: 0,code,value_represented
0,AIRNOW MAPS,The parameters represented on AirNow maps (881...
1,ALL,Select all Parameters Available
2,AQI POLLUTANTS,Pollutants that have an AQI Defined
3,CORE_HAPS,Urban Air Toxic Pollutants
4,CRITERIA,Criteria Pollutants
5,CSN DART,List of CSN speciation parameters to populate ...
6,FORECAST,Parameters routinely extracted by AirNow (STI)
7,HAPS,Hazardous Air Pollutants
8,IMPROVE CARBON,IMPROVE Carbon Parameters
9,IMPROVE_SPECIATION,PM2.5 Speciated Parameters Measured at IMPROVE...


All of the polutants that we need to calculate the AQI are contained within the 'CRITERIA' class.

In [50]:
# Obtain all parameters in the criteria class
end_point = 'list/parametersByClass'
pc = 'CRITERIA'
url = f'{url_head}/{end_point}?email={email}&key={key}&pc={pc}'
criteria_codes = requests.get(url)
criteria = pd.DataFrame(json.loads(criteria_codes.text)['Data'])
criteria

Unnamed: 0,code,value_represented
0,14129,Lead (TSP) LC
1,42101,Carbon monoxide
2,42401,Sulfur dioxide
3,42602,Nitrogen dioxide (NO2)
4,44201,Ozone
5,81102,PM10 Total 0-10um STP
6,85129,Lead PM10 LC FRM/FEM
7,88101,PM2.5 - Local Conditions


All meteorological data is contained in the 'MET' class.

In [45]:
end_point = 'list/parametersByClass'
pc = 'MET'
url = f'{url_head}/{end_point}?email={email}&key={key}&pc={pc}'
met_codes = requests.get(url)
met = pd.DataFrame(json.loads(met_codes.text)['Data'])
met

Unnamed: 0,code,value_represented
0,61101,Wind Speed - Scalar
1,61102,Wind Direction - Scalar
2,61103,Wind Speed - Resultant
3,61104,Wind Direction - Resultant
4,61105,Peak Wind Gust
5,61106,Std Dev Hz Wind Direction
6,61107,Std Dev Vt Wind Direction
7,61109,Vertical Wind Speed
8,61110,Std Dev Vt Wind Speed
9,61111,Std Dev Hz Wind Speed


We'll now start obtaining the criteria data and the meteorological data and storing it for future analysis.

In [91]:
# Getting the PM 2.5 data for all sites in Cook county
param = '88101'
bdate = '20190101'
edate = '20191231'
end_point = 'sampleData/byCounty'
url = f'{url_head}/{end_point}?email={email}&key={key}&param={param}' \
      f'&bdate={bdate}&edate={edate}&state={state}&county={county}'
county_resp = requests.get(url)

In [105]:
county_p25_df = pd.DataFrame(json.loads(county_resp.text)['Data'])
# Keeping only hourly measurements
county_p25_df = county_p25_df.loc[county_p25_df['sample_frequency'] == 'HOURLY']

In [106]:
# Looking at the columns to see what we can drop
print(county_p25_df.columns)
county_p25_df.head()

Index(['cbsa_code', 'county', 'county_code', 'date_gmt', 'date_local',
       'date_of_last_change', 'datum', 'detection_limit', 'latitude',
       'longitude', 'method', 'method_code', 'method_type', 'parameter',
       'parameter_code', 'poc', 'qualifier', 'sample_duration',
       'sample_frequency', 'sample_measurement', 'site_number', 'state',
       'state_code', 'time_gmt', 'time_local', 'uncertainty',
       'units_of_measure'],
      dtype='object')


Unnamed: 0,cbsa_code,county,county_code,date_gmt,date_local,date_of_last_change,datum,detection_limit,latitude,longitude,...,sample_duration,sample_frequency,sample_measurement,site_number,state,state_code,time_gmt,time_local,uncertainty,units_of_measure
995,16980,Cook,31,2019-03-01,2019-03-01,2019-05-03,NAD83,2.0,41.57862,-87.557406,...,1 HOUR,HOURLY,19.2,119,Illinois,17,06:00,00:00,,Micrograms/cubic meter (LC)
996,16980,Cook,31,2019-03-01,2019-03-01,2019-05-03,NAD83,2.0,41.57862,-87.557406,...,1 HOUR,HOURLY,16.3,119,Illinois,17,07:00,01:00,,Micrograms/cubic meter (LC)
997,16980,Cook,31,2019-03-01,2019-03-01,2019-05-03,NAD83,2.0,41.57862,-87.557406,...,1 HOUR,HOURLY,19.8,119,Illinois,17,08:00,02:00,,Micrograms/cubic meter (LC)
998,16980,Cook,31,2019-03-01,2019-03-01,2019-05-03,NAD83,2.0,41.57862,-87.557406,...,1 HOUR,HOURLY,16.7,119,Illinois,17,09:00,03:00,,Micrograms/cubic meter (LC)
999,16980,Cook,31,2019-03-01,2019-03-01,2019-05-03,NAD83,2.0,41.57862,-87.557406,...,1 HOUR,HOURLY,14.1,119,Illinois,17,10:00,04:00,,Micrograms/cubic meter (LC)


In [107]:
columns_to_drop = ['cbsa_code', 'county', 'county_code', 'date_gmt', 'datum', 'detection_limit',
                   'method', 'method_code', 'parameter_code', 'poc', 'state', 'state_code', 'time_gmt',
                   'uncertainty', 'method_type']
county_p25_df.drop(columns=columns_to_drop, inplace=True)

In [108]:
county_p25_df.head()

Unnamed: 0,date_local,date_of_last_change,latitude,longitude,parameter,qualifier,sample_duration,sample_frequency,sample_measurement,site_number,time_local,units_of_measure
995,2019-03-01,2019-05-03,41.57862,-87.557406,PM2.5 - Local Conditions,,1 HOUR,HOURLY,19.2,119,00:00,Micrograms/cubic meter (LC)
996,2019-03-01,2019-05-03,41.57862,-87.557406,PM2.5 - Local Conditions,,1 HOUR,HOURLY,16.3,119,01:00,Micrograms/cubic meter (LC)
997,2019-03-01,2019-05-03,41.57862,-87.557406,PM2.5 - Local Conditions,,1 HOUR,HOURLY,19.8,119,02:00,Micrograms/cubic meter (LC)
998,2019-03-01,2019-05-03,41.57862,-87.557406,PM2.5 - Local Conditions,,1 HOUR,HOURLY,16.7,119,03:00,Micrograms/cubic meter (LC)
999,2019-03-01,2019-05-03,41.57862,-87.557406,PM2.5 - Local Conditions,,1 HOUR,HOURLY,14.1,119,04:00,Micrograms/cubic meter (LC)


In [114]:
county_p25_df.site_number.value_counts()

4007    8760
4201    8760
0119    7344
Name: site_number, dtype: int64