## API Requests from the EPA Website

This notebook performs API requests to collect data for a comparative study of air quality between nine cities spread throughout the US:

1.   Seattle, Washington
2.   San Diego, California
3.   Phoenix, Arizona
4.   Minneapolis, Minnesota
5.   Denver, Colorado
6.   Austin, Texas
7.   Philadelphia, Pennsylvania
8.   Nashville, Tennessee
9.   Jacksonville, Florida

The data gathered in this notebook is available at: https://www.epa.gov/outdoor-air-quality-data. The resulting data is saved as a DataFrame and exported as a CSV file titled _'aqs_data_raw.csv'_ and an Excel file titled _'aqs_data_raw.xlsx'_.

In [None]:
import requests
import json
import pandas as pd
import re
import time
from itertools import islice
import math

In [None]:
with open('../data/credentials.json') as file:
    credentials = json.load(file)
epa_key = credentials['epa_key']
epa_email = credentials['epa_email']

#### Declare the Static Variables

In [None]:
# State Level FIPS Codes
fips_wa = '53'
fips_ca = '06'
fips_az = '04'
fips_mn = '27'
fips_co = '08'
fips_tx = '48'
fips_pa = '42'
fips_tn = '47'
fips_fl = '12'

# County Level FIPS Codes
fips_king = '033'
fips_sandiego = '073'
fips_maricopa = '013'
fips_hennepin = '053'
fips_denver = '031'
fips_travis = '453'
fips_philadelphia = '101'
fips_davidson = '037'
fips_duval = '031'

# Dictionary with states as keys and counties as values
fips_dict = {fips_wa: fips_king,
             fips_ca: fips_sandiego,
             fips_az: fips_maricopa,
             fips_mn: fips_hennepin,
             fips_co: fips_denver,
             fips_tx: fips_travis,
             fips_pa: fips_philadelphia,
             fips_tn: fips_davidson,
             fips_fl: fips_duval}

# Request Dates
date_list = ['20170101', '20180101', '20190101', '20200101', '20210101']

# API Endpoint
endpoint = 'https://aqs.epa.gov/data/api/annualData/byCounty?'

#### Obtain the AQS (Air Quality System) Parameter Codes
- The AQI (Air Quality Index) is a nationally uniform color-coded index for reporting and forecasting daily air quality. It is used to report on the most common ambient air pollutants that are regulated under the Clean Air Act: **ground-level _ozone_, particle pollution (_PM10 and PM2.5_), _carbon monoxide (CO)_, _nitrogen dioxide (NO2)_, and _sulfur dioxide (SO2)_.** The AQI tells the public how clean or polluted the air is and how to avoid health effects associated with poor air quality.
- The AQI focuses on health effects that may be experienced within a few hours or days after breathing polluted air and uses a normalized scale from 0 to 500; the higher the AQI value, the greater the level of pollution and the greater the health concern. An AQI value of 100 generally corresponds to the level of the short-term National Ambient Air Quality Standard for the pollutant. **AQI values at and below 100 are generally considered to be satisfactory. When AQI values are above 100, air quality is considered to be unhealthy, at first for members of populations at greatest risk of a health effect, then for the entire population as AQI values get higher (greater than 150).**

### Filter the Parameter Codes

In [None]:
param_df = pd.read_csv('../data/parameters.csv')
param_df = param_df.rename(columns={'Parameter Code': 'code', 
                                          'Parameter': 'param',
                                          'Parameter Abbreviation': 'abbr',
                                          'Still Valid': 'valid',
                                          'Standard Units':'std_units'})
param_df = param_df[['code', 'param', 'abbr', 'valid', 'std_units']].sort_values(by='code').reset_index(drop=True)
param_df['code'] = param_df.code.astype('string')

#### Obtain the Common Paramaters used in AQI Reports (excepting '88502', which is not used in NAAQS [National Ambient Air Quality Standards] decisions)

In [None]:
aqi_bp = pd.read_csv('../data/aqi_breakpoints.csv')
aqi_bp.columns = aqi_bp.columns.str.lower().str.replace(' ', '_')
aqi_params = list(aqi_bp.parameter_code.astype('string').unique())
aqi_params.remove('88502')

In [None]:
param_df = param_df.loc[(param_df.abbr.str.contains('SMKE|^CO2$|^NO$|^BZ$', regex=True, na=False, case=False) |
                               param_df.code.isin(aqi_params))]

param_df.sort_values(by='code', inplace=True)
print('Number of parameters:', len(param_df))
param_df

In [None]:
# Create a list of lists, each inner list containing a maximum of 5 unique parameters
param_codes = [c for c in param_df.code]
num_lists = math.ceil(len(param_codes)/5.0)
param_iter = iter(param_codes)
param_lists = [list(islice(param_iter, 5)) for num in range(num_lists)]
param_lists

#### Obtain the Air Quality Data (Limit of 5 Parameter Codes per Request)

In [None]:
aqs_df = pd.DataFrame()

for date in date_list:
    for state in fips_dict:
        for p_list in param_lists:
            request_params = ','.join(p_list)
            params = {'email': epa_email,
                      'key': epa_key,
                      'param': request_params,
                      'bdate': date,
                      'edate': date,
                      'state': state,
                      'county': fips_dict[state]}

            # Perform the request, waiting at least 5 seconds between requests
            time.sleep(6)
            response = requests.get(endpoint, params=params).json()
            if (response['Header'][0]['status'] == 'Success') & ('Data' in response.keys()):
                aqs_df = pd.concat([aqs_df, pd.DataFrame(response['Data'])], ignore_index=True)
            else:
                print('Issue with data retrieval; Reason:', response['Header'][0]['status'],
                      '\nDate:', date, '-- State:', state, '-- County:', fips_dict[state], '-- Params:', request_params, '\n')

aqs_df = aqs_df.sort_values(by=['parameter_code', 'state_code', 'county_code']).reset_index(drop=True)    

In [None]:
aqs_df.shape

In [None]:
aqs_df.head()

#### Export the Data as an Excel and/or CSV

In [None]:
aqs_df.to_csv(r'../data/aqs_data_raw.csv', index=False)
aqs_df.to_excel(r'../data/aqs_data_raw.xlsx', sheet_name='aqs_data_raw', index=False)