Data INTRO Source of Air Pollution and its effect on Respiratory Health:<br>
https://www.epa.gov/pmcourse/particle-pollution-and-respiratory-effects

- Limit the size of queries. Our database contains billions of values and you may request more than you intend. If you are unsure of the amount of data, start small and work your way up. We request that you limit queries to 1,000,000 rows of data each. **You can use the "observation count" field on the annualData service to determine how much data exists for a time-parameter-geography combination.** If you have any questions or need advice, please contact us.
- Limit the frequency of queries. Our system can process a limited load. If scripting requests, please wait for one request to complete before submitting another and do not make more than 10 requests per minute. Also, we request a pause of 5 seconds between requests and adjust accordingly for response time and size.

In [1]:
import requests
import json
import pandas as pd
import re
import time

In [2]:
with open('../data/credentials.json') as file:
    credentials = json.load(file)
epa_key = credentials['epa_key']
epa_email = credentials['epa_email']

#### Declare the Static Variables

In [3]:
# State Level FIPS Codes
fips_ca = '06'
fips_co = '08'
fips_ga = '13'
fips_tn = '47'

# County Level FIPS Codes
fips_sd = '073'
fips_la = '037'
fips_den = '031'
fips_ful = '121'
fips_dek = '089'
fips_dav = '037'

fips_dict = {fips_ca: [fips_sd, fips_la],
             fips_co: [fips_den],
             fips_ga: [fips_ful, fips_dek],
             fips_tn: [fips_dav]}

# Request Dates
date_list = ['20110101', '20160101', '20210101']

#### Obtain the AQS (Air Quality System) Parameter Codes
AQI = Air Quality Index<br>
- The AQI is a nationally uniform color-coded index for reporting and forecasting daily air quality. It is used to report on the most common ambient air pollutants that are regulated under the Clean Air Act: **ground-level _ozone_, particle pollution (_PM10 and PM2.5_), _carbon monoxide (CO)_, _nitrogen dioxide (NO2)_, and _sulfur dioxide (SO2)_.** The AQI tells the public how clean or polluted the air is and how to avoid health effects associated with poor air quality.
- The AQI focuses on health effects that may be experienced within a few hours or days after breathing polluted air and uses a normalized scale from 0 to 500; the higher the AQI value, the greater the level of pollution and the greater the health concern. An AQI value of 100 generally corresponds to the level of the short-term National Ambient Air Quality Standard for the pollutant. **AQI values at and below 100 are generally considered to be satisfactory. When AQI values are above 100, air quality is considered to be unhealthy, at first for members of populations at greatest risk of a health effect, then for the entire population as AQI values get higher (greater than 150).**

### Filter the Parameter Codes

In [4]:
param_codes = pd.read_csv('../data/parameters.csv')
param_codes = param_codes.rename(columns={'Parameter Code': 'code', 
                                          'Parameter': 'param',
                                          'Parameter Abbreviation': 'abbr',
                                          'Still Valid': 'valid',
                                          'Standard Units':'std_units'})
param_codes = param_codes[['code', 'param', 'abbr', 'valid', 'std_units']].sort_values(by='code').reset_index(drop=True)
param_codes['code'] = param_codes.code.astype('string')

#### Obtain the Common Paramaters used in AQI Reports

In [5]:
aqi_bp = pd.read_csv('../data/aqi_breakpoints.csv')
aqi_bp.columns = aqi_bp.columns.str.lower().str.replace(' ', '_')
aqi_params = list(aqi_bp.parameter_code.astype('string').unique())

In [6]:
param_codes = param_codes.loc[param_codes['valid'] == 'YES']
param_codes = param_codes.loc[~param_codes.abbr.isnull() & 
                              (param_codes.abbr.str.contains('SMKE|^CO2$|^NO$|^BZ$', regex=True, na=False, case=False) |
                               param_codes.code.isin(aqi_params))]

param_codes.sort_values(by='code', inplace=True)
print('Number of parameters:', len(param_codes))
param_codes

Number of parameters: 11


Unnamed: 0,code,param,abbr,valid,std_units
13,11204,Smoke,SMKE,YES,Micrograms/cubic meter (25 C)
534,42101,Carbon monoxide,CO,YES,Parts per million
535,42102,Carbon dioxide,CO2,YES,Parts per million
549,42401,Sulfur dioxide,SO2,YES,Parts per billion
555,42601,Nitric oxide (NO),NO,YES,Parts per billion
556,42602,Nitrogen dioxide (NO2),NO2,YES,Parts per billion
855,44201,Ozone,O3,YES,Parts per million
866,45201,Benzene,BZ,YES,Parts per billion Carbon
1029,81102,PM10 Total 0-10um STP,PM10,YES,Micrograms/cubic meter (25 C)
1306,88101,PM2.5 - Local Conditions,LC25,YES,Micrograms/cubic meter (LC)


#### Obtain the Air Quality Data (Limit of 5 Parameter Codes per Request)

In [11]:
aqs_df = pd.DataFrame()
endpoint = 'https://aqs.epa.gov/data/api/annualData/byCounty?'
params = {'email': epa_email,
                          'key': epa_key,
                          'param': '88101',
                          'bdate': '20210101',
                          'edate': '20210101',
                          'state': '36',
                          'county': '061'}
response = requests.get(endpoint, params=params).json()

In [12]:
response

{'Header': [{'status': 'Success',
   'request_time': '2022-12-16T16:10:56-05:00',
   'url': 'https://aqs.epa.gov/data/api/annualData/byCounty?email=rbzing%40gmail.com&key=mauvecat88&param=88101&bdate=20210101&edate=20210101&state=36&county=061',
   'rows': 12}],
 'Data': [{'state_code': '36',
   'county_code': '061',
   'site_number': '0079',
   'parameter_code': '88101',
   'poc': 2,
   'latitude': 40.7997,
   'longitude': -73.93432,
   'datum': 'WGS84',
   'parameter': 'PM2.5 - Local Conditions',
   'sample_duration_code': '7',
   'sample_duration': '24 HOUR',
   'pollutant_standard': 'PM25 24-hour 2006',
   'metric_used': 'Daily Mean',
   'method': 'R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC - Gravimetric',
   'year': 2021,
   'units_of_measure': 'Micrograms/cubic meter (LC)',
   'event_type': 'No Events',
   'observation_count': 57,
   'observation_percent': 93.0,
   'validity_indicator': 'Y',
   'valid_day_count': 57,
   'required_day_count': 61,
   'exceptional_data_co

In [7]:
aqs_df = pd.DataFrame()
endpoint = 'https://aqs.epa.gov/data/api/annualData/byCounty?'

for date in date_list:
    for state in fips_dict:
        for county in fips_dict[state]:
            
            param_list_full = [c for c in param_codes.code]
            param_list_limit = []

            while len(param_list_full):
                if len(param_list_full) >= 5:
                    param_list_limit = param_list_full[:5]
                    param_list_full = param_list_full[5:]
                else:
                    param_list_limit = param_list_full
                    param_list_full = param_list_full[len(param_list_full):]

                request_params = ','.join(param_list_limit)
                params = {'email': epa_email,
                          'key': epa_key,
                          'param': request_params,
                          'bdate': date,
                          'edate': date,
                          'state': state,
                          'county': county}

                # PERFORM THE REQUEST, WAIT AT LEAST 5 SECONDS BETWEEN REQUESTS
                time.sleep(6)
                response = requests.get(endpoint, params=params).json()
                if (response['Header'][0]['status'] == 'Success') & ('Data' in response.keys()):
                    aqs_df = pd.concat([aqs_df, pd.DataFrame(response['Data'])], ignore_index=True)
                else:
                    print('Issue with data retrieval; Reason:', response['Header'][0]['status'],
                          '\nDate:', date, '-- State:', state, '-- County:', county, '-- Params:', request_params, '\n')

aqs_df = aqs_df.sort_values(by=['parameter_code', 'state_code', 'county_code']).reset_index(drop=True)    

Issue with data retrieval; Reason: No data matched your selection 
Date: 20160101 -- State: 08 -- County: 031 -- Params: 88502 

Issue with data retrieval; Reason: No data matched your selection 
Date: 20160101 -- State: 13 -- County: 089 -- Params: 88502 

Issue with data retrieval; Reason: No data matched your selection 
Date: 20210101 -- State: 08 -- County: 031 -- Params: 88502 

Issue with data retrieval; Reason: No data matched your selection 
Date: 20210101 -- State: 47 -- County: 037 -- Params: 88502 



#### Export the Data as a CSV

In [8]:
aqs_df.to_excel(r'../data/aqs_data_raw.xlsx', sheet_name='aqs_data_raw', index=False)