Data INTRO Source of Air Pollution and its effect on Respiratory Health:<br>
https://www.epa.gov/pmcourse/particle-pollution-and-respiratory-effects

- Limit the size of queries. Our database contains billions of values and you may request more than you intend. If you are unsure of the amount of data, start small and work your way up. We request that you limit queries to 1,000,000 rows of data each. **You can use the "observation count" field on the annualData service to determine how much data exists for a time-parameter-geography combination.** If you have any questions or need advice, please contact us.
- Limit the frequency of queries. Our system can process a limited load. If scripting requests, please wait for one request to complete before submitting another and do not make more than 10 requests per minute. Also, we request a pause of 5 seconds between requests and adjust accordingly for response time and size.

In [38]:
import requests
import json
import pandas as pd
import re
import time

In [2]:
with open('../data/credentials.json') as file:
    credentials = json.load(file)
epa_key = credentials['epa_key']
epa_email = credentials['epa_email']

#### Static Variables

In [3]:
# US FIPS codes - state level
fips_ca = '06'
fips_co = '08'
fips_ga = '13'
fips_tn = '47'

# US FIPS codes - county level
fips_sd = '073'
fips_la = '037'
fips_den = '031'
fips_ful = '121'
fips_dek = '089'
fips_dav = '037'

#### Obtain the AQS (Air Quality System) Parameter Codes
AQI = Air Quality Index<br>
- The AQI is a nationally uniform color-coded index for reporting and forecasting daily air quality. It is used to report on the most common ambient air pollutants that are regulated under the Clean Air Act: **ground-level _ozone_, particle pollution (_PM10 and PM2.5_), _carbon monoxide (CO)_, _nitrogen dioxide (NO2)_, and _sulfur dioxide (SO2)_.** The AQI tells the public how clean or polluted the air is and how to avoid health effects associated with poor air quality.
- The AQI focuses on health effects that may be experienced within a few hours or days after breathing polluted air and uses a normalized scale from 0 to 500; the higher the AQI value, the greater the level of pollution and the greater the health concern. An AQI value of 100 generally corresponds to the level of the short-term National Ambient Air Quality Standard for the pollutant. **AQI values at and below 100 are generally considered to be satisfactory. When AQI values are above 100, air quality is considered to be unhealthy, at first for members of populations at greatest risk of a health effect, then for the entire population as AQI values get higher (greater than 150).**

#### Filter the Parameter Codes

In [34]:
param_codes = pd.read_csv('../data/parameter_codes.csv')
param_codes = param_codes.rename(columns={'Parameter Code': 'code', 
                                                 'Parameter': 'param',
                                                 'Parameter Abbreviation': 'abbr',
                                                 'Still Valid': 'valid',
                                                 'Standard Units':'std_units'})
param_codes = param_codes[['code', 'param', 'abbr', 'valid', 'std_units']].sort_values(by='code').reset_index(drop=True)
param_codes['code'] = param_codes.code.astype('string')

In [36]:
param_codes = param_codes.loc[param_codes['valid'] == 'YES']
param_codes = param_codes.loc[param_codes.param.str.contains('^PM10|^PM2.5 [STP|Raw|Total]|Ozone', regex=True, na=False, case=False) |
                              param_codes.abbr.str.contains('^[CSN]{1}O2{0,1}$|Smoke', regex=True, na=False, )]
print(len(param_codes))
param_codes

13


Unnamed: 0,code,param,abbr,valid,std_units
534,42101,Carbon monoxide,CO,YES,Parts per million
535,42102,Carbon dioxide,CO2,YES,Parts per million
549,42401,Sulfur dioxide,SO2,YES,Parts per billion
555,42601,Nitric oxide (NO),NO,YES,Parts per billion
556,42602,Nitrogen dioxide (NO2),NO2,YES,Parts per billion
855,44201,Ozone,O3,YES,Parts per million
1029,81102,PM10 Total 0-10um STP,PM10,YES,Micrograms/cubic meter (25 C)
1030,81103,PM10-2.5 STP,PMC,YES,Micrograms/cubic meter (25 C)
1031,81104,PM2.5 STP,PM2.5,YES,Micrograms/cubic meter (25 C)
1173,85101,PM10 - LC,LC10,YES,Micrograms/cubic meter (LC)


#### Obtain the Air Quality Data (Limit of 5 Codes per Request)

In [44]:
resp = pd.DataFrame()
endpoint = 'https://aqs.epa.gov/data/api/annualData/byCounty?'

# Loop through the parameter codes, 5 at a time
param_list_full = [c for c in param_codes.code]

#while len(param_list_full):
i=0
while i < 1:
    if len(param_list_full) >= 5:
        param_list_limit = param_list_full[:5]
        param_list_full = param_list_full[5:]
    else:
        param_list_limit = param_list_full
        param_list_full = param_list_full[len(param_list_full):]

    # PERFORM THE REQUEST 
    request_params = ','.join(param_list_limit)
    params = {'email': epa_email,
              'key': epa_key,
              'param': request_params,
              'bdate': '20200101',
              'edate': '20200101',
              'state': fips_ca,
              'county': fips_sd}
    
    # WAIT AT LEAST 5 SECONDS BETWEEN REQUESTS
    time.sleep(8)
    resp = requests.get(endpoint, params=params).json()
    i+=1
    


In [45]:
resp['Header']

[{'status': 'Success',
  'request_time': '2022-12-09T16:38:21-05:00',
  'url': 'https://aqs.epa.gov/data/api/annualData/byCounty?email=rbzing%40gmail.com&key=mauvecat88&param=42101%2C42102%2C42401%2C42601%2C42602&bdate=20200101&edate=20200101&state=06&county=073',
  'rows': 35}]

In [49]:
pd.DataFrame(resp['Data'])

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,...,fiftieth_percentile,tenth_percentile,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,6,73,1006,42601,1,32.842318,-116.768293,NAD83,Nitric oxide (NO),1,...,0.0,0.0,Alpine,"2300 VICTORIA DR., ALPINE",California,San Diego,Alpine,41740,"San Diego-Carlsbad, CA",2021-04-18
1,6,73,1,42601,1,32.631242,-117.059088,NAD83,Nitric oxide (NO),1,...,1.0,0.0,Chula Vista,"84 E. 'J' ST., CHULA VISTA",California,San Diego,Chula Vista,41740,"San Diego-Carlsbad, CA",2021-04-18
2,6,73,1022,42601,3,32.789565,-116.944308,NAD83,Nitric oxide (NO),1,...,0.5,0.0,El Cajon - Lexington Elementary School,533 First Street,California,San Diego,El Cajon,41740,"San Diego-Carlsbad, CA",2021-04-18
3,6,73,1014,42601,1,32.57816,-116.92135,NAD83,Nitric oxide (NO),1,...,1.0,0.0,Donovan,"480 ALTA RD, OTAY MESA, CA",California,San Diego,Otay Mesa,41740,"San Diego-Carlsbad, CA",2021-04-18
4,6,73,1026,42601,1,32.710177,-117.142665,NAD83,Nitric oxide (NO),1,...,1.0,0.0,San Diego - Sherman Elementary School,450B 24th Street,California,San Diego,San Diego,41740,"San Diego-Carlsbad, CA",2021-04-18
5,6,73,1017,42601,1,32.985442,-117.08218,WGS84,Nitric oxide (NO),1,...,6.0,1.0,San Diego -Rancho Carmel Drive,11403 Rancho Carmel Drive,California,San Diego,San Diego,41740,"San Diego-Carlsbad, CA",2021-04-18
6,6,73,1016,42601,1,32.845709,-117.123964,NAD83,Nitric oxide (NO),1,...,0.0,0.0,Kearny Villa Rd.,"6125A KEARNY VILLA RD., SAN DIEGO",California,San Diego,San Diego,41740,"San Diego-Carlsbad, CA",2021-04-18
7,6,73,1008,42601,1,33.217055,-117.396177,NAD83,Nitric oxide (NO),1,...,0.0,0.0,Camp Pendleton,21441-W B STREET,California,San Diego,Camp Pendleton South,41740,"San Diego-Carlsbad, CA",2021-04-18
8,6,73,1006,42602,1,32.842318,-116.768293,NAD83,Nitrogen dioxide (NO2),1,...,3.0,1.0,Alpine,"2300 VICTORIA DR., ALPINE",California,San Diego,Alpine,41740,"San Diego-Carlsbad, CA",2022-04-20
9,6,73,1,42602,1,32.631242,-117.059088,NAD83,Nitrogen dioxide (NO2),1,...,5.0,1.0,Chula Vista,"84 E. 'J' ST., CHULA VISTA",California,San Diego,Chula Vista,41740,"San Diego-Carlsbad, CA",2021-10-31


In [24]:
parameter_codes.to_csv('../data/parameter_codes.csv')

In [None]:
endpoint = 'https://aqs.epa.gov/data/api/annualData/byCounty?'
# email=test@aqs.api& key=test& param=88101,88502& bdate=20160101& edate=20160229& state=37& county=183"
# required parameters: email, key, param, bdate, edate, state, county

request_params = ','.join(param_list_limit)

params = {'email': epa_email,
          'key': epa_key,
          'param': request_params
          'bdate': '20190101',
          'edate': '20200101',
          'state': fips_ca,
          'county': fips_sd}

# WAIT AT LEAST 5 SECONDS BETWEEN LOOP ITERATIONS!!!

In [None]:
resp = requests.get(endpoint, params=params).json()

In [None]:
resp = pd.DataFrame(resp)

In [None]:
resp