# US EPA Air Quality Index

This notebook is a supplementary document for the common analysis. I obtained the AQI data using the US Environmental Protection Agency Air Quality Service API. This is a historical API and does not provide real-time air quality data.

## Preliminaries and Constants

In [65]:
# 
#    These are standard python modules
#
import json, time
#
#    The 'requests' module is a distribution module for making web requests.
#
import requests

import pandas as pd

from tqdm import tqdm


In [44]:
#########
#
#    CONSTANTS
#

#
#    This is the root of all AQS API URLs
#
API_REQUEST_URL = 'https://aqs.epa.gov/data/api'

#
#    These are 'actions' we can ask the API to take or requests that we can make of the API
#
#    Sign-up request - generally only performed once - unless you lose your key
API_ACTION_SIGNUP = '/signup?email={email}'
#
#    List actions provide information on API parameter values that are required by some other actions/requests
API_ACTION_LIST_CLASSES = '/list/classes?email={email}&key={key}'
API_ACTION_LIST_PARAMS = '/list/parametersByClass?email={email}&key={key}&pc={pclass}'
API_ACTION_LIST_SITES = '/list/sitesByCounty?email={email}&key={key}&state={state}&county={county}'
#
#    Monitor actions are requests for monitoring stations that meet specific criteria
API_ACTION_MONITORS_COUNTY = '/monitors/byCounty?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&state={state}&county={county}'
API_ACTION_MONITORS_BOX = '/monitors/byBox?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&minlat={minlat}&maxlat={maxlat}&minlon={minlon}&maxlon={maxlon}'
#
#    Summary actions are requests for summary data. These are for daily summaries
API_ACTION_DAILY_SUMMARY_COUNTY = '/dailyData/byCounty?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&state={state}&county={county}'
# API_ACTION_DAILY_SUMMARY_COUNTY = '/annualData/byCounty?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&state={state}&county={county}'
API_ACTION_DAILY_SUMMARY_BOX = '/dailyData/byBox?email={email}&key={key}&param={param}&bdate={begin_date}&edate={end_date}&minlat={minlat}&maxlat={maxlat}&minlon={minlon}&maxlon={maxlon}'
#
#    It is always nice to be respectful of a free data resource.
#    We're going to observe a 100 requests per minute limit - which is fairly nice
API_LATENCY_ASSUMED = 0.002       # Assuming roughly 2ms latency on the API and network
API_THROTTLE_WAIT = (1.0/100.0)-API_LATENCY_ASSUMED
#
#
#    This is a template that covers most of the parameters for the actions we might take, from the set of actions
#    above. In the examples below, most of the time parameters can either be supplied as individual values to a
#    function - or they can be set in a copy of the template and passed in with the template.
# 
AQS_REQUEST_TEMPLATE = {
    "email":      "",     
    "key":        "",      
    "state":      "",     # the two digit state FIPS # as a string
    "county":     "",     # the three digit county FIPS # as a string
    "begin_date": "",     # the start of a time window in YYYYMMDD format
    "end_date":   "",     # the end of a time window in YYYYMMDD format, begin_date and end_date must be in the same year
    "minlat":    0.0,
    "maxlat":    0.0,
    "minlon":    0.0,
    "maxlon":    0.0,
    "param":     "",     # a list of comma separated 5 digit codes, max 5 codes requested
    "pclass":    ""      # parameter class is only used by the List calls
}

#    User should replace their user name and API key for reuse
USERNAME = ''
APIKEY = ''

## Sign up for API Key
The code from this section should only be executed once to obtain the API key.

In [4]:
#
#    This implements the sign-up request. The parameters are standardized so that this function definition matches
#    all of the others. However, the easiest way to call this is to simply call this function with your preferred
#    email address.
#
def request_signup(email_address = None,
                   endpoint_url = API_REQUEST_URL, 
                   endpoint_action = API_ACTION_SIGNUP, 
                   request_template = AQS_REQUEST_TEMPLATE,
                   headers = None):
    
    # Make sure we have a string - if you don't have access to this email addres, things might go badly for you
    if email_address:
        request_template['email'] = email_address        
    if not request_template['email']: 
        raise Exception("Must supply an email address to call 'request_signup()'")
    
    # Compose the signup url - create a request URL by combining the endpoint_url with the parameters for the request
    request_url = endpoint_url+endpoint_action.format(**request_template)
        
    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response




In [5]:
#
#    A SIGNUP request is only to be done once, to request a key. A key is sent to that email address and needs to be confirmed with a click through
#    This code should probably be commented out after you've made your key request to make sure you don't accidentally make a new sign-up request
#
print("Requesting SIGNUP ...")
response = request_signup("gjwong@uw.edu")
print(json.dumps(response,indent=4))
#

Requesting SIGNUP ...
{
    "Header": [
        {
            "status": "Success",
            "request_time": "2023-11-06T02:43:09-05:00",
            "url": "https://aqs.epa.gov/data/api/signup?email=gjwong@uw.edu"
        }
    ],
    "Data": [
        "You should receive a registration confirmation email with a link for confirming your email shortly."
    ]
}


## Make a List Request

In [7]:
#
#    This implements the list request. There are several versions of the list request that only require email and key.
#    This code sets the default action/requests to list the groups or parameter class descriptors. Having those descriptors 
#    allows one to request the individual (proprietary) 5 digit codes for individual air quality measures by using the
#    param request. Some code in later cells will illustrate those requests.
#
def request_list_info(email_address = None, key = None,
                      endpoint_url = API_REQUEST_URL, 
                      endpoint_action = API_ACTION_LIST_CLASSES, 
                      request_template = AQS_REQUEST_TEMPLATE,
                      headers = None):
    
    #  Make sure we have email and key - at least
    #  This prioritizes the info from the call parameters - not what's already in the template
    if email_address:
        request_template['email'] = email_address
    if key:
        request_template['key'] = key
    
    # For the basic request we need an email address and a key
    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_list_info()'")
    if not request_template['key']: 
        raise Exception("Must supply a key to call 'request_list_info()'")

    # compose the request
    request_url = endpoint_url+endpoint_action.format(**request_template)
        
    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response



To obtain the AQI data, we first need to determine the code of data to use. The following code finds the class name  to start getting down to the sensor ID.

In [8]:
#
#   The default should get us a list of the various groups or classes of sensors. These classes are user defined names for clustors of
#   sensors that might be part of a package or default air quality sensing station. We need a class name to start getting down to the
#   a sensor ID. Each sensor type has an ID number. We'll eventually need those ID numbers to be able to request values that come from
#   that specific sensor.
#
request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = USERNAME
request_data['key'] = APIKEY

response = request_list_info(request_template=request_data)

if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))


[
    {
        "code": "AIRNOW MAPS",
        "value_represented": "The parameters represented on AirNow maps (88101, 88502, and 44201)"
    },
    {
        "code": "ALL",
        "value_represented": "Select all Parameters Available"
    },
    {
        "code": "AQI POLLUTANTS",
        "value_represented": "Pollutants that have an AQI Defined"
    },
    {
        "code": "CORE_HAPS",
        "value_represented": "Urban Air Toxic Pollutants"
    },
    {
        "code": "CRITERIA",
        "value_represented": "Criteria Pollutants"
    },
    {
        "code": "CSN DART",
        "value_represented": "List of CSN speciation parameters to populate the STI DART tool"
    },
    {
        "code": "FORECAST",
        "value_represented": "Parameters routinely extracted by AirNow (STI)"
    },
    {
        "code": "HAPS",
        "value_represented": "Hazardous Air Pollutants"
    },
    {
        "code": "IMPROVE CARBON",
        "value_represented": "IMPROVE Carbon Parameters"
    }

In [9]:
#
#   Once we have a list of the classes or groups of possible sensors, we can find the sensor IDs that make up that class (group)
#   The one that looks to be associated with the Air Quality Index is "AQI POLLUTANTS"
#   We'll use that to make another list request.
#
AQI_PARAM_CLASS = "AQI POLLUTANTS"


The following code snippet finds the pollutant codes with the AQI parameter class.

In [10]:
#
#   Structure a request to get the sensor IDs associated with the AQI
#
request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = USERNAME
request_data['key'] = APIKEY
request_data['pclass'] = AQI_PARAM_CLASS  # here we specify that we want this 'pclass' or parameter classs

response = request_list_info(request_template=request_data, endpoint_action=API_ACTION_LIST_PARAMS)

if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))


[
    {
        "code": "42101",
        "value_represented": "Carbon monoxide"
    },
    {
        "code": "42401",
        "value_represented": "Sulfur dioxide"
    },
    {
        "code": "42602",
        "value_represented": "Nitrogen dioxide (NO2)"
    },
    {
        "code": "44201",
        "value_represented": "Ozone"
    },
    {
        "code": "81102",
        "value_represented": "PM10 Total 0-10um STP"
    },
    {
        "code": "88101",
        "value_represented": "PM2.5 - Local Conditions"
    },
    {
        "code": "88502",
        "value_represented": "Acceptable PM2.5 AQI & Speciation Mass"
    }
]


We should now have (above) a response containing a set of sensor ID numbers. The list should include the sensor numbers as well as a description or name for each sensor. 

The EPA AQS API has limits on some call parameters. Specifically, when we request data for sensors we can only specify a maximum of 5 different sensor values to return. This means we cannot get all of the Air Quality Index parameters in one request for data. We have to break it up.

What I did below was to break the request into two logical groups, the AQI sensors that sample gasses and the AQI sensors that sample particles in the air.

In [11]:
#
#   Given the set of sensor codes, now we can create a parameter list or 'param' value as defined by the AQS API spec.
#   It turns out that we want all of these measures for AQI, but we need to have two different param constants to get
#   all seven of the code types. We can only have a max of 5 sensors/values request per param.
#
#   Gaseous AQI pollutants CO, SO2, NO2, and O2
AQI_PARAMS_GASEOUS = "42101,42401,42602,44201"
#
#   Particulate AQI pollutants PM10, PM2.5, and Acceptable PM2.5
AQI_PARAMS_PARTICULATES = "81102,88101,88502"
#   
#

The following city location contains the information of Cedar City, Utah.


In [12]:
#
#   We'll use Cedar City, Utah.
#
CITY_LOCATIONS = {
    'cedar_city' :    {'city'   : 'Cedar City',
                       'county' : 'Iron',
                       'state'  : 'Utah',
                       'fips'   : '49021',
                       'latlon' : [37.6775, -113.0619] }, 
}


## Bounding Box Approach
Before using the bounding box approach, I tried to find all monitors with the county approach. However, the county approach doesn't give enough AQI data for the analysis. Thus, I decided to use the bounding box approach to obtain monitors and obtain the AQI data.

In [62]:
#
#   These are rough estimates for creating bounding boxes based on a city location
#   You can find these rough estimates on the USGS website:
#   https://www.usgs.gov/faqs/how-much-distance-does-a-degree-minute-and-second-cover-your-maps
#
LAT_25MILES = 25.0 * (1.0/69.0)    # This is about 25 miles of latitude in decimal degrees
LON_25MILES = 25.0 * (1.0/54.6)    # This is about 25 miles of longitude in decimal degrees
#
#   Compute a rough estimates for a bounding box around a given place
#   The bounding box is scaled in 50 mile increments. That is the bounding box will have sides that
#   are rough multiples of 50 miles, with the center of the box around the indicated place.
#   The scale parameter determines the scale (size) of the bounding box
#
def bounding_latlon(place=None,scale=1.0):
    minlat = place['latlon'][0] - float(scale) * LAT_25MILES
    maxlat = place['latlon'][0] + float(scale) * LAT_25MILES
    minlon = place['latlon'][1] - float(scale) * LON_25MILES
    maxlon = place['latlon'][1] + float(scale) * LON_25MILES
    return [minlat,maxlat,minlon,maxlon]



In [66]:
#
#    This implements the monitors request. This requests monitoring stations. This can be done by state, county, or bounding box. 
#
#    Like the two other functions, this can be called with a mixture of a defined parameter dictionary, or with function
#    parameters. If function parameters are provided, those take precedence over any parameters from the request template.
#
def request_monitors(email_address = None, key = None, param=None,
                          begin_date = None, end_date = None, fips = None,
                          endpoint_url = API_REQUEST_URL, 
                          endpoint_action = API_ACTION_MONITORS_COUNTY, 
                          request_template = AQS_REQUEST_TEMPLATE,
                          headers = None):
    
    #  This prioritizes the info from the call parameters - not what's already in the template
    if email_address:
        request_template['email'] = email_address
    if key:
        request_template['key'] = key
    if param:
        request_template['param'] = param
    if begin_date:
        request_template['begin_date'] = begin_date
    if end_date:
        request_template['end_date'] = end_date
    if fips and len(fips)==5:
        request_template['state'] = fips[:2]
        request_template['county'] = fips[2:]            

    # Make sure there are values that allow us to make a call - these are always required
    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_monitors()'")
    if not request_template['key']: 
        raise Exception("Must supply a key to call 'request_monitors()'")
    if not request_template['param']: 
        raise Exception("Must supply param values to call 'request_monitors()'")
    if not request_template['begin_date']: 
        raise Exception("Must supply a begin_date to call 'request_monitors()'")
    if not request_template['end_date']: 
        raise Exception("Must supply an end_date to call 'request_monitors()'")
    # Note we're not validating FIPS fields because not all of the monitors actions require the FIPS numbers
    
    # compose the request
    request_url = endpoint_url+endpoint_action.format(**request_template)
    
    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response


The following code tries out different size of the bounding box. For the particulate pollutants, the best box is the 150 mile box since it has the most data with a reasonable distance.

In [73]:
#    Get monitoring stations for PARTICULATES
#
#
#    Create a copy of the AQS_REQUEST_TEMPLATE
#
request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = USERNAME
request_data['key'] = APIKEY
request_data['param'] = AQI_PARAMS_PARTICULATES     # same particulate request as the one abover
# 
#
#   Now, we need bounding box parameters

#   50 mile box
# bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=1.0)
#   100 mile box
# bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=2.0)
#   150 mile box
bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=3.0)
#   200 mile box
#bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=4.0)

# the bbox response comes back as a list - [minlat,maxlat,minlon,maxlon]

#   put our bounding box into the request_data
request_data['minlat'] = bbox[0]
request_data['maxlat'] = bbox[1]
request_data['minlon'] = bbox[2]
request_data['maxlon'] = bbox[3]

#
#   we need to change the action for the API from the default to the bounding box - same recent date for now
response = request_monitors(request_template=request_data, begin_date="20210701", end_date="20210731",
                            endpoint_action = API_ACTION_MONITORS_BOX)
#
#
#
if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))


[
    {
        "state_code": "49",
        "county_code": "017",
        "site_number": "0101",
        "parameter_code": "88502",
        "poc": 1,
        "parameter_name": "Acceptable PM2.5 AQI & Speciation Mass",
        "open_date": "1988-03-02",
        "close_date": null,
        "concurred_exclusions": null,
        "dominant_source": null,
        "measurement_scale": "REGIONAL SCALE",
        "measurement_scale_def": "50 TO HUNDREDS KM",
        "monitoring_objective": "GENERAL/BACKGROUND",
        "last_method_code": "707",
        "last_method_description": "IMPROVE Module A with Cyclone Inlet-Teflon Filter, 2.2 sq. cm. - GRAVIMETRIC",
        "last_method_begin_date": "1999-11-06",
        "naaqs_primary_monitor": null,
        "qa_primary_monitor": null,
        "monitor_type": "EPA",
        "networks": "IMPROVE",
        "monitoring_agency_code": "0745",
        "monitoring_agency": "National Park Service",
        "si_id": 93251,
        "latitude": 37.618383,
       

For getting gaseous pollutant monitors, the 100 mile box works well.

In [81]:
#
#    Get monitoring stations for GASEOUS
#
request_data['param'] = AQI_PARAMS_GASEOUS

#   Now, we need bounding box parameters

#   50 mile box
# bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=1.0)
#   100 mile box
bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=2.0)
#   150 mile box
# bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=3.0)
#   200 mile box
# bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=4.0)

# the bbox response comes back as a list - [minlat,maxlat,minlon,maxlon]

#   put our bounding box into the request_data
request_data['minlat'] = bbox[0]
request_data['maxlat'] = bbox[1]
request_data['minlon'] = bbox[2]
request_data['maxlon'] = bbox[3]

#
#   we need to change the action for the API from the default to the bounding box - same recent date for now
response = request_monitors(request_template=request_data, begin_date="20210701", end_date="20210731",
                            endpoint_action = API_ACTION_MONITORS_BOX)
#
#
#
if response["Header"][0]['status'] == "Success":
    print(json.dumps(response['Data'],indent=4))
else:
    print(json.dumps(response,indent=4))

[
    {
        "state_code": "49",
        "county_code": "053",
        "site_number": "0130",
        "parameter_code": "44201",
        "poc": 1,
        "parameter_name": "Ozone",
        "open_date": "2004-01-12",
        "close_date": null,
        "concurred_exclusions": null,
        "dominant_source": null,
        "measurement_scale": "REGIONAL SCALE",
        "measurement_scale_def": "50 TO HUNDREDS KM",
        "monitoring_objective": "GENERAL/BACKGROUND; WELFARE RELATED IMPACTS",
        "last_method_code": "047",
        "last_method_description": "INSTRUMENTAL - ULTRA VIOLET",
        "last_method_begin_date": "2004-01-12",
        "naaqs_primary_monitor": "Y",
        "qa_primary_monitor": null,
        "monitor_type": "NON-EPA FEDERAL",
        "networks": "CASTNET",
        "monitoring_agency_code": "0745",
        "monitoring_agency": "National Park Service",
        "si_id": 91905,
        "latitude": 37.1983,
        "longitude": -113.1506,
        "datum": "WGS84

## Make Daily Summary Request

In [80]:
#
#    This implements the daily summary request. Daily summary provides a daily summary value for each sensor being requested
#    from the start date to the end date. 
#
#    Like the two other functions, this can be called with a mixture of a defined parameter dictionary, or with function
#    parameters. If function parameters are provided, those take precedence over any parameters from the request template.
#
def request_daily_summary(email_address = None, key = None, param=None,
                          begin_date = None, end_date = None, fips = None,
                          endpoint_url = API_REQUEST_URL, 
                          endpoint_action = API_ACTION_DAILY_SUMMARY_BOX, 
                          request_template = AQS_REQUEST_TEMPLATE,
                          headers = None):
    
    #  This prioritizes the info from the call parameters - not what's already in the template
    if email_address:
        request_template['email'] = email_address
    if key:
        request_template['key'] = key
    if param:
        request_template['param'] = param
    if begin_date:
        request_template['begin_date'] = begin_date
    if end_date:
        request_template['end_date'] = end_date
    if fips and len(fips)==5:
        request_template['state'] = fips[:2]
        request_template['county'] = fips[2:]            

    # Make sure there are values that allow us to make a call - these are always required
    if not request_template['email']:
        raise Exception("Must supply an email address to call 'request_daily_summary()'")
    if not request_template['key']: 
        raise Exception("Must supply a key to call 'request_daily_summary()'")
    if not request_template['param']: 
        raise Exception("Must supply param values to call 'request_daily_summary()'")
    if not request_template['begin_date']: 
        raise Exception("Must supply a begin_date to call 'request_daily_summary()'")
    if not request_template['end_date']: 
        raise Exception("Must supply an end_date to call 'request_daily_summary()'")
    # Note we're not validating FIPS fields because not all of the daily summary actions require the FIPS numbers
        
    # compose the request
    request_url = endpoint_url+endpoint_action.format(**request_template)
        
    # make the request
    try:
        # Wait first, to make sure we don't exceed a rate limit in the situation where an exception occurs
        # during the request processing - throttling is always a good practice with a free data source
        if API_THROTTLE_WAIT > 0.0:
            time.sleep(API_THROTTLE_WAIT)
        response = requests.get(request_url, headers=headers)
        json_response = response.json()
    except Exception as e:
        print(e)
        json_response = None
    return json_response



To make the daily summary request more efficiently, I made the following functions to generate all gaseous / particulates AQI for one fire season. The annual fire season runs from May 1st through October 31st.

In [83]:
request_data = AQS_REQUEST_TEMPLATE.copy()
request_data['email'] = USERNAME
request_data['key'] = APIKEY


# This function generates gaseous aqi for 1 year
def yearly_gaseous_aqi(year, request_data):
    request_data['param'] = AQI_PARAMS_GASEOUS
    #   100 mile box
    bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=2.0)
    request_data['minlat'] = bbox[0]
    request_data['maxlat'] = bbox[1]
    request_data['minlon'] = bbox[2]
    request_data['maxlon'] = bbox[3]
    
    # request daily summary data for 1 fire season
    gaseous_aqi = request_daily_summary(request_template=request_data, begin_date=f"{year}0501", end_date=f"{year}1031")
    print(f"Response for the gaseous pollutants in {year}...")
    #
    if gaseous_aqi["Header"][0]['status'] == "Success":
        return gaseous_aqi['Data']
    elif gaseous_aqi["Header"][0]['status'].startswith("No data "):
        print("Looks like the response generated no data. You might take a closer look at your request and the response data.")
    else:
        print(json.dumps(gaseous_aqi,indent=4))

# This function generates particuate aqi for 1 year
def yearly_particulate_aqi(year, request_data):
    request_data['param'] = AQI_PARAMS_PARTICULATES
    #   150 mile box
    bbox = bounding_latlon(CITY_LOCATIONS['cedar_city'],scale=3.0)
    request_data['minlat'] = bbox[0]
    request_data['maxlat'] = bbox[1]
    request_data['minlon'] = bbox[2]
    request_data['maxlon'] = bbox[3]
    
    # request daily summary data for 1 fire season
    particulate_aqi = request_daily_summary(request_template=request_data, begin_date=f"{year}0501", end_date=f"{year}1031")
    print(f"Response for the particulate pollutants in {year}...")
    #
    if particulate_aqi["Header"][0]['status'] == "Success":
        return particulate_aqi['Data']
    elif particulate_aqi["Header"][0]['status'].startswith("No data "):
        print("Looks like the response generated no data. You might take a closer look at your request and the response data.")
    else:
        print(json.dumps(particulate_aqi,indent=4))

To obtain gaseous and particulates AQI, for loops are used to iterate through every year since 1970 and saved in separate lists. We are iterating from 1970 because the EPA was founded in 1970. There was no data recorded from 1963 to 1970.

In [88]:
# Obtain all gas aqi
gas_aqi_list = []
# 1963
for year in range(1970, 2024):
    gas_aqi_year = yearly_gaseous_aqi(year, request_data)
    if gas_aqi_year is None:
        continue
    
    gas_aqi_list += gas_aqi_year

df_gaseous_aqi = pd.DataFrame(gas_aqi_list)

df_gaseous_aqi

Response for the gaseous pollutants in 1970...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the gaseous pollutants in 1971...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the gaseous pollutants in 1972...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the gaseous pollutants in 1973...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the gaseous pollutants in 1974...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the gaseous pollutants in 1975...
Response for the gaseous pollutants in 1976...
Response for the gaseous pollutants in 1977...
Response for the gaseous pollutants in 1978...
Response for the gaseo

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,...,method_code,method,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,49,021,0001,42401,1,37.677477,-113.059671,WGS84,Sulfur dioxide,1,...,013,INSTRUMENTAL - CONDUCTIMETRIC,,650 WEST CENTER STREET CEDAR CITY UTAH,Utah,Iron,Cedar City,16260,"Cedar City, UT",2013-06-11
1,49,021,0001,42401,1,37.677477,-113.059671,WGS84,Sulfur dioxide,1,...,013,INSTRUMENTAL - CONDUCTIMETRIC,,650 WEST CENTER STREET CEDAR CITY UTAH,Utah,Iron,Cedar City,16260,"Cedar City, UT",2013-06-11
2,49,021,0001,42401,1,37.677477,-113.059671,WGS84,Sulfur dioxide,1,...,013,INSTRUMENTAL - CONDUCTIMETRIC,,650 WEST CENTER STREET CEDAR CITY UTAH,Utah,Iron,Cedar City,16260,"Cedar City, UT",2013-06-11
3,49,021,0001,42401,1,37.677477,-113.059671,WGS84,Sulfur dioxide,1,...,013,INSTRUMENTAL - CONDUCTIMETRIC,,650 WEST CENTER STREET CEDAR CITY UTAH,Utah,Iron,Cedar City,16260,"Cedar City, UT",2013-06-11
4,49,021,0001,42401,1,37.677477,-113.059671,WGS84,Sulfur dioxide,1,...,013,INSTRUMENTAL - CONDUCTIMETRIC,,650 WEST CENTER STREET CEDAR CITY UTAH,Utah,Iron,Cedar City,16260,"Cedar City, UT",2013-06-11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41589,49,053,0007,42602,1,37.179125,-113.305096,WGS84,Nitrogen dioxide (NO2),1,...,099,INSTRUMENTAL - GAS PHASE CHEMILUMINESCENCE,,"147 N 870 W, Hurrricane, Utah",Utah,Washington,Hurricane,41100,"St. George, UT",2023-11-02
41590,49,053,0007,42602,1,37.179125,-113.305096,WGS84,Nitrogen dioxide (NO2),1,...,099,INSTRUMENTAL - GAS PHASE CHEMILUMINESCENCE,,"147 N 870 W, Hurrricane, Utah",Utah,Washington,Hurricane,41100,"St. George, UT",2023-11-02
41591,49,053,0007,42602,1,37.179125,-113.305096,WGS84,Nitrogen dioxide (NO2),1,...,099,INSTRUMENTAL - GAS PHASE CHEMILUMINESCENCE,,"147 N 870 W, Hurrricane, Utah",Utah,Washington,Hurricane,41100,"St. George, UT",2023-11-02
41592,49,053,0007,42602,1,37.179125,-113.305096,WGS84,Nitrogen dioxide (NO2),1,...,099,INSTRUMENTAL - GAS PHASE CHEMILUMINESCENCE,,"147 N 870 W, Hurrricane, Utah",Utah,Washington,Hurricane,41100,"St. George, UT",2023-11-02


In [90]:
# Obtain all particulates aqi
part_aqi_list = []
# 1963
for year in range(1970, 2024):
    part_aqi_year = yearly_particulate_aqi(year, request_data)
    if part_aqi_year is None:
        continue
    
    part_aqi_list += part_aqi_year

df_particulates_aqi = pd.DataFrame(part_aqi_list)

df_particulates_aqi

Response for the particulate pollutants in 1970...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate pollutants in 1971...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate pollutants in 1972...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate pollutants in 1973...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate pollutants in 1974...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate pollutants in 1975...
Looks like the response generated no data. You might take a closer look at your request and the response data.
Response for the particulate

Unnamed: 0,state_code,county_code,site_number,parameter_code,poc,latitude,longitude,datum,parameter,sample_duration_code,...,method_code,method,local_site_name,site_address,state,county,city,cbsa_code,cbsa,date_of_last_change
0,49,021,0004,81102,1,37.676644,-113.065226,WGS84,PM10 Total 0-10um STP,7,...,064,HI-VOL-SA/GMW-321-B - GRAVIMETRIC,,"33 N. 100 WEST, CEDAR CITY, UTAH",Utah,Iron,Cedar City,16260,"Cedar City, UT",2021-11-09
1,49,021,0004,81102,1,37.676644,-113.065226,WGS84,PM10 Total 0-10um STP,7,...,064,HI-VOL-SA/GMW-321-B - GRAVIMETRIC,,"33 N. 100 WEST, CEDAR CITY, UTAH",Utah,Iron,Cedar City,16260,"Cedar City, UT",2021-11-09
2,49,021,0004,81102,1,37.676644,-113.065226,WGS84,PM10 Total 0-10um STP,7,...,064,HI-VOL-SA/GMW-321-B - GRAVIMETRIC,,"33 N. 100 WEST, CEDAR CITY, UTAH",Utah,Iron,Cedar City,16260,"Cedar City, UT",2021-11-09
3,49,017,0101,88502,1,37.618383,-112.174368,WGS84,Acceptable PM2.5 AQI & Speciation Mass,7,...,707,IMPROVE Module A with Cyclone Inlet-Teflon Fil...,,Bryce Canyon NP,Utah,Garfield,Not in a city,,,2015-07-22
4,49,017,0101,88502,1,37.618383,-112.174368,WGS84,Acceptable PM2.5 AQI & Speciation Mass,7,...,707,IMPROVE Module A with Cyclone Inlet-Teflon Fil...,,Bryce Canyon NP,Utah,Garfield,Not in a city,,,2015-07-22
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35045,32,003,0024,88101,3,36.815897,-114.050347,NAD83,PM2.5 - Local Conditions,X,...,638,Teledyne T640X at 16.67 LPM w/Network Data Ali...,Virgin Valley High School,820 Valley View Dr. Mesquite,Nevada,Clark,Mesquite,29820,"Las Vegas-Henderson-Paradise, NV",2023-10-20
35046,32,003,0024,88101,3,36.815897,-114.050347,NAD83,PM2.5 - Local Conditions,X,...,638,Teledyne T640X at 16.67 LPM w/Network Data Ali...,Virgin Valley High School,820 Valley View Dr. Mesquite,Nevada,Clark,Mesquite,29820,"Las Vegas-Henderson-Paradise, NV",2023-10-20
35047,32,003,0024,88101,3,36.815897,-114.050347,NAD83,PM2.5 - Local Conditions,X,...,638,Teledyne T640X at 16.67 LPM w/Network Data Ali...,Virgin Valley High School,820 Valley View Dr. Mesquite,Nevada,Clark,Mesquite,29820,"Las Vegas-Henderson-Paradise, NV",2023-10-20
35048,32,003,0024,88101,3,36.815897,-114.050347,NAD83,PM2.5 - Local Conditions,X,...,638,Teledyne T640X at 16.67 LPM w/Network Data Ali...,Virgin Valley High School,820 Valley View Dr. Mesquite,Nevada,Clark,Mesquite,29820,"Las Vegas-Henderson-Paradise, NV",2023-10-20


To make the JSON file easier to read, I converted these 2 lists of JSON objects to DataFrames. For each gaseous and particulates AQI, I computed the average AQI group by year, and by both year and pollutant parameter.

In [97]:
df_gaseous_aqi_year = df_gaseous_aqi.copy()
df_gaseous_aqi_year['date_local'] = pd.to_datetime(df_gaseous_aqi_year['date_local'])

# Extract the year from the date column and create a new 'year' column
df_gaseous_aqi_year['year'] = df_gaseous_aqi_year['date_local'].dt.year

# Group by the 'year' and 'parameter' column and calculate the mean of the 'aqi' column
average_gas_aqi_by_year = df_gaseous_aqi_year.groupby(['year'])['aqi'].mean().reset_index()
average_gas_aqi_by_year_param = df_gaseous_aqi_year.groupby(['year', 'parameter'])['aqi'].mean().reset_index()
average_gas_aqi_by_year_param.head()

Unnamed: 0,year,parameter,aqi
0,1975,Sulfur dioxide,95.0
1,1976,Sulfur dioxide,109.720339
2,1977,Nitrogen dioxide (NO2),6.480769
3,1977,Sulfur dioxide,36.72242
4,1978,Nitrogen dioxide (NO2),6.910448


In [96]:
df_particles_aqi_year = df_particulates_aqi.copy()
df_particles_aqi_year['date_local'] = pd.to_datetime(df_particles_aqi_year['date_local'])

# Extract the year from the date column and create a new 'year' column
df_particles_aqi_year['year'] = df_particles_aqi_year['date_local'].dt.year

# Group by the 'year' and 'parameter' column and calculate the mean of the 'aqi' column
average_par_aqi_by_year = df_particles_aqi_year.groupby(['year'])['aqi'].mean().reset_index()
average_par_aqi_by_year_param = df_particles_aqi_year.groupby(['year', 'parameter'])['aqi'].mean().reset_index()
average_par_aqi_by_year_param.head()

Unnamed: 0,year,parameter,aqi
0,1988,Acceptable PM2.5 AQI & Speciation Mass,16.767442
1,1988,PM10 Total 0-10um STP,18.0
2,1989,Acceptable PM2.5 AQI & Speciation Mass,16.92
3,1989,PM10 Total 0-10um STP,18.533333
4,1990,Acceptable PM2.5 AQI & Speciation Mass,15.191489


For future use of the DataFrames, I saved them to .csv files. They are used in the common analysis notebook.

In [98]:
df_aqi_year = pd.concat([df_gaseous_aqi_year, df_particles_aqi_year], ignore_index = True)
average_aqi_year = df_aqi_year.groupby(['year'])['aqi'].mean().reset_index()
average_aqi_year.head()

Unnamed: 0,year,aqi
0,1975,95.0
1,1976,109.720339
2,1977,25.926773
3,1978,34.164474
4,1979,4.282486


In [99]:
average_aqi_year.to_csv('df_average_aqi.csv', index=False)

In [102]:
average_aqi_param_year = df_aqi_year.groupby(['year', 'parameter'])['aqi'].mean().reset_index()
average_aqi_param_year.head()

Unnamed: 0,year,parameter,aqi
0,1975,Sulfur dioxide,95.0
1,1976,Sulfur dioxide,109.720339
2,1977,Nitrogen dioxide (NO2),6.480769
3,1977,Sulfur dioxide,36.72242
4,1978,Nitrogen dioxide (NO2),6.910448


In [103]:
average_aqi_param_year.to_csv('df_average_param_aqi.csv', index=False)