I want to see whether the introduction of mandatory emissions checks on cars every two years since 1983 correlates with an increase in air quality.

According to the "State of the Air" report 2020 by the American Lung Association there is a general decreasing trend in emissions since the 1970's in the US.

According to https://www.cabq.gov/airquality/trends there is also a downward trend between 2000 - 2015 in:
1. Ozone (parts per million, ppm)
2. Carbon Monoxide (parts per million, ppm)
3. Nitrogen Dioxide (parts per billion, ppb)
concentration in Albuqurque.

Levels of Sulfur Dioxide and particulate matter (both 10 and 2.5 microns) however appear to be stable over the same time period. For reference a human hair is aproximately 70 microns wide.

In [20]:
import pandas as pd
from pathlib import Path
import numpy as np
import plotly.express as px

In [None]:
#Change in Ozone levels (Parts per million)
level_ozone_2000 = 0.075
level_ozone_2015 = 0.066
change_ozone = round(level_ozone_2015 - level_ozone_2000, 4)

#Change in Carbon Monoxide levels (Parts per million)
level_CO_2000 = 3.8
level_CO_2015 = 1.4
change_CO = round(level_CO_2015 - level_CO_2000, 4)

#Change in Nitrogen Dioxide levels (Parts per billion)
level_NO_2000 = 65
level_NO_2015 = 45
change_NO = level_NO_2015 - level_NO_2000
#convert to ppm
change_NO = change_NO/1000

In [None]:
#Percent change Ozone
percent_ozone = round(change_ozone/level_ozone_2000 * 100, 4)
percent_CO = round(change_CO/level_CO_2000 * 100, 4)
precent_NO = round(change_NO/level_NO_2000 * 100, 4)

In [None]:
data = {"change_ppm": [change_ozone, change_CO, change_NO], "percent_change": [percent_ozone, percent_CO, precent_NO]}
df_change = pd.DataFrame(data = data, index = ["Ozone", "Carbon_Monoxide", "Nitrogen_Dioxide"])
print("Change in three air pollutants from 2000 to 2015 in Albuquerque")
df_change

Using data from https://www.epa.gov/outdoor-air-quality-data/air-data-daily-air-quality-tracker I want to visualize the difference in emissions between the year before mandatory car emissions checks were introduced (1982) and 2020. The metric is the Air Quality Index (AQI) for Ozone.
Note: I chose not to use the combined Ozone and and PM2.5 (Particulate Matter 2.5 microns) score, as for 1982 only the Ozone value was used in the combined score.

In [2]:
#AQI values
AQI_VALUES = {"Good":range(0, 51), "Moderate":range(51, 101), "Sensitive":range(101, 151), "Unhealthy":range(151, 201),
             "Very Unhealthy":range(201, 301), "Hazardous":range(301, 501)}
AQI_VALUES

{'Good': range(0, 51),
 'Moderate': range(51, 101),
 'Sensitive': range(101, 151),
 'Unhealthy': range(151, 201),
 'Very Unhealthy': range(201, 301),
 'Hazardous': range(301, 501)}

In [3]:
DATA_1982 = Path("data/1982_aqi_data.csv")
DATA_2020 = Path("data/2020_aqi_data.csv")

In [None]:
df_1982 = pd.read_csv(DATA_1982, parse_dates=[0,8,9], skipinitialspace=True)

In [None]:
#df_1982.info()

In [4]:
df_2020 = pd.read_csv(DATA_2020, parse_dates=[0,8,9], skipinitialspace=True)

In [5]:
df_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   Date                        366 non-null    datetime64[ns]
 1   Ozone AQI Value             366 non-null    int64         
 2   Site Name                   275 non-null    object        
 3   Site ID                     366 non-null    object        
 4   Source                      366 non-null    object        
 5   20-year High (2000-2019)    366 non-null    int64         
 6   20-year Low (2000-2019)     366 non-null    int64         
 7   5-year Average (2015-2019)  366 non-null    object        
 8   Date of 20-year High        366 non-null    datetime64[ns]
 9   Date of 20-year Low         366 non-null    datetime64[ns]
dtypes: datetime64[ns](3), int64(3), object(4)
memory usage: 28.7+ KB


In [27]:
#create sub df for plotting number of different air quality days

#df_graph = df_2020.drop(df_2020.index[59]) #remove february 29th
#df_graph = df_graph[['Date', '5-year Average (2015-2019)']]
#df_graph.head()
#day_of_year = list(range(1, 366))
#df_graph['Day of Year'] = day_of_year
df_graph['5-year Average (2015-2019)'] = df_graph['5-year Average (2015-2019)'].astype(float)
df_graph.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 365 entries, 0 to 365
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   Date                        365 non-null    datetime64[ns]
 1   5-year Average (2015-2019)  365 non-null    float64       
 2   Day of Year                 365 non-null    int64         
 3   Air Quality Rating          0 non-null      category      
dtypes: category(1), datetime64[ns](1), float64(1), int64(1)
memory usage: 11.9 KB


In [31]:
#bins = pd.IntervalIndex.from_tuples([(0,50.9),(51,100.9), (101,151)])
#df_graph['Air Quality Rating'] = pd.cut(df_graph['5-year Average (2015-2019)'],
#                                        bins,
#                                        labels=["Good", "Moderate", "Unhealthy for Sensitive Population"])
df_graph['Air Quality Rating'] = pd.cut(df_graph['5-year Average (2015-2019)'],
                                        [0, 50, 100, 150],
                                        labels=["Good", "Moderate", "Unhealthy for Sensitive Population"])

In [34]:
air_days = df_graph['Air Quality Rating'].nunique()
air_days

2

In [35]:
df_graph.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 365 entries, 0 to 365
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   Date                        365 non-null    datetime64[ns]
 1   5-year Average (2015-2019)  365 non-null    float64       
 2   Day of Year                 365 non-null    int64         
 3   Air Quality Rating          365 non-null    category      
dtypes: category(1), datetime64[ns](1), float64(1), int64(1)
memory usage: 11.9 KB


In [45]:
fig = px.histogram(df_graph, x='Air Quality Rating', color='Air Quality Rating', template='plotly_dark',
                   title="Albuquerque Good Air Days for Ozone 5-year Average (2015-2019)",
                   color_discrete_map = {"Good": "Green", "Moderate": "Red"})
fig.update_layout(yaxis_title="Number of Days")
fig.show()

In [None]:
#df_1982.head()

In [None]:
df = pd.concat([df_1982, df_2020], ignore_index=True)

In [None]:
df.info()

In [None]:
df['Year'] = pd.DatetimeIndex(df['Date']).year
#df.head()

In [None]:
#df['month_day'] = df['Date'].dt.strftime('%m-%d')

In [None]:
#df.head()
#df.info()
#df.to_csv("data/combined_air_data.csv")
DATA_COMBINED = Path("data/combined_air_data.csv")
df = pd.read_csv(DATA_COMBINED)
df.head()

In [None]:
#Count number of AQI not 'Good' air days for 1982 and 2020
not_good_air_days = df[df["Ozone AQI Value"]>51].groupby("Year").count()
not_good_air_days

In [None]:
#df.shape
df_no_leap_year = df[df['Date'] != '2020-02-29']
#df_no_leap_year.shape

In [None]:
day_of_year = list(range(1,366)) + list(range(1,366))


In [None]:
df_no_leap_year['day_of_year'] = day_of_year

In [None]:
df_no_leap_year.head()

In [None]:
fig = px.line(df_no_leap_year, x='day_of_year', y='Ozone AQI Value', color='Year', 
              title="Ozone AQI Values")
fig.show()

In [None]:
#create rolling average instead of raw data
df_no_leap_year['simple_moving_average'] = df_no_leap_year.iloc[:,1].rolling(window=5).mean()

In [None]:
df_no_leap_year.head(10)

In [None]:
#fig2 = px.line(df_no_leap_year, x='day_of_year', y='simple_moving_average', color='Year', 
#              title="Ozone Smoothed AQI Values")
#fig2.show()

In [None]:
fig3 = px.line(df_no_leap_year.iloc[0:364], x='day_of_year', y='5-year Average (2015-2019)',
               labels={
                   "day_of_year": "Day in Year"
               },
              title="Albuquerque Ozone Air Quality Index Five Year Average")
fig3.add_shape(# add a threshold line between 'Good' air and above
    type="line", line_color="salmon", line_width=3, opacity=1, line_dash="dot",
    x0=0, x1=365, xref="x", y0=50, y1=50, yref="y")
fig3.add_annotation(#add text over threshhold line
    text="Good Air Quality Boundary", x=320, y=52, showarrow=False)
fig3.show()