#### Housekeeping

In [2]:
#pip install datapane

import pandas as pd 
import datapane as dp 
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
import datetime
import statsmodels
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

In [3]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

# Exploring Air Quality and Temperature Relationship for Champaign County, IL

In this project, data vidualizations have been presented to represent and visualize relationships between Air quality, Temperature and Precipitation data for Champaign Count, Illinois. <br>The structure of this project is: <br> 1) Data Exploration of Air Quality Data for County from U.S EPA. <br> 2) Exploring and visualizing climate data from MRCC, UIUC. <br> 3) Merging datasets and visualizing for analysis.

### Air Quality Index (AQI)

<p>&bull; Air quality monitors measure PM2.5 and PM10 concentrations in g/m3<br /> &bull; Local, regional, and national governments decide how to disseminate monitor measurements to the public<br /> &bull; Preferred way to communicate is via a color-coded Air Quality Index (AQI) that is easy for the public to understand</p>
<p><strong>What is the Air Quality Index?</strong><br />&bull; Index for reporting air quality<br />&bull; Color is key for communication<br />&bull; Ranges from 0 to 500 (no units)<br />&bull; Provides indicator of the quality of the air and its health effects<br />&bull; 101 typically corresponds to the level that violates the national health standard</p>
<p><img src="https://www.epa.gov/sites/production/files/styles/large/public/2019-07/aqitableforcourse.png" alt="" width="478" height="453" /></p>

### Air Quality index for Champaign County (1980-2020)
I have imported data from EPA website. I am using the Air Quality Index of Champaign since 1980 till 2020 and visualizing this information. The data was obtained from the public website as comma seperated value files and stored on Github for access.

In [17]:
# Dataset from https://www.epa.gov/outdoor-air-quality-data/download-daily-data for Champaign County.
data = pd.read_csv('https://github.com/mihakim2/IS445FinalProject/raw/main/ChampaignAQI40year.csv')

data = data.rename(columns={'Date':'Record','Overall AQI Value': 'AQI', 'Main Pollutant': 'MainPollutant'}) #Renaming COlumns
data = data[['Record', 'AQI', 'MainPollutant']]
#Setting Date ranges
data = data.set_index(['Record'])
data['Year'] = pd.DatetimeIndex(data.index).year
data['Month'] = pd.DatetimeIndex(data.index).month
data['Date'] = pd.DatetimeIndex(data.index).day
data['Day'] = pd.DatetimeIndex(data.index).dayofyear

data.head(10)

Unnamed: 0_level_0,AQI,MainPollutant,Year,Month,Date,Day
Record,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
01/01/1980,9,Ozone,1980,1,1,1
01/02/1980,9,Ozone,1980,1,2,2
01/03/1980,10,Ozone,1980,1,3,3
01/04/1980,9,Ozone,1980,1,4,4
01/05/1980,18,Ozone,1980,1,5,5
01/06/1980,16,Ozone,1980,1,6,6
01/07/1980,13,Ozone,1980,1,7,7
01/08/1980,15,Ozone,1980,1,8,8
01/09/1980,10,Ozone,1980,1,9,9
01/10/1980,9,Ozone,1980,1,10,10


#### Exploring Air Quality Index over a single year.

In this visualization an interactive graphic representing the AQI values in one calender year are being represented. Use the Scroller to compare and analyse the AQI from 1980 to 2020 for the county.

Additionally, a trend line has been shown that is based on LOWESS (Locally Weighted Scatterplot Smoothing), also known as LOESS (locally weighted smoothing), which is a popular tool used in regression analysis that creates a smooth line through a timeplot or scatter plot to help analyse relationship between variables and foresee trends.

In [21]:
def f(Year):
    df=data.loc[data['Year'] == Year]
    fig = px.scatter(data,x='Month', y= 'AQI', trendline="lowess",trendline_color_override="red",  color ='AQI',
       labels={'value':'AQI in {}'.format(str(Year)),'index': 'Day of the Year'},
       title= f'AQI for Champaign County, IL for 365 days in year {Year}')
    fig.show()
    
interact(f, Year=(1980, 2020))


interactive(children=(IntSlider(value=2000, description='Year', max=2020, min=1980), Output()), _dom_classes=(…

<function __main__.f(Year)>

#### The variation of AQI in a given year and month has been represented below to analyse the variation by days of the month.

This illustration also shows Ordinary Least Squares regression trendline. Hovering over the trendline will show the equation of the line and its R-squared value.

In [24]:
def f(Year, Month):
    df1=data.loc[data['Year'] == Year]
    df=df1.loc[df1['Month'] == Month]
    fig = px.scatter(df, x='Date', y='AQI', trendline="ols", trendline_color_override="red", color='AQI', 
        labels={'value':'AQI in {}th month of year.'.format(str(Month)), 'index': 'Days of the Year'},
        title= f'Monthly Air Quality Index for {Month}th month of {Year} Champaign between 1980-2020')
    fig.show()



interactive_plot = interactive(f, Year=(1980, 2020,1), Month=(1, 12, 1))
interactive_plot

interactive(children=(IntSlider(value=2000, description='Year', max=2020, min=1980), IntSlider(value=6, descri…

Let us now move away from statistical visualization to a simple heat map that represents AQI using different shades.

In [37]:
def f(Month):
    df =data.loc[data['Month'] == Month]
    fig = go.Figure(data=go.Heatmap(
        y=df['Day'], x=df['Year'], z =df['AQI'],colorscale='OrRd')
        )

    fig.update_layout(
    title=f'HeatMap For AQI of Champaign County 1980-2020',
    xaxis_title="Year",
    yaxis_title="Days of Year",
    legend_title="Legend")
    fig.show()

interactive_plot = interactive(f, Month=(1, 12, 1))
interactive_plot

interactive(children=(IntSlider(value=6, description='Month', max=12, min=1), Output()), _dom_classes=('widget…

To visualize the AQI's over the year using mathematical Minimum, Maximum, Median and Mean, boxplots suggest an overall improvement in Air Quality as the Max values and quantile distribution have reduced over years.

In [38]:
def f(Month):
    df =data.loc[data['Month'] == Month]
    fig = px.box(df, x = "Year", y = 'AQI',   
       labels={'value':'AQI in {}th month of year.'.format(str(Month)), 'index': 'Days of the Year'},
       title= f'Monthly Air Quality Index for Champaign between 1980-2020')
    fig.show()

interactive_plot = interactive(f, Month=(1, 12, 1))
output = interactive_plot.children[-1]
interactive_plot


interactive(children=(IntSlider(value=6, description='Month', max=12, min=1), Output()), _dom_classes=('widget…

We can also refer to a particular calender day to check how AQI has been varrying over years. Scrollers below could be used to adjust the month and date. eg AQI for day after Independence Day ie 5th of July is shown

In [48]:

def f(Month=7, Date=5):
    df =data.loc[data['Month'] == Month]
    
    fig = px.bar(df,x= 'Year', y = 'AQI',
              labels={'index': 'Yearh', 'value': 'AQI'}, color='AQI', 
        title=f'AQI from 1980 to 2020 for {Date}th Day of {Month}th Month of Year',
          )
    fig.show()

interactive_plot = interactive(f, Month=(1, 12,1), Date=(1, 31, 1))
interactive_plot

interactive(children=(IntSlider(value=7, description='Month', max=12, min=1), IntSlider(value=5, description='…

## Daily Temperature and Precipitation Data Analysis

For this part the data has been borrowed from the UIUC Midwestern Regional Climate Center's CLIMATE portal. This data includes Mean, Max and Min daily temperarures in Champaign from 1980 to 2020 and also daily precipitation and snowfall data. The dataset also has Average values of Heating Degree Days (HDD) and Cooling Degree Days(CDD). Temperatures are recorded in Fahrenheit and precipitation and snowfall in inches.

Heating degree day is a measurement designed to quantify the demand for energy needed to heat a building. It is the number of degrees that a day's average temperature is below 65 Fahrenheit HDD is derived from measurements of outside air temperature.
A cooling degree day (CDD) is a measurement designed to quantify the demand for energy needed to cool buildings. It is the number of degrees that a day's average temperature is above 65 Fahrenheit. Both HDD and CDD are unitless.

For more details refer  https://mrcc.illinois.edu/CLIMATE/

In [83]:
# Dataset from https://mrcc.illinois.edu/CLIMATE/ for Champaign County.
temp_data = pd.read_csv('https://github.com/mihakim2/IS445FinalProject/raw/main/SD_b2dates.csv')
temp_data = temp_data.rename(columns={'099AX': 'MAX', '099IN': 'MIN', '99EAN': 'MEAN',  'HDD1': 'HDD', 'CDD1': 'CDD'}) #Renaming COlumns

#Setting Date ranges
temp_data['Date'] = pd.date_range('1980-1-1', periods=14953, freq='D')
temp_data = temp_data.set_index(['Date'])
temp_data['Year'] = pd.DatetimeIndex(temp_data.index).year
temp_data['Month'] = pd.DatetimeIndex(temp_data.index).month
temp_data['Date'] = pd.DatetimeIndex(temp_data.index).day
temp_data['Day'] = pd.DatetimeIndex(temp_data.index).dayofyear


temp_data.head()

Unnamed: 0_level_0,PRCP,SNOW,MAX,MIN,MEAN,HDD,CDD,Year,Month,Date,Day
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1980-01-01,0.0,0.0,33,26,29.5,35,0,1980,1,1,1
1980-01-02,0.0,0.0,30,28,29.0,36,0,1980,1,2,2
1980-01-03,0.01,0.0,32,28,30.0,35,0,1980,1,3,3
1980-01-04,0.0,0.0,31,26,28.5,36,0,1980,1,4,4
1980-01-05,0.0,0.0,31,21,26.0,39,0,1980,1,5,5


Barplot below allows for exploration of dataset for a given year and month.

In [84]:
def f(Type,Year, Month):
    df =temp_data.loc[temp_data['Year'] == Year]
    df2 = df.loc[df['Month'] == Month]
    fig = px.bar(df2,x='Date',y=Type,
        title=f'{Type} for {Month}th month of {Year} For Champaign County 1980-2020', color=Type
          )
    fig.show()

interactive_plot = interactive(f, Type=['MIN', 'MAX','MEAN','PRCP','SNOW','HDD','CDD'],
                               Year=(1980, 2012,1),
                               Month=(1, 12, 1))
output = interactive_plot.children[-1]
interactive_plot

interactive(children=(Dropdown(description='Type', options=('MIN', 'MAX', 'MEAN', 'PRCP', 'SNOW', 'HDD', 'CDD'…

To visualize the trend for changing values over years, below visualization represents monthly trends for the data along with regression line for data.

In [51]:
def f(Type='MEAN', Month=9):
    df=temp_data.loc[temp_data['Month'] == Month]
    fig = px.scatter(df,x='Year', y =Type,trendline="ols",
       color= Type,     
       labels={'value':'MAX in {}'.format(str(Month))},
       title= f'Monthly {Type} for Champaign between 1980-2020')
    fig.show()

interactive_plot = interactive(f, Type=['MIN', 'MAX','MEAN','PRCP','SNOW','HDD','CDD'], Month=(1, 12, 1))
output = interactive_plot.children[-1]
interactive_plot

interactive(children=(Dropdown(description='Type', index=2, options=('MIN', 'MAX', 'MEAN', 'PRCP', 'SNOW', 'HD…

In [52]:
def f(Type, Month):
    df =temp_data.loc[temp_data['Month'] == Month]
    fig = go.Figure(data=go.Heatmap(
        y=df['Date'], x=df['Year'], z =df[Type],colorscale='OrRd')
        )

    fig.update_layout(
    title=f'HeatMap For {Type} Daily Temperature of Champaign County 1980-2020 in deg F',
    xaxis_title="Year",
    yaxis_title="Date",
    legend_title="Legend")
    fig.show()

interactive_plot = interactive(f, Type=['MIN', 'MAX','MEAN'], Month=(1, 12, 1))
output = interactive_plot.children[-1]
interactive_plot

interactive(children=(Dropdown(description='Type', options=('MIN', 'MAX', 'MEAN'), value='MIN'), IntSlider(val…

### Combining AQI with Temperature

Lastly, the AQI data is compared with climate data to find possible relations with their variations over the years. The temperature data matches the changing trend of AQI suggesting possible relationships.

In [82]:
def f(Type='MEAN',Year='2019', Month='5'):
    df2 =data.loc[data['Month'] == Month]
    df =df2.loc[df2['Year'] == Year]
    dft =temp_data.loc[temp_data['Year'] == Year]
    dft2 = dft.loc[dft['Month'] == Month]
    # Create figure with secondary y-axis
    fig = make_subplots(specs=[[{"secondary_y": True}]])

    # Add traces
    fig.add_trace(
        go.Bar(x=df['Date'], y=df['AQI'], name="AQI data"),
        secondary_y=False,
    )

    fig.add_trace(
        go.Scatter(x=df['Date'], y=dft2[Type], name=f"{Type}"),
        secondary_y=True,
    )
    # Add figure title
    fig.update_layout(
        title_text=f"AQI vs Climate Data for {Year}"
    )

    # Set x-axis title
    fig.update_xaxes(title_text="xaxis title")

    # Set y-axes titles
    fig.update_yaxes(title_text="<b>Air Quality Index</b> AQI", secondary_y=False)
    fig.update_yaxes(title_text=f"<b>{Type}</b> from Climate Data", secondary_y=True)

    fig.show()

interactive_plot = interactive(f, Type=['MIN', 'MAX','MEAN','PRCP','SNOW','HDD','CDD'], Year=(1980, 2020,1), Month=(1, 12, 1))
interactive_plot




interactive(children=(Dropdown(description='Type', index=2, options=('MIN', 'MAX', 'MEAN', 'PRCP', 'SNOW', 'HD…

Data Sources: <br>
    1) EPA - Air Data: Air Quality Data Collected at Outdoor Monitors Across the US<br>
        https://www.epa.gov/outdoor-air-quality-data<br>
    2)Midwestern Regional Climate Center, University of Illinois Urbana Champaign<br>
        https://mrcc.illinois.edu/CLIMATE/stnchooser_maptest.jsp?UcanSelect=5995