# Final project Part 3 (Ian Chapman)

You will be writing an interactive data visualization article aimed at the public. Your article should feature:

* A compelling title don't forget to specify that you are the author!
* At least one central interactive visualization featuring your primary dataset. This can be similar to what you submitted in the last phase, but does not need to be a dashboard. Remember, this is for the public so it should be large and friendly.
* At least two contextual visualizations - these can be other data visualizations you've done, or images from other places (remember to site your sources!!).
* At least 3 paragraphs of connective information to help a novice understand what is happening in your datasets.
* Citations of all the data sources used and information for the reader to be able to find those datasets themselves.

You should submit:

Code:
* The GitHub (or other) URL where the code is stored or a link specifying what to enter in nbviewer/mybinder.
* You can receive extra credit for including more than the required minimum. This can include making more than one visualization interactive, incorporating more than 1 visualization you've done yourself, or incorporating more than your main dataset into the three visualizations.
* Look to data visualization articles on fivethirtyeight.com, the New York Times website, or elsewhere for inspiration.

In [1]:
import pandas as pd
import numpy as np
import bqplot
import ipywidgets


ghg = pd.read_csv('WA_GHG_Reporting_Multi-Year_Dataset(county_mod).csv',
                  na_values = {'2016 total emissions (MTCO2e)': ''})

# I have modified the ghg dataset as follows:
# In the "County" column, "NA" is changed to "(Statewide)".
# This allows data not ascribed to a given county to be included in the heatmap below.

ghg = ghg.dropna(axis=0, subset=['2016 total emissions (MTCO2e)'])

ghg

# Citation: Method for removing nulls in selected column adapted from: 
# https://stackoverflow.com/questions/49291740/delete-rows-if-there-are-null-values-in-a-specific-column-in-pandas-dataframe
# (viewed 4/13/19)

Unnamed: 0,Source,Sector,Subsector,City,County,Local Air Authority,2012 total emissions (MTCO2e),2012 biogenic carbon dioxide (MTCO2e),2013 total emissions (MTCO2e),2013 biogenic carbon dioxide (MTCO2e),2014 total emissions (MTCO2e),2014 biogenic carbon dioxide (MTCO2e),2015 total emissions (MTCO2e),2015 biogenic carbon dioxide (MTCO2e),2016 total emissions (MTCO2e),2016 biogenic carbon dioxide (MTCO2e),2017 total emissions (MTCO2e),2017 biogenic carbon dioxide (MTCO2e)
0,Agrium Kennewick Fertilizer Operations (KFO) -...,Chemicals,Nitric Acid Production,Kennewick,Benton,Benton Clean Air Agency,146926.0,0.0,154497.0,0.0,132249.0,0.0,155888.0,0.0,151371.0,0.0,144290.0,0.0
1,Air Liquide - Anacortes,Chemicals,Hydrogen Production,Anacortes,Skagit,Northwest Clean Air Agency,63356.0,0.0,58995.0,0.0,64110.0,0.0,64413.0,0.0,60209.0,0.0,63461.0,0.0
2,Alcoa Intalco Works - Ferndale,Metals,Aluminum Production,Ferndale,Whatcom,Ecology: Industrial Section,1146835.0,0.0,1234637.0,0.0,1326684.0,0.0,1195786.0,0.0,1261364.0,0.0,1091665.0,0.0
3,Alcoa Wenatchee Works - Malaga,Metals,Aluminum Production,Malaga,Chelan,Ecology: Industrial Section,306333.0,0.0,318542.0,0.0,354692.0,0.0,331207.0,0.0,898.0,0.0,0.0,0.0
4,Alon Asphalt Company - Seattle,Petroleum and Natural Gas Systems,Other Petroleum and Natural Gas Systems,Seattle,King,Puget Sound Clean Air Agency,15138.0,0.0,14336.0,0.0,16004.0,0.0,13688.0,0.0,14096.0,0.0,14818.0,0.0
5,Ardagh Glass Inc. - Seattle,Minerals,Glass Production,Seattle,King,Puget Sound Clean Air Agency,76257.0,0.0,80745.0,0.0,78044.0,0.0,76674.0,0.0,77845.0,0.0,75338.0,0.0
6,Ascensus Specialties LLC - Elma,Chemicals,Other Chemicals,Elma,Grays Harbor,Olympic Region Clean Air Agency,16809.0,0.0,17966.0,0.0,21231.0,0.0,17600.0,0.0,20802.0,0.0,21310.0,0.0
7,Ash Grove Cement Company - Seattle,Minerals,Cement Production,Seattle,King,Puget Sound Clean Air Agency,305298.0,0.0,354808.0,0.0,522982.0,0.0,495030.0,0.0,383836.0,0.0,355513.0,0.0
8,Avista Corporation - WA State DOE Reporting - ...,Petroleum and Natural Gas Systems,Natural Gas Local Distribution Companies,Spokane,Spokane,Spokane Regional Clean Air Agency,20992.0,0.0,16127.0,0.0,16420.0,0.0,22858.0,0.0,21120.0,0.0,23757.0,0.0
9,Basic American Foods - Moses Lake,Food Production,Potato Products,Moses Lake,Grant,Ecology: Eastern Regional Office,28205.0,0.0,28312.0,0.0,28982.0,0.0,31063.0,0.0,28977.0,0.0,30576.0,0.0


## Total emissions per sector (bar chart)

In [2]:
x = ghg['Sector']
y = ghg['2016 total emissions (MTCO2e)']

xnames = x.unique()
ynames = y.unique()

for i,xn in enumerate(xnames):
    mask = (x == xn)
    ynames[i] = y[mask].sum()

x_sc = bqplot.OrdinalScale()
y_sc = bqplot.LinearScale()

x_ax = bqplot.Axis(scale = x_sc, 
                    label = 'Sector',
                    label_offset = '60px',
                    tick_rotate = 70,
#                     tick_style = {'font-size':'10px', 'tick_offset':'100px','text_anchor':'top'})
                    tick_style = {'font-size':'10px'},
                    offset = {'scale':x_sc, 'value':'60px'})
y_ax = bqplot.Axis(scale = y_sc, 
                    orientation = 'vertical', 
                    side = 'left',
                    label = '2016 GHG emissions (MT CO2e)',
                    label_offset = '50px')

sect_bar = bqplot.Bars(x = xnames,
                     y = ynames,
                     color_mode = 'element',
                     scales = {'x': x_sc, 'y': y_sc},
                    opacities = [0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5],
                    interactions = {'click': 'select'},
                    anchor_style = {'fill':'red'}, 
                    selected_style = {'fill':'red','opacity': 0.5},
                    unselected_style = {'opacity': 1.0})


fig_sect = bqplot.Figure(marks = [sect_bar],
                         axes = [x_ax, y_ax],
                        fig_margin = {'top':60, 'bottom':120, 'left':70, 'right':60},
                        title = "WA GHG emissions by sector")

fig_sect

#  I have set "opacities" in bqplot.Bars to 0.5 for each of the 13 bars
#  (apparently you need to do them individually), to make the intruding tick labels visible. 
# But this has not worked: the bars remain fully opaque.
# Setting "opacity" to 0.5 under "unselected style" does make the selected bar translucent.

Figure(axes=[Axis(label='Sector', label_offset='60px', offset={'scale': OrdinalScale(), 'value': '60px'}, scal…

## Interaction: Breakdown of sectors into subsectors (bar chart)

In [3]:
x2 = ghg['Subsector'].values
y2 = ghg['2016 total emissions (MTCO2e)'].values

x2_sc = bqplot.OrdinalScale() 
y2_sc = bqplot.LinearScale()

x2_ax = bqplot.Axis(scale = x2_sc,
                    label = 'Subsector',
                    label_offset = '70px',
                    tick_values = x2,
                    tick_rotate = 45,
                    tick_style = {'font-size':'10px'})
y2_ax = bqplot.Axis(scale = y2_sc,
                    label = '2016 GHG emissions (MT CO2e)',
                    label_offset = '50px',
                    orientation = 'vertical',
                    side = 'left')

i = 0
mask = (x.values == xnames[i])
subsect = x2[mask]
emis2 = y2[mask]

emis2 = emis2[~pd.isnull(subsect)]
subsect = subsect[~pd.isnull(subsect)]

subsectu = np.unique(subsect)
emis2u = [emis2[subsect == subsect[i]].sum() for i in range(len(subsectu)) ]

subsect_bar = bqplot.Bars(x = subsectu,
                          y = emis2u,
                          color_mode = 'element',
                          scales = {'x': x2_sc, 'y': y2_sc})

fig_subsect = bqplot.Figure(marks = [subsect_bar], 
                            axes = [x2_ax, y2_ax],
                            fig_margin = {'top':60, 'bottom':120, 'left':70, 'right':60},
                            title = "WA GHG emissions by subsector")

fig_subsect

Figure(axes=[Axis(label='Subsector', label_offset='70px', scale=OrdinalScale(), tick_rotate=45, tick_style={'f…

In [4]:
mySelectedLabel = ipywidgets.Label()

def get_data_value(change):
    if change['owner'].selected is not None:
        i = change['owner'].selected[0]
        mask = (x.values == xnames[i])
        subsect = x2[mask]
        emis2 = y2[mask]
        emis2 = emis2[~pd.isnull(subsect)]
        subsect = subsect[~pd.isnull(subsect)]
        subsectu = np.unique(subsect)
        emis2 = [emis2[subsect == subsectu[b]].sum() for b in range(len(subsectu)) ]
        emis2 = np.array(emis2)
        v = emis2.sum(),
        mySelectedLabel.value = 'Sector GHG emissions = ' + str(v)
        subsect_bar.x = subsectu
        subsect_bar.y = emis2

sect_bar.observe(get_data_value, 'selected')

fig_sect.layout.max_width = '500px'
fig_sect.layout.max_height= '500px'
fig_subsect.layout.max_width='400px'
fig_subsect.layout.max_height='400px'

# ipywidgets.VBox([mySelectedLabel, ipywidgets.HBox([fig_sect, fig_subsect])])


## Comparison: national GHG sector breakdown

In [5]:
epa = pd.read_csv('EPA sectors 1990-2017.csv')

epa

Unnamed: 0,Economic Sector,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
0,Transportation,1527.076528,1480.932414,1540.536129,1577.522681,1632.154313,1667.32995,1723.499506,1750.008789,1792.366351,...,1872.031338,1795.852873,1803.437222,1775.819979,1756.345344,1765.440502,1799.892338,1809.342573,1849.737411,1866.183271
1,Electricity generation,1875.537005,1871.567818,1886.539355,1962.302414,1987.10259,2003.827119,2076.813504,2142.984903,2229.534239,...,2412.149675,2195.933089,2312.229639,2209.849661,2070.784485,2088.732365,2088.89422,1949.509412,1857.155032,1778.34546
2,Industry,1628.55571,1601.087569,1631.549769,1604.889269,1629.785256,1647.961134,1675.436843,1673.020466,1646.623631,...,1471.589121,1319.18008,1415.461672,1421.243817,1414.981983,1469.527568,1459.276118,1451.166432,1414.134995,1436.454673
3,Agriculture,534.859864,534.640267,538.743576,551.510566,543.980445,556.741782,559.083902,553.992088,575.497294,...,583.551404,586.084458,593.746436,579.216542,563.243243,572.570344,569.183613,585.239496,581.679613,582.180475
4,Commercial,426.92848,433.976409,429.400668,424.556077,427.191769,426.374785,433.795479,426.226943,400.240542,...,407.997858,410.781733,412.149087,406.995442,386.519311,409.580768,419.511322,432.172965,416.065981,415.976008
5,Residential,344.72177,354.286847,360.84921,372.203143,363.141974,367.4087,398.825974,380.392999,346.031683,...,363.763156,354.299033,355.005358,348.270012,305.636497,356.293037,376.608145,349.708507,326.863208,330.946433
6,U.S. territories,33.321186,39.123871,37.31565,39.085447,41.47944,40.423829,40.243846,41.835783,42.232269,...,49.51826,47.23788,46.562267,46.023576,48.458468,48.073592,46.629876,46.636094,46.631184,46.631872
7,Total,6371.000543,6315.615194,6424.934357,6532.069596,6624.835787,6710.067297,6907.699055,6968.46197,7032.52601,...,7160.600813,6709.369145,6938.591681,6787.419028,6545.969331,6710.218175,6759.995632,6623.775479,6492.267425,6456.718193


In [56]:
exclude = ['Total', 'U.S. territories']

xEf = epa[~epa['Economic Sector'].isin(exclude)]

xEf

# Citation: Pandas dataframe filter method adapted from:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html
# (viewed 4/18/19)

Unnamed: 0,Economic Sector,1990,1991,1992,1993,1994,1995,1996,1997,1998,...,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017
0,Transportation,1527.076528,1480.932414,1540.536129,1577.522681,1632.154313,1667.32995,1723.499506,1750.008789,1792.366351,...,1872.031338,1795.852873,1803.437222,1775.819979,1756.345344,1765.440502,1799.892338,1809.342573,1849.737411,1866.183271
1,Electricity generation,1875.537005,1871.567818,1886.539355,1962.302414,1987.10259,2003.827119,2076.813504,2142.984903,2229.534239,...,2412.149675,2195.933089,2312.229639,2209.849661,2070.784485,2088.732365,2088.89422,1949.509412,1857.155032,1778.34546
2,Industry,1628.55571,1601.087569,1631.549769,1604.889269,1629.785256,1647.961134,1675.436843,1673.020466,1646.623631,...,1471.589121,1319.18008,1415.461672,1421.243817,1414.981983,1469.527568,1459.276118,1451.166432,1414.134995,1436.454673
3,Agriculture,534.859864,534.640267,538.743576,551.510566,543.980445,556.741782,559.083902,553.992088,575.497294,...,583.551404,586.084458,593.746436,579.216542,563.243243,572.570344,569.183613,585.239496,581.679613,582.180475
4,Commercial,426.92848,433.976409,429.400668,424.556077,427.191769,426.374785,433.795479,426.226943,400.240542,...,407.997858,410.781733,412.149087,406.995442,386.519311,409.580768,419.511322,432.172965,416.065981,415.976008
5,Residential,344.72177,354.286847,360.84921,372.203143,363.141974,367.4087,398.825974,380.392999,346.031683,...,363.763156,354.299033,355.005358,348.270012,305.636497,356.293037,376.608145,349.708507,326.863208,330.946433


In [42]:
isin?

Object `isin` not found.


In [20]:

x = ghg['Sector']
y = ghg['2016 total emissions (MTCO2e)']

for i,xn in enumerate(xnames):
    mask = (x == xn)
    ynames[i] = y[mask].sum()


[]

In [None]:
xE_sc = bqplot.OrdinalScale() 
yE_sc = bqplot.LinearScale()

i = 0
if xE.values not in ['U.S territories', 'Total']
xE.values

In [None]:
x = ghg['Sector']
y = ghg['2016 total emissions (MTCO2e)']

xnames = x.unique()
ynames = y.unique()

for i,xn in enumerate(xnames):
    mask = (x == xn)
    ynames[i] = y[mask].sum()

x_sc = bqplot.OrdinalScale()
y_sc = bqplot.LinearScale()

x_ax = bqplot.Axis(scale = x_sc, 
                    label = 'Sector',
                    label_offset = '60px',
                    tick_rotate = 70,
#                     tick_style = {'font-size':'10px', 'tick_offset':'100px','text_anchor':'top'})
                    tick_style = {'font-size':'10px'},
                    offset = {'scale':x_sc, 'value':'60px'})
y_ax = bqplot.Axis(scale = y_sc, 
                    orientation = 'vertical', 
                    side = 'left',
                    label = '2016 GHG emissions (MT CO2e)',
                    label_offset = '50px')

sect_bar = bqplot.Bars(x = xnames,
                     y = ynames,
                     color_mode = 'element',
                     scales = {'x': x_sc, 'y': y_sc},
                    opacities = [0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5],
                    interactions = {'click': 'select'},
                    anchor_style = {'fill':'red'}, 
                    selected_style = {'fill':'red','opacity': 0.5},
                    unselected_style = {'opacity': 1.0})


fig_sect = bqplot.Figure(marks = [sect_bar],
                         axes = [x_ax, y_ax],
                        fig_margin = {'top':60, 'bottom':120, 'left':70, 'right':60},
                        title = "WA GHG emissions by sector")

fig_sect

## Economic sector, county, emissions (heat map)

In [7]:
x3 = ghg['Sector']
y3 = ghg['County']
z3 = ghg['2016 total emissions (MTCO2e)']

x3names = x3.unique()
y3names = y3.unique()
z3names = np.zeros([len(x3names),len(y3names)])

for i,x3n in enumerate(x3names):
    for j, y3n in enumerate(y3names):
        mask3 = (x3 == x3n) & (y3 == y3n)
        z3names[i,j] = z3[mask3].sum()

In [8]:
col_sc = bqplot.ColorScale(scheme="RdPu")
x3_sc = bqplot.OrdinalScale()
y3_sc = bqplot.OrdinalScale()

c_ax = bqplot.ColorAxis(scale = col_sc, 
                        orientation = 'vertical', 
                        side = 'right')

x3_ax = bqplot.Axis(scale = x3_sc,
                    label='County',
                    label_offset = '50px',
                    tick_rotate=90,
                    tick_style = {'font-size':'10px'},
                    offset = {'scale':x3_sc, 'value':'50'})
y3_ax = bqplot.Axis(scale = y3_sc, 
                    orientation = 'vertical', 
                    label = 'Sector',
                    label_offset = '100px')

heat_map = bqplot.GridHeatMap(color = np.log10(z3names),
                              row = x3names, 
                              column = y3names,
                              scales = {'color': col_sc,
                                        'row': y3_sc,
                                        'column': x3_sc},
                              interactions = {'click': 'select'},
                              anchor_style = {'fill':'blue'}, 
                              selected_style = {'opacity': 1.0},
                              unselected_style = {'opacity': 1.0})

fig_hm = bqplot.Figure(marks = [heat_map],
                       axes = [c_ax, y3_ax, x3_ax], 
                       fig_margin = dict(top=60, bottom=80, left=200, right=50),
                       title = "WA GHG emissions by sector and county")

fig_hm



Figure(axes=[ColorAxis(orientation='vertical', scale=ColorScale(scheme='RdPu'), side='right'), Axis(label='Sec…

## Interaction: Emissions per subsector per county (bar chart)

In [9]:
x4 = ghg['Subsector'].values
y4 = ghg['2016 total emissions (MTCO2e)'].values

x4_sc = bqplot.OrdinalScale() 
y4_sc = bqplot.LinearScale()

x4_ax = bqplot.Axis(scale=x4_sc,
                    label='Subsector',
                    label_offset = '30px',
                    tick_rotate=0,
                    tick_style={'font-size':'10px'})
#                     offset = {'scale':x4_sc, 'value':'50'}) # Here, moves labels too far.
y4_ax = bqplot.Axis(scale=y4_sc,
                    label='2016 GHG emissions (MT CO2e)',
                    label_offset = '50px',
                    orientation='vertical') 
#                     side='left',
#                     offset = {'scale':y4_sc, 'value':'50'})

In [10]:
i,j = 0,0
mask3 = (x3.values == x3names[i]) & (y3.values == y3names[j])
subsect4 = x4[mask3]
emis4 = y4[mask3]

emis4 = emis4[~pd.isnull(subsect4)]
subsect4 = subsect4[~pd.isnull(subsect4)]

subsect4u = np.unique(subsect4)
emis4u = [emis4[subsect4 == subsect4[i]].sum() for i in range(len(subsect4u)) ]

In [11]:
bar4 = bqplot.Bars(x = subsect4u,
                  y = emis4u,
                  color_mode = 'element',
                  scales = {'x': x4_sc, 'y': y4_sc})

fig_bar4 = bqplot.Figure(marks = [bar4],
                        axes = [x4_ax, y4_ax],
                        fig_margin = {'top':60, 'bottom':120, 'left':100, 'right':0},
                        title = 'WA County GHG emissions by subsector')

fig_bar4

Figure(axes=[Axis(label='Subsector', label_offset='30px', scale=OrdinalScale(), tick_style={'font-size': '10px…

In [12]:
mySelectedLabel2 = ipywidgets.Label()

def get_data_value(change):
    i,j = change['owner'].selected[0]
    mask3 = (x3.values == x3names[i]) & (y3.values == y3names[j])
    subsect4 = x4[mask3]
    emis4 = y4[mask3]
    emis4 = emis4[~pd.isnull(subsect4)]
    subsect4 = subsect4[~pd.isnull(subsect4)]
    subsect4u = np.unique(subsect4)
    emis4u = [emis4[subsect4 == subsect4u[b]].sum() for b in range(len(subsect4u)) ]
    emis4u = np.array(emis4u)
    v = emis4u.sum(),
    mySelectedLabel2.value = 'Sector GHG emissions for county = ' + str(v)
    bar4.x = subsect4u
    bar4.y = emis4
    
heat_map.observe(get_data_value, 'selected')

# fig_hm.layout.max_width = '500px'
# fig_hm.layout.max_height= '500px'
# fig_bar4.layout.max_width='400px'
# fig_bar4.layout.max_height='400px'

# ipywidgets.VBox([mySelectedLabel2, ipywidgets.VBox([fig_hm, fig_bar4])])


In [13]:
ipywidgets.VBox([mySelectedLabel, 
                 ipywidgets.HBox([fig_sect, fig_subsect]), 
                 mySelectedLabel2, 
                 ipywidgets.VBox([fig_hm, fig_bar4])])


VBox(children=(Label(value=''), HBox(children=(Figure(axes=[Axis(label='Sector', label_offset='60px', offset={…

## Explanation

This set of visualizations allows exploration of Washington state 2016 reported greenhouse gas (GHG) emissions data through two pairs of interlinked plots:
1. A bar chart plotting economic sector (x axis) against total annual GHG emissions. Clicking on a bar displays a second bar chart plotting annual emissions (y axis) for individual subsectors within the selected sector (x axis).
2. A heatmap plots the relationship between economic sector (rows), county (columns), and annual GHG emissions (color scale). Clicking on a cell within the heatmap displays a bar chart plotting the annual emissions for individual subsectors within the selected sector/county combination.  

These visualizations have the following features targeted towards experts: abbreviations such as GHG (greenhouse gases) and MT (metric tons) are used; approximate absolute measurements can readily be estimated; several different data relationships are plotted; the information is presented in a straight-forward manner without catchy gimmickry.


## Contextual database

Emissions & Generation Resource Integrated Database (eGRID)
https://www.epa.gov/energy/emissions-generation-resource-integrated-database-egrid

This provides US national greenhouse gas emissions data, hence helps contextualize the Washington state data of my focus dataset. Its most recent year is 2016; hence to enable comparison, I have also chosen this year for visualization of the Washington state dataset, though the latter also includes 2017.

## Acknowledgement
These visualizations adapt design features and a substantial amount of code from class examples and from the solution to the Assignment 6 problem provided by Dr. Jill Naiman ("hwex.ipynb", personal communication, 3/15/2019).