In [55]:
import pandas as pd
import numpy as np
import ipywidgets
import bqplot

In [56]:
lic = pd.read_csv("https://raw.githubusercontent.com/UIUC-iSchool-DataViz/is445_data/main/licenses_fall2022.csv", na_values={"Original Issue Date": 'None', "Effective Date" : "None", "County" : "None", "County" : "nan", "Expiration Date":"nan", "Expiration Date": '  /  /    '}) 

In [57]:
lic["Effective Date"] = pd.to_datetime(lic["Effective Date"],
                                            format = '%m/%d/%Y')
lic["Expiration Date"] = pd.to_datetime(lic["Expiration Date"],
                                            format = '%m/%d/%Y', errors='coerce')
lic['days_active'] = lic['Expiration Date'] - lic['Effective Date']

lic["Original Issue Date"] = pd.to_datetime(lic["Original Issue Date"],
                                             format = '%m/%d/%Y', errors='coerce')
lic['orig_to_exp'] = (lic['Expiration Date'] - lic['Original Issue Date']).astype('timedelta64[D]')

lic['orig_year'] = lic['Original Issue Date'].dt.year


In [58]:
lic_table = pd.pivot_table(lic, values='days_active', index=['County'], columns=['License Type'], aggfunc=np.mean, fill_value=pd.NaT)

In [59]:
# creating the grid heat map

# 1. data - columns is license type and rows are county
x = lic_table.index
y = lic_table.columns


# 2. scales 
col_sc = bqplot.ColorScale(scheme = 'RdPu')
x_sc = bqplot.OrdinalScale()
y_sc = bqplot.OrdinalScale()

# 3. ax
col_ax = bqplot.ColorAxis(scale=col_sc, orientation='vertical', side='right')
x_ax = bqplot.Axis(scale=x_sc, label='license type')
y_ax = bqplot.Axis(scale=y_sc, label='county', orientation = 'vertical')

# 4. mark heeat map
heat_map = bqplot.GridHeatMap(row=lic_table.index, column=lic_table.columns, color=lic_table.values.astype('int'), scales={'color': col_sc,'row': y_sc, 'column': x_sc}, interactions = {'click': 'select'}, selected_style = {'fill':'blue'})


#6. figure
fig = bqplot.Figure(marks=[heat_map], axes=[x_ax, y_ax, col_ax])
fig

Figure(axes=[Axis(label='license type', scale=OrdinalScale()), Axis(label='county', orientation='vertical', sc…

In [60]:
# bar plot (that i called hist for some reason lol)

i, j = 'CHAMPAIGN', 'COSMO'

the_data = lic.loc[lic['County'] == i].loc[lic['License Type'] == j]
series_to_plot = the_data.groupby('orig_year')['orig_to_exp'].median()

this_x = series_to_plot.index
this_y = series_to_plot.values

#scale
x_sch = bqplot.LinearScale()
y_sch = bqplot.LinearScale()

# axis
ax_xsch = bqplot.Axis(label= 'year', scale = x_sch)
ax_ysch = bqplot.Axis(label='median days from issue to expiration', scale= y_sch, orientation='vertical')
# marks
hist = bqplot.Bars(x = this_x, y=this_y,
                   scales = {'x': x_sch, 'y': y_sch})
fig_hist = bqplot.Figure(marks = [hist], axes = [ax_xsch, ax_ysch]) 

In [61]:
# linking grid heat map to bar plot

selectedLabel = ipywidgets.Label()
def get_data_value(change):
    i,j = change['owner'].selected[0]
    v = lic_table.iloc[i].iloc[j]
    if (len(change['owner'].selected) == 1):
        the_data = lic.loc[lic['County'] == lic_table.iloc[i,:].name].loc[lic['License Type'] == lic_table.iloc[:,j].name]
        series_to_plot = the_data.groupby('orig_year')['orig_to_exp'].median()
        hist.x = series_to_plot.index
        hist.y = series_to_plot.values
        selectedLabel.value = 'mean days '+ str(v)

heat_map.observe(get_data_value, 'selected')


In [62]:
fig.layout.min_width = '500px'
fig_hist.layout.min_width = '500px'
figures = ipywidgets.HBox([fig, fig_hist])
myDashboard = ipywidgets.VBox([selectedLabel, figures])

myDashboard

VBox(children=(Label(value=''), HBox(children=(Figure(axes=[Axis(label='license type', scale=OrdinalScale(), s…

## Reflection 

This assignment was difficult largely due to me not following the bqplot instructions. I found myself attempting to create functions ignoring the fact that bqplot objects already have traitlets. Moreover, my naming conventions of different elements of bqplot plots were not consistent, which made following my logic difficult. Going forward, I will be more systematic when using declarative plotting engines. I think this will reduce my confusion and the amount of time it takes me to create a final product. 

### Transformations/Scalings

I transformed the data from its initial state via datetime arithmetic using pandas. This involved transforming the data type of the original issue date, effective date, and expiration date into datetime objects. For the days active and original to expiration dates, I reported teh values in days. This means that time elements are not present--i.e., if a license became effective on 2022-01-23 23:59:59, the date would only be considered as 2022-01-23. It would not be rounded up. I did not do any rescaling to the data. 

### NaN's discussion
Write up includes includes discussion of treatments of missing data/NaN's
When data was missing, I left it in the grid heat map. This is because I believe the absence of a phenomenon is just as meaningful as its presence. A user may be curious to know what kinds of licenses have never been issued in their county. For that reason, I think the empty bar plot is also a useful visual; it confronts the viewer with the data's absence. With more time, I would have liked to make sure the NaT values were presented in a very different color from the rest of the data. At the moment, they seem to be treated as '0' on the scale which can make it difficult to find a cell with actual data. 

### Aesthetics
I ran out of time to make any aesthetic choices. With time, I would have liked to rotate the labels on the axes of the grid heat map so they were visible, add a title to the bar plot, and present the NaT values in a distinct color to make it clear where data exists in the grid heat map. 