# Jupyter Notebook best practices for effective collaboration with stakeholders

Working with data in Python usually involves writing quite a few lines of code. Jupyter notebooks provide a conventient way to write and execute code and visualize the results. While Jupyter notebooks are a great tool for data scientists and analysts to bring together code and data, they can be intimidating for stakeholders who are not proficient in writing code. Working with many large code cells can be distracting when the focus is less on code but rather on data and visualization. By applying some software engineering skills and jupyter extensions it is possible to reduce this overhead.

This notebook serves as a show case for using interactive elements, patterns and extensions to achieve a more productive collaboration between tech and business people. We'll cover:
  - **pandas styles**: expressive data frame output
  - **visualization**: add interactive visualization
  - **ipywidgets**: add interaction
  - **patterns**: encapsulate widgets and visualizations into reusable components
  - **Jupyter extensions**: create a collaborative, cohesive work environment

These are my current best practices. I am always eager to extend this approach with additional packages, patterns, extensions etc. to create even more powerful and productive experiences.

## Setup

First let's create a dataframe containing random timeseries data.

In [None]:
import pandas as pd
import pandas.util.testing as pdt
pdt.N = 12
pdt.K = 6
data = pdt.makeTimeDataFrame(freq='MS')
print('rows, cols:', data.shape)
data.head(6)

## Pandas

### Styles

Since pandas 0.17.1 it is possible to customize the way pandas dataframes are rendered by applying custom styles. Technically styling is accomplished using CSS and an API which makes it easy to conditionally apply CSS attribute/value pairs.

**Note**: This feature is still under active development.

In [None]:
import functools

def highlight(s, func, color):
    '''
    Sets the background color for values in a pandas Series,
    if the value equals the return value of func.
    '''
    is_true = s == func(s)
    return [f'background-color: {color}' if v else '' for v in is_true]

hmin = functools.partial(highlight, func=min, color='red')
hmax = functools.partial(highlight, func=max, color='lime')

display(data.style.apply(hmin).apply(hmax))

## Visualization

Choosing the right visualization is paramount for communicating meaning of and uncovering patterns in data. Static visualizations are great but adding interactivity allows for quicker exploration and enriches the experience.

### matplotlib

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

fsize = (18, 5)

# the pandas plotting API uses matplotlib by default
data.plot(figsize=fsize)

### seaborn

The seaborn package provides powerful functions for creating static statistical plots. It sits on top of matplotlib.

In [None]:
# %conda install seaborn

In [None]:
import seaborn as sns
_, ax = plt.subplots(figsize=fsize)

sns.lineplot(data=data, ax=ax, style='event', markers=True, dashes=False, n_boot=3)

### bokeh

From the official website: 
> "_Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets._"

In [None]:
# %conda install bokeh

In [None]:
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, show, output_notebook
from bokeh.models.tools import HoverTool
import random

# render in current notebook instead of indiviudal website
output_notebook()

# choose a random column
col = random.choice(data.columns)
source = ColumnDataSource(data[[col]].rename(columns={col: 'series'}))

p = figure(x_axis_type="datetime", plot_width=980, plot_height=200, title=f'Timeseries {col}')
p.add_tools(HoverTool(
    tooltips=[
        ('value', '@series{0.2f}'),
    ],
    mode='vline'
))

p.line('index', 'series', source=source)

show(p)

### bqplot

bqplot is a plotting framework of interactive widgets made for Jupyter notebooks. It was created by Bloomberg.

In [None]:
# %conda install bqplot -c conda-forge

In [None]:
import ipywidgets as widgets
from bqplot import ColorScale, DateScale, LinearScale, Lines, Axis, Figure
from bqplot.interacts import MultiSelector, IndexSelector
import random, re

def index_sel_dt_callback(change):
    """Callback to update graph when new data has been selected
    
    Parameters
    ----------
    change : dict
        Widget and changed values.
    """
    if selector.selected:
        date = re.split(r'T', str(selector.selected[0]))[0]
        label.value = date

# choose a random column
col = random.choice(data.columns)
source = data[col]

# set scales and axes
scales = {
    'x': DateScale(), 
    'y': LinearScale(), 
    'color': ColorScale(scheme='oranges')
}

axes = [
    Axis(scale=scales['x'], label='Date', num_ticks=int(len(source.index) / 2)),
    Axis(scale=scales['y'], label='Value', orientation='vertical')
]

line = Lines(x=source.index, y=source, scales=scales)

# create interactive selector and monitor brushing events
selector = IndexSelector(scale=scales['x'], show_names=False, continuous_update=False)
selector.observe(index_sel_dt_callback, ['selected'])

fig = Figure(marks=[line], 
             axes=axes, 
             interaction=selector, 
             title=f'Timeseries {col}',
             animation_duration=1000, 
             layout=widgets.Layout(min_width='980px', max_height='250px'))
label = widgets.Label()
widgets.VBox(children=[fig, label])

## Widgets

Jupyter widgets are interactive elements which can be added to a notebook cell. Widgets can be used to modify variables and report status, instead of changing code and re-running cells.

In [None]:
import ipywidgets as widgets
from IPython.display import display

# the @interact decorator can automatically infer which
# widget to display based on a paramter's type.

@widgets.interact(col=data.columns)
def select_series(col):
    """This function creates a simple plot for
    the selected column.
    
    Parameters
    ----------
    col : str
        Column name.
    """
    series = data[col]
    series.plot(figsize=fsize)

In [None]:
# It is still possible to create widgets explicitly
# to have more fine grained control over a widget's properties.

@widgets.interact(col=widgets.Dropdown(options=data.columns, description='Feature:'))
def select_series(col):
    """This function creates a simple plot for
    the selected column.
    
    Parameters
    ----------
    col : str
        Column name.
    """
    series = data[col]
    series.plot(figsize=fsize)

## Patterns

The discussed methods work well but are local to a cell or notebook. At some point one may find that certain recurring implementation patterns emerge. For instance, visualizing and selecting a certain type of data may be repeatedly reimplemented in the same way. Similarly the same visualizations may be required in different notebooks. In addition handling state becomes difficult and messy over time.

Thus, it makes sense to eventually encapsulate recurring patterns such as widget and visualization constellations into reusable components which can simply be imported into new notebooks. State can be stored in class members.

In [None]:
import mywidgets

tse = mywidgets.TSExplorer(data, layout=widgets.Layout(width='980px', height='490px'))

In [None]:
tse

In [None]:
tse.selected_data

In [None]:
tse

In [None]:
tse.reset(selected_feature=True, selected_dates=True)
tse.selected_data

## Jupyter Notebook Extensions

  - Hide input all
  - Table of Contents (2)
  - Variable Inspector
  - Highlight selected word
  - Hide input
  - Scratchpad

## Key takeaways
  1. **Combine interactive widgets and visualizations into reusable components**
     * Helps to apply the DRY principle
     * Facilitates building common tools and share them
     * Manage state
  2. **Use jupyter extensions to adapt the environment to the current context**
     * Reduce distractions
     * Quicker navigation

## Resources

### Discussed in this notebook
  - [pandas styles docs](https://pandas.pydata.org/pandas-docs/stable/style.html)
  - [seaborn](https://seaborn.pydata.org)
  - [bokeh](https://bokeh.pydata.org)
  - [bqplot](https://github.com/bloomberg/bqplot)
  - [ipywidgets](https://ipywidgets.readthedocs.io)
  - [Jupyter notebook extensions](https://jupyter-contrib-nbextensions.readthedocs.io)
  
### Going further
  - [beakerx](http://beakerx.com)
  - [jupyterlab](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html)