# Introduction to Bokeh: interactive data visulization

Reference:
1. https://realpython.com/python-data-visualization-bokeh/
2. https://www.tutorialspoint.com/bokeh/bokeh_tutorial.pdf
3. https://docs.bokeh.org/en/latest/docs/first_steps.html
4. https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-one-getting-started-a11655a467d4

## 1. Intro to Bokeh

### 1.1 What is Bokeh? 
- Python package, interactive data visualization 
- Matplotlib and Seaborn are static; Bokeh is equally powerful tool for visulization and dynamic
- Bokeh renders its graphics using HTML and JavaScript -> building web-based dashboards and applications. 

### 1.2 Generate a figure 

In [25]:
# Bokeh Libraries
from bokeh.io import output_notebook, output_file
from bokeh.plotting import figure, show

# My x-y coordinate data
x = [1, 2, 1]
y = [1, 1, 2]

# Output the visualization directly in the notebook
output_notebook() 

# Create a figure with no toolbar and axis ranges of [0,3]
fig = figure(title='My Coordinates',
             plot_height=300, plot_width=300,
             x_range=(0, 3), y_range=(0, 3))

# Draw the coordinates as circles
fig.circle(x=x, y=y,
           color='green', size=10, alpha=0.5)

# Show plot
show(fig)

#### Note: 
**Figure function**: Create a new Figure for plotting. \
https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html

**Two popular methods of output**  
1. Output the visualization directly in the notebook: 
   ***output_notebook()*** 
2. Write the visualization to a static HTML file:
   ***output_file('name_of_html.html', title='title_name')***


## 2. Adding interaction

### Configuring the Toolbar

**The default toolbar**：Pan, Box Zoom, Wheel Zoom, Save, Reset    
The toolbar can be removed by passing ***toolbar_location=None*** when instantiating a figure() object, or relocated by passing any of 'above', 'below', 'left', or 'right'.


**Bokeh offers 18 specific tools across five categories:**
- ***Pan/Drag***: box_select, box_zoom, lasso_select, pan, xpan, ypan, resize_select
- ***Click/Tap***: poly_select, tap
- ***Scroll/Pinch***: wheel_zoom, xwheel_zoom, ywheel_zoom
- ***Actions***: undo, redo, reset, save
- ***Inspectors***: crosshair, hover

In [26]:
from datetime import datetime
import pandas as pd
import numpy as np

In [27]:
# reference:https://www.kaggle.com/datasets/deepcontractor/unicorn-companies-dataset?resource=download
# read in data
data = pd.read_csv("Unicorn_Companies.csv",  parse_dates=['Date Joined'])

# data cleaning
def clean_Total_Raised(text):
    try:
        if text[-1] == 'K':
            res = float(text[1:-1])*10**-6
        elif text[-1] == 'M':
            res = float(text[1:-1])*10**-3
        elif text[-1] == 'B':
            res = float(text[1:-1])
        else:
            res = np.nan
    except:
        res = np.nan
    return res

data['Valuation ($B)'] = data['Valuation ($B)'].apply(lambda x: float(x.replace('$', '')))
data['Founded Year'] = data['Founded Year'].apply(lambda x: None if x == "None" else int(x))

data['Total Raised'] = data['Total Raised'].apply(lambda x: clean_Total_Raised(x))
data['Investors Count'] = data['Investors Count'].apply(lambda x: None if x == "None" else int(x))
data['Deal Terms'] = data['Deal Terms'].apply(lambda x: None if x == "None" else int(x))
data["Country"] = data["Country"].astype('category')
data["City"] = data["City"].astype('category')
data["Industry"] = data["Industry"].astype('category')
data["Financial Stage"] = data["Financial Stage"].astype('category')

data.rename(columns={'Valuation ($B)': 'Valuation_($B)',
                     'Date Joined': 'Date_Joined',
                     'Select Inverstors': 'Select_Inverstors',
                     'Founded Year': 'Founded_Year',
                     'Total Raised': 'Total_Raised_($B)',
                     'Financial Stage': 'Financial_Stage',
                     'Investors Count': 'Investors_Count',
                     'Deal Terms': 'Deal_Terms',
                     'Portfolio Exits': 'Portfolio_Exits'
                     }, inplace=True)

In [28]:
data.head(2)

Unnamed: 0,Company,Valuation_($B),Date_Joined,Country,City,Industry,Select_Inverstors,Founded_Year,Total_Raised_($B),Financial_Stage,Investors_Count,Deal_Terms,Portfolio_Exits
0,Bytedance,140.0,2017-04-07,China,Beijing,Artificial intelligence,"Sequoia Capital China, SIG Asia Investments, S...",2012.0,7.44,IPO,28.0,8.0,5.0
1,SpaceX,100.3,2012-12-01,United States,Hawthorne,Other,"Founders Fund, Draper Fisher Jurvetson, Rothen...",2002.0,6.874,,29.0,12.0,


### 2.1 Selecting Data Points

In [29]:
from bokeh.models import ColumnDataSource, NumeralTickFormatter
import seaborn as sns

In [30]:
# Store the data in a ColumnDataSource
Valuation_totalRaised = ColumnDataSource(data)

# Specify the selection tools to be made available
select_tools = ['lasso_select','box_select', 'reset']

# Create the figure
fig = figure(plot_height=400,
             plot_width=600,
             x_axis_label='Total Raised ($B)',
             y_axis_label='Valuation ($B)',
             toolbar_location='below',
             tools=select_tools)

# Add square representing each player
fig.square(x='Total_Raised_($B)',
           y='Valuation_($B)',
           source=Valuation_totalRaised,
           color='royalblue',
           selection_color='deepskyblue',
           nonselection_color='lightgray',
           nonselection_alpha=0.3)

# Visualize
show(fig)

### 2.2 Adding Hover Actions

In [31]:
# Bokeh Library
from bokeh.models import HoverTool

# Format the tooltip
tooltips = [
            ('Company','@Company'),
            ('Country', '@Country'),
            ('Industry', '@Industry'),
            ('Financial Stage','@Financial_Stage'),
           ]

# Add the HoverTool to the figure
fig.add_tools(HoverTool(tooltips=tooltips))

# Visualize
show(fig)

## 2.3 Linking Axes and Selections

In [32]:
data['Joined_Year'] = pd.DatetimeIndex(data['Date_Joined']).year
year_view = pd.DataFrame({'total_number_of_companies': data.groupby('Joined_Year').size(),
                                'ave_valuation_$B': data.groupby('Joined_Year')['Valuation_($B)'].sum()/data.groupby('Joined_Year').size(),
                                'ave_raised_$B': data.groupby('Joined_Year')['Total_Raised_($B)'].sum()/data.groupby('Joined_Year').size(),
                                'ave_investor_cnts': data.groupby('Joined_Year')['Investors_Count'].sum()/data.groupby('Joined_Year').size()             
                            }).reset_index()

In [33]:
# Bokeh Libraries
from bokeh.models import ColumnDataSource, CategoricalColorMapper, Div
from bokeh.layouts import gridplot, column

In [34]:
# Store the data in a ColumnDataSource
year_view_cds = ColumnDataSource(year_view)

#Create a dict with the stat name and its corresponding column in the data
stat_names = {'Total number of companies': 'total_number_of_companies',
              'Average valuation ($B)': 'ave_valuation_$B',
              'Average raised ($B)': 'ave_raised_$B',
              'Average investor counts': 'ave_investor_cnts'}

# The figure for each stat will be held in this dict
stat_figs = {}

# For each stat in the dict
for stat_label, stat_col in stat_names.items():

    # Create a figure
    fig = figure(y_axis_label=stat_label, 
                 plot_height=200, plot_width=400,
                 x_range=(2015,2022), tools=['xpan', 'reset', 'save'])

    # Configure vbar
    fig.vbar(x='Joined_Year', top=stat_col, source=year_view_cds, width=0.9)

    # Add the figure to stat_figs dict
    stat_figs[stat_label] = fig
    
    
# Create layout
# grid = gridplot([[stat_figs['total_number_of_companies'], stat_figs['ave_valuation_$B']], 
#                 [stat_figs['ave_raised_$B'], stat_figs['ave_investor_cnts']]])

grid = gridplot([[stat_figs['Total number of companies'], stat_figs['Average valuation ($B)']], 
                [stat_figs['Average raised ($B)'], stat_figs['Average investor counts']]])


# Link together the x-axes
stat_figs['Total number of companies'].x_range = \
    stat_figs['Average valuation ($B)'].x_range = \
    stat_figs['Average raised ($B)'].x_range = \
    stat_figs['Average investor counts'].x_range

sup_title = Div(text='Year analysis')
show(column(sup_title, grid))

### 2.4 Highlighting Data Using the Legend

In [35]:
# Bokeh Libraries
from bokeh.plotting import figure, show
from bokeh.io import output_file
from bokeh.models import ColumnDataSource, CDSView, GroupFilter
from bokeh.layouts import row

In [36]:
# Store the data in a ColumnDataSource
us_china_cds = ColumnDataSource(data)

In [37]:
# Create a view for each country
china_filters = [GroupFilter(column_name='Country', group='China')]
china_view = CDSView(source=us_china_cds,
                      filters=china_filters)

us_filters = [GroupFilter(column_name='Country', group='United States')]
us_view = CDSView(source=us_china_cds,
                      filters=us_filters)

# Consolidate the common keyword arguments in dicts
common_figure_kwargs = {
    'plot_width': 600,
    'x_axis_label': 'Joined Year',
    'toolbar_location': None,
}
common_circle_kwargs = {
    'x': 'Joined_Year',
    'y': 'Valuation_($B)',
    'source': us_china_cds,
    'size': 15,
    'alpha': 0.7,
}
common_china_kwargs = {
    'view': china_view,
    'color': '#002859',
    'legend_label': 'China'
}
common_us_kwargs = {
    'view': us_view,
    'color': '#FFC324',
    'legend_label': 'United States'
}

# Create the two figures and draw the data
hide_fig = figure(**common_figure_kwargs,
                  title='Click Legend to HIDE Data', 
                  y_axis_label='Valuation ($B)')
hide_fig.circle(**common_circle_kwargs, **common_china_kwargs)
hide_fig.circle(**common_circle_kwargs, **common_us_kwargs)

# Add interactivity to the legend
hide_fig.legend.click_policy = 'hide'

# Visualize
show(hide_fig)

## 3. Add Widgets

https://docs.bokeh.org/en/2.4.0/docs/user_guide/interaction/widgets.html?highlight=widget

### Example 1: 
https://github.com/WillKoehrsen/Bokeh-Python-Visualization/tree/master/application

In [38]:
import pandas as pd
import numpy as np

from bokeh.io import show, output_notebook, push_notebook
from bokeh.plotting import figure

from bokeh.models import CategoricalColorMapper, HoverTool, ColumnDataSource, Panel
from bokeh.models.widgets import CheckboxGroup, Slider, RangeSlider, Tabs, TableColumn, DataTable

from bokeh.layouts import column, row, WidgetBox
from bokeh.palettes import Category20_16

from bokeh.application.handlers import FunctionHandler
from bokeh.application import Application

output_notebook()

In [39]:
# Load in flights and inspect
flights = pd.read_csv('complete_flights.csv', index_col=0)[['arr_delay', 'carrier', 'name']]

# Available carrier list
available_carriers = list(flights['name'].unique())
# Sort the list in-place (alphabetical order)
available_carriers.sort()

flights.head()

Unnamed: 0,arr_delay,carrier,name
0,11.0,UA,United Air Lines Inc.
1,20.0,UA,United Air Lines Inc.
2,33.0,AA,American Airlines Inc.
3,-18.0,B6,JetBlue Airways
4,-25.0,DL,Delta Air Lines Inc.


In [40]:
def modify_doc(doc):
    
    def make_dataset(carrier_list, range_start = -60, range_end = 120, bin_width = 5):

        by_carrier = pd.DataFrame(columns=['proportion', 'left', 'right', 
                                           'f_proportion', 'f_interval',
                                           'name', 'color'])
        range_extent = range_end - range_start

        # Iterate through all the carriers
        for i, carrier_name in enumerate(carrier_list):

            # Subset to the carrier
            subset = flights[flights['name'] == carrier_name]

            # Create a histogram with 5 minute bins
            arr_hist, edges = np.histogram(subset['arr_delay'], 
                                           bins = int(range_extent / bin_width), 
                                           range = [range_start, range_end])

            # Divide the counts by the total to get a proportion
            arr_df = pd.DataFrame({'proportion': arr_hist / np.sum(arr_hist), 'left': edges[:-1], 'right': edges[1:] })

            # Format the proportion 
            arr_df['f_proportion'] = ['%0.5f' % proportion for proportion in arr_df['proportion']]

            # Format the interval
            arr_df['f_interval'] = ['%d to %d minutes' % (left, right) for left, right in zip(arr_df['left'], arr_df['right'])]

            # Assign the carrier for labels
            arr_df['name'] = carrier_name

            # Color each carrier differently
            arr_df['color'] = Category20_16[i]

            # Add to the overall dataframe
            by_carrier = by_carrier.append(arr_df)

        # Overall dataframe
        by_carrier = by_carrier.sort_values(['name', 'left'])

        return ColumnDataSource(by_carrier)
    
    def style(p):
        # Title 
        p.title.align = 'center'
        p.title.text_font_size = '20pt'
        p.title.text_font = 'serif'

        # Axis titles
        p.xaxis.axis_label_text_font_size = '14pt'
        p.xaxis.axis_label_text_font_style = 'bold'
        p.yaxis.axis_label_text_font_size = '14pt'
        p.yaxis.axis_label_text_font_style = 'bold'

        # Tick labels
        p.xaxis.major_label_text_font_size = '12pt'
        p.yaxis.major_label_text_font_size = '12pt'

        return p
    
    def make_plot(src):
        # Blank plot with correct labels
        p = figure(plot_width = 700, plot_height = 700, 
                  title = 'Histogram of Arrival Delays by Carrier',
                  x_axis_label = 'Delay (min)', y_axis_label = 'Proportion')

        # Quad glyphs to create a histogram
        p.quad(source = src, bottom = 0, top = 'proportion', left = 'left', right = 'right',
               color = 'color', fill_alpha = 0.7, hover_fill_color = 'color', legend = 'name',
               hover_fill_alpha = 1.0, line_color = 'black')

        # Hover tool with vline mode
        hover = HoverTool(tooltips=[('Carrier', '@name'), 
                                    ('Delay', '@f_interval'),
                                    ('Proportion', '@f_proportion')],
                          mode='vline')

        p.add_tools(hover)

        # Styling
        p = style(p)

        return p
    
    def update(attr, old, new):
        carriers_to_plot = [carrier_selection.labels[i] for i in carrier_selection.active]
        
        new_src = make_dataset(carriers_to_plot,
                               range_start = range_select.value[0],
                               range_end = range_select.value[1],
                               bin_width = binwidth_select.value)

        src.data.update(new_src.data)

        
    carrier_selection = CheckboxGroup(labels=available_carriers, active = [0, 1])
    carrier_selection.on_change('active', update)
    
    binwidth_select = Slider(start = 1, end = 30, 
                         step = 1, value = 5,
                         title = 'Delay Width (min)')
    binwidth_select.on_change('value', update)
    
    range_select = RangeSlider(start = -60, end = 180, value = (-60, 120),
                               step = 5, title = 'Delay Range (min)')
    range_select.on_change('value', update)
    
    
    
    initial_carriers = [carrier_selection.labels[i] for i in carrier_selection.active]
    
    src = make_dataset(initial_carriers,
                      range_start = range_select.value[0],
                      range_end = range_select.value[1],
                      bin_width = binwidth_select.value)
    
    p = make_plot(src)
    
    # Put controls in a single element
    controls = WidgetBox(carrier_selection, binwidth_select, range_select)
    
    # Create a row layout
    layout = row(controls, p)
    
    # Make a tab with the layout 
    tab = Panel(child=layout, title = 'Delay Histogram')
    tabs = Tabs(tabs=[tab])
    
    doc.add_root(tabs)
    
# Set up an application
handler = FunctionHandler(modify_doc)
app = Application(handler)

In [41]:
# show(app, 'localhost:8889')
show(app)

### Example 2: 

https://danielmuellerkomorowska.com/2021/08/02/interactive-data-dashboards-in-jupyter-notebook-with-ipywidgets-and-bokeh/

In [42]:
from sklearn import datasets
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show, output_notebook
import ipywidgets as widgets
from IPython.display import display, clear_output

In [43]:
"""Load Iris dataset and transform the pandas DataFrame"""
iris = datasets.load_iris()
data = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
data.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [44]:
 """Define callback function for the UI"""
def var_dropdown(x):
    """This function is executed when a dropdown value is changed.
    It creates a new figure according to the new dropdown values."""
    p = create_figure(
    x_dropdown.children[0].value,
    y_dropdown.children[0].value,
    data)
    fig[0] = p
     
    for species, checkbox in species_checkboxes.items():
        check = checkbox.children[0].value
        fig[0].select_one({'name': species}).visible = check
     
    with output_figure:
        clear_output(True)
        show(fig[0])
    fig[0]=p
     
    return x
 
def f_species_checkbox(x, q):
    """This function is executed when a checkbox is clicked.
    It directly changes the visibility of the current figure."""
    fig[0].select_one({'name': q}).visible = x
    with output_figure:
        clear_output(True)
        show(fig[0])
    return x
 
def create_figure(x_var, y_var, data):
    """This is a helper function that creates a new figure and 
    plots values from all three species. x_var and y_var control
    the features on each axis."""
    species_colors=['coral', 'deepskyblue', 'darkblue']
    p = figure(title="",
               x_axis_label=x_var,
               y_axis_label=y_var)
    species_nr = 0
    for species in iris['target_names']:
        curr_dtps = data['target'] == species_nr
        circle = p.circle(
            data[x_var][curr_dtps],
            data[y_var][curr_dtps],
            line_width=2,
            color=species_colors[species_nr],
            name=species
            )
        species_nr += 1
    return p
 
# The output widget is where we direct our figures
output_figure = widgets.Output()
 
# Create the default figure
fig = []  # Storing the figure in a singular list is a bit of a 
          # hack. We need it to properly mutate the current
          # figure in our callbacks.
p = create_figure(
    iris['feature_names'][0],
    iris['feature_names'][1],
    data)
fig.append(p)
with output_figure:
    show(fig[0])


# Checkboxes to select visible species.
species_checkboxes = {}
for species in iris['target_names']:
    curr_cb = widgets.interactive(f_species_checkbox,
                                  x=True,
                                  q=widgets.fixed(species))
    curr_cb.children[0].description = species
    species_checkboxes[species] = curr_cb
     
"""Create the widgets in the menu"""
# Dropdown menu for x-axis feature.
x_dropdown = widgets.interactive(var_dropdown,
                                 x=iris['feature_names']);
x_dropdown.children[0].description = 'x-axis'
x_dropdown.children[0].value = iris['feature_names'][0]
 
# Dropdown menu for y-axis feature.
y_dropdown = widgets.interactive(var_dropdown,
                                 x=iris['feature_names']);
y_dropdown.children[0].description = 'y-axis'
y_dropdown.children[0].value = iris['feature_names'][1]
 
 
 
# This creates the menu 
menu=widgets.VBox([x_dropdown,
                   y_dropdown,
                   *species_checkboxes.values()])


"""Create the full app with menu and output"""
# The Layout adds some styling to our app.
# You can add Layout to any widget.
app_layout = widgets.Layout(display='flex',
                flex_flow='row nowrap',
                align_items='center',
                border='none',
                width='100%',
                margin='5px 5px 5px 5px')
 
# The final app is just a box
app=widgets.Box([menu, output_figure], layout=app_layout)
 
# Display the app
display(app)

Box(children=(VBox(children=(interactive(children=(Dropdown(description='x-axis', options=('sepal length (cm)'…

In [45]:
# Note:

# ## 1. reset output method
# # Import reset_output (only needed once) 
# from bokeh.plotting import reset_output
# # Use reset_output() between subsequent show() calls, as needed
# reset_output()

# output_file('name_of_html.html', title='title_name')
# output_notebook() 