
![bokeh_logo.png](./bokeh_logo.png)

### **What is it?**
Bokeh is an open-souce Python library for making web-interactive visualizations, including 
plots, live-streaming plots, and dashboards. Bokeh visualizations and dashboards can be easily embedded into webpages,
creating smooth and quick-loading interactions, without having to know much JavaScript or importing other packages outside the bokeh system.


### **Where is it? Link to project webpage, github**
* Here is Bokeh's documentation website: https://docs.bokeh.org/en/latest/
* Here is Bokeh's Github Repository: https://github.com/bokeh/bokeh

### **Who developed it?**
* Bokeh is open-source, and therefore its code is derived widely and from many communities. 
However there are "dedicated core developers" that work on Bokeh. About 2-3 core developers 
dedicate most of their time to Bokeh at a given time. 

### **Why was it created?**
 Bokeh was created to help people create interactive visualizations in web browsers, including in Jupyter notebooks. 
It's especially useful to connect PyData tools (e.g. NumPy, SciPy, Pandas, sklearn, etc) to scalable and deployable web "data apps" with little mucking around in "web tech". 
This allows people to add interactive elements to websites while working only in Python. 

# Potential Use in Environmental Data Science
- creating interactive visualizations to communicate information
- allowing people to play with data, seeing impact of manipulating parameters
- engage non-scientists

Really cool paper about importance of environmental data visualization and best practices!! :
> Sam Grainger, Feng Mao, Wouter Buytaert, Environmental data visualisation for non-scientific contexts: Literature review and design framework, Environmental Modelling & Software, Volume 85, 2016, Pages 299-318,ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2016.09.004. 

**Real World Example**: *Building Energy Efficiency*
> "With regards to my research, a report telling a building owner how much electricity they can save by changing their AC schedule is nice, but it’s more effective to give them an interactive graph where they can choose different schedules and see how their choice affects electricity consumption."

https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-one-getting-started-a11655a467d4

# Examples

### Let's create a simple, interactive plot with bokeh!

1. ALWAYS start by importing necessary functions:
    - once you start making more complex figures, check out bokeh's documentation for what functions you'll need to import [here](https://docs.bokeh.org/en/latest/docs/user_guide.html)

In [21]:
from bokeh.plotting import figure, show

2. Make up some data to create a simple line chart:

In [22]:
x = [2, 4, 3, 7, 9]
y = [4, 6, 3, 8, 2]

3. Set up the figure using the function `figure()` and add a title and labels

In [23]:
p = figure(title = 'bokeh intro plot!',
           x_axis_label = 'x',
           y_axis_label = 'y')

4. Time to add the data using `line()`

In [24]:
p.line(x, y)

# Can add more arguments like line_width, legend_label, etc.

5. Check out your plot using `show()` 
    - the output is an .html file which opens in browser

In [25]:
show(p)

# More Complex Examples

In [6]:
# import libraries 
import pandas as pd
import numpy as np

# Examples with Olympic Athletes

### Goal
We wanted to make a scatter plot that compared the number of gold, silver, and bronze medals won by a country and show the exact number of medals when you hover over the point on the plot.

In [7]:
# choose data
olympic_athletes = pd.read_csv('../data/Olympic_Athletes/athlete_events.csv')

# make subset of data with only the medaling teams
olympic_subset = olympic_athletes[
    ['Name', 'Team', 'Year', 'Event', 'Medal']
].dropna().sort_values(by = 'Year') # drop rows with na values

# select one country -- China
china_medals = olympic_subset.loc[olympic_subset['Team'] == 'China'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count()

china_pivot = china_medals.pivot_table(index = 'Year',
                                       columns = 'Medal',
                                       values = 'Name')

china_pivot.head()

Medal,Bronze,Gold,Silver
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1984,37.0,24.0,13.0
1988,30.0,4.0,16.0
1992,15.0,14.0,44.0
1994,2.0,,1.0
1996,15.0,13.0,66.0


In [8]:
# to use the bokeh library start with this line
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import HoverTool, ColumnDataSource, SingleIntervalTicker
from bokeh.models import Axis, LinearAxis

source = ColumnDataSource(data=dict(year=china_pivot.index, 
                                    gold=china_pivot['Gold'], 
                                    silver=china_pivot['Silver'], 
                                    bronze=china_pivot['Bronze']))

# create figure
p = figure(title="Medals Won at the Olympics by China", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None) # remove default x-axis

# add a circle renderer with
# size, color and alpha
p.circle('year', 'gold', 
         size = 10, color = "gold", alpha = 0.5, source=source, legend_label=str('Gold'))
p.circle('year', 'silver', 
         size = 10, color = "silver", alpha = 0.5, source=source, legend_label=str('Silver'))
p.circle('year', 'bronze', 
         size = 10, color = "brown", alpha = 0.5, source=source, legend_label=str('Bronze'))

# Specify what shows up when you hover over the point
hover = HoverTool()
hover.tooltips = [
    ("(year,gold)", "(@year, @gold)"),
    ("(year,silver)", "(@year, @silver)"),
    ("(year,bronze)", "(@year, @bronze)"),
]

# Add the hover tool to the plot
p.add_tools(hover)

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
p.add_layout(xaxis, 'below') # identify where new axis should lay
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format


#output results to notebook 
output_notebook()

# show the results
show(p) 

## Make subset for Germany, Denmark, and Greece

In [9]:
# make subset for germany 
germany_medals = olympic_subset.loc[olympic_subset['Team'] == 'Germany'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

germany_hist = germany_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})


# make subset for denmark
denmark_medals = olympic_subset.loc[olympic_subset['Team'] == 'Denmark'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

denmark_hist = denmark_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})

# make subset for greece
greece_medals = olympic_subset.loc[olympic_subset['Team'] == 'Greece'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

greece_hist = greece_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})


## Simple Histogram

In [26]:
# Function that will map values to colors by creating 'color' column
def assign_color(Medal):
    if Medal == "Gold":
        return "#FFD700"
    elif Medal == "Silver":
        return "#C0C0C0"
    else:
        return "#CD7F32"

# Apply the function to create the "Color" column
germany_hist['Color'] = germany_hist['Medal'].apply(assign_color)
denmark_hist['Color'] = germany_hist['Medal'].apply(assign_color)
greece_hist['Color'] = germany_hist['Medal'].apply(assign_color)
    
# this was needed because 'bronze' is not a color recognized by python :)

In [28]:
# Importing library's
from bokeh.plotting import figure, show, output_notebook

p = figure()
p.vbar('Year', top='Count', width=0.75, fill_alpha= 1, color = 'Color', source = germany_hist)

output_notebook()
show(p)


## Important Side Note:

When making these plots, we realized that the bars were stacking irregularly meaning that there was overlap. Some solutions that we thought would be good would be to separate the medals so that for each year, there would be up to 3 different bars to compare wins. This however, led us down a rabbit hole that I have not recovered from. I could not figure it out. Another alternative, was to make them stack correctly using vbar_stack. This also proved to be outside of my pay grade of 0 dollars. Did not figure that out either. 

Regardless, we had some fun. Below are some more fun plots using athlete and shark attack data. 

# Better Histogram

In [13]:
# germany 
p1 = figure(title="Medals Won at the Olympics by Germany", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p1.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha= 1, 
       color='Color', # the column that we want to color in 
       source = germany_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p1.add_layout(xaxis, 'below') # identify where new axis should lay

In [14]:
# denmark
p2 = figure(title="Medals Won at the Olympics by Denmark", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p2.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha = 1, 
       color='Color', # the column that we want to color in 
       source = denmark_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p2.add_layout(xaxis, 'below') # identify where new axis should lay

In [15]:
# greece
p3 = figure(title="Medals Won at the Olympics by Greece", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p3.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha=1, 
       color = 'Color', # the column that we want to color in 
       source = greece_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p3.add_layout(xaxis, 'below') # identify where new axis should lay

In [16]:
from bokeh.layouts import row, gridplot

show(gridplot([[p1, None], [p2, None], [p3, None]], width= 2000, height=500))


# Pie Chart Example with Shark Attack Data

### Goal
Create a pie chart that showed the top ten countries with the most shark attacks and when you hover over a wedge, it tells you exactly how many attacks were recorded.

In [23]:
# import libraries 
import pandas as pd
import numpy as np

In [18]:
# read in data
shark_attacks = pd.read_csv('../data/attacks.csv', encoding='latin-1').dropna()

# select subset of data we are interested in
shark_sub = shark_attacks[['Country', 'Type', 'Activity']]

In [19]:
# prepare data for pie chart 
shark_pie = shark_sub.groupby([pd.Grouper(key = 'Country')]).count().sort_values(by = 'Activity', 
                                                                                 ascending= False).head(10) # top 10 countries 

In [20]:
# set up bokeh plotting tools
from math import pi
from collections import Counter
from bokeh.palettes import Category20c
from bokeh.plotting import figure, show, output_notebook
from bokeh.transform import cumsum, factor_cmap
from bokeh.models import ColumnDataSource, HoverTool

# select source of data
df = shark_pie

# Calculate the angles and add column for each country
df['angle'] = df['Activity'] / df['Activity'].sum() * 2 * pi

# Add colors from Category20c palette and make new column 
df['color'] = Category20c[10]

# Create figure
p = figure(height=350, 
           title = "Activity Distribution by Country", 
           toolbar_location = None,
           tooltips = "@country: @activity", 
           x_range=(-0.5, 1.0))

# Create wedges
p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', # set angle of slice
                           include_zero = True), 
                           end_angle=cumsum('angle'), # set angle of slice 
        line_color = 'white', 
        fill_color = 'color', 
        legend_field = 'Country', 
        source = df) # source of the data

# hover tool to see what country
hover = HoverTool(tooltips=[('Country', '@Country' ' - @Activity')]) # make sure @ matches the column name in dataframe
p.add_tools(hover)

# remove axes
p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
p.title = "Top 10 Countries for Shark Attacks"

show(p)

## Check out our plots on the web page we created with Github Pages! 
https://lunacatalan.github.io/bokeh-how-to/ 

#### **Advantages & Disadvantages?**
### Bokeh vs. Plotly 
#### Advantages of Bokeh:
 * keeps it simple, no compatibility with other languages = streamlined environment  
 * privacy control 
 * more color palette customization 
 * dynamic and faster Dashboards
 * Bokeh offers an option to embed our visualization in the web pages.
 #### Disadvantages of Bokeh: 
 - only in Python, no compatibility with other languages
 - no 3D plotting 
 - degree of interactivity is limited 