## Brief Description
#### What is it?
Bokeh is an open-souce Python library for making web-interactive visualizations, including 
plots, streaming plots, and dashboards. Bokeh visualizations and dashboards can be easily embedded into webpages,
creating smooth and quick-loading interactions that is unmatched by other interactive visualization libraries 
such as Plotly. 

Bokeh creates shareable, interactive data applications for modern browsers, connecting versatile graphics to PyData tools and to streaming or large datasets, all without having to know JS.  

#### Where is it? Link to project webpage, github
* Here is Bokeh's documentation website: https://docs.bokeh.org/en/latest/
* Here is Bokeh's Github Repository: https://github.com/bokeh/bokeh
- Who developed?
* Bokeh is open-source, and therefore its code is derived widely and from many communities. 
However there are "dedicated core developers" that work on Bokeh. About 2-3 core developers 
dedicate most of their time to Bokeh at a given time. 

#### Why was it created?
 Bokeh was created to help people create interactive visualizations in web browsers, including in Jupyter notebooks. 
It's especially useful to connect PyData tools (e.g. NumPy, SciPy, Pandas, sklearn, etc) to scalable and deployable web "data apps" with little mucking around in "web tech". 
This allows people to add interactive elements to websites while working only in Python. 

#### Advantages & Disadvantages? 
### Bokeh vs. Plotly 
#### Advantages of Bokeh:
 * keeps it simple, no compatibility with other languages = streamlined environment  
 * privacy control 
 * more color palette customization 
 * dynamic and faster Dashboards
 * Bokeh offers an option to embed our visualization in the web pages.
 #### Disadvantages of Bokeh: 
 - only in Python, no compatilbility with other languages
 - no 3D plotting 
 - degree of interactivity is limited 


# Potential Use in Environmental Data Science
- creating interactive visualizations to communicate information
    - This is very important for environmental data science!!!
- allowing people to play with data, seeing impact of manipulating parameters
- engage non-scientists

Reall cool paper about importance of environmental data visualization and best practices!! :
    - Sam Grainger, Feng Mao, Wouter Buytaert, Environmental data visualisation for non-scientific contexts: Literature review and design framework, Environmental Modelling & Software, Volume 85, 2016, Pages 299-318,ISSN 1364-8152, https://doi.org/10.1016/j.envsoft.2016.09.004. 

**Real World Example**: *Building Energy Efficiency*
> "With regards to my research, a report telling a building owner how much electricity they can save by changing their AC schedule is nice, but it’s more effective to give them an interactive graph where they can choose different schedules and see how their choice affects electricity consumption." >

https://towardsdatascience.com/data-visualization-with-bokeh-in-python-part-one-getting-started-a11655a467d4

# Quick tutorial/example, using Env. Data Sci example if possible!
Okay to incorporate examples found elsewhere (provide link and acknowledgement!)

# import libraries 
import pandas as pd
import numpy as np

In [2]:
# import libraries 
import pandas as pd
import numpy as np


# Examples with Olympic Athletes

### Tools used from bokeh 
from bokeh.plotting import figure, show

from bokeh.models import SingleIntervalTicker, LinearAxis, HoverTool, ColumnDataSource, Axis

### Goal
We wanted to make a scatter plot that compared the number of gold, silver, and bronze medals won by country.

### Challenges 
- what we made / conclusion

In [13]:
# choose data
olympic_athletes = pd.read_csv('../data/Olympic_Athletes/athlete_events.csv')

# make subset of data with only the medaling teams
olympic_subset = olympic_athletes[
    ['Name', 'Team', 'Year', 'Event', 'Medal']
].dropna().sort_values(by = 'Year') # drop rows with na values

# # group the data 
# by_team = olympic_subset.groupby([pd.Grouper(key = 'Year'),
#                                  pd.Grouper(key = 'Medal'),
#                                  pd.Grouper(key = 'Team')]).count()

# by_team.head(10)

# select one country
china_medals = olympic_subset.loc[olympic_subset['Team'] == 'China'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count()

china_pivot = china_medals.pivot_table(index = 'Year',
                                       columns = 'Medal',
                                       values = 'Name')



Unnamed: 0,Name,Team,Year,Event,Medal
23916,Conrad Helmut Fritz Bcker,Germany,1896,"Gymnastics Men's Horizontal Bar, Teams",Gold
244276,Georgios Tsitas,Greece,1896,"Wrestling Men's Unlimited Class, Greco-Roman",Silver
214352,Carl Schuhmann,Germany,1896,"Gymnastics Men's Horizontal Bar, Teams",Gold
214351,Carl Schuhmann,Germany,1896,"Wrestling Men's Unlimited Class, Greco-Roman",Gold
214348,Carl Schuhmann,Germany,1896,"Gymnastics Men's Parallel Bars, Teams",Gold
194678,"Leonidas ""Leon"" Pyrgos",Greece,1896,"Fencing Men's Foil, Masters, Individual",Gold
204732,Richard Rstel,Germany,1896,"Gymnastics Men's Parallel Bars, Teams",Gold
204734,Richard Rstel,Germany,1896,"Gymnastics Men's Horizontal Bar, Teams",Gold
214346,Carl Schuhmann,Germany,1896,Gymnastics Men's Horse Vault,Gold
42249,Ellery Harding Clark,United States,1896,Athletics Men's High Jump,Gold


In [10]:
# # to use the bokeh library start with this line
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import HoverTool, ColumnDataSource
from bokeh.models import Axis

source = ColumnDataSource(data=dict(year=china_pivot.index, 
                                    gold=china_pivot['Gold'], 
                                    silver=china_pivot['Silver'], 
                                    bronze=china_pivot['Bronze']))

## FH: set tooltips

# figure = china_pivot['Bronze']
# figure.plot(kind='hist', 
#             title='Distribution of Minutes')

# create figure, add tooltips feature
p = figure(title="Medals Won at the Olympics by China", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None) # remove defalt x-axis

# add a circle renderer with
# size, color and alpha
p.circle('year', 'gold', 
         size = 10, color = "gold", alpha = 0.5, source=source, legend_label=str('Gold'))
p.circle('year', 'silver', 
         size = 10, color = "silver", alpha = 0.5, source=source, legend_label=str('Silver'))
p.circle('year', 'bronze', 
         size = 10, color = "brown", alpha = 0.5, source=source, legend_label=str('Bronze'))

hover = HoverTool()
hover.tooltips = [
    ("(year,gold)", "(@year, @gold)"),
    ("(year,silver)", "(@year, @silver)"),
    ("(year,bronze)", "(@year, @bronze)"),

]

# Add the hover tool to the plot
p.add_tools(hover)

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
p.add_layout(xaxis, 'below') # identify where new axis should lay
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format


#output results to notebook 
output_notebook()

# show the results
show(p) 

## Make subset for Germany, Denmark, and Greece

In [100]:
# make subset for germany 
germany_medals = olympic_subset.loc[olympic_subset['Team'] == 'Germany'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

germany_hist = germany_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})


# make subset for denmark
denmark_medals = olympic_subset.loc[olympic_subset['Team'] == 'Denmark'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

denmark_hist = denmark_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})

# make subset for greece
greece_medals = olympic_subset.loc[olympic_subset['Team'] == 'Greece'].groupby([pd.Grouper(key = 'Year'),
                                                                              pd.Grouper(key = 'Medal')]).count().reset_index()

greece_hist = greece_medals[['Year', 'Medal', 'Name']].rename(columns={'Name' : 'Count'})


Unnamed: 0,Year,Medal,Count
0,1896,Bronze,2
1,1896,Gold,24
2,1896,Silver,5
3,1900,Gold,1
4,1900,Silver,1
5,1904,Bronze,6
6,1904,Gold,4
7,1904,Silver,5
8,1906,Bronze,6
9,1906,Gold,14


## Simple Histogram

In [112]:
# Function to map values to colors
def assign_color(Medal):
    if Medal == "Gold":
        return "#FFD700"
    elif Medal == "Silver":
        return "#C0C0C0"
    else:
        return "#CD7F32"

# Apply the function to create the "Color" column
germany_hist['Color'] = germany_hist['Medal'].apply(assign_color)
denmark_hist['Color'] = germany_hist['Medal'].apply(assign_color)
greece_hist['Color'] = germany_hist['Medal'].apply(assign_color)
    


In [151]:
# Importing library's
import numpy as np
from bokeh.plotting import figure, show, output_notebook
output_notebook()

p = figure()
p.vbar('Year', top='Count', width=0.75, fill_alpha=1, color = 'Color', source = germany_hist)

show(p)


# Better Histogram

In [114]:
# germany 
p1 = figure(title="Medals Won at the Olympics by Germany", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p1.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha= 0.5, 
       color='Color', # the column that we want to color in 
       source = germany_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p1.add_layout(xaxis, 'below') # identify where new axis should lay


show(p1)

In [115]:
# denmark
p2 = figure(title="Medals Won at the Olympics by Denmark", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p2.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha = 0.5, 
       color='Color', # the column that we want to color in 
       source = denmark_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p2.add_layout(xaxis, 'below') # identify where new axis should lay


show(p2)

In [93]:
# greece
p3 = figure(title="Medals Won at the Olympics by Greece", 
           x_axis_label='Year', 
           y_axis_label='Number of Medals',
           x_axis_type = None, # remove default x axis
           width = 2000) # make width of figure larger

p3.vbar('Year', # the column that we want to have on the x axis
       top='Count', # the column with the values we want to plot
       width=0.75, # width of the bars 
       fill_alpha=0.5, 
       color = 'Color', # the column that we want to color in 
       source = greece_hist,
       legend_field='Medal')

# change axis ticks
ticker = SingleIntervalTicker(interval = 2, # make year interval on new axis 2
                              num_minor_ticks = 0) # set number of ticks between major ticks
xaxis = LinearAxis(ticker = ticker)
xaxis.formatter.use_scientific = False # turns x axis into non-scientific format
p3.add_layout(xaxis, 'below') # identify where new axis should lay


show(p3)

In [94]:
from bokeh.layouts import row, gridplot

show(gridplot([[p1, None], [p2, None], [p3, None]], width= 2000, height=500))


# Example with Shark Attack Data

- tools used from bokeh 
- goal
- challenges 
- what we made / conclusion

In [6]:
# import libraries 
import pandas as pd
import numpy as np


In [7]:
# choose data
shark_attacks = pd.read_csv('../data/attacks.csv', encoding='latin-1').dropna()


# select subset of data we are interested in
shark_sub = shark_attacks[['Country', 'Type', 'Activity']]

shark_sub.head()



Unnamed: 0,Country,Type,Activity
0,AUSTRALIA,Unprovoked,Body boarding
5,BAHAMAS,Unprovoked,Snorkeling
12,USA,Invalid,Surfing
22,AUSTRALIA,Unprovoked,Surfing
23,USA,Unprovoked,Surfing


In [8]:

# prepare data for pie chart 
shark_pie = shark_sub.groupby([pd.Grouper(key = 'Country')]).count().sort_values(by = 'Activity', 
                                                                                 ascending= False).head(10) # top 10 countries 
shark_pie.head()



Unnamed: 0_level_0,Type,Activity
Country,Unnamed: 1_level_1,Unnamed: 2_level_1
USA,693,693
AUSTRALIA,303,303
SOUTH AFRICA,217,217
BAHAMAS,26,26
NEW ZEALAND,20,20


In [9]:

# set up bokeh plotting tools
from math import pi
from collections import Counter
from bokeh.palettes import Category20c
from bokeh.plotting import figure, show, output_notebook
from bokeh.transform import cumsum, factor_cmap
import pandas as pd
from bokeh.models import ColumnDataSource, HoverTool

# select source of data
df = shark_pie

# Calculate the angles and add column for each country
df['angle'] = df['Activity'] / df['Activity'].sum() * 2 * pi

# Add colors from Category20c palette and make new column 
df['color'] = Category20c[10]


p = figure(height=350, 
           title = "Activity Distribution by Country", 
           toolbar_location = None,
           tooltips = "@country: @activity", 
           x_range=(-0.5, 1.0))

p.wedge(x=0, y=1, radius=0.4,
        start_angle=cumsum('angle', # set angle of slice
                           include_zero = True), 
                           end_angle=cumsum('angle'), # set angle of slice 
        line_color = 'white', 
        fill_color = 'color', 
        legend_field = 'Country', 
        source = df) # source of the data

# hover tool to see what country
hover = HoverTool(tooltips=[('Country', '@Country' ' - @Activity')]) # make sure @ matches the column name in dataframe
p.add_tools(hover)


p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None
p.title = "Top 10 Countries for Shark Attacks"

show(p)