## Data Visualization with Bokeh

There are different visualization spaces in Python like Matplolib and Seaborn. Bokeh will render the graphics using HTML and JavaScript. This way your job can be share to the business as an HTML way. You wil learn how to transform your data, vizualize and add interativity to the visualizations

![](../images/Bokeh_Intro.png)

Source: Continuum Analytics

Bokeh has multiple language bindings (Python, R, lua and Julia). These bindings produce a JSON file, which works as an input for BokehJS (a Javascript library), which in turn presents data to the modern web browsers.

### Steps to build a visualization with bokeh

#### Prepare de data: 
To prepare de data you will need to use the pandas and numpy library. Transform it into a form that will be best for your visualization.

#### Determine where to render the Visualization
In this step you will a view of your visualization. Here you will learn the two options that Bokeh has available. Generating an static HTML file and rendering inline in a Jupiter Notebook.

#### Set a figure:
Here you will assemble your figure, to prepare it for being visualize. In this steps you will customize thing like title and marks, and enable interactions to it.

#### Connect and draw the data: 
There are multiple ways to render the data. You can have the felixibility to draw your data from scratch using many available marker and shape options that are easy to customize. This will give you great freedom to represent your data. Additionally Bokeh has some built-in fuctionality like 
[stacked bar charts](https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#stacked) for visual learners.
Some others more advance like [Network graphs](https://docs.bokeh.org/en/latest/docs/user_guide/graph.html)

#### Organize the layout: 
Bokeh has the option to use more than one figure by using stand grid by layout options. It will help you easily to organize your visualizations into tabbed layouts with a few line of code as well as plo can be linked together so a selection can be reflected in other plots.


#### Preview and save your creation: 
This step you will see your visualization either in a browser or a notebook. You can explore it and examine your customizations and play with the interactions with the great interaction that bokeh offers.

In [1]:
""" Template for a visualization in Bokeh 
Outline to help you to turn data into a visualization
""" 

# Data Handling
import pandas as pd
import numpy as np

# Bokeh libraries
from bokeh.io import output_file, output_notebook
from bokeh.plotting import figure,show
from bokeh.models import ColumnDataSource
from bokeh.layouts import row, column, gridplot
from bokeh.models.widgets import Tabs, Panel

# Prepare Data
# any transformations of the data needed to be usable for the chart (pandas,numpy)

# Determine where the visualizations will be rendered
#output_file('filename.html') # Render to Static HTML or
output_notebook() # Render inline in a Jupyter Notebok

# Set figure(s)
fig = figure() #initiate figure object

# Connect draw data

# Organize the layout

# Preview and save
show(fig) # See if what you made, and save if I like it




### Drawing Data with Glyphs
A glyph is a vectorized graphical shape or marker that is used to represent your data, like a circle or a square. More example can be found at the [Bokeh Gallery](https://docs.bokeh.org/en/latest/docs/gallery/markers.html) After the figure is created you get acccess to a [glyph configurable methods](https://docs.bokeh.org/en/latest/docs/reference/plotting.html) 

In [2]:
#Bokeh Libraries
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

# My x-y data
x = [1,2,1]
y = [1,1,2]

# Output the visualization directly in the notebook
output_notebook()

# Create a figure with no toolbar and axis ranges [0,3]
fig = figure(title = 'My Coordinates',
            plot_height=300, plot_width=300,
            x_range =(0,3), y_range=(0,3)
            ,toolbar_location = None)

#Draw the coordinates as circles
fig.circle(x=x, y=y
          ,color='green', size=10, alpha=0.5)

# Show plot
show(fig)

Once your figure is instatiated, you can see how it can be used to draw x-y coordinates data using the customized cicle glyphs 

Here are a few categories of glyphs

* Marker includes shapes like circle, diamons, squares, and triangles, and it is effective for creating visualizations like scatter and bubble charts.

* Line covers things like a single, step, and multi-line shapes that can be used to build a line charts.

* Bar/Rectangule shapes can be used  to create tradition or stacked bar (hbar) and column (vbar) charts as well as waterfall or gantt charts.

More information about glyphs above, as wells as others, can be found in [Bokeh's Reference Guide](https://docs.bokeh.org/en/latest/docs/user_guide/plotting.html).

These glyphs can be combined as needed to fit your visualization needs. Let's say I want to create a visualization that shows how many words I wrote per day to make this tuorial, with an overlaid and trend line of the cumulative work count:

In [8]:
import numpy as np

#Bokeh libraries
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

#My word count data
day_num = np.linspace(1,10,10)
daily_hours_study = [2,3,1,2,0,7,1,1,2,1]
cumulative_words = np.cumsum(daily_hours_study)

#Output visualization directly in the notebook
output_notebook()

#create figure with a datetime type x-axis
fig = figure(title='My Study Progress',
            plot_height=300, plot_width=600,
            x_axis_label='Day Number', y_axis_label ='Study Hours',
            y_range=(0,10),
            toolbar_location=None)

#The daily words will be represented as vertical bars(columns)
fig.vbar(x=day_num, bottom=0, top=daily_hours_study,
        color='blue', width=0.75,
        legend_label='Daily')

#The cumulative sum will be a trend line
fig.line(x=day_num, y=cumulative_words,
        color='gray', line_width=1,
        legend_label='Cumulative')

#Put legend in the upper left corner
fig.legend.location = 'top_left' 

#Let's check it out
show(fig)



To combine the columns and line on the figure you can simply create using the same figure() object.
Additionaly, you can see above how a legend can be created by setting the legend porperty for each glyph. The legend was moved to the leff upper corner of the plot by assigning 'top_left' to fig.legend.location

### Your turn
### Exercise #1
Create a line plot with the data below

In [10]:
#Data
datax =[5, 2, 3, 4, 5]
datay=[5, 7, 2, 4, 5] 

In [18]:
# Your code here

### Note
Anytime you are exploring a new visualization library, it is a good idea to start with some data in the domain you are familiar with. The beauty of Bokeh is that nearly any idea you have should be possible. It is just a matter of how you want to leverage the avaialbe tools to do so.

### Using the ColumnDataSource Object

For data in Python, you are most likely going to come across Python dictionaries and Pandas DataFrames, especially if you’re reading in data from a file or external data source.

Bokeh is well equipped to work with these more complex data structures and even has built-in functionality to handle them, namely the ColumnDataSource.

For one, whether you reference a list, array, dictionary, or DataFrame directly, Bokeh is going to turn it into a ColumnDataSource behind the scenes anyway. More importantly, the ColumnDataSource makes it much easier to implement Bokeh’s interactive affordances.

The ColumnDataSource is foundational in passing the data to the glyphs you are using to visualize. Its primary functionality is to map names to the columns of your data. This makes it easier for you to reference elements of your data when building your visualization. It also makes it easier for Bokeh to do the same when building your visualization.

The ColumnDataSource can interpret three types of data objects:

* Python dict: The keys are names associated with the respective value sequences (lists, arrays, and so forth).

* Pandas DataFrame: The columns of the DataFrame become the reference names for the ColumnDataSource.

* Pandas groupby: The columns of the ColumnDataSource reference the columns as seen by calling groupby.describe().



In [11]:
import pandas as pd

# Read the csv files

player_stats = pd.read_csv('../data/2017-18_playerBoxScore.csv', parse_dates=['gmDate'])
team_stats = pd.read_csv('../data/2017-18_teamBoxScore.csv', parse_dates=['gmDate'])
standings = pd.read_csv('../data/2017-18_standings.csv', parse_dates=['stDate'])

# parse_dates interpret the date columns as datetime objects

We will visualize the NBA Western conference in 2017-18

In [12]:
standings.head()

Unnamed: 0,stDate,teamAbbr,rank,rankOrd,gameWon,gameLost,stk,stkType,stkTot,gameBack,...,rel%Indx,mov,srs,pw%,pyth%13.91,wpyth13.91,lpyth13.91,pyth%16.5,wpyth16.5,lpyth16.5
0,2017-10-17,ATL,2,2nd,0,0,-,-,0,0.5,...,0.0,0.0,0.0,0.5,0.0,0.0,82.0,0.0,0.0,82.0
1,2017-10-17,BKN,2,2nd,0,0,-,-,0,0.5,...,0.0,0.0,0.0,0.5,0.0,0.0,82.0,0.0,0.0,82.0
2,2017-10-17,BOS,15,15th,0,1,L1,loss,1,1.0,...,0.0,-3.0,-3.0,0.4012,0.3977,32.6114,49.3886,0.3793,31.1026,50.8974
3,2017-10-17,CHA,2,2nd,0,0,-,-,0,0.5,...,0.0,0.0,0.0,0.5,0.0,0.0,82.0,0.0,0.0,82.0
4,2017-10-17,CHI,2,2nd,0,0,-,-,0,0.5,...,0.0,0.0,0.0,0.5,0.0,0.0,82.0,0.0,0.0,82.0


In [13]:
df_agg = standings.groupby(['teamAbbr']).agg({'gameWon':sum}).sort_values(by = 'gameWon', ascending=False)
df_agg.head()

Unnamed: 0_level_0,gameWon
teamAbbr,Unnamed: 1_level_1
GS,5289
HOU,5263
BOS,5080
TOR,4759
SA,4275


In [14]:
# Getting the 3rd and 4th place
west_3rd_4rd = (standings[(standings['teamAbbr'] == 'BOS') | (standings['teamAbbr'] == 'TOR')]
    .loc[:, ['stDate', 'teamAbbr', 'gameWon']]
    .sort_values(['teamAbbr','stDate']))

west_3rd_4rd.head()

Unnamed: 0,stDate,teamAbbr,gameWon
2,2017-10-17,BOS,0
32,2017-10-18,BOS,0
62,2017-10-19,BOS,0
92,2017-10-20,BOS,1
122,2017-10-21,BOS,1


In [24]:
# From here, you can load this DataFrame into two ColumnDataSource objects and visualize the race:

#Import libraries
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnarDataSource

#Output file
output_notebook()

#Isolate the data for the RedSox and Toronto Blue Jays
boston_data = west_3rd_4rd[west_3rd_4rd['teamAbbr'] == 'BOS']
toronto_data = west_3rd_4rd[west_3rd_4rd['teamAbbr'] == 'TOR']

# Create a ColumnDatasource object for each item

boston_cds = ColumnDataSource(boston_data)
toronto_cds = ColumnDataSource(toronto_data)

#create and configure the figure
fig = figure(x_axis_type = 'datetime',
            plot_height = 300,
            title = 'Western Conference 2nd and 3rd Teams  Races, 2017,18',
            x_axis_label = 'Date', y_axis_label = 'Wins',
            toolbar_location=None)

# Render the race as step lines
fig.step('stDate', 'gameWon',
        color='#CE1141', legend_label='Boston Celtics', 
        source=boston_cds)

fig.step('stDate', 'gameWon', 
         color='#006BB6', legend_label='Toronto Raptors', 
         source=toronto_cds)

#Move legend to the upper left corner
fig.legend.location = 'top_left'

# Showe the plot

show(fig)

Note: In Bokeh, you can specify colors either by name, hex value, or RGB color code.

ColumnDataSource objects can do more than just serve as an easy way to reference DataFrame columns. The ColumnDataSource object has three built-in filters that can be used to create views on your data using a CDSView object:

* GroupFilter selects rows from a ColumnDataSource based on a categorical reference value
* IndexFilter filters the ColumnDataSource via a list of integer indices
* BooleanFilter allows you to use a list of boolean values, with True rows being selected

In [26]:
# Bokeh libraries
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, CDSView, GroupFilter

# Output to notebook
output_notebook()

# Create a ColumnDataSource
west_cds = ColumnDataSource(west_3rd_4rd)


# Create views for each team
boston_view = CDSView(source=west_cds,
                       filters=[GroupFilter(column_name='teamAbbr', group='BOS')])
toronto_view = CDSView(source=west_cds,
                        filters=[GroupFilter(column_name='teamAbbr', group='TOR')])

# Create and configure the figure
west_fig = figure(x_axis_type='datetime',
                  plot_height=300, plot_width=600,
                  title='Western Conference 2nd and 3rd Place, 2017-18',
                  x_axis_label='Date', y_axis_label='Wins',
                  toolbar_location=None)

# Render the race as step lines
west_fig.step('stDate', 'gameWon',
              source=west_cds, view=boston_view,
              color='#CE1141', legend_label='Celtics')
west_fig.step('stDate', 'gameWon',
              source=west_cds, view=toronto_view,
              color='#006BB6', legend_label='Raptors')

# Move the legend to the upper left corner
west_fig.legend.location = 'top_left'

# Show the plot
show(west_fig)

### Your turn
### Exercise # 2
Create a line plot with the data below

In [27]:
import bokeh.sampledata
bokeh.sampledata.download()

Using data directory: /Users/lilianatorres/.bokeh/data
Skipping 'CGM.csv' (checksum match)
Skipping 'US_Counties.zip' (checksum match)
Skipping 'us_cities.json' (checksum match)
Skipping 'unemployment09.csv' (checksum match)
Skipping 'AAPL.csv' (checksum match)
Skipping 'FB.csv' (checksum match)
Skipping 'GOOG.csv' (checksum match)
Skipping 'IBM.csv' (checksum match)
Skipping 'MSFT.csv' (checksum match)
Skipping 'WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip' (checksum match)
Skipping 'gapminder_fertility.csv' (checksum match)
Skipping 'gapminder_population.csv' (checksum match)
Skipping 'gapminder_life_expectancy.csv' (checksum match)
Skipping 'gapminder_regions.csv' (checksum match)
Skipping 'world_cities.zip' (checksum match)
Skipping 'airports.json' (checksum match)
Skipping 'movies.db.zip' (checksum match)
Skipping 'airports.csv' (checksum match)
Skipping 'routes.csv' (checksum match)
Skipping 'haarcascade_frontalface_default.xml' (checksum match)


In [28]:
from bokeh.sampledata.glucose import data
data.head()

Unnamed: 0_level_0,isig,glucose
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-03-24 09:51:00,22.59,258
2010-03-24 09:56:00,22.52,260
2010-03-24 10:01:00,22.23,258
2010-03-24 10:06:00,21.56,254
2010-03-24 10:11:00,20.79,246


In [30]:
#Your code Here

# First reduce data just to one week (python)
# Second Build the plot using creating  Columndatasource like first example


### Organizing Multiple Visualizations With Layouts

You can add more than one layout same page or also different tabs

In [31]:
west_3rd_4rd = (standings[(standings['teamAbbr'] == 'BOS') | (standings['teamAbbr'] == 'TOR')]
    .loc[:, ['stDate', 'teamAbbr', 'gameWon']]
    .sort_values(['teamAbbr','stDate']))
west_3rd_4rd.head()

Unnamed: 0,stDate,teamAbbr,gameWon
2,2017-10-17,BOS,0
32,2017-10-18,BOS,0
62,2017-10-19,BOS,0
92,2017-10-20,BOS,1
122,2017-10-21,BOS,1


In [33]:
# Bokeh libraries
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, CDSView, GroupFilter

# Output to notebook
output_notebook()

# Create a ColumnDataSource
west_cds = ColumnDataSource(west_3rd_4rd)


# Create views for each team
boston_view = CDSView(source=west_cds,
                       filters=[GroupFilter(column_name='teamAbbr', group='BOS')])

# Create and configure the figure
west_fig1 = figure(x_axis_type='datetime',
                  plot_height=300, plot_width=600,
                  title='Western Conference 2nd place, 2017-18',
                  x_axis_label='Date', y_axis_label='Wins',
                  toolbar_location=None)

# Render the race as step lines
west_fig1.step('stDate', 'gameWon',
              source=west_cds, view=boston_view,
              color='#CE1141', legend_label='Boston Celtics')

# Move the legend to the upper left corner
west_fig1.legend.location = 'top_left'

# Show the plot
show(west_fig1)

In [35]:
# Bokeh libraries
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, CDSView, GroupFilter

# Output to notebook
output_notebook()

# Create a ColumnDataSource
west_cds = ColumnDataSource(west_3rd_4rd)


# Create views for each team
toronto_view = CDSView(source=west_cds,
                        filters=[GroupFilter(column_name='teamAbbr', group='TOR')])

# Create and configure the figure
west_fig2 = figure(x_axis_type='datetime',
                  plot_height=300, plot_width=600,
                  title='Western Conference 3nd place, 2017-18',
                  x_axis_label='Date', y_axis_label='Wins',
                  toolbar_location=None)

# Render the race as step lines
west_fig2.step('stDate', 'gameWon',
              source=west_cds, view=toronto_view,
              color='#006BB6', legend_label='Toronto Raptors')

# Move the legend to the upper left corner
west_fig2.legend.location = 'top_left'

# Show the plot
show(west_fig2)

### Let'add multiple dataViz same page by using gridplot see below

In [36]:
# Bokeh libraries
from bokeh.io import output_notebook
from bokeh.layouts import gridplot

# Output to file
output_notebook()
# Reduce the width of both figures
west_fig1.plot_width = west_fig2.plot_width = 300

# Edit the titles
west_fig1.title.text = '2nd Place'
west_fig2.title.text = '3rd Place'

# Configure the gridplot
east_west_gridplot = gridplot([[west_fig1, west_fig2]], 
                              toolbar_location='right')

# Plot the two visualizations in a horizontal configuration
show(east_west_gridplot)

You can also have a tabbed layout.

A tabbed layout consists of two Bokeh widget functions: Tab() and Panel() from the bokeh.models.widgets sub-module. Like using gridplot(), making a tabbed layout is pretty straightforward:

In [37]:
# Bokeh Library
from bokeh.io import output_notebook
from bokeh.models.widgets import Tabs, Panel

# Output to file
output_notebook

# Increase the plot widths
west_fig1.plot_width = west_fig2.plot_width = 600

# Create two panels, one for each conference
west1_panel = Panel(child=west_fig1, title='2nd place')
west2_panel = Panel(child=west_fig2, title='3rd Place')

# Assign the panels to Tabs
tabs = Tabs(tabs=[west1_panel, west2_panel])

# Show the tabbed layout
show(tabs)

The first step is to create a Panel() for each tab. Tabs() function as the mechanism that organizes the individual tabs created with Panel().

Each Panel() takes as input a child, which can either be a single figure() or a layout. (Remember that a layout is a general name for a column, row, or gridplot.) Once your panels are assembled, they can be passed as input to Tabs() in a list.

Now that you understand how to access, draw, and organize your data, it’s time to move on to the real magic of Bokeh: interaction! As always, check out Bokeh’s User Guide for more information on [layouts]('https://docs.bokeh.org/en/latest/docs/user_guide/layout.html).

### Interaction

The feature that sets Bokeh apart is its ability to easily implement interactivity in your visualization. Bokeh even goes as far as describing itself as an interactive visualization library:

Bokeh is an interactive visualization library that targets modern web browsers for presentation. (Source)

Things you can do with interaction

* Configuring the toolbar
* Selecting data points
* Adding hover actions
* Linking axes and selections
* Highlighting data using the legend

Implementing these interactive elements open up possibilities for exploring your data that static visualizations just can’t do by themselves.

We will cover a few of those.

### Configure toolbar

In [38]:
# Find players who took at least 1 three-point shot during the season
three_takers = player_stats.loc[player_stats['play3PA'] > 0].copy()

#three_takers = player_stats.loc[player_stats['play3PA'] > 0]

three_takers['name'] = three_takers[['playFNm', 'playLNm']].apply(lambda x: ' '.join(x), axis = 1)



# Aggregate the total three-point attempts and makes for each player
three_takers = (three_takers.groupby('name')
                            .sum()
                            .loc[:,['play3PA', 'play3PM']]
                            .sort_values('play3PA', ascending=False))


# Filter out anyone who didn't take at least 100 three-point shots
three_takers = three_takers[three_takers['play3PA'] >= 100].reset_index()

# Add a column with a calculated three-point percentage (made/attempted)
three_takers['pct3PM'] = three_takers['play3PM'] / three_takers['play3PA']

three_takers.head()

Unnamed: 0,name,play3PA,play3PM,pct3PM
0,James Harden,722,265,0.367036
1,Damian Lillard,628,227,0.361465
2,Paul George,609,244,0.400657
3,Eric Gordon,608,218,0.358553
4,Kemba Walker,601,231,0.384359


In [39]:
# Bokeh Libraries
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, NumeralTickFormatter

#output here in the noteobook
output_notebook()

#Store data in ColumnDataSource
three_takers_cds = ColumnDataSource(three_takers)


#Specify the selection tools to be made available
select_tools = ['box_select', 'lasso_select', 'poly_select', 'tap', 'reset']

#Create a figure
fig = figure(plot_height = 400,
            plot_width=600,
            x_axis_label= 'Three-Point Shots Attempted',
            y_axis_label='Points-Made',
            title='3PT Shots Attempted vs. Made (min. 100 3PA), 2017-18',
            toolbar_location = 'below',
            tools = select_tools)

#Format y-axis tick labels as perecentages

fig.yaxis[0].formatter = NumeralTickFormatter(format='00.0%')

# Add square representing each player
fig.square(x ='play3PA',
          y='pct3PM',
          source=three_takers_cds,
          color='royalblue',
          selection_color='deepskyblue',
          nonselection_color='lightgray',
          nonselection_alpha=0.3)
# Visualize
show(fig)

### Adding Hover Actions
So the ability to select specific player data points that seem of interest in my scatter plot is implemented, but what if you want to quickly see what individual players a glyph represents? One option is to use Bokeh’s HoverTool() to show a tooltip when the cursor crosses paths with a glyph. All you need to do is append the following to the code snippet above:

In [40]:
# Bokeh Library
from bokeh.models import HoverTool

# Format the tooltip
tooltips = [
            ('Player','@name'),
            ('Three-Pointers Made', '@play3PM'),
            ('Three-Pointers Attempted', '@play3PA'),
            ('Three-Point Percentage','@pct3PM{00.0%}'),
           ]

# Add the HoverTool to the figure
fig.add_tools(HoverTool(tooltips=tooltips))

# Visualize
show(fig)

### Your turn
### Exercise #3
Create a circle plot with the data below where you can represent  the Points versus rebounds for Lebron James
Bonus points if you use columndatasource view and group filter. 

Hint: use Dataframe to use player_stats and for points playPTS and rebounds playTRB



In [None]:
# Your code here