<a name="cid1"></a>
Prev: [Data Visualization with Pandas, Matplotlib, and Plotly](../09_visualization/visualization.ipynb) | [Table of Contents](../toc.ipynb) | Next: [Data Visualization with Plotly](numpy.ipynb)

<a name="cid2"></a>
# Data Visualization with Bokeh
## I. Introduction
We covered basic plotting with the Matplotlib package in [lesson 10]numpy.ipynb. We used the Matplotlib package because it's the most commonly used plotting package for the Python language, it's similar to Matlab, it's used in many Internet code examples, and several other visualization packages are based on it (like Seaborn).

Nevertheless, we use a different plotting package for the scouting system. We want to provide the drive team, strategists, and lead scout with *interactive* visualizations that allow them to drill down into the data. Users should be able to select whatever teams, matches, or game aspects that interest them. The Bokeh plotting package is superior to Matplotlib in this regard.

The [Bokeh Gallary](https://docs.bokeh.org/en/latest/docs/gallery.html) has several examples of interactive plots. The [Ineractive Movie Explorer](https://demo.bokeh.org/movies) has data filtering features that could be useful in an FRC scouting system. The [Gapminder Visualization on Life Expectancy and Family Size](https://demo.bokeh.org/gapminder) shows how plot animation can show trends that would otherwise be difficult to understand.

## II. Notebook Setup
### A. If Running this Notebook on Google Colab
Run the next cell to download data files into the local folder on Google Colab.

In [None]:
!wget -nv https://raw.githubusercontent.com/irs1318dev/python2023/main/output/10_visualization_bokeh/get_files.sh
!bash get_files.sh

<a name="cid3"></a>
### B. If Running Notebook Locally in Jupyter
You don't need to run the `wget` and `bash` commands if you are using Google Colab because the Bokeh package is already installed.

If you are using Jupyter Lab (as you should be), you will need to install both the bokeh and bokeh_jupyter packages to run this notebook. Run the following commands in your terminal:
```bash
conda install -c conda-forge bokeh
conda install -c conda-forge jupyter_bokeh
```

<a name="cid4"></a>
### C. Imports
Bokeh requires that we import several different modules.

In [None]:
import bokeh.io as io
import bokeh.models as models
import bokeh.plotting as plotting
import bokeh.resources as resources
import pandas as pd

<a name="cid5"></a>
### D. Configuration
Configure Bokeh to display plots in a Jupyter notebook with the `bokeh.io.output_notebook()` function.

In [None]:
# Configures notebook to display Bokeh Charts
io.output_notebook()

<a name="cid6"></a>
Bokeh normally tries to load additional resources over the Internet. We'll go over the reason for that later. If you are trying to run this notebook without Internet access, replace the code in the previous cell with this code:
```python
res = resources.Resources("inline")
io.output_notebook(res)
```

<a name="cid7"></a>
II. First Plot
For our first Bokeh plot, we'll load the housing data that we used for the Matplotlib session.

In [None]:
housing = pd.read_csv("housing.csv")
housing.head()

<a name="cid8"></a>
The data is from 1990 census data for California. Each row represents housing data for a different census tract. There are over 20,000 rows in the dataframe.

In [None]:
housing.shape

<a name="cid9"></a>
Let's look at the column names.

In [None]:
housing.columns

<a name="cid10"></a>
Now let's display a plot of median house value by median income for each census tract.

In [None]:
# First Bokeh Plot

# Step 1: Create a Figure object.
housing_plot = plotting.figure(
    title="Median House Value",
    x_axis_label="Median Income",
    y_axis_label="Median Home Value",
    width=800, height=400)

# Step 2: Plot the data (first 1000 rows)
housing_plot.scatter("median_income",
                     "median_house_value",
                     source=housing.iloc[0:1000, :])

# Step 3: Tweak the plot to make it look nice.
housing_plot.yaxis[0].formatter = (
    models.NumeralTickFormatter(format="$0a"))

# Step 4: Display the plot
plotting.show(housing_plot)

<a name="cid11"></a>
### E. Interactive Features
We didn't do anything special in our code to make our plot interactive, but it has interactive features anyway. By selecting different tools from the toolbar in the upper right corner of the plot, we can pan, zoom in, export the plot to an image file, and return to the original zoom settings.

Let's break down this code.
### F. Step 1: Creating a Figure Object
The first step in creating a Bokeh chart is to create a `bokeh.plotting.figure.Figure` object. This object represents a Bokeh plot. In this step we were able to set the plot's size, title, and axes labels. There are other parameters that can be passed to the `figure()` function. See [the Bokeh reference for the Figure object](https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html#id2) for a complete list.

### G. Step 2: Plotting the Data
We plotted the data by calling our `Figure` objects `.dot()` method. Try replacing `dot()` with `scatter()`, or `square()`. These methods are called glyph methods and there are a lot of them. See [the Bokeh reference for the Figure object](https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html#id2) for a complete list.

It was easy to get the data into the plot. Passing our dataframe to the `.dot()` function in the `source` parameter allowed us to control which columns were plotted by passing in column names.

### H. Step 3: Tweaking the Plot
Comment out the line for step 3 to see what it does. It formats the y axis to show home price in thousands of dollars. This is much easier to read than the scientific notation that was used by default. This is just one of the many ways that plot appearance can be customized.

### I. Step 4: Show the Plot
Unlike in MatplotLib, we have to call a function (`bokeh.io.show()`) to display our plot.

<a name="cid12"></a>
## III. How Bokeh Works
Instead of displaying the plot in the notebook, use the `bokeh.output_file()` function to save the plot to a file.

In [None]:
io.save(housing_plot, "housing.html", resources.CDN, title="Bokeh Housing Plot")

<a name="cid13"></a>
With most plotting packages we would export plots to graphics files like *.png* or *.jpg*, but with Bokeh we are exporting the plot to an *HTML* file. The preceding cell created an HTML file called *housing.html*.
* If you are working on Google Colab, click on the file viewer button (looks like a file folder) in the toolbar on the left. Then double click on *housing.html* to open it.
* If you are working in Jupyter, find the file in Jupyter Lab's file browser, right click on it, and select *Open With->Editor*. A new Jupyter window will open up with the file's HTML source code.

The beginning of *housing.html* looks like a regular HTML file, with `<html>`, `<head>`, and `<title>` elements. But nothing in this page looks like an image or plot. To understand how Bokeh works, we need to understnd three interesting elements in the webpage.

### A. BokehJS JavaScript Module
The page loads a JavaScript module from an external webpage. This is the BokehJS JavaScript module that contains the functions and objects needed to make the page interactive.

```html
<script type="text/javascript"
        src="https://cdn.bokeh.org/bokeh/release/bokeh-2.4.2.min.js">
</script>
```

### B. JSON Script Element
The page contains a script element with a large section of JSON text. the first few lines are displayed below. Line breaks have been added to reduce horizontal scrolling.
```html
<script type="application/json" id="2358">
{"261f6d85-a94c-4648-879c-42ea33a0191c":{"defs":[],
"roots":{"references":[{"attributes":{"below":[{"id":"2093"}],"center":[{"id":"2096"},{"id":"2100"}],"height":400,"left":[{"id":"2097"}],"renderers":[{"id":"2120"}],"title":{"id":"2083"},"toolbar":{"id":"2108"},"width":800,"x_range":{"id":"2085"},"x_scale":{"id":"2089"},"y_range":{"id":"2087"},"y_scale":{"id":"2091"}},"id":"2082","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"2106","type":"HelpTool"},{"attributes":{},"id":"2104","type":"SaveTool"},
{"attributes":{},"id":"2101","type":"PanTool"},
{"attributes":{},"id":"2105","type":"ResetTool"},
{"attributes":{"bottom_units":"screen","coordinates":null,"fill_alpha":0.5,"fill_color":"lightgrey",
"group":null,"left_units":"screen","level":"overlay","line_alpha":1.0,"line_color":
"black","line_dash":[4,4],"line_width":2,"right_units":"screen","syncable":false,"top_units":"screen"},
"id":"2107","type":"BoxAnnotation"},{"attributes":{"coordinates":null,"data_source":{"id":"2115"},"glyph":{"id":"2117"},"group":null,"hover_glyph":null,"muted_glyph":{"id":"2119"},"nonselection_glyph":{"id":"2118"},"view":{"id":"2121"}},"id":"2120","type":"GlyphRenderer"},{"attributes":{"source":{"id":"2115"}},"id":"2121","type":"CDSView"},{"attributes":{"data":{"households":{"__ndarray__":"AAAAAACAX0AAAAAAAMiRQAAAAAAAIGZAAAAAAABga0AAAAAAADBwQAAAAAAAIG ...
```

### C. Final `<script>` Tag
```html
        <script type="text/javascript">
          (function() {
            const fn = function() {
              Bokeh.safely(function() {
                (function(root) {
         ...
```

<a name="cid14"></a>
#### 1. Explanation
1. The BokehJS JavaScript module (see III.A) contains functions that generate a plot from the large JSON string (see III.B).
2. The JSON string contains all of the information needed to generate the plot, including the source data, axis labels, titles, and formatting. Much of the JSON is human readable. The long random alphanumeric strings following `"__ndarray__"` are textual encodings of binary arrays of floating point values.
3. The final `<script>` element calls functions from the BokehJS module (see III.C) that convert the JSON data to a graphical plot.

By default, Bokeh displays the plot using an [HTML5 `<canvas>` element](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API). Canvas elements allow web designers to draw lines and shapes in a webpage using JavaScript. Since the plot was drawn with JavaScript, it can be *modified* with JavaScript. This is where the interactivity comes from. When you use the plot's tools to pan or zoom, JavaScript is running within the webpage that updates the plot.

Bokeh can also be configured to draw plots with [Scalable Vector Graphics (SVG)](https://developer.mozilla.org/en-US/docs/Web/SVG). SVG uses an XML-based markup language to draw graphic elements in a webpage. One big difference between SVG and canvas elements is that SVG graphics are vector-based and canvas graphics are raster-based. SVG will perform better in some situations and canvas elements will perform better in others.

Bokeh provides a third option for drawing plots called WebGL. WebGL enables browsers to use hardware acceleration to render complex graphic objects. For example, if we were to plot all 26,000 rows in our housing dataset, we could used WebGL to get better performance.

Bokeh refers to svg, canvas, and webgl as *backends*. Unless we run into problems, we'll stick with the default canvas backend for the scouting system. Search Bokeh's documentation for the term *backend* to learn how to change Bokeh's default behavior.

<a name="cid15"></a>
## IV. Reading Assignment

### A. Bokeh Documentation: First Steps
Now is a good time to read [Bokeh's First steps guide](https://docs.bokeh.org/en/latest/docs/first_steps.html). At a minimum, read [First steps 1: Creating a Line Chart](https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_1.html) and [First Steps 2: Adding and Customizing Renderers](https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_2.html). I encourage you to scan the other sections.

### B. Bokeh Tutorial
Alternatively, Bokeh provides an [interactive, online tutorial](https://mybinder.org/v2/gh/bokeh/bokeh-notebooks/master?filepath=tutorial%2F00%20-%20Introduction%20and%20Setup.ipynb). The Tutorial uses an older version of Bokeh (version 1.4), but it could still be helpful.

<a name="cid16"></a>
## V. Bokeh Scouting Charts From Prior Years

### A. 2019: Hosting Static Files on Github
In 2019 we used Bokeh to embed charts in [static HTML pages that we hosted on Github](https://github.com/irs1318dev/scouting2019). Click on the links in the README page to view charts from our competitions. [This one is the points chart from the PNW championships](https://irs1318dev.github.io/scouting2019/pncmp/pointschart.html).

Hosting our plots on Github was a big improvement. Anyone with a smart phone could view our scouting data, and we could update the charts with a `git push...` command. But it still had drawbacks. Consider the [one-team charts that we generated for each team](https://irs1318dev.github.io/scouting2019/pncmp/oneteam_index.html). We had to generate and upload a different HTML file for each chart. This was time-consuming and wasteful.

[The code that generated the 2019 charts is here](https://github.com/irs1318dev/irsScouting2017/tree/master/server/season/s2019/view).

<a name="cid17"></a>
### B. 2020: Serving Interactive Charts with a Bokeh Server
In 2020 we created an interactive visualization application and put it online. [Click here to try out the application.](https://irs1318-viewer.herokuapp.com/app) There are multiple tabs of data. Several of the charts have dropdown select boxes that allow users to change what data is displayed.

Building interactive Bokeh applications is beyond the scope of this notebook. Learn more about building [Bokeh applications in the Bokeh User Guide](https://docs.bokeh.org/en/latest/docs/user_guide/server.html#building-bokeh-applications). The viewer_app uses the directory format for Bokeh applications.

### C. Robot Path Viewer
Go to http://pviewer.herokuapp.com/pviewer to see an interactive Bokeh application for viewing FRC robot path data. The [scource code for the path viewer is available on Github](https://github.com/irwinsnet/frc_path_viewer).

<a name="cid18"></a>
## VI. Project: Play Around with Old Scouting Data

### A. Load Some Scouting Data
Let's load some scouting data from an FRC competition. The data is stored in a pickle file.

In [None]:
import pickle

# The vif.pickle file contains scouting data from the
# 2020 Glacier Peak competition.
with open("vif.pickle", "rb") as sfile:
    sdata = pickle.load(sfile)

<a name="cid19"></a>
The `sdata` variable holds a dictionary with several keys.

In [None]:
sdata.keys()

<a name="cid20"></a>
For convenience, let's extract every key value into its own variable.

In [None]:
for key in sdata.keys():
    locals()[key] = sdata[key]

<a name="cid21"></a>
### B. Exploratory Data Analysis (EDA)
The `measures` and `enum_measures` dataframes contain the actual scouting data and are almost exactly the same. The difference is in how each table treats enumerated measure types. In `measures`, the enumerated value is stored in the *capability* column. In `enum_measures`, the enumerated value is appended to the *task*. The difference is readily apparent in the first three rows of the table.

In [None]:
measures.head()

In [None]:
enum_measures.head()

<a name="cid22"></a>
The `schedule` dataframe is in narrow format, with one team on each row and six rows per match.

In [None]:
schedule.head()

<a name="cid23"></a>
The `teams` dataframe is self explanatory.

In [None]:
teams.head()

<a name="cid24"></a>
The `event` and `season` variables contain simple, descriptive strings.

In [None]:
print("event:\t", event)
print("season:\t", season)

<a name="cid25"></a>
The `status` dataframe was used by the old scouting systems to contain it's current state. It's not needed for this exercise.

In [None]:
status

<a name="cid26"></a>
### C. Project: Make Some Charts with Bokeh
Do some more EDA and make some cool charts with Bokeh using the scouting data that we just loaded.

<a name="cid27"></a>
Prev: [Data Visualization with Pandas, Matplotlib, and Plotly](../09_visualization/visualization.ipynb) | [Table of Contents](../toc.ipynb) | Next: [Data Visualization with Plotly](numpy.ipynb)