# FRC Analytics with Python - Session 11A
# Plotting with Bokeh
#### Last Updated: 14 Jan 2022
We covered basic plotting in session 11. We used the Matplotlib package because it's the most commonly used plotting package for the Python language. It's used in many code examples on the Internet and it is also used as the core of other visualization packages like Seaborn.

Nevertheless, we will use a different plotting package for the scouting system. We want to provide the drive team, strategists, and lead scout with interactive visualizations that allow them to drill down into the data. Users of our visualizations should be able to select whatever teams, matches, or game aspects that interest them. The Bokeh plotting package is superior to Matplotlib in this regard.

The [Bokeh Gallary](https://docs.bokeh.org/en/latest/docs/gallery.html) has several examples of interactive plots. The [Ineractive Movie Explorer](https://demo.bokeh.org/movies) has data filtering features that could be useful in an FRC scouting system. The [Gapminder Visualization on Life Expectancy and Family Size](https://demo.bokeh.org/gapminder) shows how plot animation can show trends that would otherwise be difficult to understand.

## I. Notebook Setup
### Install Packages
If you are using Jupyter Lab (as you should be), you will need to install both the bokeh and bokeh_jupyter packages to run this notebook. Run the following commands in your terminal:
```bash
conda install bokeh=2.4.2
conda install -c conda-forge jupyter_bokeh=3.0.4
````
### Imports
Bokeh requires that we import several different modules.

In [2]:
import bokeh.io as io
import bokeh.models as models
import bokeh.plotting as plotting
import bokeh.resources as resources
import pandas as pd

ModuleNotFoundError: No module named 'bokeh'

### Configuration
Configure Bokeh to display plots in a Jupyter notebook with the `bokeh.io.output_notebook()` function.

In [None]:
# Configures notebook to display Bokeh Charts
io.output_notebook()

Bokeh normally tries to load additional resources over the Internet. We'll go over the reason for that later. If you are trying to run this notebook without Internet access, replace the code in the previous cell with this code:
```python
res = resources.Resources("inline")
io.output_notebook(res)
```


## II. First Plot
For our first Bokeh plot, we'll load the housing data that we used for the Matplotlib session.

In [3]:
housing = pd.read_csv("housing.csv")
housing.head()

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,ocean_proximity
0,-122.23,37.88,41.0,880.0,129.0,322.0,126.0,8.3252,452600.0,NEAR BAY
1,-122.22,37.86,21.0,7099.0,1106.0,2401.0,1138.0,8.3014,358500.0,NEAR BAY
2,-122.24,37.85,52.0,1467.0,190.0,496.0,177.0,7.2574,352100.0,NEAR BAY
3,-122.25,37.85,52.0,1274.0,235.0,558.0,219.0,5.6431,341300.0,NEAR BAY
4,-122.25,37.85,52.0,1627.0,280.0,565.0,259.0,3.8462,342200.0,NEAR BAY


The data is from 1990 census data for California. Each row represents housing data for a different census tract. There are over 20,000 rows in the dataframe.

In [4]:
housing.shape

(20640, 10)

Let's look at the column names.

In [5]:
housing.columns

Index(['longitude', 'latitude', 'housing_median_age', 'total_rooms',
       'total_bedrooms', 'population', 'households', 'median_income',
       'median_house_value', 'ocean_proximity'],
      dtype='object')

Now let's display a plot of median house value by median income for each census tract.

In [17]:
# First Bokeh Plot

# Step 1: Create a Figure object.
housing_plot = plotting.figure(
    title="Median House Value",
    x_axis_label="Median Income",
    y_axis_label="Median Home Value",
    width=800, height=400)

# Step 2: Plot the data (first 1000 rows)
housing_plot.scatter("median_income",
                     "median_house_value",
                     source=housing.iloc[0:1000, :])

# Step 3: Tweak the plot to make it look nice.
housing_plot.yaxis[0].formatter = (
    models.NumeralTickFormatter(format="$0a"))

# Step 4: Display the plot
plotting.show(housing_plot)

### Interactive Features
We didn't do anything special in our code to make our plot interactive, but it has interactive features anyway. By selecting different tools from the toolbar in the upper right corner of the plot, we can pan, zoom in, export the plot to an image file, and return to the original zoom settings.

Let's break down this code.
### Step 1: Creating a Figure Object
The first step in creating a Bokeh chart is to create a `bokeh.plotting.figure.Figure` object. This object represents a Bokeh plot. In this step we were able to set the plot's size, title, and axes labels. There are other parameters that can be passed to the `figure()` function. See [the Bokeh reference for the Figure object](https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html#id2) for a complete list.

### Step 2: Plotting the Data
We plotted the data by calling our `Figure` objects `.dot()` method. Try replacing `dot()` with `scatter()`, or `square()`. These methods are called glyph methods and there are a lot of them. See [the Bokeh reference for the Figure object](https://docs.bokeh.org/en/latest/docs/reference/plotting/figure.html#id2) for a complete list.

It was easy to get the data into the plot. Passing our dataframe to the `.dot()` function in the `source` parameter allowed us to control which columns were plotted by passing in column names.

### Step 3: Tweaking the Plot
Comment out the line for step 3 to see what it does. It formats the y axis to show home price in thousands of dollars. This is much easier to read than the scientific notation that was used by default. This is just one of the many ways that plot appearance can be customized.

### Step 4: Show the Plot
Unlike in MatplotLib, we have to call a function (`bokeh.io.show()`) to display our plot.

## III. How Bokeh Works
Instead of displaying the plot in the notebook, use the `bokeh.output_file()` function to save the plot to a file.

In [18]:
io.save(housing_plot, "housing.html", resources.CDN)

'C:\\Users\\stacy\\OneDrive\\Projects\\Python_Training\\pyclass_frc\\pyclass_frc\\sessions\\s11_visualization\\housing.html'

That's interesting. With most plotting packages we would export plots to graphics files like *.png* or *.jpg*, but with Bokeh we are exporting the plot to an HTML file.

The preceding code cell created an HTML file called *housing.html*. Assuming that you are working in Jupyter Lab, find the file in Jupyter Lab's file browser, right click on it, and select *Open With->Editor*. A new Jupyter window will open up with the file's HTML source code.

The beginning of *housing.html* looks like a regular HTML file, with `<html>`, `<head>`, and `<title>` elements. But nothing in this page looks like an image or plot.

To understand how Bokeh works, we need to understnd three interesting elements in the webpage.

#### A. BokehJS JavaScript Module
The page loads an external JavaScript module from an external webpage. This is the BokehJS JavaScript module that contains the functions and objects needed to make the page interactive.

```html
<script type="text/javascript"
        src="https://cdn.bokeh.org/bokeh/release/bokeh-2.4.2.min.js">
</script>
```

#### B. JSON Script Element
The page contains a script element with a large section of JSON text. the first few lines are displayed below. Line breaks have been added to reduce horizontal scrolling.
```html
<script type="application/json" id="2358">
{"261f6d85-a94c-4648-879c-42ea33a0191c":{"defs":[],
"roots":{"references":[{"attributes":{"below":[{"id":"2093"}],"center":[{"id":"2096"},{"id":"2100"}],"height":400,"left":[{"id":"2097"}],"renderers":[{"id":"2120"}],"title":{"id":"2083"},"toolbar":{"id":"2108"},"width":800,"x_range":{"id":"2085"},"x_scale":{"id":"2089"},"y_range":{"id":"2087"},"y_scale":{"id":"2091"}},"id":"2082","subtype":"Figure","type":"Plot"},{"attributes":{},"id":"2106","type":"HelpTool"},{"attributes":{},"id":"2104","type":"SaveTool"},
{"attributes":{},"id":"2101","type":"PanTool"},
{"attributes":{},"id":"2105","type":"ResetTool"},
{"attributes":{"bottom_units":"screen","coordinates":null,"fill_alpha":0.5,"fill_color":"lightgrey",
"group":null,"left_units":"screen","level":"overlay","line_alpha":1.0,"line_color":
"black","line_dash":[4,4],"line_width":2,"right_units":"screen","syncable":false,"top_units":"screen"},
"id":"2107","type":"BoxAnnotation"},{"attributes":{"coordinates":null,"data_source":{"id":"2115"},"glyph":{"id":"2117"},"group":null,"hover_glyph":null,"muted_glyph":{"id":"2119"},"nonselection_glyph":{"id":"2118"},"view":{"id":"2121"}},"id":"2120","type":"GlyphRenderer"},{"attributes":{"source":{"id":"2115"}},"id":"2121","type":"CDSView"},{"attributes":{"data":{"households":{"__ndarray__":"AAAAAACAX0AAAAAAAMiRQAAAAAAAIGZAAAAAAABga0AAAAAAADBwQAAAAAAAIG ...
```

#### C. Final `<script>` Tag
```html
        <script type="text/javascript">
          (function() {
            const fn = function() {
              Bokeh.safely(function() {
                (function(root) {
         ...
```


#### D. Explanation
1. The BokehJS JavaScript module (see III.A) contains functions that generate a plot from the large JSON string (see III.B).
2. The JSON string contains all of the information needed to generate the plot, including the source data, axis labels, titles, and formatting. Much of the JSON is human readable. The long random alphanumeric strings following `"__ndarray__"` are textual encodings of binary arrays of floating point values.
3. The final `<script>` element calls functions from the BokehJS module (see III.C) that convert the JSON data to a graphical plot.

By default, Bokeh displays the plot using an [HTML5 `<canvas>` element](https://developer.mozilla.org/en-US/docs/Web/API/Canvas_API). Canvas elements allow web designers to draw lines and shapes in a webpage using JavaScript. Since the plot was drawn with JavaScript, it can be *modified* with JavaScript. This is where the interactivity comes from. When you use the plot's tools to pan or zoom, JavaScript is running within the webpage that updates the plot.

Bokeh can also be configured to draw plots with [Scalable Vector Graphics (SVG)](https://developer.mozilla.org/en-US/docs/Web/SVG). SVG uses an XML-based markup language to draw graphic elements in a webpage. One big difference between SVG and canvas elements is that SVG graphics are vector-based and canvas graphics are raster-based. SVG will perform better in some situations and canvas elements will perform better in others.

Bokeh provides a third option for drawing plots called WebGL. WebGL enables browsers to use hardware acceleration to render complex graphic objects. For example, if we were to plot all 26,000 rows in our housing dataset, we could used WebGL to get better performance.

Bokeh refers to svg, canvas, and webgl as *backends*. Unless we run into problems, we'll stick with the default canvas backend for the scouting system. Search Bokeh's documentation for the term *backend* to learn how to change Bokeh's default behavior.

## IV. Reading Assignment

### A. Bokeh Documentation: First Steps
Now is a good time to read [Bokeh's First steps guide](https://docs.bokeh.org/en/latest/docs/first_steps.html). At a minimum, read [First steps 1: Creating a Line Chart](https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_1.html) and [First Steps 2: Adding and Customizing Renderers](https://docs.bokeh.org/en/latest/docs/first_steps/first_steps_2.html). I encourage you to scan the other sections.

### B. Bokeh Tutorial
Alternatively, Bokeh provides an [interactive, online tutorial](https://mybinder.org/v2/gh/bokeh/bokeh-notebooks/master?filepath=tutorial%2F00%20-%20Introduction%20and%20Setup.ipynb). The Tutorial uses an older version of Bokeh (version 1.4), but it could still be helpful.

## V. Bokeh Scouting Charts From Prior Years

### A. 2019: Hosting Static Files on Github
In 2019 we used Bokeh to embed charts in [static HTML pages that we hosted on Github](https://github.com/irs1318dev/scouting2019). Click on the links in the README page to view charts from our competitions. [This one is the points chart from the PNW championships](https://irs1318dev.github.io/scouting2019/pncmp/pointschart.html).

Hosting our plots on Github was a big improvement. Anyone with a smart phone could view our scouting data, and we could update the charts with a `git push...` command. But it still had drawbacks. Consider the [one-team charts that we generated for each team](https://irs1318dev.github.io/scouting2019/pncmp/oneteam_index.html). We had to generate and upload a different HTML file for each chart. This was time-consuming and wasteful.

[The code that generated the 2019 charts is here](https://github.com/irs1318dev/irsScouting2017/tree/master/server/season/s2019/view).

### B. 2020: Serving Interactive Charts with a Bokeh Server
In 2020 we created a standalone chart viewer that used a [Bokeh server](https://docs.bokeh.org/en/latest/docs/user_guide/server.html). You can run this application from terminal. Run the following command from the folder that contains this notebook:
```bash
bokeh serve --show viewer_app --args vif.pickle
```

Your browser should open a window that contains interactive Bokeh charts. The code that created this application is in the *viewer_app* subfolder. The first Python module to run is the *main.py* module.

Learn more about building [Bokeh applications in the Bokeh User Guide](https://docs.bokeh.org/en/latest/docs/user_guide/server.html#building-bokeh-applications). The viewer_app uses the directory format for Bokeh applications.

### C. Robot Path Viewer
Go to http://pviewer.herokuapp.com/pviewer to see an interactive Bokeh application for viewing FRC robot path data. The [scource code for the path viewer is available on Github](https://github.com/irwinsnet/frc_path_viewer).

## VI. Project: Play Around with Old Scouting Data

### A. Load Some Scouting Data
Let's load some scouting data from an FRC competition. The data is stored in a pickle file.

In [17]:
import pickle

# The vif.pickle file contains scouting data from the
# 2020 Glacier Peak competition.
with open("vif.pickle", "rb") as sfile:
    sdata = pickle.load(sfile)

The `sdata` variable holds a dictionary with several keys.

In [13]:
sdata.keys()

dict_keys(['measures', 'enum_measures', 'schedule', 'teams', 'event', 'season', 'status'])

For convenience, let's extract every key value into its own variable.

In [14]:
for key in sdata.keys():
    locals()[key] = sdata[key]

### B. Exploratory Data Analysis (EDA)
The `measures` and `enum_measures` dataframes contain the actual scouting data and are almost exactly the same. The difference is in how each table treats enumerated measure types. In `measures`, the enumerated value is stored in the *capability* column. In `enum_measures`, the enumerated value is appended to the *task*. The difference is readily apparent in the first three rows of the table.

In [15]:
measures.head()

Unnamed: 0,date,event,season,level,match,alliance,team,station,actor,task,measuretype,phase,attempt,reason,capability,successes,attempts,cycle_times,last_match,num_matches
0,2020-02-29T15:03:00,wasno,2020,qual,024-q,red,1294,1,robot,startingPosition,enum,auto,summary,na,Goal,0,0,0,9,12
1,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,startingPosition,enum,auto,summary,na,Goal,0,0,0,7,12
2,2020-03-01T09:48:00,wasno,2020,qual,062-q,red,1294,1,robot,startingPosition,enum,auto,summary,na,Cen,0,0,0,3,12
3,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,launchInner,count,auto,summary,na,,0,0,0,7,12
4,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,movedAuto,boolean,auto,summary,na,,1,1,0,7,12


In [16]:
enum_measures.head()

Unnamed: 0,date,event,season,level,match,alliance,team,station,actor,task,measuretype,phase,attempt,reason,capability,successes,attempts,cycle_times,last_match,num_matches
0,2020-02-29T15:03:00,wasno,2020,qual,024-q,red,1294,1,robot,startingPosition_Goal,enum,auto,summary,na,Goal,1,0,0,9,12
1,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,startingPosition_Goal,enum,auto,summary,na,Goal,1,0,0,7,12
2,2020-03-01T09:48:00,wasno,2020,qual,062-q,red,1294,1,robot,startingPosition_Cen,enum,auto,summary,na,Cen,1,0,0,3,12
3,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,launchInner,count,auto,summary,na,,0,0,0,7,12
4,2020-02-29T16:29:00,wasno,2020,qual,034-q,blue,1294,1,robot,movedAuto,boolean,auto,summary,na,,1,1,0,7,12


The `schedule` dataframe is in narrow format, with one team on each row and six rows per match.

In [7]:
schedule.head()

Unnamed: 0,last_match,id,date,level,match,alliance,team,station,event_id
0,1,206211,2020-03-01T11:12:00,qual,074-q,blue,4681,3,25395
1,1,206208,2020-03-01T11:12:00,qual,074-q,red,8032,3,25395
2,1,206206,2020-03-01T11:12:00,qual,074-q,red,2910,1,25395
3,1,206207,2020-03-01T11:12:00,qual,074-q,red,4513,2,25395
4,1,206209,2020-03-01T11:12:00,qual,074-q,blue,4309,1,25395


The `teams` dataframe is self explanatory.

In [8]:
teams.head()

Unnamed: 0,id,name,long_name,city,state,region,year_founded,matches_played
0,1426,2903,NeoBots,Arlington,Washington,,2009,12
1,54311,7118,ScotBots,Shoreline,Washington,,2018,12
2,5098,4513,Circuit Breakers,Medical Lake,Washington,,2013,12
3,1421,2910,Jack in the Bot,Mill Creek,Washington,,2009,12
4,6,5588,Reign Robotics,Seattle,Washington,,2015,12


The `event` and `season` variables contain simple, descriptive strings.

In [18]:
print("event:\t", event)
print("season:\t", season)

event:	 wasno
season:	 2020


The `status` dataframe was used by the old scouting systems to contain it's current state. It's not needed for this exercise.

In [19]:
status

Unnamed: 0,event_id,match,date
0,25395,074-q,2020-03-01T11:12:00


### C. Project: Make Some Charts with Bokeh
Do some more EDA and make some cool charts with Bokeh using the scouting data that we just loaded.