# Bokeh Lecture - Interactive Data Visualization
## Based on Lecture12_Bokeh.pdf

**April 10, 2024**

## 1. Introduction to Bokeh

Bokeh creates visualizations for display on the web with high interactivity. Unlike matplotlib and seaborn which create static graphics, Bokeh offers:
- Web-based visualizations (locally or embedded in webpages)
- Highly interactive plots
- Ideal for exploratory data analysis and web distribution

In [130]:
# Set up the session by importing the modules needed
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
from bokeh.plotting import figure, output_file, show, output_notebook
import bokeh.io

# Set working directory (adjust path as needed)
#os.chdir("/workspaces/College-folder/semester-1/Programming for Data Analytics/dataset/")

In [131]:
# Load Phoenix Park weather data
pp_weather = pd.read_csv("/workspaces/College-folder/semester-1/Programming for Data Analytics/dataset/phoenix_park_weather.csv")
pp_weather.date = pd.to_datetime(pp_weather.date, format='%d-%b-%y')
pp_weather.set_index(['date'], inplace=True)
pp_weather.sort_index(inplace=True)

print("Phoenix Park Weather Data:")
print(pp_weather.head())

Phoenix Park Weather Data:
            ind  rain  ind.1  maxt  ind.2  mint  gmin soil
date                                                      
2009-01-01    0   0.0      0   4.5      1  -0.7  -4.7     
2009-01-02    0   0.0      0   5.8      0   2.8   0.3     
2009-01-03    0   0.0      0   4.7      0   3.1   1.5     
2009-01-04    0   1.0      0   3.2      0   0.9  -2.1     
2009-01-05    0   0.0      0   5.4      1  -4.1  -6.8     


## 2. Basic Rules for Creating Plots with Bokeh

The basic steps to creating plots with bokeh.plotting module:
1. Prepare some data
2. Tell Bokeh where to generate output
3. Call figure()
4. Add renderers (glyph methods)
5. Ask Bokeh to show() or save() the results

## 3. Step 2: Output to HTML or Notebook

In [132]:
# Output to notebook
output_notebook()

# Output to HTML file
#output_file("max_temp.html")

## 4. Step 3: Call figure()

In [133]:
# Create a new plot with title and axis labels
max_temp_plot = figure(title="February and March 2011 maximum temperature in Phoenix Park", 
                       x_axis_label='Index', 
                       y_axis_label='Max temp (°C)')

## 5. Step 4: Add Renderers

In [134]:
# Use line() for our data with legend and line width
max_temp_plot.line(x=list(range(59)), 
                   y=pp_weather['2011-02':'2011-03'].maxt, 
                   legend_label="Max temp", 
                   line_width=1.5)

## 6. Step 5: Show or Save Results

In [135]:
# Display the results
show(max_temp_plot)

# Note: The legend slightly covers the final value, so we'll adjust y_range in the next example

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

In [136]:
# Improved version with y_range adjustment
#output_file("max_temp2.html")
output_notebook()

# Create plot with adjusted y_range
max_temp_plot = figure(y_range=[5, 18],
                       title="February and March 2011 maximum temperature in Phoenix Park",
                       x_axis_label='Index', 
                       y_axis_label='Max temp (°C)')

# Add line renderer
max_temp_plot.line(x=list(range(59)), 
                   y=pp_weather['2011-02':'2011-03'].maxt, 
                   legend_label="Max temp", 
                   line_width=1.5)

# Display results
show(max_temp_plot)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 7. Multiple Lines and Glyphs

In [137]:
#output_file("max_min_temp.html")

# Create plot with specified tools
temp_plot = figure(tools="pan,box_zoom,reset,save",
                   title="February and March 2011 max and min temperature in Phoenix Park",
                   x_axis_label='Day number', 
                   y_axis_label='Temperature (°C)')

# Add two line glyph methods
temp_plot.line(x=list(range(59)), 
               y=pp_weather['2011-02':'2011-03'].maxt,
               legend_label="Max temp", 
               line_width=2.5)
temp_plot.line(x=list(range(59)), 
               y=pp_weather['2011-02':'2011-03'].mint,
               legend_label="Min temp", 
               line_width=2.5, 
               line_color='red')

# Add circles to show data points
temp_plot.circle(x=list(range(59)), 
                 y=pp_weather['2011-02':'2011-03'].maxt,
                 legend_label="Max temp", 
                 fill_color='white', 
                 size=6)
temp_plot.circle(x=list(range(59)), 
                 y=pp_weather['2011-02':'2011-03'].mint,
                 legend_label="Min temp", 
                 fill_color='red', 
                 size=6)

show(temp_plot)



Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 8. ColumnDataSource

ColumnDataSource links Pandas DataFrames with Bokeh, allowing direct column reference in glyph methods.

In [138]:
from bokeh.models import ColumnDataSource

#output_file("max_min_temp_scatter.html")
source = ColumnDataSource(pp_weather)

In [139]:
# Plot max temp versus min temp
p = figure()
p.circle(x='mint', y='maxt', source=source, size=10, color='blue')

# Add titles and labels
p.title.text = 'Maximum temperature v minimum temperature at Phoenix Park'
p.xaxis.axis_label = 'Minimum temperature (°C)'
p.yaxis.axis_label = 'Maximum temperature (°C)'

show(p)



Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

In [140]:
# Plot with dates on x-axis
#output_file("pandas_example_dates.html")
source = ColumnDataSource(pp_weather['2011-02':'2011-03'])

# Specify x_axis_type as 'datetime' for proper date formatting
p = figure(x_axis_type='datetime')
p.line(x='date', y='maxt', source=source, color='blue')

p.title.text = 'Maximum temperature at Phoenix Park in February and March 2011'
p.xaxis.axis_label = 'Dates'
p.yaxis.axis_label = 'Maximum temperature (°C)'

show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 9. HoverTool

HoverTool adds interactive tooltips to plots for better data exploration.

In [141]:
from bokeh.models.tools import HoverTool

#output_file("pandas_example_dates.html")
source = ColumnDataSource(pp_weather['2011-02':'2011-03'])

p = figure(x_axis_type='datetime')

# Add lines and circles
p.line(x='date', y='maxt', source=source, color='red')
p.line(x='date', y='mint', source=source, color='blue')
p.circle(x='date', y='maxt', source=source, color='red', fill_color='white', size=8)
p.circle(x='date', y='mint', source=source, color='blue', fill_color='white', size=8)

p.title.text = 'Maximum and minimum temperature at Phoenix Park in February and March 2011'
p.xaxis.axis_label = 'Dates'
p.yaxis.axis_label = 'Temperature (°C)'

# Create and configure HoverTool
hover = HoverTool()
hover.tooltips=[
    ('Date', '@date{%Y-%m-%d}'),
    ('Minimum temperature', '@mint'),
    ('Maximum temperature', '@maxt'),
    ('Rainfall', '@rain')
]
hover.formatters = {'@date': 'datetime'}

p.add_tools(hover)
show(p)



Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 10. Sizing Points Based on Variables

Size points in scatterplots based on a variable in the dataset.

In [142]:
#output_file("sized_scatter.html")

# Create rain_size column for better point sizing
pp_weather['rain_size'] = pp_weather['rain'] + 5
source = ColumnDataSource(pp_weather)

p = figure()
p.circle(x='mint', y='maxt', source=source, color='red', size='rain_size')

p.title.text = 'Maximum temperature v minimum temperature at Phoenix Park'
p.xaxis.axis_label = 'Minimum temperature (°C)'
p.yaxis.axis_label = 'Maximum temperature (°C)'

# Add HoverTool
hover = HoverTool()
hover.tooltips=[
    ('Date', '@date{%Y-%m-%d}'),
    ('Minimum temperature', '@mint'),
    ('Maximum temperature', '@maxt'),
    ('Rainfall', '@rain')
]
hover.formatters = {'@date': 'datetime'}
p.add_tools(hover)

show(p)



Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 11. Categorical Data in Bokeh

Working with categorical data using bar charts.

In [143]:
# Load Premier League data
pl = pd.read_csv("pl_2seasons.csv")
pl.Date = pd.to_datetime(pl.Date, format='%d/%m/%Y')

# Filter for top 6 teams
pl_top6 = pl.loc[pl.HomeTeam.isin(['Arsenal', 'Chelsea', 'Liverpool', 'Man United', 'Man City', 'Tottenham'])]

print("Top 6 Teams Data:")
print(pl_top6.head())

Top 6 Teams Data:
      Season       Date    HomeTeam        AwayTeam  FTHG  FTAG FTR  HTHG  \
0   20172018 2017-08-11     Arsenal       Leicester     4     3   H     2   
2   20172018 2017-08-12     Chelsea         Burnley     2     3   A     0   
8   20172018 2017-08-13  Man United        West Ham     4     0   H     1   
13  20172018 2017-08-19   Liverpool  Crystal Palace     1     0   H     0   
18  20172018 2017-08-20   Tottenham         Chelsea     1     2   A     0   

    HTAG HTR  ... HST  AST  HF  AF  HC  AC  HY  AY  HR  AR  
0      2   D  ...  10    3   9  12   9   4   0   1   0   0  
2      3   A  ...   6    5  16  11   8   5   3   3   2   0  
8      0   H  ...   6    1  19   7  11   1   2   2   0   0  
13     0   D  ...  13    1  12  13   4   2   1   3   0   0  
18     1   A  ...   6    2  14  21  14   3   3   3   0   0  

[5 rows x 23 columns]


In [144]:
# Aggregate data by team
top6sums = pl_top6.groupby('HomeTeam')[['FTHG', 'FTAG', 'HF', 'AF']].sum()
print("Aggregated Data:")
print(top6sums)

Aggregated Data:
            FTHG  FTAG   HF   AF
HomeTeam                        
Arsenal       96    36  404  451
Chelsea       69    28  337  424
Liverpool    100    20  315  336
Man City     118    26  331  335
Man United    71    34  417  428
Tottenham     74    32  351  379


In [145]:
from bokeh.palettes import Spectral6
from bokeh.transform import factor_cmap

#output_file("bar_chart2.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()
p = figure(x_range=teams)

# Create color map for bars
color_map = factor_cmap(field_name='HomeTeam', palette=Spectral6, factors=teams)
p.vbar(x='HomeTeam', top='FTHG', source=source, width=0.7, color=color_map)

p.title.text = 'Number of goals scored at home over two seasons by the top 6'
p.xaxis.axis_label = 'Team'
p.yaxis.axis_label = 'Number of goals'

show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 12. Stacked Bar Charts

In [146]:
#output_file("stacked_bar_chart.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()
p = figure(x_range=teams)

# Create stacked bar chart
p.vbar_stack(stackers=['FTHG', 'FTAG'], 
             x='HomeTeam',
             legend_label=['Goals scored', 'Goals conceded'],
             source=source, 
             width=0.7, 
             color=['blue', 'red'])

p.title.text = 'Number of goals scored and conceded at home by the top 6'
p.xaxis.axis_label = 'Team'
p.yaxis.axis_label = 'Number of goals'

show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 13. Gridplot

Creating multiple plots in a grid layout.

In [147]:
from bokeh.plotting import gridplot
from bokeh.palettes import Set3_6

#output_file("grid_bar_chart.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()

# Create first plot (goals scored)
left = figure(x_range=teams, title="Goals Scored at Home")
color_map = factor_cmap(field_name='HomeTeam', palette=Set3_6, factors=teams)
left.vbar(x='HomeTeam', top='FTHG', source=source, width=0.7, color=color_map)
left.xaxis.axis_label = 'Team'
left.yaxis.axis_label = 'Goals Scored'

# Create second plot (goals conceded)
right = figure(x_range=teams, title="Goals Conceded at Home")
right.vbar(x='HomeTeam', top='FTAG', source=source, width=0.7, color=color_map)
right.xaxis.axis_label = 'Team'
right.yaxis.axis_label = 'Goals Conceded'

# Combine plots in grid
p = gridplot([[left, right]])
show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 14. Example from Bokeh Quickstart

Linked brushing example for interactive exploration.

In [148]:
# Prepare some data
N = 300
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)

#output_file("linked_brushing.html")

# Create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select"

# Create first plot
left = figure(tools=TOOLS, width=350, height=350, title=None)
left.circle('x', 'y0', source=source, size=5)  # Added size parameter

# Create second plot
right = figure(tools=TOOLS, width=350, height=350, title=None)
right.circle('x', 'y1', source=source, size=5)  # Added size parameter

# Put subplots in gridplot
p = gridplot([[left, right]])

show(p)



Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

## 15. Additional Resources

- [Real Python Bokeh Tutorial](https://realpython.com/python-data-visualization-bokeh/)
- [GeeksforGeeks Bokeh Tutorial](https://www.geeksforgeeks.org/python-bokeh-tutorial-interactive-data-visualization-with-bokeh/)
- [Official Bokeh Documentation](https://docs.bokeh.org/)

## 16. Exercise

Create your own interactive Bokeh plot using the techniques learned in this lecture. Experiment with:
- Different glyph types (circle, line, bar, etc.)
- HoverTool with custom tooltips
- Color mapping and sizing based on data
- Multiple plots in grid layouts
- Interactive tools and linked brushing

In [149]:
# https://docs.bokeh.org/en/latest/docs/reference/models/glyphs/vbar.html
#output_file("simple_bars.html")

teams = team_goals['HomeTeam'].tolist()
goals = team_goals['FTHG'].tolist()

p = figure(x_range=teams, title="Team Goals", background_fill_color= "grey")
p.vbar(x=teams, top=goals, width=0.5, fill_color="red")

show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

In [150]:
# Simple but still fancy version - CORRECTED
#output_file("simple_fancy.html")

# Just the bar chart with enhanced features
p = figure(x_range=teams, title="🏆 Team Goals Performance", height=500, width=800)

# Gradient colors based on goals (green to red)
colors = []
for goal in team_stats['FTHG']:
    red = min(255, 100 + int(goal * 8))
    green = max(0, 255 - int(goal * 8))
    colors.append(f"#{red:02x}{green:02x}00")

bars = p.vbar(x=teams, top=team_stats['FTHG'], width=0.7, color=colors, 
              alpha=0.8, line_color='black', line_width=2)

# Add value labels - CORRECTED
for i, goals in enumerate(team_stats['FTHG']):
    p.text(x=i, y=goals + max(team_stats['FTHG']) * 0.05, text=[str(int(goals))], 
           text_align='center', text_baseline='bottom',
           text_font_size='12pt', text_color='black')

# Add hover tool - CORRECTED
hover = HoverTool()
hover.tooltips = [
    ("Team", "@x"),
    ("Goals", "@top")
]
p.add_tools(hover)

# Style
p.xaxis.major_label_orientation = 45
p.background_fill_color = "#f0f8ff"
p.border_fill_color = "white"
p.outline_line_color = "navy"

show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

In [151]:
# goals and assists

team_performance = pl_top.groupby('HomeTeam').agg({
    'FTHG': 'sum',      # Goals scored at home
    'FTAG': 'sum',      # Goals conceded at home
    'HST': 'sum',       # Shots on target
    'HC': 'sum',        # Corners
    'HF': 'sum'         # Fouls
}).reset_index()

team_performance['Goal_Difference'] = team_performance['FTHG'] - team_performance['FTAG']

team_performance = team_performance.sort_values('FTHG', ascending=False)

top_6_teams = team_performance.head(6)
bottom_6_teams = team_performance.tail(6)

print("Top 6 Teams:")
print(top_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])
print("\nBottom 6 Teams:")
print(bottom_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])

Top 6 Teams:
     HomeTeam  FTHG  FTAG  Goal_Difference
3    Man City   118    26               92
2   Liverpool   100    20               80
0     Arsenal    96    36               60
5   Tottenham    74    32               42
4  Man United    71    34               37
1     Chelsea    69    28               41

Bottom 6 Teams:
     HomeTeam  FTHG  FTAG  Goal_Difference
3    Man City   118    26               92
2   Liverpool   100    20               80
0     Arsenal    96    36               60
5   Tottenham    74    32               42
4  Man United    71    34               37
1     Chelsea    69    28               41


In [152]:
# First, let's check what teams we actually have
print("All teams in pl_top dataset:")
print(pl_top['HomeTeam'].value_counts())
print(f"\nTotal unique teams: {pl_top['HomeTeam'].nunique()}")

All teams in pl_top dataset:
HomeTeam
Arsenal       38
Chelsea       38
Man United    38
Liverpool     38
Tottenham     38
Man City      38
Name: count, dtype: int64

Total unique teams: 6


In [153]:
# If pl_top only has 6 teams, let's use the full dataset instead
# Load the full dataset (not just top teams)
pl_full = pd.read_csv("pl_2seasons.csv")
pl_full.Date = pd.to_datetime(pl_full.Date, format='%d/%m/%Y')

print(f"Total teams in full dataset: {pl_full['HomeTeam'].nunique()}")
print("All teams:")
print(pl_full['HomeTeam'].value_counts())

Total teams in full dataset: 23
All teams:
HomeTeam
Arsenal           38
Brighton          38
Chelsea           38
Crystal Palace    38
Everton           38
Southampton       38
Watford           38
Man United        38
Newcastle         38
Bournemouth       38
Burnley           38
Liverpool         38
Leicester         38
Tottenham         38
Huddersfield      38
Man City          38
West Ham          38
Stoke             19
West Brom         19
Swansea           19
Fulham            19
Wolves            19
Cardiff           19
Name: count, dtype: int64


In [154]:
# Now use the FULL dataset for team performance
team_performance = pl_full.groupby('HomeTeam').agg({
    'FTHG': 'sum',      # Goals scored at home
    'FTAG': 'sum',      # Goals conceded at home
    'HST': 'sum',       # Shots on target
    'HC': 'sum',        # Corners
    'HF': 'sum'         # Fouls
}).reset_index()

team_performance['Goal_Difference'] = team_performance['FTHG'] - team_performance['FTAG']
team_performance = team_performance.sort_values('FTHG', ascending=False)

print(f"Total teams analyzed: {len(team_performance)}")

Total teams analyzed: 23


In [155]:
# Now get proper top 6 and bottom 6
top_6_teams = team_performance.head(6)
bottom_6_teams = team_performance.tail(6)

print("Top 6 Teams:")
print(top_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])
print("\nBottom 6 Teams:")
print(bottom_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])

Top 6 Teams:
      HomeTeam  FTHG  FTAG  Goal_Difference
12    Man City   118    26               92
11   Liverpool   100    20               80
0      Arsenal    96    36               60
18   Tottenham    74    32               42
13  Man United    71    34               37
5      Chelsea    69    28               41

Bottom 6 Teams:
        HomeTeam  FTHG  FTAG  Goal_Difference
9   Huddersfield    26    56              -30
8         Fulham    22    36              -14
4        Cardiff    21    38              -17
20     West Brom    21    29               -8
16         Stoke    20    30              -10
17       Swansea    17    24               -7


In [156]:
# Create the chart with proper teams
#output_file("top_bottom_proper.html")

all_teams = top_6_teams['HomeTeam'].tolist() + bottom_6_teams['HomeTeam'].tolist()
all_goals = top_6_teams['FTHG'].tolist() + bottom_6_teams['FTHG'].tolist()

p = figure(x_range=all_teams, 
           title="Top 6 vs Bottom 6 Teams - Goals Scored",
           height=400, width=700)

p.vbar(x=all_teams, top=all_goals, width=0.6, 
       color=['blue']*6 + ['red']*6)

p.xaxis.major_label_orientation = 45
show(p)

Loading "original-fs" failed
Error: Cannot find module 'original-fs'
Require stack:
- /vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js
[90m    at Module._resolveFilename (node:internal/modules/cjs/loader:1145:15)[39m
[90m    at Module._load (node:internal/modules/cjs/loader:986:27)[39m
[90m    at Module.require (node:internal/modules/cjs/loader:1233:19)[39m
[90m    at require (node:internal/modules/helpers:179:18)[39m
    at i (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:98)
    at r.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:2:1637)
    at h.load (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:1:13958)
    at u (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9338)
    at Object.errorback (/vscode/bin/linux-x64/fee1edb8d6d72a0ddff41e5f71a671c23ed924b9/out/server-cli.js:3:9457)
    at h.triggerErr

In [157]:
# goals and assists

team_performance = pl_top.groupby('HomeTeam').agg({
    'FTHG': 'sum',      # Goals scored at home
    'FTAG': 'sum',      # Goals conceded at home
    'HST': 'sum',       # Shots on target
    'HC': 'sum',        # Corners
    'HF': 'sum'         # Fouls
}).reset_index()

team_performance['Goal_Difference'] = team_performance['FTHG'] - team_performance['FTAG']

team_performance = team_performance.sort_values('FTHG', ascending=False)

top_6_teams = team_performance.head(6)
bottom_6_teams = team_performance.tail(6)

print("Top 6 Teams:")
print(top_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])
print("\nBottom 6 Teams:")
print(bottom_6_teams[['HomeTeam', 'FTHG', 'FTAG', 'Goal_Difference']])

Top 6 Teams:
     HomeTeam  FTHG  FTAG  Goal_Difference
3    Man City   118    26               92
2   Liverpool   100    20               80
0     Arsenal    96    36               60
5   Tottenham    74    32               42
4  Man United    71    34               37
1     Chelsea    69    28               41

Bottom 6 Teams:
     HomeTeam  FTHG  FTAG  Goal_Difference
3    Man City   118    26               92
2   Liverpool   100    20               80
0     Arsenal    96    36               60
5   Tottenham    74    32               42
4  Man United    71    34               37
1     Chelsea    69    28               41
