# Bokeh Lecture - Interactive Data Visualization
## Based on Lecture12_Bokeh.pdf

**April 10, 2024**

## 1. Introduction to Bokeh

Bokeh creates visualizations for display on the web with high interactivity. Unlike matplotlib and seaborn which create static graphics, Bokeh offers:
- Web-based visualizations (locally or embedded in webpages)
- Highly interactive plots
- Ideal for exploratory data analysis and web distribution

In [28]:
# Set up the session by importing the modules needed
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os
from bokeh.plotting import figure, output_file, show, output_notebook
import bokeh.io

# Set working directory (adjust path as needed)
#os.chdir("/workspaces/College-folder/semester-1/Programming for Data Analytics/dataset/")

In [33]:
# Load Phoenix Park weather data
pp_weather = pd.read_csv("/workspaces/College-folder/semester-1/Programming for Data Analytics/dataset/phoenix_park_weather.csv")
pp_weather.date = pd.to_datetime(pp_weather.date, format='%d-%b-%y')
pp_weather.set_index(['date'], inplace=True)
pp_weather.sort_index(inplace=True)

print("Phoenix Park Weather Data:")
print(pp_weather.head())

AttributeError: 'DataFrame' object has no attribute 'date'

## 2. Basic Rules for Creating Plots with Bokeh

The basic steps to creating plots with bokeh.plotting module:
1. Prepare some data
2. Tell Bokeh where to generate output
3. Call figure()
4. Add renderers (glyph methods)
5. Ask Bokeh to show() or save() the results

## 3. Step 2: Output to HTML or Notebook

In [None]:
# Output to notebook
output_notebook()

# Output to HTML file
output_file("max_temp.html")

## 4. Step 3: Call figure()

In [None]:
# Create a new plot with title and axis labels
max_temp_plot = figure(title="February and March 2011 maximum temperature in Phoenix Park", 
                       x_axis_label='Index', 
                       y_axis_label='Max temp (°C)')

## 5. Step 4: Add Renderers

In [None]:
# Use line() for our data with legend and line width
max_temp_plot.line(x=list(range(59)), 
                   y=pp_weather['2011-02':'2011-03'].maxt, 
                   legend_label="Max temp", 
                   line_width=1.5)

## 6. Step 5: Show or Save Results

In [None]:
# Display the results
show(max_temp_plot)

# Note: The legend slightly covers the final value, so we'll adjust y_range in the next example

In [None]:
# Improved version with y_range adjustment
output_file("max_temp2.html")
output_notebook()

# Create plot with adjusted y_range
max_temp_plot = figure(y_range=[5, 18],
                       title="February and March 2011 maximum temperature in Phoenix Park",
                       x_axis_label='Index', 
                       y_axis_label='Max temp (°C)')

# Add line renderer
max_temp_plot.line(x=list(range(59)), 
                   y=pp_weather['2011-02':'2011-03'].maxt, 
                   legend_label="Max temp", 
                   line_width=1.5)

# Display results
show(max_temp_plot)

## 7. Multiple Lines and Glyphs

In [None]:
output_file("max_min_temp.html")

# Create plot with specified tools
temp_plot = figure(tools="pan,box_zoom,reset,save",
                   title="February and March 2011 max and min temperature in Phoenix Park",
                   x_axis_label='Day number', 
                   y_axis_label='Temperature (°C)')

# Add two line glyph methods
temp_plot.line(x=list(range(59)), 
               y=pp_weather['2011-02':'2011-03'].maxt,
               legend_label="Max temp", 
               line_width=2.5)
temp_plot.line(x=list(range(59)), 
               y=pp_weather['2011-02':'2011-03'].mint,
               legend_label="Min temp", 
               line_width=2.5, 
               line_color='red')

# Add circles to show data points
temp_plot.circle(x=list(range(59)), 
                 y=pp_weather['2011-02':'2011-03'].maxt,
                 legend_label="Max temp", 
                 fill_color='white', 
                 size=6)
temp_plot.circle(x=list(range(59)), 
                 y=pp_weather['2011-02':'2011-03'].mint,
                 legend_label="Min temp", 
                 fill_color='red', 
                 size=6)

show(temp_plot)

## 8. ColumnDataSource

ColumnDataSource links Pandas DataFrames with Bokeh, allowing direct column reference in glyph methods.

In [None]:
from bokeh.models import ColumnDataSource

output_file("max_min_temp_scatter.html")
source = ColumnDataSource(pp_weather)

In [None]:
# Plot max temp versus min temp
p = figure()
p.circle(x='mint', y='maxt', source=source, size=10, color='blue')

# Add titles and labels
p.title.text = 'Maximum temperature v minimum temperature at Phoenix Park'
p.xaxis.axis_label = 'Minimum temperature (°C)'
p.yaxis.axis_label = 'Maximum temperature (°C)'

show(p)

In [None]:
# Plot with dates on x-axis
output_file("pandas_example_dates.html")
source = ColumnDataSource(pp_weather['2011-02':'2011-03'])

# Specify x_axis_type as 'datetime' for proper date formatting
p = figure(x_axis_type='datetime')
p.line(x='date', y='maxt', source=source, color='blue')

p.title.text = 'Maximum temperature at Phoenix Park in February and March 2011'
p.xaxis.axis_label = 'Dates'
p.yaxis.axis_label = 'Maximum temperature (°C)'

show(p)

## 9. HoverTool

HoverTool adds interactive tooltips to plots for better data exploration.

In [None]:
from bokeh.models.tools import HoverTool

output_file("pandas_example_dates.html")
source = ColumnDataSource(pp_weather['2011-02':'2011-03'])

p = figure(x_axis_type='datetime')

# Add lines and circles
p.line(x='date', y='maxt', source=source, color='red')
p.line(x='date', y='mint', source=source, color='blue')
p.circle(x='date', y='maxt', source=source, color='red', fill_color='white', size=8)
p.circle(x='date', y='mint', source=source, color='blue', fill_color='white', size=8)

p.title.text = 'Maximum and minimum temperature at Phoenix Park in February and March 2011'
p.xaxis.axis_label = 'Dates'
p.yaxis.axis_label = 'Temperature (°C)'

# Create and configure HoverTool
hover = HoverTool()
hover.tooltips=[
    ('Date', '@date{%Y-%m-%d}'),
    ('Minimum temperature', '@mint'),
    ('Maximum temperature', '@maxt'),
    ('Rainfall', '@rain')
]
hover.formatters = {'@date': 'datetime'}

p.add_tools(hover)
show(p)

## 10. Sizing Points Based on Variables

Size points in scatterplots based on a variable in the dataset.

In [None]:
output_file("sized_scatter.html")

# Create rain_size column for better point sizing
pp_weather['rain_size'] = pp_weather['rain'] + 5
source = ColumnDataSource(pp_weather)

p = figure()
p.circle(x='mint', y='maxt', source=source, color='red', size='rain_size')

p.title.text = 'Maximum temperature v minimum temperature at Phoenix Park'
p.xaxis.axis_label = 'Minimum temperature (°C)'
p.yaxis.axis_label = 'Maximum temperature (°C)'

# Add HoverTool
hover = HoverTool()
hover.tooltips=[
    ('Date', '@date{%Y-%m-%d}'),
    ('Minimum temperature', '@mint'),
    ('Maximum temperature', '@maxt'),
    ('Rainfall', '@rain')
]
hover.formatters = {'@date': 'datetime'}
p.add_tools(hover)

show(p)

## 11. Categorical Data in Bokeh

Working with categorical data using bar charts.

In [None]:
# Load Premier League data
pl = pd.read_csv("pl_2seasons.csv")
pl.Date = pd.to_datetime(pl.Date, format='%d/%m/%Y')

# Filter for top 6 teams
pl_top6 = pl.loc[pl.HomeTeam.isin(['Arsenal', 'Chelsea', 'Liverpool', 'Man United', 'Man City', 'Tottenham'])]

print("Top 6 Teams Data:")
print(pl_top6.head())

In [None]:
# Aggregate data by team
top6sums = pl_top6.groupby('HomeTeam')[['FTHG', 'FTAG', 'HF', 'AF']].sum()
print("Aggregated Data:")
print(top6sums)

In [None]:
from bokeh.palettes import Spectral6
from bokeh.transform import factor_cmap

output_file("bar_chart2.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()
p = figure(x_range=teams)

# Create color map for bars
color_map = factor_cmap(field_name='HomeTeam', palette=Spectral6, factors=teams)
p.vbar(x='HomeTeam', top='FTHG', source=source, width=0.7, color=color_map)

p.title.text = 'Number of goals scored at home over two seasons by the top 6'
p.xaxis.axis_label = 'Team'
p.yaxis.axis_label = 'Number of goals'

show(p)

## 12. Stacked Bar Charts

In [None]:
output_file("stacked_bar_chart.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()
p = figure(x_range=teams)

# Create stacked bar chart
p.vbar_stack(stackers=['FTHG', 'FTAG'], 
             x='HomeTeam',
             legend_label=['Goals scored', 'Goals conceded'],
             source=source, 
             width=0.7, 
             color=['blue', 'red'])

p.title.text = 'Number of goals scored and conceded at home by the top 6'
p.xaxis.axis_label = 'Team'
p.yaxis.axis_label = 'Number of goals'

show(p)

## 13. Gridplot

Creating multiple plots in a grid layout.

In [None]:
from bokeh.plotting import gridplot
from bokeh.palettes import Set3_6

output_file("grid_bar_chart.html")

source = ColumnDataSource(top6sums)
teams = source.data['HomeTeam'].tolist()

# Create first plot (goals scored)
left = figure(x_range=teams, title="Goals Scored at Home")
color_map = factor_cmap(field_name='HomeTeam', palette=Set3_6, factors=teams)
left.vbar(x='HomeTeam', top='FTHG', source=source, width=0.7, color=color_map)
left.xaxis.axis_label = 'Team'
left.yaxis.axis_label = 'Goals Scored'

# Create second plot (goals conceded)
right = figure(x_range=teams, title="Goals Conceded at Home")
right.vbar(x='HomeTeam', top='FTAG', source=source, width=0.7, color=color_map)
right.xaxis.axis_label = 'Team'
right.yaxis.axis_label = 'Goals Conceded'

# Combine plots in grid
p = gridplot([[left, right]])
show(p)

## 14. Example from Bokeh Quickstart

Linked brushing example for interactive exploration.

In [None]:
# Prepare some data
N = 300
x = np.linspace(0, 4*np.pi, N)
y0 = np.sin(x)
y1 = np.cos(x)

output_file("linked_brushing.html")

# Create a column data source for the plots to share
source = ColumnDataSource(data=dict(x=x, y0=y0, y1=y1))

TOOLS = "pan,wheel_zoom,box_zoom,reset,save,box_select,lasso_select"

# Create first plot
left = figure(tools=TOOLS, width=350, height=350, title=None)
left.circle('x', 'y0', source=source)

# Create second plot
right = figure(tools=TOOLS, width=350, height=350, title=None)
right.circle('x', 'y1', source=source)

# Put subplots in gridplot
p = gridplot([[left, right]])

show(p)

## 15. Additional Resources

- [Real Python Bokeh Tutorial](https://realpython.com/python-data-visualization-bokeh/)
- [GeeksforGeeks Bokeh Tutorial](https://www.geeksforgeeks.org/python-bokeh-tutorial-interactive-data-visualization-with-bokeh/)
- [Official Bokeh Documentation](https://docs.bokeh.org/)

## 16. Exercise

Create your own interactive Bokeh plot using the techniques learned in this lecture. Experiment with:
- Different glyph types (circle, line, bar, etc.)
- HoverTool with custom tooltips
- Color mapping and sizing based on data
- Multiple plots in grid layouts
- Interactive tools and linked brushing