
# Bokeh for Time Series Analysis
<hr style="border: 2px solid black;">


<img src="./images/bokeh.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">

<img src="./images/bokeh_at_ag_glance.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">
**Introduction to Bokeh**
Bokeh is an interactive visualization library for Python that targets modern web browsers for presentation.
Unlike Matplotlib, which is primarily designed for static plots, Bokeh excels at creating
interactive plots and dashboards. It can handle large datasets and streaming data,
making it suitable for real-time applications.

**Key Features of Bokeh:**

* **Interactivity:** Built-in support for zooming, panning, hovering, and other interactive tools.
* **Web-Focused:** Generates HTML and JavaScript, making it easy to embed plots in web pages.
* **High Performance:** Can handle large datasets efficiently.
* **Versatility:** Supports a wide range of plot types (lines, bars, scatter plots, etc.).

<hr style="border: 2px solid black;">


**Documentation:**

For comprehensive documentation, please refer to the official Bokeh website: [https://docs.bokeh.org/en/latest/](https://docs.bokeh.org/en/latest/)


<hr style="border: 2px solid black;">


**Lab Exercise:**

Your task is to recreate the time series analysis lab we previously conducted using Pandas,
Matplotlib, and Seaborn, but this time, utilize the Bokeh library for visualization.
This will involve:

1.  Loading and preprocessing the "AirPassengersDates.csv" dataset.
2.  Creating interactive Bokeh plots for:
    * Time series line plots
    * Bar plots of aggregated data
    * Visualizing mean and standard deviation
    * Outlier detection
    * Resampling (upsampling and downsampling)
    * Lag analysis
    * Autocorrelation

Pay close attention to Bokeh's features for interactivity (tools, hover effects) and
its handling of data sources. Aim to replicate the insights and visualizations
from the previous lab while leveraging Bokeh's strengths.

Good luck!
<hr style="border: 2px solid black;">

In [1]:
import sys
!{sys.executable} -m pip install bokeh

import pandas as pd
import numpy as np
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool, DatetimeTickFormatter, RangeTool, Band
from bokeh.layouts import column, row
from bokeh.io import output_file


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/opt/homebrew/Cellar/jupyterlab/4.4.2_1/libexec/bin/python -m pip install --upgrade pip[0m


1. Loading and preprocessing the "AirPassengersDates.csv" dataset.

In [2]:
output_notebook()

df = pd.read_csv('datasets/AirPassengersDates.csv')

df['Date'] = pd.to_datetime(df['Date'])

print("Missing values in dataset:")
print(df.isnull().sum())

print("\nFirst few rows of the dataset:")
print(df.head())

Missing values in dataset:
Date           0
#Passengers    0
dtype: int64

First few rows of the dataset:
        Date  #Passengers
0 1949-01-12          112
1 1949-02-24          118
2 1949-03-22          132
3 1949-04-05          129
4 1949-05-24          121


2.  Creating interactive Bokeh plots for:
    * Time series line plots

In [3]:
source = ColumnDataSource(df)

p = figure(title='Air Passengers Over Time',
           x_axis_type='datetime',
           x_axis_label='Date',
           y_axis_label='Passenger Count',
           width=800,
           height=400,
           tools=['box_select', 'lasso_select', 'wheel_zoom', 'pan', 'reset', 'save'])

p.line(x='Date', y='#Passengers', source=source, line_width=2)

hover = HoverTool(tooltips=[
    ('Date', '@Date{%F}'),
    ('Passengers', '@{#Passengers}')
], formatters={'@Date': 'datetime'})
p.add_tools(hover)

show(p)

* Bar plots of aggregated data

In [4]:
df['Month'] = df['Date'].dt.month_name()
df['Year'] = df['Date'].dt.year

monthly_data = df.groupby('Month')['#Passengers'].mean().reset_index()

month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']
monthly_data['Month'] = pd.Categorical(monthly_data['Month'], categories=month_order, ordered=True)
monthly_data = monthly_data.sort_values('Month')

source = ColumnDataSource(monthly_data)

p = figure(title='Average Passengers by Month (All Years)',
           x_range=month_order,
           x_axis_label='Month',
           y_axis_label='Average Passenger Count',
           width=800,
           height=400)

p.vbar(x='Month', top='#Passengers', source=source, width=0.9)

hover = HoverTool(tooltips=[
    ('Month', '@Month'),
    ('Average Passengers', '@{#Passengers}{0.0}')
])
p.add_tools(hover)

show(p)

* Visualizing mean and standard deviation

In [5]:
window_size = 12  # 12 months for annual trend
df['Rolling_Mean'] = df['#Passengers'].rolling(window=window_size).mean()
df['Rolling_Std'] = df['#Passengers'].rolling(window=window_size).std()
df['Upper_Band'] = df['Rolling_Mean'] + df['Rolling_Std']
df['Lower_Band'] = df['Rolling_Mean'] - df['Rolling_Std']

df_clean = df.dropna()

source = ColumnDataSource(df_clean)

p = figure(title='Air Passengers with Rolling Mean and Standard Deviation',
           x_axis_type='datetime',
           x_axis_label='Date',
           y_axis_label='Passenger Count',
           width=800,
           height=400)

p.line(x='Date', y='#Passengers', source=source, line_width=2, legend_label='Passengers')

p.line(x='Date', y='Rolling_Mean', source=source, line_width=2, color='red', legend_label='Rolling Mean')

band = Band(base='Date', lower='Lower_Band', upper='Upper_Band', source=source,
            level='underlay', fill_alpha=0.2, fill_color='red', line_width=1, line_color='red')
p.add_layout(band)

hover = HoverTool(tooltips=[
    ('Date', '@Date{%F}'),
    ('Passengers', '@{#Passengers}'),
    ('Mean', '@Rolling_Mean{0.0}'),
    ('Std Dev', '@Rolling_Std{0.0}')
], formatters={'@Date': 'datetime'})
p.add_tools(hover)

p.legend.location = "top_left"
p.legend.click_policy = "hide"

show(p)

* Outlier detection

In [6]:
df['Is_Outlier'] = (df['#Passengers'] > df['Upper_Band']) | (df['#Passengers'] < df['Lower_Band'])

df_normal = df[~df['Is_Outlier']].dropna()
df_outliers = df[df['Is_Outlier']].dropna()

source_normal = ColumnDataSource(df_normal)
source_outliers = ColumnDataSource(df_outliers)

p = figure(title='Air Passengers with Outliers Highlighted',
           x_axis_type='datetime',
           x_axis_label='Date',
           y_axis_label='Passenger Count',
           width=800,
           height=400)

p.line(x='Date', y='#Passengers', source=ColumnDataSource(df), line_width=2)

p.circle(x='Date', y='#Passengers', source=source_normal, size=6, color='blue', legend_label='Normal')
p.circle(x='Date', y='#Passengers', source=source_outliers, size=8, color='red', legend_label='Outliers')

hover = HoverTool(tooltips=[
    ('Date', '@Date{%F}'),
    ('Passengers', '@{#Passengers}'),
    ('Mean', '@Rolling_Mean{0.0}'),
    ('Std Dev', '@Rolling_Std{0.0}')
], formatters={'@Date': 'datetime'})
p.add_tools(hover)

p.legend.location = "top_left"
p.legend.click_policy = "hide"

show(p)



* Resampling (upsampling and downsampling)

In [7]:
df = df.sort_values('Date')

df_indexed = df.set_index('Date')

quarterly = df_indexed.resample('Q')['#Passengers'].mean().reset_index()
quarterly['Period'] = 'Quarterly'

yearly = df_indexed.resample('Y')['#Passengers'].mean().reset_index()
yearly['Period'] = 'Yearly'

original = df.copy()
original['Period'] = 'Monthly'

source_monthly = ColumnDataSource(original)
source_quarterly = ColumnDataSource(quarterly)
source_yearly = ColumnDataSource(yearly)

p = figure(title='Air Passengers: Original vs Resampled Data',
           x_axis_type='datetime',
           x_axis_label='Date',
           y_axis_label='Passenger Count',
           width=800,
           height=400)

p.line(x='Date', y='#Passengers', source=source_monthly, line_width=2,
       color='blue', legend_label='Monthly')
p.line(x='Date', y='#Passengers', source=source_quarterly, line_width=2,
       color='red', legend_label='Quarterly')
p.line(x='Date', y='#Passengers', source=source_yearly, line_width=2,
       color='green', legend_label='Yearly')

p.circle(x='Date', y='#Passengers', source=source_quarterly, size=8, color='red')
p.square(x='Date', y='#Passengers', source=source_yearly, size=10, color='green')

hover = HoverTool(tooltips=[
    ('Date', '@Date{%F}'),
    ('Passengers', '@{#Passengers}{0.0}'),
    ('Period', '@Period')
], formatters={'@Date': 'datetime'})
p.add_tools(hover)

p.legend.location = "top_left"
p.legend.click_policy = "hide"

show(p)

  quarterly = df_indexed.resample('Q')['#Passengers'].mean().reset_index()
  yearly = df_indexed.resample('Y')['#Passengers'].mean().reset_index()


* Lag analysis

In [8]:
df['Lag_1'] = df['#Passengers'].shift(1)
df['Lag_6'] = df['#Passengers'].shift(6)
df['Lag_12'] = df['#Passengers'].shift(12)

df_lagged = df.dropna()

p1 = figure(title='Lag 1 Analysis',
           x_axis_label='Passengers (t-1)',
           y_axis_label='Passengers (t)',
           width=400,
           height=400)
p1.circle(x='Lag_1', y='#Passengers', source=ColumnDataSource(df_lagged), size=8)

p2 = figure(title='Lag 6 Analysis',
           x_axis_label='Passengers (t-6)',
           y_axis_label='Passengers (t)',
           width=400,
           height=400)
p2.circle(x='Lag_6', y='#Passengers', source=ColumnDataSource(df_lagged), size=8, color='green')

p3 = figure(title='Lag 12 Analysis',
           x_axis_label='Passengers (t-12)',
           y_axis_label='Passengers (t)',
           width=400,
           height=400)
p3.circle(x='Lag_12', y='#Passengers', source=ColumnDataSource(df_lagged), size=8, color='red')

lag_plots = row(p1, p2, p3)
show(lag_plots)



* Autocorrelation

In [9]:
n_lags = 36
autocorr_values = [df['#Passengers'].autocorr(lag=i) for i in range(1, n_lags+1)]
lags = list(range(1, n_lags+1))

autocorr_df = pd.DataFrame({'Lag': lags, 'Autocorrelation': autocorr_values})

source = ColumnDataSource(autocorr_df)

p = figure(title='Autocorrelation Plot',
           x_axis_label='Lag',
           y_axis_label='Autocorrelation',
           width=800,
           height=400)

p.line(x='Lag', y='Autocorrelation', source=source, line_width=2)
p.circle(x='Lag', y='Autocorrelation', source=source, size=8, color='blue')

p.line(x=[0, n_lags+1], y=[0, 0], line_width=1, color='black', line_dash='dashed')
p.line(x=[0, n_lags+1], y=[0.2, 0.2], line_width=1, color='red', line_dash='dotted')
p.line(x=[0, n_lags+1], y=[-0.2, -0.2], line_width=1, color='red', line_dash='dotted')

hover = HoverTool(tooltips=[
    ('Lag', '@Lag'),
    ('Autocorrelation', '@Autocorrelation{0.000}')
])
p.add_tools(hover)

show(p)

