
# Bokeh for Time Series Analysis
<hr style="border: 2px solid black;">


<img src="./images/bokeh.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">

<img src="./images/bokeh_at_ag_glance.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">
**Introduction to Bokeh**
Bokeh is an interactive visualization library for Python that targets modern web browsers for presentation.
Unlike Matplotlib, which is primarily designed for static plots, Bokeh excels at creating
interactive plots and dashboards. It can handle large datasets and streaming data,
making it suitable for real-time applications.

**Key Features of Bokeh:**

* **Interactivity:** Built-in support for zooming, panning, hovering, and other interactive tools.
* **Web-Focused:** Generates HTML and JavaScript, making it easy to embed plots in web pages.
* **High Performance:** Can handle large datasets efficiently.
* **Versatility:** Supports a wide range of plot types (lines, bars, scatter plots, etc.).

<hr style="border: 2px solid black;">


**Documentation:**

For comprehensive documentation, please refer to the official Bokeh website: [https://docs.bokeh.org/en/latest/](https://docs.bokeh.org/en/latest/)


<hr style="border: 2px solid black;">


**Lab Exercise:**

Your task is to recreate the time series analysis lab we previously conducted using Pandas,
Matplotlib, and Seaborn, but this time, utilize the Bokeh library for visualization.
This will involve:

1.  Loading and preprocessing the "AirPassengersDates.csv" dataset.
2.  Creating interactive Bokeh plots for:
    * Time series line plots
    * Bar plots of aggregated data
    * Visualizing mean and standard deviation
    * Outlier detection
    * Resampling (upsampling and downsampling)
    * Lag analysis
    * Autocorrelation

Pay close attention to Bokeh's features for interactivity (tools, hover effects) and
its handling of data sources. Aim to replicate the insights and visualizations
from the previous lab while leveraging Bokeh's strengths.

Good luck!
<hr style="border: 2px solid black;">

In [10]:
# Import required libraries
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, HoverTool, RangeSlider, DatetimeTickFormatter
from bokeh.layouts import column, row
from bokeh.palettes import Category10
from datetime import datetime

# Enable notebook output for Bokeh
output_notebook()

# Load the dataset
df = pd.read_csv('datasets/AirPassengersDates.csv')

# Convert the date column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Set the date as index
df.set_index('Date', inplace=True)

# Display the first few rows of the dataset
print("Dataset Overview:")
print(df.head())
print("\nDataset Info:")
print(df.info())

Dataset Overview:
            #Passengers
Date                   
1949-01-12          112
1949-02-24          118
1949-03-22          132
1949-04-05          129
1949-05-24          121

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 144 entries, 1949-01-12 to 1960-12-04
Data columns (total 1 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   #Passengers  144 non-null    int64
dtypes: int64(1)
memory usage: 2.2 KB
None


In [18]:
# Create a ColumnDataSource for the data
source = ColumnDataSource(df)

# Create the figure
p = figure(
    title='Air Passengers Over Time',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,box_select,lasso_select,reset,save'
)

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Passengers', '@{#Passengers}'),
    ],
    formatters={
        '@Date': 'datetime',
    }
)
p.add_tools(hover)

# Add the line plot
p.line(
    x='Date',
    y='#Passengers',
    source=source,
    line_width=2,
    line_color='navy',
    legend_label='Passengers'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Number of Passengers'
p.legend.location = 'top_left'

# Show the plot
show(p)

In [22]:
# Calculate monthly and yearly averages
monthly_avg = df.resample('M').mean()
yearly_avg = df.resample('Y').mean()

# Create ColumnDataSources
monthly_source = ColumnDataSource(monthly_avg)
yearly_source = ColumnDataSource(yearly_avg)

# Create monthly plot
p_monthly = figure(
    title='Monthly Average Passengers',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add monthly bars
p_monthly.vbar(
    x='Date',
    top='#Passengers',
    source=monthly_source,
    width=pd.Timedelta(days=20),
    fill_color='steelblue',
    legend_label='Monthly Average'
)

# Create yearly plot
p_yearly = figure(
    title='Yearly Average Passengers',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add yearly bars
p_yearly.vbar(
    x='Date',
    top='#Passengers',
    source=yearly_source,
    width=pd.Timedelta(days=200),
    fill_color='firebrick',
    legend_label='Yearly Average'
)

# Customize both plots
for p in [p_monthly, p_yearly]:
    p.title.text_font_size = '16pt'
    p.axis.axis_label_text_font_size = '12pt'
    p.axis.axis_label_text_font_style = 'bold'
    p.xaxis.axis_label = 'Date'
    p.yaxis.axis_label = 'Average Number of Passengers'
    p.legend.location = 'top_left'
    
    # Add hover tool
    hover = HoverTool(
        tooltips=[
            ('Date', '@Date{%F}'),
            ('Average Passengers', '@{#Passengers}{0.0}'),
        ],
        formatters={
            '@Date': 'datetime',
        }
    )
    p.add_tools(hover)

# Show both plots
show(column(p_monthly, p_yearly))

  monthly_avg = df.resample('M').mean()
  yearly_avg = df.resample('Y').mean()


In [19]:
# Calculate rolling statistics
df['Rolling_Mean'] = df['#Passengers'].rolling(window=12).mean()
df['Rolling_Std'] = df['#Passengers'].rolling(window=12).std()

# Create ColumnDataSource
source = ColumnDataSource(df)

# Create the figure
p = figure(
    title='Passengers with Rolling Mean and Standard Deviation',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add the main line
p.line(
    x='Date',
    y='#Passengers',
    source=source,
    line_width=2,
    line_color='navy',
    legend_label='Passengers'
)

# Add rolling mean
p.line(
    x='Date',
    y='Rolling_Mean',
    source=source,
    line_width=2,
    line_color='red',
    legend_label='12-Month Rolling Mean'
)

# Add standard deviation bands
p.varea(
    x='Date',
    y1='Rolling_Mean',
    y2='Rolling_Mean',
    source=source,
    fill_alpha=0.2,
    fill_color='red',
    legend_label='Standard Deviation'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Number of Passengers'
p.legend.location = 'top_left'

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Passengers', '@{#Passengers}'),
        ('Rolling Mean', '@{Rolling_Mean}{0.0}'),
        ('Rolling Std', '@{Rolling_Std}{0.0}'),
    ],
    formatters={
        '@Date': 'datetime',
    }
)
p.add_tools(hover)

# Show the plot
show(p)

In [21]:
# Calculate z-scores for outlier detection
df['Z_Score'] = (df['#Passengers'] - df['#Passengers'].mean()) / df['#Passengers'].std()
df['Is_Outlier'] = abs(df['Z_Score']) > 2  # Mark points with |z-score| > 2 as outliers

# Create ColumnDataSource
source = ColumnDataSource(df)

# Create the figure
p = figure(
    title='Passenger Data with Outlier Detection',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add the main line
p.line(
    x='Date',
    y='#Passengers',
    source=source,
    line_width=2,
    line_color='navy',
    legend_label='Passengers'
)

# Add outlier points
outlier_source = ColumnDataSource(df[df['Is_Outlier']])
p.circle(
    x='Date',
    y='#Passengers',
    source=outlier_source,
    size=8,
    fill_color='red',
    line_color='black',
    legend_label='Outliers (|z-score| > 2)'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Number of Passengers'
p.legend.location = 'top_left'

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Passengers', '@{#Passengers}'),
        ('Z-Score', '@{Z_Score}{0.00}'),
    ],
    formatters={
        '@Date': 'datetime',
    }
)
p.add_tools(hover)

# Show the plot
show(p)



In [15]:
# Create different resampled versions
daily_data = df.resample('D').asfreq()  # Upsampling to daily
weekly_data = df.resample('W').mean()   # Downsampling to weekly
monthly_data = df.resample('M').mean()  # Downsampling to monthly

# Create ColumnDataSources
daily_source = ColumnDataSource(daily_data)
weekly_source = ColumnDataSource(weekly_data)
monthly_source = ColumnDataSource(monthly_data)

# Create the figure
p = figure(
    title='Passenger Data with Different Resampling Frequencies',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add the different resampled lines
p.line(
    x='Date',
    y='#Passengers',
    source=daily_source,
    line_width=1,
    line_color='gray',
    line_alpha=0.3,
    legend_label='Daily (Upsampled)'
)

p.line(
    x='Date',
    y='#Passengers',
    source=weekly_source,
    line_width=2,
    line_color='blue',
    legend_label='Weekly Average'
)

p.line(
    x='Date',
    y='#Passengers',
    source=monthly_source,
    line_width=3,
    line_color='red',
    legend_label='Monthly Average'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Number of Passengers'
p.legend.location = 'top_left'

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Passengers', '@{#Passengers}{0.0}'),
    ],
    formatters={
        '@Date': 'datetime',
    }
)
p.add_tools(hover)

# Show the plot
show(p)

  monthly_data = df.resample('M').mean()  # Downsampling to monthly


In [16]:
# Create lagged versions of the data
df['Lag_1'] = df['#Passengers'].shift(1)
df['Lag_12'] = df['#Passengers'].shift(12)  # 12-month lag

# Create ColumnDataSource
source = ColumnDataSource(df)

# Create the figure
p = figure(
    title='Passenger Data with Lag Analysis',
    x_axis_type='datetime',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add the main line
p.line(
    x='Date',
    y='#Passengers',
    source=source,
    line_width=2,
    line_color='navy',
    legend_label='Current'
)

# Add lagged lines
p.line(
    x='Date',
    y='Lag_1',
    source=source,
    line_width=2,
    line_color='red',
    line_dash='dashed',
    legend_label='1-Month Lag'
)

p.line(
    x='Date',
    y='Lag_12',
    source=source,
    line_width=2,
    line_color='green',
    line_dash='dotted',
    legend_label='12-Month Lag'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Date'
p.yaxis.axis_label = 'Number of Passengers'
p.legend.location = 'top_left'

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Date', '@Date{%F}'),
        ('Current', '@{#Passengers}'),
        ('1-Month Lag', '@{Lag_1}{0.0}'),
        ('12-Month Lag', '@{Lag_12}{0.0}'),
    ],
    formatters={
        '@Date': 'datetime',
    }
)
p.add_tools(hover)

# Show the plot
show(p)

In [17]:
# Calculate autocorrelation
def autocorr(x, max_lag=36):
    n = len(x)
    mean = x.mean()
    var = x.var()
    r = np.zeros(max_lag + 1)
    r[0] = 1.0  # lag 0 is always 1
    
    for lag in range(1, max_lag + 1):
        r[lag] = ((x[lag:] - mean) * (x[:-lag] - mean)).mean() / var
    
    return r

# Calculate autocorrelation
acf = autocorr(df['#Passengers'].values)
lags = np.arange(len(acf))

# Create ColumnDataSource
source = ColumnDataSource(data=dict(lags=lags, acf=acf))

# Create the figure
p = figure(
    title='Autocorrelation Function (ACF) of Passenger Data',
    width=800,
    height=400,
    tools='pan,box_zoom,wheel_zoom,reset,save'
)

# Add the bars
p.vbar(
    x='lags',
    top='acf',
    source=source,
    width=0.8,
    fill_color='steelblue',
    line_color='navy'
)

# Add horizontal lines for significance
p.line(
    x=[0, len(acf)],
    y=[0, 0],
    line_color='black',
    line_dash='dashed'
)

# Add confidence interval lines (approximate 95% confidence interval)
ci = 1.96 / np.sqrt(len(df))
p.line(
    x=[0, len(acf)],
    y=[ci, ci],
    line_color='red',
    line_dash='dashed',
    legend_label='95% Confidence Interval'
)
p.line(
    x=[0, len(acf)],
    y=[-ci, -ci],
    line_color='red',
    line_dash='dashed'
)

# Customize the plot
p.title.text_font_size = '16pt'
p.axis.axis_label_text_font_size = '12pt'
p.axis.axis_label_text_font_style = 'bold'
p.xaxis.axis_label = 'Lag'
p.yaxis.axis_label = 'Autocorrelation'
p.legend.location = 'top_right'

# Add hover tool
hover = HoverTool(
    tooltips=[
        ('Lag', '@lags'),
        ('Autocorrelation', '@acf{0.000}'),
    ]
)
p.add_tools(hover)

# Show the plot
show(p)