
# Bokeh for Time Series Analysis
<hr style="border: 2px solid black;">


<img src="./images/bokeh.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">

<img src="./images/bokeh_at_ag_glance.png" alt="bokeh Logo" width="1000"/>
<hr style="border: 2px solid black;">
**Introduction to Bokeh**
Bokeh is an interactive visualization library for Python that targets modern web browsers for presentation.
Unlike Matplotlib, which is primarily designed for static plots, Bokeh excels at creating
interactive plots and dashboards. It can handle large datasets and streaming data,
making it suitable for real-time applications.

**Key Features of Bokeh:**

* **Interactivity:** Built-in support for zooming, panning, hovering, and other interactive tools.
* **Web-Focused:** Generates HTML and JavaScript, making it easy to embed plots in web pages.
* **High Performance:** Can handle large datasets efficiently.
* **Versatility:** Supports a wide range of plot types (lines, bars, scatter plots, etc.).

<hr style="border: 2px solid black;">


**Documentation:**

For comprehensive documentation, please refer to the official Bokeh website: [https://docs.bokeh.org/en/latest/](https://docs.bokeh.org/en/latest/)


<hr style="border: 2px solid black;">


**Lab Exercise:**

Your task is to recreate the time series analysis lab we previously conducted using Pandas,
Matplotlib, and Seaborn, but this time, utilize the Bokeh library for visualization.
This will involve:

1.  Loading and preprocessing the "AirPassengersDates.csv" dataset.
2.  Creating interactive Bokeh plots for:
    * Time series line plots
    * Bar plots of aggregated data
    * Visualizing mean and standard deviation
    * Outlier detection
    * Resampling (upsampling and downsampling)
    * Lag analysis
    * Autocorrelation

Pay close attention to Bokeh's features for interactivity (tools, hover effects) and
its handling of data sources. Aim to replicate the insights and visualizations
from the previous lab while leveraging Bokeh's strengths.

Good luck!
<hr style="border: 2px solid black;">

# Setup & Data Loading

In [1]:
pip install bokeh

Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd
import numpy as np
from pathlib import Path

from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.layouts import column

output_notebook()

# Load data
df = pd.read_csv("./datasets/AirPassengersDates.csv", parse_dates=["Date"])
df.set_index("Date", inplace=True)
df.head()



Unnamed: 0_level_0,#Passengers
Date,Unnamed: 1_level_1
1949-01-12,112
1949-02-24,118
1949-03-22,132
1949-04-05,129
1949-05-24,121


# Date/Time Feature Extraction

In [3]:
df["Year"] = df.index.year
df["Month"] = df.index.month
df["Day"] = df.index.day
df.head()



Unnamed: 0_level_0,#Passengers,Year,Month,Day
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1949-01-12,112,1949,1,12
1949-02-24,118,1949,2,24
1949-03-22,132,1949,3,22
1949-04-05,129,1949,4,5
1949-05-24,121,1949,5,24


# Time series line plot with hover and zoom

In [4]:
source = ColumnDataSource(data={
    'x': df.index,
    'y': df['#Passengers']
})

p = figure(title="Passagers avec HoverTool", x_axis_type='datetime', width=800, height=400, tools="pan,wheel_zoom,box_zoom,reset,save")
p.line('x', 'y', source=source, line_width=2)
hover = HoverTool(tooltips=[("Date", "@x{%F}"), ("Passagers", "@y")],
                  formatters={'@x': 'datetime'}, mode='vline')
p.add_tools(hover)
show(p)


# Bar chart with aggregated data

In [5]:
monthly_totals = df.groupby("Month")["#Passengers"].sum().reset_index()
source = ColumnDataSource(monthly_totals)

p = figure(title="Total Passengers per Month", 
           x_axis_label="Month", 
           y_axis_label="Total Passengers",
           width=800, height=400)

p.vbar(x="Month", top="#Passengers", width=0.7, source=source)
show(p)


# Mean and standard deviation

In [6]:
mean_val = df["#Passengers"].mean()
std_val = df["#Passengers"].std()

p = figure(title="Passengers Over Time with Mean ± Std", 
           x_axis_type="datetime", width=800, height=400)

p.line(df.index, df["#Passengers"], legend_label="Passengers", line_width=2)
p.line(df.index, [mean_val]*len(df), line_dash="dashed", color="green", legend_label="Mean")
p.line(df.index, [mean_val + std_val]*len(df), line_dash="dotted", color="orange", legend_label="+1 STD")
p.line(df.index, [mean_val - std_val]*len(df), line_dash="dotted", color="orange", legend_label="-1 STD")

p.legend.location = "top_left"
show(p)



# Outlier detection

In [7]:
mean_val = df["#Passengers"].mean()
std_val = df["#Passengers"].std()

outliers = df[np.abs(df["#Passengers"] - mean_val) > 2 * std_val]

p = figure(title="Outlier Detection (2σ)", x_axis_type="datetime", width=800, height=400)
p.line(df.index, df["#Passengers"], line_width=2, legend_label="Passengers")
p.circle(outliers.index, outliers["#Passengers"], size=8, color="red", legend_label="Outliers")

p.legend.location = "top_left"
show(p)




# Resampling (upsampling and downsampling)

In [8]:
daily = df["#Passengers"].resample("D").asfreq()
daily_interpolated = daily.interpolate(method="linear")

p = figure(title="Upsampling to Daily Frequency (Interpolated)", x_axis_type="datetime", width=800, height=400)

p.line(df.index, df["#Passengers"], legend_label="Original", color="blue", line_width=2, alpha=0.7)
p.line(daily_interpolated.index, daily_interpolated.values, legend_label="Upsampled + Interpolated", color="orange", line_dash="dashed")

p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.yaxis.axis_label = "Passengers"
show(p)


In [9]:
yearly = df["#Passengers"].resample("YE").mean()

p = figure(title="Downsampling to Yearly Frequency (Line + Points)", 
           x_axis_type="datetime", width=800, height=400)

p.line(yearly.index, yearly.values, line_width=2, color="green", legend_label="Yearly Avg")

p.circle(yearly.index, yearly.values, size=8, color="green")

p.line(df.index, df["#Passengers"], color="blue", alpha=0.4, legend_label="Original")

p.legend.location = "top_left"
p.xaxis.axis_label = "Year"
p.yaxis.axis_label = "Average Passengers"

show(p)



# Lag analysis

In [12]:
passenger_df = df.copy()

# Shift de la série (valeurs) : vers le bas
passenger_df["#Passengers_Shift"] = passenger_df["#Passengers"].shift(1)

# Shift temporel (décalage de l'index) : vers la droite
passenger_df["#Passengers_tShift"] = passenger_df["#Passengers"].shift(1, freq="MS")

# Préparer la source pour Bokeh
 # remettre Date comme colonne pour Bokeh
source = ColumnDataSource(passenger_df)

# Affichage
p = figure(title="Shift vs tShift", x_axis_type="datetime", width=900, height=400)

p.line(x='Date', y='#Passengers', source=source, color='blue', legend_label='Original', line_width=2)
p.line(x='Date', y='#Passengers_Shift', source=source, color='green', legend_label='Shift (1 row)', line_dash="dashed")
p.line(x='Date', y='#Passengers_tShift', source=source, color='red', legend_label='tShift (1 MS)', line_dash="dotdash")

p.legend.location = "top_left"
p.xaxis.axis_label = "Date"
p.yaxis.axis_label = "Number of Passengers"

show(p)

# Autocorrelation plot

In [16]:
lags = range(1, 13)
autocorr = [df['#Passengers'].autocorr(lag=lag) for lag in lags]

p = figure(title="Autocorrélation", x_axis_label="Lag", y_axis_label="Coefficient", width=800, height=400)
p.vbar(x=list(lags), top=autocorr, width=0.5)
show(p)
