# Section 1: Introduction and Fundamentals

#### PyData London 2025 - Bayesian Time Series Analysis with PyMC

---

The world around us is inherently **dynamic**, with almost everything we care about -- from the environment and economy, to sports and health -- constantly evolving. Time series analysis is the discipline dedicated to making sense of this change by considering data collected sequentially, over time. It aims to uncover underlying patterns such as **trends, seasonal variations, and cyclical movements**, to gain insights into the mechanisms generating the data, and crucially, to forecast future outcomes. The products of time series analysos underpins informed **decision-making**, allows for robust **risk assessment**, and enables strategic **planning based** on projected future behavior.

Building upon this, Bayesian time series analysis offers a particularly powerful and adaptable framework. This approach has seen a significant surge in adoption, due to several compelling advantages. It provides a formal structure for integrating **prior information** that might be available before observing the data, a feature that is invaluable when data is sparse. Bayesian methods are also intrinsically suited for sequential learning and **adaptive decision-making**, allowing models to be refined as new information becomes available. Furthermore, Bayesian methods allow for robust inference from **small sample data**, avoiding reliance on asymptotic approximations common in some classical techniques. 

A key strength of the Bayesian approach is its natural and comprehensive handling of prediction, as it systematically accounts for **parameter**, **model**, and **aleatoric** uncertainty by integrating parameters out with the posterior distribution to create a predictive distribution that is probabilistic. The advent of sophisticated computational tools, most notably Markov chain Monte Carlo (MCMC) methods, has been pivotal, making even complex time series models amenable to practical Bayesian analysis.

In [None]:
import numpy as np
import polars as pl
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
from statsmodels.tsa.stattools import acf

warnings.filterwarnings('ignore')

RNG = np.random.default_rng(RANDOM_SEED:=42)

print("📊 Libraries loaded successfully!")

## Key Characteristics of Time Series Data

Before diving into Bayesian methods, let's establish a solid foundation in time series fundamentals. Understanding these characteristics is crucial for building appropriate models and interpreting results effectively.

### The Anatomy of Time Series: Core Components

Time series data can be understood as a combination of several fundamental components. Think of these as the **"building blocks"** that, when combined, create the patterns we observe in real-world data. Understanding each component is crucial for effective modeling and interpretation.

#### 1. **Trend ($T_t$)**: The Long-Term Direction

The **trend** represents the underlying long-term movement in the data. Trends can take various forms depending on the underlying process generating the data.

A **linear trend** shows steady increase or decrease over time, such as population growth or the gradual decline in manufacturing employment in developed countries. **Non-linear trends** exhibit curved patterns, including exponential growth seen in technology adoption or S-shaped curves characteristic of market saturation processes. Some time series exhibit **changing trends** where the direction shifts over time, commonly observed in economic cycles where growth periods alternate with contractions.

#### 2. **Seasonality ($S_t$)**: Predictable Recurring Patterns

**Seasonal patterns** repeat over fixed, known periods, and their key characteristic is **predictability**—if you know the season, you can anticipate the pattern with reasonable confidence. This predictability makes seasonality one of the most valuable components for forecasting.

**Annual seasonality** appears in phenomena like holiday sales spikes, agricultural production cycles, and weather-dependent energy consumption. **Weekly seasonality** is common in business data, where website traffic peaks on weekdays or retail sales follow consistent weekly patterns. **Daily seasonality** manifests in rush hour traffic patterns, electricity usage that peaks during evening hours, or social media activity that varies throughout the day. Many real-world time series exhibit **multiple seasonality**, such as retail data that shows both weekly patterns (higher weekend sales) and annual patterns (holiday shopping seasons).

#### 3. **Cyclical Patterns ($C_t$)**: Irregular Long-Term Fluctuations

Unlike seasonality, **cyclical patterns** have variable periods and are often driven by external factors that don't follow a fixed schedule. These cycles represent longer-term fluctuations that can span several years or even decades.

**Business cycles** encompass economic expansions and recessions that vary in duration and intensity. **Market cycles** include bull and bear markets in financial data, where periods of growth and decline don't follow predictable timing. **Natural cycles** such as El Niño and La Niña climate patterns affect weather, agriculture, and economic activity over multi-year periods with irregular timing.

#### 4. **Irregular/Noise ($\epsilon_t$)**: The Unpredictable Component

The **irregular component** represents random variation that cannot be explained by trend, seasonality, or cycles. This component is inherently unpredictable but understanding its sources helps in model specification and interpretation.

**Measurement errors** include sensor noise, rounding errors, and data collection inconsistencies that add random variation to observations. **Random events** such as unexpected news, natural disasters, or policy changes create one-time shocks that don't follow systematic patterns. **Model limitations** also contribute to the irregular component when our models cannot capture patterns that are too complex or when important variables are omitted from the analysis.

### Temporal Dependence

The defining characteristic that makes time series data special is **temporal dependence**—observations that are close in time are typically more similar than observations that are far apart. This fundamental property violates the assumptions of independence and identical distribution underlying standard statistical methods and necessitates specialized techniques.

**Autocorrelation** provides a mathematical framework for quantifying this temporal dependence. The autocorrelation function measures the linear relationship between observations separated by different time intervals:

$$\rho_k = \frac{\gamma_k}{\gamma_0} = \frac{\text{Cov}(X_t, X_{t+k})}{\text{Var}(X_t)}$$

The autocorrelation function completely characterizes the linear dependence structure of a stationary time series and is fundamental to time series analysis, particularly in ARMA modeling and spectral analysis.

where $h$ represents the **lag** or time separation between observations.

Understanding autocorrelation patterns reveals crucial insights about the underlying data generating process. **Strong autocorrelation** at short lags indicates that recent observations are highly predictive of future values, making forecasting possible and effective. Conversely, **weak autocorrelation** suggests that the series behaves more like random noise, making prediction challenging. Different autocorrelation patterns reveal different underlying processes: slowly decaying autocorrelations suggest trending behavior, while oscillating patterns indicate seasonal or cyclical components.

In [None]:
phi = 0.7  
sigma = 1.0 
n_samples = 500
n_lags = 20

ar1_data = np.zeros(n_samples)
ar1_data[0] = RNG.normal(0, sigma) 

for t in range(1, n_samples):
    ar1_data[t] = phi * ar1_data[t-1] + RNG.normal(0, sigma)

autocorr_ar1 = acf(ar1_data, nlags=n_lags, fft=True)

n_effective = len(ar1_data)
confidence_bound = 1.96 / np.sqrt(n_effective)

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=("AR(1) Time Series", "Autocorrelation Function")
)

time_indices = np.arange(n_samples)
fig.add_trace(
    go.Scatter(
        x=time_indices,
        y=ar1_data,
        mode='lines',
        line=dict(color='steelblue', width=1.5),
        name='AR(1) Series'
    ),
    row=1, col=1
)

lags = np.arange(n_lags + 1)
for i, (lag, corr) in enumerate(zip(lags, autocorr_ar1)):
    fig.add_trace(
        go.Scatter(
            x=[lag, lag],
            y=[0, corr],
            mode='lines',
            line=dict(color='darkred', width=2),
            showlegend=False
        ),
        row=1, col=2
    )

fig.add_trace(
    go.Scatter(
        x=lags,
        y=autocorr_ar1,
        mode='markers',
        marker=dict(color='darkred', size=8),
        name='Empirical'
    ),
    row=1, col=2
)

theoretical_acf = phi ** lags
fig.add_trace(
    go.Scatter(
        x=lags,
        y=theoretical_acf,
        mode='lines',
        line=dict(color='orange', width=2, dash='dash'),
        name='Theoretical'
    ),
    row=1, col=2
)

fig.add_hline(y=confidence_bound, line=dict(color='gray', width=1, dash='dot'), row=1, col=2)
fig.add_hline(y=-confidence_bound, line=dict(color='gray', width=1, dash='dot'), row=1, col=2)

fig.update_xaxes(title_text='Time', row=1, col=1)
fig.update_yaxes(title_text='Value', row=1, col=1)
fig.update_xaxes(title_text='Lag', row=1, col=2)
fig.update_yaxes(title_text='Autocorrelation', row=1, col=2)

fig.update_layout(height=400, legend=dict(orientation='h', y=-0.2))
fig.update_layout(
    margin=dict(l=50, r=50, t=50, b=50),
    height=400, 
    width=900,
    legend=dict(orientation='h', y=-0.2)
)
fig.show()




### Stationarity

A time series is **stationary** if its statistical properties remain constant over time. This concept is fundamental to time series analysis because many statistical methods and theoretical results depend on this assumption.

**Strict stationarity** requires that the joint distribution of any collection of observations is invariant to time shifts. However, in practice, we typically work with **weak stationarity** (also called covariance stationarity), which requires three conditions. First, the **mean must be constant**: $E[y_t] = \mu$ for all time points $t$, meaning the series doesn't exhibit trending behavior. Second, the **variance must be constant**: $\text{Var}(y_t) = \sigma^2$ for all $t$, indicating that the variability around the mean doesn't change over time. Third, the **covariance must be time-invariant**: $\text{Cov}(y_t, y_{t+h})$ depends only on the lag $h$, not on the specific time point $t$.

Stationarity matters for several crucial reasons in time series analysis. Many statistical methods, including classical forecasting techniques and some Bayesian models, assume stationarity for their theoretical validity. **Non-stationary series** can lead to spurious relationships where variables appear correlated simply because they both trend over time, even when no true causal relationship exists. Fortunately, many non-stationary series can be made stationary through appropriate **transformations**. Differencing removes trends by computing $y_t - y_{t-1}$, detrending removes systematic time-dependent patterns, and variance-stabilizing transformations like logarithms can address changing variability.

In [None]:
n_samples = 300
time_indices = np.arange(n_samples)

phi_stationary = 0.2
sigma = 1.0

stationary_series = RNG.normal(0, sigma / np.sqrt(1 - phi_stationary**2), n_samples)
for t in range(1, n_samples):
    stationary_series[t] = phi_stationary * stationary_series[t-1] + RNG.normal(0, sigma)

phi_nonstationary = 1.0
nonstationary_series = np.zeros(n_samples)
nonstationary_series[0] = RNG.normal(0, sigma)

for t in range(1, n_samples):
    nonstationary_series[t] = phi_nonstationary * nonstationary_series[t-1] + RNG.normal(0, sigma)

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=(
        'Stationary Series (AR(1), φ=0.2)',
        'Non-Stationary Series (Random Walk, φ=1.0)'
    )
)

fig.add_trace(
    go.Scatter(x=time_indices, y=stationary_series, mode='lines',
               line=dict(color='steelblue', width=1.5), showlegend=False),
    row=1, col=1
)

fig.add_trace(
    go.Scatter(x=time_indices, y=nonstationary_series, mode='lines',
               line=dict(color='red', width=1.5), showlegend=False),
    row=1, col=2
)

fig.update_xaxes(title_text='Time', row=1, col=1)
fig.update_yaxes(title_text='Value', row=1, col=1)
fig.update_xaxes(title_text='Time', row=1, col=2)
fig.update_yaxes(title_text='Value', row=1, col=2)

fig.show()

## Exploring Time Series Components in the Births Dataset

Now let's apply our understanding of time series components to a dataset of daily birth counts from the United States. This classic dataset, featured in Gelman et al.'s *Bayesian Data Analysis*, provides an excellent case study for time series modeling because it exhibits multiple overlapping patterns that affect birth rates.

<img src="images/bda_cover.png" alt="BDA" width="50%">

We'll visualize the data and identify the different components that make up this time series.

In [None]:
# Load and explore the births dataset - a classic time series example
# Handle null values in the data
births_data = pl.read_csv('../data/births.csv', null_values=['null', 'NA', '', 'NULL'])

# Filter out rows with null days if any exist
births_data = births_data.filter(pl.col('day').is_not_null())

# Aggregate to monthly data
monthly_births = (births_data
    .group_by(['year', 'month'])
    .agg(pl.col('births').sum())
    .sort(['year', 'month'])
)

# Create valid dates using the first day of each month
monthly_births = monthly_births.with_columns([
    pl.date(pl.col('year'), pl.col('month'), 1).alias('date')
])

# Focus on a 20-year period for clarity (1970-1990)
births_subset = (monthly_births
    .filter((pl.col('year') >= 1970) & (pl.col('year') <= 1990))
    .with_row_index('index')
)

print(f"📈 Births Dataset Overview:")
print(f"   • Total months: {births_subset.height}")
print(f"   • Date range: {births_subset['year'].min()} to {births_subset['year'].max()}")
print(f"   • Monthly births range: {births_subset['births'].min():,} to {births_subset['births'].max():,}")
print(f"   • Average monthly births: {births_subset['births'].mean():.0f}")
print(f"   • Standard deviation: {births_subset['births'].std():.0f}")

# Display the first few observations
print(f"\n📋 First few observations:")
print(births_subset.select(['year', 'month', 'births']).head(10))

In [None]:
dates = births_subset['date'].to_list()
births_values = births_subset['births'].to_numpy()

monthly_avg = (births_subset
    .group_by('month')
    .agg(pl.col('births').mean())
    .sort('month')
    .select('births')
    .to_numpy().flatten()
)
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

yearly_data = (births_subset
    .group_by('year')
    .agg(pl.col('births').mean())
    .sort('year')
)
yearly_years = yearly_data.select('year').to_numpy().flatten()
yearly_avg = yearly_data.select('births').to_numpy().flatten()

def rolling_window(data, window):
    """Calculate rolling statistics using numpy"""
    shape = data.shape[:-1] + (data.shape[-1] - window + 1, window)
    strides = data.strides + (data.strides[-1],)
    rolled = np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides)
    return rolled

window = 12
pad_size = window // 2
rolling_mean = np.full(len(births_values), np.nan)
rolling_std = np.full(len(births_values), np.nan)

for i in range(pad_size, len(births_values) - pad_size):
    start_idx = i - pad_size
    end_idx = i + pad_size + 1
    window_data = births_values[start_idx:end_idx]
    rolling_mean[i] = np.mean(window_data)
    rolling_std[i] = np.std(window_data)

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        '📈 Monthly Births (1970-1988)',
        '🗓️ Average Births by Month',
        '📊 Average Births by Year',
        '📈 Trend and Variability'
    ),
    specs=[[{}, {}], [{}, {}]]
).add_trace(
    go.Scatter(
        x=dates,
        y=births_values,
        mode='lines',
        line=dict(color='steelblue', width=1.5),
        name='Monthly Births',
        showlegend=False
    ),
    row=1, col=1
).add_trace(
    go.Scatter(
        x=month_names,
        y=monthly_avg,
        marker_color='lightcoral',
        opacity=0.7,
        name='Average Births',
        showlegend=False
    ),
    row=1, col=2
).add_trace(
    go.Scatter(
        x=yearly_years,
        y=yearly_avg,
        mode='lines+markers',
        line=dict(color='darkgreen', width=2),
        marker=dict(size=6),
        name='Yearly Average',
        showlegend=False
    ),
    row=2, col=1
).add_trace(
    go.Scatter(
        x=dates,
        y=births_values,
        mode='lines',
        line=dict(color='lightblue', width=1),
        opacity=0.3,
        name='Original',
        legendgroup='trend'
    ),
    row=2, col=2
).add_trace(
    go.Scatter(
        x=dates,
        y=rolling_mean,
        mode='lines',
        line=dict(color='red', width=2),
        name='12-Month Trend',
        legendgroup='trend'
    ),
    row=2, col=2
).add_trace(
    go.Scatter(
        x=dates,
        y=rolling_mean + rolling_std,
        mode='lines',
        line=dict(width=0),
        showlegend=False,
        hoverinfo='skip'
    ),
    row=2, col=2
).add_trace(
    go.Scatter(
        x=dates,
        y=rolling_mean - rolling_std,
        mode='lines',
        line=dict(width=0),
        fill='tonexty',
        fillcolor='rgba(255,0,0,0.2)',
        name='±1 Std Dev',
        legendgroup='trend'
    ),
    row=2, col=2
).update_layout(
    height=700,
    title_text='Time Series Components Analysis',
    showlegend=False,
)

fig.update_yaxes(title_text='Number of Births', row=1, col=1)
fig.update_yaxes(title_text='Average Births', row=1, col=2)
fig.update_yaxes(title_text='Average Births', row=2, col=1)
fig.update_yaxes(title_text='Number of Births', row=2, col=2)
fig.update_xaxes(title_text='Year', row=2, col=1)

fig.show()

We can perform classical seasonal decomposition using pure numpy. We'll implement a simple moving average for trend estimation (12-month centered)

In [None]:

def simple_moving_average(data, window=12):
    """Calculate centered moving average for trend estimation"""
    trend = np.full_like(data, np.nan, dtype=float)
    half_window = window // 2
    
    for i in range(half_window, len(data) - half_window):
        trend[i] = np.mean(data[i - half_window:i + half_window + 1])
    
    return trend

trend_component = simple_moving_average(births_values, window=12)

trend_filled = np.where(np.isnan(trend_component), np.nanmean(trend_component), trend_component)
detrended = births_values - trend_filled
seasonal_component = np.zeros_like(births_values)

for month in range(12):
    month_indices = np.arange(month, len(births_values), 12)
    month_values = detrended[month_indices]
    month_values_clean = month_values[~np.isnan(month_values)]
    if len(month_values_clean) > 0:
        seasonal_effect = np.mean(month_values_clean)
        seasonal_component[month_indices] = seasonal_effect

residual_component = births_values - trend_filled - seasonal_component

class SimpleDecomposition:
    def __init__(self, observed, trend, seasonal, resid):
        self.observed = observed
        self.trend = trend
        self.seasonal = seasonal
        self.resid = resid

decomp_add = SimpleDecomposition(births_values, trend_component, seasonal_component, residual_component)

fig = make_subplots(
    rows=4, cols=1,
    subplot_titles=(
        '📊 Original Data',
        '📈 Trend Component',
        '🗓️ Seasonal Component',
        '🎲 Residual Component'
    ),
    vertical_spacing=0.08
)

components_add = ['observed', 'trend', 'seasonal', 'resid']
colors = ['steelblue', 'darkred', 'darkgreen', 'purple']
y_labels = ['Births', 'Trend Level', 'Seasonal Effect', 'Residual']

for i, (comp, color, ylabel) in enumerate(zip(components_add, colors, y_labels)):
    data = getattr(decomp_add, comp)
    fig.add_trace(
        go.Scatter(
            x=dates,
            y=data,
            mode='lines',
            line=dict(color=color, width=1.5),
            name=comp.title(),
            showlegend=False
        ),
        row=i+1, col=1
    )
    
    fig.update_yaxes(title_text=ylabel, row=i+1, col=1)

fig.update_layout(
    height=800,
    title_text='Seasonal Decomposition of Monthly Births Data',
    showlegend=False
)

## Autocorrelation Analysis

Let's examine the temporal dependence in our births data by computing and visualizing the autocorrelation function (ACF). This will help us understand how past values influence future values.

In [None]:
max_lags = 36  
autocorr = acf(births_values, nlags=max_lags, fft=True)

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=(
        '📊 Autocorrelation Function (ACF)',
        '🗓️ Seasonal Autocorrelations'
    ),
    horizontal_spacing=0.1
)

lags = np.arange(max_lags + 1)

for i, (lag, corr) in enumerate(zip(lags, autocorr)):
    fig.add_trace(
        go.Scatter(
            x=[lag, lag],
            y=[0, corr],
            mode='lines',
            line=dict(color='steelblue', width=2),
            showlegend=False
        ),
        row=1, col=1
    )
    
fig.add_trace(
    go.Scatter(
        x=lags,
        y=autocorr,
        mode='markers',
        marker=dict(color='steelblue', size=6),
        name='ACF',
        showlegend=False
    ),
    row=1, col=1
)

fig.add_hline(y=0, line_color='black', line_width=1, opacity=0.3, row=1, col=1)
fig.add_hline(y=0.2, line_color='red', line_dash='dash', opacity=0.5, row=1, col=1)
fig.add_hline(y=-0.2, line_color='red', line_dash='dash', opacity=0.5, row=1, col=1)

seasonal_lags = [12, 24, 36]  # 1, 2, 3 years
seasonal_autocorr = [autocorr[lag] for lag in seasonal_lags]

fig.add_trace(
    go.Bar(
        x=['1 Year', '2 Years', '3 Years'],
        y=seasonal_autocorr,
        marker_color='lightcoral',
        opacity=0.7,
        name='Seasonal ACF',
        showlegend=False
    ),
    row=1, col=2
)

fig.add_hline(y=0, line_color='black', line_width=1, opacity=0.3, row=1, col=2)

fig.update_layout(
    height=400,
    title_text='Autocorrelation Analysis',
    showlegend=False
)

fig.update_xaxes(title_text='Lag (months)', row=1, col=1)
fig.update_yaxes(title_text='Autocorrelation', row=1, col=1)
fig.update_xaxes(title_text='Lag (months)', row=1, col=2)
fig.update_yaxes(title_text='Autocorrelation', row=1, col=2)

fig.show()

## Data Preprocessing Techniques

Before building Bayesian models, proper data preprocessing is essential for ensuring reliable and interpretable results. This section demonstrates key preprocessing techniques that prepare time series data for effective modeling.

### Why Preprocessing Matters

Time series preprocessing serves several critical purposes that directly impact the success of Bayesian modeling. **Numerical stability** is perhaps the most important consideration—standardization helps MCMC samplers converge more reliably by ensuring that all variables operate on similar scales, preventing numerical overflow or underflow issues that can cause sampling algorithms to fail.

**Prior specification** becomes much more intuitive with normalized data. When variables are standardized to have zero mean and unit variance, it's easier to specify reasonable prior distributions since we know the approximate scale of the data. **Model interpretation** is also enhanced because standardized coefficients can be directly compared in terms of their relative importance, and the magnitude of effects becomes more meaningful.

Finally, **computational efficiency** improves significantly with well-scaled data. MCMC algorithms explore the parameter space more effectively when the posterior distribution is well-conditioned, leading to faster sampling and better mixing of the chains.

### Common Preprocessing Techniques

Several preprocessing techniques are commonly used in time series analysis, each serving specific purposes. **Standardization** using the formula $(x - \mu) / \sigma$ centers data at zero with unit variance, making it the most popular choice for Bayesian modeling. **Min-Max normalization** scales data to the [0,1] range using $(x - \min) / (\max - \min)$, which is useful when you need bounded variables.

**Log transformation** $\log(x)$ serves multiple purposes: it stabilizes variance in series with changing variability, handles exponential growth patterns, and can make multiplicative relationships additive. **Differencing** computes $x_t - x_{t-1}$ to remove trends and induce stationarity, which is essential for many time series models. Finally, **seasonal decomposition** separates the series into trend, seasonal, and residual components, allowing you to model each component separately or remove seasonal effects before modeling.

In [None]:
original_data = births_subset['births'].to_numpy()

# 1. Standardization (most common for Bayesian modeling)
standardized = (original_data - original_data.mean()) / original_data.std()

# 2. Min-Max normalization  
min_max_norm = (original_data - original_data.min()) / (original_data.max() - original_data.min())

# 3. Log transformation
log_transform = np.log(original_data)

print("📊 Preprocessing Results:")
print(f"   • Original data: mean={original_data.mean():.0f}, std={original_data.std():.0f}")
print(f"   • Standardized: mean={standardized.mean():.3f}, std={standardized.std():.3f}")
print(f"   • Min-Max norm: min={min_max_norm.min():.3f}, max={min_max_norm.max():.3f}")
print(f"   • Log transform: mean={log_transform.mean():.3f}, std={log_transform.std():.3f}")

births_standardized = standardized
print(f"\n✅ **Selected preprocessing**: Standardized data (mean=0, std=1)")
print(f"   This choice provides:")
print(f"   • Numerical stability for MCMC sampling")
print(f"   • Easy interpretation of parameters")
print(f"   • Natural scale for prior specification")

---

## References

Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.). OTexts. 

Mills, T. C. (2019). Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting. Academic Press.

Nielsen, A. (2019). Practical Time Series Analysis: Prediction with Statistics and Machine Learning. O'Reilly Media.

In [None]:
%load_ext watermark
%watermark -n -u -v -iv -w