# Chapter 01: Getting Started

This notebook introduces the vangja time series forecasting package using the classic Air Passengers dataset (similar to how Facebook Prophet tutorials begin).

## In This Notebook

We cover the fundamental concepts of vangja:
1. **Loading data** using vangja's built-in dataset functions
2. **Building models** by composing trend and seasonality components
3. **Additive vs multiplicative** models and when to use each
4. **Evaluating** forecasts with standard metrics

## Setup and Imports

In [None]:
import warnings

warnings.filterwarnings("ignore")

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from vangja import FourierSeasonality, LinearTrend
from vangja.datasets import load_air_passengers
from vangja.utils import metrics

# Set random seed for reproducibility
np.random.seed(42)

print("Imports successful!")

---

## 1. Load Air Passengers Dataset

The Air Passengers dataset is a classic time series dataset containing monthly totals of international airline passengers from 1949 to 1960.

Vangja provides convenience functions in `vangja.datasets` to load common datasets in the expected format (columns: `ds` for datetime, `y` for target values).

In [None]:
# Load Air Passengers dataset using vangja.datasets
air_passengers = load_air_passengers()

print(f"Dataset shape: {air_passengers.shape}")
print(f"Date range: {air_passengers['ds'].min()} to {air_passengers['ds'].max()}")
air_passengers.head()

In [None]:
# Visualize the data
plt.figure(figsize=(14, 5))
plt.plot(air_passengers["ds"], air_passengers["y"])
plt.title("Air Passengers Dataset")
plt.xlabel("Date")
plt.ylabel("Number of Passengers (thousands)")
plt.grid(True)
plt.show()

---

## 2. Train/Test Split

We hold out the last 12 months of data as a test set. This lets us evaluate how well the model extrapolates beyond the training period.

In [None]:
# Split data: use last 12 months for testing
train = air_passengers[:-12].copy()
test = air_passengers[-12:].copy()

print(
    f"Training set: {train['ds'].min()} to {train['ds'].max()} ({len(train)} samples)"
)
print(f"Test set: {test['ds'].min()} to {test['ds'].max()} ({len(test)} samples)")

---

## 3. Model Air Passengers like Facebook Prophet

Facebook Prophet models time series as a sum of interpretable components:
- **Trend** component (piecewise linear or logistic growth)
- **Seasonality** component (Fourier series)
- **Holiday effects** (optional)

For the Air Passengers dataset, we observe:
- A clear upward trend
- Strong yearly seasonality
- Multiplicative seasonality (the seasonal amplitude increases with the level)

Vangja uses operator overloading to compose models from these building blocks:

| Operator | Meaning | Formula |
|----------|---------|--------|
| `+` | Additive | $y = \text{left} + \text{right}$ |
| `**` | Multiplicative (Prophet-style) | $y = \text{left} \cdot (1 + \text{right})$ |
| `*` | Simple multiplicative | $y = \text{left} \cdot \text{right}$ |

### 3.1 Additive Model

An additive model assumes the final value is the sum of its components: $y(t) = g(t) + s(t) + \epsilon$. Here, `LinearTrend()` captures the upward growth and `FourierSeasonality()` captures repeating patterns at yearly and weekly frequencies.

In [None]:
# Define an additive model: Trend + Yearly Seasonality + Weekly Seasonality
model_additive = (
    LinearTrend()
    + FourierSeasonality(period=365.25, series_order=10)
    + FourierSeasonality(period=7, series_order=3)
)

print(f"Model: {model_additive}")

In [None]:
# Fit the additive model
model_additive.fit(train)
print("Additive model fitted!")

In [None]:
# Predict
future_additive = model_additive.predict(horizon=365, freq="D")
print(f"Predictions shape: {future_additive.shape}")
future_additive.head()

In [None]:
# Plot results
plt.figure(figsize=(14, 5))
plt.plot(train["ds"], train["y"], "b.", label="Training data", markersize=3)
plt.plot(test["ds"], test["y"], "g.", label="Test data", markersize=3)
plt.plot(
    future_additive["ds"],
    future_additive["yhat_0"],
    "r-",
    label="Prediction",
    linewidth=1,
)
plt.title("Additive Model: Air Passengers")
plt.xlabel("Date")
plt.ylabel("Number of Passengers")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
model_additive.plot(future_additive, y_true=test)
plt.tight_layout()
plt.show()

### 3.2 Multiplicative Model

The Air Passengers data shows **multiplicative seasonality** — the variance of the seasonal fluctuations increases with the trend level. A multiplicative model captures this via: $y(t) = g(t) \cdot (1 + s(t)) + \epsilon$

In vangja, the `**` operator creates this multiplicative relationship.

In [None]:
# Define a multiplicative model
model_mult = LinearTrend() ** (
    FourierSeasonality(period=365.25, series_order=10)
    + FourierSeasonality(period=7, series_order=3)
)

print(f"Model: {model_mult}")

In [None]:
# Fit the multiplicative model
model_mult.fit(train)
print("Multiplicative model fitted!")

We plot the results to show how the multiplicative seasonality better captures the increase of variance with the trend.

In [None]:
# Predict
future_mult = model_mult.predict(horizon=365, freq="D")

# Plot results
plt.figure(figsize=(14, 5))
plt.plot(train["ds"], train["y"], "b.", label="Training data", markersize=3)
plt.plot(test["ds"], test["y"], "g.", label="Test data", markersize=3)
plt.plot(
    future_mult["ds"], future_mult["yhat_0"], "r-", label="Prediction", linewidth=1
)
plt.title("Multiplicative Model: Air Passengers")
plt.xlabel("Date")
plt.ylabel("Number of Passengers")
plt.legend()
plt.grid(True)
plt.show()

In [None]:
model_mult.plot(future_mult, y_true=test)
plt.tight_layout()
plt.show()

### Metrics Comparison

We compare standard forecasting metrics between the additive and multiplicative models. Lower values are better for all metrics: MSE (mean squared error), RMSE (root mean squared error), MAE (mean absolute error), and MAPE (mean absolute percentage error).

In [None]:
metrics_additive = metrics(test, future_additive, "complete")
print("Additive Model Metrics:")
display(metrics_additive)

In [None]:
metrics_mult = metrics(test, future_mult, "complete")
print("Multiplicative Model Metrics:")
display(metrics_mult)

---

## Summary

In this chapter, we introduced the core modeling pattern of vangja using the Air Passengers dataset:
1. **Additive model** (`+`): Combines trend and seasonality as $y = g(t) + s(t) + \epsilon$. Works well when the seasonal amplitude is roughly constant over time.
2. **Multiplicative model** (`**`): Combines trend and seasonality as $y = g(t) \cdot (1 + s(t)) + \epsilon$. Better suited when the seasonal amplitude grows proportionally with the level — as we see in the Air Passengers data.

The multiplicative model generally produces better forecasts for this dataset because the variance of the seasonal fluctuations increases with the number of passengers over the years.

### What's Next

In **Chapter 02**, we explore the different Bayesian inference algorithms available in vangja — MAP, Variational Inference, and MCMC — and compare their speed, accuracy, and uncertainty quantification capabilities.