# Forecasting Mechanisms: Multi-step Ahead and Probabilistic Forecasting

In this lab, you will explore different strategies for multi-step ahead forecasting
and implement probabilistic forecasting models. You will learn about:
- Direct vs. autoregressive forecasting approaches
- Curriculum learning and scheduled sampling for autoregressive models
- Probabilistic losses (Gaussian NLL, quantile loss)
- Uncertainty visualization

In [None]:
import math
import numpy
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

## Part 1: Direct vs. Autoregressive Forecasting

In this section, you will compare two approaches to multi-step ahead forecasting:
1. **Direct forecasting**: Predict all future steps in one forward pass
2. **Autoregressive forecasting**: Iteratively predict one step at a time

Below is a dataloader for the univariate ETTh1 dataset from previous labs, code for a direct
forecasting model as well as a training loop.

In [None]:
class ForecastingDataset(torch.utils.data.Dataset):
    """Windowed univariate forecasting dataset."""

    def __init__(self, 
                 csv_path: str, 
                 window: int, 
                 horizon: int, 
                 target_col: int = -1,
                 start: int = 0,
                 end: int = None):
        super().__init__()
        raw = numpy.loadtxt(csv_path, delimiter=",", skiprows=1, usecols=target_col)
        series = raw.astype(numpy.float32)
        if end is None:
            series = series[start:]
        else:
            series = series[start:end]
        self.window = window
        self.horizon = horizon
        self.series = series
        self.max_start = len(series) - window - horizon + 1
        if self.max_start < 1:
            raise ValueError("Window + horizon larger than available series length")

    def __len__(self):
        return self.max_start

    def __getitem__(self, idx: int):
        start = idx
        past = self.series[start : start + self.window]
        future = self.series[start + self.window : start + self.window + self.horizon]
        past = torch.from_numpy(past)  # shape: (window,)
        future = torch.from_numpy(future)  # shape: (horizon,)
        return past, future


def build_dataloader(csv_path: str, 
                     window: int, 
                     horizon: int, 
                     batch_size: int = 32, 
                     shuffle: bool = True):
    """Create a DataLoader emitting `(past, horizon)` batches."""
    dataset = ForecastingDataset(csv_path=csv_path, 
                                 window=window, 
                                 horizon=horizon)
    n = len(dataset)
    n_train = int(0.8 * n)
    train_dataset = ForecastingDataset(csv_path=csv_path, 
                                      window=window, 
                                      horizon=horizon,
                                      end=n_train)
    valid_dataset = ForecastingDataset(csv_path=csv_path, 
                                       window=window, 
                                       horizon=horizon,
                                       start=n_train)
    train_dl = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=batch_size, 
                                           shuffle=shuffle, 
                                           drop_last=False)
    valid_dl = torch.utils.data.DataLoader(valid_dataset,
                                           batch_size=batch_size, 
                                           shuffle=False, 
                                           drop_last=False)
    return train_dl, valid_dl


# Direct forecasting model: predicts all steps at once
class DirectForecaster(nn.Module):
    """Direct forecasting: predict all horizon steps in one forward pass."""
    
    def __init__(self, window: int, horizon: int, hidden_dim: int = 64):
        super().__init__()
        self.window = window
        self.horizon = horizon
        self.net = nn.Sequential(
            nn.Linear(window, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, horizon)
        )
    
    def forward(self, past):
        # past: (batch, window)
        # output: (batch, horizon)
        return self.net(past)


# Training and evaluation functions
def train_epoch(model, dataloader, optimizer, criterion, use_teacher_forcing=False):
    model.train()
    total_loss = 0.0
    for past, future in dataloader:
        optimizer.zero_grad()
        if isinstance(model, AutoregressiveForecaster):
            pred = model(past, use_ground_truth=use_teacher_forcing, ground_truth=future)
        else:
            pred = model(past)
        loss = criterion(pred, future)
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * past.size(0)
    return total_loss / len(dataloader.dataset)


@torch.no_grad()
def eval_epoch(model, dataloader, criterion):
    model.eval()
    total_loss = 0.0
    for past, future in dataloader:
        if isinstance(model, AutoregressiveForecaster):
            pred = model(past, use_ground_truth=False)
        else:
            pred = model(past)
        loss = criterion(pred, future)
        total_loss += loss.item() * past.size(0)
    return total_loss / len(dataloader.dataset)


def train_and_valid_loop(model, train_dl, valid_dl, optimizer, criterion, n_epochs, 
                         use_teacher_forcing=False):
    logs = {"train_loss": [], "valid_loss": []}
    print(model.__class__.__name__)
    for epoch in range(n_epochs):
        train_loss = train_epoch(model, train_dl, optimizer, criterion, use_teacher_forcing)
        logs["train_loss"].append(train_loss)
        valid_loss = eval_epoch(model, valid_dl, criterion)
        logs["valid_loss"].append(valid_loss)
        print(f"Epoch {epoch:02d} | train={train_loss:.4f} | valid={valid_loss:.4f}")
    return logs



**Question 1.** Implement an autoregressive forecasting model using the code template below.
The model should allow:
- teacher forcing
- injection of ground truth in the sliding window fed as input of the model
- curriculum learning (which could be implemented by providing only the first 
  few future steps in the `ground_truth` tensor)
- scheduled sampling through the specification of a probability `p`

In [None]:
# Autoregressive forecasting model: predicts one step at a time
class AutoregressiveForecaster(nn.Module):
    """Autoregressive forecasting: iteratively predict one step at a time."""
    
    def __init__(self, window: int, horizon: int, hidden_dim: int = 64):
        super().__init__()
        self.window = window
        self.horizon = horizon
        # Model that predicts one step ahead
        self.step_model = nn.Sequential(
            nn.Linear(window, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1)
        )
    
    def forward(self, past, use_ground_truth=False, ground_truth=None, p=None):
        predictions = []
        current_window = past.clone()  # (batch, window)
        
        for h in range(self.horizon):
            # TODO: your code goes here
            pass
            
            # Shift window: remove first element, add prediction
            # current_window = torch.cat([current_window[:, 1:], next_value], dim=1)
        
        return torch.stack(predictions, dim=1)  # (batch, horizon)

## Part 2: Curriculum Learning for Autoregressive Models

Curriculum learning gradually increases the prediction horizon during training,
helping the model learn to handle its own predictions progressively.

**Question 2.** Implement a curriculum learning training loop for the autoregressive model:
- Start by training to predict 1 step ahead
- Gradually increase to 2, 4, 8, ... steps ahead
- Finally train on the full horizon
- Compare with the baseline autoregressive model

## Part 3: Scheduled Sampling

Scheduled sampling randomly replaces ground truth with model predictions during
training, with the probability of using predictions increasing over time.

**Question 3.** Implement scheduled sampling for the autoregressive model:
- Start with probability p=1.0 (always use ground truth)
- Gradually decrease p to 0.0 (always use predictions)
- Compare with baseline autoregressive model

## Part 4: Probabilistic Forecasting with Gaussian NLL

Instead of predicting point estimates, probabilistic models predict distributions.
We'll start with a Gaussian distribution parameterized by mean and variance.

**Question 4.** Implement a probabilistic forecasting model that outputs both mean
and (log-)variance, and train it using negative log-likelihood (NLL) loss.

**Question 5.** Visualize the probabilistic forecasts with uncertainty intervals.
Show the mean prediction along with confidence intervals (e.g., 50% and 90%).

## Part 5: Quantile Regression

Quantile regression predicts multiple quantiles simultaneously, providing another
approach to uncertainty quantification without distributional assumptions.

**Question 6.** Implement quantile regression for forecasting using the below implementation for the quantile loss:
- Predict multiple quantiles (e.g., 10th, 50th, 90th percentiles)
- Visualize the quantile predictions

In [None]:
def quantile_loss(predictions, target, quantiles):
    """
    Quantile loss (pinball loss).
    
    Args:
        predictions: (batch, horizon, n_quantiles) - predicted quantiles
        target: (batch, horizon) - ground truth values
        quantiles: list of quantile values (e.g., [0.1, 0.5, 0.9])
    Returns:
        loss: scalar
    """
    target = target.unsqueeze(-1)  # (batch, horizon, 1)
    errors = target - predictions  # (batch, horizon, n_quantiles)
    
    quantiles_tensor = torch.tensor(quantiles, device=predictions.device).view(1, 1, -1)
    
    loss = torch.max(
        quantiles_tensor * errors,
        (quantiles_tensor - 1) * errors
    )
    return loss.mean()