# Part 1: Forecasting using weather station data

In this notebook, we will build a forecasting model for the weather station data selected.

**Goal**: Predict Temperature for the **next 12 hours** using **24 hours** of history.

We will explore three model architectures:

1.  **Linear Regression (The Baseline)**:
    *   *Theory*: "Tomorrow is just a weighted average of the past."
    *   *Pros*: Fast, Interpretable.
    *   *Cons*: Limited power; cannot learn complex interactions (e.g., "If wind is high AND sun is out, temp rises fast").

2.  **MLP (Multi-Layer Perceptron)**:
    *   *Theory*: "Non-linear pattern matching." Adds hidden layers and activation functions (ReLU) to bend the lines.
    *   *Pros*: Can model complex curves.
    *   *Cons*: Treats the history as a flat vector. It doesn't inherently understand that Hour 1 came before Hour 2.

3.  **LSTM (Long Short-Term Memory)**:
    *   *Theory*: "Sequence Modeling." Processes data step-by-step, maintaining an internal "state" or memory.
    *   *Pros*: Designed for time series; understands causality and long-term dependencies.
    *   *Cons*: Slower to train.


## 1. Setup & Data Preparation


In [None]:
try:
    import google.colab
    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    print("Running on Google Colab. Installing dependencies...")
    !pip install lightning
else:
    print("Running locally. Skipping dependency installation.")

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

### Load Data
We load the processed hourly weather data for SFO.

In [None]:
data_path = '../data/processed/sfo_hourly_2022_2026.csv'
data = pd.read_csv(data_path, index_col='time', parse_dates=True)

print(f"Loaded {len(data)} rows.")
data.head()


Before feeding data into models, we must decide what the model should "see".

*   **Temperature (temp)**: Driven by solar radiation and air mass movements.
*   **Humidity (rhum)**: Critical for cloud formation and thermal comfort.
*   **Pressure (pres)**: Large-scale high/low pressure systems drive wind and weather patterns.
*   **Wind Speed (wspd)**: Advection (moving heat around).


In [None]:
INPUT_FEATURES = [
    'pres', # Pressure
    'rhum', # Humidity
    'wspd', # Wind Speed
    'temp', # Temperature
]

TARGET_FEATURE = 'temp' # We want to forecast Temperature

# Configuration for Time Sequence
HISTORY_HOURS = 24
FORECAST_HOURS = 12

input_dim = len(INPUT_FEATURES)
print(f"Inputs ({input_dim} features): {INPUT_FEATURES}")
print(f"Target: {TARGET_FEATURE}")


### Time-Based Splitting & Scaling

We cannot shuffle time series data randomly. We fit the Scaler ONLY on the training set to simulate a real-world scenario (we don't know the future range of values).

In [None]:
split_date = '2025-01-01'

# 1. Split Train+Val (Past) vs Test (Future)
train_val_df = data[data.index < split_date]
test_df = data[data.index >= split_date]

# 2. Further Split Train vs Validation (80/20 chronological)
n_train_val = len(train_val_df)
train_end = int(n_train_val * 0.8)

train_df = train_val_df.iloc[:train_end]
val_df = train_val_df.iloc[train_end:]

print(f"Train: {len(train_df)} rows")
print(f"Val:   {len(val_df)} rows")
print(f"Test:  {len(test_df)} rows")

In [None]:
# 3. Feature Scaling (Independent for X and y)
X_scaler = StandardScaler()
y_scaler = StandardScaler()

# Fit ONLY on Training Data
X_train = X_scaler.fit_transform(train_df[INPUT_FEATURES]).astype(np.float32)
y_train = y_scaler.fit_transform(train_df[[TARGET_FEATURE]]).astype(np.float32)

# Transform Validation and Test using Training statistics
X_val = X_scaler.transform(val_df[INPUT_FEATURES]).astype(np.float32)
y_val = y_scaler.transform(val_df[[TARGET_FEATURE]]).astype(np.float32)

X_test = X_scaler.transform(test_df[INPUT_FEATURES]).astype(np.float32)
y_test = y_scaler.transform(test_df[[TARGET_FEATURE]]).astype(np.float32)

We need to chop our continuous timeline into samples:
*   **Input**: Window of length `HISTORY_HOURS`.
*   **Target**: The *following* window of length `FORECAST_HOURS`.

In [None]:
class WeatherDataset(Dataset):
    def __init__(self, X, y, input_len, output_len):
        self.X = torch.tensor(X)
        self.y = torch.tensor(y)
        self.input_len = input_len
        self.output_len = output_len

    def __len__(self):
        # We stop when we can't form a full future window
        return len(self.X) - self.input_len - self.output_len
    
    def __getitem__(self, idx):
        # Input Sequence
        x_seq = self.X[idx : idx + self.input_len]
        
        # Target Sequence (immediately following input)
        y_seq = self.y[idx + self.input_len : idx + self.input_len + self.output_len]
        
        # Remove channel dim for target: (24, 1) -> (24)
        return x_seq, y_seq.squeeze(-1)

INPUT_LEN = HISTORY_HOURS
OUTPUT_LEN = FORECAST_HOURS
BATCH_SIZE = 64

train_ds = WeatherDataset(X_train, y_train, INPUT_LEN, OUTPUT_LEN)
val_ds = WeatherDataset(X_val, y_val, INPUT_LEN, OUTPUT_LEN)
test_ds = WeatherDataset(X_test, y_test, INPUT_LEN, OUTPUT_LEN)

train_loader = DataLoader(train_ds, batch_size=BATCH_SIZE, shuffle=True) # Shuffle Train!
val_loader = DataLoader(val_ds, batch_size=BATCH_SIZE, shuffle=False)
test_loader = DataLoader(test_ds, batch_size=BATCH_SIZE, shuffle=False)


## 2. The Training Loop (PyTorch Lightning)

Instead of writing manual training loops, we use `LightningModule`

In [None]:
class WeatherPredictor(pl.LightningModule):
    def __init__(self, model):
        super().__init__()
        self.model = model
        self.criterion = nn.MSELoss()

    def forward(self, x):
        return self.model(x)

    def training_step(self, batch, batch_idx):
        x, y = batch
        loss = self.criterion(self(x), y)
        self.log("train_loss", loss, prog_bar=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        loss = self.criterion(self(x), y)
        self.log("val_loss", loss, prog_bar=True)
        return loss

    def configure_optimizers(self):
        return optim.Adam(self.parameters(), lr=0.001)

# Convenience function to train & save best model
def train_model(model, name, train_loader, val_loader):
    print(f"\n=== Training {name} ===")
    predictor = WeatherPredictor(model)
    
    # Stop if validation loss doesn't improve for 5 epochs
    early_stop = EarlyStopping(monitor="val_loss", patience=5)
    checkpoint = ModelCheckpoint(monitor="val_loss", dirpath="checkpoints", filename=f"{name}-best")
    
    trainer = pl.Trainer(
        max_epochs=50,
        callbacks=[early_stop, checkpoint], 
        accelerator='auto',
        devices=1, 
        logger=False,
        enable_progress_bar=True
    )
    trainer.fit(predictor, train_loader, val_loader)
    
    # Load best weights
    best = WeatherPredictor.load_from_checkpoint(checkpoint.best_model_path, model=model)
    return best.model

trained_models = {}

## 3. Model 1: Linear Regression

![lr](imgs/linear_regression.png)

The simplest approach. We flatten the history into a single vector and learn a weight for every input point.

In [None]:
class LinearModel(nn.Module):
    def __init__(self, input_dim, input_len, output_len):
        super().__init__()
        self.flatten_dim = input_dim * input_len
        self.fc = nn.Linear(self.flatten_dim, output_len)
    
    def forward(self, x):
        x_flat = x.view(x.size(0), -1)
        return self.fc(x_flat)

model = LinearModel(input_dim, INPUT_LEN, OUTPUT_LEN)
trained_models['Linear'] = train_model(model, 'Linear', train_loader, val_loader)


## 4. Model 2: Multi-Layer Perceptron (MLP)

![mlp](imgs/mlp.png)

An MLP is a foundational type of neural network that learns relationships between inputs and outputs by passing data through multiple stages of processing.

* **Layers:** Think of these as stages in an assembly line. The input layer receives raw data, hidden layers process the data to extract features and patterns, and the output layer delivers the final prediction.
* **Nonlinear Activations:** These are simple functions (like ReLU or Sigmoid) applied at each layer. They are the "spark" that allows the network to learn complex, curvy patterns. Without them, the network could only understand simple straight-line relationships (linear regression).

We add hidden layers with `ReLU` activations.

In [None]:
class MLPModel(nn.Module):
    def __init__(self, input_dim, input_len, output_len, hidden_dim=64):
        super().__init__()
        self.flatten_dim = input_dim * input_len
        self.net = nn.Sequential(
            nn.Linear(self.flatten_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2), # Regularization
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Linear(hidden_dim // 2, output_len)
        )
    
    def forward(self, x):
        x_flat = x.view(x.size(0), -1)
        return self.net(x_flat)

model = MLPModel(input_dim, INPUT_LEN, OUTPUT_LEN)
trained_models['MLP'] = train_model(model, 'MLP', train_loader, val_loader)


## 5. Model 3: LSTM (Long Short-Term Memory)

![lstm](imgs/lstm.png)

LSTMs are a specialized type of Recurrent Neural Network (RNN) designed to learn long-term dependencies in sequence data (like time series, speech, or text). Unlike standard feedforward networks (like MLPs), LSTMs have "loops" that allow information to persist.

* **The Cell State (Memory):** It allows information to flow unchanged, letting the network remember things from long ago.
* **Gates (The Filters):** LSTMs use specific mechanisms called "gates" (Forget, Input, and Output) to regulate this flow of information. They act like traffic controllers, deciding what old information to throw away and what new information to store in the long-term memory.

The LSTM doesn't flatten time. It walks through the sequence step-by-step, updating its memory.
This is theoretically the best approach for physical systems.

In [None]:
class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim=50, output_len=24, num_layers=2):
        super().__init__()
        self.num_layers = num_layers
        self.hidden_dim = hidden_dim

        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers=num_layers, batch_first=True, dropout=0.2)
        self.fc = nn.Linear(hidden_dim, output_len)
        self.dropout = nn.Dropout(0.2)
    
    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_dim).to(x.device)
        out, _ = self.lstm(x, (h0, c0))

        last_step_out = out[:, -1, :]
        last_step_out = self.dropout(last_step_out)
        return self.fc(last_step_out)

# Train LSTM
model = LSTMModel(input_dim, output_len=OUTPUT_LEN)
trained_models['LSTM'] = train_model(model, 'LSTM', train_loader, val_loader)


## 6. Model comparison

In [None]:
def plot_forecast(idx=0):
    # Get a batch from Test Set
    x_batch, y_batch = next(iter(test_loader))
    
    # Get predictions (scaled)
    preds_scaled = {}
    for name, model in trained_models.items():
        model.eval()
        try:
            device = next(model.parameters()).device
        except:
            device = 'cpu'
            
        x_batch_device = x_batch.to(device)
        with torch.no_grad():
            preds_scaled[name] = model(x_batch_device).cpu().numpy()[idx]
    
    # Prepare Plot
    fig, ax = plt.subplots(figsize=(14, 6))

    dataset = test_loader.dataset
    history_scaled = dataset.y[idx : idx + INPUT_LEN].numpy()
    
    # Inverse Transform
    history = y_scaler.inverse_transform(history_scaled.reshape(-1, 1)).flatten()
    
    history_len = len(history)
    history_time = range(-history_len, 0)
    ax.plot(history_time, history, 'k:', linewidth=2, label='History (Past)')

    # 1. Plot Truth (Y)
    truth_scaled = y_batch[idx].numpy()
    truth = y_scaler.inverse_transform(truth_scaled.reshape(-1, 1)).flatten()
    ax.plot(range(0, OUTPUT_LEN), truth, 'k-', linewidth=3, label='Truth (Future)')

    # 2. Plot Predictions
    colors = {'Linear': 'blue', 'MLP': 'orange', 'LSTM': 'red'}
    for name, p_scaled in preds_scaled.items():
        # Inverse Transform Prediction
        pred = y_scaler.inverse_transform(p_scaled.reshape(-1, 1)).flatten()
        
        # Calculate RMSE on Original Scale (°C)
        rmse = np.sqrt(np.mean((pred - truth)**2))
        
        ax.plot(range(0, OUTPUT_LEN), pred, color=colors.get(name, 'green'), linestyle='--', 
                label=f"{name} (RMSE: {rmse:.2f} °C)")

    ax.set_title(f"Forecast with History (Test Sample {idx})")
    ax.set_xlabel("Hours relative to present")
    ax.set_ylabel("Temperature (°C)")
    ax.legend()
    ax.grid(True, alpha=0.3)
    ax.axvline(x=0, color='gray', linestyle='-', alpha=0.5)
    plt.show()
plot_forecast(idx=20)