[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/giulatona/iecon2025_tutorial/blob/main/notebooks/03_forecasting_models.ipynb)

# Deep Learning for Energy Forecasting - Model Training

This notebook demonstrates training various forecasting models on the household power consumption dataset, from simple baselines to advanced deep learning architectures.

## Learning Objectives
- Implement baseline forecasting models (naive seasonal)
- Build and train feedforward neural networks for time series
- Develop LSTM Encoder-Decoder architectures
- Compare model performance and understand trade-offs
- Apply proper evaluation metrics for time series forecasting

## 1. Setup and Data Loading

In [1]:
# Import essential libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Deep learning frameworks
import tensorflow as tf
import keras

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"GPU available: {tf.config.list_physical_devices('GPU')}")

GPU available: []


### Data Download and Initial Loading

We'll use the same household power consumption dataset from the UCI Machine Learning Repository.

In [2]:
# Download data if not available locally
import os
import urllib.request
import zipfile

# Check if running in Google Colab
in_colab = 'google.colab' in str(get_ipython())

if in_colab:
    print("Running in Google Colab - downloading dataset...")
    !wget -q https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip
    !unzip -q household_power_consumption.zip
    data_file = 'household_power_consumption.txt'
else:
    print("Running locally...")
    data_file = 'data/household_power_consumption.txt'
    
    # Create data directory if it doesn't exist
    os.makedirs('data', exist_ok=True)
    
    # Download dataset if it doesn't exist locally
    if not os.path.exists(data_file):
        print("Dataset not found locally. Downloading...")
        url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip'
        zip_file = 'data/household_power_consumption.zip'
        
        # Download the zip file
        urllib.request.urlretrieve(url, zip_file)
        print("Download completed. Extracting...")
        
        # Extract the zip file
        with zipfile.ZipFile(zip_file, 'r') as zip_ref:
            zip_ref.extractall('data/')
        
        print("Extraction completed.")
    else:
        print("Dataset found locally.")

print(f"Using data file: {data_file}")

Running locally...
Dataset found locally.
Using data file: data/household_power_consumption.txt


## 2. Import Preprocessing Functions

We need the preprocessing functions from notebook 02. Here are the options for including them:

In [3]:
# Import preprocessing functions from shared utilities module

# Add the src directory to Python path for imports
if in_colab:
    # For Colab, install the package directly from GitHub
    print("Installing iecon2025_tutorial package from GitHub...")
    try:
        %pip install git+https://github.com/giulatona/iecon2025_tutorial.git
        from iecon2025_tutorial.preprocessing import load_and_preprocess_data
        print("✅ Successfully installed and imported preprocessing functions from GitHub")
    except Exception as e:
        print(f"❌ Installation error: {e}")
        print("Please check the GitHub repository URL and try again")
else:
    try:
        from iecon2025_tutorial.preprocessing import load_and_preprocess_data
        print("✅ Successfully imported preprocessing functions from local utilities module")
    except ImportError as e:
        print(f"❌ Import error: {e}")
        print("Please ensure the src/iecon2025_tutorial directory contains preprocessing.py")

✅ Successfully imported preprocessing functions from local utilities module


## 3. Data Loading and Basic Preprocessing

In [4]:
# Training configuration
TRAINING_CONFIG = {
    'window_size': 24,          # 24 hours of historical data
    'forecast_horizon': 12,     # Predict next 12 hours
    'batch_size': 32,
    'epochs': 50,
    'validation_split': 0.2,
    'early_stopping_patience': 10,
    'reduce_lr_patience': 5
}

print("Training configuration:")
for key, value in TRAINING_CONFIG.items():
    print(f"  {key}: {value}")

Training configuration:
  window_size: 24
  forecast_horizon: 12
  batch_size: 32
  epochs: 50
  validation_split: 0.2
  early_stopping_patience: 10
  reduce_lr_patience: 5


In [None]:
# Load and preprocess the dataset using the comprehensive pipeline
try:
    # Use the comprehensive preprocessing pipeline
    results = load_and_preprocess_data(
        data_path=str(data_file),
        downsample_freq='1H',  # Hourly data for faster training
        window_size=TRAINING_CONFIG['window_size'],
        forecast_horizon=TRAINING_CONFIG['forecast_horizon'],
        batch_size=TRAINING_CONFIG['batch_size'],
        target_columns=[0],  # Predict Global_active_power only
        add_time_features=True,
        add_holiday_features=False,  # Skip holidays for speed
        verbose=True  # Enable function's verbose output
    )
    
    # Extract the datasets and metadata
    train_dataset = results['datasets']['train']
    val_dataset = results['datasets']['val']
    test_dataset = results['datasets']['test']
    
    scaler_params = results['scaler_params']
    feature_names = results['feature_names']
    preprocessing_info = results['preprocessing_info']
    
    # Get data shapes for model building
    for inputs, targets in train_dataset.take(1):
        input_shape = inputs.shape
        target_shape = targets.shape
        num_features = input_shape[-1]
        break
    
    # Show final summary
    print(f"Ready for training - Input: {input_shape}, Target: {target_shape}")
    print(f"Features: {preprocessing_info['feature_count']}, Samples: {preprocessing_info['split_info']['train_samples']:,}")
    
except Exception as e:
    print(f"❌ Error in preprocessing: {e}")
    print("Please check that preprocessing functions are available")
    results = None

Starting preprocessing pipeline...
