# GRU Forecasting & Visualization for Energy Markets

This notebook trains the **GRU model** from the VIP project on both:

1. **Sign classification task** (up/down direction)
2. **Price regression task** (next-period price/return)

and produces publication-ready visualizations suitable for the
"Survey of Machine Learning Methods for Energy Markets" report.

## Data Sources

**This notebook uses real energy market data from the `Data/` folder.**

The data pipeline (`data_pipeline.py`) reads from:
- **Primary dataset**: `Data/Data_cleaned_Dataset.csv` - This is the main cleaned dataset containing electricity prices, volumes, natural gas prices, load data, temperature, and other engineered features.

**Additional source files** (available in `Data/` but pre-processed into the main dataset):
- `Net_generation_by places.csv`
- `Net_generation_United_States_all_sectors_monthly.csv`
- `Retail_sales_of_electricity_United_States_monthly.csv`

The `load_dataset()` function in `data_pipeline.py` reads `Data_cleaned_Dataset.csv` and applies preprocessing (date parsing, interpolation, zero-price handling). The `make_dataset_for_task()` function then builds features and targets from this cleaned dataset.

**All GRU training in this notebook uses the same unified data pipeline as the rest of the project**, ensuring consistency and reproducibility. No dummy or synthetic data is used.


In [1]:
import os
import sys
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.metrics import (
    confusion_matrix,
    ConfusionMatrixDisplay,
    roc_curve,
    auc,
    mean_squared_error,
    mean_absolute_error,
    r2_score,
)

# Ensure repo root is on the path (assumes notebook is in VIP/notebooks/)
if ".." not in sys.path:
    sys.path.append("..")

import config
from data_pipeline import make_dataset_for_task
from models import model_gru
from metrics import evaluate_model_outputs

plt.rcParams["figure.figsize"] = (10, 5)
plt.rcParams["axes.grid"] = True

print("Using sequence length:", config.SEQUENCE_LENGTH)


Using sequence length: 14


In [2]:
# === Data Sanity Check: Verify CSV Files Exist ===

# Verify all CSV files in Data/ folder exist
print("=" * 80)
print("Data Source Verification")
print("=" * 80)

csv_files = [
    "Data_cleaned_Dataset.csv",
    "Net_generation_by places.csv",
    "Net_generation_United_States_all_sectors_monthly.csv",
    "Retail_sales_of_electricity_United_States_monthly.csv",
]

all_exist = True
for fname in csv_files:
    path = os.path.join("..", "Data", fname)
    exists = os.path.exists(path)
    all_exist = all_exist and exists
    status = "[OK]" if exists else "[MISSING]"
    print(f"{status} {fname}: {'EXISTS' if exists else 'MISSING'}")

if not all_exist:
    print("\n[WARNING] Some CSV files are missing. The pipeline may fail.")
else:
    print("\n[OK] All CSV files found in Data/ folder")

# Load and inspect the main dataset used by the pipeline
print("\n" + "=" * 80)
print("Main Dataset Inspection (Data_cleaned_Dataset.csv)")
print("=" * 80)

try:
    from data_pipeline import load_dataset
    
    # Load a sample (first 1000 rows for quick inspection)
    df_sample = load_dataset()
    
    print(f"\nDataset shape: {df_sample.shape}")
    print(f"Date range: {df_sample['Trade Date'].min()} to {df_sample['Trade Date'].max()}")
    print(f"\nFirst few rows:")
    print(df_sample[['Trade Date', 'Electricity: Wtd Avg Price $/MWh', 
                     'Electricity: Daily Volume MWh']].head())
    
    print(f"\nKey columns present:")
    key_cols = [
        'Trade Date',
        'Electricity: Wtd Avg Price $/MWh',
        'Electricity: Daily Volume MWh',
        'Natural Gas: Henry Hub Natural Gas Spot Price (Dollars per Million Btu)',
        'pjm_load sum in MW (daily)',
        'temperature mean in C (daily): US'
    ]
    for col in key_cols:
        present = "[OK]" if col in df_sample.columns else "[MISSING]"
        print(f"  {present} {col}")
    
    print("\n[OK] Main dataset loaded successfully - using REAL data from CSV files")
    
except Exception as e:
    print(f"\n[ERROR] Error loading dataset: {e}")
    print("This may indicate a path issue. Check that Data/Data_cleaned_Dataset.csv exists.")

print("\n" + "=" * 80)


Data Source Verification
[OK] Data_cleaned_Dataset.csv: EXISTS
[OK] Net_generation_by places.csv: EXISTS
[OK] Net_generation_United_States_all_sectors_monthly.csv: EXISTS
[OK] Retail_sales_of_electricity_United_States_monthly.csv: EXISTS

[OK] All CSV files found in Data/ folder

Main Dataset Inspection (Data_cleaned_Dataset.csv)

Dataset shape: (8034, 305)
Date range: 2001-01-02 00:00:00 to 2022-12-31 00:00:00

First few rows:
  Trade Date  Electricity: Wtd Avg Price $/MWh  Electricity: Daily Volume MWh
0 2001-01-02                            65.000                         1600.0
1 2001-01-03                            61.250                         3200.0
2 2001-01-04                            59.120                         4800.0
3 2001-01-05                            59.215                         3800.0
4 2001-01-06                            59.310                         2800.0

Key columns present:
  [OK] Trade Date
  [OK] Electricity: Wtd Avg Price $/MWh
  [OK] Electricity: 

  df = df.interpolate(method='linear', limit_direction='both')


In [None]:
# === 1. Train GRU for SIGN classification (direction) ===

import sys

# Set global task type for classification
config.TASK_TYPE = "classification"

print("=" * 80)
print("Training GRU for SIGN Classification")
print("=" * 80)
sys.stdout.flush()

try:
    # Time the data loading
    start_time = time.time()
    print("\n[STEP 1] Loading and preparing data...")
    sys.stdout.flush()
    
    datasets_sign = make_dataset_for_task(
        task_type="sign",
        seq_len=config.SEQUENCE_LENGTH,
        test_size=config.TEST_SIZE,
        val_size=config.VAL_SIZE,
        scaler_type=config.SCALER_TYPE,
    )
    data_load_time = time.time() - start_time
    print(f"[STEP 1] Data loading completed in {data_load_time:.2f} seconds")
    sys.stdout.flush()

    print("\nSign task shapes:")
    for k in ["X_train", "X_val", "X_test"]:
        print(f"  {k}:", datasets_sign[k].shape)
    sys.stdout.flush()

    # Optional: smaller config for quicker experimentation; for full runs,
    # comment this out to use config.GRU_CONFIG + global MAX_EPOCHS / BATCH_SIZE.
    sign_train_config = {
        **config.GRU_CONFIG,
        "max_epochs": config.MAX_EPOCHS,
        "batch_size": config.BATCH_SIZE,
        "patience": config.EARLY_STOP_PATIENCE,
    }

    # Time the training
    print("\n" + "=" * 80)
    print("[STEP 2] Starting GRU Training...")
    print("=" * 80)
    print(f"Configuration: max_epochs={sign_train_config['max_epochs']}, "
          f"batch_size={sign_train_config['batch_size']}, "
          f"patience={sign_train_config['patience']}")
    sys.stdout.flush()
    
    training_start = time.time()

    results_sign = model_gru.train_and_predict(datasets_sign, config=sign_train_config)

    training_time = time.time() - training_start
    print("\n" + "=" * 80)
    print("[STEP 2] Training completed!")
    print(f"Total training time: {training_time:.2f} seconds ({training_time/60:.2f} minutes)")
    print("=" * 80)
    sys.stdout.flush()

    print("\n[STEP 3] Generating predictions and computing metrics...")
    sys.stdout.flush()
    
    y_true_sign = datasets_sign["y_test"]
    y_pred_prob_sign = results_sign["y_pred_test"].ravel()
    y_pred_label_sign = (y_pred_prob_sign > 0.5).astype(int)

    print("\nSign classification summary:")
    unique, counts = np.unique(y_pred_label_sign, return_counts=True)
    print(dict(zip(unique, counts)))
    sys.stdout.flush()
    
    print("\n[SUCCESS] All steps completed successfully!")
    
except Exception as e:
    print(f"\n[ERROR] Training failed with error: {type(e).__name__}")
    print(f"Error message: {str(e)}")
    import traceback
    print("\nFull traceback:")
    traceback.print_exc()
    sys.stdout.flush()
    raise


Training GRU for SIGN Classification

[STEP 1] Loading and preparing data...


  df = df.interpolate(method='linear', limit_direction='both')


Loading dataset...
Building features...
Building targets for task: sign...
Splitting data (time-based)...
Train size: 5624, Val size: 1204, Test size: 1204
Scaling features using standard scaler...
Creating sequences with length 14...
Sequence shapes - Train: (5610, 14, 20), Val: (1190, 14, 20), Test: (1190, 14, 20)
Dataset preparation complete!
[STEP 1] Data loading completed in 0.20 seconds

Sign task shapes:
  X_train: (5610, 14, 20)
  X_val: (1190, 14, 20)
  X_test: (1190, 14, 20)

[STEP 2] Starting GRU Training...
Configuration: max_epochs=100, batch_size=32, patience=10

GRU Model Architecture:
--------------------------------------------------------------------------------


--------------------------------------------------------------------------------

Training GRU for TASK_TYPE = classification
  max_epochs: 100
  batch_size: 32
  patience:   10
Epoch 1/100


2025-12-11 12:58:52.485797: E tensorflow/core/framework/node_def_util.cc:680] NodeDef mentions attribute use_unbounded_threadpool which is not in the op definition: Op<name=MapDataset; signature=input_dataset:variant, other_arguments: -> handle:variant; attr=f:func; attr=Targuments:list(type),min=0; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),min=1; attr=use_inter_op_parallelism:bool,default=true; attr=preserve_cardinality:bool,default=false; attr=force_synchronous:bool,default=false; attr=metadata:string,default=""> This may be expected if your graph generating binary is newer  than this binary. Unknown attributes will be ignored. NodeDef: {{node ParallelMapDatasetV2/_15}}


In [None]:
# === 2. Train GRU for PRICE regression ===

import sys

# Set global task type for regression
config.TASK_TYPE = "regression"

print("=" * 80)
print("Training GRU for PRICE Regression")
print("=" * 80)
sys.stdout.flush()

try:
    # Time the data loading
    start_time = time.time()
    print("\n[STEP 1] Loading and preparing data...")
    sys.stdout.flush()
    
    datasets_price = make_dataset_for_task(
        task_type="price",
        seq_len=config.SEQUENCE_LENGTH,
        test_size=config.TEST_SIZE,
        val_size=config.VAL_SIZE,
        scaler_type=config.SCALER_TYPE,
    )
    data_load_time = time.time() - start_time
    print(f"[STEP 1] Data loading completed in {data_load_time:.2f} seconds")
    sys.stdout.flush()

    print("\nPrice task shapes:")
    for k in ["X_train", "X_val", "X_test"]:
        print(f"  {k}:", datasets_price[k].shape)
    sys.stdout.flush()

    price_train_config = {
        **config.GRU_CONFIG,
        "max_epochs": config.MAX_EPOCHS,
        "batch_size": config.BATCH_SIZE,
        "patience": config.EARLY_STOP_PATIENCE,
    }

    # Time the training
    print("\n" + "=" * 80)
    print("[STEP 2] Starting GRU Training...")
    print("=" * 80)
    print(f"Configuration: max_epochs={price_train_config['max_epochs']}, "
          f"batch_size={price_train_config['batch_size']}, "
          f"patience={price_train_config['patience']}")
    sys.stdout.flush()
    
    training_start = time.time()

    results_price = model_gru.train_and_predict(datasets_price, config=price_train_config)

    training_time = time.time() - training_start
    print("\n" + "=" * 80)
    print("[STEP 2] Training completed!")
    print(f"Total training time: {training_time:.2f} seconds ({training_time/60:.2f} minutes)")
    print("=" * 80)
    sys.stdout.flush()

    print("\n[STEP 3] Generating predictions and computing metrics...")
    sys.stdout.flush()
    
    y_true_price = datasets_price["y_test"]
    y_pred_price = results_price["y_pred_test"].ravel()

    print("\nPrice regression basic metrics:")
    mse = mean_squared_error(y_true_price, y_pred_price)
    mae = mean_absolute_error(y_true_price, y_pred_price)
    r2 = r2_score(y_true_price, y_pred_price)
    print(f"MSE: {mse:.6f}, MAE: {mae:.6f}, R^2: {r2:.4f}")
    sys.stdout.flush()
    
    print("\n[SUCCESS] All steps completed successfully!")
    
except Exception as e:
    print(f"\n[ERROR] Training failed with error: {type(e).__name__}")
    print(f"Error message: {str(e)}")
    import traceback
    print("\nFull traceback:")
    traceback.print_exc()
    sys.stdout.flush()
    raise


In [None]:
# === 3. Visualizations for PRICE regression ===

time_index = np.arange(len(y_true_price))

# (a) Time series: actual vs predicted
plt.figure()
plt.plot(time_index, y_true_price, label="Actual", alpha=0.8)
plt.plot(time_index, y_pred_price, label="Predicted (GRU)", alpha=0.8)
plt.xlabel("Test Time Index")
plt.ylabel("Price / Return (units)")
plt.title("GRU Price Regression: Actual vs Predicted (Test Set)")
plt.legend()
plt.tight_layout()
plt.show()

# (b) Scatter plot with y=x line
plt.figure()
plt.scatter(y_true_price, y_pred_price, alpha=0.5)
min_v = min(y_true_price.min(), y_pred_price.min())
max_v = max(y_true_price.max(), y_pred_price.max())
plt.plot([min_v, max_v], [min_v, max_v], linestyle="--")
plt.xlabel("True Values")
plt.ylabel("Predicted Values")
plt.title("GRU Price Regression: True vs Predicted (Test Set)")
plt.tight_layout()
plt.show()

# (c) Residual histogram
residuals = y_true_price - y_pred_price
plt.figure()
plt.hist(residuals, bins=40, alpha=0.8)
plt.xlabel("Residual (True - Predicted)")
plt.ylabel("Frequency")
plt.title("GRU Price Regression: Residual Distribution (Test Set)")
plt.tight_layout()
plt.show()

# (d) Rolling window RMSE to highlight regimes
window = max(20, len(residuals) // 20)
rolling_rmse = [
    np.sqrt(np.mean(residuals[i:i+window] ** 2))
    for i in range(0, len(residuals) - window + 1)
]

plt.figure()
plt.plot(np.arange(len(rolling_rmse)), rolling_rmse)
plt.xlabel("Window index")  # approximate time index
plt.ylabel(f"Rolling RMSE (window={window})")
plt.title("GRU Price Regression: Rolling RMSE (Test Set)")
plt.tight_layout()
plt.show()


In [None]:
# === 4. Visualizations for SIGN classification ===

# (a) Confusion matrix
cm = confusion_matrix(y_true_sign, y_pred_label_sign)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(values_format="d")
plt.title("GRU Sign Classification: Confusion Matrix (Test Set)")
plt.tight_layout()
plt.show()

# (b) ROC curve & AUC
fpr, tpr, thresholds = roc_curve(y_true_sign, y_pred_prob_sign)
roc_auc = auc(fpr, tpr)

plt.figure()
plt.plot(fpr, tpr, label=f"ROC curve (AUC = {roc_auc:.3f})")
plt.plot([0, 1], [0, 1], linestyle="--", label="Random baseline")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("GRU Sign Classification: ROC Curve (Test Set)")
plt.legend()
plt.tight_layout()
plt.show()


## Notes for the Survey Report

- The **GRU architecture** here follows the standardized configuration in `config.GRU_CONFIG`
  (two GRU layers + dense head) and the unified training utilities in `training_utils.py`.
- Point forecast quality is summarized by MAE, MSE, RMSE, and $R^2$ for the price task.
- Classification quality is summarized by accuracy, confusion matrix, and ROC/AUC.
- Rolling RMSE illustrates **regime shifts**, which can be cross-referenced with crisis
  periods discussed in the survey (e.g., COVID-19, 2022 energy shock).
- These plots can be copied directly into the sections on RNN-based models and
  evaluation protocols in the report.
