# Systematic Hyperparameter Tuning

In this notebook, we use Optuna to find the optimal hyperparameters for our baseline models before running the final training and evaluation. As per the project plan, we will tune the deep learning models on a representative subsample of the data to save time, while tree-based models can be tuned on a larger set if needed.

The process is as follows:
1.  **Asset Subsampling**: We will analyze the liquidity of all assets and create a small, representative subset for tuning.
2.  **Optuna Objective Functions**: We will define an objective function for each model architecture (LSTM, Chronos) that Optuna will seek to minimize (the validation sMAPE or loss).
3.  **Run Studies**: We will execute the tuning studies for a predefined number of trials.
4.  **Save Best Parameters**: The best hyperparameters found will be saved for use in the next phase.

In [1]:
import os
import sys
import json
import pandas as pd
import numpy as np
import datetime, os, time, warnings
import torch
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint
import optuna
from darts import TimeSeries
from darts.models import BlockRNNModel
from chronos import Chronos2Pipeline

# Add project root to path to allow imports from src
if os.path.basename(os.getcwd()) == 'notebooks':
    project_root = os.path.abspath('..')
else:
    project_root = os.getcwd()

if project_root not in sys.path:
    sys.path.append(project_root)

from src.datamodule import ElectricityDataModule, masked_smoothed_smape

# Set precision for Tensor Cores to improve performance
torch.backends.fp32_precision = "medium"
torch.backends.cuda.matmul.fp32_precision = "medium"
torch.backends.cudnn.fp32_precision = "medium"
torch.backends.cudnn.conv.fp32_precision = "tf32"
torch.backends.cudnn.rnn.fp32_precision = "tf32"

The StatsForecast module could not be imported. To enable support for the AutoARIMA, AutoETS and Croston models, please consider installing it.
The `XGBoost` module could not be imported. To enable XGBoost support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md
The `XGBoost` module could not be imported. To enable XGBoost support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md


## 1. Setup
 
Define paths and constants for the tuning process.

In [2]:
# Define project paths
BASE_DIR = ".."
DATA_DIR = os.path.join(BASE_DIR, "data")
TRAIN_DIR = os.path.join(DATA_DIR, "train")
TRAIN_DIR_FILTERED = os.path.join(DATA_DIR, "train_trading_only")
VAL_DIR = os.path.join(DATA_DIR, "val")
VAL_DIR_FILTERED = os.path.join(DATA_DIR, "val_trading_only")
SCALERS_DIR = os.path.join(DATA_DIR, "scalers")
MODELS_DIR = os.path.join(BASE_DIR, "models")
RESULTS_DIR = os.path.join(BASE_DIR, "results")

# Create directories if needed
os.makedirs(RESULTS_DIR, exist_ok=True)
os.makedirs(TRAIN_DIR_FILTERED, exist_ok=True)
os.makedirs(VAL_DIR_FILTERED, exist_ok=True)

# Configuration
INPUT_CHUNK_LENGTH = 48
OUTPUT_CHUNK_LENGTH = 10
TARGET_COLS = ["high", "low", "close", "volume"]
N_TRIALS_Chronos = 20  # Number of trials for hyperparameter search
N_TRIALS_LSTM = 10
SEED = 827

# Global variables for caching
PREBUILT_DATA = None
TUNING_ASSETS = None

## 2. Asset Subsampling for Efficient Tuning

In [3]:
def extract_trading_segments(df, min_segment_length=96):
    """
    Extract continuous trading segments from dataframe.
    
    Parameters:
    -----------
    df : pd.DataFrame
        Must have 'is_trading' column and 'ExecutionTime' column
    min_segment_length : int
        Minimum number of consecutive trading rows to keep (default: 96 = 24 hours for 15min data)
    
    Returns:
    --------
    list of pd.DataFrame : Each dataframe is a continuous trading segment
    """
    if 'is_trading' not in df.columns:
        print("Warning: No 'is_trading' column, using volume > 0")
        df['is_trading'] = (df['volume'] > 0).astype(int)
    
    # Find where trading starts and stops
    df = df.sort_values('ExecutionTime').reset_index(drop=True)
    trading = df['is_trading'].values
    
    # Find segment boundaries
    changes = np.diff(trading, prepend=0)
    starts = np.where(changes == 1)[0]  # Trading starts
    stops = np.where(changes == -1)[0]  # Trading stops
    
    # Handle edge cases
    if len(starts) == 0:
        return []
    
    # If data starts with trading
    if trading[0] == 1:
        starts = np.insert(starts, 0, 0)
    
    # If data ends with trading
    if trading[-1] == 1:
        stops = np.append(stops, len(trading))
    
    # Ensure we have matching starts and stops
    min_len = min(len(starts), len(stops))
    starts = starts[:min_len]
    stops = stops[:min_len]
    
    # Extract segments
    segments = []
    for start, stop in zip(starts, stops):
        segment_length = stop - start
        if segment_length >= min_segment_length:
            segment = df.iloc[start:stop].copy()
            segments.append(segment)
    
    return segments


def analyze_asset_segments(train_dir, min_segment_length=96):
    """
    Analyze all assets and extract their trading segments.
    
    Returns:
    --------
    dict: {
        'asset_name': {
            'segments': [list of dataframes],
            'total_rows': int,
            'trading_rows': int,
            'n_segments': int,
            'avg_segment_length': float
        }
    }
    """
    asset_files = [f for f in os.listdir(train_dir) if f.endswith('.parquet')]
    asset_segments = {}
    
    print("="*80)
    print("TRADING SEGMENT EXTRACTION")
    print("="*80)
    print(f"Minimum segment length: {min_segment_length} rows")
    print(f"(For 15min data: {min_segment_length * 15 / 60:.1f} hours)")
    print()
    
    for asset_file in asset_files:
        asset_name = asset_file.replace('.parquet', '')
        df = pd.read_parquet(os.path.join(train_dir, asset_file))
        
        # Extract segments
        segments = extract_trading_segments(df, min_segment_length)
        
        if len(segments) > 0:
            total_rows = len(df)
            trading_rows = sum(len(seg) for seg in segments)
            avg_length = trading_rows / len(segments)
            
            asset_segments[asset_name] = {
                'segments': segments,
                'total_rows': total_rows,
                'trading_rows': trading_rows,
                'n_segments': len(segments),
                'avg_segment_length': avg_length,
                'trading_ratio_original': (df['is_trading'] == 1).mean(),
                'trading_ratio_extracted': 1.0  # 100% after extraction
            }
    
    print("="*80)
    return asset_segments


def select_best_segments(asset_segments, target_total_rows=50000, target_n_assets=5):
    """
    Select best segments to reach target total rows with maximum diversity.
    
    Strategy:
    - Prioritize longer segments (more continuous data)
    - Ensure diversity across different assets
    - Reach target total rows
    """
    print(f"\n{'='*80}")
    print(f"SELECTING SEGMENTS")
    print(f"{'='*80}")
    print(f"Target total rows: {target_total_rows:,}")
    print(f"Target n_assets: {target_n_assets}")
    print()
    
    # Sort assets by total usable trading rows
    sorted_assets = sorted(
        asset_segments.items(),
        key=lambda x: x[1]['trading_rows'],
        reverse=True
    )
    
    selected = {}
    selected_asset_names = []
    total_rows_selected = 0
    
    # First pass: Select top assets until we hit target rows
    for asset_name, info in sorted_assets:
        if len(selected) >= target_n_assets:
            break
        
        asset_trading_rows = info['trading_rows']
        
        # Add this asset
        selected[asset_name] = info
        selected_asset_names.append(asset_name)
        total_rows_selected += asset_trading_rows
        
        print(f"✓ Selected {asset_name:<20} "
              f"Segments: {info['n_segments']:>3} | "
              f"Rows: {asset_trading_rows:>8,} | "
              f"Cumulative: {total_rows_selected:>10,}")
        
        if total_rows_selected >= target_total_rows:
            print(f"\n✓ Reached target of {target_total_rows:,} rows!")
            break
    
    if total_rows_selected < target_total_rows:
        print(f"\n⚠️  Only found {total_rows_selected:,} rows (target: {target_total_rows:,})")
    
    print(f"\n{'='*80}")
    print(f"SELECTION SUMMARY")
    print(f"{'='*80}")
    print(f"Assets selected: {len(selected)}")
    print(f"Total segments: {sum(info['n_segments'] for info in selected.values())}")
    print(f"Total rows: {total_rows_selected:,}")
    print(f"{'='*80}")
    
    return selected, selected_asset_names


def build_timeseries_from_segments(selected_segments, val_dir, target_cols, 
                                   input_chunk_length, output_chunk_length,selected_asset_names):
    """
    Build Darts TimeSeries from selected trading segments.
    """
    train_ts_list = []
    val_ts_list = []
    past_covariates_train_list = []
    past_covariates_val_list = []
    val_dfs_list = []
    total_windows = 0
    
    print(f"\n{'='*80}")
    print("BUILDING TIMESERIES FROM SEGMENTS")
    print(f"{'='*80}")
    
    for asset_name, info in selected_segments.items():
        segments = info['segments']
        
        print(f"\nProcessing {asset_name}: {len(segments)} segments")
        
        # Concatenate all segments for this asset
        # Note: This creates artificial discontinuities, but all data is trading data
        train_df_combined = pd.concat(segments, ignore_index=True)
        
        # Load validation data
        val_path = os.path.join(val_dir, f"{asset_name}.parquet")
        if os.path.exists(val_path):
            val_df = pd.read_parquet(val_path)
            
            # Extract only trading segments from validation too
            val_segments = extract_trading_segments(val_df, min_segment_length=96)
            if len(val_segments) > 0:
                val_df_combined = pd.concat(val_segments, ignore_index=True)
            else:
                print(f"    No trading segments in validation, using full data")
                val_df_combined = val_df
        else:
            print(f"    No validation file, using train data")
            val_df_combined = train_df_combined
        
        # Clean timestamps
        train_df_combined['ExecutionTime'] = pd.to_datetime(
            train_df_combined['ExecutionTime']
        ).dt.tz_localize(None)
        val_df_combined['ExecutionTime'] = pd.to_datetime(
            val_df_combined['ExecutionTime']
        ).dt.tz_localize(None)
        
        # Clean numeric data & Convert to float32
        numeric_cols = train_df_combined.select_dtypes(include=[np.number]).columns
        for df in [train_df_combined, val_df_combined]:
            df[numeric_cols] = df[numeric_cols].replace([np.inf, -np.inf], np.nan)
            df[numeric_cols] = df[numeric_cols].ffill().bfill().fillna(0)
            df[numeric_cols] = df[numeric_cols].astype(np.float32)

        val_dfs_list.append(val_df_combined)
        
        # Get covariate columns
        covariate_cols = [
            col for col in train_df_combined.columns 
            if col not in target_cols + ['ExecutionTime', 'is_trading']
        ]
        
        # Create TimeSeries
        # Note: We need to handle the discontinuous time index
        # Option 1: Reset time index to be continuous (losing real timestamps)
        # Option 2: Keep real timestamps but accept gaps
        
        # Using Option 1 for stability
        train_df_continuous = train_df_combined.copy()
        val_df_continuous = val_df_combined.copy()
        
        # Create continuous 15-min frequency index
        train_df_continuous['ExecutionTime'] = pd.date_range(
            start='2020-01-01', 
            periods=len(train_df_continuous), 
            freq='15min'
        )
        val_df_continuous['ExecutionTime'] = pd.date_range(
            start='2020-01-01', 
            periods=len(val_df_continuous), 
            freq='15min'
        )
        
        try:
            # Create TimeSeries
            train_ts = TimeSeries.from_dataframe(
                train_df_continuous, 
                time_col='ExecutionTime', 
                value_cols=target_cols, 
                freq='15min'
            )
            val_ts = TimeSeries.from_dataframe(
                val_df_continuous, 
                time_col='ExecutionTime', 
                value_cols=target_cols, 
                freq='15min'
            )
            
            train_ts_list.append(train_ts)
            val_ts_list.append(val_ts)
            
            # Add covariates if available
            if len(covariate_cols) > 0:
                past_cov_train = TimeSeries.from_dataframe(
                    train_df_continuous, 
                    time_col='ExecutionTime', 
                    value_cols=covariate_cols, 
                    freq='15min'
                )
                past_cov_val = TimeSeries.from_dataframe(
                    val_df_continuous, 
                    time_col='ExecutionTime', 
                    value_cols=covariate_cols, 
                    freq='15min'
                )
                past_covariates_train_list.append(past_cov_train)
                past_covariates_val_list.append(past_cov_val)
            else:
                past_covariates_train_list.append(None)
                past_covariates_val_list.append(None)
            
            # Calculate windows
            n_windows = max(0, len(train_df_combined) - input_chunk_length - output_chunk_length + 1)
            total_windows += n_windows
            
            print(f"  ✓ Created TimeSeries: {len(train_df_combined):,} rows → {n_windows:,} windows")
            
        except Exception as e:
            print(f"  ✗ Failed to create TimeSeries: {e}")
            continue
    
    print(f"\n{'='*80}")
    print(f"TIMESERIES BUILD COMPLETE")
    print(f"{'='*80}")
    print(f"Total assets: {len(train_ts_list)}")
    print(f"Total windows: {total_windows:,}")
    print(f"{'='*80}\n")
    
    return {
        'train_ts_list': train_ts_list,
        'val_ts_list': val_ts_list,
        'past_covariates_train_list': past_covariates_train_list,
        'past_covariates_val_list': past_covariates_val_list,
        'val_dfs_list': val_dfs_list,
        'total_windows': total_windows,
        'selected_assets': selected_asset_names,
        'avg_variance': 0.0
    }

def build_timeseries_data(
    input_chunk_length, output_chunk_length, 
    target_cols, train_dir, val_dir,
    target_total_rows=50000,
    target_n_assets=5,
    min_segment_length=96
):
    """
    Build training data using ONLY trading periods (aggressive extraction).
    
    This approach:
    1. Extracts continuous trading segments from each asset
    2. Filters out all non-trading periods
    3. Ensures 100% of training data is actual trading activity
    
    Parameters:
    -----------
    input_chunk_length : int
        Input sequence length
    output_chunk_length : int
        Output sequence length
    target_cols : list
        Target column names
    train_dir : str
        Training data directory
    val_dir : str
        Validation data directory
    target_total_rows : int
        Target number of total rows (default: 50,000)
    target_n_assets : int
        Target number of assets to use (default: 5)
    min_segment_length : int
        Minimum trading segment length in rows (default: 96 = 24 hours for 15min)
    """
    global PREBUILT_DATA, TUNING_ASSETS
    
    if PREBUILT_DATA is not None:
        print("Reusing pre-built data...")
        return PREBUILT_DATA
    
    print("\n" + "="*80)
    print("TRADING-ONLY DATA EXTRACTION")
    print("="*80)
    
    # Step 1: Analyze all assets and extract segments
    asset_segments = analyze_asset_segments(train_dir, min_segment_length)

    if len(asset_segments) == 0:
        raise ValueError("No assets with sufficient trading segments found!")
    
    # Step 2: Select best segments
    selected_segments, selected_asset_names = select_best_segments(
        asset_segments, 
        target_total_rows=target_total_rows,
        target_n_assets=target_n_assets
    )

    # Store selected assets globally
    TUNING_ASSETS = selected_asset_names
    
    # Step 3: Build TimeSeries from segments
    PREBUILT_DATA = build_timeseries_from_segments(
        selected_segments,
        val_dir,
        target_cols,
        input_chunk_length,
        output_chunk_length,
        selected_asset_names
    )
    
    # Verification
    print("="*80)
    print("FINAL VERIFICATION")
    print("="*80)
    print(f"✓ Assets: {len(PREBUILT_DATA['train_ts_list'])}")
    print(f"✓ Total windows: {PREBUILT_DATA['total_windows']:,}")
    print(f"✓ Training data is 100% trading periods")
    print("="*80 + "\n")

    return PREBUILT_DATA

# To create filtered parquet files
def create_trading_only_files(train_dir, val_dir, output_train_dir, output_val_dir, assets_list):
    """Create new parquet files with only trading segments."""
    os.makedirs(output_train_dir, exist_ok=True)
    os.makedirs(output_val_dir, exist_ok=True)
    
    for asset_name in assets_list:
        # Process training file
        train_file = os.path.join(train_dir, f"{asset_name}.parquet")
        if os.path.exists(train_file):
            df = pd.read_parquet(train_file)
            segments = extract_trading_segments(df, min_segment_length=96)
            
            if segments:
                df_trading = pd.concat(segments, ignore_index=True)
                output_path = os.path.join(output_train_dir, f"{asset_name}.parquet")
                df_trading.to_parquet(output_path)
                print(f"✓ Created training file for {asset_name}")

        # Process validation file
        val_file = os.path.join(val_dir, f"{asset_name}.parquet")
        if os.path.exists(val_file):
            df = pd.read_parquet(val_file)
            segments = extract_trading_segments(df, min_segment_length=96)
            
            if segments:
                df_trading = pd.concat(segments, ignore_index=True)
                output_path = os.path.join(output_val_dir, f"{asset_name}.parquet")
                df_trading.to_parquet(output_path)
                print(f"✓ Created validation file for {asset_name}")

## 3. Chronos Hyperparameter Tuning

In [4]:
# --- Helpers used by objective_chronos2 ---
def _to_tensor(x):
    """Convert x (tensor / numpy / list/tuple of tensors or scalars) -> torch.Tensor (cpu)."""
    if isinstance(x, torch.Tensor):
        return x.cpu()
    if isinstance(x, np.ndarray):
        return torch.from_numpy(x)
    if isinstance(x, (list, tuple)):
        if len(x) == 0:
            raise ValueError("Empty list cannot be converted to tensor.")
        if all(isinstance(el, torch.Tensor) for el in x):
            return torch.stack([el.cpu() for el in x])
        if all(isinstance(el, np.ndarray) for el in x):
            return torch.from_numpy(np.stack(x))
        # fallback: attempt elementwise conversion then stack
        converted = []
        for el in x:
            if isinstance(el, torch.Tensor):
                converted.append(el.cpu())
            elif isinstance(el, np.ndarray):
                converted.append(torch.from_numpy(el))
            else:
                converted.append(torch.tensor(el))
        return torch.stack(converted)
    # final fallback
    return torch.tensor(x)

def _extract_context_target_mask(batch):
    """Return (context, target, future_mask_or_None) from common batch formats."""
    if isinstance(batch, dict):
        ctx_keys = ("past_target", "past_targets", "past", "history", "x")
        tgt_keys = ("future_target", "future_targets", "future", "y", "target")
        mask_keys = ("future_mask", "mask", "future_masks")
        context = next((batch[k] for k in ctx_keys if k in batch), None)
        target = next((batch[k] for k in tgt_keys if k in batch), None)
        future_mask = next((batch[k] for k in mask_keys if k in batch), None)
        return context, target, future_mask

    if isinstance(batch, (tuple, list)):
        if len(batch) == 2:
            return batch[0], batch[1], None
        if len(batch) >= 3:
            context = batch[0]
            target = batch[2] if len(batch) > 2 else None
            future_mask = batch[3] if len(batch) > 3 else None
            return context, target, future_mask

    raise TypeError(f"Unsupported batch format: {type(batch)}")

def _to_torch(x):
    """Convert pipeline outputs (list/ndarray/tensor) -> torch.Tensor on CPU or None if x is None."""
    if x is None:
        return None
    if isinstance(x, torch.Tensor):
        return x.cpu()
    if isinstance(x, np.ndarray):
        return torch.from_numpy(x)
    if isinstance(x, list):
        if len(x) == 0:
            return None
        if all(isinstance(el, np.ndarray) for el in x):
            return torch.from_numpy(np.stack(x))
        if all(isinstance(el, torch.Tensor) for el in x):
            return torch.stack([el.cpu() for el in x])
        # fallback to tensor construct
        return torch.tensor(x)
    raise TypeError(f"Unsupported type for forecast output conversion: {type(x)}")

def align_forecast_to_target(q_out: torch.Tensor | None,
                             mean_out: torch.Tensor | None,
                             n_features: int,
                             pred_len: int,
                             quantile_levels: list) -> torch.Tensor | None:
    """
    Align forecast to [B, pred_len, n_features]. Returns CPU tensor or None.
    Handles common layouts including [B, n_var, pred_len, n_q] and [B, pred_len, n_q, n_var].
    """
    median_idx = len(quantile_levels) // 2

    if q_out is not None:
        if q_out.ndim == 4:
            B, a, b, c = q_out.shape
            # case: [B, n_var, pred_len, n_q]
            if a == n_features and b == pred_len and c == len(quantile_levels):
                med = q_out[..., median_idx]           # [B, n_var, pred_len]
                return med.permute(0, 2, 1).contiguous()  # -> [B, pred_len, n_var]
            # case: [B, pred_len, n_q, n_var]
            if a == pred_len and b == len(quantile_levels) and c == n_features:
                return q_out[:, :, median_idx, :].contiguous()  # [B, pred_len, n_var]
            # try reasonable permutations
            for perm in [(0,1,2,3),(0,2,1,3),(0,3,1,2),(0,2,3,1)]:
                try:
                    cand = q_out.permute(*perm)
                    if cand.ndim == 4 and cand.shape[1] == pred_len and cand.shape[3] == n_features and cand.shape[2] == len(quantile_levels):
                        return cand[:, :, median_idx, :].contiguous()
                except Exception:
                    pass

        elif q_out.ndim == 3:
            s = q_out.shape
            # [B, pred_len, n_var]
            if s[1] == pred_len and s[2] == n_features:
                return q_out.contiguous()
            # [B, n_var, pred_len]
            if s[1] == n_features and s[2] == pred_len:
                return q_out.permute(0, 2, 1).contiguous()
            # [B, n_var, n_q] -> pick median quantile and expand (best-effort)
            if s[1] == n_features and s[2] == len(quantile_levels):
                med = q_out[:, :, median_idx]  # [B, n_var]
                return med.unsqueeze(1).expand(-1, pred_len, -1).contiguous()

    # fallback to mean_out
    if mean_out is not None and isinstance(mean_out, torch.Tensor):
        if mean_out.ndim == 3:
            if mean_out.shape[1] == n_features and mean_out.shape[2] == pred_len:
                return mean_out.permute(0, 2, 1).contiguous()
            if mean_out.shape[1] == pred_len and mean_out.shape[2] == n_features:
                return mean_out.contiguous()

    return None

In [5]:
if PREBUILT_DATA is None:
    data = build_timeseries_data(
        input_chunk_length=INPUT_CHUNK_LENGTH,
        output_chunk_length=OUTPUT_CHUNK_LENGTH,
        target_cols=TARGET_COLS,
        train_dir=TRAIN_DIR,
        val_dir=VAL_DIR,
        target_total_rows=200000,
        target_n_assets=5,
        min_segment_length=96
    )
else:
    data = PREBUILT_DATA
    print(f"Reusing pre-built data: {data['total_windows']} windows, {len(data['train_ts_list'])} assets")

# Create trading-only files for Chronos (which uses the datamodule)
if TUNING_ASSETS is not None:
    create_trading_only_files(
        TRAIN_DIR, VAL_DIR, 
        TRAIN_DIR_FILTERED, VAL_DIR_FILTERED,
        TUNING_ASSETS
    )
    print(f"\n✓ Created filtered trading-only files for {len(TUNING_ASSETS)} assets")

def objective_chronos2(trial: optuna.trial.Trial) -> float:
    """
    Objective function for Chronos-2 hyperparameter tuning.
    """
    # Hyperparameters to tune
    quantile_levels_choice = trial.suggest_categorical("quantile_levels", ["small", "medium", "large"])
    quantile_levels_map = {
        "small": [0.1, 0.5, 0.9],
        "medium": [0.05, 0.25, 0.5, 0.75, 0.95],
        "large": [0.1, 0.25, 0.5, 0.75, 0.9]
    }
    quantile_levels = quantile_levels_map[quantile_levels_choice]

    batch_size = trial.suggest_categorical("batch_size", [16, 32, 64])

    lr = trial.suggest_categorical("lr", [1e-5, 3e-5, 1e-4, 3e-4])

    # Additional hyperparameters specific to your use case
    max_context_length = trial.suggest_categorical("max_context_length", [96, 192, 384])
    
    print(f"\n{'='*60}\nTrial {trial.number}\n{'='*60}")
    start_time = time.time()

    # choose device for loss computation
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    try:
        
        # Load pretrained pipeline (keeps model on cuda if possible per from_pretrained device_map)
        pipeline = Chronos2Pipeline.from_pretrained(
            "amazon/chronos-2",
            device_map="cuda" if torch.cuda.is_available() else None,
            dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
        )

        # Create datamodule with hyperparameters
        datamodule = ElectricityDataModule(
            train_parquet=TRAIN_DIR_FILTERED,
            val_parquet=VAL_DIR_FILTERED,
            scalers_dir=SCALERS_DIR,
            batch_size = batch_size,
            dataset_kwargs={
                'input_chunk_length': min(INPUT_CHUNK_LENGTH, max_context_length),
                'output_chunk_length': OUTPUT_CHUNK_LENGTH,
                'target_cols': TARGET_COLS,
                'assets_list': TUNING_ASSETS
            }
        )

        # Setup datamodule
        datamodule.setup()
        
        # Get validation loader
        val_loader = datamodule.val_dataloader()
        
        # Evaluate on validation set
        total_loss = 0.0
        total_batches = 0
        max_batches = 20  # Limit for speed
        
        with torch.no_grad():
            for batch_idx, batch in enumerate(val_loader):
                if batch_idx >= max_batches:
                    break
                    
                # extract raw
                try:
                    raw_context, raw_target, raw_mask = _extract_context_target_mask(batch)
                except Exception as e:
                    print(f"Error extracting context/target from batch {batch_idx}: {e}")
                    continue

                # convert to tensors (CPU)
                try:
                    context = _to_tensor(raw_context)          # [B, context_len, n_features]
                    target = _to_tensor(raw_target) if raw_target is not None else None
                    future_mask = _to_tensor(raw_mask) if raw_mask is not None else None
                except Exception as e:
                    print(f"Error converting batch elements to tensors (batch {batch_idx}): {e}")
                    continue

                # ensure dims
                if context.ndim == 2:
                    context = context.unsqueeze(-1)
                if target is not None and target.ndim == 2:
                    target = target.unsqueeze(-1)


                # shapes and params
                try:
                    B = context.shape[0]
                    context_length = context.shape[1]
                    n_features = context.shape[2]
                except Exception as e:
                    print(f"Error reading shapes from context (batch {batch_idx}): {e}")
                    continue

                # prepare input for pipeline: [B, n_features, history]
                multivariate_context = context.transpose(1, 2)   # CPU tensor

                # call pipeline (requires numpy input on CPU)
                try:
                    q_out_raw, mean_out_raw = pipeline.predict_quantiles(
                        multivariate_context.cpu().numpy(),
                        prediction_length=OUTPUT_CHUNK_LENGTH,
                        quantile_levels=quantile_levels
                    )
                except Exception as e:
                    print(f"pipeline.predict_quantiles error (batch {batch_idx}): {e}")
                    continue

                # convert pipeline outputs to CPU torch tensors
                try:
                    q_out = _to_torch(q_out_raw)    # CPU
                except Exception as e:
                    print(f"Could not convert quantiles output to tensor: {e}")
                    continue

                mean_out = None
                if mean_out_raw is not None:
                    try:
                        mean_out = _to_torch(mean_out_raw)
                    except Exception as e:
                        print(f"Could not convert mean output to tensor: {e}")
                        mean_out = None

                # align forecast to [B, pred_len, n_features]
                forecast_median = align_forecast_to_target(q_out, mean_out, n_features, OUTPUT_CHUNK_LENGTH, quantile_levels)
                if forecast_median is None:
                    print(f"Shape mismatch after alignment (batch {batch_idx}): could not align forecast {getattr(q_out,'shape',None)} vs target {getattr(target,'shape',None)}")
                    continue

                # final sanity: shapes must match
                if target is None:
                    print(f"Missing target for batch {batch_idx}; skipping")
                    continue
                if forecast_median.shape != target.shape:
                    print(f"Shape mismatch after alignment (batch {batch_idx}): forecast {forecast_median.shape} vs target {target.shape}")
                    continue

                # Move tensors to computation device (GPU preferred) for loss computation
                forecast_median = forecast_median.to(device)
                target = target.to(device)
                future_mask = (future_mask.to(device) if future_mask is not None else torch.ones_like(target).to(device))

                # compute loss
                try:
                    loss = masked_smoothed_smape(forecast_median, target, future_mask)
                    batch_loss = float(loss.item())
                except Exception as e:
                    print(f"Error computing loss on batch {batch_idx}: {e}")
                    continue

                total_loss += batch_loss
                total_batches += 1   

        avg_loss = total_loss / total_batches if total_batches > 0 else float('inf')
        fit_time = time.time() - start_time
        print(f"Trial {trial.number}: val_loss={avg_loss:.6f}, time={fit_time:.1f}s")
                    
        trial.set_user_attr("actual_quantile_levels", quantile_levels)
        return avg_loss                


    except RuntimeError as e:
        if "out of memory" in str(e).lower():
            print(f"Trial {trial.number}: CUDA OOM")
            torch.cuda.empty_cache()
            return float('inf')
        else:
            print(f"Trial {trial.number}: Runtime error: {e}")
            return float('inf')


    except Exception as e:
        print(f"Trial {trial.number}: Failed with {type(e).__name__}: {e}")
        return float('inf')

    finally:
        if 'pipeline' in locals():
            try:
                del pipeline
            except Exception:
                pass
        torch.cuda.empty_cache()
        

print("\n" + "=" * 60)
print("Starting Chronos-2 Multivariate Optimization")
print("=" * 60)

# Create Optuna study
sampler = optuna.samplers.TPESampler(n_startup_trials=3, seed=SEED)
pruner = optuna.pruners.MedianPruner(n_startup_trials=2, n_warmup_steps=2)

study_chronos2 = optuna.create_study(
    direction="minimize",
    study_name="Chronos2_multivariate_tuning",
    sampler=sampler,
    pruner=pruner
)

total_start = time.time()

try:
    study_chronos2.optimize(
        objective_chronos2,
        n_trials=N_TRIALS_Chronos,
        show_progress_bar=True,
        gc_after_trial=True
    )
except KeyboardInterrupt:
    print("\nOptimization interrupted by user")

total_time = time.time() - total_start

print("\n" + "=" * 60)
print("Chronos-2 Multivariate Optimization Complete!")
print("=" * 60)

# Process results
completed_trials = [t for t in study_chronos2.trials if t.state == optuna.trial.TrialState.COMPLETE]
pruned_trials = [t for t in study_chronos2.trials if t.state == optuna.trial.TrialState.PRUNED]
failed_trials = [t for t in study_chronos2.trials if t.state == optuna.trial.TrialState.FAIL]

if len(completed_trials) > 0:
    best_params_chronos2 = study_chronos2.best_params
    best_value = study_chronos2.best_value
    
    # Get the actual quantile levels used in the best trial
    best_trial = study_chronos2.best_trial
    actual_quantile_levels = best_trial.user_attrs.get("actual_quantile_levels", None)
    
    print(f"\nBest trial: {best_trial.number}")
    print(f"Best validation loss: {best_value:.6f}")
    print("\n--- Best Chronos-2 Hyperparameters ---")
    for key, value in best_params_chronos2.items():
        print(f"  {key}: {value}")
    if actual_quantile_levels:
        print(f"  Actual quantile levels used: {actual_quantile_levels}")
    
    trial_durations = [t.duration.total_seconds() for t in completed_trials if t.duration]
    if trial_durations:
        avg_duration = np.mean(trial_durations)
        total_training_time = sum(trial_durations)
        print(f"\nTiming Statistics:")
        print(f"  Total optimization time: {total_time/60:.1f} minutes")
        print(f"  Total training time: {total_training_time/60:.1f} minutes")
        print(f"  Average trial duration: {avg_duration:.1f}s")
    
    print(f"\nTrial Statistics:")
    print(f"  Total trials: {len(study_chronos2.trials)}")
    print(f"  Completed: {len(completed_trials)}")
    print(f"  Pruned: {len(pruned_trials)}")
    print(f"  Failed: {len(failed_trials)}")
    
    os.makedirs(RESULTS_DIR, exist_ok=True)
    
    # Save results with actual quantile levels
    save_params = best_params_chronos2.copy()
    if actual_quantile_levels:
        save_params['actual_quantile_levels'] = actual_quantile_levels
    
    with open(os.path.join(RESULTS_DIR, "best_params_chronos2.json"), "w") as f:
        json.dump(save_params, f, indent=2)
    
    stats = {
        'best_value': best_value,
        'best_params': save_params,
        'best_trial': best_trial.number,
        'n_trials': len(study_chronos2.trials),
        'n_completed': len(completed_trials),
        'n_pruned': len(pruned_trials),
        'n_failed': len(failed_trials),
        'total_time_seconds': total_time
    }
    
    with open(os.path.join(RESULTS_DIR, "chronos2_study_stats.json"), "w") as f:
        json.dump(stats, f, indent=2)
    
    print(f"\nResults saved to {RESULTS_DIR}")
    
    if len(completed_trials) > 4:
        print("\nTop 5 Trials:")
        sorted_trials = sorted(completed_trials, key=lambda t: t.value)[:5]
        for i, trial in enumerate(sorted_trials, 1):
            duration = trial.duration.total_seconds() if trial.duration else 0
            actual_ql = trial.user_attrs.get("actual_quantile_levels", "N/A")
            print(f"  {i}. Trial {trial.number}: loss={trial.value:.6f}, time={duration:.1f}s, quantiles={actual_ql}")
else:
    print("\nNo trials completed successfully!")
    print(f"Failed trials: {len(failed_trials)}")
    print(f"Pruned trials: {len(pruned_trials)}")
    best_params_chronos2 = {}


TRADING-ONLY DATA EXTRACTION
TRADING SEGMENT EXTRACTION
Minimum segment length: 96 rows
(For 15min data: 24.0 hours)


SELECTING SEGMENTS
Target total rows: 200,000
Target n_assets: 5

✓ Selected Thu23Q3              Segments: 105 | Rows:   81,980 | Cumulative:     81,980
✓ Selected Thu23Q2              Segments: 104 | Rows:   81,908 | Cumulative:    163,888
✓ Selected Thu23Q1              Segments: 104 | Rows:   81,849 | Cumulative:    245,737

✓ Reached target of 200,000 rows!

SELECTION SUMMARY
Assets selected: 3
Total segments: 313
Total rows: 245,737

BUILDING TIMESERIES FROM SEGMENTS

Processing Thu23Q3: 105 segments
  ✓ Created TimeSeries: 81,980 rows → 81,923 windows

Processing Thu23Q2: 104 segments
  ✓ Created TimeSeries: 81,908 rows → 81,851 windows

Processing Thu23Q1: 104 segments
  ✓ Created TimeSeries: 81,849 rows → 81,792 windows

TIMESERIES BUILD COMPLETE
Total assets: 3
Total windows: 245,566

FINAL VERIFICATION
✓ Assets: 3
✓ Total windows: 245,566
✓ Training data is

[I 2025-10-31 01:55:46,126] A new study created in memory with name: Chronos2_multivariate_tuning


✓ Created training file for Thu23Q1
✓ Created validation file for Thu23Q1

✓ Created filtered trading-only files for 3 assets

Starting Chronos-2 Multivariate Optimization


  0%|          | 0/20 [00:00<?, ?it/s]


Trial 0
Trial 0: val_loss=0.507279, time=535.6s
[I 2025-10-31 02:04:43,332] Trial 0 finished with value: 0.5072793900966645 and parameters: {'quantile_levels': 'large', 'batch_size': 32, 'lr': 1e-05, 'max_context_length': 96}. Best is trial 0 with value: 0.5072793900966645.

Trial 1
Trial 1: val_loss=0.489217, time=536.2s
[I 2025-10-31 02:13:41,514] Trial 1 finished with value: 0.4892168939113617 and parameters: {'quantile_levels': 'medium', 'batch_size': 64, 'lr': 0.0001, 'max_context_length': 96}. Best is trial 1 with value: 0.4892168939113617.

Trial 2
Trial 2: val_loss=0.518395, time=534.2s
[I 2025-10-31 02:22:37,773] Trial 2 finished with value: 0.5183945670723915 and parameters: {'quantile_levels': 'medium', 'batch_size': 16, 'lr': 3e-05, 'max_context_length': 384}. Best is trial 1 with value: 0.4892168939113617.

Trial 3
Trial 3: val_loss=0.489217, time=545.3s
[I 2025-10-31 02:31:45,139] Trial 3 finished with value: 0.4892168939113617 and parameters: {'quantile_levels': 'medium

There actually many trials with the same loss. All of those trials have the batch size of 64 and all but one use medium quantile levels.The learing rates and max context length varies. To speed the training and hopefully the convergence, I'll pick the largest learning rates and mininun max context length among those successfull trials. This will require a manual intervention.

In [8]:
best_params_chronos2 = {
  "quantile_levels": "medium",
  "batch_size": 64,
  "lr": 0.0003,
  "max_context_length": 96,
  "actual_quantile_levels": [
    0.05,
    0.25,
    0.5,
    0.75,
    0.95
  ]
}

with open(os.path.join(RESULTS_DIR, "best_params_chronos2.json"), "w") as f:
    json.dump(save_params, f, indent=2)

## 4. LSTM Hyperparameter Tuning

In [6]:
torch.set_float32_matmul_precision('medium')

if PREBUILT_DATA is None:
    data = build_timeseries_data(
        input_chunk_length=INPUT_CHUNK_LENGTH,
        output_chunk_length=OUTPUT_CHUNK_LENGTH,
        target_cols=TARGET_COLS,
        train_dir=TRAIN_DIR,
        val_dir=VAL_DIR,
        target_total_rows=200000,
        target_n_assets=5,
        min_segment_length=96
    )
else:
    data = PREBUILT_DATA
    print(f"Reusing pre-built data: {data['total_windows']} windows, {len(data['train_ts_list'])} assets")

train_ts_list = data['train_ts_list']
val_ts_list = data['val_ts_list']
past_covariates_train_list = data['past_covariates_train_list']
past_covariates_val_list = data['past_covariates_val_list']

class PyTorchLightningPruningCallback(pl.Callback):
    """Prune trials that perform poorly early."""
    
    def __init__(self, trial, monitor="val_loss"):
        super().__init__()
        self.trial = trial
        self.monitor = monitor
    
    def on_validation_end(self, trainer, pl_module):
        epoch = trainer.current_epoch
        current_score = trainer.callback_metrics.get(self.monitor)
        if current_score is None:
            return
        
        self.trial.report(float(current_score), epoch)
        
        if self.trial.should_prune():
            message = f"Trial pruned at epoch {epoch}"
            raise optuna.TrialPruned(message)

def objective_lstm_optimized(trial: optuna.trial.Trial) -> float:
    """
    Optimized LSTM objective using BlockRNNModel for multi-step forecasting.
    """
    
    hidden_dim = trial.suggest_categorical("hidden_dim", [32, 64, 128, 256])
    n_rnn_layers = trial.suggest_int("n_rnn_layers", 2, 4) 
    dropout = trial.suggest_categorical("dropout", [0.0, 0.1, 0.2, 0.3, 0.4])
    lr = trial.suggest_categorical("lr", [1e-4, 3e-4, 1e-3, 3e-3, 1e-2])
    batch_size = trial.suggest_categorical("batch_size", [32, 64, 128])
    
    print(f"\n{'='*60}")
    print(f"Trial {trial.number}")
    print(f"{'='*60}")
    print(f"Params: hidden={hidden_dim}, layers={n_rnn_layers}, ")
    print(f"        dropout={dropout:.3f}, lr={lr:.5f}, batch={batch_size}")
    
    start_time = time.time()
    
    try:
        timestamp = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
        trial_id = f"trial_{trial.number:03d}"
        ckpt_dir = f"./checkpoints/LSTM_optuna/{trial_id}"
        os.makedirs(ckpt_dir, exist_ok=True)
        
        early_stopping = EarlyStopping(
            monitor='val_loss',
            patience=3,
            mode='min',
            min_delta=1e-4,
            verbose=False
        )
        
        pruning_callback = PyTorchLightningPruningCallback(trial, monitor="val_loss")
        
        max_epochs = 10

        trainer_kwargs = {
            'accelerator': 'gpu',
            'devices': 1,
            'precision': '16-mixed',
            'max_epochs': max_epochs,
            'gradient_clip_val': 1.0,
            'gradient_clip_algorithm': 'norm',
            'enable_progress_bar': False,
            'enable_model_summary': False,
            'logger': False,
            'callbacks': [early_stopping, pruning_callback],
            'benchmark': True,
            'deterministic': False,
            'val_check_interval': 1.0,
            'num_sanity_val_steps': 0
        }
        
        model = BlockRNNModel(
            model="LSTM",
            input_chunk_length=INPUT_CHUNK_LENGTH,
            output_chunk_length=OUTPUT_CHUNK_LENGTH,
            hidden_dim=hidden_dim,
            n_rnn_layers=n_rnn_layers,
            dropout=dropout,
            batch_size=batch_size,
            n_epochs=max_epochs,
            loss_fn=torch.nn.L1Loss(),
            optimizer_kwargs={'lr': lr},
            model_name=f"LSTM_trial_{trial.number}",
            save_checkpoints=True,
            work_dir=ckpt_dir,
            force_reset=True,
            random_state=SEED,
            pl_trainer_kwargs=trainer_kwargs
        )
        
        model.fit(
            series=train_ts_list,
            past_covariates=past_covariates_train_list,
            val_series=val_ts_list,
            val_past_covariates=past_covariates_val_list,
            verbose=False
        )
        
        fit_time = time.time() - start_time
        
        if hasattr(model, 'trainer') and model.trainer is not None:
            val_loss = model.trainer.callback_metrics.get("val_loss", None)
            
            if val_loss is None:
                print(f"Trial {trial.number}: No val_loss found")
                return float('inf')
            
            val_loss_value = float(val_loss.item() if torch.is_tensor(val_loss) else val_loss)
            
            if not np.isfinite(val_loss_value):
                print(f"Trial {trial.number}: Invalid val_loss={val_loss_value}")
                return float('inf')
            
            actual_epochs = model.trainer.current_epoch + 1
            
            print(f"Trial {trial.number}: val_loss={val_loss_value:.6f}, ")
            print(f"              epochs={actual_epochs}/{max_epochs}, time={fit_time:.1f}s")
            
            return val_loss_value
        else:
            print(f"Trial {trial.number}: Trainer not accessible")
            return float('inf')
            
    except optuna.TrialPruned:
        print(f"Trial {trial.number}: Pruned")
        raise
        
    except RuntimeError as e:
        if "out of memory" in str(e).lower():
            print(f"Trial {trial.number}: CUDA OOM")
            torch.cuda.empty_cache()
            return float('inf')
        else:
            print(f"Trial {trial.number}: Runtime error: {e}")
            return float('inf')
    
    except Exception as e:
        print(f"Trial {trial.number}: Failed with {type(e).__name__}: {e}")
        return float('inf')
    
    finally:
        if 'model' in locals():
            try:
                del model
            except:
                pass
        torch.cuda.empty_cache()


print("\n" + "=" * 60)
print("Starting LSTM Optimization")
print("=" * 60)

sampler = optuna.samplers.TPESampler(n_startup_trials=5, seed=SEED)
pruner = optuna.pruners.MedianPruner(n_startup_trials=3, n_warmup_steps=3, interval_steps=1)

study_lstm = optuna.create_study(
    direction="minimize",
    study_name="LSTM_tuning",
    sampler=sampler,
    pruner=pruner
)

total_start = time.time()

try:
    study_lstm.optimize(
        objective_lstm_optimized,
        n_trials=N_TRIALS_LSTM,
        show_progress_bar=True,
        gc_after_trial=True
    )
except KeyboardInterrupt:
    print("\nOptimization interrupted by user")

total_time = time.time() - total_start

print("\n" + "=" * 60)
print("Optimization Complete!")
print("=" * 60)

completed_trials = [t for t in study_lstm.trials if t.state == optuna.trial.TrialState.COMPLETE]
pruned_trials = [t for t in study_lstm.trials if t.state == optuna.trial.TrialState.PRUNED]
failed_trials = [t for t in study_lstm.trials if t.state == optuna.trial.TrialState.FAIL]

if len(completed_trials) > 0:
    best_params_lstm = study_lstm.best_params
    best_value = study_lstm.best_value
    
    print(f"\nBest trial: {study_lstm.best_trial.number}")
    print(f"Best validation loss: {best_value:.6f}")
    print("\n--- Best LSTM Hyperparameters ---")
    for key, value in best_params_lstm.items():
        print(f"  {key}: {value}")
    
    trial_durations = [t.duration.total_seconds() for t in completed_trials if t.duration]
    if trial_durations:
        avg_duration = np.mean(trial_durations)
        total_training_time = sum(trial_durations)
        print(f"\nTiming Statistics:")
        print(f"  Total optimization time: {total_time/60:.1f} minutes")
        print(f"  Total training time: {total_training_time/60:.1f} minutes")
        print(f"  Average trial duration: {avg_duration:.1f}s")
    
    print(f"\nTrial Statistics:")
    print(f"  Total trials: {len(study_lstm.trials)}")
    print(f"  Completed: {len(completed_trials)}")
    print(f"  Pruned: {len(pruned_trials)}")
    print(f"  Failed: {len(failed_trials)}")
    
    os.makedirs(RESULTS_DIR, exist_ok=True)
    
    with open(os.path.join(RESULTS_DIR, "best_params_lstm.json"), "w") as f:
        json.dump(best_params_lstm, f, indent=2)
    
    stats = {
        'best_value': best_value,
        'best_params': best_params_lstm,
        'best_trial': study_lstm.best_trial.number,
        'n_trials': len(study_lstm.trials),
        'n_completed': len(completed_trials),
        'n_pruned': len(pruned_trials),
        'n_failed': len(failed_trials),
        'total_time_seconds': total_time
    }
    
    with open(os.path.join(RESULTS_DIR, "lstm_study_stats.json"), "w") as f:
        json.dump(stats, f, indent=2)
    
    print(f"\nResults saved to {RESULTS_DIR}")
    
    if len(completed_trials) > 4:
        print("\nTop 5 Trials:")
        sorted_trials = sorted(completed_trials, key=lambda t: t.value)[:5]
        for i, trial in enumerate(sorted_trials, 1):
            duration = trial.duration.total_seconds() if trial.duration else 0
            print(f"  {i}. Trial {trial.number}: loss={trial.value:.6f}, time={duration:.1f}s")

else:
    print("\nNo trials completed successfully!")
    print(f"Failed trials: {len(failed_trials)}")
    print(f"Pruned trials: {len(pruned_trials)}")
    best_params_lstm = {}

  _C._set_float32_matmul_precision(precision)
[I 2025-10-31 04:59:04,748] A new study created in memory with name: LSTM_tuning


Reusing pre-built data: 245566 windows, 3 assets

Starting LSTM Optimization


  0%|          | 0/10 [00:00<?, ?it/s]


Trial 0
Params: hidden=128, layers=4, 
        dropout=0.100, lr=0.01000, batch=32


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 0: val_loss=22.747440, 
              epochs=6/10, time=1550.0s
[I 2025-10-31 05:24:54,758] Trial 0 finished with value: 22.747440338134766 and parameters: {'hidden_dim': 128, 'n_rnn_layers': 4, 'dropout': 0.1, 'lr': 0.01, 'batch_size': 32}. Best is trial 0 with value: 22.747440338134766.


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True



Trial 1
Params: hidden=32, layers=2, 
        dropout=0.400, lr=0.00030, batch=32


TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
`Trainer.fit` stopped: `max_epochs=10` reached.


Trial 1: val_loss=11.313666, 
              epochs=11/10, time=2493.1s
[I 2025-10-31 06:06:28,393] Trial 1 finished with value: 11.313666343688965 and parameters: {'hidden_dim': 32, 'n_rnn_layers': 2, 'dropout': 0.4, 'lr': 0.0003, 'batch_size': 32}. Best is trial 1 with value: 11.313666343688965.


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



Trial 2
Params: hidden=128, layers=3, 
        dropout=0.100, lr=0.00300, batch=128
Trial 2: val_loss=9.943198, 
              epochs=6/10, time=445.0s
[I 2025-10-31 06:13:53,957] Trial 2 finished with value: 9.943198204040527 and parameters: {'hidden_dim': 128, 'n_rnn_layers': 3, 'dropout': 0.1, 'lr': 0.003, 'batch_size': 128}. Best is trial 2 with value: 9.943198204040527.


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



Trial 3
Params: hidden=128, layers=4, 
        dropout=0.000, lr=0.00300, batch=128
Trial 3: val_loss=11.340981, 
              epochs=10/10, time=860.8s
[I 2025-10-31 06:28:15,317] Trial 3 finished with value: 11.340981483459473 and parameters: {'hidden_dim': 128, 'n_rnn_layers': 4, 'dropout': 0.0, 'lr': 0.003, 'batch_size': 128}. Best is trial 2 with value: 9.943198204040527.


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]



Trial 4
Params: hidden=256, layers=3, 
        dropout=0.000, lr=0.00300, batch=64


`Trainer.fit` stopped: `max_epochs=10` reached.
Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 4: val_loss=10.689938, 
              epochs=11/10, time=1935.4s
[I 2025-10-31 07:00:31,287] Trial 4 finished with value: 10.689937591552734 and parameters: {'hidden_dim': 256, 'n_rnn_layers': 3, 'dropout': 0.0, 'lr': 0.003, 'batch_size': 64}. Best is trial 2 with value: 9.943198204040527.

Trial 5
Params: hidden=64, layers=2, 
        dropout=0.100, lr=0.00100, batch=128


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 5: val_loss=8.530094, 
              epochs=9/10, time=153.9s
[I 2025-10-31 07:03:05,367] Trial 5 finished with value: 8.530094146728516 and parameters: {'hidden_dim': 64, 'n_rnn_layers': 2, 'dropout': 0.1, 'lr': 0.001, 'batch_size': 128}. Best is trial 5 with value: 8.530094146728516.

Trial 6
Params: hidden=64, layers=2, 
        dropout=0.200, lr=0.00100, batch=128


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 6: val_loss=9.693222, 
              epochs=9/10, time=81.4s
[I 2025-10-31 07:04:26,869] Trial 6 finished with value: 9.693222045898438 and parameters: {'hidden_dim': 64, 'n_rnn_layers': 2, 'dropout': 0.2, 'lr': 0.001, 'batch_size': 128}. Best is trial 5 with value: 8.530094146728516.

Trial 7
Params: hidden=64, layers=2, 
        dropout=0.300, lr=0.00010, batch=64


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 7: Pruned
[I 2025-10-31 07:05:39,035] Trial 7 pruned. Trial pruned at epoch 3

Trial 8
Params: hidden=64, layers=2, 
        dropout=0.100, lr=0.00100, batch=128


Detected user-defined float16-like precision. For mixed precision training, recommended options are 'bf16-mixed' and '16-mixed'.
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Trial 8: val_loss=8.530094, 
              epochs=9/10, time=83.4s
[I 2025-10-31 07:07:02,533] Trial 8 finished with value: 8.530094146728516 and parameters: {'hidden_dim': 64, 'n_rnn_layers': 2, 'dropout': 0.1, 'lr': 0.001, 'batch_size': 128}. Best is trial 5 with value: 8.530094146728516.

Trial 9
Params: hidden=32, layers=3, 
        dropout=0.400, lr=0.00100, batch=128
Trial 9: Pruned
[I 2025-10-31 07:07:50,370] Trial 9 pruned. Trial pruned at epoch 3

Optimization Complete!

Best trial: 5
Best validation loss: 8.530094

--- Best LSTM Hyperparameters ---
  hidden_dim: 64
  n_rnn_layers: 2
  dropout: 0.1
  lr: 0.001
  batch_size: 128

Timing Statistics:
  Total optimization time: 128.8 minutes
  Total training time: 126.7 minutes
  Average trial duration: 950.4s

Trial Statistics:
  Total trials: 10
  Completed: 8
  Pruned: 2
  Failed: 0

Results saved to ..\results

Top 5 Trials:
  1. Trial 5: loss=8.530094, time=153.9s
  2. Trial 8: loss=8.530094, time=83.4s
  3. Trial 6: loss=9.6

## 5. Save Best Hyperparameters

In [9]:
best_params = {
    "chronos": best_params_chronos2,
    "LSTM": best_params_lstm,
}
output_path = os.path.join(RESULTS_DIR, "best_hyperparameters.json")
with open(output_path, 'w') as f:
    json.dump(best_params, f, indent=4)

print(f"Best hyperparameters saved to: {output_path}")
print('--- Contents ---')
print(json.dumps(best_params, indent=4))

Best hyperparameters saved to: ..\results\best_hyperparameters_tst.json
--- Contents ---
{
    "chronos": {
        "quantile_levels": "medium",
        "batch_size": 64,
        "lr": 0.0003,
        "max_context_length": 96,
        "actual_quantile_levels": [
            0.05,
            0.25,
            0.5,
            0.75,
            0.95
        ]
    },
    "LSTM": {
        "hidden_dim": 64,
        "n_rnn_layers": 2,
        "dropout": 0.1,
        "lr": 0.001,
        "batch_size": 128
    }
}
