## ⚡ ULTRA-FAST Quick Test Configuration

**🚀 For immediate testing and debugging - runs in under 2 minutes!**

This ultra-fast configuration is designed for:
- **Immediate feedback** during development
- **Quick sanity checks** that everything works
- **Fast iterations** while debugging
- **Proof of concept** validation

**Expected time: 30 seconds - 2 minutes total**

## ⚠️ **Important Note**

This notebook uses the **standard TSLib data loading approach** with `Dataset_Custom` for maximum compatibility and performance. The prepared financial data (`prepared_financial_data.csv`) contains all targets and covariates in a single file that's ready for TimesNet training.

**Data Structure:**
- ✅ **Targets**: 4 columns (log_Open, log_High, log_Low, log_Close)
- ✅ **Covariates**: 114 columns (87 dynamic + 26 static + 1 time_delta)
- ✅ **Total**: 118 features aligned on business days
- ✅ **Ready to use**: No additional data preparation needed

In [1]:
# ⚡ ULTRA-FAST CONFIGURATION - FOR IMMEDIATE TESTING
# This will run in under 2 minutes!
from datetime import datetime
class UltraFastConfig:
    """Ultra-fast configuration for immediate testing and debugging"""
    
    # === DATA CONFIGURATION ===
    data = 'custom'
    root_path = './data/'
    data_path = 'prepared_financial_data.csv'
    features = 'M'
    target = 'log_Close'
    freq = 'b'
    
    # === SEQUENCE PARAMETERS - MINIMAL ===
    seq_len = 20                       # ULTRA-FAST: Very short sequences
    label_len = 5                      # ULTRA-FAST: Minimal label length
    pred_len = 3                       # ULTRA-FAST: Very short predictions
    
    # === SPLITS - TINY FOR SPEED ===
    val_len = 5                        # ULTRA-FAST: Minimal validation
    test_len = 5                       # ULTRA-FAST: Minimal test
    prod_len = 3                       # ULTRA-FAST: Minimal production
    
    # === MODEL ARCHITECTURE - MINIMAL ===
    enc_in = 118                       # Keep same (data structure requirement)
    dec_in = 118                       
    c_out = 118                        
    d_model = 16                       # ULTRA-FAST: Tiny model dimension
    d_ff = 32                          # ULTRA-FAST: Tiny feed-forward
    
    # === ATTENTION - MINIMAL ===
    n_heads = 2                        # ULTRA-FAST: Minimal heads (must divide d_model)
    e_layers = 1                       # ULTRA-FAST: Single encoder layer
    d_layers = 1                       # ULTRA-FAST: Single decoder layer
    
    # === TIMESNET - MINIMAL ===
    top_k = 2                          # ULTRA-FAST: Minimal frequencies
    num_kernels = 2                    # ULTRA-FAST: Minimal kernels
    
    # === REGULARIZATION ===
    dropout = 0.0                      # ULTRA-FAST: No dropout for speed
    
    # === ADDITIONAL SETTINGS ===
    embed = 'timeF'
    activation = 'gelu'
    factor = 1
    distil = False                     # ULTRA-FAST: No distillation
    moving_avg = 5                     # ULTRA-FAST: Minimal moving average
    output_attention = False
    
    # === TRAINING - ULTRA MINIMAL ===
    train_epochs = 3                   # ULTRA-FAST: Just 3 epochs!
    batch_size = 64                    # ULTRA-FAST: Larger batches for speed
    learning_rate = 0.01               # ULTRA-FAST: Higher LR for faster convergence
    patience = 2                       # ULTRA-FAST: Very low patience
    lradj = 'type1'
    
    # === OPTIMIZATION ===
    loss = 'MSE'
    use_amp = True                     # ULTRA-FAST: Use AMP for speed
    
    # === SYSTEM ===
    num_workers = 2                    # ULTRA-FAST: Minimal workers
    seed = 2024
    task_name = 'short_term_forecast'
    des = 'ultra_fast_test'
    checkpoints = f'./checkpoints/TimesNet_ultra_fast_{datetime.now().strftime("%Y%m%d_%H%M")}'

# Create ultra-fast config
ultra_args = UltraFastConfig()

print("⚡ ULTRA-FAST Configuration Loaded:")
print(f"   📏 Sequence: {ultra_args.seq_len} → {ultra_args.pred_len}")
print(f"   🧠 Model: d_model={ultra_args.d_model}, layers={ultra_args.e_layers}")
print(f"   ⚡ Epochs: {ultra_args.train_epochs} (should complete in ~2 minutes)")
print(f"   📊 Batch Size: {ultra_args.batch_size}")
print(f"   🚀 Expected Total Time: 30 seconds - 2 minutes")
print()
print("💡 This config prioritizes SPEED over accuracy for quick testing!")

⚡ ULTRA-FAST Configuration Loaded:
   📏 Sequence: 20 → 3
   🧠 Model: d_model=16, layers=1
   ⚡ Epochs: 3 (should complete in ~2 minutes)
   📊 Batch Size: 64
   🚀 Expected Total Time: 30 seconds - 2 minutes

💡 This config prioritizes SPEED over accuracy for quick testing!


In [2]:
# 🎛️ CONFIGURATION SWITCHER - Choose your speed!

def switch_to_ultra_fast():
    """Switch to ultra-fast configuration for immediate testing"""
    global args
    args = ultra_args
    print("⚡ Switched to ULTRA-FAST configuration!")
    print("🕐 Expected completion time: 30 seconds - 2 minutes")
    return args

def switch_to_light():
    """Switch to standard light configuration"""
    global args
    args = LightConfig()
    print("💡 Switched to LIGHT configuration")
    print("🕐 Expected completion time: 5-10 minutes")
    return args

# 🚀 FOR IMMEDIATE TESTING - USE ULTRA-FAST!
print("🎯 Configuration Options:")
print("1. ⚡ ULTRA-FAST: Run switch_to_ultra_fast() - completes in ~2 minutes")
print("2. 💡 LIGHT: Run switch_to_light() - completes in ~5-10 minutes")
print()
print("💡 Recommendation: Start with ULTRA-FAST to verify everything works!")
print()

# Automatically switch to ultra-fast for immediate testing
args = switch_to_ultra_fast()

🎯 Configuration Options:
1. ⚡ ULTRA-FAST: Run switch_to_ultra_fast() - completes in ~2 minutes
2. 💡 LIGHT: Run switch_to_light() - completes in ~5-10 minutes

💡 Recommendation: Start with ULTRA-FAST to verify everything works!

⚡ Switched to ULTRA-FAST configuration!
🕐 Expected completion time: 30 seconds - 2 minutes


In [3]:
# 🚨 DEBUGGING CONFIGURATION - For Troubleshooting Only
# This is an EXTREMELY minimal config to test if basic functionality works

class DebuggingConfig:
    """Ultra-minimal configuration for troubleshooting training bottlenecks"""
    
    # === DATA CONFIGURATION - TINY ===
    seq_len = 12                       # MINIMAL: Only 12 input steps
    label_len = 2                      # MINIMAL: Only 2 label steps
    pred_len = 1                       # MINIMAL: Predict just 1 step
    
    # Data settings
    data = 'prepared_financial_data'
    root_path = './data/'
    data_path = 'prepared_financial_data.csv'
    features = 'M'                     # Multivariate
    target = 'close'                   # Primary target
    freq = 'd'                         # Daily frequency
    
    # === MODEL CONFIGURATION - TINY ===
    enc_in = 118                       # Keep same (data requirement)
    dec_in = 118                       
    c_out = 118                        
    d_model = 8                        # MINIMAL: Tiny dimension (smallest possible)
    d_ff = 16                          # MINIMAL: Tiny feed-forward
    
    # === ATTENTION - MINIMAL ===
    n_heads = 1                        # MINIMAL: Single head
    e_layers = 1                       # MINIMAL: Single encoder layer
    d_layers = 1                       # MINIMAL: Single decoder layer
    
    # === TIMESNET - MINIMAL ===
    top_k = 1                          # MINIMAL: Single frequency
    num_kernels = 1                    # MINIMAL: Single kernel
    
    # === REGULARIZATION ===
    dropout = 0.0                      # No dropout
    
    # === ADDITIONAL SETTINGS ===
    embed = 'timeF'
    activation = 'gelu'
    factor = 1
    distil = False
    moving_avg = 3                     # MINIMAL: Smallest moving average
    output_attention = False
    
    # === TRAINING - ULTRA MINIMAL ===
    train_epochs = 1                   # MINIMAL: Just 1 epoch!
    batch_size = 16                    # MINIMAL: Small batch for debugging
    learning_rate = 0.01               
    patience = 1                       # MINIMAL: No patience
    lradj = 'type1'
    
    # === OPTIMIZATION ===
    loss = 'MSE'
    use_amp = False                    # Disable AMP for debugging
    
    # === SYSTEM ===
    num_workers = 0                    # MINIMAL: No multiprocessing
    seed = 2024
    task_name = 'short_term_forecast'
    des = 'debug_test'
    checkpoints = f'./checkpoints/TimesNet_debug_{datetime.now().strftime("%Y%m%d_%H%M")}'

# Create debugging config
debug_args = DebuggingConfig()

def switch_to_debugging():
    """Switch to debugging configuration for troubleshooting"""
    global args
    args = debug_args
    print("🚨 Switched to DEBUGGING configuration!")
    print("⚠️ This is EXTREMELY minimal - for troubleshooting only!")
    print("🕐 Expected completion time: 10-30 seconds")
    print(f"   📏 Sequence: {args.seq_len} → {args.pred_len}")
    print(f"   🧠 Model: d_model={args.d_model}, single layer, single head")
    print(f"   ⚡ Epochs: {args.train_epochs} epoch")
    print(f"   📊 Batch Size: {args.batch_size}")
    return args

print("🚨 NEW DEBUGGING MODE AVAILABLE:")
print("   Run switch_to_debugging() if ultra-fast is still too slow")
print("   This uses the absolute minimum possible configuration")

🚨 NEW DEBUGGING MODE AVAILABLE:
   Run switch_to_debugging() if ultra-fast is still too slow
   This uses the absolute minimum possible configuration


In [4]:
switch_to_debugging

<function __main__.switch_to_debugging()>

## 🚨 **URGENT: Read This If Training is Slow!**

### 🎯 **Quick Start for Immediate Results:**

**If you're experiencing slow training, follow these steps RIGHT NOW:**

**Step 1: 🚨 Try Emergency Mode**
```python
emergency_debug_mode()  # Should complete in <30 seconds
```

**Step 2: ⚡ If that works, try Ultra-Fast**
```python
switch_to_ultra_fast()  # Should complete in <2 minutes
```

**Step 3: 📊 If still slow, run diagnostics**
```python
# Run the quick speed test cell below
# Then run the full diagnostics cell
```

### 🔍 **What Each Mode Does:**

| Mode | Time | Purpose |
|------|------|---------|
| 🚨 **Emergency** | <30 sec | Test if ANYTHING works |
| ⚡ **Ultra-Fast** | <2 min | Quick experimentation |
| 💡 **Light** | <10 min | Normal light training |

### ⚠️ **Common Issues & Solutions:**

**Problem: First batch never completes**
- ✅ Run `emergency_debug_mode()`
- ✅ Check GPU availability
- ✅ Set `num_workers = 0`

**Problem: Training very slow (>5 min)**
- ✅ Use `switch_to_debugging()`
- ✅ Reduce batch size to 8 or 16
- ✅ Disable AMP: `use_amp = False`

**Problem: Out of memory**
- ✅ Reduce `batch_size` to 8
- ✅ Reduce `d_model` to 8
- ✅ Use `switch_to_debugging()`

### 🎯 **Success Criteria:**

- ✅ **Emergency mode**: Completes in 10-30 seconds
- ✅ **Ultra-fast mode**: Completes in 30 seconds - 2 minutes  
- ✅ **Light mode**: Completes in 2-10 minutes

If emergency mode fails, there's a fundamental system issue (GPU, CUDA, data corruption, etc.)

# TimesNet Light Configuration - Financial Data Training

This notebook contains a **lightweight TimesNet configuration** optimized for:
- Fast experimentation and testing
- Quick iterations during development
- Resource-constrained environments
- Proof of concept validations

**Dataset**: Financial time series with 4 targets + 114 covariates (118 total features)
**Training Time**: ~5-10 minutes per epoch

In [5]:
# Import required libraries
import os
import sys
import time
import torch
import numpy as np
import pandas as pd
from datetime import datetime

# === ROBUST PATH SETUP FOR GPU DEPLOYMENT ===
# This handles both local development and GPU server deployment

# Method 1: Auto-detect project root
def setup_project_path():
    """Automatically detect and add the TimesNet project root to Python path"""
    
    # Try to find project root automatically
    current_dir = os.getcwd()
    possible_roots = [
        current_dir,  # Current working directory
        os.path.dirname(os.path.abspath('.')),  # Parent directory
        os.path.dirname(os.path.abspath(__file__ if '__file__' in globals() else '.')),  # Script directory
    ]
    
    # Check if this is a custom path (like Google Colab)
    if '/content/drive/MyDrive' in current_dir or any('timesnet' in p.lower() for p in [current_dir]):
        # Custom deployment path detected
        print(f"🔍 Custom deployment detected: {current_dir}")
        if current_dir not in sys.path:
            sys.path.insert(0, current_dir)
            print(f"✅ Added to path: {current_dir}")
    
    # Try each possible root
    for root in possible_roots:
        if root and os.path.exists(os.path.join(root, 'models')) and os.path.exists(os.path.join(root, 'utils')):
            if root not in sys.path:
                sys.path.insert(0, root)
                print(f"✅ Project root found and added: {root}")
                return root
    
    # If not found, add current directory as fallback
    if current_dir not in sys.path:
        sys.path.insert(0, current_dir)
        print(f"⚠️  Using current directory as fallback: {current_dir}")
    
    return current_dir

# Setup the project path
project_root = setup_project_path()
print(f"📁 Project root: {project_root}")
print(f"🐍 Python path includes: {[p for p in sys.path[:3]]}")

# Try to import with error handling
try:
    from models.TimesNet import Model as TimesNet
    from utils.tools import EarlyStopping, adjust_learning_rate
    from utils.metrics import metric
    from utils.logger import logger
    from data_provider.data_loader import Dataset_Custom
    from torch.utils.data import DataLoader
    
    print("✅ All imports successful!")
    
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("\n🔧 TROUBLESHOOTING:")
    print("1. Verify you extracted the GPU package correctly")
    print("2. Check that these directories exist:")
    for dir_name in ['models', 'utils', 'data_provider', 'layers', 'exp']:
        dir_path = os.path.join(project_root, dir_name)
        exists = os.path.exists(dir_path)
        print(f"   {'✅' if exists else '❌'} {dir_path}")
    
    print("\n💡 Quick fix - run this in the next cell:")
    print("import sys")
    print(f"sys.path.insert(0, '{project_root}')")
    print("# Then re-run the imports")
    
    raise
from torch.utils.data import DataLoader

print("✅ All imports successful")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"💻 Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

# Enhanced GPU information
if torch.cuda.is_available():
    print(f"🚀 GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
    print(f"⚡ CUDA Version: {torch.version.cuda}")
    print("🎯 GPU acceleration will be used automatically!")
else:
    print("⚠️  No GPU detected - will use CPU (training will be slower)")

✅ Project root found and added: d:\workspace\Time-Series-Library
📁 Project root: d:\workspace\Time-Series-Library
🐍 Python path includes: ['d:\\workspace\\Time-Series-Library', 'C:\\Users\\mishr\\AppData\\Local\\Programs\\Python\\Python311\\python311.zip', 'C:\\Users\\mishr\\AppData\\Local\\Programs\\Python\\Python311\\DLLs']
✅ All imports successful!
✅ All imports successful
🔥 PyTorch version: 2.7.1+cpu
💻 Device: CPU
⚠️  No GPU detected - will use CPU (training will be slower)
✅ All imports successful!
✅ All imports successful
🔥 PyTorch version: 2.7.1+cpu
💻 Device: CPU
⚠️  No GPU detected - will use CPU (training will be slower)


## 🔧 Light Configuration Parameters

**Purpose**: Fast training for quick experimentation and validation

In [6]:
# ================================
# LIGHT CONFIGURATION - TIMESNET
# ================================

class LightConfig:
    # === DATA CONFIGURATION ===
    data = 'custom'                    # Dataset type (custom for prepared financial data)
    root_path = './data/'              # Root directory for data files
    data_path = 'prepared_financial_data.csv'  # Main data file
    features = 'M'                     # Forecasting mode: 'M'=Multivariate, 'S'=Univariate, 'MS'=Multivariate-to-Univariate
    target = 'log_Close'               # Primary target column (for 'S' mode)
    freq = 'b'                         # Time frequency: 'b'=business day, 'h'=hourly, 'd'=daily
    
    # === SEQUENCE PARAMETERS ===
    seq_len = 50                       # Input sequence length (lookback window) - LIGHT: shorter for speed
    label_len = 10                     # Start token length for decoder input (overlap with seq_len)
    pred_len = 5                       # Prediction horizon (how many steps to forecast) - LIGHT: shorter predictions
    
    # === TRAIN/VAL/TEST SPLITS ===
    val_len = 10                       # Validation set length in time steps
    test_len = 10                      # Test set length in time steps
    prod_len = 5                       # Production forecast length (future predictions beyond data)
    
    # === TIMESNET MODEL ARCHITECTURE ===
    # Core dimensions
    enc_in = 118                       # Encoder input size (total features: 4 targets + 114 covariates)
    dec_in = 118                       # Decoder input size (usually same as enc_in)
    c_out = 118                        # Output size (must match enc_in to avoid dimension mismatch)
    d_model = 32                       # Model dimension (embedding size) - LIGHT: smaller for speed
    d_ff = 64                          # Feed-forward network dimension - LIGHT: smaller FFN
    
    # Attention mechanism
    n_heads = 4                        # Number of attention heads - LIGHT: fewer heads
    e_layers = 2                       # Number of encoder layers - LIGHT: fewer layers
    d_layers = 1                       # Number of decoder layers (usually 1 for forecasting)
    
    # TimesNet specific parameters
    top_k = 3                          # Top-k frequencies for TimesNet decomposition - LIGHT: fewer frequencies
    num_kernels = 3                    # Number of convolution kernels in Inception blocks - LIGHT: fewer kernels
    
    # Regularization
    dropout = 0.1                      # Dropout rate for regularization
    
    # Additional model settings
    embed = 'timeF'                    # Time feature embedding: 'timeF'=time features, 'fixed'=learnable, 'learned'=learned
    activation = 'gelu'                # Activation function: 'gelu', 'relu', 'swish'
    factor = 1                         # Attention factor (usually 1)
    distil = True                      # Whether to use knowledge distillation
    moving_avg = 25                    # Moving average window for trend decomposition
    output_attention = False           # Whether to output attention weights
    
    # === TRAINING CONFIGURATION ===
    train_epochs = 10                  # Number of training epochs - LIGHT: fewer epochs
    batch_size = 32                    # Batch size - LIGHT: moderate batch size
    learning_rate = 0.001              # Learning rate - LIGHT: slightly higher for faster convergence
    patience = 5                       # Early stopping patience - LIGHT: less patience
    lradj = 'type1'                    # Learning rate adjustment strategy
    
    # Loss and optimization
    loss = 'MSE'                       # Loss function: 'MSE', 'MAE', 'Huber'
    use_amp = False                    # Automatic mixed precision (can speed up training)
    
    # System settings
    num_workers = 4                    # DataLoader workers - LIGHT: fewer workers
    seed = 2024                        # Random seed for reproducibility
    
    # Task specific
    task_name = 'short_term_forecast'  # Task type: 'short_term_forecast' for financial prediction
    
    # Experiment tracking
    des = 'light_config'               # Experiment description
    checkpoints = f'./checkpoints/TimesNet_light_{datetime.now().strftime("%Y%m%d_%H%M")}'
    
# Create config instance
args = LightConfig()

print("🔧 Light Configuration Loaded:")
print(f"   📏 Sequence Length: {args.seq_len}")
print(f"   🎯 Prediction Length: {args.pred_len}")
print(f"   🧠 Model Dimension: {args.d_model}")
print(f"   ⚡ Epochs: {args.train_epochs}")
print(f"   📊 Batch Size: {args.batch_size}")

🔧 Light Configuration Loaded:
   📏 Sequence Length: 50
   🎯 Prediction Length: 5
   🧠 Model Dimension: 32
   ⚡ Epochs: 10
   📊 Batch Size: 32


## 🎛️ Tweakable Parameters

Modify these parameters to experiment with different configurations:

In [7]:
# ================================
# TWEAKABLE PARAMETERS - EXPERIMENT
# ================================

# Modify these for quick experiments:

# --- Sequence parameters (affect model complexity and data usage) ---
args.seq_len = 50          # Try: 30, 50, 100 (longer = more context, slower training)
args.pred_len = 5          # Try: 3, 5, 10 (longer = harder prediction task)

# --- Model size (affect memory usage and training time) ---
args.d_model = 32          # Try: 16, 32, 64 (larger = more capacity, slower)
args.d_ff = 64             # Try: 32, 64, 128 (usually 2x d_model)
args.n_heads = 4           # Try: 2, 4, 8 (must divide d_model evenly)
args.e_layers = 2          # Try: 1, 2, 3 (more layers = deeper model)

# --- TimesNet specific ---
args.top_k = 3             # Try: 2, 3, 5 (more frequencies = more complex patterns)
args.num_kernels = 3       # Try: 2, 3, 6 (more kernels = more feature extraction)

# --- Training parameters ---
args.train_epochs = 10     # Try: 5, 10, 20
args.batch_size = 32       # Try: 16, 32, 64 (larger = faster but more memory)
args.learning_rate = 0.001 # Try: 0.0001, 0.001, 0.01

# --- Advanced tweaks ---
args.dropout = 0.1         # Try: 0.0, 0.1, 0.2 (higher = more regularization)
args.moving_avg = 25       # Try: 15, 25, 50 (window for trend decomposition)

print(f"✏️ Updated Configuration:")
print(f"   Model Size: d_model={args.d_model}, d_ff={args.d_ff}, heads={args.n_heads}, layers={args.e_layers}")
print(f"   TimesNet: top_k={args.top_k}, kernels={args.num_kernels}")
print(f"   Training: epochs={args.train_epochs}, batch={args.batch_size}, lr={args.learning_rate}")

✏️ Updated Configuration:
   Model Size: d_model=32, d_ff=64, heads=4, layers=2
   TimesNet: top_k=3, kernels=3
   Training: epochs=10, batch=32, lr=0.001


## 🚀 Training Setup and Execution

In [8]:
# Setup device and create checkpoint directory
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Enhanced device information
print(f"🎯 Selected Device: {device}")
if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f}GB")
    print("⚡ Parallel processing: ENABLED (automatic)")
    print("💡 Tips: Model will automatically use GPU cores for faster training")
else:
    print("⚠️  Running on CPU - training will be slower")
    print("💡 Tips: Install CUDA-compatible PyTorch for GPU acceleration")
os.makedirs(args.checkpoints, exist_ok=True)

print(f"🔥 Using device: {device}")
print(f"📁 Checkpoints: {args.checkpoints}")

# Data loader setup
def create_data_loader(flag):
    args.validation_length = args.val_len
    args.test_length = args.test_len
    
    dataset = Dataset_Custom(
        args=args,
        root_path=args.root_path,
        data_path=args.data_path,
        flag=flag,
        size=[args.seq_len, args.label_len, args.pred_len],
        features=args.features,
        target=args.target,
        scale=True,
        timeenc=1 if args.embed == 'timeF' else 0,
        freq=args.freq
    )
    
    shuffle = (flag == 'train')
    data_loader = DataLoader(
        dataset,
        batch_size=args.batch_size,
        shuffle=shuffle,
        num_workers=args.num_workers,
        drop_last=True
    )
    return data_loader

# Create data loaders
train_loader = create_data_loader('train')
val_loader = create_data_loader('val')
test_loader = create_data_loader('test')

print(f"📊 Data loaders created:")
print(f"   Train: {len(train_loader)} batches")
print(f"   Val: {len(val_loader)} batches")
print(f"   Test: {len(test_loader)} batches")

🎯 Selected Device: cpu
⚠️  Running on CPU - training will be slower
💡 Tips: Install CUDA-compatible PyTorch for GPU acceleration
🔥 Using device: cpu
📁 Checkpoints: ./checkpoints/TimesNet_light_20250616_1010
2025-06-16 10:10:10,221 [INFO] TSLib: Initializing Dataset_Custom with targets: log_Close
2025-06-16 10:10:10,456 [INFO] TSLib: Border calculation: n=7109, s=50, p=5, v=10, t=10
2025-06-16 10:10:10,457 [INFO] TSLib: border1s = [0, 7039, 7049]
2025-06-16 10:10:10,459 [INFO] TSLib: border2s = [7089, 7099, 7109]
2025-06-16 10:10:10,456 [INFO] TSLib: Border calculation: n=7109, s=50, p=5, v=10, t=10
2025-06-16 10:10:10,457 [INFO] TSLib: border1s = [0, 7039, 7049]
2025-06-16 10:10:10,459 [INFO] TSLib: border2s = [7089, 7099, 7109]
2025-06-16 10:10:10,494 [INFO] TSLib: Loaded data shape: (7109, 119)
2025-06-16 10:10:10,495 [INFO] TSLib: Data_x shape: (7089, 118), Data_y shape: (7089, 118)
2025-06-16 10:10:10,495 [INFO] TSLib: Initializing Dataset_Custom with targets: log_Close
2025-06-16 

In [None]:
# 📊 DATA LOADING SPEED TEST
# Test if the bottleneck is in data loading itself

def test_data_loading_speed():
    """Test data loading performance independently"""
    print("📊 Testing Data Loading Speed...")
    print("-" * 30)
    
    # Since we're using the standard TSLib approach, test Dataset_Custom directly
    print("1️⃣ Testing Dataset Creation...")
    ds_start = time.time()
    
    # Create dataset using the same approach as the working data loader
    test_data_set = Dataset_Custom(
        args=args,
        root_path=args.root_path,
        data_path=args.data_path,
        flag='train',
        size=[args.seq_len, args.label_len, args.pred_len],
        features=args.features,
        target=args.target,
        scale=True,
        timeenc=1 if args.embed == 'timeF' else 0,
        freq=args.freq
    )
    
    ds_time = time.time() - ds_start
    print(f"   ⏱️ Dataset creation: {ds_time:.3f}s")
    print(f"   📈 Train samples: {len(test_data_set)}")
    
    # Test dataloader creation
    print("2️⃣ Testing DataLoader Creation...")
    dl_start = time.time()
    test_train_loader = DataLoader(
        test_data_set, 
        batch_size=args.batch_size, 
        shuffle=True,
        num_workers=args.num_workers,
        drop_last=True
    )
    dl_time = time.time() - dl_start
    print(f"   ⏱️ DataLoader creation: {dl_time:.3f}s")
    print(f"   📊 Number of batches: {len(test_train_loader)}")
    
    # Test batch loading speed
    print("3️⃣ Testing Batch Loading Speed...")
    batch_times = []
    
    # Test first 5 batches
    test_iter = iter(test_train_loader)
    for i in range(min(5, len(test_train_loader))):
        batch_start = time.time()
        batch_x, batch_y, batch_x_mark, batch_y_mark = next(test_iter)
        batch_time = time.time() - batch_start
        batch_times.append(batch_time)
        print(f"   Batch {i+1}: {batch_time:.3f}s - Shape: {batch_x.shape}")
    
    avg_batch_load = sum(batch_times) / len(batch_times)
    print(f"\n📊 Data Loading Results:")
    print(f"   ⏱️ Average batch load time: {avg_batch_load:.3f}s")
    print(f"   🚀 Est. total data load time: {avg_batch_load * len(test_train_loader):.1f}s")
    
    if avg_batch_load > 0.1:
        print("   ⚠️ Data loading is slow! Recommendations:")
        print("   💡 Reduce num_workers to 0")
        print("   💡 Check if data file is corrupted")
        print("   💡 Consider using smaller batch size")
    else:
        print("   ✅ Data loading speed looks good!")
    
    return avg_batch_load

# Run data loading test
print("🧪 Testing data loading performance...")
data_load_speed = test_data_loading_speed()

🧪 Testing data loading performance...
📊 Testing Data Loading Speed...
------------------------------
1️⃣ Testing Dataset Creation...


TypeError: Dataset_Custom.__init__() missing 1 required positional argument: 'args'

In [None]:
# Initialize TimesNet model
model = TimesNet(args).to(device)

# Setup training components
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
early_stopping = EarlyStopping(patience=args.patience, verbose=True)

# Model info
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"🧠 TimesNet Light Model Initialized:")
print(f"   📊 Total Parameters: {total_params:,}")
print(f"   🎯 Trainable Parameters: {trainable_params:,}")
print(f"   💾 Model Size: ~{total_params * 4 / 1024 / 1024:.1f} MB")

In [None]:
# Training function with progress tracking
def train_epoch():
    model.train()
    total_loss = 0.0
    num_batches = len(train_loader)
    
    epoch_start_time = time.time()
    print(f"🏃 Training on {num_batches} batches...")
    
    for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(train_loader):
        # Move to device
        batch_x = batch_x.float().to(device)
        batch_y = batch_y.float().to(device)
        batch_x_mark = batch_x_mark.float().to(device)
        batch_y_mark = batch_y_mark.float().to(device)
        
        # Prepare decoder input
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
        
        # Forward pass
        optimizer.zero_grad()
        outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
        
        # Calculate loss (only on target columns - first 4 features)
        target_outputs = outputs[:, -args.pred_len:, :4]
        target_y = batch_y[:, -args.pred_len:, :4]
        loss = criterion(target_outputs, target_y)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        
        # Progress reporting - EVERY BATCH for Light Config
        progress_pct = (i + 1) / num_batches * 100
        avg_loss = total_loss / (i + 1)
        elapsed = time.time() - epoch_start_time
        remaining = elapsed / (i + 1) * (num_batches - i - 1)
        
        # Show progress for every batch (Light config)
        print(f"   📊 Batch {i+1:3d}/{num_batches} ({progress_pct:5.1f}%) - "
              f"Loss: {loss.item():.6f} (Avg: {avg_loss:.6f}) - "
              f"⏱️ Elapsed: {elapsed:.1f}s, ETA: {remaining:.1f}s")
    
    epoch_time = time.time() - epoch_start_time
    avg_loss = total_loss / num_batches
    print(f"✅ Epoch completed in {epoch_time:.1f}s. Average loss: {avg_loss:.6f}")
    return avg_loss

# Validation function
def validate_epoch():
    model.eval()
    total_loss = 0.0
    num_batches = 0
    
    with torch.no_grad():
        for batch_x, batch_y, batch_x_mark, batch_y_mark in val_loader:
            batch_x = batch_x.float().to(device)
            batch_y = batch_y.float().to(device)
            batch_x_mark = batch_x_mark.float().to(device)
            batch_y_mark = batch_y_mark.float().to(device)
            
            dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
            dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
            
            outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
            
            target_outputs = outputs[:, -args.pred_len:, :4]
            target_y = batch_y[:, -args.pred_len:, :4]
            loss = criterion(target_outputs, target_y)
            
            total_loss += loss.item()
            num_batches += 1
    
    avg_loss = total_loss / num_batches if num_batches > 0 else float('inf')
    return avg_loss

print("🔧 Training functions defined")

In [None]:
# Main training loop
print(f"🚀 Starting TimesNet Light Training ({args.train_epochs} epochs)")
print(f"⏰ Estimated time: ~{args.train_epochs * 5} minutes")
print("="*60)

best_val_loss = float('inf')
train_losses = []
val_losses = []

training_start_time = time.time()

for epoch in range(args.train_epochs):
    print(f"\n🔄 Epoch {epoch+1}/{args.train_epochs}")
    
    # Train
    train_loss = train_epoch()
    train_losses.append(train_loss)
    
    # Validate
    print("🔍 Running validation...")
    val_loss = validate_epoch()
    val_losses.append(val_loss)
    
    # Log progress
    print(f"📈 Epoch {epoch+1} Results: Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}")
    
    # Adjust learning rate
    adjust_learning_rate(optimizer, epoch + 1, args)
    
    # Early stopping
    early_stopping(val_loss, model, args.checkpoints)
    if early_stopping.early_stop:
        print("⏹️ Early stopping triggered")
        break
    
    # Save best model
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        torch.save(model.state_dict(), f"{args.checkpoints}/best_model.pth")
        print(f"💾 New best model saved (Val Loss: {val_loss:.6f})")

total_training_time = time.time() - training_start_time
print(f"\n🎉 Training completed in {total_training_time/60:.1f} minutes!")
print(f"🏆 Best validation loss: {best_val_loss:.6f}")

In [None]:
# 🔍 COMPREHENSIVE PERFORMANCE DIAGNOSTICS
# This cell will help identify exactly where the bottleneck is occurring

import time
import torch.profiler

def diagnose_training_bottleneck():
    """Run comprehensive diagnostics to identify training bottlenecks"""
    print("🔍 Running Training Performance Diagnostics...")
    print("="*60)
    
    # 1. Test data loading speed
    print("1️⃣ Testing Data Loading Speed...")
    data_start = time.time()
    train_iter = iter(train_loader)
    batch_x, batch_y, batch_x_mark, batch_y_mark = next(train_iter)
    data_load_time = time.time() - data_start
    print(f"   ⏱️ First batch load time: {data_load_time:.3f}s")
    
    # 2. Test data transfer to device
    print("\n2️⃣ Testing Data Transfer to Device...")
    transfer_start = time.time()
    batch_x = batch_x.float().to(device)
    batch_y = batch_y.float().to(device)
    batch_x_mark = batch_x_mark.float().to(device)
    batch_y_mark = batch_y_mark.float().to(device)
    transfer_time = time.time() - transfer_start
    print(f"   ⏱️ Device transfer time: {transfer_time:.3f}s")
    
    # 3. Test decoder input preparation
    print("\n3️⃣ Testing Decoder Input Preparation...")
    prep_start = time.time()
    dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
    dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
    prep_time = time.time() - prep_start
    print(f"   ⏱️ Decoder prep time: {prep_time:.3f}s")
    
    # 4. Test model forward pass
    print("\n4️⃣ Testing Model Forward Pass...")
    model.eval()  # Set to eval for consistent timing
    forward_start = time.time()
    with torch.no_grad():
        outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
    forward_time = time.time() - forward_start
    print(f"   ⏱️ Forward pass time: {forward_time:.3f}s")
    
    # 5. Test loss calculation
    print("\n5️⃣ Testing Loss Calculation...")
    loss_start = time.time()
    target_outputs = outputs[:, -args.pred_len:, :4]
    target_y = batch_y[:, -args.pred_len:, :4]
    loss = criterion(target_outputs, target_y)
    loss_time = time.time() - loss_start
    print(f"   ⏱️ Loss calculation time: {loss_time:.3f}s")
    
    # 6. Test backward pass
    print("\n6️⃣ Testing Backward Pass...")
    model.train()  # Set back to training mode
    # Create fresh batch for backward pass
    optimizer.zero_grad()
    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
    target_outputs = outputs[:, -args.pred_len:, :4]
    target_y = batch_y[:, -args.pred_len:, :4]
    loss = criterion(target_outputs, target_y)
    
    backward_start = time.time()
    loss.backward()
    backward_time = time.time() - backward_start
    print(f"   ⏱️ Backward pass time: {backward_time:.3f}s")
    
    # 7. Test optimizer step
    print("\n7️⃣ Testing Optimizer Step...")
    step_start = time.time()
    optimizer.step()
    step_time = time.time() - step_start
    print(f"   ⏱️ Optimizer step time: {step_time:.3f}s")
    
    # 8. Calculate total time per batch
    total_time = data_load_time + transfer_time + prep_time + forward_time + loss_time + backward_time + step_time
    print(f"\n📊 SUMMARY:")
    print(f"   🔢 Total time per batch: {total_time:.3f}s")
    print(f"   🏃 Expected time for {len(train_loader)} batches: {total_time * len(train_loader):.1f}s ({total_time * len(train_loader)/60:.1f} min)")
    
    # Identify bottlenecks
    times = {
        'Data Loading': data_load_time,
        'Device Transfer': transfer_time,
        'Decoder Prep': prep_time,
        'Forward Pass': forward_time,
        'Loss Calculation': loss_time,
        'Backward Pass': backward_time,
        'Optimizer Step': step_time
    }
    
    print(f"\n🎯 BOTTLENECK ANALYSIS:")
    sorted_times = sorted(times.items(), key=lambda x: x[1], reverse=True)
    for i, (operation, op_time) in enumerate(sorted_times):
        percentage = (op_time / total_time) * 100
        symbol = "🔴" if i == 0 else "🟡" if i == 1 else "🟢"
        print(f"   {symbol} {operation}: {op_time:.3f}s ({percentage:.1f}%)")
    
    # Memory usage
    if torch.cuda.is_available():
        print(f"\n💾 GPU Memory Usage:")
        print(f"   📊 Allocated: {torch.cuda.memory_allocated()/1024**2:.1f} MB")
        print(f"   📈 Reserved: {torch.cuda.memory_reserved()/1024**2:.1f} MB")
    
    return total_time

# Run diagnostics
single_batch_time = diagnose_training_bottleneck()

print(f"\n💡 RECOMMENDATIONS:")
if single_batch_time > 2.0:
    print("   ⚠️ Training is slower than expected. Main issues likely:")
    print("   1. Model is too complex for ultra-fast config")
    print("   2. Data loading bottleneck")
    print("   3. GPU memory issues")
elif single_batch_time > 0.5:
    print("   🟡 Training speed is acceptable but could be faster")
    print("   💡 Consider reducing model complexity further")
else:
    print("   ✅ Training speed looks good!")
    print("   🚀 Full training should complete quickly")

In [None]:
# ⚡ MICRO-BENCHMARK - Quick Speed Test
# Run this first to get immediate feedback on performance

def quick_speed_test():
    """Quick 10-batch speed test to identify obvious bottlenecks"""
    print("⚡ Running Quick Speed Test (10 batches)...")
    print("-" * 40)
    
    model.train()
    test_batches = min(10, len(train_loader))
    
    # Warm-up
    print("🔥 Warming up...")
    train_iter = iter(train_loader)
    batch_x, batch_y, batch_x_mark, batch_y_mark = next(train_iter)
    batch_x = batch_x.float().to(device)
    batch_y = batch_y.float().to(device)
    batch_x_mark = batch_x_mark.float().to(device)
    batch_y_mark = batch_y_mark.float().to(device)
    
    # Quick forward pass for warm-up
    dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
    dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
    
    print("✅ Warm-up complete. Starting speed test...")
    
    # Test loop
    total_start = time.time()
    batch_times = []
    
    train_iter = iter(train_loader)  # Fresh iterator
    
    for i in range(test_batches):
        batch_start = time.time()
        
        # Get batch
        batch_x, batch_y, batch_x_mark, batch_y_mark = next(train_iter)
        
        # Move to device
        batch_x = batch_x.float().to(device)
        batch_y = batch_y.float().to(device)
        batch_x_mark = batch_x_mark.float().to(device)
        batch_y_mark = batch_y_mark.float().to(device)
        
        # Prepare decoder input
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
        
        # Forward pass
        optimizer.zero_grad()
        outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
        
        # Calculate loss
        target_outputs = outputs[:, -args.pred_len:, :4]
        target_y = batch_y[:, -args.pred_len:, :4]
        loss = criterion(target_outputs, target_y)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        batch_time = time.time() - batch_start
        batch_times.append(batch_time)
        
        print(f"   Batch {i+1:2d}: {batch_time:.3f}s - Loss: {loss.item():.6f}")
    
    total_time = time.time() - total_start
    avg_batch_time = sum(batch_times) / len(batch_times)
    
    print(f"\n📊 SPEED TEST RESULTS:")
    print(f"   ⏱️ Total time: {total_time:.2f}s")
    print(f"   📈 Average per batch: {avg_batch_time:.3f}s")
    print(f"   🚀 Estimated full training time: {avg_batch_time * len(train_loader) * args.train_epochs / 60:.1f} minutes")
    
    # Classification
    if avg_batch_time < 0.1:
        print("   ✅ EXCELLENT: Training should be very fast!")
    elif avg_batch_time < 0.5:
        print("   🟢 GOOD: Training speed looks acceptable")
    elif avg_batch_time < 2.0:
        print("   🟡 MODERATE: Training will be slower than expected")
    else:
        print("   🔴 SLOW: Major bottleneck detected!")
        print("   💡 Consider reducing model size or batch size")
    
    return avg_batch_time

# Run quick speed test
print("🏃‍♀️ Starting quick speed test...")
avg_time = quick_speed_test()

# Conditional message
if avg_time > 1.0:
    print(f"\n⚠️ SPEED ISSUE DETECTED!")
    print(f"   📊 Current speed: {avg_time:.3f}s per batch")
    print(f"   🎯 Target speed: <0.5s per batch for ultra-fast config")
    print(f"   💡 Run the full diagnostics above to identify the bottleneck")
else:
    print(f"\n✅ Speed looks good! Proceeding with training...")

## 🔧 **TROUBLESHOOTING GUIDE - If Training is Too Slow**

If you're experiencing slow training performance, follow these steps in order:

### 📋 **Step-by-Step Debugging Process:**

**1. 🚨 First: Try Debugging Mode**
```python
# Switch to the most minimal configuration possible
switch_to_debugging()
```

**2. ⚡ Quick Speed Test**
- Run the "Quick Speed Test" cell above
- This will tell you immediately if there's a major bottleneck
- Target: <0.5s per batch for ultra-fast config

**3. 📊 Data Loading Test**
- Run the "Data Loading Speed Test" cell
- This isolates data loading performance
- Target: <0.1s per batch for data loading

**4. 🔍 Full Diagnostics**
- Run the "Comprehensive Performance Diagnostics" cell
- This will show you exactly where the bottleneck is
- Look for the 🔴 red items in the bottleneck analysis

**5. 🛠️ Common Solutions:**

| Problem | Solution |
|---------|----------|
| 🔴 Data Loading Slow | Set `num_workers = 0` |
| 🔴 Forward Pass Slow | Use smaller `d_model` or fewer layers |
| 🔴 Memory Issues | Reduce `batch_size` |
| 🔴 GPU Not Used | Check CUDA availability |
| 🔴 Model Too Complex | Switch to debugging config |

**6. 🎯 Performance Targets:**

| Configuration | Target Time/Batch | Total Training Time |
|---------------|-------------------|-------------------|
| 🚨 **Debugging** | **<0.1s** | **<30 seconds** |
| ⚡ **Ultra-Fast** | **<0.5s** | **<2 minutes** |
| 💡 **Light** | <1.0s | <10 minutes |

### 💡 **Quick Fixes to Try:**

1. **Switch to debugging mode** (most important)
2. **Disable multiprocessing**: Set `num_workers = 0`
3. **Disable AMP**: Set `use_amp = False`
4. **Reduce batch size**: Try `batch_size = 8`
5. **Check GPU usage**: Ensure CUDA is available

In [None]:
# 🚨 EMERGENCY MODE - Run this if everything else is too slow!

def emergency_debug_mode():
    """
    Ultra-minimal test that should complete in under 30 seconds
    This will tell us if the basic training loop works at all
    """
    print("🚨 EMERGENCY DEBUG MODE ACTIVATED!")
    print("🎯 Goal: Complete 1 epoch with minimal model in <30 seconds")
    print("="*50)
    
    # Switch to debugging config
    switch_to_debugging()
    
    # Quick data setup
    print("📊 Setting up minimal data...")
    
    # Use the same data loading approach as the working parts
    emergency_data_set = Dataset_Custom(
        args=args,
        root_path=args.root_path,
        data_path=args.data_path,
        flag='train',
        size=[args.seq_len, args.label_len, args.pred_len],
        features=args.features,
        target=args.target,
        scale=True,
        timeenc=1 if args.embed == 'timeF' else 0,
        freq=args.freq
    )
    
    # Create minimal dataloader
    emergency_train_loader = DataLoader(
        emergency_data_set, 
        batch_size=args.batch_size, 
        shuffle=False,  # No shuffle for speed
        num_workers=0,  # No multiprocessing
        drop_last=True
    )
    
    print(f"📈 Emergency setup: {len(emergency_train_loader)} batches")
    
    # Create minimal model
    print("🧠 Creating minimal model...")
    from models import TimesNet
    emergency_model = TimesNet.Model(args).float().to(device)
    
    # Count parameters
    emergency_params = sum(p.numel() for p in emergency_model.parameters())
    print(f"🔢 Emergency model: {emergency_params:,} parameters")
    
    # Setup training
    emergency_optimizer = torch.optim.Adam(emergency_model.parameters(), lr=args.learning_rate)
    emergency_criterion = nn.MSELoss()
    
    # Run emergency training
    print("🚀 Starting emergency training...")
    emergency_start = time.time()
    
    emergency_model.train()
    total_loss = 0.0
    batch_count = 0
    
    for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(emergency_train_loader):
        batch_start = time.time()
        
        # Move to device
        batch_x = batch_x.float().to(device)
        batch_y = batch_y.float().to(device)
        batch_x_mark = batch_x_mark.float().to(device)
        batch_y_mark = batch_y_mark.float().to(device)
        
        # Prepare decoder input
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
        
        # Forward pass
        emergency_optimizer.zero_grad()
        outputs = emergency_model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
        
        # Calculate loss
        target_outputs = outputs[:, -args.pred_len:, :4]
        target_y = batch_y[:, -args.pred_len:, :4]
        loss = emergency_criterion(target_outputs, target_y)
        
        # Backward pass
        loss.backward()
        emergency_optimizer.step()
        
        total_loss += loss.item()
        batch_count += 1
        
        batch_time = time.time() - batch_start
        print(f"   ⚡ Batch {i+1}: {batch_time:.3f}s - Loss: {loss.item():.6f}")
        
        # Stop after 5 batches or if too slow
        if i >= 4 or batch_time > 2.0:
            if batch_time > 2.0:
                print(f"   🔴 STOPPING: Batch too slow ({batch_time:.1f}s)")
            break
    
    emergency_time = time.time() - emergency_start
    avg_loss = total_loss / batch_count if batch_count > 0 else 0
    
    print(f"\n📊 EMERGENCY TEST RESULTS:")
    print(f"   ⏱️ Total time: {emergency_time:.1f}s")
    print(f"   📈 Batches completed: {batch_count}")
    print(f"   💯 Average loss: {avg_loss:.6f}")
    
    if emergency_time < 30:
        print("   ✅ SUCCESS: Basic training works!")
        print("   💡 The issue may be with your larger configuration")
    else:
        print("   🔴 MAJOR ISSUE: Even minimal training is too slow")
        print("   💡 Check GPU availability, data corruption, or system resources")
    
    return emergency_time < 30

print("🚨 EMERGENCY MODE AVAILABLE")
print("💡 Run emergency_debug_mode() if all other configs are too slow")

## 📊 Results and Analysis

In [None]:
# Load best model and test
model.load_state_dict(torch.load(f"{args.checkpoints}/best_model.pth", weights_only=False))
model.eval()

# Test evaluation
preds = []
trues = []

print("🧪 Testing model...")
with torch.no_grad():
    for batch_x, batch_y, batch_x_mark, batch_y_mark in test_loader:
        batch_x = batch_x.float().to(device)
        batch_y = batch_y.float().to(device)
        batch_x_mark = batch_x_mark.float().to(device)
        batch_y_mark = batch_y_mark.float().to(device)
        
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
        
        outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
        
        pred = outputs[:, -args.pred_len:, :4].detach().cpu().numpy()
        true = batch_y[:, -args.pred_len:, :4].detach().cpu().numpy()
        
        preds.append(pred)
        trues.append(true)

# Calculate metrics
if preds:
    preds = np.concatenate(preds, axis=0)
    trues = np.concatenate(trues, axis=0)
    
    mae, mse, rmse, mape, mspe = metric(preds, trues)
    
    print("\n📊 TimesNet Light - Test Results:")
    print(f"   🎯 MSE:  {mse:.6f}")
    print(f"   📏 MAE:  {mae:.6f}")
    print(f"   📐 RMSE: {rmse:.6f}")
    print(f"   📈 MAPE: {mape:.6f}%")
    print(f"   📉 MSPE: {mspe:.6f}%")
    
    # Summary
    print(f"\n📋 Configuration Summary:")
    print(f"   ⚡ Model: Light ({total_params:,} params)")
    print(f"   📏 Sequence: {args.seq_len} → {args.pred_len}")
    print(f"   🧠 Architecture: d_model={args.d_model}, layers={args.e_layers}, heads={args.n_heads}")
    print(f"   ⏱️ Training time: {total_training_time/60:.1f} minutes")
    print(f"   🏆 Final performance: RMSE={rmse:.6f}")
else:
    print("⚠️ No test data available")

In [None]:
# 🚀 Enhanced Progress Monitoring for Light Config
print("⚡ Light Configuration - Enhanced Batch Monitoring Enabled")
print("📊 Every batch will show detailed progress for faster feedback")
print("💡 This helps track training progress in real-time")
print()

# Override the train_epoch function with detailed batch printing
def train_epoch_verbose():
    """Enhanced training function with per-batch progress"""
    model.train()
    total_loss = 0.0
    num_batches = len(train_loader)
    
    epoch_start_time = time.time()
    print(f"🏃 Training on {num_batches} batches with DETAILED progress...")
    print(f"📊 Batch format: [Batch X/Y (Z%)] - Loss: current (average) - Time: elapsed/remaining")
    print("-" * 80)
    
    for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(train_loader):
        batch_start = time.time()
        
        # Move to device
        batch_x = batch_x.float().to(device)
        batch_y = batch_y.float().to(device)
        batch_x_mark = batch_x_mark.float().to(device)
        batch_y_mark = batch_y_mark.float().to(device)
        
        # Forward pass
        optimizer.zero_grad()
        
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(device)
        
        outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
        target_outputs = outputs[:, -args.pred_len:, :4]
        target_y = batch_y[:, -args.pred_len:, :4]
        loss = criterion(target_outputs, target_y)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        batch_time = time.time() - batch_start
        
        # DETAILED progress for every batch
        progress_pct = (i + 1) / num_batches * 100
        avg_loss = total_loss / (i + 1)
        elapsed = time.time() - epoch_start_time
        remaining = elapsed / (i + 1) * (num_batches - i - 1)
        
        # Progress bar visualization
        bar_length = 20
        filled_length = int(bar_length * (i + 1) // num_batches)
        bar = '█' * filled_length + '-' * (bar_length - filled_length)
        
        print(f"   [{bar}] Batch {i+1:3d}/{num_batches} ({progress_pct:5.1f}%) - "
              f"Loss: {loss.item():.6f} (Avg: {avg_loss:.6f}) - "
              f"Time: {elapsed:.1f}s/{remaining:.1f}s - "
              f"Batch: {batch_time:.2f}s")
    
    epoch_time = time.time() - epoch_start_time
    avg_loss = total_loss / num_batches
    print("-" * 80)
    print(f"✅ Epoch completed in {epoch_time:.1f}s. Average loss: {avg_loss:.6f}")
    print(f"⚡ Average time per batch: {epoch_time/num_batches:.2f}s")
    return avg_loss

print("🎯 Enhanced training function ready!")
print("💡 Now each batch will show:")
print("   - Progress bar visualization")
print("   - Current and average loss")
print("   - Elapsed and remaining time")
print("   - Individual batch processing time")

## ⚡ Enhanced Training with Detailed Batch Progress

Now you can use the enhanced training function that shows progress for **every single batch**:

### 🎯 **Features Added:**
- **📊 Progress Bar**: Visual progress indicator for each epoch
- **⏱️ Time Tracking**: Shows elapsed time and estimated remaining time
- **📈 Loss Monitoring**: Current batch loss and running average
- **🚀 Batch Timing**: Individual batch processing time

### 💡 **Why This Helps:**
- **No More Waiting**: See progress immediately, no need to wait for epoch completion
- **Performance Insights**: Identify if any batches are unusually slow
- **Loss Tracking**: Monitor if the model is learning batch by batch
- **Time Estimation**: Know exactly when training will complete

In [None]:
# 🚀 RUN ENHANCED TRAINING LOOP WITH DETAILED BATCH PROGRESS
print("🎯 Starting Enhanced TimesNet Light Training")
print("⚡ Every batch will show detailed progress!")
print("=" * 80)

# Initialize tracking
train_losses = []
val_losses = []
start_time = time.time()

# Training loop with enhanced progress
for epoch in range(args.train_epochs):
    epoch_start = time.time()
    
    print(f"\n🔥 EPOCH {epoch+1}/{args.train_epochs}")
    print(f"📅 Started at: {datetime.now().strftime('%H:%M:%S')}")
    
    # Enhanced training with detailed batch progress
    train_loss = train_epoch_verbose()
    
    # Validation (standard)
    print(f"\n🔍 Validating...")
    val_loss = validate_epoch()
    
    # Learning rate adjustment
    adjust_learning_rate(optimizer, epoch + 1, args)
    
    # Record losses
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    
    epoch_time = time.time() - epoch_start
    total_elapsed = time.time() - start_time
    
    print(f"\n📊 EPOCH {epoch+1} SUMMARY:")
    print(f"   📈 Train Loss: {train_loss:.6f}")
    print(f"   📉 Val Loss: {val_loss:.6f}")
    print(f"   ⏱️ Epoch Time: {epoch_time:.1f}s ({epoch_time/60:.1f} min)")
    print(f"   🕒 Total Time: {total_elapsed:.1f}s ({total_elapsed/60:.1f} min)")
    
    # Early stopping check
    early_stopping(val_loss, model, args.checkpoints)
    if early_stopping.early_stop:
        print(f"\n⏹️ Early stopping triggered at epoch {epoch+1}")
        break
    
    # Estimate remaining time
    if epoch < args.train_epochs - 1:
        avg_epoch_time = total_elapsed / (epoch + 1)
        remaining_epochs = args.train_epochs - (epoch + 1)
        estimated_remaining = avg_epoch_time * remaining_epochs
        print(f"   ⏳ Estimated remaining time: {estimated_remaining/60:.1f} minutes")

total_time = time.time() - start_time

print("\n" + "=" * 80)
print("🎉 ENHANCED TRAINING COMPLETED!")
print(f"⏰ Total training time: {total_time:.1f}s ({total_time/60:.1f} minutes)")
print(f"📊 Final train loss: {train_losses[-1]:.6f}")
print(f"📉 Final val loss: {val_losses[-1]:.6f}")
print(f"🏆 Best val loss: {min(val_losses):.6f} (epoch {val_losses.index(min(val_losses))+1})")
print("💡 Check the detailed batch progress above for training insights!")