# Enhanced Transformer Training for Cryptocurrency Trading

**Phase 1 Implementation - GPU Accelerated Training**

This notebook implements the enhanced transformer model with:
- Extended sequence length (250 steps)
- Temporal attention bias
- Multi-scale processing
- Advanced feature engineering
- GPU acceleration support

## 🚀 Setup and Installation

In [1]:
# Cell 1: Install All Requirements for Enhanced Transformer Training
import subprocess
import sys
import os
from IPython.display import clear_output

print("🚀 Installing Enhanced Transformer Requirements...")
print("=" * 60)

# Update pip first
!pip install --upgrade pip

# Install stable PyTorch version (2.1.0 is stable and doesn't have the API issues)
print("🔥 Installing PyTorch with GPU support...")
!pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121

# Install core ML libraries
print("📊 Installing core ML libraries...")
!pip install numpy pandas scikit-learn matplotlib seaborn plotly

# Install technical analysis libraries
print("📈 Installing technical analysis libraries...")
!pip install ta talib-binary

# Install utilities
print("🔧 Installing utilities...")
!pip install tqdm psutil requests ipywidgets

# Install Jupyter
print("📓 Installing Jupyter...")
!pip install jupyter

# Install stable-baselines3 for RL
!pip install stable-baselines3

clear_output(wait=True)
print("✅ All requirements installed successfully!")

# Check GPU availability
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
print(f"🚀 Default device: {device}")

if torch.cuda.is_available():
  print(f"💻 GPU: {torch.cuda.get_device_name(0)}")
  print(f"🧠 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
elif torch.backends.mps.is_available():
  print("🍎 Apple Silicon GPU available")
else:
  print("⚠️ No GPU detected, using CPU")

print("\n🎯 Ready for enhanced transformer training!")

✅ All requirements installed successfully!
🚀 Default device: cuda
💻 GPU: NVIDIA GeForce RTX 4070
🧠 GPU Memory: 12.5 GB

🎯 Ready for enhanced transformer training!


In [2]:
# Check GPU availability and handle PyTorch import issues
import torch
import warnings
import sys

# Workaround for PyTorch import issues
try:
    from torch._utils_internal import justknobs_check
except ImportError:
    # Create a dummy function if the import fails
    def justknobs_check(name, default=False):
        return default

# Fix for PyTorch 2.2+ pytree API changes
try:
    # Check if register_pytree_node exists
    from torch.utils._pytree import register_pytree_node
except ImportError:
    # Apply monkey patch for older pytree API
    import torch.utils._pytree as pytree
    if not hasattr(pytree, 'register_pytree_node'):
        def register_pytree_node(*args, **kwargs):
            # Dummy implementation for compatibility
            pass
        pytree.register_pytree_node = register_pytree_node

# Disable torch.compile to avoid more issues
torch._dynamo.config.disable = True

warnings.filterwarnings('ignore')

device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
print(f"🚀 Using device: {device}")
print(f"🔧 PyTorch version: {torch.__version__}")

# Check if CUDA is available
if torch.cuda.is_available():
    print(f"💻 GPU: {torch.cuda.get_device_name(0)}")
    print(f"🧠 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ No GPU detected, using CPU")

🚀 Using device: cuda
🔧 PyTorch version: 2.8.0+cu128
💻 GPU: NVIDIA GeForce RTX 4070
🧠 GPU Memory: 12.5 GB


In [3]:
# Install required packages with specific PyTorch version
!pip install pandas numpy scikit-learn matplotlib ta talib-binary

# Install stable PyTorch version
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

print("✅ Packages installed successfully")

# Verify PyTorch installation
import torch
print(f"PyTorch version: {torch.__version__}")

[31mERROR: Could not find a version that satisfies the requirement talib-binary (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for talib-binary[0m[31m
[0mLooking in indexes: https://download.pytorch.org/whl/cu121
Collecting torch
  Using cached https://download.pytorch.org/whl/cu121/torch-2.1.0%2Bcu121-cp310-cp310-linux_x86_64.whl (2200.6 MB)
Collecting triton==2.1.0 (from torch)
  Using cached https://download.pytorch.org/whl/triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89.2 MB)
Installing collected packages: triton, torch
[2K  Attempting uninstall: triton
[2K    Found existing installation: triton 3.4.0
[2K    Uninstalling triton-3.4.0:
[2K      Successfully uninstalled triton-3.4.0
[2K  Attempting uninstall: torch━━━━━━━━━━━━━━━━━━━[0m [32m0/2[0m [triton]
[2K    Found existing installation: torch 2.8.0[0m [32m0/2[0m [triton]
[2K    Uninstalling torch-2.8.0:[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━[0m [32m1/2

In [4]:
!pip -q install gdown

import gdown

# Your shared link
url = "https://drive.google.com/file/d/1cOc9wKmdHxL4sKBZVWhSoAKdMVLE32lj/view?usp=sharing"

# Where to save locally
out_path = "crypto_5min_2years.csv"  # add .zip/.csv/etc if you know the type

gdown.download(url, out_path, fuzzy=True)  # fuzzy=True lets you pass 'view' URLs
print("Saved to:", out_path)

[0m

Downloading...
From: https://drive.google.com/uc?id=1cOc9wKmdHxL4sKBZVWhSoAKdMVLE32lj
To: /mnt/crypto_5min_2years.csv
100%|██████████| 42.7M/42.7M [00:05<00:00, 8.14MB/s]

Saved to: crypto_5min_2years.csv





In [5]:
import pandas as pd
import numpy as np
from datetime import datetime
import os

# Load cryptocurrency data
def load_crypto_data(csv_path='crypto_5min_2years.csv'):
    """Load and preprocess cryptocurrency data"""
    print(f"📊 Loading data from {csv_path}...")
    
    if not os.path.exists(csv_path):
        print(f"❌ File {csv_path} not found!")
        return None
    
    df = pd.read_csv(csv_path)
    print(f"✅ Raw data shape: {df.shape}")
    
    # Handle datetime index
    if 'date' in df.columns:
        df['date'] = pd.to_datetime(df['date'])
        df.set_index('date', inplace=True)
    elif 'timestamp' in df.columns:
        df['date'] = pd.to_datetime(df['timestamp'])
        df.set_index('date', inplace=True)
    else:
        # Create datetime index for sample data
        if len(df) > 0:
            dates = pd.date_range(start='2024-01-01', periods=len(df), freq='5T')
            df.index = dates
    
    print(f"📅 Date range: {df.index.min()} to {df.index.max()}")
    print(f"💰 Symbols: {df['symbol'].unique() if 'symbol' in df.columns else 'Unknown'}")
    
    return df

# Load data
df = load_crypto_data()

if df is not None:
    display(df.head())
    print(f"\n📋 Data info:")
    display(df.info())

📊 Loading data from crypto_5min_2years.csv...
✅ Raw data shape: (630721, 7)
📅 Date range: 2023-09-10 12:15:00 to 2025-09-09 12:15:00
💰 Symbols: Unknown


Unnamed: 0_level_0,tic,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-09-10 12:15:00,BNBUSDT,213.1,213.1,212.9,213.0,385.157
2023-09-10 12:15:00,BTCUSDT,25815.18,25815.19,25801.86,25803.16,25.21043
2023-09-10 12:15:00,ETHUSDT,1625.9,1625.91,1624.86,1624.86,239.2249
2023-09-10 12:20:00,BNBUSDT,213.0,213.1,212.9,213.1,375.347
2023-09-10 12:20:00,BTCUSDT,25803.17,25816.67,25803.16,25812.27,18.98413



📋 Data info:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 630721 entries, 2023-09-10 12:15:00 to 2025-09-09 12:15:00
Data columns (total 6 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   tic     630721 non-null  object 
 1   open    630721 non-null  float64
 2   high    630721 non-null  float64
 3   low     630721 non-null  float64
 4   close   630721 non-null  float64
 5   volume  630721 non-null  float64
dtypes: float64(5), object(1)
memory usage: 33.7+ MB


None

## 🔧 Enhanced Feature Engineering

In [6]:
# Import enhanced features module
from enhanced_features import calculate_enhanced_features, select_important_features

# Calculate enhanced features
def process_features(df):
    """Process and enhance features for training"""
    print("🔧 Calculating enhanced features...")
    
    # Calculate enhanced features
    enhanced_df = calculate_enhanced_features(df)
    print(f"✅ Enhanced features shape: {enhanced_df.shape}")
    
    # Select important features
    selected_features = select_important_features(enhanced_df, n_features=40)
    print(f"🎯 Selected features shape: {selected_features.shape}")
    
    # Handle missing values
    selected_features = selected_features.fillna(method='ffill').fillna(method='bfill').fillna(0)
    
    print(f"✅ Final processed features shape: {selected_features.shape}")
    print(f"📋 Feature columns: {list(selected_features.columns[:10])}...")
    
    return selected_features

# Process features
if df is not None:
    features_df = process_features(df)
    display(features_df.head())

⚠️ TA-Lib not available, using manual calculations
🔧 Calculating enhanced features...
🔧 Calculating enhanced trading features...
  📊 Calculating core technical indicators...
  💰 Calculating order flow indicators...
  🏛️ Calculating market microstructure features...
  📈 Calculating volatility features...
  🚀 Calculating momentum and trend features...
  📊 Calculating support and resistance levels...
  💹 Calculating price-based features...
  📦 Calculating volume-based features...
  ⏰ Calculating time-based features...
  ⏪ Calculating lagged features...
✅ Enhanced features calculated: 66 features
✅ Enhanced features shape: (630721, 66)
🎯 Selecting important features...
✅ Selected 29 important features
🎯 Selected features shape: (630721, 29)
✅ Final processed features shape: (630721, 29)
📋 Feature columns: ['open', 'volume', 'rsi', 'rsi_7', 'macd', 'macd_signal', 'bb_upper', 'stoch_k', 'stoch_d', 'vwap']...


Unnamed: 0_level_0,open,volume,rsi,rsi_7,macd,macd_signal,bb_upper,stoch_k,stoch_d,vwap,...,volume_change,hour,day_of_week,is_session_open,close_lag_1,volume_lag_1,volume_lag_2,volume_lag_3,volume_lag_5,volume_lag_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-09-10 12:15:00,213.1,385.157,55.555415,50.000098,0.0,0.0,34033.495843,99.954147,35.158169,213.0,...,-0.934545,12,6,1,213.0,385.157,385.157,385.157,385.157,385.157
2023-09-10 12:15:00,25815.18,25.21043,55.555415,50.000098,574.138205,318.96567,34033.495843,99.954147,35.158169,1785.320461,...,-0.934545,12,6,1,213.0,385.157,385.157,385.157,385.157,385.157
2023-09-10 12:15:00,1625.9,239.2249,55.555415,50.000098,-18.201604,180.782361,34033.495843,99.954147,35.158169,1726.356697,...,8.489124,12,6,1,25803.16,25.21043,385.157,385.157,385.157,385.157
2023-09-10 12:20:00,213.0,375.347,55.555415,50.000098,-338.134343,4.997569,34033.495843,99.954147,35.158169,1172.156689,...,0.569013,12,6,1,1624.86,239.2249,25.21043,385.157,385.157,385.157
2023-09-10 12:20:00,25803.17,18.98413,55.555415,50.000098,513.78746,156.351059,34033.495843,99.954147,35.158169,1620.217612,...,-0.949422,12,6,1,213.1,375.347,239.2249,25.21043,385.157,385.157


## 🧠 Enhanced Transformer Model

In [7]:
# Import enhanced transformer
from transformer_enhanced_v2 import EnhancedCryptoTransformer, create_enhanced_transformer_config
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
import torch.optim as optim

# Create model configuration
config = create_enhanced_transformer_config()
print("📋 Model configuration:")
for key, value in config.items():
    print(f"   {key}: {value}")

# Create model
if 'features_df' in locals():
    input_dim = features_df.shape[1]
    model = EnhancedCryptoTransformer(
        input_dim=input_dim,
        **config['model_params']
    ).to(device)
    
    print(f"\n🧠 Model created successfully!")
    print(f"📊 Input dimension: {input_dim}")
    print(f"🔧 Model parameters: {sum(p.numel() for p in model.parameters()):,}")
    print(f"💾 Model size: {sum(p.numel() for p in model.parameters()) * 4 / 1024 / 1024:.1f} MB")
else:
    print("⚠️ Features not available, creating test model")
    model = EnhancedCryptoTransformer(
        input_dim=25,
        **config['model_params']
    ).to(device)
    print(f"🧠 Test model created with 25 input dimensions")

📋 Model configuration:
   model_params: {'d_model': 512, 'n_heads': 16, 'n_layers': 8, 'd_ff': 2048, 'dropout': 0.15, 'max_seq_len': 250, 'use_multi_scale': True, 'scales': [5, 15, 30, 60]}
   training_params: {'learning_rate': 5e-05, 'batch_size': 32, 'n_epochs': 150, 'warmup_steps': 2000, 'weight_decay': 1e-05, 'gradient_clipping': 1.0}
   environment_params: {'initial_amount': 100000, 'transaction_cost_pct': 0.001, 'sequence_length': 250, 'use_multi_scale': True}

🧠 Model created successfully!
📊 Input dimension: 29
🔧 Model parameters: 27,374,350
💾 Model size: 104.4 MB


In [8]:
# Test model forward pass
def test_model(model, input_dim=25):
    """Test model forward pass"""
    print("🧪 Testing model forward pass...")
    
    # Create test input
    batch_size = 4
    seq_len = config['model_params']['max_seq_len']
    test_input = torch.randn(batch_size, seq_len, input_dim).to(device)
    
    # Create multi-scale inputs
    scale_inputs = {
        5: torch.randn(batch_size, seq_len, input_dim).to(device),
        15: torch.randn(batch_size, seq_len//3, input_dim).to(device),
        30: torch.randn(batch_size, seq_len//6, input_dim).to(device),
    }
    
    model.eval()
    with torch.no_grad():
        outputs = model(test_input, scale_inputs)
    
    print("✅ Model forward pass successful!")
    print("📊 Output shapes:")
    for key, value in outputs.items():
        if isinstance(value, torch.Tensor):
            print(f"   {key}: {value.shape}")
    
    return outputs

# Test model
test_outputs = test_model(model, input_dim if 'features_df' in locals() else 25)

🧪 Testing model forward pass...
✅ Model forward pass successful!
📊 Output shapes:
   action: torch.Size([4, 1])
   market_regime: torch.Size([4, 4])
   confidence: torch.Size([4, 1])
   volatility: torch.Size([4, 1])
   risk_assessment: torch.Size([4, 3])
   hidden_state: torch.Size([4, 512])


## 🏋️‍♂️ Training Setup

In [9]:
# Create dataset and dataloader
class CryptoDataset(torch.utils.data.Dataset):
    """Dataset for cryptocurrency trading"""
    def __init__(self, features_df, sequence_length=250, prediction_horizon=5):
        self.features = features_df.values
        self.sequence_length = sequence_length
        self.prediction_horizon = prediction_horizon
        self.close_prices = features_df['close'].values if 'close' in features_df.columns else self.features[:, 0]
        
        self.sequences, self.targets = self._prepare_sequences()
    
    def _prepare_sequences(self):
        """Prepare training sequences"""
        sequences = []
        targets = []
        
        for i in range(len(self.features) - self.sequence_length - self.prediction_horizon):
            # Input sequence
            seq = self.features[i:i + self.sequence_length]
            sequences.append(seq)
            
            # Target (future return)
            current_price = self.close_prices[i + self.sequence_length - 1]
            future_price = self.close_prices[i + self.sequence_length + self.prediction_horizon - 1]
            target_return = (future_price - current_price) / current_price
            targets.append(target_return)
        
        return np.array(sequences), np.array(targets)
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        sequence = torch.FloatTensor(self.sequences[idx])
        target = torch.FloatTensor([self.targets[idx]])
        return sequence, target

# Create datasets
if 'features_df' in locals():
    print("🔧 Creating datasets...")
    
    # Create dataset
    full_dataset = CryptoDataset(features_df, sequence_length=config['model_params']['max_seq_len'])
    
    # Split data
    train_size = int(0.8 * len(full_dataset))
    val_size = len(full_dataset) - train_size
    
    train_dataset, val_dataset = torch.utils.data.random_split(
        full_dataset, [train_size, val_size]
    )
    
    print(f"📊 Training samples: {len(train_dataset)}")
    print(f"📊 Validation samples: {len(val_dataset)}")
    
    # Create dataloaders
    batch_size = config['training_params']['batch_size']
    train_loader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=2
    )
    
    val_loader = DataLoader(
        val_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=2
    )
    
    print(f"🔧 Batch size: {batch_size}")
    print(f"🔧 Training batches: {len(train_loader)}")
    print(f"🔧 Validation batches: {len(val_loader)}")
else:
    print("⚠️ Features not available, skipping dataset creation")

🔧 Creating datasets...
📊 Training samples: 504372
📊 Validation samples: 126094
🔧 Batch size: 32
🔧 Training batches: 15762
🔧 Validation batches: 3941


## 🚀 Training Loop

In [10]:
# Training setup
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import time
import warnings

# Suppress PyTorch warnings
warnings.filterwarnings('ignore', category=UserWarning)

# Initialize training components
try:
    if 'train_loader' in locals():
        # Optimizer and scheduler
        optimizer = optim.AdamW(
            model.parameters(),
            lr=config['training_params']['learning_rate'],
            weight_decay=config['training_params']['weight_decay']
        )
        
        scheduler = optim.lr_scheduler.CosineAnnealingLR(
            optimizer,
            T_max=config['training_params']['n_epochs'],
            eta_min=config['training_params']['learning_rate'] * 0.1
        )
        
        # Loss functions
        action_loss_fn = nn.MSELoss()
        confidence_loss_fn = nn.MSELoss()
        
        # Training history
        training_history = {
            'train_loss': [],
            'val_loss': [],
            'learning_rate': [],
            'epoch_time': [],
            'gpu_memory': []
        }
        
        print("🚀 Training setup completed!")
    else:
        print("⚠️ Training setup skipped - no datasets available")
except Exception as e:
    print(f"⚠️ Error setting up training: {str(e)}")
    print("This might be due to PyTorch compatibility issues. Please restart the kernel and try again.")

🚀 Training setup completed!


In [11]:
# Training function
def train_epoch(model, train_loader, optimizer, device):
    """Train for one epoch"""
    model.train()
    total_loss = 0
    action_loss = 0
    confidence_loss = 0
    num_batches = 0
    
    for batch_idx, (sequences, targets) in enumerate(train_loader):
        sequences = sequences.to(device)
        targets = targets.to(device)
        
        optimizer.zero_grad()
        
        # Forward pass
        outputs = model(sequences)
        
        # Calculate losses
        action_loss_batch = action_loss_fn(outputs['action'], targets)
        confidence_loss_batch = confidence_loss_fn(outputs['confidence'], torch.ones_like(outputs['confidence']) * 0.8)
        
        # Total loss
        total_loss_batch = action_loss_batch + 0.2 * confidence_loss_batch
        
        # Backward pass
        total_loss_batch.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        
        # Accumulate losses
        total_loss += total_loss_batch.item()
        action_loss += action_loss_batch.item()
        confidence_loss += confidence_loss_batch.item()
        num_batches += 1
        
        if batch_idx % 10 == 0:
            print(f"  Batch {batch_idx}/{len(train_loader)}: Loss = {total_loss_batch.item():.4f}")
    
    return {
        'total_loss': total_loss / num_batches,
        'action_loss': action_loss / num_batches,
        'confidence_loss': confidence_loss / num_batches
    }

def validate_epoch(model, val_loader, device):
    """Validate for one epoch"""
    model.eval()
    total_loss = 0
    num_batches = 0
    
    with torch.no_grad():
        for sequences, targets in val_loader:
            sequences = sequences.to(device)
            targets = targets.to(device)
            
            outputs = model(sequences)
            
            loss = action_loss_fn(outputs['action'], targets)
            total_loss += loss.item()
            num_batches += 1
    
    return total_loss / num_batches

print("🔧 Training functions defined!")

🔧 Training functions defined!


In [None]:
# Start training
def start_training(model, train_loader, val_loader, optimizer, scheduler, config, device):
    """Start the training process"""
    print("🚀 Starting enhanced transformer training...")
    print(f"📊 Training samples: {len(train_loader.dataset)}")
    print(f"📊 Validation samples: {len(val_loader.dataset)}")
    print(f"🧠 Model parameters: {sum(p.numel() for p in model.parameters()):,}")
    print(f"🔧 Epochs: {config['training_params']['n_epochs']}")

    best_val_loss = float('inf')
    training_history = {
        'train_loss': [],
        'val_loss': [],
        'learning_rate': [],
        'epoch_time': [],
        'gpu_memory': []
    }

    # Fix PyTorch serialization issues
    import pickle
    import io
    
    class CPU_Unpickler(pickle.Unpickler):
        def find_class(self, module, name):
            if module == 'torch.storage' and name == '_load_from_bytes':
                return lambda b: torch.load(io.BytesIO(b))
            return super().find_class(module, name)
    
    def safe_save(obj, filename):
        """Safely save torch objects with pickle"""
        try:
            # First try normal torch.save with legacy format
            torch.save(obj, filename, pickle_protocol=4, _use_new_zipfile_serialization=False)
            return True
        except Exception as e1:
            try:
                # Try with older protocol
                torch.save(obj, filename, pickle_protocol=2, _use_new_zipfile_serialization=False)
                return True
            except Exception as e2:
                print(f"   ⚠️ Failed to save {filename}: {e2}")
                return False

    for epoch in range(config['training_params']['n_epochs']):
        start_time = time.time()

        # Training
        train_losses = train_epoch(model, train_loader, optimizer, device)

        # Validation
        val_loss = validate_epoch(model, val_loader, device)

        # Learning rate scheduling
        scheduler.step()

        # Record metrics
        epoch_time = time.time() - start_time
        training_history['train_loss'].append(train_losses['total_loss'])
        training_history['val_loss'].append(val_loss)
        training_history['learning_rate'].append(optimizer.param_groups[0]['lr'])
        training_history['epoch_time'].append(epoch_time)

        # GPU memory usage
        if torch.cuda.is_available():
            gpu_memory = torch.cuda.memory_allocated() / 1024**3
            training_history['gpu_memory'].append(gpu_memory)

        # Print progress
        if (epoch + 1) % 10 == 0:
            print(f"\n📊 Epoch {epoch+1}/{config['training_params']['n_epochs']}")
            print(f"   Train Loss: {train_losses['total_loss']:.4f} (Action: {train_losses['action_loss']:.4f})")
            print(f"   Val Loss: {val_loss:.4f}")
            print(f"   LR: {optimizer.param_groups[0]['lr']:.6f}")
            print(f"   Time: {epoch_time:.1f}s")
            if torch.cuda.is_available():
                print(f"   GPU Memory: {gpu_memory:.1f} GB")

        # Save best model (with robust error handling)
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            checkpoint = {
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'config': config,
                'training_history': training_history,
                'epoch': epoch
            }
            
            # Try to save using multiple methods
            print("   💾 Attempting to save best model...")
            saved = False
            
            # Method 1: Safe save with pickle
            if not saved:
                saved = safe_save(checkpoint, 'enhanced_transformer_best.pth')
            
            # Method 2: Save only state dict if full checkpoint fails
            if not saved:
                try:
                    torch.save(model.state_dict(), 'enhanced_transformer_best_state.pth', 
                             _use_new_zipfile_serialization=False)
                    print("   💾 Saved model state dict only")
                    saved = True
                except Exception as e:
                    print(f"   ⚠️ Failed to save state dict: {e}")
            
            # Method 3: CPU transfer then save
            if not saved:
                try:
                    model_cpu = {k: v.cpu() for k, v in model.state_dict().items()}
                    torch.save(model_cpu, 'enhanced_transformer_best_cpu.pth', 
                             _use_new_zipfile_serialization=False)
                    print("   💾 Saved model in CPU mode")
                    saved = True
                except Exception as e:
                    print(f"   ⚠️ Failed to save CPU model: {e}")

    # Save final model (with same robust handling)
    print("   💾 Attempting to save final model...")
    saved = False
    
    final_checkpoint = {
        'model_state_dict': model.state_dict(),
        'config': config,
        'training_history': training_history
    }
    
    # Try all save methods
    if not saved:
        saved = safe_save(final_checkpoint, 'enhanced_transformer_final.pth')
    
    if not saved:
        try:
            torch.save(model.state_dict(), 'enhanced_transformer_final_state.pth', 
                     _use_new_zipfile_serialization=False)
            print("   💾 Saved final model state dict")
            saved = True
        except Exception as e:
            print(f"   ⚠️ Failed to save final state dict: {e}")

    print("\n✅ Training completed!")
    print(f"🏆 Best validation loss: {best_val_loss:.4f}")

    return training_history

# Uncomment to start training
if 'train_loader' in locals() and 'val_loader' in locals() and 'optimizer' in locals():
    training_history = start_training(model, train_loader, val_loader, optimizer, scheduler, config, device)
else:
    print("⚠️ Training setup not available. Please run all cells above first.")

# Emergency model save (if training completes but save fails)
def emergency_save_model(model, config=None, training_history=None):
    """Emergency save function if normal saving fails"""
    print("🚨 Emergency save initiated...")
    
    try:
        # Method 1: Save state dict only
        torch.save(model.state_dict(), 'emergency_model_state.pth', 
                 _use_new_zipfile_serialization=False)
        print("✅ Saved model state dict to emergency_model_state.pth")
    except Exception as e:
        print(f"❌ Failed to save state dict: {e}")
    
    try:
        # Method 2: CPU conversion
        cpu_model = {k: v.cpu() for k, v in model.state_dict().items()}
        torch.save(cpu_model, 'emergency_model_cpu.pth', 
                 _use_new_zipfile_serialization=False)
        print("✅ Saved CPU model to emergency_model_cpu.pth")
    except Exception as e:
        print(f"❌ Failed to save CPU model: {e}")
    
    try:
        # Method 3: Save as numpy arrays
        numpy_weights = {}
        for name, param in model.state_dict().items():
            numpy_weights[name] = param.cpu().numpy()
        
        import pickle
        with open('emergency_model_weights.pkl', 'wb') as f:
            pickle.dump(numpy_weights, f)
        print("✅ Saved weights as numpy arrays to emergency_model_weights.pkl")
    except Exception as e:
        print(f"❌ Failed to save numpy weights: {e}")

# Run emergency save if needed
if 'model' in locals():
    emergency_save_model(model, config, training_history if 'training_history' in locals() else None)

In [None]:
# Plot training results
def plot_training_results(history):
    """Plot training results"""
    if not history or not history['train_loss']:
        print("⚠️ No training history available")
        return
    
    fig, axes = plt.subplots(2, 2, figsize=(15, 10))
    
    # Loss plot
    axes[0, 0].plot(history['train_loss'], label='Train Loss', color='blue')
    axes[0, 0].plot(history['val_loss'], label='Val Loss', color='red')
    axes[0, 0].set_title('Training and Validation Loss')
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].set_ylabel('Loss')
    axes[0, 0].legend()
    axes[0, 0].grid(True)
    
    # Learning rate plot
    axes[0, 1].plot(history['learning_rate'], color='green')
    axes[0, 1].set_title('Learning Rate Schedule')
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].set_ylabel('Learning Rate')
    axes[0, 1].grid(True)
    
    # Epoch time plot
    axes[1, 0].plot(history['epoch_time'], color='orange')
    axes[1, 0].set_title('Training Time per Epoch')
    axes[1, 0].set_xlabel('Epoch')
    axes[1, 0].set_ylabel('Time (seconds)')
    axes[1, 0].grid(True)
    
    # GPU memory plot
    if history['gpu_memory']:
        axes[1, 1].plot(history['gpu_memory'], color='purple')
        axes[1, 1].set_title('GPU Memory Usage')
        axes[1, 1].set_xlabel('Epoch')
        axes[1, 1].set_ylabel('Memory (GB)')
        axes[1, 1].grid(True)
    
    plt.tight_layout()
    plt.savefig('training_results.png', dpi=300)
    plt.show()
    
    print("📊 Training results plotted and saved!")

# Plot results if training history exists
if 'training_history' in locals() and training_history['train_loss']:
    plot_training_results(training_history)
else:
    print("⚠️ No training history to plot")

## 🎯 Model Evaluation

In [None]:
# Model evaluation
def evaluate_model(model, val_loader, device):
    """Evaluate model performance"""
    model.eval()
    predictions = []
    actuals = []
    confidences = []
    
    with torch.no_grad():
        for sequences, targets in val_loader:
            sequences = sequences.to(device)
            targets = targets.to(device)
            
            outputs = model(sequences)
            
            predictions.extend(outputs['action'].cpu().numpy())
            actuals.extend(targets.cpu().numpy())
            confidences.extend(outputs['confidence'].cpu().numpy())
    
    predictions = np.array(predictions)
    actuals = np.array(actuals)
    confidences = np.array(confidences)
    
    # Calculate metrics
    mse = np.mean((predictions - actuals) ** 2)
    mae = np.mean(np.abs(predictions - actuals))
    r2 = 1 - np.sum((actuals - predictions) ** 2) / np.sum((actuals - np.mean(actuals)) ** 2)
    
    # Direction accuracy
    pred_direction = np.sign(predictions)
    actual_direction = np.sign(actuals)
    direction_accuracy = np.mean(pred_direction == actual_direction)
    
    print("📊 Model Evaluation Results:")
    print(f"   MSE: {mse:.6f}")
    print(f"   MAE: {mae:.6f}")
    print(f"   R²: {r2:.6f}")
    print(f"   Direction Accuracy: {direction_accuracy:.2%}")
    print(f"   Average Confidence: {np.mean(confidences):.4f}")
    
    return {
        'mse': mse,
        'mae': mae,
        'r2': r2,
        'direction_accuracy': direction_accuracy,
        'confidence': np.mean(confidences)
    }

# Evaluate model if available
if 'val_loader' in locals():
    evaluation_results = evaluate_model(model, val_loader, device)
else:
    print("⚠️ Model not available for evaluation")

## 💾 Model Loading and Inference

In [None]:
# Load trained model
def load_trained_model(model_path='enhanced_transformer_best.pth'):
    """Load a trained model"""
    if not os.path.exists(model_path):
        print(f"❌ Model file {model_path} not found")
        return None
    
    checkpoint = torch.load(model_path, map_location=device)
    
    # Recreate model architecture
    if 'features_df' in locals():
        input_dim = features_df.shape[1]
    else:
        input_dim = 25  # Default
    
    loaded_model = EnhancedCryptoTransformer(
        input_dim=input_dim,
        **checkpoint['config']['model_params']
    ).to(device)
    
    loaded_model.load_state_dict(checkpoint['model_state_dict'])
    loaded_model.eval()
    
    print(f"✅ Model loaded from {model_path}")
    print(f"📊 Model was trained for {checkpoint.get('epoch', 'unknown') + 1} epochs")
    
    return loaded_model, checkpoint

# Function for inference
def predict_trading_signal(model, sequence_data, device):
    """Generate trading signal from sequence data"""
    model.eval()
    
    with torch.no_grad():
        # Ensure correct shape
        if len(sequence_data.shape) == 2:
            sequence_data = sequence_data.unsqueeze(0)  # Add batch dimension
        
        sequence_data = sequence_data.to(device)
        
        outputs = model(sequence_data)
        
        action = outputs['action'].cpu().numpy()[0][0]
        confidence = outputs['confidence'].cpu().numpy()[0][0]
        market_regime = outputs['market_regime'].cpu().numpy()[0]
        volatility = outputs['volatility'].cpu().numpy()[0][0]
        risk_assessment = outputs['risk_assessment'].cpu().numpy()[0]
        
    # Interpret results
    signal_strength = abs(action) * confidence
    
    if action > 0.1:
        signal = "BUY"
    elif action < -0.1:
        signal = "SELL"
    else:
        signal = "HOLD"
    
    regime_labels = ['Bull', 'Bear', 'Ranging', 'Volatile']
    regime = regime_labels[np.argmax(market_regime)]
    
    risk_labels = ['Low', 'Medium', 'High']
    risk_level = risk_labels[np.argmax(risk_assessment)]
    
    return {
        'signal': signal,
        'action': action,
        'confidence': confidence,
        'signal_strength': signal_strength,
        'market_regime': regime,
        'volatility': volatility,
        'risk_level': risk_level
    }

# Test loading model
if os.path.exists('enhanced_transformer_best.pth'):
    loaded_model, checkpoint = load_trained_model()
    if loaded_model:
        print("✅ Model loading test successful!")
else:
    print("⚠️ No trained model found for loading test")

## 🎮 Interactive Trading Signal Demo

In [None]:
# Interactive trading signal generator
def generate_trading_signals_demo(num_signals=5):
    """Generate demo trading signals"""
    if 'loaded_model' not in locals() or loaded_model is None:
        print("⚠️ No loaded model available for demo")
        return
    
    if 'features_df' not in locals():
        print("⚠️ No features available for demo")
        return
    
    print(f"🎮 Generating {num_signals} trading signals...")
    print("=" * 80)
    
    # Generate random sequences from the dataset
    for i in range(num_signals):
        # Get random sequence
        start_idx = np.random.randint(0, len(features_df) - 250)
        sequence_data = features_df.iloc[start_idx:start_idx + 250].values
        
        # Get current price
        current_price = sequence_data[-1, features_df.columns.get_loc('close')] if 'close' in features_df.columns else sequence_data[-1, 0]
        
        # Generate prediction
        prediction = predict_trading_signal(loaded_model, sequence_data, device)
        
        # Display results
        print(f"\n📊 Signal {i+1}:")
        print(f"   Current Price: ${current_price:,.2f}")
        print(f"   Signal: {prediction['signal']}")
        print(f"   Action: {prediction['action']:.3f}")
        print(f"   Confidence: {prediction['confidence']:.3f}")
        print(f"   Signal Strength: {prediction['signal_strength']:.3f}")
        print(f"   Market Regime: {prediction['market_regime']}")
        print(f"   Volatility: {prediction['volatility']:.3f}")
        print(f"   Risk Level: {prediction['risk_level']}")
        
        # Trading recommendation
        if prediction['signal_strength'] > 0.7:
            print(f"   🎯 Recommendation: STRONG {prediction['signal']}")
        elif prediction['signal_strength'] > 0.4:
            print(f"   🎯 Recommendation: MODERATE {prediction['signal']}")
        else:
            print(f"   🎯 Recommendation: WEAK {prediction['signal']} - Consider holding")
        
        print("-" * 40)
    
    print("\n✅ Demo completed!")

# Run demo if model is available
if 'loaded_model' in locals() and loaded_model is not None:
    generate_trading_signals_demo(3)
else:
    print("⚠️ Demo not available - no trained model loaded")

## 📋 System Information and Performance

In [None]:
# Display system information
import platform
import psutil

def display_system_info():
    """Display system information"""
    print("🖥️ System Information")
    print("=" * 40)
    print(f"Platform: {platform.system()} {platform.release()}")
    print(f"Python: {platform.python_version()}")
    print(f"PyTorch: {torch.__version__}")
    print(f"Device: {device}")
    
    # CPU info
    print(f"\n💻 CPU Info:")
    print(f"   Cores: {psutil.cpu_count(logical=True)}")
    print(f"   Usage: {psutil.cpu_percent()}%")
    print(f"   Memory: {psutil.virtual_memory().total / 1024**3:.1f} GB")
    print(f"   Memory Available: {psutil.virtual_memory().available / 1024**3:.1f} GB")
    
    # GPU info
    if torch.cuda.is_available():
        print(f"\n🎮 GPU Info:")
        print(f"   Name: {torch.cuda.get_device_name(0)}")
        print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")
        print(f"   Allocated: {torch.cuda.memory_allocated() / 1024**3:.1f} GB")
        print(f"   Cached: {torch.cuda.memory_reserved() / 1024**3:.1f} GB")
        print(f"   Utilization: {torch.cuda.utilization()}%")
    
    # Model info
    if 'model' in locals():
        print(f"\n🧠 Model Info:")
        print(f"   Parameters: {sum(p.numel() for p in model.parameters()):,}")
        print(f"   Trainable: {sum(p.numel() for p in model.parameters() if p.requires_grad):,}")
        print(f"   Size: {sum(p.numel() for p in model.parameters()) * 4 / 1024 / 1024:.1f} MB")
        
        # Parameter count by type
        param_counts = {}
        for name, param in model.named_parameters():
            param_type = name.split('.')[0]
            param_counts[param_type] = param_counts.get(param_type, 0) + param.numel()
        
        print(f"   Parameter breakdown:")
        for param_type, count in param_counts.items():
            print(f"     {param_type}: {count:,}")
    
    print("\n✅ System information displayed!")

# Display system information
display_system_info()

## 🚀 Quick Start Guide

### **To run this notebook on GPU cloud services:**

1. **Lambda Labs** (Recommended)
   - Choose RTX A6000 instance
   - Upload this notebook and required files
   - Run cells sequentially

2. **Vast.ai** (Cheapest)
   - Rent RTX 4090 instance
   - Use PyTorch Docker image
   - Upload and run notebook

3. **Google Colab Pro** (Easiest)
   - Upload to Google Drive
   - Open in Colab with GPU runtime
   - Mount Drive and run

### **Expected Costs:**
- **Lambda Labs**: ~$0.60/hour = ~$6-12 for full training
- **Vast.ai**: ~$0.30/hour = ~$3-6 for full training
- **Colab Pro**: $10/month unlimited

### **Training Time:**
- **RTX A6000**: ~6-8 hours
- **RTX 4090**: ~8-12 hours
- **A100**: ~4-6 hours

### **Files Needed:**
- `enhanced_transformer_training.ipynb` (this notebook)
- `transformer_enhanced_v2.py` (enhanced model)
- `enhanced_features.py` (feature engineering)
- `crypto_5min_2years.csv` (training data)

### **To Start Training:**
1. Run all cells above sequentially
2. Uncomment the last line in the "Start Training" cell
3. Execute the training cell
4. Monitor progress and results

### **After Training:**
- Model saved as `enhanced_transformer_best.pth`
- Training plots saved as `training_results.png`
- Use model for inference and trading signals