# üöÄ Dynamic Cloud-Agnostic Model Training

Welcome to the **Dynamic Predictive Maintenance ML System**! This notebook demonstrates our revolutionary cloud-agnostic training system that can:

## ‚ú® Key Features

üåê **Universal Cloud Support**: Automatically detects and optimizes for:
- Google Cloud Platform (GCP) 
- Microsoft Azure
- Amazon Web Services (AWS)
- Google Colab
- Kaggle Notebooks
- Local development environments

üîÑ **Seamless Resume**: Training can be stopped on one platform and resumed on another
üíæ **Auto-Sync Checkpoints**: Automatic synchronization across cloud storage
üéØ **Dynamic Optimization**: Hardware-aware batch sizes and precision settings
üîß **Smart Configuration**: Platform-specific optimizations applied automatically

## üéÆ Quick Start Commands

Run these commands to start training immediately:

```bash
# Train TFT model on AI4I dataset with auto-resume
python ../launch_training.py --model tft --dataset ai4i --resume

# Train Hybrid CNN-BiLSTM with custom settings
python ../launch_training.py --model hybrid --dataset ai4i --batch-size 256 --max-epochs 50

# Use custom configuration file
python ../launch_training.py --config my_config.yaml --resume
```

In [None]:
# Let's start by detecting our current platform and checking system capabilities
import sys
sys.path.append('../')

from src.utils.cloud_platform import get_platform_info, get_optimal_config
from src.utils.checkpoint_manager import DynamicCheckpointManager
from src.utils.validators import validate_config_file

# Detect current platform
platform_info = get_platform_info()
optimal_config = get_optimal_config()

print("üåê PLATFORM DETECTION")
print("=" * 50)
print(f"Platform: {platform_info.platform.value.upper()}")
print(f"Instance Type: {platform_info.instance_type or 'Unknown'}")

if platform_info.gpu_info and platform_info.gpu_info.get('torch_cuda_available'):
    gpu_devices = platform_info.gpu_info.get('devices', [])
    if gpu_devices:
        print(f"GPU: {gpu_devices[0]['name']}")
        print(f"VRAM: {gpu_devices[0]['properties']['total_memory'] / 1024**3:.1f} GB")
        print(f"Compute Capability: {gpu_devices[0]['properties']['major']}.{gpu_devices[0]['properties']['minor']}")
else:
    print("GPU: Not available")

print(f"\nüí° OPTIMAL CONFIGURATION")
print("=" * 50)
print(f"Recommended Batch Size: {optimal_config['batch_size']}")
print(f"Recommended Workers: {optimal_config['num_workers']}")
print(f"Recommended Precision: {optimal_config['precision']}")
print(f"Storage Path: {optimal_config['storage_path']}")
print(f"Checkpoint Sync: {optimal_config['checkpoint_sync']}")
print(f"W&B Mode: {optimal_config['wandb_mode']}")

In [None]:
# Let's demonstrate the dynamic checkpoint manager
print("üîÑ CHECKPOINT MANAGEMENT DEMO")
print("=" * 50)

# Initialize checkpoint manager for this project
checkpoint_manager = DynamicCheckpointManager(
    project_name='predictive_maintenance_demo',
    storage_config={
        'type': 'auto',  # Auto-detect based on platform
        'auto_sync': True
    },
    auto_sync=True
)

# List any existing checkpoints
existing_checkpoints = checkpoint_manager.list_checkpoints()
print(f"Existing checkpoints: {len(existing_checkpoints)}")

for checkpoint in existing_checkpoints[:3]:  # Show first 3
    print(f"  üìÅ {checkpoint.checkpoint_id}")
    print(f"     Epoch: {checkpoint.epoch} | Platform: {checkpoint.platform}")
    print(f"     Metrics: {checkpoint.metrics}")

if not existing_checkpoints:
    print("  No existing checkpoints found - this is a fresh start!")

print(f"\nüéØ READY FOR DYNAMIC TRAINING!")
print("=" * 50)
print("Your system is configured for:")
print("‚úÖ Multi-platform training")
print("‚úÖ Automatic checkpoint synchronization") 
print("‚úÖ Hardware-optimized settings")
print("‚úÖ Seamless resume capabilities")

# Cleanup
checkpoint_manager.stop_background_sync()

# Model Experiments: TFT vs Hybrid CNN-BiLSTM

This notebook provides comprehensive model comparison and hyperparameter tuning for:
- **Temporal Fusion Transformer (TFT)**: Interpretable attention-based forecasting
- **Hybrid CNN-BiLSTM**: Multi-scale pattern detection with temporal modeling

## Objectives
1. Compare model architectures on predictive maintenance tasks
2. Hyperparameter optimization for A100 GPU
3. Performance analysis and interpretability
4. Production deployment recommendations

In [None]:
# Import necessary libraries
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# PyTorch and Lightning
import torch
import torch.nn as nn
import pytorch_lightning as pl
from pytorch_lightning.callbacks import ModelCheckpoint, EarlyStopping
from pytorch_lightning.loggers import WandbLogger

# Hyperparameter optimization
import optuna
from optuna.integration import PyTorchLightningPruningCallback

# Metrics and utilities
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Add project root to path
sys.path.append('../')

print("Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")

## 1. Setup and Configuration

In [None]:
# Import project modules
from src.data.data_loader import PredictiveMaintenanceDataModule
from src.models.tft_model import TemporalFusionTransformer
from src.models.hybrid_model import HybridCNNBiLSTM
from src.training.train import A100OptimizedTrainer
from src.training.evaluate import ModelEvaluator
from src.utils.helpers import ConfigManager, ExperimentTracker

# Load configurations
tft_config = ConfigManager.load_config('../config/tft_config.yaml')
hybrid_config = ConfigManager.load_config('../config/hybrid_config.yaml')
data_config = ConfigManager.load_config('../config/data_config.yaml')
training_config = ConfigManager.load_config('../config/training_config.yaml')

print("‚úì Project modules and configurations loaded successfully!")

In [None]:
# Experiment configuration
EXPERIMENT_CONFIG = {
    'dataset': 'ai4i',  # Change to test different datasets
    'max_epochs': 50,
    'patience': 10,
    'n_trials': 20,  # Optuna hyperparameter optimization trials
    'random_seed': 42,
    'test_size': 0.2,
    'val_size': 0.2,
    'batch_size': 256,  # Optimized for A100
    'num_workers': 8
}

# Set random seeds for reproducibility
pl.seed_everything(EXPERIMENT_CONFIG['random_seed'])

print(f"Experiment configuration:")
for key, value in EXPERIMENT_CONFIG.items():
    print(f"  {key}: {value}")

## 2. Data Preparation

In [None]:
# Initialize data module
data_module = PredictiveMaintenanceDataModule(
    dataset_name=EXPERIMENT_CONFIG['dataset'],
    config=data_config,
    batch_size=EXPERIMENT_CONFIG['batch_size'],
    num_workers=EXPERIMENT_CONFIG['num_workers']
)

# Setup data
data_module.setup()

# Get data info
train_dataloader = data_module.train_dataloader()
val_dataloader = data_module.val_dataloader()
test_dataloader = data_module.test_dataloader()

# Get a sample batch to understand data dimensions
sample_batch = next(iter(train_dataloader))
features, targets = sample_batch

print(f"Data prepared successfully!")
print(f"Feature shape: {features.shape}")
print(f"Target shape: {targets.shape}")
print(f"Training batches: {len(train_dataloader)}")
print(f"Validation batches: {len(val_dataloader)}")
print(f"Test batches: {len(test_dataloader)}")

# Store data dimensions
input_dim = features.shape[-1]
sequence_length = features.shape[1] if len(features.shape) > 2 else 1
num_classes = 2  # Binary classification

print(f"\nData dimensions:")
print(f"  Input dimension: {input_dim}")
print(f"  Sequence length: {sequence_length}")
print(f"  Number of classes: {num_classes}")

## 3. Model Architecture Comparison

In [None]:
def create_tft_model(trial=None):
    """Create TFT model with optional hyperparameter optimization."""
    config = tft_config.copy()
    
    if trial is not None:
        # Hyperparameter optimization
        config['model']['hidden_dim'] = trial.suggest_categorical('tft_hidden_dim', [64, 128, 256, 512])
        config['model']['num_heads'] = trial.suggest_categorical('tft_num_heads', [4, 8, 12, 16])
        config['model']['num_layers'] = trial.suggest_int('tft_num_layers', 2, 6)
        config['model']['dropout'] = trial.suggest_float('tft_dropout', 0.1, 0.5)
        config['training']['learning_rate'] = trial.suggest_float('tft_lr', 1e-5, 1e-2, log=True)
    
    # Update input dimension
    config['model']['input_dim'] = input_dim
    config['model']['num_classes'] = num_classes
    
    return TemporalFusionTransformer(config)

def create_hybrid_model(trial=None):
    """Create Hybrid CNN-BiLSTM model with optional hyperparameter optimization."""
    config = hybrid_config.copy()
    
    if trial is not None:
        # Hyperparameter optimization
        config['model']['cnn_channels'] = trial.suggest_categorical('hybrid_cnn_channels', [32, 64, 128, 256])
        config['model']['lstm_hidden_dim'] = trial.suggest_categorical('hybrid_lstm_dim', [64, 128, 256, 512])
        config['model']['lstm_num_layers'] = trial.suggest_int('hybrid_lstm_layers', 1, 4)
        config['model']['dropout'] = trial.suggest_float('hybrid_dropout', 0.1, 0.5)
        config['training']['learning_rate'] = trial.suggest_float('hybrid_lr', 1e-5, 1e-2, log=True)
    
    # Update input dimension
    config['model']['input_dim'] = input_dim
    config['model']['num_classes'] = num_classes
    
    return HybridCNNBiLSTM(config)

# Create baseline models
tft_model = create_tft_model()
hybrid_model = create_hybrid_model()

print("Models created successfully!")
print(f"TFT parameters: {sum(p.numel() for p in tft_model.parameters()):,}")
print(f"Hybrid parameters: {sum(p.numel() for p in hybrid_model.parameters()):,}")

## 4. Model Training Function

In [None]:
def train_model(model, model_name, max_epochs=None, trial=None):
    """Train a model with proper callbacks and logging."""
    max_epochs = max_epochs or EXPERIMENT_CONFIG['max_epochs']
    
    # Callbacks
    callbacks = [
        ModelCheckpoint(
            dirpath=f'../models/checkpoints/{model_name}',
            filename='{epoch:02d}-{val_loss:.3f}',
            monitor='val_loss',
            mode='min',
            save_top_k=3
        ),
        EarlyStopping(
            monitor='val_loss',
            patience=EXPERIMENT_CONFIG['patience'],
            mode='min',
            verbose=True
        )
    ]
    
    # Add Optuna pruning callback if using hyperparameter optimization
    if trial is not None:
        callbacks.append(PyTorchLightningPruningCallback(trial, monitor='val_loss'))
    
    # Logger
    logger = None
    try:
        logger = WandbLogger(
            project='predictive-maintenance',
            name=f'{model_name}_{EXPERIMENT_CONFIG["dataset"]}',
            log_model=True
        )
    except Exception:
        print("W&B not available, using default logger")
    
    # Trainer
    trainer = pl.Trainer(
        max_epochs=max_epochs,
        callbacks=callbacks,
        logger=logger,
        accelerator='gpu' if torch.cuda.is_available() else 'cpu',
        devices=1,
        precision='16-mixed' if torch.cuda.is_available() else '32-true',
        gradient_clip_val=1.0,
        deterministic=True,
        enable_checkpointing=True,
        enable_progress_bar=True
    )
    
    # Train
    trainer.fit(model, data_module)
    
    # Test
    test_results = trainer.test(model, data_module)
    
    return trainer, test_results[0] if test_results else {}

print("Training function defined successfully!")

## 5. Baseline Model Training

In [None]:
# Train TFT model
print("üöÄ Training Temporal Fusion Transformer...")
tft_trainer, tft_results = train_model(tft_model, 'tft_baseline')

print(f"\n‚úì TFT Training Complete!")
print(f"Test Results: {tft_results}")

In [None]:
# Train Hybrid model
print("üöÄ Training Hybrid CNN-BiLSTM...")
hybrid_trainer, hybrid_results = train_model(hybrid_model, 'hybrid_baseline')

print(f"\n‚úì Hybrid Training Complete!")
print(f"Test Results: {hybrid_results}")

## 6. Hyperparameter Optimization

In [None]:
def objective_tft(trial):
    """Objective function for TFT hyperparameter optimization."""
    model = create_tft_model(trial)
    trainer, test_results = train_model(
        model, 
        f'tft_trial_{trial.number}', 
        max_epochs=20,  # Reduced epochs for optimization
        trial=trial
    )
    
    # Return validation loss for optimization
    return trainer.callback_metrics.get('val_loss', float('inf'))

def objective_hybrid(trial):
    """Objective function for Hybrid model hyperparameter optimization."""
    model = create_hybrid_model(trial)
    trainer, test_results = train_model(
        model, 
        f'hybrid_trial_{trial.number}', 
        max_epochs=20,  # Reduced epochs for optimization
        trial=trial
    )
    
    # Return validation loss for optimization
    return trainer.callback_metrics.get('val_loss', float('inf'))

print("Hyperparameter optimization functions defined!")

In [None]:
# Optimize TFT hyperparameters
print("üîß Optimizing TFT hyperparameters...")

tft_study = optuna.create_study(
    direction='minimize',
    study_name='tft_optimization',
    pruner=optuna.pruners.MedianPruner()
)

tft_study.optimize(objective_tft, n_trials=EXPERIMENT_CONFIG['n_trials'])

print(f"\n‚úì TFT Optimization Complete!")
print(f"Best TFT Parameters: {tft_study.best_params}")
print(f"Best TFT Value: {tft_study.best_value:.4f}")

In [None]:
# Optimize Hybrid hyperparameters
print("üîß Optimizing Hybrid CNN-BiLSTM hyperparameters...")

hybrid_study = optuna.create_study(
    direction='minimize',
    study_name='hybrid_optimization',
    pruner=optuna.pruners.MedianPruner()
)

hybrid_study.optimize(objective_hybrid, n_trials=EXPERIMENT_CONFIG['n_trials'])

print(f"\n‚úì Hybrid Optimization Complete!")
print(f"Best Hybrid Parameters: {hybrid_study.best_params}")
print(f"Best Hybrid Value: {hybrid_study.best_value:.4f}")

## 7. Final Model Training with Optimized Parameters

In [None]:
# Create optimized models
class OptimalTrial:
    def __init__(self, params):
        self.params = params
    
    def suggest_categorical(self, name, choices):
        return self.params.get(name, choices[0])
    
    def suggest_int(self, name, low, high):
        return self.params.get(name, (low + high) // 2)
    
    def suggest_float(self, name, low, high, log=False):
        return self.params.get(name, (low + high) / 2)

# Create optimized models
optimal_tft_trial = OptimalTrial(tft_study.best_params)
optimal_hybrid_trial = OptimalTrial(hybrid_study.best_params)

optimal_tft_model = create_tft_model(optimal_tft_trial)
optimal_hybrid_model = create_hybrid_model(optimal_hybrid_trial)

print("Optimized models created successfully!")

In [None]:
# Train optimized TFT
print("üöÄ Training Optimized TFT...")
optimal_tft_trainer, optimal_tft_results = train_model(optimal_tft_model, 'tft_optimized')

print(f"\n‚úì Optimized TFT Training Complete!")
print(f"Test Results: {optimal_tft_results}")

In [None]:
# Train optimized Hybrid
print("üöÄ Training Optimized Hybrid CNN-BiLSTM...")
optimal_hybrid_trainer, optimal_hybrid_results = train_model(optimal_hybrid_model, 'hybrid_optimized')

print(f"\n‚úì Optimized Hybrid Training Complete!")
print(f"Test Results: {optimal_hybrid_results}")

## 8. Model Comparison and Analysis

In [None]:
# Compile results
results_comparison = {
    'TFT Baseline': tft_results,
    'Hybrid Baseline': hybrid_results,
    'TFT Optimized': optimal_tft_results,
    'Hybrid Optimized': optimal_hybrid_results
}

# Create comparison DataFrame
metrics_df = pd.DataFrame(results_comparison).T

print("üìä Model Performance Comparison:")
print("=" * 80)
print(metrics_df)

# Visualize comparison
if len(metrics_df) > 0 and len(metrics_df.columns) > 0:
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Test accuracy comparison
    if 'test_accuracy' in metrics_df.columns:
        metrics_df['test_accuracy'].plot(kind='bar', ax=axes[0], color='skyblue', alpha=0.7)
        axes[0].set_title('Test Accuracy Comparison')
        axes[0].set_ylabel('Accuracy')
        axes[0].tick_params(axis='x', rotation=45)
        axes[0].grid(True, alpha=0.3)
    
    # Test loss comparison
    if 'test_loss' in metrics_df.columns:
        metrics_df['test_loss'].plot(kind='bar', ax=axes[1], color='lightcoral', alpha=0.7)
        axes[1].set_title('Test Loss Comparison')
        axes[1].set_ylabel('Loss')
        axes[1].tick_params(axis='x', rotation=45)
        axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

In [None]:
# Hyperparameter optimization visualization
def plot_optimization_history(study, title):
    """Plot optimization history."""
    fig, axes = plt.subplots(1, 2, figsize=(15, 6))
    
    # Optimization history
    trials_df = study.trials_dataframe()
    if len(trials_df) > 0:
        axes[0].plot(trials_df['number'], trials_df['value'], 'b-o', alpha=0.7)
        axes[0].axhline(y=study.best_value, color='r', linestyle='--', 
                       label=f'Best: {study.best_value:.4f}')
        axes[0].set_title(f'{title} - Optimization History')
        axes[0].set_xlabel('Trial Number')
        axes[0].set_ylabel('Validation Loss')
        axes[0].legend()
        axes[0].grid(True, alpha=0.3)
        
        # Parameter importance
        try:
            importance = optuna.importance.get_param_importances(study)
            if importance:
                params = list(importance.keys())
                values = list(importance.values())
                
                axes[1].barh(params, values, alpha=0.7, color='green')
                axes[1].set_title(f'{title} - Parameter Importance')
                axes[1].set_xlabel('Importance')
                axes[1].grid(True, alpha=0.3, axis='x')
        except Exception as e:
            axes[1].text(0.5, 0.5, f'Parameter importance\nnot available:\n{str(e)}', 
                        ha='center', va='center', transform=axes[1].transAxes)
    
    plt.tight_layout()
    plt.show()

# Plot optimization histories
plot_optimization_history(tft_study, 'TFT')
plot_optimization_history(hybrid_study, 'Hybrid CNN-BiLSTM')

## 9. Model Interpretability Analysis

In [None]:
def analyze_model_predictions(model, trainer, model_name):
    """Analyze model predictions and interpretability."""
    print(f"\nüîç Analyzing {model_name} predictions...")
    
    # Get predictions
    predictions = trainer.predict(model, data_module.test_dataloader())
    
    if predictions and len(predictions) > 0:
        # Concatenate all predictions
        all_preds = torch.cat([pred['predictions'] for pred in predictions])
        all_targets = torch.cat([pred['targets'] for pred in predictions])
        
        # Convert to numpy
        pred_probs = torch.softmax(all_preds, dim=1).cpu().numpy()
        pred_classes = all_preds.argmax(dim=1).cpu().numpy()
        true_classes = all_targets.cpu().numpy()
        
        # Classification report
        print(f"\nClassification Report for {model_name}:")
        print(classification_report(true_classes, pred_classes))
        
        # Confusion matrix
        cm = confusion_matrix(true_classes, pred_classes)
        
        plt.figure(figsize=(8, 6))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                   xticklabels=['Normal', 'Failure'],
                   yticklabels=['Normal', 'Failure'])
        plt.title(f'{model_name} - Confusion Matrix')
        plt.ylabel('True Label')
        plt.xlabel('Predicted Label')
        plt.show()
        
        # Prediction confidence distribution
        plt.figure(figsize=(12, 5))
        
        plt.subplot(1, 2, 1)
        plt.hist(pred_probs[:, 1], bins=50, alpha=0.7, color='skyblue')
        plt.title(f'{model_name} - Failure Probability Distribution')
        plt.xlabel('Failure Probability')
        plt.ylabel('Frequency')
        plt.grid(True, alpha=0.3)
        
        plt.subplot(1, 2, 2)
        # Separate by true class
        normal_probs = pred_probs[true_classes == 0, 1]
        failure_probs = pred_probs[true_classes == 1, 1]
        
        plt.hist(normal_probs, bins=30, alpha=0.7, label='True Normal', color='blue')
        plt.hist(failure_probs, bins=30, alpha=0.7, label='True Failure', color='red')
        plt.title(f'{model_name} - Prediction Confidence by True Class')
        plt.xlabel('Failure Probability')
        plt.ylabel('Frequency')
        plt.legend()
        plt.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        return {
            'predictions': pred_classes,
            'probabilities': pred_probs,
            'targets': true_classes,
            'confusion_matrix': cm
        }
    
    return None

# Analyze both optimized models
tft_analysis = analyze_model_predictions(optimal_tft_model, optimal_tft_trainer, 'Optimized TFT')
hybrid_analysis = analyze_model_predictions(optimal_hybrid_model, optimal_hybrid_trainer, 'Optimized Hybrid')

## 10. Model Deployment Recommendations

In [None]:
# Save optimized models
def save_model_for_deployment(model, model_name, config):
    """Save model in multiple formats for deployment."""
    save_dir = Path(f'../models/deployment/{model_name}')
    save_dir.mkdir(parents=True, exist_ok=True)
    
    # Save PyTorch model
    torch.save(model.state_dict(), save_dir / 'model_weights.pth')
    
    # Save model architecture info
    model_info = {
        'model_type': model.__class__.__name__,
        'config': config,
        'input_dim': input_dim,
        'num_classes': num_classes,
        'parameter_count': sum(p.numel() for p in model.parameters())
    }
    
    import json
    with open(save_dir / 'model_info.json', 'w') as f:
        json.dump(model_info, f, indent=2, default=str)
    
    print(f"‚úì {model_name} saved to {save_dir}")
    
    # Try to export to ONNX for deployment
    try:
        model.eval()
        dummy_input = torch.randn(1, sequence_length, input_dim)
        if torch.cuda.is_available():
            model = model.cuda()
            dummy_input = dummy_input.cuda()
        
        torch.onnx.export(
            model,
            dummy_input,
            save_dir / 'model.onnx',
            export_params=True,
            opset_version=11,
            do_constant_folding=True,
            input_names=['input'],
            output_names=['output'],
            dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
        )
        print(f"‚úì {model_name} ONNX model saved")
    except Exception as e:
        print(f"‚ö†Ô∏è ONNX export failed for {model_name}: {e}")

# Save both optimized models
save_model_for_deployment(optimal_tft_model, 'tft_optimized', tft_study.best_params)
save_model_for_deployment(optimal_hybrid_model, 'hybrid_optimized', hybrid_study.best_params)

## 11. Experiment Summary and Recommendations

In [None]:
def generate_experiment_summary():
    """Generate comprehensive experiment summary."""
    print("\n" + "="*100)
    print("MODEL EXPERIMENT SUMMARY - PREDICTIVE MAINTENANCE")
    print("="*100)
    
    print(f"\nüî¨ Experiment Configuration:")
    print(f"   ‚Ä¢ Dataset: {EXPERIMENT_CONFIG['dataset'].upper()}")
    print(f"   ‚Ä¢ Input Dimension: {input_dim}")
    print(f"   ‚Ä¢ Sequence Length: {sequence_length}")
    print(f"   ‚Ä¢ Batch Size: {EXPERIMENT_CONFIG['batch_size']} (A100 optimized)")
    print(f"   ‚Ä¢ Hyperparameter Trials: {EXPERIMENT_CONFIG['n_trials']}")
    
    print(f"\nüèÜ Best Model Performance:")
    
    # Find best performing model
    best_accuracy = 0
    best_model = "None"
    
    for model_name, results in results_comparison.items():
        if results and 'test_accuracy' in results:
            if results['test_accuracy'] > best_accuracy:
                best_accuracy = results['test_accuracy']
                best_model = model_name
    
    print(f"   ‚Ä¢ Winner: {best_model}")
    print(f"   ‚Ä¢ Best Accuracy: {best_accuracy:.4f}")
    
    print(f"\nüìä Model Comparison:")
    for model_name, results in results_comparison.items():
        if results:
            acc = results.get('test_accuracy', 'N/A')
            loss = results.get('test_loss', 'N/A')
            print(f"   ‚Ä¢ {model_name}:")
            print(f"     - Accuracy: {acc}")
            print(f"     - Loss: {loss}")
    
    print(f"\nüîß Optimal Hyperparameters:")
    print(f"   ‚Ä¢ TFT Best Params: {tft_study.best_params}")
    print(f"   ‚Ä¢ Hybrid Best Params: {hybrid_study.best_params}")
    
    print(f"\nüöÄ Deployment Recommendations:")
    
    recommendations = [
        "1. Model Selection:",
        f"   - Primary: {best_model} (highest accuracy)",
        "   - Fallback: Both models for ensemble prediction",
        "",
        "2. A100 GPU Optimizations:",
        "   - Use mixed precision (16-bit) training",
        f"   - Optimal batch size: {EXPERIMENT_CONFIG['batch_size']}",
        "   - Enable gradient checkpointing for memory efficiency",
        "   - Use torch.compile() for inference acceleration",
        "",
        "3. Production Deployment:",
        "   - Export to ONNX for cross-platform inference",
        "   - Use TensorRT for maximum A100 performance",
        "   - Implement model serving with FastAPI",
        "   - Set up monitoring with W&B or MLflow",
        "",
        "4. Model Maintenance:",
        "   - Retrain monthly with new failure data",
        "   - Monitor for data drift and model degradation",
        "   - A/B test model updates before deployment",
        "   - Maintain model versioning and rollback capability"
    ]
    
    for rec in recommendations:
        print(f"   {rec}")
    
    print(f"\nüéØ Next Steps:")
    next_steps = [
        "‚Ä¢ Deploy best model to production API",
        "‚Ä¢ Set up automated retraining pipeline",
        "‚Ä¢ Create monitoring dashboard",
        "‚Ä¢ Implement uncertainty quantification",
        "‚Ä¢ Test ensemble methods for improved robustness",
        "‚Ä¢ Integrate with maintenance scheduling system"
    ]
    
    for step in next_steps:
        print(f"   {step}")
    
    print("\n" + "="*100)

# Generate summary
generate_experiment_summary()

In [None]:
# Save experiment results
experiment_results = {
    'config': EXPERIMENT_CONFIG,
    'data_info': {
        'input_dim': input_dim,
        'sequence_length': sequence_length,
        'num_classes': num_classes,
        'train_batches': len(train_dataloader),
        'val_batches': len(val_dataloader),
        'test_batches': len(test_dataloader)
    },
    'model_results': results_comparison,
    'optimization': {
        'tft_best_params': tft_study.best_params,
        'tft_best_value': tft_study.best_value,
        'hybrid_best_params': hybrid_study.best_params,
        'hybrid_best_value': hybrid_study.best_value
    }
}

# Save results
results_path = Path('../models/experiment_results.json')
results_path.parent.mkdir(parents=True, exist_ok=True)

import json
with open(results_path, 'w') as f:
    json.dump(experiment_results, f, indent=2, default=str)

print(f"\nüíæ Experiment results saved to {results_path}")
print("\nüéâ Model experiments completed successfully!")
print("Ready for production deployment on A100 GPU.")

In [None]:
# Launch Dynamic Training
print("üöÄ Launching Dynamic Training")
print("="*50)

from launch_training import DynamicTrainingLauncher
import yaml

# Create minimal training config for demo
demo_config = {
    'model': {
        'name': 'tft',
        'input_dim': 5,
        'hidden_dim': 32,
        'num_heads': 4,
        'num_layers': 2,
        'dropout': 0.1,
        'num_classes': 2
    },
    'data': {
        'dataset': 'synthetic',
        'batch_size': 16,
        'sequence_length': 50,
        'train_split': 0.8
    },
    'training': {
        'max_epochs': 2,  # Short demo training
        'learning_rate': 0.001,
        'patience': 1,
        'save_top_k': 1
    },
    'paths': {
        'data_dir': '../data',
        'model_dir': '../models',
        'log_dir': '../logs'
    }
}

# Save demo config
with open('../config/demo_config.yaml', 'w') as f:
    yaml.dump(demo_config, f)

print("‚úÖ Demo configuration created")

# Initialize dynamic launcher
launcher = DynamicTrainingLauncher(
    config_path='../config/demo_config.yaml',
    project_name='notebook_demo',
    storage_config={
        'type': 'local',
        'base_path': '../checkpoints'
    }
)

print("\nüìã Training Configuration:")
print(f"Platform: {launcher.platform_detector.platform_info.platform.value}")
print(f"GPU Available: {launcher.platform_detector.platform_info.is_gpu_available}")
print(f"Optimal Batch Size: {launcher.platform_detector.get_optimal_config()['batch_size']}")
print(f"Precision: {launcher.platform_detector.get_optimal_config()['precision']}")

print("\nüéØ Ready for dynamic training across any platform!")

## üåü Cross-Platform Resume Capability

This system can now seamlessly resume training on any platform:

In [None]:
# Demo: Resume Training Scenario
print("üì¶ Cross-Platform Resume Demo")
print("="*40)

# Simulate training on Platform 1 (e.g., Local)
print("1Ô∏è‚É£ Training started on LOCAL platform...")
print("   ‚úÖ Checkpoint saved: epoch=5, step=1000")
print("   üì§ Checkpoint synced to cloud storage")

# Simulate resuming on Platform 2 (e.g., GCP)
print("\n2Ô∏è‚É£ Moving to GCP platform...")
print("   üîç Detecting new platform: GCP")
print("   üì• Downloading latest checkpoint")
print("   üîÑ Resuming from epoch=5, step=1000")
print("   ‚ö° Auto-optimizing for A100 GPU")

# Simulate resuming on Platform 3 (e.g., Azure)
print("\n3Ô∏è‚É£ Moving to AZURE platform...")
print("   üîç Detecting new platform: Azure")
print("   üì• Syncing checkpoint state")
print("   üîÑ Resuming from epoch=8, step=1500")
print("   üöÄ Auto-optimizing for V100 GPU")

print("\n‚ú® Training can seamlessly continue on ANY platform!")
print("üîó Same model, same progress, different infrastructure")

# Show available checkpoints
print("\nüìã Available Checkpoints:")
print("   ‚Ä¢ notebook_demo_tft_epoch-5_step-1000")
print("   ‚Ä¢ notebook_demo_tft_epoch-8_step-1500") 
print("   ‚Ä¢ notebook_demo_tft_latest")

print("\nüèÜ ACHIEVEMENT UNLOCKED: Universal ML Training!")