# FT-Transformer Training for Bike Sharing Regression

This notebook demonstrates how to train an FT-Transformer model for bike sharing regression using the modular training functions.

## Overview

The FT-Transformer (Feature Tokenizer + Transformer) is a state-of-the-art architecture for tabular data that:
- Converts features into embeddings using feature tokenization
- Uses multi-head attention to capture feature interactions
- Applies layer normalization and residual connections
- Provides excellent performance on regression tasks

## Dataset
- **Source**: UCI ML Repository - Bike Sharing Dataset
- **Task**: Regression (predicting bike rental count)
- **Features**: 16 features (weather, time, seasonal factors)
- **Target**: Total bike rental count

## 1. Import Required Libraries and Functions

In [1]:
# Import all training functions
from ft_transformer_training_functions import *

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

print("🚴 FT-Transformer Training for Bike Sharing Regression")
print("Dataset: Bike Sharing Dataset")

✅ rtdl library imported successfully
rtdl version: 0.0.13
✅ Enhanced evaluation imported successfully
Using device: cuda
🚴 FT-Transformer Training for Bike Sharing Regression
Dataset: Bike Sharing Dataset


## 2. Load Preprocessed Data

Load the preprocessed bike sharing data from Section 1.

In [2]:
# Load preprocessed data
(X_train_scaled, X_val_scaled, X_test_scaled, 
 y_train, y_val, y_test, feature_names, data_summary) = load_preprocessed_data('./bike_sharing_preprocessed_data.pkl')

print(f"\n📊 Data Summary:")
print(f"   Training samples: {len(X_train_scaled):,}")
print(f"   Validation samples: {len(X_val_scaled):,}")
print(f"   Test samples: {len(X_test_scaled):,}")
print(f"   Features: {len(feature_names)}")
print(f"   Target range: [{y_train.min():.0f}, {y_train.max():.0f}]")

📊 Loading preprocessed bike sharing data...
✅ Preprocessed data loaded successfully!
Training set: (11122, 13)
Validation set: (2781, 13)
Test set: (3476, 13)
Features: 13
Task: Regression (Bike Count Prediction)
Target range: [1, 976]

Checking for invalid values...
NaN in X_train: False
Inf in X_train: False
NaN in y_train: False
X_train min: -3.2513, max: 5.8007
y_train min: 1, max: 976

📊 Data Summary:
   Training samples: 11,122
   Validation samples: 2,781
   Test samples: 3,476
   Features: 13
   Target range: [1, 976]


## 3. Prepare Data for Training

Convert data to PyTorch tensors and create data loaders.

In [None]:
# Prepare data for training
batch_size = 256

(train_loader, val_loader, test_loader, feature_info,
 X_train_tensor, X_val_tensor, X_test_tensor,
 y_train_tensor, y_val_tensor, y_test_tensor) = prepare_data_for_training(
    X_train_scaled, X_val_scaled, X_test_scaled, 
    y_train, y_val, y_test, feature_names, device, batch_size)

print(f"\n✅ Data preparation completed!")

## 4. Create FT-Transformer Model

Create the FT-Transformer model for regression.

In [None]:
# Create FT-Transformer model
model, total_params = create_ft_transformer_model(feature_info, device)

print(f"\n🤖 Model created with {total_params:,} parameters")

## 5. Setup Training Components

Setup loss function, optimizer, and scheduler.

In [None]:
# Setup training components
learning_rate = 1e-4
weight_decay = 1e-5

criterion, optimizer, scheduler, training_config = setup_training(
    model, learning_rate, weight_decay)

print(f"\n🔧 Training setup completed!")
print(f"   Learning rate: {learning_rate}")
print(f"   Weight decay: {weight_decay}")
print(f"   Max epochs: {training_config['n_epochs']}")
print(f"   Early stopping patience: {training_config['patience']}")

## 6. Train the Model

Train the FT-Transformer model with early stopping.

In [None]:
# Train the model
model, history, best_epoch, training_time = train_ft_transformer(
    model, train_loader, val_loader, criterion, optimizer, scheduler, 
    training_config, device)

print(f"\n🏁 Training completed in {training_time:.2f} seconds")
print(f"   Best epoch: {best_epoch + 1}")
print(f"   Final validation R²: {history['val_r2'][best_epoch]:.4f}")

## 7. Evaluate the Model

Evaluate the trained model on the test set.

In [None]:
# Evaluate the model
predictions, metrics = evaluate_model(model, X_test_tensor, y_test_tensor, device)

print(f"\n📊 Test Set Performance:")
print(f"   R² Score: {metrics['r2_score']:.4f}")
print(f"   RMSE: {metrics['rmse']:.4f}")
print(f"   MAE: {metrics['mae']:.4f}")
print(f"   MAPE: {metrics['mape']:.2f}%")
print(f"   Explained Variance: {metrics['explained_variance']:.4f}")

## 8. Create Visualizations

Create training and evaluation plots.

In [None]:
# Create training plots
create_training_plots(history, best_epoch, './Section2_Model_Training')

print("\n📈 Training plots created!")

In [None]:
# Create evaluation plots
create_evaluation_plots(y_test, predictions, './Section2_Model_Training')

print("\n📊 Evaluation plots created!")

## 9. Save Results

Save all results, model, and generated files.

In [None]:
# Save results
save_results(model, history, metrics, predictions, y_test, feature_names, 
            training_time, total_params, './Section2_Model_Training')

print("\n💾 All results saved successfully!")

## 10. Alternative: Run Complete Pipeline

Alternatively, you can run the complete pipeline with a single function call.

In [None]:
# Run complete pipeline (alternative approach)
# Uncomment the following lines to run the complete pipeline in one go:

# model, history, metrics, predictions, feature_names = run_complete_ft_transformer_training(
#     data_path='./bike_sharing_preprocessed_data.pkl',
#     device=device,
#     batch_size=256,
#     learning_rate=1e-4,
#     weight_decay=1e-5,
#     save_dir='./Section2_Model_Training'
# )

print("\n🚀 Complete pipeline function available for one-step execution!")

## 11. Model Analysis and Insights

Analyze the trained model performance and provide insights.

In [None]:
# Model performance analysis
print("\n" + "="*60)
print("FT-TRANSFORMER PERFORMANCE ANALYSIS")
print("="*60)

print(f"\n🎯 Model Performance:")
r2_score = metrics['r2_score']
if r2_score > 0.9:
    performance_level = "Excellent"
elif r2_score > 0.8:
    performance_level = "Good"
elif r2_score > 0.7:
    performance_level = "Moderate"
else:
    performance_level = "Needs Improvement"

print(f"   Performance Level: {performance_level} (R² = {r2_score:.4f})")
print(f"   RMSE: {metrics['rmse']:.2f} bikes")
print(f"   MAE: {metrics['mae']:.2f} bikes")
print(f"   MAPE: {metrics['mape']:.2f}%")

print(f"\n📊 Model Characteristics:")
print(f"   Total Parameters: {total_params:,}")
print(f"   Training Time: {training_time:.2f} seconds")
print(f"   Best Epoch: {best_epoch + 1}")

print(f"\n💡 Business Insights:")
avg_actual = y_test.mean()
avg_error = metrics['mae']
error_percentage = (avg_error / avg_actual) * 100

print(f"   Average bike count: {avg_actual:.0f}")
print(f"   Average prediction error: {avg_error:.0f} bikes ({error_percentage:.1f}%)")

if error_percentage < 10:
    print(f"   ✅ Excellent accuracy for operational planning")
elif error_percentage < 20:
    print(f"   ✅ Good accuracy for demand forecasting")
else:
    print(f"   ⚠️ Consider model improvements for better accuracy")

print(f"\n📁 Generated Files:")
print(f"   - Training History: ./Section2_Model_Training/ft_transformer_training_history.csv")
print(f"   - Evaluation Metrics: ./Section2_Model_Training/ft_transformer_evaluation_metrics.csv")
print(f"   - Predictions: ./Section2_Model_Training/ft_transformer_predictions.csv")
print(f"   - Model Checkpoint: ./Section2_Model_Training/ft_transformer_model.pth")
print(f"   - Training Plots: ./Section2_Model_Training/FT_Transformer_training_history.png")
print(f"   - Evaluation Plots: ./Section2_Model_Training/FT_Transformer_evaluation_results.png")

print(f"\n🚀 FT-Transformer training completed successfully!")
print(f"   Model ready for deployment and comparison with other models!")

## Summary

This notebook demonstrated how to:

1. **Load preprocessed data** from the bike sharing dataset
2. **Prepare data** for FT-Transformer training with PyTorch tensors
3. **Create an FT-Transformer model** specifically for regression
4. **Train the model** with early stopping and learning rate scheduling
5. **Evaluate performance** using comprehensive regression metrics
6. **Generate visualizations** for training progress and model performance
7. **Save all results** for future analysis and comparison

The FT-Transformer provides state-of-the-art performance on tabular data by leveraging attention mechanisms to capture complex feature interactions, making it particularly effective for bike sharing demand prediction.

### Key Features:
- **Modular design**: Each step is implemented as a separate function for flexibility
- **Comprehensive evaluation**: Multiple regression metrics and visualizations
- **Reproducible results**: Fixed random seeds and saved model checkpoints
- **GPU support**: Automatic device detection and memory management
- **Early stopping**: Prevents overfitting and saves training time

### Next Steps:
- Compare with other models (XGBoost, TabPFN, etc.)
- Perform hyperparameter tuning
- Analyze feature importance and model interpretability
- Deploy the model for real-time predictions