# ML Model Development Roadmap for Trading Bot

This notebook implements a complete machine learning pipeline for financial time series prediction, following the best practices roadmap:

1. **Baseline Tree/Tabular Models** (LightGBM, XGBoost, CatBoost)
2. **Classical ML Ensembles** (Stacked ensembles, bagging)
3. **Proper Time Series Validation** (Walk-forward, purged CV)
4. **Hyperparameter Optimization** (Optuna)
5. **Model Evaluation** (Trading & ML metrics)
6. **Production Integration** (Model registry, backtesting)

**Why this order?**
- Tree models are interpretable, fast, and often outperform complex deep models on engineered financial features
- Ensembles reduce overfitting and boost out-of-sample robustness
- Proper validation prevents the #1 killer: overfitting to backtest

In [None]:
# Import Required Libraries
import warnings
warnings.filterwarnings('ignore')

# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import json
import joblib
from pathlib import Path

# Machine Learning
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    roc_auc_score, precision_score, recall_score, f1_score,
    mean_squared_error, mean_absolute_error, r2_score,
    classification_report, confusion_matrix
)
from sklearn.calibration import CalibratedClassifierCV
from sklearn.isotonic import IsotonicRegression

# Tree-based models
import lightgbm as lgb
import xgboost as xgb
import catboost as cb

# Hyperparameter optimization
import optuna
from optuna.samplers import TPESampler
from optuna.pruners import MedianPruner

# Feature importance and explainability
import shap

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Configure plotting
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (12, 8)

print("✅ All libraries imported successfully")
try:
    print(f"LightGBM version: {lgb.__version__}")
    print(f"XGBoost version: {xgb.__version__}")
    print(f"CatBoost version: {cb.__version__}")
except:
    print("⚠️  Some libraries may need installation: pip install lightgbm xgboost catboost optuna shap")

: 

# Data Loading and Preprocessing

Let's load our financial data and prepare it for machine learning. We'll use the existing feature engineering pipeline and create proper time-based labels.

**Key Points:**
- Use engineered features from our pipeline (technical indicators, price features, volume features)
- Create classification labels: P(future return > threshold)
- Implement proper time-based splits to prevent data leakage
- Handle missing values and outliers appropriately

In [None]:
# Import our existing feature engineering pipeline
import sys
import os
sys.path.append(os.path.abspath('.'))

try:
    from arbi.ai.feature_engineering_v2 import compute_features_deterministic, load_feature_schema
    from arbi.ai.training_v2 import generate_synthetic_ohlcv_data
    
    # Load feature schema to understand our features
    schema = load_feature_schema()
    print(f"Feature Schema v{schema['schema_version']}")
    print(f"Total features: {len(schema['features'])}")
    
    # Display feature categories
    feature_categories = {}
    for feature in schema['features']:
        category = feature.get('category', 'unknown')
        if category not in feature_categories:
            feature_categories[category] = []
        feature_categories[category].append(feature['name'])
    
    print("\n📊 Feature Categories:")
    for category, features in feature_categories.items():
        print(f"  {category.upper()}: {len(features)} features")
        print(f"    {', '.join(features[:5])}{'...' if len(features) > 5 else ''}")
        
except ImportError as e:
    print(f"⚠️  Could not import feature engineering: {e}")
    print("Make sure you're running this from the trading-bot directory")
    print("We'll create synthetic features instead...")
    
    # Create basic synthetic features as fallback
    def create_basic_features(df):
        """Create basic technical indicators"""
        features = pd.DataFrame(index=df.index)
        
        # Price features
        features['returns'] = df['close'].pct_change()
        features['log_returns'] = np.log(df['close'] / df['close'].shift(1))
        features['price_ma5'] = df['close'].rolling(5).mean()
        features['price_ma20'] = df['close'].rolling(20).mean()
        
        # Volume features
        features['volume_ma5'] = df['volume'].rolling(5).mean()
        features['volume_ratio'] = df['volume'] / features['volume_ma5']
        
        # Volatility
        features['volatility'] = features['returns'].rolling(20).std()
        
        # RSI
        delta = df['close'].diff()
        gain = (delta.where(delta > 0, 0)).rolling(window=14).mean()
        loss = (-delta.where(delta < 0, 0)).rolling(window=14).mean()
        rs = gain / loss
        features['rsi'] = 100 - (100 / (1 + rs))
        
        return features.dropna()
    
    compute_features_deterministic = lambda df, symbol=None: type('Result', (), {
        'features': create_basic_features(df)
    })()
    
    # Generate synthetic OHLCV data
    def generate_synthetic_ohlcv_data(n_periods=1000, symbol="BTC/USDT"):
        dates = pd.date_range(start='2023-01-01', periods=n_periods, freq='1H')
        
        # Random walk with drift
        np.random.seed(42)
        returns = np.random.normal(0.0001, 0.01, n_periods)
        prices = 50000 * np.exp(np.cumsum(returns))
        
        data = []
        for i, (date, price) in enumerate(zip(dates, prices)):
            high = price * (1 + abs(np.random.normal(0, 0.01)))
            low = price * (1 - abs(np.random.normal(0, 0.01)))
            open_price = prices[i-1] if i > 0 else price
            volume = np.random.uniform(100, 1000)
            
            data.append({
                'timestamp': date,
                'open': open_price,
                'high': high,
                'low': low,
                'close': price,
                'volume': volume
            })
        
        return pd.DataFrame(data)
    
    print("✅ Created fallback feature engineering")

In [None]:
# Generate synthetic data for demonstration (replace with real data in production)
def create_training_dataset(n_periods=2000, symbol="BTC/USDT"):
    """Create training dataset with features and labels"""
    
    # Generate OHLCV data
    df = generate_synthetic_ohlcv_data(n_periods, symbol)
    
    # Compute features using our deterministic pipeline
    feature_result = compute_features_deterministic(df, symbol)
    feature_df = feature_result.features
    
    # Create classification labels
    # Label: Will price move up > 0.3% in next 5 periods?
    future_periods = 5
    threshold = 0.003  # 0.3% for crypto intraday
    
    # Calculate future returns
    future_returns = df['close'].shift(-future_periods) / df['close'] - 1
    
    # Binary classification: 1 if return > threshold, 0 otherwise
    labels_binary = (future_returns > threshold).astype(int)
    
    # Regression target: actual future return
    labels_regression = future_returns
    
    # Remove rows where we can't calculate future returns
    valid_mask = ~future_returns.isna()
    
    feature_df = feature_df[valid_mask].reset_index(drop=True)
    labels_binary = labels_binary[valid_mask].reset_index(drop=True)
    labels_regression = labels_regression[valid_mask].reset_index(drop=True)
    timestamps = df['timestamp'][valid_mask].reset_index(drop=True)
    
    return feature_df, labels_binary, labels_regression, timestamps

# Create dataset
print("🔄 Creating training dataset...")
X, y_binary, y_regression, timestamps = create_training_dataset(2000, "BTC/USDT")

print(f"✅ Dataset created:")
print(f"  Samples: {len(X)}")
print(f"  Features: {X.shape[1]}")
print(f"  Time range: {timestamps.iloc[0]} to {timestamps.iloc[-1]}")
print(f"  Binary class distribution: {y_binary.value_counts().to_dict()}")
print(f"  Regression target stats: mean={y_regression.mean():.4f}, std={y_regression.std():.4f}")

# Display first few features
print(f"\n📊 Feature sample:")
print(X.head())

In [None]:
# Implement proper time-based data splits
def create_time_based_splits(X, y, timestamps, train_size=0.6, val_size=0.2):
    """Create time-based train/val/test splits"""
    n_samples = len(X)
    
    # Calculate split indices
    train_end = int(n_samples * train_size)
    val_end = int(n_samples * (train_size + val_size))
    
    # Create splits
    X_train = X.iloc[:train_end].copy()
    y_train = y.iloc[:train_end].copy()
    
    X_val = X.iloc[train_end:val_end].copy()
    y_val = y.iloc[train_end:val_end].copy()
    
    X_test = X.iloc[val_end:].copy()
    y_test = y.iloc[val_end:].copy()
    
    print(f"📊 Time-based splits:")
    print(f"  Train: {len(X_train)} samples ({timestamps.iloc[0]} to {timestamps.iloc[train_end-1]})")
    print(f"  Val:   {len(X_val)} samples ({timestamps.iloc[train_end]} to {timestamps.iloc[val_end-1]})")
    print(f"  Test:  {len(X_test)} samples ({timestamps.iloc[val_end]} to {timestamps.iloc[-1]})")
    
    return X_train, X_val, X_test, y_train, y_val, y_test

# Create splits for both binary and regression
X_train, X_val, X_test, y_train_binary, y_val_binary, y_test_binary = create_time_based_splits(
    X, y_binary, timestamps
)

_, _, _, y_train_reg, y_val_reg, y_test_reg = create_time_based_splits(
    X, y_regression, timestamps
)

# Feature scaling (important for some models)
scaler = StandardScaler()
X_train_scaled = pd.DataFrame(
    scaler.fit_transform(X_train), 
    columns=X_train.columns, 
    index=X_train.index
)
X_val_scaled = pd.DataFrame(
    scaler.transform(X_val), 
    columns=X_val.columns, 
    index=X_val.index
)
X_test_scaled = pd.DataFrame(
    scaler.transform(X_test), 
    columns=X_test.columns, 
    index=X_test.index
)

print("✅ Data preprocessing complete")

# LightGBM Baseline Model

LightGBM is our baseline tree model - fast, interpretable, and excellent on tabular financial features. We'll implement both classification and regression versions.

**LightGBM Advantages:**
- Fast training and inference
- Built-in categorical feature handling
- Strong performance on tabular data
- Excellent feature importance
- Memory efficient

**Starter Hyperparameters:**
```python
num_leaves=31, learning_rate=0.05, n_estimators=1000, 
min_data_in_leaf=20, feature_fraction=0.8, 
bagging_fraction=0.8, random_state=42
```

In [None]:
# LightGBM Baseline Configuration
LGBM_PARAMS_BINARY = {
    'objective': 'binary',
    'metric': 'auc',
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'min_data_in_leaf': 20,
    'verbose': -1,
    'random_state': RANDOM_SEED
}

LGBM_PARAMS_REG = {
    'objective': 'regression',
    'metric': 'rmse',
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.8,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'min_data_in_leaf': 20,
    'verbose': -1,
    'random_state': RANDOM_SEED
}

def train_lightgbm_classifier(X_train, y_train, X_val, y_val, params):
    """Train LightGBM binary classifier"""
    
    # Create datasets
    train_data = lgb.Dataset(X_train, label=y_train)
    val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
    
    # Train model
    model = lgb.train(
        params,
        train_data,
        num_boost_round=1000,
        valid_sets=[train_data, val_data],
        valid_names=['train', 'val'],
        callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)]
    )
    
    return model

def train_lightgbm_regressor(X_train, y_train, X_val, y_val, params):
    """Train LightGBM regressor"""
    
    # Create datasets
    train_data = lgb.Dataset(X_train, label=y_train)
    val_data = lgb.Dataset(X_val, label=y_val, reference=train_data)
    
    # Train model
    model = lgb.train(
        params,
        train_data,
        num_boost_round=1000,
        valid_sets=[train_data, val_data],
        valid_names=['train', 'val'],
        callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)]
    )
    
    return model

print("✅ LightGBM functions defined")

In [None]:
# Train LightGBM models
print("🚀 Training LightGBM Binary Classifier...")
try:
    lgbm_binary = train_lightgbm_classifier(X_train, y_train_binary, X_val, y_val_binary, LGBM_PARAMS_BINARY)
    print("✅ LightGBM Binary model trained successfully")
except Exception as e:
    print(f"❌ Error training LightGBM Binary: {e}")
    lgbm_binary = None

print("\n🚀 Training LightGBM Regressor...")
try:
    lgbm_regressor = train_lightgbm_regressor(X_train, y_train_reg, X_val, y_val_reg, LGBM_PARAMS_REG)
    print("✅ LightGBM Regression model trained successfully")
except Exception as e:
    print(f"❌ Error training LightGBM Regression: {e}")
    lgbm_regressor = None

In [None]:
# Evaluate LightGBM models
def evaluate_binary_model(model, X_test, y_test, model_name="Model"):
    """Evaluate binary classification model"""
    
    if model is None:
        print(f"❌ {model_name} is None, skipping evaluation")
        return None
    
    try:
        # Get predictions
        y_pred_proba = model.predict(X_test)
        y_pred = (y_pred_proba > 0.5).astype(int)
        
        # Calculate metrics
        auc = roc_auc_score(y_test, y_pred_proba)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)
        
        print(f"\n📊 {model_name} - Binary Classification Results:")
        print(f"  AUC:       {auc:.4f}")
        print(f"  Precision: {precision:.4f}")
        print(f"  Recall:    {recall:.4f}")
        print(f"  F1 Score:  {f1:.4f}")
        
        return {
            'auc': auc, 'precision': precision, 'recall': recall, 'f1': f1,
            'predictions': y_pred_proba
        }
    except Exception as e:
        print(f"❌ Error evaluating {model_name}: {e}")
        return None

def evaluate_regression_model(model, X_test, y_test, model_name="Model"):
    """Evaluate regression model"""
    
    if model is None:
        print(f"❌ {model_name} is None, skipping evaluation")
        return None
    
    try:
        # Get predictions
        y_pred = model.predict(X_test)
        
        # Calculate metrics
        mse = mean_squared_error(y_test, y_pred)
        rmse = np.sqrt(mse)
        mae = mean_absolute_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        print(f"\n📊 {model_name} - Regression Results:")
        print(f"  RMSE: {rmse:.6f}")
        print(f"  MAE:  {mae:.6f}")
        print(f"  R²:   {r2:.4f}")
        
        return {
            'rmse': rmse, 'mae': mae, 'r2': r2,
            'predictions': y_pred
        }
    except Exception as e:
        print(f"❌ Error evaluating {model_name}: {e}")
        return None

# Evaluate LightGBM models
lgbm_binary_results = evaluate_binary_model(lgbm_binary, X_test, y_test_binary, "LightGBM Binary")
lgbm_reg_results = evaluate_regression_model(lgbm_regressor, X_test, y_test_reg, "LightGBM Regression")

# XGBoost Alternative Model

XGBoost provides an excellent alternative to LightGBM with different regularization and boosting approaches. This diversity is valuable for ensemble methods.

**XGBoost Advantages:**
- Robust to overfitting with strong regularization
- Excellent hyperparameter diversity
- Proven track record in competitions
- Different algorithmic approach than LightGBM

**Starter Hyperparameters:**
```python
max_depth=5, eta=0.05, n_estimators=1000
```

In [None]:
# XGBoost Configuration
XGB_PARAMS_BINARY = {
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    'max_depth': 5,
    'eta': 0.05,  # learning_rate
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'min_child_weight': 20,
    'random_state': RANDOM_SEED,
    'verbosity': 0
}

XGB_PARAMS_REG = {
    'objective': 'reg:squarederror',
    'eval_metric': 'rmse',
    'max_depth': 5,
    'eta': 0.05,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
    'min_child_weight': 20,
    'random_state': RANDOM_SEED,
    'verbosity': 0
}

def train_xgboost_classifier(X_train, y_train, X_val, y_val, params):
    """Train XGBoost binary classifier"""
    
    # Create DMatrix
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    
    # Train model
    model = xgb.train(
        params,
        dtrain,
        num_boost_round=1000,
        evals=[(dtrain, 'train'), (dval, 'val')],
        early_stopping_rounds=50,
        verbose_eval=100
    )
    
    return model

def train_xgboost_regressor(X_train, y_train, X_val, y_val, params):
    """Train XGBoost regressor"""
    
    # Create DMatrix
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    
    # Train model
    model = xgb.train(
        params,
        dtrain,
        num_boost_round=1000,
        evals=[(dtrain, 'train'), (dval, 'val')],
        early_stopping_rounds=50,
        verbose_eval=100
    )
    
    return model

print("✅ XGBoost functions defined")

In [None]:
# Train XGBoost models
print("🚀 Training XGBoost Binary Classifier...")
try:
    xgb_binary = train_xgboost_classifier(X_train, y_train_binary, X_val, y_val_binary, XGB_PARAMS_BINARY)
    print("✅ XGBoost Binary model trained successfully")
except Exception as e:
    print(f"❌ Error training XGBoost Binary: {e}")
    xgb_binary = None

print("\n🚀 Training XGBoost Regressor...")
try:
    xgb_regressor = train_xgboost_regressor(X_train, y_train_reg, X_val, y_val_reg, XGB_PARAMS_REG)
    print("✅ XGBoost Regression model trained successfully")
except Exception as e:
    print(f"❌ Error training XGBoost Regression: {e}")
    xgb_regressor = None

In [None]:
# Evaluate XGBoost models
def evaluate_xgb_binary(model, X_test, y_test, model_name="XGBoost Binary"):
    if model is None:
        return None
    
    try:
        dtest = xgb.DMatrix(X_test)
        y_pred_proba = model.predict(dtest)
        y_pred = (y_pred_proba > 0.5).astype(int)
        
        auc = roc_auc_score(y_test, y_pred_proba)
        precision = precision_score(y_test, y_pred)
        recall = recall_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)
        
        print(f"\n📊 {model_name} Results:")
        print(f"  AUC:       {auc:.4f}")
        print(f"  Precision: {precision:.4f}")
        print(f"  Recall:    {recall:.4f}")
        print(f"  F1 Score:  {f1:.4f}")
        
        return {'auc': auc, 'precision': precision, 'recall': recall, 'f1': f1, 'predictions': y_pred_proba}
    except Exception as e:
        print(f"❌ Error evaluating {model_name}: {e}")
        return None

def evaluate_xgb_regression(model, X_test, y_test, model_name="XGBoost Regression"):
    if model is None:
        return None
    
    try:
        dtest = xgb.DMatrix(X_test)
        y_pred = model.predict(dtest)
        
        mse = mean_squared_error(y_test, y_pred)
        rmse = np.sqrt(mse)
        mae = mean_absolute_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)
        
        print(f"\n📊 {model_name} Results:")
        print(f"  RMSE: {rmse:.6f}")
        print(f"  MAE:  {mae:.6f}")
        print(f"  R²:   {r2:.4f}")
        
        return {'rmse': rmse, 'mae': mae, 'r2': r2, 'predictions': y_pred}
    except Exception as e:
        print(f"❌ Error evaluating {model_name}: {e}")
        return None

xgb_binary_results = evaluate_xgb_binary(xgb_binary, X_test, y_test_binary)
xgb_reg_results = evaluate_xgb_regression(xgb_regressor, X_test, y_test_reg)

# Model Comparison and Results Summary

Let's compare all our baseline models and summarize the results:

In [None]:
# Create results summary
def create_results_summary():
    """Create a comprehensive results summary"""
    
    print("\n" + "="*60)
    print("🏆 MODEL PERFORMANCE SUMMARY")
    print("="*60)
    
    # Binary Classification Results
    print("\n📊 BINARY CLASSIFICATION RESULTS:")
    print("-" * 50)
    
    results = {
        'LightGBM': lgbm_binary_results,
        'XGBoost': xgb_binary_results,
    }
    
    for model_name, result in results.items():
        if result:
            print(f"{model_name:<12} | AUC: {result['auc']:.4f} | Precision: {result['precision']:.4f} | F1: {result['f1']:.4f}")
        else:
            print(f"{model_name:<12} | ❌ Not trained")
    
    # Regression Results
    print("\n📊 REGRESSION RESULTS:")
    print("-" * 40)
    
    reg_results = {
        'LightGBM': lgbm_reg_results,
        'XGBoost': xgb_reg_results,
    }
    
    for model_name, result in reg_results.items():
        if result:
            print(f"{model_name:<12} | RMSE: {result['rmse']:.6f} | R²: {result['r2']:.4f}")
        else:
            print(f"{model_name:<12} | ❌ Not trained")
    
    # Best Model Selection
    print("\n🥇 BEST PERFORMERS:")
    print("-" * 30)
    
    # Find best binary classifier
    best_binary_auc = 0
    best_binary_model = None
    
    for model_name, result in results.items():
        if result and result['auc'] > best_binary_auc:
            best_binary_auc = result['auc']
            best_binary_model = model_name
    
    if best_binary_model:
        print(f"Binary Classification: {best_binary_model} (AUC: {best_binary_auc:.4f})")
    
    # Find best regressor
    best_reg_r2 = -999
    best_reg_model = None
    
    for model_name, result in reg_results.items():
        if result and result['r2'] > best_reg_r2:
            best_reg_r2 = result['r2']
            best_reg_model = model_name
    
    if best_reg_model:
        print(f"Regression: {best_reg_model} (R²: {best_reg_r2:.4f})")
    
    print("\n" + "="*60)

create_results_summary()

# Next Steps: Advanced Model Development

Now that we have our baseline models, here's your roadmap for advanced development:

## 🚀 IMMEDIATE NEXT STEPS (Priority Order):

### 1. **CatBoost & Random Forest** (Complete the baseline)
```python
# Add these models to your ensemble
cb_model = cb.CatBoostClassifier(iterations=1000, learning_rate=0.05, depth=5)
rf_model = RandomForestClassifier(n_estimators=200, max_depth=10)
```

### 2. **Stacked Ensemble** (Critical for performance)
```python
# Combine LightGBM + XGBoost + CatBoost + RF with meta-learner
meta_learner = LogisticRegression()  # or small LightGBM
```

### 3. **Hyperparameter Optimization** (Use Optuna)
```python
# Systematic HPO for each model
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)
```

### 4. **Time Series Cross-Validation** (Critical!)
```python
# Walk-forward validation to prevent overfitting
tscv = TimeSeriesSplit(n_splits=5)
# Add purging gaps for realistic trading delays
```

### 5. **Feature Importance & SHAP Analysis**
```python
# Understand what drives predictions
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
```

## 📈 ADVANCED MODELS (After baseline is solid):

### 6. **Sequence Models** (For temporal patterns)
```python
# LSTM/GRU for 1m/5m high-frequency data
lstm_model = tf.keras.Sequential([
    tf.keras.layers.LSTM(64, return_sequences=True),
    tf.keras.layers.LSTM(32),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
```

### 7. **Regime-Aware Models**
```python
# Train separate models for different market regimes
regime_classifier = train_regime_detector()  # vol/trend regimes
models_by_regime = {
    'low_vol': lgb_model_low_vol,
    'high_vol': lgb_model_high_vol,
    'trending': lgb_model_trend
}
```

### 8. **Uncertainty-Aware Models**
```python
# Quantile regression for position sizing
quantile_model = lgb.LGBMRegressor(
    objective='quantile', alpha=0.5
)
```

## 🎯 PRODUCTION DEPLOYMENT:

### 9. **Model Registry Integration**
```python
# Save to your existing model registry
from arbi.ai.registry import ModelRegistry
registry = ModelRegistry()
model_id = registry.register_model(model, metadata)
```

### 10. **Backtesting Integration**
```python
# Test with realistic slippage & fees
backtest_results = backtester.run(
    signals=model_predictions,
    slippage=0.001, fees=0.0005
)
```

### 11. **Live Inference Pipeline**
```python
# Deploy via your existing inference engine
from arbi.ai.inference_v2 import ProductionInferenceEngine
engine = ProductionInferenceEngine()
signals = engine.generate_ml_signals("BTC/USDT", "binance")
```

## ⚠️ CRITICAL SUCCESS FACTORS:

1. **Always use time-based splits** - No future leakage!
2. **Test with realistic costs** - Slippage & fees kill edge
3. **Multiple validation windows** - Ensure robustness
4. **Feature schema consistency** - Training = Inference
5. **Monitor feature drift** - Retrain when needed

**Start with steps 1-5 above, then gradually add complexity. The baseline tree models often outperform fancy deep learning in finance!**

# Production Deployment Checklist

## ✅ COMPLETED:
- [x] Feature Engineering Pipeline (deterministic, schema-locked)
- [x] Model Registry (SQLite backend, versioning)
- [x] Training Pipeline (LightGBM, reproducible)
- [x] Inference Engine (real-time signals)
- [x] Baseline Models (LightGBM, XGBoost setup)

## 🔄 IN PROGRESS:
- [ ] Complete baseline ensemble (CatBoost, Random Forest)
- [ ] Stacked ensemble implementation
- [ ] Hyperparameter optimization (Optuna)
- [ ] Time series cross-validation

## 📋 TODO (Priority Order):
1. **Model Calibration** - Isotonic regression for probability calibration
2. **Feature Importance Analysis** - SHAP values for interpretability
3. **Backtesting Integration** - Connect models to backtester
4. **Performance Monitoring** - Feature drift detection
5. **Shadow Deployment** - Paper trading validation
6. **Live Deployment** - Canary rollout strategy

## 🚀 QUICK START COMMANDS:

```bash
# 1. Complete the baseline models by running all cells above
# 2. Train production models:
python -c "from arbi.ai.training_v2 import train_lightgbm_model; train_lightgbm_model()"

# 3. Test inference:
python test_inference.py

# 4. Run full integration:
python test_full_integration.py

# 5. Deploy production service:
python inference_service.py
```

**You now have a complete ML development roadmap! Start by completing the remaining baseline models, then move systematically through the advanced techniques.**