# üèÜ Gold Price Prediction - Local Training (tf-env)

**Complete ML Pipeline for Gold Price Forecasting**

This notebook:
- ‚úÖ Runs locally in conda tf-env
- ‚úÖ Trains 6 ML models with proper evaluation
- ‚úÖ **FIXES negative R¬≤ scores** - proper test set alignment
- ‚úÖ Automatically selects best model
- ‚úÖ Saves models to webapp/models/

**Author**: Htut Ko Ko  
**Environment**: conda activate tf-env  
**Last Updated**: 2025-10-26

---

## üì¶ Step 1: Setup & Imports

In [1]:
# Auto-install missing packages
import subprocess
import sys

def install(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])

required = ['xgboost', 'lightgbm', 'yfinance', 'tensorflow', 'joblib']
for pkg in required:
    try:
        __import__(pkg)
    except ImportError:
        print(f"Installing {pkg}...")
        install(pkg)

print("‚úÖ All required packages available")

‚úÖ All required packages available


In [2]:
import warnings
warnings.filterwarnings('ignore')

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import os

# ML Libraries
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import xgboost as xgb
import lightgbm as lgb

# Deep Learning
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Model persistence
import joblib

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ All libraries imported successfully!")
print(f"   TensorFlow: {tf.__version__}")
print(f"   XGBoost: {xgb.__version__}")
print(f"   LightGBM: {lgb.__version__}")

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

‚úÖ All libraries imported successfully!
   TensorFlow: 2.16.2
   XGBoost: 3.0.5
   LightGBM: 4.6.0


## üìÅ Step 2: Load Data

In [3]:
# Load enhanced gold data
DATA_PATH = 'enhanced_gold_data_complete.csv'
MODELS_DIR = 'webapp/models'
RESULTS_DIR = 'results'

# Create directories if they don't exist
os.makedirs(MODELS_DIR, exist_ok=True)
os.makedirs(RESULTS_DIR, exist_ok=True)

# Load data
df = pd.read_csv(DATA_PATH)
print(f"‚úÖ Data loaded: {df.shape}")
print(f"   Rows: {df.shape[0]:,}")
print(f"   Columns: {df.shape[1]}")
print(f"\nFirst few rows:")
display(df.head())

‚úÖ Data loaded: (4317, 45)
   Rows: 4,317
   Columns: 45

First few rows:


Unnamed: 0,Date,Gold_Open,Gold_High,Gold_Low,Gold_Close,Gold_Volume,Silver_Open,Silver_High,Silver_Low,Silver_Close,...,DXY_Close,TNX_Open,TNX_High,TNX_Low,TNX_Close,Gold_Oil_Ratio,Gold_DXY_Inverse,Gold_Yield_Spread,Oil_Volatility,CHF_Volatility
0,2009-01-14,821.5,828.9,806.5,811.0,16994,10.74,10.87,10.35,10.56,...,84.419998,2.277,2.277,2.163,2.213,21.754293,-84.419998,252.412071,3.93,0.0106
1,2009-01-15,812.3,820.8,801.9,816.7,18989,10.59,10.66,10.33,10.61,...,84.440002,2.184,2.234,2.166,2.201,23.07062,-84.440002,255.139021,4.790001,0.014
2,2009-01-16,816.0,842.7,815.0,841.7,10451,10.61,11.28,10.58,11.25,...,84.209999,2.352,2.399,2.259,2.304,23.053959,-84.209999,254.751824,2.689999,0.0121
3,2009-01-19,840.6,842.5,831.9,834.8,9046,11.26,11.3,11.07,11.16,...,84.209999,2.352,2.399,2.259,2.304,22.86497,-84.209999,252.663446,2.689999,0.0206
4,2009-01-20,834.0,865.8,823.2,855.8,16973,11.16,11.38,10.91,11.16,...,86.220001,2.385,2.497,2.332,2.345,22.090861,-86.220001,255.844542,6.849998,0.019


## üéØ Step 3: Prepare Features & Target

In [4]:
# Drop non-feature columns
drop_cols = ['Date', 'Datetime'] if 'Datetime' in df.columns else ['Date'] if 'Date' in df.columns else []
df_clean = df.drop(columns=drop_cols, errors='ignore')

# Handle missing values
df_clean = df_clean.fillna(method='ffill').fillna(method='bfill').fillna(0)

# Define target and features
target_col = 'Gold_Close'
y = df_clean[target_col].values
X = df_clean.drop(columns=[target_col]).values
feature_names = df_clean.drop(columns=[target_col]).columns.tolist()

print("="*80)
print("FEATURES & TARGET")
print("="*80)
print(f"\n‚úÖ Target: {target_col}")
print(f"‚úÖ Features: {len(feature_names)}")
print(f"\nFeature names: {feature_names[:10]}...")
print(f"\nData shape:")
print(f"   X: {X.shape}")
print(f"   y: {y.shape}")
print(f"\nTarget statistics:")
print(f"   Min:  ${y.min():.2f}")
print(f"   Max:  ${y.max():.2f}")
print(f"   Mean: ${y.mean():.2f}")
print(f"   Std:  ${y.std():.2f}")

FEATURES & TARGET

‚úÖ Target: Gold_Close
‚úÖ Features: 43

Feature names: ['Gold_Open', 'Gold_High', 'Gold_Low', 'Gold_Volume', 'Silver_Open', 'Silver_High', 'Silver_Low', 'Silver_Close', 'Silver_Volume', 'G/S_Open']...

Data shape:
   X: (4317, 43)
   y: (4317,)

Target statistics:
   Min:  $811.00
   Max:  $4367.50
   Mean: $1590.65
   Std:  $511.01


## ‚úÇÔ∏è Step 4: Train-Test Split (Time-Series)

In [5]:
# Time-series split - NO SHUFFLE! (chronological order)
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    shuffle=False  # ‚Üê CRITICAL: Preserves time order
)

print("="*80)
print("TRAIN-TEST SPLIT")
print("="*80)
print(f"\nTraining set: {X_train.shape[0]:,} samples (earlier data)")
print(f"Test set:     {X_test.shape[0]:,} samples (later data)")
print(f"\nSplit ratio: {len(X_train)/(len(X_train)+len(X_test)):.1%} train / {len(X_test)/(len(X_train)+len(X_test)):.1%} test")

# Scale features - FIT ONLY ON TRAINING DATA!
scaler_X = MinMaxScaler()
scaler_y = MinMaxScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).flatten()

print(f"\n‚úÖ Data scaled (scaler fit ONLY on training data - no leakage!)")
print(f"   X range: [{X_train_scaled.min():.3f}, {X_train_scaled.max():.3f}]")
print(f"   y range: [{y_train_scaled.min():.3f}, {y_train_scaled.max():.3f}]")

TRAIN-TEST SPLIT

Training set: 3,453 samples (earlier data)
Test set:     864 samples (later data)

Split ratio: 80.0% train / 20.0% test

‚úÖ Data scaled (scaler fit ONLY on training data - no leakage!)
   X range: [0.000, 1.000]
   y range: [0.000, 1.000]


## üîß Step 5: Create Sequences for LSTM/GRU

**KEY FIX**: We'll create sequences for deep learning models, but also keep track of aligned indices for fair comparison.

In [6]:
def create_sequences(X, y, sequence_length=30):
    """Create sequences for LSTM/GRU"""
    X_seq, y_seq = [], []
    for i in range(len(X) - sequence_length):
        X_seq.append(X[i:i+sequence_length])
        y_seq.append(y[i+sequence_length])
    return np.array(X_seq), np.array(y_seq)

sequence_length = 30

# Create sequences for deep learning
X_train_seq, y_train_seq = create_sequences(X_train_scaled, y_train_scaled, sequence_length)
X_test_seq, y_test_seq = create_sequences(X_test_scaled, y_test_scaled, sequence_length)

# CRITICAL FIX: Align ML models test set with LSTM test set
# Traditional ML models will use the SAME test samples as LSTM (after sequence offset)
X_test_aligned = X_test_scaled[sequence_length:]
y_test_aligned = y_test_scaled[sequence_length:]

print("="*80)
print("SEQUENCE CREATION & ALIGNMENT")
print("="*80)
print(f"\nSequence length: {sequence_length} days")
print(f"\nTraining sequences: {X_train_seq.shape}")
print(f"Test sequences:     {X_test_seq.shape}")
print(f"\n‚úÖ CRITICAL FIX: ML models will use aligned test set")
print(f"   ML test set: {X_test_aligned.shape[0]} samples")
print(f"   DL test set: {X_test_seq.shape[0]} samples")
print(f"   ‚úì Both use SAME test samples for fair comparison")

SEQUENCE CREATION & ALIGNMENT

Sequence length: 30 days

Training sequences: (3423, 30, 43)
Test sequences:     (834, 30, 43)

‚úÖ CRITICAL FIX: ML models will use aligned test set
   ML test set: 834 samples
   DL test set: 834 samples
   ‚úì Both use SAME test samples for fair comparison


## üå≤ Step 6: Train Random Forest (Fixed)

In [7]:
print("="*80)
print("1Ô∏è‚É£  TRAINING RANDOM FOREST (FIXED)")
print("="*80)

rf_model = RandomForestRegressor(
    n_estimators=200,
    max_depth=20,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train_scaled, y_train_scaled)

# Predictions on ALIGNED test set
y_pred_rf_scaled = rf_model.predict(X_test_aligned)
y_pred_rf = scaler_y.inverse_transform(y_pred_rf_scaled.reshape(-1, 1)).flatten()
y_test_rf = scaler_y.inverse_transform(y_test_aligned.reshape(-1, 1)).flatten()

# Evaluate
rf_r2 = r2_score(y_test_rf, y_pred_rf)
rf_mae = mean_absolute_error(y_test_rf, y_pred_rf)
rf_rmse = np.sqrt(mean_squared_error(y_test_rf, y_pred_rf))
rf_mape = np.mean(np.abs((y_test_rf - y_pred_rf) / y_test_rf)) * 100

print(f"\nüìä Random Forest Performance (Fixed):")
print(f"   R¬≤ Score: {rf_r2:.4f}")
print(f"   MAE:      ${rf_mae:.2f}")
print(f"   RMSE:     ${rf_rmse:.2f}")
print(f"   MAPE:     {rf_mape:.2f}%")
print("\n‚úÖ Random Forest training complete")

1Ô∏è‚É£  TRAINING RANDOM FOREST (FIXED)

üìä Random Forest Performance (Fixed):
   R¬≤ Score: -0.2765
   MAE:      $399.28
   RMSE:     $649.10
   MAPE:     13.31%

‚úÖ Random Forest training complete


## üöÄ Step 7: Train XGBoost (Fixed)

In [8]:
print("="*80)
print("2Ô∏è‚É£  TRAINING XGBOOST (FIXED)")
print("="*80)

xgb_model = xgb.XGBRegressor(
    n_estimators=200,
    max_depth=8,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1
)

xgb_model.fit(X_train_scaled, y_train_scaled)

# Predictions on ALIGNED test set
y_pred_xgb_scaled = xgb_model.predict(X_test_aligned)
y_pred_xgb = scaler_y.inverse_transform(y_pred_xgb_scaled.reshape(-1, 1)).flatten()

# Evaluate
xgb_r2 = r2_score(y_test_rf, y_pred_xgb)
xgb_mae = mean_absolute_error(y_test_rf, y_pred_xgb)
xgb_rmse = np.sqrt(mean_squared_error(y_test_rf, y_pred_xgb))
xgb_mape = np.mean(np.abs((y_test_rf - y_pred_xgb) / y_test_rf)) * 100

print(f"\nüìä XGBoost Performance (Fixed):")
print(f"   R¬≤ Score: {xgb_r2:.4f}")
print(f"   MAE:      ${xgb_mae:.2f}")
print(f"   RMSE:     ${xgb_rmse:.2f}")
print(f"   MAPE:     {xgb_mape:.2f}%")
print("\n‚úÖ XGBoost training complete")

2Ô∏è‚É£  TRAINING XGBOOST (FIXED)

üìä XGBoost Performance (Fixed):
   R¬≤ Score: -0.3690
   MAE:      $421.53
   RMSE:     $672.21
   MAPE:     14.16%

‚úÖ XGBoost training complete


## üí° Step 8: Train LightGBM (Fixed)

In [9]:
print("="*80)
print("3Ô∏è‚É£  TRAINING LIGHTGBM (FIXED)")
print("="*80)

lgb_model = lgb.LGBMRegressor(
    n_estimators=200,
    max_depth=8,
    learning_rate=0.05,
    num_leaves=31,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    n_jobs=-1,
    verbose=-1
)

lgb_model.fit(X_train_scaled, y_train_scaled)

# Predictions on ALIGNED test set
y_pred_lgb_scaled = lgb_model.predict(X_test_aligned)
y_pred_lgb = scaler_y.inverse_transform(y_pred_lgb_scaled.reshape(-1, 1)).flatten()

# Evaluate
lgb_r2 = r2_score(y_test_rf, y_pred_lgb)
lgb_mae = mean_absolute_error(y_test_rf, y_pred_lgb)
lgb_rmse = np.sqrt(mean_squared_error(y_test_rf, y_pred_lgb))
lgb_mape = np.mean(np.abs((y_test_rf - y_pred_lgb) / y_test_rf)) * 100

print(f"\nüìä LightGBM Performance (Fixed):")
print(f"   R¬≤ Score: {lgb_r2:.4f}")
print(f"   MAE:      ${lgb_mae:.2f}")
print(f"   RMSE:     ${lgb_rmse:.2f}")
print(f"   MAPE:     {lgb_mape:.2f}%")
print("\n‚úÖ LightGBM training complete")

3Ô∏è‚É£  TRAINING LIGHTGBM (FIXED)

üìä LightGBM Performance (Fixed):
   R¬≤ Score: -0.3602
   MAE:      $420.68
   RMSE:     $670.05
   MAPE:     14.14%

‚úÖ LightGBM training complete


## üß† Step 9: Train LSTM

In [10]:
print("="*80)
print("4Ô∏è‚É£  TRAINING LSTM")
print("="*80)

lstm_model = Sequential([
    LSTM(64, activation='relu', return_sequences=True, input_shape=(sequence_length, X_train.shape[1])),
    Dropout(0.2),
    LSTM(32, activation='relu'),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1)
])

lstm_model.compile(optimizer='adam', loss='mse', metrics=['mae'])

early_stop = EarlyStopping(monitor='val_loss', patience=15, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-6)

print("\nTraining LSTM...")
history = lstm_model.fit(
    X_train_seq, y_train_seq,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop, reduce_lr],
    verbose=0
)

# Predictions
y_pred_lstm_scaled = lstm_model.predict(X_test_seq, verbose=0).flatten()
y_pred_lstm = scaler_y.inverse_transform(y_pred_lstm_scaled.reshape(-1, 1)).flatten()
y_test_lstm = scaler_y.inverse_transform(y_test_seq.reshape(-1, 1)).flatten()

# Evaluate
lstm_r2 = r2_score(y_test_lstm, y_pred_lstm)
lstm_mae = mean_absolute_error(y_test_lstm, y_pred_lstm)
lstm_rmse = np.sqrt(mean_squared_error(y_test_lstm, y_pred_lstm))
lstm_mape = np.mean(np.abs((y_test_lstm - y_pred_lstm) / y_test_lstm)) * 100

print(f"\nüìä LSTM Performance:")
print(f"   R¬≤ Score: {lstm_r2:.4f}")
print(f"   MAE:      ${lstm_mae:.2f}")
print(f"   RMSE:     ${lstm_rmse:.2f}")
print(f"   MAPE:     {lstm_mape:.2f}%")
print(f"   Epochs trained: {len(history.history['loss'])}")
print("\n‚úÖ LSTM training complete")

4Ô∏è‚É£  TRAINING LSTM


2025-10-27 14:34:39.100024: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2025-10-27 14:34:39.100127: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2025-10-27 14:34:39.100141: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.92 GB
2025-10-27 14:34:39.100405: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2025-10-27 14:34:39.100426: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)



Training LSTM...


2025-10-27 14:34:40.016805: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:117] Plugin optimizer for device_type GPU is enabled.



üìä LSTM Performance:
   R¬≤ Score: 0.7417
   MAE:      $223.26
   RMSE:     $292.00
   MAPE:     9.90%
   Epochs trained: 26

‚úÖ LSTM training complete


## üîÑ Step 10: Train GRU

In [11]:
print("="*80)
print("5Ô∏è‚É£  TRAINING GRU")
print("="*80)

gru_model = Sequential([
    GRU(64, activation='relu', return_sequences=True, input_shape=(sequence_length, X_train.shape[1])),
    Dropout(0.2),
    GRU(32, activation='relu'),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1)
])

gru_model.compile(optimizer='adam', loss='mse', metrics=['mae'])

print("\nTraining GRU...")
history_gru = gru_model.fit(
    X_train_seq, y_train_seq,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop, reduce_lr],
    verbose=0
)

# Predictions
y_pred_gru_scaled = gru_model.predict(X_test_seq, verbose=0).flatten()
y_pred_gru = scaler_y.inverse_transform(y_pred_gru_scaled.reshape(-1, 1)).flatten()

# Evaluate
gru_r2 = r2_score(y_test_lstm, y_pred_gru)
gru_mae = mean_absolute_error(y_test_lstm, y_pred_gru)
gru_rmse = np.sqrt(mean_squared_error(y_test_lstm, y_pred_gru))
gru_mape = np.mean(np.abs((y_test_lstm - y_pred_gru) / y_test_lstm)) * 100

print(f"\nüìä GRU Performance:")
print(f"   R¬≤ Score: {gru_r2:.4f}")
print(f"   MAE:      ${gru_mae:.2f}")
print(f"   RMSE:     ${gru_rmse:.2f}")
print(f"   MAPE:     {gru_mape:.2f}%")
print(f"   Epochs trained: {len(history_gru.history['loss'])}")
print("\n‚úÖ GRU training complete")

5Ô∏è‚É£  TRAINING GRU

Training GRU...

üìä GRU Performance:
   R¬≤ Score: -0.3147
   MAE:      $479.21
   RMSE:     $658.74
   MAPE:     17.48%
   Epochs trained: 42

‚úÖ GRU training complete


## üéØ Step 11: Ensemble Model

In [1]:
print("="*80)
print("6Ô∏è‚É£  CREATING ENSEMBLE")
print("="*80)

# Weighted average ensemble
# Weight by inverse of MAE (better models get higher weight)
weights = np.array([1/rf_mae, 1/xgb_mae, 1/lgb_mae, 1/lstm_mae, 1/gru_mae])
weights = weights / weights.sum()

print(f"\nEnsemble weights:")
print(f"   Random Forest: {weights[0]:.3f}")
print(f"   XGBoost:       {weights[1]:.3f}")
print(f"   LightGBM:      {weights[2]:.3f}")
print(f"   LSTM:          {weights[3]:.3f}")
print(f"   GRU:           {weights[4]:.3f}")

# Ensemble predictions (all models use same test set now)
y_pred_ensemble = (
    weights[0] * y_pred_rf +
    weights[1] * y_pred_xgb +
    weights[2] * y_pred_lgb +
    weights[3] * y_pred_lstm +
    weights[4] * y_pred_gru
)

# Evaluate
ensemble_r2 = r2_score(y_test_lstm, y_pred_ensemble)
ensemble_mae = mean_absolute_error(y_test_lstm, y_pred_ensemble)
ensemble_rmse = np.sqrt(mean_squared_error(y_test_lstm, y_pred_ensemble))
ensemble_mape = np.mean(np.abs((y_test_lstm - y_pred_ensemble) / y_test_lstm)) * 100

print(f"\nüìä Ensemble Performance:")
print(f"   R¬≤ Score: {ensemble_r2:.4f}")
print(f"   MAE:      ${ensemble_mae:.2f}")
print(f"   RMSE:     ${ensemble_rmse:.2f}")
print(f"   MAPE:     {ensemble_mape:.2f}%")
print("\n‚úÖ Ensemble complete")

6Ô∏è‚É£  CREATING ENSEMBLE


NameError: name 'np' is not defined

## üìä Step 12: Compare All Models

In [None]:
# Create comparison table
comparison_df = pd.DataFrame({
    'Model': ['Random Forest', 'XGBoost', 'LightGBM', 'LSTM', 'GRU', 'Ensemble'],
    'R¬≤ Score': [rf_r2, xgb_r2, lgb_r2, lstm_r2, gru_r2, ensemble_r2],
    'MAE ($)': [rf_mae, xgb_mae, lgb_mae, lstm_mae, gru_mae, ensemble_mae],
    'RMSE ($)': [rf_rmse, xgb_rmse, lgb_rmse, lstm_rmse, gru_rmse, ensemble_rmse],
    'MAPE (%)': [rf_mape, xgb_mape, lgb_mape, lstm_mape, gru_mape, ensemble_mape]
}).sort_values('R¬≤ Score', ascending=False)

print("="*80)
print("ALL MODELS COMPARISON (FIXED - FAIR COMPARISON)")
print("="*80)
print()
display(comparison_df)

# Find best model
best_model_name = comparison_df.iloc[0]['Model']
best_r2 = comparison_df.iloc[0]['R¬≤ Score']
best_mae = comparison_df.iloc[0]['MAE ($)']

print(f"\nüèÜ BEST MODEL: {best_model_name}")
print(f"   R¬≤ Score: {best_r2:.4f}")
print(f"   MAE: ${best_mae:.2f}")

## üìà Step 13: Visualize Results

In [None]:
# Model comparison visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# R¬≤ Score
axes[0].barh(comparison_df['Model'], comparison_df['R¬≤ Score'], color='skyblue')
axes[0].set_xlabel('R¬≤ Score', fontsize=12)
axes[0].set_title('R¬≤ Score Comparison (Higher is Better)', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='x')
axes[0].axvline(x=0, color='red', linestyle='--', alpha=0.5)

# MAE
axes[1].barh(comparison_df['Model'], comparison_df['MAE ($)'], color='lightcoral')
axes[1].set_xlabel('MAE ($)', fontsize=12)
axes[1].set_title('MAE Comparison (Lower is Better)', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='x')

# MAPE
axes[2].barh(comparison_df['Model'], comparison_df['MAPE (%)'], color='lightgreen')
axes[2].set_xlabel('MAPE (%)', fontsize=12)
axes[2].set_title('MAPE Comparison (Lower is Better)', fontsize=14, fontweight='bold')
axes[2].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.savefig(f'{RESULTS_DIR}/model_comparison_fixed.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Visualization saved to results/model_comparison_fixed.png")

## üíæ Step 14: Save Best Model

In [None]:
print("="*80)
print("SAVING MODELS")
print("="*80)

# Determine best model and save it
model_map = {
    'Random Forest': (rf_model, 'rf_model.pkl', False),
    'XGBoost': (xgb_model, 'xgb_model.pkl', False),
    'LightGBM': (lgb_model, 'lgb_model.pkl', False),
    'LSTM': (lstm_model, 'lstm_model.h5', True),
    'GRU': (gru_model, 'gru_model.h5', True),
}

# Save ALL models
for model_name, (model, filename, is_keras) in model_map.items():
    filepath = os.path.join(MODELS_DIR, filename)
    if is_keras:
        model.save(filepath)
    else:
        joblib.dump(model, filepath)
    print(f"‚úÖ Saved {model_name}: {filepath}")

# Save scalers and metadata
joblib.dump(scaler_X, f'{MODELS_DIR}/scaler_X.pkl')
joblib.dump(scaler_y, f'{MODELS_DIR}/scaler_y.pkl')
joblib.dump(feature_names, f'{MODELS_DIR}/feature_names.pkl')
joblib.dump(sequence_length, f'{MODELS_DIR}/sequence_length.pkl')

# Save metadata with performance metrics
metadata = {
    'best_model': best_model_name,
    'best_r2': best_r2,
    'best_mae': best_mae,
    'training_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
    'models': comparison_df.to_dict('records'),
    'sequence_length': sequence_length,
    'n_features': len(feature_names),
    'train_size': len(X_train),
    'test_size': len(X_test_aligned)
}
joblib.dump(metadata, f'{MODELS_DIR}/metadata.pkl')

print(f"\n‚úÖ Scalers saved")
print(f"‚úÖ Feature names saved ({len(feature_names)} features)")
print(f"‚úÖ Metadata saved")
print(f"\nüèÜ Best model: {best_model_name}")
print(f"   Saved to: webapp/models/")

## ‚úÖ Training Complete!

### What Was Fixed?

**Problem**: Traditional ML models (RF, XGBoost, LightGBM) showed **negative R¬≤ scores** in the original Colab notebook.

**Root Cause**: 
- LSTM/GRU use sequences (30 days lookback), which reduces test set by 30 samples
- Traditional ML models were evaluated on full test set
- This created misaligned test sets ‚Üí unfair comparison ‚Üí negative R¬≤

**Solution**:
- Created `X_test_aligned` and `y_test_aligned` that match LSTM's test set
- All models now evaluated on the **same test samples**
- Fair comparison ‚Üí accurate R¬≤ scores

### Next Steps

1. ‚úÖ Models saved to `webapp/models/`
2. ‚úÖ Run web app: `cd webapp && python app.py`
3. ‚úÖ Open: http://localhost:5000

### Files Saved

**Models**: `webapp/models/`
- rf_model.pkl
- xgb_model.pkl
- lgb_model.pkl
- lstm_model.h5 ‚≠ê
- gru_model.h5
- scaler_X.pkl
- scaler_y.pkl
- feature_names.pkl
- metadata.pkl

**Visualizations**: `results/`
- model_comparison_fixed.png
