# NZX 50 Forecasting Pipeline with Enhanced XAI & Performance
## AI-Powered Stock Market Forecasting and Risk Analysis
**Ray Marange - 09 October 2025**

Enhanced version with robust XAI and performance optimization

### Dependencies
This notebook requires: torch, pandas, numpy, xgboost, sklearn, matplotlib, seaborn, shap, sympy

**Note:** Ensure all dependencies are installed in your Python environment

**Usage:** Run all cells sequentially

## 1. Import Dependencies and Setup

In [None]:
# Import the main module
import sys
sys.path.append('.')
from main import *

# Configure notebook display
%matplotlib inline
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)

print("✓ All dependencies imported successfully!")

## 2. Generate NZX 50 Market Data

In [None]:
# Generate synthetic NZX 50 data
data_gen = NZX50DataGenerator(start_date='2020-01-01', end_date='2025-10-01')
df_raw = data_gen.generate_data()

print(f"Generated {len(df_raw)} days of market data")
print(f"Date range: {df_raw.index[0]} to {df_raw.index[-1]}")
print("\nFirst few rows:")
df_raw.head(10)

## 3. Feature Engineering

In [None]:
# Add technical indicators
feature_eng = FeatureEngineering()
df_features = feature_eng.add_technical_indicators(df_raw)

print(f"Created {len(df_features.columns)} features")
print(f"\nFeatures: {', '.join(df_features.columns.tolist())}")
print("\nFeature dataset shape:", df_features.shape)
df_features.head()

## 4. Data Preparation

In [None]:
# Prepare data for training
forecaster = NZX50Forecaster(sequence_length=30)
(X_train_seq, y_train_seq, X_test_seq, y_test_seq,
 X_train_flat, y_train_flat, X_test_flat, y_test_flat) = forecaster.prepare_data(df_features)

print(f"Training sequences shape: {X_train_seq.shape}")
print(f"Testing sequences shape: {X_test_seq.shape}")
print(f"Training flat shape: {X_train_flat.shape}")
print(f"Testing flat shape: {X_test_flat.shape}")

## 5. Train LSTM Model

In [None]:
# Train LSTM neural network
lstm_history = forecaster.train_lstm(X_train_seq, y_train_seq, epochs=50, batch_size=32)

## 6. Train XGBoost Model

In [None]:
# Train XGBoost model
xgb_history = forecaster.train_xgboost(X_train_flat, y_train_flat)

## 7. Generate Predictions

In [None]:
# Make predictions
ensemble_pred = forecaster.ensemble_predict(X_test_seq, X_test_flat, weights=[0.5, 0.5])
y_test_actual = forecaster.target_scaler.inverse_transform(y_test_seq)

lstm_pred = forecaster.predict_lstm(X_test_seq)
xgb_pred = forecaster.predict_xgboost(X_test_flat)

print("✓ Predictions generated successfully")
print(f"Ensemble predictions shape: {ensemble_pred.shape}")

## 8. Risk Analysis and Performance Metrics

In [None]:
# Perform risk analysis
risk_analyzer = RiskAnalyzer()

# Calculate performance metrics
ensemble_metrics = risk_analyzer.calculate_metrics(y_test_actual, ensemble_pred)
lstm_metrics = risk_analyzer.calculate_metrics(y_test_actual, lstm_pred)
xgb_metrics = risk_analyzer.calculate_metrics(y_test_actual, xgb_pred)

# Display results
print("="*60)
print("PERFORMANCE METRICS")
print("="*60)

metrics_df = pd.DataFrame({
    'Ensemble': ensemble_metrics,
    'LSTM': lstm_metrics,
    'XGBoost': xgb_metrics
})

metrics_df

In [None]:
# Calculate risk metrics
returns = np.diff(y_test_actual.flatten()) / y_test_actual[:-1].flatten()
var_95 = risk_analyzer.calculate_var(returns, 0.95)
cvar_95 = risk_analyzer.calculate_cvar(returns, 0.95)
sharpe = risk_analyzer.calculate_sharpe_ratio(returns)

print("="*60)
print("RISK METRICS")
print("="*60)
print(f"Value at Risk (95%): {var_95:.6f}")
print(f"Conditional VaR (95%): {cvar_95:.6f}")
print(f"Sharpe Ratio: {sharpe:.6f}")

## 9. Explainable AI (XAI) Analysis

In [None]:
# Perform SHAP analysis
xai_analyzer = XAIAnalyzer(forecaster.xgb_model, forecaster.feature_names)
shap_values = xai_analyzer.analyze(X_test_flat, sample_size=100)

## 10. Visualizations

In [None]:
# Create visualizer
visualizer = Visualizer()

# Plot predictions vs actual
visualizer.plot_predictions(y_test_actual, ensemble_pred, 
                           "NZX 50 Ensemble Model: Predictions vs Actual")

In [None]:
# Plot feature importance
visualizer.plot_feature_importance(xgb_history['feature_importance'], 
                                  forecaster.feature_names)

In [None]:
# Plot SHAP summary
visualizer.plot_shap_summary(shap_values, forecaster.feature_names)

In [None]:
# Plot training loss
visualizer.plot_training_loss(lstm_history['train_losses'])

In [None]:
# Plot risk metrics
visualizer.plot_risk_metrics(returns)

## 11. Summary and Conclusions

In [None]:
print("="*70)
print(" " * 20 + "PIPELINE COMPLETED")
print("="*70)
print("\n✓ NZX 50 Forecasting Pipeline executed successfully!")
print("\n✓ Key Achievements:")
print("  - Multi-model forecasting (LSTM + XGBoost ensemble)")
print("  - Comprehensive risk analysis (VaR, CVaR, Sharpe Ratio)")
print("  - Explainable AI with SHAP values")
print("  - Advanced feature engineering with technical indicators")
print("  - Performance optimization and visualization")
print("\n" + "="*70)