# NBA Game Prediction - Model Evaluation and Interpretation

This notebook focuses on evaluating and interpreting our best model. We'll:
1. Load the best model
2. Perform detailed evaluation
3. Use SHAP values for model interpretation
4. Analyze feature importance
5. Generate insights and recommendations

## 1. Import Libraries
Import all necessary libraries for model evaluation and interpretation.

In [None]:
import pandas as pd
import numpy as np
import joblib
import shap
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, confusion_matrix, classification_report,
    roc_curve, precision_recall_curve, average_precision_score
)
import matplotlib.pyplot as plt
import seaborn as sns

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

# Set plot style
plt.style.use('seaborn')
sns.set_palette('husl')

## 2. Load Data and Best Model
Load the test data and the best performing model.

In [None]:
# Load test data
X_test = pd.read_csv('../data/processed/X_test_selected.csv')
y_test = pd.read_csv('../data/processed/y_test.csv')

# Load best model
best_model = joblib.load('../models/best_model.joblib')

print('Test set shape:', X_test.shape)

## 3. Model Evaluation
Perform detailed evaluation of the model's performance.

In [None]:
# Make predictions
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]

# Calculate metrics
metrics = {
    'Accuracy': accuracy_score(y_test['TARGET'], y_pred),
    'Precision': precision_score(y_test['TARGET'], y_pred),
    'Recall': recall_score(y_test['TARGET'], y_pred),
    'F1 Score': f1_score(y_test['TARGET'], y_pred),
    'ROC AUC': roc_auc_score(y_test['TARGET'], y_pred_proba)
}

print('Model Performance Metrics:')
for metric, value in metrics.items():
    print(f'{metric}: {value:.3f}')

# Print classification report
print('\nClassification Report:')
print(classification_report(y_test['TARGET'], y_pred))

## 4. Visualization of Model Performance
Create visualizations to better understand model performance.

In [None]:
# Plot confusion matrix
plt.figure(figsize=(8, 6))
cm = confusion_matrix(y_test['TARGET'], y_pred)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Plot ROC curve
fpr, tpr, _ = roc_curve(y_test['TARGET'], y_pred_proba)
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {metrics["ROC AUC"]:.3f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

# Plot precision-recall curve
precision, recall, _ = precision_recall_curve(y_test['TARGET'], y_pred_proba)
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, label=f'PR curve (AP = {average_precision_score(y_test["TARGET"], y_pred_proba):.3f})')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend()
plt.show()

## 5. Model Interpretation using SHAP
Use SHAP values to understand feature importance and model predictions.

In [None]:
# Calculate SHAP values
explainer = shap.TreeExplainer(best_model)
shap_values = explainer.shap_values(X_test)

# Plot summary plot
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_test, plot_type='bar')
plt.title('Feature Importance (SHAP Values)')
plt.tight_layout()
plt.show()

# Plot detailed SHAP values
plt.figure(figsize=(10, 6))
shap.summary_plot(shap_values, X_test)
plt.title('SHAP Value Distribution')
plt.tight_layout()
plt.show()

## 6. Feature Importance Analysis
Analyze the importance of each feature in making predictions.

In [None]:
# Calculate mean absolute SHAP values
feature_importance = pd.DataFrame({
    'feature': X_test.columns,
    'importance': np.abs(shap_values).mean(axis=0)
}).sort_values('importance', ascending=False)

# Plot feature importance
plt.figure(figsize=(10, 6))
sns.barplot(x='importance', y='feature', data=feature_importance)
plt.title('Feature Importance (Mean |SHAP Value|)')
plt.tight_layout()
plt.show()

# Save feature importance
feature_importance.to_csv('../data/processed/feature_importance.csv', index=False)

## 7. Generate Insights and Recommendations
Based on the model evaluation and interpretation, generate insights and recommendations.

In [None]:
# Calculate feature correlations with predictions
correlations = X_test.corrwith(pd.Series(y_pred_proba)).sort_values(ascending=False)

print('Feature Correlations with Predictions:')
print(correlations)

# Save correlations
pd.DataFrame({'feature': correlations.index, 'correlation': correlations.values}).to_csv(
    '../data/processed/feature_correlations.csv', index=False
)

print('\nInsights and recommendations have been saved to ../data/processed/')