# Feature Importance Analysis with SHAP

## Why Feature Importance?
Understanding **which features contribute most to fraud detection** helps us:
- **Improve model interpretability** → See how XGBoost makes decisions.
- **Detect potential biases** → Ensure the model isn't overfitting to irrelevant features.
- **Optimize feature selection** → Identify the most impactful variables.

## What We Will Do:
1. **Calculate SHAP (SHapley Additive Explanations) values** for the best XGBoost model.  
2. **Visualize feature importance** using SHAP summary plots.  
3. **Analyze individual feature effects** to understand how each variable impacts fraud detection.  
4. **Check for potential overfitting** by inspecting feature dependencies.

By the end of this notebook, we’ll determine **which features truly drive fraud predictions** and if our model relies too much on certain variables.

**Imports**:

In [3]:
# Data Handling & Visualization
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
import shap
import joblib
from sklearn.model_selection import train_test_split

# Suppress warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")

print("All libraries imported successfully!")

All libraries imported successfully!


**Load the Best Performing Model**:

In [2]:
# Define model path
model_path = "../models/optimized_base_xgb.pkl"

# Load the best XGBoost model
best_model = joblib.load(model_path)

print("Best XGBoost model loaded successfully!")

Best XGBoost model loaded successfully!


**Load the Dataset**:

In [4]:
# Load the full dataset
X = pd.read_csv("../datasets/X_scaled.csv")
y = pd.read_csv("../datasets/y.csv")

# Split into train & test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print("Data loaded and splitted successfully!")

Data loaded and splitted successfully!


**SHAP Analysis**:

In [5]:
# Initialize SHAP Explainer
explainer = shap.Explainer(best_model, X_test)

# Compute SHAP values
shap_values = explainer(X_test)

print("SHAP values computed successfully!")



SHAP values computed successfully!
