## SHAP Explainability for XGBoost Model

To make our attrition model more interpretable and actionable for HR professionals, we use SHAP (SHapley Additive exPlanations). SHAP brings visual and quantitative insights into how each feature impacts individual predictions, allowing HR teams to understand *why* the model predicts that an employee might leave. By examining these explanations, HR can design more targeted interventions. For example, addressing frequent overtime, improving job satisfaction, or adjusting policies that disproportionately affect certain employee groups.

We'll generate:
- A summary plot showing the most influential features across the dataset.
- An individual force plot showing how features contribute to a single employee's attrition risk.

These visualizations will be saved in the `visuals/` folder for easy access and potential integration into a Tableau dashboard.

In [9]:
import shap
import pandas as pd
import matplotlib.pyplot as plt
import os
from xgboost import XGBClassifier

# Load processed data
df = pd.read_csv("../data/processed_attrition.csv")
X = df.drop(columns=["Attrition"])
y = df["Attrition"]

# Re-train model (or use your tuned model if already saved)
xgb_model = XGBClassifier(learning_rate=0.1, max_depth=3, n_estimators=200, subsample=0.8, eval_metric='logloss')
xgb_model.fit(X, y)

# Create SHAP explainer
explainer = shap.TreeExplainer(xgb_model)
shap_values = explainer.shap_values(X)

# Save SHAP summary plot
plt.figure()
shap.summary_plot(shap_values, X, plot_type="bar", show=False)
plt.xlabel("Mean Absolute SHAP Value (Impact on Attrition Prediction)", fontsize=12)
plt.ylabel("Feature", fontsize=12)
plt.title("Top Features Contributing to Employee Attrition", fontsize=14, pad=20)
plt.tight_layout()
plt.savefig("../visuals/shap_summary_plot.png")
plt.close()

# Save individual force plot (e.g., employee index 0)
# Force plot for a single employee with labeled axes
plt.figure()
shap.plots._waterfall.waterfall_legacy(explainer.expected_value, shap_values[0], X.iloc[0], show=False)
plt.suptitle("Employee Attrition Risk Breakdown (Example: Index 0)", fontsize=14, y=1.05)
plt.tight_layout()
plt.savefig("../visuals/shap_force_plot_example.png", bbox_inches='tight')
plt.close()

In [11]:
# Define the indices of the additional employees you'd like to analyze
example_indices = [5, 12]

# Loop through and generate force/waterfall plots
for i in example_indices:
    plt.figure()
    shap.plots._waterfall.waterfall_legacy(
        explainer.expected_value, shap_values[i], X.iloc[i], show=False
    )
    plt.suptitle(f"Employee Attrition Risk Breakdown (Example: Index {i})", fontsize=14, y=1.05)
    
    # Save plot to visuals folder
    filename = f"../visuals/shap_force_plot_example_{i}.png"
    plt.tight_layout()
    plt.savefig(filename, bbox_inches='tight')
    plt.close()

## SHAP Explainability: What Drives Attrition?

SHAP was used to interpret both overall and individual-level predictions from our XGBoost attrition model.

### Global Insights: Top Predictive Features
The summary plot shows that:
- **OverTime**, **StockOptionLevel**, and **MonthlyIncome** are the most important features driving attrition predictions.
- Features tied to **satisfaction**, **career progression**, and **commute** also have strong effects.
- These insights align with common drivers of employee disengagement and offer clear focus areas for HR intervention.

### Individual Prediction Example
We analyzed one specific employee (Index 0) in detail using a SHAP force plot:
- The model predicted high attrition risk due to **heavy overtime**, **poor work-life balance**, and **limited recent training**.
- Some factors, like **job satisfaction** and **relationship satisfaction**, helped reduce that risk, but not enough.
- This kind of breakdown is ideal for real-world application — helping HR not only detect risk, but **understand the reasons** and develop personalized retention strategies.

SHAP visualizations make the model transparent, trustworthy, and actionable for non-technical stakeholders.