### **📌 Controlling Risks in a Credit Score Prediction Model**
To ensure that the model is **responsible, fair, and reliable**, I would implement the following **risk mitigation strategies**:

---

## **1️⃣ Bias & Fairness Risks**
💡 **Problem:** The model may discriminate against certain groups (e.g., gender, income level).  
✅ **Mitigation Strategies:**
- **Fairness Metrics**: Evaluate **Disparate Impact, Statistical Parity, and Equalized Odds**.
- **Bias Mitigation Techniques**:
  - **Reweighing** (adjust sample weights).
  - **Adversarial Debiasing** (reduce the ability of the model to infer protected attributes).
  - **Disparate Impact Remover** (modifies data to remove bias before training).
- **Monitor Fairness Over Time**: Track if bias reappears after deployment.

---

## **2️⃣ Explainability & Transparency Risks**
💡 **Problem:** Customers and regulators need to understand **why** a decision was made.  
✅ **Mitigation Strategies:**
- **SHAP (SHapley Additive Explanations)**:
  - Explain which features influenced a prediction.
  - Compare feature importance across demographic groups.
- **LIME (Local Interpretable Model-Agnostic Explanations)**:
  - Generate explanations for individual predictions.
- **Counterfactual Explanations**:
  - Show how small changes (e.g., higher credit utilization) could lead to a better score.

---

## **3️⃣ Data Privacy & Security Risks**
💡 **Problem:** Credit score models use sensitive personal and financial data.  
✅ **Mitigation Strategies:**
- **Remove Personally Identifiable Information (PII)** before training (e.g., Name, SSN).
- **Encrypt Data** in storage and transit.
- **Comply with GDPR/CCPA** regulations:
  - Allow users to **request model explanations**.
  - Provide an option for users to **contest decisions**.

---

## **4️⃣ Model Robustness & Reliability Risks**
💡 **Problem:** The model might make **wrong predictions** due to poor generalization or changing customer behavior.  
✅ **Mitigation Strategies:**
- **Monitor Model Drift**: If customer spending patterns change, retrain the model.
- **Adversarial Testing**: Check how the model reacts to **unusual edge cases** (e.g., a millionaire with high credit utilization).
- **Stress Testing**:
  - Simulate **economic downturns** to see if predictions remain stable.
  - Introduce **synthetic fraud cases** to test fraud detection capabilities.

---

## **5️⃣ Regulatory & Compliance Risks**
💡 **Problem:** Credit scoring is regulated by laws like **ECOA (Equal Credit Opportunity Act)** and **GDPR (General Data Protection Regulation)**.  
✅ **Mitigation Strategies:**
- **Fair Lending Compliance**: Ensure that **race, gender, marital status, and age** do not unfairly impact scores.
- **Automated Compliance Audits**: Generate fairness and risk reports for regulators.
- **Right to Explanation**:
  - Allow customers to receive a **detailed explanation** of their score.
  - Provide actionable insights on how to improve their score.

---

## **🚀 Next Steps**
Would you like me to:
1. **Implement model monitoring** to detect fairness drift over time?
2. **Generate counterfactual explanations** for individual customers?
3. **Improve bias mitigation techniques** with **more advanced debiasing strategies**?

Let me know how you'd like to proceed! 🔥


In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import shap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from imblearn.over_sampling import SMOTE  # Fixing class imbalance
from fairlearn.metrics import selection_rate
from aif360.sklearn.metrics import disparate_impact_ratio, statistical_parity_difference

# Load dataset
df = pd.read_csv("BankChurners.csv")

# Encode categorical features
label_encoders = {}
for col in ['Attrition_Flag', 'Gender', 'Education_Level', 'Income_Category', 'Card_Category']:
    le = LabelEncoder()
    df[col] = le.fit_transform(df[col])
    label_encoders[col] = le
    
# Define "Credit Score" using FICO Score Approximation
fico_bins = [0, 300, 580, 670, 740, 800, 850]
fico_labels = ['Very Poor', 'Fair', 'Good', 'Very Good', 'Exceptional']
df['Estimated_FICO_Score'] = np.clip((df['Credit_Limit'] * (1 - df['Avg_Utilization_Ratio']) / 20).astype(int), 300, 850)
df['Credit_Score_Category'] = pd.cut(df['Estimated_FICO_Score'], bins=fico_bins, labels=fico_labels)
le_class = LabelEncoder()
df['Credit_Score_Category'] = le_class.fit_transform(df['Credit_Score_Category'])
print(df['Credit_Score_Category'].value_counts())

# Define Features and Target for Classification
X = df.drop(columns=['Credit_Score_Category', 'Estimated_FICO_Score'])
y_class = df['Credit_Score_Category']

# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y_class, test_size=0.2, random_state=42, stratify=y_class)

# Standardizing the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Random Forest Classifier
rf_clf = RandomForestClassifier(n_estimators=100, random_state=42)
rf_clf.fit(X_train_scaled, y_train)

# Convert X_test_scaled (NumPy array) back to DataFrame with original column names
X_test_scaled_df = pd.DataFrame(X_test_scaled, columns=X.columns, index=X_test.index)

# Predict using the RandomForestClassifier
y_pred = rf_clf.predict(X_test_scaled_df)

# Print Model Accuracy
print("Model Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

# Bias Detection Metrics
protected_attribute = df.loc[X_test.index, 'Gender']  # Fetch original values before encoding
protected_attribute = protected_attribute.map({0: "Male", 1: "Female"})  # Ensure correct mapping

# Ensure indexing is correct
protected_attribute = protected_attribute.loc[X_test.index]

# Compute Selection Rates for Male & Female
male_selection_rate = selection_rate(y_test[protected_attribute == "Male"], y_pred[protected_attribute == "Male"])
female_selection_rate = selection_rate(y_test[protected_attribute == "Female"], y_pred[protected_attribute == "Female"])

# Handle Zero Selection Rates
if male_selection_rate == 0 or female_selection_rate == 0:
    gender_disparate_impact = np.nan  # Avoid division by zero
    print("⚠️ Warning: One of the gender groups has zero selection rate.")
else:
    gender_disparate_impact = female_selection_rate / male_selection_rate

# Display Bias Metrics
print("\n🔍 Fairness Metrics:")
print("Male Selection Rate:", male_selection_rate)
print("Female Selection Rate:", female_selection_rate)
print("Gender Disparate Impact Ratio:", gender_disparate_impact)

# SHAP Explainability
explainer = shap.TreeExplainer(rf_clf)
shap_values = explainer.shap_values(X_test)

shap_importance = np.abs(shap_values).mean(axis=0).flatten()

if len(shap_importance) == len(X.columns):
    shap_importance_df = pd.DataFrame({
        "Feature": X.columns,
        "SHAP Importance": shap_importance
    }).sort_values(by="SHAP Importance", ascending=False)

    print("\n🔑 Feature Importance (SHAP Values):")
    print(shap_importance_df)
else:
    print("Error: Length of SHAP importance values does not match number of features.")

# Implement Bias mitigation using Fairlearn
from fairlearn.postprocessing import ThresholdOptimizer
from fairlearn.metrics import demographic_parity_difference

thresh_opt = ThresholdOptimizer(estimator=rf_clf, constraints="demographic_parity")
thresh_opt.fit(X_train_scaled, y_train, sensitive_features=protected_attribute)
y_pred_fair = thresh_opt.predict(X_test_scaled)

# Prediction Function for New Customer
def predict_credit_score(new_customer):
    new_customer_df = pd.DataFrame([new_customer])
    new_customer_df = new_customer_df.reindex(columns=X.columns, fill_value=0)
    for col in ['Gender', 'Education_Level', 'Income_Category', 'Card_Category']:
        new_customer_df[col] = label_encoders[col].transform(new_customer_df[col])
    new_customer_scaled = scaler.transform(new_customer_df)
    prediction = rf_clf.predict(new_customer_scaled)
    return le_class.inverse_transform([prediction[0]])[0]

# New Customer Data
new_customer_data = pd.read_csv("test_customer_data.csv")

# Predict Credit Score
predicted_scores = new_customer_data.apply(predict_credit_score, axis=1)
print("Predicted Credit Score Category:", predicted_scores)


  vect_normalized_discounted_cumulative_gain = vmap(
  monte_carlo_vect_ndcg = vmap(vect_normalized_discounted_cumulative_gain, in_dims=(0,))


ValueError: Bin labels must be one fewer than the number of bin edges