# 10 - Bias Demonstration Notebook  

Evaluate and demonstrate bias in the model predictions using metrics, statistical tests, and explainability techniques. This includes GoodFit rate analysis across demographics, confusion matrices for subgroups, calibration checks, and SHAP analysis for feature contributions.

In [66]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
import shap
import scipy.stats as stats
import pickle
import json

In [67]:
# Display all rows and columns
pd.set_option('display.max_colwidth', None)  # Show full content in each cell
pd.set_option('display.max_rows', None)      # Show all rows
pd.set_option('display.max_columns', None)   # Show all columns

In [68]:
MODEL_PATH: str = "../models/xgb_model.pkl"
FEATURE_LIST_PATH: str = "../models/features.json"

In [69]:
df = pd.read_parquet("../app/data/static_data.parquet")

In [70]:
# Load the model
with open(MODEL_PATH, "rb") as f:
    model = pickle.load(f)

In [71]:
# Function to load the feature list
def load_feature_list(feature_list_path: str) -> list:
    """
    Load the feature list from a JSON file.

    Parameters:
        feature_list_path (str): Path to the feature list file.

    Returns:
        list: List of feature names.
    """
    with open(feature_list_path, "r") as f:
        return json.load(f)

In [72]:
# Load the feature list
feature_list: list = load_feature_list(FEATURE_LIST_PATH)

In [73]:
# Filter the dataset to include only relevant features
data_filtered: pd.DataFrame = df[feature_list]

In [74]:
# Ensure numeric data for all columns in data_filtered
data_filtered = data_filtered.apply(pd.to_numeric, errors="coerce")

In [75]:
# Make predictions
predictions: pd.Series = model.predict_proba(data_filtered)[:, 1]  # Probability of being a good fit

In [76]:
# Add predictions and "Good Fit" label to the dataset
df["Prediction_Probability"] = predictions
df["GoodFit"] = df["Prediction_Probability"] >= 0.5

In [77]:
df.head()

Unnamed: 0,Candidate_ID,Position_IT Support,Position_Production Technician I,Position_Area Sales Manager,Position_Production Manager,Position_Production Technician II,Position_Sales Manager,Position_Enterprise Architect,Position_Network Engineer,Position_Sr. Network Engineer,Position_Database Administrator,Position_Data Analyst,Position_Software Engineer,Position_Sr. DBA,Position_Sr. Accountant,Position_Administrative Assistant,Position_Accountant I,Position_Shared Services Manager,Position_IT Director,Position_CIO,Position_Principal Data Architect,Position_IT Manager - DB,Position_IT Manager - Support,Position_IT Manager - Infra,Position_BI Developer,Position_Senior BI Developer,Position_Data Architect,Position_BI Director,Position_Director of Sales,Position_Director of Operations,Position_Software Engineering Manager,Position_President & CEO,State,Sex,CitizenDesc_US Citizen,CitizenDesc_Eligible NonCitizen,CitizenDesc_Non-Citizen,HispanicLatino,RaceDesc_White,RaceDesc_Black or African American,RaceDesc_Asian,RaceDesc_American Indian or Alaska Native,RaceDesc_Hispanic,RaceDesc_Two or more races,Department_IT/IS,Department_Production,Department_Sales,Department_Software Engineering,Department_Admin Offices,Department_Executive Office,Age,YearsExperience,AgeGroup,ExperienceCategory,Education,Advanced Backup Strategies,Advanced Budget Forecasting,Advanced CRM Tools,Advanced Data Modeling,Advanced Data Visualization,Advanced Financial Reporting,Advanced Firewall Configurations,Advanced ITSM Tools,Advanced Machinery Maintenance,Advanced Machinery Troubleshooting,Advanced Network Configuration,Advanced Predictive Modeling,Advanced Revenue Analysis,Advanced SQL Optimization,Advanced Troubleshooting Techniques,Advanced Visualization,Agile Development Leadership,Audit Assistance,Audit Management,Backup Strategies,Backup and Recovery,Basic Accounting,Basic Machinery Maintenance,Big Data Architecture,Big Data Solutions,Budget Oversight,Budget Planning,Budget Strategy,Business Intelligence Strategy,Business Intelligence Tools,Business-IT Alignment,CI/CD Pipeline Management,Cloud Data Management,Cloud Data Solutions,Cloud Database Solutions,Cloud Integration,Cloud Networking,Cloud Strategy,Cloud-Native Data Architectures,Code Review Practices,Competitor Analysis,Cost Reduction Techniques,Customer Communication,Customer Relationship Management,Customer Retention,Customer Support,Customer Support Strategies,Cybersecurity Oversight,Dashboard Creation,Data Governance,Data Lake Architecture,Data Modeling,Data Pipeline Optimization,Data Pipeline Scalability,Data Security,Data Visualization,Database Design,Database Management,Database Tuning,Disaster Recovery Planning,Distributed Database Management,Document Management,ETL Automation,ETL Development,ETL Optimization,Efficiency Optimization,Enterprise Data Strategy,Financial Management,Financial Reporting,Firewall Expertise,Firewall Management,Forensic Accounting Techniques,Governance and Standards,Hardware Maintenance,Hardware Management,Hybrid Cloud Infrastructure Management,IT Governance,IT Security Oversight,IT Support Management,Incident Response Planning,Infrastructure Design,Java,Leadership,Leadership Skills,Lean Manufacturing,Machine Learning,Machine Learning Integration,Market Analysis,Microservices Architecture Design,Negotiation,Network Configuration,Network Management,Network Performance Optimization,Network Security Design,Office Coordination,Operations Performance Metrics,Operations Strategy,Performance Tuning,Predictive Analytics Integration,Preventive Maintenance Planning,Problem Identification,Problem-Solving,Process Improvement,Process Optimization,Production Line Efficiency Analysis,Public Relations,Python,Quality Assurance,QuickBooks,Real-Time Data Processing,Revenue Optimization,Risk Assessment,SD-WAN Deployment,SQL,SQL Optimization,Safety Protocols,Sales Funnel Optimization,Sales Strategy,Scheduling,Service Delivery Optimization,Software Design,Solution Architecture,Statistical Analysis,Strategic IT Investment Planning,Strategic Planning,Strategic Vision,Supply Chain Optimization,System Architecture,System Architecture Design,System Architecture Oversight,System Troubleshooting,System Upgrades,Tax Planning,Tax Preparation,Team Coordination,Team Leadership,Team Management,Teamwork,Technology Roadmap Development,Troubleshooting,Troubleshooting Oversight,VPN Setup,Vendor Management,AWS Certified Advanced Networking,AWS Certified Big Data Specialty,AWS Certified Database Specialty,AWS Certified Developer - Associate,AWS Certified Solutions Architect,Administrative Excellence Certification,Advanced Machinery Maintenance Certification,Basic Safety Certification,Certified Information Systems Security Professional (CISSP),Certified Kubernetes Administrator,Certified Leadership Professional,Certified Public Accountant (CPA),Chartered Financial Analyst (CFA),Cisco CCNA,Cisco CCNP,CompTIA A+,CompTIA Server+,Firewall Specialist Certification,Google Cloud Professional Data Engineer,Google Cloud Professional Developer,Google Data Analytics Professional Certificate,ITIL Expert,ITIL Foundation,Lean Manufacturing Certification,Microsoft Certified: Azure Administrator Associate,Microsoft Certified: Azure Database Administrator Associate,Microsoft Certified: Azure Fundamentals,Microsoft Power BI Data Analyst,Negotiation Specialist Certification,OSHA Certification,Oracle Certified Associate,Project Management Professional (PMP),QuickBooks Certified,Revenue Optimization Specialist Certification,Salesforce Certified,Salesforce Certified Administrator,Six Sigma Black Belt,Six Sigma Green Belt,TOGAF Certified,Tableau Desktop Certified Professional,Tableau Desktop Specialist,Employee_Name,Birthplace,Role,Technical_Skills,Certifications_Score,Prediction_Probability,GoodFit
0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,50,11,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Maurice, Shana",China,Production Technician I,3.0,0.0,0.997927,True
1,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,42,10,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,"Cobb, Rowan",USA,Production Technician I,3.0,5.0,0.999997,True
2,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,1,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,60,11,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,"Kramer, Kason",UK,Production Technician I,4.0,5.0,0.014106,False
3,4,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,56,12,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"Johns, Marquis",China,Production Technician I,4.0,2.5,1.0,True
4,5,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,56,12,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,"Mcdowell, Maximus",Canada,Production Technician I,4.0,5.0,0.958944,True


## Bias Metrics to Calculate


In [78]:
# Define demographic groups for analysis
demographic_columns = ['Sex', 'RaceDesc_White', 'RaceDesc_Black or African American', 'RaceDesc_Asian', 'Age']

**Calculate Demographic Parity**

In [79]:
# GoodFit rates by demographic groups
for col in demographic_columns:
    print(f"GoodFit Rates for {col}:")
    print(df.groupby(col)['GoodFit'].mean())
    print("\n")


GoodFit Rates for Sex:
Sex
0    0.812030
1    0.795229
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_White:
RaceDesc_White
0    0.917763
1    0.765046
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_Black or African American:
RaceDesc_Black or African American
0    0.783231
1    0.915789
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_Asian:
RaceDesc_Asian
0    0.792220
1    0.921053
Name: GoodFit, dtype: float64


GoodFit Rates for Age:
Age
32    1.000000
33    0.909091
34    1.000000
35    0.510638
36    0.960784
37    0.950000
38    0.800000
39    0.756757
40    0.660377
41    0.823529
42    0.948276
43    0.804878
44    0.733333
45    0.915493
46    0.653061
47    0.790323
48    0.857143
49    0.928571
50    0.810127
51    0.625000
52    0.636364
54    0.679245
55    0.960000
56    0.971429
57    0.838710
58    0.909091
59    0.500000
60    0.461538
63    0.625000
65    1.000000
66    1.000000
69    1.000000
72    1.000000
74    1.000000
Name: Goo

In [80]:
def _chi2_test(df, column, target):
    """
    Perform a chi-squared test for independence between a categorical column and target variable.
    
    Args:
        df (pd.DataFrame): The dataframe containing the data.
        column (str): The categorical column to test.
        target (str): The target variable (binary).
    
    Returns:
        dict: Chi-squared test results with Chi2 value, P-value, and Degrees of Freedom.
    """
    contingency_table = pd.crosstab(df[column], df[target])
    chi2, p, dof, expected = stats.chi2_contingency(contingency_table)
    return {"Chi2": chi2, "P-Value": p, "Degrees of Freedom": dof}


# Perform chi-squared tests for specified columns
columns_to_test = ["Sex", "RaceDesc_White", "RaceDesc_Black or African American", "RaceDesc_Asian"]
results = {col: _chi2_test(df, col, "GoodFit") for col in columns_to_test}

results


{'Sex': {'Chi2': np.float64(0.4132015620969667),
  'P-Value': np.float64(0.5203489755644433),
  'Degrees of Freedom': 1},
 'RaceDesc_White': {'Chi2': np.float64(32.41921524135085),
  'P-Value': np.float64(1.242517679314792e-08),
  'Degrees of Freedom': 1},
 'RaceDesc_Black or African American': {'Chi2': np.float64(16.96077187727458),
  'P-Value': np.float64(3.8160183270674855e-05),
  'Degrees of Freedom': 1},
 'RaceDesc_Asian': {'Chi2': np.float64(10.064078305767515),
  'P-Value': np.float64(0.0015118818166870406),
  'Degrees of Freedom': 1}}

**Analyze Equality of Opportunity**

In [81]:
# GoodFit rates for qualified candidates
qualified = df[df['YearsExperience'] > 5]  # Example qualification
for col in demographic_columns:
    print(f"GoodFit Rates for Qualified Candidates ({col}):")
    print(qualified.groupby(col)['GoodFit'].mean())
    print("\n")

GoodFit Rates for Qualified Candidates (Sex):
Sex
0    0.812030
1    0.795229
Name: GoodFit, dtype: float64


GoodFit Rates for Qualified Candidates (RaceDesc_White):
RaceDesc_White
0    0.917763
1    0.765046
Name: GoodFit, dtype: float64


GoodFit Rates for Qualified Candidates (RaceDesc_Black or African American):
RaceDesc_Black or African American
0    0.783231
1    0.915789
Name: GoodFit, dtype: float64


GoodFit Rates for Qualified Candidates (RaceDesc_Asian):
RaceDesc_Asian
0    0.792220
1    0.921053
Name: GoodFit, dtype: float64


GoodFit Rates for Qualified Candidates (Age):
Age
32    1.000000
33    0.909091
34    1.000000
35    0.510638
36    0.960784
37    0.950000
38    0.800000
39    0.756757
40    0.660377
41    0.823529
42    0.948276
43    0.804878
44    0.733333
45    0.915493
46    0.653061
47    0.790323
48    0.857143
49    0.928571
50    0.810127
51    0.625000
52    0.636364
54    0.679245
55    0.960000
56    0.971429
57    0.838710
58    0.909091
59    0.500000

In [82]:
# Calculate GoodFit rates for each demographic column
print("GoodFit Rates by Demographic Group")
for col in demographic_columns:
    if col == 'Age':
        # For Age, we print the rates by individual age first
        print(f"GoodFit Rates for {col}:")
        print(df.groupby(col)['GoodFit'].mean())
    else:
        print(f"GoodFit Rates for {col}:")
        print(df.groupby(col)['GoodFit'].mean())
    print("\n")

GoodFit Rates by Demographic Group
GoodFit Rates for Sex:
Sex
0    0.812030
1    0.795229
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_White:
RaceDesc_White
0    0.917763
1    0.765046
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_Black or African American:
RaceDesc_Black or African American
0    0.783231
1    0.915789
Name: GoodFit, dtype: float64


GoodFit Rates for RaceDesc_Asian:
RaceDesc_Asian
0    0.792220
1    0.921053
Name: GoodFit, dtype: float64


GoodFit Rates for Age:
Age
32    1.000000
33    0.909091
34    1.000000
35    0.510638
36    0.960784
37    0.950000
38    0.800000
39    0.756757
40    0.660377
41    0.823529
42    0.948276
43    0.804878
44    0.733333
45    0.915493
46    0.653061
47    0.790323
48    0.857143
49    0.928571
50    0.810127
51    0.625000
52    0.636364
54    0.679245
55    0.960000
56    0.971429
57    0.838710
58    0.909091
59    0.500000
60    0.461538
63    0.625000
65    1.000000
66    1.000000
69    1.000000
72  

In [83]:
# For Age, create bins for a categorical analysis
age_bins = [20, 30, 40, 50, 60, 70, 80]  # adjust the bins as appropriate
age_labels = ["20-29", "30-39", "40-49", "50-59", "60-69", "70-79"]
df["Age_Bin"] = pd.cut(df["Age"], bins=age_bins, labels=age_labels, right=False)

In [84]:
print("GoodFit Rates for Age Bins:")
print(df.groupby("Age_Bin")["GoodFit"].mean())
print("\n")

GoodFit Rates for Age Bins:
Age_Bin
20-29         NaN
30-39    0.816254
40-49    0.813592
50-59    0.774194
60-69    0.795918
70-79    1.000000
Name: GoodFit, dtype: float64




  print(df.groupby("Age_Bin")["GoodFit"].mean())


In [85]:
def _chi2_test(df, column, target):
    """
    Perform a chi-squared test for independence between a categorical column and target variable.
    
    Args:
        df (pd.DataFrame): The dataframe containing the data.
        column (str): The categorical column to test.
        target (str): The target variable (binary).
    
    Returns:
        dict: Chi-squared test results with Chi2 value, P-value, and Degrees of Freedom.
    """
    contingency_table = pd.crosstab(df[column], df[target])
    chi2, p, dof, expected = stats.chi2_contingency(contingency_table)
    return {"Chi2": chi2, "P-Value": p, "Degrees of Freedom": dof}


In [86]:
# Perform chi-squared tests for each of these columns against GoodFit
chi2_results = {col: _chi2_test(df, col, "GoodFit") for col in columns_to_test}

print("Chi-Squared Test Results")
for col, result in chi2_results.items():
    print(f"{col}: {result}")

Chi-Squared Test Results
Sex: {'Chi2': np.float64(0.4132015620969667), 'P-Value': np.float64(0.5203489755644433), 'Degrees of Freedom': 1}
RaceDesc_White: {'Chi2': np.float64(32.41921524135085), 'P-Value': np.float64(1.242517679314792e-08), 'Degrees of Freedom': 1}
RaceDesc_Black or African American: {'Chi2': np.float64(16.96077187727458), 'P-Value': np.float64(3.8160183270674855e-05), 'Degrees of Freedom': 1}
RaceDesc_Asian: {'Chi2': np.float64(10.064078305767515), 'P-Value': np.float64(0.0015118818166870406), 'Degrees of Freedom': 1}


- **Sex**:  
  In our dataset, sex is encoded as 0 for Female and 1 for Male. The GoodFit rate is approximately 82.1% for Females and 80.0% for Males. The chi-squared test (p ≈ 0.520) indicates that this difference is not statistically significant, suggesting little evidence of bias based on sex.

- **RaceDesc_White**:  
  Candidates with RaceDesc_White = 0 (non-White) have a GoodFit rate of about 91.8%, while those with RaceDesc_White = 1 (White) have a rate of roughly 76.5%. This difference is highly significant (p ≈ 1.24e-08), indicating potential bias.

- **RaceDesc_Black or African American**:  
  The GoodFit rate is around 78.3% for candidates with a value of 0 (non-Black) and 91.6% for those with a value of 1 (Black). The chi-squared test (p ≈ 3.82e-05) suggests that this difference is statistically significant, indicating potential bias.

- **RaceDesc_Asian**:  
  Candidates with RaceDesc_Asian = 0 (non-Asian) have a GoodFit rate of about 80.1%, compared to approximately 92.1% for candidates with RaceDesc_Asian = 1 (Asian). The difference is statistically significant (p ≈ 0.00151), suggesting potential bias.

- **Age**:  
  GoodFit rates vary substantially by age. For example, when examining individual ages, rates range from around 50.0% (e.g., age 35) to 100% for several ages (e.g., 32, 34, 65, 66, 69, 72, 74). When grouped into bins, the average GoodFit rates are roughly 81.6% for ages 30–39, 81.4% for ages 40–49, 77.4% for ages 50–59, 79.6% for ages 60–69, and 100% for ages 70–79. Although the differences across age bins are less pronounced than those seen in some racial subgroups, this variability suggests that age may also influence model predictions and warrants further investigation.

*Note: These metrics were recalculated after removing the HispanicLatino attribute from the dataset.*