# CS108/212 STAT108/212 W25 Course Project

### Team Details

- Teammate 1: Kyle Russell
- Teammate 2: Arhum Shahid 
- Teammate 3: Seona Magdum 

---


# Milestone: Mitigating Bias
For this project milestone, each teammate will implement bias mitigation strategies and assess pre and post bias mitigation performance.

# Installs

In [3]:
# This is the same dataset we worked on in Discussion 3
!pip install ucimlrepo
!pip install imbalanced-learn



# Imports

In [5]:
# Dataset imports
from ucimlrepo import fetch_ucirepo
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import collections

# For Preprocessing
from sklearn.preprocessing import StandardScaler

# sklearn imports
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression

# Loading dataset
_(same as previous milestone, copy-paste)_

In [7]:
## Extract data in the same way we did in Week 3 (Do this only once when coding as it sometimes takes a while)
statlog_german_credit_data = fetch_ucirepo(id=144)

In [8]:
## Use metadata and variables to find information about different attributes
# print(statlog_german_credit_data.metadata)
# print(statlog_german_credit_data.variables)

# Feature and target labels
X = statlog_german_credit_data.data.features
y = statlog_german_credit_data.data.targets

# Extract sensitive features from attributes
gender = X["Attribute9"].apply(lambda x: "Male" if x in ["A91", "A93", "A94"] else "Female")
age_binary = X["Attribute13"].apply(lambda x: "Old" if x >= 25 else "Young")

# Create a mapping to map what sensitive feature class it belongs to
group_labels = (gender == "Male").astype(int) + (age_binary == "Old").astype(int) * 2

# Print some stats
print(f"No. of samples: {X.shape[0]}")
print(f"No. of features: {X.shape[1]}")
print(f"Group Counts: {dict(collections.Counter(group_labels))}")

No. of samples: 1000
No. of features: 20
Group Counts: {3: 625, 0: 84, 2: 226, 1: 65}


**Notes for Group label Meanings**
- 0: maps to (Female, Young)
- 1: maps to (Male, Young)
- 2: maps to (Female, Old)
- 3: maps to (Male, Old)

# Preparing dataset
_(same as previous milestone, copy-paste)_

In [11]:
# Some subset of following dataset preparation steps may be necessary depending on your dataset,
# 1. Drop unnecessary features
# 2. Handle missing data
# 3. Encode categorical features
# 4. Normalize numerical features
# 5. Encode target (if your task is classification)

# No unnecessary features or missing data in German Credit Dataset
# Encode special protected categories
X_dropped = X.drop(columns = ["Attribute9", "Attribute13"]).copy()

X_protect = X[["Attribute9", "Attribute13"]].copy()
X_protect["Attribute9"] = X_protect["Attribute9"].apply(lambda x: 1 if x in ["A91", "A93", "A94"] else 0).astype(bool)
X_protect["Attribute13"] = X_protect["Attribute13"].apply(lambda x: 1 if x >= 25 else 0).astype(bool)

# Encode remaining categorical features
X_dropped = pd.get_dummies(X_dropped, drop_first=True) # Drop first of each hot one encoding to avoid multicollinearity
X_encoded = pd.concat([X_dropped, X_protect], axis=1)

# Normalize numerical features
num_cols = X_encoded.select_dtypes(include=['number']).columns
scaler = StandardScaler() # Converts to z scores (how many sd a value is from the mean)
X_encoded[num_cols] = scaler.fit_transform(X_encoded[num_cols])

# Encode the target variable
X_cleaned = X_encoded.copy()
y_cleaned = y["class"].apply(lambda x: 1 if x == 1 else 0).copy() # Note that class defines 1 as good and 2 as bad

# Note: X and y have been modified before the following lines of code!
print(f"No. of samples AFTER cleaning: {X_cleaned.shape[0]}")
assert X_cleaned.shape[0] == y_cleaned.shape[0] == group_labels.shape[0] ## Ensure that the target and group_labels have been updated if some samples were removed during cleaning.
print(f"No. of features AFTER encoding: {X_cleaned.shape[1]}")

No. of samples AFTER cleaning: 1000
No. of features AFTER encoding: 46


# Getting training and testing sets

Note: Train-test split is made **ONCE** to obtain the _training set_ and the _testing set_ and every teammate will use the training set to train their baseline model and test the trained model using the testing set. **NEVER** modify the testing set once it has been created.
Therefore, the following code cell does not need to be edited.

_(same as previous milestone, copy-paste)_

In [13]:
X_train, X_test, \
y_train, y_test, \
group_labels_train, group_labels_test = train_test_split(X_cleaned, y_cleaned, group_labels, test_size=0.2, random_state=42)

print(f"No. of training samples: {X_train.shape[0]}")
print(f"No. of testing samples: {X_test.shape[0]}")

# Changed since we are modifying copies of the original dataset with X_cleaned and y_cleaned
# # Delete X, y and group_label variables to make sure they are not used later on.
del X
del y
del group_labels

No. of training samples: 800
No. of testing samples: 200


# Setting up evaluation metrics
Note: The same evaluation function will be used by all teammates.

_(same as previous milestone, copy-paste)_

In [15]:
def evaluate_model(y_test, y_pred, g_labels):
    
    """
    Evaluate the performance of your trained model on the testing set.
  
    Parameters
    ----------
    y_test : array-like
        The true labels of the testing set.
    y_pred : array-like
        The predicted labels of the testing set.
    g_labels : array-like
        The group labels of the testing set.
  
    Returns
    -------
    results : dict
        A dictionary containing the evaluation results.
  
        Example:
          For classification task, the task-specific performance metrics like {'accuracy': <value>, 'f1_score': <value>, ...}
          and fairness metrics like {'demographic_parity': <value>, 'equalized_odds': <value>, ...}.
  
    """
    results = {}
  
    # Note: These metrics will be calculated for - 1. the full testing set, 2. individual groups.
    # Task-specific performance metrics
    TP = np.sum((y_test == 1) & (y_pred == 1))
    TN = np.sum((y_test == 0) & (y_pred == 0))
    FP = np.sum((y_test == 0) & (y_pred == 1))
    FN = np.sum((y_test == 1) & (y_pred == 0)) 

    results["accuracy"] = (TP + TN) / (TP + TN + FP + FN)
    results["precision"] = TP / (TP + FP)
    results["recall"] = TP / (TP + FN)
    
    # Fairness metric
    unique_groups = np.unique(g_labels) # 0, 1, 2, 3
    f_m = {}

    for group in unique_groups:
        y_test_g = y_test[g_labels == group]
        y_pred_g = y_pred[g_labels == group]
        
        TP = np.sum((y_test_g == 1) & (y_pred_g == 1))   
        TN = np.sum((y_test_g == 0) & (y_pred_g == 0))
        FP = np.sum((y_test_g == 0) & (y_pred_g == 1))
        FN = np.sum((y_test_g == 1) & (y_pred_g == 0))

        f_m[group] = {
            "d_p": (TP + FP) / (TP + TN + FP + FN), # Demographic parity
            "a_p": (TP + TN) / (TP + TN + FP + FN), # Accuracy parity (accuracy for each group)
            "p_p": TP / (TP + FP), # Predictive parity
            "e_e": TP / (TP + FN), # Equal Opportunity
        }
        
    """    
    For a single result, we will use the ratio between genders and age for each metric (closer to 1 -> more fair)
    Note: each group corresponds to the categories
    
    0: maps to (Female, Young)
    1: maps to (Male, Young)
    2: maps to (Female, Old)
    3: maps to (Male, Old)

    Gender will take the ratio between Female : Male
    Age will take the ratio between Young : Old
    """

    # ratio < 1: means Female parities and opportunities less often (and viseversa) 
    results["demographic_parity_gender_ratio"] = (f_m[0]["d_p"] + f_m[2]["d_p"]) / (f_m[1]["d_p"] + f_m[3]["d_p"])
    results["accuracy_parity_gender_ratio"] = (f_m[0]["a_p"] + f_m[2]["a_p"]) / (f_m[1]["a_p"] + f_m[3]["a_p"])
    results["predictive_parity_gender_ratio"] = (f_m[0]["p_p"] + f_m[2]["p_p"]) / (f_m[1]["p_p"] + f_m[3]["p_p"])
    results["equal_opportunity_gender_ratio"] = (f_m[0]["e_e"] + f_m[2]["e_e"]) / (f_m[1]["e_e"] + f_m[3]["e_e"])

    # ratio < 1: means Young parities and opportunities less often (and viseversa) 
    results["demographic_parity_age_ratio"] = (f_m[0]["d_p"] + f_m[1]["d_p"]) / (f_m[2]["d_p"] + f_m[3]["d_p"])
    results["accuracy_parity_age_ratio"] = (f_m[0]["a_p"] + f_m[1]["a_p"]) / (f_m[2]["a_p"] + f_m[3]["a_p"])
    results["predictive_parity_age_ratio"] = (f_m[0]["p_p"] + f_m[1]["p_p"]) / (f_m[2]["p_p"] + f_m[3]["p_p"])
    results["equal_opportunity_age_ratio"] = (f_m[0]["e_e"] + f_m[1]["e_e"]) / (f_m[2]["e_e"] + f_m[3]["e_e"])

    return results

# Training baseline models (INDIVIDUAL CONTRIBUTION)
_(minor modifications from previous milestone)_

In [17]:
## A place to save all teammates's baseline results
all_baseline_results = [] ## DO NOT EDIT

## Teammate 1: Kyle Russell

In [19]:
# Select a model and train it on the training set
model = DecisionTreeClassifier(max_depth=5, min_samples_split=15, min_samples_leaf=2, random_state=135) # Adjustable hyperparameters
model.fit(X_train, y_train)

# Make predictions on the testing set and store them in y_pred
y_pred = model.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)

for metric, value in results.items():
    print(f"{metric}: {value:.5f}")
    
# Save your results to all_baseline_results
results['teammate'] = 'Kyle Russell'
results['experiment_type'] = 'baseline'
results['predictor_model'] = DecisionTreeClassifier(max_depth=5, min_samples_split=15, min_samples_leaf=2, random_state=135)
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

accuracy: 0.75500
precision: 0.80667
recall: 0.85816
demographic_parity_gender_ratio: 0.95376
accuracy_parity_gender_ratio: 0.93264
predictive_parity_gender_ratio: 0.82464
equal_opportunity_gender_ratio: 0.93903
demographic_parity_age_ratio: 0.63285
accuracy_parity_age_ratio: 0.92268
predictive_parity_age_ratio: 1.04006
equal_opportunity_age_ratio: 0.72044


## Teammate 2: Arhum Shahid

In [21]:
# Select a model and train it on the training set
model = KNeighborsClassifier(n_neighbors=13) # Adjustable hyperparameter
model.fit(X_train, y_train)

# Make predictions on the testing set and store them in y_pred
y_pred = model.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)
for metric, value in results.items():
    print(f"{metric}: {value:.5f}")

# Save your results to all_baseline_results
results['teammate'] = 'Arhum Shahid'
results['experiment_type'] = 'baseline'
results['predictor_model'] = KNeighborsClassifier(n_neighbors=13)
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

accuracy: 0.68000
precision: 0.72000
recall: 0.89362
demographic_parity_gender_ratio: 0.98571
accuracy_parity_gender_ratio: 0.95455
predictive_parity_gender_ratio: 0.85990
equal_opportunity_gender_ratio: 1.03240
demographic_parity_age_ratio: 0.82096
accuracy_parity_age_ratio: 1.00000
predictive_parity_age_ratio: 1.00566
equal_opportunity_age_ratio: 0.91338


## Teammate 3: Seona Magdum

In [23]:
# Select a model and train it on the training set
model = LogisticRegression(solver='liblinear',random_state=130)
model.fit(X_train, y_train)

# Make predictions on the testing set and store them in y_pred
y_pred = model.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)
for metric, value in results.items():
    print(f"{metric}: {value:.5f}")

# Save your results to all_baseline_results
results['teammate'] = 'Seona Magdum'
results['experiment_type'] = 'baseline'
results['predictor_model'] = LogisticRegression(solver='liblinear',random_state=130)
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

accuracy: 0.79000
precision: 0.82781
recall: 0.88652
demographic_parity_gender_ratio: 0.99408
accuracy_parity_gender_ratio: 1.08377
predictive_parity_gender_ratio: 0.93232
equal_opportunity_gender_ratio: 1.11170
demographic_parity_age_ratio: 0.65196
accuracy_parity_age_ratio: 1.00000
predictive_parity_age_ratio: 1.13472
equal_opportunity_age_ratio: 0.83460


# Mitigating Bias (INDIVIDUAL CONTRIBUTION)

_(new in this milestone)_


In [25]:
## A place to save all teammates' post-mitigation results
all_mitigated_results = [] ## DO NOT EDIT

## Teammate 1: Kyle Russell

In [27]:
## Mitigation method (reweighing)
# Count the weights per group
group_counts = collections.Counter(group_labels_train)

# Compute inverse weights (Smaller counts should have larger weights)
group_inverse_weights = {}
for group, count in group_counts.items():
    group_inverse_weights[group] = 1.0 / count

# Rescale weights for stability (scaling up to sample size)
scaling_factor = len(group_labels_train) / sum(group_inverse_weights.values())
group_scaled_weights = {group: weight * scaling_factor for group, weight in group_inverse_weights.items()}

# Apply these weights to each row in our dataset
sample_weights = []
for group in group_labels_train:
    sample_weights.append(group_scaled_weights[group])

# Select a model and train it on the training set
model_mitigated = DecisionTreeClassifier(max_depth=5, min_samples_split=15, min_samples_leaf=2, random_state=135) 
model_mitigated.fit(X_train, y_train, sample_weight=sample_weights) # We are using the reweighted values for each group

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = model_mitigated.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Kyle Russell'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = DecisionTreeClassifier(max_depth=5, min_samples_split=15, min_samples_leaf=2, random_state=135)
results_mitigated['mitigation_strategy'] = 'preprocessing: Reweighting'
all_mitigated_results.append(results_mitigated)

for metric, value in results_mitigated.items():
    if isinstance(value, float):
        print(f"{metric}: {value:.5f}")

accuracy: 0.75500
precision: 0.76744
recall: 0.93617
demographic_parity_gender_ratio: 1.11765
accuracy_parity_gender_ratio: 0.88500
predictive_parity_gender_ratio: 0.80245
equal_opportunity_gender_ratio: 1.09515
demographic_parity_age_ratio: 0.91150
accuracy_parity_age_ratio: 0.97382
predictive_parity_age_ratio: 0.98230
equal_opportunity_age_ratio: 0.98257


### Teammate 1's Conclusions

&emsp;The reweighting strategy improved fairness by bringing parity values closer to 1, while maintaining an overall accuracy at 75.5%. We saw that age related parity comparisons improved significantly, while gender-related parity saw mixed results (although age was more imbalanced outcome in this dataset before). Looking at each of the categories:

#### Gender-Based Fairness Adjustments
- **Demographic Parity** increased from **0.95376 → 1.11765**, which maintains an acceptable range (though further from 1, favoring females).  
- **Equal Opportunity** shifted from **0.93903 → 1.09414**, which is also in an acceptable range but slightly further from a ratio of 1.  

#### Age-Based Fairness Adjustments  
- **Demographic Parity** improved from **0.63285 → 0.91154**, moving **44% closer to 1** (a significant improvement).  
- **Equal Opportunity** moved from **0.72044 → 0.98257**, making it **36% closer to 1**.  

#### Performance Trade-Offs  
- **Recall improved** from **0.85816 → 0.93617** (**+9.1%**), meaning better detection of positive cases.  
- **Precision decreased** from **0.85816 → 0.76744** (**-4.9%**), meaning slightly more false positives.  

## Teammate 2: Arhum Shahid

In [30]:
from imblearn.under_sampling import EditedNearestNeighbours

# Function to perform massaging strategy
def apply_massaging(X, y, group_labels):
    """Applies the massaging technique to reduce bias in training data."""
    enn = EditedNearestNeighbours()
    X_resampled, y_resampled = enn.fit_resample(X, y)
    return X_resampled, y_resampled

# Apply massaging to training data
X_train_massaged, y_train_massaged = apply_massaging(X_train, y_train, group_labels_train)

# Train a new KNN model on massaged data
knn_massaged = KNeighborsClassifier(n_neighbors=13)
knn_massaged.fit(X_train_massaged, y_train_massaged)

# Make predictions
y_pred_mitigated = knn_massaged.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save results
results_mitigated['teammate'] = 'Arhum Shahid'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = KNeighborsClassifier(n_neighbors=13)
results_mitigated['mitigation_strategy'] = 'preprocessing: Massaging'
all_mitigated_results.append(results_mitigated)


for metric, value in results_mitigated.items():
    if isinstance(value, float):
        print(f"{metric}: {value:.5f}")

accuracy: 0.66500
precision: 0.80328
recall: 0.69504
demographic_parity_gender_ratio: 0.82979
accuracy_parity_gender_ratio: 0.98039
predictive_parity_gender_ratio: 0.85572
equal_opportunity_gender_ratio: 0.84402
demographic_parity_age_ratio: 0.64331
accuracy_parity_age_ratio: 0.80357
predictive_parity_age_ratio: 0.90306
equal_opportunity_age_ratio: 0.63645


### Teammate 2's Conclusions
The massaging strategy resulted in improvements in fairness while maintaining an acceptable trade-off in performance. After applying massaging, the model's accuracy shifted from **0.645** (baseline) to **0.665**, reflecting a **3.1%** improvement. Gender-based fairness metrics showed a slight improvement, with demographic parity increasing to **0.8298**, and equal opportunity reaching **0.8440**. Age-based fairness also improved, particularly in equal opportunity, which increased to **0.6365**. While precision slightly decreased to **0.8033**, recall improved to **0.6950**, indicating fewer false negatives. This suggests that the massaging strategy effectively helped reduce bias in the dataset, leading to a fairer decision-making process while keeping predictive performance relatively stable.



## Teammate 3: Seona Magdum

In [33]:
# Implement your bias mitigation strategy: Regularization
model_mitigated = LogisticRegression(penalty='l2',C=0.1,solver='liblinear',random_state=130)
model_mitigated.fit(X_train, y_train)

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = model_mitigated.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Seona Magdum'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = LogisticRegression(penalty='l2',C=0.1,solver='liblinear',random_state=130)
results_mitigated['mitigation_strategy'] = 'inprocessing: Regularization'
all_mitigated_results.append(results_mitigated)


print("\nMitigated Model Evaluation:")
for metric, value in results_mitigated.items():
    if isinstance(value, float):
        print(f"{metric}: {value:.5f}")


Mitigated Model Evaluation:
accuracy: 0.77000
precision: 0.78443
recall: 0.92908
demographic_parity_gender_ratio: 1.00508
accuracy_parity_gender_ratio: 0.91304
predictive_parity_gender_ratio: 0.82847
equal_opportunity_gender_ratio: 1.01568
demographic_parity_age_ratio: 0.77130
accuracy_parity_age_ratio: 1.04124
predictive_parity_age_ratio: 1.06905
equal_opportunity_age_ratio: 0.90737


### Teammate 3's Conclusions
We can see that this reweighting srategy has improved fairness for most of the fairness metrics. The accuracy has decreased from about 2.5% from 79% to 77% while the precision has a five percent decrease from 82% to 78%. However, recall increased significantly by 4%, from 88% to 92%, meaning a fewer number of false negatives. <br>
For gender, the equal oppurtunity and demographic parity became slightly fairer. Equal oppurtunity dropped from 1.11 to 1.01, suggesting a more balanced ratio. Demographic parity also slightly increased, from 0.99 to 1.001. The accuracy parity and predictive parity decreased in number and became further away from 1, meaning both the prediction and accuracy consistency across gender groups slightly worsened. <br>
For age, the demographic parity, predictive parity and equal oppurtunity all became more fair, with accuracy parity having a slight increase from 1 to 1.04. The Demographic parity had the most improvement from 0.65 to 0.77 with our model, as well as equal oppurtunity improving by around 0.07. <br>
Overall, after mitigation the fairness improved for both gender and age. Accuracy and precison slightly dropped, with the recall increasing, suggesting a more inclusive model.

# Conclusions
_(new in this milestone)_


In [36]:
# Define column order with metadata fields first
column_order = [
    "teammate", "experiment_type", "predictor_model", "mitigation_strategy", 
    "accuracy", "precision", "recall", 
    "demographic_parity_gender_ratio", "accuracy_parity_gender_ratio", "predictive_parity_gender_ratio", "equal_opportunity_gender_ratio", 
    "demographic_parity_age_ratio", "accuracy_parity_age_ratio", "predictive_parity_age_ratio", "equal_opportunity_age_ratio"
]

# Collect all the results in one table.
overall_results = pd.concat([pd.DataFrame(all_baseline_results), pd.DataFrame(all_mitigated_results)])
overall_results = overall_results[column_order]
overall_results ## Note: The table displayed below in this starter notebook is for your reference, your team's table will be slightly different (e.g. different metrics, no.of sensitive attribute-based groups, actual values, etc.) upon successful completion of this notebook.

Unnamed: 0,teammate,experiment_type,predictor_model,mitigation_strategy,accuracy,precision,recall,demographic_parity_gender_ratio,accuracy_parity_gender_ratio,predictive_parity_gender_ratio,equal_opportunity_gender_ratio,demographic_parity_age_ratio,accuracy_parity_age_ratio,predictive_parity_age_ratio,equal_opportunity_age_ratio
0,Kyle Russell,baseline,"DecisionTreeClassifier(max_depth=5, min_sample...",NONE,0.755,0.806667,0.858156,0.953757,0.932642,0.824639,0.939033,0.63285,0.92268,1.040056,0.72044
1,Arhum Shahid,baseline,KNeighborsClassifier(n_neighbors=13),NONE,0.68,0.72,0.893617,0.985714,0.954545,0.859897,1.032404,0.820961,1.0,1.005657,0.913381
2,Seona Magdum,baseline,"LogisticRegression(random_state=130, solver='l...",NONE,0.79,0.827815,0.886525,0.994083,1.08377,0.932322,1.111704,0.651961,1.0,1.134721,0.834603
0,Kyle Russell,post-mitigation,"DecisionTreeClassifier(max_depth=5, min_sample...",preprocessing: Reweighting,0.755,0.767442,0.93617,1.117647,0.885,0.802455,1.095149,0.911504,0.973822,0.982298,0.982568
1,Arhum Shahid,post-mitigation,KNeighborsClassifier(n_neighbors=13),preprocessing: Massaging,0.665,0.803279,0.695035,0.829787,0.980392,0.855721,0.844021,0.643312,0.803571,0.903061,0.636451
2,Seona Magdum,post-mitigation,"LogisticRegression(C=0.1, random_state=130, so...",inprocessing: Regularization,0.77,0.784431,0.929078,1.005076,0.913043,0.828466,1.015682,0.7713,1.041237,1.069046,0.907371


&emsp;In terms of accuracy, precision and recall (performance metrics), baseline models generally had the edge compared to their post mitigation counterparts. Post mitigation models saw slight drops in accuracy (or at best maintained their accuracy) which suggests common fairness performance trade-offs. In terms of fairness and bias mitigation, certain models generally improved by the different bias mitigation methods. However, a more local model such as KNN struggled to maintain balance after applying massaging. Much of the models had improved in terms of metrics such as demographic parity age ratio and equal opportunity age ratio (excluding KNN), but struggled in improving the fairness metrics for gender ratio, which may have been the case as gender was relatively balanced before. In terms of overall performance, Decision Trees with reweighing and logistic regression with regularization are the best choices. However, if performance is the priority, the baseline Logistic regression performs well, at the cost of some fairness gaps.

# References
- Kyle Russell:
   - https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.fit 
   - https://arxiv.org/html/2312.12560v1
   - https://docs.python.org/3/library/collections.html
- Arhum Shahid:
   - https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.EditedNearestNeighbours.html
- Seona Magdum:
  - https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_sample_weight.html
  - https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

# Disclosures

- Use of chatgpt to look up how to use different functions of pandas, numpy, seaborn, and other imports