# CS108/212 STAT108/212 W26 Course Project

### Team Details

- Teammate 1: Henry Yost
- Teammate 2: Dmitry Sorokin
- Teammate 3: Kyle Chahal
- Teammate 4: Refugio Zepeda

---


# Milestone: Mitigating Bias
For this project milestone, each teammate will implement bias mitigation strategies and assess pre and post bias mitigation performance.

# Installs

In [15]:
# [INSERT CODE HERE to install necessary packages]
import sys
!{sys.executable} -m pip install -r ../requirements.txt



# Imports

In [16]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
import collections
from pprint import pprint
from sklearn.model_selection import train_test_split
from ucimlrepo import fetch_ucirepo
from fairlearn.metrics import equalized_odds_difference
from sklearn.metrics import accuracy_score
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

## Add additional imports needed for your project here.

# Loading dataset
_(same as previous milestone, copy-paste)_

In [17]:
# Load your selected dataset
# fetch dataset 
adult = fetch_ucirepo(id=2) 
# data (as pandas dataframes) 
X = adult.data.features 
y = adult.data.targets 

# variable information 
print(adult.variables)

# Making our data a pandas df
adult_clean = pd.concat([X, y], axis=1)

sensitive_feature_colname = ['age', 'sex', 'race', 'marital-status'] # sensitive feature name
#age, sex, race, (marital status), 

# Make sensitive features-based group labels
group_labels = adult_clean[sensitive_feature_colname]

# Print some stats
print(f"No. of samples: {X.shape[0]}")
print(f"No. of features: {X.shape[1]}")
#print(f"Group Counts: {dict(collections.Counter(group_labels))}")

              name     role         type      demographic  \
0              age  Feature      Integer              Age   
1        workclass  Feature  Categorical           Income   
2           fnlwgt  Feature      Integer             None   
3        education  Feature  Categorical  Education Level   
4    education-num  Feature      Integer  Education Level   
5   marital-status  Feature  Categorical            Other   
6       occupation  Feature  Categorical            Other   
7     relationship  Feature  Categorical            Other   
8             race  Feature  Categorical             Race   
9              sex  Feature       Binary              Sex   
10    capital-gain  Feature      Integer             None   
11    capital-loss  Feature      Integer             None   
12  hours-per-week  Feature      Integer             None   
13  native-country  Feature  Categorical            Other   
14          income   Target       Binary           Income   

                       

# Preparing dataset
_(same as previous milestone, copy-paste)_

In [18]:
# Some subset of following dataset preparation steps may be necessary depending on your dataset,
# 1. Drop unnecessary features
# 2. Handle missing data
# 3. Encode categorical features
# 4. Normalize numerical features
# 5. Encode target (if your task is classification)



#removing unwanted columns
adult_clean = adult_clean.drop(columns = ['education', 'native-country'])
#removing any empty values:
adult_clean = adult_clean.dropna()
#making income binary
    #1 represents over 50 k
mapping = {'>50K': 1, '<=50K': 0}
adult_clean['income'] = adult_clean['income'].map(mapping)
#updating sensitive labels
group_labels = adult_clean[sensitive_feature_colname]
#factorizing non-numeric data:

adult_clean['workclass_num'], unique_labels = pd.factorize(adult_clean['workclass'])
adult_clean['marital-status_num'], unique_labels = pd.factorize(adult_clean['marital-status'])
adult_clean['occupation_num'], unique_labels = pd.factorize(adult_clean['occupation'])
adult_clean['relationship_num'], unique_labels = pd.factorize(adult_clean['relationship'])
adult_clean['race_num'], unique_labels = pd.factorize(adult_clean['race'])
#For sex, male is 1, female is 0
mapping = {'Male': 1, 'Female': 0}
adult_clean['sex'] = adult_clean['sex'].map(mapping)

adult_clean = adult_clean.dropna()

X = adult_clean[['age', 'workclass_num', 'fnlwgt', 'education-num', 'marital-status_num', 'occupation_num', 'relationship_num', 'race_num', 'sex', 'capital-gain', 'capital-loss', 'hours-per-week']]
y = adult_clean[['income']]

group_labels = adult_clean[sensitive_feature_colname]


# Note: X and y have been modified before the following lines of code!
print(f"No. of samples AFTER cleaning: {X.shape[0]}")
assert X.shape[0] == y.shape[0] == group_labels.shape[0] ## Ensure that the target and group_labels have been updated if some samples were removed during cleaning.
print(f"No. of features AFTER encoding: {X.shape[1]}")

No. of samples AFTER cleaning: 32561
No. of features AFTER encoding: 12


# Getting training and testing sets

Note: Train-test split is made **ONCE** to obtain the _training set_ and the _testing set_ and every teammate will use the training set to train their baseline model and test the trained model using the testing set. **NEVER** modify the testing set once it has been created.
Therefore, the following code cell does not need to be edited.

_(same as previous milestone, copy-paste)_

In [19]:
X_train, X_test, \
  y_train, y_test, \
    group_labels_train, group_labels_test = train_test_split(X, y, group_labels,
                                                             test_size=0.2, random_state=42)

print(f"No. of training samples: {X_train.shape[0]}")
print(f"No. of testing samples: {X_test.shape[0]}")

# Delete X, y and group_label variables to make sure they are not used later on.
del X
del y
del group_labels

No. of training samples: 26048
No. of testing samples: 6513


# Setting up evaluation metrics
Note: The same evaluation function will be used by all teammates.

_(same as previous milestone, copy-paste)_

In [20]:
# double importing just in case
import numpy as np

def evaluate_model(y_test, y_pred, g_labels):
    """
    Evaluate the performance of your trained model on the testing set.
    
    Parameters
    ----------
    y_test : array-like
    The true targets of the testing set.
    y_pred : array-like
    The predicted targets of the testing set.
    g_labels : array-like
    The group labels of the testing set.
    
    Returns
    -------
    results : dict
    A dictionary containing the evaluation results.
    
    Example:
    For classification task, the task-specific performance metrics like {'accuracy': <value>, 'f1_score': <value>, ...}
    and fairness metrics like {'demographic_parity': <value>, 'equalized_odds': <value>, ...}.
    """
    results = {}

    # force them to be arrays just in case, so we don't have an error
    y_test = np.asarray(y_test).ravel()
    y_pred = np.asarray(y_pred).ravel()
    g_labels = np.asarray(g_labels).ravel()
    
    # Note: These metrics will be calculated for - 1. the full testing set, 2. individual groups.
    # Task-specific performance metrics

    global_accuracy = accuracy_score(y_test, y_pred)
    # print(f"Global Accuracy score of: {global_accuracy}")
    results["accuracy_overall"] = global_accuracy

    global_f1 = f1_score(y_test, y_pred)
    # print(f"Global f1 score of: {global_f1}")
    results["f1_overall"] = global_f1

    results["accuracy_by_group"] = {}
    results["f1_by_group"] = {}

    for g in np.unique(g_labels):
        mask = (g_labels == g)
        y_test_g = y_test[mask]
        y_pred_g = y_pred[mask]

        results["accuracy_by_group"][g] = accuracy_score(y_test_g, y_pred_g)
        results["f1_by_group"][g] = f1_score(y_test_g, y_pred_g, pos_label=1, zero_division=0)
    
    # Fairness metric:
    # The fairness metric we will be using is equalied odds, because: Equalized odds requires the TPR and FPR are equal accross all protected groups.

    eo_diff = equalized_odds_difference(y_test, y_pred, sensitive_features=g_labels, method='between_groups')
    # print(f"Equalized Odds Difference: {eo_diff}")
    results['eo_diff'] = eo_diff
    
    return results

# Training baseline models (INDIVIDUAL CONTRIBUTION)
_(minor modifications from previous milestone)_

In [21]:
## A place to save all teammates's baseline results
all_baseline_results = [] ## DO NOT EDIT

## Teammate 1

In [22]:
#code that is used by all teammates, seperated so we dont re-run it multiple times
#defining sensitive cols
age_bins = [0, 25, 45, 65, 120]
age_labels = ['Under_25', '25_to_45', '46_to_65', 'Over_65']

# 2. Bin the ages from X_test ON THE FLY into a temporary variable
binned_age_test = pd.cut(X_test['age'], bins=age_bins, labels=age_labels)

# 3. Create your group_labels_test by combining the temporary binned ages 
# with the other numerical columns in X_test
group_labels_test = binned_age_test.astype(str) + '_' + \
                    X_test['sex'].astype(str) + '_' + \
                    X_test['race_num'].astype(str)

#normalizing training data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [23]:
# Select a model and train it on the training set
# Deciding to use a KNN model
#

#finding optimal number of neighbors from 1 to 10, arbitrarily chosen, to limit processing time.
optimal_n = 0
optimal_n_score = 0

for i in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=i)
    knn.fit(X_train, np.asarray(y_train).ravel())
    predictions = knn.predict(X_test)
    model_score = eo_diff = equalized_odds_difference(y_test, predictions, sensitive_features=group_labels_test, method='between_groups')
    #knn.score(X_test, y_test)

    if(0 == optimal_n):
        optimal_n = i
        optimal_n_score = model_score
    elif(model_score < optimal_n_score):
        optimal_n = i
        optimal_n_score = model_score

    print('Neighbors: ', i, ' Equalized Odds Diff: ', model_score)

#Making the actual model:
KNN_model = KNeighborsClassifier(n_neighbors=optimal_n)
KNN_model.fit(X_train, np.asarray(y_train).ravel())
# Make predictions on the testing set and store them in y_pred
y_pred = KNN_model.predict(X_test)
print("Optimal number of neighbors: ", optimal_n, ". Optimal EO_diff: ", optimal_n_score)
# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)

# Save your results to all_baseline_results
results['teammate'] = 'Dmitry'
results['experiment_type'] = 'baseline'
results['predictor_model'] = 'KNN' #[INSERT MODEL NAME HERE]
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

pprint(results)

Neighbors:  1  Equalized Odds Diff:  0.7692307692307693
Neighbors:  2  Equalized Odds Diff:  0.6153846153846154
Neighbors:  3  Equalized Odds Diff:  0.8
Neighbors:  4  Equalized Odds Diff:  0.6153846153846154
Neighbors:  5  Equalized Odds Diff:  0.8
Neighbors:  6  Equalized Odds Diff:  0.6923076923076923
Neighbors:  7  Equalized Odds Diff:  0.7692307692307693
Neighbors:  8  Equalized Odds Diff:  0.6923076923076923
Neighbors:  9  Equalized Odds Diff:  0.8
Neighbors:  10  Equalized Odds Diff:  0.7692307692307693
Neighbors:  11  Equalized Odds Diff:  0.8461538461538461
Neighbors:  12  Equalized Odds Diff:  0.7692307692307693
Neighbors:  13  Equalized Odds Diff:  0.9230769230769231
Neighbors:  14  Equalized Odds Diff:  0.9230769230769231
Neighbors:  15  Equalized Odds Diff:  0.9230769230769231
Neighbors:  16  Equalized Odds Diff:  0.7692307692307693
Neighbors:  17  Equalized Odds Diff:  0.9230769230769231
Neighbors:  18  Equalized Odds Diff:  0.7692307692307693
Neighbors:  19  Equalized Od

**Analysis / Interpretation for KNN Model:**

Looking at the results I got after training and optimizing the KNN model, we see that our optimal number of neighbors, for maximizing equalized odds is 2. Moreover, we see that it scores fairly well for the accuracy when divided by groups. The lowest accuracy we get is for the 25 to 45 group with about 80.6% accruacy. 

However, our overall f1 score is fairly low at around 54%. This means that while we may be getting a minimal equalized odds difference, our overall accuracy is lower than what we would like. In a previous iteration, I focused on maximizing the accuracy of the model, however, that led to a detrimental increase in our equalized odds difference, meaning that the model was inherently baised.

In my opinion, the hit to accuracy, for instead having a more fair model is almost justified for this case. I would ideally hope for better accuracy, hopefully post mitigation.

## Teammate 2

In [24]:
# Select a model and train it on the training set
decisionTree_Model = DecisionTreeClassifier(random_state = 44,
                                            max_depth = 5,
                                            min_samples_leaf = 20)

decisionTree_Model.fit(X_train, np.asarray(y_train).ravel())

# Make predictions on the testing set and store them in y_pred
y_pred = decisionTree_Model.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)

# Save your results to all_baseline_results
results['teammate'] = 'Kyle'
results['experiment_type'] = 'baseline'
results['predictor_model'] = 'Decision Tree' #[INSERT MODEL NAME HERE]
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

pprint(results)

{'accuracy_by_group': {'25_to_45_0_0': 0.8801020408163265,
                       '25_to_45_0_1': 0.9757575757575757,
                       '25_to_45_0_2': 0.7777777777777778,
                       '25_to_45_0_3': 1.0,
                       '25_to_45_0_4': 1.0,
                       '25_to_45_1_0': 0.7951991828396323,
                       '25_to_45_1_1': 0.8395061728395061,
                       '25_to_45_1_2': 0.8068181818181818,
                       '25_to_45_1_3': 0.9130434782608695,
                       '25_to_45_1_4': 0.8421052631578947,
                       '46_to_65_0_0': 0.8907766990291263,
                       '46_to_65_0_1': 0.96875,
                       '46_to_65_0_2': 0.8,
                       '46_to_65_0_3': 0.8333333333333334,
                       '46_to_65_0_4': 1.0,
                       '46_to_65_1_0': 0.7156862745098039,
                       '46_to_65_1_1': 0.8846153846153846,
                       '46_to_65_1_2': 0.7272727272727273,
         

**Analysis / Interpretation for Decision Tree Model:**

After looking at the global performance metrics for my Decision Tree Model, we see that the overall F1 score is around 0.622, which is quite a but higher than that of KNN and Linear Model. The decision trees seem to have a better balance between its precision and recall, which may lead to it doing a better job at identifying whether someone has a >50k income. 

When looking at the different subgroups' accuracy, we see that the performance varies a bit, but stays generally strong across most of the demographic combinations. A majority of the subgroups accuracy falls between 0.77 and 0.97, with a few of them reaching 1.0. The lower accuracies appear within certain 46-65 subgroups, suggesting some inconsistencies in the predictive performance for the specific task at hand. 
The subgroups that have 1.0's for their accuracy should be treated with caution, cause it may be caused by some bias in the model itself.

Overall the model seems to do a pretty good job, but the flexibility of the model allows for some fairness concerns to arise due to the subgroup variability and possible overfitting. 

## Teammate 3

In [25]:
# Select a model and train it on the training set
# Linear model
from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

y_pred_numerical = linear_model.predict(X_test)
# Make predictions on the testing set and store them in y_pred
y_pred = (y_pred_numerical >= 0.5).astype(int)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)

# Save your results to all_baseline_results
results['teammate'] = 'Teammate 3'
results['experiment_type'] = 'baseline'
results['predictor_model'] = 'Linear model'
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

pprint(results)

{'accuracy_by_group': {'25_to_45_0_0': 0.8418367346938775,
                       '25_to_45_0_1': 0.9757575757575757,
                       '25_to_45_0_2': 0.7777777777777778,
                       '25_to_45_0_3': 1.0,
                       '25_to_45_0_4': 1.0,
                       '25_to_45_1_0': 0.7379979570990807,
                       '25_to_45_1_1': 0.8271604938271605,
                       '25_to_45_1_2': 0.625,
                       '25_to_45_1_3': 0.8260869565217391,
                       '25_to_45_1_4': 0.8421052631578947,
                       '46_to_65_0_0': 0.8446601941747572,
                       '46_to_65_0_1': 0.921875,
                       '46_to_65_0_2': 0.9,
                       '46_to_65_0_3': 0.6666666666666666,
                       '46_to_65_0_4': 1.0,
                       '46_to_65_1_0': 0.6898395721925134,
                       '46_to_65_1_1': 0.7948717948717948,
                       '46_to_65_1_2': 0.6818181818181818,
                     

**Analysis / Interpretation for Linear Model:**

Based on the global metrics, the LM has an accuracy of 0.811, which is quite strong for a rigid LM. However, the f1 score is very low relative to accuracy, 0.45 respectively. This suggests class imbalance, as the model predicts the majority class much more frequently than the minority class (>50k income). This is a logical, as an LM is rigid, and the treshold can heavily skew the f1 score.

Looking at accuracy per group, it varies quite a bit, from a min of 0.625 and a max of 1.0. Contrarily, f1 score per group is mostly 0, which brings us back to the imbalance issue mentioned previously. The model predicts the majority class almost exclusively.

Lastly, looking at the fairness metrics, the equalized odds score is 0.743. Which is very high, suggesting that the model is very biased accross sensitive subgroups. This is something we need to consider, when looking at mitigating the bias in our data. 

## Teammate 4

In [26]:
# Select a model and train it on the training set
# [INSERT YOUR CODE HERE]
logreg = LogisticRegression(
    C=1.0, 
    solver ="lbfgs",
    max_iter= 1000 
)

logreg.fit(X_train, y_train)

# Make predictions on the testing set and store them in y_pred
y_pred = logreg.predict(X_test)

# Evaluate testing set predictions using evaluate_model()
results = evaluate_model(y_test, y_pred, group_labels_test)

# Save your results to all_baseline_results
results['teammate'] = 'Refugio'
results['experiment_type'] = 'baseline'
results['predictor_model'] = 'Logistic Regression'
results['mitigation_strategy'] = 'NONE' ## DO NOT EDIT: This is pre-mitigation baseline
all_baseline_results.append(results)

pprint(results)

{'accuracy_by_group': {'25_to_45_0_0': 0.8469387755102041,
                       '25_to_45_0_1': 0.9818181818181818,
                       '25_to_45_0_2': 0.7777777777777778,
                       '25_to_45_0_3': 1.0,
                       '25_to_45_0_4': 1.0,
                       '25_to_45_1_0': 0.7604698672114403,
                       '25_to_45_1_1': 0.845679012345679,
                       '25_to_45_1_2': 0.7159090909090909,
                       '25_to_45_1_3': 0.9130434782608695,
                       '25_to_45_1_4': 0.8947368421052632,
                       '46_to_65_0_0': 0.8616504854368932,
                       '46_to_65_0_1': 0.953125,
                       '46_to_65_0_2': 0.9,
                       '46_to_65_0_3': 0.8333333333333334,
                       '46_to_65_0_4': 1.0,
                       '46_to_65_1_0': 0.713903743315508,
                       '46_to_65_1_1': 0.8333333333333334,
                       '46_to_65_1_2': 0.7272727272727273,
          

  y = column_or_1d(y, warn=True)


**Analysis / Interpretation for Linear Model:**

Based on the global metrics, our logistic regresion model has a 0.8261 accurary score, which is suffecicently strong. However, the f1 score is very low, 0.5575 respectively. This suggests class imbalance, as the model predicts will for people whom make more than 50k/year since they have a larger subgroup in the dataset. 

Looking at accuracy per group, it varies quite a bit, from a min of 0.666 and a max of 1.0. This shows a high variabily within the predictions as a margin difference of 0.33 is quite large to have when compaing max - min. 

Lastly, looking at the fairness metrics, the equalized odds score is 0.833. Which is extremely high, suggesting that the model treats group drastically different when making mistakes in predicting. All of these metrics are things to consider when implementing our mitigation bias strategy. 

# Mitigating Bias (INDIVIDUAL CONTRIBUTION)

_(new in this milestone)_


In [27]:
## A place to save all teammates' post-mitigation results
all_mitigated_results = [] ## DO NOT EDIT

## Teammate 1

In [28]:
# Implement your bias mitigation strategy
## If you chose preprocessing, you will train a new version of your predictor model with new/modified inputs.
## If you chose inprocessing, you will train a new version of your predictor with modified learning objective (loss function).
## If you chose postprocessing, you will implement strategies to modify the predictions (y_pred) of the trained baseline predictor model from the previous milestone without training any new version of the predictor model.

# [INSERT CODE HERE]

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = ... # [INSERT CODE HERE]

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Teammate 1'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = ... #[INSERT MODEL NAME HERE]
results_mitigated['mitigation_strategy'] = ... #[INSERT STRATEGY TYPE HERE: 'preprocessing', 'inprocessing', 'postprocessing']
all_mitigated_results.append(results_mitigated)

pprint(results_mitigated)

ValueError: Found input variables with inconsistent numbers of samples: [6513, 1]

### Teammate 1's Conclusions
[Briefly describe findings and conclusions here. Compare post-mitigation results with baseline results for your model. What is the % improvement in performance post-mitigation?  ]

## Teammate 2

In [None]:
# Implement your bias mitigation strategy
## If you chose preprocessing, you will train a new version of your predictor model with new/modified inputs.
## If you chose inprocessing, you will train a new version of your predictor with modified learning objective (loss function).
## If you chose postprocessing, you will implement strategies to modify the predictions (y_pred) of the trained baseline predictor model from the previous milestone without training any new version of the predictor model.

# [INSERT CODE HERE]

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = ... # [INSERT CODE HERE]

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Teammate 2'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = ... #[INSERT MODEL NAME HERE]
results_mitigated['mitigation_strategy'] = ... #[INSERT STRATEGY TYPE HERE: 'preprocessing', 'inprocessing', 'postprocessing']
all_mitigated_results.append(results_mitigated)

print(results_mitigated)

### Teammate 2's Conclusions
[Briefly describe findings and conclusions here. Compare post-mitigation results with baseline results for your model. What is the % improvement in performance post-mitigation?]


## Teammate 3

In [None]:
# Implement your bias mitigation strategy
## If you chose preprocessing, you will train a new version of your predictor model with new/modified inputs.
## If you chose inprocessing, you will train a new version of your predictor with modified learning objective (loss function).
## If you chose postprocessing, you will implement strategies to modify the predictions (y_pred) of the trained baseline predictor model from the previous milestone without training any new version of the predictor model.

# [INSERT CODE HERE]

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = ... # [INSERT CODE HERE]

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Teammate 3 - Henry'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = ... #[INSERT MODEL NAME HERE]
results_mitigated['mitigation_strategy'] = ... #[INSERT STRATEGY TYPE HERE: 'preprocessing', 'inprocessing', 'postprocessing']
all_mitigated_results.append(results_mitigated)

print(results_mitigated)

### Teammate 3's Conclusions
[Briefly describe findings and conclusions here. Compare post-mitigation results with baseline results for your model. What is the % improvement in performance post-mitigation?]


## Teammate 4

In [None]:
# Implement your bias mitigation strategy
## If you chose preprocessing, you will train a new version of your predictor model with new/modified inputs.
## If you chose inprocessing, you will train a new version of your predictor with modified learning objective (loss function).
## If you chose postprocessing, you will implement strategies to modify the predictions (y_pred) of the trained baseline predictor model from the previous milestone without training any new version of the predictor model.

# [INSERT CODE HERE]

# Make predictions on the testing set and store them in y_pred_mitigate
y_pred_mitigated = ... # [INSERT CODE HERE]

# Evaluate testing set predictions using evaluate_model()
results_mitigated = evaluate_model(y_test, y_pred_mitigated, group_labels_test)

# Save your results to all_mitigated_results
results_mitigated['teammate'] = 'Teammate 4'
results_mitigated['experiment_type'] = 'post-mitigation'
results_mitigated['predictor_model'] = ... #[INSERT MODEL NAME HERE]
results_mitigated['mitigation_strategy'] = ... #[INSERT STRATEGY TYPE HERE: 'preprocessing', 'inprocessing', 'postprocessing']
all_mitigated_results.append(results_mitigated)

print(results_mitigated)

### Teammate 4's Conclusions
[Briefly describe findings and conclusions here. Compare post-mitigation results with baseline results for your model. What is the % improvement in performance post-mitigation?]


# Conclusions
_(new in this milestone)_


In [None]:
# Collect all the results in one table.
overall_results = pd.concat([pd.DataFrame(all_baseline_results), pd.DataFrame(all_mitigated_results)])
overall_results ## Note: The table displayed below in this starter notebook is for your reference, your team's table will be slightly different (e.g. different metrics, no.of sensitive attribute-based groups, actual values, etc.) upon successful completion of this notebook.

[Briefly describe overall findings and conclusions here. Which mitigation strategy resulted in most improvement? Which resulted in the least improvement? Visualize the results with some informative plots. (Hint: Use the `overall_results` table).]

# References

[List the references you used to complete this milestone here.]
- Teammate 1:
- Teammate 2:
- Teammate 3:
- Teammate 4:

# Disclosures

[Disclose use of generative AI and similar tools here.]
- Teammate 1:
- Teammate 2:
- Teammate 3:
- Teammate 4: