# HW2 - Bias in Data and Prediction - DSCI 531 - Spring 2024

### Please complete the code or analysis under "TODO". 100pts in total. You should run every cell and keep all the outputs before submitting. Failing to include your outputs will result in zero points.
### Please keep academic integrity in mind. Plagiarism will be taken seriously.

In [1]:
import numpy as np
import pandas as pd

## 1. Implement Utility Functions

### 1.1 Fairness Metrics

In [2]:
# You are NOT allowed to use off-the-shelf fairness packages like ai360

def stat_parity(preds, sens):
    '''
    :preds: numpy array of the model predictions. Consisting of 0s and 1s
    :sens: numpy array of the sensitive features. Consisting of 0s and 1s
    :return: the statistical parity. no need to take the absolute value
    '''

    # TODO. 7.5pts
    # create a dataframe consisting of model predictions from preds array and protected values from sens array
    df = pd.DataFrame({'pred':preds,'prot':sens})

    #calculate the ratio of positive predictions for the unprivileged group ('pred' is 1 and 'prot' is 0)
    unpriv = len(df[(df['pred']==1)&(df['prot']==0)])/len(df[df['prot']==0])

    #calculate the ratio of positive predictions for the privileged group ('pred' is 1 and 'prot' is 1)
    priv = len(df[(df['pred']==1)&(df['prot']==1)])/len(df[df['prot']==1])

    #return the difference between the ratios calculated for the privileged and unprivileged groups
    return priv - unpriv



def eq_oppo(preds, sens, labels):
    '''
    :preds: numpy array of the model predictions. Consisting of 0s and 1s
    :sens: numpy array of the sensitive features. Consisting of 0s and 1s
    :labels: numpy array of the ground truth labels of the outcome. Consisting of 0s and 1s
    :return: the statistical parity. no need to take the absolute value
    '''

    # TODO. 7.5pts
    # create a dataframe consisting of model predictions from preds array, protected values from sens array
    # and labels array containing ground truth
    df = pd.DataFrame({'pred':preds,'prot':sens,'labels':labels})
    #calculate the true positive rates (a and b) for the two sensitive groups while handling potential division by zero errors
    try:
        a = len(df[(df['pred']==1) & (df['labels']==1) & (df['prot']==1)]) / len(df[(df['labels']==1) & (df['prot']==1)])
    except:
        a = 0

    try:
        b = len(df[(df['pred']==1) & (df['labels']==1) & (df['prot']==0)]) / len(df[(df['labels']==1) & (df['prot']==0)])
    except:
        b=0

    return a-b

In [3]:
# Test your implemented fairness metrics using the code below
# Don't change the code in this cell

# test case 1
preds = np.array([1, 0, 1, 0, 0, 1, 0, 0, 0, 1])
sens = np.array([1, 1, 0, 1, 1, 1, 0, 1, 1, 1])
labels = np.array([0, 1, 0, 1, 0, 1, 1, 1, 0, 1])
print(eq_oppo(preds, sens, labels), stat_parity(preds, sens))

# test case 2
preds = np.array([1, 1, 0, 1, 0, 1, 0, 0, 1, 1])
sens = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1])
labels = np.array([0, 1, 0, 1, 0, 1, 1, 0, 0, 0])
print(eq_oppo(preds, sens, labels), stat_parity(preds, sens))


# test case 3
preds = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1])
sens = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 1])
labels = np.array([0, 1, 0, 1, 0, 1, 1, 0, 0, 0])
print(eq_oppo(preds, sens, labels), stat_parity(preds, sens))

0.4 -0.125
-0.75 0.5
0.0 1.0


### 1.2 Preprocessing DataFrame

In [4]:
from sklearn.preprocessing import OneHotEncoder, MinMaxScaler

def process_dfs(df_train_x, df_test_x, categ_cols):
    '''
    Pre-process the features of the training set and the test set, not including the outcome column.
    Convert categorical features (nominal & ordinal features) to one-hot encodings.
    Normalize the numerical features into [0, 1].
    We process training set and the test set together in order to make sure that
    the encodings are consistent between them.
    For example, if one class is encoded as 001 and another class is encoded as 010 in the training set,
    you should follow this mapping for the test set too.

    :df_train: the dataframe of the training data
    :df_test: the dataframe of the test data
    :categ_cols: the column names of the categorical features. the rest features are treated as numerical ones.
    :return: the processed training data and test data, both should be numpy arrays, instead of DataFrames
    '''

    # TODO. 15pts
    combined = pd.concat([df_train_x, df_test_x], keys=['train', 'test'])

    # Separate categorical and numerical columns
    num_cols = [col for col in combined.columns if col not in categ_cols]

    # One-hot encode categorical columns
    encoder = OneHotEncoder(sparse_output=False, drop='first')
    combined_categ = encoder.fit_transform(combined[categ_cols])

    # Normalize numerical columns
    scaler = MinMaxScaler()
    combined_num = scaler.fit_transform(combined[num_cols])

    # Concatenate processed categorical and numerical features
    combined_processed = np.concatenate([combined_categ, combined_num], axis=1)

    # Split back into training and test set
    train_x = combined_processed[:len(df_train_x)]
    test_x = combined_processed[len(df_train_x):]

    return train_x, test_x

In [5]:
# Test your implemented data preprocessing function
# DO NOT change the code in this cell

df_train_x = pd.DataFrame([
    [ 'big', 10, 'blue',],
    [ 'big', 12, 'red',],
    ['medium', 5, 'blue'],
    ['small', 7, 'yellow']
], columns=['size', 'height', 'color'])

df_test_x = pd.DataFrame([
    [ 'big', 16, 'red',],
    ['small', 9, 'blue']
], columns=['size', 'height', 'color'])

train_data_x, test_data_x = process_dfs(df_train_x, df_test_x, categ_cols=['size', 'color'])
print(train_data_x)
print()
print(test_data_x)

[[0.         0.         0.         0.         0.45454545]
 [0.         0.         1.         0.         0.63636364]
 [1.         0.         0.         0.         0.        ]
 [0.         1.         0.         1.         0.18181818]]

[[0.         0.         1.         0.         1.        ]
 [0.         1.         0.         0.         0.36363636]]


## 2. Load Data

In [6]:
df_train_adult = pd.read_csv('adult-train.csv', sep=', ', engine='python')
df_test_adult = pd.read_csv('adult-test.csv', sep=', ', engine='python')
df_train_adult['sex'] = df_train_adult['sex'].map({'Male': 0, 'Female': 1})
df_test_adult['sex'] = df_test_adult['sex'].map({'Male': 0, 'Female': 1})
df_train_adult['income'] = df_train_adult['income'].map({'<=50K': 0, '>50K': 1})
df_test_adult['income'] = df_test_adult['income'].map({'<=50K': 0, '>50K': 1})


df_train_german = pd.read_csv('german-train.csv')
df_test_german = pd.read_csv('german-test.csv')
df_train_german['age'] = df_train_german['age'].apply(lambda x: 1 if x >= 33 else 0)
df_test_german['age'] = df_test_german['age'].apply(lambda x: 1 if x>=33 else 0)
df_train_german['credit_status'] = df_train_german['credit_status'].map({2:0, 1:1})
df_test_german['credit_status'] = df_test_german['credit_status'].map({2:0, 1:1})

In [7]:
df_train_adult.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,0,2174,0,40,United-States,0
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,0,0,0,13,United-States,0
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,0,0,0,40,United-States,0
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,0,0,0,40,United-States,0
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,1,0,0,40,Cuba,0


In [8]:
df_train_german.head()

Unnamed: 0,checking_account,duration,credit_history,purpose,credit_amount,savings_account,present_employment_since,installment_rate,personal_status_sex,other_debtors,...,property,age,other_installment_plans,housing,num_credits,job,num_people_liable,telephone,foreign_worker,credit_status
0,A14,21,A32,A41,5248,A65,A73,1,A93,A101,...,A123,0,A143,A152,1,A173,1,A191,A201,1
1,A11,24,A32,A43,1987,A61,A73,2,A93,A101,...,A121,0,A143,A151,1,A172,2,A191,A201,0
2,A14,36,A32,A49,5742,A62,A74,2,A93,A101,...,A123,0,A143,A152,2,A173,1,A192,A201,1
3,A14,36,A32,A49,7409,A65,A75,3,A93,A101,...,A122,1,A143,A152,2,A173,1,A191,A201,1
4,A14,6,A34,A42,1221,A65,A73,1,A94,A101,...,A122,0,A143,A152,2,A173,1,A191,A201,1


## 3. Explore fairness in data

### 3.1 statical analysis on protected feature and outcome

In [9]:
# Adult
# calculate the mean income of two protected groups. only use the training data df_train_adult.
# TODO. 3pts. The starter code below just indicate what you need to output in your code.

mean_income1_adult = df_train_adult[df_train_adult['sex'] == 0]['income'].mean()
mean_income2_adult = df_train_adult[df_train_adult['sex'] == 1]['income'].mean()

print(mean_income1_adult, mean_income2_adult)

# German
# calculate the mean credit status of two protected groups. only use the training data df_train_german.
# TODO. 3pts. The starter code below just indicate what you need to output in your code.
mean_credit1_german = df_train_german[df_train_german['age'] == 0]['credit_status'].mean()
mean_credit2_german = df_train_german[df_train_german['age'] == 1]['credit_status'].mean()

print(mean_credit1_german, mean_credit2_german)

0.3138370951913641 0.11367818442036394
0.6636363636363637 0.7594594594594595


In [10]:
# t-test between outcome of two protected groups. only use the training data df_train_adult/german.
from scipy.stats import ttest_ind

# Adult
# TODO. 1.5pts. The starter code below just indicate what you need to output in your code.
group1_adult = df_train_adult[df_train_adult['sex'] == 0]['income']
group2_adult = df_train_adult[df_train_adult['sex'] == 1]['income']
_, p_value_adult = ttest_ind(group1_adult, group2_adult)

# german
# TODO. 1.5pts. The starter code below just indicate what you need to output in your code.
age_group1 = 0  # age < 33
age_group2 = 1  # age >= 33

group1_german = df_train_german[df_train_german['age'] == age_group1]['credit_status']
group2_german = df_train_german[df_train_german['age'] == age_group2]['credit_status']
_, p_value_german = ttest_ind(group1_german, group2_german)

print(p_value_adult, p_value_german)

0.0 0.0050427130735674645


### From the p_values, are the results significant for Adult and German? How do you explain them?
### <span style="color:red">Please type your response here.</span> 3 pts
In the Adult dataset, a p-value of 0.0 => statistically significant as it is < 0.005 shows significant differences in income between groups, likely based on sex. This might reflect real-world variations due to job differences, wage gaps, or other socio-economic factors. To fully understand these differences, it's important to explore additional factors like occupation, education, or work experience.
<br>
In the German dataset, a p-value of 0.0053 => cannot reject null-hypothesis, indicates notable variations in credit status among groups categorized by age. This could be influenced by various factors such as financial stability, credit history length, or risk profiles associated with specific age ranges. Conducting further analyses, potentially using multivariate methods or qualitative investigations, is necessary to comprehend the main drivers behind these noticeable differences.

### 3.2 Explore Fairness in Prediction

In [11]:
# Prepare data
# Dont't change code in this cell

'''
:train_x: the features in the training set (including the sensitive features), shape: N_train x d
:train_y: the outcome in the training set, shape: N_train
:test_x: the features in the test set (including the sensitive features), shape: N_test x d
:test_y: the outcome in the test set, shape: N_test
:test_sens: the sensitive/protected feature in the test set, shape: N_test
All of them are processed numpy arrays that are ready for algorithms.
'''


# adult
# the outcome (income) is the last column
df_train_x_adult = df_train_adult.iloc[:, :-1]
df_train_y_adult = df_train_adult.iloc[:, -1]
df_test_x_adult = df_test_adult.iloc[:, :-1]
df_test_y_adult = df_test_adult.iloc[:, -1]
df_test_sens_adult = df_test_adult['sex']

train_x_adult, test_x_adult = process_dfs(df_train_x_adult, df_test_x_adult,
                                                   ['workclass', 'education','marital-status',
                                                    'occupation','relationship','race',
                                                    'native-country'])
train_y_adult = df_train_y_adult.values
test_y_adult = df_test_y_adult.values
test_sens_adult = df_test_sens_adult.values

# german
# the outcome (credit status) is the last column
df_train_x_german = df_train_german.iloc[:, :-1]
df_train_y_german = df_train_german.iloc[:, -1]
df_test_x_german = df_test_german.iloc[:, :-1]
df_test_y_german = df_test_german.iloc[:, -1]
df_test_sens_german = df_test_german['age']

train_x_german, test_x_german = process_dfs(df_train_x_german, df_test_x_german,
                                                     ['checking_account', 'credit_history',
                                                      'purpose', 'savings_account', 'present_employment_since',
                                                      'personal_status_sex', 'other_debtors',
                                                     'property', 'other_installment_plans',
                                                     'housing', 'job', 'telephone', 'foreign_worker'])
train_y_german = df_train_y_german.values
test_y_german = df_test_y_german.values
test_sens_german = df_test_sens_german.values

print(train_x_adult.shape, test_x_adult.shape, train_y_adult.shape, test_y_adult.shape)
print(train_x_german.shape, test_x_german.shape, train_y_german.shape, test_y_german.shape)

(30162, 96) (15060, 96) (30162,) (15060,)
(700, 48) (300, 48) (700,) (300,)


In [12]:
# train a classifier to predict the outcome y from features x
# training: train_x --> train_y; test: test_x --> preds
# logistic regression model is recommended
# sklearn is allowed to use
from sklearn.linear_model import LogisticRegression
import warnings
warnings.filterwarnings("ignore")

# Adult 5 pts

# initialize the model
# TODO.
model_adult = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_adult and train_y_adult
# TODO.
model_adult.fit(train_x_adult, train_y_adult)

# predict the outcome from test_x_adult
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_adult.predict(test_x_adult)


# report acc and two fairness metrics.
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_adult, preds)
stat_p = stat_parity(preds, test_sens_adult)
eq_op = eq_oppo(preds, test_sens_adult, test_y_adult)
print(acc, stat_p, eq_op)





# German 5 pts

# initialize the model
# TODO.
model_german = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_german and train_y_german
# TODO.
model_german.fit(train_x_german, train_y_german)

# predict the outcome from test_x_german
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_german.predict(test_x_german)


# report acc and two fairness metrics
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_german, preds)
stat_p = stat_parity(preds, test_sens_german)
eq_op = eq_oppo(preds, test_sens_german, test_y_german)
print(acc, stat_p, eq_op)

0.8457503320053121 -0.18311237991029125 -0.1006888294697229
0.7533333333333333 0.12378284647192217 0.07932692307692313


## 4. Explore possible ways to mitigate bias

### 4. 1 remove protected attribute

In [13]:
# Adult
# remove the sex column from df_train_x_adult and df_test_x_adult.
# You shouldn't do it in-place. In other words, do not modify df_train_x_adult or df_test_x_adult
# TODO. 4pts. The starter code below just indicate what you need to output in your code.
df_train_x_no_sens_adult = df_train_x_adult.drop('sex', axis=1, inplace=False)
df_test_x_no_sens_adult = df_test_x_adult.drop('sex', axis=1, inplace=False)



train_x_adult, test_x_adult = process_dfs(df_train_x_no_sens_adult, df_test_x_no_sens_adult,
                                                   ['workclass', 'education','marital-status',
                                                    'occupation','relationship','race',
                                                    'native-country'])


# German
# remove age column from df_train_x_german and df_test_x_german
# You shouldn't do it in-place. In other words, do not modify df_train_x_german or df_test_x_german
# TODO. 4pts. The starter code below just indicate what you need to output in your code.
df_train_x_no_sens_german = df_train_x_german.drop('age', axis=1, inplace=False)
df_test_x_no_sens_german = df_test_x_german.drop('age', axis=1, inplace=False)


train_x_german, test_x_german = process_dfs(df_train_x_no_sens_german, df_test_x_no_sens_german,
                                                     ['checking_account', 'credit_history',
                                                      'purpose', 'savings_account', 'present_employment_since',
                                                      'personal_status_sex', 'other_debtors',
                                                     'property', 'other_installment_plans',
                                                     'housing', 'job', 'telephone', 'foreign_worker'])


print(train_x_adult.shape, test_x_adult.shape)
print(train_x_german.shape, test_x_german.shape)

(30162, 95) (15060, 95)
(700, 47) (300, 47)


In [14]:
# train a classifier to predict the outcome y from features x (with protected feature removed)
# training: train_x --> train_y; test: test_x --> preds
# logistic regression model is recommended
# sklearn is allowed to use
# Just use the code in 3.2 again


# Adult 4 pts

# initialize the model
# TODO.
model_adult = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_adult and train_y_adult
# TODO.
model_adult.fit(train_x_adult, train_y_adult)

# predict the outcome from test_x_adult
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_adult.predict(test_x_adult)

# report acc and two fairness metrics
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_adult, preds)
stat_p = stat_parity(preds, test_sens_adult)
eq_op = eq_oppo(preds, test_sens_adult, test_y_adult)
print(acc, stat_p, eq_op)



# German 4 pts

# initialize the model
# TODO.
model_german = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_german and train_y_german
# TODO.
model_german.fit(train_x_german, train_y_german)

# predict the outcome from test_x_german
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_german.predict(test_x_german)

# report acc and two fairness metrics
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_german, preds)
stat_p = stat_parity(preds, test_sens_german)
eq_op = eq_oppo(preds, test_sens_german, test_y_german)
print(acc, stat_p, eq_op)

0.8456839309428951 -0.17440963651541957 -0.07087249257561912
0.7633333333333333 0.10337468320661602 0.07932692307692313


### According to the results, how are the accuracy, stat parity and eq oppo different from the original model? Does explicitly removing the sensitive feature help in mitigating bias? Why or why not?
### <span style="color:red">Please type your response here.</span> 4 points
The Adult dataset's accuracy went down a bit, while the German dataset improved slightly. Both datasets showed a small increase in statistical parity, indicating less bias. However, although equalized opportunity got better for the Adult dataset, there was no change for the German dataset.
<br>
These findings suggest that removing sensitive details can help a bit with bias, but it's not a huge improvement. It also shows that bias isn't only about having specific data but also about how that data is connected to other factors

### 4.2 Augmenting the training set

#### See the example in Figure 1 of https://dl.acm.org/doi/pdf/10.1145/3375627.3375865

In [15]:
# Adult
# create a synthetic training set by duplicating df_train_x_adult and df_train_y_adult
# after duplicating flip sex in the synthetic set
# You shouldn't do it in-place. In other words, do not modify df_train_x_adult or df_train_y_adult
# TODO. 8pts. The starter code below just indicate what you need to output in your code.
df_train_x_syn_adult = df_train_x_adult.copy()
df_train_x_syn_adult['sex'] = df_train_x_syn_adult['sex'].apply(lambda x: 1 if x == 0 else 0)
df_train_y_syn_adult = df_train_y_adult.copy()

# augment the original training set by the synthetic set. In other words, concatenate them
df_train_x_aug_adult = pd.concat((df_train_x_adult, df_train_x_syn_adult))
df_train_y_aug_adult = pd.concat((df_train_y_adult, df_train_y_syn_adult))

print(df_train_x_aug_adult.shape, df_train_y_aug_adult.shape)


train_x_adult, test_x_adult = process_dfs(df_train_x_aug_adult, df_test_x_adult,
                                                   ['workclass', 'education','marital-status',
                                                    'occupation','relationship','race',
                                                    'native-country'])
train_y_adult = df_train_y_aug_adult.values
print(train_x_adult.shape, test_x_adult.shape, train_y_adult.shape)



# German
# create a synthetic training set by duplicating df_train_x_german and df_train_y_german
# after duplicating flip age in the synthetic set.
# You shouldn't do it in-place. In other words, do not modify df_train_x_german or df_train_y_german
# TODO. 8pts. The starter code below just indicate what you need to output in your code.
df_train_x_syn_german = df_train_x_german.copy()
df_train_x_syn_german['age'] = df_train_x_syn_german['age'].apply(lambda x: 0 if x == 1 else 1)
df_train_y_syn_german = df_train_y_german.copy()


# augment the original training set by the synthetic set. In other words, concatenate them
df_train_x_aug_german = pd.concat((df_train_x_german, df_train_x_syn_german))
df_train_y_aug_german = pd.concat((df_train_y_german, df_train_y_syn_german))

train_y_german = df_train_y_aug_german.values

print(df_train_x_aug_german.shape, df_train_y_aug_german.shape, train_y_german.shape)


train_x_german, test_x_german = process_dfs(df_train_x_aug_german, df_test_x_german,
                                                     ['checking_account', 'credit_history',
                                                      'purpose', 'savings_account', 'present_employment_since',
                                                      'personal_status_sex', 'other_debtors',
                                                     'property', 'other_installment_plans',
                                                     'housing', 'job', 'telephone', 'foreign_worker'])
print(train_x_german.shape, test_x_german.shape)

(60324, 14) (60324,)
(60324, 96) (15060, 96) (60324,)
(1400, 20) (1400,) (1400,)
(1400, 48) (300, 48)


In [16]:
# train a classifier to predict the outcome y from features x on the augmented training data
# training: train_x --> train_y; test: test_x --> preds
# logistic regression model is recommended
# sklearn is allowed to use
# Just use the code in 3.2 again


# Adult 4 pts

# initialize the model
# TODO.
model_adult = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_adult and train_y_adult
# TODO.
model_adult.fit(train_x_adult, train_y_adult)

# predict the outcome from test_x_adult
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_adult.predict(test_x_adult)

# report acc and two fairness metrics
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_adult, preds)
stat_p = stat_parity(preds, test_sens_adult)
eq_op = eq_oppo(preds, test_sens_adult, test_y_adult)
print(acc, stat_p, eq_op)



# German 4 pts

# initialize the model
# TODO.
model_german = LogisticRegression(max_iter=1000, random_state=42)

# train/fit the model with train_x_german and train_y_german
# TODO.
model_german.fit(train_x_german, train_y_german)

# predict the outcome from test_x_german
# TODO. The starter code below just indicate what you need to output in your code.
preds = model_german.predict(test_x_german)

# report acc and two fairness metrics
from sklearn.metrics import accuracy_score
acc = accuracy_score(test_y_german, preds)
stat_p = stat_parity(preds, test_sens_german)
eq_op = eq_oppo(preds, test_sens_german, test_y_german)
print(acc, stat_p, eq_op)

0.8467463479415671 -0.17517229075356358 -0.07237250599919687
0.76 0.08323329331732687 0.06971153846153844


### According to the results, how are the accuracy, stat parity and eq oppo different from the original model? Does augmenting the dataset with synthetic data help in mitigating bias? Why or why not?
### <span style="color:red">Please type your response here.</span> 4 points
Augmenting the dataset with synthetic data, where the sensitive attribute is flipped, yielded slight enhancements in fairness metrics (statistical parity and equalized opportunity) and accuracy for both Adult and German datasets compared to method 1 which included protected attributes. This means that this method can slightly reduce bias by making sure different groups are represented more equally. This makes the model a bit fairer without hurting its ability to make accurate predictions too much.

**Summary:**
1. **Including the protected attributes** <br>
  **Adult:** <br>
  Accuracy: 0.8457503320053121 <br>
  Stat_P: -0.18311237991029125 <br>
  Eq_op: -0.1006888294697229 <br>
  **German:** <br>
  Accuracy: 0.7533333333333333 <br>
  Stat_p: 0.12378284647192217 <br>
  Eq_op: 0.07932692307692313
<br><br>
2. **Removing the protected attributes** <br>
  **Adult:** <br>
  Accuracy: 0.8456839309428951 <br>
  Stat_P: -0.17440963651541957 <br>
  Eq_op: -0.07087249257561912 <br>
  **German:** <br>
  Accuracy: 0.7633333333333333 <br>
  Stat_P: 0.10337468320661602 <br>
  Eq_op: 0.07932692307692313
<br><br>
3. **Augmenting the training set** <br>
  **Adult:** <br>
  Accuracy: 0.8467463479415671 <br>
  Stat_P: -0.17517229075356358 <br>
  Eq_op: -0.07237250599919687<br>
  **German:** <br>
  Accuracy: 0.76 <br>
  Stat_P: 0.08323329331732687 <br>
  Eq_op: 0.06971153846153844

