# Context

### FILL IN CONTEXT FOR THE DATASET HERE
Where does your dataset come from? What is it for, how was it
collected, etc.?

FILL HERE

# TASK
Code: Report standard accuracy and fairness metrics of labels and predictions,
such as those in Modules 1 and 2;

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load dataset
data = pd.read_csv("diabetic_data.csv") 

# converting hospital readmission data into binary - 1 if visited in less than 30 days, 0 if false.
data['readmitted'] = data['readmitted'].apply(lambda x: 1 if x in ['<30', '>30'] else 0)

# Dropping irrelevant columns
# encounter_id, patient_nbr -> Unique identifier of an encounter, and patient number, which seem irrelevant for prediction tasks
# weight -> 97% of weight entries missing
# readmitted -> we make a new binary y_hat label as indicated in the line above to deal with readmitted patients
# payer_code , medical_specialty -> not enough conclusive data to make decision, i.e. >50% of entries

X = data.drop(columns=['encounter_id', 'patient_nbr', 'readmitted', 'payer_code', 'weight', 'medical_specialty'])
y = data['readmitted']

# Identify numerical and categorical features
num_features = X.select_dtypes(include=['int64', 'float64']).columns
cat_features = X.select_dtypes(include=['object']).columns

# replaces missing numerical columns with observed mean values
num_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy="mean")),
    ('scaler', StandardScaler())
])

# using OHE, and converting missing entries into the most_frequent category
cat_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy="most_frequent")),
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(transformers=[
    ('num', num_transformer, num_features),
    ('cat', cat_transformer, cat_features)
])

# Create multi-layered logistic regression pipeline
model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(class_weight='balanced'))
])

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)

# OVERALL ACCURACY -> Metric 1
print(f"Accuracy: {accuracy:.4f}")

tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()

Accuracy: 0.6258


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [30]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

precision = tp/(tp+fp)
recall = tp/(tp+fn)
f1_score = 2 * precision * recall / (precision + recall)
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f"Precision Score: {precision:.4f}")
print(f"Recall Score: {recall:.4f}")
print(f"F1 Score: {f1_score:.4f}")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Mean Absolute Error (MAE): {mae:.4f}")


Precision Score: 0.6256
Recall Score: 0.4861
F1 Score: 0.5471
Mean Squared Error (MSE): 0.3718
Mean Absolute Error (MAE): 0.3718


In [None]:
# checking across race, gender, gender
unique_gender_groups = X_train['gender'].unique()

Age Group [70-80): 20879 entries in X_train
Age Group [50-60): 13741 entries in X_train
Age Group [80-90): 13737 entries in X_train
Age Group [40-50): 7810 entries in X_train
Age Group [60-70): 17959 entries in X_train
Age Group [30-40): 3030 entries in X_train
Age Group [10-20): 551 entries in X_train
Age Group [90-100): 2239 entries in X_train
Age Group [20-30): 1335 entries in X_train
Age Group [0-10): 131 entries in X_train


In [6]:
# Treating age as the sensitive attribute
unique_age_groups = data['age'].unique()

from collections import defaultdict
age_counts = defaultdict(int)  
positive_label_counts = defaultdict(int) 

for age, lbl in zip(X_test['age'], y_test):  
    age_counts[age] += 1 
    if lbl == 1:
        positive_label_counts[age] += 1  

demographic_parity = []
for i in unique_age_groups:
    demographic_parity.append(positive_label_counts[i]/age_counts[i])

# DEMOGRAPHIC PARITY / ACCURACY PARITY??? Slides have the same formula for both
print(demographic_parity)

# p% rule
count_rule_violation = 0

for i in range(len(demographic_parity)):
    for j in range(i + 1, len(demographic_parity)):
        if demographic_parity[i] / demographic_parity[j] < 0.8:
            count_rule_violation += 1

print(f"Number of p% rule violations: {count_rule_violation}")

# PPV and NPV

negative_outcomes = defaultdict(int)
positive_outcomes = defaultdict(int)
negative_predn = defaultdict(int)
positive_predn = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    age_group = X_test.iloc[idx]['age']
    if prediction == label and label == 0:
        negative_outcomes[age_group] += 1
    elif prediction == label and label == 1:
        positive_outcomes[age_group] += 1
    if prediction == 0:
        negative_predn[age_group] += 1
    elif prediction == 1:
        positive_predn[age_group] += 1

print()

for group in positive_outcomes:
    if positive_predn[group] > 0:
        ratio = positive_outcomes[group] / positive_predn[group]
    print(f"Positive parity value of age Group {group}: {ratio}")

print()

for group in negative_outcomes:
    if negative_predn[group] > 0:
        ratio = negative_outcomes[group] / negative_predn[group]
    print(f"Negative parity value of age Group {group}: {ratio}")


# Equal Opportunity

positive_outcomes = defaultdict(int)
positive_outcomes_for_age_group = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    age_group = X_test.iloc[idx]['age']
    if prediction == 1 and label == 1:
        positive_outcomes[age_group] += 1
    if prediction == 1:
        positive_outcomes_for_age_group[age_group] += 1

print()
for group in positive_outcomes:
    ratio = positive_outcomes[group] / positive_outcomes_for_age_group[group]
    print(f"Equal opportunity value of age Group {group}: {ratio}")



[0.16666666666666666, 0.40714285714285714, 0.48757763975155277, 0.4389261744966443, 0.4464, 0.4415362731152205, 0.46595932802829354, 0.47831952206590866, 0.47398843930635837, 0.427797833935018]
Number of p% rule violations: 9

Positive parity value of age Group [30-40): 0.6893939393939394
Positive parity value of age Group [60-70): 0.6123348017621145
Positive parity value of age Group [70-80): 0.5733634311512416
Positive parity value of age Group [50-60): 0.6160520607375272
Positive parity value of age Group [80-90): 0.5503500269251481
Positive parity value of age Group [40-50): 0.6622222222222223
Positive parity value of age Group [20-30): 0.8455284552845529
Positive parity value of age Group [90-100): 0.5608108108108109
Positive parity value of age Group [10-20): 0.6666666666666666

Negative parity value of age Group [70-80): 0.6214934808376136
Negative parity value of age Group [50-60): 0.6716697936210131
Negative parity value of age Group [80-90): 0.6144728633811604
Negative parity

In [16]:
# Treating gender as the sensitive attribute
unique_gender_groups = data['gender'].unique()

# Only 3 records have gender as Unknown/Invalid, so ignore that
unique_gender_groups = np.delete(unique_gender_groups, 2)

from collections import defaultdict
gender_counts = defaultdict(int)  
positive_label_counts = defaultdict(int) 

for gender, lbl in zip(X_test['gender'], y_test):  
    gender_counts[gender] += 1 
    if lbl == 1:
        positive_label_counts[gender] += 1  

demographic_parity = []
for i in unique_gender_groups:
    demographic_parity.append(positive_label_counts[i]/gender_counts[i])

# DEMOGRAPHIC PARITY / ACCURACY PARITY??? Slides have the same formula for both
print(demographic_parity)

# p% rule
count_rule_violation = 0

for i in range(len(demographic_parity)):
    for j in range(i + 1, len(demographic_parity)):
        if demographic_parity[i] / demographic_parity[j] < 0.8:
            count_rule_violation += 1

print(f"Number of p% rule violations: {count_rule_violation}")

# PPV and NPV

negative_outcomes = defaultdict(int)
positive_outcomes = defaultdict(int)
negative_predn = defaultdict(int)
positive_predn = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    gender_group = X_test.iloc[idx]['gender']
    if prediction == label and label == 0:
        negative_outcomes[gender_group] += 1
    elif prediction == label and label == 1:
        positive_outcomes[gender_group] += 1
    if prediction == 0:
        negative_predn[gender_group] += 1
    elif prediction == 1:
        positive_predn[gender_group] += 1

print()

for group in positive_outcomes:
    if positive_predn[group] > 0:
        ratio = positive_outcomes[group] / positive_predn[group]
    print(f"Positive parity value of gender Group {group}: {ratio}")

print()

for group in negative_outcomes:
    if negative_predn[group] > 0:
        ratio = negative_outcomes[group] / negative_predn[group]
    print(f"Negative parity value of gender Group {group}: {ratio}")


# Equal Opportunity

positive_outcomes = defaultdict(int)
positive_outcomes_for_gender_group = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    gender_group = X_test.iloc[idx]['gender']
    if prediction == 1 and label == 1:
        positive_outcomes[gender_group] += 1
    if prediction == 1:
        positive_outcomes_for_gender_group[gender_group] += 1

print()
for group in positive_outcomes:
    ratio = positive_outcomes[group] / positive_outcomes_for_gender_group[group]
    print(f"Equal opportunity value of gender Group {group}: {ratio}")



[0.4675419401896426, 0.4554075652637187]
Number of p% rule violations: 0

Positive parity value of gender Group Male: 0.5969959266802444
Positive parity value of gender Group Female: 0.5976353928299009

Negative parity value of gender Group Female: 0.6516422082459818
Negative parity value of gender Group Male: 0.6465090709180868
Negative parity value of gender Group Unknown/Invalid: 1.0

Equal opportunity value of gender Group Male: 0.5969959266802444
Equal opportunity value of gender Group Female: 0.5976353928299009


In [17]:
# Treating race as the sensitive attribute
unique_race_groups = data['race'].unique()

# Only 3 records have race as Unknown/Invalid, so ignore that
unique_race_groups = np.delete(unique_race_groups, 2)

from collections import defaultdict
race_counts = defaultdict(int)  
positive_label_counts = defaultdict(int) 

for race, lbl in zip(X_test['race'], y_test):  
    race_counts[race] += 1 
    if lbl == 1:
        positive_label_counts[race] += 1  

demographic_parity = []
for i in unique_race_groups:
    demographic_parity.append(positive_label_counts[i]/race_counts[i])

# DEMOGRAPHIC PARITY / ACCURACY PARITY??? Slides have the same formula for both
print(demographic_parity)

# p% rule
count_rule_violation = 0

for i in range(len(demographic_parity)):
    for j in range(i + 1, len(demographic_parity)):
        if demographic_parity[i] / demographic_parity[j] < 0.8:
            count_rule_violation += 1

print(f"Number of p% rule violations: {count_rule_violation}")

# PPV and NPV

negative_outcomes = defaultdict(int)
positive_outcomes = defaultdict(int)
negative_predn = defaultdict(int)
positive_predn = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    race_group = X_test.iloc[idx]['race']
    if prediction == label and label == 0:
        negative_outcomes[race_group] += 1
    elif prediction == label and label == 1:
        positive_outcomes[race_group] += 1
    if prediction == 0:
        negative_predn[race_group] += 1
    elif prediction == 1:
        positive_predn[race_group] += 1

print()

for group in positive_outcomes:
    if positive_predn[group] > 0:
        ratio = positive_outcomes[group] / positive_predn[group]
    print(f"Positive parity value of race Group {group}: {ratio}")

print()

for group in negative_outcomes:
    if negative_predn[group] > 0:
        ratio = negative_outcomes[group] / negative_predn[group]
    print(f"Negative parity value of race Group {group}: {ratio}")


# Equal Opportunity

positive_outcomes = defaultdict(int)
positive_outcomes_for_race_group = defaultdict(int)

for idx, label in enumerate(y_test):
    prediction = y_pred[idx]
    race_group = X_test.iloc[idx]['race']
    if prediction == 1 and label == 1:
        positive_outcomes[race_group] += 1
    if prediction == 1:
        positive_outcomes_for_race_group[race_group] += 1

print()
for group in positive_outcomes:
    ratio = positive_outcomes[group] / positive_outcomes_for_race_group[group]
    print(f"Equal opportunity value of race Group {group}: {ratio}")



[0.46933280798529026, 0.468983268983269, 0.3701067615658363, 0.33064516129032256, 0.3979328165374677]
Number of p% rule violations: 0

Positive parity value of race Group Caucasian: 0.5927627000695894
Positive parity value of race Group AfricanAmerican: 0.6202759448110378
Positive parity value of race Group Hispanic: 0.564935064935065
Positive parity value of race Group Other: 0.703125
Positive parity value of race Group ?: 0.5507246376811594
Positive parity value of race Group Asian: 0.48484848484848486

Negative parity value of race Group Caucasian: 0.6409300012433171
Negative parity value of race Group ?: 0.7473684210526316
Negative parity value of race Group AfricanAmerican: 0.6447249774571686
Negative parity value of race Group Hispanic: 0.7124463519313304
Negative parity value of race Group Asian: 0.7252747252747253
Negative parity value of race Group Other: 0.728110599078341

Equal opportunity value of race Group Caucasian: 0.5927627000695894
Equal opportunity value of race Grou

# Task

Discuss which fairness metrics are specifically relevant to your task,
e.g. accuracy parity might be less appropriate for recidivism prediction than
demographic parity (etc.)

FILL HERE

# TASK

How much can “unfairness” in your predictions be explained by dataset
characteristics? Can you fix them with dataset-based interventions?

In [None]:
# fill in code here

# TASK

How do different modeling choices impact fairness characteristics? Can
you fix them with in-processing interventions?

In [None]:
# fill in code here

# TASK

Can you apply post-processing interventions to achieve desired fairness outcomes

In [None]:
# fill in code here

# TASK 

Discussion: What types of interventions are most appropriate for your task (e.g.
legal, practical to deploy, etc.)? What are the tradeoffs between them (e.g. how
are other metrics negatively impacted by a particular intervention, etc.)

FILL HERE

# TASK

Implement a fairness intervention described in a research paper. Some
possibilities include examples from class

Additionally, Attempt to reproduce results similar to those reported in the paper on your
dataset (or comment in detail about any failure to do so)

In [None]:
# fill in code here

# TASK

Summarize the main contributions of the paper and its relevance to
your task

FILL HERE

# TASK

Is it more effective than other intervention strategies you tried? Why
or why not? Conclude your presentation with a general discussion of what was
and was not effective for your task.

FILL HERE