# Assignment 1: Algorithmic Fairness Definitions

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn import metrics

This component of the assignment derives in part, with thanks and permission, from an [assignment](https://web.stanford.edu/class/cs182/assignments/AlgorithmicDecisionMaking.zip) in Stanford's CS182: Ethics, Public Policy, and Technological Change. Their assignment, in turn, is based on the journalistic organization ProPublica's [analysis](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing) of a criminal risk prediction algorithm which we discussed in the algorithmic fairness lecture. Here, you will be assessing how a classifier designed to predict recidivism -- that is, whether someone will commit a crime in the future -- performs in terms of algorithmic fairness metrics.

## Problem 1: Loading the data (5 points)

We have split the data for you into a train set (`recidivism-training-data.csv`) and test set (`recidivism-testing-data.csv`). You will be training the model on the train set and evaluating model predictions on the test set.

1a. Read in the train and test sets. Display the first 10 rows of the data. (4 points)

In [2]:
train_data = pd.read_csv("/Users/srimoyee/Downloads/HW1_DueFeb8/recidivism-training-data.csv")
test_data = pd.read_csv("/Users/srimoyee/Downloads/HW1_DueFeb8/recidivism-testing-data.csv")

In [3]:
train_data.head(10)

Unnamed: 0,Juvenile felony count = 0,Juvenile felony count = 1,Juvenile felony count = 2,Juvenile felony count >= 3,Juvenile misdemeanor count = 0,Juvenile misdemeanor count = 1,Juvenile misdemeanor count = 2,Juvenile misdemeanor count >= 3,Juvenile other offense count = 0,Juvenile other offense count = 1,...,Age > 45,Gender = Female,Gender = Male,Race = Other,Race = Asian,Race = Native American,Race = Caucasian,Race = Hispanic,Race = African American,recidivism_outcome
0,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,1
1,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,1,0,0,1
2,1,0,0,0,1,0,0,0,1,0,...,0,1,0,0,0,0,0,0,1,0
3,1,0,0,0,1,0,0,0,1,0,...,0,1,0,0,0,0,1,0,0,0
4,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,0
5,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,1,0,0
6,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,1
7,0,1,0,0,0,1,0,0,0,1,...,0,0,1,0,0,0,0,0,1,1
8,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,0
9,1,0,0,0,1,0,0,0,0,0,...,0,1,0,0,0,0,0,0,1,1


In [4]:
test_data.head(10)

Unnamed: 0,Juvenile felony count = 0,Juvenile felony count = 1,Juvenile felony count = 2,Juvenile felony count >= 3,Juvenile misdemeanor count = 0,Juvenile misdemeanor count = 1,Juvenile misdemeanor count = 2,Juvenile misdemeanor count >= 3,Juvenile other offense count = 0,Juvenile other offense count = 1,...,Age > 45,Gender = Female,Gender = Male,Race = Other,Race = Asian,Race = Native American,Race = Caucasian,Race = Hispanic,Race = African American,recidivism_outcome
0,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,0
1,1,0,0,0,1,0,0,0,1,0,...,0,1,0,0,0,0,0,0,1,0
2,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,1,0,0,0
3,1,0,0,0,1,0,0,0,0,1,...,0,0,1,0,0,0,0,0,1,1
4,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,1
5,1,0,0,0,1,0,0,0,1,0,...,0,0,1,1,0,0,0,0,0,0
6,1,0,0,0,1,0,0,0,1,0,...,0,1,0,0,0,0,0,1,0,1
7,1,0,0,0,1,0,0,0,1,0,...,1,1,0,0,0,0,0,0,1,1
8,1,0,0,0,1,0,0,0,1,0,...,0,0,1,0,0,0,0,0,1,0
9,1,0,0,0,0,1,0,0,1,0,...,0,0,1,0,0,0,0,0,1,1


1b. Read the data documentation in the `Algorithmic Fairness Data Documentation` file. What are the possible values of the `Race` variable in the dataset? (1 point)

Answer:

6 values: 
1. Other (i.e., none of the races below)
2. Asian
3. Native American
4. Caucasian (same as “White” in the ProPublica analysis)
5. Hispanic
6. African-American (same as “Black” in the ProPublica analysis)

## Problem 2: Predicting Recidivism with a Full Logistic Regression (20 points)

Now you will use the train set to train a logistic regression model using `sklearn.linear_model.LogisticRegression`. To train a logistic regression model on features X to predict outcome y, you can use the command:

`LogisticRegression(penalty='none').fit(X, y)`

You will have to replace X and y with the data you actually want to use. [Here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) is the documentation on logistic regression. Use the "recidivism_outcome" column as the variable you are trying to predict (y).

Then, we will ask you to report model performance metrics. Use the test set to compute those quantities. You could take a look at helpful functions from the [scikit-learn metrics module](https://scikit-learn.org/stable/modules/model_evaluation.html), imported at the beginning of the assignment as `metrics`.

2a. Train a logistic regression model using all features (except `recidivism_outcome`) as input features X. (5 points)

In [5]:
y_train = train_data["recidivism_outcome"]
X_train = train_data.drop(["recidivism_outcome"], axis=1)
lr_model = LogisticRegression(penalty='none')
lr_model.fit(X_train, y_train)

LogisticRegression(penalty='none')

2b. Predict recidivism for the test set. Display the first 10 predictions. (1 point)

In [6]:
y_test = test_data["recidivism_outcome"]
X_test = test_data.drop(["recidivism_outcome"], axis=1)
y_pred = lr_model.predict(X_test)
y_prob = lr_model.predict_proba(X_test)[:,1]

2c. Report your model's AUC (i) for all defendants, (ii) for white defendants, and (iii) for Black defendants. Report the values with 4 decimal points. (1 point)

In [7]:
# AUC using class probabilities
#all defendants
auc_all_defendants = metrics.roc_auc_score(y_test, y_prob)

#white defendants
white_defendants_list = test_data[test_data["Race = Caucasian"] == 1]
white_defendants_index = white_defendants_list.index
auc_white_defendants = metrics.roc_auc_score(y_test[white_defendants_index], y_prob[white_defendants_index])

#african-american defendants
african_american_defendants_list = test_data[test_data["Race = African American"] == 1]
african_american_defendants_index = african_american_defendants_list.index
auc_african_american_defendants = metrics.roc_auc_score(y_test[african_american_defendants_index], y_prob[african_american_defendants_index])

print("AUC using class probabilities")
print(f"AUC for all defendants - {auc_all_defendants:.4f}")
print(f"AUC for white defendants - {auc_white_defendants:.4f}")
print(f"AUC for african american defendants - {auc_african_american_defendants:.4f}")

AUC using class probabilities
AUC for all defendants - 0.7247
AUC for white defendants - 0.7369
AUC for african american defendants - 0.6997


In [8]:
# AUC using class predictions
#all defendants
auc_all_defendants = metrics.roc_auc_score(y_test, y_pred)

#white defendants
white_defendants_list = test_data[test_data["Race = Caucasian"] == 1]
white_defendants_index = white_defendants_list.index
auc_white_defendants = metrics.roc_auc_score(y_test[white_defendants_index], y_pred[white_defendants_index])

#african-american defendants
african_american_defendants_list = test_data[test_data["Race = African American"] == 1]
african_american_defendants_index = african_american_defendants_list.index
auc_african_american_defendants = metrics.roc_auc_score(y_test[african_american_defendants_index], y_pred[african_american_defendants_index])

print("AUC using class predictions")
print(f"AUC for all defendants - {auc_all_defendants:.4f}")
print(f"AUC for white defendants - {auc_white_defendants:.4f}")
print(f"AUC for african american defendants - {auc_african_american_defendants:.4f}")

AUC using class predictions
AUC for all defendants - 0.6726
AUC for white defendants - 0.6632
AUC for african american defendants - 0.6564


2d. Report your model's false positive rate (i) for all defendants, (ii) for white defendants, and (iii) for Black defendants. Report the values with 4 decimal points. (1 point)

In [9]:
tn_all_defendants, fp_all_defendants, fn_all_defendants, tp_all_defendants = metrics.confusion_matrix(y_test, y_pred.round()).ravel()
fpr_all_defendants = fp_all_defendants/(fp_all_defendants + tn_all_defendants)

tn_white_defendants, fp_white_defendants, fn_white_defendants, tp_white_defendants = metrics.confusion_matrix(y_test[white_defendants_index], y_pred[white_defendants_index].round()).ravel()
fpr_white_defendants = fp_white_defendants/(fp_white_defendants + tn_white_defendants)

tn_african_american_defendants, fp_african_american_defendants, fn_african_american_defendants, tp_african_american_defendants = metrics.confusion_matrix(y_test[african_american_defendants_index], y_pred[african_american_defendants_index].round()).ravel()
fpr_african_american_defendants = fp_african_american_defendants/(fp_african_american_defendants + tn_african_american_defendants)

print(f"FPR for all defendants - {fpr_all_defendants:.4f}")
print(f"FPR for white defendants - {fpr_white_defendants:.4f}")
print(f"FPR for african american defendants - {fpr_african_american_defendants:.4f}")

FPR for all defendants - 0.2821
FPR for white defendants - 0.1719
FPR for african american defendants - 0.4165


2e. Report your model's false negative rate (i) for all defendants, (ii) for white defendants, and (iii) for Black defendants. Report the values with 4 decimal points. (1 point)

In [10]:
fnr_all_defendants = fn_all_defendants/(fn_all_defendants + tp_all_defendants)
fnr_white_defendants = fn_white_defendants/(fn_white_defendants + tp_white_defendants)
fnr_african_american_defendants = fn_african_american_defendants/(fn_african_american_defendants + tp_african_american_defendants)

print(f"FNR for all defendants - {fnr_all_defendants:.4f}")
print(f"FNR for white defendants - {fnr_white_defendants:.4f}")
print(f"FNR for african american defendants - {fnr_african_american_defendants:.4f}")

FNR for all defendants - 0.3727
FNR for white defendants - 0.5018
FNR for african american defendants - 0.2707


2f. Report the fraction of defendants classified positive by your model (i) for all defendants, (ii) for white defendants, and (iii) for Black defendants. Report the values with 4 decimal points. (1 point)

In [11]:
all_defendants_positive = (tp_all_defendants + fp_all_defendants)/ (fn_all_defendants+tn_all_defendants+tp_all_defendants+fp_all_defendants)
white_defendants_positive = (tp_white_defendants + fp_white_defendants)/(tn_white_defendants + fp_white_defendants + fn_white_defendants + tp_white_defendants)
african_american_defendants_positive = (tp_african_american_defendants + fp_african_american_defendants)/(tn_african_american_defendants + fp_african_american_defendants + fn_african_american_defendants + tp_african_american_defendants)

print(f"Fraction for all defendants classified positive - {all_defendants_positive:.4f}")
print(f"Fraction for white defendants classified positive - {white_defendants_positive:.4f}")
print(f"Fraction for african american defendants classified positive - {african_american_defendants_positive:.4f}")

Fraction for all defendants classified positive - 0.4362
Fraction for white defendants classified positive - 0.2982
Fraction for african american defendants classified positive - 0.5763


2g. In at least 5 sentences, describe what you observe, and any algorithmic fairness concerns it raises, making reference to specific algorithmic fairness concepts discussed in class and using quantitative evidence from 2c-f. Do you believe this algorithm is fair enough to be deployed in practice? Why or why not? (10 points)

Answer:

I see several issues here such as even though the overall AUC for black defendants is higher, the other metrics are raising concerns. Firstly, the False Positive Rate for Black defendants(0.4165) is much higher than that of White defendants(0.1719), which means that there is a higher chance for Black defendants to be falsely tagged for re-offend or recidivate as compared to white defendants, infact its 2.4 times more likely that a Black defendant is falsely tagged to recidivate. Secondly, the False Negative Rate for White defendants is much higher than that of Black defendants. This could mean that, even White defendants who are likely to re-offend are not classified as such, whereas Black defendants are. This is an example of severe racial discrimination. It is clear that the data is biased against Black defendants. Furthermore, there is predictive inequality as the difference between False Positive Rate and False Negative Rate for a particular class(black or white) are differing vastly. Also the fraction of Black defendants classified as postitve (0.5763) is much higher than that of White defendants(0.2982). According to this algorithm, it is nearly twice as likely for a Black defendant to be classified as positive than a White defendant.

This algorithm clearly perpetuates the existing racial bias present in society and if it were to be deployed in practice, it could cause serious harm due to incorrect predictions - either keep an innocent person in prison due to wrongly predicting they would recidivate, or releasing prisoners to would go on to commit more crimes. In both cases there are several ethical issues here. At its present form, the algorithm cannot be used as the sole determinant of recidivism. In order to be used, it has to be modified on the principles of algorithmic fairness, the data needs de-biasing and human input will be needed to make the final call on if a defendant is likely to recidivate based on the model's outputs.

## Problem 3: Predicting Recidivism with Your Own Model (15 points)

Now you will train your own model to predict recidivism.

3a. Train a model of your choice. You can choose any input features that should be used as well as any pre-processing technique. You are welcome to use models which are not logistic regression models, but if you want to use logistic regression, that's also fine! (2 points)

In [12]:
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV, StratifiedKFold
import re
from sklearn.preprocessing import StandardScaler
import statistics

In [13]:
train_data = pd.read_csv("/Users/srimoyee/Downloads/HW1_DueFeb8/recidivism-training-data.csv")
test_data = pd.read_csv("/Users/srimoyee/Downloads/HW1_DueFeb8/recidivism-testing-data.csv")

In [14]:
# removing columns that contain race information and other information highly correlated with race

train_data_modified = train_data.drop(['Age < 25', 'Age >= 25 and <=45', 'Age > 45', 'Gender = Female', 'Gender = Male', 'Race = Other','Race = Asian', 
'Race = Native American', 
'Race = Caucasian', 
'Race = Hispanic', 
'Race = African American'], axis=1)
test_data_modified = test_data.drop(['Age < 25', 'Age >= 25 and <=45', 'Age > 45', 'Gender = Female', 'Gender = Male', 'Race = Other','Race = Asian', 
'Race = Native American', 
'Race = Caucasian', 
'Race = Hispanic', 
'Race = African American'], axis=1)

In [15]:
regex = re.compile(r"\[|\]|<", re.IGNORECASE)
train_data_modified.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in train_data_modified.columns.values]
test_data_modified.columns = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in test_data_modified.columns.values]

In [16]:
y_train=train_data_modified["recidivism_outcome"]
X_train=train_data_modified.drop(["recidivism_outcome"], axis=1)

y_test=test_data_modified["recidivism_outcome"]
X_test = test_data_modified.drop(["recidivism_outcome"], axis=1)

In [17]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [18]:
param_grid = {
    'n_estimators': [100, 200, 500, 1000],
    'max_depth': [1, 3, 5, 7],
    'learning_rate': [0.001, 0.01, 0.05, 0.1]
}

In [19]:
XGB_model = XGBClassifier(objective='binary:logistic', eval_metric='auc', booster='gbtree')

In [20]:
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
grid_search = GridSearchCV(XGB_model, param_grid, scoring='roc_auc', cv=cv, n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)

GridSearchCV(cv=StratifiedKFold(n_splits=5, random_state=42, shuffle=True),
             estimator=XGBClassifier(base_score=None, booster='gbtree',
                                     callbacks=None, colsample_bylevel=None,
                                     colsample_bynode=None,
                                     colsample_bytree=None, device=None,
                                     early_stopping_rounds=None,
                                     enable_categorical=False,
                                     eval_metric='auc', feature_types=None,
                                     gamma=None, grow_policy=None,
                                     importanc...
                                     max_cat_to_onehot=None,
                                     max_delta_step=None, max_depth=None,
                                     max_leaves=None, min_child_weight=None,
                                     missing=nan, monotone_constraints=None,
                                

In [21]:
final_model = XGBClassifier(**grid_search.best_params_)

In [22]:
final_model.fit(X_train_scaled, y_train)

y_pred = final_model.predict(X_test_scaled)
y_prob = final_model.predict_proba(X_test_scaled)[:, 1]

3b. Report the model's performance on (i) all, (ii) white, and (iii) Black defendants, using whatever metrics you believe are appropriate (at least 2). You are also welcome to evaluate performance on other sensitive/protected groups. (4 points)

In [23]:
white_defendants_list = test_data[test_data['Race = Caucasian'] == 1]
white_defendants_true_values = y_test[white_defendants_list.index]
white_defendants_predicted_probabilities = y_prob[white_defendants_list.index]
white_defendants_predicted_classes = y_pred[white_defendants_list.index]

african_american_defendants_list = test_data[test_data['Race = African American'] == 1]
african_american_defendants_true_values = y_test[african_american_defendants_list.index]
african_american_defendants_predicted_probabilities = y_prob[african_american_defendants_list.index]
african_american_defendants_predicted_classes = y_pred[african_american_defendants_list.index]

In [24]:
# AUC using class probabilities

print("AUC using class probabilities")

auc=metrics.roc_auc_score(y_test, y_prob)
print(f"AUC for all defendents - {auc:.4f}")


auc_white = metrics.roc_auc_score(white_defendants_true_values, white_defendants_predicted_probabilities)
print(f"AUC for white defendents - {auc_white:.4f}")


auc_black = metrics.roc_auc_score(african_american_defendants_true_values, african_american_defendants_predicted_probabilities)
print(f"AUC for black defendents - {auc_black:.4f}")

AUC using class probabilities
AUC for all defendents - 0.6926
AUC for white defendents - 0.7095
AUC for black defendents - 0.6633


In [25]:
# AUC using class predictions

print("AUC for using class predictions")

auc_=metrics.roc_auc_score(y_test, y_pred)
print(f"AUC for all defendents - {auc_:.4f}")

auc_white = metrics.roc_auc_score(white_defendants_true_values, white_defendants_predicted_classes)
print(f"AUC for white defendents - {auc_white:.4f}")


auc_black = metrics.roc_auc_score(african_american_defendants_true_values, african_american_defendants_predicted_classes)
print(f"AUC for black defendents - {auc_black:.4f}")

AUC for using class predictions
AUC for all defendents - 0.6441
AUC for white defendents - 0.6454
AUC for black defendents - 0.6244


In [26]:
tn_all_defendants, fp_all_defendants, fn_all_defendants, tp_all_defendants = metrics.confusion_matrix(y_test, y_pred.round()).ravel()
fpr_all_defendants = fp_all_defendants/(fp_all_defendants + tn_all_defendants)

tn_white_defendants, fp_white_defendants, fn_white_defendants, tp_white_defendants = metrics.confusion_matrix(white_defendants_true_values, white_defendants_predicted_classes).ravel()
fpr_white_defendants = fp_white_defendants/(fp_white_defendants + tn_white_defendants)

tn_african_american_defendants, fp_african_american_defendants, fn_african_american_defendants, tp_african_american_defendants = metrics.confusion_matrix(african_american_defendants_true_values, african_american_defendants_predicted_classes).ravel()
fpr_african_american_defendants = fp_african_american_defendants/(fp_african_american_defendants + tn_african_american_defendants)

print(f"FPR for all defendants - {fpr_all_defendants:.4f}")
print(f"FPR for white defendants - {fpr_white_defendants:.4f}")
print(f"FPR for african american defendants - {fpr_african_american_defendants:.4f}")

FPR for all defendants - 0.2988
FPR for white defendants - 0.2321
FPR for african american defendants - 0.3927


In [27]:
fnr_all_defendants = fn_all_defendants/(fn_all_defendants + tp_all_defendants)
fnr_white_defendants = fn_white_defendants/(fn_white_defendants + tp_white_defendants)
fnr_african_american_defendants = fn_african_american_defendants/(fn_african_american_defendants + tp_african_american_defendants)

print(f"FNR for all defendants - {fnr_all_defendants:.4f}")
print(f"FNR for white defendants - {fnr_white_defendants:.4f}")
print(f"FNR for african american defendants - {fnr_african_american_defendants:.4f}")

FNR for all defendants - 0.4130
FNR for white defendants - 0.4770
FNR for african american defendants - 0.3585


In [28]:
#F1 = 2 * TP / (2 * TP + FN + FP)

f1_score_all_defendants = (2*tp_all_defendants)/(2*tp_all_defendants+fn_all_defendants+fp_all_defendants)
f1_score_white_defendants = (2*tp_white_defendants)/(2*tp_white_defendants+fn_white_defendants+fp_white_defendants)
f1_score_african_american_defendants = (2*tp_african_american_defendants)/(2*tp_african_american_defendants+fn_african_american_defendants+fp_african_american_defendants)

print(f"F1 score for all defendants - {f1_score_all_defendants:.4f}")
print(f"F1 score for white defendants - {white_defendants_positive:.4f}")
print(f"F1 score for african american - {f1_score_african_american_defendants:.4f}")

F1 score for all defendants - 0.5997
F1 score for white defendants - 0.2982
F1 score for african american - 0.6359


In [29]:
all_defendants_positive = (tp_all_defendants + fp_all_defendants)/ (fn_all_defendants+tn_all_defendants+tp_all_defendants+fp_all_defendants)
white_defendants_positive = (tp_white_defendants + fp_white_defendants)/(tn_white_defendants + fp_white_defendants + fn_white_defendants + tp_white_defendants)
african_american_defendants_positive = (tp_african_american_defendants + fp_african_american_defendants)/(tn_african_american_defendants + fp_african_american_defendants + fn_african_american_defendants + tp_african_american_defendants)

print(f"Fraction for all defendants classified positive - {all_defendants_positive:.4f}")
print(f"Fraction for white defendants classified positive - {white_defendants_positive:.4f}")
print(f"Fraction for african american defendants classified positive - {african_american_defendants_positive:.4f}")

Fraction for all defendants classified positive - 0.4274
Fraction for white defendants classified positive - 0.3447
Fraction for african american defendants classified positive - 0.5197


3c. Write two paragraphs defending your model design choices, and explaining why you designed the model the way you did. Refer to results from 3b to provide evidence. You're welcome to write two paragraphs explaining why you don't think models should be used in criminal risk prediction at all - this is a reasonable perspective! - but you still need to provide quantitative or non-quantitative evidence to back up your claims. (9 points)

While designing my own model, my focus was to reduce the predictive inequalites and difference in odds between the classes. I tried different models such as Random Forest and XGBoost. Out of the two XGBoost performed better. Furthermore, I did a bit of preliminary exploration and also learnt from the class lecture that there are often other factors that are strongly correlated with Race such as Age and Gender. Hence I dropped these before training my model. I also tried to use SMOTE (https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/) to reduce the class imbalance a bit but it did not help so I left it out. I also modified the column names using regex because XGBoost was giving me an error in processing columns which has '[]' or other such characters. Further, since I was going to use the XGBoost classifier I used a Standard Scaler to scale my data before training the model.

I used grid search cross validation to obtain the best possible parameters for my XGBoost classifier model and then used these parameters to define the model. I then used the model for prediction and obtained class predictions as well as class probabilities. I used AUC score, F1 score, False Positive Rate, False Negative Rate and Accuracy (fraction of positive classifications) as metrics to evaluate my model. As you can see, the overall AUC is higher as compared to the logistic regression model. Also the AUC for White and Black defendants is higher. Furthermore, there is more the difference between the FPR and TPR rates becomes narrower than when logistic regression is used. 
Here are the values again for better comparison:

FPR -> A person who is not likely to re-offend is incorrectly labelled as likely to re-offend

FPR values using Logistic Regression:   

White: 0.1719  

Black: 0.4165     

FPR values using XGBoost:

White: 0.2321

Black: 0.3927

FNR -> A person likely to re-offend is incorrectly labelled as not likely to do so

FNR values using Logistic Regression:

White: 0.5018

Black: 0.2707                                             

FNR values using XGBoost:

White: 0.4770

Black: 0.3585

With the logistic regression model, there is a huge risk of very high number of false negatives(0.5018) for white defendants. What this means is that even when a white defendant recidivates, they are incorrectly labelled to not do so. The ethical implication of this is enormous and becomes even more critical if the degree of crime is severe.

These values indicate that the XGBoost Model is more balanced in terms of FPR and FNR for African American defendants which is something I was trying to achieve. Furthermore, it tries to balance out the very high False Negative Rate for white defendants.

I also used the F1 score in an attempt to understand the predictions better and a higher score for African American defendants meant that it was making good classifications for that particular class.

I think models could be used in criminal risk prediction but they should not be considered the absolute ground truth and the decisions made by the models should not be finalised without human input. This is because as seen from the two very different types of models implemented above, there is an issue of bias in the data which becomes really hard to de-bias when the model is designed. These models are thus propagating existing racial biases in the society through biased datasets and amplifying them. Especially in critical scenarios like healthcare and criminal justice, one should always proceed with caution while using these models, and if they are used at all, the decisions must be reviewed and examined by humans/subject matter experts before finalising.

I think the ProPublica Analysis gives an insight as to why there are several issues with deploying such models in the context of criminal justice. To add on to it, the metrics as highlighted above raise serious concerns with how accurate these models are. With hardly ~70% accuracy, it seems like a gamble on someone's life and freedom to treat these models' predictions as absolute ground truth and make decisions based on that.


# Sources cited

Please cite any sources used to complete this homework in the markdown cell below. Nobody remembers everything, and it's always a good idea to use documentation and online resources to ensure you're growing your skills.

Note that copying text from generative AI technology, such as ChatGPT, will be considered plagiarism (and hence result in a 0 grade on this submission). You are allowed to use ChatGPT as a general educational resource (the same way you would a webpage, without copying from it). But if you used ChatGPT as an educational resource, you must include (a) your prompts, (b) ChatGPT's responses, and (c) your validation of why ChatGPT's response is correct; simply noting that the code makes sense does not suffice as an explanation.

**ADD YOUR SOURCES HERE**

1. https://stackoverflow.com/questions/48645846/pythons-xgoost-valueerrorfeature-names-may-not-contain-or
2. https://towardsdatascience.com/predict-vs-predict-proba-scikit-learn-bdc45daa5972
3. ChatGPT

    prompt: how to print upto 4 decimal places in python
    
    answer: In Python, you can use the format() function or f-strings to print numbers with a specific number of decimal places. 
    
    number = 3.141592653589793
    
    formatted_number = f"{number:.4f}"
    
    print(formatted_number)
    
    
 Reason/Validation: I needed a quick way to print the numbers upto 4 decimal places and I had forgotten the exact syntax. I just needed GPT's answer to verify what I already knew and implement it.
 
4. https://datascience.stackexchange.com/questions/123922/different-accuracy-scores-with-sklearn-roc-auc-score-on-same-model-using-sklearn

