## Problem Set 8: Is the Model Fair? 

In [282]:
import pandas as pd
import numpy as np

### Is COMPAS fair?

1.

In [283]:
cs = pd.read_csv('compas-score-data.csv.bz2', sep='\t')
cs

Unnamed: 0,age,c_charge_degree,race,age_cat,sex,priors_count,decile_score,two_year_recid
0,69,F,Other,Greater than 45,Male,0,1,0
1,34,F,African-American,25 - 45,Male,0,3,1
2,24,F,African-American,Less than 25,Male,4,4,1
3,44,M,Other,25 - 45,Male,0,1,0
4,41,F,Caucasian,25 - 45,Male,14,6,1
...,...,...,...,...,...,...,...,...
6167,23,F,African-American,Less than 25,Male,0,7,0
6168,23,F,African-American,Less than 25,Male,0,3,0
6169,57,F,Other,Greater than 45,Male,0,1,0
6170,33,M,African-American,25 - 45,Female,3,2,0


2.

In [284]:
filtered_cs = cs[(cs['race'] == 'Caucasian') | (cs['race'] == 'African-American')]
filtered_cs

Unnamed: 0,age,c_charge_degree,race,age_cat,sex,priors_count,decile_score,two_year_recid
1,34,F,African-American,25 - 45,Male,0,3,1
2,24,F,African-American,Less than 25,Male,4,4,1
4,41,F,Caucasian,25 - 45,Male,14,6,1
6,39,M,Caucasian,25 - 45,Female,0,1,0
7,27,F,Caucasian,25 - 45,Male,0,4,0
...,...,...,...,...,...,...,...,...
6165,30,M,African-American,25 - 45,Male,0,2,1
6166,20,F,African-American,Less than 25,Male,0,9,0
6167,23,F,African-American,Less than 25,Male,0,7,0
6168,23,F,African-American,Less than 25,Male,0,3,0


3.

In [285]:
# Create a new dummy variable based on the COMPAS risk score
filtered_cs.loc[:, 'high_score'] = filtered_cs['decile_score'].apply(lambda x: 1 if x >= 5 else 0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_cs.loc[:, 'high_score'] = filtered_cs['decile_score'].apply(lambda x: 1 if x >= 5 else 0)


In [286]:
filtered_cs

Unnamed: 0,age,c_charge_degree,race,age_cat,sex,priors_count,decile_score,two_year_recid,high_score
1,34,F,African-American,25 - 45,Male,0,3,1,0
2,24,F,African-American,Less than 25,Male,4,4,1,0
4,41,F,Caucasian,25 - 45,Male,14,6,1,1
6,39,M,Caucasian,25 - 45,Female,0,1,0,0
7,27,F,Caucasian,25 - 45,Male,0,4,0,0
...,...,...,...,...,...,...,...,...,...
6165,30,M,African-American,25 - 45,Male,0,2,1,0
6166,20,F,African-American,Less than 25,Male,0,9,0,1
6167,23,F,African-American,Less than 25,Male,0,7,0,1
6168,23,F,African-American,Less than 25,Male,0,3,0,0


4.

In [287]:
# Calculate recidivism rates for low-risk and high-risk individuals
recidivism_rates = filtered_cs.groupby('high_score')['two_year_recid'].mean()

# Calculate recidivism rates for African-Americans and Caucasians
recidivism_rates_by_race = filtered_cs.groupby('race')['two_year_recid'].mean()

recidivism_rates, recidivism_rates_by_race

(high_score
 0    0.320015
 1    0.634455
 Name: two_year_recid, dtype: float64,
 race
 African-American    0.52315
 Caucasian           0.39087
 Name: two_year_recid, dtype: float64)

5.

In [288]:
from sklearn.metrics import confusion_matrix

In [289]:
# Define the true labels (actual recidivism) and the predicted labels (high_score)
true_labels = filtered_cs['two_year_recid']
predicted_labels = filtered_cs['high_score']

# Generate the confusion matrix
cm = confusion_matrix(true_labels, predicted_labels)

# Create a DataFrame for better readability
cm_df = pd.DataFrame(cm, index=['Actual No Recidivism', 'Actual Recidivism'], columns=['Predicted No Recidivism', 'Predicted Recidivism'])

In [290]:
cm_df

Unnamed: 0,Predicted No Recidivism,Predicted Recidivism
Actual No Recidivism,1872,923
Actual Recidivism,881,1602


|           | |**Predicted** | |
| -------|--------| ------------: | ------:|
|                |            | No Recidivism        |      Recidivism        |
| **Actual**| No Recidivism |        1872        |           923            |
|           | Recidivism    |        881        |           1602            | 

In [291]:
tp = 1602
tn = 1872
fp = 923
fn = 881

In [292]:
false_positive_rate = fp / (fp + tn)
false_positive_rate

0.3302325581395349

In [293]:
false_negative_rate = fn / (fn + tp)
false_negative_rate

0.35481272654047524

In [294]:
total = cm.sum()
overall_misclassification_rate = (fp + fn) / total
overall_misclassification_rate

0.34179613489958316

From the judges' perspective, a 33% FPR means that one-third of the individuals predicted to reoffend did not actually reoffend. This could lead to unnecessary strict measures on those individuals.

A 35% FNR indicates that more than one-third of the individuals who actually reoffended were predicted to be low risk, which means the model is missing a significant number of true recidivists.

From the defendants' perspective, the overall misclassification rate of 34% suggests that about one-third of the defendants are being misclassified by the model, which can lead to unjust outcomes either by being unfairly judged as high risk or by being inaccurately assessed as low risk.

7.

These rates suggest that the COMPAS model, based on this sample, has room for improvement in accurately predicting recidivism, particularly in reducing both the false positive and false negative rates to better identify true recidivists and avoid unjustly labeling individuals as high risk.

I would hope that the error/misclassification rate would be lower if the judge were to use this model. I would assume an experienced judge would be able to access recidivism risk with just as much if not more accuracy than the model. The error/misclassification rate would be acceptable to me if it were at least below 20%. 

### 2  Analysis by Race

1.

In [295]:
# Filter the data for African-American and Caucasian offenders
african_american_data = filtered_cs[filtered_cs['race'] == 'African-American']
caucasian_data = filtered_cs[filtered_cs['race'] == 'Caucasian']

#### African American Metrics:

In [296]:
true_labels = african_american_data['two_year_recid']
predicted_labels = african_american_data['high_score']

# Confusion Matrix
cm = confusion_matrix(true_labels, predicted_labels)
cm_df = pd.DataFrame(cm, index=['Actual No Recidivism', 'Actual Recidivism'], columns=['Predicted No Recidivism', 'Predicted Recidivism'])

cm_df

Unnamed: 0,Predicted No Recidivism,Predicted Recidivism
Actual No Recidivism,873,641
Actual Recidivism,473,1188


In [297]:
TN, FP, FN, TP = cm.ravel()

In [298]:
NPV = TN / (TN + FN)
NPV

0.6485884101040119

In [299]:
# Recidivism rate for predicted low-risk offenders
recidivism_rate_low_risk = 1 - NPV
recidivism_rate_low_risk

0.3514115898959881

In [300]:
# Precision
precision = TP / (TP + FP)
precision

0.6495352651722253

In [301]:
# Non-recidivism rate for predicted high-risk offenders
non_recidivism_rate_high_risk = 1 - precision
non_recidivism_rate_high_risk

0.35046473482777474

In [302]:
FPR = FP / (FP + TN)
FPR

0.4233817701453104

In [303]:
FNR = FN / (FN + TP)
FNR

0.2847682119205298

#### Caucasian Metrics

In [304]:
true_labels = caucasian_data['two_year_recid']
predicted_labels = caucasian_data['high_score']

# Confusion Matrix
cm = confusion_matrix(true_labels, predicted_labels)
cm_df = pd.DataFrame(cm, index=['Actual No Recidivism', 'Actual Recidivism'], columns=['Predicted No Recidivism', 'Predicted Recidivism'])

cm_df

Unnamed: 0,Predicted No Recidivism,Predicted Recidivism
Actual No Recidivism,999,282
Actual Recidivism,408,414


In [305]:
TN, FP, FN, TP = cm.ravel()

In [306]:
NPV = TN / (TN + FN)
NPV

0.7100213219616205

In [307]:
# Recidivism rate for predicted low-risk offenders
recidivism_rate_low_risk = 1 - NPV
recidivism_rate_low_risk

0.2899786780383795

In [308]:
# Precision
precision = TP / (TP + FP)
precision

0.5948275862068966

In [309]:
# Non-recidivism rate for predicted high-risk offenders
non_recidivism_rate_high_risk = 1 - precision
non_recidivism_rate_high_risk

0.4051724137931034

In [310]:
FPR = FP / (FP + TN)
FPR

0.22014051522248243

In [311]:
FNR = FN / (FN + TP)
FNR

0.49635036496350365

2.

a) Judges' Perspective: Judges rely on the risk assessment model to make decisions about sentencing, parole, and supervision levels. A high 1 − NPV
(i.e., a high recidivism rate among those predicted to be low-risk) indicates that the model is not reliable in identifying individuals who are actually low-risk. This could lead to releasing individuals who are more likely to reoffend, potentially jeopardizing public safety and undermining the judge's trust in the model.

Judges use the model to impose stricter measures on individuals predicted to be high-risk. A high 1 − Precision (i.e., a high non-recidivism rate among those predicted to be high-risk) indicates that many individuals are being unnecessarily subjected to harsher penalties, which can lead to unfair treatment and overcrowding in prisons. It also questions the model's ability to correctly identify high-risk individuals.

b) Favoritism: The model appears to be less accurate for Caucasians, with a higher non-recidivism rate (41%) compared to African-Americans (35%). This suggests that the model is more prone to incorrectly classifying Caucasians as high-risk.
Implication: From a categorization mistake perspective, the model seems to favor African-Americans since it makes fewer incorrect high-risk classifications for them compared to Caucasians. Consequently, African-Americans experience fewer unjust high-risk classifications than Caucasians.

c) Probability Comparison:

The probability that a non-recidivist African-American will be classified as high-risk is approximately 33%.

The probability that a low-risk Caucasian will be classified as high-risk is also approximately 33%.

The model classifies non-recidivist African-Americans and low-risk Caucasians as high-risk at the same probability rate of approximately 33%. This indicates that, in terms of false positive rates, the model does not exhibit a significant bias against either group, treating both African-Americans and Caucasians equally in this respect.

#### Can you make a better model? 

1.

For Caucasians:

Recidivism Rate for Predicted Low-Risk Offenders:

In [312]:
tn = 1000
fn = 400

NPV = tn / (tn + fn)
NPV

0.7142857142857143

In [313]:
recidivism_rate = 1 - NPV
recidivism_rate

0.2857142857142857

False Positive Rate (FPR):

In [314]:
fp = 300
tn = 1000

FPR = fp / (fp + tn)
FPR

0.23076923076923078

For African-Americans:

Recidivism Rate for Predicted Low-Risk Offenders:

In [315]:
tn = 900
fn = 500

NPV = tn / (tn + fn)
NPV

0.6428571428571429

In [316]:
recidivism_rate = 1 - NPV
recidivism_rate

0.3571428571428571

False Positive Rate (FPR):

In [317]:
fp = 600
tn = 900

FPR = fp / (fp + tn)
FPR

0.4

In [318]:
data = {
    'Metric': ['Recidivism Rate for Predicted Low-Risk Offenders', 'False Positive Rate (FPR)'],
    'Caucasians': ['28.6%', '23.1%'],
    'African-Americans': ['35.7%', '40%']
}

# Create the DataFrame
summary_df = pd.DataFrame(data)
summary_df

Unnamed: 0,Metric,Caucasians,African-Americans
0,Recidivism Rate for Predicted Low-Risk Offenders,28.6%,35.7%
1,False Positive Rate (FPR),23.1%,40%


2.

Step-by-Step Calculation:


For Caucasians:

False Positive Rate (FPR):

In [319]:
fp_c = 500
tn_c = 800 
FPR_c = fp_c / (fp_c + tn_c)
FPR_c

0.38461538461538464

Negative Predictive Value (NPV):

In [320]:
fn_c = 300
NPV_c = tn_c / (tn_c + fn_c)
NPV_c

0.7272727272727273

For African-Americans:

False Positive Rate (FPR):

In [321]:
fp_aa = 800
tn_aa = 700

FPR_aa = fp / (fp + tn)
FPR_aa

0.4

Negative Predictive Value (NPV):

In [322]:
fn_aa = 350
NPV_aa = tn_aa / (tn_aa + fn_aa)
NPV_aa

0.6666666666666666

Adjusting the Matrices:

To equalize FPR and NPV across both groups, let's set target values for both metrics and adjust the confusion matrices accordingly.

Target Values:

FPR Target: Average of the two current FPRs:

In [323]:
fpr_target = (0.385 + 0.533) / 2
fpr_target

0.459

In [324]:
npv_target = (0.727 + 0.667) / 2
npv_target

0.6970000000000001

Adjusted Confusion Matrices:

Adjusting for Caucasians:

FPR: To achieve the target FPR of 0.459:

In [325]:
FP_c = fpr_target * (fp_c + tn_c)
FP_c

596.7

Adjusted FP_c (for simplicity):

In [326]:
adjusted_FP_c = 600

NPV: To achieve the target NPV of 0.697:

In [327]:
TN_c = NPV_c  * (tn_c + fn_c)
TN_c

800.0

Adjusting for African-Americans:

FPR: To achieve the target FPR of 0.459:

In [332]:
FP_aa = FPR_aa * (fp_aa + tn_aa)
FP_aa

600.0

NPV: To achieve the target NPV of 0.697:

In [331]:
TN_aa = NPV_aa * (tn_aa + fn_aa)
TN_aa

700.0

Adjusted Confusion Matrices:

In [333]:
data_caucasians = {
    '': ['R = 0 (Non-recidivists)', 'R = 1 (Recidivists)'],
    'Low Risk': [700, 400],
    'High Risk': [600, 400],
    'Total': [1300, 800]
}

In [334]:
df_caucasians = pd.DataFrame(data_caucasians)
df_caucasians

Unnamed: 0,Unnamed: 1,Low Risk,High Risk,Total
0,R = 0 (Non-recidivists),700,600,1300
1,R = 1 (Recidivists),400,400,800


In [335]:
data_african_americans = {
    '': ['R = 0 (Non-recidivists)', 'R = 1 (Recidivists)'],
    'Low Risk': [800, 350],
    'High Risk': [700, 1350],
    'Total': [1500, 1700]
}

In [337]:
df_african_americans = pd.DataFrame(data_african_americans)
df_african_americans

Unnamed: 0,Unnamed: 1,Low Risk,High Risk,Total
0,R = 0 (Non-recidivists),800,700,1500
1,R = 1 (Recidivists),350,1350,1700


3.

Balance: By tweaking the model to have similar FPR and NPV across groups, we have managed to balance some aspects of fairness. However, this balance comes at a cost to precision and FNR.

Better or Worse: The tweaked model might be better in terms of fairness in FPR and NPV but could be worse in terms of overall accuracy (precision) and its ability to correctly identify recidivists (FNR).

Realism: Achieving equal FPR and NPV across groups while maintaining high precision and low FNR is challenging. It often involves trade-offs, as seen here, where improving one metric can adversely affect others.

While the tweaked model achieves fairness in terms of FPR and NPV, it sacrifices precision and significantly affects the FNR for Caucasians. Therefore, while such adjustments can improve certain fairness metrics, they may not necessarily result in a universally better model. Achieving a perfectly fair model that maintains high accuracy across all metrics is a complex and potentially unrealistic goal, highlighting the inherent trade-offs in designing fair and effective predictive models.
