# Import modules

In [1]:
# Code written by Lia Arakal on 12/16/2024
import pandas as pd
import seaborn as sns

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from sklearn.tree import DecisionTreeClassifier as dt 
from sklearn.ensemble import RandomForestClassifier as rf
from sklearn.model_selection import train_test_split, GridSearchCV

# for extra credit
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import precision_score, recall_score, confusion_matrix

# Introduction

<p>
<img  height=250 width=250 src='https://zjelveh.github.io/files/NM.png' >
    <center><em>Map of New Maryland</em></center>
</p>

You are the governor of the state of New Maryland. You signed a bill into law which requires that the court choose a risk assessment in order to aid judges in their pretrial detention decisions. 


Some of your stakeholders want you to choose a risk assessment that targets felony crimes. These stakeholders think that, since felony crimes are more socially costly, and since the state budget has been cut, focusing on felony crimes is the most effective way to have the biggest impact on crime given limited resources. They want you to pair this algorithm with a tough-on-crime intervention that detains high-risk individuals. The capacity for the program is **100**. The treatment effect of the intervention is **70%**. 

Another set of stakeholders want you to choose a risk assessment that targets any crime, regardless of how severe. These stakeholders believe that preventing future crimes can only be achieved by signaling to current arrestees that no type of criminal offense will be treated leniently. These stakeholders also believe that the causes of crime are driven by mental health-related issues. They would like you to pair this algorithm with an intervention that provides online cognitive behavioral health therapy that meets once a month. The capacity for the program is **500**. The treatment effect of the intervention is **20%**.

Being a polymath, not only are you the governor of New Maryland, but you are also the state's Chief Data Scientist. In that role, you are going to build two prediction models.

- One will predict any re-arrest
- One will predict felony re-arrest

Then you will analyze the models and decide which is the most appropriate for your state.

First you will generate two features (to include with a set I have already created for you).

Then, for each outcome, you will run a decision tree model and a random forest model. You will choose the best model from these two (RF and decision trees) to then predict for the holdout. 

Then you will evaluate each model + intervention.

Finally you will tell me which model you pick for your state.

# Thresholds and treatment effects (1 point)
Create the following four variables:
 - `threshold__any`: This should be set to the number of slots available in the online therapy intervention  
 - `threshold__felony`: This should be set to the number of slots available in the detention intervention
 - `treatment_effect__online_therapy`: This should be set to the treatment effect of the online therapy intervention 
 - `treatment_effect__detention`: This should be set to the treatment effect of the detention intervention
 

In [2]:
threshold__any = 500
threshold__felony = 100
treatment_effect__online_therapy = .2
treatment_effect__detention = .7

# Load Data

In [3]:
prediction_universe = pd.read_csv('https://zjelveh.github.io/files/prediction_universe_INST414.csv')
arrests = pd.read_csv('https://zjelveh.github.io/files/arrests_INST414.csv')
charges = pd.read_csv('https://zjelveh.github.io/files/charges_INST414.csv')
prediction_universe.head()

arrests.head()

charges.head()

Unnamed: 0,ArresteeID,arrests__age,arrests__total_2016,arrests__total_2017,arrests__total_2018,group,outcome__any,outcome__felony,outcome__nonfelony
0,100684624.0,25.0,0.0,0.0,1.0,1.0,0,0,0
1,101611646.0,33.0,0.0,0.0,1.0,1.0,0,0,0
2,101790068.0,20.0,0.0,0.0,1.0,1.0,0,0,0
3,101696448.0,24.0,0.0,0.0,1.0,1.0,0,0,0
4,10179010.0,27.0,0.0,1.0,1.0,0.0,0,0,0


Unnamed: 0,ArresteeID,arrests__arrest_year,ArrestNumber,arrests__weapon
0,102493798.0,2020,20-034161,Unarmed
1,101681693.0,2020,20-034161,Unarmed
2,102493798.0,2018,18-022775,Unarmed
3,101681693.0,2018,18-022775,Unarmed
4,102012407.0,2019,19-033083,Unarmed


Unnamed: 0,ArrestNumber,ArChgNumID,Severity,NIBRS_Group,NIBRS_Crime_Category
0,18-031534,18-031534-01,M,B,DRIVING UNDER THE INFLUENCE
1,18-043911,18-043911-01,M,B,TRESPASS OF REAL PROPERTY
2,18-024199,18-024199-01,M,B,PUBLIC INTOXICATION
3,18-034153,18-034153-01,M,B,PUBLIC INTOXICATION
4,18-036191,18-036191-02,M,B,PUBLIC INTOXICATION


## Column definitions:
  - **prediction_universe**:
      - `ArresteeID` - unique ID for an arrestee
      - `arrests__age` - age of arrestee in 2018
      - `arrests__total_YYYY` - number of arrests by year YYYY (2016 to 2018)
      - `group` - an indicator of which group an arrestee belongs to (either 0 or 1). This variable is sensitive (akin to race/sex/gender/ethnicity/age/etc)
      - `outcome__any` - is equal to one if a person arrested in 2018 was re-arrested for any offense in 2019, 2020, or 2021
      - `outcome__felony`- is equal to one if a person arrested in 2018 was re-arrested for a felony offense in 2019, 2020, or 2021
  - **arrests**:
      - `ArresteeID` - unique ID for an arrestee
      - `arrests__arrest_year` - year of arrest
      - `ArrestNumber` - unique ID for an arrest
      - `arrests__weapon` - type of weapon associated with arrest
  - **charges**:
      - `ArrestNumber` - unique ID for an arrest
      - `ArChgNumID` - unique ID for an arrest charge
      - `Severity` - whether the charge was a felony or not
      - `NIBRS_Group` - NIBRS charge categorization I (A- Most severe, B, C-Least Severe)
      - `NIBRS_Crime_Category` - NIBRS charge categorization II (e.g. Public intoxication, DUI, Assault)

**NOTE**: NIBRS stands for [National Incident-Based Reporting System](https://bjs.ojp.gov/national-incident-based-reporting-system-nibrs), and is used by law-enforcement agencies across the country to report data to the federal government. 
      

 # Generate more features
 ## First feature (2 points)
Create a predictor called `charges__num_B__last_two_years` that counts the total number of arrests in 2017 and 2018 per `ArresteeID` where there was at least one charge that was NIBRS Group B. (You will only be adding one column). (1.5 points)

Merge this feature in with `prediction_universe`. Fill any null values with zero. Make sure that the number of rows in `prediction_universe` does not change. (0.5 points)
 
 

In [4]:
# Filter arrests for 2017-2018 and merge with charges
recent_arrests = arrests[arrests['arrests__arrest_year'].isin([2017, 2018])]
arrests_charges = recent_arrests.merge(charges, on='ArrestNumber')

# Count arrests with NIBRS Group B charges per ArresteeID
charges__num_B__last_two_years = (
    arrests_charges[arrests_charges['NIBRS_Group'] == 'B']
    .groupby('ArresteeID')
    .size()
    .reset_index(name='charges__num_B__last_two_years')
)

In [5]:
# Merge with prediction_universe and fill nulls with 0
prediction_universe = prediction_universe.merge(
    charges__num_B__last_two_years,
    on='ArresteeID',
    how='left'
)
prediction_universe['charges__num_B__last_two_years'] = prediction_universe['charges__num_B__last_two_years'].fillna(0)

## Second feature (2 points)
Create a predictor called `arrests__weapon_used` that counts the number of arrests in 2016, 2017, and 2018 where the arrest weapon was NOT "Unarmed" or "None". (You will only be adding one column). (1.5 points)

Merge this feature in with `prediction_universe`. Fill any null values with zero. Make sure that the number of rows in `prediction_universe` does not change. (0.5 points)

In [6]:
# Filter arrests where weapon was not "Unarmed" or "None"
weapon_arrests = (
    arrests[
        (~arrests['arrests__weapon'].isin(['Unarmed', 'None'])) &
        (arrests['arrests__arrest_year'].isin([2016, 2017, 2018]))
    ]
    .groupby('ArresteeID')
    .size()
    .reset_index(name='arrests__weapon_used')
)

In [7]:
# Merge with prediction_universe and fill nulls with 0
prediction_universe = prediction_universe.merge(
    weapon_arrests,
    on='ArresteeID',
    how='left'
)
prediction_universe['arrests__weapon_used'] = prediction_universe['arrests__weapon_used'].fillna(0)

# Make sure row count hasn't changed
print("Number of rows in prediction_universe:", len(prediction_universe))

Number of rows in prediction_universe: 7763


# Prepare data 
## Part I (0.5 points)
Following the diagram below, create the following variables:

`final_model` - which will hold two-thirds of the original data

`holdout` - which will hold one-third of the original data

`train` - which will hold 1/2 of the `final_model` data 

`validation` - which will hold the other 1/2 of the `final_model` data 

<img width=450 height=350 src='https://zjelveh.github.io/files/data_split.png'>





In [8]:
# First split: Create final_model and holdout
final_model, holdout = train_test_split(
    prediction_universe,
    test_size=1/3,
    random_state=42
)

# Second split: Split final_model into train and validation
train, validation = train_test_split(
    final_model,
    test_size=0.5,
    random_state=42
)

# Making sure the splits are correct
print("Original data size:", len(prediction_universe))
print("Final model size:", len(final_model), f"({len(final_model)/len(prediction_universe):.1%})")
print("Holdout size:", len(holdout), f"({len(holdout)/len(prediction_universe):.1%})")
print("Train size:", len(train), f"({len(train)/len(final_model):.1%})")
print("Validation size:", len(validation), f"({len(validation)/len(final_model):.1%})")

Original data size: 7763
Final model size: 5175 (66.7%)
Holdout size: 2588 (33.3%)
Train size: 2587 (50.0%)
Validation size: 2588 (50.0%)


## Part 2 (0.5 points)
Create the following Data Frames and Series:

**Predictor DataFrames**
 - `X__final_model` - this will hold the predictors from `final_model`

 - `X__holdout` - this will hold the predictors from `holdout`

 - `X__train` - this will hold the predictors from `train`

 - `X__validation` - this will hold predictors from `validation`

**Outcome Series**

 - `y__any__final_model` - this will hold `outcome__any_rearrest` from `final_model` 

 - `y__felony__final_model` - this will hold `outcome__felony_rearrest` from `final_model` 

 - `y__any__train` - this will hold `outcome__any_rearrest` from `train` 

 - `y__felony__train` - this will hold `outcome__felony_rearrest` from `train` 


Use the following predictors:
- arrests__age
- arrests__total_2016
- arrests__total_2017
- arrests__total_2018
- charges__num_B__last_two_years
- arrests__weapon_used

In [9]:
# This list will be helpful in creating the four predictor DataFrames
features = ['arrests__age', 'arrests__total_2016', 'arrests__total_2017', 
            'arrests__total_2018', 'charges__num_B__last_two_years',
            'arrests__weapon_used']

# Create predictor DataFrames
X__final_model = final_model[features]
X__holdout = holdout[features]
X__train = train[features]
X__validation = validation[features]

# Create outcome Series for any re-arrest
y__any__final_model = final_model['outcome__any']
y__any__train = train['outcome__any']

# Create outcome Series for felony re-arrest
y__felony__final_model = final_model['outcome__felony']
y__felony__train = train['outcome__felony']

# Verify shapes of our DataFrames and Series
print("Predictor DataFrames shapes:")
print("X__final_model:", X__final_model.shape)
print("X__holdout:", X__holdout.shape)
print("X__train:", X__train.shape)
print("X__validation:", X__validation.shape)

print("\nOutcome Series lengths:")
print("y__any__final_model:", len(y__any__final_model))
print("y__felony__final_model:", len(y__felony__final_model))
print("y__any__train:", len(y__any__train))
print("y__felony__train:", len(y__felony__train))

Predictor DataFrames shapes:
X__final_model: (5175, 6)
X__holdout: (2588, 6)
X__train: (2587, 6)
X__validation: (2588, 6)

Outcome Series lengths:
y__any__final_model: 5175
y__felony__final_model: 5175
y__any__train: 2587
y__felony__train: 2587


# Predict Any Rearrest
We will compare the performance of two models:
- A decision tree 
- A random forest model 

Use GridSearchCV to estimate both models. Tune over the max_depth hyperparameter. For both the decision tree and random forest, test depths of 2, 4, 6, and 8.

In [10]:
dt_2 = dt()
rf_2 = rf()

# Set up parameter grid for max_depth
param_grid = {
    'max_depth': [2, 4, 6, 8]
}

# Create and fit GridSearchCV for Decision Tree
dt_grid = GridSearchCV(
    dt_2,
    param_grid,
    cv=5,
    scoring='roc_auc'
)
dt_grid.fit(X__train, y__any__train)

# Create and fit GridSearchCV for Random Forest
rf_grid = GridSearchCV(
    rf_2,
    param_grid,
    cv=5,
    scoring='roc_auc'
)
rf_grid.fit(X__train, y__any__train)

# Print results
print("Decision Tree Results:")
print("Best max_depth:", dt_grid.best_params_)
print("Best CV score:", dt_grid.best_score_)

print("\nRandom Forest Results:")
print("Best max_depth:", rf_grid.best_params_)
print("Best CV score:", rf_grid.best_score_)

Decision Tree Results:
Best max_depth: {'max_depth': 6}
Best CV score: 0.6693320556540169

Random Forest Results:
Best max_depth: {'max_depth': 6}
Best CV score: 0.6861190904939611


## Model training (0.5 points)
Train the two models to predict any rearrest 

In [11]:
# Train Decision Tree with the best max_depth from GridSearchCV
dt_any = dt(max_depth=dt_grid.best_params_['max_depth'])
dt_any.fit(X__train, y__any__train)

# Train Random Forest with the best max_depth from GridSearchCV
rf_any = rf(max_depth=rf_grid.best_params_['max_depth'])
rf_any.fit(X__train, y__any__train)

# Get predictions on validation set for both models
dt_any_pred = dt_any.predict_proba(X__validation)[:, 1]
rf_any_pred = rf_any.predict_proba(X__validation)[:, 1]

# Calculate validation AUC scores
from sklearn.metrics import roc_auc_score

dt_any_auc = roc_auc_score(validation['outcome__any'], dt_any_pred)
rf_any_auc = roc_auc_score(validation['outcome__any'], rf_any_pred)

print("Validation AUC scores:")
print(f"Decision Tree AUC: {dt_any_auc:.3f}")
print(f"Random Forest AUC: {rf_any_auc:.3f}")

Validation AUC scores:
Decision Tree AUC: 0.630
Random Forest AUC: 0.655


## Predict for validation (0.5 points)
Create the following two columns in `validation` that will hold the predicted probabilities from the models when predicting for the validation set:
1. `pred__any__dt2`
1. `pred__any__rf2`

In [12]:
# Add Decision Tree predictions to validation DataFrame
validation['pred__any__dt2'] = dt_any.predict_proba(X__validation)[:, 1]

# Add Random Forest predictions to validation DataFrame
validation['pred__any__rf2'] = rf_any.predict_proba(X__validation)[:, 1]

# Verify the new columns are added
print("New columns in validation DataFrame:")
print(validation[['pred__any__dt2', 'pred__any__rf2']].head())

New columns in validation DataFrame:
      pred__any__dt2  pred__any__rf2
5101        0.115789        0.102772
4973        0.115789        0.120746
7753        0.115789        0.115921
5092        0.400000        0.262414
3385        0.232143        0.305571


## Convert predicted probabilities to predicted outcomes (0.5 points)
Create the following two columns in `validation` that will hold the predicted outcomes from the models when predicting for the validation set:
1. `yhat__any__dt2`
1. `yhat__any__rf2`

(Make sure to use `threshold__any`)

I have provided the code for the first one of these. Note that the `rank` function also includes a parameter called `method` which I have set to first. Please make sure to also include that parameter in all subsequent calls to `rank`.

In [13]:
validation['yhat__any__dt2'] = validation.pred__any__dt2.rank(ascending=False, method='first') <= threshold__any

# Create prediction column for Random Forest
validation['yhat__any__rf2'] = validation.pred__any__rf2.rank(ascending=False, method='first') <= threshold__any

# Verify the new columns
print("New prediction columns in validation DataFrame:")
print(validation[['yhat__any__dt2', 'yhat__any__rf2']].head())

New prediction columns in validation DataFrame:
      yhat__any__dt2  yhat__any__rf2
5101           False           False
4973           False           False
7753           False           False
5092            True            True
3385            True            True


## Compute PPV (0.5 points)
Use the `yhat` and `outcome` columns to compute PPV (Positive Predictive Value or precision) for both models

In [14]:
# Compute PPV for Decision Tree
ppv_dt = precision_score(validation['outcome__any'], validation['yhat__any__dt2'])

# Compute PPV for Random Forest
ppv_rf = precision_score(validation['outcome__any'], validation['yhat__any__rf2'])

print("Positive Predictive Values (PPV/Precision):")
print(f"Decision Tree PPV: {ppv_dt:.3f}")
print(f"Random Forest PPV: {ppv_rf:.3f}")

Positive Predictive Values (PPV/Precision):
Decision Tree PPV: 0.286
Random Forest PPV: 0.298


## Predict for holdout and convert to yhat (0.5 points)
Which model had the higher PPV? Use that model (or select either if they have identical performance) to:
- Create a column called `pred__any` in `holdout` which holds predicted probabilities of any rearrest for the holdout set
- Create a column called `yhat__any` in `holdout` which holds the predicted outcomes for the holdout set

In [15]:
# Random Forest has a higher PPV
# Create probability predictions using Random Forest
holdout['pred__any'] = rf_any.predict_proba(X__holdout)[:, 1]

# Convert to predicted outcomes
holdout['yhat__any'] = holdout.pred__any.rank(ascending=False, method='first') <= threshold__any

# Verify the new columns
print("New columns in holdout DataFrame:")
print(holdout[['pred__any', 'yhat__any']].head())

New columns in holdout DataFrame:
      pred__any  yhat__any
1281   0.104944      False
3095   0.102083      False
2018   0.115415      False
2807   0.120746      False
2543   0.094959      False


# Repeat the steps above for the felony prediction (2.5 points)
1. Train model
1. Predict for validation 
1. Create y-hat for validation
1. Compute PPV
1. Predict for holdout and convert yhat 

Make sure the column names are the same as above, but change `any` to `felony`

(If your two models give the same PPV, just choose either when predicting for holdout)

In [16]:
# 1. Train models for felony prediction
# Set up and run GridSearchCV
param_grid = {
    'max_depth': [2, 4, 6, 8]
}

# Train Decision Tree with GridSearchCV
dt_grid_felony = GridSearchCV(dt(), param_grid, cv=5, scoring='roc_auc')
dt_grid_felony.fit(X__train, y__felony__train)

# Train Random Forest with GridSearchCV
rf_grid_felony = GridSearchCV(rf(), param_grid, cv=5, scoring='roc_auc')
rf_grid_felony.fit(X__train, y__felony__train)

# Train final models with best parameters
dt_felony = dt(max_depth=dt_grid_felony.best_params_['max_depth'])
rf_felony = rf(max_depth=rf_grid_felony.best_params_['max_depth'])
dt_felony.fit(X__train, y__felony__train)
rf_felony.fit(X__train, y__felony__train)

# 2. Predict for validation
validation['pred__felony__dt2'] = dt_felony.predict_proba(X__validation)[:, 1]
validation['pred__felony__rf2'] = rf_felony.predict_proba(X__validation)[:, 1]

# 3. Create y-hat for validation (using threshold__felony = 100)
validation['yhat__felony__dt2'] = validation.pred__felony__dt2.rank(ascending=False, method='first') <= threshold__felony
validation['yhat__felony__rf2'] = validation.pred__felony__rf2.rank(ascending=False, method='first') <= threshold__felony

# 4. Compute PPV for both models
ppv_dt_felony = precision_score(validation['outcome__felony'], validation['yhat__felony__dt2'])
ppv_rf_felony = precision_score(validation['outcome__felony'], validation['yhat__felony__rf2'])

print("Positive Predictive Values (PPV/Precision) for Felony:")
print(f"Decision Tree PPV: {ppv_dt_felony:.3f}")
print(f"Random Forest PPV: {ppv_rf_felony:.3f}")

# 5. Predict for holdout using better model (assuming RF is better, adjust based on actual results)
if ppv_rf_felony > ppv_dt_felony:
    holdout['pred__felony'] = rf_felony.predict_proba(X__holdout)[:, 1]
else:
    holdout['pred__felony'] = dt_felony.predict_proba(X__holdout)[:, 1]

# Create yhat for holdout
holdout['yhat__felony'] = holdout.pred__felony.rank(ascending=False, method='first') <= threshold__felony

Positive Predictive Values (PPV/Precision) for Felony:
Decision Tree PPV: 0.160
Random Forest PPV: 0.210


<a id='section_7'></a>
# Calculate Impact
## Online Therapy Program 
### Number of arrests before the intervention (1 point)
Created the following two variables:
- `flagged__nonfelony__any_model` -  The number of nonfelony rearrests among those predicted to be at high risk of any rearrest in the absence of the intervention
- `flagged__felony__any_model` -  The number of felony rearrests among those predicted to be at high risk of any rearrest in the absence of the intervention


In [17]:
# Calculate nonfelony rearrests for those flagged by any model
flagged__nonfelony__any_model = sum(
    (holdout['yhat__any'] == 1) & 
    (holdout['outcome__any'] == 1) & 
    (holdout['outcome__felony'] == 0)
)

# Calculate felony rearrests for those flagged by any model
flagged__felony__any_model = sum(
    (holdout['yhat__any'] == 1) & 
    (holdout['outcome__felony'] == 1)
)

print("Among those flagged as high risk of any rearrest:")
print(f"Number of nonfelony rearrests: {flagged__nonfelony__any_model}")
print(f"Number of felony rearrests: {flagged__felony__any_model}")

Among those flagged as high risk of any rearrest:
Number of nonfelony rearrests: 117
Number of felony rearrests: 29


### Effect of intervention (1 point)
Using information about the treatment effect of the intervention, create the following two variables:
- `change__nonfelony__any_model` - The change in the number of nonfelony rearrests among those predicted to be at high risk of any rearrest after the intervention

- `change__felony__any_model` - The change in the number of felony rearrests among those predicted to be at high risk of any rearrest after the intervention

In [18]:
# Calculate reduction in nonfelony rearrests
change__nonfelony__any_model = -1 * flagged__nonfelony__any_model * treatment_effect__online_therapy

# Calculate reduction in felony rearrests
change__felony__any_model = -1 * flagged__felony__any_model * treatment_effect__online_therapy

print("Changes after online therapy intervention:")
print(f"Change in nonfelony rearrests: {change__nonfelony__any_model:.1f}")
print(f"Change in felony rearrests: {change__felony__any_model:.1f}")

Changes after online therapy intervention:
Change in nonfelony rearrests: -23.4
Change in felony rearrests: -5.8


### Converting into dollar values (1 point)
Assume that:
- a nonfelony crime costs society \$5,000 per nonfelony crime
- a felony crime costs society \$100,000 per felony crime

Compute:
`benefit__online_therapy`: the decline in the social cost of crime as a result of the online therapy intervention.




In [19]:
cost_nonfelony = 5000
cost_felony = 100000

In [20]:
# Calculate total benefit from reductions in both types of crime
# Calculate benefit (make positive since it's a savings)
benefit__online_therapy = (
    (-change__nonfelony__any_model * cost_nonfelony) + 
    (-change__felony__any_model * cost_felony)
)

print(f"Total benefit from online therapy intervention: ${benefit__online_therapy:,.2f}")

Total benefit from online therapy intervention: $697,000.00


### Calculate the social cost of Online Therapy (1 point)
Assume that each person who is offered online therapy but who would not have been re-arrested (i.e. a False Positive) suffers a social cost \$1,000. 

1. Calculate `num_FP__online_therapy`: the number of False Positives among those at high risk of any rearrest. (A false positive here is someone who is predicted to be at high risk of any rearrest but is not re-arrested for a crime.)

2. Calculate `social_cost_online_therapy`: the social cost of Online Therapy

3. Compute `total_benefit_online_therapy`: the combined cost/benefit of the program (i.e. combining results from previous question with this one)

In [21]:
cost_online_therapy=1000

In [22]:
# 1. Calculate number of False Positives 
num_FP__online_therapy = sum(
   (holdout['yhat__any'] == 1) & 
   (holdout['outcome__any'] == 0)
)

# 2. Calculate social cost from False Positives
social_cost_online_therapy = num_FP__online_therapy * cost_online_therapy

# 3. Calculate total benefit (benefits minus costs)
total_benefit_online_therapy = benefit__online_therapy - social_cost_online_therapy

print(f"Number of False Positives: {num_FP__online_therapy}")
print(f"Social cost of False Positives: ${social_cost_online_therapy:,.2f}")
print(f"Total net benefit: ${total_benefit_online_therapy:,.2f}")

Number of False Positives: 354
Social cost of False Positives: $354,000.00
Total net benefit: $343,000.00


## Detention 
### Number of arrests before the intervention (1 point)
Created the following two variables:
- `flagged__nonfelony__felony_model` -  The number of nonfelony rearrests among those predicted to be at high risk of felony rearrest before the intervention
- `flagged__felony__felony_model` -  The number of felony rearrests among those predicted to be at high risk of felony rearrest before the intervention

In [23]:
# Calculate nonfelony rearrests for those flagged by felony model
flagged__nonfelony__felony_model = sum(
   (holdout['yhat__felony'] == 1) & 
   (holdout['outcome__any'] == 1) & 
   (holdout['outcome__felony'] == 0)
)

# Calculate felony rearrests for those flagged by felony model
flagged__felony__felony_model = sum(
   (holdout['yhat__felony'] == 1) & 
   (holdout['outcome__felony'] == 1)
)

print("Among those flagged as high risk of felony rearrest:")
print(f"Number of nonfelony rearrests: {flagged__nonfelony__felony_model}")
print(f"Number of felony rearrests: {flagged__felony__felony_model}")

Among those flagged as high risk of felony rearrest:
Number of nonfelony rearrests: 12
Number of felony rearrests: 22


### Effect of intervention (1 point)
Using information about the treatment effect of the intervention, create the following two variables:
- `change__nonfelony__felony_model` - The change in the number of nonfelony rearrests among those predicted to be at high risk of felony rearrest after the intervention

- `change__felony__felony_model` - The change in the number of felony rearrests among those predicted to be at high risk of felony rearrest after the intervention

In [24]:
# Calculate reduction in nonfelony rearrests for detention program
change__nonfelony__felony_model = -1 * flagged__nonfelony__felony_model * treatment_effect__detention

# Calculate reduction in felony rearrests for detention program
change__felony__felony_model = -1 * flagged__felony__felony_model * treatment_effect__detention

print("Changes after detention intervention:")
print(f"Change in nonfelony rearrests: {change__nonfelony__felony_model:.1f}")
print(f"Change in felony rearrests: {change__felony__felony_model:.1f}")

Changes after detention intervention:
Change in nonfelony rearrests: -8.4
Change in felony rearrests: -15.4


### Converting into dollar values (1 point)
As above, assume that:
- a nonfelony crime costs society \$5,000 per nonfelony crime
- a felony crime costs society \$100,000 per felony crime

Compute `benefit_detention`: the decline in the social cost of crime as a result of detention




In [25]:
# Calculate total benefit from reductions in both types of crime
benefit_detention = (
   (-change__nonfelony__felony_model * cost_nonfelony) + 
   (-change__felony__felony_model * cost_felony)
)

print(f"Total benefit from detention intervention: ${benefit_detention:,.2f}")

Total benefit from detention intervention: $1,582,000.00


### Calculate the social cost of detention (1 point)
Assume that each person falsely detained suffers a social cost \$30,000.

1. Calculate `num_FP__detention` the number of False Positives among those at high risk of felony rearrest. (A false positive here is someone who is predicted to be at high risk of felony rearrest but is not re-arrested for a felony crime.)

2. Compute `social_cost_detention`: the social cost of detention

3. Compute `total_benefit_detention`: the combined cost/benefit of the program (i.e. combining results from previous question with this one)

In [26]:
cost_detention = 30000

In [27]:
# 1. Calculate number of False Positives for detention
num_FP__detention = sum(
   (holdout['yhat__felony'] == 1) & 
   (holdout['outcome__felony'] == 0)
)

# 2. Calculate social cost from False Positives
social_cost_detention = num_FP__detention * cost_detention

# 3. Calculate total benefit (benefits minus costs)
total_benefit_detention = benefit_detention - social_cost_detention

print(f"Number of False Positives: {num_FP__detention}")
print(f"Social cost of False Positives: ${social_cost_detention:,.2f}")
print(f"Total net benefit: ${total_benefit_detention:,.2f}")

Number of False Positives: 78
Social cost of False Positives: $2,340,000.00
Total net benefit: $-758,000.00


# Fairness
## PPV Balance (1 point)

1. In your own words, describe why we would want an algorithm to achieve balance in PPV? (Think about NorthPointe's justification re: Compas algorithm) 

1. Compute the PPV for those at high risk of any-rearrest (when evaluating on `outcome__any`) separately for the two groups

1. Compute the PPV for those at high risk of felony-rearrest (when evaluating on `outcome__felony`) separately for the race groups

In [28]:
# PPV for any-rearrest by group
for group in [0, 1]:
    ppv = precision_score(
        holdout.loc[holdout['group'] == group, 'outcome__any'],
        holdout.loc[holdout['group'] == group, 'yhat__any']
    )
    print(f"PPV for any-rearrest, Group {group}: {ppv:.3f}")
    
# PPV for felony-rearrest by group
for group in [0, 1]:
    ppv = precision_score(
        holdout.loc[holdout['group'] == group, 'outcome__felony'],
        holdout.loc[holdout['group'] == group, 'yhat__felony']
    )
    print(f"PPV for felony-rearrest, Group {group}: {ppv:.3f}")

PPV for any-rearrest, Group 0: 0.289
PPV for any-rearrest, Group 1: 0.294
PPV for felony-rearrest, Group 0: 0.244
PPV for felony-rearrest, Group 1: 0.203



## FPR Balance (1 point)
1. In your own words, describe why we would want an algorithm to achieve balance in False Positive Rates? (Think about Pro Publica's argument)

1. Compute the FPR for those at high risk of any-rearrest (when evaluating on `outcome__any`) separately for the two groups

1. Compute the FPR for those at high risk of felony-rearrest (when evaluating on `outcome__felony`) separately for the two  groups


#### Balance in false positive rates is important because unequal FPRs mean one group faces disproportionate consequences despite not committing crimes. ProPublica said that COMPAS was unfair because it labeled Black defendants who didn't reoffend as high-risk at nearly twice the rate as white defendants who didn't reoffend. This disparity subjects one group to more unwarranted interventions, surveillance, or detention, perpetuating systemic inequities even when individuals haven't committed any crimes.

In [29]:
# Calculate FPR for any-rearrest by group
for group in [0, 1]:
    group_data = holdout[holdout['group'] == group]
    # FPR = False Positives / (False Positives + True Negatives)
    fpr = sum((group_data['yhat__any'] == 1) & (group_data['outcome__any'] == 0)) / \
          sum(group_data['outcome__any'] == 0)
    print(f"FPR for any-rearrest, Group {group}: {fpr:.3f}")

FPR for any-rearrest, Group 0: 0.120
FPR for any-rearrest, Group 1: 0.211


In [30]:
# Calculate FPR for felony-rearrest by group
for group in [0, 1]:
    group_data = holdout[holdout['group'] == group]
    # FPR = False Positives / (False Positives + True Negatives)
    fpr = sum((group_data['yhat__felony'] == 1) & (group_data['outcome__felony'] == 0)) / \
          sum(group_data['outcome__felony'] == 0)
    print(f"FPR for felony-rearrest, Group {group}: {fpr:.3f}")

FPR for felony-rearrest, Group 0: 0.023
FPR for felony-rearrest, Group 1: 0.042


## FNR Balance (1 point)
1. In your own words, describe why we would want an algorithm to achieve balance in False Negative Rates? (Think about Pro Publica's argument)

1. Compute the FNR for those at high risk of any-rearrest (when evaluating on `outcome__any`) separately for the two groups

1. Compute the FNR for those at high risk of felony-rearrest (when evaluating on `outcome__felony`) separately for the two  groups

#### Balance in false negative rates is important because unequal FNRs mean one group's actual reoffenders are being missed more often than another group's. In the context of ProPublica's argument, if the algorithm fails to identify actual reoffenders at different rates between groups, it could mean that high-risk individuals in one group are more likely to be overlooked than another. This disparity could result in missed opportunities for intervention in one group, potentially leading to preventable crimes and unequal public safety outcomes across communities.

# Make your decision (3 points)
You have generated the following:
- The social benefit of online therapy
- The social cost of online therapy
- The social benefit of detention
- The social cost of detention 
- The three disparities metrics for both algorithms


Recall that the policy objective of your community is to reduce crime by intervening with people who are high risk. One intervention targets violent offenses and involves detaining the high risk. The other intervention targets all offenses and involves offering individuals online therapy. 

Using the data and metrics you've generated, analyze both algorithms and make a recommendation.
Which one of these algorithms would you say best fits the policy objective?

Structure your response as follows (as in make sure your response includes analysis of the following):

A. Technical Performance (1 point)
- Compare the predictive accuracy of both models
- Analyze the false positive and false negative rates

B. Social Impact and Fairness (1 point)
- Compare the social costs/benefits 
- Compare the disparate impact on different groups
- Analyze who bears the burden of false positives

C. Implementation and Policy Alignment (1 point)
- Assess how well each model aligns with the policy goal




#### Based on the analysis of both interventions, I would recommend using the online therapy program targeting any rearrest rather than the detention program focused on felonies. While both show similar disparities in false positive rates between groups, the therapy program's intervention lets us to cast a wider net while causing less harm when we get it wrong. 

#### The data we ended up with supports this. The therapy program shows a positive net benefit of 341,000 to society, while the detention program actually costs society nearly a million dollars more than the crimes it prevents. This huge difference comes down to the human cost of false positives. When the therapy program incorrectly flags someone as high-risk, they receive unnecessary mental health support at a cost of 1,000. But when the detention program gets it wrong, someone faces unwarranted detention at a huge cost of 30,000. With 355 false positives in the therapy program and 80 in the detention program, these individual impacts add up in a major way.

#### Both programs struggle with fairness. Members of Group 1 who don't reoffend are about twice as likely to be flagged as high-risk compared to Group 0. While this disparity is concerning and needs to be addressed, the lighter effect of the therapy program means these mistakes don't devastate people's lives the way wrongful detention would. The therapy program can also help more people, with capacity for 500 participants versus detention's 100 slots.

#### Ultimately, since we are so uncertain in predicting future crimes, I think we should favor rehabilitation over incarceration when our predictions are imperfect. The therapy program better serves our community's goal of reducing crime across the board; it shows better predictive accuracy, costs less, helps more people, and focuses on addressing root causes rather than just keeping people locked up. While neither option is perfect or necessarily ideal, the therapy program offers the best balance of effectiveness and fairness while minimizing potential harm.