# Law, Bias, and Algorithms
## Algorithmic fairness (2/2)

Today, we will continue building and evaluating our own risk assessment tool using the COMPAS data to examine some other aspects of fairness.

Just to recap what's in the data and what was the model we built:

In [1]:
# Some initial setup
options(digits = 3)
library(tidyverse)

theme_set(theme_bw())

# Because huge plots are ugly
options(repr.plot.width = 6, repr.plot.height = 4)

# Read the data
compas_df <- read_rds("../data/compas.rds")

# Recap the model
recid_model <- glm(is_recid ~ priors_count + age, data = compas_df, family = "binomial")
compas_df <- compas_df %>%
    mutate(
        risk = predict(recid_model, type = "response"),
        predicted_risk_score = round(risk * 10)
    )

Registered S3 methods overwritten by 'ggplot2':
  method         from 
  [.quosures     rlang
  c.quosures     rlang
  print.quosures rlang
Registered S3 method overwritten by 'rvest':
  method            from
  read_xml.response xml2
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──
[32m✔[39m [34mggplot2[39m 3.1.1       [32m✔[39m [34mpurrr  [39m 0.3.2  
[32m✔[39m [34mtibble [39m 2.1.1       [32m✔[39m [34mdplyr  [39m 0.8.0.[31m1[39m
[32m✔[39m [34mtidyr  [39m 0.8.3       [32m✔[39m [34mstringr[39m 1.4.0  
[32m✔[39m [34mreadr  [39m 1.3.1       [32m✔[39m [34mforcats[39m 0.4.0  
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


## COMPAS data revisited

A cleaned version of the COMPAS data is loaded as `compas_df`, with the following columns

* `id`: unique identifiers for each case
* `sex`, `dob`, `age`, `race`: demographic information for each defendant
* `recid_score`, `violence_score`: COMPAS scores assessing risk that a defendant will recidivate (`violence_score` for violent crimes) within two years of release (higher score correspond to higher risk)
* `priors_count`: number of prior arrests
* `is_recid`, `is_violent_recid`: Indicator variable that is `1` if the defendant was arrested for a new (violent) crime within two years of release, and `0` otherwise.

In [7]:
head(compas_df)

id,sex,dob,age,race,recid_score,violence_score,priors_count,is_recid,is_violent_recid
3,Male,1982-01-22,34,African-American,3,1,0,1,1
4,Male,1991-05-14,24,African-American,4,3,4,1,0
5,Male,1993-01-21,23,African-American,8,6,1,0,0
8,Male,1974-07-23,41,Caucasian,6,2,14,1,0
10,Female,1976-06-03,39,Caucasian,1,1,0,0,0
13,Male,1994-06-10,21,Caucasian,3,5,1,1,1


### Exercise 1: Calibration by gender

### Exercise 2: Relibrate the model by including gender

### Exercise 3: False positive rate and false negative rate

### Exercise 4: Equalizing false positive rates