# Law, Order, and Algorithms
## Included-variable bias

In this lab, we will investigate how to use a regression model to measure disparities across different groups, and discuss some of the problems that might arise in doing so. We will use the NYC stop and frisk data we have been using in previous labs.

In [0]:
# Some initial setup
options(digits = 3)
library(tidyverse)

theme_set(theme_bw())

# Read the data. For computational reasons, we'll work with a sample of the data.
# We also exclude rare suspected crime categories, and relevel 
# the race variable so that "white" is the base category
set.seed(1)
stop_df <- read_rds("../data/sqf_sample.rds") %>%
  sample_n(1e4) %>%
  group_by(suspected_crime) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  mutate(suspect_race = relevel(suspect_race, "white"))

The loaded data frame is a sample of stops in NYC, recorded on a 
[UF-250 form][uf250_link]

Below is a list of columns in the data, roughly corresponding to the [UF-250 form][uf250_link]:

* Base information regarding stop:
    * `id`, `year`, `date`, `time`, `precinct`, `location_housing`, 
      `suspected_crime`

* Circumstances which led to stop:
    * `stopped_bc_object`, `stopped_bc_desc`, `stopped_bc_casing`,
      `stopped_bc_lookout`, `stopped_bc_clothing`, `stopped_bc_drugs`,
      `stopped_bc_furtive`, `stopped_bc_violent`, `stopped_bc_bulge`,
      `stopped_bc_other` 
    
* Suspect demographics:
    * `suspect_dob`, `suspect_id_type`, `suspect_sex`, `suspect_race`,
      `suspect_hispanic`, `suspect_age`, `suspect_height`, `suspect_weight`,
      `suspect_hair`, `suspect_eye`, `suspect_build`, `reason_explained`,
      `others_stopped`

* Whether physical force was used:
    * `force_hands`, `force_wall`, `force_ground`, `force_drawn`,
      `force_pointed`, `force_baton`, `force_handcuffs`,
      `force_pepper`, `force_other`

* Was suspect arrested?: `arrested`

* Was summons issued?: `summons_issued`

* Officer in uniform?: `officer_uniform`, `officer_verbal`, `officer_shield`

* Was person frisked?: `frisked`
    * if yes: `frisk_reason_suspected_crime`, `frisk_reason_weapons`, 
      `frisk_reason_attire`, `frisk_reason_actual_crime`, 
      `frisk_reason_noncompliance`, `frisk_reason_threats`,
      `frisk_reason_prior`, `frisk_reason_furtive`, `frisk_reason_bulge`

* Was person searched?: `searched`,
    * if yes: `searched_hardobject`, `searched_outline`,
      `searched_admission`, `searched_other`

* Was weapon found?: `found_weapon`
    * if yes: `found_gun`, `found_pistol`, `found_rifle`, `found_assault`,
      `found_knife`, `found_machinegun`, `found_other`
      
* Was other contraband found?: `found_contraband`

* Additional circumstances/factors
    * `additional_report`, `additional_investigation`, `additional_proximity`, 
      `additional_evasive`, `additional_associating`, `additional_direction`, 
      `additional_highcrime`, `additional_time`, `additional_sights`, 
      `additional_other`

* Additional reports prepared: `extra_reports`

[uf250_link]: https://www.prisonlegalnews.org/media/publications/Blank%20UF-250%20Form%20-%20Stop%2C%20Question%20and%20Frisk%20Report%20Worksheet%2C%20NYPD%2C%202016.pdf

## Base rate disparities in the decision to frisk

First, let's measure the disparities in police decisions to frisk individuals of different race groups.

### Exercise 1: manual computation of odds and odds ratios

* **Step 1**: For each race group, compute the proportion that were frisked

In [0]:
# With the stop_df data, group by suspect_race and compute the proportion (mean) of frisked == 1
# WRITE CODE HERE


* **Step 2**: Given probability $p$ of being frisked, the *odds* of being frisked is $p / (1-p)$. 

For example, if $p = \frac{1}{2}$, you're equally likely to be frisked or not (i.e., odds = 1); if $p = \frac{2}{3}$, you're twice as likely to be frisked than not (odds = 2).

Using the proportion frisked from Step 1 as an estimate of the probability of being frisked, compute the *odds* of being frisked for each race group.

In [0]:
# Compute the odds, p / (1-p), where p is the proportion from step 1
# WRITE CODE HERE


* **Step 3**: A common method of comparing odds between two groups is to compute the *odds ratio*. 
This is simply the ratio between two odds.

For example, if the odds of being frisked is 0.8 for white pedestrians and 1.6 for Black pedestrians, the odds ratio of being frisked for Black vs. white pedestrians would be $1.6 / 0.8 = 2$. In other words, we would say stopped Black pedestrians have twice the odds of being frisked, compared to stopped white pedestrians.

Using the odds computed in Step 2, compute the odds ratio for minority groups (Black / Hispanic) versus white individuals.

In [0]:
# Compute odds of frisk for minority race group / odds of frisk for whites
# WRITE CODE HERE


### Base rate disparities with (logistic) regression

Another method for comparing differences in frisk rates is to use regression. 
Specifically, logistic regression is commonly used for binary decisions (e.g., where the decision is either "frisk" or "don't frisk").

In `R` we use the `glm` function to fit *generalized* linear models (e.g., logistic regression, poisson regression). 
In its simplest form, the `glm` function is specified with a `formula`, the `data`, and a `family` which indicates what type of regression is used.
A `formula` in `R` is specified in the form: `Left-hand-side variable ~ Right-hand-side specifications`.
For example, to fit a logistic regression (which is of the `"binomial"` family) of `frisked` to the `suspect_race` variable, using the `stop_df` data, we can write:

In [0]:
base_model <- glm(frisked ~ suspect_race, data = stop_df, family = binomial)

where the first argument to `glm` is assumed to be the `formula`. 

Using mathematical notation, this corresponds to the model:
$$
\Pr(\text{frisked}) = \operatorname{logit}^{-1}(
    \beta_0 + \beta_{\text{Black}}\mathbb{1}_{\text{Black}} + 
    \beta_{\text{Hispanic}}\mathbb{1}_{\text{Hispanic}}
),
$$
where
$$
\operatorname{logit}^{-1}(x) = 
\frac{e^{x}}{1 + e^{x}}
$$
and
$$
\operatorname{logit}(p) = \log\left(\frac{p}{1-p}\right).
$$

As a result
$$
\log\left(\frac{\Pr(\text{frisked})}
{1 - \Pr(\text{frisked})}\right) = 
\beta_0 + \beta_{\text{Black}}\mathbb{1}_{\text{Black}} + 
    \beta_{\text{Hispanic}}\mathbb{1}_{\text{Hispanic}},
$$
and so
$$
\frac{\Pr(\text{frisked})}{1 - \Pr(\text{frisked})}  = 
\exp\left(\beta_0 + \beta_{\text{Black}}\mathbb{1}_{\text{Black}} + 
    \beta_{\text{Hispanic}}\mathbb{1}_{\text{Hispanic}}\right).
$$

From the above model, $\exp(\beta_0)$ is the odds of being frisked for white individuals and $\exp(\beta_0+\beta_{\text{black}}) = \exp(\beta_0)\exp(\beta_{\text{Black}})$ is the odds of being frisked for black individuals.
Consequently, the odds _ratio_ of being frisked for Black vs. white pedestrians is $\exp(\beta_{\text{Black}})$: the exponentiated coefficient of
the variable indicating whether a pedestrian's race group is Black or not.

We can inspect the coefficients of the fitted model using the `coef()` function.

In [0]:
print(coef(base_model))

As we've seen above, the `(Intercept)` ($\beta_0$) term corresponds to the _log_-odds of being frisked for stopped white individuals, while the `suspect_raceblack` coefficient represents the change in *log*-odds (log of odds ratio) of being frisked for black individuals compared to white individuals. By exponentiating the coefficients, we can recover the odds of being frisked for whites and odds-ratio of being frisked for each minority race group with respect to whites.

In [0]:
# Exponentiating the coefficients recover odds ratio of treatment for each variable; 
# identical to what we find in Exercise 1, 
# while the exponentiated intercept represents the odds of treatment for the base case (whites) 
print(exp(coef(base_model)))

## Adjusting for other variables

One concern is that officers might have a legitimate reason to frisk certain individuals more often; it might just be that the "legitimate reason" is also highly correlated with race.

For example, as we discussed in earlier labs, one of the reasons for stopping an individual is if the officer suspects criminal posession of a weapon (encoded in the `suspected_crime` column as `cpw`).
Given that the primary justification of a frisk is concern for officer safety, one could argue that it is reasonable for an officer to 
frisk individuals whom they have stopped under suspicion of criminal posession of weapons.

(Although, whether an officer's _suspicion_ itself is justified is a different question, which we will address later)

### Adjusting for `suspected_crime == "cpw"` 

Given a regression model, we can "adjust for" additional variables in our data by including them in the right-hand side of our formula.

### Exercise 2: 

With `stop_df`, first create a new binary column named `is_cpw` that is `TRUE` if `suspected_crime` is `cpw`. Then fit a model that adjusts for this new covariate, and discuss the results.

In [0]:
# WRITE CODE HERE


### Exercise 3: Adjusting for confounding

Following the above logic, there could be multiple legitimate factors that account for the observed disparity of being frisked between different race groups. 
Explore the effects of adjusting for more covariates on the results.

In [0]:
# Use race_coefs(m) to inspect just the race coefficients of any fitted model m
race_coefs <- function(m) {
    coef(m)[c("suspect_raceblack", "suspect_racehispanic")]
}

# WRITE CODE HERE


## A kitchen-sink approach 

One common method for measuring disparities while addressing some of the omitted-variable bias concerns is to include _all_ recorded data that would have been available to the officer at the time of making the decision (to frisk an individual). This is also known as the "kitchen sink" approach.

The code below applies the kitchen sink approach to measure the disparate impact of 
frisk on minority race groups.

In [0]:
feats <- c(
    "suspected_crime",
    "precinct",
    "location_housing",
    "suspect_sex",
    "suspect_age",
    "suspect_height",
    "suspect_weight",
    "suspect_hair",
    "suspect_eye",
    "suspect_build",
    "additional_report",
    "additional_investigation",
    "additional_proximity",
    "additional_evasive",
    "additional_associating",
    "additional_direction",
    "additional_highcrime",
    "additional_time",
    "additional_sights",
    "additional_other",
    "stopped_bc_object",
    "stopped_bc_desc",
    "stopped_bc_casing",
    "stopped_bc_lookout",
    "stopped_bc_clothing",
    "stopped_bc_drugs",
    "stopped_bc_furtive",
    "stopped_bc_violent",
    "stopped_bc_bulge",
    "stopped_bc_other",
    "suspect_race"
)

# This creates a formula with a specified left-hand side (response = "frisked"),
# and using all the variables in feats on the right-hand side. 
# Constructing a formula in this way (instead of typing out all the variable names)
# is helpful for constructing multiple models that share a long list of variables in the right-hand side.
kitchen_sink_formula <- reformulate(feats, response = "frisked")

# We are only interested in the race coefficients
ks_model <- glm(kitchen_sink_formula, stop_df, family = binomial)
print(race_coefs(ks_model))

### Exercise 4:

Do you think this is a reasonable approach to measuring disparate impact?
What about disparate treatment?

## Included-variable bias
One problem with including all variables in measuring disparate impact is that an empirical connection between a factor and a decision is not necessarily justified.
An obvious example would be something like "skin color", where including skin color in the regression will likely account for observed disparities in race,
but the correlation between skin color and frisk decisions is unlikely to be justified!
On the other hand, a less obvious example would be an officer's suspicion of `cpw`.
While it seems reasonable that an officer would frisk individuals suspected of posessing a weapon more frequently,
the suspicion itself would only be justified if, and to the degree that, it is predictive of achieving the goal of a frisk: recovering weapons.

Blindly including a variable in a regression fails to take into account this _degree_ of justification, 
potentially overcompensating for variables that are correlated with race.
This is the problem known as _included-variable bias_.

## Challenge exercises: risk-adjusted regression

As discussed above, adjusting for any variable (i.e., including it in the regression) may only be justified to the degree that the variable is _predictive of the outcome we are ultimately interested in_ (in this case, recovering a weapon). But the extent to which each variable is justified is rarely clear.

One simple idea for addressing this concern of included-variable bias is to control for an explicit measure of **risk**, instead of controling for individual variables.
Intuitively, we wish to know whether individuals who have _similar risk_ (of carrying a weapon) were treated (frisked) equally.

### Exercise 5: Estimating risk

In order to adjust for risk, we must first estimate it. This is relatively straightforward in the context of frisk decisions in stop-and-frisk, 
because the goal of a frisk is relatively clear -- we wish to recover weapons. 
In other words, we want to predict whether a weapon would be found if an individual is frisked. 

* **Step 1**: `filter` the `stop_df` data to those individuals _who were frisked_. We will call this new data frame `frisked_df`

The _risk_ that we are interested in estimating is the probability that a weapon is recovered given that we _frisk_ someone who has already been stopped.
While there are many ways to achieve this, one simple way is to build a predictive model, estimating the probability that a weapon is recovered, 
using just the data for stopped individuals who happened to be frisked. 
(Implicitly, this relies on an assumption of [_ignorability_](https://en.wikipedia.org/wiki/Ignorability).)

In [0]:
# Subset the stop_df data to cases where the individual was frisked
# WRITE CODE HERE


* **Step 2**: Using the `frisked_df` data, fit a logistic regression model to predict whether or not a weapon is found using all features that would reasonably be available to an officer (as listed in `feats` above). Let's call this model `risk_model`. 

Note that we use logistic regression here for simplicity, but more complex methods for predictive modeling could be employed, with additional measures to avoid overfitting (such as splitting the data and regularizing).

In [0]:
# Using the subset of data from Step 2, fit the logistic regression model: found_weapon ~ (all available features in feats) 
# WRITE CODE HERE


* **Step 3**: Use the `risk_model` from above to generate a column of model estimated risk (we'll name this column `risk`) on the original `stop_df` data. 

_Tip_: Given a `glm` model named `risk_model`, a vector of probability predictions for `stop_df` can be created with the command `predict(risk_model, stop_df, type = "response")`.

In [0]:
# Generate a column of predicted risk 
# WRITE CODE HERE


## Distribution of risk

Now that we have an estimate of risk, we can explore the distribution of risk across race groups.

In [0]:
ggplot(filter(stop_df, suspect_race %in% c('white','black','hispanic')), aes(x = risk)) +
geom_histogram(binwidth = .01) +
facet_wrap(~ suspect_race) +
scale_x_continuous("Estimated probability of recovering weapon if frisked",
                   labels = scales::percent_format()) +
coord_cartesian(xlim = c(0, .1))

### Exercise 6: Frisk rates by estimated risk

Given risk estimates, consider individuals who have estimated risk between 4% and 5%. 
For each race group within this range of risk, compute: (1) the number stops; and (2) frisk rate.

Explore different ranges of risk. It might be helpful to refer to the histogram above to see roughly how many cases exist for each race group within each range of estimated risk.
Discuss your findings with your partner. What are some implications of these findings?

In [0]:
# WRITE CODE HERE


### Exercise 7: Risk-adjusted regression

Now compute risk-adjusted frisk rates for stopped individuals across different race groups.

How do these results compare to both the naive base rates and the "kitchen sink" approach?

In [0]:
# WRITE CODE HERE


### Exercise 8: Additional considerations

Even after adjusting for risk, disparities across certain features may still be justified.
For example, officers might enforce different standards for different location types: frisking individuals stopped in `transit` who they would not have frisked
if found in `housing`.  

Given the risk-adjusted regression results, what may be some other legitimate concerns? 
What are possible justifications for the racial disparities that persist after adjusting for risk? 
How could we revise our model to further account for such possibilities.

In [0]:
# WRITE CODE HERE
