# Policing Reservations: White Paper

### This document is intended to walk you through the analysis that was completed in support of this story. The raw code and outcomes are shown below, each including a brief explanation of their relevance to the analysis.

### Setup

In [2]:
# Libraries

library(tidyverse)
library(lubridate)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.2.1 ──
[32m✔[39m [34mggplot2[39m 3.2.0     [32m✔[39m [34mpurrr  [39m 0.3.3
[32m✔[39m [34mtibble [39m 2.1.3     [32m✔[39m [34mdplyr  [39m 0.8.3
[32m✔[39m [34mtidyr  [39m 0.8.3     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.4.0
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date



In [54]:
# Parameters

### File paths
mt_path <- "data/hp256wp2687_mt_statewide_2019_08_13.rds"

mt_path_clean <- "data_cleaning/cleaned_mt_records.csv" # Cleaned data with violations grouped into type

flathead_counties <- c(
    'Flathead County', 
    'Lincoln County', 
    'Lake County', 
    'Mineral County', 
    'Missoula County', 
    'Ravalli County', 
    'Sanders County', 
    'Salish & Kootenai County'
    )

In [55]:
# Load Data

data <-
    mt_path %>%
    read_rds() %>%
    mutate(
        year = year(date)
    ) %>%
    filter(year >= 2010)

#Load cleaned data
data_clean <-
    mt_path_clean %>%
    read_csv() %>%
    mutate(
        year = year(date)
    ) %>%
    filter(year >= 2010)

Parsed with column specification:
cols(
  date = [34mcol_date(format = "")[39m,
  time = [32mcol_double()[39m,
  year = [32mcol_double()[39m,
  lat = [32mcol_double()[39m,
  lng = [32mcol_double()[39m,
  subject_race = [31mcol_character()[39m,
  search_conducted = [33mcol_logical()[39m,
  reason_for_stop = [31mcol_character()[39m,
  violation_type = [31mcol_character()[39m,
  consent_search_conducted = [33mcol_logical()[39m,
  raw_search_type = [31mcol_character()[39m
)


#### Here you can see an example of what the raw data looks like:

In [10]:
head(data)

raw_row_number,date,time,location,lat,lng,county_name,subject_age,subject_race,subject_sex,⋯,reason_for_stop,vehicle_make,vehicle_model,vehicle_type,vehicle_registration_state,vehicle_year,raw_Race,raw_Ethnicity,raw_SearchType,year
<chr>,<date>,<drtn>,<chr>,<dbl>,<dbl>,<chr>,<int>,<fct>,<fct>,⋯,<chr>,<chr>,<chr>,<chr>,<fct>,<int>,<chr>,<chr>,<chr>,<dbl>
53,2015-08-28,07:24:10,I-90 EB MMM 418,45.59977,-109.0627,Stillwater County,27,white,male,⋯,--- - SPEED OVER LEGAL,FORD (FORD),F35,PICKUP,MT,1996,W,N,NO SEARCH REQUESTED,2015
18438,2010-01-01,00:03:09,WASHINGTON ST AND FIRST AVE,48.77944,-104.5611,Sheridan County,58,white,male,⋯,--- - FAIL TO / IMPROPER SIGNAL,FORD,F25 STYLE,PICKUP,MT,1999,W,N,NO SEARCH REQUESTED,2010
18439,2010-01-01,00:08:56,PINE AT BIG SKY SPUR,45.26102,-111.3094,Gallatin County,37,white,female,⋯,--- - FAIL TO STOP - SIGN OR LIGHT,FORD,FSY Style,CROSSOVER,MT,2005,W,N,NO SEARCH REQUESTED,2010
18440,2010-01-01,00:10:56,I90 MM 299EB,45.75969,-111.161,Gallatin County,55,white,female,⋯,--- - LIGHT VIOLATIONS,TOYOTA,CAMRY,SEDAN,MT,2001,W,N,NO SEARCH REQUESTED,2010
18441,2010-01-01,00:12:28,100 E FIRST AVE,48.77335,-104.5598,Sheridan County,40,white,male,⋯,--- - DISPLAYING ONLY ONE LICENSE PLATE,FORD,F350 SUPER,PICKUP,MT,2001,W,N,NO SEARCH REQUESTED,2010
18442,2010-01-01,00:15:30,HWY 93 MP 178 SB-35 ZONE,48.87421,-115.0429,Lincoln County,27,white,male,⋯,--- - SPEED OVER LEGAL,ROVER,LR2,CROSSOVER,,2008,W,N,NO SEARCH REQUESTED,2010


#### Here you can see all of the columns that were present in the data. These include descriptive details such as time and place of the stop, as well as details about the officer, vehicle, violation, and more.

In [11]:
data %>% colnames()

#### 1) Here you can see how many searches of each type were in the data. Consent searches were of most interest for the analysis due to their utility in examining potential bias. You'll notice that there are a lot of stops didn't result in searches of the vehicle. These are denoted by "NA" below.

In [12]:
data %>% count(search_basis, name = "Count") 

“Factor `search_basis` contains implicit NA, consider using `forcats::fct_explicit_na`”

search_basis,Count
<fct>,<int>
plain view,132
consent,2197
probable cause,43
other,575
,803726


#### 2) Upon deciding to analyze the consent search rate, out next question was to understand if certain races were searched at a disproportionate rate. If such a disparity existed, it could indicate bias in officers judgement of who to search. The numbers appears low, but that is because consent searches tend to be fairly rare across every stop. Notably, the rate for Native American drivers is almost 4x as high as it is for White drivers.

In [39]:
data %>%
    group_by(raw_Race) %>%
    filter(raw_Race != "NA", raw_Race != "U") %>%
    summarise(
        total_stops = n(),
        consent_searches = sum(search_basis == "consent", na.rm = TRUE),
        consent_search_rate = (consent_searches/total_stops) * 100
    ) %>%
    transmute(
        Race = case_when(
            raw_Race == "A" ~ "Asian Drivers",
            raw_Race == "B" ~ "Black Drivers",
            raw_Race == "I" ~ "Native American Drivers",
            raw_Race == "W" ~ "White Drivers"
        ),
        total_stops,
        total_consent_searches = consent_searches,
        consent_search_rate = consent_search_rate %>% round(digits = 2) %>% as.character() %>% paste0("%")
    )

Race,total_stops,total_consent_searches,consent_search_rate
<chr>,<int>,<int>,<chr>
Asian Drivers,6588,31,0.47%
Black Drivers,8655,92,1.06%
Native American Drivers,37882,386,1.02%
White Drivers,750944,1686,0.22%


#### 3) Upon seeing this result, we had many questions. One of them was whether or not this rate was consistent over time. For instance, it could have been due to an anomalous year. We dug in to find out. You'll see that in every year from 2010 - 2016, Native American drivers seemed to be searched more frequently than White drivers.

In [76]:
data %>%
    group_by(year, raw_Race) %>%
    filter(raw_Race %in% c("W", "I"), !is.na(year)) %>%
    summarise(
        total_stops = n(),
        consent_searches = sum(search_basis == "consent", na.rm = TRUE),
        consent_search_rate = (consent_searches/total_stops) * 100
    ) %>%
    select(-total_stops, -consent_searches) %>%
    spread(key = raw_Race, value = consent_search_rate) %>%
    ungroup() %>%
    transmute(
        year,
        'How Much More Often Were Native Drivers Searched' = round(I/W, digits = 1) %>% as.character() %>% paste0("x"),
        native_american_search_rate = I %>% round(digits = 2) %>% as.character() %>% paste0("%"),
        white_search_rate = W %>% round(digits = 2) %>% as.character() %>% paste0("%")
    )

year,How Much More Often Were Native Drivers Searched,native_american_search_rate,white_search_rate
<dbl>,<chr>,<chr>,<chr>
2010,1.2x,0.21%,0.17%
2011,2.8x,0.41%,0.15%
2012,2.3x,0.34%,0.15%
2013,3.2x,0.6%,0.19%
2014,6.2x,1.87%,0.3%
2015,4.9x,1.59%,0.32%
2016,6.1x,1.94%,0.32%


#### 4) Realizing that this disparity is consistent over time, we turned out attention to geographic differences. Below, we analyze the search rate disparity in and around the Flathead Reservation. We were curious to know if biased practices seemed to increase when policing in and around the reservation. We found that they did and the Native drivers were searched roughly 11x as often as White drivers in this area.

In [75]:
data %>%
    group_by(raw_Race) %>%
    filter(raw_Race %in% c("W", "I"), county_name %in% flathead_counties) %>%
    summarise(
        total_stops = n(),
        consent_searches = sum(search_basis == "consent", na.rm = TRUE),
        consent_search_rate = (consent_searches/total_stops) * 100
    ) %>%
    select(-total_stops, -consent_searches) %>%
    spread(key = raw_Race, value = consent_search_rate) %>%
    ungroup() %>%
    transmute(
        'How Much More Often Were Native Drivers Searched' = round(I/W, digits = 1) %>% as.character() %>% paste0("x"),
        native_american_search_rate = I %>% round(digits = 2) %>% as.character() %>% paste0("%"),
        white_search_rate = W %>% round(digits = 2) %>% as.character() %>% paste0("%")
    )

How Much More Often Were Native Drivers Searched,native_american_search_rate,white_search_rate
<chr>,<chr>,<chr>
11.5x,3.02%,0.26%


#### 5) Next we wanted to chose to explore nuances in the stop data. What was it that was leading officers to search Native American drivers more? Perhaps the nature of the stop and the related violation had something to do with it. Below we analyze that data and find that each race seems to be stopped for very similar things.

In [70]:
data_clean %>%
    filter(subject_race %in% c("white", "indigenous"), !is.na(violation_type)) %>%
    group_by(subject_race) %>%
    count(violation_type) %>%
    arrange(desc(subject_race), desc(n)) %>%
    top_n(4, wt = n) %>%
    ungroup() %>%
    transmute(
        Race = case_when(
            subject_race == "indigenous" ~ "Native American Drivers",
            subject_race == "white" ~ "White Drivers"
        ),
        top_violations = violation_type,
        total_stops = n
    )

Race,top_violations,total_stops
<chr>,<chr>,<int>
White Drivers,motor vehicle hazardous,585626
White Drivers,license/registration/insurance,34039
White Drivers,commercial,16343
White Drivers,equipment,15235
Native American Drivers,motor vehicle hazardous,27771
Native American Drivers,license/registration/insurance,2321
Native American Drivers,equipment,2001
Native American Drivers,other,609


#### 6) When looking at stops that incluided just these top violations, was there still a disparity in search rates? The answer is yes, indicating that though Native American drivers are being stopped for similar violations, they are still being searched at roughly 5x the rate of white drivers. Thus, the disparity doesn't seem to be related to differences in each demographic's violations.

In [74]:
data_clean %>%
    group_by(subject_race) %>%
    filter(subject_race %in% c("white", "indigenous"), violation_type %in% c("motor vehicle hazardous", "license/registration/insurance", "equipment")) %>%
    summarise(
        total_stops = n(),
        consent_searches = sum(raw_search_type == "CONSENT SEARCH CONDUCTED", na.rm = TRUE),
        consent_search_rate = (consent_searches/total_stops) * 100
    ) %>%
    select(-total_stops, -consent_searches) %>%
    spread(key = subject_race, value = consent_search_rate) %>%
    ungroup() %>%
    transmute(
        'How Much More Often Were Native Drivers Searched' = round(indigenous/white, digits = 1) %>% as.character() %>% paste0("x"),
        native_american_search_rate = indigenous %>% round(digits = 2) %>% as.character() %>% paste0("%"),
        white_search_rate = white %>% round(digits = 2) %>% as.character() %>% paste0("%")
    )

How Much More Often Were Native Drivers Stopped,native_american_search_rate,white_search_rate
<chr>,<chr>,<chr>
5.1x,0.98%,0.19%


#### 7) Taking a step back, the analysis returns to the Flathead Reservation. We know that there seems to be a discrepancy in the search rates of Native American drivers compared to White drivers, but are Native American drivers searched more frequently closer to the reservation? To find out, we analyze the Native American search rate in and around Flathead Reservation and compare it to the relevant search rate in Montana as a whole.

#### From above, we know the statewide search rate for Native American drivers is roughly 1.02% and for White drivers it is roughly 0.22%. Notably, the search rates in and around Flathead Reservations jumped to 3.02% for Native Americans. this is a threefold increase. Meanwhile for White drivers, the increase was minimal, only jumping to 0.26%. These numbers can be found above in sections 2 (statewide) and 4 (Flathead).

#### To be sure, the numbers behind those search rates are relatively small but nevertheless represent the population of stops.

In [82]:
data %>%
    group_by(raw_Race) %>%
    filter(raw_Race %in% c("W", "I"), county_name %in% flathead_counties) %>%
    summarise(
        total_stops = n(),
        consent_searches = sum(search_basis == "consent", na.rm = TRUE),
        consent_search_rate = ((consent_searches/total_stops) * 100) %>% round(digits = 2) %>% as.character() %>% paste0("%")
    ) %>%
    transmute(
        Race = case_when(
            raw_Race == "I" ~ "Native American Drivers",
            raw_Race == "W" ~ "White Drivers"
        ),
        total_stops,
        consent_searches,
        consent_search_rate
    )

Race,total_stops,consent_searches,consent_search_rate
<chr>,<int>,<int>,<chr>
Native American Drivers,6883,208,3.02%
White Drivers,208580,549,0.26%
