# Academy Awards Analysis 🎬
## Investigating Trends in Oscar-Winning Movies
### Author: Judd Jacobs

This project analyzes historical **Academy Award-winning films** using data from **Wikipedia**, **The Movie Database (TMDb)**, and **Online Movie Database (OMDb)**.

## **Key Analysis Areas**
- **Best Picture trends by genre** (from Wikipedia Scrape & TMDb API) 🏆
- **Box office revenue & IMDb ratings** (OMDb API) 🎭
- **Long-term trends in Oscar-winning films** 📈

## **Step 1:** Import necessary Python Libraries 💽

In [1]:
import pandas as pd
import numpy as np
import requests
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns
from bs4 import BeautifulSoup
from wordcloud import WordCloud
from dotenv import load_dotenv
import os
from urllib.parse import quote
import time

# # Import NLTK libraries are currently a strech goal for future development
# import nltk
# from nltk.corpus import stopwords
# from nltk.tokenize import word_tokenize

# # Ensure necessary NLTK components are downloaded
# nltk.download("stopwords")
# nltk.download("punkt")

## **Step 2:** Data Acquisition 🗂

### **Scraping** Wikipedia
Extract **Best Picture winners** and relevant metadata using:
- **`pandas.read_html()`** to extract the table structure.
- **`BeautifulSoup`** to identify "winning" rows based on Wikipedia background color.

In [2]:
# Wikipedia URL for Best Picture winners
wiki_url = "https://en.wikipedia.org/wiki/List_of_Academy_Award%E2%80%93winning_films"

# Use pandas to extract the table
tables = pd.read_html(wiki_url)

# Select the correct table, adjusting the index, as needed - which is currently the first table at index 0 (as of 20250317)
best_picture_wikipedia = tables[0]

# Convert the table to a DataFrame
best_picture_wikipedia = pd.DataFrame(best_picture_wikipedia)

# Print the first few rows to ensure the correct table was selected
best_picture_wikipedia.head()

Unnamed: 0,Film,Year,Awards,Nominations
0,Anora,2024,5,6
1,The Brutalist,2024,3,10
2,Emilia Pérez,2024,2,13
3,Wicked,2024,2,10
4,Dune: Part Two,2024,2,5


In [3]:
# Find the Wikipedia table with BeautifulSoup
response_wikipedia = requests.get(wiki_url)
soup_wikipedia = BeautifulSoup(response_wikipedia.text, "html.parser")
wikipedia_table = soup_wikipedia.find_all("table", {"class": "wikitable"})[0]

# Extract all rows
rows = wikipedia_table.find_all("tr")

# List to store "Winner" status
winning_status = []

# Loop through rows and check for background color "#EEDD82" skipping the header row
for row in rows[1:]:
    style = row.get("style", "")
    
    # Check if the row has the background color for winners and remove spaces for consistency
    if "background:#EEDD82" in style.replace(" ", ""):
        winning_status.append("Winner")
    else:
        winning_status.append("Nominee")

# Ensure the list length matches the DataFrame
if len(winning_status) == len(best_picture_wikipedia):
    best_picture_wikipedia["Status"] = winning_status
else:
    print("List length does not match DataFrame length")

# Normalize "Status" column and filter only winners
best_picture_winners = best_picture_wikipedia[best_picture_wikipedia["Status"] == "Winner"]

# Convert the table to a DataFrame
best_picture_winners = pd.DataFrame(best_picture_winners)

# Display updated DataFrame
best_picture_winners.head()

Unnamed: 0,Film,Year,Awards,Nominations,Status
0,Anora,2024,5,6,Winner
14,Oppenheimer,2023,7,13,Winner
27,Everything Everywhere All at Once,2022,7,11,Winner
40,CODA,2021,3,3,Winner
55,Nomadland,2020/21,3,6,Winner


### Fetching Genres from TMDb API 🎭
Use **The Movie Database (TMDb) API** to retrieve **movie genres** for Best Picture winners listed in Wikipedia Dataset.

In [4]:
# Load environment variables from .env file
load_dotenv()

# Access the TMDB API keys stored in the .env file and define them here
tmdb_api_key = os.getenv('TMDB_API_KEY')
tmdb_api_read_access_token = os.getenv('TMBD_API_READ_ACCESS_TOKEN')

tmdb_api_base_url = "https://api.themoviedb.org/3"

# Function to get genre mappings (ID -> Name)
def get_genre_mapping() -> dict:
    url = f"{tmdb_api_base_url}/genre/movie/list?language=en-US"
    headers = {"accept": "application/json", "Authorization": f"Bearer {tmdb_api_read_access_token}"}
    
    response = requests.get(url, headers=headers)
    data = response.json()
    
    if "genres" in data:
        return {genre["id"]: genre["name"] for genre in data["genres"]}
    return {}

# Function to query TMDB API and get genre names for movies
def get_movie_genres(film_titles) -> dict:
    headers = {
        "accept": "application/json",
        "Authorization": f"Bearer {tmdb_api_read_access_token}"
        }
    
    # Fetch genre ID-to-name mapping
    genre_mapping = get_genre_mapping()

    # Store results
    movie_genres = {}

    for title in film_titles:
        # Encode spaces and special characters for use in URL
        encoded_title = quote(title)
        
        url = f"{tmdb_api_base_url}/search/movie?query={encoded_title}&include_adult=false&language=en-US&page=1"
        response = requests.get(url, headers=headers)
        data = response.json()
        
        if "results" in data and data["results"]:
            # Ensure exact match
            exact_match = next((movie for movie in data["results"] if movie["title"] == title), None)
            
            if exact_match:
                genre_ids = exact_match["genre_ids"]
                genre_names = [genre_mapping.get(gid, "Unknown Genre") for gid in genre_ids]
                movie_genres[title] = genre_names
            else:
                movie_genres[title] = ["No exact match found"]
        else:
            movie_genres[title] = ["No results found"]
    
    return movie_genres

# Create a List of the movie titles from extracted Wikipedia data
movie_titles = best_picture_winners["Film"].tolist()

# Get genre names for each movie
genre_results = get_movie_genres(movie_titles)

# Convert to DataFrame for display
genre_results = pd.DataFrame(list(genre_results.items()), columns=["Title", "Genres"])
genre_results

Unnamed: 0,Title,Genres
0,Anora,"[Drama, Comedy, Romance]"
1,Oppenheimer,"[Drama, History]"
2,Everything Everywhere All at Once,"[Action, Adventure, Science Fiction]"
3,CODA,"[Drama, Music, Romance]"
4,Nomadland,[Drama]
...,...,...
92,Rebecca,"[Romance, Drama, Mystery, Thriller]"
93,Tom Jones,"[Comedy, Adventure, History, Romance]"
94,West Side Story,"[Crime, Drama, Romance]"
95,Wings,"[Drama, Action, War, Romance]"


### Combine Data from Wikipedia and TMDb 🎞️
Merge data from Wikipedia and TMDb verifying column names for both DataFrames before merging

In [5]:
# Confirm column names for both DataFrames
print("best_picture_winners columns:", best_picture_winners.columns.tolist())
print("genre_results columns:", genre_results.columns.tolist())

best_picture_winners columns: ['Film', 'Year', 'Awards', 'Nominations', 'Status']
genre_results columns: ['Title', 'Genres']


In [6]:
# # Merge best_picture_winners and genre_results DataFrames on "Film"/"Title"
best_picture_winners = best_picture_winners.merge(
    genre_results,
    left_on="Film",
    right_on="Title",
    how="left"
)

# Drop the now redundant "Title" column
best_picture_winners.drop("Title", axis=1, inplace=True)

# Split the genres into separate columns
best_picture_winners = best_picture_winners.explode("Genres")
best_picture_winners = pd.concat([best_picture_winners, best_picture_winners["Genres"].str.get_dummies()], axis=1)

# Print the updated DataFrame
best_picture_winners

Unnamed: 0,Film,Year,Awards,Nominations,Status,Genres,Action,Adventure,Animation,Comedy,...,Horror,Music,Mystery,No exact match found,Romance,Science Fiction,TV Movie,Thriller,War,Western
0,Anora,2024,5,6,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
0,Anora,2024,5,6,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
0,Anora,2024,5,6,Winner,Romance,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
1,Oppenheimer,2023,7,13,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Oppenheimer,2023,7,13,Winner,History,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Wings,1927/28,2,2,Winner,Action,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
95,Wings,1927/28,2,2,Winner,War,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
95,Wings,1927/28,2,2,Winner,Romance,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
96,You Can't Take It with You,1938,2,7,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


### Fetch Box Office Revenue from OMDb API 💸
Use **OMDb API** to fetch **box office revenue** for Best Picture winners and combine with Wikipedia and TMDb data.

In [None]:
# Pull in OMDb API Key
omdb_api_key = os.getenv('OMDB_API_KEY')

# Function to fetch box office revenue from OMDb API given a movie title
def get_box_office(movie_title, api_key) -> str:
    # Ensure spaces & special characters are URL-safe
    encoded_title = quote(movie_title)
    omdb_url = f"http://www.omdbapi.com/?t={encoded_title}&apikey={api_key}"
    
    try:
        # Timeout to prevent hanging
        response = requests.get(omdb_url, timeout=10)
        # Raises an error for 4xx/5xx responses
        response.raise_for_status()
        
        # Parse JSON response
        data = response.json()
        
        # Handle API errors
        if "Error" in data:
            print(f"OMDb API Error for {movie_title}: {data['Error']}")
            return "N/A"
        # Return Box Office revenue or "N/A" if missing
        return data.get("BoxOffice", "N/A")

    except requests.exceptions.Timeout:
        print(f"Timeout error for {movie_title}. Skipping...")
        return "Timeout"
    except requests.exceptions.RequestException as e:
        print(f"API request failed for {movie_title}: {e}")
        return "API Error"
    except ValueError:
        print(f"Invalid JSON response for {movie_title}.")
        return "JSON Error"

# Use the correct DataFrame (`best_picture_winners`)
if "Box Office Revenue" not in best_picture_winners.columns:
    best_picture_winners["Box Office Revenue"] = ""

# Apply function to each movie in the dataset (with a delay to avoid rate limits)
for index, row in best_picture_winners.iterrows():
    title = row["Film"]
    best_picture_winners.at[index, "Box Office Revenue"] = get_box_office(title, omdb_api_key)

# Display updated DataFrame
best_picture_winners.head()

Unnamed: 0,Film,Year,Awards,Nominations,Status,Genres,Action,Adventure,Animation,Comedy,...,Music,Mystery,No exact match found,Romance,Science Fiction,TV Movie,Thriller,War,Western,Box Office Revenue
0,Anora,2024,5,6,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"$16,300,129"
0,Anora,2024,5,6,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,"$16,300,129"
0,Anora,2024,5,6,Winner,Romance,0,0,0,0,...,0,0,0,1,0,0,0,0,0,"$16,300,129"
1,Oppenheimer,2023,7,13,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"$330,050,270"
1,Oppenheimer,2023,7,13,Winner,History,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"$330,050,270"


## **Step 3:** 🛠 Data Cleaning & Storage
Merged data will have some cleaning applied and then the cleaned dataset will be stored in a local **SQLite database**.

In [8]:
# Get dataset summary
print(best_picture_winners.info())

<class 'pandas.core.frame.DataFrame'>
Index: 229 entries, 0 to 96
Data columns (total 26 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Film                  229 non-null    object
 1   Year                  229 non-null    object
 2   Awards                229 non-null    object
 3   Nominations           229 non-null    object
 4   Status                229 non-null    object
 5   Genres                228 non-null    object
 6   Action                229 non-null    int64 
 7   Adventure             229 non-null    int64 
 8   Animation             229 non-null    int64 
 9   Comedy                229 non-null    int64 
 10  Crime                 229 non-null    int64 
 11  Drama                 229 non-null    int64 
 12  Family                229 non-null    int64 
 13  Fantasy               229 non-null    int64 
 14  History               229 non-null    int64 
 15  Horror                229 non-null    int64 
 

In [9]:
# Check for missing values
print(best_picture_winners.isnull().sum())

Film                    0
Year                    0
Awards                  0
Nominations             0
Status                  0
Genres                  1
Action                  0
Adventure               0
Animation               0
Comedy                  0
Crime                   0
Drama                   0
Family                  0
Fantasy                 0
History                 0
Horror                  0
Music                   0
Mystery                 0
No exact match found    0
Romance                 0
Science Fiction         0
TV Movie                0
Thriller                0
War                     0
Western                 0
Box Office Revenue      0
dtype: int64


In [10]:
# Fill missing values with "Unknown"
best_picture_winners['Genres'] = best_picture_winners['Genres'].fillna('Unknown')
print(best_picture_winners.isnull().sum())

Film                    0
Year                    0
Awards                  0
Nominations             0
Status                  0
Genres                  0
Action                  0
Adventure               0
Animation               0
Comedy                  0
Crime                   0
Drama                   0
Family                  0
Fantasy                 0
History                 0
Horror                  0
Music                   0
Mystery                 0
No exact match found    0
Romance                 0
Science Fiction         0
TV Movie                0
Thriller                0
War                     0
Western                 0
Box Office Revenue      0
dtype: int64


### Address Year column values/formatting
The "Year" variable has some values that are not years, but a combination of years (e.g., 2020/21). I have decided to retain only the first year listed and convert it to integer.

In [23]:
# Extract first four-digit year and convert to integer
# best_picture_winners["Year"] = best_picture_winners["Year"].str.extract(r"(\d{4})")
best_picture_winners["Year"] = pd.to_numeric(best_picture_winners["Year"], errors="coerce")

# Convert 'Box Office Revenue' to numeric, removing any non-numeric characters
best_picture_winners['Box Office Revenue'] = pd.to_numeric(
    best_picture_winners['Box Office Revenue'].replace(r'[\$,]', '', regex=True), 
    errors="coerce"
)


# Display cleaned DataFrame
best_picture_winners


Unnamed: 0,Film,Year,Awards,Nominations,Status,Genres,Action,Adventure,Animation,Comedy,...,Music,Mystery,No exact match found,Romance,Science Fiction,TV Movie,Thriller,War,Western,Box Office Revenue
0,Anora,2024,5,6,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,16300129.0
0,Anora,2024,5,6,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,16300129.0
0,Anora,2024,5,6,Winner,Romance,0,0,0,0,...,0,0,0,1,0,0,0,0,0,16300129.0
1,Oppenheimer,2023,7,13,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,330050270.0
1,Oppenheimer,2023,7,13,Winner,History,0,0,0,0,...,0,0,0,0,0,0,0,0,0,330050270.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Wings,1927,2,2,Winner,Action,1,0,0,0,...,0,0,0,0,0,0,0,0,0,
95,Wings,1927,2,2,Winner,War,0,0,0,0,...,0,0,0,0,0,0,0,1,0,
95,Wings,1927,2,2,Winner,Romance,0,0,0,0,...,0,0,0,1,0,0,0,0,0,
96,You Can't Take It with You,1938,2,7,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,


In [None]:
# Find duplicates and drop them
duplicates = best_picture_winners.duplicated()
print(f'Duplicate rows: {duplicates.sum()}')
best_picture_winners = best_picture_winners.drop_duplicates()

# Strip whitespace and convert to title case
best_picture_winners['Film'] = best_picture_winners['Film'].str.strip().str.title()
best_picture_winners['Genres'] = best_picture_winners['Genres'].str.strip().str.title()
best_picture_winners

Duplicate rows: 0


Unnamed: 0,Film,Year,Awards,Nominations,Status,Genres,Action,Adventure,Animation,Comedy,...,Music,Mystery,No exact match found,Romance,Science Fiction,TV Movie,Thriller,War,Western,Box Office Revenue
0,Anora,2024,5,6,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,16300129.0
0,Anora,2024,5,6,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,16300129.0
0,Anora,2024,5,6,Winner,Romance,0,0,0,0,...,0,0,0,1,0,0,0,0,0,16300129.0
1,Oppenheimer,2023,7,13,Winner,Drama,0,0,0,0,...,0,0,0,0,0,0,0,0,0,330050270.0
1,Oppenheimer,2023,7,13,Winner,History,0,0,0,0,...,0,0,0,0,0,0,0,0,0,330050270.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Wings,1927,2,2,Winner,Action,1,0,0,0,...,0,0,0,0,0,0,0,0,0,
95,Wings,1927,2,2,Winner,War,0,0,0,0,...,0,0,0,0,0,0,0,1,0,
95,Wings,1927,2,2,Winner,Romance,0,0,0,0,...,0,0,0,1,0,0,0,0,0,
96,You Can'T Take It With You,1938,2,7,Winner,Comedy,0,0,0,1,...,0,0,0,0,0,0,0,0,0,


## START HERE

### Save cleaned data to local SQLite database

In [None]:
# conn = sqlite3.connect("academy_awards.db")
# merged_df.to_sql("best_picture_winners", conn, if_exists="replace", index=False)

# print("Data successfully stored in SQLite database!")

Below are cells that might be removed

In [None]:
# # This cell was moved down from above and may eventually be removed
# # Function to store movie data in SQLite
# def store_movie_data(movie_data) -> None:
#     # Create/connect to the database
#     conn = sqlite3.connect("academy_awards.db")
#     cursor = conn.cursor()

#     # Create table if it doesn't exist
#     cursor.execute('''
#         CREATE TABLE IF NOT EXISTS movies (
#             id INTEGER PRIMARY KEY AUTOINCREMENT,
#             film TEXT,
#             release_date TEXT,
#             overview TEXT,
#             vote_average REAL,
#             tmdb_id INTEGER UNIQUE
#         )
#     ''')

#     # Extract movie details from API response
#     if movie_data and movie_data.get("results"):
#         for movie in movie_data["results"]:
#             tmdb_id = movie.get("id")
#             title = movie.get("title", "Unknown")
#             release_date = movie.get("release_date", "N/A")
#             overview = movie.get("overview", "No description available.")
#             vote_average = movie.get("vote_average", 0.0)

#             # Insert or ignore if the movie already exists (prevents duplicate entries)
#             cursor.execute('''
#                 INSERT OR IGNORE INTO movies (tmdb_id, title, release_date, overview, vote_average)
#                 VALUES (?, ?, ?, ?, ?)
#             ''', (tmdb_id, title, release_date, overview, vote_average))

#     conn.commit()
#     conn.close()

# # Loop through each movie, fetch data, and store in the database
# for movie in movie_titles:
#     data = fetch_movie_data(movie)
#     if data:
#         store_movie_data(data)

# print("Movie data successfully stored in SQLite database!")

In [None]:
# # Connect to the database
# conn = sqlite3.connect("academy_awards.db")

# # Create a cursor object
# cursor = conn.cursor()

# # Execute the query and fetch all rows
# cursor.execute("SELECT * FROM movies")
# rows = cursor.fetchall()

# # Print the results
# for row in rows:
#     print(row)

# # Close the connection
# conn.close()

## Exploratory Data Analysis 📊
We will explore trends in **Best Picture winners** by genre and other relevant statistics.

## Box Office Ratings 💰
We will analyze **box office revenue** and number of nominations.

In [None]:
# # Scatter plot: Box Office Revenue vs IMDb Ratings
# plt.figure(figsize=(10, 6))
# sns.scatterplot(x=kaggle_df["BoxOffice"], y=kaggle_df["IMDb Rating"], hue=kaggle_df["Year"], palette="coolwarm")
# plt.xlabel("Box Office Revenue (in millions)")
# plt.ylabel("IMDb Rating")
# plt.title("Box Office Revenue vs IMDb Ratings for Best Picture Winners")
# plt.show()

## Stretch Goal: Word Cloud (Wikipedia Movie Summaries) ☁️
If Wikipedia summaries are accessible, generate a **word cloud** from commonly used words in movie descriptions.

In [None]:
# # Sample Wikipedia summary text (replace with actual summaries if available)
# sample_text = "This is a sample summary of a Best Picture-winning film. It tells the story of love, ambition, and success."

# # Tokenize & remove stopwords
# tokens = word_tokenize(sample_text.lower())
# filtered_words = [word for word in tokens if word.isalnum() and word not in stopwords.words("english")]

# # Generate Word Cloud
# wordcloud = WordCloud(width=800, height=400, background_color="white").generate(" ".join(filtered_words))

# # Display Word Cloud
# plt.figure(figsize=(10, 5))
# plt.imshow(wordcloud, interpolation="bilinear")
# plt.axis("off")
# plt.title("Word Cloud of Wikipedia Movie Summaries")
# plt.show()

In [None]:
# oscars = pd.read_csv("data/oscars.csv", sep='\t', on_bad_lines='skip')
# oscars = oscars.dropna()
# oscars = oscars.drop_duplicates()
# oscars = oscars.reset_index(drop=True)

In [None]:
# # Step 4: Store Data in SQLite Database
# conn = sqlite3.connect("academy_awards.db")
# awards_df.to_sql("awards", conn, if_exists="replace", index=False)
# speech_df.to_sql("speeches", conn, if_exists="replace", index=False)

In [None]:
# # Step 5: SQL Queries & Analysis
# ## Query genres of Best Picture winners over decades
# query = """
# SELECT genre, COUNT(*) AS num_wins, strftime('%Y', award_year) AS decade
# FROM awards
# WHERE category = 'Best Picture'
# GROUP BY genre, decade
# ORDER BY decade ASC;
# """
# genre_trends_df = pd.read_sql(query, conn)

# ## Query word frequency in acceptance speeches
# query = """
# SELECT cleaned_speech FROM speeches;
# """
# speech_texts = pd.read_sql(query, conn)

### Overview of the Analysis (examples)
- In this analysis, we explored the relationship between the race of law enforcement officers and the race of the drivers they stop. Our goal was to see if there’s any indication of bias in traffic stops based on the racial identity of the officers. To do this, we used a chi-squared test for independence, which helps us understand whether there’s a meaningful connection between these two groups.

### Results of the Chi-Squared Test
- **Chi-Squared Statistic:** We calculated a chi-squared statistic of 122.92. This high number shows that there’s a significant difference between the actual number of stops for different racial groups and what we would expect to see if there were no connection between the officer's race and the driver's race. In other words, this suggests that the patterns we observe in the data are unlikely to be just a coincidence.

- **P-Value:** The p-value we found was about 8.20e-17, which is extremely low. This tells us that the result is statistically significant since it’s much lower than the usual thresholds (like 0.05 or 0.01). A low p-value means we have strong evidence against the idea that there’s no connection between the officer's race and the driver's race.

### Interpretation of Findings
- The results show a strong connection between the race of the officer and the race of the driver being stopped. This means that a driver's chances of being stopped may change depending on the officer's race, suggesting there might be some bias in how traffic stops are carried out.

### Implications
- These findings are important for understanding how race plays a role in law enforcement. They suggest that different racial groups might be treated differently by officers during traffic stops. It's crucial to address these biases to ensure fairness and equality in policing.

### Conclusion
- The strong evidence from the chi-squared statistic and p-value emphasizes the importance of further examining law enforcement practices. Police leaders and community advocacy groups should take these findings into account when reviewing policies and training programs designed to reduce racial bias in policing.