# FBI U.S. Hate Crime Data

## Dataset Description and Initial Wrangling/Cleaning

### Overview

This dataset has data from the FBI on hate crimes committed in the U.S. from 1991 - 2020. 

### Source

This dataset is provided to the public from the FBI on their FBI Crime Data Explorer website. It can be found here â€“ https://crime-data-explorer.fr.cloud.gov/pages/downloads under master file downloads + hate crime filter. 

### FBI Hate Crime Description / Definition

The FBI considers crimes which are motivated in whole or in part by bias against a race, gender, gender identity, religion, disability, sexual orientation, or ethnicity to be classified as hate crimes. The presence of bias by an offender alone does not constitute a hate crime, as it must be shown through investigation that the particular crime was motivated by said bias.

### Limitations

This data is collected by the FBI through local law enforcement agencies. It must be understood by anyone analyzing or looking at the analysis of the data that the data does not represent law enforcement effectiveness. In addition, different states' local agencies had different levels of participation in the Uniform Crime Report over time, and therefore there may be some inherent exclusions of hate crimes which were not reported by local agencies.

Moreover, it is important to remember that due to the nature of hate crimes, victims are often part of marginalized communities. Therefore, many hate crimes are unreported by the victims due to fear of re-victimization or retaliation. Thus, there may be additional shortcomings from the analysis of the data as a true representation of all hate crime across the United States.

### Feature Description

This dataset is sourced from FBI data from the Uniform Crime Report statistics across the U.S. from 1991-2001. The data includes the following features - incident id, year, ORI (agency origin ID), public agency name, agency type (city, county, state, etc), state abbreviation, state name, division name, region name, population group id, popultaion group, incident date, adult victims, juvenile victims, total offenders, adult offenders, juvenile offenders, offender race, offender ethnicity, victim count, offense name, total individual victims, location name, bias description, victim types, multiple offense, and mulitple bias. 

## Initial Assessment

Below, I'll conduct an initial assessment on any necessary cleaning or transformation needed for the dataset before conducting the analysis. 

In [7]:
# imports pandas for loading and manipulating data
import pandas as pd

# sets all columns to be visible in notebook
pd.set_option('display.max_columns', None)

In [8]:
df = pd.read_csv(r"C:\Users\14802\Desktop\hate-crime analysis\datasets\hate_crime.csv", low_memory=False)

In [9]:
df.head()

Unnamed: 0,INCIDENT_ID,DATA_YEAR,ORI,PUB_AGENCY_NAME,PUB_AGENCY_UNIT,AGENCY_TYPE_NAME,STATE_ABBR,STATE_NAME,DIVISION_NAME,REGION_NAME,POPULATION_GROUP_CODE,POPULATION_GROUP_DESC,INCIDENT_DATE,ADULT_VICTIM_COUNT,JUVENILE_VICTIM_COUNT,TOTAL_OFFENDER_COUNT,ADULT_OFFENDER_COUNT,JUVENILE_OFFENDER_COUNT,OFFENDER_RACE,OFFENDER_ETHNICITY,VICTIM_COUNT,OFFENSE_NAME,TOTAL_INDIVIDUAL_VICTIMS,LOCATION_NAME,BIAS_DESC,VICTIM_TYPES,MULTIPLE_OFFENSE,MULTIPLE_BIAS
0,3015,1991,AR0040200,Rogers,,City,AR,Arkansas,West South Central,South,5,"Cities from 10,000 thru 24,999",31-AUG-91,,,1,,,White,,1,Intimidation,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-Black or African American,Individual,S,S
1,3016,1991,AR0290100,Hope,,City,AR,Arkansas,West South Central,South,6,"Cities from 2,500 thru 9,999",19-SEP-91,,,1,,,Black or African American,,1,Simple Assault,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,S,S
2,43,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,3,"Cities from 50,000 thru 99,999",04-JUL-91,,,1,,,Black or African American,,1,Aggravated Assault,1.0,Residence/Home,Anti-Black or African American,Individual,S,S
3,44,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,3,"Cities from 50,000 thru 99,999",24-DEC-91,,,1,,,Black or African American,,2,Aggravated Assault;Destruction/Damage/Vandalis...,1.0,Highway/Road/Alley/Street/Sidewalk,Anti-White,Individual,M,S
4,3017,1991,AR0350100,Pine Bluff,,City,AR,Arkansas,West South Central,South,3,"Cities from 50,000 thru 99,999",23-DEC-91,,,1,,,Black or African American,,1,Aggravated Assault,1.0,Service/Gas Station,Anti-White,Individual,S,S


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 219577 entries, 0 to 219576
Data columns (total 28 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   INCIDENT_ID               219577 non-null  int64  
 1   DATA_YEAR                 219577 non-null  int64  
 2   ORI                       219577 non-null  object 
 3   PUB_AGENCY_NAME           219577 non-null  object 
 4   PUB_AGENCY_UNIT           6431 non-null    object 
 5   AGENCY_TYPE_NAME          219577 non-null  object 
 6   STATE_ABBR                219577 non-null  object 
 7   STATE_NAME                219577 non-null  object 
 8   DIVISION_NAME             219577 non-null  object 
 9   REGION_NAME               219577 non-null  object 
 10  POPULATION_GROUP_CODE     219577 non-null  object 
 11  POPULATION_GROUP_DESC     219577 non-null  object 
 12  INCIDENT_DATE             219577 non-null  object 
 13  ADULT_VICTIM_COUNT        51411 non-null   f

### Data Cleaning Plan

- unnecessary columns for my analysis - 
    - INCIDENT_ID,
    - ORI
    - PUB_AGENCY_UNIT
    - STATE_NAME
    - POPULATION_GROUP_CODE and 
    - POPULATION_GROUP_DESC (both population columns unnecessary because we will be pulling in US census data)
<br/>
<br/>
- cleaning - 
    - make columns lowercase (for preference / ease to work with)
    - make incident_date a datetime format instead of object
    - ADULT_VICTIM_COUNT, VICTIM_COUNT, and TOTAL_INDIVIDUAL_VICTIMS - seems redundant, I need to look into these and see if they can be consolidated into one
<br/>
<br/>
- transformations - 
    - for many of the analyses I'll be using groupby year and/or state and looking at incident totals. However, I'll do this as needed and leave the original cleaned dataframe intact 
    - add some columns to track biases which I will be focusing on (for example, columns such as  "lgbt_bias", "racial_bias","religious_bias" with boolean/binomials to be able to group them together without changing the original/more detailed bias description)

## Cleaning

### Define

Remove unnecessary columns from dataset 