### Motivation

#### Why are you planning to do this analysis? 

I'm planning to do this analysis because I’m curious to see if there’s an increase/decrease (or no change) and in what types of crime based on large events that happen.

#### Why is it potentially interesting and useful, from a scientific, practical, and/or human-centered perspective? 

It would be interesting to see if there’s any unexpected crime statistics correlating with historical events. This could be useful to know so that if similar events happen again in the future (i.e. “history repeating itself”), we can know what we can expect to see for crime and thus hopefully prevent it better. 

#### What do you hope to learn? 

I hope to learn about the impact that historical events have on crime rates and the type of crimes committed, specifically in the Seattle area. I’m currently planning to look at events like presidential elections, COVID, and George Floyd’s death, but this may change some. It might be interesting to see where these crimes tend to happen too.

### Data

#### Summarize what is represented in the dataset

This dataset contains information on finalized reported crimes in Seattle from 2008-2021 (and is updated daily). It includes columns such as the offense start and end time, report date, type of crime, and location of the crime.

#### Link to the dataset (it must be publicly available)

This is the link to the dataset: https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5.

#### Specify license and/or terms of use for the data

This dataset is under a Public Domain license.

#### Explain why this dataset is a suitable one for addressing your research goal listed above

I’m particularly interested in crime in Seattle since I’ve lived around this area my whole life and will be moving there this Fall. Additionally, this is a large dataset (with about 900k rows) that comes from the Seattle Police Department, meaning that my analysis will be more likely to be accurate.

#### Highlight any possible ethical considerations to using this dataset

Something to keep in mind when using this dataset and doing this type of analysis is where the data comes from. Only finalized reports are included, so crimes that don’t make it through the police department would be absent and unaccounted for. In addition, many people (particularly people of color) do not trust the police and may not make police reports, even after a crime has occurred. For this reason, the data will have bias to it and will not be a completely accurate representation of crime in Seattle. If people don’t acknowledge this, it could lead to poor decisions based on this data, such as only benefiting those who made police reports. It’s also worth noting that a lot of information is included in this dataset, so it would likely not be too difficult for someone to figure out who was involved in the crimes.

### Unknowns

#### Are there any factors outside of your control that might impact your ability to complete this project by the end of the quarter? 

I’ll be getting my second dose of the vaccine in a couple weeks, which may push back some of my work. I’m also not very comfortable with Python and haven’t used it for statistical analysis before, so I expect to run into a lot of problems related to that.

### Research Questions

1. Have major historical events impacted crime in Seattle?
    * Has Trump’s 2016 presidential election affected the amount of crime in Seattle?
    * What about the death of George Floyd?
    * What about the COVID-19 pandemic?

### Background

The online newspaper SeattlePI reported that there was a 48.57% increase in homicides in Seattle from 2019 to 2020 and that this increase appeared in other US cities during that same time period as well, showing that crime can fluctuate dramatically (https://www.google.com/amp/s/www.seattlepi.com/local/seattlenews/amp/2020-crime-Seattle-highest-homicide-rate-15864266.php). The article also mentioned that bodies that were just discovered that year that threw off the number, which is why it’s important to differentiate between the date a crime actually occurred and the date it was reported (as I will be doing in my research). It was additionally noted that property crimes increased, but that a possible reason for this could be the increase in Seattle’s population. This is something that I will need to mention in my analysis as well.

According to the Bureau of Justice, there is little relationship between the seasons and crime for most types of crime (https://www.bjs.gov/content/pub/pdf/ics.pdf). However, they still suggested taking into account seasonal fluctuation for crime for the sake of quality data analysis. This relationship can vary based on the location of the crime (which is why I will be focusing solely on Seattle) and how crime is reported by the police (which is why I will be using one dataset for all the crime data in hopes of mitigating this variance).

A research paper titled “The Effect of President Trump’s Election on Hate Crimes” investigated the impact that Trump’s election had on hate crime specifically and did find a statistically significant increase, proving that major events can indeed impact crime rates (https://poseidon01.ssrn.com/delivery.php?ID=724017066103122088120029027099089022026080077013030029112113018026093025106080082073011056033056027005107125092126030067066001046072056061077089122127084103124099089028058087083108115094089030083084094015102018064027072098104111085009014111088099097093&EXT=pdf&INDEX=TRUE). However, this only researched hate crimes in particular and not overall crime rates, which would be interesting to discover if there’s also a possible impact. 

### Methodology

I will be using descriptive statistics (mainly crime counts) for my analysis. Data visualizations in the form of line graphs will also be included. 

I’ll begin by cleaning the data by removing all crime reports from before the year 2008 since the report dates for this dataset does not begin until that year (earlier years are included for crime that was committed before 2008 but was not reported until that year; however, including these crimes would throw off the analysis). Crimes from the year 2021 will be included with the exception of May 2021 since that month will not be finished by the time I begin analyzing the data. Additionally, since crimes that fell under multiple categories for the type of crime were inputted multiple times into this dataset, I will be merging these data entries together (by assuming that entries with the same date, time, and location are for the same crime). 

After this data cleaning, I will find the daily average number of crimes committed for each month present in my dataset. The average was chosen as this will provide a detailed analysis with little chance of outliers impacting the results since I will be using data from months. I will do this by adding up all of the crime counts for each month and then dividing that number by the number of days in that month. In the case that this number is quite small, I will multiply it by the average number of days in a month to make it easier to understand. I will do this for each month in my dataset. I plan to also account for leap years in order to increase the accuracy of my analysis.

To reduce possible outside factors that could skew my results, I plan to make a line graph for each year (2008-2021) with the daily average number of crimes committed for each month of that year plotted. By comparing these graphs, I can see if there is any seasonal variation that may throw off my results (e.g. there are more/less crimes committed in winter than summer). From there, I plan to create a seasonal index and use that in combination with the daily average number of crimes committed for each month to determine if the crime rate after a particularly important event was actually impacted or if it was the result of outside factors (ideally, I would account for other factors as well, but that’s likely out of the scope for this project).

This information will be put on a line graph to visualize. The daily average number of crimes committed for each month will also be included on each of my 3 line graphs for large events (Trump’s election, George Floyd’s death, and the COVID-19 pandemic). For example, for the pandemic graph, the months included will be from March 2020 (since this is when the pandemic started) up until the end of the dataset in April 2021 (as the pandemic was still drastically affecting people at this time). This will allow me to see how crime was potentially impacted by these 3 events. 