**A Study of Weather and Crime in Seattle**

**Geoff Coyner**

**Introduction and background**

Just as climate change is increasing global temperature averages, there are increasing reports that more hot days might lead to more crime. On January 19, 2017, the EPA estimated that the Pacific Northwest will see a 3-10 degree (Fahrenheit) increase in average temperatures by the end of the century [5]. If temperature is connected to crime, then an uptick in temperature could have significant governmental budgeting and family planning consequences (among other impacts).  In this project, I seek to better understand the relationship between temperature and crime, specifically in Seattle.

The positive relationship between crime and ambient temperature has been studied in Finland as reported in an article in the journal Nature [3]. It has also been studied in non-academic settings by Chicago publications like the Chicago Tribune [1, 2]. Both sources have shown that increases in violent crimes can been tied to warmer days. Some news reports have indicated a correlation exists in Seattle, however these reports seem to be anecdotal [4].

In addition to analyzing the relationship between temperature and crime, I also plan to include other variables such as daylight hours and how extreme a temperature is on a given day versus the climatic average. Also (because I have the data and it seems interesting as well) I also plan to expand the scope to both non-violent, since the studies I reviewed focused on violent crime.

**Hypotheses and deliverables**

Given the positive relationship between ambient temperature and violent crime in Finland, which is generally cooler than Seattle, I hypothesize a positive, statically significant relationship between Seattle&#39;s weather and violent crime, even when controlling for daylight hours and looking at &quot;extreme&quot; days specifically. I also hypothesis that there will be a statically significant, positive relationship between heat and non-violent crime in Seattle.

In addition to performing statistical analysis, I will prepare some visualizations similar to those shown in the Chicago Tribune article [1]. If time allows (to improve my skills), I may create these in an interactive format using JavaScript D3 and likely using Keen IO templates or something similar [6].

**Data sources**

For weather informatoin, I will use METARs weather data collected from the Automated Weather Observing System (AWOS) for King County Airport (BFI). This data is provided by Iowa State University under an MIT license [7]. Since this is data intended for pilots, it is detailed. It is recorded hourly and includes temperature, sky coverage and precipitation details.

For data about crime, I plan to use Seattle Police Department Report Incident [8] and Offense Data [9], which is available under a Creative Commons License. This data includes a coded offense or incident type at the incident level of grain as well as start and end date and times for events. I plan to pull both weather and police data from August 2010 through September 2017.

For sunrise and sunset times, I will refer to sunset and sunrise tables available from the US Naval Observatory [10]. According to their privacy notice, all information on their site is &quot;considered public information and may be distributed or copied unless otherwise specified.&quot;

In order to understand how the temperature observations in AWOS data are related to climatic averages for that day, I will use average temperature data from the national Centers for Environmental Information, which is also considered public [11].

**Data processing and analysis**

After collecting the data, I plan to perform some exploratory analysis on each dataset. This might include getting descriptive statistics or performing quick visualizations to identify any missing or anomalous elements. I will likely include outliers (such as extreme temperatures) in my analysis steps, but I plan to remove any obviously erroneous elements that would lead to misleading results. Once scrubbed, I plan to join the police, weather, max/min temperature and sunrise/sunset datasets on the time dimension.

I will analyze the data through regression analyses performed at various levels of grain. I also expect that I will need to perform some feature engineering steps iteratively. On my first pass, police incidents or offenses will be group by type (violent or non-violent). Later, I may group them at a more granular level as well (shootings, theft, criminal damage, etc. as seen in the Chicago Tribute article **[1]**). My working assumption is that the number of criminal incidents in a given time period will be distributed according to a Poisson distribution, so I expect I will fit the data according to a Poisson regression model similar to [3]. I will look at this assumption as part of my exploratory analysis.

My expectation is that the results of my analysis will validate my hypotheses and that these relationships will be clear and easily interpretable in the visualizations I create.

**Limitations and risks**

One limitation I expect is related to sample size issues. In the Finland study, the dataset was country-wide, and (fortunately) Seattle is much less violent than Chicago. As a result, I expect these studies deal with data having larger sample sizes than I may encounter. Even if my approaches faces limitations of statistical significant, I hope there is value in the approach I am taking and that by using open data science best practice I will enable other researchers (or myself) to quickly an easily replicate this study with a larger dataset.

Even if I do find a correlation between the weather and violent crime in Seattle, another limitation is that it will be difficult to determine causality through this analysis. There will almost certainly be relevant features which explain crime and are missing from my model. In presenting my findings, I must also be careful not to overgeneralize any results or assume that these findings will apply to every city. I must be very carefully to avoid confusion or misinterpretation by any stakeholders.

An additional risk (although a relatively unlikely one) is that the exploratory analysis turns up problems with the data or bad assumptions that I am not able to overcame. If this happens, I am prepared to seek out alternative datasets or to re-scope the study as needed to address this.

One last risk is related to privacy issues stemming from the police data. While each incident is assigned an ID that seems unrelated to suspects or witnesses involved, it is possible that this data could be quasi-identifying if joined with other data about witnesses or suspects (which I am not aware of at this time). If I encounter issues with this, I may need to reassess my project, however at this time I think the risk of this is low. I also think ethically the potential upsides of this project outweigh the very remote privacy concerns presented here.

**Human Centered Data Science considerations**

This study is intended to explore phenomena observed broadly (weather and crime) at a very local level. Thus, my primary audience is local families as well as local, State and Federal governments who bear the cost of combatting crime in Seattle. The findings of this work may also be relevant to stakeholders in other localities who are interested in how crime might increase in their cities as the climate warms and could leverage my approach. My hope is that all stakeholder groups consider my findings as one of many data points when making budgetary, family planning or other decisions. Since I am not a PhD with deep, rigorous statistical training, I hope that they approach my work with some skepticism and attempt to scrutinize any further questions or surprises that these findings pose.

Another consideration in performing this project is that I will emphasize open data science research practices through-out. This includes but is not limited to: citing all data sources carefully, making my downloaded data available, making processing and analysis steps easily reproducible through a Jupyter notebook, making any visualization available on GitHub and publishing everything else, including a clear Readme on GitHub. This will ensure my study is highly reproducible, so that anyone can check for errors or problems in my approach. This will also ensure it can be replicated, in that the approach can be easily applied to similar datasets in other locales. I also hope that this will turn the data gathering and processing steps into a helpful starting point, even if other researchers decide to use a different approach to the analysis.

Since there is the chance the results of my analysis may be unsurprising or fruitless given the potential limitations of the data, I must emphasize the importance of replicability. Focusing on an approach that may be of interest more broadly is important to avoiding the &quot;file drawer&quot; problem.

**Sources**

[1] [http://www.chicagotribune.com/news/data/ct-crime-heat-analysis-htmlstory.html](http://www.chicagotribune.com/news/data/ct-crime-heat-analysis-htmlstory.html)

[2] [http://crime.static-eric.com/](http://crime.static-eric.com/)

[3] https://www.nature.com/articles/s41598-017-06720-z

[4] [http://q13fox.com/2017/05/04/gun-related-crime-increasing-especially-in-south-seattle/](http://q13fox.com/2017/05/04/gun-related-crime-increasing-especially-in-south-seattle/)

[5] [https://19january2017snapshot.epa.gov/climate-impacts/climate-impacts-northwest\_.html](https://19january2017snapshot.epa.gov/climate-impacts/climate-impacts-northwest_.html)

[6] [https://github.com/keen/dashboards](https://github.com/keen/dashboards)

[7] [https://mesonet.agron.iastate.edu/request/download.phtml?network=WA\_ASOS](https://mesonet.agron.iastate.edu/request/download.phtml?network=WA_ASOS)

[8] [https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Incident/7ais-f98f](https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Incident/7ais-f98f)

[9] [https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Offense/m2gk-mysw](https://data.seattle.gov/Public-Safety/Seattle-Police-Department-Police-Report-Offense/m2gk-mysw)

[10] [http://aa.usno.navy.mil/cgi-bin/aa\_rstablew.pl?ID=AA&amp;year=2017&amp;task=0&amp;state=WA&amp;place=Seattle](http://aa.usno.navy.mil/cgi-bin/aa_rstablew.pl?ID=AA&amp;year=2017&amp;task=0&amp;state=WA&amp;place=Seattle)

[11] https://www.ncdc.noaa.gov/cag/time-series/us

**Examine weather data**

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import numpy as np

path = 'C:\\Users\\geoffc.REDMOND\\OneDrive\\Data512\\Final_Project\\'

In [3]:
#Pull weather data
weather = pd.read_csv(path+'asos.txt',header=0)
weather.head(3)

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,station,valid,lon,lat,tmpf,dwpf,relh,drct,sknt,p01i,...,skyc1,skyc2,skyc3,skyc4,skyl1,skyl2,skyl3,skyl4,presentwx,metar
0,SEA,2012-01-01 00:53,-122.3144,47.4447,39.92,28.94,64.54,20.0,8.0,M,...,SCT,BKN,,,14000.00,20000.00,M,M,M,KSEA 010053Z 02008KT 10SM SCT140 BKN200 04/M02...
1,PAE,2012-01-01 00:53,-122.2816,47.907,39.02,28.94,66.85,40.0,3.0,M,...,CLR,,,,M,M,M,M,M,KPAE 010053Z 04003KT 10SM CLR 04/M02 A3032 RMK...
2,BFI,2012-01-01 00:53,-122.3,47.53,41.0,30.92,67.09,40.0,6.0,M,...,FEW,,,,20000.00,M,M,M,M,KBFI 010053Z 04006KT 10SM FEW200 05/M01 A3032 ...
