Skip to content

Latest commit

 

History

History
137 lines (108 loc) · 8.7 KB

README.md

File metadata and controls

137 lines (108 loc) · 8.7 KB

Denver Crime




Why this project?

Denver Crime initially started as a small web-app: A hobby project with Flask, JavaScript, MongoDB, and Python to showcase interactive queries, data visualization, and seasonal forecasting (skip to section) using Holt-Winters exponential smoothing and additive nonlinear regression models (explanation on the latter below, also see Prophet model publication).

Denver Crime then grew due to interest from friends in local law-enforcement. It is currently being tested with Vue.js with Tableau embeds.


Quick visualization of the dataset


The Dataset

Denver Crime takes nearly 400,000 of DenverPD's recorded incidents from 2017 to 2022, and stores them in a NoSQL database. Users can make queries through a web form to return query results, which are presented as tables and visual/interactive elements.


Making an incident query by crime type


What does it do?

With the intent of making a full-stack testbed that I could easily repurpose for more complex projects in the future, I had to structure/OOP carefully. I started by:

  1. Extracting Crime data and Offense Codes from CSVs and inserting parsed data into a NoSQL DB using MongoDB using PyMongo.
  2. Using Flask and MongoDB Query Language (MQL) for form-based queries and table outputs (with more query types and analyses to come).
  3. Conducting higher-level analyses in Python (i.e. average time between specified crime types, time windows with the most crime, areas with the most crime types).
  4. Forming interactive visualizations of incidents (maps with tooltips, so far) on user queries using Bokeh. *Initial CSS styling and layout were provided via Bootstrap and Grayscale.

Mapped query results (red dots), and tooltip information from database on mouse-over of each incident


Forecasting Models

Given (cyclical) trends in crime over time (Fig. 2), it seems reasonable to apply forecasting models which well-accommodate seasonality--both exponential smoothing and Meta's Prophet forecasting model (additive nonlinear regression) do so. How does Prophet work? We can see seasonality is accommodated by Prophet (as a periodic Fourier term) in its general mathematical model:

where represents the nonperiodic trend, represents the periodic (seasonal) component, and represents a normally distributed error term. Seasonality may occur daily, weekly, monthly, and/or yearly. Other seasonalities and events such as holidays are accommodated may be accounted for by Prophet as well.

Why is seasonality significant?

We see strong periodic behavior in Denver Crime, and the most apparent are weekly and yearly seasonalities: Fridays and Summer/early Fall tend to have the highest reported incidences of crime. Indeed, even the onset of the COVID pandemic in the US seems to have had little effect on the seasonality of crime. However, the general (non-periodic) trend in reported crimes appears to be distinctly positive.

How robust is forecasting in accommodating events such as COVID lockdown?

If we predict upon data up until the onset of the pandemic (but deprive the model of any data after March 2020), Prophet understandably fails to forecast the post-COVID increase in reported crimes (see "Forecast on data up until March 2020"). This is expected, as Prophet's non-periodic component is fitting to the relatively flat period in reported crimes, and has no data regarding crime after the lockdown.


Clean forecast without post-lockdown data


Interestingly, with just 8 months more data (up until the end of 2020--8 months after the onset of lockdowns), Prophet proves slightly over-sensitive to recent trends, overpredicting the number of reported crimes by 8% by Summer 2022. This difference is once again, attributable to the non-periodic component disproportionately weighting the increase in reported crimes at the onset of the pandemic. Tuning the non-periodic component would require de-weighting of recent values. Since Tableau's exponential smoothing model works in much the same way as Prophet's, a similar adjustment may be required there as well.





Should we use forecasting in criminal policy decision-making?

In the context of policy-making, it is important to be able to form and test hypotheses on trends to determine causative effects (correlation does not imply causation). To test an independent variable for causative effect, we would need to control for all other variables which may affect the outcome. These variables may include the weather, quarantine lockdowns, and even the price of cars and car parts (Denver has seen a ~300% increase in the rate of auto theft).


Denver Crime Forecast for 2023



Seasonal (periodic) components in Crime Forecast


Negative components are part of the additive model. Forecasts generated with the Prophet model.

Challenges?

  • Some incidents within the original dataset seem binned into a specific day (i.e., there are dozens of incidents occuring on midnight, January 1, 2018). This needs to be taken into account if time analysis for certain phenomena (i.e. "crime-waves"), thought it has less of an impact on certain calculations such as mean time between incidents.
  • PostgreSQL would have also been perfect for this project (though I may be biased due to familiarity with Postgres). The rationale for using MongoDB was to perform higher-level queries in MQL, and with the new $lookup function, inner/left/outer multi-attribute joins were easy to implement and fast to execute!


Query Results over Time in Plotly


What's left?

  1. Integrating Denver City's address database for additional incident address information (building type: apartment, etc.). A business address database is also available for further locational information.
  2. Additional inferences:
  • "Crime waves": Time-clustered incidences of crime.
  • "Crime hotspots": Spatially-clustered incidences of crime, based on GPS coordinates.
  • "Crime hotspots over time": A visualization of crime hotspots over time.
  • "Risky buildings": A simple inference on building types with the highest incidences of crime.
  1. Implementation of Vue.js elements.
  2. Containerization and deployment to AWS using Docker.

Motor Crime Hotspots over Time