A Daily Californian analysis of crime in the UC Berkeley area.
This repository contains tools for parsing and visualizing UCPD daily report logs from 2010 to 2015. Much of the code and methodology can be adapted to fit other data sources.
Clone the repo and install the requirements.
pip install -r requirements.txt npm install
Set the following environment variables:
DB_NAME: name of a PostGIS database
DB_USER: username with access to said database
If you'd like to deploy to S3 using django-bakery, set these as well:
To get started from scratch, run
python manage.py load, which will call:
load_bins, to import hexagonal bins from a shapefile in
load_ucpd, to load historical UCPD crime data
classify, to collapse incident information into one of three categories: violent, property or quality-of-life
locate, to merge location information with the address database to assign each incident a latitude and longitude
assign_bin, to locate each incident within a bin
compute_stats, to compute some basic statistics about crime across bins, across categories and over time
pack, to serialize incident-level information using Tamper
Incident-level reports come from a PRA filed with the UC Police Department. They cover January 2010 to September 2015. These raw data files are stored in
Hexagonal bins were generate in QGIS. The shapefile is stored in
Simple spreadsheet that maps the codes in the raw data to category codes:
V for violent crimes,
P for property crimes and
Q for quality-of-life crimes.
N is reserved for crimes that we aren't interested in analyzing or displaying.
Tamper is a New York Times library for efficient serialization of data. We use Tamper as opposed to sending raw JSON in order to experiment with sending all incidents to the user's browser, then using Pourover to quickly sort and filter that data on the client-side.
This means we can't send coordinates for each individual incident. Instead, we assign incidents to a bin and then send only the incident's bin ID. With small enough bins, this gives a fairly detailed look at the spatial distribution of crime, and keeps the data file being sent remarkably light (41KB, in this case).
While it's more of an experiment than something of great use for data of this scale (~10 thousand incidents), it's an interesting model for scaling up to hundreds of thousands of incidents — something we've tried with historical data from the city police department.
Building and deploying
Build this site out as flat files by running
python manage.py build.
If you've set the appropriate environment variables, publish to S3 using
python manage.py publish.
Where is this going?
We want to try scaling up this binning methodology to bigger datasets. That would involve creating a new shapefile and coming up with new address and classification dictionaries, but the rest of the loading, binning and serialization code should work.
We tried a few Pourover filters other than our basic classification (violent, property, quality-of-life), but none of them ended up being interesting for this particular dataset. For categorical variables, though, this project can accomplish some very fast visualizations of geospatial data — without running a server.