In the first three months of 2012, 203,500 New Yorkers were stopped by the police.
- 181,457 were totally innocent (89 percent).
- 108,097 were black (54 percent).
- 69,043 were Latino (33 percent).
- 18,387 were white (9 percent). - NYCLU
-
NYPD stop question and frisk databases for 2003-2011
-
over ** 4.2 million ** records
-
each record has 112 columns of information on the stop: justification, demographic, outcome, if force was used/why, location (street address)
-
~2.6 million records contain precise geographical coordinates
- Currently running local MySQL and PostgresSQL databases. Some cells contain extra spaces around cell content. (this is due to the format we received the databases in initially). There are also some encoding issues. PostGIS import with UTF-8 encoding fails but works with LATIN1.
- Migrate refined database to PostGIS database (locally (?) or set up a server environment for remote PostGIS)
- Geocode the remaining ~ 2 million records
- Add additional geographical reference boundaries to entries. They currently contain some precinct information, but community districts, boroughs, etc. might have more Meaning to people. NYC has a great listing of community boundary shapefiles on their Byte of the Big Apple site.
- Viral storming of the data - cross upload it to as many open data repositories as possible: thedatahub geocommons, buzzdata, junar, providing the same amount of metadata as on main site.
- Aggregate overlapping points where appropriate to create elevation columns for particular geographical areas (whether blocks, points, etc). Elevation can then be used to visualize density.
- Review free text columns, some of which have several hundred thousand unique responses, and consider if/ how we might recode them.
- Basic statistics of stops/frisks for each precinct/community district/other boundary, with attention paid to demographic information as well as the reasoning given for the action taken by the NYPD.
- Make the results of #3 as transparent and accessible to ordinary non-statistical and non-developer people as possible. (ie, plain language documentation and CSV summaries of precinct aggregate statistics.
- Heat maps - by year or aggregate, looking at demographics as well as other categories
- Choropleths - same for #1 but use data analysis #3 boundary statistics to help people understand the phenomenon using boundaries they may more closely identify with
- Context / background maps: precinct level crime statistics; census demographic information (community district and census block); points of interest (subway/bus stops, night clubs, dunkin donuts :) )
- Documentation website - primarily to let developers know how they can use the data.
- Map site - part of number 1 that would feature maps created using data analysis work
- User friendly web accessible instance of the complete database. Allow anyone to find/report stops near them. Allow for rich crowd sourced metadata creation on things like NYPD officer demographics.
- NYCLU Report on 2011 Database (pdf)
- Center for Constitutional Rights Background on how the data became open
- Original NYPD Database Files - Note: they are offered only as portable SPSS database files (.por extensions).
- NYPD Data Dictionary - .zip file with an XLS spreadsheet for each year
- NYPD sample UF 250 form and Metadata
- NYCLU-provided Data Dictionary
- 311 Service Requests from 2010 to Present
- NYPD Public Indicators - Cleaned up version of their dataset is available here
- NYC Geographic Boundary Shapefiles, New York City Bytes of the Big Apple
- OSM Metro Extracts - New York City - Open Street Map Metro Extract for New York City
- NYPD Parking Tickets (.zip file) - MS Access Database offered on NYC Open Data Portal; I have it converted to a SQLite database.
- Research NYPD "Clean Halls" program, in effect since 2001, which allows NYPD officers to enter "clean halls" participating buildings to conduct frisks/stops.