# Problem Description

## What is the problem you are trying to solve?

Identifying the MAX (light rail) stations in Portland, Oregon that have the highest frequency of crime occurences near them in order to aid more efficient and effective allocation of scarce police resources.

# Background

In Portland, there is increasing concern about crime occuring near light rail (MAX) stations. This is exacerbated by the fact that there are not enough officers to provide adequate coverage of these increasingly high-crime areas. Furthermore, Portlanders are averse to the creation of an armed transit police force, despite the increasing occurence of violent crime outbreaks on the transit lines. Given this context, it would be of value to identify the MAX stations requiring additional police coverage, which will allow for more efficient and effective distribution of police officers to the areas in which they are needed.

# Data

## How can you use data to answer the question?

The primary thrust of my approach will be to use open data to calculate the frequency of crime occurences within a defined buffer (e.g., 1,000 meters) encompassing Portland light rail stations. Secondarily, given enough time, additional data could be used to normalize these occurences in order to derive more meaningful insights and make more substantive comparisons between areas.

## What data do you need to answer the question?

In order to begin the analysis described above, two sources of data will be required (at a minimum):

1. Portland MAX station locations
2. Portland crime data

Additionally, in order to normalize the crime data aggregations, a third dataset will be required:

3. Portland population data (citywide, and by neighborhood)

## Where is the data coming from (identify all sources) and how will you get it?

The sources of data have been identified as follows:

Portland MAX station locations--

* Foursquare API
* Trimet Geospatial Data (if needed)

Portland crime data--

* Portland Police Bureau Neighborhood Offense Statistics

Portland population data--

* *PDX Monthly* Report (*Portland Neighborhoods by the Numbers 2018*)

## Is the data to be collected representative of the problem to be solved?

The cumulative data to be collected can be reasonably assumed to be representative of the problem to be solved, with the caveat being that 2018 data will have to be used (given the absence of complete 2019 data). Also, Portland neighborhood polygons (easily accessible) may be required to aid the requisite spatial analysis.

## What additional work is required to manipulate and work with the data?

The response object returned from the Foursquare API call will have to be traversed and appended to lists in order to be manipulated into a useable dataframe, which will require some scripting. Given the utilization of buffers to aggregate crime counts near MAX stations, specialized libraries will be required (i.e., Shapely and/or GeoPandas) to establish polygons to be used in the requisite spatial analysis. Aggregated crime counts will need to be input into a dataframe in order to be normalized using population data. A final choropleth could be generated using this result in order to allow for the facilitated visual identification of neighborhoods with higher frequency of crime occurences near MAX stations, per capita.