# Crime Data

Code used to process yearly Crime data.

Raw CSV files were downloaded locally.

**TODO:** Substitute with S3 URL

*source*: https://www.portlandoregon.gov/police/71978

In [1]:
import pandas as pd
import datetime

# Allow columns to be as wide as necessary
pd.set_option('display.max_colwidth', -1)

In [2]:
import datetime
TODAY = datetime.date.today().strftime("%Y_%m_%d")
print('Last Processed', TODAY )

Last Processed 2018_04_26


## Metadata

Column reference.  No need to rerun (if you want to, you'll need to install lxml and bs4)

In [3]:
# Import
metadata = pd.read_html('https://www.portlandoregon.gov/police/article/627228')[0]

# clean up
metadata.dropna(inplace=True)
metadata.set_index(0, drop=True, inplace=True)
metadata.index.name = 'Column Name'
metadata.columns = ['Description']

In [4]:
metadata

Unnamed: 0_level_0,Description
Column Name,Unnamed: 1_level_1
Case Number,The case year and number for the reported incident (YY-######).Sensitive cases have been randomly assigned a case number and are denoted by an X following the case year (YY-X######).
Occur Month Year,The Month and Year that the incident occured.
Occur Date,"Date the incident occurred. The exact occur date is sometimes unknown. In most situations, the first possible date the crime could have occurred is used as the occur date. (For example, victims return home from a week-long vacation to find their home burglarized. The burglary could have occurred at any point during the week. The first date of their vacation would be listed as the occur date.)"
Occur Time,"Time the incident occured. The exact occur time is sometimes unknown. In most situations, the first possible time the crime could have occured is used as the occur time.The time is reported in the 24-hour clock format, with the first two digits representing hour (ranges from 00 to 23) and the second two digits representing minutes (ranges from 00 to 59).Note: By default, Microsoft Excel removes leading zeroes when importing data. For more help with this issue, refer to Microsoft's help page."
Address,"Address of reported incident at the 100 block level (e.g.: 1111 SW 2nd Ave would be 1000 Block SW 2nd Ave).To protect the identity of victims and other privacy concerns, the address location of certain case types are not released."
Open Data X / Y,"Generalized XY point of the reported incident. For offenses that occurred at a specific address, the point is mapped to the block's midpoint. Offenses that occurred at an intersection is mapped to the intersection centroid. To protect the identity of victims and other privacy concerns, the points of certain case types are not released.XY points use the Oregon State Plane North (3601), NAD83 HARN, US International Feet coordinate system."
Open Data Lat / Lon,"Generalized Latitude / Longitude of the reported incident. For offenses that occurred at a specific address, the point is mapped to the block's midpoint. Offenses that occurred at an intersection is mapped to the intersection centroid. To protect the identity of victims and other privacy concerns, the points of certain case types are not released."
Neighborhood,"Neighborhood where incident occurred.If the neighborhood name is missing, the incident occurred outside of the boundaries of the Portland neighborhoods or at a location that could not be assigned to a specific address in the system (e.g., Portland, near Washington Park, on the streetcar, etc.). Note: Neighborhood boundaries and designations vary slightly from those found on the Office of Neighborhood Involvement website."
Crime Against,"Crime against category (Person, Property, or Society)"
Offense Category,"Category of offense (for example, Assault Offenses)"


A **detailed** breakdown of Offense Category and Offense type is viewable here: https://www.portlandoregon.gov/police/article/618535

## CODE

In [5]:
unprocessed_file_location = './crime_data_FULL.csv' # was renamed locally from default 'Open_Data_Sheet_data.csv'
processed_file_location = './crimes_processed_' + TODAY + '.csv'

### Import

In [6]:
# Load file
wanted_columns = ['Address', 'Case Number', 'Crime Against', 'Neighborhood',
       'Number of Records', 'Occur Date', 'Occur Time', 'Offense Category', 'Offense Count',
       'Offense Type', 'Open Data Lat', 'Open Data Lon', 'Report Date']

crimes = pd.read_csv(unprocessed_file_location, 
                   encoding='UTF-16', sep='\t',
                   usecols=wanted_columns,
                   dtype={'Occur Time': str})

### Process

In [7]:
# Rename columns
crimes.rename(columns={'Open Data Lon': 'Lon', 'Open Data Lat': 'Lat'}, inplace=True)

# Trim extra whitespace from Address
crimes.Address = crimes.Address.str.strip()

# Combine Date columns into something useful

# Pad Occur time so 0's are still present
hour_minutes = crimes['Occur Time'].str.rjust(4, '0')

# combine Date and time into into actual datetime object.  Set as 'Occur Date'
crimes["Occur Date"] = pd.to_datetime(crimes['Occur Date'] + ' ' + hour_minutes, infer_datetime_format=True)
crimes["Report Date"] = pd.to_datetime(crimes["Report Date"], infer_datetime_format=True)

# Get rid of now useless Occur Time column
crimes.drop(columns=['Occur Time'], inplace=True)

# take a peek
crimes.head(15)

Unnamed: 0,Address,Case Number,Crime Against,Neighborhood,Number of Records,Occur Date,Offense Category,Offense Count,Offense Type,Lat,Lon,Report Date
0,1500 BLOCK OF SE GRAND AVE,17-327011,Person,Hosford-Abernethy,1,2017-10-03 21:30:00,Assault Offenses,1,Simple Assault,45.511885,-122.660846,2017-10-03
1,,17-X4944728,Person,Montavilla,1,2017-10-04 16:30:00,Assault Offenses,1,Simple Assault,,,2017-10-04
2,SE GRAND AVE / SE TAYLOR ST,17-32801,Person,Buckman West,1,2017-02-02 12:53:00,Assault Offenses,1,Aggravated Assault,45.515091,-122.660765,2017-02-02
3,,17-X4947682,Person,Foster-Powell,1,2017-10-09 08:31:00,Assault Offenses,1,Intimidation,,,2017-10-09
4,,17-X4951466,Person,Pearl,1,2017-10-14 21:54:00,Assault Offenses,1,Simple Assault,,,2017-10-14
5,1900 BLOCK OF SE WATER AVE,17-53970,Person,Hosford-Abernethy,1,2017-02-22 18:22:00,Assault Offenses,1,Simple Assault,45.508957,-122.665792,2017-02-22
6,,17-X4781386,Person,East Columbia,1,2017-02-23 17:45:00,Assault Offenses,1,Simple Assault,,,2017-02-23
7,,17-X4782130,Person,Powellhurst-Gilbert,1,2017-02-24 12:09:00,Assault Offenses,1,Aggravated Assault,,,2017-02-24
8,N LOMBARD ST / N HURON AVE,17-56372,Person,University Park,1,2017-02-24 17:20:00,Assault Offenses,1,Aggravated Assault,45.579416,-122.715703,2017-02-24
9,4700 BLOCK OF SW 50TH AVE,17-43590,Person,Bridlemile,1,2017-02-06 19:30:00,Assault Offenses,1,Simple Assault,45.488754,-122.728398,2017-02-12


### Export

In [8]:
#export to csv
crimes.to_csv(processed_file_location, index=False)