# Project 4
## Nick Shinn

### *Note: Popup blocker must be disabled in order to properly view this notebook*

In [1]:
import folium, requests
import pandas, numpy as np

# download and prepare data for use

arrest_table = pandas.read_csv("http://www.hcbravo.org/IntroDataSci/misc/BPD_Arrests.csv")

arrest_table["race_new"] = arrest_table["sex"]
arrest_table["sex_new"] = arrest_table["race"]
arrest_table["race"] = arrest_table["race_new"]
arrest_table["sex"] = arrest_table["sex_new"]
arrest_table = arrest_table.drop('race_new', 1)
arrest_table = arrest_table.drop('sex_new', 1)

arrest_table = arrest_table[pandas.notnull(arrest_table["Location 1"])]

arrest_table["lat"], arrest_table["long"] = arrest_table["Location 1"].str.split(",").str
arrest_table["lat"] = arrest_table["lat"].str.replace("(", "").astype(float)
arrest_table["long"] = arrest_table["long"].str.replace(")", "").astype(float)

arrest_table.head()

Unnamed: 0,arrest,age,sex,race,arrestDate,arrestTime,arrestLocation,incidentOffense,incidentLocation,charge,chargeDescription,district,post,neighborhood,Location 1,lat,long
1,11127013.0,37,M,B,01/01/2011,00:01,2000 Wilkens Ave,79-Other,Wilkens Av & S Payson St,1 1425,Reckless Endangerment || Hand Gun Violation,SOUTHERN,934.0,Carrollton Ridge,"(39.2814026274, -76.6483635135)",39.281403,-76.648364
2,11126887.0,46,M,B,01/01/2011,00:01,2800 Mayfield Ave,Unknown Offense,,,Unknown Charge,NORTHEASTERN,415.0,Belair-Edison,"(39.3227699160, -76.5735750473)",39.32277,-76.573575
3,11126873.0,50,M,B,01/01/2011,00:04,2100 Ashburton St,79-Other,2100 Ashburton St,1 1106,Reg Firearm:Illegal Possession || Hgv,WESTERN,735.0,Panway/Braddish Avenue,"(39.3117196723, -76.6623546313)",39.31172,-76.662355
4,11126968.0,33,M,B,01/01/2011,00:05,4000 Wilsby Ave,Unknown Offense,1700 Aliceanna St,,Unknown Charge,NORTHERN,525.0,Pen Lucy,"(39.3382885254, -76.6045667070)",39.338289,-76.604567
5,11127041.0,41,M,B,01/01/2011,00:05,2900 Spellman Rd,81-Recovered Property,2900 Spelman Rd,1 1425,Reckless Endangerment || Handgun Violation,SOUTHERN,924.0,Cherry Hill,"(39.2449886230, -76.6273582432)",39.244989,-76.627358


In [2]:
from folium.plugins import HeatMapWithTime

# Create instance of folium map
m = folium.Map(location=[39.29, -76.61], zoom_start=12)

# Get random sample of 5000 entries from dataset
rand_sample = arrest_table.sample(n=5000, axis=0)

# copy sorted version of data table to optimize performance
rand_sample_copy = rand_sample.sort_values(by='arrestTime')

# Helper function to convert HH:MM time string into minutes
def time_int(time):
    h, m = time.split(':')
    return int(h) * 60 + int(m)

# Creates array that defines each hour of the day (converted into minutes)
times = np.arange(0, 1440, 60)

# Gather all incidents in order of arrest time
# Drops rows as they are added into the heatmap data
data = []
for time in times:
    time_slot = []
    for index, row in rand_sample_copy.iterrows():
        if abs(time_int(row.arrestTime)-time) < 60: 
            time_slot.append([float(row.lat), float(row.long)])
            rand_sample_copy.drop(index, inplace=True)
    data.append(time_slot)        

HeatMapWithTime(data,auto_play=True).add_to(m)
m

The map above shows the average frequency of arrests made over a 24-hour period. The goal of this notebook is to help the reader visualize crime data in Baltimore in an interactive way. Using the available python libraries, I was able to scrape and tidy the data in order to organize the crime data by time of arrest and location. The map above utilizes the folium heatmap plug-in to draw in the data onto an interactive map. The original data contained 63000 rows of data, and processing all of the available data will take too long to render onto a map. Disclaimer: In the case of a school project, samples of size 5000 we're taken to reduce run-time (and hopefully grading time) while also giving an accurate representation of the dataset as a population.

The data above is organized by time of arrest, so the heatmap indicates where arrests have been made grouped by the time of the arrest. The interactive map cycles through each hour of the day (indicated by the number on the tooltip) and will map out the respective crime data based on the current hour it is displaying. The tooltip also allows you move forward or backwards in time to gain a more accurate reading of the heatmap. Through observation, I can see the heatmap of arrests made overall at the start of a typical day (6-7am) is very sparse and low density. As the hour approaches noon (12-1pm) the map begins to show more concentrated data around the inner city area, and more arrests made overall. As the evening approaches, the amount of crime in the inner city area continues to grow, while outside of the city appears to remain spread and unconcentrated. As the hourly clock nears the end of its cycle (11pm-12am) the amount of arrests overall decreases dramatically, and by 6am the amount of arrests appears to reach its minimum, with most of the data residing in the city. 

Looking at this sample data from the larger dataset, most of the arrests are being made during the afternoon-evening time of the day. This however does not imply that it is less dangerous during the late-hours of the day (12am-6am) becuase this visualization does not indicate the type of offense and description of the arrest. 