# Spatial visualization of San Francisco Police Department Incidents for 2016

## Objectives
Visualize geospatial data with Folium

## Introduction
In this notebook, we will create maps for different objectives. To do that, we will part ways with Matplotlib and work with another Python visualization library, namely Folium. What is nice about Folium is that it was developed for the sole purpose of visualizing geospatial data.

## Tool Kit
This Notebook heavily relies on pandas and Numpy for data wrangling, analysis, and visualization. The primary plotting library we will explore in this lab is Folium.

## Data Set
San Francisco Police Department Incidents for the year 2016 - Police Department Incidents from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Updated daily, showing data for the entire year of 2016. Address and location has been anonymized by moving to mid-block or to an intersection.

https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry

This dataset is available from 2003 to May 2018 - We will filter the data for 2016 only

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import folium

#### Installing Folium

In [2]:
#!conda install -c conda-forge folium=0.5.0 --yes
#print('Folium installed and imported!')

### Initializing Folium

Here we will initilize Folium and see the entire map of the world. 

In [3]:
# define the world map
world_map = folium.Map()

# display world map
world_map

Now we will focus the map to any city - Boston

In [4]:
# SF Map
bos_map = folium.Map(location=[42.3601, -71.0589], zoom_start=12)

# display world map
bos_map

We can also change the style of the maps - Shown Below

In [5]:
# SF Map
bos_map = folium.Map(location=[42.3601, -71.0589], zoom_start=12,tiles='Stamen Toner')

# display world map
bos_map

In [6]:
# Boston Map
bos_map = folium.Map(location=[42.3601, -71.0589], zoom_start=12,tiles='Stamen Terrain')

# display world map
bos_map

### Loading Data Set

In [7]:
df_incidents=pd.read_csv('/Users/mianmibrahim/Documents/Untitled Folder/Police_Department_Incident_Reports__Historical_2003_to_May_2018.csv')
df_incidents.head(10)

Unnamed: 0,PdId,IncidntNum,Incident Code,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,...,Fix It Zones as of 2017-11-06 2 2,DELETE - HSOC Zones 2 2,Fix It Zones as of 2018-02-07 2 2,"CBD, BID and GBD Boundaries as of 2017 2 2","Areas of Vulnerability, 2016 2 2",Central Market/Tenderloin Boundary 2 2,Central Market/Tenderloin Boundary Polygon - Updated 2 2,HSOC Zones as of 2018-06-05 2 2,OWED Public Spaces 2 2,Neighborhoods 2
0,4133422003074,41334220,3074,ROBBERY,"ROBBERY, BODILY FORCE",Monday,11/22/2004,17:50,INGLESIDE,NONE,...,,,,,,,,,,
1,5118535807021,51185358,7021,VEHICLE THEFT,STOLEN AUTOMOBILE,Tuesday,10/18/2005,20:00,PARK,NONE,...,,,,,,,,,,
2,4018830907021,40188309,7021,VEHICLE THEFT,STOLEN AUTOMOBILE,Sunday,02/15/2004,02:00,SOUTHERN,NONE,...,,,,,,,,,,
3,11014543126030,110145431,26030,ARSON,ARSON,Friday,02/18/2011,05:27,INGLESIDE,NONE,...,,,,,1.0,,,,,94.0
4,10108108004134,101081080,4134,ASSAULT,BATTERY,Sunday,11/21/2010,17:00,SOUTHERN,NONE,...,,,,,2.0,,,,,32.0
5,13027069804134,130270698,4134,ASSAULT,BATTERY,Tuesday,04/02/2013,15:50,TARAVAL,NONE,...,,,,,1.0,,,,,44.0
6,17063991304134,170639913,4134,ASSAULT,BATTERY,Sunday,08/06/2017,18:15,SOUTHERN,NONE,...,,,,,2.0,,,,,32.0
7,16020415607020,160204156,7020,VEHICLE THEFT,STOLEN AND RECOVERED VEHICLE,Thursday,03/03/2016,19:30,TARAVAL,NONE,...,,,,,,,,,,
8,6068579904134,60685799,4134,ASSAULT,BATTERY,Saturday,06/17/2006,03:00,TARAVAL,NONE,...,,,,,,,,,,
9,5134166327195,51341663,27195,TRESPASS,TRESPASSING,Monday,11/28/2005,16:04,TENDERLOIN,"ARREST, BOOKED",...,,,,,,,,,,


In [8]:
df_incidents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2129525 entries, 0 to 2129524
Data columns (total 35 columns):
 #   Column                                                    Dtype  
---  ------                                                    -----  
 0   PdId                                                      int64  
 1   IncidntNum                                                int64  
 2   Incident Code                                             int64  
 3   Category                                                  object 
 4   Descript                                                  object 
 5   DayOfWeek                                                 object 
 6   Date                                                      object 
 7   Time                                                      object 
 8   PdDistrict                                                object 
 9   Resolution                                                object 
 10  Address                       

### Dropping Columns

From the Data Set Above, we can see we have to remove un necessary columns which are not required for mapping. We will keep the following 13 columns in our dataset

**PdId**: The police department ID

**IncidntNum**: Incident Number

**Category**: Category of crime or incident

**Descript**: Description of the crime or incident

**DayOfWeek**: The day of week on which the incident occurred

**Date**: The Date on which the incident occurred

**Time**: The time of day on which the incident occurred

**PdDistrict**: The police department district

**Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not

**Address**: The closest address to where the incident took place

**X**: The longitude value of the crime location

**Y**: The latitude value of the crime location

**Location**: A tuple of the latitude and the longitude values

In [9]:
df_incidents=df_incidents.drop(['Incident Code','SF Find Neighborhoods 2 2', 'Current Police Districts 2 2',
       'Current Supervisor Districts 2 2', 'Analysis Neighborhoods 2 2',
       'DELETE - Fire Prevention Districts 2 2',
       'DELETE - Police Districts 2 2', 'DELETE - Supervisor Districts 2 2',
       'DELETE - Zip Codes 2 2', 'DELETE - Neighborhoods 2 2',
       'DELETE - 2017 Fix It Zones 2 2',
       'Civic Center Harm Reduction Project Boundary 2 2',
       'Fix It Zones as of 2017-11-06  2 2', 'DELETE - HSOC Zones 2 2',
       'Fix It Zones as of 2018-02-07 2 2',
       'CBD, BID and GBD Boundaries as of 2017 2 2',
       'Areas of Vulnerability, 2016 2 2',
       'Central Market/Tenderloin Boundary 2 2',
       'Central Market/Tenderloin Boundary Polygon - Updated 2 2',
       'HSOC Zones as of 2018-06-05 2 2', 'OWED Public Spaces 2 2',
       'Neighborhoods 2'], axis=1)

In [10]:
df_incidents.head()

Unnamed: 0,PdId,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,location
0,4133422003074,41334220,ROBBERY,"ROBBERY, BODILY FORCE",Monday,11/22/2004,17:50,INGLESIDE,NONE,GENEVA AV / SANTOS ST,-122.420084,37.708311,POINT (-122.420084075249 37.7083109744362)
1,5118535807021,51185358,VEHICLE THEFT,STOLEN AUTOMOBILE,Tuesday,10/18/2005,20:00,PARK,NONE,TURK ST / STJOSEPHS AV,-120.5,90.0,POINT (-120.50000000000001 90)
2,4018830907021,40188309,VEHICLE THEFT,STOLEN AUTOMOBILE,Sunday,02/15/2004,02:00,SOUTHERN,NONE,BRANNAN ST / 1ST ST,-120.5,90.0,POINT (-120.50000000000001 90)
3,11014543126030,110145431,ARSON,ARSON,Friday,02/18/2011,05:27,INGLESIDE,NONE,0 Block of SANJUAN AV,-122.43622,37.724377,POINT (-122.43622001281001 37.7243766140428)
4,10108108004134,101081080,ASSAULT,BATTERY,Sunday,11/21/2010,17:00,SOUTHERN,NONE,400 Block of 10TH ST,-122.410541,37.770913,POINT (-122.410541166987 37.7709130566165)


In [11]:
df_incidents.shape

(2129525, 13)

### Filtering Dates

In [12]:
# Filter Rows between dates 1/1/2016 - 12/31/2016

df_incidents['Date'] = pd.to_datetime(df_incidents['Date'], format='%m/%d/%Y')
  
# Filter data between two dates
df_2016 = df_incidents.loc[(df_incidents['Date'] >= '2016-01-01') & (df_incidents['Date'] < '2017-01-01')]

In [13]:
df_2016.shape

(145994, 13)

From here, we see that there are around 146K Records in 2016

### Mapping Data on Maps

First we will superimpose crimes on the map using Folium. We will take the first 100 incidents to reduce computational cost

In [14]:
# Initilizing Maps on Folium

# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

# create map and display it
sf_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sf_map

In [15]:
# Taking the first 100 incidents 

limit = 100
df_2016_100 = df_2016.iloc[0:limit, :]

In [16]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

#### Visulizing the crimes - 1

In the visulization below, we can see that we located the coordinates of incidents incidents.

In [17]:
# let's start again with a clean copy of the map of San Francisco

sf_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_2016_100.Y, df_2016_100.X):
    incidents.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='red'
        )
    )

# add incidents to map
sf_map.add_child(incidents)

#### Visulizing the crimes - 2

If we want to get more details on what sort of crime occured at the location, we can add pop up markers

In [18]:
# let's start again with a clean copy of the map of San Francisco
sf_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_2016_100.Y, df_2016_100.X):
    incidents.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='red'
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_2016_100.Y)
longitudes = list(df_2016_100.X)
labels = list(df_2016_100.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sf_map)    
    
# add incidents to map
sf_map.add_child(incidents)

### Visulizing the crimes - 3

In the visulization above, we can see that it looks very messy, one way we can clean it up is to remove the pins and embed the labels of the crimes within the circles

In [19]:
# let's start again with a clean copy of the map of San Francisco
sf_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_2016_100.Y, df_2016_100.X, df_2016_100.Category):
    folium.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='red'
    ).add_to(sf_map)

# show map
sf_map

In [20]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sf_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sf_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_2016_100.Y, df_2016_100.X, df_2016_100.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sf_map