### About the dataset
This dataset is acquired from the San Francisco Police Department public data portal.  
It's about police incidents for the year 2016: https://data.sfgov.org/Public-Safety/Police-Department-Incidents-Previous-Year-2016-/ritf-b9ki  
Incidents are derived from the San Francisco Police Department (SFPD) Crime Incident Reporting system which is updated daily.  
This specific dataset shows data for the entire year of 2016.   
Address and location has been anonymized by moving to mid-block or to an intersection.  

I will display the incidents on a map of San Francisco.  
By clicking on an incident, an incident description will pop up.  
Unfortunately, I only managed to visualize 1,000 of 150,000 crimes commited in 2016 because the calculations are very computationally heavy.  
The map can be accessed at one of my websites: https://www.greenvegan.de

### Downloading the data
The dataset is stored on the IBM Object storage and will be download.  
Required Python libraries for data pre-processing and visualization will be imported and then the data read into a Pandas dataframe:

In [21]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium
import pandas as pd
import numpy as np

df_incidents = pd.read_csv('https://ibm.box.com/shared/static/nmcltjmocdi8sd5tk93uembzdec8zyaq.csv')

Let's take a look at the dataframe:

In [8]:
df_incidents.head()

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


Each row consists of 13 features:
1. **IncidntNum**: Incident Number
2. **Category**: Category of crime or incident
3. **Descript**: Description of the crime or incident
4. **DayOfWeek**: The day of week on which the incident occurred
5. **Date**: The Date on which the incident occurred
6. **Time**: The time of day on which the incident occurred
7. **PdDistrict**: The police department district
8. **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
9. **Address**: The closest address to where the incident took place
10. **X**: The longitude value of the crime location 
11. **Y**: The latitude value of the crime location
12. **Location**: A tuple of the latitude and the longitude values
13. **PdId**: The police department ID

Let's see how many crimes are in the dataset:

In [4]:
df_incidents.shape

(150500, 13)

The dataframe consists of 150,500 crimes, which took place in the year 2016.   
In order to reduce computational cost, let's just work with the first 1,000 incidents in this dataset.  

In [24]:
# get the first 1000 crimes in the df_incidents dataframe
limit = 1000
df_incidents = df_incidents.iloc[0:limit, :]

Let's superimpose the locations of the crimes onto the map.   
The way to do that in Folium is to create a feature group with its own features and style and then add it to the sanfran_map:

In [25]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco
sanfran_map

# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 1,000 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

Also, let's add some pop-up text that gets displayed when one hovers over a marker.   
Let's make each marker display the category of the crime when clicked upon.

In [26]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 1,000 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

Let's group the markers into different clusters.   
Each cluster is then represented by the number of crimes in each neighborhood.   
These clusters can be thought of as pockets of San Francisco which can be analyzed separately.

To implement this, we start off by instantiating a MarkerCluster object and adding all the data points in the dataframe to this object:

In [30]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, description, resolution in zip(df_incidents.Y, df_incidents.X, df_incidents.Descript, df_incidents.Resolution ):
    label = "Description: {} <br> Resolution: {}".format(description, resolution)
    folium.features.CircleMarker(
        location=[lat, lng],
        radius=10, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6,
        popup=label
    ).add_to(incidents)

# display map
sanfran_map

Great.  
Finally, let's create a heat map of San Francisco to find out where most of the crime occurs.  
Luckily there is a geojson file available that defines the San Francisco borough boundaries, otherwise they would have to be defined manually via a set of latitude and longitude coordinates.

In [31]:
# download san francisco geojson file
!wget --quiet https://cocl.us/sanfran_geojson -O sanfran.json

In [36]:
# get incidents sorted by PdDistrict, then sort the dataframe and rename the columns
df_can = df_incidents.groupby('PdDistrict', axis=0).size().to_frame().reset_index()
df_can.sort_values(by='PdDistrict', ascending=True, inplace=True)
df_can.rename(columns={'PdDistrict':'Neighborhood', 0:'Count'}, inplace=True)

# create heat map
from folium import plugins
sanfran_geo = r'sanfran.json' # geojson file

# create a plain world map
latitude = 37.77
longitude = -122.42
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# generate choropleth map using the crime rate of San Francisco sorted by tiles
sanfran_map.choropleth(
    geo_data=sanfran_geo,
    data=df_can,
    columns=['Neighborhood', 'Count'],
    key_on='feature.properties.DISTRICT',
    fill_color='YlOrRd',
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Crime Rate in San Francisco',
    reset = True
)

# display map
sanfran_map

Apparently, the most crimes occur in North East San Francisco.  
Especially Treasure Island is affected.