# <center>____________________________________________________________</center>

# <center>MAP OF SAN FRANCISCO - CRIME INCIDENTS IN 2016</center>

# <center>____________________________________________________________</center>

## Introduction

In this project, we will create a map of San Francisco with the reported crime incidents in 2016. To do that, we will work with a Python visualization library, namely **Folium**. It is developed for the sole purpose of visualizing geospatial data and completely free. **Folium** is a powerful Python library that helps us create several types of Leaflet maps. The fact that the Folium results are interactive makes this library very useful for dashboard building.

## Dataset
Toolkits: This project heavily relies on [**pandas**](http://pandas.pydata.org/) and [**Numpy**](http://www.numpy.org/) for data wrangling, analysis, and visualization. The primary plotting library we will use is [**Folium**](https://github.com/python-visualization/folium/).

Dataset:

### San Francisco Police Department Incidents for the year 2016
The dataset consists of the police Department incidents from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Updated daily, showing data for the entire year of 2016. Address and location has been anonymized by moving to mid-block or to an intersection.

Each row of the data consists of 13 features:

> 1.  **IncidntNum**: Incident Number
> 2.  **Category**: Category of crime or incident
> 3.  **Descript**: Description of the crime or incident
> 4.  **DayOfWeek**: The day of week on which the incident occurred
> 5.  **Date**: The Date on which the incident occurred
> 6.  **Time**: The time of day on which the incident occurred
> 7.  **PdDistrict**: The police department district
> 8.  **Resolution**: The resolution of the crime in terms whether the perpetrator was arrested or not
> 9.  **Address**: The closest address to where the incident took place
> 10. **X**: The longitude value of the crime location
> 11. **Y**: The latitude value of the crime location
> 12. **Location**: A tuple of the latitude and the longitude values
> 13. **PdId**: The police department ID


The original data is a ~32MB .csv file and has 150,500 records. To reduce complexity and increase performance, we will work with the 10% of it. The original data with the 150,500 records can be found [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Police_Department_Incidents_-_Previous_Year__2016_.csv).

## Libraries

In [None]:
#!pip install pandas
#!pip install numpy

#!pip install folium

In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

import folium
from folium import plugins

# <center>____________________________________________________________</center>

# <center>DATA ACQUISATION</center>
***

In [2]:
# Download the 10% version of the dataset and read it into a pandas dataframe
df_incidents = pd.read_csv('https://github.com/efeyemez/Portfolio/raw/main/Datasets/Police_Department_Incidents_-_Previous_Year__2016_Ten_Percent.csv')

print('Dataset downloaded and read into a dataframe!')

Dataset downloaded and read into a dataframe!


First five items in our dataset.

In [3]:
df_incidents.head(5)

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,120058272,WEAPON LAWS,POSS OF PROHIBITED WEAPON,Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212120
1,120058272,WEAPON LAWS,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",Friday,01/29/2016 12:00:00 AM,11:00,SOUTHERN,"ARREST, BOOKED",800 Block of BRYANT ST,-122.403405,37.775421,"(37.775420706711, -122.403404791479)",12005827212168
2,141059263,WARRANTS,WARRANT ARREST,Monday,04/25/2016 12:00:00 AM,14:59,BAYVIEW,"ARREST, BOOKED",KEITH ST / SHAFTER AV,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)",14105926363010
3,160013662,NON-CRIMINAL,LOST PROPERTY,Tuesday,01/05/2016 12:00:00 AM,23:50,TENDERLOIN,NONE,JONES ST / OFARRELL ST,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)",16001366271000
4,160002740,NON-CRIMINAL,LOST PROPERTY,Friday,01/01/2016 12:00:00 AM,00:30,MISSION,NONE,16TH ST / MISSION ST,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)",16000274071000


In [4]:
# Check the number of records and columns in the dataset.

df_incidents.shape

(15050, 13)

# <center>____________________________________________________________</center>

# <center>THE MAPS</center>
***

In [5]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [None]:
# create the map of SF
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map
sanfran_map

Now we superimpose the locations of the crimes onto the map:

In [None]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 15050 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)


We can also add some pop-up text that would get displayed when we click a marker. We will make each marker display the category of the crime when hovered over but remove these location markers and just add the text to the circles themselves:

In [None]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

Now we are able to know what crime category occurred at each circle. For convenience, we can use the markers instead of circles and group the markers into different clusters. Each cluster is then will be represented by the number of crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco which we can then analyze separately.

To implement this, we start off by instantiating a *MarkerCluster* object and adding all the data points in the dataframe to this object.

In [None]:
# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map

When you zoom out all the way, all markers are grouped into one cluster, *the global cluster*, of 15050 markers, which is the total number of crimes in our dataframe. Once we start zooming in, the *global cluster* will start breaking up into smaller clusters. Zooming in all the way will result in individual markers and when we click on them we can see the type of crime.