In [2]:
import pandas as pd
import numpy as np
import folium

San Francisco Police Department Incidents for the year 2016 - Police Department Incidents from San Francisco public data portal. Incidents derived from San Francisco Police Department (SFPD) Crime Incident Reporting system. Updated daily, showing data for the entire year of 2016. Address and location has been anonymized by moving to mid-block or to an intersection. Note: this dataset no longer exists on the original website since systems updates in the department. The link included will take you to the page explaining the change of system since this exercise was created.

IncidntNum: Incident Number

Category: Category of crime or incident

Descript: Description of the crime or incident

DayOfWeek: The day of week on which the incident occurred

Date: The Date on which the incident occurred

Time: The time of day on which the incident occurred

PdDistrict: The police department district

Resolution: The resolution of the crime in terms whether the perpetrator was arrested or not

Address: The closest address to where the incident took place

X: The longitude value of the crime location

Y: The latitude value of the crime location

Location: A tuple of the latitude and the longitude values

PdId: The police department ID

In [3]:
df = pd.read_csv(r"C:\Users\nico_\Desktop\fichiers_csv\Police_Department_Incident_Reports__Historical_2003_to_May_2018_20240119.csv")

In [4]:
df.head()

Unnamed: 0,PdId,IncidntNum,Incident Code,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,...,Fix It Zones as of 2017-11-06 2 2,DELETE - HSOC Zones 2 2,Fix It Zones as of 2018-02-07 2 2,"CBD, BID and GBD Boundaries as of 2017 2 2","Areas of Vulnerability, 2016 2 2",Central Market/Tenderloin Boundary 2 2,Central Market/Tenderloin Boundary Polygon - Updated 2 2,HSOC Zones as of 2018-06-05 2 2,OWED Public Spaces 2 2,Neighborhoods 2
0,4133422003074,41334220,3074,ROBBERY,"ROBBERY, BODILY FORCE",Monday,11/22/2004,17:50,INGLESIDE,NONE,...,,,,,,,,,,
1,5118535807021,51185358,7021,VEHICLE THEFT,STOLEN AUTOMOBILE,Tuesday,10/18/2005,20:00,PARK,NONE,...,,,,,,,,,,
2,4018830907021,40188309,7021,VEHICLE THEFT,STOLEN AUTOMOBILE,Sunday,02/15/2004,02:00,SOUTHERN,NONE,...,,,,,,,,,,
3,11014543126030,110145431,26030,ARSON,ARSON,Friday,02/18/2011,05:27,INGLESIDE,NONE,...,,,,,1.0,,,,,94.0
4,10108108004134,101081080,4134,ASSAULT,BATTERY,Sunday,11/21/2010,17:00,SOUTHERN,NONE,...,,,,,2.0,,,,,32.0


In [5]:
df.shape

(2129525, 35)

In [8]:
df_incidents = df[['IncidntNum', 'Category', 'Descript', 'DayOfWeek', 'Date', 'Time',
       'PdDistrict', 'Resolution', 'Address', 'X', 'Y', 'location', 'PdId']]

In [26]:
df_incidents.shape

(2129525, 13)

In [9]:
df_incidents.isna().sum()

IncidntNum    0
Category      0
Descript      0
DayOfWeek     0
Date          0
Time          0
PdDistrict    1
Resolution    0
Address       0
X             0
Y             0
location      0
PdId          0
dtype: int64

In [28]:
max = df_incidents["PdDistrict"].value_counts().idxmax()
max

'SOUTHERN'

In [29]:
df_incidents.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_incidents.dropna(inplace=True)


In [30]:
df_incidents.isna().sum()

IncidntNum    0
Category      0
Descript      0
DayOfWeek     0
Date          0
Time          0
PdDistrict    0
Resolution    0
Address       0
X             0
Y             0
location      0
PdId          0
dtype: int64

# First 100 crimes 

In [31]:
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]

In [33]:
df_incidents.shape

(100, 13)

In [34]:
# San Francisco latitude and longitude 
latitude = 37.77
longitude = -122.42

In [35]:
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)
sanfran_map

# Location of crimes on the map

In [38]:
incidents = folium.map.FeatureGroup()

for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5, 
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)

# Location and type of crimes on the map

In [39]:
incidents = folium.map.FeatureGroup()

for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5, 
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )


latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
sanfran_map.add_child(incidents)

# group the markers into different clusters

In [47]:
from folium import plugins

sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

incidents = plugins.MarkerCluster().add_to(sanfran_map)

for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map