# Downloading and Prepping Data <a id="2"></a>

Import Primary Modules:

In [1]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library

In [2]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium

# Maps with Markers <a id="6"></a>


Let's download and import the data on police department incidents using *pandas* `read_csv()` method.

Download the dataset and read it into a *pandas* dataframe:

In [3]:
df_incidents = pd.read_csv('SPD_Crime_Data__2008-Present.csv')
print('Dataset downloaded and read into a pandas dataframe!')

Dataset downloaded and read into a pandas dataframe!


In [4]:
## Rename some columns to reuse code previously done
df_incidents.rename(columns = {'Report Number':'IncidntNum', 
                               'Offense ID':'PdId', 'Offense':'Descript',
                               'Crime Against Category':'Category', 'Precinct':'PdDistrict',
                              }, inplace = True)



In [5]:
df_incidents = df_incidents.sort_values(by='Report DateTime',
                                       ascending=False)

Let's take a look at the first five items in our dataset.

In [6]:
#df_incidents.head()

Let's find out how many entries there are in our dataset.

In [7]:
df_incidents.shape

(965785, 17)

So the dataframe consists of 965,785 crimes, which took place in the year 2016. In order to reduce computational cost, let's just work with the first 500 incidents in this dataset.

In [8]:
# get a sample of 500 crimes in the df_incidents dataframe
limit = 500

## Random sampling through historical data
#tmp_df_incidents = df_incidents.sample(limit)

## 500 of most recent crimes
tmp_df_incidents = df_incidents.iloc[0:limit, :]

Let's confirm that our dataframe now consists only of 500 crimes.

In [9]:
tmp_df_incidents.shape
tmp_df_incidents.head()

Unnamed: 0,IncidntNum,PdId,Offense Start DateTime,Offense End DateTime,Report DateTime,Group A B,Category,Offense Parent Group,Descript,Offense Code,PdDistrict,Sector,Beat,MCPP,100 Block Address,Longitude,Latitude
948440,2021-342994,30739230754,12/30/2021 08:15:00 PM,12/30/2021 08:16:00 PM,12/31/2021 12:56:55 AM,A,PROPERTY,ROBBERY,Robbery,120,N,B,B1,BALLARD SOUTH,49XX BLOCK OF 17TH AVE NW,-122.37889,47.664725
948355,2021-343337,30741672454,12/31/2021 10:11:00 AM,,12/31/2021 12:50:53 PM,A,SOCIETY,WEAPON LAW VIOLATIONS,Weapon Law Violations,520,E,C,C3,CENTRAL AREA/SQUIRE PARK,,0.0,0.0
948354,2021-343337,30741554145,12/31/2021 10:11:00 AM,,12/31/2021 12:50:53 PM,A,PERSON,ASSAULT OFFENSES,Aggravated Assault,13A,E,C,C3,CENTRAL AREA/SQUIRE PARK,,0.0,0.0
948356,2021-343281,30741583215,12/31/2021 02:25:00 AM,12/31/2021 02:35:00 AM,12/31/2021 12:47:56 PM,A,PROPERTY,MOTOR VEHICLE THEFT,Motor Vehicle Theft,240,N,N,N3,BITTERLAKE,107XX BLOCK OF AURORA AVE N,-122.344721,47.706707
948357,2021-343355,30741536107,12/31/2021 10:27:00 AM,,12/31/2021 12:44:34 PM,B,SOCIETY,DRIVING UNDER THE INFLUENCE,Driving Under the Influence,90D,SW,F,F2,HIGHLAND PARK,DELRIDGE WAY SW / SW HENDERSON ST,-122.360099,47.522861


Now that we reduced the data a little bit, let's visualize where these crimes took place in the city of San Francisco. We will use the default style and we will initialize the zoom level to 12. 

In [10]:
# Seattle latitude and longitude values
latitude = 47.608013
longitude = -122.335167

In [11]:
# create map and display it
seattle_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of Seattle
#seattle_map

Now let's superimpose the locations of the crimes onto the map. The way to do that in **Folium** is to create a *feature group* with its own features and style and then add it to the seattle_map.

In [12]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(tmp_df_incidents.Latitude, tmp_df_incidents.Longitude):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
# seattle_map.add_child(incidents)

You can also add some pop-up text that would get displayed when you hover over a marker. Let's make each marker display the category of the crime when hovered over.

In [13]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(tmp_df_incidents.Latitude, tmp_df_incidents.Longitude):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(tmp_df_incidents.Latitude)
longitudes = list(tmp_df_incidents.Longitude)
labels = list(tmp_df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(seattle_map)    
    
# add incidents to map
seattle_map.add_child(incidents)

Isn't this really cool? Now you are able to know what crime category occurred at each marker.

If you find the map to be so congested will all these markers, there are two remedies to this problem. The simpler solution is to remove these location markers and just add the text to the circle markers themselves as follows:

In [14]:
# create map and display it
seattle_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(tmp_df_incidents.Latitude, tmp_df_incidents.Longitude, tmp_df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(seattle_map)

# show map
seattle_map

The other proper remedy is to group the markers into different clusters. Each cluster is then represented by the number of crimes in each neighborhood. These clusters can be thought of as pockets of Seattle which you can then analyze separately.

To implement this, we start off by instantiating a *MarkerCluster* object and adding all the data points in the dataframe to this object.

In [15]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
seattle_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(seattle_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(tmp_df_incidents.Latitude, tmp_df_incidents.Longitude, tmp_df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
seattle_map

Notice how when you zoom out all the way, all markers are grouped into one cluster, *the global cluster*, of 500 markers or crimes, which is the total number of crimes in our dataframe. Once you start zooming in, the *global cluster* will start breaking up into smaller clusters. Zooming in all the way will result in individual markers.