# Mapping Location of Interest

These maps display the field operational requests coming from districts as marker with the relevant category. The dataframe contains various districts along with category of requests from that district. Ultimately, the map is a representation and categorization that can be related to the dataframe in the main report for each of the mission or research branches ( MSCOE, AVIATION, SURFACE, STIC) discussed in previous sections.
Additionally, a Foursqaure rendering of locations of interest in the area are provided. I chose to center the work around San Fransicso where the districts as mapped with Folium are located.


Import Primary Modules:

In [38]:
import numpy as np  # useful for many scientific computing in Python
import pandas as pd # primary data structure library
import requests # library to handle requests

import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from PIL import Image # converting images into arrays
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

#define credentials for Foursqaure access
CLIENT_ID = 'V3R4GWWHBFL23A4OEPUFIUX3XRT2GICELQSPXRHJ1DAS0C2A' # your Foursquare ID
CLIENT_SECRET = 'WMLA0CKK2CTN2F1D0GBGSKN00EVX5LL4ZLQZSPYJZIOTEEPE' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Folium installed
Libraries imported.
Your credentails:
CLIENT_ID: V3R4GWWHBFL23A4OEPUFIUX3XRT2GICELQSPXRHJ1DAS0C2A
CLIENT_SECRET:WMLA0CKK2CTN2F1D0GBGSKN00EVX5LL4ZLQZSPYJZIOTEEPE


# Define location data as latitude/longitude and specify search query preference for vegan restaurant nearby

In [39]:

address = 'Lombard St, San Francisco, CA'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

search_query = 'Vegan'
radius = 500
print(search_query + ' .... OK!')

# Here's where we find the data
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()

results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
#Future reference use pandas.json_normalize instead
dataframe = json_normalize(venues)
dataframe.head()

37.802076 -122.4188091
Vegan .... OK!




Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress
0,51ba997b8bbd6f2322d7d5c5,vegan09.com,"[{'id': '4bf58dd8d48988d130941735', 'name': 'B...",v-1590321179,False,37.803036,-122.42486,"[{'label': 'display', 'lat': 37.80303649357777...",542,94109,US,San Francisco,CA,United States,"[San Francisco, CA 94109, United States]"


There is one vegan restaurant in our area of interest with the name vegan09.com 
Foursqaure show the venue_id = '51ba997b8bbd6f2322d7d5c5' in the dataframe column "ID".

In [40]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,id
0,vegan09.com,Building,37.803036,-122.42486,"[{'label': 'display', 'lat': 37.80303649357777...",542,94109,US,San Francisco,CA,United States,"[San Francisco, CA 94109, United States]",51ba997b8bbd6f2322d7d5c5


In [41]:
# Get a list of names under this category and check more details on one of interest
dataframe_filtered.name
latitude = 37.803036
longitude = 122.42486

In [34]:
venue_id = '51ba997b8bbd6f2322d7d5c5' # ID of our place of interest retrieved from dataframe ID column
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)


In [42]:

venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred 
# add Ecco as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Ecco',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)

<folium.features.CircleMarker at 0x7f871f02d2e8>


For this report, a map centered around San Francisco where our Vegan restaurant was located  with coordinate latitude = 37.803036
longitude = 122.42486 by Foursquare.

In [43]:
vegan_map = folium.Map(location=[37.774929, -122.419418], zoom_start=14) 
vegan_map

# Stamen Terrain Maps 

These are maps that feature hill shading and natural vegetation colors.  Now the markers are added to the map with the dataframe data. I chose the Stamen Terrain map because terrain and environment are important elements for contextual analysis of the data.


In [44]:
df_incidents = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv')
filename = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Police_Department_Incidents_-_Previous_Year__2016_.csv"

print('Dataset downloaded and read into a pandas dataframe!')
df = df_incidents
#
#headers = ["IncidntNum","Category","Descript","DayOfWeek","Date","Time","Field","Desc","Date","District","X", "Y","Location"]
#df_map = pd.read_csv(filename, names = headers)

Dataset downloaded and read into a pandas dataframe!


In [45]:
# Read csv to df and copied to preserve data in df. Work with df_incidents and df_maps. 


In [47]:
# Keep only columns to use

df_incidents.drop ('IncidntNum', axis = 1, inplace =True)
df_incidents.drop ('DayOfWeek', axis = 1, inplace =True)
df_incidents.drop ('Time', axis = 1, inplace =True)
df_incidents.drop ('Address', axis = 1, inplace =True)
df_incidents.drop ('PdId', axis = 1, inplace =True)
df_incidents.drop ('Resolution', axis = 1, inplace =True)
df_incidents.drop ('Date', axis = 1, inplace =True)
#
#df_map.drop ('IncidntNum', axis = 1, inplace =True)
#df_map.drop ('DayOfWeek', axis = 1, inplace =True)
#df_map.drop ('Time', axis = 1, inplace =True)
#df_map.drop ('Address', axis = 1, inplace =True)
#df_map.drop ('PdId', axis = 1, inplace =True)
#df_map.drop ('Resolution', axis = 1, inplace =True)

In [60]:
# I started with this data frame and wrangled the data so it was relevant to this project keeping the location lat/lon constant.
# I removed columns that were not of interest.
df_incidents.head(20)

Unnamed: 0,Category,Descript,PdDistrict,X,Y,Location
0,MSCOE,POSS OF PROHIBITED WEAPON,D8,-122.403405,37.775421,"(37.775420706711, -122.403404791479)"
1,MSCOE,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",D8,-122.403405,37.775421,"(37.775420706711, -122.403404791479)"
2,SURFACE,WARRANT ARREST,D11,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)"
3,AVIATION,LOST PROPERTY,D5,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)"
4,AVIATION,LOST PROPERTY,D7,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)"
5,Network,BATTERY,D1,-122.426077,37.788019,"(37.788018555829, -122.426077177375)"
6,COMMS,PAROLE VIOLATION,D8,-122.405721,37.780879,"(37.7808789360214, -122.405721454567)"
7,AVIATION,FIRE REPORT,D5,-122.411778,37.783981,"(37.7839805592634, -122.411778295992)"
8,SURFACE,WARRANT ARREST,D8,-122.393357,37.775788,"(37.7757876218293, -122.393357241451)"
9,LE,FOUND PERSON,D11,-122.387182,37.720967,"(37.7209669615499, -122.387181635995)"


In [59]:
# Replace data in Districts and Category
#headers = ["Field","Desc","Date","District","X", "Y","Location"]
#df_map = pd.read_csv(filename, names = headers)

df_incidents.replace({'PdDistrict': {'SOUTHERN': 'D8', 'BAYVIEW': 'D11','MISSION':'D7','TENDERLOIN':'D5','NORTHERN':'D1'}}, inplace=True)
df_incidents.replace({'PdDistrict': {'TARAVAL': 'D13', 'CENTRAL': 'D6','INGLESIDE':'D9','PARK':'D4','RICHMOND':'10'}}, inplace=True)

df_incidents.replace({'Category': {'WARRANTS': 'SURFACE', 'WEAPON LAWS': 'MSCOE','NON-CRIMINAL':'AVIATION','BURGLARY':'EW'}}, inplace=True)
df_incidents.replace({'Category': {'ASSAULT': 'Network', 'OTHER OFFENSES': 'COMMS','ROBBERY':'STIC','FRAUD':'IT'}}, inplace=True)
df_incidents.replace({'Category': {'LARCENY/THEFT': 'REACT', 'MISSING PERSON': 'LE','VEHICLE THEFT':'Intel','VANDALISM':'AoA'}}, inplace=True)
df_incidents.replace({'Category': {'MISSING PERSON': 'FISHERIES', 'RECOVERED VEHICLE': 'Explore','VEHICLE THEFT':'Intel','VANDALISM':'AoA'}}, inplace=True)
df_incidents.replace({'Descript': {'AUTO': 'VESSEL', 'PROPERTY': 'at SEA','FRAUDULENT GAME OR TRICK':'LE'}}, inplace=True)

df_incidents.head(50)

Unnamed: 0,Category,Descript,PdDistrict,X,Y,Location
0,MSCOE,POSS OF PROHIBITED WEAPON,D8,-122.403405,37.775421,"(37.775420706711, -122.403404791479)"
1,MSCOE,"FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE",D8,-122.403405,37.775421,"(37.775420706711, -122.403404791479)"
2,SURFACE,WARRANT ARREST,D11,-122.388856,37.729981,"(37.7299809672996, -122.388856204292)"
3,AVIATION,LOST PROPERTY,D5,-122.412971,37.785788,"(37.7857883766888, -122.412970537591)"
4,AVIATION,LOST PROPERTY,D7,-122.419672,37.76505,"(37.7650501214668, -122.419671780296)"
5,Network,BATTERY,D1,-122.426077,37.788019,"(37.788018555829, -122.426077177375)"
6,COMMS,PAROLE VIOLATION,D8,-122.405721,37.780879,"(37.7808789360214, -122.405721454567)"
7,AVIATION,FIRE REPORT,D5,-122.411778,37.783981,"(37.7839805592634, -122.411778295992)"
8,SURFACE,WARRANT ARREST,D8,-122.393357,37.775788,"(37.7757876218293, -122.393357241451)"
9,LE,FOUND PERSON,D11,-122.387182,37.720967,"(37.7209669615499, -122.387181635995)"


In [61]:
df_incidents.replace({'Descript': {'AUTO': 'VESSEL', 'PROPERTY': 'at SEA','FRAUDULENT GAME OR TRICK':'LE'}}, inplace=True)

So each row consists of 6 features:

> 1. **Category**: Technical Branch
> 2. **Descript**: Description of request
> 3. **District**: The district of interest

> 4. **X**: The longitude value of the crime location 
> 5. **Y**: The latitude value of the crime location
> 6. **Location**: A tuple of the latitude and the longitude values


Originally the set contained over 150K rows. The cleaned data set contains 100 rows and 6 columns.

In [62]:
# get the first 100 crimes in the df_incidents dataframe
limit = 100
df_incidents = df_incidents.iloc[0:limit, :]
df_incidents.shape

(100, 6)

In [63]:
# Replace lat/lon for realistic districts with categorical requests
df_incidents.shape
#df_ca.set_index('Country', inplace=True)
#df_incidents['Total'] = df_incidents['Total'] = df_can.sum(axis=1)
#df_incidents.set_index('PdDistrict', inplace=True)
#df_incidentsgroup = df_incidents.groupby['PdDistricts'].sum
#df_incidents.replace({'X': {-122.4: -74.6}}, inplace=True)

df_incidents.head
#df_incidents[50:100]
#df_incidents.dtypes

<bound method NDFrame.head of     Category                                           Descript PdDistrict  \
0      MSCOE                          POSS OF PROHIBITED WEAPON         D8   
1      MSCOE     FIREARM, LOADED, IN VEHICLE, POSSESSION OR USE         D8   
2    SURFACE                                     WARRANT ARREST        D11   
3   AVIATION                                      LOST PROPERTY         D5   
4   AVIATION                                      LOST PROPERTY         D7   
..       ...                                                ...        ...   
95     COMMS  FRAUDULENT GAME OR TRICK, OBTAINING MONEY OR P...        D13   
96  AVIATION                                         AIDED CASE         D7   
97        EW                           BURGLARY, UNLAWFUL ENTRY         D4   
98     REACT                     GRAND THEFT FROM UNLOCKED AUTO         D4   
99     REACT                       GRAND THEFT FROM LOCKED AUTO        D13   

             X          Y        

Now that we reduced the data a little bit, let's visualize where these crimes took place in the city of San Francisco. We will use the default style and we will initialize the zoom level to 12. 

In [69]:
# San Francisco latitude and longitude values
latitude = 37.77
longitude = -122.42

In [70]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)
# display the map of San Francisco
sanfran_map

Now let's superimpose the locations of district requests onto the map. I do that with the feature group definition with poppups on markers for technical branch groups. 

In [73]:
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add incidents to map
sanfran_map.add_child(incidents)
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
    incidents.add_child(
        folium.features.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='yellow',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        )
    )

# add pop-up text to each marker on the map
latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):
    folium.Marker([lat, lng], popup=label).add_to(sanfran_map)    
    
# add incidents to map
sanfran_map.add_child(incidents)

To view the data on the map with less clutter we can modify the boundaries and consolidate.

In [75]:
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# loop through the 100 crimes and add each to the map
for lat, lng, label in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(sanfran_map)

# show map
sanfran_map

In [76]:
from folium import plugins

# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
    folium.Marker(
        location=[lat, lng],
        icon=None,
        popup=label,
    ).add_to(incidents)

# display map
sanfran_map