# Further Visualisation

Because it was not possible, because of the categorical nature of the data, to do more details inferential statistical analysis of the data further exploratory visualisation was undertaken. It shouldbe noted, however, that this visualisation would actually become part of the final presentation to the traveller. It would be important for the traveller to see the crime, venue and restaurant data presented in this manner.


In [1]:
# Import Pandas
import pandas as pd

# Import Numpy
import numpy as np

# Use the inline backend to generate the plots within the browser
%matplotlib inline 

import matplotlib as mpl
import matplotlib.pyplot as plt

import seaborn as sns
sns.set_style("darkgrid")

font = {'size'   : 14}
mpl.rc('font', **font)

# Use Folium to display the Maps for Visualisation
import folium
from folium.plugins import MarkerCluster
from folium.plugins import FastMarkerCluster
from folium import plugins

from folium import plugins
from folium.plugins import HeatMap


from sklearn.neighbors import DistanceMetric



In [2]:
# Very useful function to calculate the great circle distance between two points
# Found here and adapted:
# https://stackoverflow.com/questions/40452759/pandas-latitude-longitude-to-distance-between-successive-rows

def haversine(row):
    dist = DistanceMetric.get_metric('haversine')
   
    lat1, lon1, lat2, lon2 = map(np.radians, [row['latitude'], 
                                              row['longitude'], 
                                              row['ven_lat'], 
                                              row['ven_lon']])

    X = [[lat1, lon1],
         [lat2, lon2]]
    
    kms = 6367
    
    return kms * dist.pairwise(X)[0][1]

In [3]:
# Import the Pickle of the Crimes DataFrame
df_crimes = pd.read_pickle('./Pickles/crimes.pkl')
df_crimes.drop('index', inplace=True, axis=1)

# Import the Pickle of the Top Venues DataFrame
df_top_venues = pd.read_pickle('./Pickles/top_venues.pkl')
df_top_venues['name'] = df_top_venues['name'].str.replace("'",'')
df_top_venues['name'] = df_top_venues['name'].str.replace("&",' and ')
df_top_venues['name'] = df_top_venues['name'].str.replace(",",'')

# Import the Pickle of the Restaurants DataFrame
df_rest = pd.read_pickle('./Pickles/restaurants.pkl')

# To display the Name correctly need to remove ' and & symbols
df_rest['name'] = df_rest['name'].str.replace("'",'')
df_rest['name'] = df_rest['name'].str.replace("&",' and ')
df_rest['name'] = df_rest['name'].str.replace(",",'')

## Top venues with more than 8 Restaurants nearby

In [37]:
# The Top Venues / Sites from the restaurant dataframe
top_venues_list = df_rest.venue_name.unique().tolist()

In [38]:
# Filter the top venues dataframe using the top venues list
df_top_venues = df_top_venues[df_top_venues['name'].isin(top_venues_list)]

# Take just the top 10 venues, sorted by score, to reduce to show the concept
df_top8_venues = df_top_venues.sort_values('score', ascending=False)[:8]

# Create a list of the top 10 venues names
top8_venues_list = df_top8_venues.name.tolist()

# Finally filter the Restaurants data frame to include only the top 10 venues
df_rest = df_rest[df_rest['venue_name'].isin(top8_venues_list)]

df_rest.reset_index(inplace=True)

In [39]:
# Create a list of the 2 most commonly occuring crimes
top_two_crimes = df_crimes[['primary_type', 'case_number']].groupby(
    ['primary_type']).count().sort_values('case_number', ascending=False)[:2].axes[0].tolist()

# Create a smaller DataFrame of only the top two crimes
df_crimes = df_crimes[df_crimes['primary_type'].isin(top_two_crimes)]
df_crimes.reset_index(inplace=True)
df_crimes.drop('index', inplace=True, axis=1)

In [40]:
df_dist = pd.DataFrame()

for name, lat, lon in zip(df_top10_venues.name,
                          df_top10_venues.latitude,
                          df_top10_venues.longitude):
    print('Processing: ', name)
    df_temp = df_crimes.copy()
    df_temp['ven_lat'] = lat
    df_temp['ven_lon'] = lon
    df_dist[name] = df_temp.apply(haversine, axis=1)

Processing:  Millennium Park
Processing:  The Art Institute of Chicago
Processing:  Chicago Riverwalk
Processing:  Symphony Center (Chicago Symphony Orchestra)
Processing:  The Chicago Theatre
Processing:  Publican Quality Meats
Processing:  Thalia Hall
Processing:  Bari


In [41]:
df_dist.head()

Unnamed: 0,Millennium Park,The Art Institute of Chicago,Chicago Riverwalk,Symphony Center (Chicago Symphony Orchestra),The Chicago Theatre,Publican Quality Meats,Thalia Hall,Bari
0,6.634,6.986362,6.093742,6.975499,6.265308,5.507981,8.585431,4.908782
1,13.634288,13.994928,13.095766,13.988276,13.272126,12.48245,15.382653,11.828163
2,1.766632,1.388901,2.207507,1.334783,2.014041,2.77955,2.706149,3.534604
3,6.331112,6.390797,6.041942,6.302795,6.047804,4.262594,4.772363,3.727284
4,6.507962,6.140716,6.891352,6.078914,6.703553,6.673184,3.615139,7.225222


In [42]:
df_dist.to_pickle('./Pickles/distances.pkl')

# Display each of the Top 8 Venues

In this section a preview of the type of data that will be displayed to a user of the proposed solution is shown.

For each of the Top 8 Venues:
1. All crimes within 750 meters of the venue are added to a dataframe
1. All restaurants associated with the venue are added to a dataframe
1. A folium Map is created centered on the venue
1. A heatmap of the crimes in the area are overlayed
1. the venue is marked on the map
1. The top 10 scored restaurants are marked on the map

It is possible to fully automate this through full iteration but in order to clearly show each of the 10 maps each is generated manually (to a limit)

## Venue 01

In [43]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[0]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [44]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [45]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat

## Venue 02

In [46]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[1]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [47]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [48]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Display the map
chicago_heatmat

## Venue 03

In [49]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[2]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [50]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [52]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Display the map
chicago_heatmat

## Venue 04

In [53]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[3]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [54]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [55]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat

## Venue 05

In [56]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[4]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [57]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [58]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat

## Venue 06

In [59]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[5]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [60]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [61]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat

## Venue 07

In [62]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[6]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [63]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [64]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat

## Venue 08

In [65]:
# Get the name of the Top Venue for this iteration
top_venue = top10_venues_list[7]

# Create a crime dataframe for the venue
df_crimes_venue = df_crimes.copy()
df_crimes_venue['dist'] = df_dist[top_venue]
df_crimes_venue = df_crimes_venue[df_crimes_venue['dist'] <= 0.75]

In [66]:
# Create a dataframe of the restaurants associated with the venue
df_rest_venue = df_rest[df_rest.venue_name == top_venue].copy()

# Sort the restaurants so we can pick the top 10
df_rest_venue.sort_values('score', ascending=False, inplace=True)

In [67]:
# Define Venue geolocation coordinates
chicago_latitude = df_top10_venues.latitude[df_top10_venues.name == top_venue].values[0]  
chicago_longitude = df_top10_venues.longitude[df_top10_venues.name == top_venue].values[0]

# Create the Folium Map
chicago_heatmat = folium.Map(location=[chicago_latitude, chicago_longitude], zoom_start=16) 

# List comprehension to make out list of lists of Crime Loatitude and Longitude
heat_data = [[row['latitude'], 
              row['longitude']] for index, row in df_crimes_venue.iterrows()]

# Plot the crimes on the map
HeatMap(heat_data,
        min_opacity=0.5,
        max_zoom=18, 
        max_val=1.0, 
        radius=20,
        blur=30,
        gradient=None,
        overlay=True).add_to(chicago_heatmat)

# Add the Venue to the Map
folium.Marker(
    location=[chicago_latitude, chicago_longitude],
    popup=top_venue,
    icon=folium.Icon(color='blue', icon='info-sign')
).add_to(chicago_heatmat)

# Add the Top 10 Restaurants to the map
for row in df_rest_venue[:10].itertuples():
    popup_text = '<h4>' + row.name + '</h4>'
    popup_text = popup_text + '<h5>' + row.category + '</h5>'
    popup_text = popup_text + '<b>Score: </b>' + str(row.score)
    popup = folium.Popup(popup_text)
    folium.Marker([row.latitude, row.longitude], 
                  popup=popup,
                  icon=folium.Icon(color='red', icon='thumbs-up')
                 ).add_to(chicago_heatmat)


# Display the map
chicago_heatmat