# Coursera IBM Data Science Professional Capstone Project

# Leveraging Foursquare API and Hamilton Neighbourhood Dataset for Auto-workshop location 
 ## Report written by Abiola D. Obembe
   ### Date: 23-01-2020

# 1.	Introduction

## 1.1 Background


Hamilton is a port city in the Canadian province of Ontario. An industrialized city in the Golden Horseshoe at the west end of Lake Ontario, Hamilton has a population of 536,917, and its census metropolitan area, which includes Burlington and Grimsby, has a population of 747,545. The city is 58 kilometres (36 mi) southwest of Toronto, with which the Greater Toronto and Hamilton Area (GTHA) is formed. On January 1, 2001, the current boundaries of Hamilton were created through the amalgamation of the original city with other municipalities of the Regional Municipality of Hamilton–Wentworth. Residents of the city are known as Hamiltonians. Since 1981, the metropolitan area has been listed as the ninth largest in Canada and the third largest in Ontario. With such a significant population lies opportunities and the potential for establishing businesses to cater for the populace. Take for instance, it will be beneficial for prospective entrepreneurs to be able to determine the type of business to establish and the best location to setup their office

## 1.2 Problem statement
After residing a few months in the city of Hamilton, a wealthy contractor has decided to establish a business in the local area. Based on his previous work experience in the auto industry, he has suggested that his preference will be to set up an auto business in Canada. However, after a careful study of the city, he is confident that establishing an automobile repair workshop will be profitable. However, he is unsure of the best location for his workshop. The aim of this project is to utilize available data from the city of Hamilton and Foursquare API to guide the contractor on choice location(s) to setup his workshop.


## 1.3 Interest

Using a data-driven approach to determine a business type, the best location(s) to setup businesses will be highly beneficial to Entrepreneurs as it ensures that their business will receive traction. In addition, this will help the government to develop the required infrastructure in such neighborhoods to attract more businesses which will increase employment and provide revenue for the government through taxes from businesses and people. 



# 2. Data Section

## 2.1 Data sources
Data to be employed for this project will be obtained from the city of Hamilton website from the site http://open.hamilton.ca/datasets/ac6fc684043341f6b1d6298c146a0bcf_1. Specifically, the webpage has data on the distinct municipal addresses in Hamilton. This dataset exists as both CSV files and GEOJSON files. Besides, the original dataset consists of 253,876 rows of data and 12 columns (features).  

In [1]:
# install libraries
import numpy as np
import pandas as pd
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
! pip install geopy
from geopy.geocoders import Nominatim
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
! pip install folium
import folium
print('Libraries installed!')

Libraries installed!


In [2]:
# Download and explore dataset
!wget -q -O 'hamilton_data.json' https://opendata.arcgis.com/datasets/ac6fc684043341f6b1d6298c146a0bcf_1.geojson


with open ('hamilton_data.json') as json_data :
      hamilton_data = json.load(json_data)
hamilton_dataset = hamilton_data['features'][0:100000]

hamilton_dataset

[{'type': 'Feature',
  'properties': {'OBJECTID': 1,
   'LONGITUDE': '-79.79023580613546',
   'LATITUDE': '43.20757090548786',
   'NUMBER_COMPLETE': '110',
   'UNIT_NUMBER_COMPLETE': None,
   'FULL_STREET_NAME': 'Hanover Place',
   'SETTLEMENT': None,
   'COMMUNITY': 'Hamilton',
   'MUNICIPALITY': 'City of Hamilton',
   'COUNTRY': 'Canada',
   'PROVINCE': 'Ontario'},
  'geometry': {'type': 'Point',
   'coordinates': [-79.7902392805614, 43.20757939303672]}},
 {'type': 'Feature',
  'properties': {'OBJECTID': 2,
   'LONGITUDE': '-79.75926619930769',
   'LATITUDE': '43.21327570171375',
   'NUMBER_COMPLETE': '49',
   'UNIT_NUMBER_COMPLETE': None,
   'FULL_STREET_NAME': 'Upper Lake Avenue',
   'SETTLEMENT': None,
   'COMMUNITY': 'Stoney Creek',
   'MUNICIPALITY': 'City of Hamilton',
   'COUNTRY': 'Canada',
   'PROVINCE': 'Ontario'},
  'geometry': {'type': 'Point',
   'coordinates': [-79.75926966441459, 43.21328419171143]}},
 {'type': 'Feature',
  'properties': {'OBJECTID': 3,
   'LONGITUDE':

In [3]:
# display lenth of dictionary hamilton_dataset
print('The length of the Hamilton dataset/dictionary is', len(hamilton_dataset))

The length of the Hamilton dataset/dictionary is 100000


In [4]:
# Create dataframe
column_names  =['ID','Address', 'Longitude', 'Latitude', 'Settlement', 'Community', 'Municipal']
neighborhoods = pd.DataFrame(columns= column_names)

for data in hamilton_dataset:
    neigh_ID = data['properties']['OBJECTID']
    neigh_latlon = data['geometry']['coordinates']
    neigh_lon = neigh_latlon[0]
    neigh_lat = neigh_latlon[1]
    neigh_add = data['properties']['FULL_STREET_NAME']
    neigh_comm = data['properties']['COMMUNITY']
    
    neighborhoods = neighborhoods. append({'ID': neigh_ID, 'Longitude': neigh_lon, 'Latitude': neigh_lat, 
                                           'Address' :  neigh_add, 'Community':neigh_comm}, ignore_index = True)
    
neighborhoods.head()

Unnamed: 0,ID,Address,Longitude,Latitude,Settlement,Community,Municipal
0,1,Hanover Place,-79.790239,43.207579,,Hamilton,
1,2,Upper Lake Avenue,-79.75927,43.213284,,Stoney Creek,
2,3,Orr Crescent,-79.709277,43.209003,,Stoney Creek,
3,4,Concession 8 West,-80.013807,43.380473,,Flamborough,
4,5,Cooper Road,-80.183338,43.340202,,Flamborough,


# 2.2 Sample of dataset for Analysis

In [5]:
# Quality check: Examine and clean dataframe
neighborhoods.dropna(axis=1, how='any', thresh=None, inplace=True)
print('The dataframe has {} communities in the city of Hamilton.'. format(len(neighborhoods['Community'].unique())))
print(neighborhoods.shape)
neighborhoods.head()

The dataframe has 7 communities in the city of Hamilton.
(100000, 5)


Unnamed: 0,ID,Address,Longitude,Latitude,Community
0,1,Hanover Place,-79.790239,43.207579,Hamilton
1,2,Upper Lake Avenue,-79.75927,43.213284,Stoney Creek
2,3,Orr Crescent,-79.709277,43.209003,Stoney Creek
3,4,Concession 8 West,-80.013807,43.380473,Flamborough
4,5,Cooper Road,-80.183338,43.340202,Flamborough


In [6]:
 # pandas drop columns with drop function and display unique communities

df = neighborhoods.groupby(['Address', 'Community'], as_index= False).first()

print('The dataframe has {} communities in the city of Hamilton.'. format(len(df['Community'].unique())))
print(df.shape)

The dataframe has 7 communities in the city of Hamilton.
(4056, 5)


In [7]:
# More information on dataframe
print(df['Community'].unique())
df.head()

['Flamborough' 'Ancaster' 'Hamilton' 'Glanbrook' 'Dundas' 'Stoney Creek'
 'Burlington']


Unnamed: 0,Address,Community,ID,Longitude,Latitude
0,3rd Concession Road East,Flamborough,63379,-79.860544,43.346062
1,Abbey Close,Ancaster,4821,-80.001446,43.211121
2,Abbington Drive,Hamilton,3128,-79.905134,43.228994
3,Abbot Court,Hamilton,28830,-79.858676,43.223276
4,Abbot Drive,Hamilton,11971,-79.857134,43.225534


In [8]:
# Use geop library to get the latitude and longitude values of the city of Hamilton
address    = 'Hamilton, ON'
geolocator = Nominatim(user_agent ='hamilton_explorer')
location   = geolocator.geocode(address)
latitude   = location.latitude
longitude  = location.longitude

print ('The geographical coordinate of the city of Hamilton are {}, {}.'. format(latitude,longitude))

The geographical coordinate of the city of Hamilton are 43.255205, -79.868202.


In [9]:
# Create a map of the city of Hamilton with the communities superimposed 
map_hamilton = folium.Map(location = [latitude,longitude], zoom_start = 10)

# add makers to map
for lat, lng, label in zip(df['Latitude'][0:1000], df['Longitude'][0:1000], df['Address'][0:1000]):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat,lng], radius = 5, popup = label, color = 'green', fill = True, fill_color = '#3186cc',
                         fill_opacity = 0.7, parse_html = False). add_to(map_hamilton)

map_hamilton

In [10]:
# Let's explore the first neighborhood in the df dataframe
lati  = df.loc[0,'Latitude']
longi = df.loc[0,'Longitude']
addy  = df.loc[0,'Address']
 
print('Latitude and Longitude values of {} are {}, {}'. format(addy, lati, longi))

Latitude and Longitude values of 3rd Concession Road East are 43.346061745234074, -79.86054444128337


In [11]:
# The code was removed by Watson Studio for sharing.

Yor credentials:
CLIENT ID :CXH35QZUAPYVZ4IC2BEDC4F4KP2U1TRNVMCAPCGIHCNVNQOR
CLIENT_SECRET:RN4IMJLOJ5UF0JXWALLCLHN3MAXHG01NJBUZC5XKYNL5IHOG
Auto ------ OK!


In [12]:
# function to confirm/extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list)== 0:
        return None
    else:
        return categories_list[0]['name']

In [13]:
# We are going to explore all the addresses in the dataframe
# The function below repeats the above process to all addresses in the dataframe

def getNearbyVenues(names,latitudes,longitudes, radius = 500):
    venues_list =[]
    for name,lat,lng in zip(names,latitudes,longitudes):
        print(name)
        # create API request url
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'. format(CLIENT_ID, CLIENT_SECRET, VERSION,lat,lng,search_query,
                                                                                                                                    radius,LIMIT)
        results = requests.get(url).json()['response']['venues']
        
        venues_list.append([(name,lat,lng,
                         v['name'], v['location']['lat'], 
                         v['location']['lng'],v['categories']) for v in results])
                      
                    
    nearby_venues = pd.DataFrame( [item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Address', 'Address Latitude', 'Address Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Category']
    return (nearby_venues)


In [14]:
# Apply above function on each address and create new datframe called hamilton_Venues

hamilton_Venues = getNearbyVenues(names = df['Address'], latitudes = df['Latitude'], longitudes= df['Longitude'])

hamilton_Venues

3rd Concession Road East
Abbey Close
Abbington Drive
Abbot Court
Abbot Drive
Abbotsford Trail
Abel Court
Aberdeen Avenue
Aberdeen Avenue
Aberfoyle Avenue
Acacia Street
Academy Street
Acadia Drive
Ackland Street
Acorn Street
Acredale Drive
Ada Court
Adair Avenue North
Adair Avenue South
Adams Street
Adelaide Avenue
Adele Court
Adeline Avenue
Adis Avenue
Adler Avenue
Admiral Place
Adorn Court
Adriatic Boulevard
Afton Avenue
Agawam Court
Agincourt Avenue
Agnes Street
Aikman Avenue
Ainsley Road
Ainslie Avenue
Ainsworth Street
Aintree Court
Airdrie Avenue
Airport Road East
Airport Road West
Alanson Street
Alba Street
Albany Avenue
Albemarle Street
Albert Street
Albert Street
Albert Street
Alberton Road
Albion Falls Boulevard
Albright Road
Alconbury Drive
Alden Street
Alder Court
Aldercrest Avenue
Aldercrest Avenue
Alderlea Avenue
Alderney Avenue
Alderson Drive
Alderson Road
Aldgate Avenue
Aldridge Court
Aldridge Street
Alessio Drive
Alexander Road
Alexander Street
Alexsia Court
Alfrin Court

Unnamed: 0,Address,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,Abbot Court,43.223276,-79.858676,Dicks Auto Shop,43.223578,-79.862326,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
1,Abbot Court,43.223276,-79.858676,Beverly Tire & Auto,43.221661,-79.853084,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
2,Abbot Drive,43.225534,-79.857134,Dicks Auto Shop,43.223578,-79.862326,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
3,Abbot Drive,43.225534,-79.857134,Beverly Tire & Auto,43.221661,-79.853084,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
4,Aberdeen Avenue,43.24975,-79.88841,West town auto,43.25071,-79.892156,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
5,Aberdeen Avenue,43.24975,-79.88841,Westown Auto & Tire,43.250668,-79.892249,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
6,Aberdeen Avenue,43.24975,-79.88841,NAPA AUTOPRO - Westown Auto Services,43.250641,-79.892299,"[{'id': '56aa371be4b08b9a8d5734d3', 'name': 'A..."
7,Academy Street,43.228072,-79.974561,NAPA AUTOPRO - Glendale Motors,43.228966,-79.975432,"[{'id': '56aa371be4b08b9a8d5734d3', 'name': 'A..."
8,Ackland Street,43.200957,-79.79321,In The Clear Auto Glass,43.198649,-79.792805,"[{'id': '4eb1c1623b7b52c0e1adc2ec', 'name': 'A..."
9,Acorn Street,43.252413,-79.844719,Auto Key Pro,43.25022,-79.850382,"[{'id': '52f2ab2ebcbc57f1066b8b1e', 'name': 'L..."


In [15]:
# function to confirm/extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['Category']
    except:
        categories_list = row['venue.Category']
    if len(categories_list)== 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
# Let us filter the datframe and examine the shape of the new dataframe created
hamilton_Venues['Category']= hamilton_Venues.apply(get_category_type, axis = 1)
print(hamilton_Venues.shape)

(3215, 7)


In [17]:
# Let us inspect the final dataframe
hamilton_Venues


Unnamed: 0,Address,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,Abbot Court,43.223276,-79.858676,Dicks Auto Shop,43.223578,-79.862326,Automotive Shop
1,Abbot Court,43.223276,-79.858676,Beverly Tire & Auto,43.221661,-79.853084,Automotive Shop
2,Abbot Drive,43.225534,-79.857134,Dicks Auto Shop,43.223578,-79.862326,Automotive Shop
3,Abbot Drive,43.225534,-79.857134,Beverly Tire & Auto,43.221661,-79.853084,Automotive Shop
4,Aberdeen Avenue,43.24975,-79.88841,West town auto,43.25071,-79.892156,Automotive Shop
5,Aberdeen Avenue,43.24975,-79.88841,Westown Auto & Tire,43.250668,-79.892249,Automotive Shop
6,Aberdeen Avenue,43.24975,-79.88841,NAPA AUTOPRO - Westown Auto Services,43.250641,-79.892299,Auto Workshop
7,Academy Street,43.228072,-79.974561,NAPA AUTOPRO - Glendale Motors,43.228966,-79.975432,Auto Workshop
8,Ackland Street,43.200957,-79.79321,In The Clear Auto Glass,43.198649,-79.792805,Auto Dealership
9,Acorn Street,43.252413,-79.844719,Auto Key Pro,43.25022,-79.850382,Locksmith


In [18]:
# Let us also check how many venues were returned for each address
hamilton_Venues.groupby('Address').count()

Unnamed: 0_level_0,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbot Court,2,2,2,2,2,2
Abbot Drive,2,2,2,2,2,2
Aberdeen Avenue,3,3,3,3,3,3
Academy Street,1,1,1,1,1,1
Ackland Street,1,1,1,1,1,1
Acorn Street,2,2,2,2,2,2
Adair Avenue South,1,1,1,1,1,1
Adams Street,4,4,4,4,4,4
Adeline Avenue,2,2,2,2,2,2
Adler Avenue,6,6,6,6,6,6


In [19]:
# Let us also examine the amount of unique venues that can be curated from all the venues returned and display them
print('There are {} unique categories'. format(len(hamilton_Venues['Category'].unique())))
print(hamilton_Venues['Category'].unique())

There are 19 unique categories
['Automotive Shop' 'Auto Workshop' 'Auto Dealership' 'Locksmith'
 'Auto Garage' 'Hardware Store' None 'Car Wash' 'Auditorium' 'Shoe Repair'
 'Urgent Care Center' 'Electronics Store' 'College Engineering Building'
 'Jewelry Store' 'Dry Cleaner' 'Building' 'Design Studio'
 'Insurance Office' 'Office']


# Step 3: Analyze each address

In [20]:
# one hot encoding
hamiltonVenues_onehot = pd.get_dummies(hamilton_Venues[['Category']], prefix = " ", prefix_sep = " ")
# add the address column to the above created dataframe
hamiltonVenues_onehot['Address'] = hamilton_Venues['Address']
# Let us move the address column to the first column
fixed_columns = [hamiltonVenues_onehot.columns[-1] ] + list (hamiltonVenues_onehot.columns[:-1] )
hamiltonVenues_onehot = hamiltonVenues_onehot[fixed_columns]

print(hamiltonVenues_onehot.shape)
hamiltonVenues_onehot.head()
                                                                 

(3215, 19)


Unnamed: 0,Address,Auditorium,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,Building,Car Wash,College Engineering Building,Design Studio,Dry Cleaner,Electronics Store,Hardware Store,Insurance Office,Jewelry Store,Locksmith,Office,Shoe Repair,Urgent Care Center
0,Abbot Court,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbot Court,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbot Drive,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abbot Drive,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aberdeen Avenue,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [21]:
# Let us group rows by address and by taking the mean of the frequency of the occurence of each category
hamiltonVenues_grouped = hamiltonVenues_onehot.groupby('Address').mean(). reset_index()
hamiltonVenues_grouped

Unnamed: 0,Address,Auditorium,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,Building,Car Wash,College Engineering Building,Design Studio,Dry Cleaner,Electronics Store,Hardware Store,Insurance Office,Jewelry Store,Locksmith,Office,Shoe Repair,Urgent Care Center
0,Abbot Court,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Abbot Drive,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aberdeen Avenue,0.0,0.0,0.0,0.333333,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Academy Street,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ackland Street,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Acorn Street,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
6,Adair Avenue South,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Adams Street,0.0,0.0,0.5,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Adeline Avenue,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0
9,Adler Avenue,0.0,0.166667,0.0,0.166667,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
# Now we will print each address along with the top 5 most common enues

num_top_venues = 5
for address_info in hamiltonVenues_grouped['Address']:
    print("----"+address_info+"------")
    temp = hamiltonVenues_grouped[hamiltonVenues_grouped['Address'] == address_info].T.reset_index()
    temp.columns =['venue','freq']
    temp = temp.iloc[1:]
    temp['freq']= temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending= False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbot Court------
                 venue  freq
0      Automotive Shop   1.0
1           Auditorium   0.0
2    Electronics Store   0.0
3          Shoe Repair   0.0
4               Office   0.0


----Abbot Drive------
                 venue  freq
0      Automotive Shop   1.0
1           Auditorium   0.0
2    Electronics Store   0.0
3          Shoe Repair   0.0
4               Office   0.0


----Aberdeen Avenue------
                 venue  freq
0      Automotive Shop  0.67
1        Auto Workshop  0.33
2           Auditorium  0.00
3    Electronics Store  0.00
4          Shoe Repair  0.00


----Academy Street------
                 venue  freq
0        Auto Workshop   1.0
1           Auditorium   0.0
2    Electronics Store   0.0
3          Shoe Repair   0.0
4               Office   0.0


----Ackland Street------
               venue  freq
0    Auto Dealership   1.0
1         Auditorium   0.0
2        Shoe Repair   0.0
3             Office   0.0
4          Locksmith   0.0


----Acorn St

In [23]:
# We also will create dataframe for the above results
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return  row_categories_sorted.index.values[0:num_top_venues]

In [24]:
num_top_venues = 5
indicators = ['st','nd','rd']
columns = ['Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'. format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'. format(ind+1))
        
# Create dataframe

hamilton_venues_sorted = pd.DataFrame(columns=columns)
hamilton_venues_sorted['Address']= hamiltonVenues_grouped['Address']

for ind in np.arange(hamiltonVenues_grouped.shape[0]):
    hamilton_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hamiltonVenues_grouped.iloc[ind,:], num_top_venues)
    
hamilton_venues_sorted.head()

Unnamed: 0,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbot Court,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
1,Abbot Drive,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
2,Aberdeen Avenue,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
3,Academy Street,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
4,Ackland Street,Auto Dealership,Urgent Care Center,Shoe Repair,Auto Garage,Auto Workshop


# Cluster Addresses in Hamilton using Kmeans

In [25]:
# Let us perform clustering analysis
kclusters = 3
hamiltonVenues_grouped_clustering = hamiltonVenues_grouped. drop('Address',1)

kmeans = KMeans(n_clusters = kclusters, random_state =0). fit(hamiltonVenues_grouped_clustering)

hamilton_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)
hamilton_venues_sorted.head()

Unnamed: 0,Cluster Labels,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Abbot Court,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
1,1,Abbot Drive,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
2,1,Aberdeen Avenue,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
3,0,Academy Street,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
4,0,Ackland Street,Auto Dealership,Urgent Care Center,Shoe Repair,Auto Garage,Auto Workshop


In [26]:
# I create a new dataframe that includes the cluster as well as the top three venues of each address

hamilton_merged = df
hamilton_merged = hamilton_merged.join(hamilton_venues_sorted.set_index('Address'), on= 'Address')
hamilton_merged.head(5)

Unnamed: 0,Address,Community,ID,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3rd Concession Road East,Flamborough,63379,-79.860544,43.346062,,,,,,
1,Abbey Close,Ancaster,4821,-80.001446,43.211121,,,,,,
2,Abbington Drive,Hamilton,3128,-79.905134,43.228994,,,,,,
3,Abbot Court,Hamilton,28830,-79.858676,43.223276,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
4,Abbot Drive,Hamilton,11971,-79.857134,43.225534,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage


In [27]:
#Let us examine combined dataframe
hamilton_merged.dropna(axis=0, how='any', thresh=None, inplace=True)
hamilton_merged.head(5)

Unnamed: 0,Address,Community,ID,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Abbot Court,Hamilton,28830,-79.858676,43.223276,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
4,Abbot Drive,Hamilton,11971,-79.857134,43.225534,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
7,Aberdeen Avenue,Glanbrook,13465,-79.921113,43.15349,1.0,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
8,Aberdeen Avenue,Hamilton,426,-79.88841,43.24975,1.0,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
11,Academy Street,Ancaster,8379,-79.974561,43.228072,0.0,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage


In [28]:
# Let's visualize the resulting clusters
map_clusters = folium.Map(location = [latitude,longitude], zoom_start = 11)
x = np.arange(kclusters)
ys = [i + x + (i * x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers
markers_colors = []
for lat, lon,poi, cluster in zip(hamilton_merged['Latitude'][0:500],  hamilton_merged['Longitude'][0:500],hamilton_merged['Address'][0:500], hamilton_merged['Cluster Labels'][0:500] ):
    label = folium.Popup('Cluster'+ str(cluster), parse_html = True)
    folium.CircleMarker([lat,lon], radius = 5, popup=label, color = rainbow[int(cluster)-1] , fill = True, fill_color = rainbow[int(cluster)-1] ,fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters

In [29]:
# Examine Cluster 1
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 0, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,Ancaster,0.0,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
13,Stoney Creek,0.0,Auto Dealership,Urgent Care Center,Shoe Repair,Auto Garage,Auto Workshop
27,Stoney Creek,0.0,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage,Auto Workshop
40,Hamilton,0.0,Auditorium,Locksmith,College Engineering Building,Auto Dealership,Auto Garage
42,Hamilton,0.0,Auto Dealership,Auto Garage,Automotive Shop,Urgent Care Center,Shoe Repair
44,Dundas,0.0,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
45,Flamborough,0.0,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
46,Hamilton,0.0,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
48,Hamilton,0.0,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage,Auto Workshop
72,Hamilton,0.0,Automotive Shop,Auto Garage,Auto Workshop,Urgent Care Center,Shoe Repair


In [30]:
# Examine Cluster 2
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 1, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
4,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
7,Glanbrook,1.0,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
8,Hamilton,1.0,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Auto Dealership
18,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Garage
22,Hamilton,1.0,Automotive Shop,Hardware Store,Urgent Care Center,College Engineering Building,Auto Dealership
24,Hamilton,1.0,Automotive Shop,Auto Dealership,Auto Workshop,Urgent Care Center,Shoe Repair
25,Hamilton,1.0,Automotive Shop,Hardware Store,Urgent Care Center,College Engineering Building,Auto Dealership
26,Hamilton,1.0,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership
30,Hamilton,1.0,Automotive Shop,Auto Garage,Car Wash,Urgent Care Center,Shoe Repair


In [31]:
# Examine Cluster 3
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 2, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,Hamilton,2.0,Locksmith,Auto Garage,Urgent Care Center,College Engineering Building,Auto Dealership
19,Hamilton,2.0,Auto Garage,Automotive Shop,Urgent Care Center,Shoe Repair,Auto Dealership
32,Hamilton,2.0,Locksmith,Auto Garage,Urgent Care Center,College Engineering Building,Auto Dealership
79,Hamilton,2.0,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Workshop
88,Hamilton,2.0,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Workshop
160,Hamilton,2.0,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Workshop
161,Hamilton,2.0,Locksmith,Auto Garage,Urgent Care Center,College Engineering Building,Auto Dealership
184,Hamilton,2.0,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Workshop
203,Hamilton,2.0,Locksmith,Auto Garage,Urgent Care Center,College Engineering Building,Auto Dealership
221,Dundas,2.0,Auto Garage,Urgent Care Center,Shoe Repair,Auto Dealership,Auto Workshop
