# Coursera IBM Data Science Professional Capstone Project

# Auto-workshop location for a new business in Hamilton, Ontario
 ## Report written by Abiola D. Obembe
   ### Date: 23-01-2020

# 1.	Introduction

# 1.1 Background


Hamilton is a port city in the Canadian province of Ontario. An industrialized city in the Golden Horseshoe at the west end of Lake Ontario, Hamilton has a population of 536,917, and its census metropolitan area, which includes Burlington and Grimsby, has a population of 747,545. The city is 58 kilometres (36 mi) southwest of Toronto, with which the Greater Toronto and Hamilton Area (GTHA) is formed. On January 1, 2001, the current boundaries of Hamilton were created through the amalgamation of the original city with other municipalities of the Regional Municipality of Hamilton–Wentworth. Residents of the city are known as Hamiltonians. Since 1981, the metropolitan area has been listed as the ninth largest in Canada and the third largest in Ontario. Hamilton is home to the Royal Botanical Gardens, the Canadian Warplane Heritage Museum, the Bruce Trail, McMaster University, Redeemer University College and Mohawk College. McMaster University is ranked 4th in Canada and 77th in the world by Times Higher Education Rankings 2018–19.

# 1.2 Problem statement
Mr. Jenkins has recently migrated to Ontario, Canada as a permanent resident and currently lives in the city of Hamilton. His long-term goal is to be an entrepreneur and hence part of the reasons he decided to relocate with his family to Canada. Apart from having huge funds to invest in a profitable business, the Canadian government offers incentives and tax waivers to business owners to assist entrepreneurs to establish an auto businesses in Canada. After spending a few months in the city of Hamilton, Mr. Jenkins is excited to start his business in the city. However, after a careful study of the city, he is confident establishing an automobile repair workshop will be profitable. However, he is unsure of the best location for the workshop. The aim of this project is to utilize available data from the city of Hamilton and Foursquare location data to advise Mr. Jenkins on potential location(s) for his workshop. 


# 2. Data Section

The data to be employed for this project will be obtained from the city of Hamilton website from the following webpage http://open.hamilton.ca/datasets/ac6fc684043341f6b1d6298c146a0bcf_1. Specifically, the webpage has data on the distinct municipal addresses in Hamilton. This dataset exists as both CSV files and GEOJSON files. Besides, the original dataset consists of 253,876 rows of data and 13 columns (features). Observation of the CSV dataset reveals that the features include: "X-Coordinate", "Y-Coordinate"," Object ID", "Longitude", "Latitude",  "Number Complete",  "Unit Number Complete", " Full Street Name", "Settlement",  "Community", "Muncipality", "Country", "Province.

Further inspection of the dataset indicates that the community in the city of Hamilton is divided into six which include; Hamilton, Dundas, Stoney Creek, Ancaster, Flamborough, and Glanbrook. Furthermore, the columns labeled  "X-Coordinate", "Y-Coordinate"," Object ID",  "Number Complete",  "Unit Number Complete",  "Settlement",  "Muncipality", "Country", and  "Province do not add additional information and would be neglected from this analysis. In addition, for convenience and to reduce compuational time, only the 100000 rows of data will be employed for analysis.

Finally,  Foursquare data will be employed to explore venues in the city of Hamilton (i.e. community of Hamilton) to have a proper understanding of trending venues and the best location for the car workshop business for Mr. Obembe and if possible propose alternative businesses. 

## 2.1 Sample of  Original dataset

In [1]:
# install libraries
import numpy as np
import pandas as pd
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
! pip install geopy
from geopy.geocoders import Nominatim
import requests
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
! pip install folium
import folium
print('Libraries installed!')

Libraries installed!


In [2]:
# Download and explore dataset
!wget -q -O 'hamilton_data.json' https://opendata.arcgis.com/datasets/ac6fc684043341f6b1d6298c146a0bcf_1.geojson


with open ('hamilton_data.json') as json_data :
      hamilton_data = json.load(json_data)
hamilton_dataset = hamilton_data['features'][0:10000]

hamilton_dataset

[{'type': 'Feature',
  'properties': {'OBJECTID': 1001,
   'LONGITUDE': '-79.98688084600423',
   'LATITUDE': '43.26070877732673',
   'NUMBER_COMPLETE': '4',
   'UNIT_NUMBER_COMPLETE': None,
   'FULL_STREET_NAME': 'Thornton Trail',
   'SETTLEMENT': None,
   'COMMUNITY': 'Dundas',
   'MUNICIPALITY': 'City of Hamilton',
   'COUNTRY': 'Canada',
   'PROVINCE': 'Ontario'},
  'geometry': {'type': 'Point',
   'coordinates': [-79.986884387345, 43.2607172651189]}},
 {'type': 'Feature',
  'properties': {'OBJECTID': 1002,
   'LONGITUDE': '-79.70825610204196',
   'LATITUDE': '43.23553949279581',
   'NUMBER_COMPLETE': '515',
   'UNIT_NUMBER_COMPLETE': '27',
   'FULL_STREET_NAME': 'North Service Road',
   'SETTLEMENT': None,
   'COMMUNITY': 'Stoney Creek',
   'MUNICIPALITY': 'City of Hamilton',
   'COUNTRY': 'Canada',
   'PROVINCE': 'Ontario'},
  'geometry': {'type': 'Point',
   'coordinates': [-79.70825955290819, 43.23554798910615]}},
 {'type': 'Feature',
  'properties': {'OBJECTID': 1003,
   'LONGI

In [3]:
# display lenth of dictionary hamilton_dataset
print('The length of the Hamilton dataset/dictionary is', len(hamilton_dataset))

The length of the Hamilton dataset/dictionary is 10000


In [4]:
# Create dataframe
column_names  =['ID','Address', 'Longitude', 'Latitude', 'Settlement', 'Community', 'Municipal']
neighborhoods = pd.DataFrame(columns= column_names)

for data in hamilton_dataset:
    neigh_ID = data['properties']['OBJECTID']
    neigh_latlon = data['geometry']['coordinates']
    neigh_lon = neigh_latlon[0]
    neigh_lat = neigh_latlon[1]
    neigh_add = data['properties']['FULL_STREET_NAME']
    neigh_comm = data['properties']['COMMUNITY']
    
    neighborhoods = neighborhoods. append({'ID': neigh_ID, 'Longitude': neigh_lon, 'Latitude': neigh_lat, 
                                           'Address' :  neigh_add, 'Community':neigh_comm}, ignore_index = True)
    
neighborhoods.head()

Unnamed: 0,ID,Address,Longitude,Latitude,Settlement,Community,Municipal
0,1001,Thornton Trail,-79.986884,43.260717,,Dundas,
1,1002,North Service Road,-79.70826,43.235548,,Stoney Creek,
2,1003,Frances Avenue,-79.723719,43.240725,,Stoney Creek,
3,1004,Eastview Avenue,-79.754216,43.233877,,Hamilton,
4,1005,Concession 2 West,-80.047481,43.264434,,Flamborough,


# 2.2 Sample of dataset for Analysis

In [5]:
# Quality check: Examine and clean dataframe
neighborhoods.dropna(axis=1, how='any', thresh=None, inplace=True)
print('The dataframe has {} communities in the city of Hamilton.'. format(len(neighborhoods['Community'].unique())))
print(neighborhoods.shape)
neighborhoods.head()

The dataframe has 6 communities in the city of Hamilton.
(10000, 5)


Unnamed: 0,ID,Address,Longitude,Latitude,Community
0,1001,Thornton Trail,-79.986884,43.260717,Dundas
1,1002,North Service Road,-79.70826,43.235548,Stoney Creek
2,1003,Frances Avenue,-79.723719,43.240725,Stoney Creek
3,1004,Eastview Avenue,-79.754216,43.233877,Hamilton
4,1005,Concession 2 West,-80.047481,43.264434,Flamborough


In [8]:
 # pandas drop columns with drop function and display unique communities

df = neighborhoods.groupby(['Address', 'Community'], as_index= False).first()

print('The dataframe has {} communities in the city of Hamilton.'. format(len(df['Community'].unique())))
print(df.shape)

The dataframe has 6 communities in the city of Hamilton.
(2455, 5)


In [9]:
# More information on dataframe
print(df['Community'].unique())
df.head()

['Ancaster' 'Hamilton' 'Glanbrook' 'Stoney Creek' 'Flamborough' 'Dundas']


Unnamed: 0,Address,Community,ID,Longitude,Latitude
0,Abbey Close,Ancaster,4594,-80.001446,43.211121
1,Abbington Drive,Hamilton,2991,-79.905134,43.228994
2,Abbotsford Trail,Glanbrook,1825,-79.912267,43.197734
3,Aberdeen Avenue,Hamilton,1675,-79.888418,43.250281
4,Aberfoyle Avenue,Hamilton,1990,-79.809026,43.221419


In [10]:
# Use geop library to get the latitude and longitude values of the city of Hamilton
address    = 'Hamilton, ON'
geolocator = Nominatim(user_agent ='hamilton_explorer')
location   = geolocator.geocode(address)
latitude   = location.latitude
longitude  = location.longitude

print ('The geographical coordinate of the city of Hamilton are {}, {}.'. format(latitude,longitude))

The geographical coordinate of the city of Hamilton are 43.255205, -79.868202.


In [13]:
# Create a map of the city of Hamilton with the communities superimposed 
map_hamilton = folium.Map(location = [latitude,longitude], zoom_start = 10)

# add makers to map
for lat, lng, label in zip(df['Latitude'][0:1000], df['Longitude'][0:1000], df['Address'][0:1000]):
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat,lng], radius = 5, popup = label, color = 'green', fill = True, fill_color = '#3186cc',
                         fill_opacity = 0.7, parse_html = False). add_to(map_hamilton)

map_hamilton

In [14]:
# Let's explore the first neighborhood in the df dataframe
lati  = df.loc[0,'Latitude']
longi = df.loc[0,'Longitude']
addy  = df.loc[0,'Address']
 
print('Latitude and Longitude values of {} are {}, {}'. format(addy, lati, longi))

Latitude and Longitude values of Abbey Close are 43.211121375501264, -80.001446453813


In [21]:
# The code was removed by Watson Studio for sharing.

Yor credentials:
CLIENT ID :CXH35QZUAPYVZ4IC2BEDC4F4KP2U1TRNVMCAPCGIHCNVNQOR
CLIENT_SECRET:RN4IMJLOJ5UF0JXWALLCLHN3MAXHG01NJBUZC5XKYNL5IHOG
Auto ------ OK!


In [23]:
# function to confirm/extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    if len(categories_list)== 0:
        return None
    else:
        return categories_list[0]['name']

In [24]:
# We are going to explore all the addresses in the dataframe
# The function below repeats the above process to all addresses in the dataframe

def getNearbyVenues(names,latitudes,longitudes, radius = 500):
    venues_list =[]
    for name,lat,lng in zip(names,latitudes,longitudes):
        print(name)
        # create API request url
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'. format(CLIENT_ID, CLIENT_SECRET, VERSION,lat,lng,search_query,
                                                                                                                                    radius,LIMIT)
        results = requests.get(url).json()['response']['venues']
        
        venues_list.append([(name,lat,lng,
                         v['name'], v['location']['lat'], 
                         v['location']['lng'],v['categories']) for v in results])
                      
                    
    nearby_venues = pd.DataFrame( [item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Address', 'Address Latitude', 'Address Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Category']
    return (nearby_venues)


In [25]:
# Apply above function on each address and create new datframe called hamilton_Venues

hamilton_Venues = getNearbyVenues(names = df['Address'], latitudes = df['Latitude'], longitudes= df['Longitude'])

hamilton_Venues

Abbey Close
Abbington Drive
Abbotsford Trail
Aberdeen Avenue
Aberfoyle Avenue
Acacia Street
Academy Street
Acadia Drive
Ackland Street
Acredale Drive
Adair Avenue North
Adair Avenue South
Adele Court
Adeline Avenue
Adis Avenue
Adriatic Boulevard
Afton Avenue
Agnes Street
Aikman Avenue
Airdrie Avenue
Airport Road East
Alanson Street
Albany Avenue
Albert Street
Alberton Road
Albion Falls Boulevard
Albright Road
Alconbury Drive
Alden Street
Alder Court
Aldercrest Avenue
Alderlea Avenue
Aldgate Avenue
Aldridge Court
Alessio Drive
Alexander Road
Alfrin Court
Algonquin Avenue
Alice Street
Allan Avenue
Allan Avenue
Allanbrook Street
Allandale Street
Allenby Avenue
Alma Lane
Alma Street
Alpine Avenue
Alterra Boulevard
Amberly Boulevard
Amelia Street
Amherst Circle
Amore Boulevard
Anchor Road
Anderson Court
Andrew Court
Angus Road
Ann Street
Anna Capri Drive
Annalee Drive
Anson Drive
Anthony Court
Antoinette Court
Appalachian Trail
Appaloosa Trail
Appleblossom Drive
Appleby Road
Appleridge Cour

Unnamed: 0,Address,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,Aberdeen Avenue,43.250281,-79.888418,West town auto,43.25071,-79.892156,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
1,Aberdeen Avenue,43.250281,-79.888418,Westown Auto & Tire,43.250668,-79.892249,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
2,Aberdeen Avenue,43.250281,-79.888418,NAPA AUTOPRO - Westown Auto Services,43.250612,-79.892334,"[{'id': '52f2ab2ebcbc57f1066b8b44', 'name': 'A..."
3,Academy Street,43.228072,-79.974561,NAPA AUTOPRO - Glendale Motors,43.229021,-79.975427,"[{'id': '52f2ab2ebcbc57f1066b8b44', 'name': 'A..."
4,Ackland Street,43.200957,-79.79321,In The Clear Auto Glass,43.198649,-79.792805,"[{'id': '4eb1c1623b7b52c0e1adc2ec', 'name': 'A..."
5,Adair Avenue South,43.233583,-79.791425,Auto Xperts,43.235489,-79.796685,[]
6,Adair Avenue South,43.233583,-79.791425,Auto FX Performance,43.235465,-79.796947,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
7,Adair Avenue South,43.233583,-79.791425,Strickland's Automart,43.23288,-79.79534,"[{'id': '4bf58dd8d48988d124951735', 'name': 'A..."
8,Adeline Avenue,43.241832,-79.791572,Princess Auto,43.243707,-79.78548,"[{'id': '4bf58dd8d48988d112951735', 'name': 'H..."
9,Adriatic Boulevard,43.210973,-79.7094,Dewitt Auto,43.215283,-79.712558,[]


In [26]:
# function to confirm/extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['Category']
    except:
        categories_list = row['venue.Category']
    if len(categories_list)== 0:
        return None
    else:
        return categories_list[0]['name']

In [27]:
# Let us filter the datframe and examine the shape of the new dataframe created
hamilton_Venues['Category']= hamilton_Venues.apply(get_category_type, axis = 1)
print(hamilton_Venues.shape)

(2112, 7)


In [28]:
# Let us inspect the final dataframe
hamilton_Venues


Unnamed: 0,Address,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
0,Aberdeen Avenue,43.250281,-79.888418,West town auto,43.25071,-79.892156,Automotive Shop
1,Aberdeen Avenue,43.250281,-79.888418,Westown Auto & Tire,43.250668,-79.892249,Automotive Shop
2,Aberdeen Avenue,43.250281,-79.888418,NAPA AUTOPRO - Westown Auto Services,43.250612,-79.892334,Auto Garage
3,Academy Street,43.228072,-79.974561,NAPA AUTOPRO - Glendale Motors,43.229021,-79.975427,Auto Garage
4,Ackland Street,43.200957,-79.79321,In The Clear Auto Glass,43.198649,-79.792805,Auto Dealership
5,Adair Avenue South,43.233583,-79.791425,Auto Xperts,43.235489,-79.796685,
6,Adair Avenue South,43.233583,-79.791425,Auto FX Performance,43.235465,-79.796947,Automotive Shop
7,Adair Avenue South,43.233583,-79.791425,Strickland's Automart,43.23288,-79.79534,Automotive Shop
8,Adeline Avenue,43.241832,-79.791572,Princess Auto,43.243707,-79.78548,Hardware Store
9,Adriatic Boulevard,43.210973,-79.7094,Dewitt Auto,43.215283,-79.712558,


In [29]:
# Let us also check how many venues were returned for each address
hamilton_Venues.groupby('Address').count()

Unnamed: 0_level_0,Address Latitude,Address Longitude,Venue,Venue Latitude,Venue Longitude,Category
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aberdeen Avenue,3,3,3,3,3,3
Academy Street,1,1,1,1,1,1
Ackland Street,1,1,1,1,1,1
Adair Avenue South,3,3,3,3,3,2
Adeline Avenue,1,1,1,1,1,1
Adriatic Boulevard,1,1,1,1,1,0
Agnes Street,2,2,2,2,2,2
Aikman Avenue,2,2,2,2,2,2
Airdrie Avenue,3,3,3,3,3,3
Alanson Street,2,2,2,2,2,2


In [30]:
# Let us also examine the amount of unique venues that can be curated from all the venues returned and display them
print('There are {} unique categories'. format(len(hamilton_Venues['Category'].unique())))
print(hamilton_Venues['Category'].unique())

There are 17 unique categories
['Automotive Shop' 'Auto Garage' 'Auto Dealership' None 'Hardware Store'
 'Locksmith' 'Auditorium' 'Auto Workshop' 'Urgent Care Center'
 'Shoe Repair' 'College Engineering Building' 'Electronics Store'
 'Car Wash' 'Dry Cleaner' 'Design Studio' 'Insurance Office' 'Building']


# Step 3: Analyze each address

In [31]:
# one hot encoding
hamiltonVenues_onehot = pd.get_dummies(hamilton_Venues[['Category']], prefix = " ", prefix_sep = " ")
# add the address column to the above created dataframe
hamiltonVenues_onehot['Address'] = hamilton_Venues['Address']
# Let us move the address column to the first column
fixed_columns = [hamiltonVenues_onehot.columns[-1] ] + list (hamiltonVenues_onehot.columns[:-1] )
hamiltonVenues_onehot = hamiltonVenues_onehot[fixed_columns]

print(hamiltonVenues_onehot.shape)
hamiltonVenues_onehot.head()
                                                                 

(2112, 17)


Unnamed: 0,Address,Auditorium,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,Building,Car Wash,College Engineering Building,Design Studio,Dry Cleaner,Electronics Store,Hardware Store,Insurance Office,Locksmith,Shoe Repair,Urgent Care Center
0,Aberdeen Avenue,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,Aberdeen Avenue,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,Aberdeen Avenue,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Academy Street,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ackland Street,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [32]:
# Let us group rows by address and by taking the mean of the frequency of the occurence of each category
hamiltonVenues_grouped = hamiltonVenues_onehot.groupby('Address').mean(). reset_index()
hamiltonVenues_grouped

Unnamed: 0,Address,Auditorium,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,Building,Car Wash,College Engineering Building,Design Studio,Dry Cleaner,Electronics Store,Hardware Store,Insurance Office,Locksmith,Shoe Repair,Urgent Care Center
0,Aberdeen Avenue,0.0,0.0,0.333333,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Academy Street,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ackland Street,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Adair Avenue South,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Adeline Avenue,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
5,Adriatic Boulevard,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Agnes Street,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Aikman Avenue,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
8,Airdrie Avenue,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0
9,Alanson Street,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0


In [33]:
# Now we will print each address along with the top 5 most common enues

num_top_venues = 5
for address_info in hamiltonVenues_grouped['Address']:
    print("----"+address_info+"------")
    temp = hamiltonVenues_grouped[hamiltonVenues_grouped['Address'] == address_info].T.reset_index()
    temp.columns =['venue','freq']
    temp = temp.iloc[1:]
    temp['freq']= temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending= False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aberdeen Avenue------
               venue  freq
0    Automotive Shop  0.67
1        Auto Garage  0.33
2         Auditorium  0.00
3    Auto Dealership  0.00
4      Auto Workshop  0.00


----Academy Street------
               venue  freq
0        Auto Garage   1.0
1         Auditorium   0.0
2    Auto Dealership   0.0
3      Auto Workshop   0.0
4    Automotive Shop   0.0


----Ackland Street------
               venue  freq
0    Auto Dealership   1.0
1         Auditorium   0.0
2        Auto Garage   0.0
3      Auto Workshop   0.0
4    Automotive Shop   0.0


----Adair Avenue South------
               venue  freq
0    Automotive Shop  0.67
1         Auditorium  0.00
2    Auto Dealership  0.00
3        Auto Garage  0.00
4      Auto Workshop  0.00


----Adeline Avenue------
               venue  freq
0     Hardware Store   1.0
1         Auditorium   0.0
2    Auto Dealership   0.0
3        Auto Garage   0.0
4      Auto Workshop   0.0


----Adriatic Boulevard------
               venue 

In [34]:
# We also will create dataframe for the above results
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return  row_categories_sorted.index.values[0:num_top_venues]

In [35]:
num_top_venues = 10
indicators = ['st','nd','rd']
columns = ['Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'. format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'. format(ind+1))
        
# Create dataframe

hamilton_venues_sorted = pd.DataFrame(columns=columns)
hamilton_venues_sorted['Address']= hamiltonVenues_grouped['Address']

for ind in np.arange(hamiltonVenues_grouped.shape[0]):
    hamilton_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hamiltonVenues_grouped.iloc[ind,:], num_top_venues)
    
hamilton_venues_sorted.head()

Unnamed: 0,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberdeen Avenue,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
1,Academy Street,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
2,Ackland Street,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
3,Adair Avenue South,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
4,Adeline Avenue,Hardware Store,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash


# Cluster Addresses in Hamilton using Kmeans

In [36]:
# Let us perform clustering analysis
kclusters = 5
hamiltonVenues_grouped_clustering = hamiltonVenues_grouped. drop('Address',1)

kmeans = KMeans(n_clusters = kclusters, random_state =0). fit(hamiltonVenues_grouped_clustering)

hamilton_venues_sorted.insert(0,'Cluster Labels', kmeans.labels_)
hamilton_venues_sorted.head()

Unnamed: 0,Cluster Labels,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3,Aberdeen Avenue,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
1,0,Academy Street,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
2,2,Ackland Street,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
3,3,Adair Avenue South,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
4,4,Adeline Avenue,Hardware Store,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash


In [37]:
# I create a new dataframe that includes the cluster as well as the top three venues of each address

hamilton_merged = df
hamilton_merged = hamilton_merged.join(hamilton_venues_sorted.set_index('Address'), on= 'Address')
hamilton_merged.head(5)

Unnamed: 0,Address,Community,ID,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Close,Ancaster,4594,-80.001446,43.211121,,,,,,,,,,,
1,Abbington Drive,Hamilton,2991,-79.905134,43.228994,,,,,,,,,,,
2,Abbotsford Trail,Glanbrook,1825,-79.912267,43.197734,,,,,,,,,,,
3,Aberdeen Avenue,Hamilton,1675,-79.888418,43.250281,3.0,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
4,Aberfoyle Avenue,Hamilton,1990,-79.809026,43.221419,,,,,,,,,,,


In [38]:
#Let us examine combined dataframe
hamilton_merged.dropna(axis=0, how='any', thresh=None, inplace=True)
hamilton_merged.head(5)

Unnamed: 0,Address,Community,ID,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Aberdeen Avenue,Hamilton,1675,-79.888418,43.250281,3.0,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
6,Academy Street,Ancaster,8769,-79.974561,43.228072,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
8,Ackland Street,Stoney Creek,54,-79.79321,43.200957,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
11,Adair Avenue South,Hamilton,3710,-79.791425,43.233583,3.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
13,Adeline Avenue,Hamilton,3080,-79.791572,43.241832,4.0,Hardware Store,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash


In [46]:
# Let's visualize the resulting clusters
map_clusters = folium.Map(location = [latitude,longitude], zoom_start = 11)
x = np.arange(kclusters)
ys = [i + x + (i * x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add markers
markers_colors = []
for lat, lon,poi, cluster in zip(hamilton_merged['Latitude'][0:500],  hamilton_merged['Longitude'][0:500],hamilton_merged['Address'][0:500], hamilton_merged['Cluster Labels'][0:500] ):
    label = folium.Popup('Cluster'+ str(cluster), parse_html = True)
    folium.CircleMarker([lat,lon], radius = 5, popup=label, color = rainbow[int(cluster)-1] , fill = True, fill_color = rainbow[int(cluster)-1] ,fill_opacity = 0.7).add_to(map_clusters)
    
map_clusters

In [41]:
# Examine Cluster 1
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 0, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Ancaster,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
18,Hamilton,0.0,Locksmith,Auto Garage,Urgent Care Center,Shoe Repair,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
43,Hamilton,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
83,Hamilton,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
84,Hamilton,0.0,Locksmith,Auto Garage,Urgent Care Center,Shoe Repair,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
100,Stoney Creek,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
116,Dundas,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
122,Stoney Creek,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
156,Hamilton,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
161,Hamilton,0.0,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building


In [42]:
# Examine Cluster 2
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 1, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
30,Glanbrook,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
52,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
56,Dundas,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
69,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
70,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
79,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
80,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
82,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
86,Hamilton,1.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building


In [43]:
# Examine Cluster 3
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 2, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
103,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
111,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
186,Hamilton,2.0,Dry Cleaner,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Design Studio,College Engineering Building
232,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
282,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
320,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
349,Stoney Creek,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
352,Hamilton,2.0,Dry Cleaner,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Design Studio,College Engineering Building
409,Hamilton,2.0,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building


In [44]:
# Examine Cluster 4
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 3, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Hamilton,3.0,Automotive Shop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
11,Hamilton,3.0,Automotive Shop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
19,Hamilton,3.0,Automotive Shop,Hardware Store,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
22,Hamilton,3.0,Automotive Shop,Auto Workshop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner
38,Hamilton,3.0,Automotive Shop,Auto Workshop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner
39,Dundas,3.0,Automotive Shop,Auto Garage,Auto Workshop,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store
40,Hamilton,3.0,Automotive Shop,Auto Garage,Auto Workshop,Auto Dealership,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store
42,Hamilton,3.0,Automotive Shop,Auto Workshop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner
46,Hamilton,3.0,Automotive Shop,Auditorium,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio
74,Hamilton,3.0,Automotive Shop,Auto Workshop,Auto Garage,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner


In [45]:
# Examine Cluster 5
hamilton_merged.loc[hamilton_merged['Cluster Labels'] == 4, hamilton_merged.columns[ [1] + list(range(5, hamilton_merged.shape[1]))]]

Unnamed: 0,Community,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Hamilton,4.0,Hardware Store,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
15,Stoney Creek,4.0,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
21,Hamilton,4.0,Locksmith,Auditorium,Urgent Care Center,Shoe Repair,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
25,Hamilton,4.0,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
45,Dundas,4.0,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
78,Hamilton,4.0,College Engineering Building,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,Car Wash
193,Hamilton,4.0,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
309,Dundas,4.0,Auto Workshop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building
315,Hamilton,4.0,Design Studio,Automotive Shop,Auto Workshop,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner
392,Flamborough,4.0,Urgent Care Center,Shoe Repair,Locksmith,Insurance Office,Hardware Store,Electronics Store,Dry Cleaner,Design Studio,College Engineering Building,Car Wash
