<a href="https://colab.research.google.com/github/pse843/Geo-Data/blob/master/geo_data_using_machine_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone Project - Geo Data in MTL using Machine Laerning



## Table of contents
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results](#results)
* [Discussion](#discussion)

## Data <a name="data"></a>

The data to find the neighborhood will be obtained by exploring the city. Using Foursquare API, venues and their information will be provided. The information includes names, categories, coordinates, addresses, distances. 
The exploration will be performed centering around Mont Royal.


After obtaining the data, it will be clustered by categories. Each cluster is expected to give information of each venue including names, categories, addresses, and cluster numbers. Among them, categories and cluster names will be used for visualization on the map as a pop-up text.

## Methodology <a name="methodology"></a>

**Using API by Foursquare:**
To use an API which is provided by Foursquare, an URL was created including credentials, coordinates obtained and radius. With the URL, GET request was sent.


**Obtaining data by JSON:**
The data was obtained as a JSON file. Essential information was extracted from the file then normalized in a form of dataframe.


**Clustered by categories:**
The venues were clustered based on the values under the column, categories. The result of clustering and the existing dataframe were merged. 


**Visualized by Folium:**
The clustered venues were visualized on the map using Folium. The map was generated centering around Montreal downtown.


In [29]:
import requests 
import pandas as pd 
import numpy as np 
import folium 

from geopy.geocoders import Nominatim 
from IPython.core.display import HTML     
from pandas.io.json import json_normalize

print('Libraries imported.')

Libraries imported.


In [0]:
# Define Foursquare Credentials and Version

CLIENT_ID = ''
CLIENT_SECRET = '' 
VERSION = ''
LIMIT = 30

In [31]:
# Get the coordinates of Montreal

address = 'Montreal, QC'

geolocator = Nominatim(user_agent="mtl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
radius = 100000
LIMIT = 500
print('The geograpical coordinate of Montreal are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montreal are 45.4972159, -73.6103642.


In [32]:
# Create an URL

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url


'https://api.foursquare.com/v2/venues/explore?client_id=LHQLMZUF4AG2AIKEIARI2GGKBUTDRBMI2BCDQU3UGLA4BPXZ&client_secret=0CFPDJVYDP3MVM2MEQKPJX1JC2IINL5VT5J5OIVEQWVXY225&ll=45.4972159,-73.6103642&v=20200610&radius=100000&limit=500'

In [0]:
# Send GET request then check the results:

results = requests.get(url).json()

## Analysis <a name="analysis"></a>

**JSON file:**
Since the data was obtained in a form of JSON, investigation of the file was necessary to understand it. The structure of data, including hierarchy among keys and values in dictionaries were checked. It was useful to figure out the location of proper data to extract.


In [34]:
items = results['response']['groups'][0]['items']
items[0]

{'reasons': {'count': 0,
  'items': [{'reasonName': 'globalInteractionReason',
    'summary': 'This spot is popular',
    'type': 'general'}]},
 'referralId': 'e-0-4ad8f749f964a520871621e3-0',
 'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
     'suffix': '.png'},
    'id': '4bf58dd8d48988d163941735',
    'name': 'Park',
    'pluralName': 'Parks',
    'primary': True,
    'shortName': 'Park'}],
  'id': '4ad8f749f964a520871621e3',
  'location': {'address': '1260 Chemin Remembrance',
   'cc': 'CA',
   'city': 'Montréal',
   'country': 'Canada',
   'distance': 1953,
   'formattedAddress': ['1260 Chemin Remembrance',
    'Montréal QC H3H 1A2',
    'Canada'],
   'labeledLatLngs': [{'label': 'display',
     'lat': 45.50407921694641,
     'lng': -73.58732075575296}],
   'lat': 45.50407921694641,
   'lng': -73.58732075575296,
   'postalCode': 'H3H 1A2',
   'state': 'QC'},
  'name': 'Parc du Mont-Royal',
  'photos': {'count': 0, 'grou

**Dataframe:**
To transform the data from the JSON file to a dataframe, normalization was performed. Then the number of venues was checked.


In [35]:
# Transform the JSON file as Pandas Dataframe

df = json_normalize(items)

  This is separate from the ipykernel package so we can avoid doing imports until


In [0]:
# Filter the columns:

filtered_columns = ['venue.name', 'venue.categories'] + [col for col in df.columns if col.startswith('venue.location.')] + ['venue.id']
df = df.loc[:, filtered_columns]

In [0]:
# Create a function that extracts the category of the venue

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [0]:
df['venue.categories'] = df.apply(get_category_type, axis=1)

In [39]:
print('{} venues were returned.'.format(df.shape[0]))

100 venues were returned.


In [0]:
# Clean column names by keeping only last term

df.columns = [col.split('.')[-1] for col in df.columns]

In [41]:
df.head()

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Parc du Mont-Royal,Park,1260 Chemin Remembrance,45.504079,-73.587321,"[{'label': 'display', 'lat': 45.50407921694641...",1953,H3H 1A2,CA,Montréal,QC,Canada,"[1260 Chemin Remembrance, Montréal QC H3H 1A2,...",,,4ad8f749f964a520871621e3
1,Parc Westmount Park,Park,Parc Westmount,45.481574,-73.597476,"[{'label': 'display', 'lat': 45.48157356036293...",2010,,CA,Westmount,QC,Canada,"[Parc Westmount, Westmount QC, Canada]",,,4b1b4905f964a5208bfa23e3
2,Chalet du Mont-Royal,Historic Site,1196 voie Camillien-Houde,45.503904,-73.58746,"[{'label': 'display', 'lat': 45.503904, 'lng':...",1935,H3H 1A2,CA,Montréal,QC,Canada,[1196 voie Camillien-Houde (Parc du Mont-Royal...,Parc du Mont-Royal,,4cd6eeddaeb1224b49fc2aff
3,Want Apothecary,Clothing Store,"4960 Sherbrooke Street, West",45.477508,-73.605049,"[{'label': 'display', 'lat': 45.47750760210334...",2232,H3Z 1H3,CA,Westmount,QC,Canada,"[4960 Sherbrooke Street, West (Claremont Avenu...",Claremont Avenue,,4de4440ed4c07044059821ef
4,Drawn & Quarterly,Bookstore,211 rue Bernard Ouest,45.524748,-73.604614,"[{'label': 'display', 'lat': 45.52474756423502...",3097,H2T 2K5,CA,Montréal,QC,Canada,"[211 rue Bernard Ouest, Montréal QC H2T 2K5, C...",,,4ad4c06ff964a5205ffb20e3


In [0]:
# Generate a map

venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) 

In [43]:
# Add a red marker to the map to mark the center of Montreal

folium.CircleMarker(
    [latitude, longitude],
    radius=5,
    color='red',
    popup='Center of Montreal',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

<folium.vector_layers.CircleMarker at 0x7fd32fc47438>

In [0]:
# Add the places to the map as blue circle markers

for lat, lng, label in zip(df.lat, df.lng, df.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

In [45]:
venues_map

Clustering was performed using K-means for the venues obtained from the JSON file.


In [46]:
# Cluster the venues using K-means

from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Libraries imported.


In [0]:
# Assign the number of clusters
kclusters = 7

# Encode onehot
montreal_onehot = pd.get_dummies(df[['categories']])

# Run K-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(montreal_onehot)

# Add a column for cluster labels
montreal_onehot.insert(0, 'Cluster Labels', kmeans.labels_)


In [48]:
montreal_onehot.head()

Unnamed: 0,Cluster Labels,categories_Asian Restaurant,categories_BBQ Joint,categories_Bagel Shop,categories_Bakery,categories_Beer Store,categories_Bookstore,categories_Brewery,categories_Café,categories_Canal,categories_Cheese Shop,categories_Church,categories_Clothing Store,categories_Coffee Shop,categories_College Gym,categories_Concert Hall,categories_Deli / Bodega,categories_Falafel Restaurant,categories_Farmers Market,categories_Fish Market,categories_French Restaurant,categories_Garden,categories_Garden Center,categories_Gourmet Shop,categories_Grocery Store,categories_Gym,categories_Harbor / Marina,categories_Historic Site,categories_Hockey Arena,categories_Hotel,categories_Ice Cream Shop,categories_Liquor Store,categories_Malay Restaurant,categories_Market,categories_Martial Arts Dojo,categories_Mediterranean Restaurant,categories_Mexican Restaurant,categories_Middle Eastern Restaurant,categories_Music Venue,categories_Park,categories_Performing Arts Venue,categories_Plaza,categories_Portuguese Restaurant,categories_Pub,categories_Racetrack,categories_Restaurant,categories_Salad Place,categories_Sandwich Place,categories_Smoke Shop,categories_Spa,categories_Supermarket
0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [0]:
# Add the cluster labels into the dataframe
df['Cluster Labels'] = montreal_onehot['Cluster Labels'] 

To visualize the data, a color schema was set to clusters. It means that the color of the spots on the map are different depending on the categories that they belong to.


In [50]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# Set a color schema to clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys))) 
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers on the map
markers_colors = []
for lat, lon, poi, cluster in zip(df['lat'], df['lng'], df['categories'], df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

## Results <a name="results"></a>

From the JSON file, 100 venues in Montreal and their information were acquired.

The data was transformed into a dataframe. Through a modification of the table including filtering columns, extracting categories data, and cleaning column names keeping last terms, a visible dataframe was presented.


The venues were visualized on a map using Folium. The red marker presented the center of the city, and the blue ones displayed the venues.
The map showed that most spots are scattered in downtown and Le Plateau - Mont-Royal.

As the next step, the spots were clustered using K-means. All venues belonged to 7 clusters.


The clustered spots were visualized on the map. Spots were supposed to be presented in the same color by the cluster they belonged to. When clicking on the spots, the categories and cluster names were popped up.



Issue occured: various categories are included in same clusters

## Discussion <a name="discussion"></a>

During the investigation of visualization, an issue was found.
On the map, various types of venues were displayed in the same color. 


It was verified that these various kinds of business were included in the same cluster:

In [51]:
df.loc[df['Cluster Labels'] == 0, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
2,Historic Site,Chalet du Mont-Royal,1196 voie Camillien-Houde
3,Clothing Store,Want Apothecary,"4960 Sherbrooke Street, West"
4,Bookstore,Drawn & Quarterly,211 rue Bernard Ouest
7,Grocery Store,Milano Fruiterie,6862 boul. Saint-Laurent
9,Market,Marché Atwater,138 ave. Atwater
10,Farmers Market,Marché Jean-Talon,7070 avenue Henri-Julien
11,Historic Site,Vieux-Port de Montréal / Old Port of Montreal ...,Vieux-Port de Montréal
12,Performing Arts Venue,Place des Arts,175 rue Sainte-Catherine Ouest
14,Plaza,Place des Festivals,Quartier des Spectacles
15,Canal,Canal-de-Lachine,Lachine canal / Canal de Lachine


This issue still occurred when increasing the number of clusters. Only the first cluster had the problem. The issue seemed less messy when the spots were classified into 45 clusters. However, 45 clusters for 100 spots does not seem to be meaningful, nor for visualization. 




Also, there was another issue even when clustering them into 45 tables: unorganized category names. For example, “Mexican Restaurant” was in a different cluster than “Restaurant”. And “Coffee Shop” did not belong to the same cluster with “Café".

The goal of organization was set to group them in 7 general categories. To perform this, a new column, “NewCat” was added into the dataframe. Then the values under the categories were assigned according to the conditions. For example, if a value included a word, “Restaurant” or “coffee”, they belonged to a new category, “Food & Drinks”. 

In this way, the old category column had been replaced as the new one:

In [52]:
 # Create a new coulmn in the dataframe
 # Assign the values according to the conditions
 
 df.loc[(df['categories'] == 'Park')
  | (df['categories'] == 'Historic Site')
  | (df['categories'] == 'Plaza')
  | (df['categories'] == 'Church')
  | (df['categories'] == 'Canal')
  | (df['categories'] == 'Garden'), 'NewCat'] = 'Public Place'   

df.loc[(df['categories'] == 'Restaurant') 
  | (df['categories'] == 'Café')
  | (df['categories'].str.contains('Restaurant'))
  | (df['categories'] == 'Cheese Shop')
  | (df['categories'] == 'Beer Store')
  | (df['categories'] == 'Liquor Store')
  | (df['categories'] == 'Gourmet Shop')
  | (df['categories'] == 'Bagel Shop')
  | (df['categories'] == 'Deli / Bodega')
  | (df['categories'] == 'Sandwich Place')
  | (df['categories'] == 'Ice Cream Shop')
  | (df['categories'] == 'Pub')
  | (df['categories'] == 'Salad Place')
  | (df['categories'] == 'Brewery')
  | (df['categories'] == 'Bakery')
  | (df['categories'] == 'BBQ Joint')
  | (df['categories'] == 'Coffee Shop'),'NewCat'] = 'Food & Drink' 
   
df.loc[(df['categories'] == 'Grocery Store') 
  | (df['categories'] == 'Market')
  | (df['categories'] == 'Fish Market')
  | (df['categories'] == 'Farmers Market')
  | (df['categories'] == 'Supermarket'),'NewCat'] = 'Market'

df.loc[(df['categories'] == 'Garden Center') 
  | (df['categories'] == 'Smoke Shop')
  | (df['categories'] == 'Clothing Store')
  | (df['categories'] == 'Bookstore'),'NewCat'] = 'Store'

df.loc[(df['categories'] == 'Martial Arts Dojo') 
  | (df['categories'] == 'Gym')
  | (df['categories'] == 'Racetrack')
  | (df['categories'] == 'College Gym')
  | (df['categories'] == 'Hockey Arena')
  | (df['categories'] == 'Harbor / Marina'),'NewCat'] = 'Sports'

df.loc[(df['categories'] == 'Performing Arts Venue') 
  | (df['categories'] == 'Music Venue')
  | (df['categories'] == 'Concert Hall'),'NewCat'] = 'Entertainment' 

df.loc[(df['categories'] == 'Hotel') 
  | (df['categories'] == 'Spa'),'NewCat'] = 'Hotel & Spa' 

df.head(2)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id,Cluster Labels,NewCat
0,Parc du Mont-Royal,Park,1260 Chemin Remembrance,45.504079,-73.587321,"[{'label': 'display', 'lat': 45.50407921694641...",1953,H3H 1A2,CA,Montréal,QC,Canada,"[1260 Chemin Remembrance, Montréal QC H3H 1A2,...",,,4ad8f749f964a520871621e3,1,Public Place
1,Parc Westmount Park,Park,Parc Westmount,45.481574,-73.597476,"[{'label': 'display', 'lat': 45.48157356036293...",2010,,CA,Westmount,QC,Canada,"[Parc Westmount, Westmount QC, Canada]",,,4b1b4905f964a5208bfa23e3,1,Public Place


Let's replace the old category column with the new one:

In [0]:
# Move the new column as the second one

col_name = 'NewCat'
second_col = df.pop(col_name)
df.insert(1, col_name, second_col)

In [0]:
df_1 = df

In [55]:
# Remove the old column
df_1.drop(['categories'], axis=1, inplace=True)

# Rename the new column to replace the old one
df_1.rename(columns={'NewCat': 'categories'}, inplace=True)

df_1.head(2)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id,Cluster Labels
0,Parc du Mont-Royal,Public Place,1260 Chemin Remembrance,45.504079,-73.587321,"[{'label': 'display', 'lat': 45.50407921694641...",1953,H3H 1A2,CA,Montréal,QC,Canada,"[1260 Chemin Remembrance, Montréal QC H3H 1A2,...",,,4ad8f749f964a520871621e3,1
1,Parc Westmount Park,Public Place,Parc Westmount,45.481574,-73.597476,"[{'label': 'display', 'lat': 45.48157356036293...",2010,,CA,Westmount,QC,Canada,"[Parc Westmount, Westmount QC, Canada]",,,4b1b4905f964a5208bfa23e3,1


In [56]:
df_1['categories'].value_counts()

Food & Drink     53
Public Place     24
Market            6
Sports            6
Hotel & Spa       4
Store             4
Entertainment     3
Name: categories, dtype: int64

Then cluster the new results:

In [0]:
kclusters = 7
montreal_onehot_1 = pd.get_dummies(df_1[['categories']])
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(montreal_onehot_1)
montreal_onehot_1.insert(0, 'Cluster Labels', kmeans.labels_)

In [58]:
montreal_onehot_1.head(2)

Unnamed: 0,Cluster Labels,categories_Entertainment,categories_Food & Drink,categories_Hotel & Spa,categories_Market,categories_Public Place,categories_Sports,categories_Store
0,2,0,0,0,0,1,0,0
1,2,0,0,0,0,1,0,0


In [59]:
# Add new cluster labels to the dataframe
df_1['Cluster Labels'] = montreal_onehot_1['Cluster Labels'] 

# Figure out the number of places by clusters 
df_1['Cluster Labels'].value_counts()

1    53
2    24
4     6
3     6
5     4
0     4
6     3
Name: Cluster Labels, dtype: int64

The spots were visualized in new clusters as follows:

In [60]:
map_clusters_1 = folium.Map(location=[latitude, longitude], zoom_start=13)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys))) 
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(df_1['lat'], df_1['lng'], df_1['categories'], df_1['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_1)

map_clusters_1

Most spots are presented as purple markers. Those are included in the Cluster 1, “Food & Drink”. The result of visualization shows that most types of places are restaurant business.


To double check if the values of categories are consistent in each cluster, they were investigated by clusters:



In [61]:
# Investigate Cluster 0
df.loc[df['Cluster Labels'] == 0, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
3,Store,Want Apothecary,"4960 Sherbrooke Street, West"
4,Store,Drawn & Quarterly,211 rue Bernard Ouest
78,Store,Casa del Habano,1434 rue Sherbrooke Ouest
99,Store,Pépinière Jasmin,6305 Boulevard Henri Bourassa


In [62]:
# Investigate Cluster 1
df.loc[df['Cluster Labels'] == 1, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
5,Food & Drink,Boulangerie Guillaume,5132 boulevard St-Laurent
6,Food & Drink,Dispatch Coffee,267 St-Zotique
13,Food & Drink,Cadet,"1431, boul. Saint-Laurent"
17,Food & Drink,Café Olimpico,124 rue Saint-Viateur Ouest
18,Food & Drink,Boucherie Lawrence,5237 St Laurent
20,Food & Drink,Damas,1201 Van Horne
21,Food & Drink,La Tamalera,"226, avenue Fairmount Ouest"
22,Food & Drink,La Bête à Pain,195 rue Young
24,Food & Drink,Larry's,"9, avenue Fairmount Est"
26,Food & Drink,Café Parvis,"433, rue Mayor"


In [63]:
# Investigate Cluster 2
df.loc[df['Cluster Labels'] == 2, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
0,Public Place,Parc du Mont-Royal,1260 Chemin Remembrance
1,Public Place,Parc Westmount Park,Parc Westmount
2,Public Place,Chalet du Mont-Royal,1196 voie Camillien-Houde
8,Public Place,Parc La Fontaine,3933 rue du Parc-Lafontaine
11,Public Place,Vieux-Port de Montréal / Old Port of Montreal ...,Vieux-Port de Montréal
14,Public Place,Place des Festivals,Quartier des Spectacles
15,Public Place,Canal-de-Lachine,Lachine canal / Canal de Lachine
16,Public Place,Parc Sir-Wilfrid-Laurier,Parc Sir-Wilfrid-Laurier
25,Public Place,Basilique Notre-Dame (Basilique Notre-Dame de ...,"110, rue Notre-Dame Ouest"
28,Public Place,Parc Jeanne-Mance,Parc Jeanne-Mance


In [64]:
# Investigate Cluster 3
df.loc[df['Cluster Labels'] == 3, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
27,Sports,CEPSUM,2100 boul. Édouard-Montpetit
57,Sports,Quai de l'Horloge / Clock Tower quay,Rue Quai de l'Horloge
58,Sports,Centre Bell,1909 Ave. des Canadiens-de-Montréal
77,Sports,Circuit Gilles-Villeneuve,"1, circuit Gilles-Villeneuve"
81,Sports,Midtown Le Sporting Club Sanctuaire,6105 avenue du Boisé
93,Sports,Académie Arts Martiaux Brossard,"1505, boulevard Provencher"


In [65]:
# Investigate Cluster 4
df.loc[df['Cluster Labels'] == 4, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
7,Market,Milano Fruiterie,6862 boul. Saint-Laurent
9,Market,Marché Atwater,138 ave. Atwater
10,Market,Marché Jean-Talon,7070 avenue Henri-Julien
34,Market,Adonis,225 rue Peel
65,Market,PA Nature,5029 Ave du Parc
89,Market,Odessa l'Entrepôt,100-4900 rue Molson


In [66]:
# Investigate Cluster 5
df.loc[df['Cluster Labels'] == 5, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
29,Hotel & Spa,Scandinave Les Bains Vieux-Montréal,"71, rue de la Commune Ouest"
39,Hotel & Spa,Hôtel Le Germain Montréal - NOW OPEN,"2050, Rue Mansfield"
45,Hotel & Spa,"Bota Bota, spa-sur-l'eau",358 rue de la Commune O.
90,Hotel & Spa,Strøm Spa Nordique,"1001, boul. de la Forêt"


In [67]:
# Investigate Cluster 6
df.loc[df['Cluster Labels'] == 6, df.columns[[1] + [0] + [2]]]

Unnamed: 0,categories,name,address
12,Entertainment,Place des Arts,175 rue Sainte-Catherine Ouest
19,Entertainment,"96,9 CKOI",1100-800 rue De La Gauchetière Ouest
23,Entertainment,La Maison Symphonique de Montréal,1600 rue Saint-Urbain


It is checked that all clusters have consistent values for categories.
