# SEGMENTING AND CLUSTERING NEIGHBORHOODS IN TORONTO

In this notebook, we are going to segment and cluster neighborhood data in Toronto. First, we import required libraries.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

We use Wikipedia to get the neighborhood data. First we check whether it is legal to scrape the given url. We use the Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

response = requests.get(url)
print(response.status_code)

200


The output of 200 shows it is legal to scrape data from this Wikipedia page. So we scrape the data using the package BeautifulSoup. We need the tabular data from the Wikipedia page. On inspecting the Wikipedia page, we see the required data is under the < table > tag. So we find all the data under the < table > tag.

In [3]:
soup = BeautifulSoup(response.text, 'html.parser')

datatable = soup.find('table')
datatable

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

We see an html file. Now, we need to read this html file into a pandas dataframe. We use read_html function of pandas to complete this task. This gives us the list of dataframes which has only one element. So, we extract that element.

In [4]:
df = pd.read_html(str(datatable))
df = pd.DataFrame(df[0])
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Now, we rename the columns as required just for convenience.

In [5]:
df.columns = ['PostalCode', 'Borough', 'Neighborhood']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We dont need data with not assigned Borough. So we filter only those data that has a defined Borough.

In [6]:
rowsremove = df.Borough == 'Not assigned'
df = df[~rowsremove]
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Now, we see how many rows have a defined Borough and not assigned Neighborhood and we find there are no such rows. So this task is already accomplished.

In [7]:
naneigh = df.Neighborhood == 'Not assigned'
naneigh.sum()

0

Now, we check the shape of the dataframe.

In [8]:
df.shape

(103, 3)

We now need the latitude and longitude coordinates for the given postal codes for further analysis. We have the csv file with all the latitude and longitude information that we need. So we use pandas read_csv function and store it in a dataframe names latlng_df.

In [9]:
latlng = r'C:\Users\Administrator\Documents\DS\Geospatial_Coordinates.csv'

latlng_df = pd.read_csv(latlng)
latlng_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Rename the columns for convenience and merge this latlng_df with our original dataframe. We use merge function for this task. The two dataframes are merged by matching the PostalCode column as indicated by the parameter 'on'.

In [10]:
latlng_df.columns = ['PostalCode', 'Latitude', 'Longitude']
df = pd.merge(df, latlng_df, on = 'PostalCode', sort = True)
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [11]:
print("There are {} Boroughs and {} Neighborhoods in the dataset".format(len(df['Borough'].unique()), df.shape[0]))

There are 10 Boroughs and 103 Neighborhoods in the dataset


There are 10 Boroughs in total. Now we see how many neighborhoods exist in each Borough and analyze the Borough with the highest number of neighborhoods.

In [12]:
df['Borough'].value_counts()

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
York                 5
East Toronto         5
East York            5
Mississauga          1
Name: Borough, dtype: int64

Since North York spans the most number of PostalCodes, let us analyze data from North York.

In [13]:
northyork_df = df[df['Borough'] == 'North York'].reset_index(drop = True)
northyork_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M2H,North York,Hillcrest Village,43.803762,-79.363452
1,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556
2,M2K,North York,Bayview Village,43.786947,-79.385975
3,M2L,North York,"York Mills, Silver Hills",43.75749,-79.374714
4,M2M,North York,"Willowdale, Newtonbrook",43.789053,-79.408493


We have the location data according to the Postal Code. So we will use Postal Code for identification of a region. So we drop Borough and Neighborhood as we don't need them.

In [14]:
northyork_df.drop(['Borough', 'Neighborhood'], axis = 1, inplace = True)
northyork_df.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M2H,43.803762,-79.363452
1,M2J,43.778517,-79.346556
2,M2K,43.786947,-79.385975
3,M2L,43.75749,-79.374714
4,M2M,43.789053,-79.408493


First, we will get the location of the chosen Borough using geopy package and see it on the map using folium.

In [15]:
from geopy.geocoders import Nominatim
import folium

address = 'North York, Toronto'
geolocator = Nominatim(user_agent = 'my_explorer')
location = geolocator.geocode(address)
ny_latitude = location.latitude
ny_longitude = location.longitude

In [16]:
ny_map = folium.Map(location = [ny_latitude, ny_longitude], zoom_start = 11)

for lat, lng, label in zip(northyork_df['Latitude'], northyork_df['Longitude'], northyork_df['PostalCode']):
    folium.CircleMarker(
        [lat,lng],
        popup = label,
        radius = 3,
        color = 'blue',
        fill = True, 
        fill_opacity = 0.6,
    ).add_to(ny_map)
    
ny_map    

Let's get the data about venues nearby using foursquare API.

In [17]:
CLIENT_ID = 'KIXUSBGSQRTA3NZJKPX0ENV3MX0JBYX4VMXPQTO0G3EOSZEQ' 
CLIENT_SECRET = '1U4D11MNF0KVQJIFXMDPIR52KMPRB1WMPFKG4UGL3XK0WHDA' 
VERSION = '20180605' 
LIMIT = 100
RADIUS = 500

In [18]:
import requests
import json

latitude = northyork_df.loc[0,'Latitude']
longitude = northyork_df.loc[0, 'Longitude']
print('Latitude and Longitude are {} and {}'.format(latitude, longitude))

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,
                                                                                                                            CLIENT_SECRET,
                                                                                                                            VERSION,
                                                                                                                            latitude,
                                                                                                                            longitude,
                                                                                                                            RADIUS,
                                                                                                                            LIMIT
                                                                                                                           )

results = requests.get(url).json()
results

Latitude and Longitude are 43.8037622 and -79.3634517


{'meta': {'code': 200, 'requestId': '5fda897e20255f5eb0eb228e'},
 'response': {'headerLocation': 'Toronto',
  'headerFullLocation': 'Toronto',
  'headerLocationGranularity': 'city',
  'totalResults': 5,
  'suggestedBounds': {'ne': {'lat': 43.808262204500004,
    'lng': -79.3572281853783},
   'sw': {'lat': 43.7992621955, 'lng': -79.3696752146217}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ad9dce6f964a520651b21e3',
       'name': "Eagle's Nest Golf Club",
       'location': {'address': '10000 Dufferin Rd',
        'lat': 43.805454826002794,
        'lng': -79.36418592243415,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.805454826002794,
          'lng': -79.36418592243415}],
        'distance': 197,
        'cc': 'CA',
        'city': 'Toronto

In [19]:
def get_category(temp):
    try:
        category = temp['categories']
    except:
        category = temp['venue.categories']
        
    return category[0]['name']    

From the JSON file, we will now form a pandas dataframe.

In [20]:
from pandas.io.json import json_normalize

venues = results['response']['groups'][0]['items']
venues_df = json_normalize(venues)

col = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venues_df = venues_df.loc[:, col]
venues_df['venue.categories'] = venues_df.apply(get_category, axis = 1)

venues_df.columns = ['Name', 'Category', 'Latitude', 'Longitude']

venues_df.head()

  venues_df = json_normalize(venues)


Unnamed: 0,Name,Category,Latitude,Longitude
0,Eagle's Nest Golf Club,Golf Course,43.805455,-79.364186
1,AY Jackson Pool,Pool,43.804515,-79.366138
2,Villa Madina,Mediterranean Restaurant,43.801685,-79.363938
3,Duncan Creek Park,Dog Run,43.805539,-79.360695
4,A.Y. Jackson Secondary School Track,Athletics & Sports,43.805068,-79.366677


Now, let us get the data for all the postal code present in Borough North York.

In [21]:
def getvenues(names, latitude, longitude, radius = 500):
    v_list = []
    for name, lat, lng in zip(names, latitude, longitude):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID,
                                                                                                                            CLIENT_SECRET,
                                                                                                                            VERSION,
                                                                                                                            lat,
                                                                                                                            lng,
                                                                                                                            radius,
                                                                                                                            LIMIT
                                                                                                                           )
        results = requests.get(url).json()['response']['groups'][0]['items']
        v_list.append([(
            name,
            v['venue']['name'], 
            v['venue']['categories'][0]['name'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng']
        ) for v in results]) 
        
    final_df = pd.DataFrame([items for ven in v_list for items in ven])
    final_df.columns = ['PostalCode', 'Venue', 'Category', 'Latitude', 'Longitude']
    
    return(final_df)
        
        

In [22]:
data = getvenues(names = northyork_df['PostalCode'], latitude = northyork_df['Latitude'], longitude = northyork_df['Longitude'])
data.head()

Unnamed: 0,PostalCode,Venue,Category,Latitude,Longitude
0,M2H,Eagle's Nest Golf Club,Golf Course,43.805455,-79.364186
1,M2H,AY Jackson Pool,Pool,43.804515,-79.366138
2,M2H,Villa Madina,Mediterranean Restaurant,43.801685,-79.363938
3,M2H,Duncan Creek Park,Dog Run,43.805539,-79.360695
4,M2H,A.Y. Jackson Secondary School Track,Athletics & Sports,43.805068,-79.366677


In [23]:
onehot = pd.get_dummies(data['Category'], prefix = "", prefix_sep = "")
onehot = pd.concat([data['PostalCode'], onehot], axis = 1)
onehot.head()

Unnamed: 0,PostalCode,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Vietnamese Restaurant,Women's Store
0,M2H,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M2H,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M2H,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M2H,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M2H,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
mean_df = onehot.groupby('PostalCode').mean().reset_index()
mean_df.head()

Unnamed: 0,PostalCode,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,...,Supermarket,Supplement Shop,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Video Game Store,Vietnamese Restaurant,Women's Store
0,M2H,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M2J,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.028571,0.028571,...,0.0,0.014286,0.0,0.014286,0.0,0.014286,0.014286,0.014286,0.0,0.042857
2,M2K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M2L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M2N,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,...,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.030303,0.0


Now, we have our data ready for clustering. So we now import KMeans from sklearn package and cluster the Postal Codes.

In [25]:
from sklearn.cluster import KMeans

k = 5

cluster = mean_df.drop('PostalCode', axis = 1)

clust = KMeans(n_clusters = k, random_state = 0)
clust.fit(cluster)

clust.labels_[0:10]

array([0, 0, 0, 1, 0, 3, 0, 2, 0, 0])

Now we form a dataframe containing postal codes, their cluster label and the top 5 common venues in that postal code. 

In [26]:
def sort(df, top):
    to_sort = df.iloc[1:]
    res = to_sort.sort_values(ascending = False)
    
    return res.index.values[0:top]

In [27]:
top = 5

columns = ['PostalCode',
           'ClusterLabels',
           '1st Most Common Venue',
           '2nd Most Common Venue',
           '3rd Most Common Venue',
           '4th Most Common Venue',
           '5th Most Common Venue'
          ]

final = pd.DataFrame(columns = columns)
final['PostalCode'] = mean_df['PostalCode']
final['ClusterLabels'] = clust.labels_

for i in np.arange(mean_df.shape[0]):
    final.iloc[i, 2:] = sort(onehot.iloc[i, :], top)
    
final.head()    

Unnamed: 0,PostalCode,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M2H,0,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
1,M2J,0,Pool,Women's Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,M2K,0,Mediterranean Restaurant,Women's Store,Golf Course,Comfort Food Restaurant,Construction & Landscaping
3,M2L,1,Dog Run,Women's Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
4,M2N,0,Athletics & Sports,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping


We now add the location data to the dataframe so we can visualize the clustered data.

In [28]:
final = northyork_df.merge(final, on = 'PostalCode')
final.head()

Unnamed: 0,PostalCode,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M2H,43.803762,-79.363452,0,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
1,M2J,43.778517,-79.346556,0,Pool,Women's Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,M2K,43.786947,-79.385975,0,Mediterranean Restaurant,Women's Store,Golf Course,Comfort Food Restaurant,Construction & Landscaping
3,M2L,43.75749,-79.374714,1,Dog Run,Women's Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
4,M2N,43.77012,-79.408493,0,Athletics & Sports,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping


Now, we define color scheme for the labels and it as a column in the final dataframe. This again helps us in visualizing clustered data.

In [29]:
def getcolor(df):
    colors = []
    for i in range(len(df)):
        if df[i] == 0:
            colors.append('blue')
        elif df[i] == 1:
            colors.append('red')
        elif df[i] == 2:
            colors.append('green')
        elif df[i] == 3:
            colors.append('yellow')
        elif df[i] == 4:
            colors.append('magenta')
            
    return colors

In [30]:
colors = getcolor(final['ClusterLabels'])
final.insert(4, 'ClusterColor', colors)
final.head()

Unnamed: 0,PostalCode,Latitude,Longitude,ClusterLabels,ClusterColor,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M2H,43.803762,-79.363452,0,blue,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
1,M2J,43.778517,-79.346556,0,blue,Pool,Women's Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,M2K,43.786947,-79.385975,0,blue,Mediterranean Restaurant,Women's Store,Golf Course,Comfort Food Restaurant,Construction & Landscaping
3,M2L,43.75749,-79.374714,1,red,Dog Run,Women's Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
4,M2N,43.77012,-79.408493,0,blue,Athletics & Sports,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping


Now, we visualize the clustered data using folium.

In [31]:
cluster_map = folium.Map(location  = [ny_latitude, ny_longitude], zoom_start = 11)

for lat, lng, pc, clust, col in zip(final['Latitude'], final['Longitude'], final['PostalCode'], final['ClusterLabels'], final['ClusterColor']):
    label = folium.Popup(pc + 'Cluster' + str(clust))
    folium.CircleMarker(
        [lat, lng],
        radius = 3,
        color = col,
        fill = True,
        fill_opacity = 0.6
    ).add_to(cluster_map)
    
cluster_map

In [32]:
cluster0 = final.loc[final['ClusterLabels'] == 0, final.columns[[0] + list(range(5, final.shape[1]))]]
cluster0.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M2H,Golf Course,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping
1,M2J,Pool,Women's Store,Clothing Store,Comfort Food Restaurant,Construction & Landscaping
2,M2K,Mediterranean Restaurant,Women's Store,Golf Course,Comfort Food Restaurant,Construction & Landscaping
4,M2N,Athletics & Sports,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping
6,M2R,Toy / Game Store,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping


In [33]:
cluster1 = final.loc[final['ClusterLabels'] == 1, final.columns[[0] + list(range(5, final.shape[1]))]]
cluster1.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,M2L,Dog Run,Women's Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store


In [34]:
cluster2 = final.loc[final['ClusterLabels'] == 2, final.columns[[0] + list(range(5, final.shape[1]))]]
cluster2.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,M3A,American Restaurant,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping
12,M3K,Coffee Shop,Dog Run,Comfort Food Restaurant,Construction & Landscaping,Convenience Store
19,M6B,Theater,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping
20,M6L,Clothing Store,Women's Store,Dog Run,Comfort Food Restaurant,Construction & Landscaping


In [35]:
cluster3 = final.loc[final['ClusterLabels'] == 3, final.columns[[0] + list(range(5, final.shape[1]))]]
cluster3.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,M2P,Shopping Mall,Clothing Store,Comfort Food Restaurant,Construction & Landscaping,Convenience Store


In [36]:
cluster4 = final.loc[final['ClusterLabels'] == 4, final.columns[[0] + list(range(5, final.shape[1]))]]
cluster4.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,M9M,Juice Bar,Golf Course,Comfort Food Restaurant,Construction & Landscaping,Convenience Store


Since, there arent many elements in one cluster, it is hard to draw conclusions on it. But it is visible that cluster0 mostly contains sports related venues as the most common venue. This analysis can also be carried for all the Boroughs together in a similar manner. 