# Segmenting and Clustering Neighborhoods in Toronto

##### Index of the notebook.
1. _Information retrival from wikipedia and storing into database,_
2. _Add neighbourhood latitude and longitude to the database,_
3. _Explore and cluster the neighborhoods in Toronto._
4. _Comment the result__

#### 1. Information retrival from Wikipedia and storing into database

In [21]:
import requests as req
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

# Retrive the HTML code and create a BeautifulSoup object.
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wiki_page = str(req.get(wiki_url).text)
soup=BeautifulSoup(wiki_page,'html.parser')

# Create a list with the informations contained in the table.
tag=soup.table
text=tag.get_text()
tmp_list=text.split('\n')
tmp_list2=tmp_list[1:-1]
new_list=[]
#print(tmp_list2) # uncomment to understand the for-cycle.

for i in range(0,len(tmp_list2),5):
    new_list.append([tmp_list2[i+1],tmp_list2[i+2],tmp_list2[i+3]])


# Create the database.
df_tor=pd.DataFrame(new_list[1:])
df_tor.columns=new_list[0]
df_tor.drop(df_tor[df_tor.Borough == 'Not assigned'].index, inplace=True) # Drop row with 'Borough' == 'Not assigned'.
df_tor.loc[df_tor['Neighbourhood'] == 'Not assigned', 'Neighbourhood'] = df_tor.loc[df_tor['Neighbourhood'] == 'Not assigned', 'Borough']       # Replace when 'Neighbourhood' == 'Not assigne' with the 'Borough' name.        
df_tor=df_tor.groupby(('Postcode','Borough'))['Neighbourhood'].unique()
df_tor=df_tor.to_frame()
df_tor.reset_index(inplace=True)
df_tor['Neighbourhood'] = df_tor['Neighbourhood'].apply(', '.join)
df_tor.reset_index(drop=True,inplace=True) # Reset index to 0 after dropping row.

df_tor.head(20) #uncomment to see the first 20 row of the database



Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


The above code uses BeautifulSoup functions in oder to get the text contained between the tags `<table>...</table>` 
used in the Wikipedia page to build a table. See comments in the code to understand the various instructions. The database assumes that, if not otherwise specified, the 'Borough' coincides with the 'Neighbourhood'.

In [20]:
df_tor.shape

(103, 3)

#### 2. Add neighbourhood latitude and longitude to the database

In [22]:
url_coord = 'http://cocl.us/Geospatial_data'
df_tor2 = pd.merge(left=df_tor,right=pd.read_csv(url_coord), how='left', left_on='Postcode', right_on='Postal Code')
df_tor2.drop('Postal Code',axis=1,inplace=True)
df_tor2.rename(columns={'Postcode':'Postal Code'},inplace=True)

The above code add latitude and longitude for each postal code by merging two databases. This is done since the geocode routine (install geocoder first)

```python
import geocoder 
lat_lng_coords = None
while(lat_lng_coords == None):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng  
print(lat_lng_coords)
```

does not work, as anticipated in the assignment instructions.If one wants to obtain latitude and longitude for a given address (i.e. not using a postal code) the ```geopy``` library works very well.

In [23]:
df_tor2.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


#### 3. Explore and cluster the neighborhoods in Toronto.

In [24]:
# Packages installation, uncomment if needed

!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
print('\nDone.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.0.2p             |       h470a237_1         3.1 MB  conda-forge
    certifi-2018.10.15         |        py36_1000         138 KB  conda-forge
    geopy-1.17.0               |             py_0          49 KB  conda-forge
    ca-certificates-2018.10.15 |       ha4d7672_0         135 KB  conda-forge
    conda-4.5.11               |        py36_1000         651 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         4.1 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0            conda-forge
    geopy:           

In [25]:
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim 
import folium as fo
import datetime
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Libraries loaded.')

Libraries loaded.


In [26]:
# Geo-query. (Do not work for postal code)

query = 'Toronto,Ontario'
geolocator = Nominatim()
location = geolocator.geocode(query)
lat_T = location.latitude
lon_T = location.longitude
print(lat_T,lon_T)



43.653963 -79.387207


In [47]:
Toronto_map = fo.Map(location=[lat_T, lon_T], zoom_start=12)

Toronto_map

Let us now select only the ```Borough``` containing the word ```Toronto``` to reduce our dataset (i.e. we are considering mainly the historic part, a.k.a. the old Toronto, see https://en.wikipedia.org/wiki/Old_Toronto).

In [28]:
df_rest = df_tor2[df_tor2['Borough'].str.contains('Toronto')]

Let us now explore a bit this dataset

In [45]:
df_rest.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 38 entries, 37 to 87
Data columns (total 6 columns):
Postal Code       38 non-null object
Borough           38 non-null object
Neighborhood      38 non-null object
Latitude          38 non-null float64
Longitude         38 non-null float64
Cluster labels    38 non-null int32
dtypes: float64(2), int32(1), object(3)
memory usage: 1.9+ KB


In [46]:
df_rest.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,0
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0
43,M4M,East Toronto,Studio District,43.659526,-79.340923,0
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0
46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0
47,M4S,Central Toronto,Davisville,43.704324,-79.38879,0
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0


At this point we can plot the Toronto map with cicles on the center of each Neighborhood.

In [30]:
# Add markers to map.
for lat, lng, bor, neigh,p_code in zip(df_rest['Latitude'], df_rest['Longitude'], df_rest['Borough'], df_rest['Neighbourhood'],df_rest['Postal Code']):
    label = 'Postal Code: {}, Borough: {}, Neighborhoud: {}.'.format(p_code,bor,neigh)
    label = fo.Popup(label, parse_html=True)
    fo.CircleMarker([lat, lng],
                    radius=5,
                    popup=label,
                    color='black',
                    fill=True,
                    fill_color='coral',
                    fill_opacity=0.5,
                    parse_html=False
                   ).add_to(Toronto_map)  
    
Toronto_map

Let us now retrive the information from Foursquare

__Note that the function below would return an error message if you already finish the number of free query for 'Foursquare'. This is because it would not be able to correctly classify all the information contained in the error string (error 429).__

In [31]:
# Foursquare setting

CLIENT_ID='3FTNYFFWXYMEPZDEKDQE0O03VSBTSJK4SBRPPHRQH4GXLIYZ'
CLIENT_SECRET='3YRBYJH3ZOXHAV25ZCDCBJHCKRIG5DYQFXTTRJHZT304KALE'
VERSION=datetime.date.today().strftime("%Y%m%d")
LIMIT=100

# Retriving function

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)         
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)            
        results = req.get(url).json()["response"]['groups'][0]['items']   
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)

# Retriving informations about restourant from Foursquare

df_info=getNearbyVenues(names=df_rest['Neighbourhood'],latitudes=df_rest['Latitude'],longitudes=df_rest['Longitude'])

Let us study how ```df_info``` look like.

In [32]:
print(df_info.shape)
df_info.head()

(1712, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
1,The Beaches,43.676357,-79.293031,Starbucks,43.678798,-79.298045,Coffee Shop
2,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.67873,-79.297478,Grocery Store
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


Now we check if the column name ```Neighborhood``` is present among in venues category column (retrieved from Fourtsquare).

#### P.A. 
__Neighborhood and Neighbourhood (note the presence of 'u') has the same meaning but the first is in American English while the second is in British English. The database we obtain from Wikipedia uses the British version, while the database we obtain from the Foursquare uses the American one. Thus, likely they do not coincides. However in order to use a the built in function for data manipulation in Foursquare, the easiest solution is to change the name in Wikipedia database. This generate s the issue that we are going to solve below.__

__By the way, I think it is a good habit to do this check (and eventually solve an issue like that) with real world database. Indeed you never know if some name of a database column coincides with a possible categorical variable.__

In [33]:
check_keyword = 'Neighborhood'
N=df_info['Venue Category'].str.contains('Neighborhood').sum()
print('Number of times \'Neighborhood\' appears in \'Venue Category\':',N)

Number of times 'Neighborhood' appears in 'Venue Category': 4


We have to take into account this fact later. Indeed when we transform the ```Venue Categroy``` categorical variable into numerical (using the function ```pd.get_dummies()```) we would have a column named ```Neighborhood```. To avoid problems, we will change the name of this column into  ```Neigborhood Venue Cat.```.

In [34]:
df_venues = pd.get_dummies(df_info[['Venue Category']], prefix="", prefix_sep="")
df_venues.rename(columns={'Neighborhood':'Neighborhood Venue Cat.'},inplace=True)
print(df_venues.shape)
df_venues.head()

(1712, 238)


Unnamed: 0,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


At this point we can add the column ```Neighborhood``` of the pandas dataframe ```df_info```

In [35]:
df_venues['ZZZ_Neighborhood'] = df_info['Neighborhood'] ### We added 'ZZZ_' to be sure that this is the last column.
fixed_columns = [df_venues.columns[-1]] + list(df_venues.columns[:-1])
df_info2=df_venues[fixed_columns]
df_info2.rename(columns={'ZZZ_Neighborhood':'Neighborhood'},inplace=True)
print(df_info2.shape)
df_info2.head()

(1712, 239)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
df_info2_grouped = df_info2.groupby('Neighborhood').mean().reset_index()


def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues=10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = df_info2_grouped['Neighborhood']

for ind in np.arange(df_info2_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_info2_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Thai Restaurant,American Restaurant,Cosmetics Shop,Hotel,Restaurant,Bar,Breakfast Spot
1,Berczy Park,Coffee Shop,Restaurant,Cocktail Bar,Cheese Shop,Steakhouse,Seafood Restaurant,Farmers Market,Pub,Café,Bakery
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Café,Breakfast Spot,Gym,Furniture / Home Store,Pet Store,Nightclub,Climbing Gym,Caribbean Restaurant,Restaurant
3,Business reply mail Processing Centre969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Butcher,Restaurant
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Plane,Sculpture Garden,Boutique,Boat or Ferry,Harbor / Marina,Airport Gate,Airport


At this point we can apply the K Mean algorithm to generate clusters (unsupervised leanrning).

In [69]:
kclusters = 5 #our choice
df_info2_cluster = df_info2_grouped.drop('Neighborhood',axis=1)
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(df_info2_cluster)
df_rest.rename(columns={'Neighbourhood':'Neighborhood'},inplace=True)
df_clust = df_rest
df_clust['Cluster labels'] = kmeans.labels_
df_clust = df_clust.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
df_clust.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Pop these, since the values are in `kwargs` under different names
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood Venue Cat.,Pub,Grocery Store,Coffee Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Yoga Studio,Furniture / Home Store,Pub,Pizza Place,Liquor Store
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Sandwich Place,Fast Food Restaurant,Light Rail Station,Pub,Ice Cream Shop,Movie Theater,Fish & Chips Shop,Burger Joint,Steakhouse,Park
43,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Bakery,Italian Restaurant,Gastropub,American Restaurant,Yoga Studio,Cheese Shop,Fish Market,Latin American Restaurant
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Dim Sum Restaurant,Bus Line,Park,Swim School,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [70]:
map_clusters = fo.Map(location=[lat_T, lon_T], zoom_start=12)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(df_clust['Latitude'], df_clust['Longitude'], df_clust['Neighborhood'], df_clust['Cluster labels']):
    label = fo.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    fo.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

At this point if you want you can explore each cluster. For example the first cluster can be visualized in the following way. 

In [56]:
df_clust.loc[df_clust['Cluster labels']==0,:]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Neighborhood Venue Cat.,Pub,Grocery Store,Coffee Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
41,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Bookstore,Italian Restaurant,Yoga Studio,Furniture / Home Store,Pub,Pizza Place,Liquor Store
42,M4L,East Toronto,"The Beaches West, India Bazaar",43.668999,-79.315572,0,Sandwich Place,Fast Food Restaurant,Light Rail Station,Pub,Ice Cream Shop,Movie Theater,Fish & Chips Shop,Burger Joint,Steakhouse,Park
43,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Bakery,Italian Restaurant,Gastropub,American Restaurant,Yoga Studio,Cheese Shop,Fish Market,Latin American Restaurant
44,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,0,Dim Sum Restaurant,Bus Line,Park,Swim School,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
45,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Clothing Store,Sandwich Place,Grocery Store,Burger Joint,Park,Breakfast Spot,Hotel,Food & Drink Shop,Donut Shop,Dog Run
46,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Clothing Store,Sporting Goods Shop,Coffee Shop,Yoga Studio,Chinese Restaurant,Dessert Shop,Rental Car Location,Diner,Salon / Barbershop,Sandwich Place
47,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Pizza Place,Dessert Shop,Sandwich Place,Italian Restaurant,Sushi Restaurant,Café,Coffee Shop,Seafood Restaurant,Japanese Restaurant,Farmers Market
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,0,Restaurant,Playground,Tennis Court,Park,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant
49,M4V,Central Toronto,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",43.686412,-79.400049,0,Coffee Shop,Pub,Pizza Place,American Restaurant,Light Rail Station,Sports Bar,Bagel Shop,Supermarket,Sushi Restaurant,Fried Chicken Joint


In similar way we can print the other cluster...

In [58]:
df_clust.loc[df_clust['Cluster labels']==1,:]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
69,M5W,Downtown Toronto,Stn A PO Boxes 25 The Esplanade,43.646435,-79.374846,1,Coffee Shop,Restaurant,Café,Seafood Restaurant,Pub,Hotel,Cocktail Bar,Creperie,Japanese Restaurant,Cosmetics Shop


In [59]:
df_clust.loc[df_clust['Cluster labels']==2,:]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,M5N,Central Toronto,Roselawn,43.711695,-79.416936,2,Garden,Yoga Studio,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [60]:
df_clust.loc[df_clust['Cluster labels']==3,:]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
65,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,3,Café,Sandwich Place,Coffee Shop,Pizza Place,French Restaurant,BBQ Joint,Jewish Restaurant,Pub,Martial Arts Dojo,Indian Restaurant
68,M5V,Downtown Toronto,"CN Tower, Bathurst Quay, Island airport, Harbo...",43.628947,-79.39442,3,Airport Lounge,Airport Terminal,Airport Service,Plane,Sculpture Garden,Boutique,Boat or Ferry,Harbor / Marina,Airport Gate,Airport


In [61]:
df_clust.loc[df_clust['Cluster labels']==4,:]

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,4,Coffee Shop,Café,Italian Restaurant,Sandwich Place,Middle Eastern Restaurant,Japanese Restaurant,Bar,Ice Cream Shop,Burger Joint,Indian Restaurant


#### 4. Comment the results

There is a very big cluster, while the others are small. This situation does not change too much even if we change the number of clusters.