# Explore neighborhoods of Toronto
#### This Notebook has three sections: <br>
   1. Website **Scraping** to build Toronto neighborhoods table <br>
   2. Using **Geocoder** to retrieve GPS coordinate <br>
   3. **Explore** the Neighborhoods <br>
   4. **Cluster** the neighborhoods </br>
<br>
***
### `Scroll down to section (3) to see the python code for clustering and exploring`</p><br>
***
<br>

# Section (1): Website Scraping
#### Use this Notebook to scrape the following Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [219]:
import requests
import lxml.html as lh
import pandas as pd
import numpy as np

### Read data from wikipedia

In [220]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' #Create a handle, page, to handle the contents of the website
page = requests.get(url) #Store the contents of the website under doc
doc = lh.fromstring(page.content) #Parse data that are stored between <tr>..</tr> of HTML
tr_elements = doc.xpath('//tr')

In [221]:
#Check the length of the first 12 rows
[len(T) for T in tr_elements[:12]]

[3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]

#### Get the Columns for the new table

In [222]:
tr_elements = doc.xpath('//tr') #Create empty list
col = []
i = 0 #For each row, store each first element (header) and an empty list
for t in tr_elements[0]:
    i+= 1
    name = t.text_content().rstrip("\n") # ==> USE RSTRIP to REMOVE TRAILING NEW-LINE CHARACTERS
    print("%d: %s" %(i,name))
    col.append((name,[]))

1: Postal Code
2: Borough
3: Neighbourhood


##### Retrieve each row from the web page

In [223]:
#Since out first row is the header, data is stored on the second row onwards
for j in range(1,len(tr_elements)):
    #T is our j'th row
    T=tr_elements[j]
    
    #If row is not of size 3, the //tr data is not from our table 
    if len(T)!=3:
        break
    
    #i is the index of our column
    i = 0
    
    #Iterate through each element of the row
    for t in T.iterchildren():
        data=t.text_content().rstrip("\n") # ==> USE RSTRIP to REMOVE TRAILING NEW-LINE CHARACTERS
        #Check if row is empty
        if i>0:
        #Convert any numerical value to integers
            try:
                data=int(data)
            except:
                pass
        #Append the data to the empty list of the i'th column
        col[i][1].append(data)
        #Increment i for the next column
        i+=1

In [224]:
# Ensure we retrieve an equal number of rows
[len(C) for (title,C) in col]

[181, 181, 181]

In [225]:
# Create the dataframe
Dict={title:column for (title,column) in col}
df=pd.DataFrame(Dict)

#### Delete rows that don't have an assigned borough. 

In [226]:
df = df[df.Borough != 'Not assigned']

In [227]:
df.shape

(104, 3)

In [228]:
df.tail(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."
180,,Canadian postal codes,


In [229]:
# If a cell has a borough and there is no assigned neighborhood, then the neighborhood will be the same as the borough
df['Neighbourhood'] = np.where(df['Neighbourhood'] == 'Not assigned', df['Borough'], df['Neighbourhood'])

In [230]:
# Delete last row from the table
df = df.iloc[:-1]

In [231]:
df.tail(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


# Section (2): Using Geocoder

### Install Geocoder package

In [232]:
# Geocoder is a simple and consistent geocoding library written in Python.
# See also https://geocoder.readthedocs.io/
# Install Geocoder package
!pip install geocoder



### Call Geocoder API on Open Street Map to retrieve the GPS locations

In [233]:
import geocoder

latitude = []                        # create empty list to temporarily store GPS coordinates
longitude = []                       # create second empty list to store GPS coordinates
lst = df["Neighbourhood"].to_numpy() # retrieve a list of neighbourhoods
    
x = 0
 
# Iterating using while loop 
while x < len(lst): 
    neighbourhood = lst[x]
    query = neighbourhood.split(",", 5)                                                # split variable if multiple neighborhouds exists
    g = geocoder.osm('{}, Toronto, Ontario'.format(query[0]))                          # select the first value from the string, as query[0]
    y = g.lat
    z = g.lng
    
    if len(query) > 1 and not y :                                  # If no GPS was found, we can try to check the next Neighbourhood (if available) 
        g = geocoder.osm('{}, Toronto, Ontario'.format(query[1]))  # Take second variable if g return is empty
        y = g.lat
        z = g.lng  
    
    latitude.append(y)
    longitude.append(z) 
    x = x+1

### Add the results as "latitude" and "longitude" columns to the Table

In [234]:
# Add the two lists as columns to the table
df["Latitude"] = latitude
df["Longitude"] = longitude

### Fix a wrong GPS coordination </br>
Accidentally discovered the Geocoder API found the wrong location for **Richmond, Adelaide, King**. </br>
There is a Richmond park in the Scarborough area and the Geocoder found that GPS location instead. </br>
* The postal code: M5H
* Borough: Downtown Toronto

In [235]:
#df['Neighbourhood'] = "Richmond, Adelaide, King"
df.loc[df["Neighbourhood"] == 'Richmond, Adelaide, King']

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
49,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.812589,-79.26337


In [236]:
address = 'Adelaide, Toronto, Ontario'

import geocoder

M5H_geo = geocoder.osm(address, maxrows = 5)   
#M5H_geo.json
print('The geograpical coordinate of MH5 postal code are:', M5H_geo.latlng )

The geograpical coordinate of MH5 postal code are: [43.65082325, -79.37793584643234]


#### Fix the Latitude and Longitude for this postcode and neighbourhood

In [237]:
latitude = M5H_geo.lat
longitude = M5H_geo.lng

df.loc[df['Neighbourhood'] == 'Richmond, Adelaide, King', ['Latitude','Longitude']] = latitude , longitude 
df.loc[df["Neighbourhood"] == 'Richmond, Adelaide, King']

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
49,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650823,-79.377936


### Indentify remaining misfits that have no GPS coordinates

In [238]:
# Indentify misfits that have no GPS coordinates
df1 = df[df.isna().any(axis=1)]
df1.head(20)
#df.loc[60:100,:]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
32,M6E,York,Caledonia-Fairbanks,,
40,M5G,Downtown Toronto,Central Bay Street,,
114,M7R,Mississauga,Canada Post Gateway Processing Centre,,
148,M5W,Downtown Toronto,Stn A PO Boxes,,
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",,


### Loop through Borough's to retrieve remaining coordinates

For some neighbourhoods no matching GPS coordinates were found on Open Street Maps. So switching to Borough's instead. 

In [239]:
# loop to retrieve Burough's from Open Street Maps
latitude = []                        
longitude = [] 
lst = df1["Borough"].to_numpy()

x = 0

while x < len(df1):
    borough = lst[x]
    g = geocoder.osm('{}, Toronto, Ontario'.format(borough))  
    a = g.lat
    b = g.lng
    df1['Latitude'].iloc[x] = a
    df1['Longitude'].iloc[x] = b
    x = x+1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


In [240]:
df1

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
32,M6E,York,Caledonia-Fairbanks,43.689619,-79.479188
40,M5G,Downtown Toronto,Central Bay Street,43.656322,-79.380916
114,M7R,Mississauga,Canada Post Gateway Processing Centre,43.668384,-79.587058
148,M5W,Downtown Toronto,Stn A PO Boxes,43.656322,-79.380916
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.721789,-79.374027


### Update dataframe with remaining coordinates

In [241]:
# Merge results into original table
df.update(df1)

In [242]:
# See if there are still records with no GPS data
print(df[df.isna().any(axis=1)])

Empty DataFrame
Columns: [Postal Code, Borough, Neighbourhood, Latitude, Longitude]
Index: []


In [243]:
#df.head(20)
#df.loc[30:50,:]

### Display final table with geographical coordinates of the neighborhoods of TORONTO

This table was created by scraping Wikipedia and using Geocoder to retrieve the latitude and the longitude coordinates of each neighborhood. 

In [244]:
df.shape

(103, 5)

In [245]:
df.head(30)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.761124,-79.324059
3,M4A,North York,Victoria Village,43.732658,-79.311189
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.722079,-79.437507
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.39034
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.620087,-79.512783
9,M1B,Scarborough,"Malvern, Rouge",43.809196,-79.221701
11,M3B,North York,Don Mills,43.775347,-79.345944
12,M4B,East York,"Parkview Hill, Woodbine Gardens",43.653482,-79.383935
13,M5B,Downtown Toronto,"Garden District, Ryerson",43.6565,-79.377114


# Section (3): Explore the neighbourhoods in Toronto

In [246]:
df_toronto = df

In [247]:
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.761124,-79.324059
3,M4A,North York,Victoria Village,43.732658,-79.311189
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.722079,-79.437507
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.39034


In [248]:
print('The dataframe has {} boroughs and {} neighbourhoods.'.format(
        len(df_toronto['Borough'].unique()),
        df_toronto.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighbourhoods.


### Use geocoder API to get the latitude and longitude values of Toronto.

In [249]:
address = 'Toronto, Ontario'

import geocoder

toronto_geo = geocoder.osm(address)     

latitude = toronto_geo.lat
longitude = toronto_geo.lng

#print('The geograpical coordinate of Toronto are:', toronto_geo.osm )
print('The geograpical coordinate of Toronto are:', toronto_geo.latlng )

The geograpical coordinate of Toronto are: [43.6534817, -79.3839347]


### Create a map of Toronto with neighborhoods superimposed on top

In [250]:
# install Folium package to manipulate the data in Python and visualize it in on a Leaflet map via folium.
!pip install folium



In [251]:
import folium # map rendering library

In [252]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [253]:
df_toronto['Borough'].unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

## Segmentation of Neighborhoods
let's simplify the above map and segment and cluster only the Downtown Toronto neighborhoods. </br> 

In [254]:
central_data = df_toronto[df_toronto['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
central_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.39034
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6565,-79.377114
3,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704
4,M5E,Downtown Toronto,Berczy Park,43.647984,-79.375396


In [255]:
print('The dataframe has {} boroughs and {} neighbourhoods.'.format(
        len(central_data['Borough'].unique()),
        central_data.shape[0]
    )
)

The dataframe has 1 boroughs and 19 neighbourhoods.


In [256]:
address = 'Downtown Toronto, Ontario'

import geocoder

downtown_geo = geocoder.osm(address)     

latitude1 = downtown_geo.lat
longitude1 = downtown_geo.lng

print('The geograpical coordinate of Downtown Toronto are:', downtown_geo.latlng )

The geograpical coordinate of Downtown Toronto are: [43.6563221, -79.3809161]


In [257]:
# create map of Toronto using latitude and longitude values
map_downtown = folium.Map(location=[latitude1, longitude1], zoom_start=14)

# add markers to map
for lat, lng, label in zip(central_data['Latitude'], central_data['Longitude'], central_data['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

#### Define Foursquare Credentials and Version

In [258]:
CLIENT_ID = '2IXR2M0ARLOGHVYERLMYQR3JZCTXK4B32XTTQTUI1OK5KZYK' # Foursquare ID
CLIENT_SECRET = 'IBEUQWCXEWT344AWW1DXOLOF2YTF0UN4U4331S2DRPUFN1M3' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 2IXR2M0ARLOGHVYERLMYQR3JZCTXK4B32XTTQTUI1OK5KZYK
CLIENT_SECRET:IBEUQWCXEWT344AWW1DXOLOF2YTF0UN4U4331S2DRPUFN1M3


## Explore Neighbourhoods in Downtown Toronto

#### Let's create a function to repeat the same process to all the neighbourhoods in Downtown Toronto

In [259]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run the above function on each neighborhood and create a new dataframe called _torontovenues.

In [260]:
# type your answer here
toronto_venues = getNearbyVenues(names=central_data['Neighbourhood'],
                                   latitudes=central_data['Latitude'],
                                   longitudes=central_data['Longitude']
                                  )

# check the size of the resulting dataframe
#print(toronto_venues.shape)
#toronto_venues.head(10)

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
Stn A PO Boxes
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


#### Check how many venues were returned for each neighbourhood

In [261]:
# check how many venues were returned for each neighborhood
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,100,100,100,100,100,100
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",61,61,61,61,61,61
Central Bay Street,100,100,100,100,100,100
Christie,57,57,57,57,57,57
Church and Wellesley,75,75,75,75,75,75
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Garden District, Ryerson",67,67,67,67,67,67
"Harbourfront East, Union Station, Toronto Islands",100,100,100,100,100,100
"Kensington Market, Chinatown, Grange Park",96,96,96,96,96,96


#### Find out how many unique categories can be curated from all the returned venues

In [262]:
# find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 198 uniques categories.


### Analyze each neighbourhood

#### Group rows by neighbourhood and by taking the mean of the frequency of occurrence of each category

In [263]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
#toronto_onehot = toronto_onehot[ ['Neighborhood'] + [ col for col in toronto_onehot.columns if col != 'Neighborhood' ] ]
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# Group rows by neighborhood
toronto_grouped = toronto_onehot.groupby("Neighbourhood").mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
1,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,...,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0
3,Christie,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017544,0.017544,0.0,0.017544,0.0,0.0,0.0
4,Church and Wellesley,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.026667
5,"Commerce Court, Victoria Hotel",0.0,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
6,"First Canadian Place, Underground city",0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.03,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
9,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010417,0.0,...,0.0,0.052083,0.0,0.0,0.03125,0.0,0.010417,0.0,0.0,0.0


### Create new dataframe and display the top 10 venues for each neighborhood.

1. First, let's write a function to sort the venues in descending order.
2. Create a new dataframe and display the top 10 venues for each neighborhood. 

In [264]:
# First, let's write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Second, create new dataframe and display the top 10 venues for each neighborhood.¶

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head(20)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Restaurant,Italian Restaurant,Café,Japanese Restaurant,Hotel,Beer Bar,Gastropub,Gym,Bakery
1,"CN Tower, King and Spadina, Railway Lands, Har...",Hotel,Coffee Shop,Baseball Stadium,Pizza Place,Restaurant,Italian Restaurant,Scenic Lookout,Ice Cream Shop,Gym,Aquarium
2,Central Bay Street,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Hotel,Movie Theater,Bookstore,Ramen Restaurant,Electronics Store,Sandwich Place,Bubble Tea Shop
3,Christie,Korean Restaurant,Coffee Shop,Grocery Store,Cocktail Bar,Indian Restaurant,Ice Cream Shop,Sandwich Place,Café,Karaoke Bar,Mexican Restaurant
4,Church and Wellesley,Coffee Shop,Japanese Restaurant,Gay Bar,Sushi Restaurant,Restaurant,Café,Men's Store,Mediterranean Restaurant,Hotel,Yoga Studio
5,"Commerce Court, Victoria Hotel",Coffee Shop,Restaurant,Hotel,Gastropub,Japanese Restaurant,Italian Restaurant,Café,Beer Bar,Seafood Restaurant,Gym
6,"First Canadian Place, Underground city",Coffee Shop,Hotel,Café,Japanese Restaurant,American Restaurant,Restaurant,Gym,Seafood Restaurant,Salad Place,Deli / Bodega
7,"Garden District, Ryerson",Clothing Store,Coffee Shop,Restaurant,Theater,Lingerie Store,Café,Electronics Store,Bookstore,Japanese Restaurant,Fast Food Restaurant
8,"Harbourfront East, Union Station, Toronto Islands",Coffee Shop,Café,Hotel,Restaurant,Pizza Place,Italian Restaurant,Brewery,Fried Chicken Joint,Chinese Restaurant,Sporting Goods Shop
9,"Kensington Market, Chinatown, Grange Park",Bar,Café,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Vietnamese Restaurant,Hostel,Nightclub,Bakery,Taco Place


# Section (4) : Cluster Neighbourhoods

Run k-means to cluster the neighborhoods into 5 clusters.

In [265]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 3, 1, 1, 1, 1, 1, 1], dtype=int32)

####  Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [266]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = central_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')  

toronto_merged.head(20) # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.660706,-79.360457,1,Coffee Shop,Thai Restaurant,Pharmacy,Park,Performing Arts Venue,Pet Store,Pool,Pub,Restaurant,Electronics Store
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.659659,-79.39034,1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,French Restaurant,Restaurant,Japanese Restaurant,Thai Restaurant,Bubble Tea Shop,Juice Bar
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.6565,-79.377114,1,Clothing Store,Coffee Shop,Restaurant,Theater,Lingerie Store,Café,Electronics Store,Bookstore,Japanese Restaurant,Fast Food Restaurant
3,M5C,Downtown Toronto,St. James Town,43.669403,-79.372704,4,Coffee Shop,Café,Pizza Place,Grocery Store,Pharmacy,Bar,Sandwich Place,Caribbean Restaurant,Market,Food & Drink Shop
4,M5E,Downtown Toronto,Berczy Park,43.647984,-79.375396,1,Coffee Shop,Restaurant,Italian Restaurant,Café,Japanese Restaurant,Hotel,Beer Bar,Gastropub,Gym,Bakery
5,M5G,Downtown Toronto,Central Bay Street,43.656322,-79.380916,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Hotel,Movie Theater,Bookstore,Ramen Restaurant,Electronics Store,Sandwich Place,Bubble Tea Shop
6,M6G,Downtown Toronto,Christie,43.664111,-79.418405,3,Korean Restaurant,Coffee Shop,Grocery Store,Cocktail Bar,Indian Restaurant,Ice Cream Shop,Sandwich Place,Café,Karaoke Bar,Mexican Restaurant
7,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650823,-79.377936,1,Coffee Shop,Restaurant,Gastropub,American Restaurant,Café,Japanese Restaurant,Gym,Cosmetics Shop,Seafood Restaurant,Clothing Store
8,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.64008,-79.38015,1,Coffee Shop,Café,Hotel,Restaurant,Pizza Place,Italian Restaurant,Brewery,Fried Chicken Joint,Chinese Restaurant,Sporting Goods Shop
9,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647377,-79.381372,1,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Asian Restaurant,Gastropub,Steakhouse


#### Let's visualize the resulting clusters

In [267]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters


Examine each cluster and determine the **discriminating venue categories** that distinguish Cluster 1

#### Cluster 1 - Downtown Toronto

In [268]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",1,Coffee Shop,Thai Restaurant,Pharmacy,Park,Performing Arts Venue,Pet Store,Pool,Pub,Restaurant,Electronics Store
1,"Queen's Park, Ontario Provincial Government",1,Coffee Shop,Café,Sandwich Place,Italian Restaurant,French Restaurant,Restaurant,Japanese Restaurant,Thai Restaurant,Bubble Tea Shop,Juice Bar
2,"Garden District, Ryerson",1,Clothing Store,Coffee Shop,Restaurant,Theater,Lingerie Store,Café,Electronics Store,Bookstore,Japanese Restaurant,Fast Food Restaurant
4,Berczy Park,1,Coffee Shop,Restaurant,Italian Restaurant,Café,Japanese Restaurant,Hotel,Beer Bar,Gastropub,Gym,Bakery
5,Central Bay Street,1,Coffee Shop,Clothing Store,Middle Eastern Restaurant,Hotel,Movie Theater,Bookstore,Ramen Restaurant,Electronics Store,Sandwich Place,Bubble Tea Shop
7,"Richmond, Adelaide, King",1,Coffee Shop,Restaurant,Gastropub,American Restaurant,Café,Japanese Restaurant,Gym,Cosmetics Shop,Seafood Restaurant,Clothing Store
8,"Harbourfront East, Union Station, Toronto Islands",1,Coffee Shop,Café,Hotel,Restaurant,Pizza Place,Italian Restaurant,Brewery,Fried Chicken Joint,Chinese Restaurant,Sporting Goods Shop
9,"Toronto Dominion Centre, Design Exchange",1,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Japanese Restaurant,Seafood Restaurant,Asian Restaurant,Gastropub,Steakhouse
10,"Commerce Court, Victoria Hotel",1,Coffee Shop,Restaurant,Hotel,Gastropub,Japanese Restaurant,Italian Restaurant,Café,Beer Bar,Seafood Restaurant,Gym
12,"Kensington Market, Chinatown, Grange Park",1,Bar,Café,Mexican Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Vietnamese Restaurant,Hostel,Nightclub,Bakery,Taco Place


<font color='red'>
Conclusion: Downtown Toronto is full of Coffee Shops
</font>

