### Installing lxml to read html from Wikipedia Page

In [1]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/68/30/affd16b77edf9537f5be051905f33527021e20d563d013e8c42c7fd01949/lxml-4.4.2-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 10.4MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.2
Note: you may need to restart the kernel to use updated packages.


### Installing Geopy and Folium Packages

In [1]:
!conda install -c conda-forge geopy --yes #Installing Geopy
!conda install -c conda-forge folium=0.5.0 --yes #Installing Folium

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    scikit-learn-0.20.1        |   py36h22eb022_0         5.7 MB
    liblapack-3.8.0            |      11_openblas          10 KB  conda-forge
    liblapacke-3.8.0           |      11_openblas          10 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    libopenblas-0.3.6          |       h5a2b251_2         7.7 MB
    scipy-1.4.1                |   py36h921218d_0        18.9 MB  conda-forge
    libcblas-3.8.0             |      11_openblas        

### Installing several necessary libraries

In [2]:
import pandas as pd #Importing pandas library
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Extract the tables from the Wiki Page into Dataframe and view the list of tables obtained

In [3]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M') #Reading all tables in html into dataframe
for df in dfs: #Displaying all dataframes in the wiki page
    print(df)

    Postcode           Borough          Neighbourhood
0        M1A      Not assigned           Not assigned
1        M2A      Not assigned           Not assigned
2        M3A        North York              Parkwoods
3        M4A        North York       Victoria Village
4        M5A  Downtown Toronto           Harbourfront
..       ...               ...                    ...
282      M8Z         Etobicoke              Mimico NW
283      M8Z         Etobicoke     The Queensway West
284      M8Z         Etobicoke  Royal York South West
285      M8Z         Etobicoke         South of Bloor
286      M9Z      Not assigned           Not assigned

[287 rows x 3 columns]
                                                  0   \
0                                                NaN   
1  NL NS PE NB QC ON MB SK AB BC NU/NT YT A B C E...   
2                                                 NL   
3                                                  A   

                                               

### Store the table required into another dataframe

In [4]:
pc = dfs[0] #Selecting the required dataframe from all the dataframes available
pc

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West
285,M8Z,Etobicoke,South of Bloor


### Drop all row entries where Borough is not assigned

In [5]:
pc.drop(pc[pc['Borough']=='Not assigned'].index, inplace = True) #drop rows where Borough is "Not Assigned"
pc

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
...,...,...,...
281,M8Z,Etobicoke,Kingsway Park South West
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West


### Group all rows entries with same Postcode and combine the respective neighbourhoods in those rows separated by commas 

In [6]:
pc = pc.groupby('Postcode').agg(lambda x:", ".join(set(x))) # Group values by Postcode and join neighbourhood values using "," where Post Code is same
pc

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Malvern, Rouge"
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
M1E,Scarborough,"Morningside, Guildwood, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,Etobicoke,"Richview Gardens, Kingsview Village, St. Phill..."
M9V,Etobicoke,"Mount Olive, South Steeles, Thistletown, Silve..."


### Replace the Neighbourhood name with Borough Name where Neighbourhood name is not attached

In [7]:
pc.loc[pc['Neighbourhood']=='Not assigned',"Neighbourhood"]=pc.loc[pc['Neighbourhood']=='Not assigned',"Borough"] # Replacing corresponding Borough value where Neighbourhood is "Not assigned" 
pc

Unnamed: 0_level_0,Borough,Neighbourhood
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,Scarborough,"Malvern, Rouge"
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
M1E,Scarborough,"Morningside, Guildwood, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,Etobicoke,"Richview Gardens, Kingsview Village, St. Phill..."
M9V,Etobicoke,"Mount Olive, South Steeles, Thistletown, Silve..."


### Find the number of rows in the dataframe

In [8]:
pc.shape[0] #Number of rows

103

### Read Latitude and Longitude csv file

In [9]:
ld = pd.read_csv("https://cocl.us/Geospatial_data")
ld

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


### Set index of 'ld' dataframe to Postal Code

In [10]:
ld.set_index("Postal Code")

Unnamed: 0_level_0,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476
...,...,...
M9N,43.706876,-79.518188
M9P,43.696319,-79.532242
M9R,43.688905,-79.554724
M9V,43.739416,-79.588437


### Merge Dataframe 'pc' with Dataframe 'ld'

In [11]:
result=pc.merge(ld, left_on="Postcode", right_on="Postal Code")
result

Unnamed: 0,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,Scarborough,"Malvern, Rouge",M1B,43.806686,-79.194353
1,Scarborough,"Highland Creek, Rouge Hill, Port Union",M1C,43.784535,-79.160497
2,Scarborough,"Morningside, Guildwood, West Hill",M1E,43.763573,-79.188711
3,Scarborough,Woburn,M1G,43.770992,-79.216917
4,Scarborough,Cedarbrae,M1H,43.773136,-79.239476
...,...,...,...,...,...
98,York,Weston,M9N,43.706876,-79.518188
99,Etobicoke,Westmount,M9P,43.696319,-79.532242
100,Etobicoke,"Richview Gardens, Kingsview Village, St. Phill...",M9R,43.688905,-79.554724
101,Etobicoke,"Mount Olive, South Steeles, Thistletown, Silve...",M9V,43.739416,-79.588437


### Set Index to Postal Code

In [12]:
result.set_index("Postal Code")

Unnamed: 0_level_0,Borough,Neighbourhood,Latitude,Longitude
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
M1E,Scarborough,"Morningside, Guildwood, West Hill",43.763573,-79.188711
M1G,Scarborough,Woburn,43.770992,-79.216917
M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...
M9N,York,Weston,43.706876,-79.518188
M9P,Etobicoke,Westmount,43.696319,-79.532242
M9R,Etobicoke,"Richview Gardens, Kingsview Village, St. Phill...",43.688905,-79.554724
M9V,Etobicoke,"Mount Olive, South Steeles, Thistletown, Silve...",43.739416,-79.588437


### Obtain Co-ordinates of Toronto

In [13]:
address = 'Toronto'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Totonto is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Totonto is 43.653963, -79.387207.


### Map of Toronto with neighborhoods superimposed on top

In [14]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(result['Latitude'], result['Longitude'], result['Borough'], result['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### List of Boroughs containing the key word Toronto

In [15]:
borough_names = list(result.Borough.unique())

toronto = []

for x in borough_names:
    if "toronto" in x.lower():
        toronto.append(x)
        
toronto

['East Toronto', 'Central Toronto', 'Downtown Toronto', 'West Toronto']

### Creating dataframe of rows entries with Borough name containing "Toronto

In [16]:
toronto_df = result[result['Borough'].isin(toronto)].reset_index(drop=True)
print(toronto_df.shape)
toronto_df.head(39)

(39, 5)


Unnamed: 0,Borough,Neighbourhood,Postal Code,Latitude,Longitude
0,East Toronto,The Beaches,M4E,43.676357,-79.293031
1,East Toronto,"Riverdale, The Danforth West",M4K,43.679557,-79.352188
2,East Toronto,"The Beaches West, India Bazaar",M4L,43.668999,-79.315572
3,East Toronto,Studio District,M4M,43.659526,-79.340923
4,Central Toronto,Lawrence Park,M4N,43.72802,-79.38879
5,Central Toronto,Davisville North,M4P,43.712751,-79.390197
6,Central Toronto,North Toronto West,M4R,43.715383,-79.405678
7,Central Toronto,Davisville,M4S,43.704324,-79.38879
8,Central Toronto,"Summerhill East, Moore Park",M4T,43.689574,-79.38316
9,Central Toronto,"Summerhill West, Forest Hill SE, Deer Park, So...",M4V,43.686412,-79.400049


### Define Foursquare Credentials and Version

In [17]:
CLIENT_ID = 'KZ524DI0U1R2WT4XXDN1Q5NAJ3WV2UC1MBRPJF2YR30SORF0' # your Foursquare ID
CLIENT_SECRET = 'OYM2GF4CQL1GAOMAKKIIQCLWPRHF2WBF0XQWM2A5RZVES0TX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KZ524DI0U1R2WT4XXDN1Q5NAJ3WV2UC1MBRPJF2YR30SORF0
CLIENT_SECRET:OYM2GF4CQL1GAOMAKKIIQCLWPRHF2WBF0XQWM2A5RZVES0TX


### Creating a function to explore all places around Toronto

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&r0adius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Get venues near the aforementioned Neighbourhood 

In [20]:
toronto_venues = getNearbyVenues(names=toronto_df['Neighbourhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )

The Beaches
Riverdale, The Danforth West
The Beaches West, India Bazaar
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Summerhill East, Moore Park
Summerhill West, Forest Hill SE, Deer Park, South Hill, Rathnelly
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Harbourfront
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, Richmond, King
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Roselawn
Forest Hill West, Forest Hill North
Yorkville, The Annex, North Midtown
University of Toronto, Harbord
Kensington Market, Grange Park, Chinatown
South Niagara, Bathurst Quay, King and Spadina, Railway Lands, CN Tower, Harbourfront West, Island airport
Stn A PO Boxes 25 The Esplanade
Underground city, First Canadian Place
Christie
Dovercourt Village, Dufferin
Little Portugal, Trinity
Exhibition Place, Parkdale Village, Brockton
High Park, The Junction Sout

In [21]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,The Fox Theatre,43.672801,-79.287272,Indie Movie Theater
1,The Beaches,43.676357,-79.293031,The Beech Tree,43.680493,-79.288846,Gastropub
2,The Beaches,43.676357,-79.293031,Ed's Real Scoop,43.67263,-79.287993,Ice Cream Shop
3,The Beaches,43.676357,-79.293031,Bagels On Fire,43.672864,-79.286784,Bagel Shop
4,The Beaches,43.676357,-79.293031,Beaches Bake Shop,43.680363,-79.289692,Bakery


### Analyze Each Neighborhood

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [23]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,"Adelaide, Richmond, King",0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0
3,Central Bay Street,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
4,Christie,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
5,Church and Wellesley,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0
6,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,...,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
7,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.02,0.0
8,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.02,0.0
9,"Design Exchange, Toronto Dominion Centre",0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0


### Neighbourhood along with the top 5 most common venues

In [24]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, Richmond, King----
              venue  freq
0       Coffee Shop  0.08
1              Café  0.05
2               Bar  0.04
3        Steakhouse  0.04
4  Sushi Restaurant  0.03


----Berczy Park----
         venue  freq
0  Coffee Shop  0.10
1         Café  0.05
2   Restaurant  0.04
3        Hotel  0.04
4     Beer Bar  0.04


----Business Reply Mail Processing Centre 969 Eastern----
         venue  freq
0  Coffee Shop  0.07
1         Park  0.06
2      Brewery  0.06
3         Café  0.06
4       Bakery  0.05


----Central Bay Street----
                 venue  freq
0          Coffee Shop  0.13
1   Italian Restaurant  0.04
2       Ice Cream Shop  0.03
3  Japanese Restaurant  0.03
4      Bubble Tea Shop  0.03


----Christie----
               venue  freq
0               Café  0.11
1        Coffee Shop  0.06
2  Korean Restaurant  0.06
3  Indian Restaurant  0.04
4      Grocery Store  0.04


----Church and Wellesley----
                 venue  freq
0          Coffee Shop  0.11
1   

### Function to sort values in descending order

In [25]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Create new sorted Dataframe

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, Richmond, King",Coffee Shop,Café,Bar,Steakhouse,Sushi Restaurant,Asian Restaurant,Thai Restaurant,Theater,Cosmetics Shop,Vegetarian / Vegan Restaurant
1,Berczy Park,Coffee Shop,Café,Restaurant,Hotel,Beer Bar,Park,Cocktail Bar,Japanese Restaurant,Breakfast Spot,Bakery
2,Business Reply Mail Processing Centre 969 Eastern,Coffee Shop,Brewery,Park,Café,Bakery,Beach,Indian Restaurant,Pizza Place,Bar,Italian Restaurant
3,Central Bay Street,Coffee Shop,Italian Restaurant,Bakery,Bubble Tea Shop,Clothing Store,Ice Cream Shop,Café,Japanese Restaurant,Arts & Crafts Store,Thai Restaurant
4,Christie,Café,Coffee Shop,Korean Restaurant,Italian Restaurant,Grocery Store,Bar,Indian Restaurant,Ice Cream Shop,Vegetarian / Vegan Restaurant,Dessert Shop


### Cluster Neighbourhoods

In [30]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 4, 1, 3, 1, 2, 1, 1, 2], dtype=int32)

### Creating new dataframe with all data merged

In [31]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighbourhood,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,The Beaches,M4E,43.676357,-79.293031,4,Park,Beach,Coffee Shop,Pub,Breakfast Spot,Café,Bakery,Indian Restaurant,Ice Cream Shop,Fish & Chips Shop
1,East Toronto,"Riverdale, The Danforth West",M4K,43.679557,-79.352188,4,Greek Restaurant,Café,Park,Vietnamese Restaurant,Bakery,American Restaurant,Italian Restaurant,Ice Cream Shop,Pub,Pizza Place
2,East Toronto,"The Beaches West, India Bazaar",M4L,43.668999,-79.315572,4,Café,Park,Coffee Shop,Beach,Brewery,Italian Restaurant,Bakery,Indian Restaurant,Pizza Place,American Restaurant
3,East Toronto,Studio District,M4M,43.659526,-79.340923,4,Coffee Shop,Brewery,Café,Park,Bakery,Vietnamese Restaurant,French Restaurant,Diner,Thai Restaurant,Bar
4,Central Toronto,Lawrence Park,M4N,43.72802,-79.38879,1,Coffee Shop,Italian Restaurant,Park,Bakery,Bookstore,Sushi Restaurant,Café,Tea Room,Yoga Studio,Grocery Store


### Visualise resulting clusters

In [33]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining clusters

### Cluster 1

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,"Summerhill East, Moore Park",0,Italian Restaurant,Park,Café,Sushi Restaurant,Bakery,Indian Restaurant,Coffee Shop,Dessert Shop,Restaurant,Yoga Studio
9,"Summerhill West, Forest Hill SE, Deer Park, So...",0,Italian Restaurant,Café,Park,Sushi Restaurant,Middle Eastern Restaurant,Yoga Studio,Vegetarian / Vegan Restaurant,Coffee Shop,Deli / Bodega,Liquor Store
10,Rosedale,0,Park,Coffee Shop,Italian Restaurant,Café,Indian Restaurant,Gourmet Shop,Restaurant,Spa,Japanese Restaurant,Grocery Store
27,"South Niagara, Bathurst Quay, King and Spadina...",0,Park,Café,Coffee Shop,Brewery,Gym,Scenic Lookout,Aquarium,Hotel,Italian Restaurant,Dance Studio


### Cluster 2

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Lawrence Park,1,Coffee Shop,Italian Restaurant,Park,Bakery,Bookstore,Sushi Restaurant,Café,Tea Room,Yoga Studio,Grocery Store
5,Davisville North,1,Coffee Shop,Italian Restaurant,Bakery,Café,Indian Restaurant,Park,Gym,Asian Restaurant,Supermarket,Bookstore
6,North Toronto West,1,Coffee Shop,Italian Restaurant,Café,Bakery,Park,Sushi Restaurant,Bookstore,Yoga Studio,Tea Room,Deli / Bodega
7,Davisville,1,Coffee Shop,Bakery,Italian Restaurant,Café,Indian Restaurant,Park,Gym,Dessert Shop,Deli / Bodega,Bookstore
12,Church and Wellesley,1,Coffee Shop,Gym,Café,Restaurant,Sushi Restaurant,Japanese Restaurant,Burger Joint,Bookstore,Men's Store,Gastropub
14,"Ryerson, Garden District",1,Coffee Shop,Clothing Store,Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Tea Room,Café,Japanese Restaurant,Fast Food Restaurant,Bakery
17,Central Bay Street,1,Coffee Shop,Italian Restaurant,Bakery,Bubble Tea Shop,Clothing Store,Ice Cream Shop,Café,Japanese Restaurant,Arts & Crafts Store,Thai Restaurant
22,Roselawn,1,Coffee Shop,Italian Restaurant,Café,Bakery,Japanese Restaurant,Sporting Goods Shop,Deli / Bodega,Sushi Restaurant,Middle Eastern Restaurant,Bookstore
23,"Forest Hill West, Forest Hill North",1,Coffee Shop,Italian Restaurant,Sushi Restaurant,Gastropub,Park,Middle Eastern Restaurant,Restaurant,Japanese Restaurant,Bakery,Café
37,Queen's Park,1,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Gym,Gastropub,Park,Sushi Restaurant,Burger Joint,Chinese Restaurant


### Cluster 3

In [37]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Harbourfront,2,Coffee Shop,Restaurant,Café,Bakery,Park,Pub,Breakfast Spot,Farmers Market,Theater,Italian Restaurant
15,St. James Town,2,Coffee Shop,Café,Seafood Restaurant,Hotel,Restaurant,Bakery,Breakfast Spot,Italian Restaurant,Cosmetics Shop,BBQ Joint
16,Berczy Park,2,Coffee Shop,Café,Restaurant,Hotel,Beer Bar,Park,Cocktail Bar,Japanese Restaurant,Breakfast Spot,Bakery
18,"Adelaide, Richmond, King",2,Coffee Shop,Café,Bar,Steakhouse,Sushi Restaurant,Asian Restaurant,Thai Restaurant,Theater,Cosmetics Shop,Vegetarian / Vegan Restaurant
19,"Harbourfront East, Toronto Islands, Union Station",2,Coffee Shop,Hotel,Aquarium,Café,Italian Restaurant,Brewery,Park,Scenic Lookout,Restaurant,Bar
20,"Design Exchange, Toronto Dominion Centre",2,Coffee Shop,Hotel,Café,Restaurant,Gastropub,Seafood Restaurant,American Restaurant,Bar,Lounge,Italian Restaurant
21,"Commerce Court, Victoria Hotel",2,Coffee Shop,Café,Hotel,Restaurant,Steakhouse,Gym,Japanese Restaurant,Seafood Restaurant,Gastropub,Vegetarian / Vegan Restaurant
28,Stn A PO Boxes 25 The Esplanade,2,Coffee Shop,Café,Restaurant,Japanese Restaurant,Beer Bar,Hotel,Bakery,Cocktail Bar,Italian Restaurant,Breakfast Spot
29,"Underground city, First Canadian Place",2,Coffee Shop,Café,Hotel,Gastropub,Gym,Restaurant,American Restaurant,Seafood Restaurant,Japanese Restaurant,Steakhouse


### Cluster 4

In [38]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,"St. James Town, Cabbagetown",3,Coffee Shop,Café,Japanese Restaurant,Park,Pub,Thai Restaurant,Diner,Restaurant,Bakery,Gastropub
24,"Yorkville, The Annex, North Midtown",3,Coffee Shop,Café,Italian Restaurant,Restaurant,French Restaurant,Grocery Store,Vegetarian / Vegan Restaurant,Bakery,Japanese Restaurant,Sandwich Place
25,"University of Toronto, Harbord",3,Café,Bakery,Coffee Shop,Vegetarian / Vegan Restaurant,Bookstore,Pub,Bar,Park,Thai Restaurant,Dessert Shop
26,"Kensington Market, Grange Park, Chinatown",3,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Bar,Vietnamese Restaurant,Bakery,Dessert Shop,Coffee Shop,Dumpling Restaurant,Mexican Restaurant
30,Christie,3,Café,Coffee Shop,Korean Restaurant,Italian Restaurant,Grocery Store,Bar,Indian Restaurant,Ice Cream Shop,Vegetarian / Vegan Restaurant,Dessert Shop
31,"Dovercourt Village, Dufferin",3,Café,Italian Restaurant,Bar,Coffee Shop,Bakery,Park,Breakfast Spot,Cocktail Bar,Sushi Restaurant,Brazilian Restaurant
32,"Little Portugal, Trinity",3,Bar,Café,Restaurant,Bakery,Coffee Shop,Pizza Place,Italian Restaurant,Cocktail Bar,Asian Restaurant,Men's Store
33,"Exhibition Place, Parkdale Village, Brockton",3,Café,Coffee Shop,Restaurant,Bar,Gift Shop,Furniture / Home Store,Bakery,Theater,Soccer Stadium,Theme Park
34,"High Park, The Junction South",3,Café,Coffee Shop,Bar,Bakery,Italian Restaurant,Pizza Place,Brewery,Restaurant,Breakfast Spot,Park
35,"Parkdale, Roncesvalles",3,Café,Coffee Shop,Bakery,Restaurant,Park,Bar,Italian Restaurant,Sushi Restaurant,Gastropub,Pizza Place


### Cluster 5

In [39]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,4,Park,Beach,Coffee Shop,Pub,Breakfast Spot,Café,Bakery,Indian Restaurant,Ice Cream Shop,Fish & Chips Shop
1,"Riverdale, The Danforth West",4,Greek Restaurant,Café,Park,Vietnamese Restaurant,Bakery,American Restaurant,Italian Restaurant,Ice Cream Shop,Pub,Pizza Place
2,"The Beaches West, India Bazaar",4,Café,Park,Coffee Shop,Beach,Brewery,Italian Restaurant,Bakery,Indian Restaurant,Pizza Place,American Restaurant
3,Studio District,4,Coffee Shop,Brewery,Café,Park,Bakery,Vietnamese Restaurant,French Restaurant,Diner,Thai Restaurant,Bar
38,Business Reply Mail Processing Centre 969 Eastern,4,Coffee Shop,Brewery,Park,Café,Bakery,Beach,Indian Restaurant,Pizza Place,Bar,Italian Restaurant


## Observations

##### Cluster 1 consists mostly of neighbourhoods having Italian Restuarants, Park, Coffee Shops and Cafe as the most common venues

##### Cluster 2 consists mostly of neighbourhoods having Italian Restuarants, Gyms, Coffee Shops and Bakery as the most common venues

##### Cluster 3 consists mostly of neighbourhoods having Italian Restuarants, Hotels, Coffee Shops and Aquariums as the most common venues

##### Cluster 4 consists mostly of neighbourhoods having Cafe, Coffee Shops and Asian Restuarants as the most common venues

##### Cluster 5 consists mostly of neighbourhoods having Brewery, Beaches, Coffee Shops and Cafe as the most common venues