# Segmenting and Clustering Neighborhoods in Toronto

# [Part 1] Web scraping postal codes of neighborhoods in Toronto

Use `beautifulsoup4` to scrape this Wikipedia [page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) containing the postal codes of neighborhood of Toronto. 

The postal codes will be used for geocoding.

From the [documentation](https://beautiful-soup-4.readthedocs.io/en/latest/#making-the-soup) we see that we need to create the html file to pass to `beautifulsoup4`. That can be achieved with the module `requests` handling the GET call to the Wikipedia page and transforming the result into text (we could also save it on file if needed)

In [1]:
from bs4 import BeautifulSoup
import requests

url_postal_codes = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source_page = requests.get(url_postal_codes).text

soup = BeautifulSoup(source_page, 'lxml')

In [None]:
# explore the soup
# print(soup.prettify())

### Extracting the table with postal codes

By inspecting the website (from the browser or from the soup object above) we see that the relevant information is in a `<table>` object with class `wikitable sortable`.

We can use the `find_all` method ([docs](https://beautiful-soup-4.readthedocs.io/en/latest/#searching-the-tree)) to look for the table and extract its parts.

In [2]:
table = soup.find('table', class_='wikitable sortable')
#print(table)

In [3]:
rows = table.find_all('tr')
#print(rows)

#### Tests (supplementary)

This is how a row of the table looks like:

In [None]:
test = rows[7]
#print(test.text)

And this is how to split a row into its elements and keeping only the ones with text. This is assuming that the first and last line in the row are empty.

In [None]:
test.text.split('\n')[1:4]

In [None]:
test.find_all('td')

#### Get the data from each row in the table

Iterate through all the rows in the table to extract the `Postal Code`, `Borough` and `Neighborhood`, assuming their positions in the text results.

In [4]:
header = rows[0].text
table = []
for r in rows[1:]:
    try:
        line = r.text.split('\n')[1:4]
    except Exception as e:
        print('cannot get line {}'.format(r))
        line = []
    table.append(line)
# print(table)

### Create a `pandas` dataframe with the postal codes

In [5]:
import pandas as pd
print(pd.__version__)

0.25.0


Use `from_records` to create a DataFrame directly from the table of data, giving names to the columns:

In [6]:
# do this if the rows in the table were parsed as a whole
# pc = pd.DataFrame.from_records(table,exclude=['0','1'],columns=['0','PostalCode','Borough','Neighborhood','1'])
pc = pd.DataFrame.from_records(table,columns=['PostalCode','Borough','Neighborhood'])

In [None]:
pc.head()

### Process dataframe to remove unwanted items

In [7]:
import numpy as np
print(np.__version__)

1.17.0


Keep only the rows where `Borough` is different from `Not assigned`.

In [8]:
pc_clean = pc[pc.Borough != 'Not assigned']

In [None]:
pc_clean.head()

If a `Neighborhood` does not have a name, assign the name of the corresponding `Borough`:

In [9]:
pc_clean.query('Neighborhood == "Not assigned"')

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Not assigned


In [10]:
pc_clean.at[8,'Neighborhood']=pc_clean.at[8,'Borough']

In [11]:
pc_clean.loc[8]

PostalCode               M7A
Borough         Queen's Park
Neighborhood    Queen's Park
Name: 8, dtype: object

If there where multiple instances to change, we could have used `Dataframe.where` to replace the values.

In [None]:
# https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html#pandas.DataFrame.where
#test = pc[pc.Borough != 'Not assigned']
#test.where(test.Neighborhood=='Not assigned',test.Borough, axis=1)

### Merge neighborhoods with the same postal code

Group by `PostalCode` and merge the neighborhoods

In [12]:
pc_grouped = pc_clean.groupby('PostalCode')

You can iterate through all the postal codes and look at the different neighborhoods

In [None]:
# for name, group in pc_grouped:
#     group

#### Create final dataframe

In [13]:
column_names = ['PostalCode','Borough','Neighborhood']
df = pd.DataFrame(columns=column_names)

for name, group in pc_grouped:
    p = list(dict.fromkeys(group.PostalCode))[0]
    b = list(dict.fromkeys(group.Borough))[0]
    n = list(dict.fromkeys(group.Neighborhood))
    df = df.append({'PostalCode': p, 'Borough': b, 'Neighborhood': ', '.join(n)}, ignore_index=True)

# df

Final shape:

In [14]:
df.shape

(103, 3)

# [Part 2] Get geospatial coordinates of the neighborhoods

In [None]:
# import geocoder

In [None]:
# # initialize your variable to None
# lat_lng_coords = None
# postal_code = 'M9W'

# # loop until you get the coordinates
# while(lat_lng_coords is None):
#     g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#     lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]
# print(latitude,longitude)

In [None]:
# g = geocoder.canadapost('{}, Toronto, Ontario'.format(postal_code))
# g

In [None]:
# g = geocoder.google('M9N, Toronto, Ontario')
# g

Since the `geocoder.google` does not seem to work (giving `REQUEST DENIED`) and the `geocoder.canadapost` does not have info on latitude or longitude, we use the provided `Geospatial_Coordinates.csv` file

In [15]:
gc = pd.read_csv('Geospatial_Coordinates.csv')
gc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
gc.shape

(103, 3)

Merge the Neighborhood dataset and the coordinates dataset. Be careful about the different naming convention in the columns.

In [17]:
toronto_data = pd.merge(df, gc, left_on='PostalCode', right_on='Postal Code')

In [18]:
toronto_data.drop(columns='Postal Code',inplace=True)

In [19]:
toronto_data.shape

(103, 5)

# [Part 3] Maps and segmentation of the city of Toronto

Import libraries for maps and clustering

In [20]:
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
# import k-means from scikit-learn
from sklearn.cluster import KMeans

## Create a Folium Map for Toronto

First get the address of Toronto to center the map:

In [21]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Then use the coordinates of each neighborhood saved in `toronto_data` to create a pin on the map corresponding to the location of the center of the neighborhood. First of all, let's restrict the map to different boroughs.

In [22]:
neighborhoods = toronto_data.groupby('Borough').head(1)

In [23]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(width='100%',height='100%',location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postal, borough in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['PostalCode'], neighborhoods['Borough']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=folium.Popup('{}, {}'.format(postal, borough), parse_html=True),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

<folium.vector_layers.CircleMarker at 0x1210bb6d8>

<folium.vector_layers.CircleMarker at 0x1210bb6a0>

<folium.vector_layers.CircleMarker at 0x11bd4b4e0>

<folium.vector_layers.CircleMarker at 0x1210bbc18>

<folium.vector_layers.CircleMarker at 0x1210bbd68>

<folium.vector_layers.CircleMarker at 0x1210bbeb8>

<folium.vector_layers.CircleMarker at 0x1210bbe48>

<folium.vector_layers.CircleMarker at 0x11f018198>

<folium.vector_layers.CircleMarker at 0x11f018358>

<folium.vector_layers.CircleMarker at 0x11f0184a8>

<folium.vector_layers.CircleMarker at 0x11f0185f8>

In [24]:
map_toronto

### Explore Downtown Toronto

For illustrative purposes, let's focus on Downtown Toronto

In [25]:
downtown_toronto = toronto_data[toronto_data['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937


In [26]:
downtown_toronto.shape

(18, 5)

Visualize the 18 different postal codes on the map after getting the coordinated for Downtown Toronto:

In [27]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.3808116451341.


In [28]:
# create map of Toronto using latitude and longitude values
map_downtown = folium.Map(width='100%',height='100%',location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, postal, neighborhoods in zip(downtown_toronto['Latitude'], downtown_toronto['Longitude'], downtown_toronto['PostalCode'], downtown_toronto['Neighborhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=folium.Popup('{}, {}'.format(postal, neighborhoods), parse_html=True),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)

<folium.vector_layers.CircleMarker at 0x1210bb400>

<folium.vector_layers.CircleMarker at 0x11f02fc18>

<folium.vector_layers.CircleMarker at 0x11f02fd68>

<folium.vector_layers.CircleMarker at 0x11f02feb8>

<folium.vector_layers.CircleMarker at 0x11f02fe48>

<folium.vector_layers.CircleMarker at 0x11f040160>

<folium.vector_layers.CircleMarker at 0x11f0402e8>

<folium.vector_layers.CircleMarker at 0x11f040438>

<folium.vector_layers.CircleMarker at 0x11f040588>

<folium.vector_layers.CircleMarker at 0x11f0406d8>

<folium.vector_layers.CircleMarker at 0x11f040828>

<folium.vector_layers.CircleMarker at 0x11f040978>

<folium.vector_layers.CircleMarker at 0x11f040ac8>

<folium.vector_layers.CircleMarker at 0x11f040c18>

<folium.vector_layers.CircleMarker at 0x11f040d68>

<folium.vector_layers.CircleMarker at 0x11f040eb8>

<folium.vector_layers.CircleMarker at 0x11f040e48>

<folium.vector_layers.CircleMarker at 0x11f049128>

In [29]:
map_downtown

We can now reproduce the same analysis we did for Manhattan, New York

## Clustering the neighborhoods of Downtown Toronto

There are 18 different postal codes in Downtown Toronto, corresponding to different neighborhoods.

#### Use FourSquare API

Gather credentials to access API (they are stored in a file). Then define the query to the API to gather the first 100 venues in a 500m radius around a specific latitude and longitude (FourSquare need the geospatial coordinates)

In [30]:
# get credentials from file
with open('../credentials.json') as f:
    cred = json.load(f)
CLIENT_ID = cred['client_id'] # your Foursquare ID
CLIENT_SECRET = cred['client_secret'] # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100

For illustration purposes, let's fix the postal code to be `M5T` and later we will repeat the whole process on the full list of postal codes

In [31]:
radius = 500
latitude = downtown_toronto.query('PostalCode=="M5T"').Latitude.values[0]
longitude = downtown_toronto.query('PostalCode=="M5T"').Longitude.values[0]
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=U0CCM3AW5HNICCFKWFJYJP44CYIQCUOQB1WT52W1H2FYJKD2&client_secret=XF3EB3ZTPQOCC41N1WHH0HBP3PNFVHTFEE1E3OUCAGHGLOV0&ll=43.6532057,-79.4000493&v=20180605&&radius=500&limit=100


In [32]:
results = requests.get(url).json()

In [33]:
#results

#### Clean up and collect different venues for a specific postal code

We extract the information about the venues from the result of the API call

In [34]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [35]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Kid Icarus,Arts & Crafts Store,43.653933,-79.401719
1,Seven Lives - Tacos y Mariscos,Mexican Restaurant,43.654418,-79.400545
2,Little Pebbles,Coffee Shop,43.654883,-79.400264
3,El Rey,Cocktail Bar,43.652764,-79.400048
4,The Moonbean Cafe,Café,43.654147,-79.400182


In [36]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


We need to repeat this process for all the neighborhoods (postal codes) in Downtown Toronto. This is analogous to what we did for Manhattan, New York.

#### Define a function to gather venues from FourSquare for all neighborhoods

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) # just a nested loop using list comprehension:https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get all venues

Run the function for all the rows in the `downtown_toronto` dataframe

In [38]:
venues = getNearbyVenues(names=downtown_toronto['Neighborhood'],
                        latitudes=downtown_toronto['Latitude'],
                        longitudes=downtown_toronto['Longitude']
                        )

Rosedale
Cabbagetown, St. James Town
Church and Wellesley
Harbourfront, Regent Park
Ryerson, Garden District
St. James Town
Berczy Park
Central Bay Street
Adelaide, King, Richmond
Harbourfront East, Toronto Islands, Union Station
Design Exchange, Toronto Dominion Centre
Commerce Court, Victoria Hotel
Harbord, University of Toronto
Chinatown, Grange Park, Kensington Market
CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
Stn A PO Boxes 25 The Esplanade
First Canadian Place, Underground city
Christie


We have venues for each list of neighborhoods (one list correspond to a specific postal code)

In [None]:
venues.groupby('Neighborhood').count()

In [39]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 206 uniques categories.


### Analyze the neighborhoods

In order to use a clustering algorithm to segment Downtown Toronto, we need to transform the venues and venue categories into numbers. We want to use these numbers as features for each neighborhood.

In [40]:
# one hot encoding of categories for clustering
toronto_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column one-hot dataframe
toronto_onehot['Neighborhood'] = venues['Neighborhood']
# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Taiwanese Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0


In [41]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
# toronto_grouped

In [42]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Steakhouse,Thai Restaurant,Breakfast Spot,Gym,Restaurant,Asian Restaurant,Hotel
1,Berczy Park,Coffee Shop,Cocktail Bar,Beer Bar,Bakery,Seafood Restaurant,Farmers Market,Steakhouse,Cheese Shop,Café,Park
2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Coffee Shop,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Food Court
3,"Cabbagetown, St. James Town",Coffee Shop,Park,Restaurant,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Gym / Fitness Center,American Restaurant
4,Central Bay Street,Coffee Shop,Italian Restaurant,Ice Cream Shop,Middle Eastern Restaurant,Sandwich Place,Burger Joint,Café,Bubble Tea Shop,Spa,Bakery


### Cluster the neighborhoods

Run the k-means clustering algorithm with 3 clusters

In [50]:
# set number of clusters
kclusters = 3

# remove the Neighborhood name from the dataframe, and leave only the frequencies: 260 of them per 18 neighborhoods.
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1) # drop column, axis=1

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0], dtype=int32)

Changing the number of clusters does not seem to matter much. Most of the neighborhoods fall in the same cluster because they are pretty similar (they are all in Downtown Toronto after all) but two of them, the third and the 15th. 
They are very specific neighborhood: for example one has an airport and the other a park.

In [51]:
neighborhoods_venues_sorted.iloc[[2,14]]

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,2,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Coffee Shop,Plane,Sculpture Garden,Boutique,Boat or Ferry,Airport Food Court
14,1,Rosedale,Park,Playground,Trail,Building,Women's Store,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop


### Map the different clusters

Create a dataframe with the cluster label for each neighborhood and map them with different colored markers

In [54]:
# add clustering labels to the sorted venue dataframe
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_) # insert as first column.

downtown_merged = downtown_toronto

# merge neighborhoods_venues_sorted with manhattan_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,2,Park,Playground,Trail,Building,Women's Store,Dim Sum Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
1,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675,0,Coffee Shop,Park,Restaurant,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Gym / Fitness Center,American Restaurant
2,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Restaurant,Gay Bar,Pub,Men's Store,Gastropub,Hotel,Fast Food Restaurant
3,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,Coffee Shop,Café,Pub,Bakery,Park,Theater,Breakfast Spot,Mexican Restaurant,Restaurant,Spa
4,M5B,Downtown Toronto,"Ryerson, Garden District",43.657162,-79.378937,0,Coffee Shop,Clothing Store,Cosmetics Shop,Middle Eastern Restaurant,Café,Ramen Restaurant,Diner,Italian Restaurant,Ice Cream Shop,Bubble Tea Shop


Latitude and Longitude are the ones of Downtown Toronto

In [47]:
address = 'Downtown Toronto, Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Downtown Toronto are 43.6541737, -79.3808116451341.


In [55]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))  # array of colors from the rainbow colormap
rainbow = [colors.rgb2hex(i) for i in colors_array]  # get HEX code for each color

# add markers to the map
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'], downtown_merged['Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

<folium.vector_layers.CircleMarker at 0x1211cee10>

<folium.vector_layers.CircleMarker at 0x1211cedd8>

<folium.vector_layers.CircleMarker at 0x1211ceb38>

<folium.vector_layers.CircleMarker at 0x1211ce2b0>

<folium.vector_layers.CircleMarker at 0x121203908>

<folium.vector_layers.CircleMarker at 0x1211ce6d8>

<folium.vector_layers.CircleMarker at 0x1211a9048>

<folium.vector_layers.CircleMarker at 0x1211a9da0>

<folium.vector_layers.CircleMarker at 0x1211a9828>

<folium.vector_layers.CircleMarker at 0x1211a9898>

<folium.vector_layers.CircleMarker at 0x1211a90b8>

<folium.vector_layers.CircleMarker at 0x1211a92b0>

<folium.vector_layers.CircleMarker at 0x1211a9f98>

<folium.vector_layers.CircleMarker at 0x1211a9a58>

<folium.vector_layers.CircleMarker at 0x1211a9eb8>

<folium.vector_layers.CircleMarker at 0x1211a9c50>

<folium.vector_layers.CircleMarker at 0x1211a9588>

<folium.vector_layers.CircleMarker at 0x121242940>

In [56]:
map_clusters