# Segmenting and Clustering Neighborhoods in Toronto

----

In this assignment we are working with python to pull data from the Internet for some neighnourhood analysis.

### Step One: download and clean the neighbourhoods

In [1]:
import pandas as pd


The following code uses `pandas` to scrape a table from a Wikipedia page.  Since the table we want
is the first table and only table on the page the code is a little simpler than if we had to search through 
multiple tables.

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
print(df.shape)
df.head()

(180, 3)


Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


Cleaning invovles removing the rows where the postal code has not been assigned (to a Borough) and then for the
Boroughs that have multiple neighbourhoods, replacing the '/' divider with a ','.  The assignment also wanted 
Boroughs with no Neibourhood names to use the Borough as the Neighbourhood.  A check for NaNs or `Not assigned`
in the Neighbourhood column did not find any of these cases.  

In [3]:
df = df[df.Borough != 'Not assigned']
df['Neighborhood'] = df['Neighborhood'].str.replace(' /',',')

In [4]:
df[df['Neighborhood'].isnull()]

Unnamed: 0,Postal code,Borough,Neighborhood


In [5]:
df[df.Neighborhood == 'Not assigned'] 

Unnamed: 0,Postal code,Borough,Neighborhood


In [6]:
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
df.shape

(103, 3)

A total of 77 rows were removed in this process leaving 103, which corresponds with the number of Toronto Forward Sorting Areas (FSAs) according to the Wikipedia Web page.  This seems like a good check that the processing has succeeded.

### Step Two: Get the Lat-Long coordiantes for the postal codes

Didn't have much success getting the geocoder to work so went to the backup plan ofusing the csv file provided.

In [8]:
ll = pd.read_csv("Geospatial_Coordinates.csv")
print(ll.shape)
ll.head()


(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


The shape looks good and the head look reasonable too.  The only difference is in the column name for the postal
code.  Need to adjust that so a join will work easily.

In [9]:
ll.rename(columns={'Postal Code':'Postal code'}, inplace=True)
ll.head()
#result = pd.concat([df, ll], axis=1, sort=False)

Unnamed: 0,Postal code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
df = pd.merge(df, ll)
df.head()

Unnamed: 0,Postal code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [11]:
df.shape

(103, 5)

Join looks like it had the correct effect and the shape is still good.

### Step Three: Explore and Cluster Toronto Neighbourhoods

In this brief analysis the centre of focus is Union Station - somewhere you could be arriving either on the Peasron 
Express for the airport or coming in on the Go trains.  Its on the lake side of downtown.  Let's assume we just arrived from the airport.

In [12]:
import folium # map rendering library

# latitude and longitude values for Union Station postal code
latitude = 43.640816
longitude = -79.381752
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

That's a lot of neighbourhoods and we just got off the express train so we would prefer to be walking somewhere.  So let's narrow down to showing which of these are in the `Downtown Toronto` neighbourhood.  The coordinates we are using were looked up from the postal code that contains Union Station.

In [13]:
downtown = df[df.Borough=="Downtown Toronto"]
urban = df[df.Borough!="Downtown Toronto"]

In [14]:
latitude = 43.640816
longitude = -79.381752
map_toronto2 = folium.Map(location=[latitude, longitude], zoom_start=10)

# add downtown to map
for lat, lng, borough, neighborhood in zip(downtown['Latitude'], downtown['Longitude'], downtown['Borough'], downtown['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#de2f51',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto2)  

# add urban to map
for lat, lng, borough, neighborhood in zip(urban['Latitude'], urban['Longitude'], urban['Borough'], urban['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto2)  


map_toronto2

Moving on to Foursquare queries we will continue to work with the centre of the searches being Union Station and 
look up the latitude / longintude from the Nominatum service using a guessed address for the station.

In [15]:
#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

CLIENT_ID = 'ED44TJOWXHIZ1DRYONSCSN3KI0GWW1HJLCHVXNCV2COL0RUL' # your Foursquare ID
CLIENT_SECRET = 'HOQT5SNOO04NGNMRGCLLLDKBI5LFYZJFWXM3QSL4KMOUHTPG' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

address = 'Union Station, Toronto, Ontario'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

43.6446934 -79.38013200585965


Let's look for an Irish pub.  A pint would go nicely with the rest of this assignment.  Perhaps we should widen the search to 750 metres since this is a slightly uncommmon type of venue.

In [16]:
search_query = 'Irish'
radius = 750

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=ED44TJOWXHIZ1DRYONSCSN3KI0GWW1HJLCHVXNCV2COL0RUL&client_secret=HOQT5SNOO04NGNMRGCLLLDKBI5LFYZJFWXM3QSL4KMOUHTPG&ll=43.6446934,-79.38013200585965&v=20180604&query=Irish&radius=750&limit=30'

Now we can submit the request and take look at what comes back.

In [17]:
import requests # library to handle requests
import random # library for random number generation
from pandas.io.json import json_normalize

results = requests.get(url).json()

In [18]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,venuePage.id,location.neighborhood
0,4ad4c05df964a52039f620e3,Irish Embassy,"[{'id': '52e81612bcbc57f1066b7a06', 'name': 'I...",v-1586208011,False,49 Yonge St.,at Wellington St.,43.647918,-79.377273,"[{'label': 'display', 'lat': 43.64791849936151...",426,M5E 1J1,CA,Toronto,ON,Canada,"[49 Yonge St. (at Wellington St.), Toronto ON ...",,
1,4ee61a462da504839760f8c9,P.J. O'Brien Irish Pub & Restaurant,"[{'id': '4bf58dd8d48988d11b941735', 'name': 'P...",v-1586208011,False,39 Colborne St,at Leader Ln.,43.648909,-79.37533,"[{'label': 'display', 'lat': 43.64890910058636...",608,M5E 1E3,CA,Toronto,ON,Canada,"[39 Colborne St (at Leader Ln.), Toronto ON M5...",,
2,4adaa3d1f964a520ed2321e3,The OverDraught Irish Pub,"[{'id': '52e81612bcbc57f1066b7a06', 'name': 'I...",v-1586208011,False,156 Front St W,btwn York & Simcoe,43.645028,-79.384869,"[{'label': 'display', 'lat': 43.64502770313663...",383,M5J 2L6,CA,Toronto,ON,Canada,"[156 Front St W (btwn York & Simcoe), Toronto ...",33864024.0,
3,5407851f498e4d0e6803514a,Irish Canadian Immigration Centre,"[{'id': '4bf58dd8d48988d12c951735', 'name': 'E...",v-1586208011,False,67 Yonge Street,,43.648739,-79.377409,"[{'label': 'display', 'lat': 43.648739, 'lng':...",500,,CA,Toronto,ON,Canada,"[67 Yonge Street, Toronto ON, Canada]",,
4,4b3db5e4f964a520709625e3,Quinn's Steakhouse & Bar,"[{'id': '4bf58dd8d48988d1cc941735', 'name': 'S...",v-1586208011,False,96 Richmond Street West,at Bay,43.651197,-79.382976,"[{'label': 'display', 'lat': 43.65119745750837...",759,M5H 2A3,CA,Toronto,ON,Canada,"[96 Richmond Street West (at Bay), Toronto ON ...",90490783.0,Financial District


In [19]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Irish Embassy,Irish Pub,49 Yonge St.,at Wellington St.,43.647918,-79.377273,"[{'label': 'display', 'lat': 43.64791849936151...",426,M5E 1J1,CA,Toronto,ON,Canada,"[49 Yonge St. (at Wellington St.), Toronto ON ...",,4ad4c05df964a52039f620e3
1,P.J. O'Brien Irish Pub & Restaurant,Pub,39 Colborne St,at Leader Ln.,43.648909,-79.37533,"[{'label': 'display', 'lat': 43.64890910058636...",608,M5E 1E3,CA,Toronto,ON,Canada,"[39 Colborne St (at Leader Ln.), Toronto ON M5...",,4ee61a462da504839760f8c9
2,The OverDraught Irish Pub,Irish Pub,156 Front St W,btwn York & Simcoe,43.645028,-79.384869,"[{'label': 'display', 'lat': 43.64502770313663...",383,M5J 2L6,CA,Toronto,ON,Canada,"[156 Front St W (btwn York & Simcoe), Toronto ...",,4adaa3d1f964a520ed2321e3
3,Irish Canadian Immigration Centre,Embassy / Consulate,67 Yonge Street,,43.648739,-79.377409,"[{'label': 'display', 'lat': 43.648739, 'lng':...",500,,CA,Toronto,ON,Canada,"[67 Yonge Street, Toronto ON, Canada]",,5407851f498e4d0e6803514a
4,Quinn's Steakhouse & Bar,Steakhouse,96 Richmond Street West,at Bay,43.651197,-79.382976,"[{'label': 'display', 'lat': 43.65119745750837...",759,M5H 2A3,CA,Toronto,ON,Canada,"[96 Richmond Street West (at Bay), Toronto ON ...",Financial District,4b3db5e4f964a520709625e3


In [20]:
dataframe_filtered.name

0                          Irish Embassy
1    P.J. O'Brien Irish Pub & Restaurant
2              The OverDraught Irish Pub
3      Irish Canadian Immigration Centre
4               Quinn's Steakhouse & Bar
Name: name, dtype: object

In [21]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Union Station',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Irish pub / restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Wait... this isn't gong to work!  All the pubs are closed for covid-19!  Could look for what else there is to do around here other than takeaway coffee...

In [22]:
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=ED44TJOWXHIZ1DRYONSCSN3KI0GWW1HJLCHVXNCV2COL0RUL&client_secret=HOQT5SNOO04NGNMRGCLLLDKBI5LFYZJFWXM3QSL4KMOUHTPG&ll=43.6446934,-79.38013200585965&v=20180604&radius=750&limit=30'

In [23]:
results = requests.get(url).json()
'There are {} around Union Station.'.format(len(results['response']['groups'][0]['items']))

'There are 30 around Union Station.'

In [24]:
items = results['response']['groups'][0]['items']
dataframe = json_normalize(items) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['venue.categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean columns
dataframe_filtered.columns = [col.split('.')[-1] for col in dataframe_filtered.columns]

dataframe_filtered.head(10)

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
0,Real Sports Apparel,Sporting Goods Shop,15 York St. Unit B,Maple Leaf Square,43.64286,-79.380184,"[{'label': 'display', 'lat': 43.64285984835777...",204,M5J,CA,Toronto,ON,Canada,"[15 York St. Unit B (Maple Leaf Square), Toron...",,4be46c832468c92887c1fe42
1,WVRST,Beer Bar,65 Front St W,,43.644968,-79.381376,"[{'label': 'display', 'lat': 43.64496809087762...",104,M5J 1E6,CA,Toronto,ON,Canada,"[65 Front St W, Toronto ON M5J 1E6, Canada]",,5c08474447f876002c736b68
2,Scotiabank Arena,Basketball Stadium,40 Bay St,,43.643446,-79.37904,"[{'label': 'display', 'lat': 43.64344617535107...",164,M5J 2X2,CA,Toronto,ON,Canada,"[40 Bay St, Toronto ON M5J 2X2, Canada]",,4b155081f964a520b4b023e3
3,Pilot Coffee Roasters,Coffee Shop,65 Front St W,btwn Bay St & York St,43.645018,-79.380415,"[{'label': 'display', 'lat': 43.64501814464698...",42,M5J 1E6,CA,Toronto,ON,Canada,"[65 Front St W (btwn Bay St & York St), Toront...",,563d2f2dcd10bcf27ae37c3b
4,The Fairmont Royal York,Hotel,100 Front St W,btwn York St & Bay St,43.645449,-79.381508,"[{'label': 'display', 'lat': 43.64544914616651...",139,M5J 1E3,CA,Toronto,ON,Canada,"[100 Front St W (btwn York St & Bay St), Toron...",,4ad4c05bf964a520a7f520e3
5,Union Pearson Express,Train Station,61 Front St. W,,43.644362,-79.383199,"[{'label': 'display', 'lat': 43.64436200658875...",249,M5J 2Z8,CA,Toronto,ON,Canada,"[61 Front St. W, Toronto ON M5J 2Z8, Canada]",Entertainment District,55561f8a498e2e5c866ac4c9
6,Air Canada Club,Lounge,Air Canada Centre,,43.643224,-79.379159,"[{'label': 'display', 'lat': 43.64322410003041...",181,M5J 2L2,CA,Toronto,ON,Canada,"[Air Canada Centre, Toronto ON M5J 2L2, Canada]",,4b60d5b9f964a52071fc29e3
7,Maple Leaf Square,Plaza,15 York St.,Bremner Blvd.,43.642925,-79.380892,"[{'label': 'display', 'lat': 43.64292522840183...",206,,CA,Toronto,ON,Canada,"[15 York St. (Bremner Blvd.), Toronto ON, Canada]",,4bdb8c1cc79cc928a77583e9
8,iQ Food Co,Salad Place,18 York Street,Bremner Ave,43.642851,-79.382081,"[{'label': 'display', 'lat': 43.642851, 'lng':...",258,M5J 0B2,CA,Toronto,ON,Canada,"[18 York Street (Bremner Ave), Toronto ON M5J ...",,5346c98a498ed612110d0f60
9,DAVIDsTEA,Tea Room,200 Bay St Unit R134,,43.646506,-79.380145,"[{'label': 'display', 'lat': 43.64650585612388...",201,M5J 2J2,CA,Toronto,ON,Canada,"[200 Bay St Unit R134, Toronto ON M5J 2J2, Can...",,4e67c316483bef6eb8171e49


In [25]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Ecco


# add Ecco as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Union Station',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(venues_map)

# display map
venues_map

That's quite a few things, but Foursquare is not telling me if thay are open under the current social distancing restrictions.  Lookig for the trending venues might help with this...

In [26]:
# define URL
url = 'https://api.foursquare.com/v2/venues/trending?client_id={}&client_secret={}&ll={},{}&v={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION)

# send GET request and get trending venues
results = requests.get(url).json()

if len(results['response']['venues']) == 0:
    trending_venues_df = 'No trending venues are available at the moment!'
    
else:
    trending_venues = results['response']['venues']
    trending_venues_df = json_normalize(trending_venues)

    # filter columns
    columns_filtered = ['name', 'categories'] + ['location.distance', 'location.city', 'location.postalCode', 'location.state', 'location.country', 'location.lat', 'location.lng']
    trending_venues_df = trending_venues_df.loc[:, columns_filtered]

    # filter the category for each row
    trending_venues_df['categories'] = trending_venues_df.apply(get_category_type, axis=1)

In [27]:
# display trending venues
trending_venues_df

'No trending venues are available at the moment!'

I've checked this a couple of times and I'm pretty sure this is correct - even in the afternoon (Toronto time) there are no trending locations downtown.  Wow.  Only during covid-19 could you get this in the centre of a major city!  Looks like it has to be Tim Horton's and even that isn't trending!