<a href="https://colab.research.google.com/github/rezzix/Capstone-Project/blob/master/Toronto_neighbourhood_segmentation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Segmenting and Clustering Neighborhoods in Toronto

### Description on the problem

I have read and heard from friends about advantages of migration to Canada and especially to the city of Toronto, My status as family father and my business as an independant consultant (banks and finance) involves a lot of conditions before taking a big move like moving to a new address

#### Objective
My objective is to evaluate the the opportunity of living in Toronto neighbourhoods based on 4 Criterias
* Proximity to green spots. (family confort)
* Proximity to banks. (proximity to workplaces)
* Proximity to beach. (clean air)

### Data collection and usage
Data will be mainly prepared as follows :
* Boroughs and neighborhouds names will be collected by scrapping wikipedia ( Toronto's addresses page in particular).
* Geolocalisation will be done using  geocoder library and opencage API
* Venues collection will be done using foursquare API.

Once done the data will be grouped by venue's type concentration for each neighborhood. Finally the neighborhoods will be ranked depending on best scores (based on target venue's types concentration)

### install useful modules

In [10]:
# beautiful soup for web scrapping
!pip install beautifulsoup4
# geocoder for geolocalisation
!pip install geocoder
# folium for map rendering
!pip install folium



### import useful libraries

In [12]:
import numpy as np
import pandas as pd
import requests
import re
import os
from bs4 import BeautifulSoup
import geocoder
from getpass import getpass
import folium

### start scrapping the wikipedia page for neighborhoods of Toronto

In [16]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url, allow_redirects=True)
soup = BeautifulSoup(page.content, 'html.parser')

In [17]:
postalcodes_tab = soup.find('table',class_='wikitable')

neighb_df = pd.DataFrame(columns=['PostalCode','Borough','Neighborhood'])
i=0

for neighborhood_tr in postalcodes_tab.find_all('tr'):
  if (len(neighborhood_tr.find_all('td')) == 3) :
    neighb_row = [td.text.rstrip() for td in neighborhood_tr.find_all('td')]
    if (neighb_row[1] != 'Not assigned') :
      neighb_df.loc[i] = neighb_row
      i+=1

neighb_df.head(10)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [18]:
neighb_df.shape


(103, 3)

In [19]:
#opencage_api_key = getpass("what is your opencage api key : ")

opencage_api_key = "load_me_from_drive"

with open('drive/My Drive/Colab Notebooks/keys/opencage') as f:
    opencage_api_key = f.readline()
    f.close

#print (geocoder.opencage('North York, Victoria Village, CA', key=openkage_api_key).latlng)
#print (geocoder.osm('North York, Victoria Village, CA').latlng)

### add geolocalisation data to the frame

In [20]:
neighb_df['address'] = neighb_df['Neighborhood'] + ', ' + neighb_df['Borough']+', Toronto, Canada'
#
neighb_df['lat'] = neighb_df['PostalCode']
neighb_df['lng'] = neighb_df['PostalCode']

#neighb_df_tst = neighb_df.head(3)

#neighb_df_tst['coordinates']=neighb_df_tst['adress'].apply(geocoder.osm).apply(lambda x: x.latlng if x != None else None)

for index, row in neighb_df.iterrows():
  latlng = geocoder.opencage(row['address'], key=opencage_api_key).latlng
  #print (row['adress'], geocoder.opencage(repr(row['adress']), key=opencage_api_key).latlng)
  if (latlng is not None) :
    row['lat'], row['lng'] = latlng[0], latlng[1]

neighb_df
#neighb_df['lat'] = geocoder.osm(neighb_df['adress']).lat
#geocoder.osm('M3A, Parkwoods, North York, CA').latlng

Unnamed: 0,PostalCode,Borough,Neighborhood,address,lat,lng
0,M3A,North York,Parkwoods,"Parkwoods, North York, Toronto, Canada",43.7611,-79.3241
1,M4A,North York,Victoria Village,"Victoria Village, North York, Toronto, Canada",43.7327,-79.3112
2,M5A,Downtown Toronto,"Regent Park, Harbourfront","Regent Park, Harbourfront, Downtown Toronto, T...",43.7001,-79.4163
3,M6A,North York,"Lawrence Manor, Lawrence Heights","Lawrence Manor, Lawrence Heights, North York, ...",43.7001,-79.4163
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government","Queen's Park, Ontario Provincial Government, D...",43.7001,-79.4163
...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North","The Kingsway, Montgomery Road, Old Mill North,...",43.7001,-79.4163
99,M4Y,Downtown Toronto,Church and Wellesley,"Church and Wellesley, Downtown Toronto, Toront...",43.6615,-79.3829
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...","Business reply mail Processing Centre, South C...",45.7236,7.4575
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...","Old Mill South, King's Mill Park, Sunnylea, Hu...",43.7001,-79.4163


In [21]:
address = 'Toronto, CA'

toronto_latlng = geocoder.opencage(address, key=opencage_api_key).latlng

print('The geograpical coordinate of Toronto are {}, {}.'.format(toronto_latlng[0], toronto_latlng[1]))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [22]:
condition = neighb_df['Borough'].str.contains('Toronto')

neighb_toronto_df = neighb_df[condition]

In [23]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[toronto_latlng[0], toronto_latlng[1]], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighb_toronto_df['lat'], neighb_toronto_df['lng'], neighb_toronto_df['Borough'], neighb_toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Clustering by categories of trending venues

In [24]:
# test the response of one call
CLIENT_ID = 'LETUVSLX3N1JYS23O4KJIJTAMSE2K1WYBCFTZZC52TJ5U5XC' # your Foursquare ID
CLIENT_SECRET = 'HEJX3D2KAC3UN5EOE1PHAONNFQVVRG4KECXPWFODDHXFW2UG' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, 43.7327,	-79.3112, 500, 10)
# make the GET request
results = requests.get(url).json()

results

{'meta': {'code': 200, 'requestId': '5f26b41ce9a64d4c6e9ecc89'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-550df684498ea2dd2c87bb5a-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/thai_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d149941735',
         'name': 'Thai Restaurant',
         'pluralName': 'Thai Restaurants',
         'primary': True,
         'shortName': 'Thai'}],
       'id': '550df684498ea2dd2c87bb5a',
       'location': {'address': '1744  Victoria Park',
        'cc': 'CA',
        'city': 'North York',
        'country': 'Canada',
        'crossStreet': 'Surrey Ave',
        'distance': 482,
        'formattedAddress': ['1744  Victoria Park (Surrey Ave)',
         'North York ON M1R 1R4',
         'Canada'],
        'labeledLa

In [25]:
def getExploreVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

toronto_venues = getExploreVenues(names=neighb_toronto_df['Neighborhood'], latitudes=neighb_toronto_df['lat'], longitudes=neighb_toronto_df['lng'] ) 

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


In [26]:
toronto_venues['Venue Category'].unique()

# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues['Venue Category'].unique())

In [27]:
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

In [35]:
for col in toronto_onehot.columns :
  print(col)

African Restaurant
American Restaurant
Art Museum
Arts & Crafts Store
Asian Restaurant
BBQ Joint
Bakery
Bank
Bar
Beach
Beach Bar
Beer Bar
Beer Store
Bookstore
Brazilian Restaurant
Breakfast Spot
Bubble Tea Shop
Burger Joint
Burrito Place
Café
Caribbean Restaurant
Cheese Shop
Chinese Restaurant
Chiropractor
Chocolate Shop
Clothing Store
Cocktail Bar
Coffee Shop
Comfort Food Restaurant
Comic Shop
Concert Hall
Convenience Store
Cosmetics Shop
Creperie
Dance Studio
Deli / Bodega
Department Store
Dessert Shop
Diner
Distribution Center
Dive Bar
Donut Shop
Eastern European Restaurant
Electronics Store
Falafel Restaurant
Farmers Market
Fast Food Restaurant
Fish Market
Flower Shop
Food Court
Food Truck
Fountain
French Restaurant
Frozen Yogurt Shop
Garden
Gastropub
Gay Bar
General Travel
Gift Shop
Gourmet Shop
Greek Restaurant
Grocery Store
Gym
Gym / Fitness Center
Gym Pool
Hawaiian Restaurant
Historic Site
Hobby Shop
Hotel
Ice Cream Shop
Indian Restaurant
Indonesian Restaurant
Irish Pub
Italian

In [37]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean()

In [38]:
toronto_grouped

Unnamed: 0_level_0,African Restaurant,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beach,Beach Bar,Beer Bar,Beer Store,Bookstore,Brazilian Restaurant,Breakfast Spot,Bubble Tea Shop,Burger Joint,Burrito Place,Café,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Distribution Center,...,Mountain,Museum,Music Venue,Nail Salon,New American Restaurant,North Indian Restaurant,Park,Pet Store,Pharmacy,Pizza Place,Plaza,Poke Place,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Speakeasy,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tea Room,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
Berczy Park,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
Central Bay Street,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.033333
Christie,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,...,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Queen's Park, Ontario Provincial Government",0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
"Regent Park, Harbourfront",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
St. James Town,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
The Beaches,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0


In [39]:
interesting_venue_categories = ['Bank','Beach','Garden','Fountain']