# Toronto's Neighborhood Analysis

In this notebook I'll be analising the clusters related to Toronto's negihborhood.

## First Part

In [19]:
#let's import everything needed.
import pandas as pd
import numpy as np
import folium
from bs4 import BeautifulSoup
import geocoder
import requests
from geopy.geocoders import Nominatim

So, first let's get the data in the wikipedia website.

In [4]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header=0)

data = dfs[0]
data

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


With the df, now the lat and long data will be concatenated to it

In [5]:
data = data[data.Borough != 'Not assigned']
data

Unnamed: 0,Postal code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
160,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,Business reply mail Processing CentrE
169,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


Let's reorganize the index and later check for duplicates in the Postal Code to Merge the neighborhoods associated to both.

In [6]:
data = data.reset_index(drop=True)
data

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [7]:
print(data.duplicated(subset = 'Postal code', keep = False).value_counts())
print(data.shape)
print('103 rows and non postal code replicated. Therefore good to go')

False    103
dtype: int64
(103, 3)
103 rows and non postal code replicated. Therefore good to go


## Second Part

It was imposible to connect to the data using geocoder, so the csv data will be used

In [9]:
latlng_df = pd.read_csv('http://cocl.us/Geospatial_data')
latlng_df

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


Good, so now the merge of both df will be made.

In [15]:
merged_df = data.merge(latlng_df, how='left', left_on='Postal code',
                       right_on = 'Postal Code')

In [16]:
merged_df

Unnamed: 0,Postal code,Borough,Neighborhood,Postal Code,Latitude,Longitude
0,M3A,North York,Parkwoods,M3A,43.753259,-79.329656
1,M4A,North York,Victoria Village,M4A,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park / Harbourfront,M5A,43.654260,-79.360636
3,M6A,North York,Lawrence Manor / Lawrence Heights,M6A,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government,M7A,43.662301,-79.389494
...,...,...,...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North,M8X,43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,M4Y,43.665860,-79.383160
100,M7Y,East Toronto,Business reply mail Processing CentrE,M7Y,43.662744,-79.321558
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...,M8Y,43.636258,-79.498509


In [17]:
merged_df.drop(labels = 'Postal Code', axis = 1, inplace = True)

In [None]:
merged_df

Now the merged dataframe is go to go.

## Third Part

In [20]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Let's create a map to see Toronto and its Neighborhoods.

In [22]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(merged_df['Latitude'], merged_df['Longitude'], merged_df['Borough'], merged_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Foursquare credentials

In [23]:
CLIENT_ID = 'TU233DL4CRNQQNZC3LUK0LDIRUCMP5H3KOIXJVUWD04520UN'
# your Foursquare ID
CLIENT_SECRET = 'QQXSBQ0BVKRTQC02YVUDE0FGCIIWT13PIOTHMYHLF5IAYGGU'
# your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TU233DL4CRNQQNZC3LUK0LDIRUCMP5H3KOIXJVUWD04520UN
CLIENT_SECRET:QQXSBQ0BVKRTQC02YVUDE0FGCIIWT13PIOTHMYHLF5IAYGGU


In [33]:
# type your answer here
query = 'coffee'
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&query={}&ll={},{}&v={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, query, latitude, longitude, VERSION, LIMIT)
venues_data = requests.get(url).json()

In [57]:
venues = venues_data['response']['venues']
nearby_venues = pd.json_normalize(venues)
nearby_venues

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.crossStreet,location.postalCode,location.neighborhood,venuePage.id
0,4b44fc77f964a520cc0026e3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,427 University Avenue,43.654053,-79.38809,"[{'label': 'display', 'lat': 43.65405317976302...",340,CA,Toronto,ON,Canada,"[427 University Avenue, Toronto ON, Canada]",,,,
1,59f784dd28122f14f9d5d63d,HotBlack Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,245 Queen Street West,43.650364,-79.388669,"[{'label': 'display', 'lat': 43.65036434800487...",515,CA,Toronto,ON,Canada,"[245 Queen Street West (at St Patrick St), Tor...",at St Patrick St,M5V 1Z4,Entertainment District,463001529.0
2,4b0aaa8ef964a520272623e3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,"483 Bay St,Bell Trinity Square",43.653436,-79.382314,"[{'label': 'display', 'lat': 43.653436, 'lng':...",130,CA,Toronto,ON,Canada,"[483 Bay St,Bell Trinity Square (Bell Trinity ...",Bell Trinity Square,M5G 2C9,,
3,4baa9f6cf964a520817a3ae3,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,401 Bay St.,43.652135,-79.381172,"[{'label': 'display', 'lat': 43.65213455850074...",268,CA,Toronto,ON,Canada,"[401 Bay St. (at Richmond St. W), Toronto ON M...",at Richmond St. W,M5H 2Y4,,
4,53e8acc4498ee294fb100183,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,425 University Ave,43.65427,-79.387448,"[{'label': 'display', 'lat': 43.65427, 'lng': ...",296,CA,Toronto,ON,Canada,"[425 University Ave (Dundas), Toronto ON M5G 1...",Dundas,M5G 1T6,,
5,4c19447c4ff90f4765ac0f49,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,"595 Bay St,Atrium On Bay",43.656219,-79.38329,"[{'label': 'display', 'lat': 43.656219, 'lng':...",309,CA,Toronto,ON,Canada,"[595 Bay St,Atrium On Bay (at Atrium on Bay), ...",at Atrium on Bay,M5G 2C2,,
6,4baa31def964a52037523ae3,Coffee office,[],v-1587683986,False,350 Bay St - 7th Floor,43.649498,-79.386479,"[{'label': 'display', 'lat': 43.649498, 'lng':...",488,CA,Toronto,ON,Canada,"[350 Bay St - 7th Floor, Toronto ON, Canada]",,,,
7,4ec514ec9911232436e364af,Timothy's World Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,Yonge,43.6567,-79.379941,"[{'label': 'display', 'lat': 43.65669995833159...",481,CA,Toronto,ON,Canada,"[Yonge (Dundas), Toronto ON M5B 2G9, Canada]",Dundas,M5B 2G9,,
8,4fff1f96e4b042ae8acddca5,Fahrenheit Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,120 Lombard St,43.652384,-79.372719,"[{'label': 'display', 'lat': 43.65238358726612...",911,CA,Toronto,ON,Canada,"[120 Lombard St (at Jarvis St), Toronto ON M5C...",at Jarvis St,M5C 3H5,,
9,4fb13c20e4b011e6f93513c0,Balzac's Coffee,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",v-1587683986,False,122 Bond Street,43.657854,-79.3792,"[{'label': 'display', 'lat': 43.65785440672277...",618,CA,Toronto,ON,Canada,"[122 Bond Street (at Gould St.), Toronto ON M5...",at Gould St.,M5B 1X8,,


In [58]:
curated_df = nearby_venues.drop(labels = ['id', 'referralId', 'hasPerk', 'location.address', 'location.labeledLatLngs', 'location.distance', 'location.cc', 'location.city', 'location.country', 'location.formattedAddress', 'location.crossStreet', 'location.neighborhood', 'venuePage.id', 'location.state'], axis = 1)
curated_df = curated_df.rename(columns={'location.lat': 'lat', 'location.lng': 'lng', 'location.postalCode': 'Postal code'})

In [59]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [60]:
curated_df['categories'] = curated_df.apply(get_category_type, axis=1)

In [61]:
curated_df

Unnamed: 0,name,categories,lat,lng,Postal code
0,Timothy's World Coffee,Coffee Shop,43.654053,-79.38809,
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V 1Z4
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G 2C9
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H 2Y4
4,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G 1T6
5,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G 2C2
6,Coffee office,,43.649498,-79.386479,
7,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B 2G9
8,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C 3H5
9,Balzac's Coffee,Coffee Shop,43.657854,-79.3792,M5B 1X8


In [67]:
curated_df.dtypes

name            object
categories      object
lat            float64
lng            float64
Postal code     object
dtype: object

In [68]:
curated_df['Postal code'].astype('object')

0         NaN
1     M5V 1Z4
2     M5G 2C9
3     M5H 2Y4
4     M5G 1T6
5     M5G 2C2
6         NaN
7     M5B 2G9
8     M5C 3H5
9     M5B 1X8
10    M5C 3G8
11    M5G 1X5
12    M5K 1A1
13    M5B 2H4
14    M5H 4B6
15    M5S 1Y9
16        NaN
17        NaN
18    M5T 1P7
19    M5V 3K2
20    M5J 1C3
21    M5J 1E6
22        NaN
23        NaN
24        NaN
25    M5E 1M6
26    M4X 1P3
27    M5T 1R5
28    M5G 0A6
29        NaN
30        NaN
31        NaN
32        NaN
33    M4Y 1N6
34        NaN
35        NaN
36        NaN
37        NaN
38    M5G 1R3
39        NaN
40        M5G
41    M5A 3C4
42    M5T 3K5
43    M5S 3A9
44    M5G 2B4
45        NaN
46        NaN
47    M5H 3S6
48    M5H 2G4
49        NaN
Name: Postal code, dtype: object

In [75]:
def get_postalcode_standarization(row):
    try:
        postal_codes = row['Postal code']
    except:
        postal_codes = row['Postal Code']
    if postal_codes == 'NaN':
        postal_codes = str(postal_codes)
        return None
    else:
        postal_codes = str(postal_codes)
        postal_codes = postal_codes[0:3]
        return postal_codes

In [76]:
curated_df['Postal code'] = curated_df.apply(get_postalcode_standarization, axis=1)

In [77]:
curated_df

Unnamed: 0,name,categories,lat,lng,Postal code
0,Timothy's World Coffee,Coffee Shop,43.654053,-79.38809,
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H
4,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G
5,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G
6,Coffee office,,43.649498,-79.386479,
7,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B
8,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C
9,Balzac's Coffee,Coffee Shop,43.657854,-79.3792,M5B


In [82]:
curated_venues = curated_df[curated_df['Postal code'] != 'NaN']
curated_venues = curated_df[curated_df['Postal code'] != 'nan']
curated_venues

Unnamed: 0,name,categories,lat,lng,Postal code
1,HotBlack Coffee,Coffee Shop,43.650364,-79.388669,M5V
2,Timothy's World Coffee,Coffee Shop,43.653436,-79.382314,M5G
3,Timothy's World Coffee,Coffee Shop,43.652135,-79.381172,M5H
4,Timothy's World Coffee,Coffee Shop,43.65427,-79.387448,M5G
5,Timothy's World Coffee,Coffee Shop,43.656219,-79.38329,M5G
7,Timothy's World Coffee,Coffee Shop,43.6567,-79.379941,M5B
8,Fahrenheit Coffee,Coffee Shop,43.652384,-79.372719,M5C
9,Balzac's Coffee,Coffee Shop,43.657854,-79.3792,M5B
10,Timothy's World Coffee,Coffee Shop,43.650948,-79.376825,M5C
11,TEMPORARILY CLOSED-Second Cup Coffee Co. featu...,Café,43.657473,-79.390637,M5G


In [83]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [84]:
toronto_venues = getNearbyVenues(names=merged_df['Neighborhood'],
                                   latitudes=merged_df['Latitude'],
                                   longitudes=merged_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park / Harbourfront
Lawrence Manor / Lawrence Heights
Queen's Park / Ontario Provincial Government
Islington Avenue
Malvern / Rouge
Don Mills
Parkview Hill / Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park / Princess Gardens / Martin Grove / Islington / Cloverdale
Rouge Hill / Port Union / Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate / Bloordale Gardens / Old Burnhamthorpe / Markland Wood
Guildwood / Morningside / West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor / Wilson Heights / Downsview North
Thorncliffe Park
Richmond / Adelaide / King
Dufferin / Dovercourt Village
Scarborough Village
Fairview / Henry Farm / Oriole
Northwood Park / York University
East Toronto
Harbourfront East / Union Station / Toronto Islands
Little Portugal / Trinity
Kennedy Park / Ionview / East Birchmount Park
Bayview Village
Do