# Clustering Neighborhoods in Toronto

Download and import any libraries we'll need.

In [1]:
#import sys
#!{sys.executable} -m pip install geocoder
import numpy as np
import pandas as pd
import geocoder

### Part 1: Create DataFrame of Toronto, Ontario Postal Codes

Save the url we will use to get our Toronto neighborhood data

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
url

'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

Read the url and isolate the data we're interested in.

In [3]:
data = pd.read_html(url)
df = data[0]
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


We are not intereseted in 'Not assigned' boroughs, so we can remove the rows that include them.
Reset the index since we removed rows.

In [4]:
df['Borough'].replace('Not assigned', np.nan, inplace = True)
df.dropna(subset=['Borough'], inplace = True)
df.reset_index(inplace=True, drop = True)
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


Check the shape of the dataframe.

In [5]:
df.shape

(103, 3)

### Part 2: Add Latitude and Longitude Coordinates for Postal Codes

Use the geocoder package to get the Latitude and Longitude coordinates for each post code

In [6]:
latitude = []
longitude = []
for i in range(df.shape[0]):
    # package can sometimes give a None result, but will eventually give coords when rerun
    lat_lng_coords = None
    while (lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(df['Postal Code'][i]))
        lat_lng_coords = g.latlng
    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])

Add the Latitude and Longitude coords to the dataframe.

In [7]:
df['Latitude'] = latitude
df['Longitude'] = longitude
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


### Part 3: Explore and Cluster Neighborhoods in Toronto