# Clustering Analysis of Toronto Neighborhoods

This notebook contains the codes to obtain the Toronto neighborhoods table from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M and cluster it

In [1]:
import pandas as pd # for reading and processing tabular data

## 3.1. Exploring the neighborhoods of Toronto

Read the Toronto neighborhoods table from Wikipedia

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df_nb = pd.read_html(url, header=0)[0]
df_nb.columns = ['Postal Code', 'Borough', 'Neighborhood'] # rename columns
df_nb.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [3]:
# Remove rows with 'Not assigned' values for 'Borough'
df_nb = df_nb[df_nb.Borough != 'Not assigned']
df_nb.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Merge neighborhoods of the same postal code into a single row

In [4]:
df_nb = df_nb.groupby(by=['Postal Code', 'Borough']).agg(list)
df_nb.Neighborhood = df_nb.Neighborhood.str.join(', ')
df_nb.reset_index(inplace=True)
df_nb.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


Obtain boroughs with no assigned neighborhoods and replace them

Neighborhoods with a 'Not assigned' value is replaced with their borough's name

In [5]:
# Obtain neighborhoods with 'Not assigned' values and replace them with the borough name
na_indices = df_nb.index[df_nb.Neighborhood.str.contains('Not assigned')].tolist()
for i in na_indices:
    df_nb.iloc[i, 2] = df_nb.iloc[i, 1]

df_nb.iloc[na_indices]

Unnamed: 0,Postal Code,Borough,Neighborhood
85,M7A,Queen's Park,Queen's Park


In [6]:
# Get the shape of the final DataFrame
print('The Toronto neighborhoods table consists of %i rows' % df_nb.shape[0])

The Toronto neighborhoods table consists of 103 rows


## 3.2. Obtain the geospatial coordinates of Toronto neighborhoods

Read the geospatial coordinates data of Toronto neighborhoods by postal codes

In [7]:
df_geo = pd.read_csv('Geospatial_Coordinates.csv')
df_geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the geospatial data with the names of the neighborhoods

In [8]:
df_merged = pd.merge(df_nb, df_geo, on='Postal Code')
df_merged.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
