# Canada's boroughs and neighbourhoods

In this notebook a dataset of Canada's boroughs and neighbourhoods is created.

In [1]:
import pandas as pd

In Wikipedia there is a page that contains different location tables of Canada. The table with boroughs and neighbourhoods is the first one (index 0).

In [16]:
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [67]:
# The borough/neighborhood table is the first one
df = data[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Not assigned borough are removed

In [69]:
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


Assigning the borough name to 'Not assigned' neighbourhood 

In [70]:
indexes = df.loc[df['Neighbourhood'] == 'Not assigned', 'Neighbourhood'].index
for idx in indexes:
    df.iloc[idx]['Neighbourhood'] = df.iloc[idx]['Borough']

Grouping the dataset

In [71]:
res = df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(','.join).reset_index()
res.to_csv('canada.csv', index=False)
res.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Scarborough,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Printing the dataset shape

In [72]:
print("Rows: ", res.shape[0])
print("Attributes: ", res.shape[1])

Rows:  103
Attributes:  3
