# Canada's boroughs and neighbourhoods

In this notebook a dataset of Canada's boroughs and neighbourhoods is created.

In [1]:
import pandas as pd

In Wikipedia there is a page that contains different location tables of Canada. The table with boroughs and neighbourhoods is the first one (index 0).

In [16]:
data = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [73]:
# The borough/neighborhood table is the first one
df = data[0]
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


Not assigned borough are removed

df = df[df.Borough != 'Not assigned']
df.head()

Assigning the borough name to 'Not assigned' neighbourhood 

In [70]:
indexes = df.loc[df['Neighbourhood'] == 'Not assigned', 'Neighbourhood'].index
for idx in indexes:
    df.iloc[idx]['Neighbourhood'] = df.iloc[idx]['Borough']

Grouping the dataset

In [71]:
res = df.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(','.join).reset_index()
res.to_csv('canada.csv', index=False)
res.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Scarborough,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Printing the dataset shape

In [72]:
print("Rows: ", res.shape[0])
print("Attributes: ", res.shape[1])

Rows:  103
Attributes:  3


## Getting geolocation data

Geolocation data will be linked to the borouhgs/neighbourhoods datasets.

In [76]:
import geocoder

A function to get the geolocation data for each postcode.

In [88]:
def getgeo(postcode):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postcode))
        lat_lng_coords = g.latlng
        print(lat_lng_coords)
    return lat_lng_coords

Testing. It is not working!

In [89]:
getgeo('M1B')

None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None


KeyboardInterrupt: 

Therefore, the given geolocation dataset will be used. First it has to be downloaded.

In [91]:
!wget -O geodata.csv https://cocl.us/Geospatial_data

--2019-03-20 21:38:01--  https://cocl.us/Geospatial_data
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving cocl.us (cocl.us)... 169.48.113.201
Connecting to cocl.us (cocl.us)|169.48.113.201|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-03-20 21:38:01--  https://ibm.box.com/shared/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Resolving ibm.box.com (ibm.box.com)... 107.152.27.197
Connecting to ibm.box.com (ibm.box.com)|107.152.27.197|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv [following]
--2019-03-20 21:38:02--  https://ibm.box.com/public/static/9afzr83pps4pwf2smjjcf1y5mvgb18rr.csv
Reusing existing connection to ibm.box.com:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/9af

What do we have in the geolocation dataset?

In [93]:
geo = pd.read_csv('geodata.csv')
geo.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Adjusting column names previous to merge with boroughs/neighbourhoods dataset.

In [97]:
geo.rename(columns={'Postal Code': 'Postcode'}, inplace=True)
geo.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merging the dataset.

In [100]:
df_geo = res.merge(geo, on='Postcode')
df_geo.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Scarborough,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [102]:
df_geo.to_csv('geo_toronto.csv', index=False)