# Toronto Neighborhoods Postal Codes
#### This Notebook downloads the Toronto postal codes from the Wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, builds and cleans a DataFrame

In [9]:
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Read the web-page

In [10]:
# Read the web-page and select the first table available
tables = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

df = tables[0]
df.head()        

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Process the DataFrame according to the task: rename columns, drop invalid rows.

In [11]:
df.rename(columns = {'Postal Code':'PostalCode'}, inplace = True) # rename the first column according to the task

# drop the rows, for which the Borough is not assigned
df = df.drop(df[df['Borough']=='Not assigned'].index).reset_index().drop(['index'],axis=1)

Now we check if any of the Neighborhoods are not assigned. To do so, we define a function and then apply it to the columns containing Borough and Neighborhoods information.

In [12]:
# function checking if any of the remaining Neighborhoods are not assigned, and assigning them to the Borough value
def assign_neighb(row: pd.Series):
    if 'Not assigned' in row['Neighborhood']:
        return row['Borough']
    else:
        return row['Neighborhood']

In [13]:
# applying the assign_neighb function
df['Neighborhood'] = df[['Borough', 'Neighborhood']].apply(assign_neighb, axis=1)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [14]:
df.shape # checking the shape of the DataFrame

(103, 3)

#### Find and assign the geographic coordinates to the available postal codes

The geocoder option did not work for me, so I am using the provided CSV file to attach the geographic coordinate to the initial DataFrame. I first read and preprocess the CSV, namely, I set its indexes to Postal Code.

In [20]:
coord = pd.read_csv('Geospatial_Coordinates.csv') # read the provided CSV file
coord.rename(columns={'Postal Code':'PostalCode'}, inplace = True) # rename the postal code column
coord.set_index('PostalCode', inplace = True) # set index 
coord.head()

Unnamed: 0_level_0,Latitude,Longitude
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1
M1B,43.806686,-79.194353
M1C,43.784535,-79.160497
M1E,43.763573,-79.188711
M1G,43.770992,-79.216917
M1H,43.773136,-79.239476


In [22]:
df = df.join(coord, on='PostalCode')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
