<h1>Initial Postal Code Dataframe</h1>

<h5>1. Read the raw postal code data from Wikipedia into a dataframe.</h5>

In [40]:
import pandas as pd

INPUT_TABLE = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

codes = pd.read_html(INPUT_TABLE, header=0)[0]
codes.describe()

Unnamed: 0,Postal Code,Borough,Neighbourhood
count,180,180,180
unique,180,11,100
top,M7V,Not assigned,Not assigned
freq,1,77,77


<h5>2. Remove all entries where borough is "Not assigned".</h5>

Check for instances where Bourough equals "Not assigned" in Dataframe.

In [41]:
'Not assigned' in codes['Borough'].values

True

Filter Dataframe to exclude all rows where Borough is equal to "Not assigned".

In [42]:
codes = codes[codes['Borough']!='Not assigned']
codes.describe()

Unnamed: 0,Postal Code,Borough,Neighbourhood
count,103,103,103
unique,103,10,99
top,M9M,North York,Downsview
freq,1,24,4


Confirm cleanup.

In [43]:
'Not assigned' in codes['Borough'].values

False

<h5>3. When Neighbourhood is "Not assigned", reassign its value as the corresponding Borough.</h5>

Check for instances where Neighbourhood equals "Not assigned" in Dataframe. There are none, so no further work is needed.

In [44]:
'Not assigned' in codes['Neighbourhood'].values

False

<h5>4. Collect all Neighborhoods as comma-separated lists into their respective postal codes.</h5>

Group Dataframe by Postal Code and Borough and concatenate all resulting Neighborhoods into one comma-separated list.

In [45]:
codes['Neighbourhood'] = codes.groupby(['Postal Code','Borough'])['Neighbourhood'].transform(lambda x: ', '.join(x))
codes.reset_index(drop=True, inplace=True)
codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [46]:
codes.shape

(103, 3)

<hr />

<h1>Postal Code Dataframe with Longitude and Latitude</h1>

<h5>1. Install Geocoder API</h5>

In [49]:
import sys
!{sys.executable} -m pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 9.3 MB/s  eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


<h5>2. Add a Latitude and Longitude to each dataframe entry corresponding to the Postal Code.</h5>

NOTE: Geocoder was not returning a result for any postal code in the dataframe, so dummy data is shown below.

In [74]:
import geocoder

#Function I would actually use if geocoder was functioning correctly
def get_coords(postal_code):
    coords = None
    while(coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        coords = g.latlng
    return coords

#Function to return fake coordinates to demonstrate how to accomplish datafram
def fake_coords(postal_code):
    return ['Fake Lat', 'Fake Long']

#Dummy dataframe
test_codes = codes.copy(deep=True)
test_codes['Latitude'], test_codes['Longitude'] = test_codes['Postal Code'].transform(fake_coords)[0]

test_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,Fake Lat,Fake Long
1,M4A,North York,Victoria Village,Fake Lat,Fake Long
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",Fake Lat,Fake Long
3,M6A,North York,"Lawrence Manor, Lawrence Heights",Fake Lat,Fake Long
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",Fake Lat,Fake Long


Load the supplied latitude and longitude data into a Dataframe and join with postal code Dataframe.

In [80]:
latlng_df = pd.read_csv('https://cocl.us/Geospatial_data')

codes = codes.join(latlng_df.set_index('Postal Code'), on='Postal Code')
codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


<hr />

<h1>Postal Code Clustering</h1>

TBD