## Part 1. Postal codes
1. Use BeautifulSoup library to extract postal codes from Wikipedia
1. Load postal codes into pandas data frame
1. While loading the data, check for 'Not assigned' and exlclude or replace the cells
1. Use GroupBy to combine rows with repeating postal codes

In [2]:
wiki_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

# Import all necessary libraries

import requests # library to handle requests
from bs4 import BeautifulSoup # library to decode HTML pages
import pandas as pd # library to process data as dataframes
import numpy as np # library to handle data in a vectorized manner

from geopy.geocoders import Nominatim

import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors

from sklearn.cluster import KMeans # import k-means from clustering stage

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

In [12]:
# Downoad wiki page into soup
wiki_page = requests.get(wiki_url).text
soup = BeautifulSoup(wiki_page,'lxml')

Wikipedia uses sortable tables marked with __wikitable sortable__. I use it to find my table on the page.<br>
Then I read all lines from the table (using __tr__ tag) and each line contains a code, borough and neighbourhood.

In [13]:
# Load page lines
postal_table = soup.find('table',{'class':'wikitable sortable'})
postal_lines = postal_table.findAll('tr')

In [14]:
# Read <td> and collect the data into a dataframe
col1 = []
col2 = []
col3 = []

for tr in postal_table.find_all('tr'):
    tds = tr.find_all('td')
    if not tds:
        continue
    cell1, cell2, cell3 = [td.text.strip() for td in tds[:3]]
    if cell2 != 'Not assigned':
        if cell3 == 'Not assigned':
            cell3 = cell2
        col1.append(cell1)
        col2.append(cell2)
        col3.append(cell3)

df = pd.DataFrame()
df['Postalcode'] = col1
df['Borough'] = col2
df['Neighborhood'] = col3

In [15]:
# Group neighbourhoods
df_grouped = df.groupby(['Postalcode','Borough'])['Neighborhood'].apply(list)
df_grouped = df_grouped.sample(frac=1).reset_index()
df_grouped['Neighborhood']= df_grouped['Neighborhood'].str.join(', ')

df_grouped.shape
df_grouped.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M9W,Etobicoke,Northwest
1,M2M,North York,"Newtonbrook, Willowdale"
2,M4T,Central Toronto,"Moore Park, Summerhill East"
3,M6E,York,Caledonia-Fairbanks
4,M3N,North York,Downsview Northwest


## Part 2. Add coordinates

I decided not to work with unstable services, so I load coordinates from the provided csv file

In [16]:
url_coordinates = 'http://cocl.us/Geospatial_data'
dfCoords = pd.read_csv(url_coordinates)

In [17]:
dfAreas = df_grouped.merge(dfCoords, left_on='Postalcode',right_on='Postal Code')
dfAreas.drop(['Postal Code'], axis=1, inplace=True)
dfAreas.head()

Unnamed: 0,Postalcode,Borough,Neighborhood,Latitude,Longitude
0,M9W,Etobicoke,Northwest,43.706748,-79.594054
1,M2M,North York,"Newtonbrook, Willowdale",43.789053,-79.408493
2,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
3,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512
4,M3N,North York,Downsview Northwest,43.761631,-79.520999
