<h1 align=center>(Part 1) Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Purpose
Retrieve pstal code information and neighborhoods in Toronto from the following Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Then store the table into a refined dataframe

## Import Libraries

In [71]:
import pandas as pd
import numpy as np
import urllib.request
from bs4 import BeautifulSoup

## Extract and Store Table into Dataframe

In [70]:
#Retrieve Wikipedia webpage and HTML elements
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
table = soup.find('table', class_='wikitable sortable')

In [132]:
#Extract the text from the Postal Code Tables
table_array = []

for row in table.findAll('tr'):
    cells=row.findAll('td')

    if len(cells) > 1:
        table_array.append([cells[0].get_text(strip=True),cells[1].get_text(strip=True),cells[2].get_text(strip=True)])

table_array[:5]

[['M1A', 'Not assigned', ''],
 ['M2A', 'Not assigned', ''],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Regent Park, Harbourfront']]

## Store tables into dataframe and clean it up

In [133]:
columns = ['PostalCode','Borough','Neighborhood']
df=pd.DataFrame(data=table_array,columns=columns)
df = df[df.Borough != 'Not assigned']
df = df.reset_index(drop=True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [134]:
#Check the shape of the dataframe
df.shape

(103, 3)