<h1 align=center>(Part 1) Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Purpose
Retrieve pstal code information and neighborhoods in Toronto from the following Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Then store the table into a refined dataframe

## Import Libraries

In [71]:
import pandas as pd
import numpy as np
import urllib.request
from bs4 import BeautifulSoup

## Extract and Store Table into Dataframe

In [70]:
#Retrieve Wikipedia webpage and HTML elements
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
table = soup.find('table', class_='wikitable sortable')

In [103]:
#Extract the text from the Postal Code Tables
column1 = []
column2 = []
column3 = []

for row in table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells) > 1:
        column1.append(cells[0].get_text(strip=True))
        column2.append(cells[1].get_text(strip=True))
        column3.append(cells[2].get_text(strip=True))
        
print(column1[:5])
print(column2[:5])
print(column3[:5])

['M1A', 'M2A', 'M3A', 'M4A', 'M5A']
['Not assigned', 'Not assigned', 'North York', 'North York', 'Downtown Toronto']
['', '', 'Parkwoods', 'Victoria Village', 'Regent Park, Harbourfront']


In [104]:
#Store tables into dataframe and clean it up
df=pd.DataFrame(data=column1,columns=['PostalCode'])
df['Borough'] = column2
df['Neighborhood'] = column3
df = df[df.Borough != 'Not assigned']
df = df.reset_index(drop=True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [105]:
#Check the shape of the dataframe
df.shape

(103, 3)