<h1 align=center>(Part 2) Segmenting and Clustering Neighborhoods in Toronto</font></h1>

## Purpose
Retrieve pstal code information and neighborhoods in Toronto from the following Wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Then store the table into a refined dataframe.

Next using Foursquare location data, we will need to assign latitude and longitude coordinates of each neighbourhood.

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
import urllib.request
from bs4 import BeautifulSoup

## Extract and Store Table into Dataframe

In [2]:
#Retrieve Wikipedia webpage and HTML elements
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "lxml")
table = soup.find('table', class_='wikitable sortable')

In [65]:
#Extract the text from the Postal Code Tables
table_array = []

for row in table.findAll('tr'):
    cells=row.findAll('td')

    if len(cells) > 1:
        table_array.append([cells[0].get_text(strip=True),cells[1].get_text(strip=True),cells[2].get_text(strip=True)])

table_array[:10]

[['M1A', 'Not assigned', 'Not assigned'],
 ['M2A', 'Not assigned', 'Not assigned'],
 ['M3A', 'North York', 'Parkwoods'],
 ['M4A', 'North York', 'Victoria Village'],
 ['M5A', 'Downtown Toronto', 'Regent Park, Harbourfront'],
 ['M6A', 'North York', 'Lawrence Manor, Lawrence Heights'],
 ['M7A', 'Downtown Toronto', "Queen's Park, Ontario Provincial Government"],
 ['M8A', 'Not assigned', 'Not assigned'],
 ['M9A', 'Etobicoke', 'Islington Avenue, Humber Valley Village'],
 ['M1B', 'Scarborough', 'Malvern, Rouge']]

## Store tables into dataframe and clean it up

In [64]:
columns = ['PostalCode','Borough','Neighborhood']
df_wikidata=pd.DataFrame(data=table_array,columns=columns)
df_wikidata = df_wikidata[df_wikidata.Borough != 'Not assigned']
df_wikidata = df_wikidata.reset_index(drop=True)

df_wikidata.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [14]:
#Check the shape of the dataframe
df_wikidata.shape

(103, 3)

## Assign Coordinates

In [45]:
#Extract the coordinates from the csv as Geocoder was not working
df_geodata = pd.read_csv('http://cocl.us/Geospatial_data',
                         header=0   
                        )

In [63]:
df_merged = pd.merge(df_wikidata,df_geodata,how='left',left_on='PostalCode',right_on='Postal Code')
df_merged.drop(columns='Postal Code',inplace=True)
print(df_merged.shape)
df_merged.head(10)

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
