# Assignment - Segmenting and Clustering Neighborhoods in Toronto

### Introduction

In the first part of this assignment, we will need to extract postal codes from the following link:
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Only postal codes with an assigned borough will be consider for this Notebook.

### Import Libraries

First of all, let's import the necessary modules:

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis

#handle html data
!pip install bs4
from bs4 import BeautifulSoup

!pip install geocoder
import geocoder

print('Libraries imported.')

Libraries imported.


### Extract Postal Codes from Wiki


We use Requests to get HTML code from the wiki page and BeautifulSoup to handle the returned html data.

In [31]:
r = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

soup = BeautifulSoup(r.text.replace('\n', ''), "html.parser") #replaces line break

postal_code_data = pd.DataFrame(columns=["Postal Code", "Borough", "Neighbourhood"])

#finds the correct table based on its class
postal_code_table = soup.find("table", {"class": "wikitable"})

for row in postal_code_table.find("tbody").find_all("tr"):
    if not row.find_all("th"): #handle data only no table head is found
        col = row.find_all("td")
        postal_code = col[0].text
        borough = col[1].text
        neighbourhood = col[2].text
        
        if borough.upper() != 'NOT ASSIGNED':
            if neighbourhood.upper() == 'NOT ASSIGNED':
                neighbourhood = borough
            
            postal_code_data = postal_code_data.append({"Postal Code":postal_code, "Borough":borough, "Neighbourhood":neighbourhood}, ignore_index=True)

postal_code_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [32]:
postal_code_data.shape

(103, 3)

### Add Latitude and Longitude

Now, we get latitude and longitude from CSV File (http://cocl.us/Geospatial_data) to populate <i>postal_code_data</i>.

Download the CSV file to get the geo information:

In [28]:
csv_url = 'http://cocl.us/Geospatial_data'
df_csv = pd.read_csv(csv_url)
df_csv.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge both dataframes based on Postal Code column:

In [35]:
postal_code_data = pd.merge(postal_code_data, df_csv, on=["Postal Code"])
postal_code_data

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
