# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

### For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

#### 1)Start by creating a new Notebook for this assignment.
#### 2)Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

### (PART 1) Preprocessing

In [1]:
# importing necessary libraries

import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests

In [2]:
# getting data from the internet

wiki_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
raw_wiki_data=requests.get(wiki_link).text
 
# using beautifulsoup to parse the HTML/XML codes

soup =BeautifulSoup(raw_wiki_data,'xml')

#print(soup.prettify())

## Extracting raw table from web page

In [3]:
# extracting the raw table inside that webpage
table = soup.find('table')

Postcode      = []
Borough       = []
Neighbourhood = []

# print(table)

# extracting a clean form of the table
for tr_cell in table.find_all('tr'):
    
    counter = 1
    Postcode_var      = -1
    Borough_var       = -1
    Neighbourhood_var = -1
    
    for td_cell in tr_cell.find_all('td'):
        if counter == 1: 
            Postcode_var = td_cell.text
        if counter == 2: 
            Borough_var = td_cell.text
            tag_a_Borough = td_cell.find('a')
            
        if counter == 3: 
            Neighbourhood_var = str(td_cell.text).strip()
            tag_a_Neighbourhood = td_cell.find('a')
            
        counter +=1
        
    if (Postcode_var == 'Not assigned' or Borough_var == 'Not assigned' or Neighbourhood_var == 'Not assigned'): 
        continue
    try:
        if ((tag_a_Borough is None) or (tag_a_Neighbourhood is None)):
            continue
    except:
        pass
    if(Postcode_var == -1 or Borough_var == -1 or Neighbourhood_var == -1):
        continue
        
    Postcode.append(Postcode_var)
    Borough.append(Borough_var)
    Neighbourhood.append(Neighbourhood_var)

## Integrating Postal codes with more than 1 neighbour

In [4]:
unique_p = set(Postcode)
print('num of unique Postal codes:', len(unique_p))
Postcode_u      = []
Borough_u       = []
Neighbourhood_u = []


for postcode_unique_element in unique_p:
    p_var = ''; b_var = ''; n_var = ''; 
    for postcode_idx, postcode_element in enumerate(Postcode):
        if postcode_unique_element == postcode_element:
            p_var = postcode_element;
            b_var = Borough[postcode_idx]
            if n_var == '': 
                n_var = Neighbourhood[postcode_idx]
            else:
                n_var = n_var + ', ' + Neighbourhood[postcode_idx]
    Postcode_u.append(p_var)
    Borough_u.append(b_var)
    Neighbourhood_u.append(n_var)

num of unique Postal codes: 77


### creating pandas Data frame

In [5]:
toronto_dict = {'Postal Code':Postcode_u, 'Borough':Borough_u, 'Neighbourhood':Neighbourhood_u}
df_toronto = pd.DataFrame.from_dict(toronto_dict)
df_toronto.to_csv('toronto_part1.csv')
df_toronto.head(14)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1T,Scarborough,Tam O'Shanter
1,M9A,Etobicoke,Islington Avenue
2,M6L,North York,Downsview
3,M6J,West Toronto,"Little Portugal, Trinity"
4,M4B,East York,"Woodbine Gardens, Parkview Hill"
5,M6A,North York,"Lawrence Heights, Lawrence Manor"
6,M3K,North York,CFB Toronto
7,M5A,Downtown Toronto,"Harbourfront, Regent Park"
8,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ..."
9,M4X,Downtown Toronto,"Cabbagetown, St. James Town"


In [6]:
df_toronto.shape

(77, 3)

#### The no. of unique postal code is 77 if you think any part of my code is wrong feel free to tell me , Thanks

## PART 2 - Geo co_ordinates

In [11]:
# reading the co_ordinates csv file into data frame

co_ordinates=pd.read_csv('Geospatial_Coordinates.csv')

In [14]:
# merging the both data frames

df_toronto_co = pd.merge(df_toronto,co_ordinates, on='Postal Code')
df_toronto_co.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1T,Scarborough,Tam O'Shanter,43.781638,-79.304302
1,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
2,M6L,North York,Downsview,43.713756,-79.490074
3,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975
4,M4B,East York,"Woodbine Gardens, Parkview Hill",43.706397,-79.309937
5,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
6,M3K,North York,CFB Toronto,43.737473,-79.464763
7,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
8,M1P,Scarborough,"Dorset Park, Scarborough Town Centre, Wexford ...",43.75741,-79.273304
9,M4X,Downtown Toronto,"Cabbagetown, St. James Town",43.667967,-79.367675
