# Section 1 - Toronto Neighborhood segmenting & clustering

The code is split into 4 sections -  
1) Converting wikipedia table into a dataframe  
2) Cleaning dataframe to remove all postcodes with unassigned borough  
3) Cleaning dataframe to replace neighborhood with borough if neighborhood is not assigned  
4) Merge neighborhood values for similar Postalcodes

### 1. Converting wikipedia table into a dataframe

In [1]:
import pandas as pd
from pandas.io.html import read_html
page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

wikitables = pd.read_html(page,  attrs={"class":"wikitable"})
toronto_neighborhood=wikitables[0]
toronto_neighborhood.columns=['Postalcode','Borough','Neighborhood']
toronto_neighborhood.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [2]:
toronto_neighborhood.shape

(287, 3)

### 2. Cleaning dataframe to remove all postcodes with unassigned borough

In [3]:
toronto_neighborhood = toronto_neighborhood[toronto_neighborhood.Borough!='Not assigned']
toronto_neighborhood.reset_index(drop = True, inplace = True)
toronto_neighborhood.sort_values(by='Postalcode', ascending=True, inplace=False, kind='quicksort', na_position='last')
toronto_neighborhood.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


In [4]:
toronto_neighborhood.shape

(210, 3)

### 3. Cleaning dataframe to replace neighborhood with borough if neighborhood is not assigned

In [5]:
i = 0
for i in range(0,toronto_neighborhood.shape[0]):
    if toronto_neighborhood.iloc[i][2] == 'Not assigned':
        toronto_neighborhood.iloc[i][2] = toronto_neighborhood.iloc[i][1]
        i = i+1
toronto_neighborhood.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


Check if any unassigned Boroughs exist

In [6]:
i = 0
for i in range(0,toronto_neighborhood.shape[0]):
    if toronto_neighborhood.iloc[i][2] == 'Not assigned':
        break
        print("Unassigned Boroughs exist")
        i = i+1
print("No unassigned Boroughs exist")

No unassigned Boroughs exist


### 4. Merge neighborhood values for similar Postalcodes

In [7]:
Toronto = toronto_neighborhood.groupby(['Postalcode','Borough'])['Neighborhood'].apply(', '.join).reset_index()
Toronto.head()

Unnamed: 0,Postalcode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [8]:
Toronto.shape

(103, 3)