# Segmenting and Clustering Neighborhoods in Toronto - Part 1

In this notebook I will scrape the Toronto neighborhood data from the below Wikipedia site and store it as a Pandas dataframe.

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

#### Install all necessary libraries

In [2]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

#### Pull table information from Wikipedia site

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
res = requests.get(url)

soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')
df = pd.read_html(str(table))

In [4]:
tables = table[0]
tab_data = [[cell.text for cell in row.find_all(["th","td"])]
                        for row in tables.find_all("tr")]
df = pd.DataFrame(tab_data)

#### Format dataframe of postal code information

In [5]:
df.columns = ['PostalCode','Borough','Neighborhood']

In [6]:
df = df.iloc[1:,]
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M1A\n,Not assigned\n,Not assigned\n
2,M2A\n,Not assigned\n,Not assigned\n
3,M3A\n,North York\n,Parkwoods\n
4,M4A\n,North York\n,Victoria Village\n
5,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"


In [7]:
df['PostalCode'] = df['PostalCode'].str.replace('\n','')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M1A,Not assigned\n,Not assigned\n
2,M2A,Not assigned\n,Not assigned\n
3,M3A,North York\n,Parkwoods\n
4,M4A,North York\n,Victoria Village\n
5,M5A,Downtown Toronto\n,"Regent Park, Harbourfront\n"


In [8]:
df['Borough'] = df['Borough'].str.replace('\n','')
df['Neighborhood'] = df['Neighborhood'].str.replace('\n','')
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [9]:
df = df[df['Borough'] != 'Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,"Regent Park, Harbourfront"
6,M6A,North York,"Lawrence Manor, Lawrence Heights"
7,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [14]:
df.shape

(103, 3)