# Segmenting and Clustering Neighborhoods in **Toronto**

In this notebook, we will scrape a data frame from a wikipedia page. This data frame shows the Postal Code and Borough of each Neighborhood at the city of Toronto-Canada.


In [1]:
# Dependencies
import pandas as pd
import numpy as np


# FIRTS TASK

Using Pandas to scrape a dataframe


In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


As we can see, there are some Postal Codes assigned for more than one neighborhood. In those cases, the neighborhoods are already at the same row, separated with coma.
We will drop the cells with assigned Boroughs, and we will replace unassigned neighborhoods with its Borough names.


In [3]:
# Drop unassigned Boroughs and replace unsigned Neighborhoods
df = df[df['Borough']!='Not assigned']
df['Neighborhood'].replace('Not assigned', df['Borough'], inplace=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
df.reset_index()
df.shape

(103, 3)

The new data frame has 103 rows.

# SECOND TASK

We'll import a csv file containing every Postal Code from Toronto and its Latitude and Longitude
(geocoder wasn't working)

In [20]:
# getting latitude and longitude coordinates of each postal code from csv file

df_PC = pd.read_csv('http://cocl.us/Geospatial_data')
df_PC.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
# Merging both data frames by the 'Postal Code' column

df = pd.merge(df, df_PC, on='Postal Code')
df



Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509
