# Segmenting and Clustering Neighborhoods in Toronto

In this excercise I used the Beautifulsoup package to explore web scrapping.
I parsed the information from -> https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M using requests

In [1]:
#Installing the package using pip
import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup

In [17]:
#Getting data
URL="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
r = requests.get(URL) 
soup = BeautifulSoup(r.content, 'html5lib') 
table=str(soup.table)



## Data extraction and cleansing

In [32]:
#using Pandas to clean the data
dfs = pd.read_html(table)
df=dfs[0]
df
# Dropping the rows where Borough is 'Not assigned'
df1 = df[df.Borough != 'Not assigned']

# Combining the neighbourhoods with same Postalcode
df2 = df1.groupby(['Postal code','Borough'], sort=False).agg(', '.join)
df2.reset_index(inplace=True)

# Replacing the name of the neighbourhoods which are 'Not assigned' with names of Borough
df2['Neighborhood'] = np.where(df2['Neighborhood'] == 'Not assigned',df2['Borough'], df2['Neighborhood'])

df2.head(11)


Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,Malvern / Rouge
7,M3B,North York,Don Mills
8,M4B,East York,Parkview Hill / Woodbine Gardens
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [33]:
# Shape data
df2.shape

(103, 3)

## Getting Lats/Long and merging

In [34]:
df3=pd.read_csv('http://cocl.us/Geospatial_data')
df3.dtypes


Postal Code     object
Latitude       float64
Longitude      float64
dtype: object

In [31]:
df4 = pd.merge(df2,df3,left_on='Postal code',right_on='Postal Code') 
df4.shape

(103, 6)

## Visualizing the Maps

In [35]:
import folium # plotting library

In [37]:
map_toronto = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,borough,neighbourhood in zip(df4['Latitude'],df4['Longitude'],df4['Borough'],df4['Neighborhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto