<center><h1>Segmenting and Clustering Neighborhoods in Toronto city</h1></center>

## Part 1 :  Obtain Toronto neighbourhood data from Wikipedia and create a dataframe 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
import urllib.request
from urllib.request import urlopen
from bs4 import BeautifulSoup

url="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
response=requests.get(url)                                               
soup = BeautifulSoup(response.content, 'html.parser')                   

toronto_table = soup.find('table',attrs={'class':'wikitable sortable'})
toronto_table_rows=toronto_table.tbody.find_all('tr')

t_headers=[]
for th in toronto_table_rows[0].find_all('th'):
    t_headers.append(th.text.replace('\n', ' ').strip())  

t_data={t_headers[0]:[],
       t_headers[1]:[],
        t_headers[2]:[]}


for tr in toronto_table_rows[1:]:
    for td,th in zip(tr.find_all('td'),t_headers):
        t_data[th].append(td.text.replace('\n', ' ').strip())

df=pd.DataFrame(t_data)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


<p>This is the original table. Next we need to remove all rows with <b>Bourough = Not Assigned</b>. So First of all, replace <b>Not Assigned</b> with <b>NaN</b> and then drop rows with <b>Bourough = NaN</b>.</p>

In [2]:
df.Borough.replace('Not assigned',np.nan,inplace=True)
df.dropna(axis=0,inplace=True)
df.reset_index(drop=True,inplace=True)
df

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


<p>let's check if there any <b>Not Assigned</b> value exist in <b>Neighbourhood</b> column.

In [3]:
df[df['Neighbourhood']=='Not assigned']

Unnamed: 0,Postal Code,Borough,Neighbourhood


<p>There is no 'Not Assigned' in <b>Neighbourhood</b> column.</p>

In [4]:
df.shape

(103, 3)

## Part 2 : Get the latitude and the longitude coordinates of each neighborhood

<p>Let's use <b>Geospatial_Coordinates.csv</b> file to get latitude and longitude data</p>

In [5]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<p>Merge two dataframes</p>

In [6]:
toronto_data=pd.merge(df,geo_data,on='Postal Code')
toronto_data

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## Part 3 : Explore and cluster neighbourhoods in Tororonto

<p>Let's work with only boroughs that contain the word <b>Toronto</b></p>

In [7]:
filtered_toronto_data=toronto_data[toronto_data.Borough.str.contains('Toronto')]
filtered_toronto_data.reset_index(drop=True,inplace=True)
filtered_toronto_data

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [8]:
filtered_toronto_data.shape

(39, 5)

<p>Lets find the <b>latitude</b> and <b>longitude</b> coordinates of Toronto city</p>

In [9]:
import folium
from geopy.geocoders import Nominatim

address='Toronto, ON'

geolocator = Nominatim(user_agent="foursquare_api")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.6534817, -79.3839347.


<h4>Create a map of Toronto with neighborhoods superimposed on top</h4>

In [11]:
# create map of Toronto using latitude and longitude values
map_toronto= folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(filtered_toronto_data['Latitude'], filtered_toronto_data['Longitude'], filtered_toronto_data['Borough'], filtered_toronto_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto