<a href="https://de.linkedin.com/in/delafuenteaguilerapablo/de-de"><img src="https://media-exp1.licdn.com/dms/image/C4D03AQElWBXe___WeQ/profile-displayphoto-shrink_200_200/0?e=1609372800&v=beta&t=IdUiY7Y78vTI3jcLbSWUboqDUTKDU2XgKg4KPptmGVY" width="200" align="right"></a>

# Segmenting and Clustering Neighborhoods in Toronto

## Project from Pablo de la Fuente Aguilera
### I´m an engineer and you can see my full background on my Linkedin profile: https://de.linkedin.com/in/delafuenteaguilerapablo/de-de
### This Notebook is part of the Capstone Project.
### This Project is the last Assignment of the IBM Professional Certificate in Data Science, offered by Coursera.org

In [1]:
# All libraries are imported here
import pandas as pd
import numpy as np
import geocoder
import requests 
from bs4 import BeautifulSoup 
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium 
import lxml
print('Libraries imported.')

Libraries imported.


### First Part of the Assignment

In [2]:
print("We read and scrape the table:")
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tables = pd.read_html(url, header=0)
table = tables[0]
table.head()

We read and scrape the table:


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [3]:
print('The shape of the table is:')
table.shape

The shape of the table is:


(180, 3)

In [4]:
print('Check the number of "Not assigned" in Borough')
table.Borough.value_counts()

Check the number of "Not assigned" in Borough


Not assigned        77
North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East Toronto         5
East York            5
York                 5
Mississauga          1
Name: Borough, dtype: int64

In [5]:
print('Check the number of "Not assigned" in Neighborhood')
table.Neighbourhood.value_counts()

Check the number of "Not assigned" in Neighborhood


Not assigned                                   77
Downsview                                       4
Don Mills                                       2
Birch Cliff, Cliffside West                     1
Canada Post Gateway Processing Centre           1
                                               ..
Queen's Park, Ontario Provincial Government     1
Christie                                        1
Toronto Dominion Centre, Design Exchange        1
Berczy Park                                     1
Central Bay Street                              1
Name: Neighbourhood, Length: 100, dtype: int64

In [6]:
print('Not assigned are replaced and dropped in Borough')
table.Borough.replace("Not assigned", np.nan, inplace = True)
table.dropna(axis=0, inplace=True)
table = table.reset_index()
table = table.drop(['index'], axis=1)
table.head(20)

Not assigned are replaced and dropped in Borough


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [7]:
print('We group Neighbourhoods with the same Postal Code')
table = table.groupby(['Postal Code', 'Borough'])['Neighbourhood'].apply(lambda x: "%s" % ', '.join(x))
table = table.reset_index()
table.head(20)

We group Neighbourhoods with the same Postal Code


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [8]:
print('We replace the "Not Assigned" to "Queen´s Park" in Neighborhood')
table = table.replace({'Not assigned' : "Queen's Park"}) 
table.rename(columns={"Postcode": "PostalCode"}, inplace=True)
table.head()

We replace the "Not Assigned" to "Queen´s Park" in Neighborhood


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [9]:
print('The shape of the new table is:')
table.shape

The shape of the new table is:


(103, 3)

### Second Part of the Assignment

In [10]:
lat_lon = pd.read_csv('https://cocl.us/Geospatial_data')
lat_lon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
lat_lon_list = pd.merge(table,lat_lon,on='Postal Code')
lat_lon_list.head(20)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
