# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

## 1. Getting and cleaning the dataframe

First we read the dataframe from the wiki page.

In [101]:
import pandas as pd
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

dfs = pd.read_html(link)
len(dfs)

3

The wiki page has 3 dataframes. The first element (index 0) is our match.

In [102]:
df = dfs[0]
df.head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Init shape of the dataframe is 180x3. After cleaning the 'not assigned' borough, the new dataframe size is 103x3.

In [103]:
print('Init shape:',df.shape)
# df2 = dataframe of 'not assigned' borough 
df2 = df[df.Borough == 'Not assigned']
print('Not assigned df shape:',df2.shape)
# Drop df2 from df
df.drop(df2.index, inplace=True)
print('final shape:',df.shape)
df

Init shape: (180, 3)
Not assigned df shape: (77, 3)
final shape: (103, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


We check how many cases of 'not assigned' neighbourhood are there in our cleaned dataframe. The shape is 0x3 si there is none; We can keep our previous dataframe without any more calculus.

In [104]:
# Not assigned neighbourhood will be rename as their borough
df2 = df[df.Neighbourhood == 'Not assigned']
print("Shape of not assigned neighbourhood:",df2.shape)

Shape of not assigned neighbourhood: (0, 3)


We noticed before that every same postal code neighbourhood remain in the same raw, sperated by comas. No further modification are needed, the final shape is 103x3.

In [105]:
df.shape

(103, 3)

## 2. Add Latitude and Longitude to the dataframe

We first tried to test geocoder but indeed the service was not working properly.

In [106]:
"""
# !pip install geocoder
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('Los Angeles, CA') #geocoder.google('{}, Toronto, Ontario'.format(M5G))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]
print("lat: ",latitude, " long : ",longitude)
"""

'\n# !pip install geocoder\nimport geocoder # import geocoder\n\n# initialize your variable to None\nlat_lng_coords = None\n\n# loop until you get the coordinates\nwhile(lat_lng_coords is None):\n  g = geocoder.google(\'Los Angeles, CA\') #geocoder.google(\'{}, Toronto, Ontario\'.format(M5G))\n  lat_lng_coords = g.latlng\n\nlatitude = lat_lng_coords[0]\nlongitude = lat_lng_coords[1]\nprint("lat: ",latitude, " long : ",longitude)\n'

So we loaded the geospatial data file.

In [107]:
link2geo = 'http://cocl.us/Geospatial_data'

df_geo = pd.read_csv(link2geo)
df_geo.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


And we merged it with our cleaning dataframe.

In [108]:
df = pd.merge(df_geo, df, on='Postal Code')
df.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighbourhood
0,M1B,43.806686,-79.194353,Scarborough,"Malvern, Rouge"
1,M1C,43.784535,-79.160497,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae


Then we re arrange the columns to make it look like what we want.

In [109]:
df = df[['Postal Code', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']]
df.head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## 3. Generating a map

First we get Toronto latitude and longitude.

In [110]:
#!pip install geopy
from geopy.geocoders import Nominatim

address = 'Toronto, ON'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Finally we add our neighbourhoods data to the map and we display the map !

In [116]:
#!pip install folium
import folium

# create map of Totonto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

    
map_toronto