# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

### For this assignment, we will explore and cluster the neighborhoods in Toronto.

First of all, let's download required the libraries we will need for the exercise

In [85]:
import pandas as pd # library for data analsysis

# 1. Download and Explore Dataset

We will download the data table from the link https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, and to transform the data into a pandas dataframe.

In [86]:
df = pd.read_html(io="https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M",header=0, na_values=['Not assigned'])[0]
print('Data downloaded!')
df.head()

Data downloaded!


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


In [87]:
df.shape

(287, 3)

If a cell has a borough but is Not assigned a neighborhood(s), then the neighborhood will be the same as the borough.

In [89]:
df.Neighbourhood.fillna(df.Borough, inplace=True)

 Ignore cells with a borough that is Not assigned.

In [90]:
df.dropna(inplace=True)
df.shape

(210, 3)

Let's combine postal codes with multiple neighborhoods into one row with the neighborhoods separated with a comma.

In [91]:
df = df.groupby(['Postcode','Borough'],as_index=False).agg(lambda s: ', '.join(s))

In [92]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


#### Let's check the number of rows in the df dataframe using  the .shape method.

In [93]:
df.shape

(103, 3)

# 2. Get the latitude and the longitude coordinates of each neighborhood

Let's get the get the geographical coordinates of the neighborhoods from the provided csv file

In [94]:
latlong = pd.read_csv('http://cocl.us/Geospatial_data')
latlong.head(10)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


#### Joining the two dataframes to get a dataframe with longitude and latitude data

In [69]:
df_post = df.join(latlong.set_index('Postal Code'), on='Postcode')
df_post.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


#### Let's recheck and confirm the number of rows in dataframe after adding longitude and latitude data, using the .shape method.

In [95]:
df_post.shape

(103, 5)

# 3. Explore Neighborhoods in Toronto

We will use geopy library to get the latitude and longitude of Toronto, converting an address into long and lat values.

In [96]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
lat = location.latitude
lng = location.longitude

print('The geograpical coordinates of Toronto are {}, {}.'.format(lat, lng))

The geograpical coordinates of Toronto are 43.653963, -79.387207.


Next, we will import Folium library

In [73]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
print('Folium libraries imported.')

Solving environment: done

# All requested packages already installed.

Folium libraries imported.


#### Now, we will create a map of Toronto with neighborhoods superimposed on top.

In [97]:
map_toronto = folium.Map(location=[lat, lng], zoom_start=10)

for lat, lng, borough, neighbourhood in zip(df_post['Latitude'], df_post['Longitude'], df_post['Borough'], df_post['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='maroon',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto) 

### Display the map

In [101]:
map_toronto

### Now, we will group Toronto neighborhoods into borough clusters 

In [102]:
from folium import plugins

# let's start again with a clean copy of the map of Toronto
map_toronto = folium.Map(location = [lat, lng], zoom_start = 10)

# instantiate a marker cluster object for the postcodes in the dataframe
postcodes = plugins.MarkerCluster().add_to(map_toronto)

# loop through the dataframe and add each data point to the mark cluster
for lat, lng, postcode in zip(df_post['Latitude'], df_post['Longitude'], df_post['Postcode']):
    label = 'lat-long: {}<br>Postcode: {}'.format(location[1], postcode)
    label = folium.Popup(label, parse_html=False)
    folium.Marker(
        location=[lat, lng],
        icon=folium.Icon(color='green', icon='ok-sign'),
        popup=label
    ).add_to(postcodes)

### Display the postcodes with borough clusters

In [103]:
map_toronto

# Thank you.