# Segmenting and Clustering Neighborhoods of Toronto Canada


## Part 1 - Getting and wrangling the postal code data for Toronto

Steps:
* Using Pandas to read the postal code tables from the wiki page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M .

* Remove rows that do not have an assigned Borough.

* Rename columns to PostalCode, Borough, and Neighborhood.

* Replace neighborhoods "Not assigned" with Borough.

* Combine neighborhoods with same postal code.


In [92]:
# Required imports
import pandas as pd


In [150]:
# Get postal codes from the wiki using pandas
# pd.read_html returns a list of PDs, selecting the first PD that matches the requested table
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_orig = pd.read_html(url)[0]


In [186]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned
# Reset the df index.
df_1 = df_orig[df_orig['Borough'] != 'Not assigned']
df_1.reset_index(drop=True, inplace=True)
df_1

Index(['Postcode', 'Borough', 'Neighbourhood'], dtype='object')


In [187]:
# Rename columns to PostalCode, Borough, and Neighborhood
df_2 = df_1.rename(columns = {'Postcode': 'PostalCode','Neighbourhood': 'Neighborhood'})
print(df_2.columns)

Index(['PostalCode', 'Borough', 'Neighborhood'], dtype='object')


In [180]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

def fix_neighborhood(row):
    curr_neighborhood = row['Neighborhood']
    if (curr_neighborhood == 'Not assigned'):
        return row['Borough']
    else:
        return curr_neighborhood


# before change - "M9A	Queen's Park	Not assigned"
# print(df_2[df_2['PostalCode'] == "M9A"])

# apply the change
df_2.loc[:, 'Neighborhood'] = df_filtered.apply(lambda row: fix_neighborhood(row), axis=1)

# after change
# print(df_2[df_2['PostalCode'] == "M9A"])

  PostalCode       Borough  Neighborhood
6        M9A  Queen's Park  Not assigned
  PostalCode       Borough  Neighborhood
6        M9A  Queen's Park  Queen's Park


In [172]:
df_2

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


In [181]:
# Combine neighborhoods with same postcode

df_3 = df_2.groupby(['PostalCode', 'Borough']).agg({'Neighborhood': ','.join})

df_3.reset_index(inplace=True)

df_3.head(20)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [190]:
print(df_3.shape)
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_3['Borough'].unique()),
        df_3.shape[0]
    )
)

(103, 3)
The dataframe has 103 postal codes, 11 boroughs and 103 neighborhoods.


## Part 2 - Adding geographic coordinates of the neighborhoods

Using the Geospatial_Coordinates.csv data set

In [183]:
df_coordinates = pd.read_csv('Geospatial_Coordinates.csv')
df_coordinates.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)

In [184]:
df_4 = pd.merge(df_3, df_coordinates, on='PostalCode', how='inner')
df_4

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village,Martin Grove Gardens,Richvie...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",43.739416,-79.588437


In [205]:
df_4.head(100)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
95,M9C,Etobicoke,"Bloordale Gardens,Eringate,Markland Wood,Old B...",43.643515,-79.577201
96,M9L,North York,Humber Summit,43.756303,-79.565963
97,M9M,North York,"Emery,Humberlea",43.724766,-79.532242
98,M9N,York,Weston,43.706876,-79.518188


## Part 3 - Explore and cluster neighborhoods in Toronto



In [193]:
# imports

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /usr/local/anaconda3

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.0.1               |             py_0         575 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    certifi-2019.9.11          |           py37_0         147 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    

In [206]:
# Get the geo coordinates of Toronto

address = 'Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Canada {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Canada 43.653963, -79.387207.


  """


In [210]:
# create map of Toronto
map_toronto= folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map of "postal code (borough) - neighborhoods"
for lat, lng, postalcode, borough, neighborhood in zip(df_4['Latitude'], df_4['Longitude'], df_4['PostalCode'], df_4['Borough'], df_4['Neighborhood']):
    label = '{} ({}) - {}'.format(postalcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [218]:
df_toronto = df_4[df_4['Borough'].str.contains("Toronto|toronto")==True]
df_toronto.reset_index(drop=True, inplace=True)

In [223]:
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


In [221]:
# create map of Toronto
map_toronto= folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map of "postal code (borough) - neighborhoods"
for lat, lng, postalcode, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['PostalCode'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{} ({}) - {}'.format(postalcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [222]:
# Foursquare credentials

CLIENT_ID = 'EXENEWRKIY0ZQPXVOZCD3RWEXTNNQC1113GGBXYXJVHERV0J' # your Foursquare ID
CLIENT_SECRET = 'KNWZ5LZITB5OJIAJVWA22LTLQ5OQHT1NX0VJQBOLFW3O1RDM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

