# Part 3 | Segmenting and Clustering Neighborhoods in Toronto

1. Start by creating a new Notebook for this assignment.

2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe.

3. To create the above dataframe:

.
- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
- If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
- Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
- In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.
.

### Import libraries

In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
import folium
import json
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print("Library Imported")

Library Imported


## 1 | Data Cleaning & Data Wrangling

### Scraping List of Postal Codes from Wikipedia using BeautifulSoup Libary

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
scraping_data = requests.get(url).text
wiki_scrap = BeautifulSoup(scraping_data, 'lxml')

In [3]:
wiki_scrap

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of postal codes of Canada: M - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"17a2e4ce-0c04-4988-b4d6-0f8a3bd4f32e","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":969510799,"wgRevisionId":969510799,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communications in Ontario","P

### Convert PostalCode HTML table to pandas dataframe

In [4]:
columns = ['PostalCode', 'Borough', 'Neighborhood']
df_toronto = pd.DataFrame(columns = columns)
content = wiki_scrap.find('div', class_='mw-parser-output')
table = content.table.tbody
postcode = 0
borough = 0
neighborhood = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            postcode = td.text.strip('\n')
            i = i + 1
        elif i == 1:
            borough = td.text.strip('\n')
            i = i + 1
        elif i == 2:
            neighborhood = td.text.strip('\n').replace(']', '')
    df_toronto = df_toronto.append({'PostalCode': postcode, 'Borough': borough, 'Neighborhood': neighborhood}, ignore_index=True)
    
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
0,0,0,0
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
...,...,...,...
176,M5Z,Not assigned,Not assigned
177,M6Z,Not assigned,Not assigned
178,M7Z,Not assigned,Not assigned
179,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Cleaning the Data

In [5]:
df_toronto = df_toronto[df_toronto.Borough != 'Not assigned']
df_toronto = df_toronto[df_toronto.Borough != 0]
df_toronto.reset_index(drop = True, inplace = True)
i = 0
for i in range(0, df_toronto.shape[0]):
    if(df_toronto.iloc[i][2] == 'Not assigned'):
        df_toronto.iloc[i][2] = df_toronto_iloc[i][1]
        i = + 1
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [6]:
df_toronto = df_toronto.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
df_toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


In [7]:
df_toronto.dropna()
empty = 'Not assigned'
df_toronto = df_toronto[(df_toronto.PostalCode != empty) & (df_toronto.Borough != empty) & (df_toronto.Neighborhood != empty)]
df_toronto.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


### Load Geospatial Data

In [8]:
df_coords = pd.read_csv('Geospatial_data.csv')
df_coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
df_toronto['Latitude'] = np.nan
df_toronto['Longitude'] = np.nan

for idx in df_toronto.index:
    coords_idx = df_coords['Postal Code'] == df_toronto.loc[idx, 'PostalCode']
    df_toronto.at[idx, 'Latitude'] = df_coords.loc[coords_idx, 'Latitude']
    df_toronto.at[idx, 'Longitude'] = df_coords.loc[coords_idx, 'Longitude']

df_toronto.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


### Make Geolocator object & Visualize to Folium Map

In [10]:
address = 'Toronto'
geolocator = Nominatim(user_agent="coursera")
location = geolocator.geocode(address)
latitude_x = location.latitude
longitude_x = location.longitude
print('Coordinates of {} : {}, {}'.format(address, latitude_x, longitude_x))

Coordinates of Toronto : 43.6534817, -79.3839347


In [11]:
toronto_map = folium.Map(location = [latitude_x, longitude_x], zoom_start=10)

for lat, lng, neigh in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighborhood']):
    label = '{}'.format(neigh)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fil_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)
    
toronto_map

## 2 | Loading the Foursquare API

### Define Client Credentials for Foursquare API

In [12]:
CLIENT_ID = 'UZN0S4V4PW5I05JYJWW4BQ5L02BUCS2TB1VOIBU4AAEQMSUE'
CLIENT_SECRET = '4I2EECGHTYNGPANWV0IBM3S0PJNDABNZBENCNDCJBER0HB3O'
VERSION = '20180604'
LIMIT = 30
print('Your credentials:')
print('CLIENT_ID: '+CLIENT_ID)
print('CLIENT_SECRET: '+CLIENT_SECRET)

Your credentials:
CLIENT_ID: UZN0S4V4PW5I05JYJWW4BQ5L02BUCS2TB1VOIBU4AAEQMSUE
CLIENT_SECRET: 4I2EECGHTYNGPANWV0IBM3S0PJNDABNZBENCNDCJBER0HB3O


### Make API Connection

In [13]:
radius = 700
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    latitude_x,
    longitude_x,
    radius,
    LIMIT
    )
results = requests.get(url).json()

### Load and normalize json data

In [14]:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.cc,...,venue.categories,venue.photos.count,venue.photos.groups,venue.location.address,venue.location.crossStreet,venue.location.postalCode,venue.venuePage.id,venue.location.neighborhood,venue.events.count,venue.events.summary
0,e-0-5227bb01498e17bf485e6202-0,0,"[{'summary': 'This spot is popular', 'type': '...",5227bb01498e17bf485e6202,Downtown Toronto,43.653232,-79.385296,"[{'label': 'display', 'lat': 43.65323167517444...",113,CA,...,"[{'id': '4f2a25ac4b909258e854f55f', 'name': 'N...",0,[],,,,,,,
1,e-0-4ad4c05ef964a520a6f620e3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4ad4c05ef964a520a6f620e3,Nathan Phillips Square,43.652270,-79.383516,"[{'label': 'display', 'lat': 43.65227047322295...",138,CA,...,"[{'id': '4bf58dd8d48988d164941735', 'name': 'P...",0,[],100 Queen St W,at Bay St,M5H 2N1,,,,
2,e-0-4ae7b27df964a52068ad21e3-2,0,"[{'summary': 'This spot is popular', 'type': '...",4ae7b27df964a52068ad21e3,Japango,43.655268,-79.385165,"[{'label': 'display', 'lat': 43.65526771691681...",222,CA,...,"[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",0,[],122 Elizabeth St.,at Dundas St. W,M5G 1P5,,,,
3,e-0-537773d1498e74a75bb75c1e-3,0,"[{'summary': 'This spot is popular', 'type': '...",537773d1498e74a75bb75c1e,Eggspectation Bell Trinity Square,43.653144,-79.381980,"[{'label': 'display', 'lat': 43.65314383888587...",161,CA,...,"[{'id': '4bf58dd8d48988d143941735', 'name': 'B...",0,[],483 Bay Street,Albert Street,M5G 2C9,97507838,,,
4,e-0-4b2a6eb8f964a52012a924e3-4,0,"[{'summary': 'This spot is popular', 'type': '...",4b2a6eb8f964a52012a924e3,Indigo,43.653515,-79.380696,"[{'label': 'display', 'lat': 43.65351471121164...",260,CA,...,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",0,[],220 Yonge St,,M5B 2H1,,Downtown Yonge,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,e-0-4ae734bef964a5205ea921e3-95,0,"[{'summary': 'This spot is popular', 'type': '...",4ae734bef964a5205ea921e3,Umbra Concept Store,43.650417,-79.391136,"[{'label': 'display', 'lat': 43.650417, 'lng':...",672,CA,...,"[{'id': '4bf58dd8d48988d1f8941735', 'name': 'F...",0,[],165 John St.,at Queen St. W,M5T 1X3,,,,
96,e-0-53856fa411d2061fc84a3d0a-96,0,"[{'summary': 'This spot is popular', 'type': '...",53856fa411d2061fc84a3d0a,Lobby Lounge at the Shangri-La Toronto,43.649155,-79.386546,"[{'label': 'display', 'lat': 43.64915499986854...",525,CA,...,"[{'id': '4bf58dd8d48988d121941735', 'name': 'L...",0,[],188 University Ave.,,M5H 0A3,,,,
97,e-0-5a1df4408ad62e3314313889-97,0,"[{'summary': 'This spot is popular', 'type': '...",5a1df4408ad62e3314313889,Craft Beer Market,43.649872,-79.378398,"[{'label': 'display', 'lat': 43.6498722789121,...",600,CA,...,"[{'id': '56aa371ce4b08b9a8d57356c', 'name': 'B...",0,[],1 Adelaide St E,,M5C 2V9,,,,
98,e-0-52ec621e498ec68fa15ee922-98,0,"[{'summary': 'This spot is popular', 'type': '...",52ec621e498ec68fa15ee922,Copacabana Grilled Brazilian,43.648333,-79.388151,"[{'label': 'display', 'lat': 43.64833259480477...",666,CA,...,"[{'id': '4bf58dd8d48988d16b941735', 'name': 'B...",0,[],230 Adelaide St W #2,at Duncan St.,M5H 3H1,,,,


### Create function to get type of category venues from API

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,Japango,Sushi Restaurant,43.655268,-79.385165
3,Eggspectation Bell Trinity Square,Breakfast Spot,43.653144,-79.38198
4,Indigo,Bookstore,43.653515,-79.380696


In [17]:
print('{} nearby venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 nearby venues were returned by Foursquare.


### Create getNearbyVenues function to get nearest venues

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame ([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',
                             'Neighborhood Latitude',
                             'Neighborhood Longitude',
                             'Venue',
                             'Venue Latitude',
                             'Venue Longitude',
                             'Venue Category']
    return(nearby_venues)

### Load Neighborhood by nearby venues

In [19]:
scarborough_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                    latitudes=df_toronto['Latitude'],
                                    longitudes=df_toronto['Longitude'])

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

In [20]:
print(scarborough_venues.shape)
scarborough_venues.head()

(2150, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [21]:
print('There are {} uniques venues categories.'.format(len(scarborough_venues['Venue Category'].unique())))
scarborough_venues.groupby('Neighborhood').count().head()

There are 274 uniques venues categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",7,7,7,7,7,7
"Bathurst Manor, Wilson Heights, Downsview North",22,22,22,22,22,22
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25,25,25


### Make onehot encoding for Venue Category

In [22]:
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood']
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot[:-1])
scarborugh_onehot = scarborough_onehot[fixed_columns]
scarborough_onehot.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
scarborough_onehot.shape

(2150, 274)

 ### let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
92,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0
93,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0
94,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,...,0.0,0.0,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
scarborough_grouped.shape

(96, 274)

### Write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0: num_top_venues]

###  Create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venues'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarborough_grouped['Neighborhood']
for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()    

Unnamed: 0,Neighborhood,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Skating Rink,Breakfast Spot,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Gym,Coffee Shop,Skating Rink,Sandwich Place,Pub,Dog Run,Dim Sum Restaurant,Diner,Discount Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Park,Fried Chicken Joint,Bridal Shop,Sandwich Place,Diner,Restaurant,Deli / Bodega,Ice Cream Shop
3,Bayview Village,Café,Bank,Chinese Restaurant,Japanese Restaurant,Dim Sum Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,"Bedford Park, Lawrence Manor East",Thai Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Pizza Place,Liquor Store,Indian Restaurant,Pub,Butcher,Sushi Restaurant


## 3 | Clustering Neighborhoods

### Run k-means to cluster the neighborhood into 5 clusters.

In [30]:
k = 5
scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=k, random_state=0).fit(scarborough_grouped_clustering)

kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [33]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
scarborough_merged = scarborough_venues
scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
scarborough_merged.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant,1,Fast Food Restaurant,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Gym / Fitness Center
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar,0,Bar,History Museum,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Scarborough Historical Society,43.788755,-79.162438,History Museum,0,Bar,History Museum,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank,0,Electronics Store,Restaurant,Breakfast Spot,Rental Car Location,Bank,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Distribution Center
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store,0,Electronics Store,Restaurant,Breakfast Spot,Rental Car Location,Bank,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Distribution Center


### Visualize the resulting clusters

In [34]:
map_clusters = folium.Map(location=[latitude_x, longitude_x], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Neighborhood Latitude'], scarborough_merged['Neighborhood Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4 | Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

#### Cluster 1

In [36]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 0, scarborough_merged.columns[[0] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Rouge Hill, Port Union, Highland Creek",-79.163085,Bar,0,Bar,History Museum,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
2,"Rouge Hill, Port Union, Highland Creek",-79.162438,History Museum,0,Bar,History Museum,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
3,"Guildwood, Morningside, West Hill",-79.191151,Bank,0,Electronics Store,Restaurant,Breakfast Spot,Rental Car Location,Bank,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Distribution Center
4,"Guildwood, Morningside, West Hill",-79.191537,Electronics Store,0,Electronics Store,Restaurant,Breakfast Spot,Rental Car Location,Bank,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Distribution Center
5,"Guildwood, Morningside, West Hill",-79.191275,Restaurant,0,Electronics Store,Restaurant,Breakfast Spot,Rental Car Location,Bank,Medical Center,Mexican Restaurant,Intersection,Yoga Studio,Distribution Center
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2145,"South Steeles, Silverstone, Humbergate, Jamest...",-79.584230,Fast Food Restaurant,0,Grocery Store,Pizza Place,Sandwich Place,Beer Store,Fried Chicken Joint,Pharmacy,Fast Food Restaurant,Eastern European Restaurant,Dumpling Restaurant,Drugstore
2146,"Northwest, West Humber - Clairville",-79.589943,Rental Car Location,0,Rental Car Location,Garden Center,Bar,Drugstore,Yoga Studio,Dog Run,Diner,Discount Store,Distribution Center,Doner Restaurant
2147,"Northwest, West Humber - Clairville",-79.589252,Bar,0,Rental Car Location,Garden Center,Bar,Drugstore,Yoga Studio,Dog Run,Diner,Discount Store,Distribution Center,Doner Restaurant
2148,"Northwest, West Humber - Clairville",-79.598725,Drugstore,0,Rental Car Location,Garden Center,Bar,Drugstore,Yoga Studio,Dog Run,Diner,Discount Store,Distribution Center,Doner Restaurant


#### Cluster 2

In [37]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 1, scarborough_merged.columns[[0] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Malvern, Rouge",-79.199056,Fast Food Restaurant,1,Fast Food Restaurant,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Gym / Fitness Center


#### Cluster 3

In [38]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 2, scarborough_merged.columns[[0] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
75,"Milliken, Agincourt North, Steeles East, L'Amo...",-79.289773,Park,2,Park,Playground,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
76,"Milliken, Agincourt North, Steeles East, L'Amo...",-79.289867,Playground,2,Park,Playground,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
168,"York Mills, Silver Hills",-79.371296,Martial Arts School,2,Park,Martial Arts School,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant
169,"York Mills, Silver Hills",-79.36959,Park,2,Park,Martial Arts School,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant
170,"Willowdale, Newtonbrook",-79.41012,Home Service,2,Home Service,Park,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
171,"Willowdale, Newtonbrook",-79.410022,Park,2,Home Service,Park,Yoga Studio,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
204,York Mills West,-79.401393,Convenience Store,2,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
205,York Mills West,-79.399717,Park,2,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
206,York Mills West,-79.402356,Construction & Landscaping,2,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
213,Parkwoods,-79.33214,Park,2,Park,Food & Drink Shop,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore


#### Cluster 4

In [40]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 3, scarborough_merged.columns[[0] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2098,"Old Mill South, King's Mill Park, Sunnylea, Hu...",-79.496266,Baseball Field,3,Baseball Field,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Fast Food Restaurant
2126,"Humberlea, Emery",-79.532854,Baseball Field,3,Baseball Field,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Fast Food Restaurant


#### Cluster 5

In [41]:
scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 4, scarborough_merged.columns[[0] + list(range(5, scarborough_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venues,2nd Most Common Venues,3rd Most Common Venues,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2115,"West Deane Park, Princess Gardens, Martin Grov...",-79.556107,Golf Course,4,Golf Course,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
