# IBM Data Science Capstone - Toronto / Vancouver Neighbourhood Match Maker

<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
## Note re: Folium Maps!
To view folium maps, use the nbviewer URL for this file: https://nbviewer.jupyter.org/github/g3clarke/Coursera_Capstone/blob/master/IBM%20Capstone%20-%20Toronto%20Neighbourhood%20Clustering%20Part%203.ipynb
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;
<br>&nbsp;

This is for anyone planning to move from Toronto to Vancouver who is looking to plant their roots in a simialar neighbourhood to their current home in the Six (Toronto).

We will gather all geographical inforamtion about the two cities as well as the popular venues nearby to make sure a move can feel as close to home as possible in what I believe are two very different cities given the climate and location.

Sources used:

<li>Data from Geonames.org for both cities</li>

Tools used:
<li>Python libraries</li>
<li>Foursquare</li>

First starting with the importing of the libraries and variables that wlil be used when we're ready to import and clean the data.


In [5]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import requests # library to handle requests

print ('Imported Libraries.')

Imported Libraries.


In [6]:
# Main vars

# URL where we can find the neighbourhoods
tdot_url = 'http://www.geonames.org/postalcode-search.html?q=Toronto&country=CA&adminCode1=ON'
vanc_url = 'http://www.geonames.org/postalcode-search.html?q=Vancouver&country=CA&adminCode1=BC';
    
# Data Frame to hold the neighbourhood list
tdot_neighbourhoods_df = []
vanc_neighbourhoods_df = []



## Merge-a-Plenty

Time to get to work now that our libs and vars have been dealt with.  

I needed to split this off into a function to keep things relatively clean. If there's a more efficient, vector-driven way of doing this, please leave a comment I'd be happy to explore further!

In [165]:
# Data Cleansing Functions

# postcode_merge: take repeated postal codes and combine the neighbourhoods
def geocode_merge(df):
    
    second_lines_series = df[~df['Country'].isin(['Canada'])]['Admin3']
    second_lines_df = second_lines_series.to_frame().reset_index()
    second_lines_df['Latitude'] = second_lines_df.apply(lambda row: str(row.Admin3).split('/')[0], axis = 1)
    second_lines_df['Longitude']  = second_lines_df.apply(lambda row: str(row.Admin3).split('/')[-1], axis = 1)
    second_lines_df.drop(second_lines_df.columns[[0,1]],axis=1,inplace=True)

    return_df = df[df['Country'] == 'Canada']
    return_df.reset_index(inplace=True)
    return_df.drop(return_df.columns[[0,1,7]],axis=1,inplace=True)
    return_df.rename(columns={"Code":"PostalCode","Place":"Neighborhood","Admin1": "Province","Admin2":"Borough"},inplace=True);
    return_df = return_df.merge(second_lines_df, left_index=True, right_index=True)
    return_df["Neighborhood"] = return_df["Neighborhood"].astype(str)
    return_df["PostalCode"] = return_df["PostalCode"].astype(str)
    return_df["Country"] = return_df["Country"].astype(str)
    return_df["Province"] = return_df["Province"].astype(str)
    return_df["Borough"] = return_df["Borough"].astype(str)
    return_df["Latitude"] = pd.to_numeric(return_df["Latitude"], downcast="float")
    return_df["Longitude"] = pd.to_numeric(return_df["Longitude"], downcast="float")
    
    return return_df
        
print ("functions compiled.")

functions compiled.


## Read and clean the data

It's now time to scrape the web page and clean up the data. I used Pandas to scrape but I would like to retry this one day soon using Beautiful Soup (Note to self: Google it).


In [135]:
# Read the html source for this data.

tdot_neighbourhoods_df = pd.read_html(tdot_url)[2]
vanc_neighbourhoods_df = pd.read_html(vanc_url)[2]


In [166]:
tdot_records_df = geocode_merge(tdot_neighbourhoods_df)
vanc_records_df = geocode_merge(vanc_neighbourhoods_df)
vanc_records_df.dtypes

Neighborhood     object
PostalCode       object
Country          object
Province         object
Borough          object
Latitude        float32
Longitude       float32
dtype: object

## Toronto Neighbourhoods by Postal Code

The end result with all cleansing in the rearview mirror...

In [143]:
tdot_records_df

Unnamed: 0,Neighborhood,PostalCode,Country,Province,Borough,Latitude,Longitude
0,Central Toronto (North Toronto West),M4R,Canada,Ontario,Toronto,43.714,-79.406
1,Downtown Toronto (Toronto Dominion Centre / De...,M5K,Canada,Ontario,Toronto,43.647,-79.382
2,Downtown Toronto (University of Toronto / Harb...,M5S,Canada,Ontario,Toronto,43.663,-79.399
3,Downsview East (CFB Toronto),M3K,Canada,Ontario,Toronto,43.739,-79.469
4,East Toronto (The Beaches),M4E,Canada,Ontario,Toronto,43.678,-79.294
5,East Toronto (The Danforth East),M4J,Canada,Ontario,Toronto,43.687,-79.337
6,East Toronto (Studio District),M4M,Canada,Ontario,Toronto,43.656,-79.341
7,Central Toronto (Lawrence Park East),M4N,Canada,Ontario,Toronto,43.73,-79.394
8,Central Toronto (Davisville North),M4P,Canada,Ontario,Toronto,43.714,-79.389
9,Central Toronto (Davisville),M4S,Canada,Ontario,Toronto,43.702,-79.385


## Vancouver Neighbourhoods by Postal Code

The end result with all cleansing in the rearview mirror...

In [145]:
vanc_records_df

Unnamed: 0,Neighborhood,PostalCode,Country,Province,Borough,Latitude,Longitude
0,Vancouver (Killarney),V5S,Canada,British Columbia,Vancouver,49.218,-123.038
1,Vancouver (North Hastings-Sunrise),V5K,Canada,British Columbia,Vancouver,49.281,-123.04
2,Vancouver (North Grandview-Woodlands),V5L,Canada,British Columbia,Vancouver,49.279,-123.067
3,Vancouver (SE Kensington / Victoria-Fraserview),V5P,Canada,British Columbia,Vancouver,49.222,-123.068
4,Vancouver (South Renfrew-Collingwood),V5R,Canada,British Columbia,Vancouver,49.24,-123.041
5,Vancouver (East Mount Pleasant),V5T,Canada,British Columbia,Vancouver,49.262,-123.092
6,Vancouver (East Fairview / South Cambie),V5Z,Canada,British Columbia,Vancouver,49.248,-123.121
7,Vancouver (South West End),V6E,Canada,British Columbia,Vancouver,49.283,-123.13
8,Vancouver (Central Kitsilano),V6K,Canada,British Columbia,Vancouver,49.265,-123.165
9,Vancouver (NW Arbutus Ridge),V6L,Canada,British Columbia,Vancouver,49.25,-123.166


# Mapping Some Results
Now let's have some fun with Maps. We start by adding some Folium into the mix.

In [147]:
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Additional libraries imported.')
print('Folium installed')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0           conda-forge
    geopy:          

In [148]:
# The code was removed by Watson Studio for sharing.

Foursquare credentials loaded.


#### Use geopy library to get the latitude and longitude values of Toronto and Vancouver

In [167]:
tdot_address = 'Toronto, ON'
geolocator = Nominatim(user_agent="on_explorer")
tdot_location = geolocator.geocode(tdot_address)
tdot_latitude = tdot_location.latitude
tdot_longitude = tdot_location.longitude

vanc_address = 'Vancouver, BC'
geolocator = Nominatim(user_agent="on_explorer")
vanc_location = geolocator.geocode(vanc_address)
vanc_latitude = vanc_location.latitude
vanc_longitude = vanc_location.longitude

print('The geograpical coordinates of Toronto are {}, {}, while the geographical coordinates of Vancouver are {}, {}.'.format(tdot_latitude, tdot_longitude, vanc_latitude, vanc_longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347, while the geographical coordinates of Vancouver are 49.2608724, -123.1139529.


#### Lets see Toronto

In [173]:
# create map of Toronto using latitude and longitude values
tdot_map = folium.Map(location=[tdot_latitude, tdot_longitude], zoom_start=13)
tdot_neighborhoods = tdot_records_df;

# add markers to map
for lat, lng, borough, neighborhood in zip(tdot_neighborhoods['Latitude'], tdot_neighborhoods['Longitude'], tdot_neighborhoods['Borough'], tdot_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(tdot_map)  
    
tdot_map

#### Lets see Vancouver

In [172]:
# create map of Toronto using latitude and longitude values
vanc_map = folium.Map(location=[vanc_latitude, vanc_longitude], zoom_start=13)
vanc_neighborhoods =  vanc_records_df;

# add markers to map
for lat, lng, borough, neighborhood in zip(vanc_neighborhoods['Latitude'], vanc_neighborhoods['Longitude'], vanc_neighborhoods['Borough'], vanc_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(vanc_map)  
    
vanc_map

### Venues for each neighbourhood
This function is compliments of the good folks at IBM to access the Foursquare API that will provide us with venue information.

In [174]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now cycle through the Neighbourhoods, grabbing venue infomation

In [175]:
# type your answer here

tdot_venues = getNearbyVenues(names=tdot_neighborhoods['Neighborhood'],
                                   latitudes=tdot_neighborhoods['Latitude'],
                                   longitudes=tdot_neighborhoods['Longitude']
                                  )

vanc_venues = getNearbyVenues(names=vanc_neighborhoods['Neighborhood'],
                                   latitudes=vanc_neighborhoods['Latitude'],
                                   longitudes=vanc_neighborhoods['Longitude']
                                  )

Central Toronto (North Toronto West)
Downtown Toronto (Toronto Dominion Centre / Design Exchange)
Downtown Toronto (University of Toronto / Harbord)
Downsview East (CFB Toronto)
East Toronto (The Beaches)
East Toronto (The Danforth East)
East Toronto (Studio District)
Central Toronto (Lawrence Park East)
Central Toronto (Davisville North)
Central Toronto (Davisville)
Downtown Toronto (Rosedale)
Downtown Toronto (Church and Wellesley)
Downtown Toronto (Ryerson)
Downtown Toronto (St. James Park)
Downtown Toronto (Berczy Park)
Downtown Toronto (Central Bay Street)
Downtown Toronto (Richmond / Adelaide / King)
Central Toronto (Roselawn)
Downtown Toronto (Underground city)
Downtown Toronto (Christie)
West Toronto (Dufferin / Dovercourt Village)
West Toronto (Parkdale / Roncesvalles Village)
Downtown Toronto (Harbourfront East / Union Station / Toronto Island)
West Toronto (Rua AÃ¾ores / Trinity)
East Toronto (The Danforth West / Riverdale)
East Toronto (India Bazaar / The Beaches West)
Cent

In [177]:
#tdot_venues.groupby('Neighborhood').count()
#vanc_venues.groupby('Neighborhood').count()

In [178]:
print('There are {} unique Toronto categories and {} unique Vancouver categories.'.format(len(tdot_venues['Venue Category'].unique()),len(vanc_venues['Venue Category'].unique())))

There are 178 unique Toronto categories and 152 unique Vancouver categories.


### Analyze each neighbourhood

In [183]:
# one hot encoding
tdot_onehot = pd.get_dummies(tdot_venues[['Venue Category']], prefix="", prefix_sep="")
vanc_onehot = pd.get_dummies(vanc_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframes
tdot_onehot['Neighborhood'] = tdot_venues['Neighborhood'] 
vanc_onehot['Neighborhood'] = vanc_venues['Neighborhood'] 

# move neighborhood column to the first column
tdot_fixed_columns = [tdot_onehot.columns[-1]] + list(tdot_onehot.columns[:-1])
tdot_onehot = tdot_onehot[tdot_fixed_columns]

vanc_fixed_columns = [vanc_onehot.columns[-1]] + list(vanc_onehot.columns[:-1])
vanc_onehot = vanc_onehot[vanc_fixed_columns]

#tdot_onehot.head()

This function was provided by the good folks at IBM, for returning the top n venue types per neighbourhood

In [186]:
tdot_grouped = tdot_onehot.groupby('Neighborhood').mean().reset_index()
vanc_grouped = vanc_onehot.groupby('Neighborhood').mean().reset_index()


In [187]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

### Most Popular Venue Types: Toronto

In [189]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [197]:
# create a new dataframe
tdot_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
tdot_neighborhoods_venues_sorted['Neighborhood'] = tdot_grouped['Neighborhood']

for ind in np.arange(tdot_grouped.shape[0]):
    tdot_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(tdot_grouped.iloc[ind, :], num_top_venues)

tdot_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto (Davisville North),Gym,Hotel,Department Store,Dog Run,Park,Food & Drink Shop,Breakfast Spot,Gastropub,Gluten-free Restaurant,Electronics Store
1,Central Toronto (Davisville),Coffee Shop,Italian Restaurant,Dessert Shop,Café,Sandwich Place,Sushi Restaurant,Gas Station,Flower Shop,Indian Restaurant,Fast Food Restaurant
2,Central Toronto (Forest Hill North & West),Business Service,Park,Lawyer,Trail,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop
3,Central Toronto (Lawrence Park East),Park,Lawyer,Photography Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Dog Run
4,Central Toronto (Moore Park / Summerhill East),Gym,Tennis Court,Park,Grocery Store,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


### Most Popular Venue Types: Vancouver

In [198]:
# create a new dataframe
vanc_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
vanc_neighborhoods_venues_sorted['Neighborhood'] = vanc_grouped['Neighborhood']

for ind in np.arange(vanc_grouped.shape[0]):
    vanc_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(vanc_grouped.iloc[ind, :], num_top_venues)

vanc_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North Vancouver North Central,Trail,Mountain,Scenic Lookout,Yoga Studio,Event Space,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant
1,North Vancouver Northwest,Trail,Ski Chalet,Restaurant,Fast Food Restaurant,Mountain,Sporting Goods Shop,Ski Chairlift,Skating Rink,Scenic Lookout,Ski Trail
2,North Vancouver Northwest Central,Trail,Yoga Studio,Event Space,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Fair
3,North Vancouver South Central,Butcher,Breakfast Spot,Sandwich Place,Japanese Restaurant,Yoga Studio,Falafel Restaurant,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market
4,North Vancouver Southwest,Coffee Shop,Diner,Discount Store,Gym,Bus Station,Greek Restaurant,Toy / Game Store,Burger Joint,Other Great Outdoors,Japanese Restaurant


### Let's cluster the Toronto neighbourhoods to see what we can conclude

In [199]:
# set number of clusters
kclusters = 4

tdot_grouped_clustering = tdot_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(tdot_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
# add clustering labels
tdot_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tdot_merged = tdot_neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
tdot_merged = tdot_merged.join(tdot_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
tdot_merged

Unnamed: 0,Neighborhood,PostalCode,Country,Province,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto (North Toronto West),M4R,Canada,Ontario,Toronto,43.714001,-79.405998,3.0,Yoga Studio,Health & Beauty Service,Salon / Barbershop,Café,Restaurant,Diner,Metro Station,Coffee Shop,Clothing Store,Mexican Restaurant
1,Downtown Toronto (Toronto Dominion Centre / De...,M5K,Canada,Ontario,Toronto,43.646999,-79.382004,3.0,Coffee Shop,Café,Restaurant,Bakery,Steakhouse,Gym / Fitness Center,Hotel,Japanese Restaurant,Deli / Bodega,Concert Hall
2,Downtown Toronto (University of Toronto / Harb...,M5S,Canada,Ontario,Toronto,43.662998,-79.399002,3.0,Café,Restaurant,Japanese Restaurant,Italian Restaurant,Bookstore,Bakery,College Arts Building,Chinese Restaurant,Beer Store,Beer Bar
3,Downsview East (CFB Toronto),M3K,Canada,Ontario,Toronto,43.738998,-79.469002,3.0,Airport,Shoe Store,Coffee Shop,Food Court,Wine Shop,Diner,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
4,East Toronto (The Beaches),M4E,Canada,Ontario,Toronto,43.678001,-79.293999,3.0,Trail,Gastropub,Health Food Store,Bakery,Pub,Wine Shop,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store
5,East Toronto (The Danforth East),M4J,Canada,Ontario,Toronto,43.687,-79.336998,0.0,Park,Coffee Shop,Greek Restaurant,Convenience Store,Cosmetics Shop,Dog Run,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store
6,East Toronto (Studio District),M4M,Canada,Ontario,Toronto,43.655998,-79.341003,0.0,Gym,Garden Center,Coworking Space,Coffee Shop,Diner,Baseball Field,Park,Gay Bar,Deli / Bodega,Eastern European Restaurant
7,Central Toronto (Lawrence Park East),M4N,Canada,Ontario,Toronto,43.73,-79.393997,2.0,Park,Lawyer,Photography Studio,Dessert Shop,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Dog Run
8,Central Toronto (Davisville North),M4P,Canada,Ontario,Toronto,43.714001,-79.389,0.0,Gym,Hotel,Department Store,Dog Run,Park,Food & Drink Shop,Breakfast Spot,Gastropub,Gluten-free Restaurant,Electronics Store
9,Central Toronto (Davisville),M4S,Canada,Ontario,Toronto,43.702,-79.385002,3.0,Coffee Shop,Italian Restaurant,Dessert Shop,Café,Sandwich Place,Sushi Restaurant,Gas Station,Flower Shop,Indian Restaurant,Fast Food Restaurant


### Let's cluster the Vancouver neighbourhoods to see what we can conclude

In [200]:
# set number of clusters
kclusters = 4

vanc_grouped_clustering = vanc_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vanc_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 
# add clustering labels
vanc_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

vanc_merged = vanc_neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
vanc_merged = vanc_merged.join(vanc_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
vanc_merged

Unnamed: 0,Neighborhood,PostalCode,Country,Province,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Vancouver (Killarney),V5S,Canada,British Columbia,Vancouver,49.217999,-123.038002,1.0,Bus Stop,Pizza Place,Shopping Mall,Chinese Restaurant,Liquor Store,Sandwich Place,Salon / Barbershop,Sushi Restaurant,Juice Bar,Farmers Market
1,Vancouver (North Hastings-Sunrise),V5K,Canada,British Columbia,Vancouver,49.280998,-123.040001,1.0,Theme Park Ride / Attraction,Theme Park,Event Space,Soccer Field,Sushi Restaurant,Sandwich Place,Bus Station,Burger Joint,Farm,Market
2,Vancouver (North Grandview-Woodlands),V5L,Canada,British Columbia,Vancouver,49.278999,-123.067001,1.0,Asian Restaurant,Café,Coffee Shop,Brewery,Theater,Pizza Place,Sushi Restaurant,Convenience Store,Fish & Chips Shop,Chinese Restaurant
3,Vancouver (SE Kensington / Victoria-Fraserview),V5P,Canada,British Columbia,Vancouver,49.222,-123.068001,1.0,Pizza Place,Pharmacy,Fried Chicken Joint,Park,Pet Store,Convenience Store,Motorcycle Shop,Restaurant,Sandwich Place,Middle Eastern Restaurant
4,Vancouver (South Renfrew-Collingwood),V5R,Canada,British Columbia,Vancouver,49.240002,-123.041,1.0,Fish & Chips Shop,Hotel,Park,Asian Restaurant,Bus Stop,Bar,Falafel Restaurant,Financial or Legal Service,Filipino Restaurant,Field
5,Vancouver (East Mount Pleasant),V5T,Canada,British Columbia,Vancouver,49.262001,-123.092003,1.0,Sushi Restaurant,Vietnamese Restaurant,Ethiopian Restaurant,Pharmacy,Outdoor Sculpture,Park,Grocery Store,Pizza Place,Cocktail Bar,Pub
6,Vancouver (East Fairview / South Cambie),V5Z,Canada,British Columbia,Vancouver,49.248001,-123.121002,1.0,Bus Stop,Coffee Shop,Chinese Restaurant,Gym,Bank,Dessert Shop,Café,Liquor Store,Light Rail Station,Sushi Restaurant
7,Vancouver (South West End),V6E,Canada,British Columbia,Vancouver,49.283001,-123.129997,1.0,Bakery,Diner,Bookstore,Greek Restaurant,Japanese Restaurant,Farmers Market,Gay Bar,Middle Eastern Restaurant,Miscellaneous Shop,Seafood Restaurant
8,Vancouver (Central Kitsilano),V6K,Canada,British Columbia,Vancouver,49.264999,-123.165001,1.0,Coffee Shop,Vegetarian / Vegan Restaurant,Café,Yoga Studio,Farmers Market,Japanese Restaurant,Food Truck,Burger Joint,Liquor Store,Restaurant
9,Vancouver (NW Arbutus Ridge),V6L,Canada,British Columbia,Vancouver,49.25,-123.166,1.0,Caribbean Restaurant,Italian Restaurant,Bakery,Yoga Studio,Fair,Filipino Restaurant,Field,Fast Food Restaurant,Farmers Market,Farm


# Conclusions / Insights

<ul>
    <li>What I love about <b>Rosedale</b> is that it is by and large the most outdoor recreation-friendly neighbourhood in the Toronto downtown core, relative to its surroundings, with enough outdoor venue types to set it apart as its own cluster. That's not saying much compared to Vancouver - the data shows they are completely different cities.
    </li>
        <li>If I want to reside in an area of Vancouver with the most outdoor recreation-friendly activities, as I did in Toronto, I should start looking first in the North West / Stanley Park area.
    </li>
  <li>Climate looks like it plays a significant role in the top 10 types of venues of the respective mega cities, and these venues seem to speak to the culture of the respective areas. This at least partially explains the significant difference in top 10 venue types.
    </li>
</ul>
    

### I hope you enjoyed it!