Neighborhood Comparison Between Toronto and New York City
===============================================

## Table of Contents

1. [Introduction](#chapter1)
2. [Acquire NYC Neighborhood Data](#chapter2)
3. [Acquire Toronto Neighborhood Data](#chapter3)
4. [Explore Neighborhoods](#chapter4)
5. [Analyze Each Neighborhood](#chapter5)
6. [Cluster Neighborhoods](#chapter6)
7. [Examine Neighborhood Clusters](#chapter7)
8. [Find Similar Neighborhoods](#chapter8)

## 1. Introduction <a class="anchor" id="chapter1"></a>

In this project, we will analyze neighborhood venues for Toronto and New York City (NYC) and attempt to answer two related questions:
- When grouping all of Toronto and NYC neighborhoods in terms of nearby venues, what are the groups and how they are similar or unique from each others?
- When selecting a neighborhood in origin city, what are the similar neighborhoods in the destination city centered around a business location, for example, Google NYC office?

## 2. Acquire NYC Neighborhood Data <a class="anchor" id="chapter2"></a>

In [1]:
import numpy as np
import pandas as pd
import json
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from geopy.geocoders import Nominatim
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print('Completed importing libraries.')

Completed importing libraries.


NYC neighborhoods data has already been downloaded and saved locally. Let us read it in and convert it to dataframe.

In [252]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)['features']

boroughs = []
neighborhoods = []
lats = []
lons = []
for data in newyork_data:
    boroughs.append(data['properties']['borough'])
    neighborhoods.append(data['properties']['name'])
    lats.append(data['geometry']['coordinates'][1])
    lons.append(data['geometry']['coordinates'][0])

nyc_neighborhoods_df = pd.DataFrame(
    {'Borough': boroughs, 'Neighborhood': neighborhoods, 'Latitude': lats, 'Longitude': lons}
)
print(nyc_neighborhoods_df.head())
print(nyc_neighborhoods_df.shape)
print(nyc_neighborhoods_df['Borough'].unique())

  Borough Neighborhood   Latitude  Longitude
0   Bronx    Wakefield  40.894705 -73.847201
1   Bronx   Co-op City  40.874294 -73.829939
2   Bronx  Eastchester  40.887556 -73.827806
3   Bronx    Fieldston  40.895437 -73.905643
4   Bronx    Riverdale  40.890834 -73.912585
(306, 4)
['Bronx' 'Manhattan' 'Brooklyn' 'Queens' 'Staten Island']


There are 306 neighborhoods in 5 boroughs in NYC. Let's visualize by displaying them on a map:

In [198]:
def map_city_neighborhoods(city_map, city_neighborhoods_df):
    # add markers to map
    for lat, lng, borough, neighborhood in zip(city_neighborhoods_df['Latitude'], 
                                               city_neighborhoods_df['Longitude'], 
                                               city_neighborhoods_df['Borough'], 
                                               city_neighborhoods_df['Neighborhood']
                                              ):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(city_map)  

In [253]:
    
nyc_center_lat_lon = Nominatim(user_agent="ny_explorer").geocode('New York City, NY')
map_nyc = folium.Map(location=[nyc_center_lat_lon.latitude, nyc_center_lat_lon.longitude], zoom_start=10)
map_city_neighborhoods(map_nyc, nyc_neighborhoods_df)
map_nyc

## 3. Acquire Toronto Neighborhood Data <a class="anchor" id="chapter3"></a>

Scrape the wiki page to read in Toronto neighborhoods, drop the rows with no assigned borough, and geo code each neighborhood.

In [254]:
toronto_neighborhoods_df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
toronto_neighborhoods_df = toronto_neighborhoods_df.loc[toronto_neighborhoods_df['Borough'] != 'Not assigned'].reset_index(drop=True)

toronto_lat_lon_df = pd.read_csv('Geospatial_Coordinates.csv')
toronto_neighborhoods_df = toronto_neighborhoods_df.join(toronto_lat_lon_df.set_index('Postal Code'), on='Postal Code')
toronto_neighborhoods_df = toronto_neighborhoods_df.rename(columns = {"Neighbourhood": "Neighborhood"}) 

print(toronto_neighborhoods_df.head())
print(toronto_neighborhoods_df.shape)
print(toronto_neighborhoods_df['Borough'].unique())

  Postal Code           Borough                                 Neighborhood  \
0         M3A        North York                                    Parkwoods   
1         M4A        North York                             Victoria Village   
2         M5A  Downtown Toronto                    Regent Park, Harbourfront   
3         M6A        North York             Lawrence Manor, Lawrence Heights   
4         M7A  Downtown Toronto  Queen's Park, Ontario Provincial Government   

    Latitude  Longitude  
0  43.753259 -79.329656  
1  43.725882 -79.315572  
2  43.654260 -79.360636  
3  43.718518 -79.464763  
4  43.662301 -79.389494  
(103, 5)
['North York' 'Downtown Toronto' 'Etobicoke' 'Scarborough' 'East York'
 'York' 'East Toronto' 'West Toronto' 'Central Toronto' 'Mississauga']


There are 103 neighborhoods in 10 boroughs in Toronto. Let's visualize the neighborhoods by displaying them on a map:

In [201]:
toronto_center_lat_lon = Nominatim(user_agent="toro_explorer").geocode('Toronto, Ontario')
map_toronto = folium.Map(location=[toronto_center_lat_lon.latitude, toronto_center_lat_lon.longitude], zoom_start=9.5)
map_city_neighborhoods(map_toronto, toronto_neighborhoods_df)
map_toronto

## 4. Explore Neighborhoods <a class="anchor" id="chapter4"></a>

We will start utilizing the Foursquare API to explore the neighborhoods. Let's define a function to get neighborhoods' venues and their categories.

In [138]:
CLIENT_ID = '2ZVLMGL3ZDLTBPYFAUIL0AC1ASMGJN4TOFQL5PNSZHFI0DL1' # my Foursquare ID
CLIENT_SECRET = 'KACXAZDJRR1ABEXO112TJ4PODFHEEG4KJ2PETAC20PAWDOM1' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
RADIUS = 400 # define radius
print('FourSquare API CLIENT_ID: ' + CLIENT_ID)
print('FourSquare API CLIENT_SECRET:' + CLIENT_SECRET)


def getNearbyVenues(names, latitudes, longitudes, radius=RADIUS):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request.
        # we know that all the information is in the items key.
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

FourSquare API CLIENT_ID: 2ZVLMGL3ZDLTBPYFAUIL0AC1ASMGJN4TOFQL5PNSZHFI0DL1
FourSquare API CLIENT_SECRET:KACXAZDJRR1ABEXO112TJ4PODFHEEG4KJ2PETAC20PAWDOM1


Get venues for all NYC neighborhoods:

In [139]:
nyc_venues = getNearbyVenues(names=nyc_neighborhoods_df['Neighborhood'],
                                   latitudes=nyc_neighborhoods_df['Latitude'],
                                   longitudes=nyc_neighborhoods_df['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [140]:
print('There are {} uniques venue categories and {} venues in NYC neighborhoods.'.format(
    len(nyc_venues['Venue Category'].unique()), nyc_venues.shape[0]))
nyc_venues.head()

There are 415 uniques venue categories and 7717 venues in NYC neighborhoods.


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
3,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station
4,Wakefield,40.894705,-73.847201,Yafai Corner Store,40.894745,-73.850307,Candy Store


Get venues for all Toronto neighborhoods:

In [141]:
toronto_venues = getNearbyVenues(names=toronto_neighborhoods_df['Neighborhood'],
                                   latitudes=toronto_neighborhoods_df['Latitude'],
                                   longitudes=toronto_neighborhoods_df['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [142]:
print('There are {} uniques venue categories and {} venues in Toronto neighborhoods.'.format(
    len(toronto_venues['Venue Category'].unique()), toronto_venues.shape[0]))
toronto_venues.head()

There are 245 uniques venue categories and 1637 venues in Toronto neighborhoods.


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


## 5. Analyze Neighborhoods <a class="anchor" id="chapter5"></a>

We now have all neighborhoods for both NYC and Toronto and their nearby venues. Before we can compare them, we need to combine their neighborhoods data into a master dataframe. Since neighborhood names can be the same in the two cities, we will need to add city name to the neighborhood names in the master neighborhoods dataframe so that they are unique. 

In [143]:
nyc_venues['Neighborhood'] = nyc_venues['Neighborhood'].apply(lambda x: 'NYC ' + x)

In [146]:
toronto_venues['Neighborhood'] = toronto_venues['Neighborhood'].apply(lambda x: 'Toronto ' + x)

In [None]:
# Combine nyc venues and toronto venues to form a master venues
nyc_toronto_venues = pd.concat([nyc_venues, toronto_venues], ignore_index=True)
nyc_toronto_venues.rename(columns={'Neighborhood': 'City Neighborhood'}, inplace=True)

In [171]:
# Drop cloumns not needed for clustering
nyc_toronto_venues.drop(
    ['Latitude', 'Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude'], axis=1, inplace=True)
nyc_toronto_venues.shape

(9354, 2)

In [172]:
nyc_toronto_venues.head()

Unnamed: 0,City Neighborhood,Venue Category
0,NYC Wakefield,Dessert Shop
1,NYC Wakefield,Pharmacy
2,NYC Wakefield,Pharmacy
3,NYC Wakefield,Gas Station
4,NYC Wakefield,Candy Store


In [173]:
nyc_toronto_venues.tail()

Unnamed: 0,City Neighborhood,Venue Category
9349,"Toronto Mimico NW, The Queensway West, South o...",Grocery Store
9350,"Toronto Mimico NW, The Queensway West, South o...",Social Club
9351,"Toronto Mimico NW, The Queensway West, South o...",Tanning Salon
9352,"Toronto Mimico NW, The Queensway West, South o...",Kids Store
9353,"Toronto Mimico NW, The Queensway West, South o...",Thrift / Vintage Store


One-hot encode the venue categories so we can use k-means cluster algorithm later.

In [174]:
venue_categories_onehot = pd.get_dummies(nyc_toronto_venues['Venue Category'])
venues_onehot = pd.concat([nyc_toronto_venues, venue_categories_onehot], axis=1)
venues_onehot.drop(['Venue Category'], axis=1, inplace=True)

Let's group venues by city neighborhood and taking the mean of the frequency of occurrence of each category

In [175]:
venues_onehot_grouped = venues_onehot.groupby('City Neighborhood').mean().reset_index()
print(venues_onehot_grouped.shape)
venues_onehot_grouped.head()

(392, 444)


Unnamed: 0,City Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Basketball Stadium,Bath House,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Campground,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Rec Center,College Stadium,College Theater,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Indoor Play Area,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean BBQ Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Nightlife Spot,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Gym,Outdoor Sculpture,Outdoors & Recreation,Outlet Mall,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Plane,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Post Office,Poutine Place,Print Shop,Pub,Public Art,Puerto Rican Restaurant,Racetrack,Ramen Restaurant,Real Estate Office,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roller Rink,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Lodge,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Strip Club,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Toll Plaza,Tonkatsu Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Waste Facility,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,NYC Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.148148,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,NYC Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,NYC Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,NYC Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,NYC Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 6. Cluster Neighborhoods <a class="anchor" id="chapter6"></a>

Run k-means to cluster NYC and Toronto neighborhoods.

In [352]:
kclusters = 3
venues_onehot_grouped_clustering = venues_onehot_grouped.drop('City Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venues_onehot_grouped_clustering)
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 0, 1, 1, 1, 1], dtype=int32)

Let's create a new dataframe that has the top 10 venues for each neighborhood as well as the cluster for each neighborhood.

In [353]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['City Neighborhood'] = venues_onehot_grouped['City Neighborhood']

for ind in np.arange(venues_onehot_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_onehot_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,City Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,NYC Allerton,Pizza Place,Chinese Restaurant,Deli / Bodega,Supermarket,Playground
1,NYC Annadale,Pizza Place,American Restaurant,Food,Bagel Shop,Dance Studio
2,NYC Arden Heights,Deli / Bodega,Pharmacy,Coffee Shop,Smoke Shop,Bus Stop
3,NYC Arlington,Deli / Bodega,Coffee Shop,Gay Bar,Home Service,Bus Stop
4,NYC Arrochar,Deli / Bodega,Italian Restaurant,Mediterranean Restaurant,Food Truck,Pizza Place


In [354]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
nyc_neighborhoods_df['City Neighborhood'] = nyc_neighborhoods_df['Neighborhood'].apply(lambda x: 'NYC ' + x)
toronto_neighborhoods_df['City Neighborhood'] = toronto_neighborhoods_df['Neighborhood'].apply(lambda x: 'Toronto ' + x)
nyc_neighborhoods = nyc_neighborhoods_df.loc[:, ['City Neighborhood', 'Latitude', 'Longitude']]
toronto_neighborhoods = toronto_neighborhoods_df.loc[:, ['City Neighborhood', 'Latitude', 'Longitude']]
nyc_toronto_neighborhoods = pd.concat([nyc_neighborhoods, toronto_neighborhoods], ignore_index=True)
# nyc_toronto_neighborhoods.rename(columns={'Neighborhood': 'City Neighborhood'}, inplace=True)
nyc_toronto_neighborhoods_clusters = nyc_toronto_neighborhoods
# merge venues_onehot_grouped with nyc_toronto_neighborhoods_clusters to add latitude/longitude for each neighborhood
nyc_toronto_neighborhoods_clusters = nyc_toronto_neighborhoods_clusters.join(neighborhoods_venues_sorted.set_index('City Neighborhood'), on='City Neighborhood')
nyc_toronto_neighborhoods_clusters.dropna(inplace=True)
nyc_toronto_neighborhoods_clusters = nyc_toronto_neighborhoods_clusters.astype({'Cluster Label': int})

In [355]:
nyc_toronto_neighborhoods_clusters.head()

Unnamed: 0,City Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,NYC Wakefield,40.894705,-73.847201,1,Pharmacy,Deli / Bodega,Gas Station,Dessert Shop,Candy Store
1,NYC Co-op City,40.874294,-73.829939,1,Park,Grocery Store,Trail,Liquor Store,Salon / Barbershop
2,NYC Eastchester,40.887556,-73.827806,1,Caribbean Restaurant,Diner,Deli / Bodega,Metro Station,Bus Station
3,NYC Fieldston,40.895437,-73.905643,1,Plaza,River,Yoga Studio,Escape Room,Ethiopian Restaurant
4,NYC Riverdale,40.890834,-73.912585,1,Plaza,Playground,Moving Target,Farmers Market,Ethiopian Restaurant


Finally, we can visualize the Toronto and NYC neighborhoods clusters. First, let us define a map display function.

In [396]:
# Function that creates neighborhood cluster map
def show_map(map_center_lat_lon, included_clusters=None, show_center=False, center_label=''):
    map_clusters = folium.Map(location=[map_center_lat_lon.latitude, map_center_lat_lon.longitude], zoom_start=10)

    if show_center:
        folium.CircleMarker(
            [map_center_lat_lon.latitude, map_center_lat_lon.longitude],
            radius=8,
            popup=folium.Popup(center_label, parse_html=True),
            color='red',
            fill=True,
            fill_color='red',
            fill_opacity=0.7).add_to(map_clusters)
        
    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]
    display_clusters = included_clusters or nyc_toronto_neighborhoods_clusters['Cluster Label'].unique().tolist()
    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(nyc_toronto_neighborhoods_clusters['Latitude'], 
                                      nyc_toronto_neighborhoods_clusters['Longitude'], 
                                      nyc_toronto_neighborhoods_clusters['City Neighborhood'], 
                                      nyc_toronto_neighborhoods_clusters['Cluster Label']
                                     ):
        if cluster not in display_clusters:
            continue
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)

    return map_clusters

This is NYC map showing all neighborhood clusters:

In [397]:
map_nyc_all_clusters = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('New York, NY'))
map_nyc_all_clusters

And this is Toronto map showing all neighborhood clusters:

In [358]:
map_toronto_all_clusters = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('Toronto, Ontario'))
map_toronto_all_clusters

## 7. Examine Neighborhood Clusters <a class="anchor" id="chapter7"></a>

Recall that one of the goals of this project is to discover how NYC neighborhoods and Toronto neighborhoods are similar or dissimilar, in terms of nearby venues. So, let us examine each neighborhood clusters.

### 7.1. Cluster 1 - Beach Neighborhoods

A couple of interesting observations:
- Neighborhoods in this cluster have most **beach, yoga studio, and related venues**.
- These are all NYC neighborhoods; Toronto doesn't have any neighborhoods in this cluster. 
- If you look at the NYC and Toronto maps below showing this cluster, this seems to make sense, as Toronto doesn't have beech neighborhoods. 

In [361]:
nyc_toronto_neighborhoods_clusters.loc[
    nyc_toronto_neighborhoods_clusters['Cluster Label'] == 0, 
    nyc_toronto_neighborhoods_clusters.columns[[0] + list(
        range(4, nyc_toronto_neighborhoods_clusters.shape[1]))]
]

Unnamed: 0,City Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
85,NYC Sea Gate,Beach,Lighthouse,Dog Run,Yoga Studio,Field
172,NYC Breezy Point,Trail,Beach,Monument / Landmark,Yoga Studio,Field
177,NYC Arverne,Surf Spot,Playground,Beach,Bus Stop,Filipino Restaurant
179,NYC Neponsit,Beach,Yoga Studio,Fruit & Vegetable Store,Ethiopian Restaurant,Event Service
190,NYC Belle Harbor,Beach,Playground,Yoga Studio,Escape Room,Event Service
204,NYC South Beach,Beach,Baseball Field,Deli / Bodega,Pier,Gym
302,NYC Hammels,Beach,Neighborhood,Building,Fast Food Restaurant,Bus Stop


In [359]:
map_nyc_cluster_1 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('New York, NY'), 
                             included_clusters=[0])
map_nyc_cluster_1

In [394]:
map_toronto_cluster_1 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('Toronto, Ontario'), 
                             included_clusters=[0])
map_toronto_cluster_1

### 7.2. Cluster 2 - Restaurant and Food Neighborhoods

Large number of both NYC and Toronto neighborhoods are in this cluster. Many of them have **various types of restaurants, coffee shops, bakeries as their most common venues**. For example, all neighborhoods below in this cluster have Chinese Restaurant as their 1st most common venue. However, one thing is kind of surprising is that only one Toronto neighborhood has Chinese Restaurant as its 1st most common venue, the rest are all in NYC.

In [371]:
filt = (nyc_toronto_neighborhoods_clusters['1st Most Common Venue'] == 'Chinese Restaurant')
nyc_toronto_neighborhoods_clusters.loc[filt]

Unnamed: 0,City Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
25,NYC Morrisania,40.823592,-73.901506,1,Chinese Restaurant,Bus Station,Fast Food Restaurant,Sandwich Place,Playground
47,NYC Bensonhurst,40.611009,-73.99518,1,Chinese Restaurant,Italian Restaurant,Butcher,Pet Store,Park
53,NYC Manhattan Terrace,40.614433,-73.957438,1,Chinese Restaurant,Jazz Club,Bowling Alley,Fast Food Restaurant,Filipino Restaurant
56,NYC East Flatbush,40.641718,-73.936103,1,Chinese Restaurant,Caribbean Restaurant,Supermarket,Fast Food Restaurant,Liquor Store
72,NYC East New York,40.669926,-73.880699,1,Chinese Restaurant,Deli / Bodega,Event Service,Bus Station,Asian Restaurant
73,NYC Starrett City,40.647589,-73.87937,1,Chinese Restaurant,Caribbean Restaurant,Bus Stop,American Restaurant,Donut Shop
99,NYC Fort Hamilton,40.614768,-74.031979,1,Chinese Restaurant,Italian Restaurant,Sandwich Place,Donut Shop,Bank
100,NYC Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Cocktail Bar,Bakery,Hotpot Restaurant,Dessert Shop
119,NYC Lower East Side,40.717807,-73.98089,1,Chinese Restaurant,Coffee Shop,Pharmacy,Art Gallery,Yoga Studio
153,NYC Little Neck,40.770826,-73.738898,1,Chinese Restaurant,Korean Restaurant,Pizza Place,Spa,Italian Restaurant


In [362]:
map_nyc_cluster_2 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('New York, NY'), 
                             included_clusters=[1])
map_nyc_cluster_2

In [363]:
map_toronto_cluster_2 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('Toronto, Ontario'), 
                             included_clusters=[1])
map_toronto_cluster_2

### 7.3. Cluster 3 - Park Neighborhoods

Some observations for this neighborhood cluster:
- Neighborhoods in this cluster have **most parks and park related venues**.
- Toronto has many more park neighborhoods than NYC, which is not that surprising given that NYC is much more densely populated and real estates are more developed. 
- All neighborhoods in this cluster tend to be outside the downtown and in around suburbs, again not surprising.

In [367]:
nyc_toronto_neighborhoods_clusters.loc[
    nyc_toronto_neighborhoods_clusters['Cluster Label'] == 2, 
    nyc_toronto_neighborhoods_clusters.columns[[0] + list(
        range(4, nyc_toronto_neighborhoods_clusters.shape[1]))]
]

Unnamed: 0,City Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
27,NYC Clason Point,Park,Pool,Playground,Deli / Bodega,Candy Store
148,NYC South Ozone Park,Park,Hotel Bar,Food,Sandwich Place,Donut Shop
203,NYC Todt Hill,Park,Yoga Studio,Field,Ethiopian Restaurant,Event Service
238,NYC Butler Manor,Baseball Field,Park,Pool,Yoga Studio,Field
275,NYC Stuyvesant Town,Park,Playground,Fountain,Baseball Field,Cocktail Bar
306,Toronto Parkwoods,Food & Drink Shop,Park,Yoga Studio,Field,Ethiopian Restaurant
327,Toronto Caledonia-Fairbanks,Women's Store,Construction & Landscaping,Park,Field,Ethiopian Restaurant
341,"Toronto East Toronto, Broadview North (Old Eas...",Convenience Store,Intersection,Park,Yoga Studio,Field
355,"Toronto North Park, Maple Leaf Park, Upwood Park",Bakery,Park,Yoga Studio,Fruit & Vegetable Store,Event Service
370,Toronto Weston,Park,Yoga Studio,Field,Ethiopian Restaurant,Event Service


In [365]:
map_nyc_cluster_3 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('New York, NY'), 
                             included_clusters=[2])
map_nyc_cluster_3

In [398]:
map_toronto_cluster_3 = show_map(Nominatim(user_agent="ny_toronto_explorer").geocode('Toronto, Ontario'), 
                             included_clusters=[2])
map_toronto_cluster_3

## 8. Find Similar Neighborhoods <a class="anchor" id="chapter8"></a>

Recall that another goal of this project is that given a neighborhood in origin city, determine the similar neighborhoods in the destination city centered around a business location, for example, Google NYC office?

Let us demonstrate this using **Toronto Weston** as the origin city and neighborhood. We want to find all similar neighborhoods in NYC, display them on a map and also show the location of **Google NYC office**.

In [387]:
origin_city_neighborhood = 'Toronto Weston'
origin_city_neighborhood_cluster = nyc_toronto_neighborhoods_clusters.loc[
    nyc_toronto_neighborhoods_clusters['City Neighborhood'] == origin_city_neighborhood, 
    'Cluster Label'].tolist()[0]
origin_city_neighborhood_cluster

filt = (nyc_toronto_neighborhoods_clusters['Cluster Label'] == origin_city_neighborhood_cluster) & (nyc_toronto_neighborhoods_clusters['City Neighborhood'].str.contains('NYC'))
destination_city_neighborhoods = nyc_toronto_neighborhoods_clusters.loc[filt]

These are the NYC neighborhoods which are similar with Toronto Weston Neighborhood:

In [388]:
destination_city_neighborhoods

Unnamed: 0,City Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
27,NYC Clason Point,40.806551,-73.854144,2,Park,Pool,Playground,Deli / Bodega,Candy Store
148,NYC South Ozone Park,40.66855,-73.809865,2,Park,Hotel Bar,Food,Sandwich Place,Donut Shop
203,NYC Todt Hill,40.597069,-74.111329,2,Park,Yoga Studio,Field,Ethiopian Restaurant,Event Service
238,NYC Butler Manor,40.506082,-74.229504,2,Baseball Field,Park,Pool,Yoga Studio,Field
275,NYC Stuyvesant Town,40.731,-73.974052,2,Park,Playground,Fountain,Baseball Field,Cocktail Bar


Let us show on map **NYC neighborhoods similar to Toronto Weston with Google NYC office**:

In [400]:
map_nyc_similar_neighborhoods = show_map(
    Nominatim(user_agent="ny_toronto_explorer").geocode('111 8th Ave, New York, NY 10011'), 
    included_clusters=[2],
    show_center=True,
    center_label='Google NYC'
)

map_nyc_similar_neighborhoods