# **Coursera Capstone - Final Assessment Notebook**

Welcome to my Notebook for the capstone project of the IBM Data Science Professional Certificate.
This notebook will contain all the code I used to produce my final results and conclusions.

Here is an overview of the problem I will be trying to solve:
I currently live in Manchester (UK) city centre, but I am looking to move to either Toronto or New York. How do I decide which city and, more specifically, which neighbourhood I want to live in?

Manchester is an international university city with lots of students from different backgrounds who want to move to the world’s financial centres for post-graduation employment. Everyone is different, but we all agree that the Mancunian social scene is great.
Thus, my selection criteria will be geared towards moving to a neighbourhood in Toronto or New York which has similar amenities.


Please enjoy!

## **Part 1 - Data Mining**

First I must collect and amalgamate all the data I will be using.
<br>
But before that I am going to install all the relevant python libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    openssl-1.1.1j             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be

**1a - New York Data**

In this section I will collect New York's neighbourhoods' location data

In [2]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
print('Data loaded!')

Data loaded!


In [4]:
#taking a look at the data
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Really big dataset, however all the information I need is captured in the features key

In [5]:
neighborhoods_data = newyork_data['features']
neighborhoods_data

[{'type': 'Feature',
  'id': 'nyu_2451_34572.1',
  'geometry': {'type': 'Point',
   'coordinates': [-73.84720052054902, 40.89470517661]},
  'geometry_name': 'geom',
  'properties': {'name': 'Wakefield',
   'stacked': 1,
   'annoline1': 'Wakefield',
   'annoline2': None,
   'annoline3': None,
   'annoangle': 0.0,
   'borough': 'Bronx',
   'bbox': [-73.84720052054902,
    40.89470517661,
    -73.84720052054902,
    40.89470517661]}},
 {'type': 'Feature',
  'id': 'nyu_2451_34572.2',
  'geometry': {'type': 'Point',
   'coordinates': [-73.82993910812398, 40.87429419303012]},
  'geometry_name': 'geom',
  'properties': {'name': 'Co-op City',
   'stacked': 2,
   'annoline1': 'Co-op',
   'annoline2': 'City',
   'annoline3': None,
   'annoangle': 0.0,
   'borough': 'Bronx',
   'bbox': [-73.82993910812398,
    40.87429419303012,
    -73.82993910812398,
    40.87429419303012]}},
 {'type': 'Feature',
  'id': 'nyu_2451_34572.3',
  'geometry': {'type': 'Point',
   'coordinates': [-73.82780644716412, 

In [6]:
#taking the first item as an example to better understand dataset
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
#to analyse this data fully i need to convert it into a pandas dataframe
#first I am going to create an empty dataframe then I will load the data accordingly into it
# define the dataframe columns
column_names = ['Borough', 'Neighbourhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
ny_neighbourhoods = pd.DataFrame(columns=column_names)
ny_neighbourhoods

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude


In [8]:
#i will now use a loop to load the data into the dataframe one row at a time

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    ny_neighbourhoods = ny_neighbourhoods.append({'Borough': borough,
                                          'Neighbourhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
#let's examine this new dataframe!
ny_neighbourhoods.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [10]:
#sense check to ensure there are 306 neighbourhoods and 5 boroughs as expected

print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(ny_neighbourhoods['Borough'].unique()),
        ny_neighbourhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Now let's visualise our New York location data!

In [154]:
#first i must define an instance of the geocoder

address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [155]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(ny_neighbourhoods['Latitude'], ny_neighbourhoods['Longitude'], ny_neighbourhoods['Borough'], ny_neighbourhoods['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Now we have our New York location dataset we can move onto Toronto

**1b - Toronto Data**

Collecting the toronto data was slightly different as a table with Toronto's borough information can be found on wikipedia (link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)

I used this data in conjunction with the list of latitudes and longitudes provided in week 3 of the capstone project to produce the .csv file named "Toronto_Neighborhood_Data_LL". Prior to loading this data, I filtered out all the boroughs with a value of 'null'.

In [12]:
#cleaned data and removed null boroughs prior to upload 
toronto_neighbourhoods = pd.read_csv("Toronto_Neighborhood_Data_LL.csv")
toronto_neighbourhoods.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [13]:
#confirming there is 103 Toronto boroughs
toronto_neighbourhoods.shape

(103, 5)

In [14]:
#sense check to ensure there are 103 neighbourhoods and 10 boroughs as expected

print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toronto_neighbourhoods['Borough'].unique()),
        toronto_neighbourhoods.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


As we did with the NY location data, let's visualise our Toronto data

In [152]:
#First finding the longitude and latitude of Toronto
address = 'Toronto'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


In [153]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(toronto_neighbourhoods['Latitude'], toronto_neighbourhoods['Longitude'], toronto_neighbourhoods['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Click on any of the circles to reveal the name of the neighbourhood!!

Last but not least, I need the location data for Manchester

**1c - Manchester Data**

This was the easiest data to get, I went to the following website: http://zip-code.en.mapawi.com/united-kingdom/2/greater-manchester/2/60/manchester/m1/25483/ and downloaded the .csv.
<br>
I then filtered the data so that I was just left with my postcode (m1 5qd) and the relevant latitude and longitude data

In [15]:
manchester_neighbourhood = pd.read_csv("Manchester Neighbourhoods Data.csv")
manchester_neighbourhood

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1 5QD,Deansgate,Oxford Road,53.472531,-2.240887


In [150]:
#First finding the longitude and latitude of Manchester
address = 'Manchester'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Manchester are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Manchester are 53.4794892, -2.2451148.


In [151]:
# create map of Manchester using latitude and longitude values
map_manchester = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(manchester_neighbourhood['Latitude'], manchester_neighbourhood['Longitude'], manchester_neighbourhood['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manchester)  
    
map_manchester

**1d - Data Amalgamation**

Now that I have all my data, I am going to amalgamate it all onto one dataset using excel

In [16]:
#loading the new dataset
neighbourhoods = pd.read_csv("Combined Neighbourhood Data.csv")
neighbourhoods

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude
0,Deansgate,Oxford Road,53.472531,-2.240887
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
4,Scarborough,Woburn,43.770992,-79.216917
5,Scarborough,Cedarbrae,43.773136,-79.239476
6,Scarborough,Scarborough Village,43.744734,-79.239476
7,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
8,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
9,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476


Now I am ready to analyse the dataset and find the venues in each of the neighbourhoods

## **Part 2 - Finding Venue Data**


To find venue data I will use the Foursquare API

In [17]:
#Define Foursquare credentials for exploring neighbourhoods

CLIENT_ID = 'P0S3HZBWGBB5INOTZ4JC3NIUDLVAAEDF544S0JX4O3AOBXKM' # my Foursquare ID
CLIENT_SECRET = 'UROTBKVBPL054TMMUWTWJBT35XASLQ2XVHBIOTZJHJ5IPOBW' # my Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: P0S3HZBWGBB5INOTZ4JC3NIUDLVAAEDF544S0JX4O3AOBXKM
CLIENT_SECRET:UROTBKVBPL054TMMUWTWJBT35XASLQ2XVHBIOTZJHJ5IPOBW


In [18]:
#Defining new function to retrieve venue data for venues in relevant neighbourhoods
#I am setting a slightly smaller radius at 300 units as I want these venues to be right on my doorstep!!!
def getNearbyVenues(names, latitudes, longitudes, radius=300):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
nearby_venues = getNearbyVenues(names=neighbourhoods['Neighbourhood'],
                                latitudes=neighbourhoods['Latitude'],
                                longitudes=neighbourhoods['Longitude']
                                )

Oxford Road
Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town...
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amo...
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West,

In [20]:
#studying new venue dataframe
print(nearby_venues.shape)
nearby_venues.head()

(6049, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Oxford Road,53.472531,-2.240887,Costa Coffee,53.472734,-2.239394,Coffee Shop
1,Oxford Road,53.472531,-2.240887,Hatch,53.471972,-2.238277,Pop-Up Shop
2,Oxford Road,53.472531,-2.240887,Takk Espresso Bar,53.471834,-2.238618,Café
3,Oxford Road,53.472531,-2.240887,Zouk Tea Bar & Grill,53.472321,-2.240544,Indian Restaurant
4,Oxford Road,53.472531,-2.240887,The Salisbury Ale House,53.473969,-2.241055,Pub


In [21]:
#counting the number of venues in each neighbourhood
nearby_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,2,2,2,2,2,2
"Alderwood, Long Branch",5,5,5,5,5,5
Allerton,17,17,17,17,17,17
Annadale,2,2,2,2,2,2
Arden Heights,4,4,4,4,4,4
Arlington,2,2,2,2,2,2
Arrochar,10,10,10,10,10,10
Arverne,3,3,3,3,3,3
Astoria,20,20,20,20,20,20
Astoria Heights,10,10,10,10,10,10


In [22]:
#finding out how many unique venue categories there are
print('There are {} uniques categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 404 uniques categories.


In [23]:
#analysing each neighbourhood

# one hot encoding
neighbourhood_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neighbourhood_onehot['Neighbourhood'] = nearby_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [neighbourhood_onehot.columns[-1]] + list(neighbourhood_onehot.columns[:-1])
neighbourhood_onehot = neighbourhood_onehot[fixed_columns]

neighbourhood_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Canal,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean BBQ Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Lingerie Store,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Medical Supply Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Puerto Rican Restaurant,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Oxford Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Oxford Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Oxford Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Oxford Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Oxford Road,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [24]:
neighbourhood_onehot.shape

(6049, 405)

In [28]:
#grouping rows by neighbourhood and taking the mean of the frequency of occurrence of each category
neighbourhood_grouped = neighbourhood_onehot.groupby('Neighbourhood').mean().reset_index()
neighbourhood_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Gate,Airport Lounge,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Canal,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cha Chaan Teng,Cheese Shop,Chinese Restaurant,Chocolate Shop,Christmas Market,Church,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Basketball Court,College Bookstore,College Cafeteria,College Gym,College Rec Center,Colombian Restaurant,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cooking School,Cosmetics Shop,Costume Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Distribution Center,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hockey Arena,Home Service,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Insurance Office,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean BBQ Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Laundry Service,Lawyer,Leather Goods Store,Lebanese Restaurant,Library,Light Rail Station,Lighthouse,Lingerie Store,Liquor Store,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Medical Supply Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Newsstand,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Nightlife,Other Repair Shop,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pop-Up Shop,Portuguese Restaurant,Post Office,Print Shop,Pub,Public Art,Puerto Rican Restaurant,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Skate Park,Skating Rink,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Social Club,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Arden Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
neighbourhood_grouped.shape

(373, 405)

In [30]:
#taking the top 5 venues for each neighbourhood

num_top_venues = 5

for hood in neighbourhood_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = neighbourhood_grouped[neighbourhood_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0             Breakfast Spot   0.5
1  Latin American Restaurant   0.5
2          Accessories Store   0.0
3               Optical Shop   0.0
4       Pakistani Restaurant   0.0


----Alderwood, Long Branch----
         venue  freq
0          Pub   0.2
1  Coffee Shop   0.2
2          Gym   0.2
3     Pharmacy   0.2
4  Pizza Place   0.2


----Allerton----
                 venue  freq
0          Pizza Place  0.18
1  Martial Arts School  0.06
2       Breakfast Spot  0.06
3    Electronics Store  0.06
4           Bike Trail  0.06


----Annadale----
                  venue  freq
0         Train Station   0.5
1   American Restaurant   0.5
2     Accessories Store   0.0
3           Opera House   0.0
4  Pakistani Restaurant   0.0


----Arden Heights----
               venue  freq
0      Deli / Bodega  0.25
1        Coffee Shop  0.25
2           Bus Stop  0.25
3           Pharmacy  0.25
4  Accessories Store  0.00


----Arlington----
               

Wow, so much data!!

In [31]:
#now i need to write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
#and create a new dataframe and display the top 10 venues for each neighbourhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = neighbourhood_grouped['Neighbourhood']

for ind in np.arange(neighbourhood_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighbourhood_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Breakfast Spot,Latin American Restaurant,Yoga Studio,Farmers Market,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor
1,"Alderwood, Long Branch",Pub,Coffee Shop,Pizza Place,Gym,Pharmacy,Yoga Studio,Factory,Empanada Restaurant,English Restaurant,Entertainment Service
2,Allerton,Pizza Place,Spa,Discount Store,Chinese Restaurant,Martial Arts School,Electronics Store,Bike Trail,Gas Station,Donut Shop,Fast Food Restaurant
3,Annadale,American Restaurant,Train Station,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space
4,Arden Heights,Deli / Bodega,Coffee Shop,Bus Stop,Pharmacy,Yoga Studio,Farm,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service


In [33]:
#quickly studying the most common venues where i live
neighbourhoods_venues_sorted.loc[neighbourhoods_venues_sorted['Neighbourhood'] == 'Oxford Road']

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
248,Oxford Road,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool


No surprises here!!! Pubs and Bars galore in the centre of Manchester :)
<br>
Note: I have exported the neighbourhoods_venues_sorted table and may import this version in the future so I don't have to keep rerunning the same code!

## **Part 3 - k Nearest Neighbour Analysis**

I am not looking to segment these neighbourhoods per ser, but rather find at 10-15 other similar neighbourhoods in Toronto or New York. Therefore, I will not use the 'elbow' method typically utilised for finding the best k, but instead arbitrarily set my k at 40. After analysing the results I will adjust my k accordingly until I have found the results I am looking for

### Round 1

In [60]:
# set number of clusters
kclusters = 40

neighbourhood_grouped_clustering1 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering1)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([ 9,  4,  4, 19, 29, 29,  4, 10, 39, 27], dtype=int32)

In [51]:
kmeans.labels_

array([ 9,  4,  4, 19, 29, 29,  4, 10, 39, 27, 26,  4, 39, 39, 39, 27,  4,
       27,  4, 39, 39, 27, 17, 10, 27,  4, 26, 39, 12,  4, 27, 39,  4, 27,
        2, 39,  8, 27, 39, 39, 21, 27,  4, 39, 27, 27, 39,  8, 15, 19, 17,
       39, 39, 27,  4, 27, 11, 27, 39, 39, 27, 17, 39, 39, 27, 39, 17, 27,
        8, 19, 17, 39, 39, 17, 39, 27, 39, 17, 17, 39, 27,  1,  1,  4,  4,
       39, 39,  1, 17, 39, 21, 27, 27,  3, 39, 27, 39, 17, 27, 27,  1, 21,
        8,  4, 39, 39, 27, 17, 27, 21, 30,  4,  4,  4, 18,  4,  4, 39, 27,
       27, 39, 39, 27, 39, 27,  2, 39, 27,  7,  5,  1, 39, 27,  1, 17, 39,
       39, 39, 24,  1,  4,  4, 39, 39, 29, 27, 29,  4,  4, 39, 39, 35, 32,
       39, 10, 39,  4,  5, 27, 39, 27, 27, 27, 27, 27, 39, 27, 22, 35, 39,
       17, 39, 27, 27, 39, 27, 17, 27, 27,  4,  4,  4,  8, 15, 39, 27, 39,
       27, 39, 13, 39,  5, 39, 27, 39, 39, 17, 27, 18, 27,  1, 28, 39, 39,
       19, 17,  4, 29,  4,  4,  2, 27, 39, 39, 11, 17, 14, 36, 27,  8, 39,
       17, 21, 17, 17, 17

In [53]:
#quickly dropping the Cluster Labels column in case I previously ran this code
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

KeyError: "['Cluster Labels'] not found in axis"

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [55]:
neighbourhoods_venues_sorted1 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted1.insert(0, 'Cluster Labels', kmeans.labels_)

In [56]:
neighbourhoods_venues_sorted1.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,9,Agincourt,Breakfast Spot,Latin American Restaurant,Yoga Studio,Farmers Market,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor
1,4,"Alderwood, Long Branch",Pub,Coffee Shop,Pizza Place,Gym,Pharmacy,Yoga Studio,Factory,Empanada Restaurant,English Restaurant,Entertainment Service
2,4,Allerton,Pizza Place,Spa,Discount Store,Chinese Restaurant,Martial Arts School,Electronics Store,Bike Trail,Gas Station,Donut Shop,Fast Food Restaurant
3,19,Annadale,American Restaurant,Train Station,Yoga Studio,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space
4,29,Arden Heights,Deli / Bodega,Coffee Shop,Bus Stop,Pharmacy,Yoga Studio,Farm,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service


In [57]:
neighbourhoods_merged = neighbourhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
neighbourhoods_merged1 = neighbourhoods_merged.join(neighbourhoods_venues_sorted1.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged1.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,39.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,3.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,32.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,6.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [58]:
neighbourhoods_merged1.loc[neighbourhoods_merged1['Neighbourhood'] == 'Oxford Road']

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,39.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool


In [59]:
neighbourhoods_merged1.loc[neighbourhoods_merged1['Cluster Labels'] == 39]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,39.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
19,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,39.0,Clothing Store,Fast Food Restaurant,Restaurant,Japanese Restaurant,Coffee Shop,Cosmetics Shop,Juice Bar,Bank,Women's Store,Food Court
27,North York,Don Mills,43.745906,-79.352188,39.0,Restaurant,Clothing Store,Chinese Restaurant,Asian Restaurant,Bike Shop,Shopping Mall,Discount Store,Coffee Shop,Dim Sum Restaurant,Italian Restaurant
28,North York,Don Mills,43.7259,-79.340923,39.0,Restaurant,Clothing Store,Chinese Restaurant,Asian Restaurant,Bike Shop,Shopping Mall,Discount Store,Coffee Shop,Dim Sum Restaurant,Italian Restaurant
29,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,39.0,Coffee Shop,Ice Cream Shop,Fried Chicken Joint,Grocery Store,Gas Station,Restaurant,Shopping Mall,Sushi Restaurant,Pet Store,Supermarket
35,North York,Victoria Village,43.725882,-79.315572,39.0,Portuguese Restaurant,Intersection,French Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Yoga Studio,Falafel Restaurant,English Restaurant,Entertainment Service
39,East York,Leaside,43.70906,-79.363452,39.0,Sporting Goods Shop,Bank,Burger Joint,Mexican Restaurant,Electronics Store,Coffee Shop,Restaurant,Sandwich Place,Sushi Restaurant,Department Store
40,East York,Thorncliffe Park,43.705369,-79.349372,39.0,Indian Restaurant,Yoga Studio,Discount Store,Bus Line,Supermarket,Gas Station,Sandwich Place,Grocery Store,Gym,Burger Joint
42,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,39.0,Greek Restaurant,Ice Cream Shop,Tibetan Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Fruit & Vegetable Store,Dessert Shop,Indian Restaurant,Cosmetics Shop
43,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,39.0,Fast Food Restaurant,Movie Theater,Fish & Chips Shop,Light Rail Station,Park,Restaurant,Sushi Restaurant,Pet Store,Italian Restaurant,Liquor Store


Wow, clearly I need more clusters in order to break up the dataset more

## Round 2

In [61]:
# set number of clusters
kclusters = 80

neighbourhood_grouped_clustering2 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering2)

# check cluster labels generated for each row in the dataframe
kmeans2.labels_[0:10] 

array([27, 28, 28, 43,  4, 66,  0, 48,  2, 40], dtype=int32)

In [90]:
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

In [91]:
neighbourhoods_venues_sorted2 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted2.insert(0, 'Cluster Labels', kmeans2.labels_)

In [92]:
neighbourhoods_merged2 = neighbourhoods_merged.join(neighbourhoods_venues_sorted2.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged2.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,2.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,23.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,22.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,24.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [93]:
neighbourhoods_merged2.loc[neighbourhoods_merged2['Cluster Labels'] == 2]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,2.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
11,Scarborough,"Dorset Park, Wexford Heights, Scarborough Town...",43.75741,-79.273304,2.0,Brewery,Indian Restaurant,Thrift / Vintage Store,Light Rail Station,Farm,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service
19,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,2.0,Clothing Store,Fast Food Restaurant,Restaurant,Japanese Restaurant,Coffee Shop,Cosmetics Shop,Juice Bar,Bank,Women's Store,Food Court
27,North York,Don Mills,43.745906,-79.352188,2.0,Restaurant,Clothing Store,Chinese Restaurant,Asian Restaurant,Bike Shop,Shopping Mall,Discount Store,Coffee Shop,Dim Sum Restaurant,Italian Restaurant
28,North York,Don Mills,43.7259,-79.340923,2.0,Restaurant,Clothing Store,Chinese Restaurant,Asian Restaurant,Bike Shop,Shopping Mall,Discount Store,Coffee Shop,Dim Sum Restaurant,Italian Restaurant
40,East York,Thorncliffe Park,43.705369,-79.349372,2.0,Indian Restaurant,Yoga Studio,Discount Store,Bus Line,Supermarket,Gas Station,Sandwich Place,Grocery Store,Gym,Burger Joint
42,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,2.0,Greek Restaurant,Ice Cream Shop,Tibetan Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Fruit & Vegetable Store,Dessert Shop,Indian Restaurant,Cosmetics Shop
43,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,2.0,Fast Food Restaurant,Movie Theater,Fish & Chips Shop,Light Rail Station,Park,Restaurant,Sushi Restaurant,Pet Store,Italian Restaurant,Liquor Store
45,Central Toronto,Lawrence Park,43.72802,-79.38879,2.0,Gym / Fitness Center,Jewelry Store,Photography Studio,Yoga Studio,Farm,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space
46,Central Toronto,Davisville North,43.712751,-79.390197,2.0,Gym / Fitness Center,Breakfast Spot,Gym,Convenience Store,Yoga Studio,Farmers Market,Escape Room,Ethiopian Restaurant,Event Service,Event Space


Closer but we can do better

## Round 3

In [67]:
# set number of clusters
kclusters = 120

neighbourhood_grouped_clustering3 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans3 = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering3)

# check cluster labels generated for each row in the dataframe
kmeans3.labels_[0:10] 

array([ 18,  90,  90,  57, 102,  49,  26, 116,  79,  35], dtype=int32)

In [94]:
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

In [95]:
neighbourhoods_venues_sorted3 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted3.insert(0, 'Cluster Labels', kmeans3.labels_)

In [96]:
neighbourhoods_merged3 = neighbourhoods_merged.join(neighbourhoods_venues_sorted3.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged3.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,5.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,10.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,31.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,3.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [97]:
neighbourhoods_merged3.loc[neighbourhoods_merged3['Cluster Labels'] == 5]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,5.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
19,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,5.0,Clothing Store,Fast Food Restaurant,Restaurant,Japanese Restaurant,Coffee Shop,Cosmetics Shop,Juice Bar,Bank,Women's Store,Food Court
29,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,5.0,Coffee Shop,Ice Cream Shop,Fried Chicken Joint,Grocery Store,Gas Station,Restaurant,Shopping Mall,Sushi Restaurant,Pet Store,Supermarket
35,North York,Victoria Village,43.725882,-79.315572,5.0,Portuguese Restaurant,Intersection,French Restaurant,Coffee Shop,Pizza Place,Hockey Arena,Yoga Studio,Falafel Restaurant,English Restaurant,Entertainment Service
39,East York,Leaside,43.70906,-79.363452,5.0,Sporting Goods Shop,Bank,Burger Joint,Mexican Restaurant,Electronics Store,Coffee Shop,Restaurant,Sandwich Place,Sushi Restaurant,Department Store
40,East York,Thorncliffe Park,43.705369,-79.349372,5.0,Indian Restaurant,Yoga Studio,Discount Store,Bus Line,Supermarket,Gas Station,Sandwich Place,Grocery Store,Gym,Burger Joint
44,East Toronto,Studio District,43.659526,-79.340923,5.0,Coffee Shop,Ice Cream Shop,Comfort Food Restaurant,Café,Fish Market,Gastropub,Bookstore,Gay Bar,Seafood Restaurant,Cheese Shop
48,Central Toronto,Davisville,43.704324,-79.38879,5.0,Coffee Shop,Café,Dessert Shop,Pizza Place,Italian Restaurant,Indian Restaurant,Diner,Thai Restaurant,Seafood Restaurant,Toy / Game Store
52,Downtown Toronto,"St. James Town, Cabbagetown",43.667967,-79.367675,5.0,Coffee Shop,Café,Pizza Place,Restaurant,Jewelry Store,Butcher,General Entertainment,Beer Store,Sandwich Place,Liquor Store
54,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,5.0,Food Truck,Convenience Store,Bakery,Coffee Shop,Park,Distribution Center,Bus Stop,History Museum,Gym / Fitness Center,Breakfast Spot


## Round 4

In [77]:
# set number of clusters
kclusters = 350

neighbourhood_grouped_clustering4 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans4 = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering4)

# check cluster labels generated for each row in the dataframe
kmeans4.labels_[0:10] 

array([ 24, 155,  32,  60, 154,  65, 211,  80, 288, 207], dtype=int32)

In [84]:
kmeans4.labels_

array([ 24, 155,  32,  60, 154,  65, 211,  80, 288, 207,  69, 224, 295,
       319, 321, 259, 213, 298, 248, 251, 191, 263, 199, 217, 102, 302,
       118, 153,  14, 136, 208, 308, 120,  49,  18, 230,  25, 239, 284,
       170,  63, 232, 261, 299, 165,  66, 127,  31,  95, 184,  40,   4,
       122, 129, 193, 106, 262, 316, 209, 344, 107, 179, 349, 311,   3,
       330,  89, 124, 140,  51, 161, 327, 322, 256, 275, 282, 123, 173,
       222, 241,  92, 227,   8, 287, 247, 285, 100, 167, 214, 238,  91,
       114, 234,  86, 246, 204, 328, 210, 141, 187, 190, 181,  47, 194,
       291, 325, 197, 195,  94,  34,  16, 160,  97, 306,  19, 231, 245,
       314,  41,  76, 336, 123,  54,  29, 236,  18, 276, 128,  12,  11,
       104,  75, 300,  33, 177, 290,   7, 212,  44, 134, 142,  43, 226,
       337, 115, 237, 172, 267, 277, 347,  37,  50,  22, 339, 133,   7,
        27,  11, 188, 228, 110,  53, 303, 296,  15, 309, 257,  73,  48,
       343, 175, 286,  88, 249, 326, 148, 312, 107, 189, 215, 10

In [86]:
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

KeyError: "['Cluster Labels'] not found in axis"

In [87]:
neighbourhoods_venues_sorted4 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted4.insert(0, 'Cluster Labels', kmeans4.labels_)

In [88]:
neighbourhoods_merged4 = neighbourhoods_merged

neighbourhoods_merged4 = neighbourhoods_merged.join(neighbourhoods_venues_sorted4.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged4.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,294.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,9.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,22.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,21.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [89]:
neighbourhoods_merged4.loc[neighbourhoods_merged4['Cluster Labels'] == 294]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,294.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool


Now my neighbourhood is all by itself! No good, let's lower the number of clusters

## Round 5

In [99]:
# set number of clusters
kclusters = 250

neighbourhood_grouped_clustering5 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans5 = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering5)

# check cluster labels generated for each row in the dataframe
kmeans5.labels_ 

array([ 33, 104, 243,  41, 132,  52, 202,  95,  75, 211,  50, 209, 210,
        49, 184, 242, 230, 135, 177, 245, 214,  34, 127, 166,  59,  25,
        73, 147,  13,  69, 170,   3, 124,  42,   2, 135,  76, 193, 210,
        75, 105, 208,  75,  23, 171,  65, 149,  98,  78, 102,  58, 184,
       184,  85, 196, 143,  21, 135, 223, 135, 135, 142, 135,  75,  88,
       135, 232,  48, 111,  35, 197, 135, 135, 224, 239,  75, 210, 191,
       205,   0, 175, 114,  43, 243, 179, 210,  60,  96,  34, 135, 138,
        94,   0,  63, 135, 163, 135, 190, 115, 199, 169,  57,  37, 192,
         3, 135, 212,  54, 165,  32,   4, 164, 122,  34,  68, 229, 204,
       242,  34,  67,   3, 210, 108, 134, 240,   2, 135,  88,  12,   5,
       113, 134,  34,  22, 136,  49, 210, 242,  44, 154, 155, 109, 180,
       135, 126, 221, 194,  34,  25,   3, 184, 112,  26, 135,  77, 210,
       215,   5, 158, 135, 131,  55,  34,   0,  18,   3, 234,  38,  47,
       134, 185,  34,  34, 238, 242, 168,  34, 135, 203, 248, 13

In [100]:
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

In [102]:
neighbourhoods_venues_sorted5 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted5.insert(0, 'Cluster Labels', kmeans5.labels_)

In [103]:
neighbourhoods_merged5 = neighbourhoods_merged

neighbourhoods_merged5 = neighbourhoods_merged.join(neighbourhoods_venues_sorted5.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged5.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,3.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,15.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,26.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,14.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [104]:
neighbourhoods_merged5.loc[neighbourhoods_merged5['Cluster Labels'] == 3]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,3.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
78,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,3.0,Bar,Asian Restaurant,Art Gallery,Theater,Bakery,Beer Store,Brewery,Cocktail Bar,Record Shop,Yoga Studio
153,Brooklyn,Greenpoint,40.730201,-73.954241,3.0,Bar,Coffee Shop,Cocktail Bar,Grocery Store,Mexican Restaurant,Supermarket,Spa,Café,Sushi Restaurant,Pizza Place
163,Brooklyn,Prospect Heights,40.676822,-73.964859,3.0,Bar,Cocktail Bar,Mexican Restaurant,Brewery,Thai Restaurant,Grocery Store,Greek Restaurant,Sushi Restaurant,Garden Center,Garden
171,Brooklyn,Red Hook,40.676253,-74.012759,3.0,Ice Cream Shop,Wine Shop,Flower Shop,Brewery,Bagel Shop,Thai Restaurant,Art Gallery,Coffee Shop,Sandwich Place,Bar
191,Brooklyn,Boerum Hill,40.685683,-73.983748,3.0,Furniture / Home Store,Bar,Italian Restaurant,Thrift / Vintage Store,Sandwich Place,Men's Store,Kids Store,Concert Hall,Jewelry Store,Middle Eastern Restaurant
200,Brooklyn,North Side,40.714823,-73.958809,3.0,Coffee Shop,Bar,Dive Bar,Wine Bar,American Restaurant,Bakery,Burger Joint,Pizza Place,Yoga Studio,Ice Cream Shop
201,Brooklyn,South Side,40.710861,-73.958001,3.0,Pizza Place,Bar,Coffee Shop,Latin American Restaurant,Yoga Studio,American Restaurant,Dive Bar,Burger Joint,South American Restaurant,Deli / Bodega
211,Manhattan,Upper East Side,40.775639,-73.960508,3.0,Art Gallery,Cosmetics Shop,Italian Restaurant,Coffee Shop,Hotel,Sandwich Place,Grocery Store,Bakery,Bar,Bridal Shop
222,Manhattan,East Village,40.727847,-73.982226,3.0,Bar,Korean Restaurant,Mexican Restaurant,Cocktail Bar,Vegetarian / Vegan Restaurant,Pizza Place,Ice Cream Shop,Wine Bar,Italian Restaurant,Ramen Restaurant


Getting really close now!! Let's see if we can narrow it down a bit more though

## Round 6

In [144]:
# set number of clusters
kclusters = 265

neighbourhood_grouped_clustering6 = neighbourhood_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans6 = KMeans(n_clusters=kclusters, random_state=0).fit(neighbourhood_grouped_clustering6)

# check cluster labels generated for each row in the dataframe
kmeans6.labels_ 

array([ 23, 126, 262,  28, 164,  35, 163,   3,   1, 207,  70,  94,  25,
       148,   4, 233, 199, 185, 217, 250, 191, 260, 134,  47,  96, 187,
        49, 137,  52,  92, 172, 108, 129,  42,  14, 249,  91,  66, 185,
         1,   6, 219,  94, 108, 171,  68, 138,  58,  88, 147,  71,   4,
         4,  98, 177, 132,  25, 108, 209, 148, 148, 143, 148,   1, 133,
       148, 247,  93,  82,  64, 127, 148, 148, 235, 229,   1,  25, 155,
       202, 241, 104, 258,  30, 240, 254,  25,  97,  86, 159, 148,  61,
       113, 239,  46, 148, 195, 148, 197, 149, 228, 175, 160,  31, 181,
       108, 148, 203, 173,  55, 119,  20, 184,  78, 187,  45, 206, 211,
       233, 252,  76,  25,  25, 109, 148, 248,  14, 148, 133,   8,   2,
        87, 148,   1,  17, 151, 148,  25, 224,  16, 111, 107, 106, 156,
       148,  84, 230, 193, 264, 108, 108,   4,  56,  48, 108, 118,  25,
       223,   2, 158, 257,  41,  38, 185,  83,  29, 148, 214,  67,  51,
         4, 179, 256, 185,   1, 233, 154,  37, 185, 198, 208, 17

In [145]:
neighbourhoods_venues_sorted = neighbourhoods_venues_sorted.drop('Cluster Labels',1)

In [146]:
neighbourhoods_venues_sorted6 = neighbourhoods_venues_sorted

# add clustering labels
neighbourhoods_venues_sorted6.insert(0, 'Cluster Labels', kmeans6.labels_)

In [147]:
neighbourhoods_merged6 = neighbourhoods_merged

neighbourhoods_merged6 = neighbourhoods_merged.join(neighbourhoods_venues_sorted6.set_index('Neighbourhood'), on='Neighbourhood')

neighbourhoods_merged6.head()

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,108.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
1,Scarborough,"Malvern, Rouge",43.806686,-79.194353,,,,,,,,,,,
2,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,24.0,Home Service,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
3,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,48.0,Electronics Store,Farmers Market,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor,Factory
4,Scarborough,Woburn,43.770992,-79.216917,11.0,Korean BBQ Restaurant,Yoga Studio,Electronics Store,English Restaurant,Entertainment Service,Escape Room,Ethiopian Restaurant,Event Service,Event Space,Eye Doctor


In [148]:
neighbourhoods_merged6.loc[neighbourhoods_merged6['Cluster Labels'] == 108]

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Deansgate,Oxford Road,53.472531,-2.240887,108.0,Pub,Bar,Coffee Shop,Burrito Place,Pop-Up Shop,Bakery,Indian Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Pool
78,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,108.0,Bar,Asian Restaurant,Art Gallery,Theater,Bakery,Beer Store,Brewery,Cocktail Bar,Record Shop,Yoga Studio
153,Brooklyn,Greenpoint,40.730201,-73.954241,108.0,Bar,Coffee Shop,Cocktail Bar,Grocery Store,Mexican Restaurant,Supermarket,Spa,Café,Sushi Restaurant,Pizza Place
163,Brooklyn,Prospect Heights,40.676822,-73.964859,108.0,Bar,Cocktail Bar,Mexican Restaurant,Brewery,Thai Restaurant,Grocery Store,Greek Restaurant,Sushi Restaurant,Garden Center,Garden
165,Brooklyn,Williamsburg,40.707144,-73.958115,108.0,Taco Place,Pizza Place,Bar,Latin American Restaurant,Breakfast Spot,Lounge,Liquor Store,Grocery Store,Gym,Coffee Shop
166,Brooklyn,Bushwick,40.698116,-73.925258,108.0,Bar,Mexican Restaurant,Deli / Bodega,Coffee Shop,Thrift / Vintage Store,Sandwich Place,Liquor Store,Chinese Restaurant,Latin American Restaurant,Nightclub
191,Brooklyn,Boerum Hill,40.685683,-73.983748,108.0,Furniture / Home Store,Bar,Italian Restaurant,Thrift / Vintage Store,Sandwich Place,Men's Store,Kids Store,Concert Hall,Jewelry Store,Middle Eastern Restaurant
201,Brooklyn,South Side,40.710861,-73.958001,108.0,Pizza Place,Bar,Coffee Shop,Latin American Restaurant,Yoga Studio,American Restaurant,Dive Bar,Burger Joint,South American Restaurant,Deli / Bodega
207,Manhattan,Hamilton Heights,40.823604,-73.949688,108.0,Café,Mexican Restaurant,Yoga Studio,Bar,Donut Shop,Coffee Shop,Cocktail Bar,Caribbean Restaurant,Pizza Place,Deli / Bodega
209,Manhattan,Central Harlem,40.815976,-73.943211,108.0,Caribbean Restaurant,Deli / Bodega,African Restaurant,Fried Chicken Joint,French Restaurant,Gym / Fitness Center,Beer Bar,Library,Lounge,Breakfast Spot


We did it!!!! These are the 13 neighbourhoods that have similar venues to my neighbourhood