# The Battle of Neighbourhoods 

## Introduction and context
In this (hypothetical) problem a friend of mine has quit his Wall Street job after having worked for a bank for 10+ years. Wanting a change from corporate culture and having a bit of spare money to invest, but unable to leave the City he’s looking into opening a Hungarian restaurant in New York. While most of his friends think this is at least risky if not borderline madness, New York is probably the only and best place to open a place with a less known and somewhat less health-conscious cuisine. 

New York has more than 8.5 million people, while the greater metropolitan area’s population is more than 23 million, with neighborhoods home to an extremely vast majority of places from eateries through hole in the wall places up to 2-3 michelin star fine dining restaurants. According to the New York City Department of Health and statista (https://www.statista.com/statistics/259776/number-of-people-who-went-to-restaurants-in-new-york-by-type/)  there were more than 26’000 restaurants in the City in 2017, this gives a glimpse of hope for having yet another obscure place to make ends meet. 

The ask was to analyze where it may make sense to open such a restaurant. New York is very diverse with no very obvious concentration of cuisines (apart from Chinatown and some Flushing Meadows districts), so it requires further analysis to see if there is a trend in concentration of small cuisines or maybe even Easter European block in places. 

### Business problem
There are a few ways to approach this problem, I’ll take 3 here and based on the later data analysis it may be possible to pick one (or may not). The approaches are:

#### Chinatown approach
New York’s Chinatown is one the largest of its kind with a massive number of prospering restaurants. Obviously not only residents and people of Chinese origin visit these, but is famous among visitors, tourists and in general as well. This approach assumes that if there are areas with concentrated Hungarian and in a broader sense Eastern European restaurants another one can still fit in as people do visit these parts to eat a particular dish. As Hungarian is a small portion of the city's population with a lesser known cuisine it does seem to make sense to extend the radius with similar cuisines as well. (admittedly with a subjective list)

#### Go against the current
This is the direct opposite - seeing if there are places with no or very limited number of similar restaurants. In other cities this may be a plain bad approach as purely residential areas, suburbs or other industrial districts would not be  a good fit, but NYC, especially Manhattan is so packed with restaurants that this may not be an issue there. However, if such area is found, it does make sense to see how many / how concentrated the place is.

#### Go with the flow
This approach will simply look at the areas with the highest concentration of restaurants (assuming it also is proportional to their variety) and will recommend to set up a place where there are already a lot, as people do go there to eat and is a well-established neighbourhood from this point of view. 

More specifically during the exercise the following questions will be answered:
What are the particular areas with high concentration of Hungarian restaurants?
In a more generic sense, what are the areas with high concentration of Eastern European (Hungarian, Czech, Slovakian, Polish, Romanian) cuisines?
Which areas do not have Eastern European restaurants?
What are the areas with the highest concentration of restaurants?

## Description of the data and how it’ll be used
I will essentially be using the same data sets from the previous week’s exercises, as follows:

 
### New York neighbourhood data, latitude, longitude information
Source: https://cocl.us/new_york_dataset
Usage: mapping restaurant data including address to borough and neighbourhood for classification 

### Hungarian and Eastern European restaurants in New York City
Source: Foursquare API
Usage: getting the list of Hungarian and Eastern European restaurants in NYC for each neighbourhood

### GeoSpacial data
Source: https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
Usage: using for neighbourhood boundaries for visualization

## Methodology
For this particular example the methodology is quite straightforward and analyzing the data is not overly complex either. The steps to achieve the desired answers and to draw some conclusions (if possible) are the following:
- Gather and clean / scrape New York neighbourhood data
- Read data from https://cocl.us/new_york_dataset
- Read and map geo data (latitude and longitude) to each Neighborhood and borough
- Read the list of Hungarian and Eastern European restaurants from the Foursquare API mainly focusing on Manhattan
- Do some exploratory data analysis to see if the scope of the analysis could be or should be changed (i.e.) remove Neighborhoods are   
- Run a simple density analysis on neighbourhoods to see which ones have the highest concentration of restaurants
- Use the K-Means method to cluster neighbourhoods and get answers for the distribution of Eastern European restaurants. 


In [56]:
#importing the necessary libraries
import pandas as pd
import numpy as np

#!pip install geocoder
import geocoder
import os
import requests
#!pip install folium
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

# import k-means from clustering stage
from sklearn.cluster import KMeans

print("dependent libraries are imported...")

dependent libraries are imported...


In [43]:
    #set variables
    radius=500
    LIMIT=100
    limit=100

We need some helper methods:
- convert address to long / lat (so we can match to neighbourhoods)
- reading venues from Foursquare API (for categories)
- reading venue details by ID

In [35]:
# The code was removed by Watson Studio for sharing.

In [36]:
# The code was removed by Watson Studio for sharing.

In [37]:
def get_CEE_venues(lat,lng):
    
    #set variables
    radius=1000
    LIMIT=100
    #categories from https://developer.foursquare.com/docs/api-reference/venues/categories/ and hence https://api.foursquare.com/v2/venues/categories
    CATEGORIES = '52e81612bcbc57f1066b79fa,52960bac3cf9994f4e043ac4,52f2ae52bcbc57f1066b8b81,52e81612bcbc57f1066b7a04,56aa371be4b08b9a8d57355a'
   
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryID={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            CATEGORIES)
    
    # get all the data
    results = requests.get(url).json()
    venue_data=results["response"]['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
        
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


In [38]:
def get_venues(lat,lng):
    

   
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    
    # get all the data
    results = requests.get(url).json()
    venue_data=results["response"]['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
        
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df

In [39]:
# The code was removed by Watson Studio for sharing.

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # make the GET request#foursquare api connection
        # the following variables are used to connect (already defined in function)
        # CLIENT_ID
        # CLIENT_SECRET
        # VERSION
        # uses the predefined getFoursquareExplore function, which does nothing else but calls the foursquare explore with the credentials. 
        # the function is only hidden for privacy
        results = getFoursquareExplore(lat,lng,radius,limit)["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [41]:
def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
    
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
    return new_york_data

In [8]:
#reading new york data
new_york_data=get_new_york_data()

In [9]:
new_york_data.shape

(306, 4)

In [10]:
new_york_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [12]:
#so there are 306 new york neighborhoods, need to look up Hungarian and other central / eastern european restaurants 
# reading Hungarian, Romanian, Czech, Slovakian and Polish restaurants in one go rather than having 5 calls to the API. 
#(While there is an Eastern European category in Foursquare it is not the one we're looking for)

column_names=['Borough', 'Neighborhood', 'Latitude','Longitutde', 'Hun', 'CEE']
restaurants = pd.DataFrame(columns = column_names)

cee = ["Hungarian Restaurant","Czech Restaurant","Slovak Restaurant", "Polish Restaurant", "Romanian Restaurant"]

#iterating through all neighborhoods, getting venues and adding 2 columns, Hun count and CE count (Hun + all others) 

for row in new_york_data.values.tolist():
    Borough, Neighborhood, Latitude, Longitude=row
    venues = get_CEE_venues(Latitude,Longitude)
    hun_restaurants=venues[venues['Category']=='Hungarian Restaurant']   
    cee_restaurants=venues[venues['Category'].isin(cee)]
    
    print('Hungarian Resturants in '+Neighborhood+', '+Borough+':'+str(len(hun_restaurants)))
    print('CEE Resturants in '+Neighborhood+', '+Borough+':'+str(len(cee_restaurants)))
    #adding a new row to our restaurants frame
    restaurants = restaurants.append({'Borough': Borough,
                        'Neighborhood': Neighborhood, 
                        'Latitude': Latitude,
                        'Longitutde' : Longitude,
                        'Hun' : len(hun_restaurants),
                        'CEE' : len(cee_restaurants)
                                               }, ignore_index=True)
    



Hungarian Resturants in Wakefield, Bronx:0
CEE Resturants in Wakefield, Bronx:0
Hungarian Resturants in Co-op City, Bronx:0
CEE Resturants in Co-op City, Bronx:0
Hungarian Resturants in Eastchester, Bronx:0
CEE Resturants in Eastchester, Bronx:0
Hungarian Resturants in Fieldston, Bronx:0
CEE Resturants in Fieldston, Bronx:0
Hungarian Resturants in Riverdale, Bronx:0
CEE Resturants in Riverdale, Bronx:0
Hungarian Resturants in Kingsbridge, Bronx:0
CEE Resturants in Kingsbridge, Bronx:0
Hungarian Resturants in Marble Hill, Manhattan:0
CEE Resturants in Marble Hill, Manhattan:0
Hungarian Resturants in Woodlawn, Bronx:0
CEE Resturants in Woodlawn, Bronx:0
Hungarian Resturants in Norwood, Bronx:0
CEE Resturants in Norwood, Bronx:0
Hungarian Resturants in Williamsbridge, Bronx:0
CEE Resturants in Williamsbridge, Bronx:0
Hungarian Resturants in Baychester, Bronx:0
CEE Resturants in Baychester, Bronx:0
Hungarian Resturants in Pelham Parkway, Bronx:0
CEE Resturants in Pelham Parkway, Bronx:0
Hu

In [13]:
restaurants.shape

(306, 6)

In [14]:
restaurants.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitutde,Hun,CEE
0,Bronx,Wakefield,40.894705,-73.847201,0,0
1,Bronx,Co-op City,40.874294,-73.829939,0,0
2,Bronx,Eastchester,40.887556,-73.827806,0,0
3,Bronx,Fieldston,40.895437,-73.905643,0,0
4,Bronx,Riverdale,40.890834,-73.912585,0,0


In [20]:
print('Hungarian restaurants: ' + str(restaurants['Hun'].sum()))
print('CEE restaurants: ' + str(restaurants['CEE'].sum()))

Hungarian restaurants: 0
CEE restaurants: 9


## Analysis of results

The outcome of reading all Hungarian and Central / Eastern European restaurants in New York is somewhat surprising. In the given categories only 9 restaurants were found, while there was no dedicated Hungarian restaurant in New York City. This is somewhat contradicting with the assumtions, as there are some places in New Jersey and there seem to be some Hungarian places present in NYC as well. Further analysis, however, showed that those are mainly cateogized as i.e. bakery and others under food. As the original business problem was to analyze if a restaurant is feasible, those are ommited now.

The distribution of Eastern European restaurants does not yield any significant result either, as those are:
- Greenpoint / Brooklyn: 3
- Arrochar / Staten Island: 1
- Blissville / Queens: 1
- Lenox Hill / Manhattan: 1
- Ridgewood / Queens: 1
- Roosevelt Island / Manhattan: 1
- Steinway / Queens: 1

### Conclusion 1
Given the very low number of restaurants we can safely state that neither the Chinatown model (flocking same type of restaurants) seem to be doable, nor the 'Go against the current', when the owner explicitly chooses a place where no similar restaurants are present are really options.

### Conclusion 2
We should recommend a location with a very high density of restaurants so it is likely to get enough visitors - the place is already known and liked for its food selection and variety. For this we will analyze the restaurant density for Manhattan only.

### Finding the best Manhattan districts for restaurants

In [31]:
manhattan_hoods = new_york_data.loc[new_york_data['Borough'] == 'Manhattan']
manhattan_hoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
6,Manhattan,Marble Hill,40.876551,-73.91066
100,Manhattan,Chinatown,40.715618,-73.994279
101,Manhattan,Washington Heights,40.851903,-73.9369
102,Manhattan,Inwood,40.867684,-73.92121
103,Manhattan,Hamilton Heights,40.823604,-73.949688
104,Manhattan,Manhattanville,40.816934,-73.957385
105,Manhattan,Central Harlem,40.815976,-73.943211
106,Manhattan,East Harlem,40.792249,-73.944182
107,Manhattan,Upper East Side,40.775639,-73.960508
108,Manhattan,Yorkville,40.77593,-73.947118


In [44]:
manhattan_venues = getNearbyVenues(names=manhattan_hoods['Neighborhood'],
                                   latitudes=manhattan_hoods['Latitude'],
                                   longitudes=manhattan_hoods['Longitude']
                                  )

manhattan_venues

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.910660,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.910660,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.910660,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.910660,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.910660,Dunkin',40.877136,-73.906666,Donut Shop
5,Marble Hill,40.876551,-73.910660,Rite Aid,40.875467,-73.908906,Pharmacy
6,Marble Hill,40.876551,-73.910660,TCR The Club of Riverdale,40.878628,-73.914568,Tennis Stadium
7,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
8,Marble Hill,40.876551,-73.910660,Starbucks,40.873755,-73.908613,Coffee Shop
9,Marble Hill,40.876551,-73.910660,Astral Fitness & Wellness Center,40.876705,-73.906372,Gym


In [45]:
print(manhattan_venues.shape)
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

(2984, 7)
There are 321 uniques categories.


In [46]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [47]:
#check the shape
manhattan_onehot.shape

(2984, 322)

In [51]:
#need to group by neighborhood and take a mean frequency
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.011364,...,0.0,0.022727,0.0,0.0,0.0,0.011364,0.034091,0.0,0.0,0.034091
2,Central Harlem,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
5,Civic Center,0.0,0.0,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011494,0.022989,0.0,0.0,0.034483
6,Clinton,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,...,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01


In [49]:
#sorting values
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Hotel,Gym,Boat or Ferry,Memorial Site,Beer Garden,Gourmet Shop,Plaza,Food Court,Shopping Mall
1,Carnegie Hill,Coffee Shop,Café,Yoga Studio,Pizza Place,Bar,Bookstore,Grocery Store,Gym,Gym / Fitness Center,Japanese Restaurant
2,Central Harlem,Chinese Restaurant,Gym / Fitness Center,African Restaurant,American Restaurant,Bar,Fried Chicken Joint,Seafood Restaurant,French Restaurant,Café,Market
3,Chelsea,Art Gallery,Coffee Shop,Italian Restaurant,Ice Cream Shop,American Restaurant,Seafood Restaurant,Bookstore,Boutique,Juice Bar,Market
4,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Coffee Shop,Spa,American Restaurant,Salon / Barbershop,Optical Shop,Bar,Dim Sum Restaurant


### Clustering neighborhoods into 3 clusters 
that should be enough and adequate to assess which one to recommend

In [57]:
# set number of clusters
kclusters = 3

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 1, 0, 0, 1, 1], dtype=int32)

In [58]:
#merge the dataframes together with the 10 most frequent and the cluster value
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_hoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Manhattan,Marble Hill,40.876551,-73.91066,0,Sandwich Place,Gym,American Restaurant,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Seafood Restaurant,Pizza Place
100,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Cocktail Bar,Coffee Shop,Spa,American Restaurant,Salon / Barbershop,Optical Shop,Bar,Dim Sum Restaurant
101,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Mobile Phone Shop,Pizza Place,Grocery Store,Chinese Restaurant,Latin American Restaurant,Tapas Restaurant,New American Restaurant,Park
102,Manhattan,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Park,Chinese Restaurant,Deli / Bodega,American Restaurant
103,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Coffee Shop,Mexican Restaurant,Café,Deli / Bodega,Chinese Restaurant,Sushi Restaurant,Cocktail Bar,Yoga Studio,Caribbean Restaurant


In [59]:
manhattan_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Manhattan,Marble Hill,40.876551,-73.91066,0,Sandwich Place,Gym,American Restaurant,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Seafood Restaurant,Pizza Place
100,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Cocktail Bar,Coffee Shop,Spa,American Restaurant,Salon / Barbershop,Optical Shop,Bar,Dim Sum Restaurant
101,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Mobile Phone Shop,Pizza Place,Grocery Store,Chinese Restaurant,Latin American Restaurant,Tapas Restaurant,New American Restaurant,Park
102,Manhattan,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Park,Chinese Restaurant,Deli / Bodega,American Restaurant
103,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Pizza Place,Coffee Shop,Mexican Restaurant,Café,Deli / Bodega,Chinese Restaurant,Sushi Restaurant,Cocktail Bar,Yoga Studio,Caribbean Restaurant
104,Manhattan,Manhattanville,40.816934,-73.957385,0,Coffee Shop,Seafood Restaurant,Italian Restaurant,Deli / Bodega,Park,Fried Chicken Joint,Mexican Restaurant,Food & Drink Shop,Bank,Bar
105,Manhattan,Central Harlem,40.815976,-73.943211,0,Chinese Restaurant,Gym / Fitness Center,African Restaurant,American Restaurant,Bar,Fried Chicken Joint,Seafood Restaurant,French Restaurant,Café,Market
106,Manhattan,East Harlem,40.792249,-73.944182,0,Mexican Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Latin American Restaurant,Pizza Place,Dance Studio,Steakhouse,Doctor's Office,Cocktail Bar
107,Manhattan,Upper East Side,40.775639,-73.960508,1,Italian Restaurant,Bakery,Exhibit,Gym / Fitness Center,Spa,Cosmetics Shop,Hotel,Juice Bar,Pizza Place,Yoga Studio
108,Manhattan,Yorkville,40.77593,-73.947118,1,Coffee Shop,Italian Restaurant,Gym,Bar,Deli / Bodega,Mexican Restaurant,Sushi Restaurant,Wine Shop,Japanese Restaurant,Diner


### Visualizing it

In [61]:
#creating a map of greater toronto and visualizing our boroughs and 'hoods
address = 'New York, NY, US'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of New York City are 40.7127281, -74.0060152.


In [63]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 0

In [64]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Marble Hill,Sandwich Place,Gym,American Restaurant,Coffee Shop,Yoga Studio,Deli / Bodega,Supplement Shop,Steakhouse,Seafood Restaurant,Pizza Place
100,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Coffee Shop,Spa,American Restaurant,Salon / Barbershop,Optical Shop,Bar,Dim Sum Restaurant
101,Washington Heights,Café,Bakery,Mobile Phone Shop,Pizza Place,Grocery Store,Chinese Restaurant,Latin American Restaurant,Tapas Restaurant,New American Restaurant,Park
102,Inwood,Mexican Restaurant,Café,Bakery,Pizza Place,Lounge,Restaurant,Park,Chinese Restaurant,Deli / Bodega,American Restaurant
103,Hamilton Heights,Pizza Place,Coffee Shop,Mexican Restaurant,Café,Deli / Bodega,Chinese Restaurant,Sushi Restaurant,Cocktail Bar,Yoga Studio,Caribbean Restaurant
104,Manhattanville,Coffee Shop,Seafood Restaurant,Italian Restaurant,Deli / Bodega,Park,Fried Chicken Joint,Mexican Restaurant,Food & Drink Shop,Bank,Bar
105,Central Harlem,Chinese Restaurant,Gym / Fitness Center,African Restaurant,American Restaurant,Bar,Fried Chicken Joint,Seafood Restaurant,French Restaurant,Café,Market
106,East Harlem,Mexican Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Latin American Restaurant,Pizza Place,Dance Studio,Steakhouse,Doctor's Office,Cocktail Bar
109,Lenox Hill,Italian Restaurant,Coffee Shop,Pizza Place,Cocktail Bar,Sushi Restaurant,Café,Gym,Burger Joint,Gym / Fitness Center,Sporting Goods Shop
110,Roosevelt Island,Deli / Bodega,Park,Gym / Fitness Center,Greek Restaurant,Dog Run,Liquor Store,Outdoors & Recreation,Sandwich Place,Scenic Lookout,Food & Drink Shop


#### Cluster 1

In [65]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
107,Upper East Side,Italian Restaurant,Bakery,Exhibit,Gym / Fitness Center,Spa,Cosmetics Shop,Hotel,Juice Bar,Pizza Place,Yoga Studio
108,Yorkville,Coffee Shop,Italian Restaurant,Gym,Bar,Deli / Bodega,Mexican Restaurant,Sushi Restaurant,Wine Shop,Japanese Restaurant,Diner
112,Lincoln Square,Plaza,Gym / Fitness Center,Café,Italian Restaurant,Concert Hall,Performing Arts Venue,Theater,American Restaurant,Wine Shop,Indie Movie Theater
113,Clinton,Theater,Coffee Shop,Gym / Fitness Center,Gym,Wine Shop,Hotel,Italian Restaurant,Sandwich Place,Pizza Place,Spa
117,Greenwich Village,Italian Restaurant,Coffee Shop,Café,Bakery,Sushi Restaurant,Gym,Wine Bar,Dessert Shop,Comedy Club,Indie Movie Theater
120,Tribeca,Park,Italian Restaurant,Wine Bar,Café,Spa,American Restaurant,Art Gallery,Steakhouse,Skate Park,Scenic Lookout
122,Soho,Italian Restaurant,Mediterranean Restaurant,Coffee Shop,Art Gallery,Gym,Spa,Café,Clothing Store,French Restaurant,Paper / Office Supplies Store
123,West Village,Wine Bar,Italian Restaurant,Park,American Restaurant,New American Restaurant,Coffee Shop,Jazz Club,Bakery,Cocktail Bar,Seafood Restaurant
126,Gramercy,Italian Restaurant,Coffee Shop,Playground,Pizza Place,Bar,Diner,Taco Place,Pub,Park,Mexican Restaurant
127,Battery Park City,Park,Hotel,Gym,Boat or Ferry,Memorial Site,Beer Garden,Gourmet Shop,Plaza,Food Court,Shopping Mall


#### Cluster 2

In [66]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
275,Stuyvesant Town,Boat or Ferry,Park,Bar,Pet Service,Gas Station,Farmers Market,German Restaurant,Gym / Fitness Center,Baseball Field,Harbor / Marina
