# The Battle of Neighborhoods

Download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Load and explore the data

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

All the relevant data is in the features key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

The next task is essentially transforming this data of nested Python dictionaries into a pandas dataframe. So let's start by creating an empty dataframe.

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [8]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [9]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [10]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [11]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


Use geopy library to get the latitude and longitude values of New York City.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ny_explorer, as shown below.

In [12]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Create a map of New York with neighborhoods superimposed on top.¶

In [13]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [14]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's get the geographical coordinates of Manhattan.

In [15]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


As we did with all of New York City, let's visualize Manhattan the neighborhoods in it.

In [16]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Define Foursquare Credentials and Version.

In [17]:
CLIENT_ID = 'B2SOBN12MQU3S50UDGCFKZTLMPMGMSAD53AZ5OJIT4FW42QQ' # your Foursquare ID
CLIENT_SECRET = 'GJLTIEY5MGSQCHILQHPK45VCDG3BOO4DOCHUPPQWP2FSMWAI' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 500 
radius = 5000 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: B2SOBN12MQU3S50UDGCFKZTLMPMGMSAD53AZ5OJIT4FW42QQ
CLIENT_SECRET:GJLTIEY5MGSQCHILQHPK45VCDG3BOO4DOCHUPPQWP2FSMWAI


Explore Neighborhoods in Manhattan.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, categoryId=''):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        if (categoryId != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryId)

        # make the GET request
        response = requests.get(url).json()
        results = response["response"]['venues']
        
        # return only relevant information for each nearby venue
        for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Create a new dataframe called dessert_venues.

In [19]:
#neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
dessert_venues = getNearbyVenues(names=manhattan_data['Neighborhood'], 
                                latitudes=manhattan_data['Latitude'], 
                                longitudes=manhattan_data['Longitude'], 
                                radius=1000, 
                                categoryId='4bf58dd8d48988d1d0941735')

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Create a map of Manhattan with venues and neighborhoods superimposed on top.

In [20]:
for lat, lng, venue, venue_cat, neighborhood in zip(dessert_venues['Venue Latitude'], dessert_venues['Venue Longitude'], dessert_venues['Venue'], dessert_venues['Venue Category'], dessert_venues['Neighborhood']):
    label = '{}, {}, {}'.format(venue, venue_cat, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

Let's check the size of the resulting dataframe.

In [21]:
print(dessert_venues.shape)
dessert_venues.head()

(1600, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Baskin-Robbins,40.877149,-73.906658,Ice Cream Shop
1,Marble Hill,40.876551,-73.91066,Sugarboy Bakery Cafe,40.877948,-73.90286,Bakery
2,Marble Hill,40.876551,-73.91066,Carvel Ice Cream,40.883657,-73.901655,Ice Cream Shop
3,Marble Hill,40.876551,-73.91066,Room for Dessert,40.877993,-73.906023,Dessert Shop
4,Marble Hill,40.876551,-73.91066,Pinkberry,40.873125,-73.90134,Frozen Yogurt Shop


Let's check how many venues were returned for each neighborhood.

In [22]:
dessert_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,47,47,47,47,47,47
Carnegie Hill,48,48,48,48,48,48
Central Harlem,13,13,13,13,13,13
Chelsea,50,50,50,50,50,50
Chinatown,50,50,50,50,50,50
Civic Center,49,49,49,49,49,49
Clinton,48,48,48,48,48,48
East Harlem,14,14,14,14,14,14
East Village,50,50,50,50,50,50
Financial District,47,47,47,47,47,47


Let's find out how many unique categories can be curated from all the returned venues.

In [23]:
print('There are {} uniques categories.'.format(len(dessert_venues['Venue Category'].unique())))

There are 30 uniques categories.


Analyze Each Neighborhood.

In [24]:
# one hot encoding
manhattan_onehot = pd.get_dummies(dessert_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = dessert_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café,Candy Store,Chocolate Shop,Coffee Shop,Creperie,Cupcake Shop,Deli / Bodega,Dessert Shop,Food & Drink Shop,Food Stand,Food Truck,French Restaurant,Frozen Yogurt Shop,Gift Shop,Ice Cream Shop,Italian Restaurant,Juice Bar,New American Restaurant,Pastry Shop,Pie Shop,Pizza Place,Smoothie Shop,Snack Place,Tea Room
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [25]:
manhattan_onehot.shape

(1600, 31)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category.

In [26]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,American Restaurant,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café,Candy Store,Chocolate Shop,Coffee Shop,Creperie,Cupcake Shop,Deli / Bodega,Dessert Shop,Food & Drink Shop,Food Stand,Food Truck,French Restaurant,Frozen Yogurt Shop,Gift Shop,Ice Cream Shop,Italian Restaurant,Juice Bar,New American Restaurant,Pastry Shop,Pie Shop,Pizza Place,Smoothie Shop,Snack Place,Tea Room
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.042553,0.0,0.085106,0.0,0.212766,0.0,0.0,0.021277,0.0,0.12766,0.0,0.425532,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0
1,Carnegie Hill,0.0,0.041667,0.020833,0.0,0.041667,0.0,0.020833,0.041667,0.020833,0.0,0.0,0.083333,0.0,0.208333,0.020833,0.0,0.041667,0.0,0.041667,0.0,0.375,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833
2,Central Harlem,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.461538,0.0,0.0,0.0,0.0,0.0,0.076923,0.307692,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.1,0.0,0.02,0.02,0.0,0.04,0.02,0.02,0.02,0.02,0.1,0.0,0.26,0.0,0.02,0.04,0.0,0.02,0.0,0.28,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
4,Chinatown,0.0,0.08,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.06,0.0,0.36,0.0,0.0,0.02,0.0,0.0,0.0,0.38,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02
5,Civic Center,0.0,0.081633,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.020408,0.0,0.367347,0.0,0.0,0.0,0.0,0.020408,0.0,0.408163,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408
6,Clinton,0.020833,0.104167,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.104167,0.0,0.1875,0.0,0.0,0.041667,0.0,0.0625,0.0,0.291667,0.0,0.0,0.0,0.020833,0.020833,0.020833,0.020833,0.0,0.020833
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.071429,0.0,0.071429,0.0,0.571429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.08,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.02,0.4,0.0,0.0,0.02,0.0,0.0,0.0,0.36,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.042553,0.0,0.085106,0.0,0.255319,0.0,0.0,0.021277,0.0,0.12766,0.0,0.404255,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0


In [27]:
manhattan_grouped.shape

(40, 31)

Let's put that into a pandas dataframe.

First, let's write a function to sort the venues in descending order.

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Ice Cream Shop,Dessert Shop,Frozen Yogurt Shop,Cupcake Shop,Coffee Shop,Snack Place,Food Truck,Chocolate Shop,Juice Bar,Café
1,Carnegie Hill,Ice Cream Shop,Dessert Shop,Cupcake Shop,Food Truck,Bakery,Burger Joint,Candy Store,Frozen Yogurt Shop,Tea Room,Bar
2,Central Harlem,Dessert Shop,Ice Cream Shop,Burger Joint,Candy Store,Gift Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Butcher
3,Chelsea,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Food Truck,Café,Frozen Yogurt Shop,Coffee Shop,Food Stand,Chocolate Shop
4,Chinatown,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Creperie,Food Truck,Tea Room,Bubble Tea Shop,Pie Shop,Café


Cluster Neighborhoods.

Run k-means to cluster the neighborhood into 5 clusters.

In [30]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 3, 0, 0, 3, 4, 0, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4,Ice Cream Shop,Frozen Yogurt Shop,Dessert Shop,Bakery,Tea Room,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café
1,Manhattan,Chinatown,40.715618,-73.994279,0,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Creperie,Food Truck,Tea Room,Bubble Tea Shop,Pie Shop,Café
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Ice Cream Shop,Dessert Shop,Butcher,Frozen Yogurt Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Café
3,Manhattan,Inwood,40.867684,-73.92121,2,Frozen Yogurt Shop,Ice Cream Shop,Dessert Shop,Cupcake Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Ice Cream Shop,Dessert Shop,Candy Store,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café


Finally, let's visualize the resulting clusters.

In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now, examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, a name can be assigned to each cluster. 

Cluster 1

In [33]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Creperie,Food Truck,Tea Room,Bubble Tea Shop,Pie Shop,Café
5,Manhattanville,Dessert Shop,Ice Cream Shop,Cupcake Shop,Food Truck,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher
6,Central Harlem,Dessert Shop,Ice Cream Shop,Burger Joint,Candy Store,Gift Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Butcher
10,Lenox Hill,Ice Cream Shop,Dessert Shop,Café,Bakery,Cupcake Shop,Frozen Yogurt Shop,Pastry Shop,Burger Joint,Candy Store,Chocolate Shop
18,Greenwich Village,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Food Truck,Frozen Yogurt Shop,Café,Tea Room,Juice Bar,Bubble Tea Shop
19,East Village,Dessert Shop,Ice Cream Shop,Bakery,Cupcake Shop,Juice Bar,Food Truck,Deli / Bodega,Café,Bubble Tea Shop,Butcher
20,Lower East Side,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Café,Bubble Tea Shop,Deli / Bodega,Pie Shop,Food Truck,Burger Joint
21,Tribeca,Dessert Shop,Ice Cream Shop,Bakery,Cupcake Shop,Creperie,Chocolate Shop,Frozen Yogurt Shop,Tea Room,Pie Shop,Deli / Bodega
22,Little Italy,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Bubble Tea Shop,Creperie,Food Truck,Tea Room,Pie Shop,Deli / Bodega
23,Soho,Ice Cream Shop,Dessert Shop,Bakery,Pie Shop,Cupcake Shop,Creperie,Food Truck,Tea Room,Bubble Tea Shop,Deli / Bodega


Cluster 2

In [34]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Ice Cream Shop,Dessert Shop,Café,Cupcake Shop,Frozen Yogurt Shop,Bakery,Burger Joint,Candy Store,Food Truck,Tea Room
9,Yorkville,Ice Cream Shop,Dessert Shop,Cupcake Shop,Frozen Yogurt Shop,Burger Joint,Café,Candy Store,Tea Room,Bakery,Bar
13,Lincoln Square,Ice Cream Shop,Dessert Shop,Cupcake Shop,Bakery,Food Truck,Frozen Yogurt Shop,Food Stand,Chocolate Shop,American Restaurant,Candy Store
26,Morningside Heights,Ice Cream Shop,Dessert Shop,Food Truck,Cupcake Shop,French Restaurant,Juice Bar,Frozen Yogurt Shop,Candy Store,Coffee Shop,Chocolate Shop
28,Battery Park City,Ice Cream Shop,Dessert Shop,Frozen Yogurt Shop,Cupcake Shop,Coffee Shop,Snack Place,Food Truck,Chocolate Shop,Juice Bar,Café
29,Financial District,Ice Cream Shop,Dessert Shop,Frozen Yogurt Shop,Cupcake Shop,Coffee Shop,Snack Place,Food Truck,Juice Bar,Café,Chocolate Shop
30,Carnegie Hill,Ice Cream Shop,Dessert Shop,Cupcake Shop,Food Truck,Bakery,Burger Joint,Candy Store,Frozen Yogurt Shop,Tea Room,Bar
36,Tudor City,Ice Cream Shop,Dessert Shop,Bakery,Chocolate Shop,Cupcake Shop,Candy Store,Italian Restaurant,Café,Frozen Yogurt Shop,Deli / Bodega
39,Hudson Yards,Ice Cream Shop,Dessert Shop,Cupcake Shop,Food Truck,Pie Shop,Burger Joint,Frozen Yogurt Shop,Pizza Place,Bakery,Snack Place


Cluster 3

In [35]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Frozen Yogurt Shop,Ice Cream Shop,Dessert Shop,Cupcake Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher


Cluster 4

In [36]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Clinton,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Frozen Yogurt Shop,Burger Joint,Chocolate Shop,Food Truck,Tea Room,American Restaurant
15,Midtown,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Frozen Yogurt Shop,Food Truck,Chocolate Shop,Pie Shop,Pastry Shop,Burger Joint
16,Murray Hill,Ice Cream Shop,Dessert Shop,Bakery,Food Truck,Chocolate Shop,Cupcake Shop,Burger Joint,Juice Bar,Frozen Yogurt Shop,Candy Store
17,Chelsea,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Food Truck,Café,Frozen Yogurt Shop,Coffee Shop,Food Stand,Chocolate Shop
24,West Village,Ice Cream Shop,Dessert Shop,Cupcake Shop,Bakery,Chocolate Shop,Candy Store,Café,Frozen Yogurt Shop,Creperie,Burger Joint
33,Midtown South,Ice Cream Shop,Dessert Shop,Bakery,Cupcake Shop,Burger Joint,Frozen Yogurt Shop,Pastry Shop,Juice Bar,Coffee Shop,Candy Store
35,Turtle Bay,Ice Cream Shop,Dessert Shop,Bakery,Chocolate Shop,Cupcake Shop,Candy Store,Italian Restaurant,Pastry Shop,Café,Food Truck


Cluster 5

In [37]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Ice Cream Shop,Frozen Yogurt Shop,Dessert Shop,Bakery,Tea Room,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café
2,Washington Heights,Ice Cream Shop,Dessert Shop,Butcher,Frozen Yogurt Shop,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Café
4,Hamilton Heights,Ice Cream Shop,Dessert Shop,Candy Store,Tea Room,Bakery,Bar,Bubble Tea Shop,Burger Joint,Butcher,Café
7,East Harlem,Ice Cream Shop,Dessert Shop,Food Truck,Frozen Yogurt Shop,Café,Cupcake Shop,Creperie,Coffee Shop,Chocolate Shop,Tea Room
11,Roosevelt Island,Ice Cream Shop,Dessert Shop,Tea Room,Bar,Bubble Tea Shop,Frozen Yogurt Shop,Bakery,Burger Joint,Butcher,Café
12,Upper West Side,Ice Cream Shop,Dessert Shop,Frozen Yogurt Shop,Cupcake Shop,Burger Joint,Café,Candy Store,Chocolate Shop,Tea Room,Bakery
25,Manhattan Valley,Ice Cream Shop,Dessert Shop,Frozen Yogurt Shop,Candy Store,Food Truck,French Restaurant,Juice Bar,Coffee Shop,Chocolate Shop,Tea Room
