# Coursera Capstone Project - The Battle of Neighborhoods

## 1. Introduction / Business Problem

Opening a new restaurant can be a really exciting thing. But how do we decide where should we open our restaurant? There could be lots of factors that could potentially affect how the business would go, so choosing the right neighborhood is critical!

Luckily, with the data science tools we have nowadays, we can analyze the neighborhoods and decide which one would be the best for opening a new restaurant. Here are a few factors that we should take into consideration.

### Factors that should be taken into consideration

#### Cuisine Type 

Cuisine type is one of the most important things to consider when opening a restaurant. People from different neighborhoods may have very different tastes in food. But at the mean time, the customers who like a specific type of cuisine should share some common interests and characteristics. And these charachteristics can be analyzed using the data science tools we have learned in this series of courses.

#### Demographical Insights of the Neighborhoods

Like the one mentioned above, demographical data can also affect how well the restaurant runs.It is very important to figure out the population of the neighborhoods and what are the preferences that people from different neighborhood have in order to choose the perfect place for a restaurant. By looking at the customers data, we should be able to find out which neiborhood enjoys what kind of food the most and how often do they go to a specific restaurant and so on.

#### Competing Restaurants

Another factor we have to think is other restaurants. Fortunately, Foursquare can provide us with the data of each neighborhood with the information on their top restaurants and even photos and comments. So we can find out what kind of restaurant will have a potential to be a hit in each neighborhood. 

#### Lease and rent prices in each neighborhood

Money is always important when it comes to business! We have to always keep in mind the expences and profits. So analyzing or predicting the profits for potential locations are also very crutial! If we can get the average price for renting spaces for our restaurant, it would really help us to make the predictions and maximize the profits.

### Objectives

In this project, due to time and effort limits, I will be mainly focusing on analyzing the most popular restaurants in different neighborhoods in order to provide insights on the most suitable locations for restaruants that serve different types of cuisines.

## 2. Data

The majority of the data used in this project will be taken from Foursquare API, which is a crowd sourced, comprehensive geographical data source. With Foursquare API, we will be able to get insight on the most popular venues in each neighborhood, their photos, ratings and customer comments on those venues.

For this project, I will only be analyzing neighborhood in Manhattan because the large amount of restaurants and large diversity in population. I will use the information of these neighborhoods from Foursquare, especially the ratings and rankings for restaurants in different neighborhoods to get knowledge of the preferences of customers in each neighborhood. Their preferences will then help us decide the location of a new restaurant.

## 3. Methodology

First let's import the libraries that will be used in the project

In [2]:
import numpy as np 
import pandas as pd 
import json
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests 
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

### Load and explore the data

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

Because all the relevant data is in the features key, let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Now that we have threw out the extra data, let's transform the 'features' into a pandas dataframe for our convenience.

In [6]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

Go through each item to extract useful information and add it to our dataframe.

In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                        'Longitude': neighborhood_lon}, ignore_index=True)

For this project, I will only be looking at the neighborhoods in Manhattan because it is the most populated borough in NYC and should give us lots of useful and exciting information.

In [8]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688
5,Manhattan,Manhattanville,40.816934,-73.957385
6,Manhattan,Central Harlem,40.815976,-73.943211
7,Manhattan,East Harlem,40.792249,-73.944182
8,Manhattan,Upper East Side,40.775639,-73.960508
9,Manhattan,Yorkville,40.77593,-73.947118


In [10]:
# get the coordinates of Manhattan and plot the neighborhoods on the map
address = 'Manhattan, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_manhattan)  
    
map_manhattan



Next we can get the restaurants information of these neighborhoods from Foursquare

In [11]:
CLIENT_ID = 'J3CKEZA4ZSXW3YJX2B03M02FH1G2JRV0151WTGQFTZODDFNE'
CLIENT_SECRET = '05VAF2XXEXZ4DZAQRPUWO2LAYLMXVNQG44Y2KQUXY5C4YMQH' 
VERSION = '20181006' 

def getNearbyRestaurants(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    restaurants_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query="restaurant"'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
        # return only relevant information for each nearby venue
        restaurants_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_restaurants = pd.DataFrame([item for restaurants_list in restaurants_list for item in restaurants_list])
    nearby_restaurants.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Restaurant Name', 
                  'Restaurant Latitude', 
                  'Restaurant Longitude', 
                  'Restaurant Category']
    
    return(nearby_restaurants)

In [12]:
manhattan_restaurants = getNearbyRestaurants(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'])


In [13]:
print(manhattan_restaurants.shape)
manhattan_restaurants.head()

(2843, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Restaurant Name,Restaurant Latitude,Restaurant Longitude,Restaurant Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
2,Marble Hill,40.876551,-73.91066,Dunkin' Donuts,40.876993,-73.906507,Donut Shop
3,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
4,Marble Hill,40.876551,-73.91066,Boston Market,40.87743,-73.905412,Comfort Food Restaurant


In [14]:
print('There are {} uniques categories.'.format(len(manhattan_restaurants['Restaurant Category'].unique())))

There are 120 uniques categories.


In [15]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_restaurants[['Restaurant Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_restaurants['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,...,Theme Restaurant,Tibetan Restaurant,Tonkatsu Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,...,Theme Restaurant,Tibetan Restaurant,Tonkatsu Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Battery Park City,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.014493,0.0,0.014493,0.0,0.0,0.0,0.0,0.014493,...,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.028986,0.0
2,Central Harlem,0.065217,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.021739,...,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0
3,Chelsea,0.0,0.04,0.01,0.0,0.03,0.01,0.0,0.01,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0
4,Chinatown,0.0,0.03,0.0,0.0,0.03,0.01,0.01,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.06,0.0


In [17]:
num_top_restaurants = 1

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_restaurants))
    print('\n')

----Battery Park City----
         venue  freq
0  Pizza Place  0.12


----Carnegie Hill----
         venue  freq
0  Pizza Place  0.12


----Central Harlem----
                 venue  freq
0  Fried Chicken Joint  0.11


----Chelsea----
    venue  freq
0  Bakery  0.07


----Chinatown----
                venue  freq
0  Chinese Restaurant   0.2


----Civic Center----
                venue  freq
0  Italian Restaurant  0.11


----Clinton----
                venue  freq
0  Italian Restaurant  0.09


----East Harlem----
                venue  freq
0  Mexican Restaurant  0.14


----East Village----
         venue  freq
0  Pizza Place  0.07


----Financial District----
            venue  freq
0  Sandwich Place  0.09


----Flatiron----
                venue  freq
0  Italian Restaurant  0.13


----Gramercy----
                venue  freq
0  Italian Restaurant  0.11


----Greenwich Village----
                venue  freq
0  Italian Restaurant   0.2


----Hamilton Heights----
           venue  freq


In [18]:
def return_most_common_restaurants(row, num_top_restaurants):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_restaurants]

In [19]:
num_top_restaurants = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_restaurants):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))

# create a new dataframe
neighborhoods_restaurants_sorted = pd.DataFrame(columns=columns)
neighborhoods_restaurants_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_restaurants_sorted.iloc[ind, 1:] = return_most_common_restaurants(manhattan_grouped.iloc[ind, :], num_top_restaurants)

neighborhoods_restaurants_sorted

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
0,Battery Park City,Pizza Place,Italian Restaurant,Burger Joint,American Restaurant,Sandwich Place,BBQ Joint,Bakery,Chinese Restaurant,Burrito Place,Steakhouse
1,Carnegie Hill,Pizza Place,Café,Sushi Restaurant,Bakery,Italian Restaurant,Japanese Restaurant,French Restaurant,Mexican Restaurant,Restaurant,Deli / Bodega
2,Central Harlem,Fried Chicken Joint,Deli / Bodega,Chinese Restaurant,African Restaurant,Pizza Place,Seafood Restaurant,Caribbean Restaurant,Southern / Soul Food Restaurant,Sandwich Place,French Restaurant
3,Chelsea,Bakery,Italian Restaurant,Pizza Place,Tapas Restaurant,Seafood Restaurant,Café,Mexican Restaurant,Japanese Restaurant,American Restaurant,Breakfast Spot
4,Chinatown,Chinese Restaurant,Bakery,Dim Sum Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Hotpot Restaurant,Café,American Restaurant,Vegetarian / Vegan Restaurant
5,Civic Center,Italian Restaurant,Sandwich Place,French Restaurant,American Restaurant,Mexican Restaurant,Bakery,Sushi Restaurant,Korean Restaurant,Café,Indian Restaurant
6,Clinton,Italian Restaurant,American Restaurant,Deli / Bodega,Chinese Restaurant,Sandwich Place,Pizza Place,Café,Thai Restaurant,Mexican Restaurant,Seafood Restaurant
7,East Harlem,Mexican Restaurant,Pizza Place,Bakery,Latin American Restaurant,Deli / Bodega,Fast Food Restaurant,Burger Joint,Thai Restaurant,Steakhouse,Fried Chicken Joint
8,East Village,Pizza Place,Mexican Restaurant,Japanese Restaurant,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Italian Restaurant,Deli / Bodega,French Restaurant
9,Financial District,Sandwich Place,Italian Restaurant,Food Truck,American Restaurant,Mexican Restaurant,Steakhouse,Café,Pizza Place,Deli / Bodega,Salad Place


In [20]:
# set number of clusters
kclusters = 10

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([3, 0, 8, 7, 0, 7, 7, 9, 0, 7, 7, 7, 3, 9, 1, 9, 3, 1, 3, 0, 7, 8,
       6, 7, 4, 1, 7, 7, 5, 3, 2, 7, 7, 1, 1, 3, 3, 9, 3, 7])

In [21]:
manhattan_merged = manhattan_data

# add clustering labels
manhattan_merged['Cluster Labels'] = kmeans.labels_
manhattan_merged = manhattan_merged.join(neighborhoods_restaurants_sorted.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Sandwich Place,Deli / Bodega,Steakhouse,Diner,Restaurant,Pizza Place,Seafood Restaurant,Comfort Food Restaurant,Chinese Restaurant,Donut Shop
1,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Dim Sum Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Hotpot Restaurant,Café,American Restaurant,Vegetarian / Vegan Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,8,Deli / Bodega,Pizza Place,Mexican Restaurant,Sandwich Place,Spanish Restaurant,Chinese Restaurant,Bakery,Café,Donut Shop,Latin American Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,7,Pizza Place,Mexican Restaurant,Deli / Bodega,Bakery,Café,Restaurant,Chinese Restaurant,American Restaurant,Donut Shop,Diner
4,Manhattan,Hamilton Heights,40.823604,-73.949688,0,Deli / Bodega,Pizza Place,Mexican Restaurant,Café,Donut Shop,Chinese Restaurant,Sandwich Place,Indian Restaurant,Sushi Restaurant,Bakery


## 4. Results

Let's look at the clusters of neighborhoods on the map

In [22]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let's take a closer look at each cluster of neighborhood and their favorite restaurants

In [23]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
1,Chinatown,Chinese Restaurant,Bakery,Dim Sum Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Mexican Restaurant,Hotpot Restaurant,Café,American Restaurant,Vegetarian / Vegan Restaurant
4,Hamilton Heights,Deli / Bodega,Pizza Place,Mexican Restaurant,Café,Donut Shop,Chinese Restaurant,Sandwich Place,Indian Restaurant,Sushi Restaurant,Bakery
8,Upper East Side,Italian Restaurant,American Restaurant,Diner,Pizza Place,French Restaurant,Sushi Restaurant,Deli / Bodega,Bakery,Bagel Shop,Salad Place
19,East Village,Pizza Place,Mexican Restaurant,Japanese Restaurant,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Italian Restaurant,Deli / Bodega,French Restaurant


In [24]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
14,Clinton,Italian Restaurant,American Restaurant,Deli / Bodega,Chinese Restaurant,Sandwich Place,Pizza Place,Café,Thai Restaurant,Mexican Restaurant,Seafood Restaurant
17,Chelsea,Bakery,Italian Restaurant,Pizza Place,Tapas Restaurant,Seafood Restaurant,Café,Mexican Restaurant,Japanese Restaurant,American Restaurant,Breakfast Spot
25,Manhattan Valley,Pizza Place,Indian Restaurant,Italian Restaurant,Szechuan Restaurant,Café,Deli / Bodega,Thai Restaurant,French Restaurant,Mexican Restaurant,American Restaurant
33,Midtown South,Korean Restaurant,American Restaurant,Italian Restaurant,Sandwich Place,Japanese Restaurant,Food Court,Restaurant,Salad Place,Bakery,Café
34,Sutton Place,Italian Restaurant,Indian Restaurant,Pizza Place,Bagel Shop,Burger Joint,Mexican Restaurant,American Restaurant,Restaurant,Japanese Restaurant,Steakhouse


In [25]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
30,Carnegie Hill,Pizza Place,Café,Sushi Restaurant,Bakery,Italian Restaurant,Japanese Restaurant,French Restaurant,Mexican Restaurant,Restaurant,Deli / Bodega


In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
0,Marble Hill,Sandwich Place,Deli / Bodega,Steakhouse,Diner,Restaurant,Pizza Place,Seafood Restaurant,Comfort Food Restaurant,Chinese Restaurant,Donut Shop
12,Upper West Side,Italian Restaurant,Vegetarian / Vegan Restaurant,Bakery,Indian Restaurant,French Restaurant,Bagel Shop,Burger Joint,Restaurant,Breakfast Spot,Pizza Place
16,Murray Hill,Sandwich Place,American Restaurant,Japanese Restaurant,Pizza Place,Italian Restaurant,Burger Joint,Café,Sushi Restaurant,French Restaurant,Restaurant
18,Greenwich Village,Italian Restaurant,American Restaurant,Café,Sushi Restaurant,Seafood Restaurant,Chinese Restaurant,Indian Restaurant,French Restaurant,Pizza Place,Sandwich Place
29,Financial District,Sandwich Place,Italian Restaurant,Food Truck,American Restaurant,Mexican Restaurant,Steakhouse,Café,Pizza Place,Deli / Bodega,Salad Place
35,Turtle Bay,Italian Restaurant,Deli / Bodega,Café,Steakhouse,Japanese Restaurant,Food Truck,Sushi Restaurant,French Restaurant,Asian Restaurant,Diner
36,Tudor City,Café,Greek Restaurant,Pizza Place,Deli / Bodega,Diner,Mexican Restaurant,Food Truck,Sandwich Place,American Restaurant,Bagel Shop
38,Flatiron,Italian Restaurant,American Restaurant,New American Restaurant,Japanese Restaurant,Sandwich Place,Bakery,Food Truck,Mediterranean Restaurant,Mexican Restaurant,Korean Restaurant


In [27]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
24,West Village,Italian Restaurant,American Restaurant,Japanese Restaurant,New American Restaurant,French Restaurant,Gastropub,Seafood Restaurant,Restaurant,Pizza Place,Mexican Restaurant


In [28]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 5, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
28,Battery Park City,Pizza Place,Italian Restaurant,Burger Joint,American Restaurant,Sandwich Place,BBQ Joint,Bakery,Chinese Restaurant,Burrito Place,Steakhouse


In [29]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 6, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
22,Little Italy,Italian Restaurant,Bakery,Chinese Restaurant,Café,Seafood Restaurant,Vietnamese Restaurant,Sandwich Place,Pizza Place,French Restaurant,Asian Restaurant


In [30]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 7, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
3,Inwood,Pizza Place,Mexican Restaurant,Deli / Bodega,Bakery,Café,Restaurant,Chinese Restaurant,American Restaurant,Donut Shop,Diner
5,Manhattanville,Chinese Restaurant,Deli / Bodega,Mexican Restaurant,Seafood Restaurant,Sushi Restaurant,Italian Restaurant,Fried Chicken Joint,Sandwich Place,Noodle House,Caribbean Restaurant
6,Central Harlem,Fried Chicken Joint,Deli / Bodega,Chinese Restaurant,African Restaurant,Pizza Place,Seafood Restaurant,Caribbean Restaurant,Southern / Soul Food Restaurant,Sandwich Place,French Restaurant
9,Yorkville,Pizza Place,Italian Restaurant,Deli / Bodega,Japanese Restaurant,Sushi Restaurant,Thai Restaurant,Sandwich Place,Restaurant,Bakery,Vietnamese Restaurant
10,Lenox Hill,Italian Restaurant,Sushi Restaurant,Pizza Place,Deli / Bodega,Burger Joint,Diner,Café,Restaurant,Mexican Restaurant,Chinese Restaurant
11,Roosevelt Island,Sandwich Place,Deli / Bodega,Bakery,Kosher Restaurant,Greek Restaurant,Pizza Place,Chinese Restaurant,Café,Japanese Restaurant,American Restaurant
20,Lower East Side,Pizza Place,Chinese Restaurant,Bakery,Café,Japanese Restaurant,Latin American Restaurant,Ramen Restaurant,Sandwich Place,Deli / Bodega,Diner
23,Soho,Italian Restaurant,French Restaurant,Café,American Restaurant,Bakery,Seafood Restaurant,Sandwich Place,Mediterranean Restaurant,Mexican Restaurant,Pizza Place
26,Morningside Heights,Deli / Bodega,Café,Pizza Place,Food Truck,American Restaurant,Burger Joint,Chinese Restaurant,New American Restaurant,Restaurant,Korean Restaurant
27,Gramercy,Italian Restaurant,Bagel Shop,Deli / Bodega,Mexican Restaurant,Restaurant,Pizza Place,Diner,Thai Restaurant,Vietnamese Restaurant,American Restaurant


In [31]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 8, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
2,Washington Heights,Deli / Bodega,Pizza Place,Mexican Restaurant,Sandwich Place,Spanish Restaurant,Chinese Restaurant,Bakery,Café,Donut Shop,Latin American Restaurant
21,Tribeca,American Restaurant,Italian Restaurant,Deli / Bodega,Sandwich Place,Café,Greek Restaurant,Bakery,Steakhouse,French Restaurant,Restaurant


In [32]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 9, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant
7,East Harlem,Mexican Restaurant,Pizza Place,Bakery,Latin American Restaurant,Deli / Bodega,Fast Food Restaurant,Burger Joint,Thai Restaurant,Steakhouse,Fried Chicken Joint
13,Lincoln Square,Food Truck,Café,Italian Restaurant,American Restaurant,French Restaurant,Chinese Restaurant,Pizza Place,Bakery,Deli / Bodega,Mediterranean Restaurant
15,Midtown,American Restaurant,Sandwich Place,Bakery,Japanese Restaurant,Food Truck,Steakhouse,Cuban Restaurant,Pizza Place,French Restaurant,Sushi Restaurant
37,Stuyvesant Town,Sandwich Place,German Restaurant,Food Court,Diner,Donut Shop,Dosa Place,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant


## 5. Discussion

By clustering the neighborhoods in Manhattan, we were able to tell the taste differences between different neighborhoods and also able to find out which neighborhood are silimar to other ones. This gave us some insights in regards to choosing the optimal location for our new restaurant.

For example, the first cluster we had above, which had Chinatown, Hamilton Heights, Upper East Side, and East Villiage, showed high interests in Asian cuisines. The fourth cluster preferred Italian food over other cuisines.

## 6. Conclusion

With the help of machine learning techniques, we were able to merge different neighborhoods into clusters based on their preferences in restaurants. This also provided insights for the problem we tried to solve in the beginning - how to choose the best place to open a restaurant. By analyzing the clusters, we were able to identify neighborhoods that are better for opening a specific type of restaurant.