# Introduction/Buisness Problem

A resturant is to be opened in a borough of New York. For this problem Brooklyn, NY is selected. The neighbourhood in which the resturant should be opened is to be decided.The deciding factor in selection is the proximity to other popular resturants. **Foursquare** api is used to get the location data of the top venues in each neighbourhood. The neighbourhoods with least number of resturants are examined for deciding the final location. The locations with the least popular resturants will allow new establishments to grow without much competition. Also the borough will be divided into clusters based on K Mean Clustering.


Importing libraries


In [1]:
%matplotlib inline
import matplotlib.pyplot
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>


## 1. Methodology


New York has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.New York data can be downloaded from:
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
##  **Data**
https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json


Load the data.


In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [3]:
neighborhoods_data = newyork_data['features']

In [4]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Transforming to dataframe


In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [6]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [8]:
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.


In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.


In [10]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.Circle(
        [lat, lng],
        radius=200,
        popup=label,
        color='blue',
        fill=True,
        fill_color='red',
        fill_opacity=0.2,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

We will now cluster only the neighborhoods in Brooklyn. So let's slice the original dataframe and create a new dataframe of the Brooklyn data.


In [12]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


Getting Brooklyn geographical location

In [13]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


Visualization of Brooklyn and the neighborhoods in it.


In [14]:
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  

folium.Circle([latitude,longitude],
             radius=10000,
             color='yellow',
             fill= True,
             fill_opacity=0.15,
             parse_html=False ).add_to(map_brooklyn)
map_brooklyn

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.


#### Define Foursquare Credentials and Version


In [15]:
CLIENT_ID = 'X**********************4' # your Foursquare ID
CLIENT_SECRET = 'T************************J' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
ACCESSS_TOKEN='2*****************************ICEMV'

print('Credentials established:')


Credentials established:


#### Let's explore the first neighborhood in our dataframe.


Get the neighborhood's name.


In [20]:
brooklyn_data.loc[0:, 'Neighborhood']

0         Bay Ridge
1       Bensonhurst
2       Sunset Park
3        Greenpoint
4         Gravesend
          ...      
65            Dumbo
66        Homecrest
67    Highland Park
68          Madison
69          Erasmus
Name: Neighborhood, Length: 70, dtype: object

Get the neighborhood's latitude and longitude values.


In [21]:
neighborhood_latitude = brooklyn_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = brooklyn_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = brooklyn_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bay Ridge are 40.625801065010656, -74.03062069353813.


#### Top 100 venues in each neigborhood within a radius of 750 meters.


In [30]:
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    750, 
    LIMIT)

In [31]:
results = requests.get(url).json()

In [32]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a _pandas_ dataframe.


In [33]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Pilo Arts Day Spa and Salon,Spa,40.624748,-74.030591
1,Bagel Boy,Bagel Shop,40.627896,-74.029335
2,Pegasus Cafe,Breakfast Spot,40.623168,-74.031186
3,Cocoa Grinder,Juice Bar,40.623967,-74.030863
4,Leo's Casa Calamari,Pizza Place,40.6242,-74.030931


In [34]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


<a id='item2'></a>


## 2. Explore Neighborhoods in Brooklyn


Function for getting top 100 venues for all neighbourhoods in Brooklyn

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### For all neigborhoods a new dataframe called _brooklyn_venues_.


In [36]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )


Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


#### Let's check the size of the resulting dataframe


In [37]:
print(brooklyn_venues.shape)
brooklyn_venues.head()
brooklyn_venues.to_csv('brooklyn_venues.csv')

(5656, 7)


In [38]:
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,100,100,100,100,100,100
Bay Ridge,100,100,100,100,100,100
Bedford Stuyvesant,100,100,100,100,100,100
Bensonhurst,100,100,100,100,100,100
Bergen Beach,8,8,8,8,8,8
...,...,...,...,...,...,...
Vinegar Hill,98,98,98,98,98,98
Weeksville,94,94,94,94,94,94
Williamsburg,100,100,100,100,100,100
Windsor Terrace,97,97,97,97,97,97


#### Let's find out how many unique categories can be curated from all the returned venues


In [39]:
print('There are {} uniques categories.'.format(len(brooklyn_venues['Venue Category'].unique())))


There are 353 uniques categories.


<a id='item3'></a>


## 3. Analyze Each Neighborhood


In [40]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 
# move neighborhood column to the first column
fixed_columns = [brooklyn_onehot.columns[-1]] + list(brooklyn_onehot.columns[:-1])
brooklyn_onehot = brooklyn_onehot[fixed_columns]
brooklyn_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Arcade,...,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [41]:
brooklyn_onehot.shape

(5656, 353)

#### Grouping rows by neighborhood and by taking the mean of the frequency of occurrence of each category


In [42]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()


In [103]:
brooklyn_grouped.shape

(70, 353)

In [104]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 5 venues for each neighborhood.


In [105]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bath Beach,Pizza Place,Bank,Japanese Restaurant,Chinese Restaurant,Bakery
1,Bay Ridge,Spa,Pizza Place,Italian Restaurant,Bar,Chinese Restaurant
2,Bedford Stuyvesant,Coffee Shop,Pizza Place,Bar,Café,Wine Shop
3,Bensonhurst,Bank,Pizza Place,Bakery,Japanese Restaurant,Chinese Restaurant
4,Bergen Beach,Stables,Playground,Italian Restaurant,Deli / Bodega,Peruvian Restaurant


In [106]:
num_top_venues = 10
arr=[]

for hood in brooklyn_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_grouped[brooklyn_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    temp.head()
    print('\n')
print(temp)


----Bath Beach----
                  venue  freq
0           Pizza Place  0.08
1                  Bank  0.06
2    Chinese Restaurant  0.05
3   Japanese Restaurant  0.05
4                Bakery  0.04
5  Cantonese Restaurant  0.04
6       Bubble Tea Shop  0.03
7           Coffee Shop  0.03
8    Italian Restaurant  0.03
9      Sushi Restaurant  0.03


----Bay Ridge----
                 venue  freq
0          Pizza Place  0.07
1                  Spa  0.07
2   Italian Restaurant  0.06
3   Chinese Restaurant  0.04
4                  Bar  0.04
5           Bagel Shop  0.03
6     Greek Restaurant  0.03
7       Cosmetics Shop  0.03
8  American Restaurant  0.03
9                 Café  0.02


----Bedford Stuyvesant----
                  venue  freq
0           Coffee Shop  0.11
1                  Café  0.06
2                   Bar  0.06
3           Pizza Place  0.06
4             Wine Shop  0.04
5         Deli / Bodega  0.04
6            Playground  0.03
7    Mexican Restaurant  0.03
8            

In [48]:
temp['venue']
temp['freq']
temp.head(5)

Unnamed: 0,venue,freq
1,Yoga Studio,0.0
2,Accessories Store,0.0
3,Adult Boutique,0.0
4,African Restaurant,0.0
5,American Restaurant,0.0


<a id='item4'></a>


In [107]:
data = brooklyn_venues['Venue Category'].value_counts().to_dict()
sr=pd.DataFrame.from_dict(data,orient='index')
sr

Unnamed: 0,0
Pizza Place,291
Coffee Shop,182
Bar,159
Grocery Store,146
Bakery,138
...,...
Bath House,1
Shop & Service,1
Print Shop,1
Udon Restaurant,1


From the abve table it is clear that the most popular places are pizza shops and coffee shops

## 4. Cluster Neighborhoods


Run _k_-means to cluster the neighborhood into 8 clusters.


In [108]:
# set number of clusters
kclusters = 5

brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,n_init=20,max_iter=500,random_state=0).fit(brooklyn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 1, 4, 1, 1, 4, 4, 2, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.


In [109]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

brooklyn_merged = brooklyn_data


brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,1,Spa,Pizza Place,Italian Restaurant,Bar,Chinese Restaurant
1,Brooklyn,Bensonhurst,40.611009,-73.99518,4,Bank,Pizza Place,Bakery,Japanese Restaurant,Chinese Restaurant
2,Brooklyn,Sunset Park,40.645103,-74.010316,4,Bakery,Mexican Restaurant,Pizza Place,Chinese Restaurant,Asian Restaurant
3,Brooklyn,Greenpoint,40.730201,-73.954241,1,Coffee Shop,Bar,Cocktail Bar,Pizza Place,Yoga Studio
4,Brooklyn,Gravesend,40.59526,-73.973471,4,Pizza Place,Donut Shop,Bakery,Gym,Grocery Store


Finally, let's visualize the resulting clusters


In [110]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.3).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>


## 5. Examine Clusters


We can now examine each cluster and determine cluster and neighbourhood which does not have resturants as the most popular venue. Based on that we can determine the location for opening a resturant.


#### Cluster 1


In [111]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,Flatbush,Pizza Place,Caribbean Restaurant,Mexican Restaurant,Coffee Shop,Bar
28,Canarsie,Caribbean Restaurant,Pizza Place,Bank,Pharmacy,Grocery Store
42,Prospect Lefferts Gardens,Caribbean Restaurant,Café,Pizza Place,Bakery,Wine Shop
47,Prospect Park South,Caribbean Restaurant,Pizza Place,Mexican Restaurant,Grocery Store,Coffee Shop
54,Ditmas Park,Caribbean Restaurant,Mexican Restaurant,Pizza Place,Coffee Shop,Mobile Phone Shop
56,Rugby,Caribbean Restaurant,Pizza Place,Pharmacy,Grocery Store,Fast Food Restaurant
59,Paerdegat Basin,Caribbean Restaurant,Harbor / Marina,Athletics & Sports,Park,Pizza Place
69,Erasmus,Caribbean Restaurant,Discount Store,Pizza Place,Pharmacy,Mobile Phone Shop


#### Cluster 2


In [112]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bay Ridge,Spa,Pizza Place,Italian Restaurant,Bar,Chinese Restaurant
3,Greenpoint,Coffee Shop,Bar,Cocktail Bar,Pizza Place,Yoga Studio
9,Crown Heights,Café,Pizza Place,Caribbean Restaurant,Grocery Store,Southern / Soul Food Restaurant
12,Windsor Terrace,Deli / Bodega,Italian Restaurant,Café,Wine Shop,Playground
13,Prospect Heights,Bar,Mexican Restaurant,Wine Shop,Cocktail Bar,Yoga Studio
15,Williamsburg,Bar,Pizza Place,Coffee Shop,American Restaurant,Wine Bar
16,Bushwick,Bar,Coffee Shop,Mexican Restaurant,Cocktail Bar,Pizza Place
17,Bedford Stuyvesant,Coffee Shop,Pizza Place,Bar,Café,Wine Shop
18,Brooklyn Heights,Park,Yoga Studio,Wine Shop,Pizza Place,Coffee Shop
19,Cobble Hill,Italian Restaurant,Cocktail Bar,Pizza Place,Bar,Yoga Studio


#### Cluster 3


In [116]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,East Flatbush,Supermarket,Discount Store,Caribbean Restaurant,Gym,Check Cashing Service
14,Brownsville,Discount Store,Pizza Place,Fried Chicken Joint,Supermarket,Mobile Phone Shop
25,Cypress Hills,Latin American Restaurant,Fast Food Restaurant,Grocery Store,Pizza Place,Donut Shop
26,East New York,Chinese Restaurant,Pizza Place,Supermarket,Metro Station,Yoga Studio
29,Flatlands,Caribbean Restaurant,Fried Chicken Joint,Pharmacy,Supermarket,Lounge
43,Ocean Hill,Discount Store,Fried Chicken Joint,Fast Food Restaurant,Café,Southern / Soul Food Restaurant
44,City Line,Pizza Place,Fast Food Restaurant,Donut Shop,Chinese Restaurant,Grocery Store
55,Wingate,Grocery Store,Pizza Place,Caribbean Restaurant,Deli / Bodega,Donut Shop
57,Remsen Village,Caribbean Restaurant,Grocery Store,Donut Shop,Fast Food Restaurant,Chinese Restaurant
58,New Lots,Deli / Bodega,Fast Food Restaurant,Pizza Place,Fried Chicken Joint,Caribbean Restaurant


#### Cluster 4


In [117]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
39,Sea Gate,Beach,Supermarket,Bus Station,Convenience Store,Spa


#### Cluster 5


In [118]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Bensonhurst,Bank,Pizza Place,Bakery,Japanese Restaurant,Chinese Restaurant
2,Sunset Park,Bakery,Mexican Restaurant,Pizza Place,Chinese Restaurant,Asian Restaurant
4,Gravesend,Pizza Place,Donut Shop,Bakery,Gym,Grocery Store
5,Brighton Beach,Beach,Sushi Restaurant,Bakery,Restaurant,Pharmacy
6,Sheepshead Bay,Russian Restaurant,Sushi Restaurant,Italian Restaurant,Turkish Restaurant,Pizza Place
7,Manhattan Terrace,Pizza Place,Bank,Donut Shop,Sushi Restaurant,Grocery Store
11,Kensington,Pizza Place,Grocery Store,Gas Station,Thai Restaurant,Café
27,Starrett City,Pizza Place,Mobile Phone Shop,Department Store,Pharmacy,Park
33,Bath Beach,Pizza Place,Bank,Japanese Restaurant,Chinese Restaurant,Bakery
34,Borough Park,Pizza Place,Bank,Grocery Store,Sandwich Place,Pharmacy


### Conclusion

So from examing the above clusters we can predict the neighbourhood where food places should be opened. For example opening a resturant  at **cluster 4 in the neighbourhood of Sea Gate** would be appropiate since there is no popular resturants nearby.

**The above procedure can be applied to all boroughs of New York to determine location for any type of commercial establishment.**

This notebook is the project for the Capstone Project on **Coursera**.