# Battle of Neighborhoods in New York City

## Table of contents
* [Introduction](#introduction)
* [Business Problem](#business_problem)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

New York is a famous city, which attracts many tourists and workers from all over the world.

Opening a business activity in this city can be very profitable but also very risky, as it is necessary to know well the neighborhoods and the commercial businesses. Not all neighborhoods in New York City are the same, and for each of them there can be advantages and disadvantages, for example, a less famous neighborhood can attract less tourists and workers, but there should be less competition from competing businesses.

## Business Problem <a name="business_problem"></a>

In this project the main goal is to find the best neighborhood/area to open an italian restaurant or pizzeria. Specifically, this report will be targeted to stakeholders interested in opening an **Italian restaurant/pizzeria** in **New York**, USA.

It's important to understand which is the right neighborhood to open this kind of activity, because in NYC there are already many italian restaurant/pizzeria activities, and therefore opening it in the right neighborhood can make the difference.

We will try to detect **locations that are not already crowded with restaurants**, especially we are particularly interested in **areas with no Italian restaurants/pizzeria in vicinity**. We would also prefer locations **as close to the biggest borough as possible**, assuming that first two conditions are met.

## Data <a name="data"></a>

Based on the business problem defined above, fist of all we must find the biggest borough in New York, or the most densely populated borough. After that, the factors on which we will base our choice are the following:
* number of restaurants in each neighborhood
* number of Italian restaurants or pizzeria in the neighborhood, if any
* Population density in each neighborhood
* Latitude and longitude of each neighborhood

Following data sources will be needed to extract/generate the required information:
* New York City boroughs are taken from wikipedia page [NYC Boroughs](https://en.wikipedia.org/wiki/New_York_City#Boroughs). We will use this table to determine which is the "target borough".
* Neighborhoods, latitudes and longitudes are taken here: [NYC neighborhoods, latitudes and longitudes](https://geo.nyu.edu/catalog/nyu_2451_34572)
* Neighborhoods population density are taken here: [NYC Population Density](https://data.cityofnewyork.us/City-Government/New-York-City-Population-By-Neighborhood-Tabulatio/swpk-hqdp)
* Foursquare API for find restaurants in each neighbohood, especially italian restaurant and pizzeria.


### Boroughs Candidate

As mentioned before, first of all we need to understand which is our "target borough". To do so, we need to escrape data (using beautifulSoup) from the following wikipedia page: [NYC Boroughs](https://en.wikipedia.org/wiki/New_York_City#Boroughs).

In [1]:
! pip install beautifulsoup4
! pip install requests
! pip install pandas
! pip install geopy
! pip install folium
! pip install numpy
! pip install sklearn
! pip install matplotlib



In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

Now we can escrape data from Wikipedia.

In [3]:
# getting the text from Wiki
wikipedia_url = requests.get('https://en.wikipedia.org/wiki/New_York_City#Boroughs').text
# creating BS instance
soup = BeautifulSoup(wikipedia_url, 'html.parser')

From the wikipedia page we have to extract the table containing NYC borough. If we inspect the page, we can see that the table class is "wikitable sortable".

In [4]:
nyc_boroughs = soup.find('table',{'class': 'wikitable sortable'})
# get headers and data and append them into lists
table_rows = nyc_boroughs.find_all('tr')
headers = table_rows[2].text.strip().split('\n')
data = []
for row in table_rows[3:-3]:
    data.append([table_data.text.strip() for table_data in row.find_all('td')])

In [5]:
# create dataframe
nyc_boroughs_df = pd.DataFrame(data, columns=headers)
nyc_boroughs_df

Unnamed: 0,Borough,County,Estimate (2018)[151],billions(US$)[152],per capita(US$),square miles,squarekm,persons / sq. mi,persons /km2
0,The Bronx,Bronx,1432132,42.695,29200,42.1,109.04,34653,13231
1,Brooklyn,Kings,2582830,91.559,34600,70.82,183.42,37137,14649
2,Manhattan,New York,1628701,600.244,360900,22.83,59.13,72033,27826
3,Queens,Queens,2278906,93.31,39600,108.53,281.09,21460,8354
4,Staten Island,Richmond,476179,14.514,30300,58.37,151.18,8112,3132


As we can see in the dataframe above, New York has 5 boroughs. Although Manhattan has the highest density per square meter (72,033), the estimated population is higher in Brooklyn (2,582,830), so is the city's most populous borough. Moreover, Brooklyn is known for its cultural, social, and ethnic diversity, therefore we can choose **Brooklyn** as our target borough.

Now that we have find our target borough, the next step is to analyze NYC map, more precisely Brooklyn borough.

### Exploration of the target borough

To explore our target borough, we need geographical data about New York, with latitude and longitude of each borough/neighbohood. So, let's download the dataset:

In [6]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


Next, let's load the data.

In [7]:
import json

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

If we take a look of the data, we notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [8]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0:2] 

[{'type': 'Feature',
  'id': 'nyu_2451_34572.1',
  'geometry': {'type': 'Point',
   'coordinates': [-73.84720052054902, 40.89470517661]},
  'geometry_name': 'geom',
  'properties': {'name': 'Wakefield',
   'stacked': 1,
   'annoline1': 'Wakefield',
   'annoline2': None,
   'annoline3': None,
   'annoangle': 0.0,
   'borough': 'Bronx',
   'bbox': [-73.84720052054902,
    40.89470517661,
    -73.84720052054902,
    40.89470517661]}},
 {'type': 'Feature',
  'id': 'nyu_2451_34572.2',
  'geometry': {'type': 'Point',
   'coordinates': [-73.82993910812398, 40.87429419303012]},
  'geometry_name': 'geom',
  'properties': {'name': 'Co-op City',
   'stacked': 2,
   'annoline1': 'Co-op',
   'annoline2': 'City',
   'annoline3': None,
   'annoangle': 0.0,
   'borough': 'Bronx',
   'bbox': [-73.82993910812398,
    40.87429419303012,
    -73.82993910812398,
    40.87429419303012]}}]

Now that we have all the geographic data to analyze Brooklyn, we can tranform the data into a *pandas* dataframe.

In [9]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Then let's loop through the data and fill the dataframe one row at a time.

In [10]:
# for each value in NY neighborhoods json, we takes values and we append them into our dataframe
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [11]:
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In the dataframe above we can see all the NYC boroughs. However, we are interested only at Brooklyn borough. So we slice the original dataframe and create a new dataframe with Brooklyn'S data.

In [12]:
# reset index add an index to dataframe
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
# dataframe containing only Brooklyn neighborhood
brooklyn_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471
5,Brooklyn,Brighton Beach,40.576825,-73.965094
6,Brooklyn,Sheepshead Bay,40.58689,-73.943186
7,Brooklyn,Manhattan Terrace,40.614433,-73.957438
8,Brooklyn,Flatbush,40.636326,-73.958401
9,Brooklyn,Crown Heights,40.670829,-73.943291


The next step is to find the population density for each Brooklyn neighborhood. We can download a dataset containing the population density from the following site:
[NYC Density](https://data.cityofnewyork.us/City-Government/New-York-City-Population-By-Neighborhood-Tabulatio/swpk-hqdp). Let's prepare data:

In [13]:
with open('brooklyn_pop_density.json') as json_data:
    brooklyn_pop_data = json.load(json_data)['data']
brooklyn_pop_data[0]

['row-ehjv-a8u9.7wyj',
 '00000000-0000-0000-7947-CE85B9FCC11E',
 0,
 1425769289,
 None,
 1425769289,
 None,
 '{ }',
 'Brooklyn',
 '2000',
 '047',
 'BK09',
 'Brooklyn Heights-Cobble Hill',
 '22548']

The last 2 elements of each element are the neighborhood name and the population. Now the next step is to integrate these informations in our dataframe:

In [14]:
# create and empty column
brooklyn_data['Population Density'] = np.NaN
for neigh_population in brooklyn_pop_data:
    for index, row in brooklyn_data.iterrows():
        # if there is a match, add the value into df
        if row['Neighborhood'] in neigh_population[12] or neigh_population[12] in row['Neighborhood']:
            brooklyn_data.loc[index, 'Population Density'] = neigh_population[13]
# delete row that contains NaN value in population density'
brooklyn_data = brooklyn_data[pd.notnull(brooklyn_data['Population Density'])]

In [15]:
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population Density
0,Brooklyn,Bay Ridge,40.625801,-74.030621,79371
1,Brooklyn,Bensonhurst,40.611009,-73.99518,62978
2,Brooklyn,Sunset Park,40.645103,-74.010316,72340
3,Brooklyn,Greenpoint,40.730201,-73.954241,34719
4,Brooklyn,Gravesend,40.59526,-73.973471,29436


### Brooklyn map

Now we have all the geographical data for visualize map of Brooklyn, containing each of its neighborhoods. To do so, we need to get geographical coordinates of Brooklyn.

In [16]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="brooklyn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


Let's visualize Brooklyn neighborhoods.

In [17]:
import folium # map rendering library

# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood, pop_density in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood'], brooklyn_data['Population Density']):
    label = folium.Popup(f"{neighborhood}, {pop_density}", parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of NYC that have low italian restaurant/pizzeria density and the highest possible population density. The target borough is Brooklyn, because is the borough with the highest population density.

In first step we have found the target borough, and find all the *geographical informations about it*: neighbohoords name, latitude, longitude, population density, and finally visualize a map of it.

Second step in our analysis will be exploration of '**restaurants**' across different neighborhoods of Brooklyn. To do so, we will use **Foursquare API**.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements**: the lowest density of italian restaurants/pizzeria and the highest population density. We will create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / to search the optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

As mentioned before, we analyze venues for each neighborhood of Brooklyn using Foursquare API. We need to import the personal credentials:

In [18]:
import os
CLIENT_ID = os.environ.get("CLIENT_ID") # Foursquare ID
CLIENT_SECRET = os.environ.get("CLIENT_SECRET") # Foursquare Secret
VERSION = os.environ.get("VERSION") # Foursquare API version
LIMIT = 100

We can get the top venues in Brooklyn:

In [19]:
def get_venues(names, latitudes, longitudes, radius=500):
    # method that returns venues using Foursquare API.
    # Params:
    #    names: the value that is to be examined (ex: Neighborhood)
    #    pop_density: the population density of neighborhood
    #    latitude: latitude of the value
    #    longitude: longitude of the value
    #    radius: radius of the area that is to be examined (default=500 mq)
    # Returns a dataframe containing the top venues for each values passed as name.
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        # GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # for each venue, append only the relevant infos (like name, lat, long, categories, exc...)
        venues_list.append([(
            name,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    # create a venue df containing the data obtained 
    venues_df = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues_df.columns = ['Neighborhood','Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(venues_df)

In [20]:
brooklyn_venues = get_venues(brooklyn_data['Neighborhood'], brooklyn_data['Latitude'], brooklyn_data['Longitude'])

Now that we have taken venues for each Brooklyn neighborhood, we can analyze them.

In [21]:
# one hot encoding for venue category
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")
# add neighborhood column to dataframe toronto_onehotenc
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood']
# move neighborhood column to first position
neighborhood_col = brooklyn_onehot.pop('Neighborhood')
brooklyn_onehot.insert(0, 'Neighborhood', neighborhood_col)

brooklyn_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,History Museum,Home Service,Hookah Bar,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Library,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Non-Profit,Noodle House,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pie Shop,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Turkish Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, we can group rows by neighborhoods and taking the mean of the frequency of occurrence of each category. This will be useful later, for clusters neighborhoods.

In [22]:
brooklyn_grouped = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Court,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,Burrito Place,Bus Line,Bus Station,Bus Stop,Butcher,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Herbs & Spices Store,History Museum,Home Service,Hookah Bar,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Latin American Restaurant,Laundromat,Laundry Service,Library,Lingerie Store,Liquor Store,Locksmith,Lounge,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Non-Profit,Noodle House,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Outdoors & Recreation,Outlet Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Photography Studio,Pie Shop,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Pool Hall,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Rental Car Location,Rental Service,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Supplement Shop,Surf Spot,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Toy / Game Store,Trail,Turkish Restaurant,Used Bookstore,Vape Store,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bath Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.085106,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.021277,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.021277,0.0,0.06383,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bay Ridge,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.022989,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.011494,0.0,0.022989,0.011494,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.022989,0.0,0.0,0.011494,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.057471,0.022989,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022989,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.068966,0.011494,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.022989,0.011494,0.0,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,0.0,0.011494,0.0,0.0,0.0,0.0,0.0,0.0,0.011494
2,Bedford Stuyvesant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0
3,Bensonhurst,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bergen Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Cluster Neighborhoods

The third and final step is to cluster the neighborhood using K-means clustering algorithm. But before doing this, we need to create a dataframe containing the top venues for each neighborhood.

In [23]:
def most_common_venues(row, num_top_venues):
    # This method return the most common venues for each row passed as parameter
    # Params:
    #    row: the data to be examined
    #    num_top_venues: the max number of venues
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues (in this case 10 columns)
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe containing venues
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']

for ind in np.arange(brooklyn_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Chinese Restaurant,Pharmacy,Bubble Tea Shop,Gas Station,Fast Food Restaurant,Italian Restaurant,Sushi Restaurant,Pakistani Restaurant,Sandwich Place,Surf Spot
1,Bay Ridge,Italian Restaurant,Spa,Pizza Place,American Restaurant,Greek Restaurant,Pharmacy,Bar,Chinese Restaurant,Playground,Bagel Shop
2,Bedford Stuyvesant,Pizza Place,Coffee Shop,Bar,Café,Discount Store,Juice Bar,Gift Shop,Cocktail Bar,Gourmet Shop,New American Restaurant
3,Bensonhurst,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Chinese Restaurant,Grocery Store,Donut Shop,Bridal Shop,Noodle House,Factory,Liquor Store
4,Bergen Beach,Harbor / Marina,Baseball Field,Athletics & Sports,Playground,Donut Shop,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant


Now we can execute the K-means clustering algorithm:

In [25]:
from sklearn.cluster import KMeans

# set number of clusters (arbitrary)
kclusters = 5

# drop Neighborhood column 
brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', 1)

# run k-means clustering, compute it with "fit" method
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)
kmeans.labels_

array([1, 1, 1, 1, 4, 1, 0, 1, 1, 1, 1, 2, 1, 1, 1, 1, 3, 0, 3, 1, 2, 3,
       3, 1, 3, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 3, 3, 1, 1, 3, 1, 1, 1, 1, 3])

The next step is to create a dataframe containing Brooklyn data and the cluster's labels:

In [26]:
# add clustering labels into the df created above
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# reminder: brooklyn_data is a df containing borough, neighborhood, latitude, longitude and population density
brooklyn_merged = brooklyn_data

# join toronto_merged df with neighborhoods_venues_sorted, setting the index the Neighborhood column)
brooklyn_merged = brooklyn_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
# drop NaN value
brooklyn_merged.dropna(inplace=True)
# cast labels from float to int
brooklyn_merged["Cluster Labels"].astype(int)
brooklyn_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,79371,1,Italian Restaurant,Spa,Pizza Place,American Restaurant,Greek Restaurant,Pharmacy,Bar,Chinese Restaurant,Playground,Bagel Shop
1,Brooklyn,Bensonhurst,40.611009,-73.99518,62978,1,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Chinese Restaurant,Grocery Store,Donut Shop,Bridal Shop,Noodle House,Factory,Liquor Store
2,Brooklyn,Sunset Park,40.645103,-74.010316,72340,1,Latin American Restaurant,Pizza Place,Mexican Restaurant,Bank,Bakery,Gym,Mobile Phone Shop,Fried Chicken Joint,Pharmacy,Sandwich Place
3,Brooklyn,Greenpoint,40.730201,-73.954241,34719,1,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Yoga Studio,Thrift / Vintage Store,Record Shop,New American Restaurant,Tea Room,Deli / Bodega
4,Brooklyn,Gravesend,40.59526,-73.973471,29436,1,Chinese Restaurant,Pizza Place,Lounge,Bakery,Diner,Gym,Music Store,Spa,Music Venue,Furniture / Home Store


The dataframe above contains the borough, the neighborhood, population density, the latitude and longitude of each zone, the cluster of which the neighborhood is part of, and the 10 most common venues. Now we have all the informations for visualize the clusters!

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, pop_dens in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], 
                                  brooklyn_merged['Cluster Labels'], brooklyn_merged['Population Density']):
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ', cluster ' + str(cluster) + ', pop density ' +str(pop_dens), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and discussion <a name="results"></a>

Finally we have all the informations for analyze the results and discuss which is the best neighborhood to open a an italian restaurant/pizzeria.

Let's start analyzing each cluster:

In [28]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Crown Heights,39670,0,Pizza Place,Café,Museum,Burger Joint,Coffee Shop,Convenience Store,Bank,Bakery,Bagel Shop,Candy Store
34,Borough Park,106357,0,Bank,Pizza Place,Deli / Bodega,Pharmacy,Grocery Store,Café,American Restaurant,Hotel,Farmers Market,Fast Food Restaurant
46,Midwood,52835,0,Pizza Place,Ice Cream Shop,Pharmacy,Candy Store,Bakery,Bagel Shop,Electronics Store,Video Game Store,Convenience Store,Flower Shop


*Cluster 0*: although there is a high population density (especially at Borough Park), the most common venue in this cluster is Pizza Place, a therefore it is not recommended to open a pizzeria. However, considering the low presence of ethnic restaurants and the high population density in **Borough Park**, this neighborhood could be a **good option** to open an Italian restaurant.

In [29]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,79371,1,Italian Restaurant,Spa,Pizza Place,American Restaurant,Greek Restaurant,Pharmacy,Bar,Chinese Restaurant,Playground,Bagel Shop
1,Bensonhurst,62978,1,Italian Restaurant,Sushi Restaurant,Ice Cream Shop,Chinese Restaurant,Grocery Store,Donut Shop,Bridal Shop,Noodle House,Factory,Liquor Store
2,Sunset Park,72340,1,Latin American Restaurant,Pizza Place,Mexican Restaurant,Bank,Bakery,Gym,Mobile Phone Shop,Fried Chicken Joint,Pharmacy,Sandwich Place
3,Greenpoint,34719,1,Bar,Pizza Place,Coffee Shop,Cocktail Bar,Yoga Studio,Thrift / Vintage Store,Record Shop,New American Restaurant,Tea Room,Deli / Bodega
4,Gravesend,29436,1,Chinese Restaurant,Pizza Place,Lounge,Bakery,Diner,Gym,Music Store,Spa,Music Venue,Furniture / Home Store
5,Brighton Beach,35547,1,Restaurant,Eastern European Restaurant,Russian Restaurant,Beach,Mobile Phone Shop,Bank,Sushi Restaurant,Gourmet Shop,Supplement Shop,Taco Place
6,Sheepshead Bay,64518,1,Turkish Restaurant,Dessert Shop,Sandwich Place,Yoga Studio,Buffet,Diner,Miscellaneous Shop,Café,Chinese Restaurant,Russian Restaurant
8,Flatbush,50355,1,Coffee Shop,Juice Bar,Caribbean Restaurant,Bank,Mexican Restaurant,Bakery,Bagel Shop,Lounge,Sandwich Place,Chinese Restaurant
11,Kensington,36891,1,Thai Restaurant,Grocery Store,Ice Cream Shop,Pizza Place,Mexican Restaurant,Mobile Phone Shop,Furniture / Home Store,Burger Joint,Gas Station,Café
12,Windsor Terrace,20988,1,Café,Diner,Park,Plaza,Deli / Bodega,Butcher,Thrift / Vintage Store,Coffee Shop,Bookstore,Sushi Restaurant


*Cluster 1*: this is the largest cluster, in fact, it contains most of the neighborhoods of Brooklyn. This cluster is characterized by a massive presence of ethnic restaurants of all kinds (Indian, Japanese, Italian, Chinese, etc.), especially in larger neighborhoods. Open an Italian restaurant/pizzeria in one of these neighborhoods could  be a risky choice.

In [30]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Canarsie,83693,2,Deli / Bodega,Asian Restaurant,Food,Caribbean Restaurant,Thai Restaurant,Grocery Store,Chinese Restaurant,Gym,Dry Cleaner,Discount Store
35,Dyker Heights,42419,2,Burger Joint,Dance Studio,Golf Course,Bagel Shop,Food,Yoga Studio,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Field
68,Madison,38917,2,Bagel Shop,Dessert Shop,Italian Restaurant,Deli / Bodega,Candy Store,Pizza Place,Pilates Studio,Filipino Restaurant,Farm,Farmers Market


*Cluster 2*: in this cluster the italian restaurants and pizzeria represent the most popular venues, especially in Madison neighborhood, where opening an italian restaurant or pizzeria could lead to high competition due to the other similar commercial businesses already present. However, **Canarsie** neighborhood could be a valid choice to open italian restaurant or pizzeria,  because it has a high population density and there aren't competing activities. However, the presence of other ethnic restaurants, although not italian, should not be underestimated.

In [31]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,East Flatbush,50355,3,Food & Drink Shop,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Park,Supermarket,Caribbean Restaurant,Hardware Store,Print Shop,Moving Target
25,Cypress Hills,49223,3,Fried Chicken Joint,Donut Shop,Latin American Restaurant,Fast Food Restaurant,Ice Cream Shop,Food,Women's Store,Coffee Shop,Baseball Field,Seafood Restaurant
26,East New York,29343,3,Deli / Bodega,Spanish Restaurant,Gym,Pizza Place,Convenience Store,Salon / Barbershop,Event Service,Caribbean Restaurant,Fast Food Restaurant,Fried Chicken Joint
27,Starrett City,13354,3,Moving Target,Pizza Place,Convenience Store,Cosmetics Shop,Donut Shop,Pharmacy,American Restaurant,Caribbean Restaurant,Food & Drink Shop,Food
29,Flatlands,64762,3,Pharmacy,Fried Chicken Joint,Fast Food Restaurant,Caribbean Restaurant,Lounge,Paper / Office Supplies Store,Park,Electronics Store,Nightclub,Video Store
32,Coney Island,31965,3,Caribbean Restaurant,Baseball Stadium,Brewery,Athletics & Sports,Pharmacy,Music Venue,Skating Rink,Food Court,Monument / Landmark,Beach
55,Wingate,67459,3,Fried Chicken Joint,Gym / Fitness Center,Pizza Place,Deli / Bodega,Donut Shop,Pharmacy,Park,Caribbean Restaurant,Fast Food Restaurant,Field
56,Rugby,55326,3,Caribbean Restaurant,Grocery Store,Bank,Pizza Place,Sandwich Place,Chinese Restaurant,Seafood Restaurant,Fast Food Restaurant,Beach,Pharmacy
57,Remsen Village,55326,3,Caribbean Restaurant,Fast Food Restaurant,Gym,Sandwich Place,Deli / Bodega,Breakfast Spot,Fried Chicken Joint,Fish Market,Pharmacy,Donut Shop
69,Erasmus,29938,3,Caribbean Restaurant,Grocery Store,Yoga Studio,Bank,Pharmacy,Donut Shop,Convenience Store,School,Music Venue,Bus Line


*Cluster 3*: also in this cluster there are many neighborhoods that have a high presence of ethnic restaurants (especially caribbean restaurant), some of them can be valid candidates to open an Italian restaurant/pizzeria, because there is a high population density and low competition by similar activities.

In [32]:
brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,Bergen Beach,45231,4,Harbor / Marina,Baseball Field,Athletics & Sports,Playground,Donut Shop,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant


*Cluster 4*: in this cluster there is only one neighborhood. Although in this neighborhood there are no Italian restaurants or pizzeria, compared to the previous candidates the population density is rather low, therefore other neighborhoods could be better candidates than this.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify neighborhoods in New York with a low presence of restaurants, especially Italian restaurants or pizzeria, in order to aid stakeholders in narrowing down the search for optimal location for a new Italian restaurant/pizzeria.

First of all, we analyzed which could be the best borough to open the restaurant. The analysis led us to choose Brooklyn, thanks to its higher population density than other boroughs. Then, by finding restaurants presence in  each Brooklyn'S neighborhood from Foursquare data we have clustered them. Clustering those locations was performed in order to create major zones of interest (containing greatest number of potential locations) which satisfy some basic requirements regarding existing nearby italian restaurants/pizzeria.

The final decision about the best italian restaurant/pizzeria location will be made by stakeholders based on the specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.