## **The Battle of the Neighborhoods **

### **Explore New York city geographical coordinates dataset**

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Luckily, this dataset exists for free on the web. Link to the dataset: 
- https://geo.nyu.edu/catalog/nyu_2451_34572

First, let's download all the dependencies that we will need.

In [97]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

import csv # implements classes to read and write tabular data in CSV form
from urllib.request import urlopen
import wget
print('Libraries imported.')


Libraries imported.


In [98]:
import wget

#python -m wget -o --output FILE 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json

print('Beginning file download with wget module')

url = 'https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json'
wget.download(url, r'F:\Priya\Python\newyork_data.json')


print('Data downloaded!')

Beginning file download with wget module
Data downloaded!


**Load and explore the data**

In [99]:
with open(r'F:\Priya\Python\newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

All the relevant data is in the features key, which is basically a list of the neighborhoods. So, define a new variable that includes this data.

In [100]:
neighborhoods_data = newyork_data['features']

Take a look at the first item in this list

In [101]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

**Transform the data into a Pandas Dataframe**

The next task is essentially transforming this data of nested Python dictionaries into a Pandas Dataframe. Start by creating an empty Dataframe.

In [102]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [103]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then loop through the data and fill the dataframe one row at a time.

In [104]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Let's make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [105]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


In [106]:
neighborhoods.to_csv('BON1_NYC_GEO.csv',index=False)

**Use geopy library to get the latitude and longitude values of New York City.**

In [107]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Create a map of New York with neighborhoods superimposed on top.

'*Folium*' is a great visualization library. We can zoom into the below map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

In [108]:
# create map of NewYork using latitude and longitude values
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

In [109]:
map_NewYork.save('map_NewYork.html')

### **Web scrapping of Population and Demographics data of New York city from Wikipedia**

#### **A : POPULATION DATA**

Web scrapping of Population data from wikipedia page 
https://en.wikipedia.org/wiki/New_York_City

Download all the dependencies that is needed.

In [110]:
import sys
!{sys.executable} -m pip install geocoder

print('Packages installed.')

Packages installed.


In [111]:
!pip install BeautifulSoup4



In [112]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# conda install -c anaconda beautiful-soup --yes
from bs4 import BeautifulSoup # package for parsing HTML and XML documents

import csv # implements classes to read and write tabular data in CSV form

print('Libraries imported.')

Libraries imported.


Web scrapping of Population Data from wikipedia page using BeautifulSoup.

In [113]:
URL = 'https://en.wikipedia.org/wiki/Demographics_of_New_York_City'
r = requests.get(URL) 
  
soup = BeautifulSoup(r.content, 'html5lib') 
table = soup.find('div', attrs = {'id':'container'}) 

# print(soup.prettify()) 
print('Page Scrapped.')

Page Scrapped.


In [114]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(r.content, 'html5lib')
table = soup.find('table',{'class':'wikitable sortable'})
#print(soup.prettify())

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)


**Load data from CSV**

In [115]:
Pop_data=pd.read_csv('BON2_POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[3,7,8,9,10,11,12,13,14]], axis=1,inplace=True)
print('Data downloaded!')
Pop_data

Data downloaded!


Unnamed: 0,New York City's five boroughsvte,Jurisdiction,Population,Land area,Density,Borough
0,The Bronx\r\n,\r\n Bronx\r\n,"1,418,207\r\n","30,100\r\n",42.10\r\n,109.04\r\n
1,Brooklyn\r\n,\r\n Kings\r\n,"2,559,903\r\n","35,800\r\n",70.82\r\n,183.42\r\n
2,Manhattan\r\n,\r\n New York\r\n,"1,628,706\r\n","368,500\r\n",22.83\r\n,59.13\r\n
3,Queens\r\n,\r\n Queens\r\n,"2,253,858\r\n","41,400\r\n",108.53\r\n,281.09\r\n
4,Staten Island\r\n,\r\n Richmond\r\n,"476,143\r\n","30,500\r\n",58.37\r\n,151.18\r\n
5,City of New York,8336817,842.343,302.64,783.83,27547
6,State of New York,19453561,1731.910,47214,122284,412
7,Sources:[14] and see individual borough articl...,,,,,


Remove white spaces and rename columns

In [116]:
Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.rename(columns={'Borough':'persons_sq_mi','County':'persons_sq_km'}, inplace=True)
#Pop_data['NewYorkCitysfiveboroughsvte\r\n']
Pop_data.keys()


Index(['NewYorkCitysfiveboroughsvte\r\n', 'Jurisdiction\r\n', 'Population\r\n',
       'Landarea\r\n', 'Density\r\n', 'persons_sq_mi'],
      dtype='object')

In [117]:
#Pop_data.rename(columns={'NewYorkCitysfiveboroughsvte':'Borough'})
Pop_data['NewYorkCitysfiveboroughsvte\r\n']=Pop_data['NewYorkCitysfiveboroughsvte\r\n'].str.strip('\r\n')
Pop_data['Jurisdiction\r\n']=Pop_data['Jurisdiction\r\n'].str.strip('\r\n')
Pop_data['Population\r\n']=Pop_data['Population\r\n'].str.strip('\r\n')
Pop_data['Landarea\r\n']=Pop_data['Landarea\r\n'].str.strip('\r\n')
Pop_data['Density\r\n']=Pop_data['Density\r\n'].str.strip('\r\n')
Pop_data['persons_sq_mi']=Pop_data['persons_sq_mi'].str.strip('\r\n')

#Pop_data['NewYorkCitysfiveboroughsvte\r\n']P
Pop_data.keys()
Pop_data
list(Pop_data.columns)

['NewYorkCitysfiveboroughsvte\r\n',
 'Jurisdiction\r\n',
 'Population\r\n',
 'Landarea\r\n',
 'Density\r\n',
 'persons_sq_mi']

Replace newline('\n') from each string from left and right sides and delete unnecessary columns 

In [118]:
Pop_data.rename(columns={'NewYorkCitysfiveboroughsvte\r\n':'Borough', 'Jurisdiction\r\n':'County',
                   'Population\r\n':'Estimate_2017', 
                   'Landarea\r\n':'square_miles',
                    'Density\r\n':'square_km'}, inplace=True)


Pop_data

Unnamed: 0,Borough,County,Estimate_2017,square_miles,square_km,persons_sq_mi
0,The Bronx,Bronx,1418207.0,30100.0,42.1,109.04
1,Brooklyn,Kings,2559903.0,35800.0,70.82,183.42
2,Manhattan,New York,1628706.0,368500.0,22.83,59.13
3,Queens,Queens,2253858.0,41400.0,108.53,281.09
4,Staten Island,Richmond,476143.0,30500.0,58.37,151.18
5,City of New York,8336817,842.343,302.64,783.83,27547.0
6,State of New York,19453561,1731.91,47214.0,122284.0,412.0
7,Sources:[14] and see individual borough articles,,,,,


In [119]:
Pop_data.iloc[5] = Pop_data.iloc[5].shift(1,axis=0)
Pop_data.iloc[6] = Pop_data.iloc[6].shift(1,axis=0)
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,square_miles,square_km,persons_sq_mi
0,The Bronx,Bronx,1418207.0,30100.0,42.1,109.04
1,Brooklyn,Kings,2559903.0,35800.0,70.82,183.42
2,Manhattan,New York,1628706.0,368500.0,22.83,59.13
3,Queens,Queens,2253858.0,41400.0,108.53,281.09
4,Staten Island,Richmond,476143.0,30500.0,58.37,151.18
5,,City of New York,8336817.0,842.343,302.64,783.83
6,,State of New York,19453561.0,1731.91,47214.0,122284.0
7,Sources:[14] and see individual borough articles,,,,,


Now let's remove NaN value 

In [120]:
Pop_data = Pop_data.fillna('')
Pop_data

Unnamed: 0,Borough,County,Estimate_2017,square_miles,square_km,persons_sq_mi
0,The Bronx,Bronx,1418207.0,30100.0,42.1,109.04
1,Brooklyn,Kings,2559903.0,35800.0,70.82,183.42
2,Manhattan,New York,1628706.0,368500.0,22.83,59.13
3,Queens,Queens,2253858.0,41400.0,108.53,281.09
4,Staten Island,Richmond,476143.0,30500.0,58.37,151.18
5,,City of New York,8336817.0,842.343,302.64,783.83
6,,State of New York,19453561.0,1731.91,47214.0,122284.0
7,Sources:[14] and see individual borough articles,,,,,


and the last unnecessary row 

In [121]:
i = Pop_data[((Pop_data.Borough == 'Sources:[14] and see individual borough articles'))].index
Pop_data.drop(i)

Unnamed: 0,Borough,County,Estimate_2017,square_miles,square_km,persons_sq_mi
0,The Bronx,Bronx,1418207,30100.0,42.1,109.04
1,Brooklyn,Kings,2559903,35800.0,70.82,183.42
2,Manhattan,New York,1628706,368500.0,22.83,59.13
3,Queens,Queens,2253858,41400.0,108.53,281.09
4,Staten Island,Richmond,476143,30500.0,58.37,151.18
5,,City of New York,8336817,842.343,302.64,783.83
6,,State of New York,19453561,1731.91,47214.0,122284.0


Save Dataframe as CSV 

In [122]:
Pop_data.to_csv('BON2_POPULATION.csv',index=False)

### **B : DEMOGRAPHICS DATA**

We will web scrap Demographics data from wikipedia page 
- https://en.wikipedia.org/wiki/Demographic_history_of_New_York_City

Web scrapping of Demographics data from wikipedia page using BeautifulSoup.

In [123]:
URL = 'https://en.wikipedia.org/wiki/Demographic_history_of_New_York_City'
r = requests.get(URL) 
  
soup = BeautifulSoup(r.content, 'html5lib') 
table = soup.find('div', attrs = {'id':'container'}) 

# print(soup.prettify()) 
print('Page Scrapped.')

Page Scrapped.


In [124]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographic_history_of_New_York_City').text
soup = BeautifulSoup(website_url,'html5lib')
table = soup.find('table',{'class':'wikitable sortable'})
#print(soup.prettify())

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('NYC_DEMO.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

Load data from CSV

In [125]:
Demo_data=pd.read_csv('NYC_DEMO.csv')
print('Data downloaded!')

Data downloaded!


In [126]:
Demo_data

Unnamed: 0,Year,Population,White(includes White Hispanics),%W,Non-Hispanic Whites,%ANG,Black,%B,Asian,%A,Other orMixed,%O/M,Hispanic/Latino,%H/L,Foreignborn,%FB
0,1900,3437202,3369898,98.04,,,60666,1.76,6607,0.19,31,0.0,,,1270080,36.95
1,1910,4766883,4669162,97.95,,,91709,1.92,5669,0.12,343,0.01,,,1944357,40.79
2,1920,5620048,5459463,97.14,,,152467,2.71,7969,0.14,149,0.0,,,2028160,36.09
3,1930,6930446,6589377,95.08,,,327706,4.73,12972,0.19,391,0.01,,,2358686,34.03
4,1940,7454995,6977501,93.59,6856586.0,91.97,458444,6.15,17986,0.24,1064,0.01,120915.0,1.62,2138657,28.69
5,1950,7891957,7116441,90.17,,,747608,9.47,21441,0.27,6467,0.08,,,1784206,22.61
6,1960,7781984,6640662,85.33,,,1087931,13.98,43103,0.55,10288,0.13,,,1558690,20.03
7,1970,7894862,6048841,76.62,4969749.0,62.95,1668115,21.13,94499,1.2,83407,1.06,1278630.0,16.2,1437058,18.2
8,1980,7071639,4294075,60.72,3668945.0,51.88,1784337,25.23,231501,3.27,761762,10.77,1406024.0,19.88,1670199,23.62
9,1990,7322564,3827088,52.26,3163125.0,43.2,2102512,28.71,512719,7.0,880245,12.02,1783511.0,24.36,2082931,28.45


In [127]:
Demo_data.columns

Index(['Year', 'Population', 'White(includes White Hispanics)', '%W',
       'Non-Hispanic Whites', '%ANG', 'Black', '%B', 'Asian', '%A',
       'Other orMixed', '%O/M', 'Hispanic/Latino', '%H/L', 'Foreignborn',
       '%FB\r\n'],
      dtype='object')

Remove NaN values 

In [128]:
Demo_data.columns = Demo_data.columns.str.replace(' ', '')

In [129]:
Demo_data= Demo_data.fillna('')
Demo_data

Unnamed: 0,Year,Population,White(includesWhiteHispanics),%W,Non-HispanicWhites,%ANG,Black,%B,Asian,%A,OtherorMixed,%O/M,Hispanic/Latino,%H/L,Foreignborn,%FB
0,1900,3437202,3369898,98.04,,,60666,1.76,6607,0.19,31,0.0,,,1270080,36.95
1,1910,4766883,4669162,97.95,,,91709,1.92,5669,0.12,343,0.01,,,1944357,40.79
2,1920,5620048,5459463,97.14,,,152467,2.71,7969,0.14,149,0.0,,,2028160,36.09
3,1930,6930446,6589377,95.08,,,327706,4.73,12972,0.19,391,0.01,,,2358686,34.03
4,1940,7454995,6977501,93.59,6856586.0,91.97,458444,6.15,17986,0.24,1064,0.01,120915.0,1.62,2138657,28.69
5,1950,7891957,7116441,90.17,,,747608,9.47,21441,0.27,6467,0.08,,,1784206,22.61
6,1960,7781984,6640662,85.33,,,1087931,13.98,43103,0.55,10288,0.13,,,1558690,20.03
7,1970,7894862,6048841,76.62,4969749.0,62.95,1668115,21.13,94499,1.2,83407,1.06,1278630.0,16.2,1437058,18.2
8,1980,7071639,4294075,60.72,3668945.0,51.88,1784337,25.23,231501,3.27,761762,10.77,1406024.0,19.88,1670199,23.62
9,1990,7322564,3827088,52.26,3163125.0,43.2,2102512,28.71,512719,7.0,880245,12.02,1783511.0,24.36,2082931,28.45


Save data as BON2_DEMOGRAPHICS.csv

In [130]:
Demo_data.to_csv('BON2_DEMOGRAPHICS.csv',index=False)

## **Download and Explore New York city its Restaurants & Cuisine dataset**

This data is extracted from 
- https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j/data

let's first read our csv file .

In [131]:
#use necessery data
use_cols = ["DBA", "BORO", "STREET", "CUISINE DESCRIPTION", "Latitude","Longitude"]
ignore_cols = ["CAMIS", "BUILDING ","ZIPCODE","PHONE","INSPECTION DATE","ACTION","VIOLATION CODE","VIOLATION DESCRIPTION","CRITICAL FLAG","SCORE","GRADE","GRADE DATE","RECORD DATE",
               "INSPECTION TYPE","Community Board","Council District","Census Tract","BIN"]
#read csv                                                                                                  
NYC_Rest = pd.read_csv("DOHMH_New_York_City_Restaurant_Inspection_Results.csv", usecols=use_cols)

In [132]:
NYC_Rest.rename(columns={'DBA':'Restaurant','BORO':'Borough','STREET':'Street','CUISINE DESCRIPTION':'Cuisine'}, inplace=True)

NYC_Rest.head()

Unnamed: 0,Restaurant,Borough,Street,Cuisine,Latitude,Longitude
0,THE CLAM,Manhattan,HUDSON STREET,American,40.730429,-74.006809
1,IGNITED RESTAURANT & LOUNGE,Queens,STEINWAY ST,Middle Eastern,40.763017,-73.915688
2,BELLA PIZZA & GRILL,Queens,165TH ST,Pizza,40.707367,-73.796086
3,DAILY BAGEL,Manhattan,1 AVENUE,American,40.76076,-73.961145
4,BAYRIDGE SUSHI,Brooklyn,3 AVENUE,Japanese,40.635517,-74.026136


In [133]:
NYC_Rest.shape

(399661, 6)

In [134]:
NYC_Rest.to_csv('NYC_Rest.csv',index=False)

In [135]:
NYCR_Data = pd.read_csv("NYC_Rest.csv")
NYCR_Data.head()

Unnamed: 0,Restaurant,Borough,Street,Cuisine,Latitude,Longitude
0,THE CLAM,Manhattan,HUDSON STREET,American,40.730429,-74.006809
1,IGNITED RESTAURANT & LOUNGE,Queens,STEINWAY ST,Middle Eastern,40.763017,-73.915688
2,BELLA PIZZA & GRILL,Queens,165TH ST,Pizza,40.707367,-73.796086
3,DAILY BAGEL,Manhattan,1 AVENUE,American,40.76076,-73.961145
4,BAYRIDGE SUSHI,Brooklyn,3 AVENUE,Japanese,40.635517,-74.026136


Now let's discover the Borough and cuisine

In [136]:
print(NYCR_Data.Borough.unique())

['Manhattan' 'Queens' 'Brooklyn' 'Staten Island' 'Bronx' '0']


In [137]:
print(NYCR_Data.Cuisine.unique())

['American' 'Middle Eastern' 'Pizza' 'Japanese' 'Jewish/Kosher'
 'Caribbean' 'Donuts' 'Sandwiches' 'Café/Coffee/Tea' 'Italian' 'Peruvian'
 'Latin (Cuban, Dominican, Puerto Rican, South & Central American)'
 'Eastern European' 'Korean' 'Chinese' 'Mexican' 'Irish' 'Spanish'
 'French' 'Portuguese' 'Vegetarian' 'Pizza/Italian' 'Indian'
 'Juice, Smoothies, Fruit Salads' 'Bangladeshi' 'Chicken' 'Greek'
 'Bottled beverages, including water, sodas, juices, etc.' 'African'
 'Mediterranean' 'Delicatessen' 'Pakistani' 'Hamburgers' 'Bakery'
 'Soul Food' 'Asian' 'Tex-Mex' 'Chinese/Japanese' 'Bagels/Pretzels'
 'Ice Cream, Gelato, Yogurt, Ices' 'Other' 'Vietnamese/Cambodian/Malaysia'
 'Thai' 'Hawaiian' 'Creole' 'Tapas' 'Australian'
 'Sandwiches/Salads/Mixed Buffet' 'Salads' 'Hotdogs/Pretzels' 'Seafood'
 'Egyptian' 'Filipino' 'Russian' 'Steak' 'Barbecue' 'Fruits/Vegetables'
 'Cajun' 'Hotdogs' 'Continental' 'Ethiopian' 'Chinese/Cuban' 'Chilean'
 'Polish' 'Soups & Sandwiches' 'Afghan' 'English' 'Pancake

let's count number of Borough who have restaurant.

In [138]:
NYCR_Data['Borough'].value_counts().to_frame()

Unnamed: 0,Borough
Manhattan,157758
Brooklyn,101734
Queens,90423
Bronx,36166
Staten Island,13348
0,232


The top cuisine in New York City.

In [139]:
NYCR_Data['Cuisine'].value_counts().to_frame()

Unnamed: 0,Cuisine
American,83161
Chinese,42239
Café/Coffee/Tea,19782
"Latin (Cuban, Dominican, Puerto Rican, South & Central American)",17585
Pizza,17364
Mexican,16656
Italian,16180
Caribbean,14501
Japanese,14408
Bakery,12448


## **Segmenting and Clustering Neighborhoods - Brooklyn and Manhattan**

Explore Dataset

In [140]:
NYC_Geo=pd.read_csv('BON1_NYC_GEO.csv')
print('Data downloaded!')

Data downloaded!


In [141]:
NYC_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [142]:
NYC_Geo['Borough'].value_counts().to_frame()

Unnamed: 0,Borough
Queens,81
Brooklyn,70
Staten Island,63
Bronx,52
Manhattan,40


In [143]:
NYC_Geo.shape

(306, 4)

Neighborhood has a total of 5 boroughs and 306 neighborhoods

In [144]:
print(NYC_Geo.Borough.unique())

['Bronx' 'Manhattan' 'Brooklyn' 'Queens' 'Staten Island']


Segmenting and Clustering Neighborhoods - Brooklyn and Manhattan

In [145]:
BM_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Brooklyn')|(NYC_Geo['Borough'] == 'Manhattan')]
BM_Geo = BM_Geo.reset_index(drop=True)
BM_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Brooklyn,Bay Ridge,40.625801,-74.030621
2,Brooklyn,Bensonhurst,40.611009,-73.99518
3,Brooklyn,Sunset Park,40.645103,-74.010316
4,Brooklyn,Greenpoint,40.730201,-73.954241


In [146]:
BM_Geo.shape

(110, 4)

Let's use geopy library to get the latitude and longitude values of New York City. The geograpical coordinate of New York City are 40.7308619, -73.9871558..

In [147]:
import time
start_time = time.time()

address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

print("--- %s seconds ---" % round((time.time() - start_time), 2))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.
--- 0.4 seconds ---


Create a map of Brooklyn and Manhattan with neighborhoods superimposed on top.

In [148]:
# create map of Toronto using latitude and longitude values
map_BM = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(BM_Geo['Latitude'], BM_Geo['Longitude'], BM_Geo['Borough'], BM_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BM)  
    
map_BM

Extract Venues data for each neighborhoods in Brooklyn

In [149]:
map_BM.save('map_BM.html')

let's use the Foursquare API to explore neighborhoods in Brooklyn and Manhattan.

In [150]:
CLIENT_ID = 'OLN1BAQQBHO234LKFIU1ZNGV4Z3O3P1GS5KIMTNPJHLX1MKL' # your Foursquare ID
CLIENT_SECRET = 'VDM5CGGVSUOGKMY21ETO4J1UAJH5QJEALQCJAIWUF2DJXR2T' # your Foursquare Secret
VERSION = '20181218' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OLN1BAQQBHO234LKFIU1ZNGV4Z3O3P1GS5KIMTNPJHLX1MKL
CLIENT_SECRET:VDM5CGGVSUOGKMY21ETO4J1UAJH5QJEALQCJAIWUF2DJXR2T


In [151]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=200, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the above function on each neighborhood and create a new Dataframe

In [161]:
BM_venues = getNearbyVenues(names=BM_Geo['Neighborhood'],
                                  latitudes=BM_Geo['Latitude'],
                                  longitudes=BM_Geo['Longitude'],
                                  LIMIT=200, radius=200)
BM_venues.to_csv('BM_venues.csv', sep=',', encoding='UTF8')
BM_venues.head()

Marble Hill
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Rite Aid,40.875467,-73.908906,Pharmacy
1,Marble Hill,40.876551,-73.91066,Habib Deli,40.875179,-73.909486,Sandwich Place
2,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
3,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
4,Bay Ridge,40.625801,-74.030621,Brooklyn Market,40.626939,-74.029948,Grocery Store


In [162]:
colnames = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
BM_venues = pd.read_csv('BM_venues.csv', skiprows=1, names=colnames)
BM_venues.columns = BM_venues.columns.str.replace(' ', '')
BM_venues.head()

Unnamed: 0,Neighborhood,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
0,Marble Hill,40.876551,-73.91066,Rite Aid,40.875467,-73.908906,Pharmacy
1,Marble Hill,40.876551,-73.91066,Habib Deli,40.875179,-73.909486,Sandwich Place
2,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
3,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
4,Bay Ridge,40.625801,-74.030621,Brooklyn Market,40.626939,-74.029948,Grocery Store


In [163]:
BM_venues.shape

(1463, 7)

Let's visualize the venues

In [164]:
def Venues_Map(Borough_name, Borough_neighborhoods):
    
    # Use geopy library to get the latitude and longitude values 
    geolocator = Nominatim(user_agent="Jupyter")
    Borough_location = geolocator.geocode(Borough_name) #'Brooklyn, NY'
    Borough_latitude = Borough_location.latitude
    Borough_longitude = Borough_location.longitude
    print('The geographical coordinates of "{}" are {}, {}.'.format(Borough_name, Borough_latitude, Borough_longitude))
    
    # To verify the number of Boroughs and Neighborhoods in the extracted data
    print('The "{}" dataframe has {} different venue types and {} neighborhoods.'.format(
          Borough_name,
          len(Borough_neighborhoods['VenueCategory'].unique()),
          len(Borough_neighborhoods['Neighborhood'].unique())))
    
    # create map of city using latitude and longitude values
    map_Borough = folium.Map(location=[Borough_latitude, Borough_longitude], zoom_start=10)

    # add markers to map
    for lat, lng, venue, category in zip(Borough_neighborhoods['VenueLatitude'], Borough_neighborhoods['VenueLongitude'], Borough_neighborhoods['Venue'], Borough_neighborhoods['VenueCategory']):
        label = '{}, {}'.format(category, venue)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=0.1,
            popup=label,
            color='red',
            fill=True,
            fill_color='#FF0000',
            fill_opacity=0.3).add_to(map_Borough)  

    return map_Borough

In [165]:
BM_venues.groupby('VenueCategory')['Venue'].count().sort_values(ascending=False)

VenueCategory
Coffee Shop                                 61
Pizza Place                                 43
Italian Restaurant                          42
Café                                        38
Bar                                         35
Chinese Restaurant                          34
Mexican Restaurant                          30
Bakery                                      27
Deli / Bodega                               27
American Restaurant                         27
Gym                                         25
Sandwich Place                              25
Cocktail Bar                                25
Park                                        23
Ice Cream Shop                              23
Grocery Store                               20
Sushi Restaurant                            20
Gym / Fitness Center                        20
Bagel Shop                                  19
Spa                                         18
Cosmetics Shop                              17

Let's see how many venues were returned for each neighborhood 

In [166]:
BM_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,NeighborhoodLatitude,NeighborhoodLongitude,Venue,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,5,5,5,5,5,5
Battery Park City,26,26,26,26,26,26
Bay Ridge,20,20,20,20,20,20
Bedford Stuyvesant,1,1,1,1,1,1
Bensonhurst,2,2,2,2,2,2
Boerum Hill,14,14,14,14,14,14
Brighton Beach,16,16,16,16,16,16
Broadway Junction,8,8,8,8,8,8
Brooklyn Heights,41,41,41,41,41,41
Brownsville,4,4,4,4,4,4


In [167]:
print('There are {} uniques categories.'.format(len(BM_venues['VenueCategory'].unique())))

There are 263 uniques categories.


let's analyze Each Neighborhood

In [169]:
# one hot encoding
BM_onehot = pd.get_dummies(BM_venues[['VenueCategory']], prefix="", prefix_sep="")

#column lists before adding neighborhood
column_names = ['Neighborhood'] + list(BM_onehot.columns)

# add neighborhood column back to dataframe
BM_onehot['Neighborhood'] = BM_venues['Neighborhood'] 

# move neighborhood column to the first column
BM_onehot = BM_onehot[column_names]

BM_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundry Service,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Moving Target,Museum,Music Store,Music Venue,Nail Salon,New American Restaurant,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Opera House,Optical Shop,Other Great Outdoors,Outdoor Sculpture,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Photography Studio,Pier,Piercing Parlor,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Pub,Public Art,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Residential Building (Apartment / Condo),Restaurant,Rock Climbing Spot,Rock Club,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soup Place,South American Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Turkish Restaurant,Varenyky restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Filtering Data 

In [170]:
restaurant_List = []
search = 'Restaurant'
for i in BM_onehot.columns :
    if search in i:
        restaurant_List.append(i)

In [171]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List
BM_restaurant = BM_onehot[col_name]
BM_restaurant = BM_restaurant.iloc[:,1::]

In [172]:
BM_restaurant_grouped = BM_restaurant.groupby('Neighborhood').sum().reset_index()

KeyError: 'Neighborhood'

In [173]:
BM_restaurant_grouped['Total'] = BM_restaurant_grouped .sum(axis=1)

In [174]:
restaurant_List

['Afghan Restaurant',
 'American Restaurant',
 'Argentinian Restaurant',
 'Asian Restaurant',
 'Cajun / Creole Restaurant',
 'Caribbean Restaurant',
 'Caucasian Restaurant',
 'Chinese Restaurant',
 'Cuban Restaurant',
 'Dumpling Restaurant',
 'English Restaurant',
 'Falafel Restaurant',
 'Fast Food Restaurant',
 'Filipino Restaurant',
 'French Restaurant',
 'Greek Restaurant',
 'Halal Restaurant',
 'Hawaiian Restaurant',
 'Himalayan Restaurant',
 'Hotpot Restaurant',
 'Indian Restaurant',
 'Israeli Restaurant',
 'Italian Restaurant',
 'Japanese Curry Restaurant',
 'Japanese Restaurant',
 'Jewish Restaurant',
 'Korean Restaurant',
 'Latin American Restaurant',
 'Malay Restaurant',
 'Mediterranean Restaurant',
 'Mexican Restaurant',
 'Middle Eastern Restaurant',
 'Molecular Gastronomy Restaurant',
 'New American Restaurant',
 'North Indian Restaurant',
 'Pakistani Restaurant',
 'Peruvian Restaurant',
 'Polish Restaurant',
 'Ramen Restaurant',
 'Restaurant',
 'Scandinavian Restaurant',
 '

## Cluster Neighborhoods and Examine Clusters

let's find K-means using sklearn

In [175]:
from sklearn import metrics
from sklearn.metrics import pairwise_distances
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
from sklearn.cluster import KMeans
BM_grouped_clustering = BM_restaurant_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(BM_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = metrics.silhouette_score(BM_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

For n_clusters=2, The Silhouette Coefficient is 0.4915926596672915
For n_clusters=3, The Silhouette Coefficient is 0.43477110871642205
For n_clusters=4, The Silhouette Coefficient is 0.35719989316945366
For n_clusters=5, The Silhouette Coefficient is 0.3213171023600002
For n_clusters=6, The Silhouette Coefficient is 0.260922834210102
For n_clusters=7, The Silhouette Coefficient is 0.2853027804385233
For n_clusters=8, The Silhouette Coefficient is 0.25797922877246426
For n_clusters=9, The Silhouette Coefficient is 0.24653786662858576


Run k-means to cluster the neighborhood into high accuracy Silhouette Coefficient cluster.it's 2 clusters

In [176]:
# set number of clusters
kclusters = 2

BM_grouped_clustering = BM_restaurant_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BM_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1,
       0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])

In [177]:
BM_results = pd.DataFrame(kmeans.cluster_centers_)
BM_results.columns = BM_grouped_clustering.columns
BM_results.index = ['cluster0','cluster1']
BM_results['Total Sum'] = BM_results.sum(axis = 1)
BM_results

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,Burmese Restaurant,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Czech Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Paella Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Polish Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Total,Total Sum
cluster0,-1.734723e-18,0.019231,1.25,0.019231,0.038462,0.096154,0.019231,0.038462,-1.734723e-18,-1.734723e-18,0.019231,-6.938894e-18,0.019231,1.346154,0.038462,1.173077,0.019231,0.019231,0.115385,-3.469447e-18,2.775558e-17,0.019231,0.115385,4.1633360000000003e-17,-6.938894e-18,0.038462,0.134615,0.884615,-1.387779e-17,0.423077,0.057692,0.057692,0.019231,2.775558e-17,-1.734723e-18,-6.938894e-18,0.173077,0.019231,1.269231,-3.469447e-18,0.5,-6.938894e-18,0.019231,0.038462,0.019231,0.307692,-1.387779e-17,-6.938894e-18,0.307692,0.653846,0.192308,0.019231,-1.387779e-17,0.25,-1.734723e-18,-1.734723e-18,-6.938894e-18,0.192308,-3.469447e-18,0.076923,0.538462,0.134615,-3.469447e-18,0.634615,-1.734723e-18,0.019231,0.076923,0.096154,0.173077,0.576923,-3.469447e-18,0.019231,0.019231,0.057692,0.230769,0.019231,-6.938894e-18,0.038462,-6.938894e-18,-1.734723e-18,0.192308,-3.469447e-18,0.173077,26.038462,39.057692
cluster1,0.01724138,0.12069,1.793103,0.137931,0.12069,0.551724,0.137931,0.103448,0.01724138,0.01724138,0.086207,0.05172414,0.12069,1.206897,0.051724,1.534483,0.0,0.103448,0.293103,0.03448276,0.0862069,0.12069,0.172414,0.1206897,0.05172414,0.137931,0.241379,0.344828,0.137931,0.586207,0.086207,0.586207,0.137931,0.0862069,0.01724138,0.06896552,0.87931,0.051724,2.741379,0.03448276,1.258621,0.06896552,0.034483,0.293103,0.017241,0.810345,0.1034483,0.06896552,0.62069,2.051724,0.5,0.017241,0.1034483,0.672414,0.01724138,0.01724138,0.06896552,0.172414,0.03448276,0.155172,0.810345,0.206897,0.03448276,1.0,0.01724138,0.086207,0.103448,0.482759,0.482759,1.362069,0.03448276,0.051724,0.017241,0.431034,0.982759,0.017241,0.05172414,0.310345,0.05172414,0.01724138,0.706897,0.03448276,0.482759,56.103448,84.155172


The Total and Total Sum of cluster0 has smallest value. It shows that the market is not saturated.

In [178]:
BM_results_merged = pd.DataFrame(BM_restaurant_grouped['Neighborhood'])

BM_results_merged['Total'] = BM_restaurant_grouped['Total']
BM_results_merged = BM_results_merged.assign(Cluster_Labels = kmeans.labels_)


In [179]:
print(BM_results_merged.shape)
BM_results_merged

(110, 3)


Unnamed: 0,Neighborhood,Total,Cluster_Labels
0,Bath Beach,68,1
1,Battery Park City,24,0
2,Bay Ridge,72,1
3,Bedford Stuyvesant,40,0
4,Bensonhurst,64,1
5,Bergen Beach,6,0
6,Boerum Hill,36,0
7,Borough Park,12,0
8,Brighton Beach,40,0
9,Broadway Junction,16,0


Merge BM_results_merged with BM_Geo

In [180]:
BM_merged = BM_Geo

BM_merged = BM_merged.join(BM_results_merged.set_index('Neighborhood'), on='Neighborhood')

print(BM_merged.shape)
BM_merged.head(10) # check the last columns!

(110, 6)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
0,Manhattan,Marble Hill,40.876551,-73.91066,28,0
1,Brooklyn,Bay Ridge,40.625801,-74.030621,72,1
2,Brooklyn,Bensonhurst,40.611009,-73.99518,64,1
3,Brooklyn,Sunset Park,40.645103,-74.010316,76,1
4,Brooklyn,Greenpoint,40.730201,-73.954241,44,1
5,Brooklyn,Gravesend,40.59526,-73.973471,22,0
6,Brooklyn,Brighton Beach,40.576825,-73.965094,40,0
7,Brooklyn,Sheepshead Bay,40.58689,-73.943186,60,1
8,Brooklyn,Manhattan Terrace,40.614433,-73.957438,36,0
9,Brooklyn,Flatbush,40.636326,-73.958401,64,1


Finally, let's visualize the resulting clusters

In [181]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(BM_merged['Latitude'], BM_merged['Longitude'], BM_merged['Neighborhood'], BM_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [182]:
map_clusters.save('map_clusters.html')

### List Neighborhoods of Interest in New York City

Cluster 1 : Saturated Markets

In [183]:
BM_merged[BM_merged['Cluster_Labels'] == 1].reset_index(drop=True)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
0,Brooklyn,Bay Ridge,40.625801,-74.030621,72,1
1,Brooklyn,Bensonhurst,40.611009,-73.99518,64,1
2,Brooklyn,Sunset Park,40.645103,-74.010316,76,1
3,Brooklyn,Greenpoint,40.730201,-73.954241,44,1
4,Brooklyn,Sheepshead Bay,40.58689,-73.943186,60,1
5,Brooklyn,Flatbush,40.636326,-73.958401,64,1
6,Brooklyn,Crown Heights,40.670829,-73.943291,50,1
7,Brooklyn,Kensington,40.642382,-73.980421,44,1
8,Brooklyn,Prospect Heights,40.676822,-73.964859,48,1
9,Brooklyn,Williamsburg,40.707144,-73.958115,52,1


Cluster 0 : Untapped Markets

In [184]:
BM_merged[BM_merged['Total'] == 0].reset_index(drop=True)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Total,Cluster_Labels


There are no Untapped Markets for Restaurant business in Brooklyn and Manhattan.

#### Segmenting and Clustering Neighborhoods - Bronx, Queens and Staten Island

In [185]:
BQS_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Bronx')|(NYC_Geo['Borough'] == 'Queens')|(NYC_Geo['Borough'] == 'Staten Island')]
BQS_Geo = BQS_Geo.reset_index(drop=True)
BQS_Geo.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [186]:
BQS_Geo.shape

(196, 4)

Create a map of Bronx, Queens and Staten Island 

In [187]:
# create map of Toronto using latitude and longitude values
map_BQS = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(BQS_Geo['Latitude'], BQS_Geo['Longitude'], BQS_Geo['Borough'], BQS_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BQS)  
    
map_BQS

Explore Neighborhoods in Bronx, Queens and Staten Island

In [188]:
map_BQS.save('map_BQS.html')

In [190]:
BQS_venues = getNearbyVenues(names=BQS_Geo['Neighborhood'],
                                  latitudes=BQS_Geo['Latitude'],
                                  longitudes=BQS_Geo['Longitude'],radius=100,LIMIT=100)
BQS_venues.to_csv('BQS_venues.csv', sep=',', encoding='UTF8')
BQS_venues

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Astoria
Woodside
Jackson Heights
Elmhurst
Howard Beach
Corona
Forest Hills
Kew Gardens
Richmond Hill
Flushing
Long Island City
Sunnyside
East Elmhurst
Maspeth
Ridgewood
Glendale
Rego Park
Woodhaven
Ozone Park
South Ozone Park
College Point
Whitestone
Bayside
Auburndale
Little Neck
Douglaston
Glen Oaks
Bellerose
Kew Gardens Hills
Fresh Meadows
Briarwood
Jamaica Center
Oakland Gardens
Queens Village
Hollis
South Jamaica
St. Albans
Rochdale
Springfield Gardens
Cam

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kingsbridge,40.881687,-73.902818,Garden Gourmet Market,40.88135,-73.903389,Gourmet Shop
1,Kingsbridge,40.881687,-73.902818,MyUnique,40.881966,-73.903584,Thrift / Vintage Store
2,Kingsbridge,40.881687,-73.902818,Mattress Firm,40.881641,-73.903061,Mattress Store
3,Kingsbridge,40.881687,-73.902818,Stop & Shop,40.882048,-73.902111,Supermarket
4,Woodlawn,40.898273,-73.867315,Katonah Pizza and Pasta,40.898784,-73.867457,Pizza Place
5,Woodlawn,40.898273,-73.867315,Rambling House,40.898439,-73.867197,Pub
6,Woodlawn,40.898273,-73.867315,Curry Spot,40.897625,-73.867147,Indian Restaurant
7,Woodlawn,40.898273,-73.867315,Sean's Quality Deli,40.897669,-73.867445,Deli / Bodega
8,Woodlawn,40.898273,-73.867315,Behan's Pub,40.898585,-73.867507,Bar
9,Woodlawn,40.898273,-73.867315,CTown Supermarkets,40.897496,-73.86736,Grocery Store


Visualize the BQS_Venues data

In [191]:
BQS_venues.groupby('Venue Category')['Venue'].count().sort_values(ascending=False)

Venue Category
Deli / Bodega                               19
Pizza Place                                 18
Chinese Restaurant                          12
Bus Stop                                    10
Grocery Store                                9
Donut Shop                                   7
Bank                                         7
Italian Restaurant                           6
Sandwich Place                               6
Korean Restaurant                            6
Bus Station                                  6
Park                                         5
Ice Cream Shop                               4
Playground                                   4
Café                                         4
Fast Food Restaurant                         4
Liquor Store                                 4
Pharmacy                                     4
Discount Store                               4
Bar                                          4
Mexican Restaurant                           

Let's check how many venues were returned for each neighborhood

In [192]:
BQS_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,3,3,3,3,3,3
Arlington,2,2,2,2,2,2
Astoria Heights,1,1,1,1,1,1
Bay Terrace,3,3,3,3,3,3
Bayswater,2,2,2,2,2,2
Blissville,1,1,1,1,1,1
Briarwood,1,1,1,1,1,1
Brookville,1,1,1,1,1,1
Bulls Head,1,1,1,1,1,1
Butler Manor,1,1,1,1,1,1


In [193]:
print('There are {} uniques categories.'.format(len(BQS_venues['Venue Category'].unique())))

There are 118 uniques categories.


Analyze Each Neighborhood

In [194]:
# one hot encoding
BQS_onehot = pd.get_dummies(BQS_venues[['Venue Category']], prefix="", prefix_sep="")

#column lists before adding neighborhood
column_names = ['Neighborhood'] + list(BQS_onehot.columns)

# add neighborhood column back to dataframe
BQS_onehot['Neighborhood'] = BQS_venues['Neighborhood'] 

# move neighborhood column to the first column
BQS_onehot = BQS_onehot[column_names]

BQS_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Beer Garden,Bistro,Bridal Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cantonese Restaurant,Caribbean Restaurant,Check Cashing Service,Chinese Restaurant,Coffee Shop,Comedy Club,Construction & Landscaping,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,Eye Doctor,Farm,Fast Food Restaurant,Fish & Chips Shop,Food,Food & Drink Shop,Food Truck,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Gas Station,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Health Food Store,Historic Site,History Museum,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Latin American Restaurant,Laundromat,Liquor Store,Malay Restaurant,Massage Studio,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Mobile Phone Shop,Moving Target,Music Venue,Nail Salon,Neighborhood.1,New American Restaurant,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Residential Building (Apartment / Condo),Rock Climbing Spot,Salon / Barbershop,Sandwich Place,Shipping Store,Shoe Store,Smoothie Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sri Lankan Restaurant,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tapas Restaurant,Tattoo Parlor,Tennis Court,Thai Restaurant,Thrift / Vintage Store,Trail,Train Station,Video Store,Wine Bar,Wine Shop,Yoga Studio
0,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Kingsbridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Woodlawn,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Woodlawn,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [195]:
restaurant_List1 = []
search = 'Restaurant'
for i in BQS_onehot.columns :
    if search in i:
        restaurant_List1.append(i)

In [196]:
restaurant_List1

['American Restaurant',
 'Asian Restaurant',
 'Cantonese Restaurant',
 'Caribbean Restaurant',
 'Chinese Restaurant',
 'Dumpling Restaurant',
 'Fast Food Restaurant',
 'French Restaurant',
 'Greek Restaurant',
 'Indian Restaurant',
 'Italian Restaurant',
 'Japanese Restaurant',
 'Korean Restaurant',
 'Latin American Restaurant',
 'Malay Restaurant',
 'Mediterranean Restaurant',
 'Mexican Restaurant',
 'New American Restaurant',
 'Spanish Restaurant',
 'Sri Lankan Restaurant',
 'Sushi Restaurant',
 'Szechuan Restaurant',
 'Tapas Restaurant',
 'Thai Restaurant']

In [197]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List1
BQS_restaurant = BQS_onehot[col_name]
BQS_restaurant = BQS_restaurant.iloc[:,1::]
BQS_restaurant.columns

Index(['Neighborhood', 'American Restaurant', 'Asian Restaurant',
       'Cantonese Restaurant', 'Caribbean Restaurant', 'Chinese Restaurant',
       'Dumpling Restaurant', 'Fast Food Restaurant', 'French Restaurant',
       'Greek Restaurant', 'Indian Restaurant', 'Italian Restaurant',
       'Japanese Restaurant', 'Korean Restaurant', 'Latin American Restaurant',
       'Malay Restaurant', 'Mediterranean Restaurant', 'Mexican Restaurant',
       'New American Restaurant', 'Spanish Restaurant',
       'Sri Lankan Restaurant', 'Sushi Restaurant', 'Szechuan Restaurant',
       'Tapas Restaurant', 'Thai Restaurant'],
      dtype='object')

In [198]:
BQS_restaurant_grouped = BQS_restaurant.groupby('Neighborhood').sum().reset_index()

In [199]:
BQS_restaurant_grouped['Total'] = BQS_restaurant_grouped .sum(axis=1)

Cluster Neighborhoods and Examine Clusters

In [200]:
BQS_grouped_clustering = BQS_restaurant_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(BQS_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = metrics.silhouette_score(BQS_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

For n_clusters=2, The Silhouette Coefficient is 0.7793696095541235
For n_clusters=3, The Silhouette Coefficient is 0.6460116174922196
For n_clusters=4, The Silhouette Coefficient is 0.6525422289330138
For n_clusters=5, The Silhouette Coefficient is 0.6712446961971045
For n_clusters=6, The Silhouette Coefficient is 0.6720980198163584
For n_clusters=7, The Silhouette Coefficient is 0.6853547793283975
For n_clusters=8, The Silhouette Coefficient is 0.6905652763135313
For n_clusters=9, The Silhouette Coefficient is 0.5898548489118448


Run k-means to cluster the neighborhood 

In [201]:
# set number of clusters
kclusters = 2

BQS_grouped_clustering = BQS_restaurant_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BQS_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [202]:
BQS_results = pd.DataFrame(kmeans.cluster_centers_)
BQS_results.columns = BQS_grouped_clustering.columns
BQS_results.index = ['cluster0','cluster1']
BQS_results['Total Sum'] = BQS_results.sum(axis = 1)
BQS_results

Unnamed: 0,American Restaurant,Asian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Dumpling Restaurant,Fast Food Restaurant,French Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,New American Restaurant,Spanish Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Total,Total Sum
cluster0,0.02439,0.02439,0.01219512,0.02439024,0.109756,6.938894e-18,0.02439,0.01219512,6.938894e-18,0.02439,0.036585,0.02439,-9.714451000000001e-17,0.01219512,6.938894e-18,6.938894e-18,0.012195,0.01219512,0.01219512,0.01219512,0.012195,6.938894e-18,6.938894e-18,1.387779e-17,0.390244,0.780488
cluster1,0.166667,0.166667,-1.734723e-18,-3.469447e-18,0.5,0.1666667,0.333333,-1.734723e-18,0.1666667,0.166667,0.5,0.166667,1.0,-1.734723e-18,0.1666667,0.1666667,0.5,-1.734723e-18,-1.734723e-18,-1.734723e-18,0.166667,0.1666667,0.1666667,0.3333333,5.0,10.0


The Total and Total Sum of cluster0 has smallest value. It shows that the market is not saturated.

In [203]:
BQS_results_merged = pd.DataFrame(BQS_restaurant_grouped['Neighborhood'],)

BQS_results_merged['Total'] = BQS_restaurant_grouped['Total']
BQS_results_merged = BQS_results_merged.assign(Cluster_Labels = kmeans.labels_)

In [204]:
print(BQS_results_merged.shape)
BQS_results_merged

(88, 3)


Unnamed: 0,Neighborhood,Total,Cluster_Labels
0,Allerton,0,0
1,Arlington,0,0
2,Astoria Heights,0,0
3,Bay Terrace,0,0
4,Bayswater,0,0
5,Blissville,0,0
6,Briarwood,0,0
7,Brookville,0,0
8,Bulls Head,0,0
9,Butler Manor,0,0


Merge BQS_results_merged with BQS_Geo

In [205]:
BQS_merged = BQS_Geo

BQS_merged = BQS_merged.join(BQS_results_merged.set_index('Neighborhood'), on='Neighborhood')

print(BQS_merged.shape)
BQS_merged.head(10) # check the last columns!

(196, 6)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Total,Cluster_Labels
0,Bronx,Wakefield,40.894705,-73.847201,,
1,Bronx,Co-op City,40.874294,-73.829939,,
2,Bronx,Eastchester,40.887556,-73.827806,,
3,Bronx,Fieldston,40.895437,-73.905643,,
4,Bronx,Riverdale,40.890834,-73.912585,,
5,Bronx,Kingsbridge,40.881687,-73.902818,0.0,0.0
6,Bronx,Woodlawn,40.898273,-73.867315,1.0,0.0
7,Bronx,Norwood,40.877224,-73.879391,1.0,0.0
8,Bronx,Williamsbridge,40.881039,-73.857446,,
9,Bronx,Baychester,40.866858,-73.835798,,


Finally, let's visualize the resulting clusters

In [209]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(BQS_merged['Latitude'], BQS_merged['Longitude'], BQS_merged['Neighborhood'], BQS_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float

In [208]:
map_clusters.save('map_clusters.html')

#### List Neighborhoods of Interest in New York City - Bronx, Queens and Staten Island

Cluster 1 : Saturated Markets

In [None]:
BQS_merged[BQS_merged['Cluster_Labels'] == 1].reset_index(drop=True)

Cluster 0 : Untapped Markets

In [None]:
BQS_merged[BQS_merged['Total'] == 0].reset_index(drop=True)

In [None]:
from platform import python_version

print(python_version())