# Introduction 
### New York City, one of the most vibrant financial centers of the United States, is home to thousands of restaurants—each as diverse and unique as the 8.6 million people people who live there.  Every business in the city strives to rise above competitors while meeting strict city codes and laws.  Such a contentious environments demands that potential restaurant owners thoroughly consider many details before opening their businesses.  Considerations include a myriad of things to consider: cuisine, location, atmosphere, and customer-base. 

# Business Problem
### As outlined above, future New York restaurant owners compete in a highly contentious business environment.  Therefore, this analysis seeks to capture and present accurate picture of some of the most successful restaurants in New York City. 

# This review will provide an in-depth examination of the following factors to enable the target audience with all the data needed to confidently make an informed decision:
### o Demographics of New York
### o Population distributions
### o Neighborhood statistics
### o Avoidance areas, such as those cornered by competitors
### o Independent markets, such as open markets and farmer’s co-ops
### o Local attractions including malls, tourist attractions, theaters, etc.

# Target Audience
### The target audience for this project primarily represents three types of people, entrepreneurs, professional restaurant staff, and investors.  This analysis will use modern data science principles to provide a well-researched recommendation to each of these groups.  

# Data Sources
### Though this review utilizes multiple sources, all data will focus on New York City:

### - Geographical data comes from GPS-coordinates.org and NYC.gov
https://gps-coordinates.org/new-york-city-latitude.php
https://www1.nyc.gov/site/planning/zoning/districts-tools/residence-districts-r1-r10.page

### - Primarily our base data search began on Wikipedia, at the following sites
https://en.wikipedia.org/wiki/New_York_City
https://en.wikipedia.org/wiki/List_of_restaurants_in_New_York_City
https://en.wikipedia.org/wiki/New_York_City#Cuisine
https://en.wikipedia.org/wiki/Economy_of_New_York_City
https://en.wikipedia.org/wiki/New_York_City#Streets_and_highways

### - Derived data covers over 300 neighborhoods and 5 boroughs
https://geo.nyu.edu/?f%5Bdc_subject_sm%5D%5B%5D=Neighborhoods


### - Restaurant is provided by City-data.com 
https://www.city-data.com/city/New-York-New-York.html

### - Locations of sidewalk cafes and open-air eating establishment
https://data.cityofnewyork.us/City-Government/Sidewalk-Caf-Regulations-GIS-Shapefile/qsuf-mgjh

In [None]:
!conda install -c conda-forge geopy --yes

Solving environment: \ 

In [None]:
!conda install -c conda-forge folium=0.5.0 --yes

## Downloading needed depecencies...

In [None]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

from geopy.geocoders import Nominatim 

import requests 
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

import folium 

import csv

print('Libraries imported.')

# Methodology
### This analysis uses mapping and table data to depict very detailed data visually.  Data was transformed with Pandas dataframe, then looped though the dataframe to render readable results. I then verified that predicted results were matching expeccted data.  Trained data and test data matched appropriately.  Then,  I created a map overlay showing New York Neighborhoods.  I used web scraping techniques, via Python/Beautiful Soup, to glean population and demographic data.  The project reqired several rounds of web scraping.  To clean data, I had to remove white spaces and change the name of some columns to make them usable.  When this portion of the project was complete, I saved the dataframe to a .CSV file.  Then, I downloaded and analyzed data related to New York cuisine.  Next I segmented and clustered neighborhoods with Foursquare API.  Then, I used k-means to cluster neighborhoods.  Lastly, I configured parameters to visualize the data in this report.

In [None]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

In [None]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
neighborhoods_data = newyork_data['features']

In [None]:
neighborhoods_data[0]

## Pushing data through the Pandas dataframe

In [None]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

In [None]:
## Looping the data

In [None]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
neighborhoods.head()

In [None]:
neighborhoods.to_csv('BON1_NYC_GEO.csv',index=False)

Using GeoPy library to get ltitude and longitude coordinates for our city 

In [None]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

## Creating our map overlayto depict neighborhoods

In [None]:
map_NewYork = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NewYork)  
    
map_NewYork

## Here, we begin web scraping for population data 

In [None]:
website_url = requests.get('https://en.wikipedia.org/wiki/Demographics_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
   td = row.find_all('td')
   row = [row.text for row in td]
   rows.append(row)

with open('BON2_POPULATION1.csv', 'w') as f:
   writer = csv.writer(f)
   writer.writerow(headers)
   writer.writerows(row for row in rows if row)

In [None]:
Pop_data=pd.read_csv('BON2_POPULATION1.csv')
Pop_data.drop(Pop_data.columns[[9,10,11]], axis=1,inplace=True)
print('Data downloaded!')
Pop_data.head()

In [None]:
Pop_data.columns = Pop_data.columns.str.replace(' ', '')
Pop_data.columns = Pop_data.columns.str.replace('\'','')
Pop_data.rename(columns={'Borough':'persons_sq_mi','County':'persons_sq_km'}, inplace=True)
Pop_data.head(10)

In [None]:
Pop_data.rename(columns = {'NewYorkCitysfiveboroughsvte\n' : 'Borough',
                   'Jurisdiction\n':'County',
                   'Population\n':'Estimate_2017', 
                   'Landarea\n':'square_miles',
                    'Density\n':'square_km',
                          }, inplace=True)
Pop_data

In [None]:
Pop_data['Borough']=Pop_data['Borough'].replace(to_replace='\n', value='', regex=True)
Pop_data['County']=Pop_data['County'].replace(to_replace='\n', value='', regex=True)
Pop_data['Estimate_2017']=Pop_data['Estimate_2017'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_miles']=Pop_data['square_miles'].replace(to_replace='\n', value='', regex=True)
Pop_data['square_km']=Pop_data['square_km'].replace(to_replace='\n', value='', regex=True)
Pop_data['persons_sq_mi']=Pop_data['persons_sq_mi'].replace(to_replace='\n', value='', regex=True)
Pop_data['squarekm']=Pop_data['squarekm'].replace(to_replace='\n', value='', regex=True)
Pop_data

In [None]:
Pop_data.loc[5:,['persons_sq_mi','persons_sq_km']] = Pop_data.loc[2:,['persons_sq_mi','persons_sq_km']].shift(1,axis=1)
Pop_data.loc[5:,['square_km','persons_sq_mi']] = Pop_data.loc[2:,['square_km','persons_sq_mi']].shift(1,axis=1)
Pop_data.loc[5:,['square_miles','square_km']] = Pop_data.loc[2:,['square_miles','square_km']].shift(1,axis=1)
Pop_data.loc[5:,['Estimate_2017','square_miles']] = Pop_data.loc[2:,['Estimate_2017','square_miles']].shift(1,axis=1)
Pop_data.loc[5:,['County','Estimate_2017']] = Pop_data.loc[2:,['County','Estimate_2017']].shift(1,axis=1)
Pop_data.loc[5:,['Borough','County']] = Pop_data.loc[2:,['Borough','County']].shift(1,axis=1)
Pop_data

In [None]:
Pop_data = Pop_data.fillna('')
Pop_data

In [None]:
i = Pop_data[((Pop_data.County == 'Sources: [2] and see individual borough articles'))].index
Pop_data.drop(i)

In [None]:
Pop_data.to_csv('BON2_POPULATION.csv',index=False)

In [None]:
## Here, we draw demographics data through web scraping

In [None]:
website_url = requests.get('https://en.wikipedia.org/w/index.php?title=New_York_City&oldid=861524529').text
soup = BeautifulSoup(website_url,'lxml')
table = soup.find('table',{'class':'wikitable sortable collapsible'})

headers = [header.text for header in table.find_all('th')]

table_rows = table.find_all('tr')        
rows = []
for row in table_rows:
    td = row.find_all('td')
    row = [row.text for row in td]
    rows.append(row)

with open('NYC_DEMO.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(row for row in rows if row)

In [None]:
Demo_data=pd.read_csv('NYC_DEMO.csv')
print('Data downloaded!')

In [None]:
Demo_data

In [None]:
Demo_data.columns

In [None]:
Demo_data.rename(columns = {'2010[239]' : '2010',
                   '1990[241]':'1990',
                   '1970[241]':'1970', 
                   '1940[241]\n':'1940',
                    }, inplace=True)
Demo_data

In [None]:
Demo_data.columns

In [None]:
Demo_data.columns = Demo_data.columns.str.replace(' ', '')

In [None]:
Demo_data= Demo_data.replace('\n',' ', regex=True)
Demo_data

In [None]:
Demo_data['1970'] = Demo_data['1970'].str.rstrip('[242]')
Demo_data

In [None]:
Demo_data.to_csv('BON2_DEMOGRAPHICS.csv',index=False)

In [None]:
from PIL import Image

In [None]:
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

mpl.style.use('ggplot')

print ('Matplotlib version: ', mpl.__version__)

from wordcloud import WordCloud, STOPWORDS

print ('Wordcloud is installed and imported!')

In [None]:
!conda install -c conda-forge wordcloud==1.4.1 --yes
<div class="div-col columns column-width" style="-moz-column-width: 30em; -webkit-column-width: 30em; column-width: 30em;">
<ul><li><a href="/wiki/Bedford_Park,_Bronx" title="Bedford Park, Bronx">Bedford Park</a> – Mexican, Puerto Rican, Dominican, Korean (on 204th St.)</li>
<li><a href="/wiki/Belmont,_Bronx" title="Belmont, Bronx">Belmont</a> – Italian, Albanian (also known as "Arthur Avenue," "Little Italy")</li>
<li><a href="/wiki/City_Island,_Bronx" title="City Island, Bronx">City Island</a> – Italian, Seafood</li>
<li><a href="/wiki/Morris_Park,_Bronx" title="Morris Park, Bronx">Morris Park</a> – Italian, Albanian</li>
<li><a href="/wiki/Norwood,_Bronx" title="Norwood, Bronx">Norwood</a> – Filipino (formerly Irish, less so today)</li>
<li><a href="/wiki/Riverdale,_Bronx" title="Riverdale, Bronx">Riverdale</a> – Jewish</li>
<li><a href="/wiki/South_Bronx" title="South Bronx">South Bronx</a> – Puerto Rican, Dominican</li>
<li><a href="/wiki/Wakefield,_Bronx" title="Wakefield, Bronx">Wakefield</a> – Jamaican, West Indian</li>
<li><a href="/wiki/Woodlawn,_Bronx" title="Woodlawn, Bronx">Woodlawn</a> – Irish</li></ul>
 </div>

In [None]:
website_url = requests.get('https://en.wikipedia.org/wiki/Cuisine_of_New_York_City').text
soup = BeautifulSoup(website_url,'lxml')
uls = soup.find({'div':'div-col columns column-width'})

headers = [header.text for header in ul.find_all('li')]

table_rows = ul.find_all('li')        
lis = []
for ul in uls:
    for li in ul.findAll('li'):
        if li.find('ul'):
            break
        lis.append(li)

with open('BON3_NYC_CUISINE.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(headers)
    writer.writerows(li for li in uls if li)

In [None]:
my_file = project.get_file("BON3_NYC_CUISINE.csv")

my_file.seek(0)
import pandas as pd
NYC_CUISINE=pd.read_csv("BON3_NYC_CUISINE.csv")
NYC_CUISINE.drop(NYC_CUISINE.columns[[3,4,5,6,7]], axis=1,inplace=True) 
NYC_CUISINE.head()

In [None]:
NYC_CUISINE.shape

In [None]:
NYC_CUISINE['Borough'].value_counts().to_frame()

In [None]:
CUISINE_WC = NYC_CUISINE[['Cuisine']]
CUISINE_WC

In [None]:
CUISINE_WC.to_csv('CUISINE_WC.txt', sep=',', index=False)

In [None]:
CUISINE_WC1 = open('CUISINE_WC.txt', 'r').read()

In [None]:
stopwords = set(STOPWORDS)

In [None]:
NYC_CUISINE_WC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

NYC_CUISINE_WC.generate(CUISINE_WC1)

In [None]:
plt.imshow(NYC_CUISINE_WC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

In [None]:
Brooklyn_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Brooklyn'].reset_index(drop=True)
Brooklyn_data.head()

In [None]:
BR_CUISINE_WC = Brooklyn_data[['Cuisine']]
BR_CUISINE_WC

In [None]:
BR_CUISINE_WC.to_csv('BR_CUISINE.txt', sep=',', index=False)
BR_CUISINE_WC = open('BR_CUISINE.txt', 'r').read()
stopwords = set(STOPWORDS)

In [None]:
BR_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

BR_CUISINE_NYC.generate(BR_CUISINE_WC)

In [None]:
plt.imshow(BR_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

In [None]:
Queens_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Queens'].reset_index(drop=True)
Queens_data.head()

In [None]:
Q_CUISINE_WC = Queens_data[['Cuisine']]
Q_CUISINE_WC

In [None]:
Q_CUISINE_WC.to_csv('Q_CUISINE.txt', sep=',', index=False)

Q_CUISINE_WC = open('Q_CUISINE.txt', 'r').read()

stopwords = set(STOPWORDS)

Q_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

Q_CUISINE_NYC.generate(Q_CUISINE_WC)

In [None]:
plt.imshow(Q_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

In [None]:
Manhattan_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'Manhattan'].reset_index(drop=True)
Manhattan_data.head()
MN_CUISINE_WC = Manhattan_data[['Cuisine']]
MN_CUISINE_WC

In [None]:
MN_CUISINE_WC.to_csv('MN_CUISINE.txt', sep=',', index=False)

MN_CUISINE_WC = open('MN_CUISINE.txt', 'r').read()

stopwords = set(STOPWORDS)

MN_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

MN_CUISINE_NYC.generate(MN_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c126c50>

plt.imshow(MN_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

In [None]:
Bronx_data = NYC_CUISINE[NYC_CUISINE['Borough'] == 'The Bronx'].reset_index(drop=True)
Bronx_data.head()

In [None]:
BX_CUISINE_WC = Bronx_data[['Cuisine']]
BX_CUISINE_WC

In [None]:
BX_CUISINE_WC.to_csv('BX_CUISINE.txt', sep=',', index=False)

BX_CUISINE_WC = open('BX_CUISINE.txt', 'r').read()

stopwords = set(STOPWORDS)

BX_CUISINE_NYC = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)

BX_CUISINE_NYC.generate(BX_CUISINE_WC)

<wordcloud.wordcloud.WordCloud at 0x7f562c149438>

plt.imshow(BX_CUISINE_NYC, interpolation='bilinear')
plt.axis('off')

fig = plt.figure()
fig.set_figwidth(30)
fig.set_figheight(45)

plt.show()

In [None]:
import seaborn as sns

In [None]:
my_file = project.get_file("DOHMH_Farmers_Markets_and_Food_Boxes.csv")

my_file.seek(0)
FM_NYC=pd.read_csv(my_file)
FM_NYC.head()

In [None]:
FM_NYC.rename(columns={'Service Type':'Service_Type'}, inplace=True)
print(FM_NYC.Service_Type.unique())
FM_NYC['Service_Type'].value_counts().to_frame()

In [None]:
fig.ax = plt.subplots(1, 1, figsize=(5, 5))
sns.countplot(x='Service_Type',data=FM_NYC)
ax.set_title("Service_Type")
for t in ax.patches:
    if (np.isnan(float(t.get_height()))):
        ax.annotate('', (t.get_x(), 0))
    else:
        ax.annotate(str(format(int(t.get_height()), ',d')), (t.get_x(), t.get_height()*1.01))
    
plt.show();

In [None]:
FM_NYC_filtered = FM_NYC[FM_NYC['Service_Type'] == 'Farmers Markets'].copy()
FM_NYC_filtered ['Borough'] = FM_NYC_filtered['Borough'].map(lambda x: x.strip())
print(FM_NYC_filtered.shape)
FM_NYC_filtered.head()

In [None]:
fig.ax = plt.subplots(1, 1, figsize=(5, 5))
sns.countplot(x='Borough',data=FM_NYC_filtered)
ax.set_title("Borough")
for t in ax.patches:
    if (np.isnan(float(t.get_height()))):
        ax.annotate('', (t.get_x(), 0))
    else:
        ax.annotate(str(format(int(t.get_height()), ',d')), (t.get_x(), t.get_height()*1.01))
        ax.set_xticklabels([t.get_text().split("T")[0] for t in ax.get_xticklabels()])

plt.xticks(rotation=90) 
plt.show()

In [None]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

In [None]:
map_markets = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, FacilityName, borough in zip(FM_NYC_filtered['Latitude'], FM_NYC_filtered['Longitude'], FM_NYC_filtered['FacilityName'], FM_NYC_filtered['Borough']):
            label = '{}, {}'.format(FacilityName, borough)
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
                color='green',
                fill=True,
                fill_color='green',
                fill_opacity=0.7,
                parse_html = False).add_to(map_markets)  

map_markets

In [None]:
from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

In [None]:
NYC_Geo=pd.read_csv('BON1_NYC_GEO.csv')
print('Data downloaded!')

In [None]:
NYC_Geo.head()

In [None]:
NYC_Geo['Borough'].value_counts().to_frame()
NYC_Geo.shape
print(NYC_Geo.Borough.unique())
NYC_Geo.isnull().sum()
BM_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Brooklyn')|(NYC_Geo['Borough'] == 'Manhattan')]
BM_Geo = BM_Geo.reset_index(drop=True)
BM_Geo.head()

In [None]:
BM_Geo.shape

In [None]:
import time
start_time = time.time()

address = 'New York City, NY'

geolocator = Nominatim(user_agent="Jupyter")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

print("--- %s seconds ---" % round((time.time() - start_time), 2))

In [None]:
map_BM = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(BM_Geo['Latitude'], BM_Geo['Longitude'], BM_Geo['Borough'], BM_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BM)  
    
map_BM

In [None]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=200, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
BM_venues = getNearbyVenues(names=BM_Geo['Neighborhood'],
                                  latitudes=BM_Geo['Latitude'],
                                  longitudes=BM_Geo['Longitude'],
                                  LIMIT=200)

print('The "BM_venues" dataframe has {} venues and {} unique venue types.'.format(
      len(BM_venues['Venue Category']),
      len(BM_venues['Venue Category'].unique())))

BM_venues.to_csv('BM_venues.csv', sep=',', encoding='UTF8')
BM_venues.head()

In [None]:
colnames = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
BM_venues = pd.read_csv('BM_venues.csv', skiprows=1, names=colnames)
BM_venues.columns = BM_venues.columns.str.replace(' ', '')
BM_venues.head()

In [None]:
BM_venues.shape

In [None]:
def Venues_Map(Borough_name, Borough_neighborhoods):
    
    geolocator = Nominatim(user_agent="Jupyter")
    Borough_location = geolocator.geocode(Borough_name)
    Borough_latitude = Borough_location.latitude
    Borough_longitude = Borough_location.longitude
    print('The geographical coordinates of "{}" are {}, {}.'.format(Borough_name, Borough_latitude, Borough_longitude))
    
    print('The "{}" dataframe has {} different venue types and {} neighborhoods.'.format(
          Borough_name,
          len(Borough_neighborhoods['VenueCategory'].unique()),
          len(Borough_neighborhoods['Neighborhood'].unique())))
    
    map_Borough = folium.Map(location=[Borough_latitude, Borough_longitude], zoom_start=10)

    for lat, lng, venue, category in zip(Borough_neighborhoods['VenueLatitude'], Borough_neighborhoods['VenueLongitude'], Borough_neighborhoods['Venue'], Borough_neighborhoods['VenueCategory']):
        label = '{}, {}'.format(category, venue)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=0.1,
            popup=label,
            color='red',
            fill=True,
            fill_color='#FF0000',
            fill_opacity=0.3).add_to(map_Borough)  

    return map_Borough

In [None]:
Venues_Map('New York City, NY', BM_venues)
BM_venues.groupby('VenueCategory')['Venue'].count().sort_values(ascending=False)

In [None]:
BM_venues.groupby('Neighborhood').count()
print('There are {} uniques categories.'.format(len(BM_venues['VenueCategory'].unique())))

Analyzing each neighborhood...

In [None]:
BM_onehot = pd.get_dummies(BM_venues[['VenueCategory']], prefix="", prefix_sep="")

column_names = ['Neighborhood'] + list(BM_onehot.columns)

BM_onehot['Neighborhood'] = BM_venues['Neighborhood'] 

BM_onehot = BM_onehot[column_names]

BM_onehot.head()

In [None]:
restaurant_List = []
search = 'Restaurant'
for i in BM_onehot.columns :
    if search in i:
        restaurant_List.append(i)
restaurant_List

In [None]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List
BM_restaurant = BM_onehot[col_name]
BM_restaurant = BM_restaurant.iloc[:,1::]

In [None]:
BM_restaurant_grouped = BM_restaurant.groupby('Neighborhood').sum().reset_index()
BM_restaurant_grouped['Total'] = BM_restaurant_grouped .sum(axis=1)

In [None]:
BM_grouped_clustering = BM_restaurant_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(BM_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = silhouette_score(BM_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

In [None]:
kclusters = 2

BM_grouped_clustering = BM_restaurant_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BM_grouped_clustering)

kmeans.labels_

In [None]:
BM_results = pd.DataFrame(kmeans.cluster_centers_)
BM_results.columns = BM_grouped_clustering.columns
BM_results.index = ['cluster0','cluster1']
BM_results['Total Sum'] = BM_results.sum(axis = 1)
BM_results

In [None]:
BM_results_merged = pd.DataFrame(BM_restaurant_grouped['Neighborhood'])

BM_results_merged['Total'] = BM_restaurant_grouped['Total']
BM_results_merged = BM_results_merged.assign(Cluster_Labels = kmeans.labels_)

In [None]:
print(BM_results_merged.shape)
BM_results_merged
BM_merged = BM_Geo

BM_merged = BM_merged.join(BM_results_merged.set_index('Neighborhood'), on='Neighborhood')

print(BM_merged.shape)
BM_merged.head(10)

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(BM_merged['Latitude'], BM_merged['Longitude'], BM_merged['Neighborhood'], BM_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
BM_merged[BM_merged['Cluster_Labels'] == 1].reset_index(drop=True)

In [None]:
BM_merged[BM_merged['Total'] == 0].reset_index(drop=True)

In [None]:
BQS_Geo = NYC_Geo.loc[(NYC_Geo['Borough'] == 'Bronx')|(NYC_Geo['Borough'] == 'Queens')|(NYC_Geo['Borough'] == 'Staten Island')]
BQS_Geo = BQS_Geo.reset_index(drop=True)
BQS_Geo.head()

In [None]:
BQS_Geo.shape

In [None]:
map_BQS = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(BQS_Geo['Latitude'], BQS_Geo['Longitude'], BQS_Geo['Borough'], BQS_Geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_BQS)  
    
map_BQS

In [None]:
BQS_venues = getNearbyVenues(names=BQS_Geo['Neighborhood'],
                                  latitudes=BQS_Geo['Latitude'],
                                  longitudes=BQS_Geo['Longitude'],
                                  LIMIT=200)

print('The "BQS_venues" dataframe has {} venues and {} unique venue types.'.format(
      len(BQS_venues['Venue Category']),
      len(BQS_venues['Venue Category'].unique())))

BQS_venues.to_csv('BQS_venues.csv', sep=',', encoding='UTF8')
BQS_venues.head()

In [None]:
colnames = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
BQS_venues = pd.read_csv('BQS_venues.csv', skiprows=1, names=colnames)
BQS_venues.columns = BQS_venues.columns.str.replace(' ', '')
BQS_venues.head()

In [None]:
Venues_Map('New York City, NY', BQS_venues)
BQS_venues.groupby('VenueCategory')['Venue'].count().sort_values(ascending=False)

In [None]:
BQS_venues.groupby('Neighborhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(BQS_venues['VenueCategory'].unique())))

In [None]:
BQS_onehot = pd.get_dummies(BQS_venues[['VenueCategory']], prefix="", prefix_sep="")

column_names = ['Neighborhood'] + list(BQS_onehot.columns)

BQS_onehot['Neighborhood'] = BQS_venues['Neighborhood'] 

BQS_onehot = BQS_onehot[column_names]

BQS_onehot.head()

In [None]:
restaurant_List1 = []
search = 'Restaurant'
for i in BQS_onehot.columns :
    if search in i:
        restaurant_List1.append(i)

In [None]:
col_name = []
col_name = ['Neighborhood'] + restaurant_List1
BQS_restaurant = BQS_onehot[col_name]
BQS_restaurant = BQS_restaurant.iloc[:,1::]

In [None]:
BQS_restaurant_grouped = BQS_restaurant.groupby('Neighborhood').sum().reset_index()

BQS_restaurant_grouped['Total'] = BQS_restaurant_grouped .sum(axis=1)

In [None]:
BQS_grouped_clustering = BQS_restaurant_grouped.drop('Neighborhood', 1)

for n_cluster in range(2, 10):
    kmeans = KMeans(n_clusters=n_cluster).fit(BQS_grouped_clustering)
    label = kmeans.labels_
    sil_coeff = silhouette_score(BQS_grouped_clustering, label, metric='euclidean')
    print("For n_clusters={}, The Silhouette Coefficient is {}".format(n_cluster, sil_coeff))

In [None]:
kclusters = 2

BQS_grouped_clustering = BQS_restaurant_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(BQS_grouped_clustering)

kmeans.labels_

In [None]:
BQS_results = pd.DataFrame(kmeans.cluster_centers_)
BQS_results.columns = BQS_grouped_clustering.columns
BQS_results.index = ['cluster0','cluster1']
BQS_results['Total Sum'] = BQS_results.sum(axis = 1)
BQS_results

In [None]:
BQS_results_merged = pd.DataFrame(BQS_restaurant_grouped['Neighborhood'],)
BQS_results_merged['Total'] = BQS_restaurant_grouped['Total']
BQS_results_merged = BQS_results_merged.assign(Cluster_Labels = kmeans.labels_)
print(BQS_results_merged.shape)
BQS_results_merged

In [None]:
BQS_merged = BQS_Geo

BQS_merged = BQS_merged.join(BQS_results_merged.set_index('Neighborhood'), on='Neighborhood')

print(BQS_merged.shape)
BQS_merged.head(10)

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(BQS_merged['Latitude'], BQS_merged['Longitude'], BQS_merged['Neighborhood'], BQS_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
BQS_merged[BQS_merged['Cluster_Labels'] == 1].reset_index(drop=True)

In [None]:
BQS_merged[BQS_merged['Total'] == 0].reset_index(drop=True)


# Conclusion
### This analysis uses numerous tools to locate, gather, clean, and visualize the data from a broad range of data sources.  However, this report also does a fair job of predict the success of specific cuisines, based on the patronage of customers visiting those areas.