### Background

*New York is one of the top restaurant destinations in the United States. Some would even say that this state is home to the best food in the world. With such high standards,  aspiring restaurant owners have a tough decision on were to locate new restaurants*. 

*The pupose of this project is to help those aspiring restaurant owners to identify the ideal neighborhood to open their business by providing competitors location data, as well as projected population data. This will enable people to make smart and efficient decisions on selecting a great neighborhood for their business.*

### Import Libraries

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

### Download Neighborhood Data Set

In [5]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

*Tranform the data into a pandas dataframe*

In [6]:
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### New York City

In [7]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


## Download Foursquare Data

In [8]:
CLIENT_ID = '4HQS2WVLRKYBSGRZTCSOI1HDGLUIABGR1AGUAEQC2GESHWV2' # your Foursquare ID
CLIENT_SECRET = 'TDIQP4RQOHDKUBUOM2K4GKLYVZVHKPRTY0DTMZPMBTVN22MW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [12]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name


LIMIT = 1000


*Get Restaurant Data*

In [16]:
search_query = 'Restaurant'
radius = 500
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()
results
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]



food_venues = dataframe_filtered[['name','categories','lat','lng']]
food_venues.head()

  # This is added back by InteractiveShellApp.init_path()


Unnamed: 0,name,categories,lat,lng
0,Mudville Restaurant & Tap House,Wings Joint,40.715336,-74.008881
1,Amore's Pizza Restaurant,Pizza Place,40.71586,-74.009888
2,TJ Byrnes Bar and Restaurant,Restaurant,40.709233,-74.003747
3,New Shezan Restaurant,Middle Eastern Restaurant,40.715789,-74.007227
4,Win Won Restaurant,Chinese Restaurant,40.709193,-74.009344


In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
manhattan_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )
manhattan_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Walgreens,40.896528,-73.8447,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
4,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop


*Merge Dataframes*

In [28]:
df = pd.merge(left=manhattan_venues, right=food_venues, left_on='Venue', right_on='name')
df = df[["Neighborhood", "Venue", "Venue Category", "lat", "lng"]]
df.head()

Unnamed: 0,Neighborhood,Venue,Venue Category,lat,lng
0,Wakefield,Subway,Sandwich Place,40.713012,-74.008096
1,Wakefield,Subway,Sandwich Place,40.716417,-74.005031
2,Kingsbridge,Subway,Sandwich Place,40.713012,-74.008096
3,Kingsbridge,Subway,Sandwich Place,40.716417,-74.005031
4,Marble Hill,Subway,Sandwich Place,40.713012,-74.008096


In [33]:
NY_grouped = df.groupby('Neighborhood').head()
NY_grouped = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")

NY_grouped['Neighborhood'] = df['Neighborhood'] 
fixed_columns = [NY_grouped.columns[-1]] + list(NY_grouped.columns[:-1])
NY_grouped = NY_grouped[fixed_columns]
NY_grouped = NY_grouped.groupby('Neighborhood').mean().reset_index()
NY_grouped.head()

Unnamed: 0,Neighborhood,Fast Food Restaurant,Italian Restaurant,New American Restaurant,Restaurant,Sandwich Place,Sushi Restaurant
0,Arrochar,0.0,0.0,0.0,0.0,1.0,0.0
1,Arverne,0.0,0.0,0.0,0.0,1.0,0.0
2,Bay Ridge,0.0,0.0,0.0,0.0,1.0,0.0
3,Bayside,0.0,0.0,0.0,0.0,1.0,0.0
4,Belmont,0.0,0.0,0.0,0.0,1.0,0.0


## Top 5

<a id='item3'></a>

In [34]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


NY_top5 = pd.DataFrame(columns=columns)
NY_top5['Neighborhood'] = NY_grouped['Neighborhood']

for ind in np.arange(NY_grouped.shape[0]):
    NY_top5.iloc[ind, 1:] = return_most_common_venues(NY_grouped.iloc[ind, :], num_top_venues)
NY_top5.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Arrochar,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
1,Arverne,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
2,Bay Ridge,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
3,Bayside,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
4,Belmont,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant


## Cluster

In [48]:
kclusters = 4

NY_grouped_clustering = NY_grouped.drop('Neighborhood',1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NY_grouped_clustering)
NY_top5

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,Arrochar,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
1,1,Arverne,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
2,1,Bay Ridge,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
3,1,Bayside,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
4,1,Belmont,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
5,1,Broadway Junction,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
6,3,Brownsville,Restaurant,Sushi Restaurant,Sandwich Place,New American Restaurant,Italian Restaurant
7,1,Bulls Head,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
8,1,Castleton Corners,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
9,0,City Line,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant


In [95]:
NY_merged = neighborhoods

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
NY_merged = NY_merged.join(NY_top5.set_index('Neighborhood'), on='Neighborhood')

NY_merged.columns = ['Borough', 'Neighborhood', 'Latitude', 'Longitude','ClusterLabels', '1st Most Common Venue' ,'2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue', '5th Most Common Venue' ]
NY = NY_merged[NY_merged.ClusterLabels != 'NaN']

NY=NY.dropna(axis='rows')

NY.ClusterLabels = NY.ClusterLabels.astype(int)
NY.reset_index(drop = True, inplace = True)
NY


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Bronx,Wakefield,40.894705,-73.847201,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
1,Bronx,Kingsbridge,40.881687,-73.902818,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
2,Manhattan,Marble Hill,40.876551,-73.91066,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
3,Bronx,University Heights,40.855727,-73.910416,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
4,Bronx,Fordham,40.860997,-73.896427,0,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant
5,Bronx,East Tremont,40.842696,-73.887356,4,Fast Food Restaurant,Sushi Restaurant,Sandwich Place,Restaurant,New American Restaurant
6,Bronx,West Farms,40.839475,-73.877745,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
7,Bronx,Melrose,40.819754,-73.909422,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
8,Bronx,Longwood,40.815099,-73.895788,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
9,Bronx,Parkchester,40.837938,-73.856003,1,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant


isualize the resulting clusters

In [96]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NY['Latitude'], NY['Longitude'], NY['Neighborhood'], NY['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

##Examine Clusters

#### Cluster 1

In [98]:
NY.loc[NY['ClusterLabels'] == 0, NY.columns[[1] + list(range(5, NY.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Fordham,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant
19,City Line,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant
20,Prospect Park South,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant
58,Concourse Village,Sandwich Place,Fast Food Restaurant,Sushi Restaurant,Restaurant,New American Restaurant


#### Cluster 2

In [99]:
NY.loc[NY['ClusterLabels'] == 1, NY.columns[[1] + list(range(5, NY.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Wakefield,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
1,Kingsbridge,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
2,Marble Hill,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
3,University Heights,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
6,West Farms,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
7,Melrose,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
8,Longwood,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
9,Parkchester,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
10,Westchester Square,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant
11,Belmont,Sandwich Place,Sushi Restaurant,Restaurant,New American Restaurant,Italian Restaurant


#### Cluster 3

In [100]:
NY.loc[NY['ClusterLabels'] == 2, NY.columns[[1] + list(range(5, NY.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
25,East Village,Sushi Restaurant,Sandwich Place,Restaurant,New American Restaurant,Italian Restaurant
55,Noho,Sushi Restaurant,Sandwich Place,Restaurant,New American Restaurant,Italian Restaurant
56,Civic Center,Sushi Restaurant,New American Restaurant,Italian Restaurant,Sandwich Place,Restaurant


#### Cluster 4

In [101]:
NY.loc[NY['ClusterLabels'] == 3, NY.columns[[1] + list(range(5, NY.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
17,Brownsville,Restaurant,Sushi Restaurant,Sandwich Place,New American Restaurant,Italian Restaurant


#### Cluster 5

In [102]:
NY.loc[NY['ClusterLabels'] == 4, NY.columns[[1] + list(range(5, NY.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,East Tremont,Fast Food Restaurant,Sushi Restaurant,Sandwich Place,Restaurant,New American Restaurant
39,Rochdale,Fast Food Restaurant,Sushi Restaurant,Sandwich Place,Restaurant,New American Restaurant
