# Capstone Project - The Battle of Neighborhoods
# Finding the Best Neighborhood in New York

## Introduction:
Recently, one of my client has approached me to find out the most suitable place in New York to live.
He wants to move to the most happening place in New York.
### Business Problem
Given that there are many neighbourhoods in New York
Exploring the neighbouhoods 
Using the Foursquare API to analyse 
Find the most sought after, trendy, popular and commented venues. 

## Importing Necessary Libraries

In [None]:
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library


# Data Collection
## Web Scrapping with BeautifulSoup

In [None]:
from bs4 import BeautifulSoup
import requests
import re

url = "https://www.google.com/search?q=new+york+neighbourhoods&oq=new+york+neighbourhoods&aqs=chrome..69i57j0l7.12674j0j7&sourceid=chrome&ie=UTF-8"
response = requests.get(url)
NY_data = BeautifulSoup(response.text, 'lxml')
contents=NY_data.find_all('div', class_='RWuggc kCrYT')
columns = ['Neighborhood']
NY_neighborhood = pd.DataFrame(columns = columns)
for content in contents:
    #coordinate = content.find('div', class_='BNeawe s3v9rd AP7Wnd')
    neighbor_ = content.find('div').find('div').text
    NY_neighborhood = NY_neighborhood.append({'Neighborhood': neighbor_}, ignore_index=True)
NY_neighborhood


Unnamed: 0,Neighborhood
0,Midtown Manhattan
1,SoHo
2,Harlem
3,Upper East Side
4,Tribeca
5,Upper West Side
6,Williamsburg
7,Lower East Side
8,Lower Manhattan
9,Greenwich Village


## Defining a funtion to get the new york city data such as Boroughs Neighborhoods along with their Latitude and Longitude.



In [None]:
NY_coord = pd.DataFrame(columns = ['Latitude','Longitude'])
geolocator = Nominatim(user_agent="New York")
for row in NY_neighborhood['Neighborhood']:
    location = geolocator.geocode(row)
    NY_coord = NY_coord.append({'Latitude':location.latitude, 'Longitude':location.longitude}, ignore_index=True)
NY_neighborhood = NY_neighborhood.join(NY_coord)
NY_neighborhood

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Midtown Manhattan,40.760109,-73.978163
1,SoHo,51.513163,-0.131175
2,Harlem,40.807879,-73.945415
3,Upper East Side,40.773702,-73.96412
4,Tribeca,40.71538,-74.009306
5,Upper West Side,40.787045,-73.975416
6,Williamsburg,37.278921,-76.694486
7,Lower East Side,40.715936,-73.986806
8,Lower Manhattan,51.113231,17.018655
9,Greenwich Village,40.733584,-74.002817


### Plotting the map with Folium

In [None]:
address = 'New York'

geolocator = Nominatim(user_agent="New York")
location = geolocator.geocode(address)
latitude_NY = location.latitude
longitude_NY = location.longitude
map_NY = folium.Map(location=[latitude_NY, longitude_NY], zoom_start=12)

# add markers to map
for lat, lng, postal in zip(NY_neighborhood['Latitude'],NY_neighborhood['Longitude'],NY_neighborhood['Neighborhood']):
    label = str(postal)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [float(lat), float(lng)],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NY)  
    
map_NY

## Getting Venues List via FourSquare API

In [None]:
CLIENT_ID = 'ILLMWN3IEHII4JAWIVV13BLYXJ31QCGQ23FL2KG552DNEFPD' #Foursquare ID
CLIENT_SECRET ='BXPXXEIO4AUTWA2IJDFFTY1JGLJYIWO4GPCSSSRVQBFAYOP1'
VERSION = '20180605' # Foursquare API version

radius=5000
LIMIT=100
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        print(results)
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
NY_venues = getNearbyVenues(names=NY_neighborhood['Neighborhood'],
                                   latitudes=NY_neighborhood['Latitude'],
                                   longitudes=NY_neighborhood['Longitude']
                                  )


[{'reasons': {'count': 0, 'items': [{'summary': 'This spot is popular', 'type': 'general', 'reasonName': 'globalInteractionReason'}]}, 'venue': {'id': '4161e400f964a520721d1fe3', 'name': 'Radio City Music Hall', 'location': {'address': '1260 Avenue of the Americas', 'crossStreet': 'at W 50th St', 'lat': 40.759850056828405, 'lng': -73.97934397752239, 'labeledLatLngs': [{'label': 'display', 'lat': 40.759850056828405, 'lng': -73.97934397752239}], 'distance': 103, 'postalCode': '10020', 'cc': 'US', 'city': 'New York', 'state': 'NY', 'country': 'United States', 'formattedAddress': ['1260 Avenue of the Americas (at W 50th St)', 'New York, NY 10020', 'United States']}, 'categories': [{'id': '5032792091d4c4b30a586d5c', 'name': 'Concert Hall', 'pluralName': 'Concert Halls', 'shortName': 'Concert Hall', 'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/musicvenue_', 'suffix': '.png'}, 'primary': True}], 'photos': {'count': 0, 'groups': []}, 'venuePage': {'id': '348456

In [None]:
NY_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Midtown Manhattan,40.760109,-73.978163,Radio City Music Hall,40.759850,-73.979344,Concert Hall
1,Midtown Manhattan,40.760109,-73.978163,Equinox,40.759180,-73.977784,Gym
2,Midtown Manhattan,40.760109,-73.978163,MoMA: Architecture and Design,40.761547,-73.976732,Art Museum
3,Midtown Manhattan,40.760109,-73.978163,Rockefeller Center,40.758668,-73.978730,Plaza
4,Midtown Manhattan,40.760109,-73.978163,Museum of Modern Art (MoMA),40.761412,-73.977462,Art Museum
...,...,...,...,...,...,...,...
3727,Forest Hills,38.215348,-85.585793,Plato's Closet,38.217318,-85.589029,Clothing Store
3728,Forest Hills,38.215348,-85.585793,Neatbeat,38.212127,-85.588286,Salon / Barbershop
3729,Forest Hills,38.215348,-85.585793,El Taco Luchador,38.211735,-85.587660,Mexican Restaurant
3730,Forest Hills,38.215348,-85.585793,Happy China,38.212563,-85.589912,Chinese Restaurant


In [None]:
NY_venues.head()


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Midtown Manhattan,40.760109,-73.978163,Radio City Music Hall,40.75985,-73.979344,Concert Hall
1,Midtown Manhattan,40.760109,-73.978163,Equinox,40.75918,-73.977784,Gym
2,Midtown Manhattan,40.760109,-73.978163,MoMA: Architecture and Design,40.761547,-73.976732,Art Museum
3,Midtown Manhattan,40.760109,-73.978163,Rockefeller Center,40.758668,-73.97873,Plaza
4,Midtown Manhattan,40.760109,-73.978163,Museum of Modern Art (MoMA),40.761412,-73.977462,Art Museum


In [None]:
NY_venues_count = NY_venues.groupby('Neighborhood').count().reset_index()
NY_venues_count[['Neighborhood', 'Venue']]


Unnamed: 0,Neighborhood,Venue
0,Astoria,84
1,Battery Park City,92
2,Bay Ridge,27
3,Bedford-Stuyvesant,55
4,Boerum Hill,82
5,Brooklyn Heights,100
6,Bushwick,11
7,Carroll Gardens,98
8,Chelsea,45
9,Chinatown,100


In [None]:
print('So {} uniques categories.'.format(len(NY_venues['Venue Category'].unique())))

So 335 uniques categories.


### There are 335 uniques venues.

## Onehot Encoding

In [None]:
NY_onehot = pd.get_dummies(NY_venues[['Venue Category']], prefix="", prefix_sep="") 

# add neighborhood column back to dataframe
NY_onehot['Neighborhood'] = NY_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [NY_onehot.columns[-1]] + list(NY_onehot.columns[:-1])
NY_onehot = NY_onehot[fixed_columns]

NY_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,...,Supermarket,Supplement Shop,Sushi Restaurant,Synagogue,Szechuan Restaurant,TV Station,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Tram Station,Tree,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
NY_grouped = NY_onehot.groupby('Neighborhood').mean().reset_index()
NY_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Dealership,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Basketball Stadium,Bath House,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,...,Supermarket,Supplement Shop,Sushi Restaurant,Synagogue,Szechuan Restaurant,TV Station,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Track Stadium,Trail,Tram Station,Tree,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Astoria,0.0,0.02381,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Battery Park City,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032609,0.0,0.0
2,Bay Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0
3,Bedford-Stuyvesant,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.054545,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.036364,0.0,0.0
4,Boerum Hill,0.02439,0.0,0.0,0.0,0.012195,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.036585,0.012195,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.012195,0.0,0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.0,0.0
5,Brooklyn Heights,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.0,0.01
6,Bushwick,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Carroll Gardens,0.010204,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.020408,0.030612,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.010204,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,0.0,0.0,0.010204,0.020408,0.010204,0.0
8,Chelsea,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222
9,Chinatown,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.0,0.0,0.0


In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = NY_grouped['Neighborhood']

for ind in np.arange(NY_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(NY_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Astoria,Brewery,Seafood Restaurant,American Restaurant,Bar,Cocktail Bar,Hotel,Coffee Shop,Breakfast Spot,Eastern European Restaurant,Chinese Restaurant
1,Battery Park City,Park,Coffee Shop,Memorial Site,Hotel,Plaza,Wine Shop,Gym,Mexican Restaurant,Food Truck,Sandwich Place
2,Bay Ridge,Chinese Restaurant,Playground,Dessert Shop,Seafood Restaurant,Noodle House,Gift Shop,Rental Car Location,Park,Dive Bar,Nightclub
3,Bedford-Stuyvesant,Coffee Shop,Pizza Place,Café,Bar,Boutique,Fried Chicken Joint,Sandwich Place,Wine Shop,Deli / Bodega,Caribbean Restaurant
4,Boerum Hill,Coffee Shop,Bar,Bakery,Furniture / Home Store,French Restaurant,Dance Studio,Sandwich Place,Spa,Yoga Studio,Gym / Fitness Center
5,Brooklyn Heights,Park,Yoga Studio,Deli / Bodega,Mexican Restaurant,Juice Bar,Italian Restaurant,Ice Cream Shop,Gym,Wine Shop,Bar
6,Bushwick,Café,Pizza Place,Indian Restaurant,Mexican Restaurant,Supermarket,Fast Food Restaurant,Vape Store,Cocktail Bar,Coffee Shop,Burrito Place
7,Carroll Gardens,Italian Restaurant,Coffee Shop,Deli / Bodega,Bar,Pizza Place,Food Truck,Bank,Café,Cocktail Bar,Thai Restaurant
8,Chelsea,Bakery,Pub,Italian Restaurant,French Restaurant,Coffee Shop,Ice Cream Shop,English Restaurant,Park,Burger Joint,Supermarket
9,Chinatown,Chinese Restaurant,Bakery,Bubble Tea Shop,Sandwich Place,Vietnamese Restaurant,Hotpot Restaurant,Spa,Malay Restaurant,Dessert Shop,Cocktail Bar


## Running Clusttering Algorithum

In [None]:
kclusters = 4

NY_grouped_clustering = NY_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NY_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[:] 


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 3], dtype=int32)

In [None]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

NY_merged = NY_neighborhood

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
NY_merged = NY_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

NY_merged.head() # check the last columns!
NY_merged = NY_merged.dropna()

In [None]:

map_clusters = folium.Map(location=[latitude_NY, longitude_NY], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NY_merged['Latitude'], NY_merged['Longitude'], NY_merged['Neighborhood'], NY_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
print("Cluster 1")
NY_merged.loc[NY_merged['Cluster Labels'] == 0, NY_merged.columns[[0] + list(range(4, NY_merged.shape[1]))]]


Cluster 1


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Midtown Manhattan,Hotel,Gym,Theater,Boutique,French Restaurant,Coffee Shop,Clothing Store,Sushi Restaurant,Steakhouse,Lebanese Restaurant
1,SoHo,Coffee Shop,Theater,Ice Cream Shop,Hotel,Pizza Place,Arts & Crafts Store,Cocktail Bar,Comic Shop,Bakery,Chocolate Shop
2,Harlem,Clothing Store,Southern / Soul Food Restaurant,Mobile Phone Shop,Cosmetics Shop,African Restaurant,Theater,Pizza Place,Burger Joint,Coffee Shop,Kids Store
3,Upper East Side,Italian Restaurant,Art Museum,Boutique,Coffee Shop,French Restaurant,Outdoor Sculpture,Clothing Store,Women's Store,Hotel,Gift Shop
4,Tribeca,Coffee Shop,Gym / Fitness Center,American Restaurant,Gym,Spa,Plaza,Bakery,Hotel,Bar,Italian Restaurant
5,Upper West Side,Italian Restaurant,Bar,Pizza Place,Wine Bar,Coffee Shop,Indian Restaurant,American Restaurant,Bakery,Middle Eastern Restaurant,Bagel Shop
7,Lower East Side,Mexican Restaurant,Café,Chinese Restaurant,Coffee Shop,Bar,Cocktail Bar,Ice Cream Shop,Sandwich Place,Boutique,Art Gallery
8,Lower Manhattan,Hostel,Tram Station,Polish Restaurant,Public Art,Dumpling Restaurant,Chinese Restaurant,Nightclub,Café,Burrito Place,Burger Joint
9,Greenwich Village,Italian Restaurant,Pizza Place,American Restaurant,Coffee Shop,Bakery,Jazz Club,Sandwich Place,Cocktail Bar,Ice Cream Shop,Indian Restaurant
10,East Village,Japanese Restaurant,Grocery Store,Pizza Place,Coffee Shop,Ice Cream Shop,Vegetarian / Vegan Restaurant,Sushi Restaurant,Chinese Restaurant,Speakeasy,Dessert Shop


In [None]:
print("Cluster 2")
NY_merged.loc[NY_merged['Cluster Labels'] == 1, NY_merged.columns[[0] + list(range(4, NY_merged.shape[1]))]]


Cluster 2


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
38,Inwood,Bar,Food,Hotel,Playground,Event Space,Falafel Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant


In [None]:
print("Cluster 3")
NY_merged.loc[NY_merged['Cluster Labels'] == 2, NY_merged.columns[[0] + list(range(4, NY_merged.shape[1]))]]


Cluster 3


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Flushing,Korean Restaurant,Pizza Place,Coffee Shop,Mobile Phone Shop,Bank,Bar,Bath House,Fried Chicken Joint,Supermarket,Fish & Chips Shop


In [None]:
print("Cluster 4")
NY_merged.loc[NY_merged['Cluster Labels'] == 3, NY_merged.columns[[0] + list(range(4, NY_merged.shape[1]))]]


Cluster 4


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Williamsburg,Hotel,Hotel Pool,Tourist Information Center,Road,Park,Café,Breakfast Spot,Gift Shop,Bookstore,American Restaurant


# Conclusion:
After doing the K means clustering and other analysis described in the python notebook we are of the Opinion that the person should reside in Cluster 1. This is because the person wants to live in a happening place nearby. Cluster one has all the facilities the client is looking for. 
