# Introduction

New York City is one of the largest cities in the world with 8 million people. There are five boroughs: Manhattan, Brooklyn, Queens, Bronx, and Staten Island. As a resident in NYC, I decided to explore this city in my project. One interesting idea would be, if one wants to purchase an apartment in New York City, what are the best neighborhood to consider? The tradeoff is usually between price and convenience. The more centrally located neighborhoods with a lot of restaurants and grocery stores are often more expensive, while more affordable neighborhoods tend to be further from the city center, less convenient with less shops and restaurants. 

With all the above in mind, I will create a ranking of neighborhoods according to their housing prices, distance to city center, number of shops, restaurants, and grocery stores. For simplicity, I will only consider the neighborhoods in Manhattan.

# Data

The data to explore this question come from below sources:

**1.Housing price data**

Housing price data is downloaded from StreetEasy Data Dashboard: https://streeteasy.com/blog/data-dashboard/. It provides downloadable csv files for price index, median sale price, median asking price for each neighborhood. Since median sale price data has a lot of missing values if there is no sales in a given month, I will just use the median asking price as a gauge of housing price. The data is a time-series data of monthly prices and I just use the latest available month, which is December 2019.

**2.Venue data**

I use Foursquare API to get venue data such as the number of shops, restaurants and grocery stores in each neighborhood.

**3.Distance data**

I use Google map to get distance to city center data for each neighborhood. The official New York City coordinates are those of the city hall, but most professional people work around midtown, and when we consider distance to center we are mostly concerned about commute for work, so I use midtown Manhattan coordinates 40.7549° N, 73.9840° W to calculate distance to center.

# Methodology

## Import Data

I download the Manhattan median asking price by neighborhood data from StreetEasy and combine it with distance to center data from Google Map in one csv file:

In [138]:
import pandas as pd
import numpy as np

In [139]:
df=pd.read_csv('neighborhoods.csv')
df

Unnamed: 0,Neighborhood,Price,DistanceToCenter
0,Battery Park City,1340000.0,3.8
1,Central Harlem,980000.0,4.5
2,Central Park South,2345000.0,0.9
3,Chelsea,2275000.0,1.4
4,Chinatown,1995000.0,3.0
5,Civic Center,,3.1
6,East Harlem,950000.0,4.3
7,East Village,1250000.0,2.4
8,Financial District,1397000.0,3.7
9,Flatiron,2695000.0,1.1


We can see there are 32 neighborhoods in Manhattan. Let''s drop neighborhoods that have missing values:

In [140]:
df.dropna(inplace=True)
df

Unnamed: 0,Neighborhood,Price,DistanceToCenter
0,Battery Park City,1340000.0,3.8
1,Central Harlem,980000.0,4.5
2,Central Park South,2345000.0,0.9
3,Chelsea,2275000.0,1.4
4,Chinatown,1995000.0,3.0
6,East Harlem,950000.0,4.3
7,East Village,1250000.0,2.4
8,Financial District,1397000.0,3.7
9,Flatiron,2695000.0,1.1
10,Gramercy Park,1715000.0,1.5


## Get the latitude and longitude coordinates of each neighborhood

In [141]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [142]:
neighborhoods=df['Neighborhood'].tolist()
geolocator = Nominatim(user_agent="newyork_explorer")
latitude=[]
longitude=[]
for n in neighborhoods:
    address = n+', New York'
    location = geolocator.geocode(address)
    latitude.append(location.latitude)
    longitude.append(location.longitude)
df['Latitude']=latitude
df['Longitude']=longitude
df

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude
0,Battery Park City,1340000.0,3.8,40.711017,-74.016937
1,Central Harlem,980000.0,4.5,40.807879,-73.945415
2,Central Park South,2345000.0,0.9,40.76775,-73.981318
3,Chelsea,2275000.0,1.4,40.746491,-74.001528
4,Chinatown,1995000.0,3.0,40.716491,-73.99625
6,East Harlem,950000.0,4.3,40.794722,-73.9425
7,East Village,1250000.0,2.4,40.729269,-73.987361
8,Financial District,1397000.0,3.7,40.707612,-74.009378
9,Flatiron,2695000.0,1.1,40.741059,-73.989642
10,Gramercy Park,1715000.0,1.5,40.737925,-73.985932


## Create a map of Manhattan New York with neighborhoods superimposed on top

In [143]:
import folium
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[40.762268,-73.979544], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

## Foursquare

Now we can find venues in each neighborhood using Foursquare API. First let's define Foursquare credentials and Version:

In [144]:
CLIENT_ID = 'OKIBPTJLJIFBJDZR3NN0CYN0YWG35ANBIEGDBQWDOQCWN10G' # your Foursquare ID
CLIENT_SECRET = 'NC4WTAGQX23WZ1UYSK5YYDEHVV1L2PKXDKFTK4VYWOZRT5MV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OKIBPTJLJIFBJDZR3NN0CYN0YWG35ANBIEGDBQWDOQCWN10G
CLIENT_SECRET:NC4WTAGQX23WZ1UYSK5YYDEHVV1L2PKXDKFTK4VYWOZRT5MV


In [145]:
# Create a function to explore venues
def getNearbyVenues(names, latitudes, longitudes, category, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Find number of restaurants in each neighborhood

In [146]:
import requests
# run the above function on each neighborhood and create a new dataframe 
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
food_category='4d4b7105d754a06374d81259' # 'Root' category for all food-related venues
nyc_restaurants = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   category=food_category
                                  )
nyc_restaurants.shape

Battery Park City
Central Harlem
Central Park South
Chelsea
Chinatown
East Harlem
East Village
Financial District
Flatiron
Gramercy Park
Greenwich Village
Hamilton Heights
Inwood
Lower East Side
Midtown
Midtown East
Midtown South
Midtown West
Morningside Heights
Nolita
Roosevelt Island
Soho
Tribeca
Upper East Side
Upper West Side
Washington Heights
West Harlem
West Village


(2233, 7)

There are a total of 2233 restaurants in Manhattan. Now let''s calculate the number of restaurants in each neighborhood.

In [147]:
manhattan=pd.merge(df,nyc_restaurants.groupby('Neighborhood',as_index=False).count()[['Neighborhood','Venue']])
manhattan.rename(columns={'Venue':'NumRestaurant'},inplace=True)
manhattan

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant
0,Battery Park City,1340000.0,3.8,40.711017,-74.016937,70
1,Central Harlem,980000.0,4.5,40.807879,-73.945415,68
2,Central Park South,2345000.0,0.9,40.76775,-73.981318,100
3,Chelsea,2275000.0,1.4,40.746491,-74.001528,66
4,Chinatown,1995000.0,3.0,40.716491,-73.99625,100
5,East Harlem,950000.0,4.3,40.794722,-73.9425,49
6,East Village,1250000.0,2.4,40.729269,-73.987361,100
7,Financial District,1397000.0,3.7,40.707612,-74.009378,100
8,Flatiron,2695000.0,1.1,40.741059,-73.989642,100
9,Gramercy Park,1715000.0,1.5,40.737925,-73.985932,100


## Find number of grocery stores in each neighborhood

In [148]:
grocery_category='4bf58dd8d48988d118951735' # category for grocery stores from https://developer.foursquare.com/docs/resources/categories
nyc_grocery = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   category=grocery_category
                                  )
nyc_grocery.shape

Battery Park City
Central Harlem
Central Park South
Chelsea
Chinatown
East Harlem
East Village
Financial District
Flatiron
Gramercy Park
Greenwich Village
Hamilton Heights
Inwood
Lower East Side
Midtown
Midtown East
Midtown South
Midtown West
Morningside Heights
Nolita
Roosevelt Island
Soho
Tribeca
Upper East Side
Upper West Side
Washington Heights
West Harlem
West Village


(310, 7)

There are a total of 310 grocery stores in Manhattan. Now let''s calculate the number of grocery stores in each neighborhood.

In [149]:
manhattan2=pd.merge(manhattan,nyc_grocery.groupby('Neighborhood',as_index=False).count()[['Neighborhood','Venue']])
manhattan2.rename(columns={'Venue':'NumGrocery'},inplace=True)
manhattan2

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant,NumGrocery
0,Battery Park City,1340000.0,3.8,40.711017,-74.016937,70,7
1,Central Harlem,980000.0,4.5,40.807879,-73.945415,68,5
2,Central Park South,2345000.0,0.9,40.76775,-73.981318,100,7
3,Chelsea,2275000.0,1.4,40.746491,-74.001528,66,10
4,Chinatown,1995000.0,3.0,40.716491,-73.99625,100,65
5,East Harlem,950000.0,4.3,40.794722,-73.9425,49,14
6,East Village,1250000.0,2.4,40.729269,-73.987361,100,15
7,Financial District,1397000.0,3.7,40.707612,-74.009378,100,12
8,Flatiron,2695000.0,1.1,40.741059,-73.989642,100,11
9,Gramercy Park,1715000.0,1.5,40.737925,-73.985932,100,10


In [150]:
manhattan2.describe()

Unnamed: 0,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant,NumGrocery
count,28.0,28.0,28.0,28.0,28.0,28.0
mean,1779359.0,2.957143,40.761105,-73.978035,79.75,11.071429
std,1031887.0,2.012264,0.042684,0.025049,25.383904,13.213149
min,487500.0,0.0,40.707612,-74.016937,14.0,1.0
25%,1061750.0,1.475,40.727672,-73.996329,64.0,5.0
50%,1529025.0,2.5,40.754832,-73.985092,100.0,7.0
75%,2131250.0,3.925,40.788965,-73.959432,100.0,11.25
max,4750000.0,8.9,40.869258,-73.920495,100.0,65.0


## Rank neighborhoods

Now we have the complete data for each neighborhood, including housing price, distance to center, number of restaurants and number of grocery. Next we will rank the neighborhoods by these four metrics. I do this by first ranking each neighborhood according to each individual metric. Then assuming equal weight for each metric, I will sum up all four ranks to get an aggregate score. The smaller the aggregate score, the higher the rank of a particular neighborhood.

In [151]:
#Create rank columns for each metric
manhattan2['RankPrice']=manhattan2['Price'].rank()
manhattan2['RankDistance']=manhattan2['DistanceToCenter'].rank()
manhattan2['RankRes']=manhattan2['NumRestaurant'].rank(ascending=False)
manhattan2['RankGrocery']=manhattan2['NumGrocery'].rank(ascending=False)
manhattan2

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant,NumGrocery,RankPrice,RankDistance,RankRes,RankGrocery
0,Battery Park City,1340000.0,3.8,40.711017,-74.016937,70,7,11.0,21.0,17.0,15.5
1,Central Harlem,980000.0,4.5,40.807879,-73.945415,68,5,7.0,24.0,18.5,22.0
2,Central Park South,2345000.0,0.9,40.76775,-73.981318,100,7,24.0,4.0,8.0,15.5
3,Chelsea,2275000.0,1.4,40.746491,-74.001528,66,10,23.0,7.0,20.0,10.5
4,Chinatown,1995000.0,3.0,40.716491,-73.99625,100,65,20.0,16.5,8.0,1.0
5,East Harlem,950000.0,4.3,40.794722,-73.9425,49,14,5.5,22.5,24.5,5.0
6,East Village,1250000.0,2.4,40.729269,-73.987361,100,15,10.0,11.0,8.0,4.0
7,Financial District,1397000.0,3.7,40.707612,-74.009378,100,12,12.0,20.0,8.0,7.0
8,Flatiron,2695000.0,1.1,40.741059,-73.989642,100,11,25.0,6.0,8.0,8.5
9,Gramercy Park,1715000.0,1.5,40.737925,-73.985932,100,10,16.0,8.0,8.0,10.5


In [152]:
#Create overall rank by adding all four ranks
manhattan2['OverallRank']=manhattan2[['RankPrice','RankDistance','RankRes','RankGrocery']].sum(axis=1)
manhattan2

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant,NumGrocery,RankPrice,RankDistance,RankRes,RankGrocery,OverallRank
0,Battery Park City,1340000.0,3.8,40.711017,-74.016937,70,7,11.0,21.0,17.0,15.5,64.5
1,Central Harlem,980000.0,4.5,40.807879,-73.945415,68,5,7.0,24.0,18.5,22.0,71.5
2,Central Park South,2345000.0,0.9,40.76775,-73.981318,100,7,24.0,4.0,8.0,15.5,51.5
3,Chelsea,2275000.0,1.4,40.746491,-74.001528,66,10,23.0,7.0,20.0,10.5,60.5
4,Chinatown,1995000.0,3.0,40.716491,-73.99625,100,65,20.0,16.5,8.0,1.0,45.5
5,East Harlem,950000.0,4.3,40.794722,-73.9425,49,14,5.5,22.5,24.5,5.0,57.5
6,East Village,1250000.0,2.4,40.729269,-73.987361,100,15,10.0,11.0,8.0,4.0,33.0
7,Financial District,1397000.0,3.7,40.707612,-74.009378,100,12,12.0,20.0,8.0,7.0,47.0
8,Flatiron,2695000.0,1.1,40.741059,-73.989642,100,11,25.0,6.0,8.0,8.5,47.5
9,Gramercy Park,1715000.0,1.5,40.737925,-73.985932,100,10,16.0,8.0,8.0,10.5,42.5


By sorting the overall rank by ascending order, we can get the rank of the neighborhoods:

In [153]:
#sort by overall rank
manhattan2.sort_values(by=['OverallRank'],inplace=True)
manhattan2

Unnamed: 0,Neighborhood,Price,DistanceToCenter,Latitude,Longitude,NumRestaurant,NumGrocery,RankPrice,RankDistance,RankRes,RankGrocery,OverallRank
6,East Village,1250000.0,2.4,40.729269,-73.987361,100,15,10.0,11.0,8.0,4.0,33.0
17,Midtown West,1492500.0,0.9,40.764423,-73.992392,100,7,14.0,4.0,8.0,15.5,41.5
13,Lower East Side,1397500.0,3.3,40.715936,-73.986806,100,16,13.0,18.0,8.0,3.0,42.0
9,Gramercy Park,1715000.0,1.5,40.737925,-73.985932,100,10,16.0,8.0,8.0,10.5,42.5
15,Midtown East,950000.0,0.9,40.759822,-73.972471,100,3,5.5,4.0,8.0,25.0,42.5
16,Midtown South,1565550.0,0.3,40.749842,-73.984251,100,6,15.0,2.0,8.0,19.0,44.0
4,Chinatown,1995000.0,3.0,40.716491,-73.99625,100,65,20.0,16.5,8.0,1.0,45.5
7,Financial District,1397000.0,3.7,40.707612,-74.009378,100,12,12.0,20.0,8.0,7.0,47.0
8,Flatiron,2695000.0,1.1,40.741059,-73.989642,100,11,25.0,6.0,8.0,8.5,47.5
10,Greenwich Village,1975000.0,1.8,40.73198,-73.996566,100,9,19.0,9.0,8.0,12.0,48.0


Now we see that the top 5 neighborhoods to consider to purchase an apartment are: East Village, Midtown West, Lower East Side, Gramercy Park, and Midtown East. We can visualize these top choices on the map:

In [154]:
# add markers to map
# create map of New York using latitude and longitude values
map_manhattan = folium.Map(location=[40.762268,-73.979544], zoom_start=12)

for lat, lng in zip(manhattan2['Latitude'].head(5), manhattan2['Longitude'].head(5)):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.5,
        parse_html=False).add_to(map_manhattan)  

map_manhattan

# Results and Discussion

In the above analysis, we collect and create data for each of the 32 neighborhoods in Manhattan, New York their median housing price, distance to city center, number of restaurants and number of grocery stores. We then rank the neighborhoods by each of the four metrics that we think are most relevant in making decision on purchasing an apartment. Finally we combine the four ranks by equal weight to get an overall rank. We found that the top 5 neighborhoods in Manhattan are East Village, Midtown West, Lower East Side, Gramercy Park, and Midtown East. These candidates overall represent the neighborhoods that are most affordable, centrally located, and abundant in restaurants and grocery stores.

Of course, we can add more metrics to evaluate the neighborhoods, depending on the target client's preferences. For example, we can add the number of subway stations nearby to represent the convenience of commute, number of movie theaters for movie goers, number of gyms for athletic people, number of bars to represent the abundance of nightlife, or school zone rating for families with school-aged children.

In addition, we can also apply different weightings for the selected metrics. One can easily adjust up or down the individual weighting of each metric according to their own preferences. For example, if the target client is more sensitive to housing price, then we can apply a higher weight on the price metric etc. The model is very flexible.

# Conclusion

In this project, we try to answer the question: what's the best neighborhood to consider if we want to purchase an apartment in Manhattan? We present a simple approach by collecting data of four different metrics: housing price, distance to city center, number of restaurants, and number of grocery stores. By leveraging Foursquare API data, we are able to calculate the number of restaurants and grocery stores in each neighborhood. We then rank the neighborhood by each metric and sum them together to get the overall score. We find that the best neighborhood options are East Village, Midtown West, Lower East Side, Gramercy Park, and Midtown East.

The model is flexible and can be easily modified according the stakeholder's preferences. Using the same framework, we can collect data on different metrics and apply different weights on each metric. We can also cluster the neighborhoods to find commonalities among the neighborhoods as an extension of the project.