<h1> Marble Hill vs Chinatown </h1>

<h2> Introduction/Business Problem </h2>

Which neighborhood has more venues of the type Coffee Shop within a radius of 800 meters in Manhattan, Marble Hill
or Chinatown?

<h2> Data </h2>

The dataset would be: https://geo.nyu.edu/catalog/nyu_2451_34572. This dataset contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood in New York City.

<h2> Methodology </h2>

First, we will import the dependencies that we will need.

In [9]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


We will then download the data and convert it to a Pandas dataframe.

In [10]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

We will get then only the data from Manhattan. After that we will get the latitude and longitude of Marble Hill and Chinatown.

In [11]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [35]:
MarbleHillIndex = manhattan_data[manhattan_data['Neighborhood'] == 'Marble Hill'].index
ChinatownIndex = manhattan_data[manhattan_data['Neighborhood'] == 'Chinatown'].index

MarbleHill_latitude = manhattan_data.loc[MarbleHillIndex[0], 'Latitude'] # neighborhood latitude value
MarbleHill_longitude = manhattan_data.loc[MarbleHillIndex[0], 'Longitude'] # neighborhood longitude value

Chinatown_latitude = manhattan_data.loc[ChinatownIndex[0], 'Latitude'] # neighborhood latitude value
Chinatown_longitude = manhattan_data.loc[ChinatownIndex[0], 'Longitude'] # neighborhood longitude value

We then define the Foursquare credentials and version.

In [16]:
# The code was removed by Watson Studio for sharing.

Now, we will get the venues that are in Marble Hill and Chinatown within a radius of 800 meters.

In [42]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [50]:
radius = 800 # define radius

MarbleHill_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    MarbleHill_latitude, 
    MarbleHill_longitude, 
    radius)

Chinatown_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Chinatown_latitude, 
    Chinatown_longitude, 
    radius)

MarbleHill_results = requests.get(MarbleHill_url).json()
Chinatown_results = requests.get(Chinatown_url).json()
 
# MarbleHill DATAFRAME
MarbleHill_venues = MarbleHill_results['response']['groups'][0]['items']
    
MarbleHill_venues = json_normalize(MarbleHill_venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
MarbleHill_venues = MarbleHill_venues.loc[:, filtered_columns]

# filter the category for each row
MarbleHill_venues['venue.categories'] = MarbleHill_venues.apply(get_category_type, axis=1)

# clean columns
MarbleHill_venues.columns = [col.split(".")[-1] for col in MarbleHill_venues.columns]

# Chinatown DATAFRAME
Chinatown_venues = Chinatown_results['response']['groups'][0]['items']
    
Chinatown_venues = json_normalize(Chinatown_venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
Chinatown_venues = Chinatown_venues.loc[:, filtered_columns]

# filter the category for each row
Chinatown_venues['venue.categories'] = Chinatown_venues.apply(get_category_type, axis=1)

# clean columns
Chinatown_venues.columns = [col.split(".")[-1] for col in Chinatown_venues.columns]

All the venus from MarbleHill and Chinatown.

In [46]:
MarbleHill_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Arturo's,Pizza Place,40.874412,-73.910271
1,Bikram Yoga,Yoga Studio,40.876844,-73.906204
2,Tibbett Diner,Diner,40.880404,-73.908937
3,Sam's Pizza,Pizza Place,40.879435,-73.905859
4,Starbucks,Coffee Shop,40.877531,-73.905582


In [44]:
Chinatown_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Cheeky Sandwiches,Sandwich Place,40.715821,-73.99183
1,Kiki's,Greek Restaurant,40.714476,-73.992036
2,Bar Belly,Cocktail Bar,40.715135,-73.991802
3,Hotel 50 Bowery NYC,Hotel,40.715936,-73.996789
4,Scarr's Pizza,Pizza Place,40.715335,-73.991649


<h2> Results </h2>

Now we will get only the venus of type Coffee Shop.

In [55]:
MarbleHill_venues = MarbleHill_venues[MarbleHill_venues['categories'] == 'Coffee Shop']
Chinatown_venues = Chinatown_venues[Chinatown_venues['categories'] == 'Coffee Shop']

print('Number of Coffe Shops in MarbleHill: {}'.format(MarbleHill_venues.shape[0]))
print('Number of Coffe Shops in Chinatown: {}'.format(Chinatown_venues.shape[0]))

Number of Coffe Shops in MarbleHill: 2
Number of Coffe Shops in Chinatown: 0


<h2> Discussion </h2>

As we could see above, MarbleHill has more venues of type Coffe Shop than Chinatown.

<h2> Conclusion </h2>

Just because MarbleHill has more venues of type Coffe Shop than Chinatown, it does not mean there are not other venus with other types that sell Coffe, so they are kind of a Coffe Shop also.