# The Battle of Neighbourhoods 

## Introduction and context
In this (hypothetical) problem a friend of mine has quit his Wall Street job after having worked for a bank for 10+ years. Wanting a change from corporate culture and having a bit of spare money to invest, but unable to leave the City he’s looking into opening a Hungarian restaurant in New York. While most of his friends think this is at least risky if not borderline madness, New York is probably the only and best place to open a place with a less known and somewhat less health-conscious cuisine. 

New York has more than 8.5 million people, while the greater metropolitan area’s population is more than 23 million, with neighborhoods home to an extremely vast majority of places from eateries through hole in the wall places up to 2-3 michelin star fine dining restaurants. According to the New York City Department of Health and statista (https://www.statista.com/statistics/259776/number-of-people-who-went-to-restaurants-in-new-york-by-type/)  there were more than 26’000 restaurants in the City in 2017, this gives a glimpse of hope for having yet another obscure place to make ends meet. 

The ask was to analyze where it may make sense to open such a restaurant. New York is very diverse with no very obvious concentration of cuisines (apart from Chinatown and some Flushing Meadows districts), so it requires further analysis to see if there is a trend in concentration of small cuisines or maybe even Easter European block in places. 

### Business problem
There are a few ways to approach this problem, I’ll take 3 here and based on the later data analysis it may be possible to pick one (or may not). The approaches are:

#### Chinatown approach
New York’s Chinatown is one the largest of its kind with a massive number of prospering restaurants. Obviously not only residents and people of Chinese origin visit these, but is famous among visitors, tourists and in general as well. This approach assumes that if there are areas with concentrated Hungarian and in a broader sense Eastern European restaurants another one can still fit in as people do visit these parts to eat a particular dish. As Hungarian is a small portion of the city's population with a lesser known cuisine it does seem to make sense to extend the radius with similar cuisines as well. (admittedly with a subjective list)

#### Go against the current
This is the direct opposite - seeing if there are places with no or very limited number of similar restaurants. In other cities this may be a plain bad approach as purely residential areas, suburbs or other industrial districts would not be  a good fit, but NYC, especially Manhattan is so packed with restaurants that this may not be an issue there. However, if such area is found, it does make sense to see how many / how concentrated the place is.

#### Go with the flow
This approach will simply look at the areas with the highest concentration of restaurants (assuming it also is proportional to their variety) and will recommend to set up a place where there are already a lot, as people do go there to eat and is a well-established neighbourhood from this point of view. 

More specifically during the exercise the following questions will be answered:
What are the particular areas with high concentration of Hungarian restaurants?
In a more generic sense, what are the areas with high concentration of Eastern European (Hungarian, Czech, Slovakian, Polish, Romanian) cuisines?
Which areas do not have Eastern European restaurants?
What are the areas with the highest concentration of restaurants?

## Description of the data and how it’ll be used
I will essentially be using the same data sets from the previous week’s exercises, as follows:

 
### New York neighbourhood data, latitude, longitude information
Source: https://cocl.us/new_york_dataset
Usage: mapping restaurant data including address to borough and neighbourhood for classification 

### Hungarian and Eastern European restaurants in New York City
Source: Foursquare API
Usage: getting the list of Hungarian and Eastern European restaurants in NYC for each neighbourhood

### GeoSpacial data
Source: https://data.cityofnewyork.us/City-Government/Borough-Boundaries/tqmj-j8zm
Usage: using for neighbourhood boundaries for visualization

## Methodology
For this particular example the methodology is quite straightforward and analyzing the data is not overly complex either. The steps to achieve the desired answers and to draw some conclusions (if possible) are the following:
- Gather and clean / scrape New York neighbourhood data
- Read data from https://cocl.us/new_york_dataset
- Read and map geo data (latitude and longitude) to each Neighborhood and borough
- Read the list of Hungarian and Eastern European restaurants from the Foursquare API mainly focusing on Manhattan
- Do some exploratory data analysis to see if the scope of the analysis could be or should be changed (i.e.) remove Neighborhoods are   
- Run a simple density analysis on neighbourhoods to see which ones have the highest concentration of restaurants
- Use the K-Means method to cluster neighbourhoods and get answers for the distribution of Eastern European restaurants. 


In [2]:
#importing the necessary libraries
import pandas as pd
import numpy as np

!pip install geocoder
import geocoder
import os
import requests
!pip install folium
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

print("dependent libraries are imported...")

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 7.7MB/s ta 0:00:011
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/fd/a0/ccb3094026649cda4acd55bf2c3822bb8c277eb11446d13d384e5be35257/folium-0.10.1-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 12.1MB/s eta 0:00:01
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/81/6d/31c83485189a2521a75b4130f1fee5364f772a0375f81afff619004e5237/branca-0.4.0-py3-none-any

We need some helper methods:
- convert address to long / lat (so we can match to neighbourhoods)
- reading venues from Foursquare API (for categories)
- reading venue details by ID

In [3]:
# The code was removed by Watson Studio for sharing.

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
def get_CEE_venues(lat,lng):
    
    #set variables
    radius=1000
    LIMIT=100
    #categories from https://developer.foursquare.com/docs/api-reference/venues/categories/ and hence https://api.foursquare.com/v2/venues/categories
    CATEGORIES = '52e81612bcbc57f1066b79fa,52960bac3cf9994f4e043ac4,52f2ae52bcbc57f1066b8b81,52e81612bcbc57f1066b7a04,56aa371be4b08b9a8d57355a'
   
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryID={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            CATEGORIES)
    
    # get all the data
    results = requests.get(url).json()
    venue_data=results["response"]['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
        
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df


In [None]:
def get_venues(lat,lng):
    
    #set variables
    radius=1000
    LIMIT=100
   
    #url to fetch data from foursquare api
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
    
    # get all the data
    results = requests.get(url).json()
    venue_data=results["response"]['groups'][0]['items']
    venue_details=[]
    for row in venue_data:
        try:
            venue_id=row['venue']['id']
            venue_name=row['venue']['name']
            venue_category=row['venue']['categories'][0]['name']
            venue_details.append([venue_id,venue_name,venue_category])
        except KeyError:
            pass
        
    column_names=['ID','Name','Category']
    df = pd.DataFrame(venue_details,columns=column_names)
    return df

In [None]:
# The code was removed by Watson Studio for sharing.

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # make the GET request#foursquare api connection
        # the following variables are used to connect (already defined in function)
        # CLIENT_ID
        # CLIENT_SECRET
        # VERSION
        # uses the predefined getFoursquareExplore function, which does nothing else but calls the foursquare explore with the credentials. 
        # the function is only hidden for privacy
        results = getFoursquareExplore(lat,lng,radius,limit)["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
    
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
    return new_york_data

In [8]:
#reading new york data
new_york_data=get_new_york_data()

In [9]:
new_york_data.shape

(306, 4)

In [10]:
new_york_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [12]:
#so there are 306 new york neighborhoods, need to look up Hungarian and other central / eastern european restaurants 
# reading Hungarian, Romanian, Czech, Slovakian and Polish restaurants in one go rather than having 5 calls to the API. 
#(While there is an Eastern European category in Foursquare it is not the one we're looking for)

column_names=['Borough', 'Neighborhood', 'Latitude','Longitutde', 'Hun', 'CEE']
restaurants = pd.DataFrame(columns = column_names)

cee = ["Hungarian Restaurant","Czech Restaurant","Slovak Restaurant", "Polish Restaurant", "Romanian Restaurant"]

#iterating through all neighborhoods, getting venues and adding 2 columns, Hun count and CE count (Hun + all others) 

for row in new_york_data.values.tolist():
    Borough, Neighborhood, Latitude, Longitude=row
    venues = get_CEE_venues(Latitude,Longitude)
    hun_restaurants=venues[venues['Category']=='Hungarian Restaurant']   
    cee_restaurants=venues[venues['Category'].isin(cee)]
    
    print('Hungarian Resturants in '+Neighborhood+', '+Borough+':'+str(len(hun_restaurants)))
    print('CEE Resturants in '+Neighborhood+', '+Borough+':'+str(len(cee_restaurants)))
    #adding a new row to our restaurants frame
    restaurants = restaurants.append({'Borough': Borough,
                        'Neighborhood': Neighborhood, 
                        'Latitude': Latitude,
                        'Longitutde' : Longitude,
                        'Hun' : len(hun_restaurants),
                        'CEE' : len(cee_restaurants)
                                               }, ignore_index=True)
    



Hungarian Resturants in Wakefield, Bronx:0
CEE Resturants in Wakefield, Bronx:0
Hungarian Resturants in Co-op City, Bronx:0
CEE Resturants in Co-op City, Bronx:0
Hungarian Resturants in Eastchester, Bronx:0
CEE Resturants in Eastchester, Bronx:0
Hungarian Resturants in Fieldston, Bronx:0
CEE Resturants in Fieldston, Bronx:0
Hungarian Resturants in Riverdale, Bronx:0
CEE Resturants in Riverdale, Bronx:0
Hungarian Resturants in Kingsbridge, Bronx:0
CEE Resturants in Kingsbridge, Bronx:0
Hungarian Resturants in Marble Hill, Manhattan:0
CEE Resturants in Marble Hill, Manhattan:0
Hungarian Resturants in Woodlawn, Bronx:0
CEE Resturants in Woodlawn, Bronx:0
Hungarian Resturants in Norwood, Bronx:0
CEE Resturants in Norwood, Bronx:0
Hungarian Resturants in Williamsbridge, Bronx:0
CEE Resturants in Williamsbridge, Bronx:0
Hungarian Resturants in Baychester, Bronx:0
CEE Resturants in Baychester, Bronx:0
Hungarian Resturants in Pelham Parkway, Bronx:0
CEE Resturants in Pelham Parkway, Bronx:0
Hu

In [13]:
restaurants.shape

(306, 6)

In [14]:
restaurants.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitutde,Hun,CEE
0,Bronx,Wakefield,40.894705,-73.847201,0,0
1,Bronx,Co-op City,40.874294,-73.829939,0,0
2,Bronx,Eastchester,40.887556,-73.827806,0,0
3,Bronx,Fieldston,40.895437,-73.905643,0,0
4,Bronx,Riverdale,40.890834,-73.912585,0,0


In [20]:
print('Hungarian restaurants: ' + str(restaurants['Hun'].sum()))
print('CEE restaurants: ' + str(restaurants['CEE'].sum()))

Hungarian restaurants: 0
CEE restaurants: 9


## Analysis of results

The outcome of reading all Hungarian and Central / Eastern European restaurants in New York is somewhat surprising. In the given categories only 9 restaurants were found, while there was no dedicated Hungarian restaurant in New York City. This is somewhat contradicting with the assumtions, as there are some places in New Jersey and there seem to be some Hungarian places present in NYC as well. Further analysis, however, showed that those are mainly cateogized as i.e. bakery and others under food. As the original business problem was to analyze if a restaurant is feasible, those are ommited now.

The distribution of Eastern European restaurants does not yield any significant result either, as those are:
- Greenpoint / Brooklyn: 3
- Arrochar / Staten Island: 1
- Blissville / Queens: 1
- Lenox Hill / Manhattan: 1
- Ridgewood / Queens: 1
- Roosevelt Island / Manhattan: 1
- Steinway / Queens: 1

### Conclusion 1
Given the very low number of restaurants we can safely state that neither the Chinatown model (flocking same type of restaurants) seem to be doable, nor the 'Go against the current', when the owner explicitly chooses a place where no similar restaurants are present are really options.

### Conclusion 2
We should recommend a location with a very high density of restaurants so it is likely to get enough visitors - the place is already known and liked for its food selection and variety. For this we will analyze the restaurant density for Manhattan only.

### Finding the best Manhattan districts