# __Capstone Project: The Battle of Neighborhoods__
## *Opening a Japanese restaurant in New York*

## __1. Introduction__

New York City is perhaps the most difficult market in which to open a restaurant. Businesses must distinguish themselves from countless others — both new and old — diners’ tastes are constantly evolving and the rent is, well, too damn high. One trend that has picked up steam in the past year or two in the Big Apple is the opening of Japanese chain (or mini-chain) restaurants. These restaurants have all seized on growth opportunities in New York and diners have been quick to gravitate towards them (Embricos, 2018).

This Capstone project explores different neighbourhoods in New York, and attempts to answer the following business problem: "*If investor is looking to open a new Japanese restaurant, where would you recommend that they open it?*". 

This project might be of interest to potentional *business investors* specilaizing in Japanese cuisine/ restaurant chains, as well as to aspiring *Data Scientists*, who want to learn how to implement some of the most common Exploratory Data Analysis techniques to obtain necessary data, analyze it and, finally to be able to explain your insights in a compelling story.

## __2. Data__

For this project we need the following data:

1. New York City data that contains all Boroughs and Neighbourhoods along with their latitudes and longitudes. <br>
Data Source: https://cocl.us/new_york_dataset

2. Median price per square foot for each Neighbourhood in NYC. <br>
Data Source: https://www.zumper.com/blog/nyc-by-square-foot-see-which-neighborhood-gets-you-the-most-space-for-your-money/

2. Data related to locations and ratings of Japanese restaurants in NYC. <br>
Data Source: **Foursquare API**

## __3. Methodology__

- import all required libraries

- obtain information about NYC boroughs/neighbourhoods along with their *coordinates* (using **requests** library) from https://cocl.us/new_york_dataset and load it into a data frame

- obtain information about *medium rental Price per Sq Foot* for each NYC neighbourhood from https://www.zumper.com/blog/nyc-by-square-foot-see-which-neighborhood-gets-you-the-most-space-for-your-money/ and load it into a data frame (using **BeautifulSoup** package)

- merge the above data frames on their Neighbourhood value (note that not all neighbourhoods had info on the rental price in the area, so there will be some data cleansing steps required along the way)

- next, we are going to start utilizing the **Foursquare API** to explore the neighborhoods and segment them:

    1. define Foursquare credentials and version
    2. define a function **get_venues** that returns top 100 venues for a Neighbourhood within a radius of 500 meters (using url to fetch data from Foursquare API)
    3. analyze how many Japanese restaurants are there in each Neighbourhood and borough
        - prepare a list that contains Japanese restaurants using **get_venues** function
        - calculate how many Japanese restaurants are there in each Neighbourhood
        - merge the new data with the data frame from the previous step on Neighbourhoods to link the number of Japanese restaurants in the Neighbourhood to its coordinates and median rental price per sq foot
        - data cleansing and pre-processing to prepare data frame for clustering


- run clustering on the final data frame containing Neighbourhoods, the medium rental price per sq foot and total number of Japanese restaurants in the area using **K-means algorithm** (set number of clusters = 5)

- add clustering labels to the final data frame

- visualize the results

    1. use **geopy** library to get the latitude and longitude values of New York City
    2. create map to visualize clusters
    3. examine the clusters


## __4. Results__

Now, we can examine each cluster and determine the discriminating features that distinguish each cluster.

>We can see that Neighbourhoods in **Cluster 1** have *high* Median Price per Sq Foot and *medium* number of Japanese restaurants in the area.

>We can see that Neighbourhoods in **Cluster 2** have *low* Median Price per Sq Foot and almost _no Japanese restaurants in the area_.

>We can see that Neighbourhoods in **Cluster 3** have *low* Median Price per Sq Foot and _low_ number of Japanese restaurants in the area.

>We can see that Neighbourhoods in **Cluster 4** have *high* Median Price per Sq Foot and _high_ number of Japanese restaurants in the area.

>We can see that Neighbourhoods in **Cluster 5** have *medium* Median Price per Sq Foot and _medium_ number of Japanese restaurants in the area.

## __5. Discussion__

There are several options for potential improvement of the analysis results:

- to add an extra step of finding an optimal value of K for clustering purposes (e.g. using the Elbow method).
- to add other features for analysis (e.g. average rating of the restaurants in the neighbourhood etc.)

## **6. Conclusion**

- Cluster 1: high rental price & medium competition/demand
- Cluster 2: low rental price & no competition/demand
- Cluster 3: low rental price & low competition/demand
- Cluster 4: high rental price & high competition/demand
- Cluster 5: medium rental price & medium competition/demand

Out of the Cluster descriptions outlined above, **any neigborhood in Cluster 5** that doesn't have a Japanese restaurant yet seems to be the most attractive option for investment, although this choice highly depends on the investor's budget. 