<h1><center> The Battle of the Neighborhoods - The Capstone Project <center></h1>

<h2><center> Opening a restaurant of Russian national cuisine in New York City <center></h2>

<img src = "https://cache.marriott.com/marriottassets/destinations/hero/new-york-city-destination.jpg?interpolation=progressive-bilinear&resize=2880:960" width = 1920>

<h3><center>Table of Contents<center></h3>
<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li> 
            <a href="#intro"> Business Problem </a>
        </li>
        <li> 
            <a href="#data"> Data </a>
        </li>
        <li> 
            <a href="#methodology"> Methodology </a>
        </li>
             <ul>
                <li> 
                    <a href="#import"> Importing all relevant libraries </a>
                </li>
                <li> 
                    <a href="#upload"> Uploading neighborhoods database </a>
                </li>
                <li> 
                    <a href="#choose"> Choosing district - Brooklyn </a>
                </li>
                <li> 
                    <a href="#coord"> Getting the geographical coordinates of the district </a>
                </li>
                <li> 
                    <a href="#map_1"> Creating a map of District's neighborhoods using latitude and longitude values </a>
                </li>
                 <li> 
                    <a href="#api"> Foursquare API to explore the neighborhoodse values </a>
                </li>
                 <li> 
                    <a href="#map_2"> Creating a map of 30 Brooklyn (Bay Ridge) restaurants using latitude and longitude values </a>
                </li>
            </ul>
        <li> 
            <a href="#results"> Results </a>
        </li>
        <li> 
            <a href="#discussion"> Discussion </a>
        </li>
        <li> 
            <a href="#conclusion"> Conclusion </a>
        </li>
     </ul>
</div>


<a id="intro"></a>
<h2>Business Problem</h2>

The target city of my interest is **New York**. 

The aim of this project is to define the optimal location for opening a new **restaurant** in one of the New York districts. 

The main problem of the project is that New York is crowded with various kinds of restaurants so the main challange is to select right location with minimum existing restaurants in the nearest neighborhood.

The project can be relevant to those businessmen who desire to enlarge their business, New York has always been an attractive market.

<a id="data"></a>
<h2>Data</h2>

The main data will be extracted using **Foursquare API** and the **"newyork_data.json"** file provided in the course earlier, as there are several problems with Foursquare when trying to fetch data of other cities.

**Firstly**, the newyork_data.json file will be used to extract the information about New York districts and to choose one particular.

**Secondly**, using Foursquare API, the relevant data about neighborhood venues will be uploaded.

**Then**, only "food" types of venues will be chosen and mapped to determine the possible location of a new restaurant.

<a id="methodology"></a>
<h2>Methodology</h2>

<a id="import"></a>
<h3>Importing all relevant libraries</h3>

In [None]:
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

pd.options.display.max_columns = None
pd.options.display.max_rows = None

print('Libraries imported.')

<a id="upload"></a>
<h3>Uploading neighborhoods database</h3>

In [None]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

In [None]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
neighborhoods_data = newyork_data['features']

In [None]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [None]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [None]:
neighborhoods

<a id="choose"></a>
<h3>Choosing district - Brooklyn</h3>

In [None]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data

Brooklyn has 70 neighborhoods

<a id="coord"></a>
<h3>Getting the geographical coordinates of the district</h3>

In [43]:
address = 'Brooklyn, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Brooklyn are 40.6501038, -73.9495823.


<a id="map_1"></a>
<h3>Creating a map of Brooklyn's neighborhoods using latitude and longitude values</h3>

In [41]:
map_brooklyn = folium.Map(location=[40.6501038, -73.9495823], zoom_start=11)

for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)
map_brooklyn

<a id="api"></a>
<h3>Foursquare API to explore the neighborhoods</h3>

**Defining Foursquare Credentials and Version**

In [None]:
CLIENT_ID = 'FNLZPWYNF5YI0RT3AHCNH2HUMRTZLDVKYBRLFDPV52KJQOFI'
CLIENT_SECRET = 'TNGIE1IZFPATQEFROT1ICBP04WCLHCDBP4O4ILWML3ALJZCK'
VERSION = '20200325'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id=FNLZPWYNF5YI0RT3AHCNH2HUMRTZLDVKYBRLFDPV52KJQOFI&client_secret=TNGIE1IZFPATQEFROT1ICBP04WCLHCDBP4O4ILWML3ALJZCK&v=20200325&ll=40.6501038, -73.9495823&radius=5000&limit=1000'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *brooklyn_venues*.

In [None]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude'])

**Let's check the size of the resulting dataframe**

In [34]:
print(brooklyn_venues.shape)

(7000, 7)


**7000 seems strange as the limit was 1000 and 7000 does not correlate with any other setting**

In [35]:
brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Kings Theatre,40.64611,-73.957175,Theater
1,Bay Ridge,40.625801,-74.030621,Peppa's Jerk Chicken,40.654953,-73.959783,Caribbean Restaurant
2,Bay Ridge,40.625801,-74.030621,Risbo,40.656012,-73.959912,Restaurant
3,Bay Ridge,40.625801,-74.030621,PLG Coffee House and Tavern,40.660007,-73.953362,Café
4,Bay Ridge,40.625801,-74.030621,Little Mo Wine & Spirits,40.660346,-73.95036,Liquor Store


**Let's see hom many unique values do we have among 7000**

In [36]:
brooklyn_venues['Venue'].unique()

array(['Kings Theatre', "Peppa's Jerk Chicken", 'Risbo',
       'PLG Coffee House and Tavern', 'Little Mo Wine & Spirits',
       'Prospect Park - Parkside', 'Empanada City', "Erv's On Beekman",
       'Glou', 'Ix', 'Silver Rice', 'The Food Sermon',
       'LeFrak Center at Lakeside',
       'Prospect Park Boathouse & Audubon Center', 'De Hot Pot Roti Shop',
       'Prospect Park (Lincoln Rd Playground)', 'Drink',
       'Smorgasburg Prospect Park', 'Prospect Park (Peninsula)',
       'Prospect Park (Nethermead)', 'The Castello Plan', 'Der Pioneer',
       'Werkstatt', 'Wheated', 'Prospect Park', 'Brooklyn Botanic Garden',
       'Bonsai Museum', 'Discovery Garden @ Brooklyn Botanic Garden',
       'Colina Cuervo', 'Japanese Garden', "Brancaccio's",
       'Prospect Park (Dog Run)', 'Barboncino', "Sal's Restaurant",
       'Brooklyn Botanic Garden - Visitor Center', 'Cherry Esplanade',
       'Brooklyn Museum', 'Lula Bird',
       'Trinidad Golden Palace Restaurant', "Glady's", 'Super 

In [37]:
len(brooklyn_venues['Venue'].unique())

99

**We have 7000 rows, but only 99 unique Venues, which means that this data consists of hundreds of duplicates.**

**Now we are looking for a special Venue category**

In [None]:
display(brooklyn_venues)

**After a closer look it becomes clear that there are 99 venues and they are the same for each neighborhood, moroover, the coordinates are also the same.**

In [39]:
len(brooklyn_venues['Venue Latitude'].unique())

100

In [40]:
len(brooklyn_venues['Venue Longitude'].unique())

100

**Here is the list of unique venues**

In [None]:
venues = brooklyn_venues['Venue'].drop_duplicates()
venues

**Let us choose the first Neighborhood (Bay Ridge), as there is no difference in venues between Neighborhood as for the data provided by Foursquare API**

In [None]:
df = brooklyn_venues[0:100]
df

In [None]:
df.drop(['Neighborhood','Neighborhood Latitude','Neighborhood Longitude'], axis = 1, inplace = True)
df

In [None]:
df['Venue Category'].unique()

**From all list of venues we need only restaurants**

In [33]:
new = df[df["Venue Category"].str.contains('Restaurant', na=True)]
new

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Peppa's Jerk Chicken,40.654953,-73.959783,Caribbean Restaurant
2,Risbo,40.656012,-73.959912,Restaurant
6,Empanada City,40.661631,-73.950436,Empanada Restaurant
8,Glou,40.662949,-73.953869,Tapas Restaurant
10,Silver Rice,40.659673,-73.960368,Sushi Restaurant
11,The Food Sermon,40.664588,-73.953735,Caribbean Restaurant
14,De Hot Pot Roti Shop,40.661424,-73.960884,Caribbean Restaurant
22,Werkstatt,40.645252,-73.970341,Austrian Restaurant
37,Lula Bird,40.670854,-73.950421,Southern / Soul Food Restaurant
39,Glady's,40.671711,-73.95776,Caribbean Restaurant


In [32]:
new['Venue'].count()

30

**So, we have 30 restaurants to be mapped**

<a id="map_2"></a>
<h3>Creating a map of 30 Brooklyn (Bay Ridge) restaurants using latitude and longitude values</h3>

In [31]:
map_rest_bay = folium.Map(location=[40.6501038, -73.9495823], zoom_start=12, tiles = 'Stamen Terrain')

for lat, lng, label in zip(new['Venue Latitude'], new['Venue Longitude'], new['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],
                        radius=6,
                        popup=label,
                        color='green',
                        fill=True,
                        fill_color='blue',
                        fill_opacity=0.3).add_to(map_rest_bay)  
    
map_rest_bay

<a id="results"></a>
<h2>Results</h2>

In my opinion, the main result of the project is achieved - we have all potential competitors mapped and then the business side decides where exactly to locate the new restaurant.

<a id="discussion"></a>
<h2>Discussion</h2>

I throroughly studied the lab "Segmenting and Clustering Neighborhoods in New York City" and notice that it simply does not work: the uploaded data shows the sema venues for every neighborhood, making them initiallt equal, that is why clustering in the lab also does not work. That is why I hope the tutors will change this, very important, module, because we all struggle a lot, as I read from the forum. This API makes a lot of problems.

<a id="conclusion"></a>
<h2>Conclusion</h2>

In my opinion, in this report I have showed the most important steps in the business-like tasks.