# Capstone Project - The Battle of the Neighborhoods 

## Introduction: Business Problem

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Italian restaurant** or an **Japanese restaurant**, in **Toronto**, Canada.

Since there are lots of restaurants there we will try to detect locations that are common avenues or streets that the people go to eat. We would also prefer locations **as close to city center as possible** or **the locations where there is a large flow of people**. We will try to show which type of restaurant is better to open and where is the best place to open it.


## Data

Based on our business problem, factors that will influence our decission are:
* number of existing restaurants 
* number of and distance to Italian/Japanese restaurants in the neighborhood, if any
* distance of the restaurant from city center or an location where is a large flow of people

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Foursquare API**

In [1]:
!pip install BeautifulSoup4
!pip install requests



In [2]:
#Importing Libraries

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import folium 

from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 

import matplotlib.cm as cm
import matplotlib.colors as colors

In [3]:
#Latitude and Longitude values

address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


In [4]:
#Creating a map

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto

In [5]:
#Foursquare Credentials

CLIENT_ID = 'VBCNKVIQHD2JM4E5VDNEK2N3TSAQ4TEXEBJ0S5QA54YLEIT1'
CLIENT_SECRET = 'N1X3FYWEKSQQD5YSYYZ2UHUAZDJGDV01VPIIKIEVIC0SDNKH'
VERSION = '20180604'

In [26]:
#Searching for japanese restaurants in Toronto

search_query = 'japanese'
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 1500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    search_query,
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f295787a52b5d02f7a05c8b'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'query': 'japanese',
  'totalResults': 171,
  'suggestedBounds': {'ne': {'lat': 43.666981713500014,
    'lng': -79.36531094178297},
   'sw': {'lat': 43.639981686499986, 'lng': -79.40255845821703}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ae7b27df964a52068ad21e3',
       'name': 'Japango',
       'location': {'address': '122 Elizabeth St.',
        'crossStreet': 'at Dundas St. W',
        'lat': 43.65526771691681,
        'lng': -79.3851

In [27]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [28]:
venues = results['response']['groups'][0]['items']
    
jap_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
jap_venues = jap_venues.loc[:, filtered_columns]

# filter the category for each row
jap_venues['venue.categories'] = jap_venues.apply(get_category_type, axis=1)

# clean columns
jap_venues.columns = [col.split(".")[-1] for col in jap_venues.columns]

jap_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Japango,Japanese Restaurant,43.655268,-79.385165
1,Kinka Izakaya Original,Japanese Restaurant,43.660596,-79.378891
2,NAMI,Japanese Restaurant,43.650853,-79.375887
3,JaBistro,Japanese Restaurant,43.649687,-79.38809
4,Rolltation,Japanese Restaurant,43.654918,-79.387424


In [29]:
#Checking how many places we have

print('{} venues were returned by Foursquare.'.format(jap_venues.shape[0]))

100 venues were returned by Foursquare.


In [30]:
#Creating a map to see where the restaurants are

map_toronto_jap = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, categories, name in zip(
        jap_venues['lat'], 
        jap_venues['lng'], 
        jap_venues['categories'], 
        jap_venues['name']):
    label = '{}, {}'.format(categories, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_jap)  

map_toronto_jap

Here we can see that we have a big concentration of japanese restaurantes Downtown and most of them are located on the same street, which is Dundas Street. Also, near to the financial district we have the 2nd most common neighborhood of restaurants. Basically, we have many options of restaurants in the city. Usually, no matter where, they are always very close. The competition between Japanese restaurants is very large, as they have many options and are located very close. But we will see.

In [31]:
#Searching for japanese restaurants in Toronto

search_query = 'italian'
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 1500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    search_query,
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f2957ed5b5de7522f59d98a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'query': 'italian',
  'totalResults': 85,
  'suggestedBounds': {'ne': {'lat': 43.666981713500014,
    'lng': -79.36531094178297},
   'sw': {'lat': 43.639981686499986, 'lng': -79.40255845821703}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d306dd82748b60c62b6dba0',
       'name': 'Trattoria Mercatto',
       'location': {'address': '220 Yonge St.',
        'crossStreet': 'in Toronto Eaton Centre',
        'lat': 43.65445314470199,
        'l

In [32]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [33]:
venues = results['response']['groups'][0]['items']
    
italian_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
italian_venues = italian_venues.loc[:, filtered_columns]

# filter the category for each row
italian_venues['venue.categories'] = italian_venues.apply(get_category_type, axis=1)

# clean columns
italian_venues.columns = [col.split(".")[-1] for col in italian_venues.columns]

italian_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Trattoria Mercatto,Italian Restaurant,43.654453,-79.380974
1,Scaddabush Italian Kitchen & Bar,Italian Restaurant,43.65892,-79.382891
2,Terroni,Italian Restaurant,43.650927,-79.375602
3,Donatello Restaurant,Italian Restaurant,43.657489,-79.383605
4,Pizzeria Libretto,Italian Restaurant,43.648334,-79.385111


In [34]:
print('{} venues were returned by Foursquare.'.format(italian_venues.shape[0]))

82 venues were returned by Foursquare.


In [35]:
map_toronto_italian = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, categories, name in zip(
        italian_venues['lat'], 
        italian_venues['lng'], 
        italian_venues['categories'], 
        italian_venues['name']):
    label = '{}, {}'.format(categories, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_italian)  

map_toronto_italian

For the italian restaurants, we have 82 options. They are not equally distributed. We have some options Downtown Toronto and most of them are in the south, in the neighborhood where is the financial district. We could say that these 2 neighborhoods are the most common places where the restaurants are. For sure, these are the areas where there is a large flow of people. The distance between Italian restaurants is greater than the distance between Japanese restaurants. Another point in favor of Italian restaurants is that there are fewer options, so the competition tends to be less.

Now, we will see how many options of restaurant we have in the city, and the ones that are the most common. 

In [36]:
#Searching for restaurants in Toronto

search_query = 'restaurants'
LIMIT = 400 # limit of number of venues returned by Foursquare API
radius = 1500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION,
    search_query,
    latitude, 
    longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5f29588b780c337071c43778'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'query': 'restaurants',
  'totalResults': 242,
  'suggestedBounds': {'ne': {'lat': 43.666981713500014,
    'lng': -79.36531094178297},
   'sw': {'lat': 43.639981686499986, 'lng': -79.40255845821703}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ae7b27df964a52068ad21e3',
       'name': 'Japango',
       'location': {'address': '122 Elizabeth St.',
        'crossStreet': 'at Dundas St. W',
        'lat': 43.65526771691681,
        'lng': -79.3

In [37]:
venues = results['response']['groups'][0]['items']
    
rest_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
rest_venues = rest_venues.loc[:, filtered_columns]

# filter the category for each row
rest_venues['venue.categories'] = rest_venues.apply(get_category_type, axis=1)

# clean columns
rest_venues.columns = [col.split(".")[-1] for col in rest_venues.columns]

rest_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Japango,Sushi Restaurant,43.655268,-79.385165
1,Poke Guys,Poke Place,43.654895,-79.385052
2,Richmond Station,American Restaurant,43.651569,-79.379266
3,The Keg Steakhouse + Bar - York Street,Restaurant,43.649987,-79.384103
4,Rosalinda,Vegetarian / Vegan Restaurant,43.650252,-79.385156


In [38]:
print('{} venues were returned by Foursquare.'.format(rest_venues.shape[0]))

100 venues were returned by Foursquare.


In [60]:
#Getting only the categories of restaurants

rest_cat = rest_venues['categories']
rest_cat

0                    Sushi Restaurant
1                          Poke Place
2                 American Restaurant
3                          Restaurant
4       Vegetarian / Vegan Restaurant
                   ...               
95                     Sandwich Place
96                         Restaurant
97                    Thai Restaurant
98                             Bistro
99    Molecular Gastronomy Restaurant
Name: categories, Length: 100, dtype: object

In [59]:
#Checking under 100 which types of restaurant are most common

rest_cat.value_counts()

Restaurant                         11
Café                                9
Italian Restaurant                  8
Japanese Restaurant                 7
Pizza Place                         5
Gastropub                           5
Vegetarian / Vegan Restaurant       4
Thai Restaurant                     4
Sandwich Place                      4
Seafood Restaurant                  3
Diner                               3
French Restaurant                   2
Steakhouse                          2
Creperie                            2
Sushi Restaurant                    2
Burrito Place                       2
Ramen Restaurant                    2
American Restaurant                 2
Middle Eastern Restaurant           2
Bakery                              2
Poke Place                          2
New American Restaurant             1
Food Court                          1
Asian Restaurant                    1
Latin American Restaurant           1
Deli / Bodega                       1
Burger Joint

Based on 100 restaurants, we have 11 restaurants that we do not specified what kind of food they serve. So, we will not consider it. We can see that after Coffee Shop, Italian and Japanese restaurants are the most common kinds of place we have in Toronto. 

In [70]:
#Taking off Japanese and Italian Restaurants

rest_venues_edit = rest_venues[(rest_venues.categories != 'Italian Restaurant') & (rest_venues.categories != 'Japanese Restaurant')]

In [71]:
rest_venues_edit.shape

(85, 4)

In [72]:
#Creating a map without Japanese and Italian restaurants

map_toronto_all = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, categories, name in zip(
        rest_venues_edit['lat'], 
        rest_venues_edit['lng'], 
        rest_venues_edit['categories'], 
        rest_venues_edit['name']):
    label = '{}, {}'.format(categories, name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_all)  

map_toronto_all

For the rest kinds of restaurant, we can state that the neighborhoods near to the financial district have a very high concentration. So, considering all restaurants, the competition there would be grater than the competition Downtown.

## Conclusion

After plotting the 3 maps, first one only for Japanese Restaurants, second one for Italian Restaurants and the last one for the rest of kinds od restaurants, we got some insights that could help us and the stakeholders to take the decision. The questions for our problem are: Should I open an Japanese restaurant or an Italian restaurant? Where should I open it? 

For the first question, considering the competition and the type of food the city consumes, the option would be to open an Italian restaurant. First, competition is less and the distance between restaurants is greater. The concentration is not as great as that of a Japanese restaurant. Second, Italian food is among the 5 most consumed in the city.

For the second question, if we are looking for a place to open a new restaurant, we have to consider the flow of people. There are a lot of companies located in the financial sector. During the week, monday to friday, the restaurants could sell more than the restaurants located Downtown. But considering the constancy of consumption, downtown would be the best option. First, there are few downtown Italian restaurant options and the distance between them is relatively large. Second, the financial district has a large volume of people during the week and on working days. If we take into account the holidays, and weekends, the volume of people and consumption there falls a lot. Another point is that, in addition to the normal flow of people downtown, tourists increase this even more. Few tourists visit the financial district. So the final decision would be to open an Italian restaurant downtown Toronto.