# Restaurant Business In Istanbul

## Introduction

Istanbul is the largest city in Turkey, constituting the country's economic, cultural and historical center. The population of almost 15 million lives in Istanbul's vast area of 5,343 square kilometers. Istanbul is situated at the Bosphorus – one of the world's busiest waterways – in northwestern Turkey, between the Sea of Marmara and the Black Sea. Its commercial and historical center lies in Europe, while a third of its population lives on the Asian side.

Foreign investors in Turkey often choose to open a restaurant or any other business establishment that can be linked to tourism. Turkey is an increasingly popular tourist destination and a large number of visitors will always represent a good clientele for a restaurant. Much as the country, the Turkish cuisine is a blend between different cultures - the Asian, Caucasian, Middle Eastern, Mediterranean and Balkan cuisines have shaped and influenced the Turkish food.

## Business Problem

In recent months and years, the restaurant industry has seen its share of bankruptcies, including, but not limited to, filings from Bertucci’s, Logan’s Roadhouse, Real Mex Restaurants, fast casual Noon Mediterranean, Romano’s Macaroni Grill, Scotty’s Brewhouse, Ruby’s Diner, and Iron Chef Jose Garces. Bankruptcies aren’t anything new in the industry, but they’ve been proliferating at a rapid pace, compared to more sporadic declarations 10 years ago.

If you are an investor and decided to open a new restaurant, you should know that location is one of the most important decisions you need to make. In this project, I will analyze restaurant locations and try to provide some valuable recommendation to the foreign or internal investors who want to enter restaurant business in Istanbul.

## Data

Based on definition of our problem, factors that will influence our decision are:
* Number of existing restaurants in every Borough from Foursquare API
* Latitude and Longitude values of 39 boroughes in Istanbul from Github (https://gist.github.com/ismailbaskin/2492196)

## Methodology

In this project we will direct our efforts on detecting areas of Istanbul that have low restaurant density, particularly those with low number of Italian restaurants. We will limit our analysis to area ~6km around city center.

In first step we have collected the location data of 39 Boroughes in Istanbul from the Github. Since the format of the data that I found at that website was not ready to use, I have prepared the location dataframe manualy by using the original data. I have also **identified Italian restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Istanbul 

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Italian restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

   

## Location Data 

Firstly, I got Location information of 39 boroughs of Istanbul from the github (https://gist.github.com/ismailbaskin/2492196) and converted them to a dataframe

In [2]:
data={'Borough':['Adalar','Arnavutkoy','Atasehir','Avcilar','Bagcilar','Bahcelievler','Bakirkoy','Basaksehir','Bayrampasa','Besiktas','Beykoz','Beylikduzu','Beyoglu','Buyukcekmece','Catalca','Cekmekoy','Esenler','Esenyurt','Eyup','Fatih','Gaziosmanpasa','Gungoren','Kadikoy','Kagithane','Kartal','Kucukcekmece','Maltepe','Pendik','Sancaktepe','Sariyer','Silivri','Sultanbeyli','Sultangazi','Sile','Sisli','Tuzla','Umraniye','Uskudar','Zeytinburnu'],
      'Latitude':[40.87637720,41.20000000,40.98333330,41.01534790,41.04555560,40.99750000,40.96815500,41.07789500,41.04815030,41.06861600,41.13271900,40.99103810,41.03828640,41.03413300,41.14823900,41.10423500,41.07941330,41.03428060,41.18715980,41.01666670,41.07594770,41.01666670,40.98014100,41.07100000,40.89965100,41.00865800,40.94904700,40.87932600,41.02870280,41.16632800,41.08015800,40.96111230,41.12557940,41.17638890,41.06000000,40.84200000,41.03030000,41.03223600,40.99063500],
      'Longitude':[29.09544400,28.73333300,29.11666670,28.73146180,28.84055560,28.85055560,28.82280000,28.81255100,28.90045530,29.02853550,29.10569000,28.64981440,28.97033040,28.59000300,28.46773000,29.31772720,28.85385450,28.68011940,28.88298160,28.93333330,28.90045530,28.88333330,29.08227000,28.97000000,29.19364900,28.77534200,29.17410900,29.25813500,29.29018290,29.04995000,28.26829000,29.26694380,28.87133140,29.61277780,28.98700000,29.29500000,29.10650000,29.03193800,28.89614000]
     }
df = pd.DataFrame.from_dict(data)
df

Unnamed: 0,Borough,Latitude,Longitude
0,Adalar,40.876377,29.095444
1,Arnavutkoy,41.2,28.733333
2,Atasehir,40.983333,29.116667
3,Avcilar,41.015348,28.731462
4,Bagcilar,41.045556,28.840556
5,Bahcelievler,40.9975,28.850556
6,Bakirkoy,40.968155,28.8228
7,Basaksehir,41.077895,28.812551
8,Bayrampasa,41.04815,28.900455
9,Besiktas,41.068616,29.028536


Let's look at them on a map

In [25]:
istanbul_map = folium.Map(location=[41.01384, 28.94966], zoom_start=11)

folium.Marker([40.87637720,29.09544400], popup='Adalar').add_to(istanbul_map)
folium.Marker([41.20000000,28.73333300], popup='Arnavutkoy').add_to(istanbul_map)
folium.Marker([40.98333330,29.11666670], popup='Atasehir').add_to(istanbul_map)
folium.Marker([41.01534790,28.73146180], popup='Avcilar').add_to(istanbul_map)
folium.Marker([41.04555560,28.84055560], popup='Bagcilar').add_to(istanbul_map)
folium.Marker([40.99750000,28.85055560], popup='Bahcelievler').add_to(istanbul_map)
folium.Marker([40.96815500,28.82280000], popup='Bakirkoy').add_to(istanbul_map)
folium.Marker([41.07789500,28.81255100], popup='Basaksehir').add_to(istanbul_map)
folium.Marker([41.04815030,28.90045530], popup='Bayrampasa').add_to(istanbul_map)
folium.Marker([41.06861600,29.02853550], popup='Besiktas').add_to(istanbul_map)
folium.Marker([41.13271900,29.10569000], popup='Beykoz').add_to(istanbul_map)
folium.Marker([40.99103810,28.64981440], popup='Beylikduzu').add_to(istanbul_map)
folium.Marker([41.03828640,28.97033040], popup='Beyoglu').add_to(istanbul_map)
folium.Marker([41.03413300,28.59000300], popup='Buyukcekmece').add_to(istanbul_map)
folium.Marker([41.14823900,28.46773000], popup='Catalca').add_to(istanbul_map)
folium.Marker([41.10423500,29.31772720], popup='Cekmekoy').add_to(istanbul_map)
folium.Marker([41.07941330,28.85385450], popup='Esenler').add_to(istanbul_map)
folium.Marker([41.03428060,28.68011940], popup='Esenyurt').add_to(istanbul_map)
folium.Marker([41.18715980,28.88298160], popup='Eyup').add_to(istanbul_map)
folium.Marker([41.01666670,28.93333330], popup='Fatih').add_to(istanbul_map)
folium.Marker([41.07594770,28.90045530], popup='Gaziosmanpasa').add_to(istanbul_map)
folium.Marker([41.01666670,28.88333330], popup='Gungoren').add_to(istanbul_map)
folium.Marker([40.98014100,29.08227000], popup='Kadikoy').add_to(istanbul_map)
folium.Marker([41.07100000,28.97000000], popup='Kagithane').add_to(istanbul_map)
folium.Marker([40.89965100,29.19364900], popup='Kartal').add_to(istanbul_map)
folium.Marker([41.00865800,28.77534200], popup='Kucukcekmece').add_to(istanbul_map)
folium.Marker([40.94904700,29.17410900], popup='Maltepe').add_to(istanbul_map)
folium.Marker([40.87932600,29.25813500], popup='Pendik').add_to(istanbul_map)
folium.Marker([41.02870280,29.29018290], popup='Sancaktepe').add_to(istanbul_map)
folium.Marker([41.16632800,29.04995000], popup='Sariyer').add_to(istanbul_map)
folium.Marker([41.08015800,28.26829000], popup='Silivri').add_to(istanbul_map)
folium.Marker([40.96111230,29.26694380], popup='Sultanbeyli').add_to(istanbul_map)
folium.Marker([41.12557940,28.87133140], popup='Sultangazi').add_to(istanbul_map)
folium.Marker([41.17638890,29.61277780], popup='Sile').add_to(istanbul_map)
folium.Marker([41.06000000,28.98700000], popup='Sisli').add_to(istanbul_map)
folium.Marker([40.84200000,29.29500000], popup='Tuzla').add_to(istanbul_map)
folium.Marker([41.03030000,29.10650000], popup='Umraniye').add_to(istanbul_map)
folium.Marker([41.03223600,29.03193800], popup='Uskudar').add_to(istanbul_map)
folium.Marker([40.99063500,28.89614000], popup='Zeytinburnu').add_to(istanbul_map)


istanbul_map

## Foursquare Data

Now that I have location candidates. Let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

I have added my Foursquare API credentials

In [3]:
CLIENT_ID = 'SYIQ0G4IJMQZRNR1TY4QBVFPXTSHD03EJSESWYEPVQWWYJF0' # your Foursquare ID
CLIENT_SECRET = '3GAWUHFWNJ3IYUH204ORS2TXGHCKUVIE23ZIENZSGTXWXIVM' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

In [7]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Istanbul', '')
    address = address.replace(', İstanbul', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues


Let's now go over our neighborhood locations and get nearby restaurants; 
we'll also maintain a dictionary of all found restaurants and all found italian restaurants

In [None]:
def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Let's look at restaurant statistics

In [None]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

In [None]:
print('List of Italian restaurants')
print('---------------------------')
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(italian_restaurants))

## Analysis

I have performed some basic explanatory data analysis and derive some additional info from my raw data. First let's count the **number of restaurants in every area candidate**:

In [None]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

What we have now is a clear indication of zones with low number of restaurants in vicinity, and *no* Italian restaurants at all nearby.

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

In [None]:
from sklearn.cluster import KMeans

number_of_clusters = 39

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

Istanbul_map = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_berlin)
HeatMap(restaurant_latlons).add_to(map_berlin)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_berlin)
folium.Marker(istanbul_center).add_to(Istanbul_map)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_berlin) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(Istanbul_map)
map_berlin

Our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones 'rich' with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

In [None]:
Istanbul_map = folium.Map(location=roi_center, zoom_start=14)
folium.Marker(istanbul_center).add_to(istanbul_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_berlin)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=False).add_to(map_berlin) 
folium.GeoJson(berlin_boroughs, style_function=boroughs_style, name='geojson').add_to(map_berlin)
Istanbul_Map

## Results and Discussions

Our analysis shows that although there is a great number of restaurants in Istanbul, there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected near Bosphorous, so we should focused our attention to those areas . Boroughs which offer a combination of popularity among tourists, closeness to city center, strong socio-economic dynamics a number of pockets of low restaurant density will give us better conclusions.

After directing our attention to this more narrow area of interest I first created a dense grid of location candidates (spaced 100m appart); those locations were then filtered so that those with more than two restaurants in radius of 250m and those with an Italian restaurant closer than 400m were removed.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all this is 39 boroughes containing largest number of potential new restaurant locations based on number of and distance to existing venues - both restaurants in general and Italian restaurants particularly. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to the Bosphorous, but not crowded with existing restaurants (particularly Italian) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conlusion

Purpose of this project was to identify Istanbul areas close to center with low number of restaurants (particularly Italian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Italian restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location , levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.