## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Hotpot restaurant** in **Singapore**.


Since there are lots of restaurants in Singapore we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Hotpot restaurants in vicinity**. We would also prefer locations as **close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

 ## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Hotpot restaurants in the neighborhood, if any
* distance of neighborhood from city center.

To achieve these, the following data are needed:
1. Foursquare location data 
2. coordinate data to calculate neighborhood

### Neighborhood Candidates

Let's first find the latitude & longitude of Singapore city center. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Singapore city center. Check Google the Singapore downtown core is [1.287953, 103.851784]

In [1]:
singapore_center = [1.287953, 103.851784]

Now let's create a grid of area candidates, equaly spaced, centered around center and within ~3km from downtown. Our neighborhoods will be defined as circular areas with a radius of 500 meters, so our neighborhood centers will be 1000 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [5]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone='48N', datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone='48N', datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Singapore center longitude={}, latitude={}'.format(singapore_center[1], singapore_center[0]))
x, y = lonlat_to_xy(singapore_center[1], singapore_center[0])
print('Singapore center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Singapore center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Singapore center longitude=103.851784, latitude=1.287953
Singapore center UTM X=372255.770901449, Y=142386.51904085424
Singapore center longitude=103.851784, latitude=1.2879529999999997


Let's create a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [20]:
singapore_center_x, singapore_center_y = lonlat_to_xy(singapore_center[1], singapore_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = singapore_center_x - 3000
x_step = 1000
y_min = singapore_center_y - 3000 - (int(21/k)*k*1000 - 6000)/2
y_step = 1000 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 500 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(singapore_center_x, singapore_center_y, x, y)
        if (distance_from_center <= 3001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

30 candidate neighborhood centers generated.


Let's visualize the data we have so far: center location and candidate neighborhood centers:

In [10]:
import folium

In [21]:
map_singapore = folium.Map(location=singapore_center, zoom_start=13)
folium.Marker(singapore_center, popup='Downtown').add_to(map_singapore)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=500, color='blue', fill=False).add_to(map_singapore)
map_singapore

OK, we now have the coordinates of centers of neighborhoods/areas to be evaluated, equally spaced (distance from every point to it's neighbors is exactly the same) and within ~3km from Downtown core. 

Let's now use Google Maps API to get approximate addresses of those locations.

In [25]:
for (lat, lon) in zip(latitudes, longitudes):
    print(str.format('{0:.6f}', lat), str.format('{0:.6f}', lon))

1.264448 103.842807
1.264452 103.851794
1.264456 103.860782
1.272280 103.838310
1.272284 103.847297
1.272288 103.856285
1.272292 103.865272
1.280111 103.833813
1.280115 103.842800
1.280119 103.851787
1.280123 103.860775
1.280127 103.869762
1.287943 103.829316
1.287947 103.838303
1.287951 103.847290
1.287955 103.856278
1.287959 103.865265
1.287963 103.874252
1.295778 103.833806
1.295782 103.842793
1.295787 103.851780
1.295791 103.860768
1.295795 103.869755
1.303614 103.838296
1.303618 103.847283
1.303622 103.856271
1.303626 103.865258
1.311450 103.842786
1.311454 103.851773
1.311458 103.860761


In [26]:
# check the address in https://www.latlong.net/Show-Latitude-Longitude.html

addresses = ['Road J, Singapore Bukit Merah','Road A, Singapore Bukit Merah', 'Marina Coastal Drive', '37 Keppel Road',
'Keppel Road, Singapore Marina, 07 Marina', 'Marina Coastal Expressway',
'Singapore Marina, 01 Marina','16 College Road','1A Kreta Ayer Road', 'Central Boulevard',
'Sheares Avenue',
'Singapore Marina, 01 Marina',
'Havelock Link,16 Tiong Bahru',
'5 Jalan Minyak',
'20B Hongkong Street',
'Raffles Avenue',
'03 Downtown Core',
'02 Marina Singapore',
'341 River Valley Road',
'3 Jalan Rumbia',
'Victoria Street, 18 Downtown Core',
'Rochor Road, 03 Downtown Core',
'185 Tanjong Rhu Road',
'5 Hullet Road, Newton',
'4 Upper Wilkie Road, Newton',
'50 Ban San Street, Rochor',
'Beach Road, Kallang',
'174 Bukit Timah Road, Novena',
'21 Northumberland Road, Rochor',
'11A Hamilton Road, Kallang'
]

In [27]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Road J, Singapore Bukit Merah",1.264448,103.842807,371255.770901,139788.44283,2783.882181
1,"Road A, Singapore Bukit Merah",1.264452,103.851794,372255.770901,139788.44283,2598.076211
2,Marina Coastal Drive,1.264456,103.860782,373255.770901,139788.44283,2783.882181
3,37 Keppel Road,1.27228,103.83831,370755.770901,140654.468233,2291.287847
4,"Keppel Road, Singapore Marina, 07 Marina",1.272284,103.847297,371755.770901,140654.468233,1802.775638
5,Marina Coastal Expressway,1.272288,103.856285,372755.770901,140654.468233,1802.775638
6,"Singapore Marina, 01 Marina",1.272292,103.865272,373755.770901,140654.468233,2291.287847
7,16 College Road,1.280111,103.833813,370255.770901,141520.493637,2179.449472
8,1A Kreta Ayer Road,1.280115,103.8428,371255.770901,141520.493637,1322.875656
9,Central Boulevard,1.280119,103.851787,372255.770901,141520.493637,866.025404


In [28]:
df_locations.to_pickle('./locations.pkl')  

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

In [50]:
# set Foursquare API
foursquare_client_id = 'YUAXAKMY353C4NGWSRLVPZ4CRRTEOQEEI3THEHLGRBHYJ0ED' # your Foursquare ID
foursquare_client_secret = 'KA344V33NFFR21VR0EU1TFSFY5ZCGFA2Y3BPF1AF3BCE5XDF' # your Foursquare Secret

In [59]:
import requests

In [82]:
# Category IDs corresponding to Hotpot restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

hotpot_restaurant_categories = ['52af0bd33cf9994f4e043bdd']

def is_restaurant(categories, specific_filter=None): 
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    hotpot_words = ['hot pot', 'steamboat','bijin nabe']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None):
            if (category_id in specific_filter):
                specific = True
                restaurant = True
            for r in hotpot_words:
                if r in category_name:
                    specific = True
                    restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20190101'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [86]:
# Let's now go over our neighborhood locations and get nearby restaurants; 
# we'll also maintain a dictionary of all found restaurants and all found hotpot restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    hotpot_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=550 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, foursquare_client_id, foursquare_client_secret, radius=550, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_hotpot = is_restaurant(venue_categories, specific_filter=hotpot_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_hotpot, x, y)
                if venue_distance<=500:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_hotpot:
                    hotpot_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, hotpot_restaurants, location_restaurants

# Try to load from local file system in case we did this before
# restaurants = {}
# hotpot_restaurants = {}
# location_restaurants = []
# loaded = False
# try:
#     with open('restaurants_550.pkl', 'rb') as f:
#         restaurants = pickle.load(f)
#     with open('hotpot_restaurants_550.pkl', 'rb') as f:
#         hotpot_restaurants = pickle.load(f)
#     with open('location_restaurants_550.pkl', 'rb') as f:
#         location_restaurants = pickle.load(f)
#     print('Restaurant data loaded.')
#     loaded = True
# except:
#     pass

# If load failed use the Foursquare API to get the data
#if not loaded:
if True:
    restaurants, hotpot_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_550.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('hotpot_restaurants_550.pkl', 'wb') as f:
        pickle.dump(hotpot_restaurants, f)
    with open('location_restaurants_550.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [87]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Hotpot restaurants:', len(hotpot_restaurants))
print('Percentage of Hotpot restaurants: {:.2f}%'.format(len(hotpot_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 727
Total number of Hotpot restaurants: 7
Percentage of Hotpot restaurants: 0.96%
Average number of restaurants in neighborhood: 20.333333333333332


In [88]:
print('List of Italian restaurants')
print('---------------------------')
for r in list(hotpot_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(hotpot_restaurants))

List of Italian restaurants
---------------------------
('53ae48b4498ec970f3cc0455', 'CITY Hot Pot Shabu shabu', 1.284172851493935, 103.85158494974766, 'One Raffles Place #04-28, 048616, Singapore', 451, True, 372233.4346508933, 141968.62249332076)
('50ae28fde4b0062752d48b38', 'Hai Di Lao Hot Pot', 1.2894538145879413, 103.84565413368888, '#02-04 Blk 3D River Valley Rd, Clarke Quay, 179023, Singapore', 247, True, 371573.7781421836, 142552.74695106354)
('5ab9b9acd7627e775da498b0', 'Spice World Hot Pot 香天下火锅', 1.290611289561904, 103.84507235886304, 'Block B, 3 River Valley Road(Clarke Quay) #01-06/07, 179021, Singapore', 385, True, 371509.1023130056, 142680.73886923498)
('5889fcf13e88350e4e2a266a', 'Da Miao Hotpot', 1.2900504965819144, 103.84598717740977, '3C River Valley Road, #01-11 The Cannery, 179022, Singapore', 275, True, 371610.8657191406, 142618.69523009294)
('5a268cf98173cb7b40543d4c', 'Hai Di Lao Hot Pot', 1.3000075971167873, 103.84496501271408, 'Plaza Singapura #04-01 (68 Orcha

In [89]:
map_singapore = folium.Map(location=singapore_center, zoom_start=13)
folium.Marker(singapore_center, popup='Downtown Core').add_to(map_singapore)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_hotpot = res[6]
    color = 'red' if is_hotpot else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_singapore)
map_singapore

Looking good. So now we have all the restaurants in area within few kilometers from Downtown Core, and we know which ones are Hotpot restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Hotpot restaurant!

## Analysis <a name="analysis"></a>

First let's count the **number of restaurants in every area candidate**:

In [91]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=500m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=500m: 20.333333333333332


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"Road J, Singapore Bukit Merah",1.264448,103.842807,371255.770901,139788.44283,2783.882181,1
1,"Road A, Singapore Bukit Merah",1.264452,103.851794,372255.770901,139788.44283,2598.076211,0
2,Marina Coastal Drive,1.264456,103.860782,373255.770901,139788.44283,2783.882181,0
3,37 Keppel Road,1.27228,103.83831,370755.770901,140654.468233,2291.287847,1
4,"Keppel Road, Singapore Marina, 07 Marina",1.272284,103.847297,371755.770901,140654.468233,1802.775638,9
5,Marina Coastal Expressway,1.272288,103.856285,372755.770901,140654.468233,1802.775638,0
6,"Singapore Marina, 01 Marina",1.272292,103.865272,373755.770901,140654.468233,2291.287847,2
7,16 College Road,1.280111,103.833813,370255.770901,141520.493637,2179.449472,28
8,1A Kreta Ayer Road,1.280115,103.8428,371255.770901,141520.493637,1322.875656,69
9,Central Boulevard,1.280119,103.851787,372255.770901,141520.493637,866.025404,46


now let's calculate the **distance to nearest Hotpot restaurant from every area candidate center**

In [92]:
distances_to_hotpot_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in hotpot_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_hotpot_restaurant.append(min_distance)

df_locations['Distance to Hotpot restaurant'] = distances_to_hotpot_restaurant

In [93]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Hotpot restaurant
0,"Road J, Singapore Bukit Merah",1.264448,103.842807,371255.770901,139788.44283,2783.882181,1,2389.353422
1,"Road A, Singapore Bukit Merah",1.264452,103.851794,372255.770901,139788.44283,2598.076211,0,2180.29408
2,Marina Coastal Drive,1.264456,103.860782,373255.770901,139788.44283,2783.882181,0,2407.97732
3,37 Keppel Road,1.27228,103.83831,370755.770901,140654.468233,2291.287847,1,1977.49629
4,"Keppel Road, Singapore Marina, 07 Marina",1.272284,103.847297,371755.770901,140654.468233,1802.775638,9,1398.271818
5,Marina Coastal Expressway,1.272288,103.856285,372755.770901,140654.468233,1802.775638,0,1414.155783
6,"Singapore Marina, 01 Marina",1.272292,103.865272,373755.770901,140654.468233,2291.287847,2,2011.096487
7,16 College Road,1.280111,103.833813,370255.770901,141520.493637,2179.449472,28,1674.123649
8,1A Kreta Ayer Road,1.280115,103.8428,371255.770901,141520.493637,1322.875656,69,1075.474723
9,Central Boulevard,1.280119,103.851787,372255.770901,141520.493637,866.025404,46,448.685168


In [94]:
print('Average distance to closest Hotpot restaurant from each area center:', df_locations['Distance to Hotpot restaurant'].mean())

Average distance to closest Hotpot restaurant from each area center: 1238.90210058013


In [95]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

hotpot_latlons = [[res[2], res[3]] for res in hotpot_restaurants.values()]

In [96]:
from folium import plugins
from folium.plugins import HeatMap

map_singapore = folium.Map(location=singapore_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_singapore) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_singapore)
folium.Marker(singapore_center).add_to(map_singapore)
folium.Circle(singapore_center, radius=1000, fill=False, color='white').add_to(map_singapore)
folium.Circle(singapore_center, radius=2000, fill=False, color='white').add_to(map_singapore)
folium.Circle(singapore_center, radius=3000, fill=False, color='white').add_to(map_singapore)
map_singapore

Looks like a few pockets of low restaurant density closest to city center can be found **west, north-east and south-east from Downtown Core**. 

In [97]:
from folium import plugins
from folium.plugins import HeatMap

map_singapore = folium.Map(location=singapore_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_singapore) #cartodbpositron cartodbdark_matter
HeatMap(hotpot_latlons).add_to(map_singapore)
folium.Marker(singapore_center).add_to(map_singapore)
folium.Circle(singapore_center, radius=1000, fill=False, color='white').add_to(map_singapore)
folium.Circle(singapore_center, radius=2000, fill=False, color='white').add_to(map_singapore)
folium.Circle(singapore_center, radius=3000, fill=False, color='white').add_to(map_singapore)
map_singapore

There are only 7 hotpot restaurants and most of them are in west part of Downtown core, near Clarke Quay. There are no Hotpot restaurants in North-east of Downtown Core (SUNTEC CITY area). That means **suntec city (3 Temasek Blvd, Singapore 038983)** is a good place for open a new Hotpot restaurant.

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Singapore near the Downtown Core area, but there are only 7 Hotpot restraurants in this area. And as shown in the figure, most of them are in west and south part of Downtown core. In the north-esst, the suntec city which is a famous place and near downtown but there is no hotpot restaurant there. Therefore, we recommend **suntec city (3 Temasek Blvd, Singapore 038983)** as a good place to open a new Hotpot restaurant.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Singapore Downtown areas close to center with low number of restaurants (particularly Hotpot restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Hotpot restaurant. By calculating restaurant density distribution from Foursquare data we have identified Suntec City area has no hotpot restaurants which is very near the downtown core, while the west and south of downtown core already have hotpot restaurants. So we recommend **suntec city (3 Temasek Blvd, Singapore 038983)** as a very good place to open a new Hotpot restaurant.