# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by Michael PD

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in franchising a **Fast Food Restaurant** in **Central Jakarta**, Indonesia.

Since there are lots of restaurants in Central Jakarta we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Fast Food restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use data science to generate the most promising neighborhoods for this business, based on the predefined criteria above. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decisions are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Fast Food restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

For this project, we will **explore five Jakarta regions: West, East, Central, North, and South.**

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Jakarta center will be obtained using **Google Maps geocoding**

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Jakarta city center.

Let's first find the latitude & longitude of Central Jakarta city center, using specific, well known address and Google Maps geocoding API.

For later use, we will import necessary libaries to this notebook.

In [37]:
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install geopy
from geopy.geocoders import Nominatim
import json
import requests # library to handle requests
!pip install geocoder
import geocoder

!pip install shapely
import shapely.geometry
!pip install pyproj
import pyproj
import math
import pickle

print('Libraries imported.')

Libraries imported.


Let's find the postal codes dataframe for Jakarta

a. Retrieving the postal codes of neighborhoods (Town) in Central Jakarta
(https://kodepos.nomor.net/_kodepos.php?_i=kota-kodepos&sby=100000&daerah=Provinsi&jobs=DKI%20Jakarta)

After a separate cleaning through excel, we finally read the csv file into this notebook.

In [38]:
df = pd.read_csv('Central Jakarta Postal Codes.csv',encoding= 'unicode_escape')
df.dropna(axis=0,inplace=True)
df.tail()

Unnamed: 0,Postal Code,Sub-District,Town
39,10250,Tanah Abang,Kampung Bali
40,10220,Tanah Abang,Karet Tengsin
41,10240,Tanah Abang,Kebon Kacang
42,10230,Tanah Abang,Kebon Melati
43,10260,Tanah Abang,Petamburan


In [39]:
print("There are {} Towns in Central Jakarta and \
{} unique sub-districts".format(df.shape[0],len(df["Sub-District"].unique())))

There are 44 Towns in Central Jakarta and 8 unique sub-districts


According to statistics, the area of Central Jakarta is 48.13 km²

b. We will now find the coordinates of each Towns in the dataframe above.

In [40]:
# initialize your variable to None
lat_lng_coords = None
latitude = []
longitude = []

for postal_code in df["Postal Code"]:
    # loop until you get the coordinates
    while True:
        g = geocoder.arcgis('{}, Central Jakarta City, Jakarta'.format(postal_code))
        lat_lng_coords = g.latlng
        latitude.append(lat_lng_coords[0])
        longitude.append(lat_lng_coords[1])
        break

In [41]:
df["Latitude"] = latitude
df["Longitude"] = longitude
df.head()

Unnamed: 0,Postal Code,Sub-District,Town,Latitude,Longitude
0,10520,Cempaka Putih,Cempaka Putih Barat,-6.17961,106.863545
1,10510,Cempaka Putih,Cempaka Putih Timur,-6.176818,106.8716
2,10570,Cempaka Putih,Rawasari,-6.19066,106.866442
3,10150,Gambir,Cideng,-6.170603,106.806985
4,10140,Gambir,Duri Pulo,-6.162677,106.804603


c. Visualizing all postal codes region using Folium

In [42]:
address = 'Central Jakarta City'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {} is {}, {}.'.format(address,latitude,longitude))

The geograpical coordinate of Central Jakarta City is -6.18233995, 106.84287153600738.


In [43]:
jakpus_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add the trending venues as blue circle markers
for lat, lng, postcode, subdistrict, town in \
    zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Sub-District'],df['Town']):
        label = '{}, {}, {}'.format(town, subdistrict, postcode)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6,
            parse_html=False).add_to(jakpus_map)
    
#Show the map
jakpus_map

In [8]:
jakarta_center = [latitude,longitude]
jakarta_center

[-6.18233995, 106.84287153600738]

In [44]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Jakarta center longitude={}, latitude={}'.format(jakarta_center[1], jakarta_center[0]))
x, y = lonlat_to_xy(jakarta_center[1], jakarta_center[0])
print('Jakarta center UTM X={}, Y={}'.format(x, y))
long, lat = xy_to_lonlat(x, y)
print('Jakarta center longitude={}, latitude={}'.format(long, lat))

Coordinate transformation check
-------------------------------
Jakarta center longitude=106.84287153600738, latitude=-6.18233995
Jakarta center UTM X=703915.3363898612, Y=-683714.1481157349
Jakarta center longitude=106.84287153600738, latitude=-6.18233995


Creating a **hexagonal grid of cells**: we offset every other row, and adjust vertical row spacing so that **every cell center is equally distant from all it's neighbors**.

In [45]:
jkt_center_x, jkt_center_y = lonlat_to_xy(jakarta_center[1], jakarta_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = jkt_center_x - 6000
x_step = 600
y_min = jkt_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(jkt_center_x, jkt_center_y, x, y)
        if (distance_from_center <= 6000):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


**Visualizing the data**: City center location and candidate neighborhood centers

In [46]:
jakpus_map = folium.Map(location=jakarta_center, zoom_start=13)
folium.Marker(jakarta_center, popup='Central Jakarta City').add_to(jakpus_map)
for lat, lon in zip(latitudes, longitudes):
   
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(jakpus_map)
    
jakpus_map

The coordinates of centers of neighborhoods/areas to be evaluated have been found, equally spaced and within 6km from Central Jakarta City. 

Let's now use Google Maps API to get approximate addresses of those locations.

In [47]:
# @hidden cell
google_api_key = "AIzaSyDkfklLUAmFjFxVhydpQIp33KLZSCP75vU"

In [48]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, jakarta_center[0], jakarta_center[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(jakarta_center[0], jakarta_center[1], addr))

Reverse geocoding check
-----------------------
Address of [-6.18233995, 106.84287153600738] is: RW, RT.2/RW.9, Kwitang, Kec. Senen, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10420, Indonesia


In [61]:
try:
# Try to load from local file system in case we did this before
    with open('locations.pkl', 'rb') as f:
        df_loc = pickle.load(f)
except:
    addresses = []
    for lat, lon in zip(latitudes, longitudes):
        address = get_address(google_api_key, lat, lon)
        if address is None:
            address = 'NO ADDRESS'
        address = address.replace(', Indonesia', '') # We don't need country part of address
        addresses.append(address)
    
    df_loc = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center}) #dictionary to dataframe

    df_loc.to_pickle('./locations.pkl')    

df_loc.head()


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Jl. Denpasar Raya Blok N No.3, RT.8/RW.3, Kuni...",-6.234076,106.826788,702115.33639,-689429.915781,5992.495307
1,"Jl. Patra Kuningan Raya No.5, RT.6/RW.4, Kunin...",-6.234057,106.832209,702715.33639,-689429.915781,5840.3767
2,"Jl. Rasamala III No.87, RT.5/RW.13, Menteng Da...",-6.234038,106.83763,703315.33639,-689429.915781,5747.173218
3,"Jl. Rasamala I No.62, RT.3/RW.3, Menteng Dalam...",-6.234019,106.843051,703915.33639,-689429.915781,5715.767665
4,"Jl. Tebet Barat IV No.21, RT.8/RW.3, Tebet Bar...",-6.234,106.848472,704515.33639,-689429.915781,5747.173218


### Integrating Foursquare
Using Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in **'food' category**, but only those that are proper restaurants - coffee shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will **include only** venues that have 'restaurant' in category name. We need to detect and include all the subcategories of specific 'Fast food restaurant' category, as we need info on fast food restaurants in the neighborhood.

In [62]:
# @hidden_cell

CLIENT_ID = 'CVM4ZCKF2U50OIDUTGMHNSW0KG5VP3ZRPH5XFXZPOSHR4JIZ' # your Foursquare ID
CLIENT_SECRET = 'GH05WFIM0ZNQEQRQ5J3AVS4OCY0OSYJN05WXJBB4FSACW0GW' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30

In [63]:
# Category IDs corresponding to Fast Food restaurants were taken 
# from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

fast_food_cat = '4bf58dd8d48988d16e941735'

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Indonesia', '')
    return address

def get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [64]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found italian restaurants
def get_restaurants(lats, lons):
    restaurants = {}
    fastfood_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_fastfood = is_restaurant(venue_categories, specific_filter=fast_food_cat)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_categories, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_fastfood, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_fastfood:
                    fastfood_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, fastfood_restaurants, location_restaurants

In [68]:
#Try to load from local file system in case we did this before
restaurants = {}
fastfood_restaurants = {}
location_restaurants = []

try:
    with open('restaurants_jkt.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('fastfood_restaurants_jkt.pkl', 'rb') as f:
        fastfood_restaurants = pickle.load(f)
    with open('location_restaurants_jkt.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')

except:
    restaurants, fastfood_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    # Let's persists this in local file system
    with open('restaurants_jkt.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('fastfood_restaurants_jkt.pkl', 'wb') as f:
        pickle.dump(fastfood_restaurants, f)
    with open('location_restaurants_jkt.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [69]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Fast Food restaurants:', len(fastfood_restaurants))
print('Percentage of Fast Food restaurants: {:.2f}%'.format(len(fastfood_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 2057
Total number of Fast Food restaurants: 134
Percentage of Fast Food restaurants: 6.51%
Average number of restaurants in neighborhood: 4.917582417582418


In [70]:
df_resto = pd.DataFrame(restaurants).transpose()
df_resto.reset_index(drop=True,inplace=True)
df_resto.columns = ["Unique Id","Food Category","Restaurant Name","Latitude",
                    "Longitude", "Address","Distance","IsFastFood","x","y"]
df_resto["Food Category"] = df_resto["Food Category"][0][0][0]
df_resto.head()

Unnamed: 0,Unique Id,Food Category,Restaurant Name,Latitude,Longitude,Address,Distance,IsFastFood,x,y
0,4bcea93f937ca593e8e9ae92,Steakhouse,Stonegrill,-6.23169,106.828,"Menara Anugerah, Kantor Taman Blok E3.3 no. 9 ...",328,False,702211,-689166
1,52c50e7911d2a3d73b07c686,Steakhouse,Kekun Tempo Scan Tower,-6.23522,106.829,"Tempo Scan tower, Jl. HR Rasuna said (Kuningan...",289,False,702374,-689558
2,4cb07cb6562d224b6e9f1688,Steakhouse,Kedai Sunda Cipayung,-6.23266,106.829,"Jl. Raya Puncak km. 75 (Cipayung), Bogor, Jawa...",321,False,702396,-689274
3,4ce8e2e5f1c6236ad9a562f0,Steakhouse,Yoshi Izakaya,-6.23534,106.831,Gran Meliá Jakarta (Jalan HR Rasuna Said Kav. ...,230,False,702534,-689571
4,4c11cd74d41e76b09e1d320d,Steakhouse,"Kantin Sehat, KEMENKES RI",-6.23185,106.832,"Jl. Rasuna Said kav. 4-9 Blok X-5 (Kuningan), ...",247,False,702692,-689186


Now let's look at the fast food restaurants available

In [91]:
df_fastfood = df_resto[df_resto["IsFastFood"]==True]
df_fastfood.reset_index(drop=True,inplace=True)
df_fastfood.tail()

Unnamed: 0,Unique Id,Food Category,Restaurant Name,Latitude,Longitude,Address,Distance,IsFastFood,x,y
129,4d22ddd45acaa35db9b0db35,Steakhouse,A&w WTC Mangga Dua,-6.13357,106.831,"Lobby Gn. Sahari (Jl. Mangga Dua Raya 8), Jaka...",333,True,702620,-678316
130,4c146cfb7f7f2d7f92f9e068,Steakhouse,AW mangga dua square,-6.13599,106.831,"Jl.mangga dua raya, Jakarta, Jakarta",228,True,702634,-678583
131,4fddaae0e4b06f69854feeea,Steakhouse,A&W,-6.13345,106.831,"WTC Mangga Dua GF,Jalan Gunung Sahari Raya, Ja...",320,True,702620,-678303
132,4f52dd3de4b02cf6be592799,Steakhouse,Nasi Akwang Pontianak,-6.13602,106.842,"Jl. Pademangan IV Gg 22 No. 7 (Pademangan), Ja...",208,True,703811,-678591
133,50f68038e4b0255d3d501bd4,Steakhouse,Shihlin,-6.13413,106.847,Sunter mall lantai 3,206,True,704372,-678384


In [92]:
print("There are", len(df_fastfood), "Fast Food Restaurants captured.")

There are 134 Fast Food Restaurants captured.


Now let's see which franchise business is most common in the vicinity.

In [93]:
df_fastfood = df_fastfood.sort_values(by=["Restaurant Name"],ascending = True)
df_fastfood.reset_index(drop=True,inplace=True)
df_fastfood.loc[120:125]

Unnamed: 0,Unique Id,Food Category,Restaurant Name,Latitude,Longitude,Address,Distance,IsFastFood,x,y
120,4bb173a5f964a5206e943ce3,Steakhouse,RM. Padang Abdul Muis,-6.17516,106.82,"Jl. Abdul Muis Raya, Jakarta, Jakarta",303,True,701404,-682912
121,4edb43339911a3e78e37c688,Steakhouse,Raffel's 'World Famous Roast Beef Sandwich',-6.1868,106.814,"Urban Kitcen, Plaza Indonesia",210,True,700708,-684196
122,549eace7498e198131fc091f,Steakhouse,Richeese Factory,-6.17655,106.874,Jl.Cempaka Putih Raya no.139,172,True,707382,-683086
123,5008e2cce4b0639ac2834969,Steakhouse,Richeese Factory,-6.1607,106.819,"Gajah Mada Plaza, Lt. 1 (Jl. Gajah Mada No. 19...",297,True,701291,-681311
124,4fb35d51e4b076d9e9a24a3b,Steakhouse,Richeese Factory,-6.19402,106.891,"Arion Mall, Lt. 2 (Jl. Pemuda Raya, Rawamangun...",329,True,709201,-685024
125,50f68038e4b0255d3d501bd4,Steakhouse,Shihlin,-6.13413,106.847,Sunter mall lantai 3,206,True,704372,-678384


In [75]:
df_fastfood.shape

(134, 10)

In [76]:
df_fastfood["Restaurant Name"] = df_fastfood["Restaurant Name"].str.upper()

In [77]:
#Some more cleaning
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('A&W')] = 'A&W'
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('TEXAS')] = 'TEXAS FRIED CHICKEN'
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('ES TELER')]= 'ES TELER 77'
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('HOK')]='HOKA HOKA BENTO'
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('MC','DONALD')]="MC DONALD'S"
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('MAC')]="MC DONALD'S"
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('SHIHLIN')]="SHIHLIN"
df_fastfood[df_fastfood["Restaurant Name"].astype(str).str.contains('CARL')]="CARL'S JR."

In [78]:
df_fastfood.groupby('Restaurant Name').count().sort_values(by="Unique Id",ascending=False)

Unnamed: 0_level_0,Unique Id,Food Category,Latitude,Longitude,Address,Distance,IsFastFood,x,y
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
KFC,30,30,30,30,30,30,30,30,30
MC DONALD'S,25,25,25,25,25,25,25,25,25
BURGER KING,15,15,15,15,15,15,15,15,15
A&W,13,13,13,13,13,13,13,13,13
SHIHLIN,3,3,3,3,3,3,3,3,3
CARL'S JR.,3,3,3,3,3,3,3,3,3
CFC,3,3,3,3,3,3,3,3,3
RICHEESE FACTORY,3,3,3,3,3,3,3,3,3
TEXAS FRIED CHICKEN,2,2,2,2,2,2,2,2,2
ES TELER 77,2,2,2,2,2,2,2,2,2


From this basic exploration, we could see that the top 4 Fast Food restaurants in Central Jakarta are KFC, Mc Donald's, Burger King, A&W. Investors might consider on franchising on either restaurants. Yet, we need to determine the best possible location which will generate best possible return on investment, ignoring the details of venue purchases, law enforcements, etc.

For a quick visualisation, let's see and distinguish fast food restaurants with the other non-fast-food restaurants in the vicinity.

In [79]:
jakpus_map = folium.Map(location=jakarta_center, zoom_start=12)
folium.Marker(jakarta_center, popup='Central Jakarta City').add_to(jakpus_map)
for x in range(len(df_resto)):
    lat = df_resto["Latitude"][x]; lon = df_resto["Longitude"][x]
    is_fastfood = df_resto["IsFastFood"][x]
    if is_fastfood == True:
        color = 'red' 
    else:
        color = 'blue'
    label = '{}, {}'.format(df_resto["Restaurant Name"][x], 
                                df_resto["Address"][x])
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker([lat, lon], radius=3, color=color, 
                        popup=label,fill=True, fill_color=color,
                        fill_opacity=1).add_to(jakpus_map)
jakpus_map

We can also visualize the restaurants with distance up to 300 m from the hexagonal grids we created earlier.

In [80]:
df_loc_resto = pd.DataFrame(location_restaurants)
df_temp_resto = df_loc_resto[0].dropna(axis=0)

for i in range(1,39):
    df_temp_resto = df_temp_resto.append(df_loc_resto[i].dropna(axis=0))

In [81]:
df_temp_resto.shape

(1790,)

In [82]:
df_temp_resto.iloc[0]

('4bcea93f937ca593e8e9ae92',
 [('Steakhouse', '4bf58dd8d48988d1cc941735')],
 'Stonegrill',
 -6.231686079903219,
 106.82764659232964,
 'Menara Anugerah, Kantor Taman Blok E3.3 no. 9 (Mega Kuningan), Jakarta Selatan, Jakarta',
 282,
 False,
 702211.2895325976,
 -689165.9328901849)

In [83]:
df_location_UID = []
df_location_FoodCat = []
df_location_Name = []
df_location_lat = []
df_location_long = []
df_location_addr = []
df_location_dist = []
df_location_ff = []
df_location_x = []
df_location_y = []

for i in range(len(df_temp_resto)):
    df_location_UID.append(df_temp_resto.iloc[i][0])
    df_location_FoodCat.append(df_temp_resto.iloc[i][1][0][0])
    df_location_Name.append(df_temp_resto.iloc[i][2])
    df_location_lat.append(df_temp_resto.iloc[i][3])
    df_location_long.append(df_temp_resto.iloc[i][4])
    df_location_addr.append(df_temp_resto.iloc[i][5])
    df_location_dist.append(df_temp_resto.iloc[i][6])
    df_location_ff.append(df_temp_resto.iloc[i][7])
    df_location_x.append(df_temp_resto.iloc[i][8])
    df_location_y.append(df_temp_resto.iloc[i][9])

In [84]:
df_loc_less300 = pd.DataFrame(list(zip(df_location_UID,df_location_FoodCat,
                                      df_location_Name,df_location_lat,
                                      df_location_long,df_location_addr,
                                       df_location_dist,df_location_ff,
                                       df_location_x, df_location_y)), 
                              columns = ["Unique Id","Food Category",
                                         "Restaurant Name","Latitude",
                                         "Longitude","Address","Distance",
                                         "IsFastFood","x","y"])
df_loc_less300.head()

Unnamed: 0,Unique Id,Food Category,Restaurant Name,Latitude,Longitude,Address,Distance,IsFastFood,x,y
0,4bcea93f937ca593e8e9ae92,Steakhouse,Stonegrill,-6.231686,106.827647,"Menara Anugerah, Kantor Taman Blok E3.3 no. 9 ...",282,False,702211.289533,-689165.93289
1,4ce8e2e5f1c6236ad9a562f0,Japanese Restaurant,Yoshi Izakaya,-6.235336,106.830574,Gran Meliá Jakarta (Jalan HR Rasuna Said Kav. ...,230,False,702533.91678,-689570.727135
2,4bd564086f649521b2f66eec,Asian Restaurant,Bakso Topo*,-6.234437,106.839713,"Jl Rasamala, Jakarta, Jakarta",234,False,703545.692856,-689474.84423
3,51f5021e498ee02281777173,Indonesian Restaurant,Sate Khas Senayan,-6.233162,106.844556,"Jl. Prof. Dr. Soepomo (Tebet), Jakarta, Jakart...",191,False,704082.218506,-689335.685286
4,5405957c498e2fef357dbee9,French Restaurant,frenchie,-6.236188,106.84692,"Tebet Barat Dalam no.29, Jakarta, Jakarta 12810",298,False,704342.62827,-689671.325664


In [85]:
jakpus_map = folium.Map(location=jakarta_center, zoom_start=12)
folium.Marker(jakarta_center, popup='Central Jakarta City').add_to(jakpus_map)
for x in range(len(df_loc_less300)):
    lat = df_loc_less300["Latitude"][x]; lon = df_loc_less300["Longitude"][x]
    is_fastfood = df_loc_less300["IsFastFood"][x]
    if is_fastfood == True:
        color = 'red' 
    else:
        color = 'blue'
    label = '{}, {}'.format(df_loc_less300["Restaurant Name"][x], 
                                df_loc_less300["Address"][x])
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker([lat, lon], radius=3, color=color, 
                        popup=label,fill=True, fill_color=color,
                        fill_opacity=1).add_to(jakpus_map)
jakpus_map

Now we know all restaurants located few kilometers from the center of Central Jakarta City. We also could differentiate which one is a fast food restaurant.

## Methodology <a name="methodology"></a>

In this project we will detect areas at Central Jakarta that have a low restaurant density, particularly those with low number of Fast Food restaurants. We will limit our analysis to area ~6km around city center.

In first step we have collected the required **data: location and type (category) of every restaurant within 6km from Central Jakarta**. We have also **identified Fast Food restaurants** (according to Foursquare categorization).

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Central Jakarta by using heatmaps to identify a few promising areas close to center with low number of restaurants in general (*and* no Fast Food restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 300 meters**, and we want locations **without Fast Food restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters using **k-means clustering** to those locations to identify general zones / neighborhoods / addresses. This should be a starting point for final exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of restaurants in every area candidate**:

In [88]:
loc_resto_count = [len(res) for res in location_restaurants]

df_loc['Restaurants Count'] = loc_resto_count

print('Average number of restaurants in every area with radius=300m:', np.array(loc_resto_count).mean())

df_loc.head()

Average number of restaurants in every area with radius=300m: 4.917582417582418


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants Count
0,"Jl. Denpasar Raya Blok N No.3, RT.8/RW.3, Kuni...",-6.234076,106.826788,702115.33639,-689429.915781,5992.495307,2
1,"Jl. Patra Kuningan Raya No.5, RT.6/RW.4, Kunin...",-6.234057,106.832209,702715.33639,-689429.915781,5840.3767,12
2,"Jl. Rasamala III No.87, RT.5/RW.13, Menteng Da...",-6.234038,106.83763,703315.33639,-689429.915781,5747.173218,1
3,"Jl. Rasamala I No.62, RT.3/RW.3, Menteng Dalam...",-6.234019,106.843051,703915.33639,-689429.915781,5715.767665,3
4,"Jl. Tebet Barat IV No.21, RT.8/RW.3, Tebet Bar...",-6.234,106.848472,704515.33639,-689429.915781,5747.173218,12


Now, we will calculate the **distance to nearest Fast Food restaurant from every area candidate nhood center**, regardless how distant it is.

In [95]:
dist_to_ff_resto = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for i in range(len(df_fastfood)):
        res_x = df_fastfood["x"][i]
        res_y = df_fastfood["y"][i]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    dist_to_ff_resto.append(min_distance)

df_loc['Dist to Fast Food resto'] = dist_to_ff_resto

In [96]:
df_loc.head()

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants Count,Dist to Fast Food resto
0,"Jl. Denpasar Raya Blok N No.3, RT.8/RW.3, Kuni...",-6.234076,106.826788,702115.33639,-689429.915781,5992.495307,2,717.447243
1,"Jl. Patra Kuningan Raya No.5, RT.6/RW.4, Kunin...",-6.234057,106.832209,702715.33639,-689429.915781,5840.3767,12,1038.098733
2,"Jl. Rasamala III No.87, RT.5/RW.13, Menteng Da...",-6.234038,106.83763,703315.33639,-689429.915781,5747.173218,1,902.545243
3,"Jl. Rasamala I No.62, RT.3/RW.3, Menteng Dalam...",-6.234019,106.843051,703915.33639,-689429.915781,5715.767665,3,354.467972
4,"Jl. Tebet Barat IV No.21, RT.8/RW.3, Tebet Bar...",-6.234,106.848472,704515.33639,-689429.915781,5747.173218,12,395.862565


In [97]:
print('Average distance to closest Fast FOod restaurant from each area center:', df_loc['Dist to Fast Food resto'].mean(),'m')

Average distance to closest Fast FOod restaurant from each area center: 554.9716288399853 m


**On average Fast Food restaurants can be found within ~550m** from every area center candidate. 
Now we need to filter our areas carefully!

Create a map showing **heatmap or density of restaurants** and attempt to meaningful info from that. **Borders of Central Jakarta sub-districts** will be shown on our map and a few circles indicating distance of 1km, 2km and 3km from Central Jakarta too.

In [98]:
resto_latlons = [[df_resto["Latitude"][x], df_resto["Longitude"][x]] 
                 for x in range(len(df_resto))]

fastfood_latlons = [[df_fastfood["Latitude"][x],df_fastfood["Longitude"][x]]
                    for x in range(len(df_fastfood))]

In [99]:
from folium import plugins
from folium.plugins import HeatMap

jakpus_map = folium.Map(location=jakarta_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(jakpus_map) #cartodbpositron cartodbdark_matter
HeatMap(resto_latlons,min_opacity=0.5).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)
folium.Circle(jakarta_center, radius=1000, fill=False, color='white').add_to(jakpus_map)
folium.Circle(jakarta_center, radius=2000, fill=False, color='white').add_to(jakpus_map)
folium.Circle(jakarta_center, radius=3000, fill=False, color='white').add_to(jakpus_map)

for lat, lng, postcode, subdistrict, town in \
    zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Sub-District'],df['Town']):
        label = '{}, {}, {}'.format(town, subdistrict, postcode)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6,
            parse_html=False).add_to(jakpus_map)

jakpus_map

Take a look within radius=2000m, there are still some less dense areas in the east side of the marker.

Let's create another heatmap map showing **heatmap/density of Fast Food restaurants** only.

In [100]:
from folium import plugins
from folium.plugins import HeatMap

jakpus_map = folium.Map(location=jakarta_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(jakpus_map) #cartodbpositron cartodbdark_matter
HeatMap(fastfood_latlons,min_opacity=0.5).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)
folium.Circle(jakarta_center, radius=1000, fill=False, color='white').add_to(jakpus_map)
folium.Circle(jakarta_center, radius=2000, fill=False, color='white').add_to(jakpus_map)
folium.Circle(jakarta_center, radius=3000, fill=False, color='white').add_to(jakpus_map)

for lat, lng, postcode, subdistrict, town in \
    zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Sub-District'],df['Town']):
        label = '{}, {}, {}'.format(town, subdistrict, postcode)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6,
            parse_html=False).add_to(jakpus_map)
    
jakpus_map

In [None]:
print("Fast Food Restaurants / Total Restaurants =",len(df_fastfood)/len(df_resto)*100,"%")

This map is blue-ish rather than red-ish. In fact, Fast Food restaurants  (Fast Food restaurants represent a subset of ~6.6% of all restaurants in Central Jakarta). If the stakeholders would like a place with low restaurant density and less fast food competitors, then **we should explore east from city center.**

We will move the center of our area of interest and reduce it's size to have a radius of **2 km**.

Let's define new, more narrow region of interest, which will include low-restaurant-count parts of Senen and Johar Baru closest to Central Jakarta City.

In [101]:
roi_x_min = jkt_center_x - 2000
roi_y_max = jkt_center_y + 900
roi_width = 5000
roi_height = 5000
roi_center_x = roi_x_min + 2000
roi_center_y = roi_y_max - 2000
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

jakpus_map = folium.Map(location=roi_center, zoom_start=14)
HeatMap(fastfood_latlons).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(jakpus_map)

for lat, lng, postcode, subdistrict, town in \
    zip(df['Latitude'], df['Longitude'], df['Postal Code'], df['Sub-District'],df['Town']):
        label = '{}, {}, {}'.format(town, subdistrict, postcode)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            fill=True,
            color='blue',
            fill_color='blue',
            fill_opacity=0.6,
            parse_html=False).add_to(jakpus_map)
    
jakpus_map

Create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m apart)

In [102]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 2500

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2155 candidate neighborhood centers generated.


OK. Now let's calculate two most important things for each location candidate: **number of restaurants in vicinity** (radius of **250 meters**) and **distance to closest Fast Food restaurant**.

In [103]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[8]; res_y = res[9]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[8]; res_y = res[9]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_fastfood_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, fastfood_restaurants)
    roi_fastfood_distances.append(distance)
print('done.')


Generating data on location candidates... done.


In [104]:
# Let's put this into dataframe
df_roi_loc = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Fast Food restaurant':
                                 roi_fastfood_distances})

df_roi_loc.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Fast Food restaurant
0,-6.214891,106.842533,703865.33639,-687314.148116,3,374.099575
1,-6.214888,106.843436,703965.33639,-687314.148116,3,284.532209
2,-6.214125,106.837561,703315.33639,-687227.545575,0,786.076465
3,-6.214122,106.838464,703415.33639,-687227.545575,2,827.940839
4,-6.214119,106.839368,703515.33639,-687227.545575,2,732.457589
5,-6.214116,106.840271,703615.33639,-687227.545575,2,638.358994
6,-6.214113,106.841175,703715.33639,-687227.545575,1,546.360953
7,-6.21411,106.842078,703815.33639,-687227.545575,3,457.731775
8,-6.214106,106.842982,703915.33639,-687227.545575,5,374.868595
9,-6.214103,106.843885,704015.33639,-687227.545575,5,302.546773


We need to **filter** those locations. 
The conditions are: 
1. Locations with no more than two restaurants in radius of 250 meters, and 
2. No fast food restaurants in radius of 600 meters.

In [109]:
meet_res_count = np.array((df_roi_loc['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', meet_res_count.sum())

meet_ff_dist = np.array(df_roi_loc['Distance to Fast Food restaurant']>=600)
print('Locations with no Fast Food restaurants within 600m:', meet_ff_dist.sum())

meet_locations = np.logical_and(meet_res_count, meet_ff_dist)
print('Locations with both conditions met:', meet_locations.sum())

df_good_locations = df_roi_loc[meet_locations]


Locations with no more than two restaurants nearby: 1181
Locations with no Fast Food restaurants within 600m: 454
Locations with both conditions met: 274


In [114]:
try:
# Try to load from local file system in case we did this before
    with open('good_locations.pkl', 'rb') as f:
        df_good_locations = pickle.load(f)
except:
    addresses = []
    for lat, lon in zip(df_good_locations["Latitude"], df_good_locations["Longitude"]):
        address = get_address(google_api_key, lat, lon)
        if address is None:
            address = 'NO ADDRESS'
        address = address.replace(', Indonesia', '') # We don't need country part of address
        addresses.append(address)

    df_good_locations["Address"] = addresses
    df_good_locations.to_pickle('./good_locations.pkl') 

df_good_locations.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Fast Food restaurant,Address
2,-6.214125,106.837561,703315.33639,-687227.545575,0,786.076465,"Jl. Epicentrum Boulevard Tim. No.1, RT.5/RW.1,..."
3,-6.214122,106.838464,703415.33639,-687227.545575,2,827.940839,"Jl. Menteng Atas No.13, RT.12/RW.12, Menteng A..."
4,-6.214119,106.839368,703515.33639,-687227.545575,2,732.457589,"Jl. Menteng Atas No.34, RT.11/RW.12, Menteng A..."
5,-6.214116,106.840271,703615.33639,-687227.545575,2,638.358994,"Jl. Menteng Atas No.9b, RT.9/RW.12, Kuningan, ..."
15,-6.21335,106.835299,703065.33639,-687140.943035,2,658.66962,"Jl. Kuningan Mulia RT. RW. 7/1, RT.7/RW.1, Men..."


In [125]:
df_good_locations["Restaurants nearby"].max()

2

**Visualizing on Map**

In [121]:
meet_latitudes = df_good_locations['Latitude'].values
meet_longitudes = df_good_locations['Longitude'].values

meet_locations = [[lat, lon] for lat, lon in zip(meet_latitudes, meet_longitudes)]

jakpus_map = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(jakpus_map)
HeatMap(resto_latlons).add_to(jakpus_map)

folium.Circle(roi_center, radius=2500, color='white', fill=True, 
              fill_opacity=0.6).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)
for lat, lon, nearby, addr in zip(meet_latitudes, meet_longitudes,
                                 df_good_locations["Restaurants nearby"],
                                 df_good_locations["Address"]):
    label = 'Nearby resto: {} |\nAddress: {}'.format(nearby, addr)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, 
                        fill_color='blue',popup=label, 
                        fill_opacity=1).add_to(jakpus_map) 
    
jakpus_map

We are now able to visualize 274 potential locations to consider when opting to start a Fast Food franchise business at Central Jakarta.

In [124]:
jakpus_map = folium.Map(location=roi_center, zoom_start=14)
HeatMap(meet_locations, radius=25).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)
for lat, lon, nearby, addr in zip(meet_latitudes, meet_longitudes,
                                 df_good_locations["Restaurants nearby"],
                                 df_good_locations["Address"]):
    label = 'Nearby resto: {} |\nAddress: {}'.format(nearby, addr)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, 
                        fill_color='blue',popup=label, 
                        fill_opacity=1).add_to(jakpus_map) 
jakpus_map

Above is the heat map of the locations that meet the defined requirements. We will then **cluster** these locations to create **"centers" of zones containing good locations**. This will scale down the area of analysis from 274 locations to 15 locations when we use k-means clustering with k=15. These zones, their centers and addresses will be the final result of this analysis project.

In [127]:
number_of_clusters = 15

meet_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(meet_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

jakpus_map = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(jakpus_map)
HeatMap(resto_latlons).add_to(jakpus_map)
folium.Circle(roi_center, radius=2500, color='white', fill=True, 
              fill_opacity=0.4).add_to(jakpus_map)
folium.Marker(jakarta_center).add_to(jakpus_map)

for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, 
                  fill_opacity=0.25).add_to(jakpus_map) 
for lat, lon in zip(meet_latitudes, meet_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', 
                        fill=True, fill_color='blue', 
                        fill_opacity=1).add_to(jakpus_map)
jakpus_map

The clusters are ready. Now we need to compile the center candidate address to be presented for further analysis.

In [163]:
#Set pandas to be able to display address in its full form
pd.set_option('display.max_colwidth', -1)

try:
# Try to load from local file system in case we did this before
    with open('candidate_locations.pkl', 'rb') as f:
        df_final_15 = pickle.load(f)
except:
    candidate_area_addresses = []
    d = []

    for lon, lat in cluster_centers:
        addr = get_address(google_api_key, lat, lon).replace(', Indonesia', '')
        candidate_area_addresses.append(addr)    
        x, y = lonlat_to_xy(lon, lat)
        d.append(calc_xy_distance(x, y, jkt_center_x, jkt_center_y))

    df_final_15 = pd.DataFrame(list(zip(candidate_area_addresses,d)),
                              columns = ["Candidate Address",
                                         "Distance from Central Jakarta City(km)"]) 
    df_final_15 = df_final_15.sort_values(by="Distance from Central Jakarta City(km)")
    df_final_15["Distance from Central Jakarta City(km)"] = df_final_15["Distance from Central Jakarta City(km)"]/1000
    df_final_15 = df_final_15.round(1)
    df_final_15.reset_index(drop=True,inplace=True)
    df_final_15.to_pickle('./candidate_locations.pkl')
    
df_final_15

  


Unnamed: 0,Candidate Address,Distance from Central Jakarta City(km)
0,"Jl. Kramat Kwitang 3C No.290, RT.6/RW.6, Kwitang, Kec. Senen, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10420",0.4
1,"Jl. Kramat Pulo Dalam I No.B150, RT.10/RW.5, Kramat, Kec. Senen, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10450",0.5
2,"Jl. Rawa Sawah III No.29, RT.5/RW.1, Kp. Rw., Kec. Johar Baru, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10550",1.2
3,"Jl. Dr. Wahidin Raya No.1, Ps. Baru, Kecamatan Sawah Besar, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10710",1.4
4,"5, Jl. Tanah Tinggi Timur No.90, RT.5/RW.2, Harapan Mulya, Kec. Kemayoran, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10460",1.4
5,"Jl. Teuku Umar No.20, RT.1/RW.1, Gondangdia, Kec. Menteng, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10350",1.5
6,"Jl. Kp. Rw. Sel. 2 No.53, RT.14/RW.5, Kp. Rw., Kec. Johar Baru, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10550",1.7
7,"Perpustakaan Nasional, Gambir, Kecamatan Gambir, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta",1.9
8,"Gg. S No.20B, RT.11/RW.5, Johar Baru, Kec. Johar Baru, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10560",2.0
9,"9, Jl. Borobudur No.11, RT.9/RW.2, Pegangsaan, Kec. Menteng, Kota Jakarta Pusat, Daerah Khusus Ibukota Jakarta 10320",2.0


Finally let's visualize again these chosen candidates!

In [165]:
jakpus_map = folium.Map(location=roi_center, zoom_start=13)
folium.Circle(jakarta_center, radius=50, color='red', 
              fill=True, fill_color='red', fill_opacity=1).add_to(jakpus_map)
for lonlat, addr, dist in zip(cluster_centers, candidate_area_addresses,
                             df_final_15.iloc[:,1]):
    label = 'Address: {} | Distance from Central Jakarta: {} km'.format(addr,dist)
    label = folium.Popup(label, parse_html=True)
    folium.Marker([lonlat[1], lonlat[0]], popup=label).add_to(jakpus_map) 
for lat, lon in zip(meet_latitudes, meet_longitudes):
    folium.Circle([lat, lon], radius=250, color='#0000ff00', fill=True, 
                  fill_color='#0066ff', fill_opacity=0.05).add_to(jakpus_map)
jakpus_map

## Results and Discussion <a name="results"></a>

Through this analysis, we are able to locate areas with low-density of restaurants around south, southeast and east of Central Jakarta, despite there are at least 2000 restaurants spreaded across the vicinity. We successfully recognize 15 final locations to be presented to the stakeholders who opt to open a fast food franchise business. The locations are nonetheless are located at the following sub-districts: Senen (2 locations), Johar Baru (3 locations), Menteng (3 locations), Sawah Besar (1 location), Gambir (1 location), Kemayoran (1 location), Setiabudi (2 locations), Matraman (2 locations).

Notice that Setiabudi and Matraman are sub-districts of South Jakarta not Central Jakarta. This is interesting because South Jakarta people are well-known for being more elite in terms of living than the others. And as these locations do meet our pre-defined conditions, these might be another alternatives to consider.

Venue cost wise, we could rank from the most expensive to cheapest as follow: Menteng, Kemayoran > Senen, Johar Baru, Sawah Besar, Gambir > Matraman, Setiabudi. This rankings do make sense as the more central the location is, the more strategic the place is. 

The 15 locations above met the requirement of no more than two restaurants in radius of 250 meters, and no fast food restaurants in radius of 600 meters. However, these findings should only be a starting point for further analysis (stage 2) in terms of fixed cost (including tax), variable costs such as inventories and employee salaries, target consumers, potential revenues, and much more.

The location with the least number of breakeven days should be picked. Another deeper analysis needs to be done until the stakeholders are convinced on the business prospect.

## Conclusion <a name="conclusion"></a>

We have started from a simple postal codes of Central Jakarta, retrieve their coordinates, and fetching nearby venues in the vicinity using Foursquare API. We have narrowed down the location candidates by two requirements of low restaurant density. And we have filtered 364 locations to 15 potential locations, which are pending for further technical analysis, using k-means clustering to extract the center location candidate.

Final decision is on the stakeholders who will take into account many additional factors such as attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.