# Restaurant to Open at Yogyakarta, Indonesia

## 1. Introduction

Yogyakarta is a really diverse place in Indonesia. All sort of people are here in Yogyakarta to study, travel, business, permananent residence and many others. These groups have a tendency to cluster to a certain location in Yogyakarta. Thus, this clustering also affects their food choices. Since their palates is affected, so does the success of a certain restaurants being open on certain locations. 

This project is meant to help **entrepreneurs, local cooks and chefs** to better understand the correlation between the location and the best type of restaurant to be opened based on local palate. By choosing the best location based on the type of restaurant they're going to open, hopefully they could minimize the risk of bankruptcy and maximize profit.

To better understand the palate preferences in Yogyakarta, foursquare venue data will gives us relevant information about restaurant type and it's location. By using machine learning algorithm such as KKN, we can group neighborhoods that have similar food palates based on nearby restaurants thus giving us cluster of food preferences. 

## 2. Data

Data used in this project:
<br>**1.** Data retrieved from https://kodepos.nomor.net/_kodepos.php?_i=kota-kodepos&daerah=Provinsi&jobs=DI+Yogyakarta&perhal=400&urut=10&asc=00001111&sby=110000&no1=2 for region names and postal code information.
<br>**2.** Geopy library to find geographical location information.
<br>**3.** Using Foursquare API to find nearby restaurants information such as their location and types of dishes they sell.

## 3. Acquiring and Data Cleaning

### importing necessary libraries

In [1]:
import time # for time delay while working with API

import requests # library to handle requests

import bs4 # library to parse webpages

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
import geopy.geocoders

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# k-means from clustering stage
from sklearn.cluster import KMeans

# Map rendering library
import folium

# regular expressions
import re

### Scraping Region Name and Postal Code Data from Website

In [3]:
# Download the webpage
url = 'https://kodepos.nomor.net/_kodepos.php?_i=kecamatan-kodepos&sby=110000&daerah=Kota&jobs=Yogyakarta'
res = requests.get(url)
res.raise_for_status()

In [4]:
# Create an beautifulSoup object
yogyakarta_soup = bs4.BeautifulSoup(res.text)

In [6]:
# Selecting all elements inside the corresponding tags
elements = yogyakarta_soup.select('div table tbody tr td')

In [9]:
# Take a look on raw data
for i in range(2, len(elements), 8):
    print('{0} | {1}'.format(str(i//6+1), elements[i].getText(), elements[i+1].getText().getText()))
    if elements[i].getText() == 'Wirobrajan': # the last location on the table
        break

In [29]:
len(elements)

0

In [11]:
# Creating a new list of rows
lst = []
for i in range(2, 3195, 6):
    Kecamatan, Kode_POS = elements[i].getText(), elements[i+1].getText()
    lst.append([Kecamatan, Kode_POS, 0, 0])
lst[1:10]

IndexError: list index out of range

In [13]:
data=[['Danurejan',55211],['Gondongtengen',55271],['Gondokusuman',55221],['Gondomanan',55121],['Jetis',55231],['Kotagede',55171],['Kraton',55131],['Mantrijeron',55141],['Mergasangan',55151],['Ngampilan',55261],['Pakualaman',55111],['Tegalrejo',55241],['Umbulharjo',55161],['Wirobrajan',55251]]
df = pd.DataFrame(data, columns = ['Kecamatan', 'Kode POS'])

In [14]:
df

Unnamed: 0,Kecamatan,Kode POS
0,Danurejan,55211
1,Gondongtengen,55271
2,Gondokusuman,55221
3,Gondomanan,55121
4,Jetis,55231
5,Kotagede,55171
6,Kraton,55131
7,Mantrijeron,55141
8,Mergasangan,55151
9,Ngampilan,55261


### Adding Coordinates

In [16]:
# Getting Address
address =df.iloc[0, 0]
address
# Using geopy
geolocator = Nominatim(user_agent='opening_restaurant_yogyakarta')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(address, latitude, longitude))

The geograpical coordinate of Danurejan are -7.79284905, 110.37177162547343.


In [18]:
df['Latitude'] = latitude
df['Longitude'] = longitude
df.head()

Unnamed: 0,Kecamatan,Kode POS,Latitude,Longitude
0,Danurejan,55211,-7.792849,110.371772
1,Gondongtengen,55271,-7.792849,110.371772
2,Gondokusuman,55221,-7.792849,110.371772
3,Gondomanan,55121,-7.792849,110.371772
4,Jetis,55231,-7.792849,110.371772


In [21]:

for i in range(len(df)):
    time.sleep(2.5)
    address = df.iloc[i,0]
    geolocator = Nominatim(user_agent='opening_restaurant_yogyakarta')
    location = geolocator.geocode(address)
    if location == None:
        continue
    latitude = location.latitude
    longitude = location.longitude
    df.iloc[i,2] = latitude
    df.iloc[i,3] = longitude
df

Unnamed: 0,Kecamatan,Kode POS,Latitude,Longitude
0,Danurejan,55211,-7.792849,110.371772
1,Gondongtengen,55271,-7.792849,110.371772
2,Gondokusuman,55221,-7.786791,110.381157
3,Gondomanan,55121,-7.802395,110.366112
4,Jetis,55231,-7.781514,110.364669
5,Kotagede,55171,-7.818311,110.397941
6,Kraton,55131,-7.808799,110.362726
7,Mantrijeron,55141,-7.818067,110.359731
8,Mergasangan,55151,-7.792849,110.371772
9,Ngampilan,55261,-7.802183,110.357615


### Get Yogyakarta Central Point

In [24]:
# Get the yogyakarta "central" point
yogyakarta_address = 'yogyakarta, indonesia'
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(yogyakarta_address)
yog_lat = location.latitude
yog_lon = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(yogyakarta_address, yog_lat, yog_lon))

The geograpical coordinate of yogyakarta, indonesia are -7.8011945, 110.364917.


In [28]:
yog_map = folium.Map(location=[yog_lat, yog_lon], zoom_start=13)

# add markers to map
for lat, lng, kec in zip(df['Latitude'], df['Longitude'], df['Kecamatan']):
    label = '{}'.format(kec)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(yog_map)
    
yog_map