<h1 align="center"> Battle of Neighborhoods - Clustering and Segmenting the Neighborhoods of Jakarta and Surabaya</h1>

<p align ="center"> Muhammad Adisatriyo Pratama
<br>
<br>
14 December 2020
</p>




# 1. Introduction

In this report i will analyze area of cluster between two major metropolitan area in Indonesia that is Jakarta and Surabaya. Jakarta and Surabaya are very popular and most populated metropolitan area in Indonesia. Altough Jakarta have approximately 3 times population than Surabaya, Surabaya has it's own destination, unique places, and landmark to go to. 

# 2. Business Problem

The aim of this report is try to help tourist or business owner to open new detinations or places in the neighborhood depending on the experiences that neighborhood have. Once the data is obtained, the cluster and segmentation between neighborhood is created to see wich neighborhood has the same simmilarity based on destination and places. This also will help people to make decision if they are want to migrate or move into another neighborhood.

# 3. Data Collecting

In this report, we require neighborhood (Kecamatan) for Jakarta and Surabaya. Using the location of the neighborhood we can search most popular venue or places for each categories using **Foursquare API**. We also need the coordinates/geographical location for each neighborhood in Jakarta and Surabaya. Using the coordinates of the neighborhood we can visualize with **OpenStreetMap using Folium API**.

## 3.1 Jakarta

In order to get neighborhood (Kecamatan) in Jakarta we scrape the data from : https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta

In this wikipedia page there is several table representing each Town in Jakarta. in each table, there is data about name of neighborhood (Kecamatan) for each town and name of villages (Kelurahan) for each neighborhood.

After doing data processing we limit the data and concatenate 5 table into 1 table containing information about : 

1. *Neighborhood* : Name of kecamatan, we call this neighborhood to make it easy for report.
2. *Town* : Name of Administrative Town for each neighborhood.

At the end, we obtained 48 rows of data each representing its neighborhood.

## 3.2 Surabaya

We scrape neighborhood data in Surabaya also from wikipedia page : https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya

In this wikipedia page there is just containing 1 table with the same information from wikipedia page in Jakarta. Because the table contains some data that we do not need so we can keep the same information we got fram table Jakarta.

## 3.3 Nominatim OpenStreetMap

The data scraping from wikipedia page does not give information about the coordinates for each neighborhood. So we can use Nominatim OpenStreetMap API in order to get *latitude* and *longitude* for each neighborhood.

Using Nominatim OpenStreetMap API in python we can use **geopy** library and import **geopy.geocoders.Nominatim** package into notebook.

Using nominatim we can pass neighborhood keyword into nominatim object and get the representing latitude and longitude so we can add this information into neighborhood table for Jakarta and Surabaya.

## 3.4 Foursquare API

Foursquare is a company focusing on social media services. One of their products is Foursquare City Guide commonly called Foursquare is a product that give information about venues, places, or events within an area of interest. This app also proveides personalized reccomendations of places to go in near the user's current location based on other user's rating for the places. Using Foursquare API we can find data about different venues for different neighborhood. With Foursquare API we can make a call containing neighborhood information so we can gain information about the places or venues.

After using Foursquare API we can find data about venues for each neighborhood and we can create a **Pandas Dataframe** object for information about Jakarta and Surabaya. After this, the information we obtained as follows:

1. *Neighborhood* : Name of kecamatan, we call this neighborhood to make it easy for report.
2. *Town* : Name of Administrative Town for each neighborhood.
3. *Latitude* : Latitude coordinates of the neighborhood.
4. *Longitude* : Longitude coordinates of the neighborhood.
5. *Venue* : Name of the venue.
6. *Venue Category* : Category of the venue.
7. *Venue Latitude* : Latitude coordinates of the venue.
8. *Venue Longitude* : Longitude coordinates of the venue.


# 4. Methodology

In this part of the section, i will collecting data (data scrapping) from wikipedia page in order to get **neighborhood information** for **Jakarta** and **Surabaya**. After getting that information, i will use name of the neighborhood as a keyword to providing information about **neighborhood coordinates** (latitude and longitude) using **Nominatim** with *geopy.geocoders.Nominatim* package. Using coordinates for each neighborhood i will use **Foursquare API** to get relevant venues and places near the given **latitude** and **longitude**. Using that information we create a pandas dataframe to sort **5 most popular venues (categories) for each neighborhood**

### Import library
Before we start collecting and processing data we want to import necessary library that we use in this research notebook.

In [1]:
# basic library
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors

# import folium library for map visualization
# !pip install folium # uncomment this if you have not already insalled folium
import folium

# Import Nominatim API from geopy.geocoders.Nominatim package for providing information about latitude and longitude
# !pip install geopy # uncomment this if you have not already installed folium
from geopy.geocoders import Nominatim

# import k-means for the clustering stage
from sklearn.cluster import KMeans

## 4.1 Data Collection

## Explore Jakarta

### In this part i will do data wrangling from wikipedia page for providing neighborhood data and information in Jakarta
URL : https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta

From this wikipedia page there is several table that we need so we can use **pandas.read_html()** function to get a list of table that we need.

In [2]:
jakarta_url = 'https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta'
jakarta_wiki = requests.get(jakarta_url)
jakarta_wiki

<Response [200]>

We got response 200 wich means connection is established

In [3]:
# Read data
jakarta_data = pd.read_html(jakarta_wiki.text)

# See first couple of data
jakarta_data[0:2]

[   No. Kode Kemendagri         Kabupaten/Kota Luas Wilayah (km²)  \
    No. Kode Kemendagri         Kabupaten/Kota Luas Wilayah (km²)   
 0  1.0           31.01  Kab. Kepulauan Seribu               1018   
 1  2.0           31.73     Kota Jakarta Barat              12444   
 2  3.0           31.71     Kota Jakarta Pusat               5238   
 3  4.0           31.74   Kota Jakarta Selatan              15432   
 4  5.0           31.75     Kota Jakarta Timur              18270   
 5  6.0           31.72     Kota Jakarta Utara              13999   
 6  NaN             NaN                  TOTAL              66401   
 
   Penduduk (jiwa) Kepadatan (jiwa/km²)      2017                 
   Penduduk (jiwa) Kepadatan (jiwa/km²) Kecamatan Kelurahan Desa  
 0          27.123              2.66434         2         6    -  
 1       2.324.121             18.67664         8        56    -  
 2       1.138.346             21.73246         8        44    -  
 3       2.188.457             14.18129   

 Here are the sample of data that we want

In [4]:
jakarta_data[1].head(2)

Unnamed: 0,Kode Kemendagri,Kecamatan,Jumlah Kelurahan,Daftar Kelurahan
0,31.71.05,Cempaka Putih,3,Cempaka Putih Barat Cempaka Putih Timur Rawasari
1,31.71.01,Gambir,6,Cideng Duri Pulo Gambir Kebon Kelapa Petojo Se...


## Data Processing

Add information about administrative town

In [5]:
jakarta_data[1]['Town'] = 'Central Jakarta'
jakarta_data[2]['Town'] = 'North Jakarta'
jakarta_data[3]['Town'] = 'East Jakarta'
jakarta_data[4]['Town'] = 'South Jakarta'
jakarta_data[5]['Town'] = 'West Jakarta Barat'
jakarta_data[6]['Town'] = 'Kepulauan Seribu'

Some of the table have different name but same meaning so we rename those column name

In [6]:
jakarta_data[2].rename(columns={'Kemendagri':'Kode Kemendagri'}, inplace=True)
jakarta_data[3].rename(columns={'Kemendagri':'Kode Kemendagri'}, inplace=True)


Append all table into one dataframe containing all of information

In [7]:
jakarta_df = jakarta_data[1].append(jakarta_data[2]).append(jakarta_data[3]).append([jakarta_data[4]]).append(jakarta_data[5]).append(jakarta_data[6]).reset_index(drop=True).dropna()

Here are the combined data for Neighborhood in Jakarta

In [8]:
jakarta_df

Unnamed: 0,Kode Kemendagri,Kecamatan,Jumlah Kelurahan,Daftar Kelurahan,Town
0,31.71.05,Cempaka Putih,3,Cempaka Putih Barat Cempaka Putih Timur Rawasari,Central Jakarta
1,31.71.01,Gambir,6,Cideng Duri Pulo Gambir Kebon Kelapa Petojo Se...,Central Jakarta
2,31.71.08,Johar Baru,4,Galur Johar Baru Kampung Rawa Tanah Tinggi,Central Jakarta
3,31.71.03,Kemayoran,8,Cempaka Baru Gunung Sahari Selatan Harapan Mul...,Central Jakarta
4,31.71.06,Menteng,5,Cikini Gondangdia Kebon Sirih Menteng Pegangsaan,Central Jakarta
5,31.71.02,Sawah Besar,5,Gunung Sahari Utara Karang Anyar Kartini Mangg...,Central Jakarta
6,31.71.04,Senen,6,Bungur Kenari Kramat Kwitang Paseban Senen,Central Jakarta
7,31.71.07,Tanah Abang,7,Bendungan Hilir Gelora Kampung Bali Karet Teng...,Central Jakarta
9,31.72.04,Cilincing,7,Cilincing Kalibaru Marunda Rorotan Semper Bara...,North Jakarta
10,31.72.06,Kelapa Gading,3,Kelapa Gading Barat Kelapa Gading Timur Pegang...,North Jakarta


## Feature Selection

In this part we will drop columns/features that we don't need and rename some of the columns with english word in order to make it easy for further analysis

In [9]:
# Drop columns 'Kode Kemendagri', 'Jumlah Kelurahan', and 'Daftar Kelurahan'
jakarta_df.drop(columns=['Kode Kemendagri', 'Jumlah Kelurahan', 'Daftar Kelurahan'], inplace=True)
jakarta_df.head()

Unnamed: 0,Kecamatan,Town
0,Cempaka Putih,Central Jakarta
1,Gambir,Central Jakarta
2,Johar Baru,Central Jakarta
3,Kemayoran,Central Jakarta
4,Menteng,Central Jakarta


In [10]:
# Rename column 'Kecamatan' into 'Neighborhood'
jakarta_df.rename(columns={'Kecamatan':'Neighborhood'}, inplace=True)
jakarta_df.head()

Unnamed: 0,Neighborhood,Town
0,Cempaka Putih,Central Jakarta
1,Gambir,Central Jakarta
2,Johar Baru,Central Jakarta
3,Kemayoran,Central Jakarta
4,Menteng,Central Jakarta


Here are information about jakarta_df

In [11]:
jakarta_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 44 entries, 0 to 48
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Neighborhood  44 non-null     object
 1   Town          44 non-null     object
dtypes: object(2)
memory usage: 1.0+ KB


In [12]:
jakarta_df.describe()

Unnamed: 0,Neighborhood,Town
count,44,44
unique,44,6
top,Tebet,East Jakarta
freq,1,10


## Explore Surabaya

### In this part i will do data wrangling from wikipedia page for providing neighborhood data and information in Surabaya
URL : https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya

The approach to get data is pretty much the same from what i did with Jakarta Neighborhood

In [13]:
# Read html
surabaya_url = 'https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya'
surabaya_wiki = requests.get(surabaya_url)

# Read data
surabaya_data = pd.read_html(surabaya_wiki.text)

# Get first data from wiki page and assign it to pandas dataframe
surabaya_df = surabaya_data[0]

# Add column town
surabaya_df['Town'] = 'Surabaya'

# Remove unnecesary row
surabaya_df.dropna(inplace=True)

## Feature Selection

In [14]:
# Drop columns 'Kode Kemendagri', 'Jumlah Kelurahan', and 'Daftar Kelurahan'
surabaya_df.drop(columns=['Kode Kemendagri', 'Jumlah Kelurahan', 'Daftar Kelurahan'], inplace=True)
surabaya_df.head()

Unnamed: 0,Kecamatan,Town
0,Asemrowo,Surabaya
1,Benowo,Surabaya
2,Bubutan,Surabaya
3,Bulak,Surabaya
4,Dukuh Pakis,Surabaya


In [15]:
# Rename column 'Kecamatan' into 'Neighborhood'
surabaya_df.rename(columns={'Kecamatan':'Neighborhood'}, inplace=True)
surabaya_df.head()

Unnamed: 0,Neighborhood,Town
0,Asemrowo,Surabaya
1,Benowo,Surabaya
2,Bubutan,Surabaya
3,Bulak,Surabaya
4,Dukuh Pakis,Surabaya


Here are information about surabaya_df

In [16]:
surabaya_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 31 entries, 0 to 30
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Neighborhood  31 non-null     object
 1   Town          31 non-null     object
dtypes: object(2)
memory usage: 744.0+ bytes


## Nominatim OpenStreetMap API

To get information about latitude and longitude for each neighborhood in Jakarta and Surabaya we can use Nominatim from **geopy.geocoders.Nominatim** package to provide coordinates passing neighborhood keyword as an argument.

First we create Nominatim object

In [17]:
# Create Nominatim object as 'geolocator'
geolocator = Nominatim(user_agent='explorer')

Now we create the function in order to apply it to the both dataframe

In [18]:
# All of these function will provide information about latitude and longitude for neighborhood
def get_latitude_jakarta(neighborhood):
    location = geolocator.geocode(f'{neighborhood}, Jakarta, Indonesia')
    latitude = location.latitude
    return latitude

def get_longitude_jakarta(neighborhood):
    location = geolocator.geocode(f'{neighborhood}, Jakarta, Indonesia')
    longitude = location.longitude
    return longitude

def get_latitude_surabaya(neighborhood):
    location = geolocator.geocode(f'{neighborhood}, Surabaya, Indonesia')
    latitude = location.latitude
    return latitude

def get_longitude_surabaya(neighborhood):
    location = geolocator.geocode(f'{neighborhood}, Surabaya, Indonesia')
    longitude = location.longitude
    return longitude

Find the information of latitude and longitude for neighborhood in Jakarta

In [19]:
jakarta_df['Latitude'] = jakarta_df['Neighborhood'].apply(get_latitude_jakarta)
jakarta_df['Longitude'] = jakarta_df['Neighborhood'].apply(get_longitude_jakarta)
jakarta_df.head()

Unnamed: 0,Neighborhood,Town,Latitude,Longitude
0,Cempaka Putih,Central Jakarta,-6.181214,106.868548
1,Gambir,Central Jakarta,-6.176684,106.830653
2,Johar Baru,Central Jakarta,-6.183125,106.855332
3,Kemayoran,Central Jakarta,-6.162546,106.85689
4,Menteng,Central Jakarta,-6.195026,106.832224


Find the information of latitude and longitude for neighborhood in Surabaya

In [20]:
surabaya_df['Latitude'] = surabaya_df['Neighborhood'].apply(get_latitude_surabaya)
surabaya_df['Longitude'] = surabaya_df['Neighborhood'].apply(get_longitude_surabaya)
surabaya_df.head()

Unnamed: 0,Neighborhood,Town,Latitude,Longitude
0,Asemrowo,Surabaya,-7.24174,112.688802
1,Benowo,Surabaya,-7.229055,112.649775
2,Bubutan,Surabaya,-7.252671,112.730062
3,Bulak,Surabaya,-7.228354,112.787631
4,Dukuh Pakis,Surabaya,-7.293024,112.695125


#### Saving dataframe as csv for further use

In [21]:
jakarta_df.to_csv('data/jakarta_neighborhood.csv')
surabaya_df.to_csv('data/surabaya_neighborhood.csv')

## 4.2 Map Visualize

Visualizing map using Folium API with OpenStreetMap view with information of neighborhood from both dataframes

### Jakarta Neighborhood Map View

get coordinates for Jakarta

In [22]:
address = 'Jakarta'

location = geolocator.geocode(address)
jakarta_latitude = location.latitude
jakarta_longitude = location.longitude
print(f'Coordinates of Jakarta are {jakarta_latitude}, {jakarta_longitude}')

Coordinates of Jakarta are -6.1753942, 106.827183


### Folium OpenStreetMap of Jakarta Neighborhood

In [23]:
jakarta_map = folium.Map(location=[jakarta_latitude, jakarta_longitude], zoom_start=11)

for latitude, longitude, borough, neighborhood in zip(jakarta_df['Latitude'], jakarta_df['Longitude'], jakarta_df['Town'], jakarta_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(jakarta_map)
    
jakarta_map

### Surabaya Neighborhood Map View

get coordinates for Surabaya

In [24]:
address = 'Surabaya'

location = geolocator.geocode(address)
surabaya_latitude = location.latitude
surabaya_longitude = location.longitude
print(f'Coordinates of Surabaya are {surabaya_latitude}, {surabaya_longitude}')

Coordinates of Surabaya are -7.2459717, 112.7378266


### Folium OpenStreetMap of Jakarta Neighborhood

In [25]:
surabaya_map = folium.Map(location=[surabaya_latitude, surabaya_longitude], zoom_start=11)

for latitude, longitude, borough, neighborhood in zip(surabaya_df['Latitude'], surabaya_df['Longitude'], surabaya_df['Town'], surabaya_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(surabaya_map)
    
surabaya_map

## 4.3 Foursquare API

Defining Foursquare API Credentials and Version

In [26]:
#@hidden cells
CLIENT_ID = 'L00XEGIHDO1OSWE2JBEV5WPRY4IILAHBLKKZ54WGRU51OTPU' # your Foursquare ID
CLIENT_SECRET = 'VZ2VJZJIZA1XAY4OXXOTMQHVMFUPVSPTBI21LUJSIDWE1K4U' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L00XEGIHDO1OSWE2JBEV5WPRY4IILAHBLKKZ54WGRU51OTPU
CLIENT_SECRET:VZ2VJZJIZA1XAY4OXXOTMQHVMFUPVSPTBI21LUJSIDWE1K4U


### Get nearby venues
Create a Function that retrieves information about venues and places in given latitudes and longitudes Using Foursquare API

In [27]:
# Function that return latitude, longitude, venues, and venue_categories in neighborhood_df
def get_nearby_venues(names, latitudes, longitudes, radius=500):
    
    # create an empty list
    venues_list=[]
    
    # for loop that iterate through dataframe
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

        
    # Create pandas dataframe from venues_list
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

### Get venues data for each neighborhood in Jakarta

In [28]:
jakarta_venues = get_nearby_venues(jakarta_df['Neighborhood'], jakarta_df['Latitude'], jakarta_df['Longitude'])

Cempaka Putih
Gambir
Johar Baru
Kemayoran
Menteng
Sawah Besar
Senen
Tanah Abang
Cilincing
Kelapa Gading
Koja
Pademangan
Penjaringan
Tanjung Priok
Cakung
Cipayung
Ciracas
Duren Sawit
Jatinegara
Kramat Jati
Makasar
Matraman
Pasar Rebo
Pulo Gadung
Cilandak
Jagakarsa
Kebayoran Baru
Kebayoran Lama
Mampang Prapatan
Pancoran
Pasar Minggu
Pesanggrahan
Setiabudi
Tebet
Cengkareng
Grogol Petamburan
Taman Sari
Tambora
Kebon Jeruk
Kalideres
Palmerah
Kembangan
Kepulauan Seribu Utara
Kepulauan Seribu Selatan


### Get venues data for each neighborhood in Surabaya

In [29]:
surabaya_venues = get_nearby_venues(surabaya_df['Neighborhood'], surabaya_df['Latitude'], surabaya_df['Longitude'])

Asemrowo
Benowo
Bubutan
Bulak
Dukuh Pakis
Gayungan
Genteng
Gubeng
Gunung Anyar
Jambangan
Karang Pilang
Kenjeran
Krembangan
Lakarsantri
Mulyorejo
Pabean Cantian
Pakal
Rungkut
Sambikerep
Sawahan
Semampir
Simokerto
Sukolilo
Sukomanunggal
Tambaksari
Tandes
Tegalsari
Tenggilis Mejoyo
Wiyung
Wonocolo
Wonokromo


### Check the size of the resulting dataframe (Jakarta and Surabaya)

In [30]:
# Jakarta Venues dataframe
print(f'Shape of jakarta_venues dataframe : {jakarta_venues.shape}\n')
print('Head of jakarta_venues dataframe : ')
jakarta_venues.head()


Shape of jakarta_venues dataframe : (567, 5)

Head of jakarta_venues dataframe : 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Cempaka Putih,-6.181214,106.868548,Mie Aceh Bungong Cempaka,Acehnese Restaurant
1,Cempaka Putih,-6.181214,106.868548,Arcici Swiming Pool™,Pool
2,Cempaka Putih,-6.181214,106.868548,Pizza Hut,Pizza Place
3,Cempaka Putih,-6.181214,106.868548,Pizza Hut,Pizza Place
4,Cempaka Putih,-6.181214,106.868548,Bebek Bentu,BBQ Joint


In [31]:
# Surabaya venues dataframe
print(f'Shape of surabaya_venues dataframe : {surabaya_venues.shape}\n')
print('Head of surabaya_venues dataframe : ')
surabaya_venues.head()

Shape of surabaya_venues dataframe : (224, 5)

Head of surabaya_venues dataframe : 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Asemrowo,-7.24174,112.688802,CANTEEN,Wine Bar
1,Benowo,-7.229055,112.649775,"Benowo Trade Centre [ BTC ], Benowo, Surabaya",Shoe Store
2,Benowo,-7.229055,112.649775,Stadion GBT Benowo,Soccer Field
3,Benowo,-7.229055,112.649775,Pecel B.Yatin ketabang kali,Food Court
4,Bubutan,-7.252671,112.730062,CGV Cinemas,Multiplex


#### Saving dataframe as csv for further use

In [32]:
jakarta_venues.to_csv('data/jakarta_venues.csv')
surabaya_venues.to_csv('data/surabaya_venues.csv')

## 4.4 Check how many venues were returned for each neighborhood

### Jakarta

In [33]:
jakarta_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cakung,3,3,3,3
Cempaka Putih,6,6,6,6
Cengkareng,4,4,4,4
Cilandak,22,22,22,22
Cilincing,3,3,3,3
Cipayung,2,2,2,2
Ciracas,3,3,3,3
Duren Sawit,5,5,5,5
Gambir,25,25,25,25
Grogol Petamburan,30,30,30,30


Unique Categories in Jakarta

In [34]:
print(f'There are {len(jakarta_venues["Venue Category"].unique())} uniques categories in Jakarta.')

There are 141 uniques categories in Jakarta.


### Surabaya

In [35]:
surabaya_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Asemrowo,1,1,1,1
Benowo,3,3,3,3
Bubutan,13,13,13,13
Bulak,8,8,8,8
Dukuh Pakis,3,3,3,3
Gayungan,8,8,8,8
Genteng,11,11,11,11
Gubeng,28,28,28,28
Jambangan,5,5,5,5
Karang Pilang,1,1,1,1


Unique Cagegories in Surabaya

In [36]:
print(f'There are {len(surabaya_venues["Venue Category"].unique())} uniques categories in Surabaya.')

There are 76 uniques categories in Surabaya.


## 4.5 One Hot Encoding

In order to find top 5 most common venue, we need to transform each categorical data into number with One Hot Encoding using **pandas.get_dummies()** function

### One hot encoding for Jakarta venues 

In [37]:
# one hot encoding
jakarta_onehot = pd.get_dummies(jakarta_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jakarta_onehot['Neighborhood'] = jakarta_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [jakarta_onehot.columns[-1]] + list(jakarta_onehot.columns[:-1])
jakarta_onehot = jakarta_onehot[fixed_columns]

jakarta_onehot.head()

Unnamed: 0,Wings Joint,Accessories Store,Acehnese Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Track,Track Stadium,Trail,Train,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [38]:
# Shape of jakarta_onehot
jakarta_onehot.shape

(567, 141)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [39]:
jakarta_grouped = jakarta_onehot.groupby('Neighborhood').mean().reset_index()
jakarta_grouped.head()

Unnamed: 0,Neighborhood,Wings Joint,Accessories Store,Acehnese Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Track,Track Stadium,Trail,Train,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Cakung,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Cempaka Putih,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Cengkareng,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cilandak,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cilincing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a pandas dataframe for each neighborhood with the top 10 most common venues

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [41]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
jakarta_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
jakarta_neighborhoods_venues_sorted['Neighborhood'] = jakarta_grouped['Neighborhood']

for ind in np.arange(jakarta_grouped.shape[0]):
    jakarta_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(jakarta_grouped.iloc[ind, :], num_top_venues)

jakarta_neighborhoods_venues_sorted.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cakung,Lounge,Gas Station,Wine Bar,French Restaurant,Food Stand
1,Cempaka Putih,Pizza Place,Acehnese Restaurant,Pool,Indonesian Meatball Place,BBQ Joint
2,Cengkareng,Music Venue,Pet Store,Restaurant,Movie Theater,Electronics Store
3,Cilandak,Gym,Indonesian Restaurant,Food Truck,Convenience Store,Pizza Place
4,Cilincing,Park,Diner,Shopping Mall,Wine Bar,Farmers Market


In [42]:
# shape
jakarta_neighborhoods_venues_sorted.shape

(44, 6)

### One hot encoding for Surabaya venues 

In [43]:
# one hot encoding
surabaya_onehot = pd.get_dummies(surabaya_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
surabaya_onehot['Neighborhood'] = surabaya_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [surabaya_onehot.columns[-1]] + list(surabaya_onehot.columns[:-1])
surabaya_onehot = surabaya_onehot[fixed_columns]

surabaya_onehot.head()

Unnamed: 0,Neighborhood,Arcade,Asian Restaurant,Australian Restaurant,Bakery,Balinese Restaurant,Basketball Court,Batik Shop,Beach,Bed & Breakfast,...,Soccer Field,Soccer Stadium,Soup Place,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Theme Park,Vegetarian / Vegan Restaurant,Wine Bar
0,Asemrowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,Benowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Benowo,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,Benowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bubutan,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [44]:
# Shape of surabaya_onehot
surabaya_onehot.shape

(224, 77)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [45]:
surabaya_grouped = surabaya_onehot.groupby('Neighborhood').mean().reset_index()
surabaya_grouped.head()

Unnamed: 0,Neighborhood,Arcade,Asian Restaurant,Australian Restaurant,Bakery,Balinese Restaurant,Basketball Court,Batik Shop,Beach,Bed & Breakfast,...,Soccer Field,Soccer Stadium,Soup Place,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Theme Park,Vegetarian / Vegan Restaurant,Wine Bar
0,Asemrowo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,Benowo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bubutan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0
3,Bulak,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
4,Dukuh Pakis,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a pandas dataframe for each neighborhood with the top 10 most common venues

In [46]:
# create a new dataframe
surabaya_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
surabaya_neighborhoods_venues_sorted['Neighborhood'] = surabaya_grouped['Neighborhood']

for ind in np.arange(surabaya_grouped.shape[0]):
    surabaya_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(surabaya_grouped.iloc[ind, :], num_top_venues)

surabaya_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Asemrowo,Wine Bar,Fish & Chips Shop,Dessert Shop,Diner,Donut Shop
1,Benowo,Shoe Store,Food Court,Soccer Field,Wine Bar,Dessert Shop
2,Bubutan,Indonesian Restaurant,Hotel,Multiplex,Donut Shop,Pizza Place
3,Bulak,Beach,Indonesian Restaurant,Food Truck,Fish & Chips Shop,Dessert Shop
4,Dukuh Pakis,Convenience Store,Mobile Phone Shop,Boutique,Fast Food Restaurant,Diner


In [47]:
# shape
surabaya_neighborhoods_venues_sorted.shape

(29, 6)

In [48]:
# Save into csv for further use
jakarta_grouped.to_csv('data/jakarta_grouped_onehot.csv')
jakarta_neighborhoods_venues_sorted.to_csv('data/jakarta_neighborhoods_venues_sorted.csv')

surabaya_grouped.to_csv('data/surabaya_grouped_onehot.csv')
surabaya_neighborhoods_venues_sorted.to_csv('data/surabaya_neighborhoods_venues_sorted.csv')

# 5. Modeling

After we get data about top 10 most common venue for each neighborhood in Jakarta and Surabaya we can begin create a clustering model using **K-Means Clustering** library from Scikit-Learn

We will run the K-Means Clustering to cluster and segment the neighborhood into 3 different clusters based on type of venues and places.

In [49]:
# set number of clusters
kclusters = 5

# instantiate kmeans model
kmeans = KMeans(n_clusters=kclusters, random_state=0)

## 5.1 Prepare the data (features) for modeling

We will use grouped dataframe for Jakarta and Surabaya that is containing values of one hot encoded venues and places and drop 'Neighborhood' column that is contain Neighborhood name (string dtypes)

In [50]:
# Jakarta data
jakarta_cluster = jakarta_grouped.drop(columns=['Neighborhood'])

# Surabaya data
surabaya_cluster = surabaya_grouped.drop(columns=['Neighborhood'])

## 5.2 Begin modeling

### Clustering in Jakarta Neighborhood

In [51]:
# fit the data
jakarta_kmmeans = kmeans.fit(jakarta_cluster)

# check cluster labels generated for each row in the dataframe
jakarta_kmmeans.labels_

array([3, 3, 3, 3, 0, 3, 4, 3, 3, 3, 4, 3, 4, 3, 3, 3, 3, 4, 4, 3, 3, 2,
       4, 4, 4, 4, 3, 3, 3, 3, 4, 3, 4, 3, 4, 4, 4, 4, 3, 3, 3, 4, 1, 4])

In [52]:
# add clustering labels
jakarta_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

jakarta_merged = jakarta_df

# merge jakarta_grouped with neighborhood_df to add latitude/longitude for each neighborhood
jakarta_merged = jakarta_merged.join(jakarta_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

jakarta_merged.head() 

Unnamed: 0,Neighborhood,Town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cempaka Putih,Central Jakarta,-6.181214,106.868548,3,Pizza Place,Acehnese Restaurant,Pool,Indonesian Meatball Place,BBQ Joint
1,Gambir,Central Jakarta,-6.176684,106.830653,3,Indonesian Restaurant,Park,Coffee Shop,Fast Food Restaurant,Food Court
2,Johar Baru,Central Jakarta,-6.183125,106.855332,4,Food Truck,Convenience Store,Arcade,Indonesian Restaurant,Italian Restaurant
3,Kemayoran,Central Jakarta,-6.162546,106.85689,4,Diner,Arcade,Indonesian Restaurant,Noodle House,Wine Bar
4,Menteng,Central Jakarta,-6.195026,106.832224,3,Indonesian Restaurant,Breakfast Spot,Coffee Shop,Park,Sushi Restaurant


In [53]:
# Check na value for 'Cluster Label'
jakarta_merged['Cluster Labels'].isna().sum()

0

### Clustering in Surabaya Neighborhood

In [54]:
# fit the data
surabaya_kmmeans = kmeans.fit(surabaya_cluster)

# check cluster labels generated for each row in the dataframe
surabaya_kmmeans.labels_

array([4, 1, 1, 1, 0, 0, 1, 1, 0, 3, 1, 0, 2, 1, 1, 1, 0, 1, 2, 1, 1, 2,
       2, 1, 1, 1, 1, 1, 2])

In [55]:
# add clustering labels
surabaya_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

surabaya_merged = surabaya_df

# merge surabaya_grouped with neighborhood_df to add latitude/longitude for each neighborhood
surabaya_merged = surabaya_merged.join(surabaya_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

surabaya_merged.head() 

Unnamed: 0,Neighborhood,Town,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Asemrowo,Surabaya,-7.24174,112.688802,4.0,Wine Bar,Fish & Chips Shop,Dessert Shop,Diner,Donut Shop
1,Benowo,Surabaya,-7.229055,112.649775,1.0,Shoe Store,Food Court,Soccer Field,Wine Bar,Dessert Shop
2,Bubutan,Surabaya,-7.252671,112.730062,1.0,Indonesian Restaurant,Hotel,Multiplex,Donut Shop,Pizza Place
3,Bulak,Surabaya,-7.228354,112.787631,1.0,Beach,Indonesian Restaurant,Food Truck,Fish & Chips Shop,Dessert Shop
4,Dukuh Pakis,Surabaya,-7.293024,112.695125,0.0,Convenience Store,Mobile Phone Shop,Boutique,Fast Food Restaurant,Diner


In [56]:
# Check na value for 'Cluster Label'
surabaya_merged['Cluster Labels'].isna().sum()

2

In [57]:
# drop NA value
surabaya_merged.dropna(subset=['Cluster Labels'], inplace=True)

Shape for each dataframe and save dataframe as csv for further use

In [58]:
# shape of jakarta_merged and surabaya_merged
print(f'Shape of jakarta_merged : {jakarta_merged.shape}')
print(f'Shape of surabaya_merged : {surabaya_merged.shape}')

# save dataframe as csv
jakarta_merged.to_csv('data/jakarta_clustering.csv')
surabaya_merged.to_csv('data/surabaya_clustering.csv')

Shape of jakarta_merged : (44, 10)
Shape of surabaya_merged : (29, 10)


## 5.3 Visualizing the clusters

### Jakarta Clusters

In [59]:
# create map
jakarta_map_clusters = folium.Map(location=[jakarta_latitude, jakarta_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(jakarta_merged['Latitude'], jakarta_merged['Longitude'], jakarta_merged['Neighborhood'], jakarta_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(jakarta_map_clusters)
       
jakarta_map_clusters

### Surabaya Clusters

In [60]:
# create map
surabaya_map_clusters = folium.Map(location=[surabaya_latitude, surabaya_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(surabaya_merged['Latitude'], surabaya_merged['Longitude'], surabaya_merged['Neighborhood'], surabaya_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(surabaya_map_clusters)
       
surabaya_map_clusters

# 6. Results and Discussion

In this section we will see the clusters results from Jakarta Neighborhood and Surabaya Neighborhood.

## 6.1 Results in Jakarta Neighborhood

### Cluster 1

In this cluster we can see there is only 

In [75]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 0, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Cilincing,Park,Diner,Shopping Mall,Wine Bar,Farmers Market


### Cluster 2

In this cluster we can see

In [72]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 1, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,Tanjung Priok,Bakery,Wine Bar,Frozen Yogurt Shop,French Restaurant,Food Truck


### Cluster 3

In this cluster we can see

In [73]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 2, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
47,Kepulauan Seribu Utara,Resort,Wine Bar,Electronics Store,Food Stand,Food Court


### Cluster 4

In this cluster we can see

In [74]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 3, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cempaka Putih,Pizza Place,Acehnese Restaurant,Pool,Indonesian Meatball Place,BBQ Joint
1,Gambir,Indonesian Restaurant,Park,Coffee Shop,Fast Food Restaurant,Food Court
4,Menteng,Indonesian Restaurant,Breakfast Spot,Coffee Shop,Park,Sushi Restaurant
12,Pademangan,Hotel,Pool,Bowling Alley,Theme Park,Vegetarian / Vegan Restaurant
13,Penjaringan,Boutique,Theme Park,Gift Shop,Grocery Store,Food Truck
16,Cakung,Lounge,Gas Station,Wine Bar,French Restaurant,Food Stand
17,Cipayung,Food,Asian Restaurant,Wine Bar,Frozen Yogurt Shop,French Restaurant
19,Duren Sawit,Indonesian Meatball Place,Convenience Store,Coffee Shop,Mediterranean Restaurant,Wine Bar
20,Jatinegara,Jewelry Store,Asian Restaurant,Arts & Crafts Store,Donut Shop,Food Truck
23,Matraman,Convenience Store,Pizza Place,Electronics Store,Food Stand,Food Court


### Cluster 5

In this cluster we can see

In [76]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 4, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Johar Baru,Food Truck,Convenience Store,Arcade,Indonesian Restaurant,Italian Restaurant
3,Kemayoran,Diner,Arcade,Indonesian Restaurant,Noodle House,Wine Bar
5,Sawah Besar,Indonesian Restaurant,Pet Store,Convenience Store,Noodle House,Asian Restaurant
6,Senen,Hotel,Indonesian Restaurant,Grocery Store,History Museum,University
7,Tanah Abang,Indonesian Restaurant,Coffee Shop,Seafood Restaurant,Pizza Place,Noodle House
10,Kelapa Gading,Indonesian Restaurant,Asian Restaurant,Steakhouse,Korean Restaurant,Japanese Restaurant
11,Koja,Pizza Place,Indonesian Restaurant,Restaurant,Grocery Store,Donut Shop
18,Ciracas,Noodle House,Indonesian Restaurant,Golf Course,Farmers Market,Food Stand
21,Kramat Jati,Hospital,Indonesian Restaurant,Seafood Restaurant,Chinese Restaurant,Noodle House
22,Makasar,Indonesian Restaurant,Airport Terminal,Asian Restaurant,Wine Bar,Farmers Market


## 6.2 Results in Surabaya Neighborhood

### Cluster 1

In this cluster we can see

In [77]:
surabaya_merged.loc[surabaya_merged['Cluster Labels'] == 0, surabaya_merged.columns[[0] + list(range(5, surabaya_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Dukuh Pakis,Convenience Store,Mobile Phone Shop,Boutique,Fast Food Restaurant,Diner
5,Gayungan,Asian Restaurant,Convenience Store,Hotel,Café,Boutique
9,Jambangan,Coffee Shop,Convenience Store,Asian Restaurant,Food Truck,Fish & Chips Shop
12,Krembangan,Convenience Store,Farmers Market,Museum,Coffee Shop,Food Truck
17,Rungkut,Convenience Store,Coffee Shop,Grocery Store,Fast Food Restaurant,Dessert Shop


### Cluster 2

In this cluster we can see

In [78]:
surabaya_merged.loc[surabaya_merged['Cluster Labels'] == 1, surabaya_merged.columns[[0] + list(range(5, surabaya_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Benowo,Shoe Store,Food Court,Soccer Field,Wine Bar,Dessert Shop
2,Bubutan,Indonesian Restaurant,Hotel,Multiplex,Donut Shop,Pizza Place
3,Bulak,Beach,Indonesian Restaurant,Food Truck,Fish & Chips Shop,Dessert Shop
6,Genteng,Soup Place,Furniture / Home Store,Park,Breakfast Spot,Bed & Breakfast
7,Gubeng,Indonesian Restaurant,Food Truck,Electronics Store,Chinese Restaurant,Café
11,Kenjeran,Food Court,Pizza Place,Mosque,Mobile Phone Shop,Wine Bar
14,Mulyorejo,Indonesian Restaurant,Coffee Shop,Convenience Store,Pizza Place,Mobile Phone Shop
15,Pabean Cantian,Indonesian Restaurant,History Museum,Food Truck,Café,Farmers Market
16,Pakal,Campground,Wine Bar,Fish & Chips Shop,Diner,Donut Shop
18,Sambikerep,Food Truck,Indonesian Restaurant,Wine Bar,Fast Food Restaurant,Dessert Shop


### Cluster 3

In this cluster we can see

In [79]:
surabaya_merged.loc[surabaya_merged['Cluster Labels'] == 2, surabaya_merged.columns[[0] + list(range(5, surabaya_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Lakarsantri,Café,Wine Bar,Fish & Chips Shop,Diner,Donut Shop
19,Sawahan,Karaoke Bar,Café,Seafood Restaurant,Convention Center,Diner
22,Sukolilo,Café,Supermarket,Steakhouse,Gym / Fitness Center,Asian Restaurant
23,Sukomanunggal,Noodle House,Café,Hardware Store,Soup Place,Juice Bar
30,Wonokromo,Coffee Shop,Café,Wine Bar,Convention Center,Diner


### Cluster 4

In this cluster we can see

In [80]:
surabaya_merged.loc[surabaya_merged['Cluster Labels'] == 3, surabaya_merged.columns[[0] + list(range(5, surabaya_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Karang Pilang,Soccer Stadium,Wine Bar,Fish & Chips Shop,Dessert Shop,Diner


### Cluster 5

In this cluster we can see

In [81]:
surabaya_merged.loc[surabaya_merged['Cluster Labels'] == 4, surabaya_merged.columns[[0] + list(range(5, surabaya_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Asemrowo,Wine Bar,Fish & Chips Shop,Dessert Shop,Diner,Donut Shop


## 6.3 Discussion

# 7. Conclusion

## References :
- [Coursera Applied Data Science Capstone Course](https://www.coursera.org/learn/applied-data-science-capstone)

## Thanks to : 
- [Foursquare Developer API](https://foursquare.com/developers/)
- [Indonesian Ministry of Internal Affairs (Kemendagri)](https://www.kemendagri.go.id/page/read/48/peraturan-menteri-dalam-negeri-no72-tahun-2019) Accessed via Wikipedia page [neighborhood Jakarta](https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta) and [neighborhood Surabaya](https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya).
- OpenStreetView
- Python library package (pandas, numpy, matplotlibt, folium, scikit-learn, and geopy)

## Tools :
- Jupyter Notebooks using Visual Studio Code
- GitHub (version control)