## Recommendation for Hotel Construction in Bandung, Indonesia through Clustering Analysis

## Introduction
In this notebook writer tries to build a recommender system based on K-Means Clustering for Hotel Construction in Bandung, Indonesia.

In [1]:
#Basics
import numpy as np
import pandas as pd 

#JSON
import json
import requests
from pandas.io.json import json_normalize 

#Geopy
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 
!conda install -c conda-forge folium=0.5.0 --yes 
import folium

# Plotting
import matplotlib.cm as mpcm
import matplotlib.colors as mpcol

#SKLearn for KMN
from sklearn.cluster import KMeans

# BeautifulSoup for Webscrape
from bs4 import BeautifulSoup

#XML Reader
import xml

print('OK done')

Collecting package metadata: done
Solving environment: \ 
The environment is inconsistent, please check the package plan carefully
The following packages are causing the inconsistency:

  - defaults/linux-64::anaconda==5.3.1=py37_0
  - defaults/linux-64::astropy==3.0.4=py37h14c3975_0
  - defaults/linux-64::bkcharts==0.2=py37_0
  - defaults/linux-64::blaze==0.11.3=py37_0
  - defaults/linux-64::bokeh==0.13.0=py37_0
  - defaults/linux-64::bottleneck==1.2.1=py37h035aef0_1
  - defaults/linux-64::dask==0.19.1=py37_0
  - defaults/linux-64::datashape==0.5.4=py37_1
  - defaults/linux-64::mkl-service==1.1.2=py37h90e4bf4_5
  - defaults/linux-64::numba==0.39.0=py37h04863e7_0
  - defaults/linux-64::numexpr==2.6.8=py37hd89afb7_0
  - defaults/linux-64::odo==0.5.1=py37_0
  - defaults/linux-64::pytables==3.4.4=py37ha205bf6_0
  - defaults/linux-64::pytest-arraydiff==0.2=py37h39e3cac_0
  - defaults/linux-64::pytest-astropy==0.4.0=py37_0
  - defaults/linux-64::pytest-doctestplus==0.1.3=py37_0
  - defaults

<h3>Creating Bandung Map<h3>

Bandung GIS data set is **credit to https://github.com/tyohan/bandung-map-dataset**

In [4]:
#Load JSON
with open('bandung-kelurahan.json') as json_data:
    bandung_data = json.load(json_data)

In [5]:
#All relevants data is in features cell
neighborhoods_data = bandung_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'properties': {'FID': 0,
  'KELURAHAN': 'Isola',
  'KECAMATAN': 'Sukasari',
  'PENDUDUK': 9722,
  'AREA': 1987190.006,
  'PERIMETER': 9143,
  'ACRES': 491.043,
  'HECTARES': 198.719,
  'KPDTN': 156.27,
  'KPDTA_BRUT': 48.92},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[107.592640917144, -6.865201563331791],
    [107.58980224084617, -6.864648600836308],
    [107.58960461148367, -6.864425631906367],
    [107.58797866081942, -6.8639618561974896],
    [107.58463692796249, -6.862659714291793],
    [107.58367573060849, -6.862517013592402],
    [107.58347810124597, -6.862802414948386],
    [107.58195096526298, -6.861990804393804],
    [107.5815197739266, -6.862392150445362],
    [107.58076518908793, -6.861865941108766],
    [107.5808011216993, -6.8613397311905855],
    [107.58199588102718, -6.860635144118334],
    [107.58246300497491, -6.860064338637211],
    [107.58328945503631, -6.860037582113488],
    [107.58353200016302, -6.860189202394663],
    [107.58449319

<h4>Create Pandas Dataframe<h4>

Kelurahan means Neighborhood in Indonesian, and Kecamatan is equivalent to Borough

In [6]:
# define the dataframe columns
column_names = ['Boroughs', 'Neighborhoods', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Boroughs,Neighborhoods,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [7]:
for data in neighborhoods_data:
    Boroughs = neighborhood_name = data['properties']['KECAMATAN'] 
    neighborhood_name = data['properties']['KELURAHAN']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[0:1][0][0][1]
    neighborhood_lon = neighborhood_latlon[0:1][0][0][0]
    
    neighborhoods = neighborhoods.append({'Boroughs': Boroughs,
                                          'Neighborhoods': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [8]:
neighborhoods.head()

Unnamed: 0,Boroughs,Neighborhoods,Latitude,Longitude
0,Sukasari,Isola,-6.865202,107.592641
1,Sukasari,Geger Kalong,-6.86134,107.580801
2,Sukasari,Sukarasa,-6.876823,107.579175
3,Sukasari,Sarijadi,-6.867003,107.575214
4,Sukajadi,Sukawarna,-6.881309,107.579723


Just to know the number of boroughs and neighborhoods

In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Boroughs'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 26 boroughs and 139 neighborhoods.


Geopy is used to determine Bandung location

In [10]:
address = 'Bandung, ID'

geolocator = Nominatim(user_agent="bdg_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bandung are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bandung are -6.9344694, 107.6049539.


Creating map of Bandung using folium

In [11]:
# create map of New York using latitude and longitude values
map_bandung = folium.Map(location=[latitude, longitude], zoom_start=12.2)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Boroughs'], neighborhoods['Neighborhoods']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.6,
        parse_html=False).add_to(map_bandung)  
    
map_bandung

<h3>Getting Venue Data<h3?

<h4>Foursquare Credentials<h4>

In [12]:
CLIENT_ID = 'GSSY4GWNQGVOSNI0KJIZR1RDEVSBXRPGAW3BJCXYEGJLVAV2' # your Foursquare ID
CLIENT_SECRET = 'U3AKXJWQVVSZCDQ2L0UQCLSBERTVNHAJTFBTXOVPYT2NATZY' # your Foursquare Secret
VERSION = '20190505' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GSSY4GWNQGVOSNI0KJIZR1RDEVSBXRPGAW3BJCXYEGJLVAV2
CLIENT_SECRET:U3AKXJWQVVSZCDQ2L0UQCLSBERTVNHAJTFBTXOVPYT2NATZY


Foursquare Crawler credit to: https://github.com/chenyang03/Foursquare_Crawler

In [13]:
def foursquare_crawler (neighborhood_list, lat_list, lng_list, LIMIT = 500, radius = 1000):
    result_ds = []
    counter = 0
    for neighborhood, lat, lng in zip(neighborhood_list, lat_list, lng_list):
         
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, 
            lat, lng, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        tmp_dict = {}
        tmp_dict['Neighborhood(s)'] = neighborhood; 
        tmp_dict['Latitude'] = lat; tmp_dict['Longitude'] = lng;
        tmp_dict['Crawling_result'] = results;
        result_ds.append(tmp_dict)
        counter += 1
        print('{}.'.format(counter))
        print('Data is Obtained, neighborhoods {} SUCCESSFULLY.'.format(neighborhood))
    return result_ds;

In [14]:
print('Crawling on Bandung')
bandung_foursquare_dataset = foursquare_crawler(list(neighborhoods['Neighborhoods']),
                                                   list(neighborhoods['Latitude']),
                                                   list(neighborhoods['Longitude']),)

Crawling on Bandung
1.
Data is Obtained, neighborhoods Isola SUCCESSFULLY.
2.
Data is Obtained, neighborhoods Geger Kalong SUCCESSFULLY.
3.
Data is Obtained, neighborhoods Sukarasa SUCCESSFULLY.
4.
Data is Obtained, neighborhoods Sarijadi SUCCESSFULLY.
5.
Data is Obtained, neighborhoods Sukawarna SUCCESSFULLY.
6.
Data is Obtained, neighborhoods Sukagalih SUCCESSFULLY.
7.
Data is Obtained, neighborhoods Cipedes SUCCESSFULLY.
8.
Data is Obtained, neighborhoods Sukabungah SUCCESSFULLY.
9.
Data is Obtained, neighborhoods Pasteur SUCCESSFULLY.
10.
Data is Obtained, neighborhoods Ledeng SUCCESSFULLY.
11.
Data is Obtained, neighborhoods Ciumbuleuit SUCCESSFULLY.
12.
Data is Obtained, neighborhoods Hegarmanah SUCCESSFULLY.
13.
Data is Obtained, neighborhoods Dago SUCCESSFULLY.
14.
Data is Obtained, neighborhoods Cipaganti SUCCESSFULLY.
15.
Data is Obtained, neighborhoods Babakan Siliwangi SUCCESSFULLY.
16.
Data is Obtained, neighborhoods Sekeloa SUCCESSFULLY.
17.
Data is Obtained, neighborhood

<h4>Cleaning Foursquare Data Set<h4>

This function is credit to https://github.com/alidastgheib

In [15]:
# This function is created to connect to the saved list which is the received database. It will extract each venue 
# for every neighborhood inside the database

def get_venue_dataset(foursquare_dataset):
    result_df = pd.DataFrame(columns = ['Neighborhood', 
                                           'Neighborhood Latitude', 'Neighborhood Longitude',
                                          'Venue', 'Venue Summary', 'Venue Category', 'Distance'])
    # print(result_df)
    
    for neigh_dict in foursquare_dataset:
        neigh = neigh_dict['Neighborhood(s)']
        lat = neigh_dict['Latitude']; lng = neigh_dict['Longitude']
        print('Number of Venues in Coordination "{}" Neighborhood(s) is:'.format(neigh))
        print(len(neigh_dict['Crawling_result']))
        
        for venue_dict in neigh_dict['Crawling_result']:
            summary = venue_dict['reasons']['items'][0]['summary']
            name = venue_dict['venue']['name']
            dist = venue_dict['venue']['location']['distance']
            cat =  venue_dict['venue']['categories'][0]['name']
            
            
     
            result_df = result_df.append({'Neighborhood': neigh, 
                              'Neighborhood Latitude': lat, 'Neighborhood Longitude':lng,
                              'Venue': name, 'Venue Summary': summary, 
                              'Venue Category': cat, 'Distance': dist}, ignore_index = True)
            # print(result_df)
    
    return(result_df)

In [16]:
bandung_venues = get_venue_dataset(bandung_foursquare_dataset)

Number of Venues in Coordination "Isola" Neighborhood(s) is:
68
Number of Venues in Coordination "Geger Kalong" Neighborhood(s) is:
17
Number of Venues in Coordination "Sukarasa" Neighborhood(s) is:
53
Number of Venues in Coordination "Sarijadi" Neighborhood(s) is:
10
Number of Venues in Coordination "Sukawarna" Neighborhood(s) is:
65
Number of Venues in Coordination "Sukagalih" Neighborhood(s) is:
47
Number of Venues in Coordination "Cipedes" Neighborhood(s) is:
79
Number of Venues in Coordination "Sukabungah" Neighborhood(s) is:
96
Number of Venues in Coordination "Pasteur" Neighborhood(s) is:
100
Number of Venues in Coordination "Ledeng" Neighborhood(s) is:
69
Number of Venues in Coordination "Ciumbuleuit" Neighborhood(s) is:
28
Number of Venues in Coordination "Hegarmanah" Neighborhood(s) is:
100
Number of Venues in Coordination "Dago" Neighborhood(s) is:
100
Number of Venues in Coordination "Cipaganti" Neighborhood(s) is:
100
Number of Venues in Coordination "Babakan Siliwangi" Ne

See the head of Bandung Venue List

In [17]:
bandung_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
0,Isola,-6.865202,107.592641,Kedai Utama,This spot is popular,Restaurant,132
1,Isola,-6.865202,107.592641,My Little Kitchen (MYLK) Steakhouse,This spot is popular,Steakhouse,414
2,Isola,-6.865202,107.592641,Mie Baso Gerlong,This spot is popular,Food Truck,75
3,Isola,-6.865202,107.592641,Soerabi Enhaii - Dapoer Ndeso,This spot is popular,Café,245
4,Isola,-6.865202,107.592641,Travello Hotels,This spot is popular,Hotel,552


See the tail of Bandung Venue List

In [18]:
bandung_venues.tail()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
5699,Mekar Mulya,-6.933282,107.691483,MaxOne Hotels,This spot is popular,Hotel,762
5700,Mekar Mulya,-6.933282,107.691483,Pasar Cimol Gedebage,This spot is popular,Clothing Store,739
5701,Mekar Mulya,-6.933282,107.691483,Pasar Gedebage,This spot is popular,Market,718
5702,Pasir Jati,-6.904353,107.713348,Pizza Hut,This spot is popular,Pizza Place,470
5703,Pasir Jati,-6.904353,107.713348,Kampung Wisata Pasir Kunci,This spot is popular,Garden Center,511


Save to CSV and Load again (to save Foursquare queries limit)

In [19]:
bandung_venues.columns

Index(['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
       'Venue', 'Venue Summary', 'Venue Category', 'Distance'],
      dtype='object')

In [20]:
bandung_venues.to_csv('bandung_venues.csv')

In [21]:
bandung_venues = pd.read_csv('bandung_venues.csv')

A brief about captured data

In [22]:
neigh_list = list(bandung_venues['Neighborhood'].unique())
print('Number of Neighborhoods inside Bandung:')
print(len(neigh_list))
print('List of Neighborhoods inside Bandung:')
neigh_list

Number of Neighborhoods inside Bandung:
139
List of Neighborhoods inside Bandung:


['Isola',
 'Geger Kalong',
 'Sukarasa',
 'Sarijadi',
 'Sukawarna',
 'Sukagalih',
 'Cipedes',
 'Sukabungah',
 'Pasteur',
 'Ledeng',
 'Ciumbuleuit',
 'Hegarmanah',
 'Dago',
 'Cipaganti',
 'Babakan Siliwangi',
 'Sekeloa',
 'Lebak Gede',
 'Sadang Serang',
 'Sukaraja',
 'Campaka',
 'Husein Sastranegara',
 'Pajajaran',
 'Pamoyanan',
 'Arjuna',
 'Pasir Kaliki',
 'Taman Sari',
 'Sukaluyu',
 'Cigadung',
 'Neglasari',
 'Citarum',
 'Cihaurgeulis',
 'Babakan Ciamis',
 'Merdeka',
 'Cihapit',
 'Sukamaju',
 'Cicadas',
 'Cikutra',
 'Padasuka',
 'Sukapada',
 'Pasir Layung',
 'Maleber',
 'Garuda',
 'Dungus Cariang',
 'Ciroyom',
 'Kebon Jeruk',
 'Braga',
 'Kebon Pisang',
 'Cijerah',
 'Gempol Sari',
 'Warung Muncang',
 'Cibuntu',
 'Sukahaji',
 'Caringin',
 'Cigondewah Rahayu',
 'Cigondewah Kidul',
 'Cigondewah Kaler',
 'Babakan',
 'Babakan Ciparay',
 'Margahayu Utara',
 'Margasuka',
 'Cirangrang',
 'Jamika',
 'Babakan Tarogong',
 'Babakan Asih',
 'Suka Asih',
 'Cibadak',
 'Karang Anyar',
 'Panjunan',
 'Ny

In [23]:
neigh_venue_summary = bandung_venues.groupby('Neighborhood').count()
neigh_venue_summary.drop(columns = ['Unnamed: 0']).head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Venue Category,Distance
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ancol,22,22,22,22,22,22
Antapani,7,7,7,7,7,7
Antapani Kidul,32,32,32,32,32,32
Antapani Tengah,41,41,41,41,41,41
Arjuna,100,100,100,100,100,100


In [29]:
print('There are {} uniques categories.'.format(len(bandung_venues['Venue Category'].unique())))
print('List:')
list(bandung_venues['Venue Category'].unique())

There are 227 uniques categories.
List:


['Restaurant',
 'Steakhouse',
 'Food Truck',
 'Café',
 'Hotel',
 'Noodle House',
 'Bistro',
 'German Restaurant',
 'Coffee Shop',
 'Soup Place',
 'Museum',
 'BBQ Joint',
 'Park',
 'Sundanese Restaurant',
 'Breakfast Spot',
 'Spa',
 'College Academic Building',
 'Indonesian Restaurant',
 'Baby Store',
 'Bakery',
 'Supermarket',
 'Donut Shop',
 'Convenience Store',
 'Food Court',
 'Motorcycle Shop',
 'Field',
 'Asian Restaurant',
 'Department Store',
 'Pool',
 'Music Venue',
 'Soccer Field',
 'Resort',
 'Event Space',
 'Golf Course',
 'Korean Restaurant',
 'Seafood Restaurant',
 'Art Gallery',
 'Tech Startup',
 'Pizza Place',
 'Martial Arts Dojo',
 'Comfort Food Restaurant',
 'Japanese Restaurant',
 'Indie Movie Theater',
 'Massage Studio',
 'Salon / Barbershop',
 'Padangnese Restaurant',
 'Gift Shop',
 'Cosmetics Shop',
 'Flea Market',
 'Hotel Bar',
 'Bar',
 'Gym / Fitness Center',
 'Video Store',
 'Sculpture Garden',
 'Gastropub',
 'Airport',
 'Video Game Store',
 'Bed & Breakfast',
 '

<h4>Making one-hot encoding to the dataframe<h4>

In [30]:
#one hot encoding
bandung_onehot = pd.get_dummies(data = bandung_venues, drop_first  = False, 
                              prefix = "", prefix_sep = "", columns = ['Venue Category'])
bandung_onehot.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Summary,Distance,Accessories Store,Acehnese Restaurant,African Restaurant,...,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Wine Bar,Winery,Wings Joint,Women's Store,Yoga Studio
0,0,Isola,-6.865202,107.592641,Kedai Utama,This spot is popular,132,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,Isola,-6.865202,107.592641,My Little Kitchen (MYLK) Steakhouse,This spot is popular,414,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,2,Isola,-6.865202,107.592641,Mie Baso Gerlong,This spot is popular,75,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,3,Isola,-6.865202,107.592641,Soerabi Enhaii - Dapoer Ndeso,This spot is popular,245,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,Isola,-6.865202,107.592641,Travello Hotels,This spot is popular,552,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h4>Filtering out Subsets<h4>

According to the client, they want the Hotel to be constructed near the point of interest or attractions, so we manually eliminate non traveling destinations categories

In [35]:
chosen= [ 'Neighborhood',
 'Neighborhood Latitude',
 'Neighborhood Longitude',
  'Museum',
 'Park',
 'Spa',
 'Field',
 'Pool',
 'Soccer Stadium',
 'Arcade',
 'Art Gallery',
 'Massage Studio',
 'Salon / Barbershop',
 'Gift Shop',
 'Cosmetics Shop',
 'Hotel Bar',
 'Bar',
 'Sculpture Garden',
 'Gastropub',
 'Tea Room',
 'Boutique',
 'Movie Theater',
 'Theme Park',
 'Tailor Shop',
 'Beer Garden',
 'Accessories Store',
 'Campground',
 'Art Museum',
 'Gym Pool',
 'Bridge',
 'College Stadium',
 'Karaoke Bar',
 'Nightclub',
 'Hobby Shop',
 'Garden',
 'Acehnese Restaurant',
 'Kids Store',
 'Health & Beauty Service',
 'Track',
 'Winery',
 'Wine Bar',
 'Playground',
 'Plaza',
 'Event Space',
 'Udon Restaurant',
 'Basketball Court',
 'Electronics Store',
 'Flower Shop',
 'Toy / Game Store',
 'Track Stadium',
 'Vegetarian / Vegan Restaurant',
 'Gaming Cafe',
'General Entertainment',
 'Golf Course',
 'Theater',
 'Music Venue',
 'Scenic Lookout',
 'Performing Arts Venue',
 'Historic Site',
 'Basketball Stadium',
 'Arts & Crafts Store',
 'Shopping Plaza',
 'Stadium',
 'Pool Hall',
 'Building',
 'Skate Park',
 'Miscellaneous Shop',
 'Hotel Pool',
 'Pub',
 'Spanish Restaurant',
 'Other Event',
 'Jewelry Store',
 'Monument / Landmark',
 'Bathing Area',
 'Mobile Phone Shop',
 'Baseball Field',
 'Aquarium',
 'Theme Park Ride / Attraction',
 'Mountain',
 'Garden Center',
 'Soccer Field',
 'Dance Studio',
 'Jazz Club',
 'Concert Hall',
 'Beach']

In [36]:
#update One-hot with chosen categories
bandung_onehot2= bandung_onehot[chosen].drop(
    columns = ['Neighborhood Latitude', 'Neighborhood Longitude']).groupby(
    'Neighborhood').sum()

bandung_onehot2.head()

Unnamed: 0_level_0,Museum,Park,Spa,Field,Pool,Soccer Stadium,Arcade,Art Gallery,Massage Studio,Salon / Barbershop,...,Baseball Field,Aquarium,Theme Park Ride / Attraction,Mountain,Garden Center,Soccer Field,Dance Studio,Jazz Club,Concert Hall,Beach
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Ancol,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Antapani,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
Antapani Kidul,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
Antapani Tengah,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0
Arjuna,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h3>Machine Learning Step<h3>

In this section we will use clustering by K Means

In [37]:
# run k-means clustering with 5 clusters
kmeans = KMeans(n_clusters = 5, random_state = 0).fit(bandung_onehot2)

<h4>Ranking the Cluster<h4>

In [38]:
#ranking by sum of the centroids
means_df = pd.DataFrame(kmeans.cluster_centers_)
means_df.columns = bandung_onehot2.columns
means_df.index = ['Cluster1','Cluster2','Cluster3','Cluster4','Cluster5']
means_df['Total Sum'] = means_df.sum(axis = 1)
means_df_ranked = means_df.sort_values(axis = 0, by = ['Total Sum'], ascending=False)
means_df_ranked

Unnamed: 0,Museum,Park,Spa,Field,Pool,Soccer Stadium,Arcade,Art Gallery,Massage Studio,Salon / Barbershop,...,Aquarium,Theme Park Ride / Attraction,Mountain,Garden Center,Soccer Field,Dance Studio,Jazz Club,Concert Hall,Beach,Total Sum
Cluster5,0.0,-5.5511150000000004e-17,0.25,1.387779e-17,2.775558e-17,1.387779e-17,1.0,1.0,0.0,0.875,...,1.734723e-18,2.0,1.734723e-18,1.734723e-18,0.0,3.469447e-18,1.734723e-18,-6.938894e-18,1.734723e-18,18.125
Cluster3,1.0,2.625,1.0,0.25,0.375,1.387779e-17,0.125,2.775558e-17,0.375,0.375,...,1.734723e-18,2.775558e-17,1.734723e-18,1.734723e-18,0.25,3.469447e-18,1.734723e-18,-6.938894e-18,1.734723e-18,13.125
Cluster2,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,3.75,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.5
Cluster4,0.125,0.0625,1.0,-1.387779e-17,0.0625,2.0816680000000002e-17,0.25,0.125,0.6875,0.375,...,2.6020850000000002e-18,4.1633360000000003e-17,2.6020850000000002e-18,2.6020850000000002e-18,2.775558e-17,5.2041700000000004e-18,2.6020850000000002e-18,-1.387779e-17,2.6020850000000002e-18,11.125
Cluster1,0.087379,0.2427184,0.067961,0.1067961,0.1650485,0.0776699,0.417476,0.05825243,0.029126,0.067961,...,9.540979e-18,1.526557e-16,0.009708738,0.009708738,0.09708738,0.01941748,0.009708738,0.06796117,0.009708738,3.262136


In [39]:
#Rank
means_df_ranked['Total Sum']

Cluster5    18.125000
Cluster3    13.125000
Cluster2    12.500000
Cluster4    11.125000
Cluster1     3.262136
Name: Total Sum, dtype: float64

In [40]:
neigh_summary = pd.DataFrame([bandung_onehot2.index, 1 + kmeans.labels_]).T
neigh_summary.columns = ['Neighborhood', 'Cluster']
neigh_summary

Unnamed: 0,Neighborhood,Cluster
0,Ancol,1
1,Antapani,1
2,Antapani Kidul,1
3,Antapani Tengah,1
4,Arjuna,4
5,Babakan,1
6,Babakan Asih,1
7,Babakan Ciamis,3
8,Babakan Ciparay,1
9,Babakan Sari,5


Because based on means_df_ranked Cluster 1 is the best, so the best neighborhoods for hotel construction:

In [42]:
neigh_summary[neigh_summary['Cluster'] == 5]


Unnamed: 0,Neighborhood,Cluster
9,Babakan Sari,5
15,Binong,5
25,Cibangkong,5
63,Gumuruh,5
68,Kacapiring,5
72,Kebon Gedang,5
77,Kebon Waru,5
86,Maleer,5


## Conclusion
The best place for Hotel Construction is Cluster 5, in the Neighborhood of: Babakan Sari, Binong, Cibangkong, Gamuruh, Kacapiring, Kebon Gadang, Kebon Waru and/or Maleer

<h3>Visualization of Clusters<h3>

In [43]:
bandung_onehot_cut = bandung_onehot[['Neighborhood','Neighborhood Latitude','Neighborhood Longitude']]
bandung_merged2 = neigh_summary.merge(bandung_onehot_cut, on = 'Neighborhood')
bandung_merged2.drop_duplicates(subset ="Neighborhood", 
                     keep = 'first', inplace = True) 
bandung_merged2

Unnamed: 0,Neighborhood,Cluster,Neighborhood Latitude,Neighborhood Longitude
0,Ancol,1,-6.948861,107.610904
22,Antapani,1,-6.903640,107.659682
29,Antapani Kidul,1,-6.920566,107.652972
61,Antapani Tengah,1,-6.916589,107.649495
102,Arjuna,4,-6.906824,107.596063
202,Babakan,1,-6.944001,107.566958
207,Babakan Asih,1,-6.931864,107.589057
247,Babakan Ciamis,3,-6.914974,107.603061
347,Babakan Ciparay,1,-6.951786,107.584278
368,Babakan Sari,5,-6.924463,107.642902


In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = mpcm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [mpcol.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bandung_merged2['Neighborhood Latitude'], bandung_merged2['Neighborhood Longitude'], bandung_merged2['Neighborhood'], bandung_merged2['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The best cluster denoted as red circle in the visualization