IBM Applied Data Science Capstone Course by Coursera
Week 5 Final Report
Opening a business in Bali, Indonesia

Build a dataframe of neighborhoods in Bali by web scraping the data from Wikipedia page
Get the geographical coordinates of the neighborhoods
Obtain the venue data for the neighborhoods from Foursquare API
Explore and cluster the neighborhoods
Select the best cluster to open a new shopping mall

1. Install libraries

In [1]:
!pip install geopy
!pip install geocoder
!pip install BeautifulSoup4

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/07/e1/9c72de674d5c2b8fcb0738a5ceeb5424941fefa080bfe4e240d0bacb5a38/geopy-2.0.0-py3-none-any.whl (111kB)
[K     |████████████████████████████████| 112kB 6.0MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.0.0
Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 5.0MB/s ta 0:00:011
[?25hCollecting click (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.

2. Scrap data from Wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Districts_of_Bali").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
Bali_df = pd.DataFrame({"Neighborhood": neighborhoodList})

Bali_df.head()

Unnamed: 0,Neighborhood
0,List of districts of Bali
1,Abiansemal District
2,"Banjar, Buleleng"
3,Banjarangkan
4,Blahbatuh


In [7]:
# print the number of rows of the dataframe
Bali_df.shape

(24, 1)

In [8]:
Bali_df.tail()

Unnamed: 0,Neighborhood
19,"Sukasada, Buleleng"
20,Sukawati
21,Tegallalang
22,"Tejakula, Buleleng"
23,Ubud District


3. Get the geographical coordinates

In [9]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Bali, Indonesia'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in Bali_df["Neighborhood"].tolist() ]
coords

[[-8.729689999999948, 115.16812000000004],
 [-8.540969999999959, 115.22325000000001],
 [-8.255219999999952, 115.09030000000007],
 [-8.567619999999977, 115.39053000000001],
 [-8.599719999999934, 115.32788000000005],
 [-8.115909999999928, 115.09037000000001],
 [-8.311459999999954, 114.91358000000002],
 [-8.170419999999979, 114.74011000000007],
 [-8.503659999999968, 115.29287000000011],
 [-8.132109999999955, 115.20392000000004],
 [-8.728629999999953, 115.16895000000011],
 [-8.619789999999966, 115.17547000000002],
 [-8.43787999999995, 115.17709000000002],
 [-8.56015999999994, 115.19683000000009],
 [-8.673909999999978, 115.55202000000008],
 [-8.396149999999977, 115.25114000000008],
 [-8.080169999999953, 115.14852000000008],
 [-8.19680999999997, 114.90411000000006],
 [-8.809189999999944, 115.15787000000012],
 [-8.237979999999936, 115.12071000000003],
 [-8.601239999999962, 115.2642800000001],
 [-8.37860999999998, 115.30259000000001],
 [-8.120259999999973, 115.32592000000011],
 [-8.51961999999

In [11]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
Bali_df['Latitude'] = df_coords['Latitude']
Bali_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates
print(Bali_df.shape)
Bali_df

(24, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,List of districts of Bali,-8.72969,115.16812
1,Abiansemal District,-8.54097,115.22325
2,"Banjar, Buleleng",-8.25522,115.0903
3,Banjarangkan,-8.56762,115.39053
4,Blahbatuh,-8.59972,115.32788
5,"Buleleng, Bali",-8.11591,115.09037
6,"Busung Biu, Buleleng",-8.31146,114.91358
7,"Gerokgak, Buleleng",-8.17042,114.74011
8,Tampaksiring,-8.50366,115.29287
9,"Kubutambahan, Buleleng",-8.13211,115.20392


4. Create a map with neighborhoods superimposed on top

In [14]:
address = 'Bali, Indonesia'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bali, Indonesia {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bali, Indonesia -8.3304977, 115.0906401.


In [15]:
# create map of Bali using latitude and longitude values
map_Bali = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(Bali_df['Latitude'], Bali_df['Longitude'], Bali_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_Bali)  
    
map_Bali

In [16]:
# save the map as HTML file
map_Bali.save('map_Bali.html')

5. Use the Foursquare API to explore the neighborhoods

In [17]:
# define Foursquare Credentials and Version
CLIENT_ID = 'GW2AA04CL1MYLF5OMERKX3GVAH0WOJ4ZSBKZ3WOXFGFNURJR'
CLIENT_SECRET = '3EF0FU5AEQNFP12ESGWCB3LVJORGGEMNCL3EKHHPS0HAYF0L'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GW2AA04CL1MYLF5OMERKX3GVAH0WOJ4ZSBKZ3WOXFGFNURJR
CLIENT_SECRET:3EF0FU5AEQNFP12ESGWCB3LVJORGGEMNCL3EKHHPS0HAYF0L


In [18]:
#Now, let's get the top 100 venues that are within a radius of 10,000 meters.
radius = 10000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(Bali_df['Latitude'], Bali_df['Longitude'], Bali_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))


In [19]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1461, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,List of districts of Bali,-8.72969,115.16812,Odysseys Surf School,-8.720849,115.169901,Surf Spot
1,List of districts of Bali,-8.72969,115.16812,Young Spa,-8.722417,115.17528,Spa
2,List of districts of Bali,-8.72969,115.16812,Cara Cara Inn,-8.722761,115.17332,Hotel
3,List of districts of Bali,-8.72969,115.16812,Sheraton Bali Kuta Resort,-8.717966,115.169126,Hotel
4,List of districts of Bali,-8.72969,115.16812,Discovery Kartika Plaza Hotel,-8.729493,115.166609,Hotel


In [20]:
#Let's check how many venues were returned for each neighorhood
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abiansemal District,100,100,100,100,100,100
"Banjar, Buleleng",49,49,49,49,49,49
Banjarangkan,30,30,30,30,30,30
Blahbatuh,100,100,100,100,100,100
"Buleleng, Bali",47,47,47,47,47,47
"Busung Biu, Buleleng",4,4,4,4,4,4
"Gerokgak, Buleleng",18,18,18,18,18,18
"Kubutambahan, Buleleng",5,5,5,5,5,5
Kuta District,100,100,100,100,100,100
Kuta North,100,100,100,100,100,100


In [21]:
#Let's find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))


There are 166 uniques categories.


In [22]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:170]

array(['Surf Spot', 'Spa', 'Hotel', 'Shopping Mall', 'Coffee Shop',
       'Airport Lounge', 'Café', 'Clothing Store',
       'Indonesian Restaurant', 'Supermarket', 'Discount Store',
       'Satay Restaurant', 'Resort', 'Sushi Restaurant', 'Breakfast Spot',
       'Hunting Supply', 'French Restaurant', 'Ice Cream Shop',
       'Vietnamese Restaurant', 'Water Park', 'Restaurant', 'Beach Bar',
       'Japanese Restaurant', 'Tea Room', 'Convenience Store', 'Church',
       'Italian Restaurant', 'BBQ Joint', 'Gym / Fitness Center', 'Pool',
       'Mexican Restaurant', 'Beach', 'Baby Store', 'Seafood Restaurant',
       'Bakery', 'Yoga Studio', 'Beer Garden', 'Kids Store',
       'Vegetarian / Vegan Restaurant', 'Cocktail Bar', 'Garden Center',
       'Bistro', 'Arts & Crafts Store', 'Farm', 'Peruvian Restaurant',
       'Art Gallery', 'Trail', 'Gift Shop', 'Tourist Information Center',
       'Asian Restaurant', 'Cosmetics Shop', 'Food & Drink Shop',
       'Dessert Shop', 'Museum', 'Mode

In [23]:
# check if the results contain "Coffee Shop"
"Coffee Shop" in venues_df['VenueCategory'].unique()

True

6. Analyze Each Neighborhood

In [24]:
# one hot encoding
Bali_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Bali_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Bali_onehot.columns[-1]] + list(Bali_onehot.columns[:-1])
Bali_onehot = Bali_onehot[fixed_columns]

print(Bali_onehot.shape)
Bali_onehot.head()

(1461, 167)


Unnamed: 0,Neighborhoods,Airport Food Court,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Baby Store,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Café,Cajun / Creole Restaurant,Campground,Cemetery,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Spot,Donut Shop,Electronics Store,Exhibit,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hawaiian Restaurant,High School,Hindu Temple,Historic Site,History Museum,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hunting Supply,Ice Cream Shop,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Museum,National Park,Nature Preserve,Neighborhood,New American Restaurant,Night Market,Office,Other Great Outdoors,Outdoors & Recreation,Paintball Field,Park,Pedestrian Plaza,Peruvian Restaurant,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Pool,Racetrack,Rafting,Recreation Center,Resort,Rest Area,Restaurant,River,Rock Club,Salad Place,Satay Restaurant,Scenic Lookout,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Stadium,Soup Place,Spa,Sports Club,Stables,Steakhouse,Sundanese Restaurant,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Restaurant,Tourist Information Center,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Waterfall,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,List of districts of Bali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,List of districts of Bali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,List of districts of Bali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,List of districts of Bali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,List of districts of Bali,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
Bali_grouped = Bali_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(Bali_grouped.shape)
Bali_grouped

(24, 167)


Unnamed: 0,Neighborhoods,Airport Food Court,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Baby Store,Bakery,Balinese Restaurant,Bar,Basketball Court,Basketball Stadium,Beach,Beach Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Café,Cajun / Creole Restaurant,Campground,Cemetery,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Spot,Donut Shop,Electronics Store,Exhibit,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Flea Market,Food & Drink Shop,Food Court,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hawaiian Restaurant,High School,Hindu Temple,Historic Site,History Museum,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hunting Supply,Ice Cream Shop,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Kids Store,Lounge,Market,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Motorcycle Shop,Mountain,Museum,National Park,Nature Preserve,Neighborhood,New American Restaurant,Night Market,Office,Other Great Outdoors,Outdoors & Recreation,Paintball Field,Park,Pedestrian Plaza,Peruvian Restaurant,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Pool,Racetrack,Rafting,Recreation Center,Resort,Rest Area,Restaurant,River,Rock Club,Salad Place,Satay Restaurant,Scenic Lookout,Seafood Restaurant,Shopping Mall,Snack Place,Soccer Stadium,Soup Place,Spa,Sports Club,Stables,Steakhouse,Sundanese Restaurant,Supermarket,Surf Spot,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Theme Restaurant,Tourist Information Center,Trail,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Waterfall,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Abiansemal District,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.01,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.23,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.05,0.0,0.0,0.0,0.0,0.03,0.0,0.0
1,"Banjar, Buleleng",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081633,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.102041,0.040816,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.020408,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0
2,Banjarangkan,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.066667,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333
3,Blahbatuh,0.01,0.0,0.0,0.0,0.1,0.0,0.03,0.03,0.01,0.0,0.03,0.0,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.1,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.1,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02
4,"Buleleng, Bali",0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.06383,0.0,0.0,0.0,0.0,0.021277,0.0,0.042553,0.0,0.0,0.085106,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06383,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.021277,0.0,0.0,0.0,0.106383,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085106,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0
5,"Busung Biu, Buleleng",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Gerokgak, Buleleng",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Kubutambahan, Buleleng",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Kuta District,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.08,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.01,0.04,0.06,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.11,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0
9,Kuta North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.15,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.05,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.03,0.0,0.0


In [28]:
len(Bali_grouped[Bali_grouped["Coffee Shop"] > 0])

17

In [29]:
#Create a new DataFrame for this category of data only
Bali_Resto = Bali_grouped[["Neighborhoods","Coffee Shop"]]
Bali_Resto.head()

Unnamed: 0,Neighborhoods,Coffee Shop
0,Abiansemal District,0.01
1,"Banjar, Buleleng",0.020408
2,Banjarangkan,0.0
3,Blahbatuh,0.03
4,"Buleleng, Bali",0.021277


7. Cluster Neighborhoods

In [30]:
#Run k-means to cluster the neighborhoods
# set number of clusters
kclusters = 3

Bali_clustering = Bali_Resto.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Bali_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 0, 2, 2, 0, 0, 0, 1, 2], dtype=int32)

In [31]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
Bali_merged = Bali_Resto.copy()

# add clustering labels
Bali_merged["Cluster Labels"] = kmeans.labels_

In [32]:
Bali_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
Bali_merged.head()

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels
0,Abiansemal District,0.01,0
1,"Banjar, Buleleng",0.020408,2
2,Banjarangkan,0.0,0
3,Blahbatuh,0.03,2
4,"Buleleng, Bali",0.021277,2


In [33]:
Bali_merged = Bali_merged.join(Bali_df.set_index("Neighborhood"), on="Neighborhood")

print(Bali_merged.shape)
Bali_merged.head() # check the last columns!

(24, 5)


Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
0,Abiansemal District,0.01,0,-8.54097,115.22325
1,"Banjar, Buleleng",0.020408,2,-8.25522,115.0903
2,Banjarangkan,0.0,0,-8.56762,115.39053
3,Blahbatuh,0.03,2,-8.59972,115.32788
4,"Buleleng, Bali",0.021277,2,-8.11591,115.09037


In [34]:
# sort the results by Cluster Labels
print(Bali_merged.shape)
Bali_merged.sort_values(["Cluster Labels"], inplace=True)
Bali_merged

(24, 5)


Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
0,Abiansemal District,0.01,0,-8.54097,115.22325
20,Tampaksiring,0.01,0,-8.50366,115.29287
16,Seririt,0.0,0,-8.19681,114.90411
12,Mengwi,0.01,0,-8.56016,115.19683
22,"Tejakula, Buleleng",0.0,0,-8.12026,115.32592
7,"Kubutambahan, Buleleng",0.0,0,-8.13211,115.20392
11,"Marga, Tabanan",0.0,0,-8.43788,115.17709
5,"Busung Biu, Buleleng",0.0,0,-8.31146,114.91358
2,Banjarangkan,0.0,0,-8.56762,115.39053
6,"Gerokgak, Buleleng",0.0,0,-8.17042,114.74011


In [35]:
#Finally, let's visualize the resulting clusters
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Bali_merged['Latitude'], Bali_merged['Longitude'], Bali_merged['Neighborhood'], Bali_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [56]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

8. Examine Clusters

Cluster 0

In [36]:
Bali_merged.loc[Bali_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
0,Abiansemal District,0.01,0,-8.54097,115.22325
20,Tampaksiring,0.01,0,-8.50366,115.29287
16,Seririt,0.0,0,-8.19681,114.90411
12,Mengwi,0.01,0,-8.56016,115.19683
22,"Tejakula, Buleleng",0.0,0,-8.12026,115.32592
7,"Kubutambahan, Buleleng",0.0,0,-8.13211,115.20392
11,"Marga, Tabanan",0.0,0,-8.43788,115.17709
5,"Busung Biu, Buleleng",0.0,0,-8.31146,114.91358
2,Banjarangkan,0.0,0,-8.56762,115.39053
6,"Gerokgak, Buleleng",0.0,0,-8.17042,114.74011


Cluster 1

In [37]:
Bali_merged.loc[Bali_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
13,Nusa Penida,0.052632,1,-8.67391,115.55202
8,Kuta District,0.08,1,-8.72863,115.16895
21,Tegallalang,0.068966,1,-8.37861,115.30259
10,List of districts of Bali,0.06,1,-8.72969,115.16812


Cluster 2

In [38]:
Bali_merged.loc[Bali_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Coffee Shop,Cluster Labels,Latitude,Longitude
4,"Buleleng, Bali",0.021277,2,-8.11591,115.09037
3,Blahbatuh,0.03,2,-8.59972,115.32788
14,Payangan,0.02381,2,-8.39615,115.25114
9,Kuta North,0.03,2,-8.61979,115.17547
17,South Kuta,0.03,2,-8.80919,115.15787
18,"Sukasada, Buleleng",0.021739,2,-8.23798,115.12071
19,Sukawati,0.03,2,-8.60124,115.26428
1,"Banjar, Buleleng",0.020408,2,-8.25522,115.0903
15,"Sawan, Buleleng",0.034483,2,-8.08017,115.14852


Observations:
Most of the cafes are concentrated in cluster 2 which is actually where the famous ubud rice field teraces are. We have a very low number to nearly zero for cluster 0 and moderate number in cluster 1. Cluster 0 represents a great opportunity and high potential area to open a business as there is very little to no competition. However access and supply will be an issue especially when we look at Nusa Penida for example. Therefore, this project recommends to open a business in cluster 0 considering as well the government tourism development. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.