# Capstone Project: Singapore and Kuala Lumpur Compared

# Purpose
This exercise serves as the document for the peer review assignment for the IBM Data Science Professional Certificate - Applied Data Science Capstone.

# Introduction
This exercise demonstrates the use of the Foursquare API and the k-means machine learning algorithm.


The objectives of this exercise was to illustrate the qualitative differences of two neighborhoods in two cities – Singapore and Kuala Lumpur, Malaysia. This exercise could presumably be helpful in two ways: a guide for potential visitors of these two cities on the kinds of places they offer and the food scenes these two places offer; and for potential restauranteurs in thinking about what kinds of dining options to offer in these two places.


Singapore and Kuala Lumpur are two cities that are often compared with each other. They are physically just 400km from each other - about four hours ride, and share several qualities: both are diverse multicultural cities where people of different ethnicities, faiths, and cultures live with each other. They have also developed their own food cultures, which often compete with each other. This exercise does not claim to be definititve in either way - merely using Foursquare data to look at how visitors interact with both cities on Foursquare, and how that data might reveal patterns about the food culture as experienced by users who check in on Foursquare.

The findings from this small limited study will still provide interesting nuggets of information for potential visitors and businesses to consider as they find out more about both cities. 

##Data

The data from this comes from the Foursquare API. Foursquare API calls offers information on the venue and category of the venue – which is sufficient information for the tasks to perform. The file format can be organized for further analysis in the pandas dataframe.
The main fields of the Foursquare API that will be used would be:
- Venue Name
- Latitude
- Longitude
- Venue cateogry.
For the geographical data, I have relied on Google searches to determine the approximate coordinates of the places of interest.

# Methodology

The main tools I will be using will be K-means clustering, as I am trying to understand the qualitative nature of the various neighbourhoods within Singapore and Kuala Lumpur, and how these neighbourhoods would be distinctive in their own right. Such a description would be useful to potential visitors and restauranteurs and the places they might choose to visit and set up their businesses. 

Owing to the small-scale nature of this exercise, this is certainly not a comprehensive analysis, and further data collection and analysis would be required. Nonetheless, this small demonstration would be sufficient to reveal the insights of these two cities and their constitutent neighbourhoods. 

I will be collecting the names of the neighbourhoods, followed by their latitude and longitude coordinates for use in the Foursquare API. In situations where it is difficult to collect the latitude and longitude coordinates using the Geocoder API, I will manually obtain them through Google searches. 

The coordinates will then be fed into the Foursquare Places API calls to collect information of the venues around the neighbourhoods. For ease of analysis I will collect up a limit of 50 places for neighbourhoods in these two cities. 

I will be collecting the coordinates for Singapore and Kuala Lumpur neighbourhoods/districts. I will then call the Foursquare API and obtain the venues for both places.
I will convert the venues data into vectors through one-hot encoding.
I will then cluster the neighbourhoods/districts in both places, and attempt to interpret the findings. 
I will also take a look at the most popular venues for both cities.
All of these together, will provide an illustrative view of the food scenes in both places, and inform potential visitors and business owners as they make decisions on where to visit/put a restaurant in. 


In [1]:
# importing the various libraries

import folium
import numpy as np
import pandas as pd
import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import requests # library to handle requests
import lxml.html as lh
import bs4 as bs
import urllib.request

In [2]:
from IPython.display import HTML
import base64

# Extra Helper scripts to generate download links for saved dataframes in csv format.
def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

In [3]:
#listing Singapore's main housing districts, called 'new towns'/

sg_new_towns = ['Ang Mo Kio', 'Bedok', 'Bishan', 'Bukit Batok', 'Bukit Merah', 'Bukit Panjang', 'Bukit Timah', 'Choa Chu Kang', 'Clementi', 'Geylang', 'Hougang', 'Jurong East', 'Jurong West', 'Kallang/Whampoa', 'Marine Parade', 'Pasir Ris', 'Punggol', 'Queenstown', 'Sembawang', 'Sengkang', 'Serangoon', 'Tampines', 'Tanjong Pagar', 'Toa Payoh', 'Woodlands', 'Yishun']

In [4]:
sg_towns = pd.DataFrame()
sg_towns['Neighbourhoods'] = sg_new_towns
sg_towns

Unnamed: 0,Neighbourhoods
0,Ang Mo Kio
1,Bedok
2,Bishan
3,Bukit Batok
4,Bukit Merah
5,Bukit Panjang
6,Bukit Timah
7,Choa Chu Kang
8,Clementi
9,Geylang


In [5]:
#obtaining latitude coordinates
sg_latitude = []


for i in range(len(sg_new_towns)):
    geolocator = Nominatim(user_agent = "sg_explorer")
    location = geolocator.geocode("Singapore " + sg_new_towns[i] + " town centre")
    latitude = location.latitude
    sg_latitude.append(latitude)

print(sg_latitude)


[1.3712845, 1.3259277, 1.3489326, 1.3482831, 1.2839191, 1.377921, 1.3294484, 1.3848961499999999, 1.3132179499999999, 1.3181862, 1.3733601, 1.3338016, 1.3396365, 1.3218517, 1.3026889, 1.3742214, 1.398033, 1.2994371, 1.4480646, 1.3909487, 1.363236, 1.3546528, 1.2764189, 1.3353906, 1.436897, 1.428136]


In [6]:
#obtaining longitude coordinates
sg_longitude = []

for i in range(len(sg_new_towns)):
    geolocator = Nominatim(user_agent = "sg_explorer")
    location = geolocator.geocode("Singapore " + sg_new_towns[i] + " town centre")
    longitude = location.longitude
    sg_longitude.append(longitude)

print(sg_longitude)

[103.84699364624538, 103.93181266590753, 103.84890573407641, 103.7490191, 103.817807, 103.7718658, 103.794166, 103.74300455709542, 103.7650857156276, 103.8870563, 103.8860907, 103.7419081, 103.7073387, 103.86358, 103.9073952, 103.950796, 103.9073312, 103.8000884, 103.8207604, 103.8951748, 103.8744617, 103.9435712, 103.8429295, 103.8497414, 103.786216, 103.8336942]


In [7]:
#combining the names and the latitudes and longitudes into a single dataframe

sg_towns['Latitude'] = sg_latitude
sg_towns['Longitude'] = sg_longitude
sg_towns

Unnamed: 0,Neighbourhoods,Latitude,Longitude
0,Ang Mo Kio,1.371285,103.846994
1,Bedok,1.325928,103.931813
2,Bishan,1.348933,103.848906
3,Bukit Batok,1.348283,103.749019
4,Bukit Merah,1.283919,103.817807
5,Bukit Panjang,1.377921,103.771866
6,Bukit Timah,1.329448,103.794166
7,Choa Chu Kang,1.384896,103.743005
8,Clementi,1.313218,103.765086
9,Geylang,1.318186,103.887056


In [8]:
#creating a map to make sure that the neighbourhoods are roughly in the correct places. 

address = 'Singapore'

geolocator = Nominatim(user_agent = 'sg_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_singapore = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, town in zip(
    sg_towns['Latitude'],
    sg_towns['Longitude'],
    sg_towns['Neighbourhoods']):
    label = town
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_singapore)
map_singapore

In [103]:
#As I am not familiar with Kuala Lumpur, I have gone to Wikipedia and Google Maps to look at the districts and environs of Kuala Lumpur and the surrounding areas. 
#I am also aware that some of these districts are not part of KL proper, but have nonetheless include them as they are popularly mentioned as being part of KL informally.

kl_districts = ['Bandar Tun Razak','Batu','Bukit Bintang', 'Cheras', 'Kepong','Lembah Pantai','Segambut','Seputeh','Setiawangsa', 'Titiwangsa',  'Wangsa Maju', 'Subang Jaya','Petaling Jaya','Putrajaya','Kajang','Klang','Puchong','Port Klang','Sungai Buloh','Ampang Jaya', 'Shah Alam','Seri Kembangan']

In [104]:
kl_districts_df=pd.DataFrame()
kl_districts_df['Name']=kl_districts
kl_districts_df

Unnamed: 0,Name
0,Bandar Tun Razak
1,Batu
2,Bukit Bintang
3,Cheras
4,Kepong
5,Lembah Pantai
6,Segambut
7,Seputeh
8,Setiawangsa
9,Titiwangsa


In [105]:
#these coordinates are collected manually. 
kl_latitudes = [3.092, 3.139, 3.1468,3.1068,3.214,3.1252,3.1917,3.115,3.183,3.1774,3.2038,3.0567,3.1279,2.9264,2.9935,3.0449,3.0327,2.9999,3.2093,3.1491,3.0733,3.0220]

In [106]:
#these coordinates are collected manually. 
kl_longitudes = [101.7211,101.6869,101.7113,101.7259,101.635,101.6683,101.6734,101.6797,101.7462,101.7077,101.7367,101.5851,101.5945,101.6964,101.7874,101.4456,101.6188,101.3928,101.5613,101.7625,101.5185,101.7055]


In [107]:
#Full dataframe constructed for KL. 
kl_districts_df['Latitudes'] = kl_latitudes
kl_districts_df['Longitudes'] = kl_longitudes
kl_districts_df

Unnamed: 0,Name,Latitudes,Longitudes
0,Bandar Tun Razak,3.092,101.7211
1,Batu,3.139,101.6869
2,Bukit Bintang,3.1468,101.7113
3,Cheras,3.1068,101.7259
4,Kepong,3.214,101.635
5,Lembah Pantai,3.1252,101.6683
6,Segambut,3.1917,101.6734
7,Seputeh,3.115,101.6797
8,Setiawangsa,3.183,101.7462
9,Titiwangsa,3.1774,101.7077


In [108]:
#Making a map to check that the coordinates are just about in the right place. 
address = 'Kuala Lumpur'

geolocator = Nominatim(user_agent = 'kl_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_kl = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, town in zip(
    kl_districts_df['Latitudes'],
    kl_districts_df['Longitudes'],
    kl_districts_df['Name']):
    label = town
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_kl)
map_kl

In [None]:
#creating the API call
CLIENT_ID = 'SECRET' # your Foursquare ID
CLIENT_SECRET = 'SECRET' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [24]:
#Using on of the Singapore neighbourhoods as test for the Foursquare API. 
#The Singapore neighbourhood is "Tanjong Pagar", hence "tp_lat" and "tp_long" for the latitudes and longitudes.
tp_lat = sg_towns.loc[22, 'Latitude']
tp_long = sg_towns.loc[22, 'Longitude']
tp_name = sg_towns.loc[22, 'Neighbourhoods']

In [None]:
# Limit the number of venues returned by Foursquare API
LIMIT = 50

# Define radius
radius = 500

# Create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    tp_lat, 
    tp_long, 
    radius, 
    LIMIT)

# Display URL
url



In [None]:
#results from the Foursquare API call
results = requests.get(url).json()
results

In [27]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [28]:
#creating the venues dataframe for Tanjong Pagar from the Foursquare API call
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Oasia Downtown Hotel,Hotel,1.27607,103.844334
1,Binomio Spanish Restaurante,Spanish Restaurant,1.277713,103.842248
2,Baristart Coffee,Coffee Shop,1.277694,103.84439
3,Salmon Samurai,Sushi Restaurant,1.275067,103.84373
4,DON DON DONKI,Discount Store,1.274742,103.843383


In [29]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

50 venues were returned by Foursquare.


In [30]:
#creating a helper function for the api call to obtain the venues around the neighbourhoods
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
#calling the API for the variou Singapore neighbourhoods. 
sg_venues = getNearbyVenues(names=sg_towns['Neighbourhoods'],
                                   latitudes=sg_towns['Latitude'],
                                   longitudes=sg_towns['Longitude']
                                  )



Ang Mo Kio
Bedok
Bishan
Bukit Batok
Bukit Merah
Bukit Panjang
Bukit Timah
Choa Chu Kang
Clementi
Geylang
Hougang
Jurong East
Jurong West
Kallang/Whampoa
Marine Parade
Pasir Ris
Punggol
Queenstown
Sembawang
Sengkang
Serangoon
Tampines
Tanjong Pagar
Toa Payoh
Woodlands
Yishun


In [32]:
#This gives a preview of the venues collected and the category in all of the neighbourhoods. 
#A total of 908 venues were identified. 
print(sg_venues.shape)
sg_venues.head()

(908, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ang Mo Kio,1.371285,103.846994,Face Ban Mian 非板面 (Ang Mo Kio),1.372031,103.847504,Noodle House
1,Ang Mo Kio,1.371285,103.846994,NTUC FairPrice,1.371507,103.847082,Supermarket
2,Ang Mo Kio,1.371285,103.846994,Old Chang Kee,1.369094,103.848389,Snack Place
3,Ang Mo Kio,1.371285,103.846994,FairPrice Xtra,1.369279,103.848886,Supermarket
4,Ang Mo Kio,1.371285,103.846994,True Fitness,1.372891,103.847661,Gym


In [33]:
sg_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ang Mo Kio,50,50,50,50,50,50
Bedok,50,50,50,50,50,50
Bishan,41,41,41,41,41,41
Bukit Batok,28,28,28,28,28,28
Bukit Merah,26,26,26,26,26,26
Bukit Panjang,7,7,7,7,7,7
Bukit Timah,29,29,29,29,29,29
Choa Chu Kang,27,27,27,27,27,27
Clementi,47,47,47,47,47,47
Geylang,34,34,34,34,34,34


In [34]:
#This gives a count of the number of unique categories. 
print('There are {} uniques categories.'.format(len(sg_venues['Venue Category'].unique())))

There are 149 uniques categories.


In [35]:
# one hot encoding - vectorizing the data
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sg_onehot['Neighborhood'] = sg_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sg_onehot.columns[-1]] + list(sg_onehot.columns[:-1])
sg_onehot = sg_onehot[fixed_columns]

sg_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,...,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
#take note of this number - 908 venues from 149 categories. This can be compared to the KL dataset later. 
sg_onehot.shape

(908, 149)

In [37]:
#grouping the dataset
sg_grouped = sg_onehot.groupby('Neighborhood').mean().reset_index()
sg_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,...,Tea Room,Thai Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Wings Joint
0,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
1,Bedok,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,...,0.0,0.04,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.02
2,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,...,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bukit Batok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bukit Panjang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bukit Timah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,...,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0
7,Choa Chu Kang,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,...,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Clementi,0.0,0.0,0.0,0.021277,0.0,0.021277,0.06383,0.0,0.021277,...,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0
9,Geylang,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.029412,...,0.0,0.0,0.0,0.0,0.0,0.029412,0.088235,0.0,0.0,0.029412


In [38]:
sg_grouped.shape

(26, 149)

In [39]:
#This will give the top 5 venues for each neighbourhood.
num_top_venues = 5

for hood in sg_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sg_grouped[sg_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ang Mo Kio----
                  venue  freq
0            Food Court  0.12
1           Coffee Shop  0.08
2          Dessert Shop  0.06
3  Fast Food Restaurant  0.06
4   Japanese Restaurant  0.04


----Bedok----
                           venue  freq
0             Chinese Restaurant  0.08
1                    Coffee Shop  0.06
2                     Food Court  0.06
3            Japanese Restaurant  0.06
4  Vegetarian / Vegan Restaurant  0.04


----Bishan----
                venue  freq
0          Food Court  0.12
1         Coffee Shop  0.12
2     Bubble Tea Shop  0.07
3      Ice Cream Shop  0.05
4  Chinese Restaurant  0.05


----Bukit Batok----
                venue  freq
0         Coffee Shop  0.18
1  Chinese Restaurant  0.11
2          Food Court  0.07
3              Bakery  0.07
4    Department Store  0.04


----Bukit Merah----
                  venue  freq
0    Chinese Restaurant  0.27
1            Food Court  0.08
2  Fast Food Restaurant  0.08
3           Coffee Shop  0.08
4   

In [109]:
#I extract the venues for the various districts in Kuala Lumpur.
kl_venues = getNearbyVenues(names=kl_districts_df['Name'],
                                   latitudes=kl_districts_df['Latitudes'],
                                   longitudes=kl_districts_df['Longitudes']
                                  )

Bandar Tun Razak
Batu
Bukit Bintang
Cheras
Kepong
Lembah Pantai
Segambut
Seputeh
Setiawangsa
Titiwangsa
Wangsa Maju
Subang Jaya
Petaling Jaya
Putrajaya
Kajang
Klang
Puchong
Port Klang
Sungai Buloh
Ampang Jaya
Shah Alam
Seri Kembangan


In [110]:
#Printing out the number of venues. 
print(kl_venues.shape)
kl_venues.head()

(582, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bandar Tun Razak,3.092,101.7211,Restaurant Al Amal Taman Jaya,3.094404,101.724547,Food
1,Bandar Tun Razak,3.092,101.7211,KB Restoran,3.093953,101.724432,Malay Restaurant
2,Bandar Tun Razak,3.092,101.7211,Rakanda Ent Sdn Bhd,3.088694,101.722114,Bakery
3,Bandar Tun Razak,3.092,101.7211,Pasar Malam Khamis,3.087836,101.721736,Flea Market
4,Bandar Tun Razak,3.092,101.7211,Padang Bola Sepak Bandar Tun Razak,3.092137,101.717157,Soccer Field


In [42]:
#Counting the number of venues and categories
kl_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ampang Jaya,46,46,46,46,46,46
Bandar Tun Razak,17,17,17,17,17,17
Batu,20,20,20,20,20,20
Bukit Bintang,50,50,50,50,50,50
Cheras,5,5,5,5,5,5
Kajang,47,47,47,47,47,47
Kepong,50,50,50,50,50,50
Klang,33,33,33,33,33,33
Lembah Pantai,12,12,12,12,12,12
Petaling Jaya,18,18,18,18,18,18


In [111]:
#Note that there are 163 unique categories
print('There are {} uniques categories.'.format(len(kl_venues['Venue Category'].unique())))

There are 163 uniques categories.


In [112]:
# one hot encoding
kl_onehot = pd.get_dummies(kl_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhood'] = kl_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

kl_onehot.head()

Unnamed: 0,Women's Store,American Restaurant,Arts & Entertainment,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,...,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Wine Bar
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [113]:
#Note that the figure for the various districts in KL is 582 total venues from 163 categories.
#Note that Singapore's figure was 908 venues from 149 categories. This alone would already be interesting.

kl_onehot.shape

(582, 163)

## Results Part 1

Singapore has a smaller physical size than KL; KL should have more venues in much more categories.

This could be due to a smaller number of districts sampled for KL, although the difference is not small, and it is unlikely that the coordinates would have missed an important district. Nonetheless, future iterations of such a study should look at expanding the radius from the coordinates to lower the chances of missing key areas that might have many venues. 

On the other hand, the similarity in the number of venue categories show that both cities are sophisticated cities with a wide range of offerings, and the small difference between them - 149 and 163 could just be down to the larger population of KL (c. 8 million) vs Singapore (c. 5.5 million). 

In [114]:
#one-hot encoding for KL. 
kl_grouped = kl_onehot.groupby('Neighborhood').mean().reset_index()
kl_grouped

Unnamed: 0,Neighborhood,Women's Store,American Restaurant,Arts & Entertainment,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,...,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Volleyball Court,Wine Bar
0,Ampang Jaya,0.0,0.0,0.0,0.065217,0.0,0.021739,0.021739,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0
1,Bandar Tun Razak,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Batu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bukit Bintang,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
4,Cheras,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Kajang,0.0,0.0,0.0,0.106383,0.0,0.0,0.021277,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0
6,Kepong,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0
7,Klang,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.030303,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Lembah Pantai,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0
9,Petaling Jaya,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556


In [115]:
#top 5 venues per district in KL. 
num_top_venues = 5

for hood in kl_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = kl_grouped[kl_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ampang Jaya----
                venue  freq
0  Chinese Restaurant  0.15
1    Malay Restaurant  0.09
2    Asian Restaurant  0.07
3          Food Truck  0.04
4         Pizza Place  0.04


----Bandar Tun Razak----
                  venue  freq
0            Food Court  0.06
1     Indian Restaurant  0.06
2     Convenience Store  0.06
3  Gym / Fitness Center  0.06
4          Night Market  0.06


----Batu----
                  venue  freq
0                 Hotel  0.15
1             Hotel Bar  0.10
2            Steakhouse  0.05
3  Other Great Outdoors  0.05
4         Deli / Bodega  0.05


----Bukit Bintang----
            venue  freq
0           Hotel  0.14
1            Café  0.06
2  Clothing Store  0.04
3   Shopping Mall  0.04
4        Boutique  0.04


----Cheras----
              venue  freq
0       Flea Market   0.2
1  Malay Restaurant   0.2
2        Food Court   0.2
3            Market   0.2
4        Food Truck   0.2


----Kajang----
                  venue  freq
0      Malay Restauran

In [116]:
kl_grouped.shape

(22, 163)

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [92]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sg_grouped['Neighborhood']

for ind in np.arange(sg_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ang Mo Kio,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall
1,Bedok,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant
2,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant
3,Bukit Batok,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park
4,Bukit Merah,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo)


In [117]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
kl_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
kl_neighborhoods_venues_sorted['Neighborhood'] = kl_grouped['Neighborhood']

for ind in np.arange(kl_grouped.shape[0]):
    kl_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

kl_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ampang Jaya,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall
1,Bandar Tun Razak,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant
2,Batu,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant
3,Bukit Bintang,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park
4,Cheras,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo)


## Results Part 2: K-Means Clustering
In this next section, I will use the K-Means machine learning algorithm to look at the various clusters within each city. This clustering exercise will help us see how diverse each city is and how their neighbourhoods might share similar qualities. 

I have set the number of clusters at 10 - this gives about 2-3 neighbourhoods per cluster on average. A higher number might reduce the meaningfulness of the categories. 

The other clustering methods might not be as useful - there is no need for hierarchical clustering as there are no hierarchies; Density-Based SCAN Clustering might be helpful but the number of clusters identified is small. DBSCAN might be more useful for looking at the distribution of shops in both locations. 

In [87]:
# set number of clusters
sg_kclusters = 10

sg_grouped_clustering = sg_grouped.drop('Neighborhood', 1)

# run k-means clustering
sg_kmeans = KMeans(n_clusters=sg_kclusters, random_state=0).fit(sg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
sg_kmeans.labels_[0:10] 

array([3, 3, 3, 9, 5, 2, 8, 3, 3, 7], dtype=int32)

In [93]:
neighborhoods_venues_sorted['Cluster Labels']=sg_kmeans.labels_

In [96]:
sg_merged = sg_towns.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhoods')

In [97]:
sg_merged.head()

Unnamed: 0,Neighbourhoods,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Ang Mo Kio,1.371285,103.846994,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall,3
1,Bedok,1.325928,103.931813,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant,3
2,Bishan,1.348933,103.848906,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant,3
3,Bukit Batok,1.348283,103.749019,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park,9
4,Bukit Merah,1.283919,103.817807,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo),5


In [99]:
sg_merged.to_csv('sg_merged.csv')

In [98]:
# create map
address = 'Singapore'

geolocator = Nominatim(user_agent = 'sg_explorer')
sg_location = geolocator.geocode(address)
sg_latitude = sg_location.latitude
sg_longitude = sg_location.longitude

sg_map_clusters = folium.Map(location=[sg_latitude, sg_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['Latitude'], sg_merged['Longitude'], sg_merged['Neighbourhoods'], sg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(sg_map_clusters)
       
sg_map_clusters

In [157]:
#displaying the SG dataframe with the cluster labels
sg_merged

Unnamed: 0,Neighbourhoods,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Ang Mo Kio,1.371285,103.846994,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall,3
1,Bedok,1.325928,103.931813,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant,3
2,Bishan,1.348933,103.848906,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant,3
3,Bukit Batok,1.348283,103.749019,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park,9
4,Bukit Merah,1.283919,103.817807,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo),5
5,Bukit Panjang,1.377921,103.771866,Park,Food Court,Miscellaneous Shop,Market,Noodle House,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store,Eastern European Restaurant,2
6,Bukit Timah,1.329448,103.794166,Food Court,Indian Restaurant,Bus Station,Coffee Shop,Italian Restaurant,Bakery,Churrascaria,Seafood Restaurant,Building,Café,8
7,Choa Chu Kang,1.384896,103.743005,Coffee Shop,Food Court,Bubble Tea Shop,Furniture / Home Store,Bus Line,Sandwich Place,Shop & Service,Café,Fast Food Restaurant,Pet Store,3
8,Clementi,1.313218,103.765086,Food Court,Asian Restaurant,Chinese Restaurant,Chinese Breakfast Place,Fried Chicken Joint,Fast Food Restaurant,Dim Sum Restaurant,Dessert Shop,Coffee Shop,Gym,3
9,Geylang,1.318186,103.887056,Chinese Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Food Court,Asian Restaurant,Grocery Store,Dim Sum Restaurant,Steakhouse,Dessert Shop,Pool,7


In [118]:
# set number of clusters
kclusters = 10

kl_grouped_clustering = kl_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 7, 1, 1, 6, 7, 0, 5, 5, 0], dtype=int32)

In [120]:
kl_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

kl_merged = kl_districts_df.join(kl_neighborhoods_venues_sorted.set_index('Neighborhood'),on='Name')
kl_merged.head()


Unnamed: 0,Name,Latitudes,Longitudes,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bandar Tun Razak,3.092,101.7211,7,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant
1,Batu,3.139,101.6869,1,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant
2,Bukit Bintang,3.1468,101.7113,1,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park
3,Cheras,3.1068,101.7259,6,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo)
4,Kepong,3.214,101.635,0,Food Court,Indian Restaurant,Bus Station,Coffee Shop,Italian Restaurant,Bakery,Churrascaria,Seafood Restaurant,Building,Café


In [121]:
kl_merged.to_csv('kl_merged.csv')

In [122]:
kl_address = 'Kuala Lumpur'

geolocator = Nominatim(user_agent = 'kl_explorer')
kl_location = geolocator.geocode(kl_address)
kl_latitude = kl_location.latitude
kl_longitude = kl_location.longitude

kl_map_clusters = folium.Map(location=[kl_latitude, kl_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitudes'], kl_merged['Longitudes'], kl_merged['Name'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(kl_map_clusters)
       
kl_map_clusters

In [156]:
kl_merged

Unnamed: 0,Name,Latitudes,Longitudes,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bandar Tun Razak,3.092,101.7211,7,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant
1,Batu,3.139,101.6869,1,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant
2,Bukit Bintang,3.1468,101.7113,1,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park
3,Cheras,3.1068,101.7259,6,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo)
4,Kepong,3.214,101.635,0,Food Court,Indian Restaurant,Bus Station,Coffee Shop,Italian Restaurant,Bakery,Churrascaria,Seafood Restaurant,Building,Café
5,Lembah Pantai,3.1252,101.6683,5,Food Court,Asian Restaurant,Chinese Restaurant,Chinese Breakfast Place,Fried Chicken Joint,Fast Food Restaurant,Dim Sum Restaurant,Dessert Shop,Coffee Shop,Gym
6,Segambut,3.1917,101.6734,4,Chinese Restaurant,Coffee Shop,Noodle House,Asian Restaurant,Bakery,Fast Food Restaurant,Seafood Restaurant,Soccer Stadium,Soup Place,Food Court
7,Seputeh,3.115,101.6797,1,Hotel,Japanese Restaurant,Massage Studio,Multiplex,Chinese Restaurant,Salad Place,Indian Restaurant,Noodle House,Recreation Center,Clothing Store
8,Setiawangsa,3.183,101.7462,8,Bus Station,High School,Chinese Restaurant,Basketball Court,Market,Wings Joint,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store
9,Titiwangsa,3.1774,101.7077,1,Basketball Court,Bus Station,Dessert Shop,Café,Bus Line,Wings Joint,Flea Market,Fishing Spot,Fast Food Restaurant,Falafel Restaurant


In [151]:
sg_popvenue = sg_venues.groupby('Venue Category')['Neighborhood'].count().sort_values(ascending=False)[:30].to_frame(name='Count').reset_index()

In [154]:
sg_popvenue.to_csv('sg_popvenue.csv')

In [None]:
# Check top 30 most frequently occuring venue type
kl_venues.groupby('Venue Category')['Neighborhood'].count().sort_values(ascending=False)[:30]

In [155]:
kl_popvenue.to_csv('kl_popvenue.csv')

# Results Part 2a: Singapore Cluster intro

In this section, we will be looking at the neighbourhoods clusters and attempt to interpret the results. 

In [128]:
# Looks like Japanese Restaurant X Coffeeshop X Shopping mall seem to be the attributes

sg_merged.loc[sg_merged['Cluster Labels'] == 0, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
11,Jurong East,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Department Store,Bubble Tea Shop,Café,Clothing Store,Shopping Mall,Hotpot Restaurant,Seafood Restaurant,0
21,Tampines,Café,Bakery,Sushi Restaurant,Shopping Mall,Pharmacy,Chinese Restaurant,Clothing Store,Coffee Shop,Supermarket,Gym,0
24,Woodlands,Japanese Restaurant,Coffee Shop,Café,Shopping Mall,Electronics Store,Fast Food Restaurant,Clothing Store,Asian Restaurant,Chinese Restaurant,Indian Restaurant,0


In [129]:
# Bus X School X Chinese Restaurant are the main attributes

sg_merged.loc[sg_merged['Cluster Labels'] == 1, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
16,Punggol,Bus Station,High School,Chinese Restaurant,Basketball Court,Market,Wings Joint,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store,1


In [130]:
#Park X Food Court X Misc. Shop are the main attributes

sg_merged.loc[sg_merged['Cluster Labels'] == 2, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
5,Bukit Panjang,Park,Food Court,Miscellaneous Shop,Market,Noodle House,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store,Eastern European Restaurant,2


In [131]:
#Food Court X Coffee Shop X Fast Food Restaurant are the main attributes
#This cluster contains the 'heartland' of Singapore, with several housing estates. 

sg_merged.loc[sg_merged['Cluster Labels'] == 3, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Ang Mo Kio,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall,3
1,Bedok,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant,3
2,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant,3
7,Choa Chu Kang,Coffee Shop,Food Court,Bubble Tea Shop,Furniture / Home Store,Bus Line,Sandwich Place,Shop & Service,Café,Fast Food Restaurant,Pet Store,3
8,Clementi,Food Court,Asian Restaurant,Chinese Restaurant,Chinese Breakfast Place,Fried Chicken Joint,Fast Food Restaurant,Dim Sum Restaurant,Dessert Shop,Coffee Shop,Gym,3
10,Hougang,Food Court,Coffee Shop,Noodle House,Food,Stadium,Pool,Park,Café,Sandwich Place,Bus Line,3
12,Jurong West,Asian Restaurant,Japanese Restaurant,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Dessert Shop,Café,Pizza Place,Bubble Tea Shop,Shopping Mall,3
15,Pasir Ris,Fast Food Restaurant,Food Court,Bus Station,Coffee Shop,Sandwich Place,Asian Restaurant,Karaoke Bar,Japanese Restaurant,Snack Place,Italian Restaurant,3
19,Sengkang,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Food Court,Supermarket,Shopping Mall,Sculpture Garden,Sandwich Place,Bus Line,Bus Station,3
25,Yishun,Chinese Restaurant,Fried Chicken Joint,Coffee Shop,Food Court,Grocery Store,Hainan Restaurant,Dessert Shop,Fast Food Restaurant,Supermarket,Bubble Tea Shop,3


In [132]:
#Basketball Court X Bus Station X Dessert Shop are the main attributes

sg_merged.loc[sg_merged['Cluster Labels'] == 4, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
20,Serangoon,Basketball Court,Bus Station,Dessert Shop,Café,Bus Line,Wings Joint,Flea Market,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,4


In [133]:
#Although Chinese Restaurant, Fast Food Restaurant and Food COurt are the main attributes, this differs from 
#Cluster 3 in the subsequent areas


sg_merged.loc[sg_merged['Cluster Labels'] == 5, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
4,Bukit Merah,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo),5


In [134]:
#Hotels X Japanese Restaurant are the main drivers

sg_merged.loc[sg_merged['Cluster Labels'] == 6, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
14,Marine Parade,Hotel,Japanese Restaurant,Massage Studio,Multiplex,Chinese Restaurant,Salad Place,Indian Restaurant,Noodle House,Recreation Center,Clothing Store,6
22,Tanjong Pagar,Japanese Restaurant,Coffee Shop,Hotel,Bakery,Gym / Fitness Center,Cocktail Bar,Mexican Restaurant,Ramen Restaurant,Yoga Studio,Italian Restaurant,6


In [135]:
#Chinese Restauran X Noodle House X are the main drivers

sg_merged.loc[sg_merged['Cluster Labels'] == 7, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
9,Geylang,Chinese Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Food Court,Asian Restaurant,Grocery Store,Dim Sum Restaurant,Steakhouse,Dessert Shop,Pool,7
13,Kallang/Whampoa,Chinese Restaurant,Coffee Shop,Noodle House,Asian Restaurant,Bakery,Fast Food Restaurant,Seafood Restaurant,Soccer Stadium,Soup Place,Food Court,7
17,Queenstown,Chinese Restaurant,Asian Restaurant,Noodle House,Stadium,Supermarket,Cosmetics Shop,Fried Chicken Joint,Malay Restaurant,Market,Basketball Court,7
23,Toa Payoh,Noodle House,Chinese Restaurant,Coffee Shop,Food Court,Asian Restaurant,Steakhouse,Bakery,Bus Station,Café,Snack Place,7


In [136]:
#Food Court X Indian Restaurant X Bus station are the main drivers

sg_merged.loc[sg_merged['Cluster Labels'] == 8, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
6,Bukit Timah,Food Court,Indian Restaurant,Bus Station,Coffee Shop,Italian Restaurant,Bakery,Churrascaria,Seafood Restaurant,Building,Café,8


In [137]:
#Coffee Shop seems to be the main attribute. 

sg_merged.loc[sg_merged['Cluster Labels'] == 9, sg_merged.columns[[0] + list(range(3, sg_merged.shape[1]))]]

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
3,Bukit Batok,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park,9
18,Sembawang,Coffee Shop,Fast Food Restaurant,Japanese Restaurant,Bakery,Shopping Mall,Buffet,Malay Restaurant,Pharmacy,Bistro,Basketball Court,9


## Results Part 2a: Singapore Clusters

The main way the algorithm has clustered is by the most common category of venues in each cluster. The main way the clusters have differed is by the difference in food shops. The biggest cluster is Cluster 3, which contains several housing estates, and other amenities. 



## Results Part 2b: KL Clusters intro

In this section, we will be looking at the neighbourhoods clusters and attempt to interpret the results. 

In [140]:
#Food Court X Restaurant (Chinese/Indian) X Bus station are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 0, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Kepong,0,Food Court,Indian Restaurant,Bus Station,Coffee Shop,Italian Restaurant,Bakery,Churrascaria,Seafood Restaurant,Building,Café
12,Petaling Jaya,0,Chinese Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Food Court,Asian Restaurant,Grocery Store,Dim Sum Restaurant,Steakhouse,Dessert Shop,Pool
16,Puchong,0,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Department Store,Bubble Tea Shop,Café,Clothing Store,Shopping Mall,Hotpot Restaurant,Seafood Restaurant
21,Seri Kembangan,0,Fast Food Restaurant,Food Court,Bus Station,Coffee Shop,Sandwich Place,Asian Restaurant,Karaoke Bar,Japanese Restaurant,Snack Place,Italian Restaurant


In [141]:
#Food Court X Coffee shop are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 1, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Batu,1,Food Court,Coffee Shop,Bubble Tea Shop,Café,Japanese Restaurant,Supermarket,Chinese Restaurant,Ice Cream Shop,Dumpling Restaurant,Eastern European Restaurant
2,Bukit Bintang,1,Coffee Shop,Chinese Restaurant,Bakery,Food Court,Fast Food Restaurant,Malay Restaurant,Plaza,Pool,Department Store,Park
7,Seputeh,1,Hotel,Japanese Restaurant,Massage Studio,Multiplex,Chinese Restaurant,Salad Place,Indian Restaurant,Noodle House,Recreation Center,Clothing Store
9,Titiwangsa,1,Basketball Court,Bus Station,Dessert Shop,Café,Bus Line,Wings Joint,Flea Market,Fishing Spot,Fast Food Restaurant,Falafel Restaurant
10,Wangsa Maju,1,Café,Bakery,Sushi Restaurant,Shopping Mall,Pharmacy,Chinese Restaurant,Clothing Store,Coffee Shop,Supermarket,Gym


In [142]:
#Coffee Shop X Fast Food X Japanese Restaurant are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 2, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Subang Jaya,2,Coffee Shop,Fast Food Restaurant,Japanese Restaurant,Bakery,Shopping Mall,Buffet,Malay Restaurant,Pharmacy,Bistro,Basketball Court


In [143]:
#Food Court X Coffee Shop X Noodle House are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 3, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Port Klang,3,Food Court,Coffee Shop,Noodle House,Food,Stadium,Pool,Park,Café,Sandwich Place,Bus Line


In [144]:
#Chinese Restaurant X Coffee Shop X Noodle House are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 4, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Segambut,4,Chinese Restaurant,Coffee Shop,Noodle House,Asian Restaurant,Bakery,Fast Food Restaurant,Seafood Restaurant,Soccer Stadium,Soup Place,Food Court


In [146]:
#Food Court X Restaurant X Coffee Shop are the main attributes 

kl_merged.loc[kl_merged['Cluster Labels'] == 5, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Lembah Pantai,5,Food Court,Asian Restaurant,Chinese Restaurant,Chinese Breakfast Place,Fried Chicken Joint,Fast Food Restaurant,Dim Sum Restaurant,Dessert Shop,Coffee Shop,Gym
15,Klang,5,Coffee Shop,Food Court,Bubble Tea Shop,Furniture / Home Store,Bus Line,Sandwich Place,Shop & Service,Café,Fast Food Restaurant,Pet Store
19,Ampang Jaya,5,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Supermarket,Convenience Store,Shopping Mall


In [147]:
#Chinese Restaurant X Fast Food Restaurant X Food Court are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 6, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Cheras,6,Chinese Restaurant,Fast Food Restaurant,Food Court,Coffee Shop,Bistro,Malay Restaurant,Café,Sandwich Place,Noodle House,Residential Building (Apartment / Condo)


In [148]:
# Chinese/Fast Food Restaurant X Coffee Shop X Resaturant are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 7, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bandar Tun Razak,7,Chinese Restaurant,Coffee Shop,Japanese Restaurant,Food Court,Thai Restaurant,Sandwich Place,Supermarket,Café,Fast Food Restaurant,Asian Restaurant
14,Kajang,7,Park,Food Court,Miscellaneous Shop,Market,Noodle House,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store,Eastern European Restaurant
18,Sungai Buloh,7,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Food Court,Supermarket,Shopping Mall,Sculpture Garden,Sandwich Place,Bus Line,Bus Station
20,Shah Alam,7,Chinese Restaurant,Asian Restaurant,Noodle House,Stadium,Supermarket,Cosmetics Shop,Fried Chicken Joint,Malay Restaurant,Market,Basketball Court


In [149]:
#Bus Station X High School X Chinese Restaurant are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 8, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Setiawangsa,8,Bus Station,High School,Chinese Restaurant,Basketball Court,Market,Wings Joint,Fishing Spot,Fast Food Restaurant,Falafel Restaurant,Electronics Store


In [150]:
#Asian Restaurant X Japanese Restaurant X Fast Food Restaurant are the main attributes

kl_merged.loc[kl_merged['Cluster Labels'] == 9, kl_merged.columns[[0] + list(range(3, kl_merged.shape[1]))]]

Unnamed: 0,Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Putrajaya,9,Asian Restaurant,Japanese Restaurant,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Dessert Shop,Café,Pizza Place,Bubble Tea Shop,Shopping Mall


## Results 2b: KL Clusters

The KL clusters revolve around restaurants - Chinese/Japanese/Asian/Fast Food, and coffee shops. 

## Results 3: Comparing Venue Category Frequency Counts

We can zoom back and take a look at the overall popularity of venue categories in both cities. 

In [153]:
kl_popvenue = kl_venues.groupby('Venue Category')['Neighborhood'].count().sort_values(ascending=False)[:30].to_frame(name = 'Count').reset_index()
kl_popvenue

Unnamed: 0,Venue Category,Count
0,Chinese Restaurant,64
1,Malay Restaurant,39
2,Asian Restaurant,36
3,Indian Restaurant,18
4,Café,16
5,Hotel,15
6,Convenience Store,14
7,Coffee Shop,12
8,Shopping Mall,11
9,Restaurant,9


In [152]:
sg_popvenue

Unnamed: 0,Venue Category,Count
0,Coffee Shop,67
1,Chinese Restaurant,61
2,Food Court,53
3,Japanese Restaurant,41
4,Asian Restaurant,37
5,Fast Food Restaurant,32
6,Café,28
7,Noodle House,26
8,Bakery,26
9,Supermarket,22


## Results 3: Comparing Venue Category Frequency Counts

While both cities have similar constitutes in their venue category frequency counts, they differ significantly in the overall order. In both cities, restaurants and food outlets occupy the top spots, in Singapore supermarkets make an appearance, highlighting the high residential density there. 

## Discussion

Here are the main observations from this clustering analysis:
1. KL has 582 total venues from 163 categories. Singapore's figure was 908 venues from 149 categories. This alone would already be interesting.

2. Singapore's clusters are dominated by housing estates, with possibly Marine Parade and Tanjong Pagar as more unusual neighbourhoods.

3. KL's clusters are more heterogenous, with more obviously different neighbourhoods. 

4. Singapore's common places has slightly higher number of food outlets, edging out Malaysia - by about 22 to 19. This suggests that the food landscape in dense Singapore might be more competitive than KL. This is what this dataset suggests, and refinements of this finding should be undertaken. 

5. A visitor that likes food places should consider Singapore owing to the higher placements of food places, although KL could be more interesting as a location with more distinct clusters. 

6. A potential restauranteur can consider KL to avoid competition, although there is a trade off when it comes to density. KL's sprawling nature means more traveling time for consumers when compared to Singapore. 




## Conclusion

This concludes the report for the IBM Data Science Professional Certificate - Applied Science Capstone module. 

I have formulated objectives and utilised data sources relevant for the project - via Geopy, and from Google's API. I have used the API calls from Foursquare's Places API, and collected venue names, and venue categories, and processed them for analysis. 

I have made use of k-means clustering, an unsupervised machine learning classification algorithm to classify the neighbourhoods. This approach allows for an initial exploration into the data collected that can be further refined through subsequent investigation ("Now we know what to look out for"). 

