# Battle of The Neighborhoods in Singapore

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. [Introduction and Problem Statement](#introduction)

2. [Description of Data and How to use it](#datadescription)
    
3. [Methodology](#method)

4. [Results](#results)

5. [Discussion](#discussion)
    
6. [Conclusion](#conclusion)
    
</font>
</div>

## 1. Introduction and Problem Statement  <a name="introduction"></a>

Singapore is small country of 725.7 sq kms area and 5.6 million population. Despite its small size, Singapore has a diversity of languages, religions, and cultures. Singapore does not fit the traditional description of a nation, it is a society-in-transition, given the fact that Singaporeans do not all speak the same language, share the same religion, or have the same customs. This diversity in Singapore population as given rise to neighborhoods which can be distinguiished from each other based on cuisine, culture, food, nationality, religion and many other features.

I am trying to explore what the venue distribution tells us about the neighborhood.

I am an Indian national residing in Singapore. I have stayed in places such as Clementi, Bukit Panjang.

1. Is there a similarity between these locations I have stayed at?
2. I am looking to move to east of island due to change in job, which area would be similar to my preference?


## 2. Description of Data and How to use it <a name="datadescription"></a>

#### Wikipedia:
Wikipedia on Singapores postal codes gives us the information of how the country is divided into various locations.  
https://en.wikipedia.org/wiki/Postal_codes_in_Singapore

We can scrape wikipedia page for the postal code table and locations in singapore.

#### Geopy library Nominatim API
The search API allows you to look up a location from a textual description.
From the list of locations in singapore found from wikipedia we can obtain geographical coordinates from geopy library.  (https://nominatim.openstreetmap.org/)

#### Foursquare API
This API offers real-time access to Foursquare’s global database of rich venue data and user content to power your location-based experiences in your app or website.
We then construct a URL to send a request to the Foursquare API to explore geographical locations, and to get trending venues around a location.  
https://foursquare.com/



## 3. Methodology <a name="method"></a>
So how we would approach this problem using data is as follows:

- Collect the Singapore city data from https://en.wikipedia.org/wiki/Postal_codes_in_Singapore
- Using Geopy library and Nominatim API we determine coordinates of each loaction
- Using FourSquare API we will find all venues for each neighborhood
- Using venue descition and frequency we sort each venue by location
- Visualize the neighborhoods using folium library
- Cluster the neighborhoods using the kmeans clustering algorithm
- Derive conclusion based on the clusters and venue data

Before we get the data and start exploring it, let's download all the dependencies that we will need

In [254]:
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import pandas as pd
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
import geocoder
print('Libraries imported.')

Libraries imported.


In Singapore there are multiple postal district and within each postal distrct there are postal sectors.
Each postal sector has locations associated.
Here we will extract the list of locations from the wikipedia page

In [255]:
URL = "https://en.wikipedia.org/wiki/Postal_codes_in_Singapore"
soup = bs(urlopen(URL))
#print(soup.prettify())
My_table = soup.find('table',{'class':'wikitable'})

I also add Latitude and Longitude columns which we will fill later. RIght now they have None values.

In [256]:

column_names = ['Postal district','Postal sector','Location','Latitude','Longitude']
df = pd.DataFrame(columns = column_names)

output_rows = []
for table_row in My_table.findAll('tr'):    
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.get_text(strip=True))
    if len(output_row)==3:    
        output_row.append(None)
        output_row.append(None)
        df.loc[len(df)]=output_row

print(df.shape)
df

(28, 5)


Unnamed: 0,Postal district,Postal sector,Location,Latitude,Longitude
0,1,"01, 02, 03, 04, 05, 06","Raffles Place, Cecil, Marina, People's Park",,
1,2,"07, 08","Anson, Tanjong Pagar",,
2,3,"14, 15, 16","Bukit Merah,Queenstown,Tiong Bahru",,
3,4,"09, 10","Telok Blangah, Harbourfront",,
4,5,"11, 12, 13","Pasir Panjang, Hong Leong Garden, Clementi New...",,
5,6,17,"High Street, Beach Road (part)",,
6,7,"18, 19","Middle Road, Golden Mile",,
7,8,"20, 21","Little India,Farrer Park,Jalan Besar,Lavender",,
8,9,"22, 23","Orchard, Cairnhill, River Valley",,
9,10,"24, 25, 26, 27","Ardmore, Bukit Timah, Holland Road, Tanglin",,


Explode the location columns, that is we will convert the comma separated Location column into a list which contains all locations in Singapore. <br> Still the latitude and longitude columns are empty.

In [257]:
df_sg =df.set_index(df.columns.drop('Location',1).tolist()).Location.str.split(',', expand=True).stack().reset_index().rename(columns={0:'Location'}).loc[:, df.columns]
df_sg = df_sg[['Location','Latitude','Longitude']]
df_sg['Location']=df_sg['Location'].str.strip()
df_sg

Unnamed: 0,Location,Latitude,Longitude
0,Raffles Place,,
1,Cecil,,
2,Marina,,
3,People's Park,,
4,Anson,,
...,...,...,...
70,Upper Thomson,,
71,Springleaf,,
72,Yishun,,
73,Sembawang,,


#### Use geopy library to get the latitude and longitude values of all locations in Singapore 

In [258]:
geolocator = Nominatim(user_agent="sg1_explorer")

for index, row in df_sg.iterrows():
    address=row['Location']
    location = geolocator.geocode('{} ,Singapore'.format(address))
    if(location is not None):
        df_sg.iloc[index,df_sg.columns.get_loc('Latitude')]=location.latitude
        df_sg.iloc[index,df_sg.columns.get_loc('Longitude')]=location.longitude
        print('The geograpical coordinate of {} are {}, {}.'.format(address,location.latitude, location.longitude))
        #df_sg.to_csv('sg lat long.csv',index=False)

# initialize your variable to None
#lat_lng_coords = None

The geograpical coordinate of Raffles Place are 1.2844077, 103.85139.
The geograpical coordinate of Cecil are 1.2826449, 103.8507869.
The geograpical coordinate of Marina are 1.2904753, 103.8520359.
The geograpical coordinate of People's Park are 1.2858105, 103.8441598.
The geograpical coordinate of Anson are 1.2758152, 103.8464915.
The geograpical coordinate of Tanjong Pagar are 1.2765707, 103.845848.
The geograpical coordinate of Bukit Merah are 1.2806275, 103.8305915.
The geograpical coordinate of Queenstown are 1.2946235, 103.8060454.
The geograpical coordinate of Tiong Bahru are 1.2861968, 103.8257646.
The geograpical coordinate of Telok Blangah are 1.2705858, 103.8098632.
The geograpical coordinate of Harbourfront are 1.2653951, 103.8224032.
The geograpical coordinate of Pasir Panjang are 1.27620135, 103.7914758234202.
The geograpical coordinate of Clementi New Town are 1.3140256, 103.7624098.
The geograpical coordinate of High Street are 1.2893011, 103.8511455.
The geograpical c

In [259]:
#write to csv to future use
#df_sg.to_csv('sg lat long.csv',index=False)

We now populate the extracted data into our dataframe for further analysis

In [260]:
location = geolocator.geocode('Singapore')
latitude=location.latitude
longitude = location.longitude
df_sg

Unnamed: 0,Location,Latitude,Longitude
0,Raffles Place,1.284408,103.851390
1,Cecil,1.282645,103.850787
2,Marina,1.290475,103.852036
3,People's Park,1.285810,103.844160
4,Anson,1.275815,103.846491
...,...,...,...
70,Upper Thomson,1.354498,103.832821
71,Springleaf,1.398315,103.818057
72,Yishun,1.429384,103.835028
73,Sembawang,1.449093,103.820055


#### Create a map of Singapore with neighborhood locations superimposed on top.

In [261]:
# create map of Singapore using latitude and longitude values
df_sg=pd.read_csv('sg lat long.csv', delimiter = ',')
df_sg=df_sg.dropna(how='any')

map_sg = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, location in zip(df_sg['Latitude'], df_sg['Longitude'], df_sg['Location']):
    label = '{}, Singapore'.format(location)
    label = folium.Popup(label, parse_html=True)
    #print(pd.isna(lat))
    if(pd.isna(lat)==False):
        folium.CircleMarker(
            [lat, lng],
            radius=9,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_sg)  
    
map_sg

#### Define Foursquare Credentials and Version

In [262]:
CLIENT_ID = 'KACFPC3HJ21VF0TIHOHKNO3BNE0AFDJZPN0HYLH0T2Q0X0OE' # your Foursquare ID
CLIENT_SECRET = '2QCHHNJRNEE2OB0RDYJWBF0PITRMJOE404IMPB1YB3MYB4V2' # your Foursquare Secret
VERSION = '20190605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KACFPC3HJ21VF0TIHOHKNO3BNE0AFDJZPN0HYLH0T2Q0X0OE
CLIENT_SECRET:2QCHHNJRNEE2OB0RDYJWBF0PITRMJOE404IMPB1YB3MYB4V2


GET request URL explore all neighborhoods in Toronto

### Explore Neighborhoods in Singapore
Let's create a function to find all nearby venues for each location in Singapore through Foursquare API

In [263]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&LIMIT={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the function to get all venues at each neighborhood/location

In [264]:
sg_venues = getNearbyVenues(names=df_sg['Location'],
                                   latitudes=df_sg['Latitude'],
                                   longitudes=df_sg['Longitude']
                                  )
print(sg_venues.shape)

Raffles Place
Cecil
Marina
People's Park
Anson
Tanjong Pagar
Bukit Merah
Queenstown
Tiong Bahru
Telok Blangah
Harbourfront
Pasir Panjang
Clementi New Town
High Street
Middle Road
Golden Mile
Little India
Farrer Park
Jalan Besar
Lavender
Orchard
Cairnhill
River Valley
Ardmore
Bukit Timah
Holland Road
Tanglin
Watten Estate
Novena
Thomson
Balestier
Toa Payoh
Serangoon
Macpherson
Braddell
Geylang
Eunos
Katong
Joo Chiat
Amber Road
Bedok
Upper East Coast
Eastwood
Kew Drive
Loyang
Changi
Simei
Tampines
Pasir Ris
Serangoon Garden
Hougang
Punggol
Bishan
Ang Mo Kio
Upper Bukit Timah
Clementi Park
Ulu Pandan
Jurong
Tuas
Hillview
Dairy Farm
Bukit Panjang
Choa Chu Kang
Lim Chu Kang
Tengah
Kranji
Woodgrove
Woodlands
Upper Thomson
Springleaf
Yishun
Sembawang
Seletar
(1691, 7)


#### Output of the above function in a dataframe.
This is how the resulting dataframe after running the Foursquare API looks like

In [265]:
print(sg_venues.shape)
sg_venues.head()

(1691, 7)


Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Raffles Place,1.284408,103.85139,CITY Hot Pot Shabu shabu,1.284173,103.851585,Hotpot Restaurant
1,Raffles Place,1.284408,103.85139,The Fullerton Bay Hotel,1.283878,103.853314,Hotel
2,Raffles Place,1.284408,103.85139,CULINARYON,1.284876,103.850933,Comfort Food Restaurant
3,Raffles Place,1.284408,103.85139,The Salad Shop,1.285523,103.851177,Salad Place
4,Raffles Place,1.284408,103.85139,Virgin Active,1.284608,103.850815,Gym / Fitness Center


Get how many venues at each neighborhood

In [282]:
sg_bar= sg_venues.groupby('Location').count()
sg_bar=sg_bar[['Venue']]
sg_bar

Unnamed: 0_level_0,Venue
Location,Unnamed: 1_level_1
Amber Road,30
Ang Mo Kio,30
Anson,30
Ardmore,30
Balestier,30
...,...
Upper Thomson,30
Watten Estate,18
Woodgrove,13
Woodlands,30


One hot encoding  to determine for each venue category how many venues are there

In [187]:
# one hot encoding  to determine for each venue category how many venues are there
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sg_onehot['Location'] = sg_venues['Location'] 

#fixed_columns = [toronto_onehot.columns[157]] + list(toronto_onehot.columns[:-1])
#toronto_onehot = toronto_onehot[fixed_columns]

#toronto_onehot.iloc[3][157]

sg_grouped = sg_onehot.groupby('Location').sum().reset_index()

sg_grouped.to_csv('sg_neighborhood.csv',index=False)
sg_grouped.head()

Unnamed: 0,Location,Accessories Store,Airport,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo Exhibit
0,Amber Road,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Anson,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ardmore,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Balestier,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [188]:
sg_grouped.shape

(71, 219)

### Which are the TOP 5 venues near each neighbourhood

In [189]:
num_top_venues = 5

for hood in sg_grouped['Location']:
    print("----"+hood+"----")
    temp = sg_grouped[sg_grouped['Location'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Amber Road----
                 venue  freq
0                Hotel   4.0
1    Indian Restaurant   3.0
2  Japanese Restaurant   3.0
3                 Café   2.0
4                  Bar   2.0


----Ang Mo Kio----
                 venue  freq
0          Coffee Shop   4.0
1  Japanese Restaurant   2.0
2          Supermarket   2.0
3           Food Court   2.0
4           Hobby Shop   1.0


----Anson----
                  venue  freq
0   Japanese Restaurant   4.0
1           Coffee Shop   4.0
2                Bakery   3.0
3      Ramen Restaurant   2.0
4  Gym / Fitness Center   2.0


----Ardmore----
                 venue  freq
0                Hotel   7.0
1  Japanese Restaurant   4.0
2   Chinese Restaurant   2.0
3             Boutique   2.0
4    French Restaurant   1.0


----Balestier----
                  venue  freq
0    Chinese Restaurant   5.0
1                 Hotel   3.0
2                Bakery   2.0
3            Food Court   2.0
4  Fast Food Restaurant   2.0


----Bedok----
        

### Let's put that into a pandas dataframe
First, let's write a function to sort the venues in descending order.

In [190]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [289]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = sg_grouped['Location']

for ind in np.arange(sg_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amber Road,Hotel,Indian Restaurant,Japanese Restaurant,Bakery,Café,Bar,Ice Cream Shop,Bistro,Sushi Restaurant,Lounge
1,Ang Mo Kio,Coffee Shop,Japanese Restaurant,Food Court,Supermarket,Shopping Mall,Dessert Shop,Bubble Tea Shop,Modern European Restaurant,Burger Joint,Snack Place
2,Anson,Coffee Shop,Japanese Restaurant,Bakery,Hotel,Gym / Fitness Center,Ramen Restaurant,Spanish Restaurant,Kebab Restaurant,Shopping Mall,Mexican Restaurant
3,Ardmore,Hotel,Japanese Restaurant,Chinese Restaurant,Boutique,Thai Restaurant,Club House,Café,French Restaurant,Burger Joint,Miscellaneous Shop
4,Balestier,Chinese Restaurant,Hotel,Food Court,Fast Food Restaurant,Bakery,Noodle House,Tea Room,Shopping Mall,Middle Eastern Restaurant,Dessert Shop


### Cluster Neighborhoods
Run k-means to cluster the neighborhood into 5 clusters.

In [290]:
# set number of clusters
kclusters = 5

sg_grouped_clustering = sg_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:100]



array([3, 1, 1, 3, 0, 1, 1, 0, 1, 1, 4, 4, 4, 4, 1, 4, 1, 4, 4, 1, 3, 0,
       4, 4, 4, 1, 4, 1, 3, 4, 3, 4, 4, 1, 4, 2, 4, 4, 4, 3, 1, 1, 4, 1,
       4, 1, 4, 4, 0, 4, 1, 1, 0, 1, 4, 1, 4, 1, 1, 4, 0, 0, 4, 1, 4, 1,
       1, 4, 4, 1, 1])

## 4. Results <a name="results"></a>

After running the k means clustering. Following is the resultant table with cluter labels added to each location in Singapore

In [291]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sg_merged = df_sg

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
sg_merged = sg_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

sg_merged.head() # check the last columns!

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Raffles Place,1.284408,103.85139,4.0,Hotel,Gym / Fitness Center,Cocktail Bar,Japanese Restaurant,Salad Place,Modern European Restaurant,Building,Bridge,Massage Studio,Restaurant
1,Cecil,1.282645,103.850787,4.0,Salad Place,Hotel,Gym / Fitness Center,Cocktail Bar,Food Court,Café,Martial Arts Dojo,Beer Garden,Restaurant,Lounge
2,Marina,1.290475,103.852036,4.0,Concert Hall,Event Space,Performing Arts Venue,Business Service,Café,Restaurant,Club House,Cocktail Bar,Coffee Shop,Park
3,People's Park,1.28581,103.84416,4.0,Chinese Restaurant,Hostel,Spa,Hotel,BBQ Joint,Food Court,Seafood Restaurant,Soup Place,Flea Market,Noodle House
4,Anson,1.275815,103.846491,1.0,Coffee Shop,Japanese Restaurant,Bakery,Hotel,Gym / Fitness Center,Ramen Restaurant,Spanish Restaurant,Kebab Restaurant,Shopping Mall,Mexican Restaurant


In [292]:
# we can write this output to csv file for future reference
sg_merged.to_csv('sg_neighborhood.csv',index=False)

### Finally, let's visualize the resulting clusters

In [293]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['Latitude'], sg_merged['Longitude'],sg_merged['Location'], sg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    if(pd.isna(cluster)==False):
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster)],
            fill=True,
            fill_color=rainbow[int(cluster)],
            fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster.

In [294]:
sg_merged.loc[sg_merged['Cluster Labels'] == 0]

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Tiong Bahru,1.286197,103.825765,0.0,Chinese Restaurant,Sushi Restaurant,Food Court,Coffee Shop,Bakery,Korean Restaurant,Park,Sandwich Place,Supermarket,Fast Food Restaurant
24,River Valley,1.308398,103.886149,0.0,Noodle House,Seafood Restaurant,Chinese Restaurant,Food Court,Asian Restaurant,Dim Sum Restaurant,Supermarket,Bar,Korean Restaurant,BBQ Joint
32,Balestier,1.326226,103.847315,0.0,Chinese Restaurant,Hotel,Food Court,Fast Food Restaurant,Bakery,Noodle House,Tea Room,Shopping Mall,Middle Eastern Restaurant,Dessert Shop
33,Toa Payoh,1.335391,103.849741,0.0,Coffee Shop,Chinese Restaurant,Snack Place,Noodle House,Food Court,Café,Thai Restaurant,Fast Food Restaurant,Cosmetics Shop,Bubble Tea Shop
36,Braddell,1.340458,103.846767,0.0,Chinese Restaurant,Food Court,Noodle House,Bakery,Café,Seafood Restaurant,Asian Restaurant,Hakka Restaurant,Fast Food Restaurant,Ice Cream Shop
37,Geylang,1.318186,103.887056,0.0,Chinese Restaurant,Noodle House,Food Court,Vegetarian / Vegan Restaurant,Dim Sum Restaurant,Asian Restaurant,Grocery Store,Dessert Shop,Steakhouse,Cantonese Restaurant
51,Serangoon Garden,1.362458,103.866013,0.0,Bakery,Chinese Restaurant,Food Court,Noodle House,Sushi Restaurant,Supermarket,Korean Restaurant,Beer Garden,Mediterranean Restaurant,Ice Cream Shop


In [295]:
sg_merged.loc[sg_merged['Cluster Labels'] == 1]

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Anson,1.275815,103.846491,1.0,Coffee Shop,Japanese Restaurant,Bakery,Hotel,Gym / Fitness Center,Ramen Restaurant,Spanish Restaurant,Kebab Restaurant,Shopping Mall,Mexican Restaurant
5,Tanjong Pagar,1.276571,103.845848,1.0,Japanese Restaurant,Bakery,Coffee Shop,Ramen Restaurant,Hotel,Gym / Fitness Center,Salad Place,Café,Burrito Place,Sushi Restaurant
6,Bukit Merah,1.280628,103.830591,1.0,Asian Restaurant,Café,Coffee Shop,Noodle House,Chinese Restaurant,Bookstore,Ice Cream Shop,Bakery,Pizza Place,Department Store
9,Telok Blangah,1.270586,103.809863,1.0,Food Court,Coffee Shop,Hotel,Shopping Mall,Bus Station,Metro Station,Board Shop,Supermarket,Market,Chinese Restaurant
21,Lavender,1.307372,103.862772,1.0,Coffee Shop,Hotel,Food Court,BBQ Joint,Restaurant,Dessert Shop,Supermarket,French Restaurant,Fast Food Restaurant,Dim Sum Restaurant
22,Orchard,1.305272,103.832876,1.0,Hotel,Boutique,Coffee Shop,Cosmetics Shop,Bubble Tea Shop,Sushi Restaurant,Bakery,Chocolate Shop,Shopping Mall,Frozen Yogurt Shop
30,Novena,1.320526,103.843881,1.0,Café,Coffee Shop,Bakery,Hotel,Ramen Restaurant,Hotpot Restaurant,Japanese Restaurant,Juice Bar,Gym,German Restaurant
34,Serangoon,1.349763,103.873721,1.0,Clothing Store,Coffee Shop,Chinese Restaurant,Supermarket,Soup Place,Shopping Mall,Café,Scenic Lookout,Sandwich Place,Portuguese Restaurant
38,Eunos,1.325406,103.90256,1.0,Coffee Shop,Martial Arts Dojo,Convenience Store,Grocery Store,Diner,Japanese Restaurant,Fujian Restaurant,Bakery,Food Court,Food
42,Bedok,1.323976,103.930216,1.0,Chinese Restaurant,Coffee Shop,Food Court,Supermarket,Bakery,Frozen Yogurt Shop,Burger Joint,Burrito Place,French Restaurant,Soup Place


In [296]:
sg_merged.loc[sg_merged['Cluster Labels'] == 2]

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Little India,1.306648,103.849269,2.0,Indian Restaurant,Hotel,Vegetarian / Vegan Restaurant,Motel,Hospital,General College & University,Playground,Coffee Shop,Bakery,Restaurant


In [297]:
sg_merged.loc[sg_merged['Cluster Labels'] == 3]

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Middle Road,1.300168,103.852145,3.0,Hotel,Café,Art Gallery,Sushi Restaurant,Ice Cream Shop,Bakery,Mobile Phone Shop,Building,Middle Eastern Restaurant,Dessert Shop
19,Farrer Park,1.312591,103.854229,3.0,Indian Restaurant,Chinese Restaurant,Café,Hotel,Climbing Gym,Bakery,Sporting Goods Shop,Food Court,North Indian Restaurant,Restaurant
20,Jalan Besar,1.303643,103.854192,3.0,Vegetarian / Vegan Restaurant,Indian Restaurant,Hostel,Art Gallery,Chinese Restaurant,Café,Dessert Shop,Hotel,Shopping Mall,Clothing Store
25,Ardmore,1.30876,103.829589,3.0,Hotel,Japanese Restaurant,Chinese Restaurant,Boutique,Thai Restaurant,Club House,Café,French Restaurant,Burger Joint,Miscellaneous Shop
39,Katong,1.305233,103.905052,3.0,Hotel,Coffee Shop,Salad Place,Japanese Restaurant,Neighborhood,Steakhouse,Massage Studio,Café,Multiplex,Ice Cream Shop
41,Amber Road,1.303121,103.900556,3.0,Hotel,Indian Restaurant,Japanese Restaurant,Bakery,Café,Bar,Ice Cream Shop,Bistro,Sushi Restaurant,Lounge


In [298]:
sg_merged.loc[sg_merged['Cluster Labels'] == 4]

Unnamed: 0,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Raffles Place,1.284408,103.85139,4.0,Hotel,Gym / Fitness Center,Cocktail Bar,Japanese Restaurant,Salad Place,Modern European Restaurant,Building,Bridge,Massage Studio,Restaurant
1,Cecil,1.282645,103.850787,4.0,Salad Place,Hotel,Gym / Fitness Center,Cocktail Bar,Food Court,Café,Martial Arts Dojo,Beer Garden,Restaurant,Lounge
2,Marina,1.290475,103.852036,4.0,Concert Hall,Event Space,Performing Arts Venue,Business Service,Café,Restaurant,Club House,Cocktail Bar,Coffee Shop,Park
3,People's Park,1.28581,103.84416,4.0,Chinese Restaurant,Hostel,Spa,Hotel,BBQ Joint,Food Court,Seafood Restaurant,Soup Place,Flea Market,Noodle House
7,Queenstown,1.294624,103.806045,4.0,Food Court,Noodle House,Chinese Restaurant,Seafood Restaurant,Pool,Café,Stadium,Italian Restaurant,Spa,Train Station
10,Harbourfront,1.265395,103.822403,4.0,Clothing Store,Shoe Store,Toy / Game Store,Multiplex,Boutique,Supermarket,German Restaurant,Swiss Restaurant,Sushi Restaurant,Chocolate Shop
11,Pasir Panjang,1.276201,103.791476,4.0,Chinese Restaurant,Asian Restaurant,Food Court,Gas Station,Indian Restaurant,Thai Restaurant,Office,Metro Station,Café,Seafood Restaurant
13,Clementi New Town,1.314026,103.76241,4.0,Food Court,Asian Restaurant,Fried Chicken Joint,Dessert Shop,Noodle House,Thai Restaurant,Stadium,Sandwich Place,Chinese Breakfast Place,Chinese Restaurant
14,High Street,1.289301,103.851146,4.0,Concert Hall,Cocktail Bar,Italian Restaurant,Coffee Shop,Monument / Landmark,French Restaurant,Business Service,Bridge,Seafood Restaurant,Cantonese Restaurant
17,Golden Mile,1.302747,103.865186,4.0,Thai Restaurant,Coffee Shop,Food Court,Vietnamese Restaurant,French Restaurant,Hotel,BBQ Joint,Performing Arts Venue,Park,Multiplex


## 5. Discussion <a name="discussion"></a>

The resulting clusters are found to have answered our questions but also have unearthed some interesting observations.
Following are some observations points

1. Bukit Panjang and Clementi - the places I have stayed in Singapore at part of same cluster 1 (aqua blue)
2. If I have to look for places to stay for my next job change then the favorable locations are Simei, Punggol, Tampines, Bedok, Pasir Ris
3. Cluster 4 (red) is an interesting cluster. It is spread all across Singapore. It is found at the corners of Singapore as well as at the Central Business District. The reason for this looks like the presence of places of tourist attractions which are natural attractions at the suburbs and museums at the central district
4. Little India location forms a cluster 2 (green) on its own. This is a popular spot for Indian community and plethora of shopping options and restaurants. It is indeed a place which is unique in its own way and the experience cannot be found at any other place.



## 6. Conclusion <a name="conclusion"></a>

Above methodology and analysis gives an excellent approach to finding answers to our intial questions.
The study for uncovers some interesting observations which can be used for answering other important questions also.

This analysis when combined with the restaurant or tourist attraction or shopping experience ratings can answer questions on pin pointing exact venue of interest.<br>
For example. Where can I find the coffee I like when I move to different location.

Overall this case study has been very useful to gain knowledge of data wrangling, data engineering, machine learning, visualization and presentation.
