<h1>Location Clustering in Singapore based on Residential home value and nearby Venues</h1>
<h3>Data provided by Foursquare developer API and Gov.sg</h3>

<h2>The issue</h2>
<p>Pretend we are a business franchise owner in charge of a new and upcoming restaurant concept that we would like to launch in Singapore. However, we want to come up with <strong>some areas that may be good to enter the market</strong> and potentially extend in. Preferably, by the end of the study we would be able to know which areas have a relatively high density of other restaurants (due to footfall as a dining "area") and get a good idea of where to launch in.</p>

<h2>Data</h2>
<p>We would like to use machine learning to find these areas where <strong>there are many other restaurants</strong>. As it is a relatively upmarket restaurant chain, we would also like to make use of <strong>household property values</strong> as a variable in considering the areas, since property value is a very good indicator of affluency of the household and thus people in the area.</p>

<h3>Getting the dataset of median home prices from Gov.sg in order to obtain the home property values</h3>

In [3]:
from zipfile import ZipFile
!wget -q -O 'hdb_sales.zip' https://data.gov.sg/dataset/44b852a9-e7f8-4381-b896-e7c809da0f9c/download
print('Data downloaded!')

Data downloaded!


In [58]:
#Create a ZipFile Object and load sample.zip in it
with ZipFile('hdb_sales.zip', 'r') as zipObj:
   # Extract all the contents of zip file in current directory
   zipObj.extractall()

In [59]:
import os
import pandas as pd
import numpy as np
os.rename('median-resale-prices-for-registered-applications-by-town-and-flat-type.csv', 'hdb_data.csv')
os.remove('metadata-median-resale-prices-for-registered-applications-by-town-and-flat-type.txt')
files = os.listdir(os.curdir)
files

['hdb_data.csv', 'hdb_sales.zip']

In [60]:
hdb_df = pd.read_csv('hdb_data.csv')
hdb_df.tail()

Unnamed: 0,quarter,town,flat_type,price
8107,2020-Q1,YISHUN,2-ROOM,-
8108,2020-Q1,YISHUN,3-ROOM,270000
8109,2020-Q1,YISHUN,4-ROOM,360000
8110,2020-Q1,YISHUN,5-ROOM,482500
8111,2020-Q1,YISHUN,EXEC,584000


In [61]:
#Slice the df so that we only have the latest data because that's all we need.
hdb_df = hdb_df.loc[hdb_df['quarter'] == "2020-Q1"]

#we are only using the price of 4-room flats as it is the most common type of property, and also to reduce complexity
hdb_df = hdb_df.loc[hdb_df['flat_type'] == "4-ROOM"]

#remove the 1-2 neighborhoods where there are no 4-room flats
indexes = hdb_df[hdb_df['price'] == "-"].index
hdb_df.drop(indexes, inplace=True)
hdb_df = hdb_df.reset_index(drop=True)

hdb_df

Unnamed: 0,quarter,town,flat_type,price
0,2020-Q1,ANG MO KIO,4-ROOM,397500
1,2020-Q1,BEDOK,4-ROOM,387300
2,2020-Q1,BISHAN,4-ROOM,521500
3,2020-Q1,BUKIT BATOK,4-ROOM,350000
4,2020-Q1,BUKIT MERAH,4-ROOM,640000
5,2020-Q1,BUKIT PANJANG,4-ROOM,411000
6,2020-Q1,CHOA CHU KANG,4-ROOM,339000
7,2020-Q1,CLEMENTI,4-ROOM,644000
8,2020-Q1,GEYLANG,4-ROOM,450000
9,2020-Q1,HOUGANG,4-ROOM,400000


Now we have the dataframe of household prices and town names ready!

<h2>Installing dependencies for analysis and map visualisation</h2>

In [62]:
!pip install folium
import folium 
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
!pip install geocoder
import geocoder
print("Execution completed")

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 6.0MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Execution completed


<h3>Using the geocoder library to get the latitude and longitude of the towns</h3>

In [71]:
latitude = []
longitude = []
for town in hdb_df['town']:
    g = geocoder.osm(town + ", Singapore")
    latitude.append(g.latlng[0])
    longitude.append(g.latlng[1])

hdb_df['Latitude'] = latitude
hdb_df['Longitude'] = longitude
hdb_df

Unnamed: 0,quarter,town,flat_type,price,Latitude,Longitude
0,2020-Q1,ANG MO KIO,4-ROOM,397500,1.370073,103.849516
1,2020-Q1,BEDOK,4-ROOM,387300,1.323976,103.930216
2,2020-Q1,BISHAN,4-ROOM,521500,1.350986,103.848255
3,2020-Q1,BUKIT BATOK,4-ROOM,350000,1.349057,103.749591
4,2020-Q1,BUKIT MERAH,4-ROOM,640000,1.270439,103.828318
5,2020-Q1,BUKIT PANJANG,4-ROOM,411000,1.378629,103.762136
6,2020-Q1,CHOA CHU KANG,4-ROOM,339000,1.384749,103.744534
7,2020-Q1,CLEMENTI,4-ROOM,644000,1.3151,103.765231
8,2020-Q1,GEYLANG,4-ROOM,450000,1.318186,103.887056
9,2020-Q1,HOUGANG,4-ROOM,400000,1.370682,103.892545


<h2>Let's take a look at the towns on the Singapore map</h2>

In [88]:
SIN_lat = '1.3494661'
SIN_lon = '103.8405051'
map_singapore = folium.Map(location=[SIN_lat, SIN_lon], zoom_start=12)

for lat, lng, town in zip(hdb_df['Latitude'], hdb_df['Longitude'], hdb_df['town']):
    label = '{}'.format(town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_singapore)  
    
map_singapore

<h3>Testing the latitude and longitude of Bishan Town on the Foursquare API:</h3>

In [72]:
# The code was removed by Watson Studio for sharing.

In [73]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [221]:
test_latitude = '1.350986'
test_longitude = '103.848255'
LIMIT = 100
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    test_latitude, 
    test_longitude, 
    radius, 
    LIMIT)

In [77]:
results = requests.get(url).json()
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Tori-Q,Japanese Restaurant,1.350549,103.848659
1,Dian Xiao Er 店小二,Chinese Restaurant,1.350426,103.848988
2,Starbucks,Coffee Shop,1.349849,103.850415
3,Gymm Boxx XL,Gym,1.349909,103.850689
4,Bishan Cafeteria (Eating House),Food Court,1.350579,103.849336


<p>Now we are sure that the Foursquare API works.</p>
<h2>Creating a function to iterate through all the towns</h2>

In [96]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL.
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town', 
                  'Town Latitude', 
                  'Town Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    print("Completed")
    return(nearby_venues)

In [97]:
sin_venues = getNearbyVenues(names=hdb_df['town'],
                                   latitudes=hdb_df['Latitude'],
                                   longitudes=hdb_df['Longitude']
                                  )

ANG MO KIO
BEDOK
BISHAN
BUKIT BATOK
BUKIT MERAH
BUKIT PANJANG
CHOA CHU KANG
CLEMENTI
GEYLANG
HOUGANG
JURONG EAST
JURONG WEST
KALLANG/WHAMPOA
PASIR RIS
PUNGGOL
QUEENSTOWN
SEMBAWANG
SENGKANG
SERANGOON
TAMPINES
TOA PAYOH
WOODLANDS
YISHUN
Completed


In [149]:
sin_venues.head()

Unnamed: 0,Town,Town Latitude,Town Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ANG MO KIO,1.370073,103.849516,FairPrice Xtra,1.369279,103.848886,Supermarket
1,ANG MO KIO,1.370073,103.849516,Old Chang Kee,1.369094,103.848389,Snack Place
2,ANG MO KIO,1.370073,103.849516,Face Ban Mian 非板面 (Ang Mo Kio),1.372031,103.847504,Noodle House
3,ANG MO KIO,1.370073,103.849516,NTUC FairPrice,1.371507,103.847082,Supermarket
4,ANG MO KIO,1.370073,103.849516,MOS Burger,1.36917,103.847831,Burger Joint


In [150]:
sin_venues.shape

(1644, 7)

<h3>Now we have a dataframe of towns and their respective venues.</h3>
<p>In order for the analysis to work, we have to convert the venue categories into numerical values, and then calculate how often they appear in each town.</p>

In [152]:
# one hot encoding
sin_onehot = pd.get_dummies(sin_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sin_onehot['Town'] = sin_venues['Town'] 

# move neighborhood column to the first column
fixed_columns = [sin_onehot.columns[-1]] + list(sin_onehot.columns[:-1])
sin_onehot = sin_onehot[fixed_columns]
sin_onehot.head()

Unnamed: 0,Town,ATM,Accessories Store,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Track,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Wine Bar,Wings Joint
0,ANG MO KIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ANG MO KIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ANG MO KIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ANG MO KIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ANG MO KIO,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [153]:
sin_onehot.shape

(1644, 193)

In [154]:
sin_grouped = sin_onehot.groupby('Town').mean().reset_index()
sin_grouped

Unnamed: 0,Town,ATM,Accessories Store,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,...,Track,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Wine Bar,Wings Joint
0,ANG MO KIO,0.0,0.0,0.011364,0.0,0.0,0.0,0.034091,0.0,0.0,...,0.0,0.0,0.0,0.0,0.034091,0.0,0.0,0.0,0.0,0.0
1,BEDOK,0.0,0.010417,0.010417,0.0,0.0,0.0,0.03125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.010417
2,BISHAN,0.0,0.0,0.0,0.0,0.0,0.0,0.042254,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BUKIT BATOK,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BUKIT MERAH,0.0,0.0,0.016129,0.0,0.016129,0.0,0.048387,0.016129,0.0,...,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0
5,BUKIT PANJANG,0.0,0.0,0.019231,0.0,0.0,0.0,0.057692,0.0,0.0,...,0.0,0.0,0.0,0.0,0.019231,0.0,0.0,0.0,0.0,0.0
6,CHOA CHU KANG,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0
7,CLEMENTI,0.0,0.0,0.0,0.0,0.0,0.011111,0.044444,0.0,0.0,...,0.0,0.0,0.011111,0.0,0.0,0.011111,0.0,0.0,0.0,0.0
8,GEYLANG,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,...,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.0
9,HOUGANG,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.016667,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0


In [155]:
sin_grouped.shape

(23, 193)

<h3>Now we find the top 5 most common venues to use in our clustering analysis.</h3>

In [177]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Town']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
town_venues_sorted = pd.DataFrame(columns=columns)
town_venues_sorted['Town'] = sin_grouped['Town']

for ind in np.arange(sin_grouped.shape[0]):
    town_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sin_grouped.iloc[ind, :], num_top_venues)

town_venues_sorted.head()

Unnamed: 0,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,ANG MO KIO,Food Court,Coffee Shop,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
1,BEDOK,Coffee Shop,Chinese Restaurant,Food Court,Café,Supermarket
2,BISHAN,Food Court,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant
3,BUKIT BATOK,Food Court,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Malay Restaurant
4,BUKIT MERAH,Coffee Shop,Asian Restaurant,Clothing Store,Food Court,Bus Stop


In [169]:
medianprice = []
for town in town_venues_sorted['Town']:
    index = hdb_df[hdb_df['town'] == town].index
    medianprice.append(hdb_df['price'][index].tolist()[0])

In [178]:
# set number of clusters
kclusters = 4

sin_grouped_clusters = sin_grouped.drop('Town', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sin_grouped_clusters)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
town_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [179]:
town_venues_sorted.head()

Unnamed: 0,Cluster Labels,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,2,ANG MO KIO,Food Court,Coffee Shop,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
1,0,BEDOK,Coffee Shop,Chinese Restaurant,Food Court,Café,Supermarket
2,3,BISHAN,Food Court,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant
3,2,BUKIT BATOK,Food Court,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Malay Restaurant
4,0,BUKIT MERAH,Coffee Shop,Asian Restaurant,Clothing Store,Food Court,Bus Stop


In [180]:
sin_merged = hdb_df

sin_merged = pd.merge(sin_merged, town_venues_sorted, left_on='town', right_on='Town')

sin_merged.head()

Unnamed: 0,quarter,town,flat_type,price,Latitude,Longitude,Cluster Labels,Town,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,2020-Q1,ANG MO KIO,4-ROOM,397500,1.370073,103.849516,2,ANG MO KIO,Food Court,Coffee Shop,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
1,2020-Q1,BEDOK,4-ROOM,387300,1.323976,103.930216,0,BEDOK,Coffee Shop,Chinese Restaurant,Food Court,Café,Supermarket
2,2020-Q1,BISHAN,4-ROOM,521500,1.350986,103.848255,3,BISHAN,Food Court,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant
3,2020-Q1,BUKIT BATOK,4-ROOM,350000,1.349057,103.749591,2,BUKIT BATOK,Food Court,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Malay Restaurant
4,2020-Q1,BUKIT MERAH,4-ROOM,640000,1.270439,103.828318,0,BUKIT MERAH,Coffee Shop,Asian Restaurant,Clothing Store,Food Court,Bus Stop


In [181]:
sin_merged.drop(['Town', 'flat_type','quarter'], axis=1, inplace=True)
sin_merged.head()

Unnamed: 0,town,price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,ANG MO KIO,397500,1.370073,103.849516,2,Food Court,Coffee Shop,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
1,BEDOK,387300,1.323976,103.930216,0,Coffee Shop,Chinese Restaurant,Food Court,Café,Supermarket
2,BISHAN,521500,1.350986,103.848255,3,Food Court,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant
3,BUKIT BATOK,350000,1.349057,103.749591,2,Food Court,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Malay Restaurant
4,BUKIT MERAH,640000,1.270439,103.828318,0,Coffee Shop,Asian Restaurant,Clothing Store,Food Court,Bus Stop


In [182]:
# create map
map_clusters = folium.Map(location=[SIN_lat, SIN_lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sin_merged['Latitude'], sin_merged['Longitude'], sin_merged['town'], sin_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [215]:
cl1 = sin_merged.loc[sin_merged['Cluster Labels'] == 0, sin_merged.columns[[1] + list(range(5,sin_merged.shape[1]))]]
cl1 = cl1.astype({'price': 'int32'})
print("The mean price is {}".format(cl1['price'].mean(axis=0)))
cl1

The mean price is 424572.7272727273


Unnamed: 0,price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,387300,Coffee Shop,Chinese Restaurant,Food Court,Café,Supermarket
4,640000,Coffee Shop,Asian Restaurant,Clothing Store,Food Court,Bus Stop
6,339000,Coffee Shop,Fast Food Restaurant,Food Court,Asian Restaurant,Gym
9,400000,Coffee Shop,Food Court,Chinese Restaurant,Fast Food Restaurant,Asian Restaurant
10,375000,Food Court,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Café
12,500000,Coffee Shop,Chinese Restaurant,Convenience Store,Food Court,Bus Line
13,435500,Coffee Shop,Fast Food Restaurant,Food Court,Park,Supermarket
14,460000,Café,Fast Food Restaurant,Supermarket,Electronics Store,Japanese Restaurant
16,351000,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Japanese Restaurant
19,422500,Coffee Shop,Café,Bakery,Supermarket,Bubble Tea Shop


In [217]:
cl2 = sin_merged.loc[sin_merged['Cluster Labels'] == 1, sin_merged.columns[[1] + list(range(5,sin_merged.shape[1]))]]
cl2 = cl2.astype({'price': 'int32'})
print("The mean price is {}".format(cl2['price'].mean(axis=0)))
cl2

The mean price is 418000.0


Unnamed: 0,price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,411000,Fast Food Restaurant,Café,Coffee Shop,Asian Restaurant,Supermarket
17,425000,Bus Station,Food Court,Fast Food Restaurant,Coffee Shop,Supermarket


In [218]:
cl3 = sin_merged.loc[sin_merged['Cluster Labels'] == 2, sin_merged.columns[[1] + list(range(5,sin_merged.shape[1]))]]
cl3 = cl3.astype({'price': 'int32'})
print("The mean price is {}".format(cl3['price'].mean(axis=0)))
cl3

The mean price is 362500.0


Unnamed: 0,price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,397500,Food Court,Coffee Shop,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
3,350000,Food Court,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Malay Restaurant
21,340000,Food Court,Coffee Shop,Fast Food Restaurant,Café,Asian Restaurant


In [219]:
cl4 = sin_merged.loc[sin_merged['Cluster Labels'] == 3, sin_merged.columns[[1] + list(range(5,sin_merged.shape[1]))]]
cl4 = cl4.astype({'price': 'int32'})
print("The mean price is {}".format(cl4['price'].mean(axis=0)))
cl4

The mean price is 524500.0


Unnamed: 0,price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,521500,Food Court,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant
7,644000,Food Court,Chinese Restaurant,Indian Restaurant,Asian Restaurant,Supermarket
8,450000,Chinese Restaurant,Asian Restaurant,Food Court,Vegetarian / Vegan Restaurant,Noodle House
11,360000,Japanese Restaurant,Asian Restaurant,Fast Food Restaurant,Chinese Restaurant,Coffee Shop
15,728000,Chinese Restaurant,Coffee Shop,Food Court,Café,Noodle House
18,429000,Asian Restaurant,Coffee Shop,Chinese Restaurant,Café,Bus Station
20,539000,Chinese Restaurant,Noodle House,Food Court,Asian Restaurant,Coffee Shop


<h4>Based on the clusters above, we can start to find some meaningful insights, and even come up with some potential categories for the clusters</h4>
<ul>
    <li>Cluster 1: Coffee shops & Restaurants