# IBM Data Science Captsone Project

This notebook will be use to do my coursework for the IBM Data Science Capstone Project.

## Problem

The business problem of this project is to analyze and provide insights into ideal locations for opening a food business in Singapore. By using location data to aid us, data science comes in to address the problem and aims to answer this business question: 

If someone wants to open a food business in Singapore, which locations will be recommended? 

## Steps to take 

#### Import libraries >> Scrap data >> Data cleaning >> Data exploration >> Data modelling >> Evaluation

### Import libraries

In [1]:
#Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import preprocessing
%matplotlib inline
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import geocoder
import requests 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes  
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


### Scrap data

In [2]:
from bs4 import BeautifulSoup

url= requests.get('https://en.wikipedia.org/wiki/Postal_codes_in_Singapore').text
soup= BeautifulSoup(url,'html.parser')
soup_split= soup.prettify().splitlines()
wiki_table= soup.find('table',{'class':'wikitable'})
#wiki_table

### Data cleaning

In [50]:
#read the html file to a dataframe
df= pd.read_html(str(wiki_table))
df= pd.DataFrame(df[0])
#df.head()
print(df.shape)
df

(28, 3)


Unnamed: 0,Postal district,Postal sector(1st 2 digits of 6-digit postal codes),General location
0,1,"01, 02, 03, 04, 05, 06","Raffles Place, Cecil, Marina, People's Park"
1,2,"07, 08","Anson, Tanjong Pagar"
2,3,"14, 15, 16","Bukit Merah, Queenstown, Tiong Bahru"
3,4,"09, 10","Telok Blangah, Harbourfront"
4,5,"11, 12, 13","Pasir Panjang, Hong Leong Garden, Clementi New..."
5,6,17,"High Street, Beach Road (part)"
6,7,"18, 19","Middle Road, Golden Mile"
7,8,"20, 21","Little India, Farrer Park, Jalan Besar, Lavender"
8,9,"22, 23","Orchard, Cairnhill, River Valley"
9,10,"24, 25, 26, 27","Ardmore, Bukit Timah, Holland Road, Tanglin"


In [5]:
#Define a get coordinates function
def get_latlng(general_l):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Singapore'.format(general_l))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [6]:
#call the function and store to a list
coords = [ get_latlng(general_l) for general_l in df["General location"].tolist() ]

In [7]:
#view the first five lines of the list
coords[0:5]

[[1.2818900000000326, 103.84912000000008],
 [1.2788900000000467, 103.84539000000007],
 [1.2929730216474127, 103.80565615548085],
 [1.2653312143819655, 103.81886147149507],
 [1.3132035831516617, 103.75570981938193]]

In [8]:
#Change the list to a dataframe
latlong_df= pd.DataFrame(coords, columns= ['Latitude','Longitude'])
print(latlong_df.shape)
latlong_df.head()

(28, 2)


Unnamed: 0,Latitude,Longitude
0,1.28189,103.84912
1,1.27889,103.84539
2,1.292973,103.805656
3,1.265331,103.818861
4,1.313204,103.75571


In [22]:
#Merge latlong dataframe with actual dataframe
df['Latitude']= latlong_df['Latitude']
df['Longitude']= latlong_df['Longitude']
df.rename(columns={"Postal district": "Postal_district","General location":"General_location"}, inplace= True)
df.head()

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude
0,1,"01, 02, 03, 04, 05, 06","Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912
1,2,"07, 08","Anson, Tanjong Pagar",1.27889,103.84539
2,3,"14, 15, 16","Bukit Merah, Queenstown, Tiong Bahru",1.292973,103.805656
3,4,"09, 10","Telok Blangah, Harbourfront",1.265331,103.818861
4,5,"11, 12, 13","Pasir Panjang, Hong Leong Garden, Clementi New...",1.313204,103.75571


In [24]:
#Rename and save dataframe to csv file
sg_df= df
sg_df.to_csv('sg_df', index= False)

In [25]:
sg_df.head()

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude
0,1,"01, 02, 03, 04, 05, 06","Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912
1,2,"07, 08","Anson, Tanjong Pagar",1.27889,103.84539
2,3,"14, 15, 16","Bukit Merah, Queenstown, Tiong Bahru",1.292973,103.805656
3,4,"09, 10","Telok Blangah, Harbourfront",1.265331,103.818861
4,5,"11, 12, 13","Pasir Panjang, Hong Leong Garden, Clementi New...",1.313204,103.75571


In [26]:
#Get geographical coordinates of Singapore 
address= 'Singapore'

geolocator = Nominatim(user_agent="sg_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Singapore are 1.3408630000000001, 103.83039182212079.


In [27]:
#Visualize the neighbourhoods in Singapore

#create map of Singapore using latitude and longitude values
map_sg = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(sg_df['Latitude'], sg_df['Longitude'], sg_df['General_location']):
    label = folium.Popup(label, parse_html=True)
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sg)  
map_sg  

### Data exploration

In [28]:
#Define foursquare credentials and version

CLIENT_ID = 'YZUEBDPMA4W3LVDSXUBYVXLYHLE5QPTBGDIWW53VX5YBGL3S' # Foursquare ID
CLIENT_SECRET = 'VUVQCXQHWZALCWL4JLFYA0SOT0B3EC3NPQBU0SITKUI0XH1G' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YZUEBDPMA4W3LVDSXUBYVXLYHLE5QPTBGDIWW53VX5YBGL3S
CLIENT_SECRET:VUVQCXQHWZALCWL4JLFYA0SOT0B3EC3NPQBU0SITKUI0XH1G


In [29]:
#ceate a function to get top 100 venues around a radius of 1000m

def getNearbyVenues(names, latitudes, longitudes, radius=1000, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Location Latitude', 
                  'Location Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
sg_venues = getNearbyVenues(names=sg_df['General_location'],
                                   latitudes=sg_df['Latitude'],
                                   longitudes=sg_df['Longitude']
                                  )

Raffles Place, Cecil, Marina, People's Park
Anson, Tanjong Pagar
Bukit Merah, Queenstown, Tiong Bahru
Telok Blangah, Harbourfront
Pasir Panjang, Hong Leong Garden, Clementi New Town
High Street, Beach Road (part)
Middle Road, Golden Mile
Little India, Farrer Park, Jalan Besar, Lavender
Orchard, Cairnhill, River Valley
Ardmore, Bukit Timah, Holland Road, Tanglin
Watten Estate, Novena, Thomson
Balestier, Toa Payoh, Serangoon
Macpherson, Braddell
Geylang, Eunos, Aljunied
Katong, Joo Chiat, Amber Road
Bedok, Upper East Coast, Eastwood, Kew Drive
Loyang, Changi
Simei, Tampines, Pasir Ris
Serangoon Garden, Hougang, Punggol
Bishan, Ang Mo Kio
Upper Bukit Timah, Clementi Park, Ulu Pandan
Penjuru, Jurong, Pioneer, Tuas
Hillview, Dairy Farm, Bukit Panjang, Choa Chu Kang
Lim Chu Kang, Tengah
Kranji, Woodgrove, Woodlands
Upper Thomson, Springleaf
Yishun, Sembawang, Senoko
Seletar


In [31]:
#Check the venues
sg_venues.head()

Unnamed: 0,Location,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Amoy Hotel,1.283118,103.848539,Hotel
1,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Luke's Oyster Bar & Chop House,1.282459,103.84724,Seafood Restaurant
2,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Napoleon Food & Wine Bar,1.279925,103.847333,Wine Bar
3,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Freehouse,1.281254,103.848513,Beer Garden
4,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Lau Pa Sat Satay Street,1.280261,103.850235,Street Food Gathering


In [52]:
sg_venues.groupby('Location').count().head()

Unnamed: 0_level_0,Location Latitude,Location Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Anson, Tanjong Pagar",100,100,100,100,100,100
"Ardmore, Bukit Timah, Holland Road, Tanglin",18,18,18,18,18,18
"Balestier, Toa Payoh, Serangoon",77,77,77,77,77,77
"Bedok, Upper East Coast, Eastwood, Kew Drive",34,34,34,34,34,34
"Bishan, Ang Mo Kio",32,32,32,32,32,32


In [33]:
# one hot encoding
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sg_onehot['Location'] = sg_venues['Location'] 

# move neighborhood column to the first column
fixed_columns = [sg_onehot.columns[-1]] + list(sg_onehot.columns[:-1])
sg_onehot = sg_onehot[fixed_columns]

sg_onehot.head()

Unnamed: 0,Location,Airport,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
3,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [35]:
#Group venues base on the location in Singapore

sg_grouped = sg_onehot.groupby('Location').mean().reset_index()
sg_grouped

Unnamed: 0,Location,Airport,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,...,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Anson, Tanjong Pagar",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Balestier, Toa Payoh, Serangoon",0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.064935,...,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.058824,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bishan, Ang Mo Kio",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bukit Merah, Queenstown, Tiong Bahru",0.0,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.036585,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Geylang, Eunos, Aljunied",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01
7,"High Street, Beach Road (part)",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01
8,"Hillview, Dairy Farm, Bukit Panjang, Choa Chu ...",0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Katong, Joo Chiat, Amber Road",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02


In [36]:
sg_grouped.shape

(28, 245)

In [37]:
#show the top 5 venues in each location 
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [51]:
#Display the top 5 most common venues base on the location
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
location_venues_sorted = pd.DataFrame(columns=columns)
location_venues_sorted['Location'] = sg_grouped['Location']

for ind in np.arange(sg_grouped.shape[0]):
    location_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

location_venues_sorted.head()

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,"Anson, Tanjong Pagar",Coffee Shop,Japanese Restaurant,Korean Restaurant,Hotel,Café
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",Bus Station,Café,Coffee Shop,Gourmet Shop,Bar
2,"Balestier, Toa Payoh, Serangoon",Coffee Shop,Chinese Restaurant,Asian Restaurant,Noodle House,Park
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",Bus Station,Coffee Shop,Noodle House,Food Court,Asian Restaurant
4,"Bishan, Ang Mo Kio",Chinese Restaurant,Soup Place,Bus Station,Food Court,Indian Restaurant


### Data modelling

In [39]:
#k-means clustering will be use to form clusters for deriving insights
# set number of clusters
kclusters = 4

sg_cluster = sg_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 2, 2, 2, 2, 2, 0, 0, 0])

In [40]:
#add clustering labels
location_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sg_merged= sg_df

#merge sg_grouped with sg_data to add latitude/longitude for each neighborhood
sg_merged = sg_merged.join(location_venues_sorted.set_index('Location'), on='General_location')


In [45]:
sg_merged['Cluster Labels'].value_counts()

0    14
2    12
3     1
1     1
Name: Cluster Labels, dtype: int64

In [49]:
import matplotlib.cm as cm
import matplotlib.colors as colors
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['Latitude'], sg_merged['Longitude'], sg_merged['General_location'], sg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=('red','blue','green','white'),
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster 0

In [28]:
sg_merged.loc[sg_merged['Cluster Labels'] == 0]

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,1,"01, 02, 03, 04, 05, 06","Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,0,Hotel,Cocktail Bar,Gym / Fitness Center,Japanese Restaurant,Restaurant
1,2,"07, 08","Anson, Tanjong Pagar",1.27889,103.84539,0,Coffee Shop,Japanese Restaurant,Korean Restaurant,Hotel,Café
3,4,"09, 10","Telok Blangah, Harbourfront",1.265331,103.818861,0,Japanese Restaurant,Chinese Restaurant,Clothing Store,Toy / Game Store,Shopping Mall
5,6,17,"High Street, Beach Road (part)",1.290619,103.849451,0,Hotel,Cocktail Bar,Japanese Restaurant,Café,Waterfront
6,7,"18, 19","Middle Road, Golden Mile",1.299462,103.852847,0,Hotel,Café,Japanese Restaurant,Shopping Mall,Bookstore
7,8,"20, 21","Little India, Farrer Park, Jalan Besar, Lavender",1.3071,103.85842,0,Indian Restaurant,Chinese Restaurant,Hotel,Café,Italian Restaurant
8,9,"22, 23","Orchard, Cairnhill, River Valley",1.30656,103.83945,0,Hotel,Japanese Restaurant,Shopping Mall,Clothing Store,Chinese Restaurant
9,10,"24, 25, 26, 27","Ardmore, Bukit Timah, Holland Road, Tanglin",1.323305,103.784985,0,Bus Station,Café,Coffee Shop,Gourmet Shop,Bar
10,11,"28, 29, 30","Watten Estate, Novena, Thomson",1.32667,103.81139,0,Café,Japanese Restaurant,Bakery,Shopping Mall,Asian Restaurant
14,15,"42, 43, 44, 45","Katong, Joo Chiat, Amber Road",1.300876,103.901634,0,Chinese Restaurant,Noodle House,Asian Restaurant,Indian Restaurant,Hotel


#### Cluster 1

In [29]:
sg_merged.loc[sg_merged['Cluster Labels'] == 1]

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,24,"69, 70, 71","Lim Chu Kang, Tengah",1.41967,103.70232,1,Asian Restaurant,Café,History Museum,Clothing Store,Military Base


#### Cluster 2

In [30]:
sg_merged.loc[sg_merged['Cluster Labels'] == 2]

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,3,"14, 15, 16","Bukit Merah, Queenstown, Tiong Bahru",1.292973,103.805656,2,Chinese Restaurant,Food Court,Café,Coffee Shop,Supermarket
4,5,"11, 12, 13","Pasir Panjang, Hong Leong Garden, Clementi New...",1.313204,103.75571,2,Food Court,Food & Drink Shop,Indian Restaurant,Noodle House,Chinese Breakfast Place
11,12,"31, 32, 33","Balestier, Toa Payoh, Serangoon",1.35554,103.8766,2,Coffee Shop,Chinese Restaurant,Asian Restaurant,Noodle House,Park
12,13,"34, 35, 36, 37","Macpherson, Braddell",1.32789,103.88519,2,Food Court,Chinese Restaurant,Coffee Shop,Asian Restaurant,Grocery Store
13,14,"38, 39, 40, 41","Geylang, Eunos, Aljunied",1.31399,103.88197,2,Chinese Restaurant,Noodle House,Food Court,Seafood Restaurant,Asian Restaurant
15,16,"46, 47, 48","Bedok, Upper East Coast, Eastwood, Kew Drive",1.320397,103.950729,2,Bus Station,Coffee Shop,Noodle House,Food Court,Asian Restaurant
17,18,"51, 52","Simei, Tampines, Pasir Ris",1.37194,103.94994,2,Coffee Shop,Park,Food Court,Sandwich Place,Fast Food Restaurant
18,19,"53, 54, 55, 82","Serangoon Garden, Hougang, Punggol",1.364027,103.860205,2,Chinese Restaurant,Food Court,Asian Restaurant,Coffee Shop,Bakery
19,20,"56, 57","Bishan, Ang Mo Kio",1.36447,103.83506,2,Chinese Restaurant,Soup Place,Bus Station,Food Court,Indian Restaurant
21,22,"60, 61, 62, 63, 64","Penjuru, Jurong, Pioneer, Tuas",1.32088,103.74532,2,Coffee Shop,Café,Food Court,Gym / Fitness Center,Baseball Stadium


#### Cluster 3

In [31]:
sg_merged.loc[sg_merged['Cluster Labels'] == 3]

Unnamed: 0,Postal_district,Postal sector(1st 2 digits of 6-digit postal codes),General_location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
16,17,"49, 50, 81","Loyang, Changi",1.373017,103.968394,3,Bus Station,Asian Restaurant,Supermarket,Convenience Store,Cafeteria


### Evaluation

Summary of results

|Cluster|Quantity|
|-------|--------|
|0      |14      |
|2      |12      |
|3      |1       |
|1      |1       |

From the folium and observing the above data sets of the different clusters, we can derived the following: 

Cluster 0 and 2 are the ones that we are more concerned in as that is where the 2 clusters have the most dense results of 14 and 12 respectively. However, for cluster 3 and 1, there are only 1 data point each. 

We can observed that cluster 1's most common venue are actually not very appealing to many people with a milltary base as their top 5 common venues. We will not be expecing much crowd in that area particularly as millitary base do not bring much crowd and spend most of the time in the base instead. Lastly, both cluster 1 and 3 are located along the ends of the country and many tourists would usually hang out around the central region as that is more convenient part of the country for travelling wise. Hence, these 2 clusters are not recommended to open a food business.

For cluster 2, there is an overwhelming results of restaurants, bars and coffeeshops which makes it a great cluster to consider in when opening a food business. This could also mean that those locations are where the crowds are which reflects the high number of food business there. In cluster 2, there are mainly Asian and chinese restaurants there which I derived the assumption that the restaurants that are set up in that cluster are targetting locals as a big portion of Singaore's population are chinese. 

Lastly, in cluster 0, we could see from the data set that there is a unique venues that only cluster 0 has which is hotels. There are also a couple of Italian restaurants too. With this, I made the assumption that this cluster caters more towards the forigner's area which means that it is also an excellent place to open a food business in this area. 

Base the machine learning model that is being applied and after evluating the maps and datasets base on location data only, I would recommend a food business to be open at cluster 0 as the cluster are populated with not just Singaporeans, but also foreigners and this could mean more customers and more profit for the business. 
