# Healthy Café Location Selection in Singapore

IMB Data Science Capstone Project – The Battles of Neighbourhoods

<b>Introduction<br>
I.	Background</b>

About 400,000 Singaporeans are diabetes and one in three has lifetime risk of developing the disease. And if nothing is done, the number of diabetes under age 70 is expected to rise to 670,000 by 2030 and one million by 2050.
<br><br>
Singapore government adopts a multi-pronged strategy to encourage Singaporeans to eat healthily and reduce their sugar intake from foods and drinks.<br>
Coffee is one of the common sugar intakes for Singaporean. We are going to open a café to server coffee with less sugar and food with balanced nutrition.


<b>II.	Healthy Café Concept</b>

<b>Target customers:</b> Office clericals who care about healthy lifestyle but don’t have time to prepare food for lunch or rush for dinner due to overtime.<br>
<b>Food Service:</b> Coffee with sugar level choices and different types of milk (e.g. non-fat, skimmed milk, soya milk, etc). Food with nutrition label, and different size options to match and mix.


<b>III.	Objective</b><br>
To find out suitable locations in Singapore<br>
a.	Near business area<br>
b.	Ares with high density of gyms (To filter those areas passed by more people with healthy concept)<br>
c.	Distribution of restaurants (types and density) – to understand competitor distribution<br>

<b>Data Acquisition</b><br>
I.	Target business area in Singapore: https://www.corporateservicessingapore.com/7-popular-business-locations-singapore/ <br>
So, we can get below starting points<br>
1.	Raffles Place Area<br>
2.	Marina Bay Area<br>
3.	Tanjong Pagar / Anson Road<br>
4.	Orchard Road Area<br>
5.	Shenton Way Area<br>
6.	River Valley<br>
7.	Suntec City<br>
<br>

Uploaded datafile to GitHub as a start.

In [1]:
!conda install -c conda-forge folium=0.5.0 --yes # comment/uncomment if not yet installed.
!conda install -c conda-forge geopy --yes        # comment/uncomment if not yet installed

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

# Numpy and Pandas libraries were already imported at the beginning of this notebook.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

import requests # library to handle requests
import lxml.html as lh
import bs4 as bs
import urllib.request

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    altair-3.3.0               |           py36_0         747 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.2 MB

The following NEW packages will be 

In [2]:
data_area=pd.read_csv("https://raw.githubusercontent.com/shayy07/Coursera_Capstone/master/OfficeArea.csv")
data_area

Unnamed: 0,Area,Landmark
0,Raffles Place Area,Raffles Place MRT
1,Marina Bay Area,Marina Bay Finacial Centre
2,Tanjong Pagar / Anson Road,Tanjong Pagar MRT
3,Orchard Road Area,Ngee Ann City
4,Shenton Way Area,Capital Tower
5,River Valley,Great World City
6,Suntec City,Suntec City


In [3]:
MyLM=data_area["Landmark"]
MyLM

0             Raffles Place MRT
1    Marina Bay Finacial Centre
2             Tanjong Pagar MRT
3                 Ngee Ann City
4                 Capital Tower
5              Great World City
6                   Suntec City
Name: Landmark, dtype: object

In [4]:
# @hidden_cell
google_key="AIzaSyBkPw7COxbYTtvc4gTXDMRyxbYdxHHSo3M"
cid='R0A4V12LCTHAPQXIP5103KSSIWCCLOLSTZB3PRJUGMSPJNRN'
csecret='20SFFWC52FAZJEMZBRE450QB1H05F4YJL5QRGTK0I0F4ANRR'

II.	Get geocodes of above areas via Google Map

In [5]:
data_area['Latitude'] = 0.0
data_area['Longitude'] = 0.0

for idx,area in data_area['Landmark'].iteritems():
    area=area + ' ' + 'Singapore'
    url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}&key={}'.format(area,google_key)
    #url='https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input={}&key={}.format(area,google_key)
    lat = requests.get(url).json()["results"][0]["geometry"]["location"]['lat']
    lng = requests.get(url).json()["results"][0]["geometry"]["location"]['lng']
    data_area.loc[idx,'Latitude'] = lat
    data_area.loc[idx,'Longitude'] = lng
    
data_area

Unnamed: 0,Area,Landmark,Latitude,Longitude
0,Raffles Place Area,Raffles Place MRT,1.283969,103.85154
1,Marina Bay Area,Marina Bay Finacial Centre,1.280283,103.854307
2,Tanjong Pagar / Anson Road,Tanjong Pagar MRT,1.276342,103.846792
3,Orchard Road Area,Ngee Ann City,1.302557,103.834568
4,Shenton Way Area,Capital Tower,1.277737,103.847622
5,River Valley,Great World City,1.293642,103.831929
6,Suntec City,Suntec City,1.29478,103.858526


In [62]:
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Singapore'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map
map_area = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

# set color scheme for the clusters
x = np.arange(7)
ys = [i+x+(i*x)**2 for i in range(7)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
cluster=0
for lat, lon, poi, cat in zip(data_area['Latitude'], data_area['Longitude'], data_area['Area'], data_area['Landmark']):
    cluster=cluster+1
    label = folium.Popup(str(cat) + '-' + str(poi), parse_html=True)
    folium.Marker(
        [lat, lon],
        popup=label).add_to(map_area)
       
map_area

III.	Collect location data from Foursquare website: https://developer.foursquare.com/<br>
1.	Location of Venues

In [6]:
url = 'https://api.foursquare.com/v2/venues/explore'
venue_list=[]

In [7]:
for idx,lat in data_area['Latitude'].iteritems():
    lng=data_area.loc[idx,'Longitude']
    name=data_area.loc[idx,'Area']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=300,
      limit=80,
      query='food'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Other Food"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [8]:
for idx,lat in data_area['Latitude'].iteritems():
    lng=data_area.loc[idx,'Longitude']
    name=data_area.loc[idx,'Area']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=300,
      limit=80,
      query='Café'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Café"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [9]:
for idx,lat in data_area['Latitude'].iteritems():
    lng=data_area.loc[idx,'Longitude']
    name=data_area.loc[idx,'Area']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=300,
      limit=80,
      query='gym'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Gym"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [10]:
nearby_venues = pd.DataFrame([item for venue_list in venue_list for item in venue_list])
nearby_venues.columns = ['Area','Area_Latitude','Area_Longitude','Venue_Main_Category','Venue_ID','Venue','Venue_Latitude','Venue_Longitude','Venue_Category']

nearby_venues.head()

Unnamed: 0,Area,Area_Latitude,Area_Longitude,Venue_Main_Category,Venue_ID,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Raffles Place Area,1.283969,103.85154,Other Food,53ae48b4498ec970f3cc0455,CITY Hot Pot Shabu shabu,1.284173,103.851585,Hotpot Restaurant
1,Raffles Place Area,1.283969,103.85154,Other Food,55fe3da6498e0c5eaa442250,CULINARYON,1.284876,103.850933,Comfort Food Restaurant
2,Raffles Place Area,1.283969,103.85154,Other Food,59538b0ac824ae5235423946,Waa Cow!,1.284284,103.851215,Japanese Restaurant
3,Raffles Place Area,1.283969,103.85154,Other Food,4bb2edcb2397b713669e37b3,The Salad Shop,1.285523,103.851177,Salad Place
4,Raffles Place Area,1.283969,103.85154,Other Food,4b568e7cf964a520151528e3,The Sandwich Shop,1.284266,103.852673,Sandwich Place


In [11]:
nearby_venues.groupby(['Venue_Main_Category'])['Venue_Category'].value_counts(normalize=False)

Venue_Main_Category  Venue_Category                          
Café                 Café                                        276
                     Coffee Shop                                   4
                     Bar                                           2
                     Italian Restaurant                            2
                     Asian Restaurant                              1
                     Bakery                                        1
                     Breakfast Spot                                1
                     Fast Food Restaurant                          1
                     Food Court                                    1
                     Noodle House                                  1
                     Vietnamese Restaurant                         1
Gym                  Gym / Fitness Center                         35
                     Gym                                          27
                     Yoga Studio         

2. Clean data

In [12]:
nearby_venues_clean=nearby_venues.copy()

list_cafe_filter=['Bar','Fast Food Restaurant']
indexNames = nearby_venues_clean[ (nearby_venues_clean['Venue_Main_Category'] == 'Café') & (nearby_venues_clean['Venue_Category'].isin(list_cafe_filter)) ].index
nearby_venues_clean.drop(indexNames , inplace=True)

list_Gym_filter=['Hotel','Residential Building (Apartment / Condo)','Martial Arts Dojo','Building','Hotel Pool','Track']
indexNames = nearby_venues_clean[ (nearby_venues_clean['Venue_Main_Category'] == 'Gym') & (nearby_venues_clean['Venue_Category'].isin(list_Gym_filter)) ].index
nearby_venues_clean.drop(indexNames , inplace=True)

list_Food_filter=['Café']
indexNames = nearby_venues_clean[ (nearby_venues_clean['Venue_Main_Category'] == 'Other Food') & (nearby_venues_clean['Venue_Category'].isin(list_Food_filter)) ].index
nearby_venues_clean.drop(indexNames , inplace=True)


for idx,cat_m in nearby_venues_clean['Venue_Main_Category'].iteritems():
    cat=nearby_venues_clean.loc[idx,'Venue_Category']
    if cat_m=='Café' or cat_m=='Gym':
        nearby_venues_clean.loc[idx,'Final_Category']=cat_m
    else:
        nearby_venues_clean.loc[idx,'Final_Category']=cat
    
nearby_venues_clean.head()
#nearby_venues_clean[nearby_venues_clean['Area']=='Raffles Place Area'].groupby(['Venue_Main_Category'])['Venue_Category'].value_counts(normalize=False)

Unnamed: 0,Area,Area_Latitude,Area_Longitude,Venue_Main_Category,Venue_ID,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Final_Category
0,Raffles Place Area,1.283969,103.85154,Other Food,53ae48b4498ec970f3cc0455,CITY Hot Pot Shabu shabu,1.284173,103.851585,Hotpot Restaurant,Hotpot Restaurant
1,Raffles Place Area,1.283969,103.85154,Other Food,55fe3da6498e0c5eaa442250,CULINARYON,1.284876,103.850933,Comfort Food Restaurant,Comfort Food Restaurant
2,Raffles Place Area,1.283969,103.85154,Other Food,59538b0ac824ae5235423946,Waa Cow!,1.284284,103.851215,Japanese Restaurant,Japanese Restaurant
3,Raffles Place Area,1.283969,103.85154,Other Food,4bb2edcb2397b713669e37b3,The Salad Shop,1.285523,103.851177,Salad Place,Salad Place
4,Raffles Place Area,1.283969,103.85154,Other Food,4b568e7cf964a520151528e3,The Sandwich Shop,1.284266,103.852673,Sandwich Place,Sandwich Place


In [13]:
venue_freq=pd.crosstab(nearby_venues_clean.Area,nearby_venues_clean.Venue_Main_Category,margins=False)
venue_freq.sort_values('Gym',ascending=False)

Venue_Main_Category,Café,Gym,Other Food
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Shenton Way Area,57,22,43
Raffles Place Area,55,17,19
Tanjong Pagar / Anson Road,45,16,29
Orchard Road Area,46,10,68
Suntec City,56,10,59
Marina Bay Area,20,4,27
River Valley,9,4,32


Do a simple check, our assumption is correct. The correlation between the density of café and gyms are positive.

In [23]:
from scipy.stats import pearsonr
# calculate Pearson's correlation
corr, _ = pearsonr(venue_freq["Café"], venue_freq["Gym"])
print('Pearsons correlation: %.3f' % corr)

Pearsons correlation: 0.811


# Conclusion 

There are most Gyms in Shenton Way Area. So, relatively we can say people near Shenton Way Area have more healthy awareness.<br>
There're always cafe near gym. It's not easy to find a place without competitors. <br>
Based on the density of competitor and gyms, relatively Shenton Way Area is ideal.

In [29]:
shenton=nearby_venues_clean[nearby_venues_clean['Area']=='Shenton Way Area']
shenton.head()

Unnamed: 0,Area,Area_Latitude,Area_Longitude,Venue_Main_Category,Venue_ID,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Final_Category
158,Shenton Way Area,1.277737,103.847622,Other Food,5bab10b9364d97002cc5e458,Muchachos,1.279072,103.847026,Burrito Place,Burrito Place
159,Shenton Way Area,1.277737,103.847622,Other Food,50248ddde4b0bb3db3a6ff8a,Pepper Bowl,1.279371,103.84671,Asian Restaurant,Asian Restaurant
160,Shenton Way Area,1.277737,103.847622,Other Food,5887008018dc5375de1b11e0,Kuro Maguro,1.276699,103.845951,Japanese Restaurant,Japanese Restaurant
161,Shenton Way Area,1.277737,103.847622,Other Food,59410e436f706a795d106c76,Ippudo (一風堂),1.27707,103.84579,Ramen Restaurant,Ramen Restaurant
162,Shenton Way Area,1.277737,103.847622,Other Food,591fc82fe96d0c63d9997b98,Venue By Sebastian,1.276363,103.848251,Restaurant,Restaurant


In [37]:
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Singapore'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map
map_shenton = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

# set color scheme for the clusters
x = np.arange(3)
ys = [i+x+(i*x)**2 for i in range(3)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
cluster=0
for lat, lon, poi, cat in zip(shenton['Venue_Latitude'], shenton['Venue_Longitude'], shenton['Venue'], shenton['Venue_Main_Category']):
    if cat=='Gym':
        cluster=0
        color='green'
    elif cat=='Café':
        cluster=1
        color='red'
    else:
        cluster=2
        color='yellow'
        
    label = folium.Popup(str(cat) + '-' + str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=6,
        popup=label,
        color=color,
        #color=rainbow[cluster-1],
        fill=True,
        fill_color=color,
        #fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_shenton)
       
map_shenton

In [38]:
shenton_gym=shenton[shenton['Venue_Main_Category']=='Gym']
venue_list=[]

In [39]:
for idx,lat in shenton_gym['Venue_Latitude'].iteritems():
    lng=shenton_gym.loc[idx,'Venue_Longitude']
    name=shenton_gym.loc[idx,'Venue']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=50,
      query='Café'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Café"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [40]:
for idx,lat in shenton_gym['Venue_Latitude'].iteritems():
    lng=shenton_gym.loc[idx,'Venue_Longitude']
    name=shenton_gym.loc[idx,'Venue']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=50,
      query='food'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Other food"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [41]:
for idx,lat in shenton_gym['Venue_Latitude'].iteritems():
    lng=shenton_gym.loc[idx,'Venue_Longitude']
    name=shenton_gym.loc[idx,'Venue']
    sll=str(lat) + ',' + str(lng)
    params = dict(
      client_id=cid,
      client_secret=csecret,
      v='20180323',
      ll=sll,
      radius=50,
      query='Gym'
    )
    resp = requests.get(url=url, params=params).json()["response"]['groups'][0]['items']
    main_category="Gym"
    venue_list.append([(name,lat,lng,main_category,v['venue']['id'],v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in resp])

In [42]:
nearby_shenton = pd.DataFrame([item for venue_list in venue_list for item in venue_list])
nearby_shenton.columns = ['Venue','Gym_Latitude','Gym_Longitude','Venue_Main_Category','Venue_ID','Sub_Venue','Venue_Latitude','Venue_Longitude','Venue_Category']

nearby_shenton.head()

Unnamed: 0,Venue,Gym_Latitude,Gym_Longitude,Venue_Main_Category,Venue_ID,Sub_Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Virgin Active,1.277055,103.846435,Café,5a1e3809a0215b152b57dedf,Cedele,1.276936,103.84626,Café
1,Virgin Active,1.277055,103.846435,Café,4e256401a80967678ca3d6f2,Da Pai Dang 大牌档,1.277222,103.846639,Café
2,Virgin Active,1.277055,103.846435,Café,4e23c2ca1495f18f036537c2,Share Tea 歇脚亭,1.277387,103.846283,Café
3,Virgin Active,1.277055,103.846435,Café,4d8987d9401a224b3bd88e18,Coffee & Toast,1.276724,103.846634,Café
4,Fitness First Platinum,1.277843,103.847654,Café,5625d21a498e596424dac931,Joe & Dough,1.277715,103.847619,Café


In [43]:
nearby_shenton.groupby(['Venue_Main_Category'])['Venue_Category'].value_counts(normalize=False)

Venue_Main_Category  Venue_Category                          
Café                 Café                                        47
                     Bar                                          3
                     Coffee Shop                                  1
Gym                  Gym / Fitness Center                        21
                     Gym                                          9
                     Yoga Studio                                  9
                     Residential Building (Apartment / Condo)     3
                     Boxing Gym                                   2
                     Climbing Gym                                 2
                     Cycle Studio                                 1
Other food           Japanese Restaurant                          9
                     Café                                         8
                     Asian Restaurant                             4
                     Bakery                           

In [44]:
indexNames = nearby_shenton[ (nearby_shenton['Venue_Category'] == 'Residential Building (Apartment / Condo)')].index
nearby_shenton.drop(indexNames , inplace=True)
nearby_shenton.groupby(['Venue_Main_Category'])['Venue_Category'].value_counts(normalize=False)

list_Food_filter=['Café']
indexNames = nearby_shenton[ (nearby_shenton['Venue_Main_Category'] == 'Other Food') & (nearby_shenton['Venue_Category'].isin(list_Food_filter)) ].index
nearby_shenton.drop(indexNames , inplace=True)

In [45]:
nearby_shenton.head()

Unnamed: 0,Venue,Gym_Latitude,Gym_Longitude,Venue_Main_Category,Venue_ID,Sub_Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Virgin Active,1.277055,103.846435,Café,5a1e3809a0215b152b57dedf,Cedele,1.276936,103.84626,Café
1,Virgin Active,1.277055,103.846435,Café,4e256401a80967678ca3d6f2,Da Pai Dang 大牌档,1.277222,103.846639,Café
2,Virgin Active,1.277055,103.846435,Café,4e23c2ca1495f18f036537c2,Share Tea 歇脚亭,1.277387,103.846283,Café
3,Virgin Active,1.277055,103.846435,Café,4d8987d9401a224b3bd88e18,Coffee & Toast,1.276724,103.846634,Café
4,Fitness First Platinum,1.277843,103.847654,Café,5625d21a498e596424dac931,Joe & Dough,1.277715,103.847619,Café


In [46]:
shenton_gym_freq=pd.crosstab(nearby_shenton.Venue,nearby_shenton.Venue_Main_Category,margins=False)
shenton_gym_freq.sort_values('Gym',ascending=False)

Venue_Main_Category,Café,Gym,Other food
Venue,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bodytec Studio,3,4,6
Athlete Lab,4,3,6
Uppercut Boxing,2,3,5
STILL,2,3,2
Haus Athletics,1,3,0
GuavaLabs,1,3,0
Hale Yoga Studio,2,2,3
Anytime Fitness Cecil Street,1,2,0
URA Gymnasium @ 4th Floor,1,2,4
Sweatbox,1,2,0


In [47]:
shenton_gym[['Venue','Venue_Latitude','Venue_Longitude']]

Unnamed: 0,Venue,Venue_Latitude,Venue_Longitude
655,Virgin Active,1.277055,103.846435
656,Fitness First Platinum,1.277843,103.847654
657,Vanda Boxing Club,1.279218,103.849207
658,GIC Gym Studio,1.277402,103.847426
659,STILL,1.277296,103.848878
660,GuavaLabs,1.277162,103.84876
661,Haus Athletics,1.277023,103.848485
662,Ritual Gym,1.27824,103.848499
663,Gym @ The Clift,1.279529,103.847373
664,Anytime Fitness,1.276658,103.845944


In [48]:
shenton_gym_freq2=pd.merge(shenton_gym_freq,shenton_gym[['Venue','Venue_Latitude','Venue_Longitude']],on='Venue',how='left')
shenton_gym_freq2

Unnamed: 0,Venue,Café,Gym,Other food,Venue_Latitude,Venue_Longitude
0,Anytime Fitness,7,1,9,1.276658,103.845944
1,Anytime Fitness Cecil Street,1,2,0,1.279642,103.848224
2,Athlete Lab,4,3,6,1.280222,103.846832
3,Bodytec Studio,3,4,6,1.280082,103.847438
4,Boulder Movement,1,2,2,1.277651,103.84888
5,Fitness Chemistry,1,2,4,1.279504,103.84656
6,Fitness First Platinum,1,2,2,1.277843,103.847654
7,Freedom Yoga,0,2,0,1.279755,103.848431
8,GIC Gym Studio,4,1,2,1.277402,103.847426
9,GuavaLabs,1,3,0,1.277162,103.84876


In [49]:
shenton_gym_freq3=shenton_gym_freq2.drop('Venue', 1)

In [50]:
kclusters = 4
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(shenton_gym_freq3)

# check cluster labels generated for each row in the dataframe
print(kmeans.labels_[0:30])
print(len(kmeans.labels_))

[3 2 1 1 2 1 2 2 0 2 1 1 2 0 0 2 2 1 1 0 0 0]
22


In [51]:
shenton_gym_freq2['Cluster_Labels'] = kmeans.labels_
shenton_gym_freq2.sort_values('Cluster_Labels',ascending=False)

Unnamed: 0,Venue,Café,Gym,Other food,Venue_Latitude,Venue_Longitude,Cluster_Labels
0,Anytime Fitness,7,1,9,1.276658,103.845944,3
1,Anytime Fitness Cecil Street,1,2,0,1.279642,103.848224,2
4,Boulder Movement,1,2,2,1.277651,103.84888,2
16,Sweatbox,1,2,0,1.276672,103.84869,2
6,Fitness First Platinum,1,2,2,1.277843,103.847654,2
7,Freedom Yoga,0,2,0,1.279755,103.848431,2
15,STILL,2,3,2,1.277296,103.848878,2
9,GuavaLabs,1,3,0,1.277162,103.84876,2
12,Haus Athletics,1,3,0,1.277023,103.848485,2
18,Uppercut Boxing,2,3,5,1.28033,103.847431,1


In [54]:
shenton=shenton_gym_freq2
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Singapore'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map
map_shenton2 = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

# set color scheme for the clusters
x = np.arange(4)
ys = [i+x+(i*x)**2 for i in range(4)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(shenton['Venue_Latitude'], shenton['Venue_Longitude'], shenton['Venue'], shenton['Cluster_Labels']):        
    label = folium.Popup(str(poi) + ' cluster:' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_shenton2)
       
map_shenton2

Relatively cluster 2 is gym indensitive and less Cafe around area. It is recomended.