<b> <font size = 5> Introduction & Problem <b>
   

Singapore is a known food paradise with influence from its multi-racialism demographic, cross-cultural historical roots, and generally, an insatiable appetite for good food. As Singapore is a small country, tourists tend to visit for a short duration, and consequentially, that means a limited number of meals for the many food gems out there. 

With Foursquare, the best food places can be shortlisted through crowdsourcing the ratings of each food place from the experience of locals and tourists combined. This Best Food Places list can then be used by a tourist to plan his itinerary for his travel in Singapore. 


<b><font size = 5>Data to be used <b>

1. Areas in Singapore 
2. Food venues in Singapore sorted by ratings 

Areas in Singapore will be derived from 'Towns' as listed from any available dataset from https://data.gov.sg. In this case, https://data.gov.sg/dataset/b35046dc-7428-4cff-968d-ef4c3e9e6c99, will be used. Town coordinates will be retrieved using Google Maps. 

Food venues in Singapore sorted by ratings will be obtained from FourSquare. Explore endpoint will be used to get the venue, its category of establishment, and coordinates.



<b><font size = 5> Method<b>

Singapore will be segmented into different areas for the different food places to be grouped under. This appreciation of the Singapore food scene and its geography will facilitate any accommodation arrangement. 

<b> Setup & import library

In [1]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
!pip install wget

print("completed")

Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.





Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.





completed


In [2]:
import numpy as np 
import pandas as pd 

import json 
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import folium 

import requests 
import lxml.html as lh
import bs4 as bs
from bs4 import BeautifulSoup
import urllib.request

print("completed")

completed


In [3]:
# to automate csv format downloads

from IPython.display import HTML
import base64

def create_download_link( df, title = "Download CSV file", filename = "data.csv"):  
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

print("completed")

completed


<b> Singapore towns retrieval

In [4]:
import zipfile
import os
import csv

SG_TOWNS_DATA = pd.read_csv("median-rent-by-town-and-flat-type.csv")
SG_TOWNS_DATA.rename(columns = {'town':'Town'}, inplace = True)
SG_TOWNS_DATA.head()

Unnamed: 0,quarter,Town,flat_type,median_rent
0,2005-Q2,ANG MO KIO,1-RM,na
1,2005-Q2,ANG MO KIO,2-RM,na
2,2005-Q2,ANG MO KIO,3-RM,800
3,2005-Q2,ANG MO KIO,4-RM,950
4,2005-Q2,ANG MO KIO,5-RM,-


In [5]:
SG_TOWNS_LIST= SG_TOWNS_DATA.drop(['quarter','flat_type','median_rent'], axis = 1)
SG_TOWNS = SG_TOWNS_LIST.groupby(['Town']).count().reset_index()
SG_TOWNS.head()

Unnamed: 0,Town
0,ANG MO KIO
1,BEDOK
2,BISHAN
3,BUKIT BATOK
4,BUKIT MERAH


In [6]:
SG_TOWNS

Unnamed: 0,Town
0,ANG MO KIO
1,BEDOK
2,BISHAN
3,BUKIT BATOK
4,BUKIT MERAH
5,BUKIT PANJANG
6,BUKIT TIMAH
7,CENTRAL
8,CHOA CHU KANG
9,CLEMENTI


In [7]:
SG_TOWNS['Latitude'] = 0.0
SG_TOWNS['Longitude'] = 0.0

In [8]:
SG_TOWNS

Unnamed: 0,Town,Latitude,Longitude
0,ANG MO KIO,0.0,0.0
1,BEDOK,0.0,0.0
2,BISHAN,0.0,0.0
3,BUKIT BATOK,0.0,0.0
4,BUKIT MERAH,0.0,0.0
5,BUKIT PANJANG,0.0,0.0
6,BUKIT TIMAH,0.0,0.0
7,CENTRAL,0.0,0.0
8,CHOA CHU KANG,0.0,0.0
9,CLEMENTI,0.0,0.0


In [9]:
geo = Nominatim(user_agent='Mypythonapi')
for idx,town in SG_TOWNS['Town'].iteritems():
    coord = geo.geocode(town + ' ' + "Singapore", timeout = 10)

    if coord:
            SG_TOWNS.loc[idx,'Latitude'] = coord.latitude
            SG_TOWNS.loc[idx,'Longitude'] = coord.longitude
    else:
            SG_TOWNS.loc[idx,'Latitude'] = NULL
            SG_TOWNS.loc[idx,'Longitude'] = NULL

In [10]:
SG_TOWNS

Unnamed: 0,Town,Latitude,Longitude
0,ANG MO KIO,1.369842,103.846609
1,BEDOK,1.323976,103.930216
2,BISHAN,1.351455,103.848263
3,BUKIT BATOK,1.349057,103.749591
4,BUKIT MERAH,1.280628,103.830591
5,BUKIT PANJANG,1.377921,103.771866
6,BUKIT TIMAH,1.35469,103.776372
7,CENTRAL,1.290475,103.852036
8,CHOA CHU KANG,1.38926,103.743728
9,CLEMENTI,1.314026,103.76241


In [11]:
SG_TOWNS.set_index("Town")

Unnamed: 0_level_0,Latitude,Longitude
Town,Unnamed: 1_level_1,Unnamed: 2_level_1
ANG MO KIO,1.369842,103.846609
BEDOK,1.323976,103.930216
BISHAN,1.351455,103.848263
BUKIT BATOK,1.349057,103.749591
BUKIT MERAH,1.280628,103.830591
BUKIT PANJANG,1.377921,103.771866
BUKIT TIMAH,1.35469,103.776372
CENTRAL,1.290475,103.852036
CHOA CHU KANG,1.38926,103.743728
CLEMENTI,1.314026,103.76241


In [12]:
geo = Nominatim(user_agent='My-IBMNotebook')
address = 'Singapore'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude


# create map of Singapore using latitude and longitude values
map_singapore = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, town in zip(
    SG_TOWNS['Latitude'],
    SG_TOWNS['Longitude'],
    SG_TOWNS['Town']):
    label = town
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#87cefa',
        fill_opacity=0.5,
        parse_html=False).add_to(map_singapore)
    
map_singapore

<b> Retrieving data from FourSquare

In [13]:

CLIENT_ID     = "GQIYPJEETW3E31G0AGFTJBC4U3S3D1FBV0BWBND2SEL121DC"
CLIENT_SECRET = "2WABL5MGW103VJ1XETFHTP3NJFVRXAKMDW2BQFKNU5NOIPYF"
VERSION       = 20190102
LIMIT         = 80

In [14]:
import time

FOURSQUARE_EXPLORE_URL = 'https://api.foursquare.com/v2/venues/explore?'
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('getNearbyVenues',names)
        
        # create the API request URL
        url = '{}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            FOURSQUARE_EXPLORE_URL,CLIENT_ID,CLIENT_SECRET,VERSION,
            lat,lng,radius,LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,lat,lng, 
            v['venue']['id'],v['venue']['name'], 
            v['venue']['location']['lat'],v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        time.sleep(2)

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Town','Town Latitude','Town Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']
    
    return(nearby_venues)

In [15]:
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'


# Dataframe : venue_id_recover 
# - store venue id to recover failed venues id score retrieval later if foursquare limit is exceeded when getting score.
venue_id_rcols = ['VenueID']
venue_id_recover = pd.DataFrame(columns=venue_id_rcols)

def getVenuesByCategory(names, latitudes, longitudes, categoryID, radius=500):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    venue_columns = ['Town','Town Latitude','Town Longitude','VenueID','VenueName','score','category','catID','latitude','longitude']
    venue_DF = pd.DataFrame(columns=venue_columns)
    print("[#Start getVenuesByCategory]")
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        print(name,",",end='')
        #print('getVenuesByCategory',categoryID,name) ; # DEBUG: be quiet
        # create the API request URL
        url = '{}client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            FOURSQUARE_SEARCH_URL,CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT,categoryID)
        # make the GET request
        results = requests.get(url).json()
        # Populate dataframe with the category venue results
        # Extracting JSON  data values
        
        for jsonSub in results['response']['venues']:
            #print(jsonSub)
            # JSON Results may not be in expected format or incomplete data, in that case, skip!
            ven_id = 0
            try:
                # If there are any issue with a restaurant, retry or ignore and continue
                # Get location details
                ven_id   = jsonSub['id']
                ven_cat  = jsonSub['categories'][0]['pluralName']
                ven_CID  = jsonSub['categories'][0]['id']
                ven_name = jsonSub['name']
                ven_lat  = jsonSub['location']['lat']
                ven_lng  = jsonSub['location']['lng']
                venue_DF = venue_DF.append({
                    'Town'      : name,
                    'Town Latitude' : lat,
                    'Town Longitude': lng,
                    'VenueID'   : ven_id,
                    'VenueName' : ven_name,
                    'score'     : 'nan',
                    'category'  : ven_cat,
                    'catID'     : ven_CID,
                    'latitude'  : ven_lat,
                    'longitude' : ven_lng}, ignore_index=True)
            except:
                continue
    # END OF LOOP, return.

    print("\n[#Done getVenuesByCategory]")
    return(venue_DF)

In [16]:
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'
# SEARCH VENUES BY CATEGORY

# Dataframe : venue_id_recover 
# - store venue id to recover failed venues id score retrieval later if foursquare limit is exceeded when getting score.
venue_id_rcols = ['VenueID','Score']
venue_id_recover = pd.DataFrame(columns=venue_id_rcols)

def getVenuesIDScore(venueID):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    global venue_id_recover
    print("[#getVenuesIDScore]")
    venID_URL = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venueID,CLIENT_ID,CLIENT_SECRET,VERSION)
    print(venID_URL)
    venID_score = 0.00
    # Process results
    try:
        venID_result = requests.get(venID_URL).json()
        venID_score  = venID_result['response']['venue']['rating']
    except: 
        venue_id_recover = venue_id_recover.append({'VenueID' : venueID, 'Score' : 0.0})
        
        return ["error",0.0]
    return ["success",venID_score]

In [17]:
SG_TOWNS.dtypes

Town          object
Latitude     float64
Longitude    float64
dtype: object

In [36]:
venue_columns = ['Town','Town Latitude','Town Longitude','VenueID','VenueName','score','category','catID','latitude','longitude']
singapore_town_venues = pd.DataFrame(columns=venue_columns)

In [37]:
# Food Venues : Restaurants, Fastfoods, Etc
categoryID = "4d4b7105d754a06374d81259"
town_names = SG_TOWNS['Town']
lat_list   = SG_TOWNS['Latitude']
lng_list   = SG_TOWNS['Longitude']

singapore_food_venues = getVenuesByCategory(names=town_names,latitudes=lat_list,longitudes=lng_list,categoryID=categoryID)

[#Start getVenuesByCategory]
ANG MO KIO ,BEDOK ,BISHAN ,BUKIT BATOK ,BUKIT MERAH ,BUKIT PANJANG ,BUKIT TIMAH ,CENTRAL ,CHOA CHU KANG ,CLEMENTI ,GEYLANG ,HOUGANG ,JURONG EAST ,JURONG WEST ,KALLANG/WHAMPOA ,LIM CHU KANG ,MARINE PARADE ,PASIR RIS ,PUNGGOL ,QUEENSTOWN ,SEMBAWANG ,SENGKANG ,SERANGOON ,TAMPINES ,TOA PAYOH ,WOODLANDS ,YISHUN ,
[#Done getVenuesByCategory]


<b> <font size = 5>Results <b>
    

In [None]:
town_venues_sorted = town_venues_sorted.set_index("Town")
sg_merged = SG_TOWNS.set_index("Town")

sg_merged['Cluster Labels'] = kmeans.labels_

sg_merged = sg_merged.join(town_venues_sorted)
sg_merged

In [None]:
map_clusters = folium.Map(location=[latitude, longitude], tiles="Openstreetmap", zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['Latitude'], sg_merged['Longitude'], sg_merged.index.values,kmeans.labels_):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

<b> <font size = 5>Conclusion <b>

In descending order of popularity: Food Court, Coffee Shop, and Fast Food Restaurant. While highest rated Food Court is located in Marine Parade and Central Areas. 