# The Effects of Inequality in Singapore

## Problem

Inequality is a rising problem in Singapore, a city built on extracting wealth from the global elite and channelling it through the government's developmental state to further investments, creating a positive wealth cycle. Inequality is inevitable as some of the population cannot keep up with the rate of change in the global economy, need to and are unable to upskill, and/or lack initial starting capital/suffer indebtedness. However, anecdotal observations suggest that parts of Singapore catering to high income individuals (e.g. the Central Business District or the luxury property districts such as Tanglin) are getting significantly different in terms of public facilities and the types of private businesses conducted in the area. 


## Methodology

This project aims to investigate whether the differences in physical spaces can be linked to income levels within each neighbourhood, by clustering neighbourhoods based on public facilities (number of parks, number of schools, number of transport nodes) and private businesses. These clusters will then be compared against known income levels of each neighbourhood to determine if there is a significant relationship between income and the physical development of shared spaces.


## Data Sources

Public infrastructure data will come from the Singapore government's Onemap API, the Land Transport Authority Datamall API, and the data available on GovTech's data.gov.sg. 

Private business data will come from the Foursquare API. 

Both these data will be combined with data on planning areas from Onemap into a dataset. 

In [1]:
import requests
import pandas as pd
import json
import numpy as np

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


In [2]:
access_token=("eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzdWIiOjUzNDEsInVzZXJfaWQiOjUzNDEsImVtYWlsIjoic2Vhbndvb24xMjM0QGdtYWlsLmNvbSIsImZvcmV2ZXIiOmZhbHNlLCJpc3MiOiJodHRwOlwvXC9vbTIuZGZlLm9uZW1hcC5zZ1wvYXBpXC92MlwvdXNlclwvc2Vzc2lvbiIsImlhdCI6MTU5ODg3NDUzOCwiZXhwIjoxNTk5MzA2NTM4LCJuYmYiOjE1OTg4NzQ1MzgsImp0aSI6IjUxOThkODhjMGJjMTJlYjhmMDY2MWMxYmE5YjVlZWJiIn0.fee88V5ZUW_J5UbmeS49X7_pirvc8v-VYElwhEnxwaY")

### Getting Singapore's Planning Areas

Using the Onemap API, I downloaded the list of planning areas. 

In [3]:
response=requests.get('https://developers.onemap.sg/privateapi/popapi/getPlanningareaNames?token='+access_token+'&year=2014')

In [4]:
data=json.loads(response.text)

In [5]:
planning_areas=pd.DataFrame.from_dict(data)

In [6]:
planning_areas=planning_areas.sort_values(by='pln_area_n', ascending=True)

In [7]:
planning_areas=planning_areas.drop(columns='id')
planning_areas

Unnamed: 0,pln_area_n
31,ANG MO KIO
20,BEDOK
32,BISHAN
19,BOON LAY
33,BUKIT BATOK
21,BUKIT MERAH
22,BUKIT PANJANG
24,BUKIT TIMAH
25,CENTRAL WATER CATCHMENT
26,CHANGI


In [8]:
planning_areas.reset_index(inplace=True)

In [9]:
planning_areas=planning_areas.drop(columns='index')

In [10]:
planning_areas

Unnamed: 0,pln_area_n
0,ANG MO KIO
1,BEDOK
2,BISHAN
3,BOON LAY
4,BUKIT BATOK
5,BUKIT MERAH
6,BUKIT PANJANG
7,BUKIT TIMAH
8,CENTRAL WATER CATCHMENT
9,CHANGI


### Getting Number of Schools in each Planning Area

Through the School Directory dataset from data.gov.sg, I used a batch geocoder online to get the coordinates of each of the schools, and parsed it through the Onemap API's Planning Area Query to get the number of schools in each planning area. This was saved to a CSV file. 


In [11]:
coordinates=pd.read_csv("schools.csv")
coordinates.head()

Unnamed: 0,lat,long
0,1.354756,103.84662
1,1.298644,103.882094
2,1.374109,103.834462
3,1.34389,103.709446
4,1.329967,103.94148


In [15]:
latitude=coordinates[0:2]['lat'].tolist()
longitude=coordinates[0:2]['long'].tolist()
length=len(longitude)

In [13]:
def get_pln_areas(lat, long): 
    geocode_url = "http://developers.onemap.sg/privateapi/popapi/getPlanningarea?token="+access_token+"&lat={}".format(lat) + "&lng={}".format(long) + "&year=2014"
    results=requests.get(geocode_url)
    xdd=json.loads(results.text)
    global name
    name = xdd[0]['pln_area_n']

In [16]:
geocode_results=[]
for coords in range(length): 
    get_pln_areas(latitude[coords], longitude[coords])
    geocode_results.append(name)
    if len(geocode_results) % 50 ==0: print("completed {} out of {}".format(coords, len(longitude)))          
print("finished geocoding all addresses")
geocode_results

finished geocoding all addresses


['BISHAN', 'KALLANG']

### Getting Number of Parks in each Planning Area

Government investment also possibly affects the amount of leisure space in each neighbourhood. Hence, I used the NParks dataset available at the same website, and parsed it through the Onemap API again to get the number of parks in each planning area. 

In [17]:
coordinates1=pd.read_csv("park_coords.csv")
coordinates1.head()

Unnamed: 0,lat,long
0,1.288312,103.852346
1,1.294018,103.84542
2,1.299254,103.843993
3,1.365191,103.836444
4,1.371724,103.832808


In [18]:
latitude1=coordinates1[301:390]['lat'].tolist()
longitude1=coordinates1[301:390]['long'].tolist()
length1=len(longitude1)

In [82]:
def get_pln_areas(lat, long): 
    geocode_url = "http://developers.onemap.sg/privateapi/popapi/getPlanningarea?token="+access_token+"&lat={}".format(lat) + "&lng={}".format(long) + "&year=2014"
    results=requests.get(geocode_url)
    xdd=json.loads(results.text)
    global names
    names = xdd[0]['pln_area_n']

In [5]:
geocode_results_parks=[]
for coords in range(length1): 
    get_pln_areas(latitude1[coords], longitude1[coords])
    geocode_results_parks.append(names)
    if len(geocode_results_parks) % 50 ==0: print("completed {} out of {}".format(coords, len(longitude1)))          
print("finished geocoding all addresses")
geocode_results_parks

NameError: name 'length1' is not defined

### Preparing for Clustering 

In [43]:
dataset=pd.read_csv("inequality.csv")

In [44]:
inequality=pd.DataFrame(dataset)

In [45]:
inequality.head()

Unnamed: 0,Planning Area,% of population under $3000,number of schools,number of parks,number of bus stops
0,Ang Mo Kio,43.379447,14,30,168
1,Bedok,38.057219,24,55,281
2,Bishan,30.505051,10,16,95
3,Bukit Batok,37.698413,13,10,161
4,Bukit Merah,43.309002,10,5,179


In [164]:
dgp=inequality['Planning Area'].tolist()
n=len(dgp)


Getting the coordinates of each of the Planning Areas. 

In [165]:
lel=", Singapore"
geolocator = Nominatim(user_agent="to_explorer")
latitude=[]
longitude=[]
for i in range(n):
    location = geolocator.geocode(dgp[i]+lel)
    latitude.append(location.latitude)
    longitude.append(location.longitude)
latitude    

[1.3700803,
 1.3239765,
 1.3509859,
 1.3490572,
 1.2704395,
 1.3791486,
 1.3546901,
 1.3847493,
 1.3151003,
 1.3181862,
 1.3708011,
 1.333108,
 1.3396365,
 1.310759,
 1.3026889,
 1.3205257000000001,
 1.2828695,
 1.3730307,
 1.40519735,
 1.2946226,
 1.4490928,
 1.3919236499999998,
 1.3497610500000001,
 1.3546528,
 1.3060443,
 1.3353906,
 1.436897,
 1.4293839]

In [166]:
longitude

[103.8495228,
 103.930216,
 103.84825507492937,
 103.7495906,
 103.82831840176755,
 103.76141301431002,
 103.7763724,
 103.7445341,
 103.7652311,
 103.8870563,
 103.89254433997465,
 103.7422939,
 103.7073387,
 103.866262,
 103.9073952,
 103.84388133927948,
 103.8378603,
 103.949255,
 103.90234976571602,
 103.8060366,
 103.8200555,
 103.89549093760694,
 103.87368414801405,
 103.9435712,
 103.8152804,
 103.8497414,
 103.786216,
 103.8350282]

In [2]:
inequality_coords=pd.read_csv("inequality_better.csv")
inequality_coords.head()

Unnamed: 0,Planning Area,Pecent of population under 3000,number of schools,number of parks,Lat of PA,Long of PA
0,Ang Mo Kio,43.379447,14,30,1.37008,103.849523
1,Bedok,38.057219,24,55,1.323976,103.930216
2,Bishan,30.505051,10,16,1.350986,103.848255
3,Bukit Batok,37.698413,13,10,1.349057,103.749591
4,Bukit Merah,43.309002,10,5,1.270439,103.828318


In [3]:
latlong=inequality_coords.drop(columns=["Pecent of population under 3000", "number of schools", "number of parks"])
latlong


Unnamed: 0,Planning Area,Lat of PA,Long of PA
0,Ang Mo Kio,1.37008,103.849523
1,Bedok,1.323976,103.930216
2,Bishan,1.350986,103.848255
3,Bukit Batok,1.349057,103.749591
4,Bukit Merah,1.270439,103.828318
5,Bukit Panjang,1.379149,103.761413
6,Bukit Timah,1.35469,103.776372
7,Choa Chu Kang,1.384749,103.744534
8,Clementi,1.3151,103.765231
9,Geylang,1.318186,103.887056


In [169]:
dataset=pd.read_csv("inequality.csv")

In [170]:
inequality=pd.DataFrame(dataset)

Testing the clustering without any commercial information input yet; this is only showing the governmental perspective. 

In [171]:
k=4
dgp=inequality.drop("Planning Area", 1)
knn=KMeans(n_clusters=k, random_state=0).fit(dgp)
inequality.insert(0, 'Cluster Labels', knn.labels_)
inequality.head()

Unnamed: 0,Cluster Labels,Planning Area,% of population under $3000,number of schools,number of parks,number of bus stops
0,2,Ang Mo Kio,43.379447,14,30,168
1,0,Bedok,38.057219,24,55,281
2,3,Bishan,30.505051,10,16,95
3,2,Bukit Batok,37.698413,13,10,161
4,2,Bukit Merah,43.309002,10,5,179


In [172]:
inequality

Unnamed: 0,Cluster Labels,Planning Area,% of population under $3000,number of schools,number of parks,number of bus stops
0,2,Ang Mo Kio,43.379447,14,30,168
1,0,Bedok,38.057219,24,55,281
2,3,Bishan,30.505051,10,16,95
3,2,Bukit Batok,37.698413,13,10,161
4,2,Bukit Merah,43.309002,10,5,179
5,3,Bukit Panjang,36.953808,13,9,103
6,3,Bukit Timah,17.934783,11,35,109
7,3,Choa Chu Kang,36.419753,14,1,122
8,3,Clementi,32.026144,9,12,103
9,2,Geylang,41.693811,10,13,156


In [173]:
merged=inequality.sort_values(by="Cluster Labels")
merged=merged.join(latlong.set_index("Planning Area"), on='Planning Area')
merged

Unnamed: 0,Cluster Labels,Planning Area,% of population under $3000,number of schools,number of parks,number of bus stops,Lat of PA,Long of PA
1,0,Bedok,38.057219,24,55,281,1.323976,103.930216
23,0,Tampines,38.089005,17,11,240,1.354653,103.943571
19,0,Queenstown,36.788618,10,8,226,1.294623,103.806037
10,0,Hougang,39.072848,17,18,202,1.370801,103.892544
12,0,Jurong West,39.193447,21,6,241,1.339636,103.707339
26,0,Woodlands,43.84273,23,4,206,1.436897,103.786216
24,1,Tanglin,12.222222,1,1,46,1.306044,103.81528
16,1,Outram,42.735043,1,5,28,1.282869,103.83786
14,1,Marine Parade,28.372093,6,8,56,1.302689,103.907395
0,2,Ang Mo Kio,43.379447,14,30,168,1.37008,103.849523


#### Visualising Initial Clusters

In [40]:
# initialising Singapore's coordinates for Folium
latitude=1.3521
longitude=103.8198

In [184]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0.2, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(merged['Lat of PA'], merged['Long of PA'], merged['Planning Area'], merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Adding Foursquare Location Data

Location data will now be added to determine if there is any difference in the clustering and any patterns that can be observed.

In [4]:
CLIENT_ID = 'AEW2MD1SSNGHD14O00DRFU1WFKKAMDPAOT5QZSALKHCD0PKV' # your Foursquare ID
CLIENT_SECRET = 'U0YCAGNS3UNJOA4SWIO2ZO2MP21DF2RP550LA2BE5L2WN2GZ' # your Foursquare Secret
VERSION = '20200820' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AEW2MD1SSNGHD14O00DRFU1WFKKAMDPAOT5QZSALKHCD0PKV
CLIENT_SECRET:U0YCAGNS3UNJOA4SWIO2ZO2MP21DF2RP550LA2BE5L2WN2GZ


In [5]:
latlong

Unnamed: 0,Planning Area,Lat of PA,Long of PA
0,Ang Mo Kio,1.37008,103.849523
1,Bedok,1.323976,103.930216
2,Bishan,1.350986,103.848255
3,Bukit Batok,1.349057,103.749591
4,Bukit Merah,1.270439,103.828318
5,Bukit Panjang,1.379149,103.761413
6,Bukit Timah,1.35469,103.776372
7,Choa Chu Kang,1.384749,103.744534
8,Clementi,1.3151,103.765231
9,Geylang,1.318186,103.887056


In [6]:
nb_lat=latlong.loc[0, 'Lat of PA']
nb_long=latlong.loc[0, 'Long of PA']

In [7]:
LIMIT = 200 # limit of number of venues returned by Foursquare API

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,  
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
sg_venues = getNearbyVenues(names=latlong['Planning Area'],
                                   latitudes=latlong['Lat of PA'],
                                   longitudes=latlong['Long of PA']
                                  )

Ang Mo Kio
Bedok
Bishan
Bukit Batok
Bukit Merah
Bukit Panjang
Bukit Timah
Choa Chu Kang
Clementi
Geylang
Hougang
Jurong East
Jurong West
Kallang
Marine Parade
Novena
Outram
Pasir Ris
Punggol
Queenstown
Sembawang
Sengkang
Serangoon
Tampines
Tanglin
Toa Payoh
Woodlands
Yishun


In [10]:
sg_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ang Mo Kio,1.37008,103.849523,FairPrice Xtra,1.369279,103.848886,Supermarket
1,Ang Mo Kio,1.37008,103.849523,Old Chang Kee,1.369094,103.848389,Snack Place
2,Ang Mo Kio,1.37008,103.849523,Face Ban Mian 非板面 (Ang Mo Kio),1.372031,103.847504,Noodle House
3,Ang Mo Kio,1.37008,103.849523,MOS Burger,1.36917,103.847831,Burger Joint
4,Ang Mo Kio,1.37008,103.849523,NTUC FairPrice,1.371507,103.847082,Supermarket


In [11]:
sg_onehot=pd.get_dummies(sg_venues['Venue Category'], prefix="", prefix_sep="")
sg_onehot["Planning Area"]=sg_venues["Neighborhood"]
fixed_columns = [sg_onehot.columns[-1]] + list(sg_onehot.columns[:-1])
sg_onehot = sg_onehot[fixed_columns]

sg_onehot.head()

Unnamed: 0,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Ang Mo Kio,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
sg_grouped=sg_onehot.groupby("Planning Area").mean().reset_index()
sg_grouped

Unnamed: 0,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bedok,0.0,0.011628,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0
2,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bukit Batok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.01
5,Bukit Panjang,0.0,0.019608,0.0,0.0,0.0,0.0,0.078431,0.0,0.0,...,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bukit Timah,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Choa Chu Kang,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Clementi,0.0,0.0125,0.0,0.0,0.0,0.0125,0.05,0.0,0.0,...,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Geylang,0.0,0.012346,0.0,0.0,0.0,0.0,0.061728,0.0,0.0,...,0.061728,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0


In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Planning Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
sg_sorted = pd.DataFrame(columns=columns)
sg_sorted['Planning Area'] = sg_grouped['Planning Area']

for ind in np.arange(sg_grouped.shape[0]):
    sg_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

sg_sorted.head()

Unnamed: 0,Planning Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ang Mo Kio,Coffee Shop,Dessert Shop,Food Court,Japanese Restaurant,Supermarket,Snack Place,Fast Food Restaurant,Bubble Tea Shop,Asian Restaurant,Fried Chicken Joint
1,Bedok,Chinese Restaurant,Coffee Shop,Café,Food Court,Asian Restaurant,Bakery,Sandwich Place,Noodle House,Japanese Restaurant,Supermarket
2,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Chinese Restaurant,Cosmetics Shop,Ice Cream Shop,Stadium,Supermarket,Thai Restaurant
3,Bukit Batok,Coffee Shop,Food Court,Chinese Restaurant,Fast Food Restaurant,Bus Station,Bus Stop,Bowling Alley,Malay Restaurant,Food & Drink Shop,Shopping Mall
4,Bukit Merah,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Café,Clothing Store,Fast Food Restaurant,Toy / Game Store,Multiplex,Asian Restaurant,Scenic Lookout


In [32]:
sg_sorted

Unnamed: 0,Planning Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ang Mo Kio,Coffee Shop,Dessert Shop,Food Court,Japanese Restaurant,Supermarket,Snack Place,Fast Food Restaurant,Bubble Tea Shop,Asian Restaurant,Fried Chicken Joint
1,Bedok,Chinese Restaurant,Coffee Shop,Café,Food Court,Asian Restaurant,Bakery,Sandwich Place,Noodle House,Japanese Restaurant,Supermarket
2,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Chinese Restaurant,Cosmetics Shop,Ice Cream Shop,Stadium,Supermarket,Thai Restaurant
3,Bukit Batok,Coffee Shop,Food Court,Chinese Restaurant,Fast Food Restaurant,Bus Station,Bus Stop,Bowling Alley,Malay Restaurant,Food & Drink Shop,Shopping Mall
4,Bukit Merah,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Café,Clothing Store,Fast Food Restaurant,Toy / Game Store,Multiplex,Asian Restaurant,Scenic Lookout
5,Bukit Panjang,Coffee Shop,Asian Restaurant,Fast Food Restaurant,Convenience Store,Indonesian Restaurant,Supermarket,Sushi Restaurant,Café,Shopping Mall,Bus Station
6,Bukit Timah,Café,Chinese Restaurant,Korean Restaurant,Thai Restaurant,Indian Restaurant,Nature Preserve,Coffee Shop,Food Court,Supermarket,Park
7,Choa Chu Kang,Fast Food Restaurant,Coffee Shop,Food Court,Golf Course,Sushi Restaurant,Café,Sandwich Place,Bubble Tea Shop,Miscellaneous Shop,Italian Restaurant
8,Clementi,Food Court,Chinese Restaurant,Bus Station,Asian Restaurant,Convenience Store,Fast Food Restaurant,Playground,Chinese Breakfast Place,Pet Store,Supermarket
9,Geylang,Chinese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Noodle House,Food Court,Coffee Shop,Fast Food Restaurant,Dessert Shop,Seafood Restaurant,Steakhouse


In [33]:
sg_grouped.head()

Unnamed: 0,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bedok,0.0,0.011628,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0
2,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bukit Batok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.01


In [34]:
k=4
sg_clustered=sg_grouped.drop("Planning Area", 1)
knn=KMeans(n_clusters=k, random_state=0).fit(sg_clustered)
sg_sorted.insert(0, 'Cluster Labels', knn.labels_)
sg_sorted.head()

Unnamed: 0,Cluster Labels,Planning Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Ang Mo Kio,Coffee Shop,Dessert Shop,Food Court,Japanese Restaurant,Supermarket,Snack Place,Fast Food Restaurant,Bubble Tea Shop,Asian Restaurant,Fried Chicken Joint
1,3,Bedok,Chinese Restaurant,Coffee Shop,Café,Food Court,Asian Restaurant,Bakery,Sandwich Place,Noodle House,Japanese Restaurant,Supermarket
2,1,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Chinese Restaurant,Cosmetics Shop,Ice Cream Shop,Stadium,Supermarket,Thai Restaurant
3,1,Bukit Batok,Coffee Shop,Food Court,Chinese Restaurant,Fast Food Restaurant,Bus Station,Bus Stop,Bowling Alley,Malay Restaurant,Food & Drink Shop,Shopping Mall
4,3,Bukit Merah,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Café,Clothing Store,Fast Food Restaurant,Toy / Game Store,Multiplex,Asian Restaurant,Scenic Lookout


In [35]:
sg_sorted=sg_sorted.join(latlong.set_index("Planning Area"), on="Planning Area")
sg_sorted.head()

Unnamed: 0,Cluster Labels,Planning Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Lat of PA,Long of PA
0,1,Ang Mo Kio,Coffee Shop,Dessert Shop,Food Court,Japanese Restaurant,Supermarket,Snack Place,Fast Food Restaurant,Bubble Tea Shop,Asian Restaurant,Fried Chicken Joint,1.37008,103.849523
1,3,Bedok,Chinese Restaurant,Coffee Shop,Café,Food Court,Asian Restaurant,Bakery,Sandwich Place,Noodle House,Japanese Restaurant,Supermarket,1.323976,103.930216
2,1,Bishan,Food Court,Coffee Shop,Bubble Tea Shop,Café,Chinese Restaurant,Cosmetics Shop,Ice Cream Shop,Stadium,Supermarket,Thai Restaurant,1.350986,103.848255
3,1,Bukit Batok,Coffee Shop,Food Court,Chinese Restaurant,Fast Food Restaurant,Bus Station,Bus Stop,Bowling Alley,Malay Restaurant,Food & Drink Shop,Shopping Mall,1.349057,103.749591
4,3,Bukit Merah,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Café,Clothing Store,Fast Food Restaurant,Toy / Game Store,Multiplex,Asian Restaurant,Scenic Lookout,1.270439,103.828318


In [37]:
sg_sorted.sort_values(by="Cluster Labels")

Unnamed: 0,Cluster Labels,Planning Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Lat of PA,Long of PA
24,0,Tanglin,Hotel,Café,Bar,Lounge,Garden,Modern European Restaurant,French Restaurant,Indian Restaurant,Park,Italian Restaurant,1.306044,103.81528
15,0,Novena,Café,Coffee Shop,Hotel,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Bakery,Hainan Restaurant,Ramen Restaurant,Asian Restaurant,1.320526,103.843881
0,1,Ang Mo Kio,Coffee Shop,Dessert Shop,Food Court,Japanese Restaurant,Supermarket,Snack Place,Fast Food Restaurant,Bubble Tea Shop,Asian Restaurant,Fried Chicken Joint,1.37008,103.849523
21,1,Sengkang,Coffee Shop,Food Court,Fast Food Restaurant,Metro Station,Grocery Store,Sculpture Garden,Supermarket,Sushi Restaurant,Sandwich Place,Café,1.391924,103.895491
20,1,Sembawang,Coffee Shop,Chinese Restaurant,Bus Station,Asian Restaurant,Fast Food Restaurant,Food Court,Park,Food Truck,Spa,Italian Restaurant,1.449093,103.820056
17,1,Pasir Ris,Fast Food Restaurant,Food Court,Coffee Shop,Bakery,Supermarket,Sandwich Place,Bus Station,Seafood Restaurant,Shopping Mall,Snack Place,1.373031,103.949255
10,1,Hougang,Food Court,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Asian Restaurant,Indian Restaurant,Shopping Mall,Supermarket,Vegetarian / Vegan Restaurant,Pharmacy,1.370801,103.892544
8,1,Clementi,Food Court,Chinese Restaurant,Bus Station,Asian Restaurant,Convenience Store,Fast Food Restaurant,Playground,Chinese Breakfast Place,Pet Store,Supermarket,1.3151,103.765231
13,1,Kallang,Coffee Shop,Hostel,BBQ Joint,Noodle House,Restaurant,Food Court,Café,Fast Food Restaurant,Soup Place,Supermarket,1.310759,103.866262
27,1,Yishun,Food Court,Coffee Shop,Chinese Restaurant,Hainan Restaurant,Fried Chicken Joint,Supermarket,Fast Food Restaurant,Italian Restaurant,Pharmacy,Park,1.429384,103.835028


In [42]:
map_clusters_a = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0.2, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_sorted['Lat of PA'], sg_sorted['Long of PA'], sg_sorted['Planning Area'], sg_sorted['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_a)
       
map_clusters_a

### Does Adding Commerical Data with Infrastructure Data Change the Clusters?

In [48]:
sg_final=sg_grouped.join(inequality.set_index("Planning Area"), on="Planning Area")

In [49]:
sg_final

Unnamed: 0,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,% of population under $3000,number of schools,number of parks,number of bus stops
0,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,43.379447,14,30,168
1,Bedok,0.0,0.011628,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,...,0.0,0.0,0.0,0.011628,0.0,0.0,38.057219,24,55,281
2,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,30.505051,10,16,95
3,Bukit Batok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,37.698413,13,10,161
4,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,...,0.01,0.01,0.01,0.0,0.01,0.01,43.309002,10,5,179
5,Bukit Panjang,0.0,0.019608,0.0,0.0,0.0,0.0,0.078431,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,36.953808,13,9,103
6,Bukit Timah,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,17.934783,11,35,109
7,Choa Chu Kang,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,36.419753,14,1,122
8,Clementi,0.0,0.0125,0.0,0.0,0.0,0.0125,0.05,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,32.026144,9,12,103
9,Geylang,0.0,0.012346,0.0,0.0,0.0,0.0,0.061728,0.0,0.0,...,0.0,0.0,0.0,0.012346,0.0,0.0,41.693811,10,13,156


In [52]:
k=4
sg_final_c=sg_final.drop("Planning Area", 1)
knn=KMeans(n_clusters=k, random_state=0).fit(sg_final_c)
sg_final.insert(0, 'Cluster Labels', knn.labels_)
sg_final.head()

Unnamed: 0,Cluster Labels,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,% of population under $3000,number of schools,number of parks,number of bus stops
0,2,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,43.379447,14,30,168
1,0,Bedok,0.0,0.011628,0.0,0.0,0.0,0.0,0.046512,0.0,...,0.0,0.0,0.0,0.011628,0.0,0.0,38.057219,24,55,281
2,3,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,30.505051,10,16,95
3,2,Bukit Batok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,37.698413,13,10,161
4,2,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,...,0.01,0.01,0.01,0.0,0.01,0.01,43.309002,10,5,179


In [53]:
sg_final=sg_final.join(latlong.set_index("Planning Area"), on="Planning Area")


In [55]:
sg_final.sort_values(by="Cluster Labels")

Unnamed: 0,Cluster Labels,Planning Area,Accessories Store,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Wine Bar,Wings Joint,Women's Store,Yoga Studio,% of population under $3000,number of schools,number of parks,number of bus stops,Lat of PA,Long of PA
1,0,Bedok,0.0,0.011628,0.0,0.0,0.0,0.0,0.046512,0.0,...,0.0,0.011628,0.0,0.0,38.057219,24,55,281,1.323976,103.930216
23,0,Tampines,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,...,0.0,0.0,0.0,0.0,38.089005,17,11,240,1.354653,103.943571
19,0,Queenstown,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.0,36.788618,10,8,226,1.294623,103.806037
10,0,Hougang,0.0,0.0,0.0,0.0,0.0,0.0,0.057692,0.019231,...,0.0,0.0,0.0,0.0,39.072848,17,18,202,1.370801,103.892544
12,0,Jurong West,0.0,0.013889,0.0,0.0,0.0,0.0,0.083333,0.0,...,0.0,0.013889,0.0,0.0,39.193447,21,6,241,1.339636,103.707339
26,0,Woodlands,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,43.84273,23,4,206,1.436897,103.786216
24,1,Tanglin,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,...,0.02,0.0,0.0,0.0,12.222222,1,1,46,1.306044,103.81528
16,1,Outram,0.0,0.016393,0.0,0.0,0.0,0.0,0.04918,0.0,...,0.032787,0.0,0.0,0.0,42.735043,1,5,28,1.282869,103.83786
14,1,Marine Parade,0.0,0.0,0.0,0.013889,0.0,0.0,0.027778,0.013889,...,0.0,0.0,0.0,0.027778,28.372093,6,8,56,1.302689,103.907395
0,2,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,...,0.0,0.0,0.0,0.0,43.379447,14,30,168,1.37008,103.849523


In [57]:
map_clusters_b = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0.2, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_final['Lat of PA'], sg_final['Long of PA'], sg_final['Planning Area'], sg_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_b)
       
map_clusters_b