# The Battle of Neighborhoods

## Content Page
1. Introduction
2. Data Extraction
3. Data Transformation
4. Data Visualisation
5. Data Exploration
6. Data Analysis
7. K-Means Clustering
8. Evaluating the Clusters
9. Conclusion

## 1. Introduction
Discuss the business problem and audience who would interested in this project.

### Business Problem: Recommend the *best place* to open a *Cafe* in Singapore.

The Cafe will serve *pastries and good coffee*. Seatings will also be provided for visitors to enjoy their coffee and meet up with friends. People will visit Cafe for numerous reasons and on different occasions. On *weekdays*, students might visit after school to *do assignments* together with their school mates. For cafes near to offices, working adults may use it as a place for *informal meetings and discussions* or just to *grab a cup of coffee* in the morning before heading to the office. On *weekends*, adults will usually visit a Cafe to *meet and catch up with friends*. For cafes near to *public transportations* and *attractions or hostels and hotels*, tourists will tend to visit due to *convenience*. Therefore, we can infer firsthand that the visitors to the cafe is *heavily dependent on its location on weekdays* but less likely on weekends.

In order to maximise revenues, cafes need to be located near offices to capture the weekday crowd but also not too far away from places of attractions and hotels where tourists will pay a visit. As working adults has larger spending powers than students, the analysis will focus more on the working adults.

## 2. Data Extraction
Describe the data that you will be using to solve the problem or execute your idea. 

### Requires Singapore City data that contains Towns, Neighborhoods along with their latitude and longitude.

The *5* Regions of Singapore are:
1. Central Region
2. East Region
3. North-East Region
4. North Region
5. West Region

There are a total of *55* planning areas organised into these *5* regions. The information of these 55 planning areas can be extracted from wikipedia.

### Import Libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import seaborn as sns

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geopy


The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-certificates: 2019.1.23-0                   --> 2019.3.9-hecc5488_0 conda-forge
    certifi:         2018.8.24-py35_1              --> 2018.8.24-py35_1001 conda-forge
    openssl:         1.0.2r-h7b6447c_0             --> 1.0.2r-h14c3975_0   conda-forge

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - folium=0.5.0


The following NEW packages will be INSTALLED:

    altair:  2.2.2-py35_1 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 

### 2a. Webscrap Singapore's Neighborhood from Wikipedia
   (Data Source:  Wikipedia: https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore)

Then transform the data into a pandas dataframe that consists of five columns: Name, Region, Area, Population and Density

In [378]:
from bs4 import BeautifulSoup

source = requests.get('https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore').text

soup = BeautifulSoup(source, 'html5lib')

In [379]:
table_post = soup.find_all('table')[2] #Grab the third table
fields = table_post.find_all('td') 

EName = []
MName =[]
CName=[]
Pinyin =[]
TName =[]
Region = []
Area =[]
Population = []
Density =[]

for i in range(0, len(fields), 9):
    EName.append(fields[i].text.strip())
    MName.append(fields[i+1].text.strip())
    CName.append(fields[i+2].text.strip())
    Pinyin.append(fields[i+3].text.strip())
    TName.append(fields[i+4].text.strip())
    Region.append(fields[i+5].text.strip())
    Area.append(fields[i+6].text.strip())
    Population.append(fields[i+7].text.strip())
    Density.append(fields[i+8].text.strip())

df_sg = pd.DataFrame(data=[EName, MName, CName, Pinyin, TName, Region, Area, Population, Density]).transpose()
df_sg.columns = ['EName', 'MName', 'CName','Pinyin','TName','Region','Area','Population','Density']
df_sg.head()

Unnamed: 0,EName,MName,CName,Pinyin,TName,Region,Area,Population,Density
0,Ang Mo Kio,,宏茂桥,Hóng mào qiáo,ஆங் மோ கியோ,North-East,13.94,165710,12000.0
1,Bedok,*,勿洛,Wù luò,பிடோ,East,21.69,281300,13000.0
2,Bishan,,碧山,Bì shān,பீஷான்,Central,7.62,88490,12000.0
3,Boon Lay,,文礼,Wén lǐ,பூன் லே,West,8.23,30,3.6
4,Bukit Batok,*,武吉巴督,Wǔjí bā dū,புக்கிட் பாத்தோக்,West,11.13,144410,13000.0


In [380]:
# Check 55 Neighborhoods
df_sg.shape

(55, 9)

In [381]:
# Extract only column 'EName'
df_sg2 = pd.DataFrame(data=[EName]).transpose()
df_sg2.columns = ['EName2']
df_sg2.head()

Unnamed: 0,EName2
0,Ang Mo Kio
1,Bedok
2,Bishan
3,Boon Lay
4,Bukit Batok


### 2b. Use Googlemaps to get Geographical Coordinates for all 55 Singapore Neighborhoods
   (Data Source:  Google Maps API) 
   

In [382]:
# install the google map api client library
!pip install -U googlemaps

Requirement already up-to-date: googlemaps in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (3.0.2)
Requirement not upgraded as not directly required: requests<3.0,>=2.11.1 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from googlemaps) (2.18.4)
Requirement not upgraded as not directly required: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from requests<3.0,>=2.11.1->googlemaps) (3.0.4)
Requirement not upgraded as not directly required: idna<2.7,>=2.5 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from requests<3.0,>=2.11.1->googlemaps) (2.6)
Requirement not upgraded as not directly required: urllib3<1.23,>=1.21.1 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from requests<3.0,>=2.11.1->googlemaps) (1.22)
Requirement not upgraded as not directly required: certifi>=2017.4.17 in /opt/conda/envs/DSX-Python35/lib/python3.5/site-packages (from requests<3.0,>=2.11.1->googlemaps) (2019.3.9)
[31mtensor

In [383]:
# The code was removed by Watson Studio for sharing.

In [384]:
lat = []
long = []

for i in range(0,len(df_sg2),1):
        geocode_result = gmaps.geocode('{}, Singapore'.format (df_sg2["EName2"][i]))
        lat.append(geocode_result[0]['geometry']['location']['lat'])
        long.append(geocode_result[0]['geometry']['location']['lng'])
print (lat, long)

[1.3691149, 1.3236038, 1.3525845, 1.3142556, 1.3590288, 1.2819046, 1.3774142, 1.3294113, 1.3551526, 1.3450101, 1.3219708, 1.3839803, 1.3161811, 1.2866961, 1.3200544, 1.3612182, 1.3328572, 1.3403898, 1.3100334, 1.4304941, 1.4260074, 1.2912788, 1.279294, 1.3019687, 1.2966147, 1.3075517, 1.4063775, 1.3208572, 1.301674, 1.2848825, 1.3720937, 1.3516087, 1.2857497, 1.3984457, 1.2941664, 1.2959376, 1.3051345, 1.405163, 1.4491107, 1.3868121, 1.3553567, 1.4443218, 1.2895263, 1.2536568, 1.2785501, 1.4073821, 1.3495907, 1.306932, 1.3555189, 1.3343035, 1.2949472, 1.2478844, 1.3471977, 1.4381922, 1.430368] [103.8454342, 103.9273405, 103.8352116, 103.7093099, 103.7636796, 103.8239182, 103.7719498, 103.8020777, 103.7972022, 103.9832089, 104.0290022, 103.7469611, 103.7649377, 103.8535097, 103.8917746, 103.8862529, 103.7435522, 103.7089875, 103.8651056, 103.7173325, 103.8241046, 103.8709039, 103.8701686, 103.8970821, 103.8485095, 103.8403765, 104.0323021, 103.8424319, 103.8380766, 103.8438992, 103.9473

## 3. Data Transformation
Transform the data into a new *pandas* dataframe

In [385]:
# define the dataframe columns
neighborhood = ['Region','Neighborhood','Latitude','Longitude'] 

# instantiate the dataframe
df_sgNeigh = pd.DataFrame(columns=neighborhood)
df_sgNeigh

Unnamed: 0,Region,Neighborhood,Latitude,Longitude


In [386]:
for i in range(0, df_sg.shape[0],1):
    
    df_sgNeigh = df_sgNeigh.append({'Region': df_sg['Region'][i], 
                                    'Neighborhood': df_sg['EName'][i],
                                    'Latitude': lat[i],
                                    'Longitude': long[i],
                                    'Population Density': df_sg['Density'][i] }, ignore_index=True)
    
df_sgNeigh.head(55)

Unnamed: 0,Region,Neighborhood,Latitude,Longitude,Population Density
0,North-East,Ang Mo Kio,1.369115,103.845434,12000
1,East,Bedok,1.323604,103.92734,13000
2,Central,Bishan,1.352585,103.835212,12000
3,West,Boon Lay,1.314256,103.70931,3.6
4,West,Bukit Batok,1.359029,103.76368,13000
5,Central,Bukit Merah,1.281905,103.823918,11000
6,West,Bukit Panjang,1.377414,103.77195,16000
7,Central,Bukit Timah,1.329411,103.802078,4400
8,North,Central Water Catchment,1.355153,103.797202,*
9,East,Changi,1.34501,103.983209,62.3


In [387]:
print('The dataframe has {} Region and {} neighborhoods.'.format(
        len(df_sgNeigh['Region'].unique()),
        df_sgNeigh.shape[0]
    )
)

The dataframe has 5 Region and 55 neighborhoods.


In [388]:
address = 'Singapore'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
SGlatitude = location.latitude
SGlongitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(SGlatitude, SGlongitude))

The geograpical coordinate of Singapore are 1.3408528, 103.878446863736.


## 4. Data Visualization 
Using Folium Library

### Create a map of Singapore with neighborhoods superimposed on top.

In [431]:
# create map of Singapore using latitude and longitude values
map_sg = folium.Map(location=[SGlatitude, SGlongitude], zoom_start=10)

# add markers to map
for lat, lng, region, neighborhood in zip(df_sgNeigh['Latitude'], df_sgNeigh['Longitude'], df_sgNeigh['Region'], df_sgNeigh['Neighborhood']):
    label = '{}, {}'.format(neighborhood, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sg)  
    
map_sg

## 5. Data Exploration
Using Foursquare API
### Define Foursquare Credentials and Version
(Data Source:  Foursquare API)

In [391]:
# The code was removed by Watson Studio for sharing.

### Explore the top 10 venues of each neighborhood in Singapore
We start by first exploring the first neighborhood - Ang Mo Kio

In [392]:
# type your answer here
LIMIT = 100
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    df_sgNeigh['Latitude'][0], 
    df_sgNeigh['Longitude'][0], 
    radius, 
    LIMIT)

In [393]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf92bb0db04f52f63047f9a'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b9f647df964a520032037e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d142941735',
         'name': 'Asian Restaurant',
         'pluralName': 'Asian Restaurants',
         'primary': True,
         'shortName': 'Asian'}],
       'id': '4b9f647df964a520032037e3',
       'location': {'address': 'Blk 202 Ang Mo Kio Ave 3',
        'cc': 'SG',
        'country': 'Singapore',
        'distance': 180,
        'formattedAddress': ['Blk 202 Ang Mo Kio Ave 3', 'Singapore'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 1.3681666144843387,
          'lng': 103.84411848687625}

### Create a Function using the get_category_type function from the Foursquare Lab

In [394]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [395]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(neigborhood, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [396]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Kam Jia Zhuang Restaurant,Asian Restaurant,1.368167,103.844118
1,Old Chang Kee,Snack Place,1.369094,103.848389
2,Subway,Sandwich Place,1.369136,103.847612
3,MOS Burger,Burger Joint,1.36917,103.847831
4,Bun Master,Bakery,1.369242,103.849031


In [397]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

63 venues were returned by Foursquare.


Now Let's explore the top venues of all 55 Neighborhoods in Singapore

In [398]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Create a new dataframe called *sg_venues*

In [399]:
sg_venues = getNearbyVenues(names=df_sgNeigh['Neighborhood'],
                                   latitudes=df_sgNeigh['Latitude'],
                                   longitudes=df_sgNeigh['Longitude']
                                  )

Ang Mo Kio
Bedok
Bishan
Boon Lay
Bukit Batok
Bukit Merah
Bukit Panjang
Bukit Timah
Central Water Catchment
Changi
Changi Bay
Choa Chu Kang
Clementi
Downtown Core
Geylang
Hougang
Jurong East
Jurong West
Kallang
Lim Chu Kang
Mandai
Marina East
Marina South
Marine Parade
Museum
Newton
North-Eastern Islands
Novena
Orchard
Outram
Pasir Ris
Paya Lebar
Pioneer
Punggol
Queenstown
River Valley
Rochor
Seletar
Sembawang
Sengkang
Serangoon
Simpang
Singapore River
Southern Islands
Straits View
Sungei Kadut
Tampines
Tanglin
Tengah
Toa Payoh
Tuas
Western Islands
Western Water Catchment
Woodlands
Yishun


In [401]:
print(sg_venues.shape)
sg_venues.head()

(1545, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Ang Mo Kio,1.369115,103.845434,Kam Jia Zhuang Restaurant,1.368167,103.844118,Asian Restaurant
1,Ang Mo Kio,1.369115,103.845434,Old Chang Kee,1.369094,103.848389,Snack Place
2,Ang Mo Kio,1.369115,103.845434,Subway,1.369136,103.847612,Sandwich Place
3,Ang Mo Kio,1.369115,103.845434,MOS Burger,1.36917,103.847831,Burger Joint
4,Ang Mo Kio,1.369115,103.845434,Bun Master,1.369242,103.849031,Bakery


### Number of Venues returned for each neighborhood

In [402]:
sg_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ang Mo Kio,63,63,63,63,63,63
Bedok,61,61,61,61,61,61
Bishan,49,49,49,49,49,49
Boon Lay,1,1,1,1,1,1
Bukit Batok,34,34,34,34,34,34
Bukit Merah,12,12,12,12,12,12
Bukit Panjang,11,11,11,11,11,11
Bukit Timah,3,3,3,3,3,3
Central Water Catchment,2,2,2,2,2,2
Changi,12,12,12,12,12,12


### Check number of unique categories from all the returned venues from the 55 neighborhoods

In [405]:
print('There are {} uniques categories.'.format(len(sg_venues['Venue Category'].unique())))

There are 227 uniques categories.


## 6. Data Analysis
Using one hot encoding to convert the returned venues into numerical values for K-Clustering

In [406]:
# one hot encoding
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sg_onehot['Neighborhood'] = sg_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sg_onehot.columns[-1]] + list(sg_onehot.columns[:-1])
sg_onehot = sg_onehot[fixed_columns]

sg_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bay,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Green,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Campground,Canal,Candy Store,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Theater,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Dongbei Restaurant,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden Center,Gastropub,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health Food Store,High School,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Ice Cream Shop,Indian Restaurant,Indie Theater,Indonesian Restaurant,Indoor Play Area,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Library,Lingerie Store,Lottery Retailer,Lounge,Malay Restaurant,Manchu Restaurant,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Military Base,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Peking Duck Restaurant,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Ramen Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,Ang Mo Kio,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ang Mo Kio,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ang Mo Kio,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ang Mo Kio,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ang Mo Kio,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [407]:
sg_onehot.shape

(1545, 228)

### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each catergory

In [408]:
sg_grouped = sg_onehot.groupby('Neighborhood').mean().reset_index()
sg_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bay,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Green,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Campground,Canal,Candy Store,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Theater,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Dongbei Restaurant,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden Center,Gastropub,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health Food Store,High School,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Housing Development,Ice Cream Shop,Indian Restaurant,Indie Theater,Indonesian Restaurant,Indoor Play Area,Island,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Library,Lingerie Store,Lottery Retailer,Lounge,Malay Restaurant,Manchu Restaurant,Market,Massage Studio,Medical Center,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Military Base,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motel,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Paper / Office Supplies Store,Park,Pedestrian Plaza,Peking Duck Restaurant,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Pub,Ramen Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Satay Restaurant,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Theme Restaurant,Tourist Information Center,Toy / Game Store,Track Stadium,Trail,Train Station,Tunnel,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wings Joint,Yoga Studio
0,Ang Mo Kio,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.047619,0.0,0.0,0.0,0.015873,0.0,0.015873,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.063492,0.0,0.0,0.0,0.0,0.0,0.0,0.079365,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.047619,0.031746,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bedok,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.016393,0.032787,0.0,0.0,0.0,0.0,0.04918,0.016393,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.04918,0.0,0.016393,0.032787,0.016393,0.0,0.0,0.0,0.032787,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04918,0.04918,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0
2,Bishan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.122449,0.0,0.0,0.0,0.0,0.102041,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.061224,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.020408,0.0,0.020408,0.0,0.0,0.020408,0.081633,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Boon Lay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Batok,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.058824,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bukit Merah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bukit Panjang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bukit Timah,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Central Water Catchment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Changi,0.0,0.083333,0.083333,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [106]:
sg_grouped.shape

(52, 228)

### Print each neighborhood along with top 5 most common venues

In [409]:
num_top_venues = 5

for hood in sg_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sg_grouped[sg_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ang Mo Kio----
                  venue  freq
0           Coffee Shop  0.10
1            Food Court  0.08
2  Fast Food Restaurant  0.06
3           Supermarket  0.05
4          Dessert Shop  0.05


----Bedok----
                venue  freq
0         Coffee Shop  0.07
1    Sushi Restaurant  0.05
2          Food Court  0.05
3  Chinese Restaurant  0.05
4         Supermarket  0.05


----Bishan----
                venue  freq
0                Café  0.12
1  Chinese Restaurant  0.10
2     Thai Restaurant  0.08
3   Indian Restaurant  0.06
4      Ice Cream Shop  0.06


----Boon Lay----
               venue  freq
0   Botanical Garden   1.0
1  Accessories Store   0.0
2  Other Repair Shop   0.0
3              Motel   0.0
4      Movie Theater   0.0


----Bukit Batok----
                venue  freq
0  Italian Restaurant  0.09
1       Shopping Mall  0.06
2                Café  0.06
3         Supermarket  0.06
4                 Gym  0.06


----Bukit Merah----
                 venue  freq
0   Chines

### Create a function to sort the venues in descending order

In [410]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Create a new dataframe that display the top 10 venues for each neighborhood

In [411]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sg_grouped['Neighborhood']

for ind in np.arange(sg_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(55)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ang Mo Kio,Coffee Shop,Food Court,Fast Food Restaurant,Supermarket,Dessert Shop,Bubble Tea Shop,Sushi Restaurant,Sandwich Place,Halal Restaurant,Seafood Restaurant
1,Bedok,Coffee Shop,Supermarket,Sushi Restaurant,Food Court,Chinese Restaurant,Café,Fried Chicken Joint,Dessert Shop,Fast Food Restaurant,Indian Restaurant
2,Bishan,Café,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Indian Restaurant,Food Court,French Restaurant,Coffee Shop,Park,Spa
3,Boon Lay,Botanical Garden,Yoga Studio,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food
4,Bukit Batok,Italian Restaurant,Gym,Café,Ice Cream Shop,Indian Restaurant,Supermarket,Shopping Mall,Coffee Shop,Grocery Store,Bus Stop
5,Bukit Merah,Chinese Restaurant,Convenience Store,Residential Building (Apartment / Condo),Food Court,Seafood Restaurant,Bus Line,Japanese Restaurant,Asian Restaurant,Coffee Shop,Yoga Studio
6,Bukit Panjang,Food Court,Park,Miscellaneous Shop,Grocery Store,Dance Studio,Fruit & Vegetable Store,Market,Noodle House,Food Stand,Food & Drink Shop
7,Bukit Timah,Pool,Food,Gym / Fitness Center,Farmers Market,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Flower Shop
8,Central Water Catchment,Trail,Yoga Studio,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food
9,Changi,Airport Terminal,Bus Station,Movie Theater,Tunnel,Café,Coffee Shop,Sporting Goods Shop,Road,Airport Service,Airport


## 7. K-Mean Clustering
Run k-means to cluster the neighborhood into 10 clusters

In [413]:
# set number of clusters
kclusters = 10

sg_grouped_clustering = sg_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:54] 

array([1, 1, 1, 4, 1, 1, 1, 1, 3, 1, 6, 1, 1, 1, 1, 1, 1, 1, 1, 8, 1, 7, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 5, 1,
       1, 1, 9, 1, 1, 1], dtype=int32)

### New Dataframe that includes the cluster labels as well as the top 10 venues for each neighborhood

In [414]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sg_merged = df_sgNeigh

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
sg_merged = sg_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# Replace NaN with '0'
sg_merged['Cluster Labels'].fillna(value=0, method=None, axis=None, inplace=True)
sg_merged.head() # check the last columns!

Unnamed: 0,Region,Neighborhood,Latitude,Longitude,Population Density,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North-East,Ang Mo Kio,1.369115,103.845434,12000.0,1.0,Coffee Shop,Food Court,Fast Food Restaurant,Supermarket,Dessert Shop,Bubble Tea Shop,Sushi Restaurant,Sandwich Place,Halal Restaurant,Seafood Restaurant
1,East,Bedok,1.323604,103.92734,13000.0,1.0,Coffee Shop,Supermarket,Sushi Restaurant,Food Court,Chinese Restaurant,Café,Fried Chicken Joint,Dessert Shop,Fast Food Restaurant,Indian Restaurant
2,Central,Bishan,1.352585,103.835212,12000.0,1.0,Café,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Indian Restaurant,Food Court,French Restaurant,Coffee Shop,Park,Spa
3,West,Boon Lay,1.314256,103.70931,3.6,4.0,Botanical Garden,Yoga Studio,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food
4,West,Bukit Batok,1.359029,103.76368,13000.0,1.0,Italian Restaurant,Gym,Café,Ice Cream Shop,Indian Restaurant,Supermarket,Shopping Mall,Coffee Shop,Grocery Store,Bus Stop


In [415]:
# create map
map_clusters = folium.Map(location=[SGlatitude, SGlongitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['Latitude'], sg_merged['Longitude'], sg_merged['Neighborhood'], sg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.6).add_to(map_clusters)
       
map_clusters

### Visualise the resulting 10 clusters

In [416]:
sg_merged.loc[sg_merged['Cluster Labels'] == 0, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,Sengkang,0.0,Bus Station,Grocery Store,Food Stand,General College & University,Metro Station,Bus Line,Basketball Court,Yoga Studio,Flea Market,Fried Chicken Joint
41,Simpang,0.0,,,,,,,,,,
48,Tengah,0.0,,,,,,,,,,
51,Western Islands,0.0,,,,,,,,,,


The resultant 'NaN' is due to several reasons:
1. Simpang is not yet developed and currently still a forested area.
2. Tengah is a new town that is still under construction.
3. Western island (Jurong island to be specific) is an full fledged industrial area.

In [417]:
sg_merged.loc[sg_merged['Cluster Labels'] == 1, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Ang Mo Kio,1.0,Coffee Shop,Food Court,Fast Food Restaurant,Supermarket,Dessert Shop,Bubble Tea Shop,Sushi Restaurant,Sandwich Place,Halal Restaurant,Seafood Restaurant
1,Bedok,1.0,Coffee Shop,Supermarket,Sushi Restaurant,Food Court,Chinese Restaurant,Café,Fried Chicken Joint,Dessert Shop,Fast Food Restaurant,Indian Restaurant
2,Bishan,1.0,Café,Chinese Restaurant,Thai Restaurant,Ice Cream Shop,Indian Restaurant,Food Court,French Restaurant,Coffee Shop,Park,Spa
4,Bukit Batok,1.0,Italian Restaurant,Gym,Café,Ice Cream Shop,Indian Restaurant,Supermarket,Shopping Mall,Coffee Shop,Grocery Store,Bus Stop
5,Bukit Merah,1.0,Chinese Restaurant,Convenience Store,Residential Building (Apartment / Condo),Food Court,Seafood Restaurant,Bus Line,Japanese Restaurant,Asian Restaurant,Coffee Shop,Yoga Studio
6,Bukit Panjang,1.0,Food Court,Park,Miscellaneous Shop,Grocery Store,Dance Studio,Fruit & Vegetable Store,Market,Noodle House,Food Stand,Food & Drink Shop
7,Bukit Timah,1.0,Pool,Food,Gym / Fitness Center,Farmers Market,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Flower Shop
9,Changi,1.0,Airport Terminal,Bus Station,Movie Theater,Tunnel,Café,Coffee Shop,Sporting Goods Shop,Road,Airport Service,Airport
11,Choa Chu Kang,1.0,Food Court,Coffee Shop,Bus Station,Fast Food Restaurant,Bubble Tea Shop,Furniture / Home Store,Sandwich Place,Lottery Retailer,Thai Restaurant,Bakery
12,Clementi,1.0,Coffee Shop,Noodle House,Chinese Restaurant,Asian Restaurant,Snack Place,Japanese Restaurant,Shopping Mall,Bakery,Dessert Shop,Electronics Store


In [418]:
sg_merged.loc[sg_merged['Cluster Labels'] == 2, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Paya Lebar,2.0,Military Base,Yoga Studio,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food


In [420]:
sg_merged.loc[sg_merged['Cluster Labels'] == 3, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Water Catchment,3.0,Trail,Yoga Studio,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food


In [421]:
sg_merged.loc[sg_merged['Cluster Labels'] == 4, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Boon Lay,4.0,Botanical Garden,Yoga Studio,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food


In [422]:
sg_merged.loc[sg_merged['Cluster Labels'] == 5, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Punggol,5.0,Bus Station,High School,Chinese Restaurant,Yoga Studio,Fast Food Restaurant,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop
45,Sungei Kadut,5.0,Bus Station,Café,Furniture / Home Store,Chinese Restaurant,Fast Food Restaurant,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop


In [423]:
sg_merged.loc[sg_merged['Cluster Labels'] == 6, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Changi Bay,6.0,Boat or Ferry,Military Base,Field,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop


In [424]:
sg_merged.loc[sg_merged['Cluster Labels'] == 7, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Marina East,7.0,Golf Course,Park,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food


In [425]:
sg_merged.loc[sg_merged['Cluster Labels'] == 8, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Lim Chu Kang,8.0,Farm,Farmers Market,Theme Park Ride / Attraction,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food Court,Food & Drink Shop,Food


In [426]:
sg_merged.loc[sg_merged['Cluster Labels'] == 9, sg_merged.columns[[1] + list(range(5, sg_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Tuas,9.0,Food Court,Asian Restaurant,Yoga Studio,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Stand,Food & Drink Shop,Food


## 8. Evaluating the 10 Clusters

10 Clusters are chosen to better distinct the neighborhoods as it is initally found that Singapore's Neigborhoods are all too similar. Most Neighborhoods fall into Cluster '0' which is a cluster that is categorised by *eateries*. The 1st most common venues are *'Coffee shop', 'Restaurant', 'Food court', 'Bar'* etc. Areas with coffee shops are *residential neighborhoods* as it is common to have coffee shops within HDB estates. Since *80%* of Singapore residential typologies are HDBs, Cluster '0' consists of most neighborhoods out of the 55 neighborhoods.

*Cluster '0'* will be the chosen cluster to locate the cafe as the other clusters are less populated areas that are do not have enough human traffic to generate consistent revenues.

The *Museum* neigborhood is the only neighborhood in Cluster '0' that has *hotels* as the *first most common venue*. Therefore the second part of the analysis will use the Museum's *geographically coordinates - (Lat 1.296615, Long 103.848510)* as the epicentre to *query* for potential location for our cafe.

In [446]:
search_query = 'Hotel'
radius = 500
LIMIT = 100
print(search_query + ' .... OK!')

Hotel .... OK!


### Define the corresponding URL

In [447]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, 1.296615, 103.848510, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=UCVO2GVTL12OH0OWGRITBRVBUO2XABHYHFTVA2YIYXWZY2XE&client_secret=C5HRVGKZXEGWYOV0D03E4HRRCM5JW2AJ44A3ORJXDLSFLFUO&ll=1.296615,103.84851&v=20180604&query=Hotel&radius=500&limit=100'

### Send the GET request and examine the results

In [448]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5cf93ecddb04f52f5edd7a52'},
 'response': {'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d1fa931735',
      'name': 'Hotel',
      'pluralName': 'Hotels',
      'primary': True,
      'shortName': 'Hotel'}],
    'hasPerk': False,
    'id': '4b05880af964a520aead22e3',
    'location': {'address': '9 Bras Basah Road',
     'cc': 'SG',
     'city': 'Singapore',
     'country': 'Singapore',
     'distance': 227,
     'formattedAddress': ['9 Bras Basah Road', '189559', 'Singapore'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 1.2985851758211133,
       'lng': 103.84906056770934}],
     'lat': 1.2985851758211133,
     'lng': 103.84906056770934,
     'postalCode': '189559'},
    'name': 'Rendezvous Hotel Singapore',
    'referralId': 'v-1559838413'},
   {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/

### Get relevant part of JSON and transform it into a *pandas* dataframe

In [449]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.shape

(50, 19)

### Filter for venue categories

In [367]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Rendezvous Hotel Singapore,Hotel,9 Bras Basah Road,SG,Singapore,Singapore,,227,"[9 Bras Basah Road, 189559, Singapore]","[{'lng': 103.84906056770934, 'lat': 1.29858517...",1.298585,103.849061,,189559.0,,4b05880af964a520aead22e3
1,Carlton Hotel,Hotel,76 Bras Basah Rd,SG,Singapore,Singapore,,471,"[76 Bras Basah Rd, 189558, Singapore]","[{'lng': 103.85265190696873, 'lat': 1.29571376...",1.295714,103.852652,,189558.0,,4b05880af964a5208aad22e3
2,JW Marriott Hotel Singapore South Beach,Hotel,"30 Beach Road, Access Via Nicoll Highway",SG,Singapore,Singapore,,844,"[30 Beach Road, Access Via Nicoll Highway, 189...","[{'lng': 103.85579288005829, 'lat': 1.29446877...",1.294469,103.855793,,189763.0,SG-01,56a80c97498e568a286384c6
3,Hotel Bencoolen,Hotel,47 Bencoolen St.,SG,Singapore,Singapore,,333,"[47 Bencoolen St., 189626, Singapore]","[{'lng': 103.850186415232, 'lat': 1.2991005826...",1.299101,103.850186,,189626.0,,4b05880af964a52095ad22e3
4,Strand Hotel,Hotel,25 Bencoolen St,SG,Singapore,Singapore,,259,"[25 Bencoolen St, 189619, Singapore]","[{'lng': 103.84995898321019, 'lat': 1.29844189...",1.298442,103.849959,,189619.0,,4b6e3d6df964a520a5b32ce3
5,Marina Bay Sands Hotel,Hotel,10 Bayfront Ave.,SG,Singapore,Singapore,,1996,"[10 Bayfront Ave., 018956, Singapore]",,1.283096,103.860296,,18956.0,,4bf7c0404a67c928061924cf
6,Hotel G Singapore,Hotel,200 Middle Road,SG,Singapore,Singapore,,558,"[200 Middle Road, 188980, Singapore]","[{'lng': 103.85147545152623, 'lat': 1.30066073...",1.300661,103.851475,,188980.0,,515d4b70e8897cf50c48cc03
7,Mercure Hotel,Hotel,122 Middle Road,SG,Singapore,Singapore,,610,"[122 Middle Road, 188973, Singapore]","[{'lng': 103.85317286291435, 'lat': 1.29950306...",1.299503,103.853173,,188973.0,,4b164b8ef964a52011b823e3
8,V Hotel Bencoolen,Hotel,48 Bencoolen Street #01-01,SG,Bugis,Singapore,,362,"[48 Bencoolen Street #01-01, 189627, Singapore]","[{'lng': 103.85058095759739, 'lat': 1.29912982...",1.29913,103.850581,,189627.0,,5109c750e4b0987e64028b97
9,Grand Park City Hall Hotel,Hotel,10 Coleman St.,SG,Singapore,Singapore,,517,"[10 Coleman St., 179809, Singapore]","[{'lng': 103.85031946942136, 'lat': 1.29232969...",1.29233,103.850319,,179809.0,,4b4ea3a5f964a52029f326e3


### Visualise the hotels nearby *Museum Neighborhood*

In [368]:
dataframe_filtered.name

0                  Rendezvous Hotel Singapore
1                               Carlton Hotel
2     JW Marriott Hotel Singapore South Beach
3                             Hotel Bencoolen
4                                Strand Hotel
5                      Marina Bay Sands Hotel
6                           Hotel G Singapore
7                               Mercure Hotel
8                           V Hotel Bencoolen
9                  Grand Park City Hall Hotel
10                  Peninsula Excelsior Hotel
11                             Studio M Hotel
12                         Hotel Fort Canning
13                        The Fullerton Hotel
14         Hotel Jen Orchardgateway Singapore
15                             Concorde Hotel
16                             Victoria Hotel
17                              Raffles Hotel
18                        Hotel Grand Central
19             Tower 3 Marina Bay Sands Hotel
20                        Village Hotel Bugis
21          Fragrance Hotel Selegi

In [432]:
venues_map = folium.Map(location=[SGlatitude, SGlongitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle radius with museum neighborhood as epicentre
folium.features.CircleMarker(
    [1.296615, 103.848510],
    radius=100,
    color='red',
    popup='Museum Neighborhood Radius',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.1
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## 9. Conclusion

The decision is to locate the cafe within the *500 metres radius* of the *Museum Neighborhood* due to the large number of hotels, restaurants, bars and other eating places. It is also located within walking distance to Singapore's attractions such as Chinatown, Marina Bay Sands, Singapore Flyer, National Gallery and many more.