Project: building a mall in Singapore.
Steps:
-scrap wikipedia page with BeautifulSoup,
-create dataframe with pandas,
-get coordinates with geocoder,
-get venues with Foursuare API,
-cluster neighborhoods,
-select cluster.

In [1]:
!conda install -c conda-forge beautifulsoup4 --yes

!conda install -c conda-forge geopy --yes

!conda install -c conda-forge folium=0.5.0 --yes

print('Libraries installed!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - beautifulsoup4


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    certifi-2018.8.24          |        py35_1001         139 KB  conda-forge
    beautifulsoup4-4.6.3       |           py35_0         139 KB  conda-forge
    openssl-1.0.2r             |       h14c3975_0         3.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.6 MB

The following packages will be UPDATED:

    beautifulsoup4:  4.6.0-py35h442a8c9_1 --> 4.6.3-py35_0         conda-forge
    ca-certificates: 2019.1.23-0          --> 2019.6.16-hecc5488_0 conda-forge
    certifi:         2018.8.24-py35_1     --> 2018.8.24-py35_100

In [2]:
!conda install -c conda-forge geocoder --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geocoder-1.38.1            |             py_0          52 KB  conda-forge
    ratelim-0.1.6              |           py35_0           5 KB  conda-forge
    orderedset-2.0             |           py35_0         685 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         742 KB

The following NEW packages will be INSTALLED:

    geocoder:   1.38.1-py_0  conda-forge
    orderedset: 2.0-py35_0   conda-forge
    ratelim:    0.1.6-py35_0 conda-forge


Downloading and Extracting Packages
geocoder-1.38.1      | 52 KB     | ##################################### | 100% 
ratelim-0.1.6        | 5 KB      | #######################

In [3]:
from bs4 import BeautifulSoup  #parse html and xml
import requests   # deal with requests
import numpy as np  # vector data
import pandas as pd  # dataframe analysis 
pd.set_option('display.max_columns', None) # display all columns
pd.set_option('display.max_rows', None) # display all rows

from pandas.io.json import json_normalize #transform json in pandas df
import json #read json files

import folium #Manipulate your data in Python, then visualize it in a Leaflet map via folium
from geopy.geocoders import Nominatim #locate adresses
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
import geocoder # to get coordinates


print('Libraries installed')
 


Libraries installed


Import Libraries

In [4]:
data = requests.get("https://en.wikipedia.org/wiki/Postal_codes_in_Singapore").text

In [5]:
soup = BeautifulSoup(data, 'html.parser')

In [6]:
neighborhoodList3 = []
postalsectorList3 = []

In [7]:
for row in soup.find('table').find_all('tr'):   # find all row in a table
    cells = row.find_all('td')
    if(len(cells) > 0):
        postalsectorList3.append(cells[0].text.rstrip('\n'))
        neighborhoodList3.append(cells[2].text.rstrip('\n'))
        

In [8]:
singa_df = pd.DataFrame({"PostalSector": postalsectorList3,
                           "Neighborhood": neighborhoodList3})

singa_df.head()

Unnamed: 0,Neighborhood,PostalSector
0,"Raffles Place, Cecil, Marina, People's Park",1
1,"Anson, Tanjong Pagar",2
2,"Bukit Merah, Queenstown, Tiong Bahru",3
3,"Telok Blangah, Harbourfront",4
4,"Pasir Panjang, Hong Leong Garden, Clementi New...",5


In [9]:
singa_df.shape

(28, 2)

In [10]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Singapore'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [11]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in singa_df["Neighborhood"].tolist() ]


In [12]:
coords

[[1.2818900000000326, 103.84912000000008],
 [1.2788900000000467, 103.84539000000007],
 [1.2895300000000702, 103.83208000000008],
 [1.2727710000000059, 103.8096601268702],
 [1.315125000000009, 103.75577550000003],
 [1.2906179999999985, 103.8494475],
 [1.2994090927865445, 103.85290178261569],
 [1.3071000000000481, 103.85842000000008],
 [1.3064600038951875, 103.83898002668911],
 [1.3404100000000199, 103.77221000000009],
 [1.3266700000000355, 103.81139000000007],
 [1.355540000000076, 103.87660000000005],
 [1.3278300693923344, 103.88544992281749],
 [1.3114700000000425, 103.88218000000006],
 [1.3008653140007216, 103.90163630630467],
 [1.320698967626008, 103.95086828863809],
 [1.3749700000000757, 103.97395000000006],
 [1.371940000000052, 103.94994000000008],
 [1.3638900000000262, 103.85750000000007],
 [1.3644700000000398, 103.83506000000006],
 [1.3277700261122058, 103.76665002221708],
 [1.3208800000000451, 103.74532000000005],
 [1.3787700000000314, 103.76977000000005],
 [1.4196700000000533, 1

In [13]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [14]:
singa_df['Latitude'] = df_coords['Latitude']
singa_df['Longitude'] = df_coords['Longitude']

In [15]:
singa_df

Unnamed: 0,Neighborhood,PostalSector,Latitude,Longitude
0,"Raffles Place, Cecil, Marina, People's Park",1,1.28189,103.84912
1,"Anson, Tanjong Pagar",2,1.27889,103.84539
2,"Bukit Merah, Queenstown, Tiong Bahru",3,1.28953,103.83208
3,"Telok Blangah, Harbourfront",4,1.272771,103.80966
4,"Pasir Panjang, Hong Leong Garden, Clementi New...",5,1.315125,103.755776
5,"High Street, Beach Road (part)",6,1.290618,103.849447
6,"Middle Road, Golden Mile",7,1.299409,103.852902
7,"Little India, Farrer Park, Jalan Besar, Lavender",8,1.3071,103.85842
8,"Orchard, Cairnhill, River Valley",9,1.30646,103.83898
9,"Ardmore, Bukit Timah, Holland Road, Tanglin",10,1.34041,103.77221


In [16]:
# save as csv
singa_df.to_csv("singa_df.csv", index=False)

## create a mappf Singapore with neighborhoods superimposed on top

In [17]:
address = 'Singapore'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Singapore 1.3408528, 103.878446863736.


In [18]:
#create a map of Singapore using these coordinates
map_singa = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(singa_df['Latitude'], singa_df['Longitude'], singa_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_singa)  
    
map_singa

In [23]:
#save map as html
map_singa.save('map_singa.html')

In [24]:
# use Foursquare API credentials
# define Foursquare Credentials and Version API Foursquare
CLIENT_ID = 'J5UD2VVFGSFKT1UKSN2RA0OXBG4ENP5QGPXFA4S4E2LCVFCP' # Foursquare ID
CLIENT_SECRET = 'NBGKO3VY3QH2UHQFUEI3LEMUEATZYPV0OB5FGUTK1O14YONN' #  Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: J5UD2VVFGSFKT1UKSN2RA0OXBG4ENP5QGPXFA4S4E2LCVFCP
CLIENT_SECRET:NBGKO3VY3QH2UHQFUEI3LEMUEATZYPV0OB5FGUTK1O14YONN


## now lets use Foursquare API

In [25]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(singa_df['Latitude'], singa_df['Longitude'], singa_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [26]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2442, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Napoleon Food & Wine Bar,1.279925,103.847333,Wine Bar
1,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Fat Saigon Boy,1.282977,103.849068,Vietnamese Restaurant
2,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Freehouse,1.281254,103.848513,Beer Garden
3,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Luke's Oyster Bar & Chop House,1.282459,103.84724,Seafood Restaurant
4,"Raffles Place, Cecil, Marina, People's Park",1.28189,103.84912,Wine Connection,1.283888,103.848359,Wine Bar


Let's check how many venues were returned for each neighorhood

In [27]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Anson, Tanjong Pagar",100,100,100,100,100,100
"Ardmore, Bukit Timah, Holland Road, Tanglin",93,93,93,93,93,93
"Balestier, Toa Payoh, Serangoon",100,100,100,100,100,100
"Bedok, Upper East Coast, Eastwood, Kew Drive",100,100,100,100,100,100
"Bishan, Ang Mo Kio",100,100,100,100,100,100
"Bukit Merah, Queenstown, Tiong Bahru",100,100,100,100,100,100
"Geylang, Eunos",100,100,100,100,100,100
"High Street, Beach Road (part)",100,100,100,100,100,100
"Hillview, Dairy Farm, Bukit Panjang, Choa Chu Kang",72,72,72,72,72,72
"Jurong, Tuas",100,100,100,100,100,100


Let's find out how many unique categories can be curated from all the returned venues

In [28]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 247 uniques categories.


In [29]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Wine Bar', 'Vietnamese Restaurant', 'Beer Garden',
       'Seafood Restaurant', 'Modern European Restaurant', 'Restaurant',
       'Korean Restaurant', 'Hotpot Restaurant', 'Café',
       'Gym / Fitness Center', 'Food Court', 'Cocktail Bar',
       'Sandwich Place', 'Martial Arts Dojo', 'Mediterranean Restaurant',
       'Buddhist Temple', 'Japanese Restaurant', 'Hotel',
       'Street Food Gathering', 'Deli / Bodega', 'Gym',
       'Comfort Food Restaurant', 'Yoga Studio', 'Spanish Restaurant',
       'Other Great Outdoors', 'Salad Place', 'Chinese Restaurant',
       'Hostel', 'Pub', 'Waterfront', 'Dumpling Restaurant', 'Bar',
       'Brewery', 'Coffee Shop', 'Burrito Place', 'Bookstore',
       'Harbor / Marina', 'Massage Studio', 'Ice Cream Shop',
       'Ramen Restaurant', 'Plaza', 'Pool', 'Bridge', 'Bakery',
       'Tapas Restaurant', 'Outdoor Sculpture', 'Italian Restaurant',
       'Nightclub', 'Beer Bar', 'Kebab Restaurant'], dtype=object)

In [31]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

In [32]:
# one hot encoding
singa_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
singa_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [singa_onehot.columns[-1]] + list(singa_onehot.columns[:-1])
singa_onehot = singa_onehot[fixed_columns]

print(singa_onehot.shape)
singa_onehot.head()

(2442, 248)


Unnamed: 0,Neighborhoods,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bay,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Betting Shop,Bike Trail,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Cafeteria,Café,Campground,Candy Store,Cantonese Restaurant,Chinese Restaurant,Chocolate Shop,Churrascaria,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Theater,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farm,Fast Food Restaurant,Filipino Restaurant,Fishing Spot,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fujian Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,High School,Historic Site,History Museum,Hobby Shop,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Mosque,Mountain,Movie Theater,Multiplex,Museum,Nature Preserve,Neighborhood,Nightclub,Non-Profit,Noodle House,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Racetrack,Ramen Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Rock Climbing Spot,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Taiwanese Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
1,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
2,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Raffles Place, Cecil, Marina, People's Park",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


In [33]:
singa_grouped = singa_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(singa_grouped.shape)
singa_grouped

(28, 248)


Unnamed: 0,Neighborhoods,Accessories Store,Airport,Airport Service,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bay,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Betting Shop,Bike Trail,Bistro,Boarding House,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Cafeteria,Café,Campground,Candy Store,Cantonese Restaurant,Chinese Restaurant,Chocolate Shop,Churrascaria,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Theater,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farm,Fast Food Restaurant,Filipino Restaurant,Fishing Spot,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fujian Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,High School,Historic Site,History Museum,Hobby Shop,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Island,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Mosque,Mountain,Movie Theater,Multiplex,Museum,Nature Preserve,Neighborhood,Nightclub,Non-Profit,Noodle House,Other Great Outdoors,Other Repair Shop,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Planetarium,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Post Office,Print Shop,Pub,Racetrack,Ramen Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Rock Climbing Spot,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Ski Area,Ski Chalet,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Club,Stadium,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Swiss Restaurant,Taiwanese Restaurant,Tapas Restaurant,Taxi Stand,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Water Park,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio
0,"Anson, Tanjong Pagar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.08,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.02
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.010753,0.010753,0.0,0.053763,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.021505,0.010753,0.0,0.064516,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.086022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.021505,0.010753,0.010753,0.0,0.010753,0.0,0.0,0.021505,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.021505,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.043011,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.021505,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.075269,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0
2,"Balestier, Toa Payoh, Serangoon",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.02,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.03,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.07,0.01,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
4,"Bishan, Ang Mo Kio",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Bukit Merah, Queenstown, Tiong Bahru",0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.03
6,"Geylang, Eunos",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.01,0.09,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.05,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01
7,"High Street, Beach Road (part)",0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.12,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.04,0.02,0.03,0.0,0.01
8,"Hillview, Dairy Farm, Bukit Panjang, Choa Chu ...",0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.013889,0.027778,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.069444,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.013889,0.0,0.013889,0.013889,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.013889,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.013889,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Jurong, Tuas",0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.02,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.03,0.07,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
len(singa_grouped[singa_grouped["Shopping Mall"] > 0])

19

Create a new DataFrame for Shopping Mall data only

In [35]:
singa_mall = singa_grouped[["Neighborhoods","Shopping Mall"]]

In [36]:
singa_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,"Anson, Tanjong Pagar",0.0
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0
2,"Balestier, Toa Payoh, Serangoon",0.01
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",0.02
4,"Bishan, Ang Mo Kio",0.0


Cluster Neighborhoods

Run k-means to cluster the neighborhoods in SIngapore into 3 clusters.

In [37]:
# set number of clusters
kclusters = 3

singa_clustering = singa_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(singa_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 1, 1, 2, 0, 1, 1, 0, 0], dtype=int32)

In [38]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
singa_merged = singa_mall.copy()

# add clustering labels
singa_merged["Cluster Labels"] = kmeans.labels_

In [39]:
singa_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
singa_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,"Anson, Tanjong Pagar",0.0,2
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0,2
2,"Balestier, Toa Payoh, Serangoon",0.01,1
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",0.02,1
4,"Bishan, Ang Mo Kio",0.0,2


In [40]:
# merge singa_grouped with singa_data to add latitude/longitude for each neighborhood
singa_merged = singa_merged.join(singa_df.set_index("Neighborhood"), on="Neighborhood")

print(singa_merged.shape)
singa_merged.head() # check the last columns!

(28, 6)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,PostalSector,Latitude,Longitude
0,"Anson, Tanjong Pagar",0.0,2,2,1.27889,103.84539
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0,2,10,1.34041,103.77221
2,"Balestier, Toa Payoh, Serangoon",0.01,1,12,1.35554,103.8766
3,"Bedok, Upper East Coast, Eastwood, Kew Drive",0.02,1,16,1.320699,103.950868
4,"Bishan, Ang Mo Kio",0.0,2,20,1.36447,103.83506


In [41]:
# sort the results by Cluster Labels
print(singa_merged.shape)
singa_merged.sort_values(["Cluster Labels"], inplace=True)
singa_merged

(28, 6)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,PostalSector,Latitude,Longitude
5,"Bukit Merah, Queenstown, Tiong Bahru",0.04,0,3,1.28953,103.83208
17,"Orchard, Cairnhill, River Valley",0.06,0,9,1.30646,103.83898
8,"Hillview, Dairy Farm, Bukit Panjang, Choa Chu ...",0.041667,0,23,1.37877,103.76977
9,"Jurong, Tuas",0.06,0,22,1.32088,103.74532
13,"Little India, Farrer Park, Jalan Besar, Lavender",0.02,1,8,1.3071,103.85842
23,"Telok Blangah, Harbourfront",0.01,1,4,1.272771,103.80966
22,"Simei, Tampines, Pasir Ris",0.01087,1,18,1.37194,103.94994
21,"Serangoon Garden, Hougang, Punggol",0.01,1,19,1.36389,103.8575
18,"Pasir Panjang, Hong Leong Garden, Clementi New...",0.02,1,5,1.315125,103.755776
16,"Middle Road, Golden Mile",0.02,1,7,1.299409,103.852902


Finally, let's visualize the resulting clusters

In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(singa_merged['Latitude'], singa_merged['Longitude'], singa_merged['Neighborhood'], singa_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [43]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

Examine Clusters

In [45]:
singa_merged.loc[singa_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,PostalSector,Latitude,Longitude
5,"Bukit Merah, Queenstown, Tiong Bahru",0.04,0,3,1.28953,103.83208
17,"Orchard, Cairnhill, River Valley",0.06,0,9,1.30646,103.83898
8,"Hillview, Dairy Farm, Bukit Panjang, Choa Chu ...",0.041667,0,23,1.37877,103.76977
9,"Jurong, Tuas",0.06,0,22,1.32088,103.74532


Cluster 1

In [46]:
singa_merged.loc[singa_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,PostalSector,Latitude,Longitude
13,"Little India, Farrer Park, Jalan Besar, Lavender",0.02,1,8,1.3071,103.85842
23,"Telok Blangah, Harbourfront",0.01,1,4,1.272771,103.80966
22,"Simei, Tampines, Pasir Ris",0.01087,1,18,1.37194,103.94994
21,"Serangoon Garden, Hougang, Punggol",0.01,1,19,1.36389,103.8575
18,"Pasir Panjang, Hong Leong Garden, Clementi New...",0.02,1,5,1.315125,103.755776
16,"Middle Road, Golden Mile",0.02,1,7,1.299409,103.852902
15,"Macpherson, Braddell",0.01,1,13,1.32783,103.88545
26,"Watten Estate, Novena, Thomson",0.01,1,11,1.32667,103.81139
27,"Yishun, Sembawang",0.018868,1,27,1.44794,103.81891
11,"Kranji, Woodgrove, Woodlands",0.025316,1,25,1.4294,103.78149


cluster 2

In [48]:
singa_merged.loc[singa_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,PostalSector,Latitude,Longitude
14,"Loyang, Changi",0.0,2,17,1.37497,103.97395
4,"Bishan, Ang Mo Kio",0.0,2,20,1.36447,103.83506
19,"Raffles Place, Cecil, Marina, People's Park",0.0,2,1,1.28189,103.84912
20,Seletar,0.0,2,28,1.41,103.87417
1,"Ardmore, Bukit Timah, Holland Road, Tanglin",0.0,2,10,1.34041,103.77221
24,"Upper Bukit Timah, Clementi Park, Ulu Pandan",0.0,2,21,1.32777,103.76665
25,"Upper Thomson, Springleaf",0.0,2,26,1.401178,103.817162
12,"Lim Chu Kang, Tengah",0.0,2,24,1.41967,103.70232
0,"Anson, Tanjong Pagar",0.0,2,2,1.27889,103.84539


Most shopping Malls are concentrate in Cluster 0, followed by cluster 1 and very few in Cluster 2, so the North-West part of the city looks like an area were a Shopping Mall could find less competition.