# Capstone Project - The Battle of Neighborhoods

In this project I will use location data to explore a geographical location, and use Data Science technique like Clustering and Visualization to solve the problem I defined below.

## Problem

In this project I will answer one question: 'Where is the proper location to open a restaurant in Hong Kong?'

## Data

Main data is from two sources:

1. List of districts and neighborhoods in Hong Kong from Wikipedia (https://en.wikipedia.org/wiki/List_of_places_in_Hong_Kong)
2. Foursquare

_**Note:** Some data may constains Chinese characters, i.e. some places' name, though I've tried my best to avoid it. Please aware that it doesn't affact the analysis and report at all._

## Prepare data

Firstly, load necessary libraries.

In [1]:
import requests
import folium

import numpy as np
import pandas as pd

import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

The list of districts and neighborhoods in Hong Kong is from [Wikipedia](https://en.wikipedia.org/wiki/List_of_places_in_Hong_Kong), and the coordinates data is from [https://www.maps.ie/coordinates.html](https://www.maps.ie/coordinates.html). 

Hong Kong consists of Hong Kong Island, the Kowloon Peninsula, the New Territories, Lantau Island, and over 200 other islands. This project will focus on Hong Kong Island and Kowloon.

I create the `.csv` file manually.

Let's load and explore it.

In [2]:
df_hk = pd.read_csv('neighborhoods_hong_kong.csv')

df_hk.head()

Unnamed: 0,District,Neighborhood,Latitude,Longitude
0,Central & Western,Central District,22.281322,114.160258
1,Central & Western,Mid-Levels,22.282405,114.145809
2,Central & Western,The Peak,22.272003,114.152417
3,Central & Western,Sai Wan,22.285838,114.134023
4,Central & Western,Sheung Wan,22.28687,114.150267


Check the shape of the DataFrame.

In [3]:
df_hk.shape

(60, 4)

## Visualize the geographic data

In [4]:
latitude = 22.2793278
longitude = 114.1828131

map_hk = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_hk['Latitude'], df_hk['Longitude'], df_hk['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hk)  
    
map_hk

## Explore the Foursquare's API

In [5]:
CLIENT_ID = 'ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A'
CLIENT_SECRET = 'JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A
CLIENT_SECRET:JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU


Generate request url.

In [6]:
LIMIT = 100
radius = 1000

neighborhood_latitude = 22.30383
neighborhood_longitude = 114.18297

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A&client_secret=JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU&v=20180605&ll=22.30383,114.18297&radius=1000&limit=100'

Send request and examine the results.

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ca03757351e3d4c79e09804'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4b9248b1f964a520fcef33e3-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/arts_entertainment/stadium_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d184941735',
         'name': 'Stadium',
         'pluralName': 'Stadiums',
         'primary': True,
         'shortName': 'Stadium'}],
       'id': '4b9248b1f964a520fcef33e3',
       'location': {'address': '9 Cheong Wan Rd',
        'cc': 'HK',
        'city': '红磡',
        'country': '香港',
        'distance': 285,
        'formattedAddress': ['9 Cheong Wan Rd', '香港'],
        'labeledLatLngs': [{'label': 'display',
          'lat': 22.301417,
          'lng': 114.1820305}],
        'lat': 22.30141

## Explore neighborhoods

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
hk_venues = getNearbyVenues(names=df_hk['Neighborhood'],
                                   latitudes=df_hk['Latitude'],
                                   longitudes=df_hk['Longitude']
                                  )

Central District
Mid-Levels
The Peak
Sai Wan
Sheung Wan
Chai Wan
North Point
Quarry Bay
Sai Wan Ho
Shau Kei Wan
Siu Sai Wan
Aberdeen
Ap Lei Chau
Chung Hom Kok
Cyberport
Deep Water Bay
Pok Fu Lam
Tin Wan
Repulse Bay
Stanley
Shek O
Tai Tam
Wong Chuk Hang
Causeway Bay
Happy Valley
Tai Hang
Wan Chai
Ho Man Tin
Hung Hom
Kowloon City
Kowloon Tong
Kowloon Tsai
Ma Tau Kok
Ma Tau Wai
To Kwa Wan
Cha Kwo Ling
Kwun Tong
Lam Tin
Ngau Tau Kok
Kowloon Bay
Sau Mau Ping
Yau Tong
Cheung Sha Wan
Lai Chi Kok
Sham Shui Po
Shek Kip Mei
Stonecutters Island
Yau Yat Chuen
Diamond Hill
Kowloon Peak
Ngau Chi Wan
San Po Kong
Tsz Wan Shan
Wang Tau Hom
Wong Tai Sin
Mong Kok
Tai Kok Tsui
Tsim Sha Tsui
Tsim Sha Tsui East
Yau Ma Tei


Check the size of resulting dataframe.

In [10]:
print(hk_venues.shape)
hk_venues.head()

(1901, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central District,22.281322,114.160258,Mott 32 (卅二公館),22.280696,114.15938,Dim Sum Restaurant
1,Central District,22.281322,114.160258,Mandarin Oriental Hong Kong (香港文華東方酒店),22.281879,114.159443,Hotel
2,Central District,22.281322,114.160258,Mandarin Grill + Bar 文華扒房＋酒吧,22.281462,114.160156,Steakhouse
3,Central District,22.281322,114.160258,8½ Otto e Mezzo Bombana,22.281726,114.158767,Italian Restaurant
4,Central District,22.281322,114.160258,The Mandarin Cake Shop,22.281959,114.159416,Bakery


In [11]:
hk_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aberdeen,27,27,27,27,27,27
Ap Lei Chau,28,28,28,28,28,28
Causeway Bay,100,100,100,100,100,100
Central District,100,100,100,100,100,100
Cha Kwo Ling,5,5,5,5,5,5
Chai Wan,24,24,24,24,24,24
Cheung Sha Wan,34,34,34,34,34,34
Chung Hom Kok,2,2,2,2,2,2
Cyberport,24,24,24,24,24,24
Deep Water Bay,2,2,2,2,2,2


In [12]:
print('There are {} uniques categories.'.format(len(hk_venues['Venue Category'].unique())))

There are 216 uniques categories.


## Pre-Processing

In [13]:
# one hot encoding
hk_onehot = pd.get_dummies(hk_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hk_onehot['Neighborhood'] = hk_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hk_onehot.columns[-1]] + list(hk_onehot.columns[:-1])
hk_onehot = hk_onehot[fixed_columns]

hk_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Airport Service,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Tram Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
hk_onehot.shape

(1901, 216)

In [15]:
hk_grouped = hk_onehot.groupby('Neighborhood').mean().reset_index()
hk_grouped

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Airport Service,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Tram Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,Aberdeen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0
1,Ap Lei Chau,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0
2,Causeway Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
3,Central District,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0
4,Cha Kwo Ling,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chai Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0
6,Cheung Sha Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0
7,Chung Hom Kok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cyberport,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0
9,Deep Water Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Check the size of grouped dataframe.

In [16]:
hk_grouped.shape

(59, 216)

The size of grouped dataframe is different from the neighborhood dataframe. Let's find out it.

In [17]:
missing_neighborhood = [i for i in df_hk['Neighborhood'].unique() if i not in hk_grouped['Neighborhood'].unique()]

missing_neighborhood

['Stonecutters Island']

'Stonecutters Island' is missing in grouped dataframe. After some research, I find out that Stonecutters Island is a military port, so I decide to exclude it from the dataset.

In [18]:
df_hk = df_hk[df_hk['Neighborhood'] != 'Stonecutters Island']

Print each neighborhood along with the top 5 most common venues.

In [19]:
num_top_venues = 5

for hood in hk_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hk_grouped[hk_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aberdeen----
                  venue  freq
0  Fast Food Restaurant  0.11
1         Grocery Store  0.07
2      Sushi Restaurant  0.07
3        Cha Chaan Teng  0.07
4    Chinese Restaurant  0.07


----Ap Lei Chau----
                    venue  freq
0    Fast Food Restaurant  0.11
1  Furniture / Home Store  0.11
2      Seafood Restaurant  0.07
3           Grocery Store  0.07
4           Shopping Mall  0.07


----Causeway Bay----
                 venue  freq
0  Japanese Restaurant  0.10
1   Chinese Restaurant  0.08
2         Dessert Shop  0.05
3          Coffee Shop  0.05
4     Sushi Restaurant  0.04


----Central District----
                  venue  freq
0    Chinese Restaurant  0.05
1                Lounge  0.04
2     French Restaurant  0.04
3  Gym / Fitness Center  0.04
4           Social Club  0.04


----Cha Kwo Ling----
                  venue  freq
0     Convenience Store   0.4
1  Fast Food Restaurant   0.2
2         Shopping Mall   0.2
3          Noodle House   0.2
4         Je

                   venue  freq
0                   Café  0.12
1         Cha Chaan Teng  0.07
2     Chinese Restaurant  0.07
3    Japanese Restaurant  0.05
4  Vietnamese Restaurant  0.05


----Tai Kok Tsui----
                 venue  freq
0         Noodle House  0.11
1          Coffee Shop  0.08
2  Japanese Restaurant  0.07
3         Burger Joint  0.03
4    Hotpot Restaurant  0.03


----Tai Tam----
                     venue  freq
0                     Park   1.0
1                      Zoo   0.0
2  New American Restaurant   0.0
3        Outdoor Sculpture   0.0
4     Outdoor Supply Store   0.0


----The Peak----
              venue  freq
0    Scenic Lookout  0.11
1  Asian Restaurant  0.07
2     Shopping Mall  0.07
3    Ice Cream Shop  0.07
4       Pizza Place  0.04


----Tin Wan----
           venue  freq
0     Restaurant  0.25
1    Fish Market  0.25
2         Hostel  0.25
3  Shopping Mall  0.25
4            Zoo  0.00


----To Kwa Wan----
                  venue  freq
0    Chinese Restau

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = hk_grouped['Neighborhood']

for ind in np.arange(hk_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hk_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberdeen,Fast Food Restaurant,Sushi Restaurant,Grocery Store,Cha Chaan Teng,Thai Restaurant,Chinese Restaurant,Athletics & Sports,Market,Shopping Mall,Taiwanese Restaurant
1,Ap Lei Chau,Fast Food Restaurant,Furniture / Home Store,Chinese Restaurant,Shopping Mall,Grocery Store,Seafood Restaurant,Cupcake Shop,Bus Station,Outlet Store,Café
2,Causeway Bay,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Dessert Shop,Bakery,Sushi Restaurant,Cantonese Restaurant,Noodle House,Café,Hotel
3,Central District,Chinese Restaurant,French Restaurant,Gym / Fitness Center,Social Club,Lounge,Cantonese Restaurant,Hotel,Italian Restaurant,Steakhouse,Spa
4,Cha Kwo Ling,Convenience Store,Noodle House,Fast Food Restaurant,Shopping Mall,Donburi Restaurant,Flea Market,Fish Market,Field,Farmers Market,English Restaurant


Find out venues including bus or metro station.

In [22]:
df_station = hk_venues[hk_venues['Venue Category'].str.contains('Station$') |
                       hk_venues['Venue Category'].str.contains('^Bus')]
df_station.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
147,The Peak,22.272003,114.152417,Peak Tram Upper Terminus (山頂纜車凌霄閣總站),22.271115,114.150183,Tram Station
162,The Peak,22.272003,114.152417,Peak Tram May Road Station (山頂纜車梅道站),22.273614,114.155979,Light Rail Station
461,Shau Kei Wan,22.279343,114.228898,Shau Kei Wan Tram Terminus (筲箕灣電車總站),22.277801,114.23022,Tram Station
473,Shau Kei Wan,22.279343,114.228898,Chai Wan Road Tram Stop (101E/02W) (柴灣道電車站),22.276824,114.228662,Tram Station
481,Shau Kei Wan,22.279343,114.228898,Hoi Foo Street Tram Stop (95E/06W) (海富街電車站),22.280369,114.224526,Tram Station


Insert a new column to represents if there is a station nearby.

In [24]:
cols = df_station['Neighborhood'].unique()
indice = neighborhoods_venues_sorted[neighborhoods_venues_sorted['Neighborhood'].isin(cols)].index.values

neighborhoods_venues_sorted['Station'] = 'No'
neighborhoods_venues_sorted.loc[indice, 'Station'] = 'Yes'

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Aberdeen,Fast Food Restaurant,Sushi Restaurant,Grocery Store,Cha Chaan Teng,Thai Restaurant,Chinese Restaurant,Athletics & Sports,Market,Shopping Mall,Taiwanese Restaurant,Yes
1,Ap Lei Chau,Fast Food Restaurant,Furniture / Home Store,Chinese Restaurant,Shopping Mall,Grocery Store,Seafood Restaurant,Cupcake Shop,Bus Station,Outlet Store,Café,Yes
2,Causeway Bay,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Dessert Shop,Bakery,Sushi Restaurant,Cantonese Restaurant,Noodle House,Café,Hotel,No
3,Central District,Chinese Restaurant,French Restaurant,Gym / Fitness Center,Social Club,Lounge,Cantonese Restaurant,Hotel,Italian Restaurant,Steakhouse,Spa,No
4,Cha Kwo Ling,Convenience Store,Noodle House,Fast Food Restaurant,Shopping Mall,Donburi Restaurant,Flea Market,Fish Market,Field,Farmers Market,English Restaurant,No


## Clustering

Run k-means to cluster the neighborhood into 5 clusters.

In [25]:
# set number of clusters
kclusters = 5

hk_grouped_clustering = hk_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 2, 1, 3], dtype=int32)

Create a new dataframe that includes the cluster for each neighborhood.

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hk_merged = df_hk

# merge hk_grouped with hk_data to add latitude/longitude for each neighborhood
hk_merged = hk_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

hk_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central & Western,Central District,22.281322,114.160258,1,Chinese Restaurant,French Restaurant,Gym / Fitness Center,Social Club,Lounge,Cantonese Restaurant,Hotel,Italian Restaurant,Steakhouse,Spa,No
1,Central & Western,Mid-Levels,22.282405,114.145809,1,Thai Restaurant,Coffee Shop,Italian Restaurant,French Restaurant,Noodle House,Café,Bar,Scandinavian Restaurant,Beer Store,Bookstore,No
2,Central & Western,The Peak,22.272003,114.152417,1,Scenic Lookout,Ice Cream Shop,Shopping Mall,Asian Restaurant,Sushi Restaurant,Burger Joint,Shoe Store,Seafood Restaurant,Restaurant,Clothing Store,Yes
3,Central & Western,Sai Wan,22.285838,114.134023,0,Dessert Shop,Hong Kong Restaurant,Noodle House,Spanish Restaurant,Café,Sushi Restaurant,Park,Furniture / Home Store,Cantonese Restaurant,Malay Restaurant,No
4,Central & Western,Sheung Wan,22.28687,114.150267,1,Café,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Indian Restaurant,Cha Chaan Teng,Italian Restaurant,French Restaurant,Ramen Restaurant,Vegetarian / Vegan Restaurant,No


### Visualize the result

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hk_merged['Latitude'], hk_merged['Longitude'], hk_merged['Neighborhood'], hk_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1

In [28]:
hk_merged.loc[hk_merged['Cluster Labels'] == 0, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
3,Sai Wan,Dessert Shop,Hong Kong Restaurant,Noodle House,Spanish Restaurant,Café,Sushi Restaurant,Park,Furniture / Home Store,Cantonese Restaurant,Malay Restaurant,No
6,North Point,Chinese Restaurant,Snack Place,Hotpot Restaurant,Hong Kong Restaurant,Noodle House,Burger Joint,Café,Bar,Shopping Mall,Park,No
8,Sai Wan Ho,Chinese Restaurant,Hong Kong Restaurant,French Restaurant,Japanese Restaurant,Restaurant,Park,Cantonese Restaurant,Indian Restaurant,Deli / Bodega,Cycle Studio,No
9,Shau Kei Wan,Chinese Restaurant,Tram Station,Noodle House,Dessert Shop,Fast Food Restaurant,Hong Kong Restaurant,Cha Chaan Teng,Convenience Store,Hainan Restaurant,Shopping Mall,Yes
24,Happy Valley,Chinese Restaurant,Hong Kong Restaurant,Japanese Restaurant,French Restaurant,Pub,Coffee Shop,Bakery,Café,Italian Restaurant,Dim Sum Restaurant,Yes
28,Hung Hom,Chinese Restaurant,Hotel,Japanese Restaurant,Coffee Shop,Hotpot Restaurant,Hong Kong Restaurant,Snack Place,Noodle House,Dessert Shop,Cha Chaan Teng,Yes
30,Kowloon Tong,Grocery Store,Pool,Park,Track,Playground,Chinese Restaurant,Basketball Court,Dongbei Restaurant,Dumpling Restaurant,Zhejiang Restaurant,No
34,To Kwa Wan,Chinese Restaurant,Cha Chaan Teng,Café,Theater,Food & Drink Shop,Fast Food Restaurant,Hostel,Vietnamese Restaurant,Hong Kong Restaurant,Steakhouse,Yes
39,Kowloon Bay,Chinese Restaurant,Cantonese Restaurant,Coffee Shop,Furniture / Home Store,Café,Multiplex,Bistro,Hong Kong Restaurant,Portuguese Restaurant,Sushi Restaurant,No
44,Sham Shui Po,Noodle House,Chinese Restaurant,Dessert Shop,Snack Place,Hong Kong Restaurant,Shopping Mall,Bus Stop,Italian Restaurant,Sushi Restaurant,Szechuan Restaurant,Yes


### Cluster 2

In [29]:
hk_merged.loc[hk_merged['Cluster Labels'] == 1, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central District,Chinese Restaurant,French Restaurant,Gym / Fitness Center,Social Club,Lounge,Cantonese Restaurant,Hotel,Italian Restaurant,Steakhouse,Spa,No
1,Mid-Levels,Thai Restaurant,Coffee Shop,Italian Restaurant,French Restaurant,Noodle House,Café,Bar,Scandinavian Restaurant,Beer Store,Bookstore,No
2,The Peak,Scenic Lookout,Ice Cream Shop,Shopping Mall,Asian Restaurant,Sushi Restaurant,Burger Joint,Shoe Store,Seafood Restaurant,Restaurant,Clothing Store,Yes
4,Sheung Wan,Café,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Indian Restaurant,Cha Chaan Teng,Italian Restaurant,French Restaurant,Ramen Restaurant,Vegetarian / Vegan Restaurant,No
5,Chai Wan,Chinese Restaurant,Fast Food Restaurant,Vietnamese Restaurant,Coffee Shop,Cha Chaan Teng,Bakery,Cosmetics Shop,Seafood Restaurant,Multiplex,Park,No
7,Quarry Bay,Café,Japanese Restaurant,Coffee Shop,Department Store,Thai Restaurant,Vietnamese Restaurant,Chinese Restaurant,Korean Restaurant,Food Court,Sandwich Place,No
10,Siu Sai Wan,Fast Food Restaurant,Hong Kong Restaurant,Supermarket,Restaurant,Thai Restaurant,Dessert Shop,Dim Sum Restaurant,Park,Café,Bus Station,Yes
11,Aberdeen,Fast Food Restaurant,Sushi Restaurant,Grocery Store,Cha Chaan Teng,Thai Restaurant,Chinese Restaurant,Athletics & Sports,Market,Shopping Mall,Taiwanese Restaurant,Yes
12,Ap Lei Chau,Fast Food Restaurant,Furniture / Home Store,Chinese Restaurant,Shopping Mall,Grocery Store,Seafood Restaurant,Cupcake Shop,Bus Station,Outlet Store,Café,Yes
14,Cyberport,Coffee Shop,Café,Bus Stop,Gym,Hotel,Asian Restaurant,Cantonese Restaurant,Shopping Mall,Buffet,Multiplex,Yes


### Cluster 3

In [30]:
hk_merged.loc[hk_merged['Cluster Labels'] == 2, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
13,Chung Hom Kok,Beach,Park,Zhejiang Restaurant,Dumpling Restaurant,Food & Drink Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market,No
21,Tai Tam,Park,Zhejiang Restaurant,Dongbei Restaurant,Food & Drink Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market,English Restaurant,No


### Cluster 4

In [31]:
hk_merged.loc[hk_merged['Cluster Labels'] == 3, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
15,Deep Water Bay,Coffee Shop,Furniture / Home Store,Dongbei Restaurant,Food & Drink Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market,English Restaurant,No


### Cluster 5

In [32]:
hk_merged.loc[hk_merged['Cluster Labels'] == 4, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
49,Kowloon Peak,Mountain,Campground,Zhejiang Restaurant,Dongbei Restaurant,Food & Drink Shop,Flea Market,Fish Market,Field,Fast Food Restaurant,Farmers Market,No


## Conclusion

Our question is "Where is the proper location to open a restaurant". Obviously cluster 3-5 are excluded from our candidates cause these are mountain or park (actually we can see it on the map).

After examining cluster 1 and cluster 2, I'd like to say cluster 1 represents residential area and cluster 2 represents commercial area. So the answer of our question depends on what type the restaurant is.

Detailed conclusion will be including in the report.