<H1>Clustering of Hong Kong Districts' by restaurants mix</H1>

This project is my final assignment of my IBM professional certificate. I reviewed the script, added more details in the notebook and then published to medium.


<H4>Business problems</H4>
Hong Kong is well known as "food paradise" and visitors can enjoy both Eastern and Western food in such small city.<br>
My curiosity is to analyze the mix of restaurant type of each district in Hong Kong to see if some districts do have similar mix of restaurants.<br>
In this project, KMeans Clustering will be used for clustering.

In [1]:
# Loading necessary packages

import pandas as pd
import numpy as np
import geopandas
import geopy
from IPython.display import IFrame
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library
from sklearn.cluster import KMeans

<H4>Data collection and manipulation</H4>
First of all, the Hong Kong districts metadata are gathered from wikipedia by pandas.<br>
For coordinates of each Hong Kong districts, it has already been collected previous by Google Map API.
In this project, I will import the csv instead.

In [2]:
# Read table from Wiki
HK_district_wiki = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong')[5]

# Exclude the data which is irrelevant which are 'subtotal' and Marine
HK_district = HK_district_wiki[~HK_district_wiki['District'].str.contains('subtotal')&~HK_district_wiki['District'].str.contains('Marine')]['District'].to_frame()

# Read area table from Wiki
HK_district_wiki_area = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong')[6][['District','Area(km2)']]

# Merge the Area data
HK_district = HK_district.merge(HK_district_wiki_area)

# Reset index
HK_district.reset_index(inplace = True, drop = True)

print(HK_district)

               District  Area(km2)
0   Central and Western      12.44
1               Eastern      18.56
2              Southern      38.85
3              Wan Chai       9.83
4          Sham Shui Po       9.35
5          Kowloon City      10.02
6             Kwun Tong      11.27
7          Wong Tai Sin       9.30
8         Yau Tsim Mong       6.99
9               Islands     175.12
10           Kwai Tsing      23.34
11                North     136.61
12             Sai Kung     129.65
13              Sha Tin      68.71
14               Tai Po     136.15
15            Tsuen Wan      61.71
16             Tuen Mun      82.89
17            Yuen Long     138.46


Map the district coordinates from a pre-handled csv file.

In [3]:
# Load the HK District Geodata which I collected previously
hk_district_geo = pd.read_csv('HKDistrictGeo.csv')

# Merge to the District DataFrame
HK_district = HK_district.merge(hk_district_geo)
print(HK_district)

               District  Area(km2)   latitude   longitude
0   Central and Western      12.44  22.273022  114.149881
1               Eastern      18.56  22.273389  114.236078
2              Southern      38.85  22.243216  114.197440
3              Wan Chai       9.83  22.276022  114.175147
4          Sham Shui Po       9.35  22.328590  114.160285
5          Kowloon City      10.02  22.323210  114.185550
6             Kwun Tong      11.27  22.310369  114.222703
7          Wong Tai Sin       9.30  22.342961  114.192981
8         Yau Tsim Mong       6.99  22.311603  114.170688
9               Islands     175.12  22.262800  113.965500
10           Kwai Tsing      23.34  22.354908  114.126099
11                North     136.61  22.500090  114.155800
12             Sai Kung     129.65  22.383689  114.270787
13              Sha Tin      68.71  22.377130  114.197440
14               Tai Po     136.15  22.442328  114.165521
15            Tsuen Wan      61.71  22.369912  114.114431
16            

To examine whether the coordinates are correct, plot a map of districts using Folium

In [4]:
# create map
# set location that make Hong Kong at the centre of the web page
# set zoom_start = 12 for better visualization
map = folium.Map(tiles='cartodbpositron', location=[22.3529808, 114.107615], zoom_start=12)

# add markers to the map to show the districts
markers_colors = []
for lat, lon, poi in zip(HK_district['latitude'], HK_district['longitude'], HK_district['District']):
    label = folium.Popup(str(poi))
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        fill=True,
        fill_color='#FFFFFF',
        fill_opacity=0.7).add_to(map)
       
map.save('hk_district.html')
display(IFrame('hk_district.html', width = 1500, height = 1200))

Above map correctly locate the coordinate of Hong Kong districts.<br>

After that, Foursquare will be used to explore the restaurants data in the district.<br>
By using the API, the venue location and the type of restaurant can be extracted.

In [5]:
# Define FourSquare Credential
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 1000 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXX
CLIENT_SECRET:XXX


In [6]:
# Create a function to explore the Food venure from a given coordinates
import requests
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng, rad in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4d4b7105d754a06374d81259'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            rad, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
# Run the function for each HK districts
# Since some districts are much larger than another districts, I assume each district to be a circle and use the district area to determine the radius using
HK_district_venues = getNearbyVenues(names = HK_district['District'],
                                   latitudes = HK_district['latitude'],
                                   longitudes = HK_district['longitude'],
                                   
                                   #assume district is a circle, determine radius by the equation of circle area: Area = pi * r^2
                                   radius = np.sqrt(HK_district['Area(km2)']/np.pi)*1000
                                  )

Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [8]:
# Check how many restaurants can be extracted
print(HK_district_venues.groupby('District').count())

                     District Latitude  District Longitude  Venue  \
District                                                            
Central and Western                100                 100    100   
Eastern                             89                  89     89   
Islands                            100                 100    100   
Kowloon City                        94                  94     94   
Kwai Tsing                         100                 100    100   
Kwun Tong                           93                  93     93   
North                               72                  72     72   
Sai Kung                            71                  71     71   
Sha Tin                            100                 100    100   
Sham Shui Po                       100                 100    100   
Southern                            98                  98     98   
Tai Po                              91                  91     91   
Tsuen Wan                         

<H4>Analyze the restaurants in each district</H4>

Convert categorial attrtibutes of the restaurant type to numerical by using one hot encoding

In [9]:
# one hot encoding
HK_district_venues_onehot = pd.get_dummies(HK_district_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
HK_district_venues_onehot['District'] = HK_district_venues['District'] 

# move District column to the first column
fixed_columns = [HK_district_venues_onehot.columns[-1]] + list(HK_district_venues_onehot.columns[:-1])
HK_district_venues_onehot = HK_district_venues_onehot[fixed_columns]

HK_district_venues_onehot.head()

Unnamed: 0,District,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Beijing Restaurant,Bistro,Brazilian Restaurant,...,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yunnan Restaurant,Zhejiang Restaurant
0,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central and Western,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
# Group the district by summing the count of restaurant type
# Calculate the proportion of the restaurant type in the districts
HK_district_grouped = HK_district_venues_onehot.groupby('District').mean().reset_index()
HK_district_grouped

Unnamed: 0,District,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Beijing Restaurant,Bistro,Brazilian Restaurant,...,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yunnan Restaurant,Zhejiang Restaurant
0,Central and Western,0.0,0.01,0.01,0.02,0.03,0.0,0.0,0.01,0.01,...,0.03,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
1,Eastern,0.011236,0.022472,0.0,0.011236,0.022472,0.0,0.0,0.0,0.0,...,0.0,0.022472,0.0,0.0,0.022472,0.0,0.011236,0.0,0.0,0.0
2,Islands,0.01,0.01,0.01,0.01,0.06,0.0,0.0,0.01,0.0,...,0.01,0.06,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0
3,Kowloon City,0.0,0.031915,0.0,0.0,0.042553,0.0,0.0,0.010638,0.0,...,0.0,0.117021,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.0
4,Kwai Tsing,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.01,0.0,...,0.0,0.03,0.0,0.0,0.03,0.0,0.01,0.0,0.01,0.0
5,Kwun Tong,0.010753,0.0,0.0,0.021505,0.0,0.0,0.010753,0.010753,0.0,...,0.0,0.021505,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0
6,North,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,0.0,0.0,...,0.0,0.013889,0.0,0.041667,0.0,0.0,0.027778,0.0,0.0,0.0
7,Sai Kung,0.014085,0.0,0.0,0.042254,0.014085,0.0,0.0,0.014085,0.0,...,0.014085,0.070423,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0
8,Sha Tin,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.0
9,Sham Shui Po,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.02,0.03,0.01,0.0,0.0


In [11]:
# Find out the top 10 restaurant type in the district

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Restaurant Type'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Restaurant Type'.format(ind+1))

# create a new dataframe
HK_district_dish_sorted = pd.DataFrame(columns=columns)
HK_district_dish_sorted['District'] = HK_district_grouped['District']

for ind in np.arange(HK_district_grouped.shape[0]):
    HK_district_dish_sorted.iloc[ind, 1:] = return_most_common_venues(HK_district_grouped.iloc[ind, :], num_top_venues)

HK_district_dish_sorted

Unnamed: 0,District,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
0,Central and Western,Japanese Restaurant,Café,Steakhouse,French Restaurant,Indian Restaurant,Italian Restaurant,Restaurant,Sushi Restaurant,Cantonese Restaurant,Bakery
1,Eastern,Fast Food Restaurant,Chinese Restaurant,Seafood Restaurant,Noodle House,Hong Kong Restaurant,Cantonese Restaurant,Cha Chaan Teng,Japanese Restaurant,Restaurant,Café
2,Islands,Café,Chinese Restaurant,Bakery,Thai Restaurant,Mediterranean Restaurant,Japanese Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Middle Eastern Restaurant
3,Kowloon City,Thai Restaurant,Chinese Restaurant,Cha Chaan Teng,Noodle House,Café,Fast Food Restaurant,Cantonese Restaurant,Restaurant,Dim Sum Restaurant,Hotpot Restaurant
4,Kwai Tsing,Fast Food Restaurant,Chinese Restaurant,Noodle House,Café,Cha Chaan Teng,Hong Kong Restaurant,Japanese Restaurant,Dim Sum Restaurant,Sushi Restaurant,Taiwanese Restaurant
5,Kwun Tong,Fast Food Restaurant,Chinese Restaurant,Cha Chaan Teng,Café,Japanese Restaurant,Cantonese Restaurant,Sushi Restaurant,Hong Kong Restaurant,Fried Chicken Joint,Vietnamese Restaurant
6,North,Fast Food Restaurant,Chinese Restaurant,Noodle House,Turkish Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Sushi Restaurant,Seafood Restaurant,Café,Burger Joint
7,Sai Kung,Café,Seafood Restaurant,Fast Food Restaurant,Chinese Restaurant,Thai Restaurant,BBQ Joint,Pizza Place,Noodle House,Burger Joint,Shanghai Restaurant
8,Sha Tin,Fast Food Restaurant,Chinese Restaurant,Café,Cantonese Restaurant,Italian Restaurant,Noodle House,Japanese Restaurant,Hong Kong Restaurant,Cha Chaan Teng,Shanghai Restaurant
9,Sham Shui Po,Noodle House,Chinese Restaurant,Cha Chaan Teng,Dim Sum Restaurant,Bakery,Café,Hong Kong Restaurant,Japanese Restaurant,Dumpling Restaurant,Cantonese Restaurant


<H4>Cluster the districts by the restaurant type</H4>

KMeans clustering will be used as the methdology of clustering.<br>
As a simple trial, 4 clusters will be set in the classifier.

In [12]:
# set number of clusters
kclusters = 4

# set the attributes used for clustering
HK_district_grouped_clustering = HK_district_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0)
kmeans.fit(HK_district_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 3, 0, 2, 2, 2, 3, 2, 1], dtype=int32)

In [13]:
# Merge the result into the district dataframe
# add clustering labels
HK_district_dish_sorted['Cluster Labels'] = kmeans.labels_
#HK_district_dish_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

HK_district_merged = HK_district

# merge df_grouped with df_data to add latitude/longitude for each neighbourhood
HK_district_merged = HK_district_merged.join(HK_district_dish_sorted.set_index('District'), on='District')

HK_district_merged

Unnamed: 0,District,Area(km2),latitude,longitude,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type,Cluster Labels
0,Central and Western,12.44,22.273022,114.149881,Japanese Restaurant,Café,Steakhouse,French Restaurant,Indian Restaurant,Italian Restaurant,Restaurant,Sushi Restaurant,Cantonese Restaurant,Bakery,1
1,Eastern,18.56,22.273389,114.236078,Fast Food Restaurant,Chinese Restaurant,Seafood Restaurant,Noodle House,Hong Kong Restaurant,Cantonese Restaurant,Cha Chaan Teng,Japanese Restaurant,Restaurant,Café,2
2,Southern,38.85,22.243216,114.19744,Café,Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Pizza Place,Asian Restaurant,Italian Restaurant,Seafood Restaurant,Bakery,3
3,Wan Chai,9.83,22.276022,114.175147,Café,Japanese Restaurant,Hong Kong Restaurant,Chinese Restaurant,Cantonese Restaurant,Italian Restaurant,Cha Chaan Teng,Bakery,Steakhouse,Thai Restaurant,1
4,Sham Shui Po,9.35,22.32859,114.160285,Noodle House,Chinese Restaurant,Cha Chaan Teng,Dim Sum Restaurant,Bakery,Café,Hong Kong Restaurant,Japanese Restaurant,Dumpling Restaurant,Cantonese Restaurant,1
5,Kowloon City,10.02,22.32321,114.18555,Thai Restaurant,Chinese Restaurant,Cha Chaan Teng,Noodle House,Café,Fast Food Restaurant,Cantonese Restaurant,Restaurant,Dim Sum Restaurant,Hotpot Restaurant,0
6,Kwun Tong,11.27,22.310369,114.222703,Fast Food Restaurant,Chinese Restaurant,Cha Chaan Teng,Café,Japanese Restaurant,Cantonese Restaurant,Sushi Restaurant,Hong Kong Restaurant,Fried Chicken Joint,Vietnamese Restaurant,2
7,Wong Tai Sin,9.3,22.342961,114.192981,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Cha Chaan Teng,Cantonese Restaurant,Café,Noodle House,Asian Restaurant,Vietnamese Restaurant,Snack Place,0
8,Yau Tsim Mong,6.99,22.311603,114.170688,Chinese Restaurant,Japanese Restaurant,Noodle House,Café,Cantonese Restaurant,Hotpot Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Indian Restaurant,Dim Sum Restaurant,1
9,Islands,175.12,22.2628,113.9655,Café,Chinese Restaurant,Bakery,Thai Restaurant,Mediterranean Restaurant,Japanese Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Middle Eastern Restaurant,3


Display the resulted cluster in the map for better visualization.

In [14]:
# create map
map_cluster = folium.Map(tiles='cartodbpositron', location=[22.3529808, 114.107615], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_district_merged['latitude'], HK_district_merged['longitude'], HK_district_merged['District'], HK_district_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_cluster)
       
map_cluster.save('hk_district_cluster.html')
display(IFrame('hk_district_cluster.html', width = 1500, height = 1200))

<H1>Examine Clusters</H1>

<H4>Cluster 1</H4>
Kolwoon City, Wong Tai Sin and Tsuen Wan are special areas in Hong Kong which clustered in Cluster 1</br>
Kolwoon City is well known to have the best and most number of Thai restaurants in Hong Kong.</br>
It's reasonable why these three districts are clustered into the same clusters.

In [15]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 0]

Unnamed: 0,District,Area(km2),latitude,longitude,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type,Cluster Labels
5,Kowloon City,10.02,22.32321,114.18555,Thai Restaurant,Chinese Restaurant,Cha Chaan Teng,Noodle House,Café,Fast Food Restaurant,Cantonese Restaurant,Restaurant,Dim Sum Restaurant,Hotpot Restaurant,0
7,Wong Tai Sin,9.3,22.342961,114.192981,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Cha Chaan Teng,Cantonese Restaurant,Café,Noodle House,Asian Restaurant,Vietnamese Restaurant,Snack Place,0
15,Tsuen Wan,61.71,22.369912,114.114431,Chinese Restaurant,Noodle House,Cha Chaan Teng,Japanese Restaurant,Fast Food Restaurant,Dim Sum Restaurant,Cantonese Restaurant,Shanghai Restaurant,Café,Burger Joint,0


<H4>Cluster 2</H4>

Cluster 2 are mostly business and urban area in Hong Kong which consists of different types of restaurants.</br>
The variety of restaurant type in these districts is high that well demostrated why Hong Kong is also known as "Food Paradise".

In [16]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 1]

Unnamed: 0,District,Area(km2),latitude,longitude,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type,Cluster Labels
0,Central and Western,12.44,22.273022,114.149881,Japanese Restaurant,Café,Steakhouse,French Restaurant,Indian Restaurant,Italian Restaurant,Restaurant,Sushi Restaurant,Cantonese Restaurant,Bakery,1
3,Wan Chai,9.83,22.276022,114.175147,Café,Japanese Restaurant,Hong Kong Restaurant,Chinese Restaurant,Cantonese Restaurant,Italian Restaurant,Cha Chaan Teng,Bakery,Steakhouse,Thai Restaurant,1
4,Sham Shui Po,9.35,22.32859,114.160285,Noodle House,Chinese Restaurant,Cha Chaan Teng,Dim Sum Restaurant,Bakery,Café,Hong Kong Restaurant,Japanese Restaurant,Dumpling Restaurant,Cantonese Restaurant,1
8,Yau Tsim Mong,6.99,22.311603,114.170688,Chinese Restaurant,Japanese Restaurant,Noodle House,Café,Cantonese Restaurant,Hotpot Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Indian Restaurant,Dim Sum Restaurant,1


<H4>Cluster 3</H4>

Most of Cluster 3 districts are living areas in Hong Kong.</br>
The restaurants mainly serves locals and provide cheap and convenient opions of meals.</br>
As shown in the below tables, Chinese Restaurant and Fast Food Restaurant exist a lot in these districts.



In [17]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 2]

Unnamed: 0,District,Area(km2),latitude,longitude,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type,Cluster Labels
1,Eastern,18.56,22.273389,114.236078,Fast Food Restaurant,Chinese Restaurant,Seafood Restaurant,Noodle House,Hong Kong Restaurant,Cantonese Restaurant,Cha Chaan Teng,Japanese Restaurant,Restaurant,Café,2
6,Kwun Tong,11.27,22.310369,114.222703,Fast Food Restaurant,Chinese Restaurant,Cha Chaan Teng,Café,Japanese Restaurant,Cantonese Restaurant,Sushi Restaurant,Hong Kong Restaurant,Fried Chicken Joint,Vietnamese Restaurant,2
10,Kwai Tsing,23.34,22.354908,114.126099,Fast Food Restaurant,Chinese Restaurant,Noodle House,Café,Cha Chaan Teng,Hong Kong Restaurant,Japanese Restaurant,Dim Sum Restaurant,Sushi Restaurant,Taiwanese Restaurant,2
11,North,136.61,22.50009,114.1558,Fast Food Restaurant,Chinese Restaurant,Noodle House,Turkish Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Sushi Restaurant,Seafood Restaurant,Café,Burger Joint,2
13,Sha Tin,68.71,22.37713,114.19744,Fast Food Restaurant,Chinese Restaurant,Café,Cantonese Restaurant,Italian Restaurant,Noodle House,Japanese Restaurant,Hong Kong Restaurant,Cha Chaan Teng,Shanghai Restaurant,2
14,Tai Po,136.15,22.442328,114.165521,Fast Food Restaurant,Chinese Restaurant,Hong Kong Restaurant,Café,Cha Chaan Teng,Restaurant,BBQ Joint,Sushi Restaurant,Cantonese Restaurant,Snack Place,2
16,Tuen Mun,82.89,22.39083,113.972513,Fast Food Restaurant,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Italian Restaurant,Café,BBQ Joint,Cha Chaan Teng,Japanese Restaurant,2
17,Yuen Long,138.46,22.444538,114.022208,Chinese Restaurant,Fast Food Restaurant,Seafood Restaurant,Noodle House,Café,Hong Kong Restaurant,Japanese Restaurant,Thai Restaurant,Sushi Restaurant,Dumpling Restaurant,2


<H4>Cluster 4</H4>

Southern, Islands and Sai Kung are the remote areas in Hong Kong and many foreigners loves to live there.</br>
As a result, to fit the taste of their need, there are lots of Cafe and western dishes restaurants in the areas.

In [18]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 3]

Unnamed: 0,District,Area(km2),latitude,longitude,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type,Cluster Labels
2,Southern,38.85,22.243216,114.19744,Café,Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Pizza Place,Asian Restaurant,Italian Restaurant,Seafood Restaurant,Bakery,3
9,Islands,175.12,22.2628,113.9655,Café,Chinese Restaurant,Bakery,Thai Restaurant,Mediterranean Restaurant,Japanese Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Middle Eastern Restaurant,3
12,Sai Kung,129.65,22.383689,114.270787,Café,Seafood Restaurant,Fast Food Restaurant,Chinese Restaurant,Thai Restaurant,BBQ Joint,Pizza Place,Noodle House,Burger Joint,Shanghai Restaurant,3


<H1>Conclusion</H1>

By using clustering above, the districts can be briefly grouped into 4 segments by the mix of restaurants there.</br>
Although it maybe not surprising to any Hong Kong people, this technic can still be replicated to other city even though we have no knowledge about the city.

<H1>Next step</H1>

Since I was just using the free vision of FourSquare, the number of venue can be extracted is limited.<br>
More restaurants can be grabbed for paid version.<br>
On the other hand, FourSquare is a US based company and may not have full detail of restaurant data in Hong Kong.<br>
If the restaurant data from Hong Kong popular dining rating website "Openrice" can be obtained, more accurate clustering can be expected. 