<H1>The Battle of Neighborhoods - Clustering of Hong Kong Districts' Restaurants mix</H1>

<H4>Business problems</H4>
This project is to analyze the restaurant mix in each district of Hong Kong.<br>
The districts will be clustered by using KMeans Clustering to group districts having similar mix of restaurants.

The people who would like to open a new restaurant in Hong Kong would be interested to this project so he/she can make a better decision of choosing a district for his/her new restaurant.

In [1]:
import pandas as pd
import numpy as np
import geopandas
import geopy

<H4>Data collection and manipulation</H4>
First of all, the districts data in Hong Kong would be gathered from wikipedia by pandas.

In [2]:
#Read table from Wiki
HK_district_wiki = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong')[5]

#Exclude District including the name 'subtotal' and Marine
HK_district = HK_district_wiki[~HK_district_wiki['District'].str.contains('subtotal')&~HK_district_wiki['District'].str.contains('Marine')]['District'].to_frame()

#Read area table from Wiki
HK_district_wiki_area = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong')[6][['District','Area(km2)']]

#Merge the Area data
HK_district = HK_district.merge(HK_district_wiki_area)

#Reset index
HK_district.reset_index(inplace = True, drop = True)

print(HK_district)

               District  Area(km2)
0   Central and Western      12.44
1               Eastern      18.56
2              Southern      38.85
3              Wan Chai       9.83
4          Sham Shui Po       9.35
5          Kowloon City      10.02
6             Kwun Tong      11.27
7          Wong Tai Sin       9.30
8         Yau Tsim Mong       6.99
9               Islands     175.12
10           Kwai Tsing      23.34
11                North     136.61
12             Sai Kung     129.65
13              Sha Tin      68.71
14               Tai Po     136.15
15            Tsuen Wan      61.71
16             Tuen Mun      82.89
17            Yuen Long     138.46


Map the district coordinates from a pre-handled csv file.

In [3]:
#Load the HK District Geodata
hk_district_geo = pd.read_csv('HKDistrictGeo.csv')

#Merge to the District DataFrame
HK_district = HK_district.merge(hk_district_geo)
print(HK_district)

               District  Area(km2)                     place_id   latitude  \
0   Central and Western      12.44  ChIJsWdv64D_AzQRRxIBOZvIw0Y  22.273022   
1               Eastern      18.56  ChIJ98wBvQYBBDQR-QRxlG6U-y0  22.273389   
2              Southern      38.85  ChIJseIpbvL_AzQREzBG1q-p7sA  22.243216   
3              Wan Chai       9.83  ChIJef5hflsABDQR7tjDNeZxJSQ  22.276022   
4          Sham Shui Po       9.35  ChIJC6pfobQABDQRTn_pueERhuM  22.328590   
5          Kowloon City      10.02  ChIJ4YVsN9QABDQRp6hYS6D6tso  22.323210   
6             Kwun Tong      11.27  ChIJlU2gg0gBBDQRTIVhSaIQK84  22.310369   
7          Wong Tai Sin       9.30  ChIJ7Xj8OCcHBDQRooODyCoZHeA  22.342961   
8         Yau Tsim Mong       6.99  ChIJV4T2M5UABDQREoOIKqiR83o  22.311603   
9               Islands     175.12  ChIJhz5GfisABDQRQBiwD46PxeQ  22.262800   
10           Kwai Tsing      23.34  ChIJi6vp4Jf4AzQR8U4A1xv0KYU  22.354908   
11                North     136.61  ChIJD5gyo-3iAzQRfMnq27qzivA 

To examine whether the coordinates are correct, plot a map of districts using Folium

In [4]:
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

# create map
map = folium.Map(location=[22.3529808, 114.107615], zoom_start=12)

# add markers to the map
markers_colors = []
for lat, lon, poi in zip(HK_district['latitude'], HK_district['longitude'], HK_district['District']):
    label = folium.Popup(str(poi))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_color='#0000000',
        fill_opacity=0.7).add_to(map)
       
map

After that, Foursquare will be used to explore the Food venue information in the district.
By using the API, the venue location and the type of restaurant can be extracted.

In [5]:
#Define FourSquare Credential
CLIENT_ID = 'KOX3A3VAFUHQ4OGMEPYXLNKRY0N5LKVGUMTVYGFXKJEDSNU0' # your Foursquare ID
CLIENT_SECRET = 'BGXEYDN0BASGAMHZ1LA0KM51NCFTGAAE3Q1XUD54GEC5SUON' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 1000 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KOX3A3VAFUHQ4OGMEPYXLNKRY0N5LKVGUMTVYGFXKJEDSNU0
CLIENT_SECRET:BGXEYDN0BASGAMHZ1LA0KM51NCFTGAAE3Q1XUD54GEC5SUON


In [6]:
#Create a function to explore the Food venure from a given coordinates
import requests
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng, rad in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4d4b7105d754a06374d81259'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            rad, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
# Run the function for each HK districts
# Since some districts are much larger than another districts, I assume each district to be a circle and use the district area to determine the radius using
HK_district_venues = getNearbyVenues(names = HK_district['District'],
                                   latitudes = HK_district['latitude'],
                                   longitudes = HK_district['longitude'],
                                   radius = np.sqrt(HK_district['Area(km2)']/np.pi)*1000
                                  )

Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [8]:
#Check how many restaurants can be extracted
print(HK_district_venues.shape)
print(HK_district_venues.groupby('District').count())

(1563, 7)
                     District Latitude  District Longitude  Venue  \
District                                                            
Central and Western                100                 100    100   
Eastern                             85                  85     85   
Islands                            100                 100    100   
Kowloon City                        92                  92     92   
Kwai Tsing                         100                 100    100   
Kwun Tong                           97                  97     97   
North                               64                  64     64   
Sai Kung                            73                  73     73   
Sha Tin                            100                 100    100   
Sham Shui Po                       100                 100    100   
Southern                            97                  97     97   
Tai Po                              77                  77     77   
Tsuen Wan               

<H4>Analyze the restaurants in each district</H4>

Convert categorial attrtibutes to numerical by on hot encoding

In [9]:
# one hot encoding
HK_district_venues_onehot = pd.get_dummies(HK_district_venues[['Venue Category']], prefix="", prefix_sep="")

# add District column back to dataframe
HK_district_venues_onehot['District'] = HK_district_venues['District'] 

# move District column to the first column
fixed_columns = [HK_district_venues_onehot.columns[-1]] + list(HK_district_venues_onehot.columns[:-1])
HK_district_venues_onehot = HK_district_venues_onehot[fixed_columns]

HK_district_venues_onehot.head()

Unnamed: 0,District,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Beijing Restaurant,Bistro,...,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yunnan Restaurant,Zhejiang Restaurant
0,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
#Group the district by the breakdown of restaurant type
HK_district_grouped = HK_district_venues_onehot.groupby('District').mean().reset_index()
HK_district_grouped

Unnamed: 0,District,American Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Beijing Restaurant,Bistro,...,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Xinjiang Restaurant,Yunnan Restaurant,Zhejiang Restaurant
0,Central and Western,0.01,0.0,0.01,0.0,0.02,0.02,0.0,0.0,0.01,...,0.02,0.02,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0
1,Eastern,0.011765,0.0,0.011765,0.0,0.011765,0.035294,0.0,0.0,0.0,...,0.0,0.023529,0.0,0.0,0.011765,0.0,0.011765,0.0,0.0,0.0
2,Islands,0.01,0.0,0.01,0.01,0.01,0.06,0.0,0.0,0.01,...,0.01,0.06,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0
3,Kowloon City,0.0,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.01087,...,0.0,0.141304,0.0,0.0,0.0,0.01087,0.032609,0.0,0.0,0.0
4,Kwai Tsing,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0
5,Kwun Tong,0.010309,0.0,0.010309,0.0,0.020619,0.0,0.0,0.010309,0.010309,...,0.0,0.020619,0.0,0.0,0.0,0.0,0.030928,0.0,0.0,0.0
6,North,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,...,0.0,0.015625,0.0,0.046875,0.0,0.0,0.0,0.0,0.0,0.0
7,Sai Kung,0.013699,0.0,0.013699,0.0,0.027397,0.013699,0.0,0.013699,0.013699,...,0.013699,0.068493,0.0,0.0,0.0,0.013699,0.0,0.0,0.0,0.0
8,Sha Tin,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.0
9,Sham Shui Po,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.02,0.03,0.01,0.0,0.0


In [11]:
#Find out the top 10 restaurant type in the district

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Restaurant Type'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Restaurant Type'.format(ind+1))

# create a new dataframe
HK_district_dish_sorted = pd.DataFrame(columns=columns)
HK_district_dish_sorted['District'] = HK_district_grouped['District']

for ind in np.arange(HK_district_grouped.shape[0]):
    HK_district_dish_sorted.iloc[ind, 1:] = return_most_common_venues(HK_district_grouped.iloc[ind, :], num_top_venues)

HK_district_dish_sorted

Unnamed: 0,District,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
0,Central and Western,Café,Japanese Restaurant,Italian Restaurant,French Restaurant,Steakhouse,Restaurant,Indian Restaurant,Sushi Restaurant,Cantonese Restaurant,Chinese Restaurant
1,Eastern,Chinese Restaurant,Seafood Restaurant,Fast Food Restaurant,Cha Chaan Teng,Cantonese Restaurant,Noodle House,Hong Kong Restaurant,Japanese Restaurant,Bakery,Hainan Restaurant
2,Islands,Chinese Restaurant,Café,Bakery,Thai Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Mediterranean Restaurant,Japanese Restaurant,Hong Kong Restaurant
3,Kowloon City,Thai Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Noodle House,Fast Food Restaurant,Dim Sum Restaurant,Bakery,Asian Restaurant,Hotpot Restaurant
4,Kwai Tsing,Fast Food Restaurant,Chinese Restaurant,Noodle House,Cha Chaan Teng,Hong Kong Restaurant,Cantonese Restaurant,Café,Japanese Restaurant,Shanghai Restaurant,Italian Restaurant
5,Kwun Tong,Fast Food Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Japanese Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Pizza Place
6,North,Fast Food Restaurant,Chinese Restaurant,Noodle House,Cha Chaan Teng,Hong Kong Restaurant,Café,Turkish Restaurant,Seafood Restaurant,Burger Joint,Sushi Restaurant
7,Sai Kung,Café,Seafood Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Hong Kong Restaurant,Pizza Place,Italian Restaurant,Burger Joint,BBQ Joint
8,Sha Tin,Fast Food Restaurant,Chinese Restaurant,Café,Cantonese Restaurant,Ramen Restaurant,Italian Restaurant,Hong Kong Restaurant,Pizza Place,Noodle House,Vietnamese Restaurant
9,Sham Shui Po,Noodle House,Chinese Restaurant,Cha Chaan Teng,Café,Bakery,Dim Sum Restaurant,Dumpling Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Japanese Restaurant


<H4>Cluster the districts by the restaurant type</H4>

In the below project, KMeans will be used as the methdology of clustering.
4 clusters will be set in the classifier.

In [12]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 4

HK_district_grouped_clustering = HK_district_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0)
kmeans.fit(HK_district_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 2, 3, 1, 1, 1, 2, 1, 0], dtype=int32)

In [13]:
#Merge the result into the district dataframe
# add clustering labels
HK_district_dish_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

HK_district_merged = HK_district

# merge df_grouped with df_data to add latitude/longitude for each neighbourhood
HK_district_merged = HK_district_merged.join(HK_district_dish_sorted.set_index('District'), on='District')

HK_district_merged

Unnamed: 0,District,Area(km2),place_id,latitude,longitude,Cluster Labels,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
0,Central and Western,12.44,ChIJsWdv64D_AzQRRxIBOZvIw0Y,22.273022,114.149881,0,Café,Japanese Restaurant,Italian Restaurant,French Restaurant,Steakhouse,Restaurant,Indian Restaurant,Sushi Restaurant,Cantonese Restaurant,Chinese Restaurant
1,Eastern,18.56,ChIJ98wBvQYBBDQR-QRxlG6U-y0,22.273389,114.236078,1,Chinese Restaurant,Seafood Restaurant,Fast Food Restaurant,Cha Chaan Teng,Cantonese Restaurant,Noodle House,Hong Kong Restaurant,Japanese Restaurant,Bakery,Hainan Restaurant
2,Southern,38.85,ChIJseIpbvL_AzQREzBG1q-p7sA,22.243216,114.19744,2,Café,Restaurant,Chinese Restaurant,Thai Restaurant,Fast Food Restaurant,Italian Restaurant,Seafood Restaurant,Asian Restaurant,Pizza Place,Bakery
3,Wan Chai,9.83,ChIJef5hflsABDQR7tjDNeZxJSQ,22.276022,114.175147,0,Café,Hong Kong Restaurant,Bakery,Cantonese Restaurant,Japanese Restaurant,Italian Restaurant,Noodle House,Steakhouse,Chinese Restaurant,Cha Chaan Teng
4,Sham Shui Po,9.35,ChIJC6pfobQABDQRTn_pueERhuM,22.32859,114.160285,0,Noodle House,Chinese Restaurant,Cha Chaan Teng,Café,Bakery,Dim Sum Restaurant,Dumpling Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Japanese Restaurant
5,Kowloon City,10.02,ChIJ4YVsN9QABDQRp6hYS6D6tso,22.32321,114.18555,3,Thai Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Noodle House,Fast Food Restaurant,Dim Sum Restaurant,Bakery,Asian Restaurant,Hotpot Restaurant
6,Kwun Tong,11.27,ChIJlU2gg0gBBDQRTIVhSaIQK84,22.310369,114.222703,1,Fast Food Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Japanese Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Pizza Place
7,Wong Tai Sin,9.3,ChIJ7Xj8OCcHBDQRooODyCoZHeA,22.342961,114.192981,3,Thai Restaurant,Fast Food Restaurant,Chinese Restaurant,Cha Chaan Teng,Cantonese Restaurant,Vietnamese Restaurant,Café,Restaurant,Hotpot Restaurant,Asian Restaurant
8,Yau Tsim Mong,6.99,ChIJV4T2M5UABDQREoOIKqiR83o,22.311603,114.170688,0,Chinese Restaurant,Japanese Restaurant,Noodle House,Cantonese Restaurant,Cha Chaan Teng,Café,Hotpot Restaurant,Hong Kong Restaurant,Indian Restaurant,Dim Sum Restaurant
9,Islands,175.12,ChIJhz5GfisABDQRQBiwD46PxeQ,22.2628,113.9655,2,Chinese Restaurant,Café,Bakery,Thai Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Mediterranean Restaurant,Japanese Restaurant,Hong Kong Restaurant


In [14]:
# create map
map_cluster = folium.Map(location=[22.3529808, 114.107615], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_district_merged['latitude'], HK_district_merged['longitude'], HK_district_merged['District'], HK_district_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_cluster)
       
map_cluster

<H1>Examine Clusters</H1>

<H4>Cluster 1</H4>
Cluster 1 are mostly business area in Hong Kong which consists of different types of restaurants.</br>
The variety of restaurant type in these districts is high that well demostrated why Hong Kong is also known as "Food Paradise".

In [15]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 0]

Unnamed: 0,District,Area(km2),place_id,latitude,longitude,Cluster Labels,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
0,Central and Western,12.44,ChIJsWdv64D_AzQRRxIBOZvIw0Y,22.273022,114.149881,0,Café,Japanese Restaurant,Italian Restaurant,French Restaurant,Steakhouse,Restaurant,Indian Restaurant,Sushi Restaurant,Cantonese Restaurant,Chinese Restaurant
3,Wan Chai,9.83,ChIJef5hflsABDQR7tjDNeZxJSQ,22.276022,114.175147,0,Café,Hong Kong Restaurant,Bakery,Cantonese Restaurant,Japanese Restaurant,Italian Restaurant,Noodle House,Steakhouse,Chinese Restaurant,Cha Chaan Teng
4,Sham Shui Po,9.35,ChIJC6pfobQABDQRTn_pueERhuM,22.32859,114.160285,0,Noodle House,Chinese Restaurant,Cha Chaan Teng,Café,Bakery,Dim Sum Restaurant,Dumpling Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Japanese Restaurant
8,Yau Tsim Mong,6.99,ChIJV4T2M5UABDQREoOIKqiR83o,22.311603,114.170688,0,Chinese Restaurant,Japanese Restaurant,Noodle House,Cantonese Restaurant,Cha Chaan Teng,Café,Hotpot Restaurant,Hong Kong Restaurant,Indian Restaurant,Dim Sum Restaurant


<H4>Cluster 2</H4>
Most of Cluster 2 districts are living areas in Hong Kong.</br>
The restaurants mainly serves locals and provide cheap and convenient opions of meals.</br>
As shown in the below tables, Chinese Restaurant, Fast Food Restaurant, Noodle House and Cha Chaan Teng exist a lot in these districts.

In [16]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 1]

Unnamed: 0,District,Area(km2),place_id,latitude,longitude,Cluster Labels,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
1,Eastern,18.56,ChIJ98wBvQYBBDQR-QRxlG6U-y0,22.273389,114.236078,1,Chinese Restaurant,Seafood Restaurant,Fast Food Restaurant,Cha Chaan Teng,Cantonese Restaurant,Noodle House,Hong Kong Restaurant,Japanese Restaurant,Bakery,Hainan Restaurant
6,Kwun Tong,11.27,ChIJlU2gg0gBBDQRTIVhSaIQK84,22.310369,114.222703,1,Fast Food Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Japanese Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Pizza Place
10,Kwai Tsing,23.34,ChIJi6vp4Jf4AzQR8U4A1xv0KYU,22.354908,114.126099,1,Fast Food Restaurant,Chinese Restaurant,Noodle House,Cha Chaan Teng,Hong Kong Restaurant,Cantonese Restaurant,Café,Japanese Restaurant,Shanghai Restaurant,Italian Restaurant
11,North,136.61,ChIJD5gyo-3iAzQRfMnq27qzivA,22.50009,114.1558,1,Fast Food Restaurant,Chinese Restaurant,Noodle House,Cha Chaan Teng,Hong Kong Restaurant,Café,Turkish Restaurant,Seafood Restaurant,Burger Joint,Sushi Restaurant
13,Sha Tin,68.71,ChIJE7LqOkkGBDQROkTLhUO0GXU,22.37713,114.19744,1,Fast Food Restaurant,Chinese Restaurant,Café,Cantonese Restaurant,Ramen Restaurant,Italian Restaurant,Hong Kong Restaurant,Pizza Place,Noodle House,Vietnamese Restaurant
14,Tai Po,136.15,ChIJz3uR-ML3AzQRMJDlfs07kMg,22.442328,114.165521,1,Fast Food Restaurant,Chinese Restaurant,Hong Kong Restaurant,Café,Cha Chaan Teng,Noodle House,Restaurant,BBQ Joint,Snack Place,Cantonese Restaurant
15,Tsuen Wan,61.71,ChIJmXQjje74AzQRsINigpbDAAo,22.369912,114.114431,1,Chinese Restaurant,Noodle House,Cha Chaan Teng,Japanese Restaurant,Fast Food Restaurant,Shanghai Restaurant,Italian Restaurant,Sushi Restaurant,Cantonese Restaurant,Café
16,Tuen Mun,82.89,ChIJT7oNlTv7AzQRgir6z3WI6fY,22.39083,113.972513,1,Fast Food Restaurant,Seafood Restaurant,Chinese Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Italian Restaurant,Burger Joint,Cha Chaan Teng,Café,BBQ Joint
17,Yuen Long,138.46,ChIJuWZv9rvwAzQR5kO1teLKjTU,22.444538,114.022208,1,Chinese Restaurant,Fast Food Restaurant,Noodle House,Seafood Restaurant,Café,Japanese Restaurant,Hong Kong Restaurant,Italian Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant


<H4>Cluster 3</H4>
Southern, Islands and Sai Kung are the remote areas in Hong Kong and many foreigners loves to live there.</br>
As a result, to fit the taste of their need, there are lots of Cafe and western dishes restaurants in the areas.

In [17]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 2]

Unnamed: 0,District,Area(km2),place_id,latitude,longitude,Cluster Labels,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
2,Southern,38.85,ChIJseIpbvL_AzQREzBG1q-p7sA,22.243216,114.19744,2,Café,Restaurant,Chinese Restaurant,Thai Restaurant,Fast Food Restaurant,Italian Restaurant,Seafood Restaurant,Asian Restaurant,Pizza Place,Bakery
9,Islands,175.12,ChIJhz5GfisABDQRQBiwD46PxeQ,22.2628,113.9655,2,Chinese Restaurant,Café,Bakery,Thai Restaurant,Pizza Place,Restaurant,Seafood Restaurant,Mediterranean Restaurant,Japanese Restaurant,Hong Kong Restaurant
12,Sai Kung,129.65,ChIJMVcosp4FBDQRoHRigpbDAAo,22.383689,114.270787,2,Café,Seafood Restaurant,Chinese Restaurant,Fast Food Restaurant,Thai Restaurant,Hong Kong Restaurant,Pizza Place,Italian Restaurant,Burger Joint,BBQ Joint


<H4>Cluster 4</H4>
Kolwoon City and Wong Tai Sin are two special areas in Hong Kong which clustered in Cluster 4</br>
Kolwoon City is well known to have the best and most number of Thai restaurants in Hong Kong.</br>
It's reasonable why these two districts are clustered into the same clusters.

In [18]:
HK_district_merged[HK_district_merged['Cluster Labels'] == 3]

Unnamed: 0,District,Area(km2),place_id,latitude,longitude,Cluster Labels,1st Most Restaurant Type,2nd Most Restaurant Type,3rd Most Restaurant Type,4th Most Restaurant Type,5th Most Restaurant Type,6th Most Restaurant Type,7th Most Restaurant Type,8th Most Restaurant Type,9th Most Restaurant Type,10th Most Restaurant Type
5,Kowloon City,10.02,ChIJ4YVsN9QABDQRp6hYS6D6tso,22.32321,114.18555,3,Thai Restaurant,Chinese Restaurant,Café,Cha Chaan Teng,Noodle House,Fast Food Restaurant,Dim Sum Restaurant,Bakery,Asian Restaurant,Hotpot Restaurant
7,Wong Tai Sin,9.3,ChIJ7Xj8OCcHBDQRooODyCoZHeA,22.342961,114.192981,3,Thai Restaurant,Fast Food Restaurant,Chinese Restaurant,Cha Chaan Teng,Cantonese Restaurant,Vietnamese Restaurant,Café,Restaurant,Hotpot Restaurant,Asian Restaurant


<H1>Conclusion</H1>

By the clustering above, the districts can be briefly grouped into 4 segments by the restaurants there.</br>
A potential restaurant owner may consider which dishes he/she good at so that they can choose the most appropiate district to open a restaurant.