# IBM Data Science Capstone Project

### Introduction

Many Hong Kongers are frequent travelers to Japan as they like to eat Japanese food, sightseeing and enjoy the japanese culture. However, COVID-19 blocks all the ways travelling to Japan and most of them had not been to there for more than one and a half year. It is a golden opportunity for Japanese catering groups to explore their business in Hong Kong. By observation, many Japanese companies, like Donki (Supermarket), Sukiya (Fastfood shop) etc, are eager to expand their business in Hong Kong with propersity. It is imaginable that there is tremendious market and room for Japanese F&B group to develop in Hong Kong.

### Business Problem
Nevertheless, it is always a challenge for the foreign catering companies to select an optimal location as the rent in Hong Kong may well count around 30 - 40% of the operating costs. A good location and a bad location may crutically influence the profitability and  popularity of a Japanese restaurant. In this project, we would like to identify the best location by utilising data science approach and maching learning techniques to provide insight to those Japanese F&B companies for making a better decision on locating their restaurants in Hong Kong.

### Data Acquisition

The following APIs and data sources will be used for extract/generate the required information:

- Areas and districts list from HK Government: split and define the 18 districts in Hong Kong
    https://www.rvd.gov.hk/doc/tc/hkpr20/Appendix_TC.xlsx
- Folium and geopy API: retrieve Latitude and Longitude of these neighborhoods
- Foursquare API: collect top Venues data related to these neighborhoods
    https://foursquare.com/products/places/

### Import Library

In [4]:
import pandas as pd
import numpy as np
import folium
import geopy as geo
import requests
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline
import matplotlib.cm as cm
import matplotlib.colors as colors

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

### Read the District File from the government website

In [5]:
districtdf = pd.read_excel("https://www.rvd.gov.hk/doc/tc/hkpr20/Appendix_TC.xlsx", engine = 'openpyxl', header = 4)
districtdf.head()

Unnamed: 0,區 域 \nArea,地 區 \nDistrict,地 區 內 的 分 區 名 稱,Names of Sub-districts\nwithin District Boundaries,小 規 劃 統 計 區 \nTertiary Planning Units
0,,,,,
1,港 島 \nHONG KONG,中 西 區 \nCentral and\nWestern,堅 尼 地 城 、 石 塘 咀 、\n西 營 盤 、 上 環 、\n中 環 、 金 鐘 、\...,"Kennedy Town, Shek Tong Tsui,\nSai Ying Pun, S...","111, 112, 113, 114, 115, 116, \n121, 122, 123,..."
2,,灣 仔 \nWan Chai,灣 仔 、 銅 鑼 灣 、\n天 后 、 跑 馬 地 、 大 坑 、\n掃 桿 埔 、 渣 ...,"Wan Chai, Causeway Bay,\nTin Hau, Happy Valley...","124(p), 131, 132, 133, 134, 135, \n140, 144, 1..."
3,,東 區 \nEastern,寶 馬 山 、 北 角 、\n鰂 魚 涌 、 西 灣 河 、\n筲 箕 灣 、 柴 灣 、 ...,"Braemar Hill, North Point,\nQuarry Bay, Sai Wa...","148(p), 151(p), 152(p), 153, \n154, 155, 156, ..."
4,,南 區 \nSouthern,薄 扶 林 、 香 港 仔 、 \n鴨 脷 洲 、 黃 竹 坑 、\n壽 臣 山 、 淺 水...,"Pok Fu Lam, Aberdeen,\nAp Lei Chau, Wong Chuk ...","171, 172, 173, 174, 175, 176, \n191, 192, 193,..."


### Transform the data in the District file to a proper table 
that can be read and understand by the machine to extend location information

In [6]:
districtdf = districtdf[['地 區 \nDistrict','Names of Sub-districts\nwithin District Boundaries']]
districtdf = districtdf.rename(columns={'地 區 \nDistrict':'District','Names of Sub-districts\nwithin District Boundaries':'Neighborhood'})
districtdf = districtdf.replace('\n',' ',regex=True)
districtdf['Neighborhood'] = districtdf['Neighborhood'].replace(r'\(including ', '', regex=True)
districtdf['Neighborhood'] = districtdf['Neighborhood'].replace(r'\)','', regex =True)
districtdf = districtdf.iloc[1:,:]
districtdf = districtdf[pd.notnull(districtdf['District'])]
districtdf['District'] = districtdf["District"].apply(lambda x: ''.join([" " if ord(i) <= 32 or ord(i) > 126 else i for i in x]))
districtdf = districtdf.assign(Neighborhood = districtdf['Neighborhood'].str.split(',')).explode('Neighborhood')
districtdf = districtdf.reset_index(drop=True)
districtdf

Unnamed: 0,District,Neighborhood
0,Central and Western,Kennedy Town
1,Central and Western,Shek Tong Tsui
2,Central and Western,Sai Ying Pun
3,Central and Western,Sheung Wan
4,Central and Western,Central
5,Central and Western,Admiralty
6,Central and Western,Mid-levels
7,Central and Western,Peak
8,Wan Chai,Wan Chai
9,Wan Chai,Causeway Bay


### Peek on the data

In [7]:
districtdf2 = districtdf.loc[districtdf['District'] == 'Wan Chai', :]
districtdf2

Unnamed: 0,District,Neighborhood


In [7]:
districtdf['Location'] = districtdf['Neighborhood'] + ', Hong Kong, China'
districtdf.head()


Unnamed: 0,District,Neighborhood,Location
0,Central and Western,Kennedy Town,"Kennedy Town, Hong Kong, China"
1,Central and Western,Shek Tong Tsui,"Shek Tong Tsui, Hong Kong, China"
2,Central and Western,Sai Ying Pun,"Sai Ying Pun, Hong Kong, China"
3,Central and Western,Sheung Wan,"Sheung Wan, Hong Kong, China"
4,Central and Western,Central,"Central, Hong Kong, China"


In [8]:
geolocator = geo.Nominatim(user_agent='folium')
districtdf['Coordinates'] = districtdf['Location'].apply(geolocator.geocode)
districtdf['Latitude'] = districtdf['Coordinates'].apply(lambda x: x.latitude if x!= None else None)
districtdf['Longitude'] = districtdf['Coordinates'].apply(lambda x: x.longitude if x!= None else None)
districtdf.head(127)

Unnamed: 0,District,Neighborhood,Location,Coordinates,Latitude,Longitude
0,Central and Western,Kennedy Town,"Kennedy Town, Hong Kong, China","(堅尼地城 Kennedy Town, 12N, 士美菲路 Smithfield, 堅尼地城...",22.281312,114.12916
1,Central and Western,Shek Tong Tsui,"Shek Tong Tsui, Hong Kong, China","(石塘咀 Shek Tong Tsui, 中西區 Central and Western D...",22.285876,114.135749
2,Central and Western,Sai Ying Pun,"Sai Ying Pun, Hong Kong, China","(西營盤 Sai Ying Pun, 中西區 Central and Western Dis...",22.286121,114.142086
3,Central and Western,Sheung Wan,"Sheung Wan, Hong Kong, China","(上環 Sheung Wan, 中西區 Central and Western Distri...",22.286483,114.150197
4,Central and Western,Central,"Central, Hong Kong, China","(中環 Central, 中西區 Central and Western District,...",22.281829,114.158278
5,Central and Western,Admiralty,"Admiralty, Hong Kong, China","(金鐘 Admiralty, 德立街 Drake Street, 金鐘 Admiralty,...",22.278616,114.166269
6,Central and Western,Mid-levels,"Mid-levels, Hong Kong, China","(半山 Mid-Levels, 中西區 Central and Western Distri...",22.276935,114.155937
7,Central and Western,Peak,"Peak, Hong Kong, China","(山頂 The Peak, 中西區 Central and Western District...",22.269917,114.150667
8,Wan Chai,Wan Chai,"Wan Chai, Hong Kong, China","(灣仔 Wan Chai, 灣仔區 Wan Chai District, 香港島 Hong ...",22.279015,114.172483
9,Wan Chai,Causeway Bay,"Causeway Bay, Hong Kong, China","(銅鑼灣 Causeway Bay, 灣仔區 Wan Chai District, 香港島 ...",22.280511,114.185559


In [9]:
map = folium.Map(location=[districtdf.Latitude.mean(),
                           districtdf.Longitude.mean()], zoom_start=11, control_scale=True, tiles="OpenStreetMap")

In [10]:
"""
for index, location_info in districtdf.iterrows():
    #folium.Marker([districtdf["Latitude"], districtdf["Longitude"]], popup=districtdf['Neighborhood']).add_to(map)
    folium.Marker([[location_info['Latitude'], location_info['Longitude']], popup= location_info['Neighborhood']).add_to(map)
"""                   
for index, location_info in districtdf.iterrows():
    folium.Circle([location_info["Latitude"], location_info["Longitude"]], 
                  popup=location_info["Neighborhood"],
                  radius=300,
                  color="purple",
                  fill=True,
                  fill_color="yellow").add_to(map)

In [11]:
map

In [12]:
CLIENT_ID = 'YDZTFUA0WMUJ03PGLQHH0JF2ZDC3DJDTURTKDBLO1EAOHV1E' # your Foursquare ID
CLIENT_SECRET = 'Q5KP4RT0ZMVR15L4XFXNAZMAGSTIQCJ1HFJO0DFDRQRLN5DY' # your Foursquare Secret
ACCESS_TOKEN = 'Z4VZV14NNX4H304QYJAMJKRDX35BHSE3WVUAWBZ3CHCXMJLJ' # your FourSquare Access Token
VERSION = '20210824'
LIMIT = 10
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YDZTFUA0WMUJ03PGLQHH0JF2ZDC3DJDTURTKDBLO1EAOHV1E
CLIENT_SECRET:Q5KP4RT0ZMVR15L4XFXNAZMAGSTIQCJ1HFJO0DFDRQRLN5DY


### Search for Japanese restaurant

In [13]:
search_query = 'Japanese Restaurant'
radius = 500
jptable= pd.DataFrame()

for index, x in districtdf.iterrows():
    #print(x['Latitude'] + ';' + x['Longitude'])
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, x['Latitude'], x['Longitude'] ,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

    results = requests.get(url).json()
    venues = results['response']['venues']

    # tranform venues into a dataframe
    dataframe_filtered = json_normalize(venues)
    dataframe_filtered['Neighborhood'] = x['Neighborhood']
    jptable = jptable.append(dataframe_filtered, ignore_index = True)

jptable.shape

  del sys.path[0]


(818, 20)

In [14]:
jptable

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.country,location.formattedAddress,location.neighborhood,Neighborhood,location.state,location.crossStreet,location.postalCode,venuePage.id
0,4c8780d4d8086dcbad6ea252,Kamukura Japanese Restaurant (神座日本料理),"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1629878455,False,"Shop E, G/F, Luen Tak Apartment, 45-51 Smithfield",22.281105,114.12825,"[{'label': 'display', 'lat': 22.28110527718966...",96.0,HK,坚尼地城,香港,"[Shop E, G/F, Luen Tak Apartment, 45-51 Smithf...",,Kennedy Town,,,,
1,538c2001498e379b2c90d5a9,K House Japanese Restaurant (城屋日本料理),"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1629878455,False,6-18 Hau Wo St,22.282904,114.127919,"[{'label': 'display', 'lat': 22.28290360196949...",218.0,HK,,香港,[6-18 Hau Wo St],,Kennedy Town,,,,
2,547ef886498e2615336edb6d,Kyo Japanese (京日本料理),"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1629878455,False,"G/F, 34 Davis St",22.281499,114.12711,"[{'label': 'display', 'lat': 22.28149936219063...",212.0,HK,坚尼地城,香港,"[G/F, 34 Davis St]",,Kennedy Town,,,,
3,53638f2f498e5c347c5d75be,Shin Shu Japanese (信州日本料理),"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1629878455,False,"Shop 5, G/F, Grand Fortune Mansion, 1 Davis St",22.28347,114.126485,"[{'label': 'display', 'lat': 22.28346965870307...",365.0,HK,坚尼地城,香港,"[Shop 5, G/F, Grand Fortune Mansion, 1 Davis St]",,Kennedy Town,,,,
4,4d00d08b75d3236af378ecf7,Iron Japanese Cuisine (鐵坂日式料理),"[{'id': '4bf58dd8d48988d111941735', 'name': 'J...",v-1629878455,False,"Shop C, 47 Hau Wo St",22.282819,114.126863,"[{'label': 'display', 'lat': 22.28281886774259...",290.0,HK,坚尼地城,香港,"[Shop C, 47 Hau Wo St]",,Kennedy Town,,,,
5,5b28f23b82a750002c4540dd,Man Wah Restaurant (民華餐廳),"[{'id': '58daa1558bbb0b01f18ec1d3', 'name': 'C...",v-1629878455,False,"12 Rock Hill St, Kennedy Town",22.282121,114.128534,"[{'label': 'display', 'lat': 22.282121, 'lng':...",110.0,HK,香港,香港,"[12 Rock Hill St, Kennedy Town]",,Kennedy Town,,,,
6,59523dcd6bdee678d3ba2a45,Xiao Yu Hotpot Restaurant (渝味曉宇重慶老火鍋),"[{'id': '52af3b773cf9994f4e043c03', 'name': 'S...",v-1629878455,False,"1/F, Westview Height, 163 Belcher's St",22.282716,114.128821,"[{'label': 'display', 'lat': 22.282716, 'lng':...",160.0,HK,坚尼地城,香港,"[1/F, Westview Height, 163 Belcher's St]",Kennedy Town,Kennedy Town,,,,
7,4b59359ff964a5207b8128e3,Cheung Hong Yuen Tea Restaurant (祥香茶餐廳),"[{'id': '58daa1558bbb0b01f18ec1d3', 'name': 'C...",v-1629878455,False,"G/F, 107 Belcher's St",22.282561,114.128434,"[{'label': 'display', 'lat': 22.28256051674656...",157.0,HK,坚尼地城,香港,"[G/F, 107 Belcher's St]",Kennedy Town,Kennedy Town,,,,
8,4cfb20f7084f548149a17f09,Kai Kee Restaurant (佳記餐廳),"[{'id': '58daa1558bbb0b01f18ec1d3', 'name': 'C...",v-1629878455,False,110-112 Belcher's St,22.282489,114.127628,"[{'label': 'display', 'lat': 22.28248911752310...",205.0,HK,坚尼地城,香港,[110-112 Belcher's St],Kennedy Town,Kennedy Town,,,,
9,4cbad6e0c7228cfa54cd19ce,Ho Choi Seafood Restaurant (好彩海鮮酒家),"[{'id': '52af3a7c3cf9994f4e043bed', 'name': 'C...",v-1629878455,False,67-83 Belcher's St,22.282931,114.129206,"[{'label': 'display', 'lat': 22.28293149285392...",180.0,HK,坚尼地城,香港,[67-83 Belcher's St],,Kennedy Town,,,,


In [15]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories','Neighborhood'] + [col for col in jptable.columns if col.startswith('location.')] + ['id']
dataframe_filtered = jptable.loc[:, filtered_columns]

#remove those null rows, otherwise, the following function will run with error.
dataframe_filtered = dataframe_filtered[pd.notnull(dataframe_filtered['name'])]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]


# remove unnecessary columns and rename them
filtered_columns = ['name', 'categories','Neighborhood','lat','lng']
dataframe_filtered = dataframe_filtered.loc[:, filtered_columns]
dataframe_filtered = dataframe_filtered.rename(columns=
                                               {'name':'Venue','categories':'VCat','lat':'VLat','lng':'VLng'})


dataframe_filtered.head()

Unnamed: 0,Venue,VCat,Neighborhood,VLat,VLng
0,Kamukura Japanese Restaurant (神座日本料理),Japanese Restaurant,Kennedy Town,22.281105,114.12825
1,K House Japanese Restaurant (城屋日本料理),Japanese Restaurant,Kennedy Town,22.282904,114.127919
2,Kyo Japanese (京日本料理),Japanese Restaurant,Kennedy Town,22.281499,114.12711
3,Shin Shu Japanese (信州日本料理),Japanese Restaurant,Kennedy Town,22.28347,114.126485
4,Iron Japanese Cuisine (鐵坂日式料理),Japanese Restaurant,Kennedy Town,22.282819,114.126863


In [16]:
dataframe_filtered.groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue,VCat,VLat,VLng
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Aberdeen,10,10,10,10
Admiralty,10,10,10,10
Ap Lei Chau,9,9,9,9
Causeway Bay,10,10,10,10
Central,10,10,10,10
Chai Wan,10,10,10,10
Cheung Sha Wan,10,10,10,10
Chung Hom Kok,3,3,3,3
Diamond Hill,10,10,10,10
Discovery Bay,5,5,5,5


In [17]:
UniqueCat = dataframe_filtered['VCat'].unique()
list(UniqueCat)

['Japanese Restaurant',
 'Cha Chaan Teng',
 'Szechuan Restaurant',
 'Cantonese Restaurant',
 'Dim Sum Restaurant',
 'Chinese Restaurant',
 'Vietnamese Restaurant',
 'Hong Kong Restaurant',
 'Restaurant',
 'Noodle House',
 'Sushi Restaurant',
 'BBQ Joint',
 'Ramen Restaurant',
 'Thai Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Asian Restaurant',
 'Shanghai Restaurant',
 'Grocery Store',
 'Taiwanese Restaurant',
 'Elementary School',
 'American Restaurant',
 'Food & Drink Shop',
 'Café',
 'French Restaurant',
 'Supermarket',
 'Korean Restaurant',
 'Seafood Restaurant',
 'Indian Restaurant',
 'Diner',
 'Fast Food Restaurant',
 'Hotpot Restaurant',
 'Theme Park',
 'German Restaurant',
 'Bar',
 'Buffet',
 'Malay Restaurant',
 'Italian Restaurant',
 'Dongbei Restaurant',
 'Udon Restaurant',
 'Dumpling Restaurant',
 'Shandong Restaurant',
 'Preschool',
 'Steakhouse',
 'Turkish Restaurant',
 'Hotel Bar',
 'Shabu-Shabu Restaurant',
 'Singaporean Restaurant',
 'Snack Place',
 'Breakfast Spo

Udon Restaurant, Shabu-Shabu Restaurant, Ramen Restaurant and Sushi Restaurant are belong to Japanese Restaurant.

In [18]:
dataframe_filtered['VCat'] = dataframe_filtered['VCat'].replace('Udon Restaurant','Japanese Restaurant',regex=True)
dataframe_filtered['VCat'] = dataframe_filtered['VCat'].replace('Ramen Restaurant','Japanese Restaurant',regex=True)
dataframe_filtered['VCat'] = dataframe_filtered['VCat'].replace('Shabu-Shabu Restaurant','Japanese Restaurant',regex=True)
dataframe_filtered['VCat'] = dataframe_filtered['VCat'].replace('Sushi Restaurant','Japanese Restaurant',regex=True)
dataframe_filtered['VCat'].unique()

array(['Japanese Restaurant', 'Cha Chaan Teng', 'Szechuan Restaurant',
       'Cantonese Restaurant', 'Dim Sum Restaurant', 'Chinese Restaurant',
       'Vietnamese Restaurant', 'Hong Kong Restaurant', 'Restaurant',
       'Noodle House', 'BBQ Joint', 'Thai Restaurant',
       'Vegetarian / Vegan Restaurant', 'Asian Restaurant',
       'Shanghai Restaurant', 'Grocery Store', 'Taiwanese Restaurant',
       'Elementary School', 'American Restaurant', 'Food & Drink Shop',
       'Café', 'French Restaurant', 'Supermarket', 'Korean Restaurant',
       'Seafood Restaurant', 'Indian Restaurant', 'Diner',
       'Fast Food Restaurant', 'Hotpot Restaurant', 'Theme Park',
       'German Restaurant', 'Bar', 'Buffet', 'Malay Restaurant',
       'Italian Restaurant', 'Dongbei Restaurant', 'Dumpling Restaurant',
       'Shandong Restaurant', 'Preschool', 'Steakhouse',
       'Turkish Restaurant', 'Hotel Bar', 'Singaporean Restaurant',
       'Snack Place', 'Breakfast Spot', 'Korean BBQ Restaurant',


Regenerate the distinct category list

In [19]:
UniqueCat = dataframe_filtered['VCat'].unique()
list(UniqueCat)

['Japanese Restaurant',
 'Cha Chaan Teng',
 'Szechuan Restaurant',
 'Cantonese Restaurant',
 'Dim Sum Restaurant',
 'Chinese Restaurant',
 'Vietnamese Restaurant',
 'Hong Kong Restaurant',
 'Restaurant',
 'Noodle House',
 'BBQ Joint',
 'Thai Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Asian Restaurant',
 'Shanghai Restaurant',
 'Grocery Store',
 'Taiwanese Restaurant',
 'Elementary School',
 'American Restaurant',
 'Food & Drink Shop',
 'Café',
 'French Restaurant',
 'Supermarket',
 'Korean Restaurant',
 'Seafood Restaurant',
 'Indian Restaurant',
 'Diner',
 'Fast Food Restaurant',
 'Hotpot Restaurant',
 'Theme Park',
 'German Restaurant',
 'Bar',
 'Buffet',
 'Malay Restaurant',
 'Italian Restaurant',
 'Dongbei Restaurant',
 'Dumpling Restaurant',
 'Shandong Restaurant',
 'Preschool',
 'Steakhouse',
 'Turkish Restaurant',
 'Hotel Bar',
 'Singaporean Restaurant',
 'Snack Place',
 'Breakfast Spot',
 'Korean BBQ Restaurant',
 'Indonesian Restaurant',
 'Tibetan Restaurant',
 'Pub',
 '

In [20]:
print(len(UniqueCat))

54


There are 54 distinct venue categories. All are listed as above.

### Analyze the districts

Use one-hot encoding to the venue and summarize with the mean of each venue category by districts.

In [21]:
# one hot encoding
hk_onehot = pd.get_dummies(dataframe_filtered[['VCat']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hk_onehot['Neighborhood'] = dataframe_filtered['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hk_onehot.columns[-1]] + list(hk_onehot.columns[:-1])
hk_onehot = hk_onehot[fixed_columns]

hk_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bar,Breakfast Spot,Buffet,Cafeteria,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Dim Sum Restaurant,Diner,Dongbei Restaurant,Dumpling Restaurant,Elementary School,Fast Food Restaurant,Food & Drink Shop,French Restaurant,General College & University,German Restaurant,Gourmet Shop,Grocery Store,Hong Kong Restaurant,Hotel Bar,Hotpot Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Korean Restaurant,Malay Restaurant,Noodle House,Preschool,Pub,Restaurant,Seafood Restaurant,Shandong Restaurant,Shanghai Restaurant,Shopping Mall,Singaporean Restaurant,Snack Place,Steakhouse,Supermarket,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Park,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Kennedy Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Kennedy Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Kennedy Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Kennedy Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Kennedy Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [22]:
hk_onehot.shape

(818, 55)

In [23]:
hk_grouped = hk_onehot.groupby('Neighborhood').mean().reset_index()
hk_grouped

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bar,Breakfast Spot,Buffet,Cafeteria,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Dim Sum Restaurant,Diner,Dongbei Restaurant,Dumpling Restaurant,Elementary School,Fast Food Restaurant,Food & Drink Shop,French Restaurant,General College & University,German Restaurant,Gourmet Shop,Grocery Store,Hong Kong Restaurant,Hotel Bar,Hotpot Restaurant,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Korean BBQ Restaurant,Korean Restaurant,Malay Restaurant,Noodle House,Preschool,Pub,Restaurant,Seafood Restaurant,Shandong Restaurant,Shanghai Restaurant,Shopping Mall,Singaporean Restaurant,Snack Place,Steakhouse,Supermarket,Szechuan Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Park,Tibetan Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Aberdeen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0
1,Admiralty,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
2,Ap Lei Chau,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111
3,Causeway Bay,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.8,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chai Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.1,0.2,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
6,Cheung Sha Wan,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Chung Hom Kok,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Diamond Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1
9,Discovery Bay,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0


In [24]:
hk_grouped.shape

(101, 55)

### find out the missing districts

In [25]:
missing_district = [i for i in districtdf['Neighborhood'].unique() if i not in hk_grouped['Neighborhood'].unique()]
missing_district

[" Jardine's Lookout",
 'Pok Fu Lam',
 ' Tai Tam',
 ' Tai Wo Ping',
 ' Stonecutters Island',
 ' Beacon Hill',
 ' Tsing Yi',
 ' Ting Kau',
 ' Tsing Lung Tau',
 ' Sunny Bay',
 'Tai Lam Chung',
 ' Ha Tsuen',
 ' San Tin',
 ' Shek Kong',
 ' Pat Heung',
 ' Luk Keng',
 ' Wu Kau Tang',
 ' Shuen Wan',
 ' Cheung Muk Tau',
 ' Kei Ling Ha',
 'Clear Water Bay',
 ' Tai Mong Tsai',
 ' Tseung Kwan O',
 ' Ma Yau Tong',
 ' Lantau Island',
 ' Lamma Island']

### each neighborhood along with the top 5 most common venue

In [26]:
num_top_venues = 5

for hood in hk_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hk_grouped[hk_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Aberdeen----
                           venue  freq
0                 Cha Chaan Teng   0.4
1             Chinese Restaurant   0.2
2            Japanese Restaurant   0.1
3  Vegetarian / Vegan Restaurant   0.1
4              Hotpot Restaurant   0.1


---- Admiralty----
                   venue  freq
0    Japanese Restaurant   0.5
1   Cantonese Restaurant   0.2
2     Dim Sum Restaurant   0.1
3     Chinese Restaurant   0.1
4  Vietnamese Restaurant   0.1


---- Ap Lei Chau----
                   venue  freq
0     Chinese Restaurant  0.44
1         Cha Chaan Teng  0.22
2  Vietnamese Restaurant  0.11
3        Thai Restaurant  0.11
4     Seafood Restaurant  0.11


---- Causeway Bay----
                   venue  freq
0    Japanese Restaurant   0.9
1              BBQ Joint   0.1
2    American Restaurant   0.0
3          Shopping Mall   0.0
4  Korean BBQ Restaurant   0.0


---- Central----
                 venue  freq
0  Japanese Restaurant   0.8
1         Noodle House   0.1
2           Rest

4  Japanese Restaurant  0.00


---- Shau Kei Wan----
                  venue  freq
0    Chinese Restaurant   0.4
1  Hong Kong Restaurant   0.2
2     Indian Restaurant   0.1
3        Cha Chaan Teng   0.1
4    Dim Sum Restaurant   0.1


---- Shek Kip Mei----
                  venue  freq
0        Cha Chaan Teng   0.4
1  Hong Kong Restaurant   0.3
2   Japanese Restaurant   0.2
3       Thai Restaurant   0.1
4   American Restaurant   0.0


---- Shek O----
                 venue  freq
0      Thai Restaurant  0.50
1  American Restaurant  0.25
2   Dim Sum Restaurant  0.25
3  Shanghai Restaurant  0.00
4  Japanese Restaurant  0.00


---- Shek Tong Tsui----
                   venue  freq
0   Cantonese Restaurant   0.3
1    Japanese Restaurant   0.2
2  Vietnamese Restaurant   0.1
3   Hong Kong Restaurant   0.1
4     Dim Sum Restaurant   0.1


---- Shek Wu Hui----
                venue  freq
0      Cha Chaan Teng   0.4
1    Asian Restaurant   0.2
2  Chinese Restaurant   0.2
3                 Pub   

In [27]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [28]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = hk_grouped['Neighborhood']

for ind in np.arange(hk_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hk_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(20)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberdeen,Cha Chaan Teng,Chinese Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Breakfast Spot,Buffet,Asian Restaurant,Grocery Store
1,Admiralty,Japanese Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Bar,Breakfast Spot,Grocery Store,Gourmet Shop,Asian Restaurant
2,Ap Lei Chau,Chinese Restaurant,Cha Chaan Teng,Vietnamese Restaurant,Thai Restaurant,Seafood Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University
3,Causeway Bay,Japanese Restaurant,BBQ Joint,Vietnamese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
4,Central,Japanese Restaurant,Noodle House,Restaurant,Vietnamese Restaurant,Hong Kong Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
5,Chai Wan,Cantonese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Fast Food Restaurant,Cha Chaan Teng,Shanghai Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop
6,Cheung Sha Wan,Cha Chaan Teng,Hong Kong Restaurant,Japanese Restaurant,Restaurant,Asian Restaurant,Cantonese Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
7,Chung Hom Kok,Shanghai Restaurant,Asian Restaurant,German Restaurant,Dim Sum Restaurant,Grocery Store,Gourmet Shop,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant
8,Diamond Hill,Chinese Restaurant,Cha Chaan Teng,Cantonese Restaurant,Singaporean Restaurant,Japanese Restaurant,Vietnamese Restaurant,Thai Restaurant,Bar,Dumpling Restaurant,Gourmet Shop
9,Discovery Bay,Vegetarian / Vegan Restaurant,Australian Restaurant,Thai Restaurant,Korean Restaurant,Chinese Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University


### Cluster Neighorhoods

Run k-means to cluster the neighborhood into 5 clusters.

### Use Elbow Method to find the optimal K

In [31]:
WSS = []
K = range(1,10)
for n in K:
    algo = KMeans(n_clusters = n)
    algo.fit(neighborhoods_venues_sorted)
    WSS.append(algo.inertia_)

plt.plot(K, WSS)
plt.xlabel('Values of K') 
plt.ylabel('Sum of squared distances/Inertia') 
plt.title('Elbow Method For Optimal k')
plt.show()

ValueError: could not convert string to float: ' Aberdeen'

Use Silhouette Score to find the optimal K

In [30]:
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans

sil = []
# Number of clusters in range 2-10
K = range(2, 10)
for n in K:
    algorithm = (KMeans(n_clusters = n) )
    algorithm.fit(hk_grouped_clustering)
    labels = algorithm.labels_
    sil.append(silhouette_score(hk_grouped_clustering, labels, metric = 'euclidean'))

plt.plot(K, sil)
plt.xlabel('Values of K') 
plt.ylabel(' Silhouette Score') 
plt.title(' Silhouette Score For Optimal k')
plt.show()


NameError: name 'hk_grouped_clustering' is not defined

In [32]:
# set number of clusters
kclusters = 5

hk_grouped_clustering = hk_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 1, 0, 0, 1, 3, 1, 1, 1])

###  new dataframe that includes the cluster as well as the top 10 venues for each neighborhood

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
#hk_merged = hk_merged.astype({"Cluster Labels":'int'})

hk_merged = districtdf

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
hk_merged = hk_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
hk_merged  = hk_merged[pd.notnull(hk_merged['Cluster Labels'])]
hk_merged = hk_merged.astype({"Cluster Labels":'int'})

hk_merged # check the last columns!

Unnamed: 0,District,Neighborhood,Location,Coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Kennedy Town,"Kennedy Town, Hong Kong, China","(堅尼地城 Kennedy Town, 12N, 士美菲路 Smithfield, 堅尼地城...",22.281312,114.12916,0,Japanese Restaurant,Cha Chaan Teng,Szechuan Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
1,Central and Western,Shek Tong Tsui,"Shek Tong Tsui, Hong Kong, China","(石塘咀 Shek Tong Tsui, 中西區 Central and Western D...",22.285876,114.135749,1,Cantonese Restaurant,Japanese Restaurant,Vietnamese Restaurant,Chinese Restaurant,Cha Chaan Teng,Dim Sum Restaurant,Hong Kong Restaurant,Elementary School,Grocery Store,Gourmet Shop
2,Central and Western,Sai Ying Pun,"Sai Ying Pun, Hong Kong, China","(西營盤 Sai Ying Pun, 中西區 Central and Western Dis...",22.286121,114.142086,0,Japanese Restaurant,Restaurant,Szechuan Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Vietnamese Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant
3,Central and Western,Sheung Wan,"Sheung Wan, Hong Kong, China","(上環 Sheung Wan, 中西區 Central and Western Distri...",22.286483,114.150197,0,Japanese Restaurant,BBQ Joint,Noodle House,Cha Chaan Teng,Vietnamese Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
4,Central and Western,Central,"Central, Hong Kong, China","(中環 Central, 中西區 Central and Western District,...",22.281829,114.158278,0,Japanese Restaurant,Noodle House,Restaurant,Vietnamese Restaurant,Hong Kong Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
5,Central and Western,Admiralty,"Admiralty, Hong Kong, China","(金鐘 Admiralty, 德立街 Drake Street, 金鐘 Admiralty,...",22.278616,114.166269,0,Japanese Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Bar,Breakfast Spot,Grocery Store,Gourmet Shop,Asian Restaurant
6,Central and Western,Mid-levels,"Mid-levels, Hong Kong, China","(半山 Mid-Levels, 中西區 Central and Western Distri...",22.276935,114.155937,0,Japanese Restaurant,Hong Kong Restaurant,Restaurant,Thai Restaurant,Cantonese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Gourmet Shop,German Restaurant,General College & University
7,Central and Western,Peak,"Peak, Hong Kong, China","(山頂 The Peak, 中西區 Central and Western District...",22.269917,114.150667,3,Hong Kong Restaurant,Asian Restaurant,BBQ Joint,Cha Chaan Teng,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Grocery Store,Gourmet Shop,German Restaurant
8,Wan Chai,Wan Chai,"Wan Chai, Hong Kong, China","(灣仔 Wan Chai, 灣仔區 Wan Chai District, 香港島 Hong ...",22.279015,114.172483,0,Japanese Restaurant,Hong Kong Restaurant,Shanghai Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
9,Wan Chai,Causeway Bay,"Causeway Bay, Hong Kong, China","(銅鑼灣 Causeway Bay, 灣仔區 Wan Chai District, 香港島 ...",22.280511,114.185559,0,Japanese Restaurant,BBQ Joint,Vietnamese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop


In [34]:
# create map
map_clusters = folium.Map(location=[districtdf.Latitude.mean(),
                           districtdf.Longitude.mean()], zoom_start=11, control_scale=True, tiles="OpenStreetMap")

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hk_merged['Latitude'], hk_merged['Longitude'], hk_merged['Neighborhood'], hk_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

##### Cluster 1

In [35]:
hk_merged.loc[hk_merged['Cluster Labels'] == 0, hk_merged.columns[[1] + list(range(7, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Kennedy Town,Japanese Restaurant,Cha Chaan Teng,Szechuan Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
2,Sai Ying Pun,Japanese Restaurant,Restaurant,Szechuan Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Vietnamese Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant
3,Sheung Wan,Japanese Restaurant,BBQ Joint,Noodle House,Cha Chaan Teng,Vietnamese Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
4,Central,Japanese Restaurant,Noodle House,Restaurant,Vietnamese Restaurant,Hong Kong Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
5,Admiralty,Japanese Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Chinese Restaurant,Dim Sum Restaurant,Bar,Breakfast Spot,Grocery Store,Gourmet Shop,Asian Restaurant
6,Mid-levels,Japanese Restaurant,Hong Kong Restaurant,Restaurant,Thai Restaurant,Cantonese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Gourmet Shop,German Restaurant,General College & University
8,Wan Chai,Japanese Restaurant,Hong Kong Restaurant,Shanghai Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
9,Causeway Bay,Japanese Restaurant,BBQ Joint,Vietnamese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
10,Tin Hau,Japanese Restaurant,Grocery Store,Thai Restaurant,Taiwanese Restaurant,Dim Sum Restaurant,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
12,Tai Hang,Japanese Restaurant,American Restaurant,Chinese Restaurant,Grocery Store,Hong Kong Restaurant,Bar,Breakfast Spot,Asian Restaurant,Gourmet Shop,Australian Restaurant


##### Cluster 2

In [36]:
hk_merged.loc[hk_merged['Cluster Labels'] ==1, hk_merged.columns[[1] + list(range(7, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Shek Tong Tsui,Cantonese Restaurant,Japanese Restaurant,Vietnamese Restaurant,Chinese Restaurant,Cha Chaan Teng,Dim Sum Restaurant,Hong Kong Restaurant,Elementary School,Grocery Store,Gourmet Shop
11,Happy Valley,Elementary School,Vietnamese Restaurant,Diner,Hong Kong Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
13,So Kon Po,Japanese Restaurant,Café,Chinese Restaurant,Hong Kong Restaurant,French Restaurant,Food & Drink Shop,Cha Chaan Teng,Dongbei Restaurant,Grocery Store,Gourmet Shop
19,Shau Kei Wan,Chinese Restaurant,Hong Kong Restaurant,Dim Sum Restaurant,Cha Chaan Teng,Diner,Indian Restaurant,Breakfast Spot,Elementary School,Asian Restaurant,Grocery Store
20,Chai Wan,Cantonese Restaurant,Chinese Restaurant,Vietnamese Restaurant,Japanese Restaurant,Fast Food Restaurant,Cha Chaan Teng,Shanghai Restaurant,Dongbei Restaurant,Grocery Store,Gourmet Shop
21,Siu Sai Wan,Chinese Restaurant,Thai Restaurant,Dim Sum Restaurant,Cha Chaan Teng,Taiwanese Restaurant,Hong Kong Restaurant,BBQ Joint,Bar,Grocery Store,Gourmet Shop
24,Ap Lei Chau,Chinese Restaurant,Cha Chaan Teng,Vietnamese Restaurant,Thai Restaurant,Seafood Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University
26,Shouson Hill,Theme Park,Vietnamese Restaurant,Dim Sum Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant
28,Chung Hom Kok,Shanghai Restaurant,Asian Restaurant,German Restaurant,Dim Sum Restaurant,Grocery Store,Gourmet Shop,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant
31,Shek O,Thai Restaurant,American Restaurant,Dim Sum Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant


##### Cluster 3

In [37]:
hk_merged.loc[hk_merged['Cluster Labels'] == 2, hk_merged.columns[[1]+ list(range(7, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
86,Lau Fau Shan,Seafood Restaurant,Dim Sum Restaurant,Chinese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
98,Sha Tau Kok,Seafood Restaurant,Vietnamese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant


##### Cluster 4

In [38]:
hk_merged.loc[hk_merged['Cluster Labels'] == 3, hk_merged.columns[[1] + list(range(7, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Peak,Hong Kong Restaurant,Asian Restaurant,BBQ Joint,Cha Chaan Teng,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Grocery Store,Gourmet Shop,German Restaurant
23,Aberdeen,Cha Chaan Teng,Chinese Restaurant,Hotpot Restaurant,Hong Kong Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Breakfast Spot,Buffet,Asian Restaurant,Grocery Store
25,Wong Chuk Hang,Hong Kong Restaurant,Seafood Restaurant,Cha Chaan Teng,Chinese Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop
29,Stanley,Asian Restaurant,Bar,Cantonese Restaurant,Cha Chaan Teng,Vietnamese Restaurant,Dumpling Restaurant,Hong Kong Restaurant,Grocery Store,Gourmet Shop,German Restaurant
35,King's Park,Cha Chaan Teng,Fast Food Restaurant,Chinese Restaurant,Dongbei Restaurant,Cantonese Restaurant,Vietnamese Restaurant,Hong Kong Restaurant,Grocery Store,Gourmet Shop,German Restaurant
38,Mei Foo,Cha Chaan Teng,Japanese Restaurant,Vietnamese Restaurant,Seafood Restaurant,Dim Sum Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant
40,Cheung Sha Wan,Cha Chaan Teng,Hong Kong Restaurant,Japanese Restaurant,Restaurant,Asian Restaurant,Cantonese Restaurant,Grocery Store,Gourmet Shop,German Restaurant,General College & University
42,Shek Kip Mei,Cha Chaan Teng,Hong Kong Restaurant,Japanese Restaurant,Thai Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant
47,To Kwa Wan,Japanese Restaurant,Cantonese Restaurant,Cha Chaan Teng,Vietnamese Restaurant,Korean Restaurant,Steakhouse,Chinese Restaurant,Dongbei Restaurant,Gourmet Shop,German Restaurant
48,Ma Tau Kok,Cha Chaan Teng,Vietnamese Restaurant,Cantonese Restaurant,Japanese Restaurant,Korean Restaurant,Chinese Restaurant,Hong Kong Restaurant,Buffet,Fast Food Restaurant,Grocery Store


##### Cluster 5

In [39]:
hk_merged.loc[hk_merged['Cluster Labels'] == 4, hk_merged.columns[[1] + list(range(7, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
103,Tai Po Kau,Restaurant,Vietnamese Restaurant,Diner,Grocery Store,Gourmet Shop,German Restaurant,General College & University,French Restaurant,Food & Drink Shop,Fast Food Restaurant


### Conclusion
Cluster 1 is good for Japanese restaurants to consider and expand their business there.

In [20]:
i = 1
Test_Latitude = districtdf['Latitude'][i]
Test_Longitude = districtdf['Longitude'][i]
radius = 500
LIMIT = 10
search_query = 'Japanese'
print(Test_Latitude,",",Test_Longitude)
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, Test_Latitude, Test_Longitude ,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
results = requests.get(url).json()
#venues = results['response']['venues']
#df_filtered = json_normalize(venues)
#df_filtered
url

22.2858761 , 114.1357494


'https://api.foursquare.com/v2/venues/search?client_id=YDZTFUA0WMUJ03PGLQHH0JF2ZDC3DJDTURTKDBLO1EAOHV1E&client_secret=Q5KP4RT0ZMVR15L4XFXNAZMAGSTIQCJ1HFJO0DFDRQRLN5DY&ll=22.2858761,114.1357494&oauth_token=Z4VZV14NNX4H304QYJAMJKRDX35BHSE3WVUAWBZ3CHCXMJLJ&v=20210824&query=Japanese&radius=500&limit=10'

In [None]:
#latitude = 22.285221790034562
#longitude = 114.13319169114956

venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=11,
    color='red',
    popup='Keneddy Town',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map