This notebook will be used for the Capstone Project


# Introduction/Business Problem

The purpose of this Capstone Project is to identify the venues around the different districts of Hong Kong and determine the venues with the best ratings for tourists. 

When visiting Hong Kong, tourists will want to find a place to stay in a neighborhood with the best venues surrounding them in terms of ratings and price. 

Using the Foursquare API, we will identify which district best suits tourists depending on their preference of restaurants and venues. 

# Data Collection

We will require different sets of data in order to complete this project. 

1. A list of the different districts of Hong Kong 
2. The Longitude and Latitude of these different districts
3. Venue data surrounding these districts using the Foursquare API 


In [1]:
import numpy as np
import pandas as pd

from urllib.request import urlopen
from bs4 import BeautifulSoup

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ------------------------------------------------------------
                       

In [2]:
#getting the html of our desired url
url = "https://en.wikipedia.org/wiki/Districts_of_Hong_Kong"
html = urlopen(url)

In [3]:
#creating the Beautifulsoup object
soup = BeautifulSoup(html, 'lxml')
type(soup)

bs4.BeautifulSoup

In [4]:
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Districts of Hong Kong - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"14214c98-4a9c-4938-88c2-4dba6a87161f","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Districts_of_Hong_Kong","wgTitle":"Districts of Hong Kong","wgCurRevisionId":959380263,"wgRevisionId":959380263,"wgArticleId":151994,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Webarchive template wayback links","EngvarB from July 2014","Use dmy dates from July 2014","Articles containing Chinese-langu

In [5]:
all_tables=soup.find_all("table")
all_tables
right_table=soup.find('table',class_='wikitable sortable')
right_table

<table class="wikitable sortable">
<tbody><tr>
<th>District
</th>
<th><a href="/wiki/Traditional_Chinese_characters" title="Traditional Chinese characters">Chinese</a>
</th>
<th>Population<sup class="noprint Inline-Template" style="white-space:nowrap;">[<i><a href="/wiki/Wikipedia:Manual_of_Style/Dates_and_numbers#Chronological_items" title="Wikipedia:Manual of Style/Dates and numbers"><span title="The time period mentioned near this tag is ambiguous. (December 2019)">when?</span></a></i>]</sup> <sup class="reference" id="cite_ref-6"><a href="#cite_note-6">[6]</a></sup>
</th>
<th>Area<br/>(km<sup>2</sup>)
</th>
<th>Density<br/>(/km<sup>2</sup>)
</th>
<th>Region
</th></tr>
<tr>
<td><a href="/wiki/Central_and_Western_District" title="Central and Western District"><span class="nowrap">Central and Western</span></a></td>
<td><span lang="zh-HK" title="Chinese language text">中西區</span></td>
<td align="right">244,600</td>
<td align="right">12.44</td>
<td align="right">19,983.92</td>
<td><span

In [6]:
A=[]
B=[]
C=[]
D=[]
E=[]
F=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==6:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))
        D.append(cells[3].find(text=True))
        E.append(cells[4].find(text=True))
        F.append(cells[5].find(text=True))
        


In [7]:
df=pd.DataFrame(A,columns=['District'])
df['Chinese']=B
df['Population']=C
df['Area km^2']=D
df['Density /km^2']=E
df['Region']=F
df

Unnamed: 0,District,Chinese,Population,Area km^2,Density /km^2,Region
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon
9,Islands,離島區,146900,175.12,825.14,New Territories


In [8]:
geolocator = Nominatim(user_agent="Hong Kong Districts")

df['Coordinates']=df['Chinese'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))
df[['Latitude', 'Longitude']] = df['Coordinates'].apply(pd.Series)

df.drop(['Coordinates'], axis = 1, inplace=True)
df

Unnamed: 0,District,Chinese,Population,Area km^2,Density /km^2,Region,Latitude,Longitude
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island,22.273078,114.233594
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island,22.219263,114.22523
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227661
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307404,114.165526
9,Islands,離島區,146900,175.12,825.14,New Territories,35.736156,139.714222


In [9]:
address = 'Hong Kong, China'

geolocator = Nominatim(user_agent="j.chau8261@gmail.com")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Hong Kong are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Hong Kong are 22.2793278, 114.1628131.


In [10]:
# creating a map of Hong Kong using latitude and longitude values
map_HK = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, district, region in zip(df['Latitude'], df['Longitude'], df['District'], df['Region']):
    label = '{}, {}'.format(district, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_HK)  
    
map_HK

In [11]:
CLIENT_ID = 'UGUIBGHT1X2BDCBMQME4JHPC1OUSKNZAI5QFXDHMTVRVOUN3' # your Foursquare ID
CLIENT_SECRET = '0QS0C0Q5B1USWQMRSBYZDS3L0DKVVBR5JPODBFUCNICG5LW1' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UGUIBGHT1X2BDCBMQME4JHPC1OUSKNZAI5QFXDHMTVRVOUN3
CLIENT_SECRET:0QS0C0Q5B1USWQMRSBYZDS3L0DKVVBR5JPODBFUCNICG5LW1


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
HK_venues = getNearbyVenues(names=df['District'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [14]:
HK_venues.head()


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central and Western,22.274848,114.148725,Hong Kong Trail (Section 1) (港島徑（第一段）),22.272874,114.145895,Trail
1,Central and Western,22.274848,114.148725,Victoria Peak (太平山),22.27128,114.149976,Scenic Lookout
2,Central and Western,22.274848,114.148725,The Sky Terrace 428 (凌霄閣摩天臺428),22.271304,114.149991,Scenic Lookout
3,Central and Western,22.274848,114.148725,Mount Austin Playground (柯士甸山遊樂場),22.272387,114.147578,Playground
4,Central and Western,22.274848,114.148725,The Peak Tower (凌霄閣),22.271307,114.149977,Monument / Landmark


In [15]:
print (HK_venues['Venue Category'].value_counts())

Chinese Restaurant               23
Ramen Restaurant                 15
Convenience Store                13
Noodle House                     13
Fast Food Restaurant             12
Café                             11
Dessert Shop                      9
Coffee Shop                       9
Japanese Restaurant               8
Shopping Mall                     8
Hotel                             7
Sushi Restaurant                  7
Hong Kong Restaurant              6
Hotpot Restaurant                 5
Pizza Place                       5
Park                              5
Cha Chaan Teng                    4
Sake Bar                          4
Cantonese Restaurant              4
Italian Restaurant                4
Indian Restaurant                 4
BBQ Joint                         4
Szechuan Restaurant               3
Thai Restaurant                   3
Snack Place                       3
Scenic Lookout                    3
Market                            3
Restaurant                  

In [16]:
HK_venue_frequency = HK_venues['Venue Category'].value_counts()[0:20].to_frame(name='frequency')
HK_venue_frequency=HK_venue_frequency.reset_index()

HK_venue_frequency.rename(index=str, columns={"index": "Venue Category", "frequency": "Frequency"}, inplace=True)
HK_venue_frequency

Unnamed: 0,Venue Category,Frequency
0,Chinese Restaurant,23
1,Ramen Restaurant,15
2,Convenience Store,13
3,Noodle House,13
4,Fast Food Restaurant,12
5,Café,11
6,Dessert Shop,9
7,Coffee Shop,9
8,Japanese Restaurant,8
9,Shopping Mall,8


In [17]:
import seaborn as sns
from matplotlib import pyplot as plt

s=sns.barplot(x="Venue Category", y="Frequency", data=HK_venue_frequency)
s.set_xticklabels(s.get_xticklabels(), rotation=45, horizontalalignment='right')

plt.title('Top 20 Most Frequent Venues in Hong Kong Districts', fontsize=15)
plt.xlabel("Venue Category", fontsize=15)
plt.ylabel ("Frequency", fontsize=15)
plt.savefig("Most_Freq_Venues1.png", dpi=300)
fig = plt.figure(figsize=(18,7))
plt.show()

<Figure size 640x480 with 1 Axes>

<Figure size 1800x700 with 0 Axes>

In [18]:
# one hot encoding
HK_onehot = pd.get_dummies(HK_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
HK_onehot['District'] = HK_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [HK_onehot.columns[-1]] + list(HK_onehot.columns[:-1])
HK_onehot = HK_onehot[fixed_columns]

HK_onehot.head(15)


Unnamed: 0,District,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,...,Track,Trail,Train Station,Tram Station,Tunnel,Udon Restaurant,Video Store,Vietnamese Restaurant,Waterfall,Yoshoku Restaurant
0,Central and Western,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Central and Western,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [19]:
HK_grouped = HK_onehot.groupby('District').mean().reset_index()
HK_grouped

Unnamed: 0,District,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,...,Track,Trail,Train Station,Tram Station,Tunnel,Udon Restaurant,Video Store,Vietnamese Restaurant,Waterfall,Yoshoku Restaurant
0,Central and Western,0.0,0.0,0.041667,0.041667,0.041667,0.0,0.041667,0.0,0.0,...,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0
1,Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Islands,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641
3,Kowloon City,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Kwai Tsing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
5,Kwun Tong,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,North,0.0,0.018868,0.0,0.0,0.0,0.037736,0.018868,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.018868,0.0
7,Sha Tin,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Sham Shui Po,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0
9,Tsuen Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [20]:
num_top_venues = 5

for hood in HK_grouped['District']:
    print("----"+hood+"----")
    temp = HK_grouped[HK_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central and Western----
                  venue  freq
0        Scenic Lookout  0.12
1        Ice Cream Shop  0.08
2  Fast Food Restaurant  0.04
3     Indian Restaurant  0.04
4           Coffee Shop  0.04


----Eastern----
                 venue  freq
0             Bus Stop  0.33
1               Castle  0.33
2        Historic Site  0.33
3  American Restaurant  0.00
4       Scenic Lookout  0.00


----Islands----
                 venue  freq
0     Ramen Restaurant  0.13
1          Music Store  0.05
2   Chinese Restaurant  0.05
3  Japanese Restaurant  0.05
4    Convenience Store  0.05


----Kowloon City----
                  venue  freq
0  Fast Food Restaurant  0.19
1    Chinese Restaurant  0.19
2                  Café  0.12
3                  Park  0.06
4                 Hotel  0.06


----Kwai Tsing----
                 venue  freq
0               Tunnel  0.33
1   Chinese Restaurant  0.33
2                 Café  0.33
3  American Restaurant  0.00
4          Pizza Place  0.00


----Kwun

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = HK_grouped['District']

for ind in np.arange(HK_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HK_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Scenic Lookout,Ice Cream Shop,Coffee Shop,Italian Restaurant,Playground,Pizza Place,Noodle House,Seafood Restaurant,Fast Food Restaurant,Monument / Landmark
1,Eastern,Historic Site,Castle,Bus Stop,Yoshoku Restaurant,Fried Chicken Joint,Donburi Restaurant,Dongbei Restaurant,Drugstore,Dumpling Restaurant,Electronics Store
2,Islands,Ramen Restaurant,Music Store,Chinese Restaurant,Convenience Store,Japanese Restaurant,Yoshoku Restaurant,Hobby Shop,Recording Studio,Park,Noodle House
3,Kowloon City,Chinese Restaurant,Fast Food Restaurant,Café,Hong Kong Restaurant,Noodle House,Park,Seafood Restaurant,Hostel,Hotel,Art Gallery
4,Kwai Tsing,Tunnel,Chinese Restaurant,Café,Yoshoku Restaurant,Fried Chicken Joint,Donburi Restaurant,Dongbei Restaurant,Drugstore,Dumpling Restaurant,Electronics Store


In [23]:
# set number of clusters
kclusters = 5

HK_grouped_clustering = HK_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HK_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 0, 4, 3, 0, 0, 4, 0, 1], dtype=int32)

In [28]:
# add clustering labels
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
HK_merged = df

# merge HK_grouped with HK_data to add latitude/longitude for each district
HK_merged = HK_merged.join(district_venues_sorted.set_index('District'), on='District')

HK_merged # check the last columns!

Unnamed: 0,District,Chinese,Population,Area km^2,Density /km^2,Region,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725,0.0,Scenic Lookout,Ice Cream Shop,Coffee Shop,Italian Restaurant,Playground,Pizza Place,Noodle House,Seafood Restaurant,Fast Food Restaurant,Monument / Landmark
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island,22.273078,114.233594,2.0,Historic Site,Castle,Bus Stop,Yoshoku Restaurant,Fried Chicken Joint,Donburi Restaurant,Dongbei Restaurant,Drugstore,Dumpling Restaurant,Electronics Store
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island,22.219263,114.22523,,,,,,,,,,,
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749,0.0,Hotel,Szechuan Restaurant,Restaurant,Chinese Restaurant,Cantonese Restaurant,Sports Club,Sushi Restaurant,Ramen Restaurant,Pizza Place,Coffee Shop
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321,0.0,Noodle House,Dessert Shop,Chinese Restaurant,Shopping Mall,Fast Food Restaurant,Ramen Restaurant,Dumpling Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Snack Place
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594,4.0,Chinese Restaurant,Fast Food Restaurant,Café,Hong Kong Restaurant,Noodle House,Park,Seafood Restaurant,Hostel,Hotel,Art Gallery
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227661,0.0,Coffee Shop,Chinese Restaurant,Café,Fast Food Restaurant,Japanese Restaurant,Shopping Mall,Bus Station,Hong Kong Restaurant,Hotpot Restaurant,Climbing Gym
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215,0.0,Shopping Mall,Convenience Store,Cantonese Restaurant,Taiwanese Restaurant,Multiplex,Cha Chaan Teng,Chinese Restaurant,Park,Pizza Place,Japanese Curry Restaurant
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307404,114.165526,0.0,Chinese Restaurant,Sandwich Place,Indian Restaurant,Café,Dessert Shop,Clothing Store,Bookstore,Park,Seafood Restaurant,Shaanxi Restaurant
9,Islands,離島區,146900,175.12,825.14,New Territories,35.736156,139.714222,0.0,Ramen Restaurant,Music Store,Chinese Restaurant,Convenience Store,Japanese Restaurant,Yoshoku Restaurant,Hobby Shop,Recording Studio,Park,Noodle House


In [29]:
#drop NaN rows
HK_merged=HK_merged.drop([HK_merged.index[2],HK_merged.index[12],HK_merged.index[14],HK_merged.index[16],HK_merged.index[17]])
HK_merged

Unnamed: 0,District,Chinese,Population,Area km^2,Density /km^2,Region,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725,0.0,Scenic Lookout,Ice Cream Shop,Coffee Shop,Italian Restaurant,Playground,Pizza Place,Noodle House,Seafood Restaurant,Fast Food Restaurant,Monument / Landmark
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island,22.273078,114.233594,2.0,Historic Site,Castle,Bus Stop,Yoshoku Restaurant,Fried Chicken Joint,Donburi Restaurant,Dongbei Restaurant,Drugstore,Dumpling Restaurant,Electronics Store
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749,0.0,Hotel,Szechuan Restaurant,Restaurant,Chinese Restaurant,Cantonese Restaurant,Sports Club,Sushi Restaurant,Ramen Restaurant,Pizza Place,Coffee Shop
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321,0.0,Noodle House,Dessert Shop,Chinese Restaurant,Shopping Mall,Fast Food Restaurant,Ramen Restaurant,Dumpling Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Snack Place
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594,4.0,Chinese Restaurant,Fast Food Restaurant,Café,Hong Kong Restaurant,Noodle House,Park,Seafood Restaurant,Hostel,Hotel,Art Gallery
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227661,0.0,Coffee Shop,Chinese Restaurant,Café,Fast Food Restaurant,Japanese Restaurant,Shopping Mall,Bus Station,Hong Kong Restaurant,Hotpot Restaurant,Climbing Gym
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215,0.0,Shopping Mall,Convenience Store,Cantonese Restaurant,Taiwanese Restaurant,Multiplex,Cha Chaan Teng,Chinese Restaurant,Park,Pizza Place,Japanese Curry Restaurant
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307404,114.165526,0.0,Chinese Restaurant,Sandwich Place,Indian Restaurant,Café,Dessert Shop,Clothing Store,Bookstore,Park,Seafood Restaurant,Shaanxi Restaurant
9,Islands,離島區,146900,175.12,825.14,New Territories,35.736156,139.714222,0.0,Ramen Restaurant,Music Store,Chinese Restaurant,Convenience Store,Japanese Restaurant,Yoshoku Restaurant,Hobby Shop,Recording Studio,Park,Noodle House
10,Kwai Tsing,葵青區,507100,23.34,21503.86,New Territories,22.341007,114.104285,3.0,Tunnel,Chinese Restaurant,Café,Yoshoku Restaurant,Fried Chicken Joint,Donburi Restaurant,Dongbei Restaurant,Drugstore,Dumpling Restaurant,Electronics Store


In [30]:
HK_merged['Cluster Labels']=HK_merged['Cluster Labels'].astype(np.int64)
print (HK_merged['Cluster Labels'])

0     0
1     2
3     0
4     0
5     4
6     0
7     0
8     0
9     0
10    3
11    0
13    4
15    1
Name: Cluster Labels, dtype: int64


In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_merged['Latitude'], HK_merged['Longitude'], HK_merged['District'], HK_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters