### Introduction/Background:

<p>The Gini – coefficient in Hong Kong is huge. One month’s profit for a successful businessman can be the poor’s total earning for more than 3 years. The gap between the poor and the richest are enlarging, this gap is not only reflected in salary but also in living style, education and living. In Hong Kong, living in different district can b one of the symbols to identify the social status.<p>






### Business Problem


<p>For businessperson, one of the critical factors which leads to success is the location of their business. While rich person spends more and poor person spends less, businessperson want to know which kind of business is more suitable in those rich and poor district so as to earn more.<p>

### Data

18 districts in Hong Kong:

https://en.wikipedia.org/wiki/Districts_of_Hong_Kong

Districts's latitude and longitude:
By Foresqure

Income for different district's resident:

https://www.censtatd.gov.hk/fd.jsp?file=B11303012018AN18B0100.pdf&product_id=B1130301&lang=2

### Import necessary library

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#conda install -c conda-forge geopy   # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Access the Hong Kong's districts dataset form wikipedia

In [2]:
url = "https://en.wikipedia.org/wiki/Districts_of_Hong_Kong"

In [3]:
df = pd.read_html(url,match="District")
df = df[2]
df

Unnamed: 0,District,Chinese,Population [6],Area(km²),Density(/km²),Region
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon
9,Islands,離島區,146900,175.12,825.14,New Territories


## Finding the latitude and longitude of those districts

In [4]:
lat=[]
lon=[]
for i in df['District']:
    location=i+" District, HK"
    #get the lat and long
    geolocator = Nominatim(user_agent="capstone")
    location = geolocator.geocode(location)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical',location,'are {}, {}.'.format(latitude, longitude))
    lat.append(latitude)
    lon.append(longitude)
    
df['latitude']=lat
df['longitude']=lon

The geograpical 中西區 Central and Western District, HK, China 中国 are 22.27484785, 114.148724944187.
The geograpical 東區 Eastern District, HK, China 中国 are 22.2730777, 114.233593773416.
The geograpical 南區 Southern District, HK, China 中国 are 22.2192627, 114.225229848547.
The geograpical 灣仔區 Wan Chai District, HK, China 中国 are 22.27394695, 114.18174874679.
The geograpical 深水埗區 Sham Shui Po District, HK, China 中国 are 22.33125395, 114.15932119658.
The geograpical 九龍城區 Kowloon City District, HK, China 中国 are 22.32179955, 114.188594186382.
The geograpical 觀塘區 Kwun Tong District, HK, China 中国 are 22.3086486, 114.227677478014.
The geograpical 黃大仙區 Wong Tai Sin District, HK, China 中国 are 22.34432185, 114.202150264315.
The geograpical 油尖旺區 Yau Tsim Mong District, HK, China 中国 are 22.30745265, 114.165503213127.
The geograpical 離島區 Islands District, HK, 中国 are 22.22806975, 113.987896201265.
The geograpical 葵青區 Kwai Tsing District, HK, 852, China 中国 are 22.3410033, 114.104264338884.
The geograpical 北區 

In [5]:
#rename the columns so that it can more easily access
df.rename(columns={"Population [6]":"Population","latitude":"Latitude","longitude":"Longitude"},inplace=True)
df

Unnamed: 0,District,Chinese,Population,Area(km²),Density(/km²),Region,Latitude,Longitude
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island,22.273078,114.233594
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island,22.219263,114.22523
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227677
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307453,114.165503
9,Islands,離島區,146900,175.12,825.14,New Territories,22.22807,113.987896


## Ploting a map with districts

In [6]:
# getting Hong kong geographic location
address = 'Hong Kong'

geolocator = Nominatim(user_agent="capstone") #Nominatim requires this value to be set to your application name. The goal is to be able to limit the number of requests per application.
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hong Kong are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hong Kong are 22.2793278, 114.1628131.


In [7]:
# create map of Toronto using latitude and longitude values
map_HongKong = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map (neighbourhood)

for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_HongKong)  
    
map_HongKong

## Getting information from Foresquare

In [8]:
#Add my client ID and Secret
CLIENT_ID = 'D3NYWDCSVAIPI3X4L3PG5VUPLBYS0S0R3R1OGR31AXUY3OQS' # your Foursquare ID
CLIENT_SECRET = 'HNJ5K4THVJHCPB0PIUOYFFQEMEOMQNIX4U0BAP5WSAGUMKZY' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 #how many value would like to return
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: D3NYWDCSVAIPI3X4L3PG5VUPLBYS0S0R3R1OGR31AXUY3OQS
CLIENT_SECRET:HNJ5K4THVJHCPB0PIUOYFFQEMEOMQNIX4U0BAP5WSAGUMKZY


In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
# type your answer here

HongKong_venues = getNearbyVenues(names=df['District'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )




Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [43]:
print(HongKong_venues.shape)
HongKong_venues.head(230)


(230, 7)


Unnamed: 0,District,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central and Western,22.274848,114.148725,Victoria Peak (太平山),22.27128,114.149976,Scenic Lookout
1,Central and Western,22.274848,114.148725,The Sky Terrace 428 (凌霄閣摩天臺428),22.271304,114.149991,Scenic Lookout
2,Central and Western,22.274848,114.148725,Hong Kong Trail (Section 1) 港島徑（第一段）,22.272874,114.145895,Trail
3,Central and Western,22.274848,114.148725,Mount Austin Playground (柯士甸山遊樂場),22.272387,114.147578,Playground
4,Central and Western,22.274848,114.148725,Madame Tussauds (杜莎夫人蠟像館),22.2714,114.14997,Art Gallery
5,Central and Western,22.274848,114.148725,The Peak Tower (凌霄閣),22.271307,114.149977,Shopping Mall
6,Central and Western,22.274848,114.148725,Bubba Gump (阿甘蝦餐廳),22.271323,114.150003,Seafood Restaurant
7,Central and Western,22.274848,114.148725,Rajasthan Rifles,22.271008,114.15031,Indian Restaurant
8,Central and Western,22.274848,114.148725,Wildfire Pizzabar & Grill,22.271292,114.150293,Pizza Place
9,Central and Western,22.274848,114.148725,Lion’s Pavilion (太平山獅子亭),22.270935,114.15081,Scenic Lookout


In [41]:
#how much location for each district
HongKong_venues.groupby('District').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central and Western,25,25,25,25,25,25
Islands,2,2,2,2,2,2
Kowloon City,18,18,18,18,18,18
Kwai Tsing,5,5,5,5,5,5
Kwun Tong,45,45,45,45,45,45
Sha Tin,12,12,12,12,12,12
Sham Shui Po,56,56,56,56,56,56
Tsuen Wan,4,4,4,4,4,4
Wan Chai,26,26,26,26,26,26
Wong Tai Sin,12,12,12,12,12,12


## Analysis each neighborhood

In [38]:
# one hot encoding
HK_onehot = pd.get_dummies(HongKong_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
HK_onehot['District'] = HongKong_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [HK_onehot.columns[-1]] + list(HK_onehot.columns[:-1])
HK_onehot = HK_onehot[fixed_columns]

HK_onehot.head()

Unnamed: 0,District,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Beijing Restaurant,Bookstore,Buffet,Burger Joint,Bus Station,Bus Stop,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,Flea Market,French Restaurant,Gift Shop,Gym,Halal Restaurant,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Malay Restaurant,Market,Miscellaneous Shop,Mountain,Multiplex,Nightclub,Noodle House,Park,Pizza Place,Playground,Pool,Racecourse,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shaanxi Restaurant,Shanghai Restaurant,Shopping Mall,Snack Place,Soccer Field,Sporting Goods Shop,Sports Bar,Sports Club,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tennis Court,Thai Restaurant,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Central and Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Central and Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Central and Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,Central and Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Central and Western,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [45]:
HK_grouped = HK_onehot.groupby('District').mean().reset_index()
HK_grouped

Unnamed: 0,District,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bar,Beach,Beijing Restaurant,Bookstore,Buffet,Burger Joint,Bus Station,Bus Stop,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,Flea Market,French Restaurant,Gift Shop,Gym,Halal Restaurant,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Malay Restaurant,Market,Miscellaneous Shop,Mountain,Multiplex,Nightclub,Noodle House,Park,Pizza Place,Playground,Pool,Racecourse,Ramen Restaurant,Record Shop,Restaurant,Rock Climbing Spot,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shaanxi Restaurant,Shanghai Restaurant,Shopping Mall,Snack Place,Soccer Field,Sporting Goods Shop,Sports Bar,Sports Club,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tennis Court,Thai Restaurant,Track,Trail,Train Station,Tram Station,Tunnel,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Central and Western,0.04,0.04,0.08,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.12,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0
1,Islands,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Kowloon City,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Kwai Tsing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0
4,Kwun Tong,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.022222,0.111111,0.022222,0.022222,0.0,0.088889,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.022222,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.0,0.022222,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.044444,0.0,0.022222,0.044444,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0
5,Sha Tin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0
6,Sham Shui Po,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.053571,0.089286,0.0,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.035714,0.017857,0.035714,0.017857,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.017857,0.017857,0.0,0.017857,0.017857,0.035714,0.0,0.017857,0.017857,0.0,0.0,0.0,0.0,0.178571,0.0,0.017857,0.0,0.017857,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.035714,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.017857
7,Tsuen Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Wan Chai,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.076923,0.0,0.0,0.038462,0.076923,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.038462,0.038462,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.076923,0.076923,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0
9,Wong Tai Sin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333


In [47]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [49]:
#return top 10 nenues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
hk_venues_sorted = pd.DataFrame(columns=columns)
hk_venues_sorted['District'] = HK_grouped['District']

for ind in np.arange(HK_grouped.shape[0]):
    hk_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HK_grouped.iloc[ind, :], num_top_venues)

hk_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Scenic Lookout,Asian Restaurant,Ice Cream Shop,Art Gallery,Indian Restaurant,Coffee Shop,Dessert Shop,Restaurant,Sandwich Place,Seafood Restaurant
1,Islands,Rock Climbing Spot,Mountain,Vietnamese Restaurant,French Restaurant,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant
2,Kowloon City,Fast Food Restaurant,Noodle House,Hong Kong Restaurant,Chinese Restaurant,Café,Park,Dessert Shop,Seafood Restaurant,Art Gallery,Tennis Court
3,Kwai Tsing,Tunnel,Chinese Restaurant,Taiwanese Restaurant,Bus Station,French Restaurant,Vietnamese Restaurant,Flea Market,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant
4,Kwun Tong,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Café,Shopping Mall,Sporting Goods Shop,Art Gallery,Multiplex,Climbing Gym,Clothing Store
5,Sha Tin,Cantonese Restaurant,Hotel,Convenience Store,Train Station,Japanese Restaurant,Hong Kong Restaurant,Market,Ramen Restaurant,Buffet,Chinese Restaurant
6,Sham Shui Po,Noodle House,Dessert Shop,Chinese Restaurant,Cha Chaan Teng,Shopping Mall,Dumpling Restaurant,Fast Food Restaurant,Italian Restaurant,Ramen Restaurant,Snack Place
7,Tsuen Wan,Hotel,Beach,Buffet,Café,Vietnamese Restaurant,French Restaurant,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store
8,Wan Chai,Sushi Restaurant,Chinese Restaurant,Cantonese Restaurant,Coffee Shop,Szechuan Restaurant,Shanghai Restaurant,Dessert Shop,Restaurant,Cocktail Bar,Sports Club
9,Wong Tai Sin,Fast Food Restaurant,Vietnamese Restaurant,Multiplex,Coffee Shop,Chinese Restaurant,Shopping Mall,Cantonese Restaurant,Bus Stop,Park,Noodle House


## Clustering District

In [50]:
# set number of clusters
kclusters = 3

HK_grouped_clustering = HK_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HK_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 0, 0, 0, 0, 2, 0, 0], dtype=int32)

In [54]:
# add clustering labels
hk_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

HK_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
HK_merged = HK_merged.join(hk_venues_sorted.set_index('District'), on='District')

 # check the last columns!

ValueError: cannot insert Cluster Labels, already exists

In [55]:
HK_merged

Unnamed: 0,District,Chinese,Population,Area(km²),Density(/km²),Region,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725,0.0,Scenic Lookout,Asian Restaurant,Ice Cream Shop,Art Gallery,Indian Restaurant,Coffee Shop,Dessert Shop,Restaurant,Sandwich Place,Seafood Restaurant
1,Eastern,東區,574500,18.56,31217.67,Hong Kong Island,22.273078,114.233594,,,,,,,,,,,
2,Southern,南區,269200,38.85,6962.68,Hong Kong Island,22.219263,114.22523,,,,,,,,,,,
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749,0.0,Sushi Restaurant,Chinese Restaurant,Cantonese Restaurant,Coffee Shop,Szechuan Restaurant,Shanghai Restaurant,Dessert Shop,Restaurant,Cocktail Bar,Sports Club
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321,0.0,Noodle House,Dessert Shop,Chinese Restaurant,Cha Chaan Teng,Shopping Mall,Dumpling Restaurant,Fast Food Restaurant,Italian Restaurant,Ramen Restaurant,Snack Place
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594,0.0,Fast Food Restaurant,Noodle House,Hong Kong Restaurant,Chinese Restaurant,Café,Park,Dessert Shop,Seafood Restaurant,Art Gallery,Tennis Court
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227677,0.0,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Café,Shopping Mall,Sporting Goods Shop,Art Gallery,Multiplex,Climbing Gym,Clothing Store
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215,0.0,Fast Food Restaurant,Vietnamese Restaurant,Multiplex,Coffee Shop,Chinese Restaurant,Shopping Mall,Cantonese Restaurant,Bus Stop,Park,Noodle House
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307453,114.165503,0.0,Chinese Restaurant,Indian Restaurant,Dessert Shop,Sandwich Place,Café,Snack Place,Hotpot Restaurant,Japanese Restaurant,Market,Multiplex
9,Islands,離島區,146900,175.12,825.14,New Territories,22.22807,113.987896,1.0,Rock Climbing Spot,Mountain,Vietnamese Restaurant,French Restaurant,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant


In [61]:
#HK_merged.drop(1,inplace=True)
#HK_merged.drop(2,inplace=True)
#HK_merged.drop(11,inplace=True)
#HK_merged.drop(12,inplace=True)
#HK_merged.drop(14,inplace=True)
#HK_merged.drop(16,inplace=True)
#HK_merged.drop(17,inplace=True)
HK_merged

Unnamed: 0,District,Chinese,Population,Area(km²),Density(/km²),Region,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,中西區,244600,12.44,19983.92,Hong Kong Island,22.274848,114.148725,0.0,Scenic Lookout,Asian Restaurant,Ice Cream Shop,Art Gallery,Indian Restaurant,Coffee Shop,Dessert Shop,Restaurant,Sandwich Place,Seafood Restaurant
3,Wan Chai,灣仔區,150900,9.83,15300.1,Hong Kong Island,22.273947,114.181749,0.0,Sushi Restaurant,Chinese Restaurant,Cantonese Restaurant,Coffee Shop,Szechuan Restaurant,Shanghai Restaurant,Dessert Shop,Restaurant,Cocktail Bar,Sports Club
4,Sham Shui Po,深水埗區,390600,9.35,41529.41,Kowloon,22.331254,114.159321,0.0,Noodle House,Dessert Shop,Chinese Restaurant,Cha Chaan Teng,Shopping Mall,Dumpling Restaurant,Fast Food Restaurant,Italian Restaurant,Ramen Restaurant,Snack Place
5,Kowloon City,九龍城區,405400,10.02,40194.7,Kowloon,22.3218,114.188594,0.0,Fast Food Restaurant,Noodle House,Hong Kong Restaurant,Chinese Restaurant,Café,Park,Dessert Shop,Seafood Restaurant,Art Gallery,Tennis Court
6,Kwun Tong,觀塘區,641100,11.27,56779.05,Kowloon,22.308649,114.227677,0.0,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Café,Shopping Mall,Sporting Goods Shop,Art Gallery,Multiplex,Climbing Gym,Clothing Store
7,Wong Tai Sin,黃大仙區,426200,9.3,45645.16,Kowloon,22.344322,114.20215,0.0,Fast Food Restaurant,Vietnamese Restaurant,Multiplex,Coffee Shop,Chinese Restaurant,Shopping Mall,Cantonese Restaurant,Bus Stop,Park,Noodle House
8,Yau Tsim Mong,油尖旺區,318100,6.99,44864.09,Kowloon,22.307453,114.165503,0.0,Chinese Restaurant,Indian Restaurant,Dessert Shop,Sandwich Place,Café,Snack Place,Hotpot Restaurant,Japanese Restaurant,Market,Multiplex
9,Islands,離島區,146900,175.12,825.14,New Territories,22.22807,113.987896,1.0,Rock Climbing Spot,Mountain,Vietnamese Restaurant,French Restaurant,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant
10,Kwai Tsing,葵青區,507100,23.34,21503.86,New Territories,22.341003,114.104264,0.0,Tunnel,Chinese Restaurant,Taiwanese Restaurant,Bus Station,French Restaurant,Vietnamese Restaurant,Flea Market,Dessert Shop,Dim Sum Restaurant,Donburi Restaurant
13,Sha Tin,沙田區,648200,68.71,9433.85,New Territories,22.391573,114.208098,0.0,Cantonese Restaurant,Hotel,Convenience Store,Train Station,Japanese Restaurant,Hong Kong Restaurant,Market,Ramen Restaurant,Buffet,Chinese Restaurant


In [65]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_merged['Latitude'], HK_merged['Longitude'], HK_merged['District'], HK_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Report

The different district in Hong Kong is mostly clustered in the same group except for those not in Kowloon and Hong Kong Island. The result points out that different district, no matter low or high income, they have the similar facility or shop. However, the only factor that makes the result varying is whether it is in downtown or not. 

The outcome is surprising as people make think higher income district may have more entertainment's facility while the low income district may have more restaurant or supermarket.