# Capstone Project - Best District for New Coffee Shop in Hong Kong

### Applied Data Science Capstone by IBM/Coursera

## 1. Introduction


### 1.1 Background

Hong Kong is primarily the entry into the international market because it scores high on several factors including its strategic location, productive work-force, attractive tax regime, world-class infrastructure and and an effective legal system. Many investors and entrepreneurs have chosen to set up their businesses in Hong Kong. Although Hong Kong is small in terms of the area of land, the population is comparatively large which contributes to the great business opportunities. 

### 1.2 Problem

Hong Kong is small, merely around 1,100 km², and the majority of Hong Kong's landscape consists of steep, undeveloped mountains and hills, which explain why Hong Kong has limited land for development. Officially, there are 18 districts in Hong Kong. The first question for a startup in Hong Kong would be where the shop will be. In the context of opening a new coffee shop, it is concerned that which district should be chosen regarding the business opportunities and competition. 

### 1.3 Stakeholders

The quantitative analysis aims to provide potential investors, or startup entrepreneurs, especially those who are interested in opening a new coffee shop with a guide to analyze the important problem scientifically. Supplement information such as rental prices of certain potential retail shops and their community facilities nearby is needed for more thorough consideration. Plus, government authorities can refer to the analysis for better understanding the city's culture diversity. 

## 2. Data


The analysis to find the best districts for new coffee shops is based on the following aspects:

* number of existing coffee shops in the districts;
* population density in the districts.

The sources of data are the following to achieve their respective aims:

* **Wikipedia**: To obtain the district data, including names of regions, names of districts, population density;
* **OpenCage Geocoder API**: To look up the latitudes and longitudes of all districts;
* **Foursquare API**: To obtain the number of coffee shop, their types and locations in all districts.

There are different websites scraping libraries and packages in Python. For scraping the table from Wiki, `pandas` is simply used to read the table into a pandas dataframe. Then, a free API, OpenCage Geocoder, is utilized to find the longitude-latitude coordinates for the list of districts in Hong Kong.

### Scrapping District Data (Names of Regions, Names of Districts, Population Density)

Before scrapping and exploring the data, all the dependencies needed should be downloaded first.

In [None]:
import pandas as pd
!pip install lxml

!pip install opencage
from opencage.geocoder import OpenCageGeocode

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes
import folium

import requests

import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

from pandas.io.json import json_normalize

print('Libraries imported.')

Next, `pandas` is used to read the tables in the Wikipedia page and a `for` loop is used to scrap a 'District table'. The name of a column is adjusted and the dataframe with the column names, Region, District, Population, Area(km²), and Density(/km²) is built. 

In [2]:
tables = pd.read_html('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong', header=0)

headings = ['District']

for table in tables:
    current_headings = table.columns.values[:1]
    if len(current_headings) != len(headings):
        continue
    if all(current_headings == headings):
        break

df = table.rename(columns={"Population[when?] [6]":"Population",})
df = df[['Region','District','Population','Area(km²)','Density(/km²)']]

df

Unnamed: 0,Region,District,Population,Area(km²),Density(/km²)
0,Hong Kong Island,Central and Western,244600,12.44,19983.92
1,Hong Kong Island,Eastern,574500,18.56,31217.67
2,Hong Kong Island,Southern,269200,38.85,6962.68
3,Hong Kong Island,Wan Chai,150900,9.83,15300.1
4,Kowloon,Sham Shui Po,390600,9.35,41529.41
5,Kowloon,Kowloon City,405400,10.02,40194.7
6,Kowloon,Kwun Tong,641100,11.27,56779.05
7,Kowloon,Wong Tai Sin,426200,9.3,45645.16
8,Kowloon,Yau Tsim Mong,318100,6.99,44864.09
9,New Territories,Islands,146900,175.12,825.14


In order to utilize the Foursquare location data, it is needed to get the latitude and the longitude coordinates of each districts. OpenCage Geocoder, which is a free API that can be use to look up coordinates of places, and also find out the place a set of coordinates corresponds to, is used to get the data of latitudes and longitudes of the districts.

In [3]:
#Geocoding Tutorial from Amaral Lab: https://amaral.northwestern.edu/blog/getting-long-lat-list-cities

key = '1cfb1dbb86d54891a7c74a57c4761949'
geocoder = OpenCageGeocode(key)

In [4]:
list_lat = []
list_long = []

for index, row in df.iterrows():
    
    District = row['District']
    Region = row['Region']       
    query = str(District)+','+str(Region)
    
    geo_results = geocoder.geocode(query)   
    district_lat = geo_results[0]['geometry']['lat']
    district_long = geo_results[0]['geometry']['lng']
    
    list_lat.append(district_lat)
    list_long.append(district_long)

df['Latitude'] = list_lat
df['Longitude'] = list_long

df

Unnamed: 0,Region,District,Population,Area(km²),Density(/km²),Latitude,Longitude
0,Hong Kong Island,Central and Western,244600,12.44,19983.92,22.281938,114.158077
1,Hong Kong Island,Eastern,574500,18.56,31217.67,22.273078,114.233594
2,Hong Kong Island,Southern,269200,38.85,6962.68,22.244541,114.205376
3,Hong Kong Island,Wan Chai,150900,9.83,15300.1,22.279015,114.172483
4,Kowloon,Sham Shui Po,390600,9.35,41529.41,22.32819,114.160854
5,Kowloon,Kowloon City,405400,10.02,40194.7,22.33016,114.189937
6,Kowloon,Kwun Tong,641100,11.27,56779.05,22.312937,114.22561
7,Kowloon,Wong Tai Sin,426200,9.3,45645.16,22.341654,114.193859
8,Kowloon,Yau Tsim Mong,318100,6.99,44864.09,22.302857,114.182032
9,New Territories,Islands,146900,175.12,825.14,22.230076,113.986785


In [5]:
print('The dataframe has {} regions and {} districts.'.format(
        len(df['Region'].unique()),
        df.shape[0]
    )
)

The dataframe has 3 regions and 18 districts.


## 3. Methodology


To achieve the aim of the study, 

## 4. Analysis

In [6]:
address = 'Hong Kong'

geolocator = Nominatim(user_agent="hk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hong Kong are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hong Kong are 22.2793278, 114.1628131.


In [7]:
# create map of Hong Kong using latitude and longitude values
map_hongkong = folium.Map(location=[latitude+0.08, longitude],zoom_start=11)

# add markers to map
for lat, lng, region, district in zip(df['Latitude'], df['Longitude'], df['Region'], df['District']):
    label = '{}, {}'.format(district, region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hongkong)  

map_hongkong

In [8]:
CLIENT_ID = '1FZMJSXAV4VP3X0THT2EI1SF0EA5YH05T3JKYV0YDS5BAOVJ' # your Foursquare ID
CLIENT_SECRET = 'GD3V0SHKWYOQKA0XR1WF3CCOOIRXMTIIQ3LSA4ASYSCJ5WTR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1FZMJSXAV4VP3X0THT2EI1SF0EA5YH05T3JKYV0YDS5BAOVJ
CLIENT_SECRET:GD3V0SHKWYOQKA0XR1WF3CCOOIRXMTIIQ3LSA4ASYSCJ5WTR


In [9]:
df.loc[0, 'District']

'Central and Western'

In [10]:
district_latitude = df.loc[0, 'Latitude']
district_longitude = df.loc[0, 'Longitude']

district_name = df.loc[0, 'District']

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_latitude, 
                                                               district_longitude))

Latitude and longitude values of Central and Western are 22.2819378, 114.1580765.


In [11]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    district_latitude, 
    district_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=1FZMJSXAV4VP3X0THT2EI1SF0EA5YH05T3JKYV0YDS5BAOVJ&client_secret=GD3V0SHKWYOQKA0XR1WF3CCOOIRXMTIIQ3LSA4ASYSCJ5WTR&v=20180605&ll=22.2819378,114.1580765&radius=500&limit=100'

In [None]:
url_results = requests.get(url).json()
url_results

In [13]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [14]:
venues = url_results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Mott 32 (卅二公館),Dim Sum Restaurant,22.280696,114.15938
1,Mandarin Oriental Hong Kong (香港文華東方酒店),Hotel,22.281857,114.159382
2,Mandarin Grill + Bar (文華扒房＋酒吧),Steakhouse,22.281928,114.159408
3,The Mandarin Cake Shop,Bakery,22.281959,114.159416
4,XYZ,Cycle Studio,22.280877,114.157108


In [15]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        url_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in url_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [17]:
district_venues = getNearbyVenues(names=df['District'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [18]:
print(district_venues.shape)
district_venues.head()

(743, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central and Western,22.281938,114.158077,Mott 32 (卅二公館),22.280696,114.15938,Dim Sum Restaurant
1,Central and Western,22.281938,114.158077,Mandarin Oriental Hong Kong (香港文華東方酒店),22.281857,114.159382,Hotel
2,Central and Western,22.281938,114.158077,Mandarin Grill + Bar (文華扒房＋酒吧),22.281928,114.159408,Steakhouse
3,Central and Western,22.281938,114.158077,The Mandarin Cake Shop,22.281959,114.159416,Bakery
4,Central and Western,22.281938,114.158077,XYZ,22.280877,114.157108,Cycle Studio


In [19]:
district_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central and Western,100,100,100,100,100,100
Eastern,1,1,1,1,1,1
Islands,2,2,2,2,2,2
Kowloon City,67,67,67,67,67,67
Kwun Tong,66,66,66,66,66,66
North,100,100,100,100,100,100
Sai Kung,51,51,51,51,51,51
Sha Tin,54,54,54,54,54,54
Sham Shui Po,32,32,32,32,32,32
Southern,1,1,1,1,1,1


In [20]:
print('There are {} uniques categories.'.format(len(district_venues['Venue Category'].unique())))

There are 157 uniques categories.


In [21]:
# one hot encoding
district_onehot = pd.get_dummies(district_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
district_onehot['District'] = district_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [district_onehot.columns[-1]] + list(district_onehot.columns[:-1])
district_onehot = district_onehot[fixed_columns]

district_onehot.head()

Unnamed: 0,District,Airport Service,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Astrologer,Australian Restaurant,BBQ Joint,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Yunnan Restaurant,Zoo
0,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Central and Western,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
district_onehot.shape

(743, 158)

In [23]:
district_grouped = district_onehot.groupby('District').mean().reset_index()
district_grouped

Unnamed: 0,District,Airport Service,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Astrologer,Australian Restaurant,BBQ Joint,...,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Yunnan Restaurant,Zoo
0,Central and Western,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.02,...,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0
1,Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Islands,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Kowloon City,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,...,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0
4,Kwun Tong,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.015152,...,0.0,0.0,0.0,0.015152,0.0,0.0,0.0,0.0,0.0,0.0
5,North,0.0,0.02,0.01,0.03,0.0,0.02,0.0,0.01,0.0,...,0.0,0.0,0.01,0.01,0.01,0.01,0.02,0.01,0.0,0.0
6,Sai Kung,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,...,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.0,0.0
7,Sha Tin,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,...,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0
8,Sham Shui Po,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,...,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
9,Southern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
district_grouped.shape

(17, 158)

In [None]:
num_top_venues = 5

for hood in district_grouped['District']:
    print("----"+hood+"----")
    temp = district_grouped[district_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
district_venues_sorted = pd.DataFrame(columns=columns)
district_venues_sorted['District'] = district_grouped['District']

for ind in np.arange(district_grouped.shape[0]):
    district_venues_sorted.iloc[ind, 1:] = return_most_common_venues(district_grouped.iloc[ind, :], num_top_venues)

district_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Social Club,Lounge,Coffee Shop,Cocktail Bar,Gym,Cantonese Restaurant,Café
1,Eastern,Bus Stop,Zoo,Food Court,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Flea Market
2,Islands,Mountain,Rock Climbing Spot,Zoo,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Court
3,Kowloon City,Thai Restaurant,Dessert Shop,Chinese Restaurant,Café,Coffee Shop,Cha Chaan Teng,Bakery,Noodle House,Halal Restaurant,Fast Food Restaurant
4,Kwun Tong,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Japanese Restaurant,Sushi Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Restaurant
5,North,Italian Restaurant,French Restaurant,Cocktail Bar,Coffee Shop,Mexican Restaurant,Bakery,Ice Cream Shop,Pizza Place,Rock Club,Sandwich Place
6,Sai Kung,Seafood Restaurant,Café,Thai Restaurant,Coffee Shop,Dessert Shop,Burger Joint,Pub,Pizza Place,Sri Lankan Restaurant,Chinese Restaurant
7,Sha Tin,Café,Shopping Mall,Dessert Shop,Clothing Store,Dim Sum Restaurant,Chinese Restaurant,Japanese Restaurant,Cantonese Restaurant,Coffee Shop,Hong Kong Restaurant
8,Sham Shui Po,Noodle House,Dessert Shop,Chinese Restaurant,Italian Restaurant,Snack Place,Café,Shopping Mall,Hong Kong Restaurant,Hostel,Cha Chaan Teng
9,Southern,Reservoir,Zoo,Flea Market,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Court


In [28]:
# set number of clusters
kclusters = 6

district_grouped_clustering = district_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(district_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 0, 5, 5, 1, 5, 5, 4, 3], dtype=int32)

In [29]:
# add clustering labels
district_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

district_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
district_merged = district_merged.join(district_venues_sorted.set_index('District'), on='District', how='right')

district_merged.head() # check the last columns!

Unnamed: 0,Region,District,Population,Area(km²),Density(/km²),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hong Kong Island,Central and Western,244600,12.44,19983.92,22.281938,114.158077,1,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Social Club,Lounge,Coffee Shop,Cocktail Bar,Gym,Cantonese Restaurant,Café
1,Hong Kong Island,Eastern,574500,18.56,31217.67,22.273078,114.233594,2,Bus Stop,Zoo,Food Court,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Flea Market
2,Hong Kong Island,Southern,269200,38.85,6962.68,22.244541,114.205376,3,Reservoir,Zoo,Flea Market,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Court
3,Hong Kong Island,Wan Chai,150900,9.83,15300.1,22.279015,114.172483,1,Coffee Shop,Café,Italian Restaurant,Hotel,Hong Kong Restaurant,Lounge,Korean Restaurant,Cantonese Restaurant,Chinese Restaurant,Sandwich Place
4,Kowloon,Sham Shui Po,390600,9.35,41529.41,22.32819,114.160854,4,Noodle House,Dessert Shop,Chinese Restaurant,Italian Restaurant,Snack Place,Café,Shopping Mall,Hong Kong Restaurant,Hostel,Cha Chaan Teng


In [30]:
# create map
map_clusters = folium.Map(location=[latitude+0.08, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(district_merged['Latitude'], district_merged['Longitude'], district_merged['District'], district_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [31]:
#Cluster 1
district_merged.loc[district_merged['Cluster Labels'] == 0, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Islands,146900,22.230076,113.986785,0,Mountain,Rock Climbing Spot,Zoo,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Court


In [32]:
#Cluster 2
district_merged.loc[district_merged['Cluster Labels'] == 1, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,244600,22.281938,114.158077,1,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Social Club,Lounge,Coffee Shop,Cocktail Bar,Gym,Cantonese Restaurant,Café
3,Wan Chai,150900,22.279015,114.172483,1,Coffee Shop,Café,Italian Restaurant,Hotel,Hong Kong Restaurant,Lounge,Korean Restaurant,Cantonese Restaurant,Chinese Restaurant,Sandwich Place
8,Yau Tsim Mong,318100,22.302857,114.182032,1,Hotel,Coffee Shop,Buffet,Cocktail Bar,Train Station,Stadium,Burger Joint,Chinese Restaurant,Hong Kong Restaurant,Café
11,North,310800,40.722105,-73.988081,1,Italian Restaurant,French Restaurant,Cocktail Bar,Coffee Shop,Mexican Restaurant,Bakery,Ice Cream Shop,Pizza Place,Rock Club,Sandwich Place


In [33]:
#Cluster 3
district_merged.loc[district_merged['Cluster Labels'] == 2, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Eastern,574500,22.273078,114.233594,2,Bus Stop,Zoo,Food Court,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Flea Market


In [34]:
#Cluster 4
district_merged.loc[district_merged['Cluster Labels'] == 3, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Southern,269200,22.244541,114.205376,3,Reservoir,Zoo,Flea Market,Greek Restaurant,Gourmet Shop,Gastropub,Furniture / Home Store,Fried Chicken Joint,French Restaurant,Food Court


In [35]:
#Cluster 5
district_merged.loc[district_merged['Cluster Labels'] == 4, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Sham Shui Po,390600,22.32819,114.160854,4,Noodle House,Dessert Shop,Chinese Restaurant,Italian Restaurant,Snack Place,Café,Shopping Mall,Hong Kong Restaurant,Hostel,Cha Chaan Teng
14,Tai Po,307100,22.449402,114.171133,4,Chinese Restaurant,Fast Food Restaurant,Shopping Mall,Noodle House,Music Venue,Hong Kong Restaurant,Café,Coffee Shop,Sushi Restaurant,Plaza
15,Tsuen Wan,303600,22.371661,114.11347,4,Shopping Mall,Chinese Restaurant,Noodle House,Dessert Shop,Coffee Shop,Cha Chaan Teng,Fast Food Restaurant,Japanese Restaurant,Italian Restaurant,Sushi Restaurant
17,Yuen Long,607200,22.442646,114.030434,4,Chinese Restaurant,Dessert Shop,Noodle House,Fast Food Restaurant,Japanese Restaurant,Ramen Restaurant,Market,Bookstore,Shopping Mall,Cantonese Restaurant


In [36]:
#Cluster 6
district_merged.loc[district_merged['Cluster Labels'] == 5, district_merged.columns[[1] + [2] + list(range(5, district_merged.shape[1]))]]

Unnamed: 0,District,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Kowloon City,405400,22.33016,114.189937,5,Thai Restaurant,Dessert Shop,Chinese Restaurant,Café,Coffee Shop,Cha Chaan Teng,Bakery,Noodle House,Halal Restaurant,Fast Food Restaurant
6,Kwun Tong,641100,22.312937,114.22561,5,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Japanese Restaurant,Sushi Restaurant,Cha Chaan Teng,Hong Kong Restaurant,Restaurant
7,Wong Tai Sin,426200,22.341654,114.193859,5,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Burger Joint,Cantonese Restaurant,Szechuan Restaurant,Park,Temple,Café,Astrologer
12,Sai Kung,448600,22.382249,114.272828,5,Seafood Restaurant,Café,Thai Restaurant,Coffee Shop,Dessert Shop,Burger Joint,Pub,Pizza Place,Sri Lankan Restaurant,Chinese Restaurant
13,Sha Tin,648200,22.381056,114.188879,5,Café,Shopping Mall,Dessert Shop,Clothing Store,Dim Sum Restaurant,Chinese Restaurant,Japanese Restaurant,Cantonese Restaurant,Coffee Shop,Hong Kong Restaurant
16,Tuen Mun,495900,22.390826,113.973169,5,Shopping Mall,Coffee Shop,Cantonese Restaurant,Zoo,Burger Joint,Miscellaneous Shop,Park,Fast Food Restaurant,Dessert Shop,Department Store


https://towardsdatascience.com/exploring-the-taste-of-nyc-neighborhoods-1a51394049a4

## 5. Results

## 6. Discussion

## 7. Conclusion

