<h1>Analysis of Fitness Studio in Hong Kong</h1>

Introduction:

This is a system that visually displays the gyms conveniently located along the railway in Hong Kong and provides recommendation based on the ratings of them.

In this project, first, a list of railway stations is scraped from an online website then combined with geospatial data of the stations obtained from Geocoder to form a dataframe. Next, Foursquare API is deployed to obtain metadata of venues near the stations, where the data of gyms are extracted from. Afterwards, the gyms are segmented according to their ratings and distances from station. Then, k-means clustering technique is adopted to allocate the gyms with similar characteristics into clusters. Z-score is calculated to categorize clusters to produce a rating scale and the walking time is calcualated to provide an estimate for the user. Last, a map is produced to illustrate the clustered members for people to choose their gyms base on location, rating and walking distance.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

#import beautiful soup for web scraping
import requests
! pip install BeautifulSoup4
from bs4 import BeautifulSoup
import requests
! pip install lxml

from scipy import stats # for calculating z-score 

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.1

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.1

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


Read the stations extracted from the exploremetro website.

In [2]:
# Web scraping

res = requests.get("https://www.exploremetro.com/blog/hong-kong-mtr-station-names-in-cantonese-jyutping/")

soup = BeautifulSoup(res.content,"html.parser") 
table = soup.find_all('table')[0] 
hk_station = pd.read_html(str(table))[0]

hk_station.head(120)

Unnamed: 0,English,Chinese,Jyutping
0,Admiralty,金鐘,gam1 zung1
1,Airport,機場,gei1 coeng4
2,AsiaWorld-Expo,博覽館,bok3 laam5 gun2
3,Austin,柯士甸,o1 si6 din1
4,Causeway Bay,銅鑼灣,tung4 lo4 waan1
5,Central,中環,zung1 waan4
6,Chai Wan,柴灣,caai4 waan1
7,Che Kung Temple,車公廟,ce1 gung1 miu6
8,Cheung Sha Wan,長沙灣,coeng4 saa1 waan1
9,Choi Hung,彩虹,coi2 hung4


Data wringling by renaming column label, removing unwanted columns and adding stations that are newly built.

In [4]:
#functoin for data wriningling

hk_station.rename(columns={"English": "Station"}, inplace = True)
hk_station.drop(columns=['Chinese','Jyutping'], inplace = True)

south_island_line = pd.DataFrame({'Station':['Ocean Park','Wong Chuk Hang','Lei Tung','South Horizons']})
west_island_line = pd.DataFrame({'Station':['Sai Ying Pun','HKU','Kennedy Town']})
tuen_ma_line = pd.DataFrame({'Station':['Hin Keng','Kai Tak']})

hk_station = hk_station.append([south_island_line,west_island_line,tuen_ma_line],ignore_index=True)

In [5]:
hk_station

Unnamed: 0,Station
0,Admiralty
1,Airport
2,AsiaWorld-Expo
3,Austin
4,Causeway Bay
5,Central
6,Chai Wan
7,Che Kung Temple
8,Cheung Sha Wan
9,Choi Hung


Download the geospatial coorindates of the stations using Geocoder API

In [6]:
##function for adding lat and long

columns = ['Latitude', 'Longitude']
hk_coor = pd.DataFrame(columns=columns)

    
for index in range(0,len(hk_station)):
    
    address = str(hk_station['Station'][index]+' Station, Hong Kong')   
    geolocator = Nominatim(user_agent="hk_explorer", timeout = 5)
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    
    row = [latitude,longitude]
    
    hk_coor.loc[index]=[latitude,longitude]    #didnt use append to improve performance
 #   print(hk_coor.loc[index]) 

hk_station[['Latitude','Longitude']]=hk_coor[['Latitude','Longitude']]
hk_station.head(len(hk_station))

Unnamed: 0,Station,Latitude,Longitude
0,Admiralty,22.278475,114.164646
1,Airport,22.316087,113.936478
2,AsiaWorld-Expo,22.321251,113.942971
3,Austin,22.306364,114.164398
4,Causeway Bay,22.280208,114.184841
5,Central,22.285039,114.158382
6,Chai Wan,22.264819,114.237107
7,Che Kung Temple,22.374746,114.186186
8,Cheung Sha Wan,22.33534,114.15779
9,Choi Hung,22.334068,114.210044


Plot the map of Hong Kong with MTR Station markers

In [7]:
address = 'Tsuen Wan, Hong Kong'   # Tsuen Wan, the geographical centre of Hong Kong is selected such that the whole region can be displayed

geolocator = Nominatim(user_agent="hk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Tsuen Wan are {}, {}.'.format(latitude, longitude))
                                                                    
#map_hk = folium.Map(location=[latitude, longitude], zoom_start=10.5)
#map_hk

The geograpical coordinates of Tsuen Wan are 22.3716605, 114.1134699.


In [8]:
map_hk = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(hk_station['Latitude'], hk_station['Longitude'], hk_station['Station']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hk)  
    
map_hk

In [9]:
# @hidden_cell

CLIENT_ID = 'GGHVT23P5IV0HT0KJZK0S0CYPOXD4ZE23MSRM1SX4L51PFGV' # your Foursquare ID
CLIENT_SECRET = 'XB2ZCCD1JNKLZKTGFSD3BLKWILFQTZWNR1KCRCN0YQ0LDHCP' # your Foursquare Secret
VERSION = '20200130' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GGHVT23P5IV0HT0KJZK0S0CYPOXD4ZE23MSRM1SX4L51PFGV
CLIENT_SECRET:XB2ZCCD1JNKLZKTGFSD3BLKWILFQTZWNR1KCRCN0YQ0LDHCP


In [10]:
LIMIT = 20000 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [11]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
def getNearbyVenues(names, latitudes, longitudes, radius):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station',
                  'Latitude',
                  'Longitude',
                  'Venue',
                  'Venue ID',           
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [12]:
hk_venues = getNearbyVenues(names=hk_station['Station'],
                            latitudes=hk_station['Latitude'],
                            longitudes=hk_station['Longitude'],
                            radius=radius
                           )

In [13]:
hk_venues.tail(50)

Unnamed: 0,Station,Latitude,Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
3193,Kennedy Town,22.281363,114.127832,Beeger,551e8d77498eb9c15d93e67f,22.283913,114.127581,Burger Joint
3194,Kennedy Town,22.281363,114.127832,Blue Place Café,52f734d2498e27465d59ce2d,22.282856,114.128883,Café
3195,Kennedy Town,22.281363,114.127832,Holland 100% Dessert 荷蘭糖水 (荷蘭糖水),4d04efda28926ea818316ec2,22.284566,114.130763,Dessert Shop
3196,Kennedy Town,22.281363,114.127832,Ice Monkey,5571ae35498edb3cbe95d8cd,22.283139,114.126845,Ice Cream Shop
3197,Kennedy Town,22.281363,114.127832,Tequila on Davis,513b1ffde4b093aa3ccfa99f,22.28346,114.126494,Mexican Restaurant
3198,Kennedy Town,22.281363,114.127832,Shiba,5448fc2d498ea61e417d45d1,22.283528,114.12814,Japanese Restaurant
3199,Kennedy Town,22.281363,114.127832,Forbes 36,5243b70011d2da3078fb64c8,22.281558,114.127928,Australian Restaurant
3200,Kennedy Town,22.281363,114.127832,Tai Hing (太興),51d0071b498ef748760cb4a1,22.282002,114.12828,Hong Kong Restaurant
3201,Kennedy Town,22.281363,114.127832,Cadogan Street Temporary Garden,52083b4d11d263e532f88144,22.282346,114.12492,Park
3202,Kennedy Town,22.281363,114.127832,Starbucks (星巴克),58faf77ca35dce24a1d3d11f,22.282668,114.128672,Coffee Shop


In [14]:
# A list of venues in the region

hk_venues['Venue Category'].unique()

array(['Hotel', 'Café', 'Multiplex', 'Shopping Mall', 'Steakhouse',
       'Non-Profit', 'Supermarket', 'Yoga Studio',
       'Furniture / Home Store', 'Seafood Restaurant', 'Park',
       'Cantonese Restaurant', 'Hotel Bar', 'Gym', 'Burger Joint', 'Zoo',
       'Gym / Fitness Center', 'Juice Bar', 'Bookstore',
       'Sushi Restaurant', 'Lounge', 'Italian Restaurant',
       'French Restaurant', 'Chinese Restaurant', 'Tea Room',
       'Thai Restaurant', 'Vegetarian / Vegan Restaurant',
       'Clothing Store', 'Modern European Restaurant', 'Dessert Shop',
       'Chocolate Shop', 'Noodle House', 'Vietnamese Restaurant',
       'Dim Sum Restaurant', 'Perfume Shop', 'Social Club',
       'Cocktail Bar', 'Spanish Restaurant', 'Wine Bar', 'Plaza', 'Spa',
       'Coffee Shop', 'Squash Court', 'Botanical Garden', 'Tennis Court',
       'Airport', 'Deli / Bodega', 'Airport Gate', 'Gift Shop', 'Bakery',
       'Airport Food Court', 'Airport Lounge', 'Movie Theater',
       'Dumpling Restaura

Extract the rows that are associated with gym or fitness center or yoga studio

In [15]:
gym_category = np.array(['Gym','Gym / Fitness Center','Gym Pool','Yoga Studio'])

removal_array = np.setdiff1d(hk_venues['Venue Category'].unique(),gym_category)

removal_array

hk_gym = hk_venues[~hk_venues['Venue Category'].isin(removal_array)]    # pick gym_category only
hk_gym.reset_index(drop = True, inplace = True)
hk_gym
#getting a clear dataframe of just restaurants
##nearby_venues2 = nearby_venues2[~nearby_venues2['categories'].isin(removal_list)]
#nearby_venues2

Unnamed: 0,Station,Latitude,Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
0,Admiralty,22.278475,114.164646,Pure Yoga,59a94e82c5b11c443d0083e0,22.278106,114.164754,Yoga Studio
1,Admiralty,22.278475,114.164646,Pure Fitness,4c7b310793ef236a360bb50f,22.279925,114.163022,Gym
2,Admiralty,22.278475,114.164646,Pure Fitness,56ad7bf6498e83964e53a64d,22.278475,114.161363,Gym / Fitness Center
3,Admiralty,22.278475,114.164646,Pure Yoga,5ba22ace31fd14002ce5930e,22.276904,114.168365,Yoga Studio
4,Airport,22.316087,113.936478,Om Spa & Fitness Centre,50dc73f5e4b0c1f301014a4c,22.318796,113.934504,Gym / Fitness Center
5,AsiaWorld-Expo,22.321251,113.942971,Marriott SkyCity Gym,52d28c93498e0806dab1aa57,22.319435,113.943461,Gym
6,Causeway Bay,22.280208,114.184841,Pure Fitness,50545f8ee4b0f42fb69d529b,22.278437,114.183919,Gym / Fitness Center
7,Causeway Bay,22.280208,114.184841,Pure Yoga,4b2f15e3f964a52075e924e3,22.278688,114.18261,Yoga Studio
8,Central,22.285039,114.158382,Pure Fitness,4b3082a2f964a520f2f924e3,22.285137,114.159455,Gym / Fitness Center
9,Central,22.285039,114.158382,Pure Yoga,51f78c00498e852dc483a48c,22.283022,114.155674,Yoga Studio


Get the likes counts from Foursquare API

In [16]:
#Extract a list of Gym ID

gym_id = hk_gym['Venue ID'].tolist()

In [17]:
# Extract the like counts using Foursquare API
url_list = []
like_count_list = []
json_list = []

for i in gym_id:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_count_list.append(likes)
print(like_count_list)

[10, 55, 31, 8, 11, 7, 99, 38, 66, 74, 10, 8, 33, 15, 9, 31, 55, 10, 12, 4, 0, 55, 15, 12, 7, 74, 8, 135, 7, 0, 0, 6, 0, 4, 2, 0, 8, 38, 2, 10, 13, 5, 0]


In [18]:
hk_gym['Number of Likes'] = like_count_list
hk_gym.head(100)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,Station,Latitude,Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Number of Likes
0,Admiralty,22.278475,114.164646,Pure Yoga,59a94e82c5b11c443d0083e0,22.278106,114.164754,Yoga Studio,10
1,Admiralty,22.278475,114.164646,Pure Fitness,4c7b310793ef236a360bb50f,22.279925,114.163022,Gym,55
2,Admiralty,22.278475,114.164646,Pure Fitness,56ad7bf6498e83964e53a64d,22.278475,114.161363,Gym / Fitness Center,31
3,Admiralty,22.278475,114.164646,Pure Yoga,5ba22ace31fd14002ce5930e,22.276904,114.168365,Yoga Studio,8
4,Airport,22.316087,113.936478,Om Spa & Fitness Centre,50dc73f5e4b0c1f301014a4c,22.318796,113.934504,Gym / Fitness Center,11
5,AsiaWorld-Expo,22.321251,113.942971,Marriott SkyCity Gym,52d28c93498e0806dab1aa57,22.319435,113.943461,Gym,7
6,Causeway Bay,22.280208,114.184841,Pure Fitness,50545f8ee4b0f42fb69d529b,22.278437,114.183919,Gym / Fitness Center,99
7,Causeway Bay,22.280208,114.184841,Pure Yoga,4b2f15e3f964a52075e924e3,22.278688,114.18261,Yoga Studio,38
8,Central,22.285039,114.158382,Pure Fitness,4b3082a2f964a520f2f924e3,22.285137,114.159455,Gym / Fitness Center,66
9,Central,22.285039,114.158382,Pure Yoga,51f78c00498e852dc483a48c,22.283022,114.155674,Yoga Studio,74


Calcuate the distance from the closest station by longtitudes and latitudes.

In [19]:
#Calculation distance from the closest station, the nearer the better.

from math import sin, cos, sqrt, atan2, radians

# approximate radius of earth in km
R = 6373.0

columns = ['Distance from Station']
distance_df = pd.DataFrame(columns=columns)

for index in range(0,len(hk_gym)):
    
    lat1 = radians(hk_gym['Latitude'][index])
    lon1 = radians(hk_gym['Longitude'][index])
    lat2 = radians(hk_gym['Venue Latitude'][index])
    lon2 = radians(hk_gym['Venue Longitude'][index])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    distance = R * c
    distance_df.loc[index]=[distance]
    
hk_gym['Distance from Station']=distance_df['Distance from Station']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Calculate Z-score for Number of Likes and Distance from the nearest MTR station. Z-score is generated to ease the statistical analysis process. By forcing values that lie within a small range (roughly -3 to 3), the range of parameters of each cluster can be distinguished and compared more easily.

In [20]:
hk_gym['Z-score for Likes']=stats.zscore(hk_gym['Number of Likes'], axis=0, ddof=1)
hk_gym['Z-score for Distance']=stats.zscore(hk_gym['Distance from Station'], axis=0, ddof=1)
#print(hk_gym)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [21]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 4

hk_gym_clustering = hk_gym[['Z-score for Likes','Z-score for Distance']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_gym_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:len(hk_gym)] 

array([0, 2, 1, 1, 1, 0, 3, 2, 2, 3, 1, 1, 2, 1, 1, 2, 2, 0, 1, 1, 0, 2,
       1, 1, 1, 3, 1, 3, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0],
      dtype=int32)

In [22]:
# add clustering labels
hk_gym_clustering.insert(0, 'Cluster Labels', kmeans.labels_)


hk_gym_clustering.head(len(hk_gym))
hk_gym['Cluster Label']=hk_gym_clustering['Cluster Labels']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Plot map along with the 4 clusters.

In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**4 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
count = 0
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hk_gym['Venue Latitude'], hk_gym['Venue Longitude'], hk_gym['Venue'], hk_gym_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters

Statistical Analysis of Z-scores to categorize according to quality of service and walking distance

In [24]:
hk_gym_clustering.groupby('Cluster Labels').max()

Unnamed: 0_level_0,Z-score for Likes,Z-score for Distance
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,-0.332373,-0.280367
1,0.502442,1.588076
2,1.437436,0.055013
3,3.741526,1.237633


In [25]:
hk_gym_clustering.groupby('Cluster Labels').min()

Unnamed: 0_level_0,Z-score for Likes,Z-score for Distance
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,-0.766477,-1.885622
1,-0.766477,0.231729
2,0.268694,-1.601107
3,1.704577,-0.456578


0, bad and average to convenient.

1, accpetable and average to far.

2, satisfactory and average to convenient.

3, excellent and ranging from below average to far.

.
 


Determining the rating, quality of service and walking time

In [26]:
##function for adding rating, quality of service and walking time
##rating is determined by the k-means clusters, cluster 0 = 1 *, cluster 1 = 2 * ...

rating_df = pd.DataFrame(columns=['Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR Station'])

    
for index in range(0,len(hk_gym_clustering)):      #didnt use append to improve performance

    if(hk_gym_clustering['Cluster Labels'][index]==3):
        rating_df.loc[index]=['****','Excellent', int(abs(hk_gym['Distance from Station'][index])*25+3)]
    elif(hk_gym_clustering['Cluster Labels'][index]==2):
        rating_df.loc[index]=['***','Satisfactory',int(abs(hk_gym['Distance from Station'][index])*25+3)]
    elif(hk_gym_clustering['Cluster Labels'][index]==1):
            rating_df.loc[index]=['**','Acceptable', int(abs(hk_gym['Distance from Station'][index])*25+3)]
    else:
            rating_df.loc[index]=['*','Bad', int(abs(hk_gym['Distance from Station'][index])*25+3)]


#print(rating_df)
            
hk_gym_clustering[['Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR']]=rating_df[['Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR Station']]
#hk_gym_clustering.head(40)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[k1] = value[k2]


In [27]:
# Adding columns to main dataframe

hk_gym[['Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR (minute)']]=rating_df[['Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR Station']]


In [28]:
hk_gym

Unnamed: 0,Station,Latitude,Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category,Number of Likes,Distance from Station,Z-score for Likes,Z-score for Distance,Cluster Label,"Rating (*-Worst, ****-Best)",Quality of Service,Walking Time from MTR (minute)
0,Admiralty,22.278475,114.164646,Pure Yoga,59a94e82c5b11c443d0083e0,22.278106,114.164754,Yoga Studio,10,0.042504,-0.432551,-1.811568,0,*,Bad,4
1,Admiralty,22.278475,114.164646,Pure Fitness,4c7b310793ef236a360bb50f,22.279925,114.163022,Gym,55,0.23231,1.070117,-0.351754,2,***,Satisfactory,8
2,Admiralty,22.278475,114.164646,Pure Fitness,56ad7bf6498e83964e53a64d,22.278475,114.161363,Gym / Fitness Center,31,0.337851,0.268694,0.459969,1,**,Acceptable,11
3,Admiralty,22.278475,114.164646,Pure Yoga,5ba22ace31fd14002ce5930e,22.276904,114.168365,Yoga Studio,8,0.42079,-0.499336,1.09786,1,**,Acceptable,13
4,Airport,22.316087,113.936478,Om Spa & Fitness Centre,50dc73f5e4b0c1f301014a4c,22.318796,113.934504,Gym / Fitness Center,11,0.363367,-0.399158,0.656215,1,**,Acceptable,12
5,AsiaWorld-Expo,22.321251,113.942971,Marriott SkyCity Gym,52d28c93498e0806dab1aa57,22.319435,113.943461,Gym,7,0.208144,-0.532729,-0.537618,0,*,Bad,8
6,Causeway Bay,22.280208,114.184841,Pure Fitness,50545f8ee4b0f42fb69d529b,22.278437,114.183919,Gym / Fitness Center,99,0.21868,2.539392,-0.456578,3,****,Excellent,8
7,Causeway Bay,22.280208,114.184841,Pure Yoga,4b2f15e3f964a52075e924e3,22.278688,114.18261,Yoga Studio,38,0.285198,0.502442,0.055013,2,***,Satisfactory,10
8,Central,22.285039,114.158382,Pure Fitness,4b3082a2f964a520f2f924e3,22.285137,114.159455,Gym / Fitness Center,66,0.110982,1.437436,-1.284897,2,***,Satisfactory,5
9,Central,22.285039,114.158382,Pure Yoga,51f78c00498e852dc483a48c,22.283022,114.155674,Yoga Studio,74,0.357753,1.704577,0.613038,3,****,Excellent,11


Generate a dataframe and a map for final presentation.

In [29]:
final_df=hk_gym[['Station','Venue', 'Rating (*-Worst, ****-Best)','Quality of Service','Walking Time from MTR (minute)']]

In [30]:
final_df

Unnamed: 0,Station,Venue,"Rating (*-Worst, ****-Best)",Quality of Service,Walking Time from MTR (minute)
0,Admiralty,Pure Yoga,*,Bad,4
1,Admiralty,Pure Fitness,***,Satisfactory,8
2,Admiralty,Pure Fitness,**,Acceptable,11
3,Admiralty,Pure Yoga,**,Acceptable,13
4,Airport,Om Spa & Fitness Centre,**,Acceptable,12
5,AsiaWorld-Expo,Marriott SkyCity Gym,*,Bad,8
6,Causeway Bay,Pure Fitness,****,Excellent,8
7,Causeway Bay,Pure Yoga,***,Satisfactory,10
8,Central,Pure Fitness,***,Satisfactory,5
9,Central,Pure Yoga,****,Excellent,11


In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**4 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
count = 0
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, rating, qual, walk_time in zip(hk_gym['Venue Latitude'], hk_gym['Venue Longitude'], hk_gym['Venue'], hk_gym['Cluster Label'], hk_gym['Rating (*-Worst, ****-Best)'],hk_gym['Quality of Service'], hk_gym['Walking Time from MTR (minute)']):
    label = folium.Popup(str(poi) + '   Rating:' + str(rating) +'  Quality:'+ str(qual)+ '  Walking Time:'+ str(walk_time) + ' mins', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=3,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

END