# To analyse clustering city of Penang (Pulau Pinang) state in Malaysia

## Introduction:
The aim of this project is to demonstrate how Foursquare data can be used to explore and compare neighbourhoods or cities of choice and which problems can it solve.

## Problem description:
Penang (a.k.a Pulau Pinang) is famous tourist state in Malaysia. I frequently visit Penang state during long holiday, one of our intention is to have good food. This let me to have a thought about food business in around the city of Penang. I decided to make a clustering analysis on most common venues around different city in Penang where most common venues attracts visitors and that will be the source of business for food.

## Target audience:
In the future, this approach could be used as a service, helping food business owner to understand which category of food business is preferred in each city of Penang.

## Data description:
I will use the dataset from https://www.weatherdatasource.com/MY-Pulau_Pinang order to get the list of city of Penang. I then use the list of city data and retrieve latitude and longitude coordinates using Geopy. With  City, Latitude and Longitude data, 'Venue Category' can be retrieve using FourSquare API.

In [16]:
#import library
!pip install folium
!pip install geopy

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np # library to handle data in a vectorized manner

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# import request and BeautifulSoup
import requests
from bs4 import BeautifulSoup

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from folium import plugins
from folium.plugins import MarkerCluster

print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Libraries imported.


In [17]:
#scrape weather website for city list in Pulau Pinang
url = 'https://www.weatherdatasource.com/MY-Pulau_Pinang'
website_text = requests.get(url).text
soup = BeautifulSoup(website_text,'xml')
table_contents = []
divs = soup.find_all("ul", {"class": "list-group w-100 align-items-stretch"})
                     
for ultag in divs:
     for litag in ultag.find_all('li'):
            cell = {}
            cell['City']=litag.text
            cell['State']='Penang'
            table_contents.append(cell)
#print(table_contents)
df_CityList=pd.DataFrame(table_contents).sort_values(by='City')
#df_CityList

In [18]:
#Correction on "Tasek Glugor" to "Tasek Gelugor" and "George Town" to "Bandaraya George Town"
#remove "Permatang Kuching" as unable to get City with Geopy
df_CityList.loc[(df_CityList.City == 'Tasek Glugor'),'City']='Tasek Gelugor'
df_CityList.loc[(df_CityList.City == 'George Town'),'City']='Bandaraya George Town'
df_CityList.drop(df_CityList.loc[df_CityList['City']=='Permatang Kuching'].index, inplace=True)
df_CityList

Unnamed: 0,City,State
0,Bayan Lepas,Penang
1,Bukit Mertajam,Penang
2,Butterworth,Penang
3,Bandaraya George Town,Penang
4,Juru,Penang
5,Kepala Batas,Penang
6,Nibong Tebal,Penang
7,Perai,Penang
9,Sungai Ara,Penang
10,Tanjung Tokong,Penang


In [19]:
#Get Latitude and Longitude
address = 'Tanjung Tokong, Penang'

geolocator = Nominatim(user_agent="Coursera-workshop")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bayan Lepas, Penang are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Bayan Lepas, Penang are 5.4511119, 100.3045892.


In [20]:
#Loop through df_CityList
list_lat = []   # create empty lists
list_long = []
#geolocator = Nominatim(user_agent="ny_explorer")
for index, row in df_CityList.iterrows():
    #City = row['City']
    #State = row['State']  
    address = str(row['City'])+', '+str(row['State'])
    #print(address)
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude

    list_lat.append(latitude)
    list_long.append(longitude)

#create new columns from lists    
df_CityList['Latitude'] = list_lat   
df_CityList['Longitude'] = list_long

In [21]:
#Geopy unable to find "Permatang Kuching", hence I added them manually.
new_row = {'City':'Permatang Kuching', 'State':'Penang', 'Latitude':'5.46339', 'Longitude':'100.381'}
#append row to the dataframe
df_CityList = df_CityList.append(new_row, ignore_index=True)
df_CityList

Unnamed: 0,City,State,Latitude,Longitude
0,Bayan Lepas,Penang,5.29513,100.26
1,Bukit Mertajam,Penang,5.36424,100.461
2,Butterworth,Penang,5.39349,100.366
3,Bandaraya George Town,Penang,5.41457,100.33
4,Juru,Penang,5.31572,100.438
5,Kepala Batas,Penang,5.51438,100.436
6,Nibong Tebal,Penang,5.1701,100.479
7,Perai,Penang,5.38708,100.382
8,Sungai Ara,Penang,5.32088,100.268
9,Tanjung Tokong,Penang,5.45111,100.305


In [22]:
#Create a map of Penang with City superimposed on top.
# create map of Penang using latitude and longitude values
map_Penang = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city in zip(df_CityList['Latitude'], df_CityList['Longitude'], df_CityList['City']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Penang)  
    
map_Penang

In [23]:
# The code was removed by Watson Studio for sharing.

In [24]:
#Define function to get venues around 500 meters of city
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [25]:
# Call function of getNearbyVenues to get list of 100 venues around 500 meters of City
PG_City_Venues = getNearbyVenues(names=df_CityList['City'],
                                   latitudes=df_CityList['Latitude'],
                                   longitudes=df_CityList['Longitude']
                                  )

Bayan Lepas
Bukit Mertajam
Butterworth
Bandaraya George Town
Juru
Kepala Batas
Nibong Tebal
Perai
Sungai Ara
Tanjung Tokong
Tasek Gelugor
Permatang Kuching


In [26]:
#Let's check the size of the resulting dataframe
print(PG_City_Venues.shape)
PG_City_Venues.head()

(252, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bayan Lepas,5.29513,100.26,Nasi Dalca Briyani,5.296738,100.259892,Malay Restaurant
1,Bayan Lepas,5.29513,100.26,Pasar Ramadhan Bayan Lepas,5.29533,100.262206,Food Truck
2,Bayan Lepas,5.29513,100.26,Cargas Cafe,5.294794,100.25838,Malay Restaurant
3,Bayan Lepas,5.29513,100.26,Fizzy Cafe Mee Udang Ketam,5.296352,100.259004,Asian Restaurant
4,Bayan Lepas,5.29513,100.26,Bangkok Tomyam,5.297739,100.263921,Thai Restaurant


In [28]:
#Check how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(PG_City_Venues['Venue Category'].unique())))

There are 90 uniques categories.


In [30]:
#Anylse each Neighborhood
# one hot encoding
PG_City_Venues_onehot = pd.get_dummies(PG_City_Venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
PG_City_Venues_onehot['Neighborhood'] = PG_City_Venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [PG_City_Venues_onehot.columns[-1]] + list(PG_City_Venues_onehot.columns[:-1])
PG_City_Venues_onehot = PG_City_Venues_onehot[fixed_columns]

PG_City_Venues_onehot.head()

Unnamed: 0,Neighborhood,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Automotive Shop,BBQ Joint,Bakery,Bank,Basketball Court,Bistro,Boarding House,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fishing Store,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,General Entertainment,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Halal Restaurant,Harbor / Marina,Health & Beauty Service,History Museum,Hotel,IT Services,Indian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Laundromat,Lawyer,Malay Restaurant,Mamak Restaurant,Market,Miscellaneous Shop,Music Venue,Nail Salon,Nightclub,Noodle House,Office,Outdoors & Recreation,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pool,Restaurant,Salon / Barbershop,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Sporting Goods Shop,Supermarket,Temple,Thai Restaurant,Theater,Trail,Train Station,Vegetarian / Vegan Restaurant,Vineyard
0,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bayan Lepas,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [32]:
#Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
PG_City_grouped = PG_City_Venues_onehot.groupby('Neighborhood').mean().reset_index()
PG_City_grouped.head()

Unnamed: 0,Neighborhood,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Automotive Shop,BBQ Joint,Bakery,Bank,Basketball Court,Bistro,Boarding House,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fishing Store,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,General Entertainment,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Halal Restaurant,Harbor / Marina,Health & Beauty Service,History Museum,Hotel,IT Services,Indian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Laundromat,Lawyer,Malay Restaurant,Mamak Restaurant,Market,Miscellaneous Shop,Music Venue,Nail Salon,Nightclub,Noodle House,Office,Outdoors & Recreation,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pool,Restaurant,Salon / Barbershop,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Sporting Goods Shop,Supermarket,Temple,Thai Restaurant,Theater,Trail,Train Station,Vegetarian / Vegan Restaurant,Vineyard
0,Bandaraya George Town,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.060606,0.090909,0.0,0.060606,0.0,0.0,0.0,0.0,0.090909,0.030303,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.151515,0.0
1,Bayan Lepas,0.0,0.0,0.117647,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.235294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0
2,Bukit Mertajam,0.0,0.0,0.051282,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.230769,0.0,0.051282,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.025641,0.051282,0.0,0.076923,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.179487,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.025641,0.0,0.0,0.0
3,Butterworth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Juru,0.0,0.058824,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.235294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [33]:
#function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [35]:
#New dataframe and display the top 10 venues for each neighborhood.

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
PG_City_venues_sorted = pd.DataFrame(columns=columns)
PG_City_venues_sorted['Neighborhood'] = PG_City_grouped['Neighborhood']

for ind in np.arange(PG_City_grouped.shape[0]):
    PG_City_venues_sorted.iloc[ind, 1:] = return_most_common_venues(PG_City_grouped.iloc[ind, :], num_top_venues)

PG_City_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bandaraya George Town,Vegetarian / Vegan Restaurant,Dessert Shop,Hotel,Chinese Restaurant,Coffee Shop,Café,Dim Sum Restaurant,Nightclub,General Entertainment,Pizza Place
1,Bayan Lepas,Malay Restaurant,Food Truck,Asian Restaurant,Thai Restaurant,Convenience Store,Coffee Shop,Auto Garage,Restaurant,Japanese Restaurant,Outdoors & Recreation
2,Bukit Mertajam,Chinese Restaurant,Malay Restaurant,Food Truck,Food Court,Asian Restaurant,Coffee Shop,Grocery Store,Breakfast Spot,Food & Drink Shop,Health & Beauty Service
3,Butterworth,Coffee Shop,Food & Drink Shop,Business Service,Platform,Bubble Tea Shop,Malay Restaurant,Fast Food Restaurant,Farmers Market,Convenience Store,Diner
4,Juru,Chinese Restaurant,Asian Restaurant,Food & Drink Shop,Malay Restaurant,Arts & Crafts Store,Food Truck,Restaurant,Gym,Lake,Food


In [36]:
#Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

PG_City_grouped_clustering = PG_City_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(PG_City_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 2, 1, 2, 3, 2, 0, 4, 1], dtype=int32)

In [37]:
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
#add clustering labels
#df_CityList
#Rename initial dataframe 'df_CityList' 
df_CityList.rename(columns = {'City':'Neighborhood'}, inplace = True)

PG_City_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

PG_City_merged = df_CityList

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
PG_City_merged = PG_City_merged.join(PG_City_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

PG_City_merged.head()

Unnamed: 0,Neighborhood,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayan Lepas,Penang,5.29513,100.26,0,Malay Restaurant,Food Truck,Asian Restaurant,Thai Restaurant,Convenience Store,Coffee Shop,Auto Garage,Restaurant,Japanese Restaurant,Outdoors & Recreation
1,Bukit Mertajam,Penang,5.36424,100.461,2,Chinese Restaurant,Malay Restaurant,Food Truck,Food Court,Asian Restaurant,Coffee Shop,Grocery Store,Breakfast Spot,Food & Drink Shop,Health & Beauty Service
2,Butterworth,Penang,5.39349,100.366,1,Coffee Shop,Food & Drink Shop,Business Service,Platform,Bubble Tea Shop,Malay Restaurant,Fast Food Restaurant,Farmers Market,Convenience Store,Diner
3,Bandaraya George Town,Penang,5.41457,100.33,2,Vegetarian / Vegan Restaurant,Dessert Shop,Hotel,Chinese Restaurant,Coffee Shop,Café,Dim Sum Restaurant,Nightclub,General Entertainment,Pizza Place
4,Juru,Penang,5.31572,100.438,2,Chinese Restaurant,Asian Restaurant,Food & Drink Shop,Malay Restaurant,Arts & Crafts Store,Food Truck,Restaurant,Gym,Lake,Food


In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(PG_City_merged['Latitude'], PG_City_merged['Longitude'], PG_City_merged['Neighborhood'], PG_City_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


In [50]:
#Examine Cluster 0
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 0, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Penang,Bayan Lepas,Malay Restaurant,Food Truck,Asian Restaurant,Thai Restaurant,Convenience Store,Coffee Shop,Auto Garage,Restaurant,Japanese Restaurant,Outdoors & Recreation
7,Penang,Perai,Malay Restaurant,Fast Food Restaurant,BBQ Joint,Coffee Shop,Lawyer,Indian Restaurant,Playground,Basketball Court,Harbor / Marina,Miscellaneous Shop
10,Penang,Tasek Gelugor,Department Store,Coffee Shop,Asian Restaurant,Clothing Store,Malay Restaurant,Flea Market,Bistro,Restaurant,Farmers Market,Dessert Shop


In [51]:
#Examine Cluster 1
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 1, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Penang,Butterworth,Coffee Shop,Food & Drink Shop,Business Service,Platform,Bubble Tea Shop,Malay Restaurant,Fast Food Restaurant,Farmers Market,Convenience Store,Diner
8,Penang,Sungai Ara,Asian Restaurant,Chinese Restaurant,Seafood Restaurant,Coffee Shop,Convenience Store,Thai Restaurant,Cosmetics Shop,Government Building,Laundromat,Pizza Place
9,Penang,Tanjung Tokong,Japanese Restaurant,Coffee Shop,Korean Restaurant,Café,Asian Restaurant,Food Truck,Burger Joint,Convenience Store,Cosmetics Shop,Chinese Restaurant


In [52]:
#Examine Cluster 2
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 2, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Penang,Bukit Mertajam,Chinese Restaurant,Malay Restaurant,Food Truck,Food Court,Asian Restaurant,Coffee Shop,Grocery Store,Breakfast Spot,Food & Drink Shop,Health & Beauty Service
3,Penang,Bandaraya George Town,Vegetarian / Vegan Restaurant,Dessert Shop,Hotel,Chinese Restaurant,Coffee Shop,Café,Dim Sum Restaurant,Nightclub,General Entertainment,Pizza Place
4,Penang,Juru,Chinese Restaurant,Asian Restaurant,Food & Drink Shop,Malay Restaurant,Arts & Crafts Store,Food Truck,Restaurant,Gym,Lake,Food
6,Penang,Nibong Tebal,Food Court,Noodle House,Chinese Restaurant,Vineyard,Bakery,Coffee Shop,Fast Food Restaurant,Bank,Seafood Restaurant,Supermarket


In [53]:
#Examine Cluster 3
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 3, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Penang,Kepala Batas,Indian Restaurant,Athletics & Sports,Diner,Malay Restaurant,Bank,Breakfast Spot,Vineyard,Fast Food Restaurant,Dessert Shop,Dim Sum Restaurant


In [54]:
#Examine Cluster 4
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 4, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Penang,Permatang Kuching,Golf Course,Asian Restaurant,Malay Restaurant,Breakfast Spot,Skate Park,Vineyard,Farmers Market,Department Store,Dessert Shop,Dim Sum Restaurant
