# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Applied Data Science Capstone by IBM/Coursera

## Introduction:
The aim of this project is to explore cities in Penang state and it's common venues. By analysing the common venues around cities, it can show us the popular type of business.

## Problem description:
Penang (a.k.a Pulau Pinang) is famous tourist state in Malaysia. I frequently visit Penang state 
during long holiday, one of our intention is to have good food. This let me to have a thought about 
food  business  in  around  the  city  of  Penang.  I  decided  to  make  a  clustering  analysis  on  most 
common venues around different city in Penang where most common venues attract visitors and 
that will be the source of business for food.

## Target audience:
In the future, this approach could be used as a service, helping food business entrepreneur to 
understand which category of food business is preferred in each city of Penang. 

## Data description:
I will use the dataset from https://www.weatherdatasource.com/MY-Pulau_Pinang order to get the 
list of city in Penang. I then use the list of city data and retrieve latitude and longitude coordinates 
by using Geopy. With City, Latitude and Longitude data, 'Venue Category' can be retrieve using 
FourSquare API. With JSON data from FourSquare API, I will retrieve top 10 common venues, and 
use Kmeans to cluster those top common venue. City cluster will be presented via Folium map. 
The cluster city will show data of food business category is more common and blue ocean strategy 
can be applied to avoid stiff competition (or too many of same food category in the same city area). 

In [55]:
#import library
!pip install folium
!pip install geopy

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np # library to handle data in a vectorized manner

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# import request and BeautifulSoup
import requests
from bs4 import BeautifulSoup

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
from folium import plugins
from folium.plugins import MarkerCluster

print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Libraries imported.


In [56]:
#scrape weather website for city list in Pulau Pinang
url = 'https://www.weatherdatasource.com/MY-Pulau_Pinang'
website_text = requests.get(url).text
soup = BeautifulSoup(website_text,'xml')
table_contents = []
divs = soup.find_all("ul", {"class": "list-group w-100 align-items-stretch"})
                     
for ultag in divs:
     for litag in ultag.find_all('li'):
            cell = {}
            cell['City']=litag.text
            cell['State']='Penang'
            table_contents.append(cell)
#print(table_contents)
df_CityList=pd.DataFrame(table_contents).sort_values(by='City')
#df_CityList

In [57]:
#Correction on "Tasek Glugor" to "Tasek Gelugor" and "George Town" to "Bandaraya George Town"
#remove "Permatang Kuching" as unable to get City with Geopy
df_CityList.loc[(df_CityList.City == 'Tasek Glugor'),'City']='Tasek Gelugor'
df_CityList.loc[(df_CityList.City == 'George Town'),'City']='Bandaraya George Town'
df_CityList.drop(df_CityList.loc[df_CityList['City']=='Permatang Kuching'].index, inplace=True)
df_CityList

Unnamed: 0,City,State
0,Bayan Lepas,Penang
1,Bukit Mertajam,Penang
2,Butterworth,Penang
3,Bandaraya George Town,Penang
4,Juru,Penang
5,Kepala Batas,Penang
6,Nibong Tebal,Penang
7,Perai,Penang
9,Sungai Ara,Penang
10,Tanjung Tokong,Penang


In [58]:
#Loop through df_CityList and get it's Latitude, Longitude and add it to city list dataframe
list_lat = []   # create empty lists
list_long = []
#geolocator = Nominatim(user_agent="ny_explorer")
for index, row in df_CityList.iterrows():
    #City = row['City']
    #State = row['State']  
    address = str(row['City'])+', '+str(row['State'])
    #print(address)
    geolocator = Nominatim(user_agent="ny_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude

    list_lat.append(latitude)
    list_long.append(longitude)

#create new columns from lists    
df_CityList['Latitude'] = list_lat   
df_CityList['Longitude'] = list_long

In [59]:
#Geopy unable to find "Permatang Kuching", hence I added them manually.
new_row = {'City':'Permatang Kuching', 'State':'Penang', 'Latitude':'5.46339', 'Longitude':'100.381'}
#append row to the dataframe
df_CityList = df_CityList.append(new_row, ignore_index=True)
df_CityList

Unnamed: 0,City,State,Latitude,Longitude
0,Bayan Lepas,Penang,5.29513,100.26
1,Bukit Mertajam,Penang,5.36424,100.461
2,Butterworth,Penang,5.39349,100.366
3,Bandaraya George Town,Penang,5.41457,100.33
4,Juru,Penang,5.31572,100.438
5,Kepala Batas,Penang,5.51438,100.436
6,Nibong Tebal,Penang,5.1701,100.479
7,Perai,Penang,5.38708,100.382
8,Sungai Ara,Penang,5.32088,100.268
9,Tanjung Tokong,Penang,5.45111,100.305


In [60]:
#Create a map of Penang with City superimposed on top.
#Create map of Penang using latitude and longitude values
map_Penang = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, city in zip(df_CityList['Latitude'], df_CityList['Longitude'], df_CityList['City']):
    label = '{}'.format(city)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Penang)  
    
map_Penang

## Methodology 

In this project I will focus on top 20 most common venues and get the venues category.

In first step we have collected the required data: location and type (category) of every restaurant within 1000m of each city center. 
With the list of 20 common venues, I will group the by Venues Category.
With List of Venues Category, city can be clustered according to popular venues and this will allow us to view what business is popular in that area.

In [61]:
# The code was removed by Watson Studio for sharing.

In [62]:
#Define function to get venues around 1000 meters of city
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [63]:
# Call function of getNearbyVenues to get list of 100 venues around 1000 meters of City
PG_City_Venues = getNearbyVenues(names=df_CityList['City'],
                                   latitudes=df_CityList['Latitude'],
                                   longitudes=df_CityList['Longitude']
                                  )

Bayan Lepas
Bukit Mertajam
Butterworth
Bandaraya George Town
Juru
Kepala Batas
Nibong Tebal
Perai
Sungai Ara
Tanjung Tokong
Tasek Gelugor
Permatang Kuching


## Analysis
Detail of each analysis is in the cell of each step.

In [64]:
#Let's check the size of the resulting dataframe
print(PG_City_Venues.shape)
PG_City_Venues.head()

(508, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bayan Lepas,5.29513,100.26,Nasi Dalca Briyani,5.296738,100.259892,Malay Restaurant
1,Bayan Lepas,5.29513,100.26,Bawal Goreng Pokok Cheri,5.299255,100.262345,Malay Restaurant
2,Bayan Lepas,5.29513,100.26,Cargas Cafe,5.294794,100.25838,Malay Restaurant
3,Bayan Lepas,5.29513,100.26,Pasar Ramadhan Bayan Lepas,5.29533,100.262206,Food Truck
4,Bayan Lepas,5.29513,100.26,Fizzy Cafe Mee Udang Ketam,5.296352,100.259004,Asian Restaurant


In [65]:
#Check how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(PG_City_Venues['Venue Category'].unique())))

There are 127 uniques categories.


In [66]:
#Anylse each Neighborhood
# one hot encoding
PG_City_Venues_onehot = pd.get_dummies(PG_City_Venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
PG_City_Venues_onehot['Neighborhood'] = PG_City_Venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [PG_City_Venues_onehot.columns[-1]] + list(PG_City_Venues_onehot.columns[:-1])
PG_City_Venues_onehot = PG_City_Venues_onehot[fixed_columns]

PG_City_Venues_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Station,Business Service,Café,Camera Store,Casino,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Dongbei Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gymnastics Gym,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,History Museum,Home Service,Hookah Bar,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Laundromat,Lottery Retailer,Lounge,Malay Restaurant,Mamak Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Night Market,Nightclub,Noodle House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Pool,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soup Place,Spa,Street Art,Supermarket,Tea Room,Temple,Thai Restaurant,Theater,Toll Plaza,Tourist Information Center,Track,Train Station,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Yoga Studio
0,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bayan Lepas,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bayan Lepas,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [67]:
#Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
PG_City_grouped = PG_City_Venues_onehot.groupby('Neighborhood').mean().reset_index()
PG_City_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Station,Business Service,Café,Camera Store,Casino,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Dongbei Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gymnastics Gym,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,History Museum,Home Service,Hookah Bar,Hostel,Hotel,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Laundromat,Lottery Retailer,Lounge,Malay Restaurant,Mamak Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Motorcycle Shop,Night Market,Nightclub,Noodle House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Pastry Shop,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Pool,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Sandwich Place,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Soup Place,Spa,Street Art,Supermarket,Tea Room,Temple,Thai Restaurant,Theater,Toll Plaza,Tourist Information Center,Track,Train Station,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Yoga Studio
0,Bandaraya George Town,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.03,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.08,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.09,0.01,0.0,0.02,0.01,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.02,0.0
1,Bayan Lepas,0.021277,0.021277,0.021277,0.021277,0.0,0.0,0.0,0.085106,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12766,0.0,0.0,0.0,0.0,0.085106,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.042553,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.106383,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.021277,0.0,0.042553,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0
2,Bukit Mertajam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.0,0.155556,0.0,0.066667,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.088889,0.088889,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Butterworth,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.03125,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.03125,0.03125,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
4,Juru,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.142857,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.178571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [68]:
#function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [78]:
#New dataframe and display the top 50 venues for each neighborhood.

num_top_venues = 20

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
PG_City_venues_sorted = pd.DataFrame(columns=columns)
PG_City_venues_sorted['Neighborhood'] = PG_City_grouped['Neighborhood']

for ind in np.arange(PG_City_grouped.shape[0]):
    PG_City_venues_sorted.iloc[ind, 1:] = return_most_common_venues(PG_City_grouped.iloc[ind, :], num_top_venues)

PG_City_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Bandaraya George Town,Café,Hotel,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Food Truck,Dessert Shop,Bed & Breakfast,Halal Restaurant,Wine Bar,Hostel,Indian Restaurant,Japanese Restaurant,Malay Restaurant,Coffee Shop,Restaurant,Noodle House,Snack Place,Pastry Shop,Art Museum
1,Bayan Lepas,Café,Malay Restaurant,Asian Restaurant,Coffee Shop,Convenience Store,Pharmacy,Food Truck,Indian Restaurant,Thai Restaurant,Pastry Shop,Outdoors & Recreation,Noodle House,Accessories Store,Pool,Fast Food Restaurant,Japanese Restaurant,Hotel,History Museum,Gymnastics Gym,Pizza Place
2,Bukit Mertajam,Chinese Restaurant,Malay Restaurant,Food Court,Food Truck,Coffee Shop,Asian Restaurant,Electronics Store,Pizza Place,Market,Department Store,Café,Bus Station,Bubble Tea Shop,Restaurant,Breakfast Spot,Flower Shop,Eastern European Restaurant,Shoe Store,Noodle House,Fried Chicken Joint
3,Butterworth,Asian Restaurant,Harbor / Marina,Malay Restaurant,Coffee Shop,Thai Restaurant,Electronics Store,Burger Joint,Business Service,Platform,Pier,Convenience Store,Hookah Bar,Farmers Market,Restaurant,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Indian Restaurant,Gym
4,Juru,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Food Truck,Breakfast Spot,Shipping Store,Hotel,Comfort Food Restaurant,Coffee Shop,Food Court,Food,Park,Fast Food Restaurant,Juice Bar,Farmers Market,Café,Lake,Yoga Studio,Flea Market,Eastern European Restaurant


In [79]:
#Run k-means to cluster the neighborhood into 5 clusters.
# set number of clusters
kclusters = 5

PG_City_grouped_clustering = PG_City_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(PG_City_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 0, 2, 0, 4, 0, 1, 2], dtype=int32)

In [80]:
#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
#add clustering labels
#df_CityList
#Rename initial dataframe 'df_CityList' 
df_CityList.rename(columns = {'City':'Neighborhood'}, inplace = True)

PG_City_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

PG_City_merged = df_CityList

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
PG_City_merged = PG_City_merged.join(PG_City_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

PG_City_merged.head()

Unnamed: 0,Neighborhood,State,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Bayan Lepas,Penang,5.29513,100.26,0,Café,Malay Restaurant,Asian Restaurant,Coffee Shop,Convenience Store,Pharmacy,Food Truck,Indian Restaurant,Thai Restaurant,Pastry Shop,Outdoors & Recreation,Noodle House,Accessories Store,Pool,Fast Food Restaurant,Japanese Restaurant,Hotel,History Museum,Gymnastics Gym,Pizza Place
1,Bukit Mertajam,Penang,5.36424,100.461,2,Chinese Restaurant,Malay Restaurant,Food Court,Food Truck,Coffee Shop,Asian Restaurant,Electronics Store,Pizza Place,Market,Department Store,Café,Bus Station,Bubble Tea Shop,Restaurant,Breakfast Spot,Flower Shop,Eastern European Restaurant,Shoe Store,Noodle House,Fried Chicken Joint
2,Butterworth,Penang,5.39349,100.366,0,Asian Restaurant,Harbor / Marina,Malay Restaurant,Coffee Shop,Thai Restaurant,Electronics Store,Burger Joint,Business Service,Platform,Pier,Convenience Store,Hookah Bar,Farmers Market,Restaurant,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Indian Restaurant,Gym
3,Bandaraya George Town,Penang,5.41457,100.33,0,Café,Hotel,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Food Truck,Dessert Shop,Bed & Breakfast,Halal Restaurant,Wine Bar,Hostel,Indian Restaurant,Japanese Restaurant,Malay Restaurant,Coffee Shop,Restaurant,Noodle House,Snack Place,Pastry Shop,Art Museum
4,Juru,Penang,5.31572,100.438,2,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Food Truck,Breakfast Spot,Shipping Store,Hotel,Comfort Food Restaurant,Coffee Shop,Food Court,Food,Park,Fast Food Restaurant,Juice Bar,Farmers Market,Café,Lake,Yoga Studio,Flea Market,Eastern European Restaurant


In [81]:
# create map of the Cluster
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(PG_City_merged['Latitude'], PG_City_merged['Longitude'], PG_City_merged['Neighborhood'], PG_City_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [82]:
#Examine Cluster 0
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 0, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
0,Penang,Bayan Lepas,Café,Malay Restaurant,Asian Restaurant,Coffee Shop,Convenience Store,Pharmacy,Food Truck,Indian Restaurant,Thai Restaurant,Pastry Shop,Outdoors & Recreation,Noodle House,Accessories Store,Pool,Fast Food Restaurant,Japanese Restaurant,Hotel,History Museum,Gymnastics Gym,Pizza Place
2,Penang,Butterworth,Asian Restaurant,Harbor / Marina,Malay Restaurant,Coffee Shop,Thai Restaurant,Electronics Store,Burger Joint,Business Service,Platform,Pier,Convenience Store,Hookah Bar,Farmers Market,Restaurant,Fast Food Restaurant,Flea Market,Food & Drink Shop,Food Court,Indian Restaurant,Gym
3,Penang,Bandaraya George Town,Café,Hotel,Chinese Restaurant,Vegetarian / Vegan Restaurant,Bakery,Food Truck,Dessert Shop,Bed & Breakfast,Halal Restaurant,Wine Bar,Hostel,Indian Restaurant,Japanese Restaurant,Malay Restaurant,Coffee Shop,Restaurant,Noodle House,Snack Place,Pastry Shop,Art Museum
5,Penang,Kepala Batas,Asian Restaurant,Mobile Phone Shop,Breakfast Spot,Indian Restaurant,Burger Joint,Café,Boutique,Convenience Store,Malay Restaurant,Chinese Restaurant,Fast Food Restaurant,Park,Pizza Place,Camera Store,Kids Store,Diner,Bubble Tea Shop,Motorcycle Shop,Accessories Store,Middle Eastern Restaurant
7,Penang,Perai,Malay Restaurant,Diner,Coffee Shop,Asian Restaurant,Chinese Restaurant,Indian Restaurant,Vegetarian / Vegan Restaurant,Furniture / Home Store,Food Truck,Food Court,Fast Food Restaurant,Basketball Court,Convenience Store,Hotel,Casino,Café,Sandwich Place,Seafood Restaurant,Accessories Store,Thai Restaurant
9,Penang,Tanjung Tokong,Japanese Restaurant,Coffee Shop,Korean Restaurant,Café,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Food Truck,Convenience Store,Fast Food Restaurant,Department Store,Market,Cosmetics Shop,Bookstore,Dongbei Restaurant,Paper / Office Supplies Store,Park,Lounge,Pet Store,Pharmacy


In [83]:
#Examine Cluster 1
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 1, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
11,Penang,Permatang Kuching,Golf Course,Malay Restaurant,Diner,Asian Restaurant,Water Park,Soccer Field,Night Market,Breakfast Spot,Flea Market,Fast Food Restaurant,Farmers Market,Yoga Studio,Electronics Store,Food,Eastern European Restaurant,Dongbei Restaurant,Dessert Shop,Department Store,Flower Shop,Food Court


In [84]:
#Examine Cluster 2
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 2, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
1,Penang,Bukit Mertajam,Chinese Restaurant,Malay Restaurant,Food Court,Food Truck,Coffee Shop,Asian Restaurant,Electronics Store,Pizza Place,Market,Department Store,Café,Bus Station,Bubble Tea Shop,Restaurant,Breakfast Spot,Flower Shop,Eastern European Restaurant,Shoe Store,Noodle House,Fried Chicken Joint
4,Penang,Juru,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Food Truck,Breakfast Spot,Shipping Store,Hotel,Comfort Food Restaurant,Coffee Shop,Food Court,Food,Park,Fast Food Restaurant,Juice Bar,Farmers Market,Café,Lake,Yoga Studio,Flea Market,Eastern European Restaurant
8,Penang,Sungai Ara,Malay Restaurant,Asian Restaurant,Coffee Shop,Chinese Restaurant,Convenience Store,Seafood Restaurant,Bakery,Farmers Market,Restaurant,Resort,Yoga Studio,Laundromat,Japanese Restaurant,Pizza Place,Pharmacy,Noodle House,Playground,Cosmetics Shop,Grocery Store,Vegetarian / Vegan Restaurant


In [85]:
#Examine Cluster 3
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 3, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
10,Penang,Tasek Gelugor,Asian Restaurant,Flea Market,Department Store,Pharmacy,Restaurant,Bistro,Food Truck,Basketball Court,Malay Restaurant,Clothing Store,Train Station,Coffee Shop,Dongbei Restaurant,Flower Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Diner,Dessert Shop,Eastern European Restaurant


In [86]:
#Examine Cluster 4
PG_City_merged.loc[PG_City_merged['Cluster Labels'] == 4, PG_City_merged.columns[[1] + [0] + list(range(5, PG_City_merged.shape[1]))]]

Unnamed: 0,State,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue,16th Most Common Venue,17th Most Common Venue,18th Most Common Venue,19th Most Common Venue,20th Most Common Venue
6,Penang,Nibong Tebal,Noodle House,Seafood Restaurant,Food Court,Café,Chinese Restaurant,Coffee Shop,Bakery,Asian Restaurant,Fast Food Restaurant,Restaurant,Sandwich Place,Flea Market,Bank,Supermarket,Breakfast Spot,Athletics & Sports,Thai Restaurant,Market,Electronics Store,Eastern European Restaurant


## Results and Discussion

The analysis is showing that restaurant are the most common venues in each city. 
Cluster 0 represent the major city in Penang, they are more 'famous' compared to other city. This could be a good spot for food truck business.
Cluster 1 is near to military based and it is not really common venues to visit by tourist.
Cluster 2 and 3 is considered as outskirt and not quite a tourist hotspot where local restaurant is among the top venues where local frequently visit. 
Cluster 4 is far from center of Penang and it is close to fishing village, hence seafood restaurant is famous in that area.


## Conclusion

Purpose of this project was to identify City area in Penang to understand what are the common venues. As tourist state, those venues will be heart of attraction and with crowd of tourist, food will be one of the necessity and it present opportunity opening restaurant. Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every city zone.