# Find potential locations for a new Family Mart store

## Introduction

Family mart is a Japanese convenient store franchies chain. It is planning to expand its franchie to Bangkok where 7-Eleven is the leader in convenient store in the area. In order to find potential locations to open Family Mart stores, Kmeans cluster will be used to segment and cluster neighborhoods in Bangkok based on nearby venue categories. Then the neighborhoods which have lower numbers of 7-Eleven store in similar cluster will be selected for further explore.

## Data

Data required for this project are bangkok neighborhoods, nearby venues in the neighborhoods, and number of 7-Eleven stores in each neighborhood. The neighborhood data contained neighborhood name and their location can be obtained from wikipedia, while nearby venues and number of 7-Eleven stores in each neighborhood can be explored and searched by using fourquare API.

In order to accurately search for numbers of 7-Eleven store in a neighborhood, the radius variable should be specified in foursquare request. Therefore, radius will also be calculated from area of each neighborhood. The area data will be obtained from bangkok database website.

## 1.Scraping neighborhood table from wikipedia and exploring the dataset

Install pandas and lxml library

In [1]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install pandas 
import pandas as pd

# import lxml.html to read data from html
!{sys.executable} -m pip install lxml
import lxml.html

You should consider upgrading via the '/usr/local/Cellar/jupyterlab/2.1.5/libexec/bin/python3.8 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/usr/local/Cellar/jupyterlab/2.1.5/libexec/bin/python3.8 -m pip install --upgrade pip' command.[0m


Scraping Bangkok neighborhood data from wikipedia

In [2]:
path = 'https://en.wikipedia.org/wiki/List_of_districts_of_Bangkok'
Bangkok_temp = pd.read_html(path)
Bangkok_data = Bangkok_temp[0]
Bangkok_data.head()

Unnamed: 0,District(Khet),MapNr,Post-code,Thai,Popu-lation,No. ofSubdis-trictsKhwaeng,Latitude,Longitude
0,Bang Bon,50,10150,บางบอน,105161,4,13.6592,100.3991
1,Bang Kapi,6,10240,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,40,10160,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,5,10220,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,31,10120,บางคอแหลม,94956,3,13.693333,100.5025


drop irrelevant columns

In [3]:
Bangkok_data.drop(columns=['MapNr'], inplace=True)
Bangkok_data.head()

Unnamed: 0,District(Khet),Post-code,Thai,Popu-lation,No. ofSubdis-trictsKhwaeng,Latitude,Longitude
0,Bang Bon,10150,บางบอน,105161,4,13.6592,100.3991
1,Bang Kapi,10240,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,10160,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,10220,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,10120,บางคอแหลม,94956,3,13.693333,100.5025


Rename columns

In [4]:
Bangkok_data.rename(columns={'District(Khet)': 'Neighborhood',
                             'Post-code': 'PostalCode',
                             'Popu-lation': 'Population', 
                             'No. ofSubdis-trictsKhwaeng': 'No.ofSub'},
                    inplace=True)
Bangkok_data.head()

Unnamed: 0,Neighborhood,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude
0,Bang Bon,10150,บางบอน,105161,4,13.6592,100.3991
1,Bang Kapi,10240,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,10160,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,10220,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,10120,บางคอแหลม,94956,3,13.693333,100.5025


In [5]:
Bangkok_data.tail(15)

Unnamed: 0,Neighborhood,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude
35,Prawet,10250,ประเวศ,160671,3,13.716944,100.694444
36,Rat Burana,10140,ราษฏร์บูรณะ,86695,2,13.682222,100.505556
37,Ratchathewi,10400,ราชเทวี,73035,4,13.758889,100.534444
38,Sai Mai,10220,สายไหม,188123,3,13.919167,100.645833
39,Samphanthawong,10100,สัมพันธวงศ์,27452,3,13.731389,100.514167
40,Saphan Sung,10240,สะพานสูง,89825,3,13.77,100.684722
41,Sathon,10120,สาทร,84916,3,13.708056,100.526389
42,Suan Luang,10250,สวนหลวง,115658,3,13.730278,100.651389
43,Taling Chan,10170,ตลิ่งชัน,106604,6,13.776944,100.456667
44,Thawi Watthana,10170,ทวีวัฒนา,76351,2,13.7878,100.3638


In [6]:
Bangkok_data.shape

(50, 7)

To get numbers of 7-Eleven store in each neighborhood, radius limit for searching will be required. The radius value can be calculated from area. Area of each neighborhood can be obtained from Bangkok database website.

In [7]:
url = 'http://www.e-report.energy.go.th/area/Bangkok.htm'
Temp_area = pd.read_html(url)
Bangkok_area = Temp_area[0]
Bangkok_area.head(15)

Unnamed: 0,0,1,2
0,ลำดับ,อำเภอ/กิ่งอำเภอ,เนื้อที่ (ตร.กม.)
1,1,เขตคลองเตย,12.99
2,,แขวงคลองเตย,7.25
3,,แขวงคลองตัน,1.90
4,,แขวงพระโขนง,3.85
5,2,เขตคลองสาน,6.05
6,,แขวงคลองต้นไทร,1.77
7,,แขวงคลองสาน,0.73
8,,แขวงบางลำภูล่าง,2.23
9,,แขวงสมเด็จเจ้าพระยา,1.32


The dataframe has both province and subprovince data. We only need province data so let's drop subprovince data.

In [8]:
Bangkok_area.dropna(inplace=True)
Bangkok_area.head()

Unnamed: 0,0,1,2
0,ลำดับ,อำเภอ/กิ่งอำเภอ,เนื้อที่ (ตร.กม.)
1,1,เขตคลองเตย,12.99
5,2,เขตคลองสาน,6.05
10,3,เขตคลองสามวา,110.69
16,4,เขตคันนายาว,25.98


Change the column name 

In [9]:
Bangkok_area.columns = ['Index', 'Thai', 'Area']

# drop first row
Bangkok_area.drop([0], axis=0, inplace=True)

# drop index column
Bangkok_area.drop(columns=['Index'], inplace=True)
Bangkok_area.head()

Unnamed: 0,Thai,Area
1,เขตคลองเตย,12.99
5,เขตคลองสาน,6.05
10,เขตคลองสามวา,110.69
16,เขตคันนายาว,25.98
18,เขตจตุจักร,32.91


Clean up Neighborhood name so it matches with Neighborhood name in Bangkok_data

In [10]:
Bangkok_area['Thai'] = [col.replace('เขต', "") for col in Bangkok_area['Thai']]
Bangkok_area.head()

Unnamed: 0,Thai,Area
1,คลองเตย,12.99
5,คลองสาน,6.05
10,คลองสามวา,110.69
16,คันนายาว,25.98
18,จตุจักร,32.91


In [31]:
Bangkok_area.tail(20)

Unnamed: 0,Thai,Area
123,พระโขนง,13.99
125,พระนคร,5.54
138,ภาษีเจริญ,17.83
146,มีนบุรี,63.65
149,ยานนาวา,16.66
152,ราชเทวี,7.13
157,ราษฎร์บูรณะ,15.78
160,ลาดกระบัง,123.86
167,ลาดพร้าว,21.86
170,วังทองหลาง,19.57


Merged the 2 dataframes

In [11]:
Bangkok_merged=Bangkok_data.merge(Bangkok_area)
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude,Area
0,Bang Bon,10150,บางบอน,105161,4,13.6592,100.3991,34.75
1,Bang Kapi,10240,บางกะปิ,148465,2,13.765833,100.647778,28.52
2,Bang Khae,10160,บางแค,191781,4,13.696111,100.409444,44.46
3,Bang Khen,10220,บางเขน,189539,2,13.873889,100.596389,42.12
4,Bang Kho Laem,10120,บางคอแหลม,94956,3,13.693333,100.5025,10.92


In [12]:
Bangkok_merged.shape

(49, 8)

One row is missing. Let's find out which one

In [13]:
# Check to see which row is missing
Bangkok_merged['Neighborhood']

0                 Bang Bon
1                Bang Kapi
2                Bang Khae
3                Bang Khen
4            Bang Kho Laem
5          Bang Khun Thian
6                  Bang Na
7               Bang Phlat
8                 Bang Rak
9                 Bang Sue
10             Bangkok Noi
11             Bangkok Yai
12               Bueng Kum
13               Chatuchak
14              Chom Thong
15               Din Daeng
16              Don Mueang
17                   Dusit
18             Huai Khwang
19             Khan Na Yao
20           Khlong Sam Wa
21              Khlong San
22             Khlong Toei
23                  Lak Si
24             Lat Krabang
25               Lat Phrao
26                Min Buri
27               Nong Chok
28              Nong Khaem
29              Pathum Wan
30           Phasi Charoen
31              Phaya Thai
32            Phra Khanong
33             Phra Nakhon
34    Pom Prap Sattru Phai
35                  Prawet
36             Ratchathewi
3

Rat Burana is missing. Let's add it back to the dataframe

In [14]:
# Get all the data of Rat Burana from Bangkok_data dataframe
Bangkok_merged = Bangkok_merged.append(Bangkok_data.loc[36])

In [15]:
# Manually add area value of Rat Burana to Bangkok_merged dataframe
Bangkok_merged.set_index(['Neighborhood'], inplace=True)
Bangkok_merged.loc[['Rat Burana'], ['Area']] = 15.78
Bangkok_merged.tail()

Unnamed: 0_level_0,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude,Area
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Thung Khru,10140,ทุ่งครุ,116473,2,13.6472,100.4958,30.74
Wang Thonglang,10310,วังทองหลาง,114768,4,13.7864,100.6087,19.57
Watthana,10110,วัฒนา,81623,3,13.742222,100.585833,12.57
Yan Nawa,10120,ยานนาวา,81521,2,13.696944,100.543056,16.66
Rat Burana,10140,ราษฏร์บูรณะ,86695,2,13.682222,100.505556,15.78


In [16]:
Bangkok_merged.reset_index(inplace=True)
Bangkok_merged.shape

(50, 8)

Now we do not need Thai name anymore. Let's drop it

In [17]:
Bangkok_merged.drop(columns=['Thai'], inplace=True)
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Population,No.ofSub,Latitude,Longitude,Area
0,Bang Bon,10150,105161,4,13.6592,100.3991,34.75
1,Bang Kapi,10240,148465,2,13.765833,100.647778,28.52
2,Bang Khae,10160,191781,4,13.696111,100.409444,44.46
3,Bang Khen,10220,189539,2,13.873889,100.596389,42.12
4,Bang Kho Laem,10120,94956,3,13.693333,100.5025,10.92


## 2. Get numbers of 7-11 stores in each neighborhood from foursquare

In [18]:
import requests # library to handle requests
CLIENT_ID = 'NMREAEN1VZAMGXCDGVFP13WE51UGFCRH2LRJAJY3T13CSQBA' # your Foursquare ID
CLIENT_SECRET = 'H3TUVYIZKUSIFTM2WMCVECPYMQKH4HCVSXZL4AIBAZYEBQC2' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

To calculate radius to use for get numbers of 7-Eleven store in each neighborhood, area can be used to calculate radius which will be applied to foursquare API

In [19]:
# First import math library to get pi value for radius calculation from area
import math
print(math.pi)

3.141592653589793


Set Area type as float to use numpy expression and calculate radius from the area in float type.

In [20]:
pi = math.pi
Bangkok_merged['RADIUS'] = [math.sqrt(area/pi) for area in Bangkok_merged['Area'].astype('float')]
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Population,No.ofSub,Latitude,Longitude,Area,RADIUS
0,Bang Bon,10150,105161,4,13.6592,100.3991,34.75,3.325849
1,Bang Kapi,10240,148465,2,13.765833,100.647778,28.52,3.013005
2,Bang Khae,10160,191781,4,13.696111,100.409444,44.46,3.761922
3,Bang Khen,10220,189539,2,13.873889,100.596389,42.12,3.661586
4,Bang Kho Laem,10120,94956,3,13.693333,100.5025,10.92,1.864388


Area unit is in kilometer square, therefore calculated RADIUS is in kilometer unit. Foursquare radius unit is in meter so we convert RADIUS to meter scale and add to the dataframe before dropping the RADIUS column that is in kilimeter unit.

In [21]:
Bangkok_merged['Radius'] = [r*1000 for r in Bangkok_merged['RADIUS']]
Bangkok_merged.drop(columns=['RADIUS'], inplace=True)
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Population,No.ofSub,Latitude,Longitude,Area,Radius
0,Bang Bon,10150,105161,4,13.6592,100.3991,34.75,3325.848545
1,Bang Kapi,10240,148465,2,13.765833,100.647778,28.52,3013.004805
2,Bang Khae,10160,191781,4,13.696111,100.409444,44.46,3761.922054
3,Bang Khen,10220,189539,2,13.873889,100.596389,42.12,3661.586051
4,Bang Kho Laem,10120,94956,3,13.693333,100.5025,10.92,1864.38836


#### Now let's explore the first neighborhood

In [22]:
Bangkok_merged.loc[0, 'Neighborhood']

'Bang Bon'

Get the neighborhood's latitude and longitude values.

In [23]:
neighborhood_latitude = Bangkok_merged.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = Bangkok_merged.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = Bangkok_merged.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Bang Bon are 13.6592, 100.3991.


#### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL url.

In [24]:
radius = 3326
LIMIT = 100
query = '7-Eleven'
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}&query={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT, query)
url

'https://api.foursquare.com/v2/venues/explore?client_id=NMREAEN1VZAMGXCDGVFP13WE51UGFCRH2LRJAJY3T13CSQBA&client_secret=H3TUVYIZKUSIFTM2WMCVECPYMQKH4HCVSXZL4AIBAZYEBQC2&ll=13.6592,100.3991&v=20180605&radius=3326&limit=100&query=7-Eleven'

Send the GET request to relevant results and count number of 7-Eleven stores in the neighborhood

In [25]:
results = requests.get(url).json()["response"]['groups'][0]['items']
num7_store = len(results)
print(num7_store)

19


Creat a function to get number of 7-Eleven store in each neighborhood

In [26]:
def GetSevenStore(names, latitudes, longitudes, radius, query='7-Eleven'):
    
    venues_list=[]
    for name, lat, lng, radius in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&query={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            query)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append(len(results))
   
    return(venues_list)

Run the getsevenstore function to get a list of number of 7-Eleven stores in each neighborhood

In [27]:
seven_stores = GetSevenStore(names=Bangkok_merged['Neighborhood'], 
                             latitudes=Bangkok_merged['Latitude'], 
                             longitudes=Bangkok_merged['Longitude'], 
                             radius=Bangkok_merged['Radius'])

Bang Bon
Bang Kapi
Bang Khae
Bang Khen
Bang Kho Laem
Bang Khun Thian
Bang Na
Bang Phlat
Bang Rak
Bang Sue
Bangkok Noi
Bangkok Yai
Bueng Kum
Chatuchak
Chom Thong
Din Daeng
Don Mueang
Dusit
Huai Khwang
Khan Na Yao
Khlong Sam Wa
Khlong San
Khlong Toei
Lak Si
Lat Krabang
Lat Phrao
Min Buri
Nong Chok
Nong Khaem
Pathum Wan
Phasi Charoen
Phaya Thai
Phra Khanong
Phra Nakhon
Pom Prap Sattru Phai
Prawet
Ratchathewi
Sai Mai
Samphanthawong
Saphan Sung
Sathon
Suan Luang
Taling Chan
Thawi Watthana
Thon Buri
Thung Khru
Wang Thonglang
Watthana
Yan Nawa
Rat Burana


Add number of 7-Eleven stores located in each neighborhood to Bangkok_merged dataframe

In [28]:
Bangkok_merged['7-Eleven'] = seven_stores
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Population,No.ofSub,Latitude,Longitude,Area,Radius,7-Eleven
0,Bang Bon,10150,105161,4,13.6592,100.3991,34.75,3325.848545,19
1,Bang Kapi,10240,148465,2,13.765833,100.647778,28.52,3013.004805,30
2,Bang Khae,10160,191781,4,13.696111,100.409444,44.46,3761.922054,28
3,Bang Khen,10220,189539,2,13.873889,100.596389,42.12,3661.586051,30
4,Bang Kho Laem,10120,94956,3,13.693333,100.5025,10.92,1864.38836,26


#### Find nearby venues in each neighborhood

In [29]:
# create a function to get nearby venues for each neighborhood

def getNearbyVenues(names, latitudes, longitudes, radius, ):
    
    venues_list=[]
    for name, lat, lng, radius in zip(names, latitudes, longitudes, radius):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    
    # creat dataframe from nearby_venues list
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    # rename column in the dataframe
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [32]:
# Now run the function above to get 100 nearby venues for the neighborhood in Bangkok_merged
LIMIT=100
Bangkok_venues = getNearbyVenues(names=Bangkok_merged['Neighborhood'],
                                   latitudes=Bangkok_merged['Latitude'],
                                   longitudes=Bangkok_merged['Longitude'],
                                   radius=Bangkok_merged['Radius']
                                  )

Bang Bon
Bang Kapi
Bang Khae
Bang Khen
Bang Kho Laem
Bang Khun Thian
Bang Na
Bang Phlat
Bang Rak
Bang Sue
Bangkok Noi
Bangkok Yai
Bueng Kum
Chatuchak
Chom Thong
Din Daeng
Don Mueang
Dusit
Huai Khwang
Khan Na Yao
Khlong Sam Wa
Khlong San
Khlong Toei
Lak Si
Lat Krabang
Lat Phrao
Min Buri
Nong Chok
Nong Khaem
Pathum Wan
Phasi Charoen
Phaya Thai
Phra Khanong
Phra Nakhon
Pom Prap Sattru Phai
Prawet
Ratchathewi
Sai Mai
Samphanthawong
Saphan Sung
Sathon
Suan Luang
Taling Chan
Thawi Watthana
Thon Buri
Thung Khru
Wang Thonglang
Watthana
Yan Nawa
Rat Burana


Explore the nearby venues dataframe

In [33]:
Bangkok_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bang Bon,13.6592,100.3991,ขาหมูบางหว้า,13.657136,100.39523,Thai Restaurant
1,Bang Bon,13.6592,100.3991,ร้านต้นไม้ ริมถนนกาญจนาภิเษก,13.654098,100.405054,Garden Center
2,Bang Bon,13.6592,100.3991,KFC,13.670449,100.405502,Fast Food Restaurant
3,Bang Bon,13.6592,100.3991,TPD Bowling,13.663977,100.408965,Bowling Alley
4,Bang Bon,13.6592,100.3991,Burger King (เบอร์เกอร์คิง),13.67083,100.405089,Fast Food Restaurant


In [34]:
Bangkok_venues.shape

(4788, 7)

Check how many venues were returned for each neighborhood

In [35]:
Bangkok_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bang Bon,100,100,100,100,100,100
Bang Kapi,100,100,100,100,100,100
Bang Khae,100,100,100,100,100,100
Bang Khen,100,100,100,100,100,100
Bang Kho Laem,100,100,100,100,100,100
Bang Khun Thian,100,100,100,100,100,100
Bang Na,100,100,100,100,100,100
Bang Phlat,100,100,100,100,100,100
Bang Rak,100,100,100,100,100,100
Bang Sue,100,100,100,100,100,100


Check how many unique categories can be curated from the nearby venues

In [36]:
print('There are {} uniques categories.'.format(len(Bangkok_venues['Venue Category'].unique())))

There are 250 uniques categories.


### Analyze each neighborhood

In [37]:
# one hot encoding
Bangkok_onehot = pd.get_dummies(Bangkok_venues[['Venue Category']], prefix="", prefix_sep="")
Bangkok_onehot.shape

(4788, 250)

In [38]:
# drop neighborhood column generated from one hot encoding in the dataframe
Bangkok_onehot.drop(columns=['Neighborhood'], inplace=True)
Bangkok_onehot.shape

(4788, 249)

In [166]:
# add the correct neighborhood column back to dataframe
Bangkok_onehot['Neighborhood'] = Bangkok_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Bangkok_onehot.columns[-1]] + list(Bangkok_onehot.columns[:-1])
Bangkok_onehot = Bangkok_onehot[fixed_columns]

Bangkok_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Service,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bang Bon,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Next  is to group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [167]:
Bangkok_grouped = Bangkok_onehot.groupby('Neighborhood').mean().reset_index()
Bangkok_grouped

Unnamed: 0,Neighborhood,Accessories Store,Airport,Airport Lounge,Airport Service,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant,Zoo,Zoo Exhibit
0,Bang Bon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bang Kapi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
2,Bang Khae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bang Khen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bang Kho Laem,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bang Khun Thian,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
6,Bang Na,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bang Phlat,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bang Rak,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bang Sue,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01


#### Let's find out each neighborhood top 10 most common venues

First write a function to sort the venues in descending order.

In [168]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Then create a new dataframe and display the top 10 venues for each neighborhood.

In [169]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']    # Name the first column 'Neighborhood'
for ind in np.arange(num_top_venues):   # loop to create the rest of the column name as XX Most Common Venue
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind])) # loop to create 1st, 2nd and 3rd Most Common Venue
    except:
        columns.append('{}th Most Common Venue'.format(ind+1)) # loop to create 4th - 10th Most Common Venue column name

# create a new dataframe using the columns name from above
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Bangkok_grouped['Neighborhood']   # add data to Neighborhood column frin Bangkok_grouped dataframe 

for ind in np.arange(Bangkok_grouped.shape[0]):     #add data from Bangkok_group dataframe to the rest of the columns
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Bangkok_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bang Bon,Convenience Store,Asian Restaurant,Noodle House,Coffee Shop,Thai Restaurant,Som Tum Restaurant,Bar,Fast Food Restaurant,Supermarket,Market
1,Bang Kapi,Noodle House,Coffee Shop,Thai Restaurant,Japanese Restaurant,Dessert Shop,Fast Food Restaurant,Som Tum Restaurant,Steakhouse,Clothing Store,Supermarket
2,Bang Khae,Coffee Shop,Noodle House,Thai Restaurant,Japanese Restaurant,BBQ Joint,Convenience Store,Fast Food Restaurant,Ice Cream Shop,Hotpot Restaurant,Pizza Place
3,Bang Khen,Coffee Shop,Noodle House,Thai Restaurant,Hotpot Restaurant,Hotel,Supermarket,Golf Course,Gym / Fitness Center,Convenience Store,Gun Range
4,Bang Kho Laem,Noodle House,Coffee Shop,Thai Restaurant,Hotel,Convenience Store,Chinese Restaurant,Hotpot Restaurant,Pub,BBQ Joint,Breakfast Spot


## 4. Cluster neighborhood

Run *k*-means to cluster the neighborhood using mean of each category in the neighborhood into 5 clusters.

In [170]:
# import Kmeans library
!{sys.executable} -m pip install scikit-learn
from sklearn.cluster import KMeans 

# set number of clusters
kclusters = 5

Bangkok_grouped_clustering = Bangkok_grouped.drop('Neighborhood', axis=1)

# run k-means clustering using manhattan_group dataframe (contained mean of each category in neighborhood)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Bangkok_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 



array([1, 2, 2, 2, 4, 2, 1, 0, 3, 4], dtype=int32)

Then create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [177]:
# add clustering labels as a column to neighborhoods_venues_sorted dataframe
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Bangkok_merged = Bangkok_data

# merge Bangkok_grouped with Bangkok_data to add Borough, latitude/longitude for each neighborhood
Bangkok_merged = Bangkok_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Bangkok_merged.head() # check the merged dataframe!

Unnamed: 0,Neighborhood,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bang Bon,10150,บางบอน,105161,4,13.6592,100.3991,1,Convenience Store,Asian Restaurant,Noodle House,Coffee Shop,Thai Restaurant,Som Tum Restaurant,Bar,Fast Food Restaurant,Supermarket,Market
1,Bang Kapi,10240,บางกะปิ,148465,2,13.765833,100.647778,2,Noodle House,Coffee Shop,Thai Restaurant,Japanese Restaurant,Dessert Shop,Fast Food Restaurant,Som Tum Restaurant,Steakhouse,Clothing Store,Supermarket
2,Bang Khae,10160,บางแค,191781,4,13.696111,100.409444,2,Coffee Shop,Noodle House,Thai Restaurant,Japanese Restaurant,BBQ Joint,Convenience Store,Fast Food Restaurant,Ice Cream Shop,Hotpot Restaurant,Pizza Place
3,Bang Khen,10220,บางเขน,189539,2,13.873889,100.596389,2,Coffee Shop,Noodle House,Thai Restaurant,Hotpot Restaurant,Hotel,Supermarket,Golf Course,Gym / Fitness Center,Convenience Store,Gun Range
4,Bang Kho Laem,10120,บางคอแหลม,94956,3,13.693333,100.5025,4,Noodle House,Coffee Shop,Thai Restaurant,Hotel,Convenience Store,Chinese Restaurant,Hotpot Restaurant,Pub,BBQ Joint,Breakfast Spot


In [178]:
Bangkok_merged.shape

(50, 18)

In [180]:
Bangkok_merged.head()

Unnamed: 0,Neighborhood,PostalCode,Thai,Population,No.ofSub,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
0,Bang Bon,10150,บางบอน,105161,4,13.6592,100.3991,1,Convenience Store,Asian Restaurant,Noodle House,Coffee Shop,Thai Restaurant,Som Tum Restaurant,Bar,Fast Food Restaurant,Supermarket,Market,19
1,Bang Kapi,10240,บางกะปิ,148465,2,13.765833,100.647778,2,Noodle House,Coffee Shop,Thai Restaurant,Japanese Restaurant,Dessert Shop,Fast Food Restaurant,Som Tum Restaurant,Steakhouse,Clothing Store,Supermarket,30
2,Bang Khae,10160,บางแค,191781,4,13.696111,100.409444,2,Coffee Shop,Noodle House,Thai Restaurant,Japanese Restaurant,BBQ Joint,Convenience Store,Fast Food Restaurant,Ice Cream Shop,Hotpot Restaurant,Pizza Place,28
3,Bang Khen,10220,บางเขน,189539,2,13.873889,100.596389,2,Coffee Shop,Noodle House,Thai Restaurant,Hotpot Restaurant,Hotel,Supermarket,Golf Course,Gym / Fitness Center,Convenience Store,Gun Range,30
4,Bang Kho Laem,10120,บางคอแหลม,94956,3,13.693333,100.5025,4,Noodle House,Coffee Shop,Thai Restaurant,Hotel,Convenience Store,Chinese Restaurant,Hotpot Restaurant,Pub,BBQ Joint,Breakfast Spot,26


Final step is to visualize the resulting clusters

In [175]:
# install folium
!{sys.executable} -m pip install folium
import folium



In [176]:
# install matplot and import colormaps to plot visualize the map
!{sys.executable} -m pip install matplot 
import matplotlib.cm as cm
import matplotlib.colors as colors



In [196]:
# create map
latitude = 13.7563
longitude = 100.5018
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster, seven in zip(Bangkok_merged['Latitude'], Bangkok_merged['Longitude'], Bangkok_merged['Neighborhood'], Bangkok_merged['Cluster Labels'], Bangkok_merged['7-Eleven']):
    label = folium.Popup(str(poi) + ' Cluster:' + str(cluster) + ' 7-11:' + str(seven), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Results

Now, let's examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

### Cluster 1 is Center of conveneince store, Thai restaurant, and noodle house.

In [185]:
Bangkok_merged.loc[Bangkok_merged['Cluster Labels'] == 0, Bangkok_merged.columns[[0] + list(range(5, Bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
7,Bang Phlat,13.793889,100.505,0,Thai Restaurant,Convenience Store,Noodle House,Coffee Shop,Café,Asian Restaurant,Hotel,Ice Cream Shop,Vietnamese Restaurant,Flea Market,26
26,Min Buri,13.813889,100.748056,0,Thai Restaurant,Convenience Store,Noodle House,Coffee Shop,Café,Flea Market,Golf Course,Asian Restaurant,Furniture / Home Store,Hotpot Restaurant,22
28,Nong Khaem,13.704722,100.348889,0,Convenience Store,Thai Restaurant,Noodle House,Hotpot Restaurant,Restaurant,Flea Market,Asian Restaurant,Bar,Fast Food Restaurant,Diner,19
34,Pom Prap Sattru Phai,13.758056,100.513056,0,Noodle House,Convenience Store,Café,Thai Restaurant,Asian Restaurant,Chinese Restaurant,Museum,Steakhouse,Government Building,Mediterranean Restaurant,11
38,Sai Mai,13.919167,100.645833,0,Convenience Store,Thai Restaurant,Noodle House,Café,Coffee Shop,Vietnamese Restaurant,Market,BBQ Joint,Asian Restaurant,Fast Food Restaurant,9
44,Thawi Watthana,13.7878,100.3638,0,Thai Restaurant,Convenience Store,Noodle House,Coffee Shop,Seafood Restaurant,Shabu-Shabu Restaurant,Café,Italian Restaurant,Bakery,Dessert Shop,25
45,Thon Buri,13.725,100.485833,0,Noodle House,Thai Restaurant,Convenience Store,Coffee Shop,Asian Restaurant,Seafood Restaurant,Dessert Shop,Train Station,Steakhouse,Café,28
46,Thung Khru,13.6472,100.4958,0,Convenience Store,Thai Restaurant,Noodle House,Coffee Shop,Café,Bar,Som Tum Restaurant,Hotpot Restaurant,Seafood Restaurant,BBQ Joint,29


### Cluster 2 is mixed area of coffee shop, convenience store and Thai restaurant.

In [186]:
Bangkok_merged.loc[Bangkok_merged['Cluster Labels'] == 1, Bangkok_merged.columns[[0] + list(range(5, Bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
0,Bang Bon,13.6592,100.3991,1,Convenience Store,Asian Restaurant,Noodle House,Coffee Shop,Thai Restaurant,Som Tum Restaurant,Bar,Fast Food Restaurant,Supermarket,Market,19
6,Bang Na,13.680081,100.5918,1,Coffee Shop,Thai Restaurant,Noodle House,Convenience Store,Fast Food Restaurant,Café,Train Station,Hotel,Chinese Restaurant,Asian Restaurant,20
10,Bangkok Noi,13.770867,100.467933,1,Convenience Store,Noodle House,Japanese Restaurant,Coffee Shop,Som Tum Restaurant,Café,Dessert Shop,Steakhouse,Supermarket,Seafood Restaurant,25
12,Bueng Kum,13.785278,100.669167,1,Thai Restaurant,Noodle House,Coffee Shop,Japanese Restaurant,Supermarket,Convenience Store,Café,Halal Restaurant,Fast Food Restaurant,Bakery,27
16,Don Mueang,13.913611,100.589722,1,Noodle House,Coffee Shop,Convenience Store,Thai Restaurant,Flea Market,Som Tum Restaurant,Hotel,Market,Café,Fast Food Restaurant,30
23,Lak Si,13.8875,100.578889,1,Thai Restaurant,Coffee Shop,Convenience Store,Noodle House,Som Tum Restaurant,Hotel,Fast Food Restaurant,Gym / Fitness Center,Hotpot Restaurant,Supermarket,29
24,Lat Krabang,13.722317,100.759669,1,Coffee Shop,Noodle House,Thai Restaurant,Airport Lounge,Airport Service,Park,Convenience Store,Hotel,Som Tum Restaurant,Asian Restaurant,30
27,Nong Chok,13.855556,100.8625,1,Thai Restaurant,Café,Golf Course,Convenience Store,Coffee Shop,Shopping Mall,Restaurant,Flea Market,Pub,Shabu-Shabu Restaurant,14
30,Phasi Charoen,13.714722,100.437222,1,Convenience Store,Coffee Shop,Noodle House,Thai Restaurant,BBQ Joint,Asian Restaurant,Steakhouse,Hotpot Restaurant,Fast Food Restaurant,Som Tum Restaurant,23
32,Phra Khanong,13.702222,100.601667,1,Coffee Shop,Convenience Store,Noodle House,Hotel,Italian Restaurant,Chinese Restaurant,Ice Cream Shop,Gym / Fitness Center,Spa,Train Station,30


### Cluster 3 is Heaven for Noodle lovers and coffee shop hoppers

In [187]:
Bangkok_merged.loc[Bangkok_merged['Cluster Labels'] == 2, Bangkok_merged.columns[[0] + list(range(5, Bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
1,Bang Kapi,13.765833,100.647778,2,Noodle House,Coffee Shop,Thai Restaurant,Japanese Restaurant,Dessert Shop,Fast Food Restaurant,Som Tum Restaurant,Steakhouse,Clothing Store,Supermarket,30
2,Bang Khae,13.696111,100.409444,2,Coffee Shop,Noodle House,Thai Restaurant,Japanese Restaurant,BBQ Joint,Convenience Store,Fast Food Restaurant,Ice Cream Shop,Hotpot Restaurant,Pizza Place,28
3,Bang Khen,13.873889,100.596389,2,Coffee Shop,Noodle House,Thai Restaurant,Hotpot Restaurant,Hotel,Supermarket,Golf Course,Gym / Fitness Center,Convenience Store,Gun Range,30
5,Bang Khun Thian,13.660833,100.435833,2,Coffee Shop,Noodle House,Thai Restaurant,BBQ Joint,Seafood Restaurant,Asian Restaurant,Fast Food Restaurant,Hotpot Restaurant,Ice Cream Shop,Supermarket,30
11,Bangkok Yai,13.722778,100.476389,2,Noodle House,Japanese Restaurant,Convenience Store,Asian Restaurant,Restaurant,Train Station,Fast Food Restaurant,Hotpot Restaurant,Ice Cream Shop,Chinese Restaurant,16
13,Chatuchak,13.828611,100.559722,2,Coffee Shop,Thai Restaurant,Asian Restaurant,Seafood Restaurant,Ice Cream Shop,Noodle House,Dessert Shop,Gym / Fitness Center,Som Tum Restaurant,Café,30
15,Din Daeng,13.769722,100.552778,2,Coffee Shop,Noodle House,Thai Restaurant,Som Tum Restaurant,Bar,Restaurant,Convenience Store,Japanese Restaurant,Bakery,Steakhouse,30
18,Huai Khwang,13.776667,100.579444,2,Noodle House,Thai Restaurant,Som Tum Restaurant,Hotel,Coffee Shop,Seafood Restaurant,Korean Restaurant,Japanese Restaurant,Dessert Shop,Asian Restaurant,30
19,Khan Na Yao,13.8271,100.6743,2,Thai Restaurant,Coffee Shop,Noodle House,Japanese Restaurant,Dessert Shop,Ice Cream Shop,Café,Som Tum Restaurant,Asian Restaurant,Bakery,30
20,Khlong Sam Wa,13.859722,100.704167,2,Thai Restaurant,Coffee Shop,Japanese Restaurant,Dessert Shop,Noodle House,Asian Restaurant,Som Tum Restaurant,Supermarket,Bakery,Convenience Store,30


### Cluster 4 is toursit district. Plenty of hotels are available in this cluster.

In [188]:
Bangkok_merged.loc[Bangkok_merged['Cluster Labels'] == 3, Bangkok_merged.columns[[0] + list(range(5, Bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
8,Bang Rak,13.730833,100.524167,3,Noodle House,Café,Chinese Restaurant,Coffee Shop,Thai Restaurant,Hotel,Seafood Restaurant,Japanese Restaurant,Hostel,Hotel Bar,30
21,Khlong San,13.730278,100.509722,3,Noodle House,Coffee Shop,Chinese Restaurant,Dessert Shop,Café,Hotel,Thai Restaurant,Hotel Bar,Art Gallery,Restaurant,24
29,Pathum Wan,13.744942,100.5222,3,Noodle House,Coffee Shop,Dessert Shop,Hotel,Asian Restaurant,Seafood Restaurant,Thai Restaurant,Hostel,Bar,Bakery,30
31,Phaya Thai,13.78,100.542778,3,Thai Restaurant,Coffee Shop,Bar,Japanese Restaurant,Café,Restaurant,Noodle House,Som Tum Restaurant,Sushi Restaurant,Burger Joint,30
33,Phra Nakhon,13.764444,100.499167,3,Bar,Thai Restaurant,Hotel,Café,Noodle House,Massage Studio,Asian Restaurant,Spa,Coffee Shop,Vegetarian / Vegan Restaurant,30
37,Ratchathewi,13.758889,100.534444,3,Hotel,Hostel,Coffee Shop,Steakhouse,Café,Massage Studio,Noodle House,Thai Restaurant,Shopping Mall,Convenience Store,23
39,Samphanthawong,13.731389,100.514167,3,Thai Restaurant,Coffee Shop,Dessert Shop,Art Gallery,Café,Noodle House,Hotel Bar,Chinese Restaurant,Satay Restaurant,Restaurant,19
41,Sathon,13.708056,100.526389,3,Noodle House,Thai Restaurant,Asian Restaurant,Hotel,Coffee Shop,Café,Bar,Chinese Restaurant,Dessert Shop,Som Tum Restaurant,29
48,Watthana,13.742222,100.585833,3,Café,Coffee Shop,Japanese Restaurant,Thai Restaurant,Bar,Noodle House,BBQ Joint,Hotel,Supermarket,Nightclub,30
49,Yan Nawa,13.696944,100.543056,3,Thai Restaurant,Asian Restaurant,Coffee Shop,Noodle House,Café,Japanese Restaurant,Bakery,Restaurant,BBQ Joint,Seafood Restaurant,27


### Cluster 5 is Hotpot hotspot.

In [189]:
Bangkok_merged.loc[Bangkok_merged['Cluster Labels'] == 4, Bangkok_merged.columns[[0] + list(range(5, Bangkok_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,7-Eleven
4,Bang Kho Laem,13.693333,100.5025,4,Noodle House,Coffee Shop,Thai Restaurant,Hotel,Convenience Store,Chinese Restaurant,Hotpot Restaurant,Pub,BBQ Joint,Breakfast Spot,26
9,Bang Sue,13.809722,100.537222,4,Thai Restaurant,Noodle House,Coffee Shop,Café,Ice Cream Shop,Clothing Store,Hotpot Restaurant,Som Tum Restaurant,Park,Train Station,22
14,Chom Thong,13.677222,100.484722,4,Noodle House,Thai Restaurant,Coffee Shop,Convenience Store,Hotpot Restaurant,Bar,BBQ Joint,Asian Restaurant,Chinese Restaurant,Seafood Restaurant,23
17,Dusit,13.776944,100.520556,4,Noodle House,Thai Restaurant,Coffee Shop,Café,Palace,Asian Restaurant,Chinese Restaurant,Dessert Shop,Museum,Som Tum Restaurant,30
36,Rat Burana,13.682222,100.505556,4,Coffee Shop,Noodle House,Thai Restaurant,Chinese Restaurant,Hotpot Restaurant,Convenience Store,BBQ Joint,Pub,Seafood Restaurant,Spa,30
42,Suan Luang,13.730278,100.651389,4,Noodle House,Coffee Shop,Thai Restaurant,Som Tum Restaurant,Chinese Restaurant,Café,Dessert Shop,Japanese Restaurant,Asian Restaurant,Burger Joint,30


### Discussion

Based on the clustering results and number of competitor store which is 7-Eleven stores in each neighborhood, the tourist district (cluster 4) does not have much convenient store as it is not in the first ten most common venue. In particular, Samphanthawong neighborhood has only 19 7-Eleven stores. This could be a good area to open a Family Mart.

### Conclusion

In this project, I utilize Kmeans cluster to segment and cluster neighborhoods in Bangkok based on nearby venue categories to identify potential location for a Family Mart store. The Kmeans cluster approach segment Bangkok neighborhoods into 5 clusters which are 1. the center of convenient stores, Thai restaurants and noodle houses, 2. mixed area of coffee shop, convenient store and Thai restaurant, 3. the heaven for noodle lovers and coffee shop hoppers, 4. tourist district, and 5. hotpot hotspot. Out of these 5 clusters, convenient stores seem to be lacking in the tourist district especially Samphanthawong neighborhood. Therefore, it is nominated to be the best location for a Family Mart store based on this study. Further studies could be conducted by pinpoint a smaller area for the new Family Mart store by analysing sub neighbhorhood in Samphantawong neighborhood. It is also possible to take into account the number of population per convenient store in each neighborhood.