## Battle of the Neighborhoods - hotel edition


There are several hotels in Bangkok, in different areas. The objective here is to find the most suitable area for a new 5* hotel. What we want to understand is:

* Where are 5 stars hotels located?
* What are the similarities between the various 5 stars hotels?
* Is there a common pattern for success based on the neighbourhood facilities?

After this exploratory part, we will proceed to analyze hotel density per area and understand where we can open a new hotel.

* What is the best location to open a new 5 stars hotel?

### Tools for the job

We will use different tools.

a. Exploratory part:

   * folium
   * open data related to Bangkok neighbourhoods
   * requests and foursquare API to gather hotel data
   * for each 5 star hotel, we will analyze the surroundings and understand what is the pattern in a hotel's location.   
   
b. Analysis

   * TBD

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import json
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 

In [2]:
# alternate scraping
url = 'https://en.wikipedia.org/w/index.php?title=List_of_districts_of_Bangkok&oldid=884382933'
response = requests.get(url)
# make the soup
soup = BeautifulSoup(response.text)
# find table
table = soup.find("table", {"class": "wikitable"})
rows = table.find_all('tr')[1:]
rows[0]

<tr>
<td><a href="/wiki/Bang_Bon_District" title="Bang Bon District">Bang Bon</a></td>
<td>50</td>
<td>บางบอน</td>
<td align="right">105,161</td>
<td>4
</td>
<td>NA
</td>
<td>NA
</td></tr>

In [3]:
# build df from table
cols = ['Neighborhood', 'Code', 'Thai','Population', 'Subdistricts', 'Latitude', 'Longitude'] 
#Initialize df
df = pd.DataFrame(columns=cols)

In [4]:
for i in range(len(rows)):
    df.loc[i] = [td.text.strip() for td in rows[i].find_all('td')]
    

In [5]:
df.head()

Unnamed: 0,Neighborhood,Code,Thai,Population,Subdistricts,Latitude,Longitude
0,Bang Bon,50,บางบอน,105161,4,,
1,Bang Kapi,6,บางกะปิ,148465,2,13.765833,100.647778
2,Bang Khae,40,บางแค,191781,4,13.696111,100.409444
3,Bang Khen,5,บางเขน,189539,2,13.873889,100.596389
4,Bang Kho Laem,31,บางคอแหลม,94956,3,13.693333,100.5025


In [6]:
df = df.drop(['Code', 'Thai', 'Population', 'Subdistricts'], axis=1)
df = df.drop(df[df.Latitude == 'NA'].index)
df = df.rename(columns={'Latitude':'latitude', 'Longitude':'longitude'})
df['latitude'] = pd.to_numeric(df['latitude'], errors='ignore')
df['longitude'] = pd.to_numeric(df['longitude'], errors='ignore')
df.head()

Unnamed: 0,Neighborhood,latitude,longitude
1,Bang Kapi,13.765833,100.647778
2,Bang Khae,13.696111,100.409444
3,Bang Khen,13.873889,100.596389
4,Bang Kho Laem,13.693333,100.5025
5,Bang Khun Thian,13.660833,100.435833


In [7]:
# Bangkok map
bkk_lat = 13.7563
bkk_lon = 100.5018
bkk = folium.Map(location=[bkk_lat, bkk_lon], zoom_start=10)
bkk

In [8]:
# Let's load neighbourhood data for later use
bkk_hoods = r'adm2_greaterBK_hD4.json'

Now add markers to map

In [9]:
# add markers to map
for lat, lng, neighborhood in zip(df['latitude'], df['longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(bkk)  
    
bkk

In [10]:
# Load foursquare credentials

CLIENT_ID = 'ZQCVWU4MH0JBSW2K03ME3T2GATLO2EF5PWQ25JFHSZ1UYZLM' # your Foursquare ID
CLIENT_SECRET = 'VMTDMOCYHACWG2N45IJ0YP2EY4AI2M2QPIY1VMFFPJ4XHDEF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials are loaded')

Your credentials are loaded


In [11]:
# now let's isolate a sample neighborhood, Bang Rak

hood_latitude = df.loc[30, 'latitude'] 
hood_longitude = df.loc[30, 'longitude'] 

hood_name = df.loc[30, 'Neighborhood'] 

print('Latitude and longitude values of {} are {}, {}.'.format(hood_name, 
                                                               hood_latitude, 
                                                               hood_longitude))

Latitude and longitude values of Phasi Charoen are 13.714722, 100.43722199999999.


In [12]:
# now let's search hotels. Build a URL:

# type your answer here

search_url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query=hotel'.format(CLIENT_ID, CLIENT_SECRET, hood_latitude, hood_longitude, VERSION, 500, 100) 


In [13]:
search_results = requests.get(search_url).json()
search_results

{'meta': {'code': 200, 'requestId': '5c6faf811ed2196e4aa41b29'},
 'response': {'venues': [{'id': '4b0587f6f964a5200da922e3',
    'name': 'Shangri-La Hotel, Bangkok',
    'location': {'address': '89 Soi Charoen Krung 42/1',
     'lat': 13.721573379038752,
     'lng': 100.51381182368625,
     'labeledLatLngs': [{'label': 'display',
       'lat': 13.721573379038752,
       'lng': 100.51381182368625}],
     'distance': 8317,
     'postalCode': '10500',
     'cc': 'TH',
     'city': 'บางรัก',
     'state': 'กรุงเทพมหานคร',
     'country': 'ประเทศไทย',
     'formattedAddress': ['89 Soi Charoen Krung 42/1',
      'บางรัก',
      'กรุงเทพมหานคร 10500',
      'ประเทศไทย']},
    'categories': [{'id': '4bf58dd8d48988d1fa931735',
      'name': 'Hotel',
      'pluralName': 'Hotels',
      'shortName': 'Hotel',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
       'suffix': '.png'},
      'primary': True}],
    'venuePage': {'id': '32930345'},
    'referralId': 'v-1

In [14]:
len(search_results['response']['venues'])

30

Let's check how many fields we have per response.


In [15]:
search_results['response']['venues'][0].keys()

dict_keys(['id', 'name', 'location', 'categories', 'venuePage', 'referralId', 'hasPerk'])

In [16]:
search_results['response']['venues'][0]

{'id': '4b0587f6f964a5200da922e3',
 'name': 'Shangri-La Hotel, Bangkok',
 'location': {'address': '89 Soi Charoen Krung 42/1',
  'lat': 13.721573379038752,
  'lng': 100.51381182368625,
  'labeledLatLngs': [{'label': 'display',
    'lat': 13.721573379038752,
    'lng': 100.51381182368625}],
  'distance': 8317,
  'postalCode': '10500',
  'cc': 'TH',
  'city': 'บางรัก',
  'state': 'กรุงเทพมหานคร',
  'country': 'ประเทศไทย',
  'formattedAddress': ['89 Soi Charoen Krung 42/1',
   'บางรัก',
   'กรุงเทพมหานคร 10500',
   'ประเทศไทย']},
 'categories': [{'id': '4bf58dd8d48988d1fa931735',
   'name': 'Hotel',
   'pluralName': 'Hotels',
   'shortName': 'Hotel',
   'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
    'suffix': '.png'},
   'primary': True}],
 'venuePage': {'id': '32930345'},
 'referralId': 'v-1550823297',
 'hasPerk': False}

### Neighborhood check - number of hotels per neighborhood

let's check how many hotels we have in each neighborhood. The method is flawed because the search has to be per radius and related to the latlon points we got from the initial dataframe, which are not that great - however let's try to find a viable results anyway.

Step 1 - build a new dataframe with hotels associated to neighborhood.


In [17]:
venues = search_results['response']['venues']
hotels_list = json_normalize(venues)

# filter columns
filtered_columns = ['id', 'name', 'location.address', 'location.lat', 'location.lng', 'Neighborhood']
hotels_list_b = hotels_list.loc[:, filtered_columns]

hotels_list_b.head()

for venue in venues:
    print(venue['location']['address'])

89 Soi Charoen Krung 42/1
946 Rama IV Rd
991/9 Rama I Rd
257 Charoen Nakhon Rd
61 Wireless Road (Witthayu), Lumpini, Pathumwan, Bangkok
518/8 Ploenchit Road, Lumphini
222 Ratchaprarop Rd
4 Sukhumvit Road, Khlong Toei
29 Sukhumvit 3
2 Charoen Krung 30
444 Phaya Thai Rd
1091/336 Phetchaburi Rd
143
9/99 Charoen Krung Road Bangkoleam
28 Soi Charoen Krung 70
Suan Dusit University
54 Surawong Rd
33/1 Sathon Tai Rd
188 Silom Rd
865 Rama I Rd.
222 Silom Rd.
1091/343 Nikhom Makkasan Rd.
29/9 Soi Ngam Du Phli
99 Sukhumvit 6
99 Ratchadamri Rd.
23/34-35 Soi Sukhon 1
Ratchadamnoen Klang Rd.
31 Sathon Tai Rd
268 Petchburi Rd.
777 Mahachai Rd


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


Now let's serialize the process for all the neighborhoods

In [18]:
def getHotels(hoods, latitudes, longitudes, LIMIT=50, radius=500):
    
    hotels_list = []
    
    for hood, lat, lng in zip(hoods, latitudes, longitudes):
        print(hood, lat, lng)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&categoryId=4bf58dd8d48988d1fa931735&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        #categoryId=4bf58dd8d48988d1fa931735
            
        # make the GET request
        results = requests.get(url).json()
        hotel_response = results['response']['venues']
        
        for i in range(len(hotel_response)):
                hotels_list.append([
                hood, 
                lat, 
                lng,
                hotel_response[i]['id'], 
                hotel_response[i]['name'],
                hotel_response[i]['location']['lat'],
                hotel_response[i]['location']['lng']])
                
    columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue Id',
                  'Venue Name',
                  'Venue Latitude', 
                  'Venue Longitude']
    
    hotels = pd.DataFrame(columns=columns)
    for i in range(len(hotels_list)):
        hotels.loc[i] = [hotel for hotel in hotels_list[i]]
    
    return(hotels)

In [19]:
bkk_hotels = getHotels(hoods=df['Neighborhood'],
                       latitudes=df['latitude'],
                       longitudes=df['longitude']
                       )

Bang Kapi 13.765832999999999 100.647778
Bang Khae 13.696110999999998 100.409444
Bang Khen 13.873889000000002 100.596389
Bang Kho Laem 13.693332999999999 100.5025
Bang Khun Thian 13.660832999999998 100.435833
Bang Na 13.680081 100.5918
Bang Phlat 13.793889000000002 100.505
Bang Rak 13.730832999999999 100.524167
Bang Sue 13.809722 100.537222
Bangkok Noi 13.770867 100.467933
Bangkok Yai 13.722778 100.476389
Bueng Kum 13.785278 100.669167
Chatuchak 13.828610999999999 100.559722
Chom Thong 13.677222 100.48472199999999
Din Daeng 13.769722 100.552778
Don Mueang 13.913610999999998 100.589722
Dusit 13.776944 100.520556
Huai Khwang 13.776667000000002 100.579444
Khlong Sam Wa 13.859722 100.704167
Khlong San 13.730278 100.509722
Khlong Toei 13.708056 100.583889
Lak Si 13.8875 100.578889
Lat Krabang 13.722317000000002 100.75966899999999
Lat Phrao 13.803610999999998 100.6075
Min Buri 13.813889000000001 100.748056
Nong Chok 13.855556 100.8625
Nong Khaem 13.704722 100.348889
Pathum Wan 13.744942000000

Let's drop duplicates.

In [20]:
bkk_hotels = bkk_hotels.drop_duplicates(subset='Venue Id', keep='first')
bkk_hotels

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue Id,Venue Name,Venue Latitude,Venue Longitude
0,Bang Kapi,13.765833,100.647778,58ceb8c2ef469468c98052e2,The Pantip Hotel,13.766296,100.645489
1,Bang Kapi,13.765833,100.647778,4df90af2483bc58a610bc092,Peep Inn Hotel,13.763555,100.650416
2,Bang Kapi,13.765833,100.647778,518ff1f4498e36d160a6fd65,Mall Inn Hotel Lad Phrao 144,13.766454,100.643690
3,Bang Kapi,13.765833,100.647778,518ede52498e5714a820d674,Mall Inn Hotel ลาดพร้าว 144,13.766504,100.643558
4,Bang Khen,13.873889,100.596389,4f78ea7ce4b0b9643bc9b5ec,B8 Rooms Hotel,13.870801,100.593676
5,Bang Kho Laem,13.693333,100.502500,4e0fc9547d8bb178a8b67710,Gym & Swimming Pool @ Charoenkrung Place,13.693619,100.500752
6,Bang Kho Laem,13.693333,100.502500,513e3d7ce4b0bb777c93904a,โรงแรมแม่นํ้า รามาดา,13.693803,100.505441
7,Bang Kho Laem,13.693333,100.502500,4de4a78445dd180ae57a1422,Vivi's Room ,13.691904,100.502248
8,Bang Kho Laem,13.693333,100.502500,4d7364a2f7c38cfa8248b63d,Bangkok Marriot Resort and Spa,13.689827,100.502746
9,Bang Kho Laem,13.693333,100.502500,4e4efa48aeb70f12849a835d,Forum park,13.693858,100.507046


Which neighborhoods have the most hotels?

In [21]:
hotels_grouped = bkk_hotels.groupby('Neighborhood')['Venue Name'].count().sort_values(axis=0, ascending=False).to_frame().reset_index()
hotels_grouped = hotels_grouped.rename(columns={'Venue Name':'Hotels'})
hotels_grouped.head()

Unnamed: 0,Neighborhood,Hotels
0,Phra Nakhon,50
1,Ratchathewi,49
2,Bang Rak,45
3,Huai Khwang,21
4,Khlong San,21


Which neighborhoods have the least hotels?

In [22]:
hotels_grouped.tail()

Unnamed: 0,Neighborhood,Hotels
30,Khlong Sam Wa,1
31,Min Buri,1
32,Sai Mai,1
33,Thon Buri,1
34,Bang Khen,1


In order to understand why there is such a difference between these neighborhoods, let's try to understand what makes these datapoints so unique. How can we do it?

In [23]:
# Let's create a map with all the hotels we found.
# add markers to map
for lat, lng, neighbourhood, hotel_name in zip(bkk_hotels['Venue Latitude'], bkk_hotels['Venue Longitude'], bkk_hotels['Neighborhood'], bkk_hotels['Venue Name']):
    label = '{}, {}'.format(neighbourhood, hotel_name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(bkk)  
    
bkk

From the map above we can see that hotels are clustered in a few spots - mainly around the initial datapoints.

How can we check what are the similarities between the various areas?

We will adopt a simplified approach:

    * Find the most common traits among hotels
    * Find an area which has all these traits
    

    

In [44]:
def getNearbyVenues(names, latitudes, longitudes, LIMIT=100, radius=500):
    import time
    
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        time.sleep(2)
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        #print(url)
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        if results is None:
            pass
        else:
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
hotel_clusters = getNearbyVenues(names=bkk_hotels['Venue Name'],
                                   latitudes=bkk_hotels['Venue Latitude'],
                                   longitudes=bkk_hotels['Venue Longitude']
                                  )

The Pantip Hotel
Peep Inn Hotel
Mall Inn Hotel Lad Phrao 144
Mall Inn Hotel ลาดพร้าว 144
B8 Rooms Hotel
Gym & Swimming Pool @ Charoenkrung Place
โรงแรมแม่นํ้า รามาดา
Vivi's Room 
Bangkok Marriot Resort and Spa
Forum park
Park Village
Maxx hotel
Ban Chom Duan
Beyond Suite Hotel
โรงแรมบียอนด์ สวีท
โรงแรม ลักษ์ซัวรี่สวีท
โรงแรมลักษ์ชัวร์รี่
โรงแรมบียอนด์ เขตบางพลัด
Luxury Suite Hotel (โรงแรมลักษ์ชัวร์รี่ สวีท)
Resort Bangphlat
มาย รีสอร์ท แอท ริเวอร์
Simply Sleep Hostel
BED time
Amara Bangkok (อัมรา กรุงเทพ)
The Backpack Hostel
Money Room Apartment
Royal Asia Paradise Hotel Bangkok
Swimming Pool At Marriott Surawangse
Lod.D Bangkok
Bangkok Marriott Hotel The Surawongse
Mandarin Hotel (โรงแรมแมนดาริน)
Swimming Pool
iSanook Hostel (ไอสนุก โฮสเทล)
Bed By City Surawong - Patpong
La Residence Bangkok
Everyday Bangkok Hostel
HOFT Hostel
The Lobby Lounge Marriott Surawongse
Luxx Hotel
Amara Bangkok Pool Bar
Amara Bangkok
Red Planet Surawong
Grand Ballroom @ Mandarin Hotel
iSanook Residence
koko

In [47]:
hotel_results = hotel_clusters

In [48]:
hotel_results.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Pantip Hotel,13.766296,100.645489,Burger King (เบอร์เกอร์ คิง),13.767083,100.643001,Fast Food Restaurant
1,The Pantip Hotel,13.766296,100.645489,The Mall Bangkapi (เดอะมอลล์ บางกะปิ),13.766983,100.64248,Shopping Mall
2,The Pantip Hotel,13.766296,100.645489,H&M,13.765972,100.642471,Clothing Store
3,The Pantip Hotel,13.766296,100.645489,Starbucks (สตาร์บัคส์),13.767339,100.642136,Coffee Shop
4,The Pantip Hotel,13.766296,100.645489,UNIQLO (ยูนิโคล่) ユニクロ,13.767217,100.642717,Clothing Store


In [49]:
cat_overview = hotel_results.groupby('Venue Category')['Venue'].count()
cat_overview = cat_overview.sort_values(axis=0, ascending=False).to_frame().reset_index()
cat_overview

Unnamed: 0,Venue Category,Venue
0,Hotel,1135
1,Thai Restaurant,1029
2,Noodle House,993
3,Café,870
4,Coffee Shop,694
5,Convenience Store,615
6,Hostel,603
7,Bar,541
8,Asian Restaurant,539
9,Chinese Restaurant,520


It appears that hotels are close to... hotels! This reinforces the theory that hotels kind of cluster together. What we should do now is find a suitable location, and find a location that is close to venues of specific categories. In this way we can target a specific clientele: a Japanese restaurant and an Onsen venue will attract Japanese customers, for example. 