# Hotel’s surroundings

## Introduction
Although there are many travel search engines available on the internet, for example: Trivago, Hotwire, Kayak and others. Most of them allow the user to search for hotels based on travel dates and on price. However, sometimes the user is more interested in the features around the hotel and on its accessibility. In this case, the user, who might not be familiar with the location, have to read reviews to try to understand the area, which is a time-consuming activity. In some cases, the user might not have time to do that. This study aims at showing the applicability of a search engine that takes into account what the user expects from the surrounding area around a hotel. The aim of this report is not to produce a final guideline for an application, it will instead show a case study of the functionality of such tool. 
The case study will be based in Montreal. A hypothetical user is looking for a hotel in Montreal, a place it he’s never been for a business trip. Since his company is paying for the stay, he is not very concerned about the price, he is more concerned about the hotel surroundings. He is looking for the following three features:

- Easy access to public transit, specially by metro, because it is easier to ride it than to use buses when a person doesn’t know the city. He is willing to walk 500m to a metro station.
- He wants to have access to a grocery store because he hates hotel food and he always forgets to take important hygiene items. He is willing to walk 500m to a grocery store.
- He wants easy access to a post office, because he loves sending post cards when he travels. E is willing to walk 300m to a post office.

Although it is a single user case, the study will show that it is possible based on a person’s preference to choose a better location for a hotel.

This notebook has the methodology used to built this business case


## Methodology


1. Import Libraries

In [108]:
# Required Libraries
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

import urllib.request #Library that helps opening urls
from bs4 import BeautifulSoup #Library to scrap webpages

print('Libraries imported.')

Libraries imported.


2. Extract Data from Foursquares 

In [156]:
# Client and ID information
CLIENT_ID = 'PNOJAZLGPCGBAPS5UFIRNE0BOHET5Q5CEXXUW4ROO53WILCV'
CLIENT_SECRET = '42CVWRZQKOLHZF4AZF0NQ5TOKUPN2L4FYIPNU1FF54WSZWKD'
VERSION = '20180604'
hotel = '4bf58dd8d48988d1fa931735'
grocery = '4bf58dd8d48988d118951735'
metro = '4bf58dd8d48988d1fd931735'
poste = '4bf58dd8d48988d172941735'

#Montreal
lat = 45.5017
lng = -73.5673
radius = 8000
limit=1000

# create the API request URLfor each searh
url_h = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius,
        hotel,
        limit)

url_g = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius,
        grocery,
        limit)

url_m = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius,
        metro,
        limit)

url_p = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        8000,
        poste,
        limit)

# make the GET request fro each search
h_json = requests.get(url_h).json()
g_json = requests.get(url_g).json()
m_json = requests.get(url_m).json()
p_json = requests.get(url_p).json()

3. Organize JSON in dataframes

In [234]:
################   Hotel   #################

results_h = h_json["response"]['groups'][0]['items']
hotel_list = []

# return only relevant information for each nearby venue
hotel_list.append([(
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_h])


hotel_dt = pd.DataFrame([item for hotel_list in hotel_list for item in hotel_list])
hotel_dt.columns  = [
    'Name', 
    'VenueLatitude', 
    'VenueLongitude',
    'VenueCategory']

hotel_dt.head()

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory
0,Hôtel Le Germain Montréal - NOW OPEN,45.502524,-73.574383,Hotel
1,Fairmont The Queen Elizabeth,45.500451,-73.56887,Hotel
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel
3,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel
4,Hôtel Nelligan,45.504038,-73.554549,Hotel


In [159]:
################   Groceries  #################

results_g = g_json["response"]['groups'][0]['items']
grocery_list = []

# return only relevant information for each nearby venue
grocery_list.append([(
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_g])


grocery_dt = pd.DataFrame([item for grocery_list in grocery_list for item in grocery_list])
grocery_dt.columns  = [
    'Name', 
    'VenueLatitude', 
    'VenueLongitude',
    'VenueCategory']

grocery_dt.head()

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory
0,Provigo,45.496439,-73.570662,Grocery Store
1,Adonis,45.493788,-73.558723,Grocery Store
2,FOU D'ICI,45.507057,-73.569161,Gourmet Shop
3,Metro Plus,45.493179,-73.564574,Supermarket
4,Adonis,45.490523,-73.583151,Grocery Store


In [212]:
################   Post Office   #################

results_p = p_json["response"]['groups'][0]['items']
poste_list = []

# return only relevant information for each nearby venue
poste_list.append([(
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_p])


poste_dt = pd.DataFrame([item for poste_list in poste_list for item in poste_list])
poste_dt.columns  = [
    'Name', 
    'VenueLatitude', 
    'VenueLongitude',
    'VenueCategory']

poste_dt.head()

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory
0,Purolator,45.502926,-73.569026,Post Office
1,Postes Canada Post - Boutique Voilà,45.503798,-73.571202,Post Office
2,Postes Canada Post,45.501159,-73.567928,Post Office
3,Postes Canada,45.500611,-73.561832,Post Office
4,Poste Canada,45.504778,-73.558809,Post Office


In [195]:
################   Metro Stations   #################

results_m = m_json["response"]['groups'][0]['items']
metro_list = []

# return only relevant information for each nearby venue
metro_list.append([(
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']) for v in results_m])


metro_dt = pd.DataFrame([item for metro_list in metro_list for item in metro_list])
metro_dt.columns  = [
    'Name', 
    'VenueLatitude', 
    'VenueLongitude',
    'VenueCategory']

indexNames = metro_dt[metro_dt['VenueCategory'] != 'Metro Station' ].index
metro_dt.drop(indexNames , inplace=True)
metro_dt.reset_index(inplace=True),


metro_dt.head()

260

4. Overview geographical data with Folium

In [214]:
# create map
m = folium.Map(location=[lat, lng], zoom_start=14)

# add markers to the map
for lat, lon, poi in zip(hotel_dt['VenueLatitude'], hotel_dt['VenueLongitude'], hotel_dt['Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        ).add_to(m)
    
for lat, lon, poi in zip(metro_dt['VenueLatitude'], metro_dt['VenueLongitude'], metro_dt['Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        ).add_to(m)

for lat, lon, poi in zip(grocery_dt['VenueLatitude'], grocery_dt['VenueLongitude'], grocery_dt['Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        ).add_to(m)

for lat, lon, poi in zip(poste_dt['VenueLatitude'], poste_dt['VenueLongitude'], poste_dt['Name']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        ).add_to(m)
    
    
m

5. Build main dataframe 

Distances were calculated using Pythagoras. There are many problems with this approach such as:
- it is 'as the crow flies' distance and not the real distance that he will walk (maybe a Manhattan approach should have been used)
- It is a rough estimate because ith doesnt consider the earth curvature.

However, it was assumed that for the purpose of this study, Pythagoras was sufficient to exemplify the problem.

In [251]:
##############   Groceries  #######################

#Distances will be calculate using Pythagoras’ because we are only looking at small distances 
import math 
radius = 500
R = 6371 * 1000
groceryL = []

for rowH in range(len(hotel_dt)):
    n_grocery =0
    for rowG in range(len(grocery_dt)):
        dx = math.radians(grocery_dt['VenueLongitude'][rowG]-hotel_dt['VenueLongitude'][rowH])*math.cos(hotel_dt['VenueLatitude'][rowH]+grocery_dt['VenueLatitude'][rowG])
        dy = math.radians(hotel_dt['VenueLatitude'][rowH]-grocery_dt['VenueLatitude'][rowG])
        distance = math.sqrt(dx*dx+dy*dy)*R
        if distance < radius:
           n_grocery = n_grocery+1
    if(n_grocery > 0):
        groceryL.append(1)
    else:
        groceryL.append(0)
        
hotel_dt['n_groceries']=groceryL

#################  Post Office  ####################
posteL = []
radius = 300

for rowH in range(len(hotel_dt)):
    n_poste =0
    for rowP in range(len(poste_dt)):
        dx = math.radians(poste_dt['VenueLongitude'][rowP]-hotel_dt['VenueLongitude'][rowH])*math.cos(hotel_dt['VenueLatitude'][rowH]+poste_dt['VenueLatitude'][rowP])
        dy = math.radians(hotel_dt['VenueLatitude'][rowH]-poste_dt['VenueLatitude'][rowP])
        distance = math.sqrt(dx*dx+dy*dy)*R
        if distance < radius:
           n_poste = n_poste+1
    if(n_poste > 0):
        posteL.append(1)
    else:
        posteL.append(0)

hotel_dt['n_poste']=posteL

############################  Metro  ############################

metroL = []
radius = 500

for rowH in range(len(hotel_dt)):
    n_metro =0
    for rowM in range(len(metro_dt)):
        dx = math.radians(metro_dt['VenueLongitude'][rowM]-hotel_dt['VenueLongitude'][rowH])*math.cos(hotel_dt['VenueLatitude'][rowH]+metro_dt['VenueLatitude'][rowM])
        dy = math.radians(hotel_dt['VenueLatitude'][rowH]-metro_dt['VenueLatitude'][rowM])
        distance = math.sqrt(dx*dx+dy*dy)*R
        if distance < radius:
           n_metro = n_metro+1
    if(n_metro > 0):
        metroL.append(1)
    else:
        metroL.append(0)
hotel_dt['n_metro']=metroL

hotel_dt.head(20)


Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory,n_groceries,n_metro,n_poste,cluster,sum
0,Hôtel Le Germain Montréal - NOW OPEN,45.502524,-73.574383,Hotel,0,2,0,1,2
1,Fairmont The Queen Elizabeth,45.500451,-73.56887,Hotel,1,1,1,1,3
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,1,1,0,4,3
3,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel,0,1,0,1,1
4,Hôtel Nelligan,45.504038,-73.554549,Hotel,0,0,1,2,0
5,Le Westin Montreal,45.503606,-73.559566,Hotel,1,2,1,1,4
6,Hotel Gault,45.501589,-73.558355,Hotel,0,0,0,1,0
7,Courtyard Montreal Downtown,45.505269,-73.564198,Hotel,1,2,0,0,4
8,Le Petit Hôtel,45.503041,-73.555072,Hotel,0,0,1,2,0
9,LHotel Montreal,45.503215,-73.558738,Hotel,1,1,1,1,2


6. K-means to group hotels in 5  groups

In [253]:
from sklearn.cluster import KMeans 
from sklearn.preprocessing import StandardScaler

X = hotel_dt.values[:,4:]
X = np.nan_to_num(X)
Clus_dataSet = StandardScaler().fit_transform(X)

kclusters = 5

k_means = KMeans(init = "k-means++", n_clusters = kclusters, n_init = 30)
k_means.fit(X)
labels = k_means.labels_

hotel_dt["cluster"] = labels
hotel_dt.head(5)

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory,n_groceries,n_metro,n_poste,cluster,sum
0,Hôtel Le Germain Montréal - NOW OPEN,45.502524,-73.574383,Hotel,0,1,0,4,2
1,Fairmont The Queen Elizabeth,45.500451,-73.56887,Hotel,1,1,1,1,3
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,1,1,0,0,3
3,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel,0,1,0,4,1
4,Hôtel Nelligan,45.504038,-73.554549,Hotel,0,0,1,3,0


In [258]:
s=[]
for index, row in hotel_dt.iterrows():
    s.append(row.n_groceries + row.n_metro + row.n_poste)
hotel_dt['sum']=s
hotel_dt.head()

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory,n_groceries,n_metro,n_poste,cluster,sum
0,Hôtel Le Germain Montréal - NOW OPEN,45.502524,-73.574383,Hotel,0,1,0,4,1
1,Fairmont The Queen Elizabeth,45.500451,-73.56887,Hotel,1,1,1,1,3
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,1,1,0,0,2
3,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel,0,1,0,4,1
4,Hôtel Nelligan,45.504038,-73.554549,Hotel,0,0,1,3,1


7. Create grades based on clusters (clusters don't respect user preference, so we need to evaluate which ratings are best for the user)

In [267]:
summ = hotel_dt.groupby(['cluster']).mean()
summ['grade'] = ""
summ.sort_values(by=['sum'], inplace=True)

In [283]:
grade =[1,2,3,4,5]
summ['grade']=grade

In [285]:
summ

Unnamed: 0_level_0,VenueLatitude,VenueLongitude,n_groceries,n_metro,n_poste,sum,grade
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
3,45.50371,-73.550177,0.153846,0.0,0.461538,0.615385,1
4,45.50562,-73.57081,0.111111,1.0,0.111111,1.222222,2
0,45.501198,-73.571792,1.0,1.0,0.0,2.0,3
2,45.507127,-73.569949,1.0,1.0,0.1,2.1,4
1,45.506612,-73.565106,1.0,1.0,0.428571,2.428571,5


In [294]:
dict = pd.Series(summ.grade.values,index=summ.index).to_dict()
dict

{3: 1, 4: 2, 0: 3, 2: 4, 1: 5}

In [296]:
hotel_dt['grade']=hotel_dt['cluster'].map(dict)
hotel_dt

Unnamed: 0,Name,VenueLatitude,VenueLongitude,VenueCategory,n_groceries,n_metro,n_poste,cluster,sum,grade
0,Hôtel Le Germain Montréal - NOW OPEN,45.502524,-73.574383,Hotel,0,1,0,4,1,2
1,Fairmont The Queen Elizabeth,45.500451,-73.56887,Hotel,1,1,1,1,3,5
2,The Ritz-Carlton Montréal,45.500191,-73.578127,Hotel,1,1,0,0,2,3
3,Sofitel Montréal Le Carré Doré,45.501499,-73.577526,Hotel,0,1,0,4,1,2
4,Hôtel Nelligan,45.504038,-73.554549,Hotel,0,0,1,3,1,1
5,Le Westin Montreal,45.503606,-73.559566,Hotel,1,1,1,1,3,5
6,Hotel Gault,45.501589,-73.558355,Hotel,0,0,0,3,0,1
7,Courtyard Montreal Downtown,45.505269,-73.564198,Hotel,1,1,0,1,2,5
8,Le Petit Hôtel,45.503041,-73.555072,Hotel,0,0,1,3,1,1
9,LHotel Montreal,45.503215,-73.558738,Hotel,1,1,1,1,3,5


8. Create a colorcoded map, to show in green the best hotels and in red the worse hotels to the user

In [311]:
m2 = folium.Map(location=[lat, lng], zoom_start=13)

# set color scheme for the clusters
RdYlGn = cm.get_cmap('RdYlGn', kclusters)
colors_array = RdYlGn(np.linspace(0, 1, kclusters))

rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, poi, grade in zip(hotel_dt['VenueLatitude'], hotel_dt['VenueLongitude'], hotel_dt['Name'], hotel_dt['grade']):
    label = folium.Popup(str(poi) + ' Grade ' + str(grade), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[grade-1],
        fill=True,
        fill_color=rainbow[grade-1],
        ).add_to(m2)
m2