# Ranking hotels by the number of top attraction around them

**IBM Data Science Professional Certificate Specialization Capstone Project**

## Table of Contents

1. [Introduction](#Introduction)
    1. [Description of the Problem](#Description-of-the-Problem)
    2. [Description of the Data](#Description-of-the-Data)
2. Methodology
    1. [Overview](#Methodology)
    2. [The list of categories](#The-list-of-categories)
    3. [The most popular local attractions](#The-most-popular-local-attractions)
    4. [The most popular hotels](#The-most-popular-hotels)
    5. [The most popular local attracation around the most popular hotels](#The-most-popular-local-attracation-around-the-most-popular-hotels)
3. [Results](#Results)
4. [Discussion](#Results)
5. [Conclusion](#Conclusion)

## Introduction

### Description of the Problem

Finding the best hotel for a short stay might be tricky. On one hand, one wants to stay in a nice place, on the other, wishes to experience the local culture as much as possible.

Following report explores the feasibility of locating the best possible hotel within range of as many local attractions as possible. For the sake of the exercise the top 10 hotels in the city of Prague in Czech Republic is used.

**Target audience**:
- End users: with a bit of technical knowledge the approach can be utilized to identify the hotel in the city of ones choice.
- Investors: the approach can be implemented in any travel-related app

### Description of the Data

The Foursquare API is used to fetch all required data:
- The list of categories supported by Foursquare. (Endpoint: categories)
- The list of the most popular local attractions in a given city. (Endpoint: explore)
- The list of the most popular hotels amongst Foursqaure users. (Endpoint: search)
- The list of the most popular local attracation around the listed hotels. (Endpoint: search)

The categories IDs from the list of categories can be used in the further API calls to Foursquare to filter results correctly.

The list of the most popular local attractions in a given city is the first of two main datasets.

The list of the most popular hotels is joined with the lists of the most popular attractions around each of them. This creates the second of two main datasets. 

The main datasets are then used to to rank the hotels by the number of the top attractions around them.

## Methodology

Following section describes in the details the datasets utilized in the report

For fetching the data a python library "requests" is used. And for storing and manipulating the data - Pandas library.

Following global variables are defined for the HTTP requests

In [1]:
import os

CLIENT_ID = os.environ['CAPSTONE_FOURSQUARE_CLIENT_ID'] # Foursquare Client ID
CLIENT_SECRET = os.environ['CAPSTONE_FOURSQUARE_CLIENT_SECRET'] # Foursquare Client Secret
VERSION = '20180605' # Foursquare API version
DEFAULT_NEAR = 'Prague, Czech Republic' # City of interest

In [2]:
import requests
import pandas as pd

### The categories

The endpoint _categories_ returns a hierarchical list of Foursquare categories, full API reference can be found here https://developer.foursquare.com/docs/api/venues/categories

The endpoint returns a JSON response and each category level is stored in an array named _category_.

The tree structure is converted to the table format containing the ID for category, the full path, and each level name.

**Fetching and parsing the data**

In [3]:
data = []

url = 'https://api.foursquare.com/v2/venues/categories?client_id={}&client_secret={}&v={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION
)

response = requests.get(url).json()['response']

for level1 in response['categories']:
    data.append((
        level1['id'],
        level1['name'],
        level1['name'],
        '',
        ''
    ))
    
    for level2 in level1['categories']:
        data.append((
            level2['id'],
            level1['name'] + ' - ' + level2['name'],
            level1['name'],
            level2['name'],
            ''
        ))
        
        for level3 in level2['categories']:
            data.append((
                level3['id'],
                level1['name'] + ' - ' + level2['name'] + ' - ' + level3['name'],
                level1['name'],
                level2['name'],
                level3['name']
            ))
            
categories = pd.DataFrame(data, columns = [
    'Category Id', 
    'Category Path', 
    'Category Level 1', 
    'Category Level 2', 
    'Category Level 3'
])

In [4]:
print("Shape of the dataframe", categories.shape)

Shape of the dataframe (833, 5)


In [5]:
categories.head(10)

Unnamed: 0,Category Id,Category Path,Category Level 1,Category Level 2,Category Level 3
0,4d4b7104d754a06370d81259,Arts & Entertainment,Arts & Entertainment,,
1,56aa371be4b08b9a8d5734db,Arts & Entertainment - Amphitheater,Arts & Entertainment,Amphitheater,
2,4fceea171983d5d06c3e9823,Arts & Entertainment - Aquarium,Arts & Entertainment,Aquarium,
3,4bf58dd8d48988d1e1931735,Arts & Entertainment - Arcade,Arts & Entertainment,Arcade,
4,4bf58dd8d48988d1e2931735,Arts & Entertainment - Art Gallery,Arts & Entertainment,Art Gallery,
5,4bf58dd8d48988d1e4931735,Arts & Entertainment - Bowling Alley,Arts & Entertainment,Bowling Alley,
6,4bf58dd8d48988d17c941735,Arts & Entertainment - Casino,Arts & Entertainment,Casino,
7,52e81612bcbc57f1066b79e7,Arts & Entertainment - Circus,Arts & Entertainment,Circus,
8,4bf58dd8d48988d18e941735,Arts & Entertainment - Comedy Club,Arts & Entertainment,Comedy Club,
9,5032792091d4c4b30a586d5c,Arts & Entertainment - Concert Hall,Arts & Entertainment,Concert Hall,


Quick inspection of the data shows that there are 833 categories.

**TODO: Add some label**

TODO: Add something that no other analysis is being made as it isn't that important.

Let's try to identify the category ID for hotels

In [6]:
categories[categories['Category Path'].str.contains('Hotel')]

Unnamed: 0,Category Id,Category Path,Category Level 1,Category Level 2,Category Level 3
384,4bf58dd8d48988d1d5941735,Nightlife Spot - Bar - Hotel Bar,Nightlife Spot,Bar,Hotel Bar
802,4bf58dd8d48988d1fa931735,Travel & Transport - Hotel,Travel & Transport,Hotel,
803,4bf58dd8d48988d1f8931735,Travel & Transport - Hotel - Bed & Breakfast,Travel & Transport,Hotel,Bed & Breakfast
804,4f4530a74b9074f6e4fb0100,Travel & Transport - Hotel - Boarding House,Travel & Transport,Hotel,Boarding House
805,4bf58dd8d48988d1ee931735,Travel & Transport - Hotel - Hostel,Travel & Transport,Hotel,Hostel
806,4bf58dd8d48988d132951735,Travel & Transport - Hotel - Hotel Pool,Travel & Transport,Hotel,Hotel Pool
807,5bae9231bedf3950379f89cb,Travel & Transport - Hotel - Inn,Travel & Transport,Hotel,Inn
808,4bf58dd8d48988d1fb931735,Travel & Transport - Hotel - Motel,Travel & Transport,Hotel,Motel
809,4bf58dd8d48988d12f951735,Travel & Transport - Hotel - Resort,Travel & Transport,Hotel,Resort
810,56aa371be4b08b9a8d5734e1,Travel & Transport - Hotel - Vacation Rental,Travel & Transport,Hotel,Vacation Rental


As the search results shows there are multiple matching tuples: there is a seperate category for hotel bar and multiple different types of hotels and their facilities.

However, the entry in the 2nd row is the one the most interesting as it is the general _Hotel_ category

In [7]:
categories.iloc[[802]]

Unnamed: 0,Category Id,Category Path,Category Level 1,Category Level 2,Category Level 3
802,4bf58dd8d48988d1fa931735,Travel & Transport - Hotel,Travel & Transport,Hotel,


### The most popular local attractions

The endpoint _explore_ is used to fetch the data, full API reference cen be found here: https://developer.foursquare.com/docs/api/venues/explore

The endpoint returns a JSON response with object _groups_ containing an array _items_ with the list of recommended places.

The data which is the most interesting is:
- Venue ID
- Venue Name
- Venue Location: Latitude and Longitude
- Venue Category

This list is the list of *the most popular local attracation in the city*.

**Fetching and parsing the data**

In [8]:
NUMBER_OF_MOST_POPULAR_ATTRACTIONS=100

url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&limit={}&near={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    NUMBER_OF_MOST_POPULAR_ATTRACTIONS,
    DEFAULT_NEAR
)

results = requests.get(url).json()["response"]['groups'][0]['items']
        
data = [(
    v['venue']['id'],
    v['venue']['name'], 
    v['venue']['location']['lat'], 
    v['venue']['location']['lng'],  
    v['venue']['categories'][0]['name']
) for v in results]

most_popular_attractions = pd.DataFrame(data, columns = [
    'Venue Id', 
    'Venue Name', 
    'Venue Latitude', 
    'Venue Longitude', 
    'Venue Category'
])

In [9]:
print("Shape of the dataframe", most_popular_attractions.shape)

Shape of the dataframe (100, 5)


In [10]:
most_popular_attractions.head(10)

Unnamed: 0,Venue Id,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,4adcda9ff964a520654d21e3,Stromovka,50.105098,14.42184,Park
1,4c5ed6b67735c9b617ca9272,Havlíčkovy sady (Grébovka),50.068765,14.443674,Park
2,4b78047af964a5203bb22ee3,Letenské sady,50.096275,14.414406,Park
3,4affd3cdf964a5201c3a22e3,Vyšehrad,50.064095,14.419387,Castle
4,51c23e25498e8ef76e18c2c6,Vyhlídka Riegrovy sady,50.079692,14.440136,Scenic Lookout
5,5311ae7311d2b14c76832d24,Naše maso,50.090763,14.42696,Butcher
6,4b464feef964a520451d26e3,Kampa,50.083981,14.407711,Park
7,4bd473f46798ef3be09c618d,Vyhlídková cesta,50.085683,14.391567,Scenic Lookout
8,539842a2498ee08ed9e0b8ce,Mozzarellart,50.065568,14.439399,Cheese Shop
9,4adcdaa0f964a5209a4d21e3,Riegrovy sady,50.080498,14.441271,Park


Quick inspection of the created dataframe shows that the top 100 local attractions were fetched correctly. Each record contains the ID, the attraction Name and Category, and the GPS coordinates.

### The most popular hotels

The endpoint _search_ is used to fetch the data, full API reference cen be found here: https://developer.foursquare.com/docs/api/venues/search

The endpoint returns a JSON response with an array _venues_.

The data which is the most interesting is:
- Hotel ID
- Hotel Name
- Hotel Location: Latitude and Longitude

This list is the list of *the most popular hotels*.

**Fetching and parsing the data**

In [11]:
HOTEL_CATEGORY_ID = '4bf58dd8d48988d1fa931735'
NUMBER_OF_HOTELS = 25

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&limit={}&near={}&categoryId={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    NUMBER_OF_HOTELS,
    DEFAULT_NEAR,
    HOTEL_CATEGORY_ID
)

results = requests.get(url).json()["response"]['venues']
        
data = [(
    v['id'],
    v['name'],
    v['location']['lat'],
    v['location']['lng']
) for v in results]

most_popular_hotels = pd.DataFrame(data, columns = [
  'Hotel Id', 
  'Hotel Name', 
  'Hotel Latitude', 
  'Hotel Longitude'
])

In [12]:
most_popular_hotels

Unnamed: 0,Hotel Id,Hotel Name,Hotel Latitude,Hotel Longitude
0,4adcda9af964a520544c21e3,InterContinental Prague,50.091498,14.41866
1,5b486ba2666116002c231866,Mama Shelter,50.102394,14.431907
2,4adcda9af964a520104c21e3,Hotel International Prague,50.109227,14.393567
3,4adcda9af964a5202b4c21e3,Krystal Praha,50.093888,14.34117
4,4adcda9af964a520204c21e3,Occidental Praha,50.043463,14.439222
5,4bd0fbac20cd9960319b2e9e,ibis Praha Malá Strana,50.072277,14.400727
6,56e3fbfe498e069c58721350,Marriott Prague,50.088136,14.431245
7,4bcd56c40687ef3b4fcee0cc,Grand Majestic Plaza,50.090354,14.430538
8,4adcda9bf964a5206d4c21e3,Hotel Josef,50.089958,14.425959
9,4b801a9ef964a5208b5230e3,Hotel Juno,50.071841,14.499934


Inspection of the data frame shows that 10 top hotels were fetched correctly and the hotels IDs, names and locations are available.

### The most popular local attracation around the most popular hotels

The endpoint _explore_ is used to fetch the data, full API reference cen be found here: https://developer.foursquare.com/docs/api/venues/explore

This endpoint has been previously described for *the most popular attractions*. The difference here is that the HTTP call is made repeatedly for each hotel of *the most popular hotels* list.

**Fetching and parsing the data**

In [13]:
NUMBER_OF_ATTRACTIONS_PER_HOTEL=25
ATTRACTIONS_RADIUS=5000

base_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&limit={}&radius={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    NUMBER_OF_ATTRACTIONS_PER_HOTEL,
    ATTRACTIONS_RADIUS
)

data = []

for _, hotel in most_popular_hotels.iterrows():
    url = base_url + '&ll={},{}'.format(
        hotel['Hotel Latitude'],
        hotel['Hotel Longitude']
    )
    
    results = requests.get(url).json()["response"]['groups'][0]['items']

    data.append([(
        hotel['Hotel Id'],
        hotel['Hotel Name'],
        v['venue']['id'],
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']        
    ) for v in results])
    
most_popular_attracations_around_most_popular_hotels = pd.DataFrame([item for data in data for item in data], columns = [
    'Hotel Id', 
    'Hotel Name', 
    'Venue Id', 
    'Venue Name', 
    'Venue Latitude', 
    'Venue Longitude', 
    'Venue Category'
])

In [14]:
most_popular_attracations_around_most_popular_hotels.shape

(625, 7)

In [15]:
most_popular_attracations_around_most_popular_hotels.head(10)

Unnamed: 0,Hotel Id,Hotel Name,Venue Id,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,4adcda9af964a520544c21e3,InterContinental Prague,56aa3cca498e65a06469bda1,COS,50.091086,14.41842,Boutique
1,4adcda9af964a520544c21e3,InterContinental Prague,4db6ed13cda1c57c828d673c,Galerie Rudolfinum,50.090029,14.415059,Art Gallery
2,4adcda9af964a520544c21e3,InterContinental Prague,4ffc4783e4b068da70934f42,Ingredients,50.088439,14.419088,Cosmetics Shop
3,4adcda9af964a520544c21e3,InterContinental Prague,54f7836f498ec71694f873dc,L'Fleur Bar,50.089745,14.421682,Cocktail Bar
4,4adcda9af964a520544c21e3,InterContinental Prague,4adcda9bf964a520af4c21e3,Bugsy's Bar,50.088948,14.419832,Cocktail Bar
5,4adcda9af964a520544c21e3,InterContinental Prague,4b51eb43f964a520505b27e3,Yami Sushi House,50.089428,14.423481,Sushi Restaurant
6,4adcda9af964a520544c21e3,InterContinental Prague,4bbdfa09f57ba593a6b3aeb9,Staroměstské náměstí | Old Town Square (Starom...,50.087493,14.421091,Plaza
7,4adcda9af964a520544c21e3,InterContinental Prague,555375e2498e72fcb27e402a,Stalin Containall,50.094588,14.415922,Beer Garden
8,4adcda9af964a520544c21e3,InterContinental Prague,4cdfd87ef8cdb1f7e9119412,Stalin Skate Plaza,50.095013,14.415987,Skate Park
9,4adcda9af964a520544c21e3,InterContinental Prague,4adcda9af964a520234c21e3,Hotel Maximilian,50.09107,14.424626,Hotel


In [16]:
most_popular_attracations_around_most_popular_hotels.tail(10)

Unnamed: 0,Hotel Id,Hotel Name,Venue Id,Venue Name,Venue Latitude,Venue Longitude,Venue Category
615,4bdebff56316d13afbb7a011,ibis Praha Old Town,52d501e3498e0793fa053add,Sisters,50.090752,14.426957,Sandwich Place
616,4bdebff56316d13afbb7a011,ibis Praha Old Town,521dccba498e314c4c758d38,Pot au Feu,50.089119,14.426156,French Restaurant
617,4bdebff56316d13afbb7a011,ibis Praha Old Town,56817ae2498e729ea7c59838,Hotel BoHo,50.086021,14.429245,Hotel
618,4bdebff56316d13afbb7a011,ibis Praha Old Town,554219d8498e5c23e728d00e,Meat & Greet,50.085919,14.429909,Burger Joint
619,4bdebff56316d13afbb7a011,ibis Praha Old Town,5662c71d498e4003dfb5de01,onesip coffee,50.091269,14.425565,Coffee Shop
620,4bdebff56316d13afbb7a011,ibis Praha Old Town,4adcda9af964a520234c21e3,Hotel Maximilian,50.09107,14.424626,Hotel
621,4bdebff56316d13afbb7a011,ibis Praha Old Town,4b51eb43f964a520505b27e3,Yami Sushi House,50.089428,14.423481,Sushi Restaurant
622,4bdebff56316d13afbb7a011,ibis Praha Old Town,4e66411318a8ce02fe7cea46,Whiskeria,50.084959,14.429694,Whisky Bar
623,4bdebff56316d13afbb7a011,ibis Praha Old Town,5718a7f7cd10d897c72d6126,Hamleys,50.085347,14.425668,Toy / Game Store
624,4bdebff56316d13afbb7a011,ibis Praha Old Town,58b9236d06f1a31b2627d3ae,Kantýna,50.083593,14.429031,Steakhouse


Quick look at the data shows that 1000 records are correctly present, as 10 hotels times 100 attractions gives that number. There is hotel and attracation details available in each tuple.

## Results

TODO:
- Stats for venues categories
- Map of top 100 venues
- Map of hotels
- Calculate how many top venus is near each hotel
- That ^ + split on each category?
- Map everything?

Prague:
50.08804,14.42076

In [17]:
import folium

In [18]:
map1 = folium.Map(location=[50.08804,14.42076], zoom_start=13)

for _, hotel in most_popular_hotels.iterrows():
    folium.CircleMarker(
        [hotel['Hotel Latitude'], hotel['Hotel Longitude']],
        radius=5,
        color='blue',
        popup=folium.Popup(hotel['Hotel Name'])
    ).add_to(map1)

In [19]:
map1

In [20]:
map2 = folium.Map(location=[50.08804,14.42076], zoom_start=13)

for _, venue in most_popular_attractions.iterrows():
    folium.CircleMarker(
        [venue['Venue Latitude'], venue['Venue Longitude']],
        radius=4,
        color='black',
        popup=folium.Popup(venue['Venue Name'])
    ).add_to(map2)

In [21]:
map2

In [22]:
map3 = folium.Map(location=[50.08804,14.42076], zoom_start=13)

for _, hotel in most_popular_hotels.iterrows():
    folium.CircleMarker(
        [hotel['Hotel Latitude'], hotel['Hotel Longitude']],
        radius=5,
        color='blue',
        popup=folium.Popup(hotel['Hotel Name'])
    ).add_to(map3)
    
    folium.Circle(
        [hotel['Hotel Latitude'], hotel['Hotel Longitude']],
        radius=500,
        color='blue',
        opacity=0.25,
        fill=True,
        fill_opacity=0.25
    ).add_to(map3)
    
    for _, venue in most_popular_attracations_around_most_popular_hotels[most_popular_attracations_around_most_popular_hotels['Hotel Id'] == hotel['Hotel Id']].iterrows():
        folium.CircleMarker(
            [venue['Venue Latitude'], venue['Venue Longitude']],
            radius=1,
            color='red',
            popup=folium.Popup(venue['Venue Name'])
        ).add_to(map3) 
    

In [23]:
map3

In [24]:
map4 = folium.Map(location=[50.08804,14.42076], zoom_start=13)

for _, hotel in most_popular_hotels.iterrows():
    folium.CircleMarker(
        [hotel['Hotel Latitude'], hotel['Hotel Longitude']],
        radius=5,
        color='blue',
        popup=folium.Popup(hotel['Hotel Name'])
    ).add_to(map4)
    
    folium.Circle(
        [hotel['Hotel Latitude'], hotel['Hotel Longitude']],
        radius=500,
        color='blue',
        opacity=0.25,
        fill=True,
        fill_opacity=0.25
    ).add_to(map4)
    
    for _, venue in most_popular_attracations_around_most_popular_hotels[most_popular_attracations_around_most_popular_hotels['Hotel Id'] == hotel['Hotel Id']].iterrows():
        folium.CircleMarker(
            [venue['Venue Latitude'], venue['Venue Longitude']],
            radius=1,
            color='red',
            popup=folium.Popup(venue['Venue Name'])
        ).add_to(map4) 
   
for _, venue in most_popular_attractions.iterrows():
    folium.CircleMarker(
        [venue['Venue Latitude'], venue['Venue Longitude']],
        radius=4,
        color='black',
        popup=folium.Popup(venue['Venue Name'])
    ).add_to(map4)

In [25]:
map4

## Discussion

## Conclussion