# W2PT - Welcome to Portugal!
A quick guide towards your next vacation destination.
#### by fil.coutinho

## 1. Business Problem
**Tourism is a big deal in Portugal.** Elected as the best European destination for three years in a row (2017 to 2019), it is a great place to visit and to stay.\
**But where should you go?**


This project intends to leverage Foursquare ratings on accommodation for the most popular locations of Portugal to clearly rate and classify the best destinations and help to guide you and other tourists towards the most preferred places.

On the other hand, the rating of different locations is especially relevant for new hotel businesses or tourism agencies, allowing them to invest on the tourism hotspots and focus on improving the touristic experience where it is currently less attractive.

## 2. Data
**We will import data from *Foursquare*** regarding the main accomodation venues and associated ratings for the different district capitals in Portugal (the three main destinations with airports in **bold**: 
1. Aveiro
2. Beja
3. Braga
4. Bragança
5. Castelo Branco
6. Coimbra
7. Évora
8. **Faro**
9. Guarda
10. Leiria
11. **Lisboa**
12. Portalegre
13. **Porto**
14. Santarém
15. Setúbal
16. Viana do Castelo
17. Vila Real
18. Viseu

This will give us a good overview of the most preferred regions and the most popular. Other variables like number of comments and photos can also serve as a measure of popularity. We will associate these popularity metrics to the coordinates of each district capital.

We will then **rank the different locations and cluster them in 5 regions**, based on their popularity. 
**The output will be a clear recommendation of where you should place your bet for your next vacation!**

## 3. Extract Data from Foursquare

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

from pandas.io.json import json_normalize

!pip install folium
import folium # plotting library



In [4]:
# The code was removed by Watson Studio for sharing.

Credentials Loaded


### 3.1 Single-Case Example

We will attempt to retrieve a list of popular venues for a specific city and count the number of likes for each of these venues. The sum of likes will allow us to create a scoring mechanism for each city (touristic spot). 

In [5]:
#Get the 30 most popular venues for the city Porto (sorted by popularity)

venue_id = '4bf58dd8d48988d1fa931735' #Foursquare Category ID for hotels
LOCATION = 'Porto, Portugal'
radius = 1000
LIMIT = 30

api_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&near={}&v={}&categoryId={}&radius={}&limit={}&sortByPopularity=1'.format(
    CLIENT_ID, CLIENT_SECRET, LOCATION, VERSION, venue_id, radius, LIMIT)

In [6]:
results = requests.get(api_url).json()
results

{'meta': {'code': 200, 'requestId': '5eef27c67828ae001bc4e042'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'geocode': {'what': '',
   'where': 'porto portugal',
   'center': {'lat': 41.14961, 'lng': -8.61099},
   'displayString': 'Porto, Portugal',
   'cc': 'PT',
   'geometry': {'bounds': {'ne': {'lat': 41.18485249999986,
      'lng': -8.556522500000199},
     'sw': {'lat': 41.13876354754274, 'lng': -8.691275103633359}}},
   'slug': 'porto-portugal',
   'longId': '72057594040663879'},
  'headerLocation': 'Porto',
  'headerFullLocation': 'Porto',
  'headerLocationGranularity': 'city',
  'query': 'hotel',
  'totalResults': 124,
  'suggestedBounds': {'ne': {'lat': 41.1899576356188,
    'lng': -8.580240784597008},
   'sw': {'lat': 41.13812991126631, 'lng': -8.653197736795976}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary'

In [7]:
# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']

# tranform venues into a dataframe
df_venues = json_normalize(venues)
df_venues.head()

Unnamed: 0,flags.outsideRadius,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,...,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id
0,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4cbc54f07a5d9eb0ed5b31e9-0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,PT,Porto,Portugal,...,"[R. Tenente Valadim 146, 4100-476 Porto, Portu...","[{'label': 'display', 'lat': 41.16106404315841...",41.161064,-8.640411,4100-476,Porto,Sheraton Porto Hotel & Spa,0,[],
1,,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4bcc5b3aaeaaeee151ec3d6d-1,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",PT,Porto,Portugal,...,"[R. Guedes de Azevedo, 179, Porto, Portugal]","[{'label': 'display', 'lat': 41.15218346930591...",41.152183,-8.607009,,Porto,Hotel Dom Henrique,0,[],
2,,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4e15cb9bc65b14b6ca369fa4-2,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4e15cb9bc65b14b6ca369fa4,"Praça Da Liberdade, 25",PT,Porto,Portugal,...,"[Praça Da Liberdade, 25, 4000-322 Porto, Portu...","[{'label': 'display', 'lat': 41.1458686624337,...",41.145869,-8.61154,4000-322,Porto,InterContinental,0,[],
3,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4b9572c0f964a52025a334e3-3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4b9572c0f964a52025a334e3,"Av. Boavista, 1269",PT,Porto,Portugal,...,"[Av. Boavista, 1269, 4100-130 Porto, Portugal]","[{'label': 'display', 'lat': 41.15940787208058...",41.159408,-8.638681,4100-130,Porto,Hotel Porto Palácio,0,[],
4,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4da69f026e81162ae782263e-4,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4da69f026e81162ae782263e,"R. Maria Feliciana, 100",PT,Matosinhos,Portugal,...,"[R. Maria Feliciana, 100, 4465-283 Matosinhos,...","[{'label': 'display', 'lat': 41.18760182996641...",41.187602,-8.597501,4465-283,Porto,Axis Porto Business & SPA Hotel,0,[],


In [8]:
#Clean unnecessary columns

df_venues.drop(['flags.outsideRadius', 'reasons.count', 'reasons.items', 'referralId',
       'venue.categories','venue.location.cc','venue.location.country','venue.location.formattedAddress',
       'venue.location.crossStreet','venue.location.labeledLatLngs','venue.location.postalCode','venue.location.state','venue.photos.count',
       'venue.photos.groups', 'venue.venuePage.id'], axis=1, inplace=True)
df_venues.head()

Unnamed: 0,venue.id,venue.location.address,venue.location.city,venue.location.lat,venue.location.lng,venue.name
0,4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,Porto,41.161064,-8.640411,Sheraton Porto Hotel & Spa
1,4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",Porto,41.152183,-8.607009,Hotel Dom Henrique
2,4e15cb9bc65b14b6ca369fa4,"Praça Da Liberdade, 25",Porto,41.145869,-8.61154,InterContinental
3,4b9572c0f964a52025a334e3,"Av. Boavista, 1269",Porto,41.159408,-8.638681,Hotel Porto Palácio
4,4da69f026e81162ae782263e,"R. Maria Feliciana, 100",Matosinhos,41.187602,-8.597501,Axis Porto Business & SPA Hotel


In [9]:
#Get the number of likes for a specific venue (let's test with the first one).

venue_id = '4cbc54f07a5d9eb0ed5b31e9'
api_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(
    venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

results = requests.get(api_url).json()
results

{'meta': {'code': 200, 'requestId': '5eef2813949393001baa072f'},
 'response': {'likes': {'count': 265,
   'summary': '265 Likes',
   'items': [{'id': '26563642',
     'firstName': 'Patrick',
     'lastName': 'J',
     'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
      'suffix': '/2KRD0KMXOJYSOB3J.jpg'}},
    {'id': '565771369',
     'firstName': 'Gokcen',
     'lastName': 'M',
     'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
      'suffix': '/blank_girl.png',
      'default': True}},
    {'id': '864699',
     'firstName': 'Antonio',
     'lastName': 'M',
     'photo': {'prefix': 'https://fastly.4sqi.net/img/user/',
      'suffix': '/TCLJ0HSMXHB4H0ZH.jpg'}}]}}}

In [10]:
# assign relevant part of JSON to venues
likes = results['response']['likes']['count']
likes


265

### 3.2 Associate likes to venues in a specific city
We now need to merge information from the venue list with the API call to retrieve likes for a specific venue.

In [11]:
#Cycle through the venues dataframe

df_venues['likes'] = 0

for i,row in df_venues.iterrows():
    
    venue_id = df_venues.loc[i,'venue.id']
    
    api_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(
    venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    results = requests.get(api_url).json()
    likes = results['response']['likes']['count']
    
    df_venues.at[i, 'likes'] = likes
    
df_venues.head()
    

Unnamed: 0,venue.id,venue.location.address,venue.location.city,venue.location.lat,venue.location.lng,venue.name,likes
0,4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,Porto,41.161064,-8.640411,Sheraton Porto Hotel & Spa,265
1,4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",Porto,41.152183,-8.607009,Hotel Dom Henrique,73
2,4e15cb9bc65b14b6ca369fa4,"Praça Da Liberdade, 25",Porto,41.145869,-8.61154,InterContinental,111
3,4b9572c0f964a52025a334e3,"Av. Boavista, 1269",Porto,41.159408,-8.638681,Hotel Porto Palácio,123
4,4da69f026e81162ae782263e,"R. Maria Feliciana, 100",Matosinhos,41.187602,-8.597501,Axis Porto Business & SPA Hotel,59


### 3.3 Repeat procedure for all district capitals
Finally, we will expand the cycle to obtain a complete list with all venues for the different district capitals with the number of likes associated.

In [12]:
# List of cities to be searched (district capitals)

city_list = ['Aveiro', 'Beja', 'Braga', 'Bragança', 'Castelo Branco', 'Coimbra', 'Évora',
             'Faro', 'Guarda', 'Leiria', 'Lisboa', 'Portalegre', 'Porto', 'Santarém',
             'Setúbal', 'Viana do Castelo', 'Vila Real', 'Viseu']

len(city_list)

18

In [13]:
i=0
search_string = []

for i in range(len(city_list)):
    
    search_string.append(city_list[i] + ', Portugal')

search_string

['Aveiro, Portugal',
 'Beja, Portugal',
 'Braga, Portugal',
 'Bragança, Portugal',
 'Castelo Branco, Portugal',
 'Coimbra, Portugal',
 'Évora, Portugal',
 'Faro, Portugal',
 'Guarda, Portugal',
 'Leiria, Portugal',
 'Lisboa, Portugal',
 'Portalegre, Portugal',
 'Porto, Portugal',
 'Santarém, Portugal',
 'Setúbal, Portugal',
 'Viana do Castelo, Portugal',
 'Vila Real, Portugal',
 'Viseu, Portugal']

In [14]:
venue_id = '4bf58dd8d48988d1fa931735' #Foursquare Category ID for hotels
LOCATION = 'Porto, Portugal'
radius = 1000
LIMIT = 30

api_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&near={}&v={}&categoryId={}&radius={}&limit={}&sortByPopularity=1'.format(
    CLIENT_ID, CLIENT_SECRET, LOCATION, VERSION, venue_id, radius, LIMIT)

results = requests.get(api_url).json()
    
venues = results['response']['groups'][0]['items']

df_venues = json_normalize(venues)

df_result = df_venues[0:0]
df_result

Unnamed: 0,flags.outsideRadius,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,...,venue.location.formattedAddress,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id


In [15]:
for i in range(len(search_string)):
    
    LOCATION = search_string[i] #cycle through the different cities and then use the same method as before
    
    api_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&near={}&v={}&categoryId={}&radius={}&limit={}&sortByPopularity=1'.format(
    CLIENT_ID, CLIENT_SECRET, LOCATION, VERSION, venue_id, radius, LIMIT)
    results = requests.get(api_url).json()
    
    # assign relevant part of JSON to venues
    venues = results['response']['groups'][0]['items']
    
    # tranform venues into a dataframe
    df_city = json_normalize(venues)
    
    #Clean unnecessary columns
    
    df_city['venue.location.city'] = city_list[i] #ensure that city = district capital
    df_result = df_result.append(df_city, ignore_index=True)
    
    print(search_string[i], '...DONE')
    
df_result.head()
    

Aveiro, Portugal ...DONE
Beja, Portugal ...DONE


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  sort=sort)


Braga, Portugal ...DONE
Bragança, Portugal ...DONE
Castelo Branco, Portugal ...DONE
Coimbra, Portugal ...DONE
Évora, Portugal ...DONE
Faro, Portugal ...DONE
Guarda, Portugal ...DONE
Leiria, Portugal ...DONE
Lisboa, Portugal ...DONE
Portalegre, Portugal ...DONE
Porto, Portugal ...DONE
Santarém, Portugal ...DONE
Setúbal, Portugal ...DONE
Viana do Castelo, Portugal ...DONE
Vila Real, Portugal ...DONE
Viseu, Portugal ...DONE


Unnamed: 0,flags.outsideRadius,reasons.count,reasons.items,referralId,venue.categories,venue.id,venue.location.address,venue.location.cc,venue.location.city,venue.location.country,...,venue.location.labeledLatLngs,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.location.postalCode,venue.location.state,venue.name,venue.photos.count,venue.photos.groups,venue.venuePage.id
0,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4bd88ada2ecdce72d0cfd0f2-0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4bd88ada2ecdce72d0cfd0f2,Lugar da Ponte,PT,Aveiro,Portugal,...,"[{'label': 'display', 'lat': 41.18965405538785...",41.189654,-7.543781,,5085-034,Vila Real,Vintage House Hotel,0,[],
1,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-53aaf061498e0f4d35e1423b-1,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",53aaf061498e0f4d35e1423b,Av. Brasil,PT,Aveiro,Portugal,...,"[{'label': 'display', 'lat': 40.15601098173873...",40.156011,-8.867282,,3080-323,Coimbra,Eurostars Oásis Plaza Hotel,0,[],
2,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4cd0483df6378cfa9b2db4d6-2,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4cd0483df6378cfa9b2db4d6,Rotunda da Exponor,PT,Aveiro,Portugal,...,"[{'label': 'display', 'lat': 41.20250225, 'lng...",41.202502,-8.692774,,4450-801,Porto,Tryp Porto Expo Hotel,0,[],506987548.0
3,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4cbc54f07a5d9eb0ed5b31e9-3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,PT,Aveiro,Portugal,...,"[{'label': 'display', 'lat': 41.16106404315841...",41.161064,-8.640411,,4100-476,Porto,Sheraton Porto Hotel & Spa,0,[],
4,True,0,"[{'summary': 'This spot is popular', 'type': '...",e-0-4bcc5b3aaeaaeee151ec3d6d-4,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",PT,Aveiro,Portugal,...,"[{'label': 'display', 'lat': 41.15218346930591...",41.152183,-8.607009,,,Porto,Hotel Dom Henrique,0,[],


In [45]:
#Clean unnecessary columns
df_result.drop(['flags.outsideRadius', 'reasons.count', 'reasons.items', 'referralId',
       'venue.categories','venue.location.cc','venue.location.country','venue.location.formattedAddress',
       'venue.location.crossStreet','venue.location.labeledLatLngs','venue.location.postalCode','venue.location.state','venue.photos.count',
       'venue.photos.groups', 'venue.venuePage.id', 'venue.location.neighborhood'], axis=1, inplace=True)

#Count total number of rows
df_result.shape

(499, 7)

In [17]:
df_result.head()

Unnamed: 0,venue.id,venue.location.address,venue.location.city,venue.location.lat,venue.location.lng,venue.location.neighborhood,venue.name
0,4bd88ada2ecdce72d0cfd0f2,Lugar da Ponte,Aveiro,41.189654,-7.543781,,Vintage House Hotel
1,53aaf061498e0f4d35e1423b,Av. Brasil,Aveiro,40.156011,-8.867282,,Eurostars Oásis Plaza Hotel
2,4cd0483df6378cfa9b2db4d6,Rotunda da Exponor,Aveiro,41.202502,-8.692774,,Tryp Porto Expo Hotel
3,4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,Aveiro,41.161064,-8.640411,,Sheraton Porto Hotel & Spa
4,4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",Aveiro,41.152183,-8.607009,,Hotel Dom Henrique


In [23]:
#Check results by district (max. 30 as dictated in the LIMIT of the API call)
df_result['venue.location.city'].value_counts()

Beja                30
Viana do Castelo    30
Bragança            30
Lisboa              30
Porto               30
Aveiro              30
Coimbra             30
Leiria              30
Viseu               30
Vila Real           30
Castelo Branco      30
Setúbal             30
Guarda              30
Faro                30
Braga               30
Santarém            30
Évora               15
Portalegre           4
Name: venue.location.city, dtype: int64

In [43]:
# The code was removed by Watson Studio for sharing.

In [44]:
#Append the likes

df_result['likes'] = 0

for i,row in df_result.iterrows():
    
    venue_id = df_result.loc[i,'venue.id']
    
    api_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(
    venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    results = requests.get(api_url).json()
    likes = results['response']['likes']['count']
    
    df_result.at[i, 'likes'] = likes

    print('Likes for', df_result.loc[i, 'venue.name'], '...DONE')
df_result.head()
    

Likes for Vintage House Hotel ...DONE
Likes for Eurostars Oásis Plaza Hotel ...DONE
Likes for Tryp Porto Expo Hotel ...DONE
Likes for Sheraton Porto Hotel & Spa ...DONE
Likes for Hotel Dom Henrique ...DONE
Likes for Hotel Porto Palácio ...DONE
Likes for Axis Porto Business & SPA Hotel ...DONE
Likes for Hotel Vila Galé Coimbra ...DONE
Likes for Meliá Ria Hotel & Spa ...DONE
Likes for Palace Hotel Monte Real ...DONE
Likes for Hotel HF Tuela Porto ...DONE
Likes for Stay Hotel ...DONE
Likes for Hotel Serra da Estrela ...DONE
Likes for Hotel das Salinas ...DONE
Likes for Hotel Quinta das Lágrimas ...DONE
Likes for Hotel da Música ...DONE
Likes for Eurostars Rio Douro Hotel & SPA ...DONE
Likes for Tulip Inn Estarreja Hotel & SPA ...DONE
Likes for Hotel Veneza ...DONE
Likes for Hotel de Ílhavo ...DONE
Likes for Hotel Dighton ...DONE
Likes for Montebelo Viseu Hotel & SPA ...DONE
Likes for Hotel Moliceiro ...DONE
Likes for Palace Hotel do Bussaco ...DONE
Likes for Hotel Teatro ...DONE
Likes for

Unnamed: 0,venue.id,venue.location.address,venue.location.city,venue.location.lat,venue.location.lng,venue.name,likes
0,4bd88ada2ecdce72d0cfd0f2,Lugar da Ponte,Aveiro,41.189654,-7.543781,Vintage House Hotel,63
1,53aaf061498e0f4d35e1423b,Av. Brasil,Aveiro,40.156011,-8.867282,Eurostars Oásis Plaza Hotel,54
2,4cd0483df6378cfa9b2db4d6,Rotunda da Exponor,Aveiro,41.202502,-8.692774,Tryp Porto Expo Hotel,51
3,4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,Aveiro,41.161064,-8.640411,Sheraton Porto Hotel & Spa,265
4,4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",Aveiro,41.152183,-8.607009,Hotel Dom Henrique,73


#### Cell below saves dataframe to a CSV for later.

In [None]:
# The code was removed by Watson Studio for sharing.

## 4. Visualization
With our database created with the data exported from Foursquare, let's visualize the results!

### 4.1 Create a map with all results
Let's create a map of Portugal with the best hotel venues and their associated score (number of likes).

In [5]:
#Coordinates for Portugal
latitude = '39.557191'
longitude = '-7.8536599'

# create map
map = folium.Map(location=[latitude, longitude], zoom_start=7)

# add markers to the map

for i, row in df_result.iterrows():
    
    lat = df_result.loc[i, 'venue.location.lat']
    lon = df_result.loc[i, 'venue.location.lng']
    venue = df_result.loc[i, 'venue.name']
    likes = df_result.loc[i, 'likes']
    
    label = str(venue) + ':' + str(likes)
    
    folium.CircleMarker(location=(lat, lon),
                        fill = True,
                        radius = 5,
                        tooltip = label).add_to(map)
    
map

### 4.2 Cluster Results (hotspots)
Now let's cluster the best areas in order to inform the user about hotspots for tourism.

In [7]:
df_result = pd.read_csv('export_Foursquare.csv', index_col = 0)

df_result.head()

Unnamed: 0,venue.id,venue.location.address,venue.location.city,venue.location.lat,venue.location.lng,venue.name,likes
0,4bd88ada2ecdce72d0cfd0f2,Lugar da Ponte,Aveiro,41.189654,-7.543781,Vintage House Hotel,63
1,53aaf061498e0f4d35e1423b,Av. Brasil,Aveiro,40.156011,-8.867282,Eurostars Oásis Plaza Hotel,54
2,4cd0483df6378cfa9b2db4d6,Rotunda da Exponor,Aveiro,41.202502,-8.692774,Tryp Porto Expo Hotel,51
3,4cbc54f07a5d9eb0ed5b31e9,R. Tenente Valadim 146,Aveiro,41.161064,-8.640411,Sheraton Porto Hotel & Spa,265
4,4bcc5b3aaeaaeee151ec3d6d,"R. Guedes de Azevedo, 179",Aveiro,41.152183,-8.607009,Hotel Dom Henrique,73


In [11]:
# Clustering based on location and score
from sklearn.cluster import KMeans

# set number of clusters - 5 categories
kclusters = 5
df_cluster = df_result.drop(['venue.id','venue.location.address','venue.location.city','venue.name'], axis=1)

df_cluster.head()

Unnamed: 0,venue.location.lat,venue.location.lng,likes
0,41.189654,-7.543781,63
1,40.156011,-8.867282,54
2,41.202502,-8.692774,51
3,41.161064,-8.640411,265
4,41.152183,-8.607009,73


In [12]:
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 3, 3, 2, 1, 4, 3, 1, 1, 3], dtype=int32)

In [13]:
for i,row in df_cluster.iterrows():
    df_cluster.at[i, 'cluster'] = kmeans.labels_[i]

df_cluster.head()

Unnamed: 0,venue.location.lat,venue.location.lng,likes,cluster
0,41.189654,-7.543781,63,1.0
1,40.156011,-8.867282,54,3.0
2,41.202502,-8.692774,51,3.0
3,41.161064,-8.640411,265,2.0
4,41.152183,-8.607009,73,1.0


In [14]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

rainbow

['#8000ff', '#00b5eb', '#80ffb4', '#ffb360', '#ff0000']

In [22]:
# add markers to the map

map_clusters = map

for i, row in df_cluster.iterrows():
    
    lat = df_cluster.loc[i, 'venue.location.lat']
    lon = df_cluster.loc[i, 'venue.location.lng']
    venue = df_result.loc[i, 'venue.name']
    likes = df_cluster.loc[i, 'likes']
    cluster = int(df_cluster.loc[i, 'cluster'])
    
    label = str(venue) + ':' + str(likes)
    
    folium.CircleMarker(location=(lat, lon),
                        radius = 5,
                        tooltip = label,
                        color=rainbow[cluster-1],
                        fill=True,
                        fill_color=rainbow[cluster-1],
                        fill_opacity=0.7).add_to(map)
       
map_clusters

In [21]:
#K-Means took location and likes as the variables for clustering. Let's check the "quality" (likes) of each cluster.
df_cluster.groupby(['cluster']).mean()

Unnamed: 0_level_0,venue.location.lat,venue.location.lng,likes
cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,39.931064,-8.07639,13.495238
1.0,40.436021,-8.517542,78.46
2.0,40.118389,-8.859974,263.5
3.0,39.854827,-8.587103,43.868056
4.0,39.660465,-8.985743,122.83871


We can conclude that we have some clear discrepancy on the number of likes per cluster. As the algorithm took the location as a clustering variable, the results are not fully dependent on the number of likes. We could've clustered only by number of likes to make it different.\
**Let's change the colors to make it easier to read - green for good ratings, red for bad ratings.**

In [33]:
#Apply colors to cluster numbers

new_colors = ['red', 'gold', 'limegreen', 'orange', 'palegreen'] #according to the likes distribution above (cluster 0, 1, 2, 3, 4)

for i,row in df_cluster.iterrows():
    
    cluster = int(df_cluster.loc[i, 'cluster'])    
    df_cluster.at[i, 'color'] = new_colors[cluster]
    
df_cluster.head()

Unnamed: 0,venue.location.lat,venue.location.lng,likes,cluster,color
0,41.189654,-7.543781,63,1.0,gold
1,40.156011,-8.867282,54,3.0,orange
2,41.202502,-8.692774,51,3.0,orange
3,41.161064,-8.640411,265,2.0,limegreen
4,41.152183,-8.607009,73,1.0,gold


In [36]:
# add markers to the map

map_clusters = map

for i, row in df_cluster.iterrows():
    
    lat = df_cluster.loc[i, 'venue.location.lat']
    lon = df_cluster.loc[i, 'venue.location.lng']
    venue = df_result.loc[i, 'venue.name']
    likes = df_cluster.loc[i, 'likes']
    cluster = int(df_cluster.loc[i, 'cluster'])
    color = df_cluster.loc[i, 'color']
    
    label = str(venue) + ': ' + str(likes) + ' - cluster:' + str(cluster)
    
    folium.CircleMarker(location=(lat, lon),
                        radius = 5,
                        tooltip = label,
                        color=color,
                        fill=True,
                        fill_color=color,
                        fill_opacity=1).add_to(map)
       
map_clusters

## 5. Conclusion

We were able to import a selection of venues (hotels) from Foursquare, for each of the capital districts of Portugal and their nearby locations. With the Folium map representation we are able to visualize these locations and quickly access their scoring (number of likes). With the use of K-Means algorithm we automatically clustered locations and differentiated them based on likes (red being the worst and green the best).

We can conclude that Porto and Lisbon have the larger number of high-ranking places. We also have a wide variety of choices to the south (Algarve) and the north (around Vila Real and Braga), although with fewer number of likes.