# A Foursquare European Trip

## 1. Business Understanding

*What is the problem that you are trying to solve?*

**Problem**

There are many people who keep saying that the inner cities in Europe basically all look the same: a big square with a cathedral, nearby mostly the same types of restaurants, bars, shops – which mostly belong to a bigger chain; a castle, a big river, the cool trendy quarter, the not so cool run-down quarter, the train station. Thus, a european trip can be boring for tourists.


**Idea**

Why not using open data to avoid boring holidays? Why not using e.g. Foursquare data to determine a trip with the "most diverse" cities or city centers in Europe?


**Goals**

The goals are:
* To compare about 70 cities or their city centers in Italy, France, Spain, Portugal
* To select 3 italian, 3 french, 3 spanish and 1 portugese cities as such so that they are as diverse as possible.


**Target Audience**

The results of this analysis might be useful for tourists planning to visit European cities.


## 2. Analytic Approach and Data Requirements

*How can you use data to answer the question?*

*What data do you need to answer the question?*

To conquer this problem we need:
* A list of cities from Italy (20 cities), France (20), Spain (20), Portugal (10). Please don't feel discriminated right now if you are Portuguese. The list of cities shall be retrieved from Wikipedia
* The coordinates of each city (which should also roughly be the city center) based on geopy / Nomatim data
* Latitude / Longitude
* Access to Foursquare to determine venues of interest
* For visualization purposes we need the Folium library


The data will be processed using Python in a Jupyter environment. 


A clustering method (most likely k-means) will be used to compare the cities. The algorithm shall  be optimized by using different evaluation methods.


## 3. Data Collection

*Where is the data coming from (identify all sources) and how will you get it?*

### 3.1 Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner
from bs4 import BeautifulSoup # LIbrary to scrape websites

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 3.3 Defines cities of interest
10 lists with cities of interest, each list with 7 cities, were defined. This was necessary to avoid coordinate retrieval issues with Nominatim.

In [2]:
cities_italy1 = ("Rome","Milan","Naples","Turin","Palermo","Genoa","Bologna")
cities_italy2 = ("Florence","Bari","Catania","Venice","Verona","Messina","Padua")
cities_italy3 = ("Trieste","Brescia","Taranto","Parma","Prato","Modena","Reggio Calabria")
cities_france1 = ("Paris","Marseille","Lyon","Toulouse","Nice","Nantes","Strasbourg")
cities_france2 = ("Montpellier","Bordeaux","Lille","Rennes","Reims","Le Havre","Saint-Etienne")
cities_france3 = ("Toulon","Grenoble","Dijon","Nimes","Angers","Villeurbanne","Le Mans")
cities_spain1 = ("Madrid","Barcelona","Valencia","Seville","Zaragoza","Malaga","Murcia")
cities_spain2 = ("Palma","Las Palmas de Gran Canaria","Bilbao","Alicante","Cordoba","Valladolid","Vigo")
cities_spain3 = ("Gijon","L'Hospitalet de Llobregat","A Coruna","Vitoria-Gasteiz","Granada","Elche","Oviedo")
cities_portugal = ("Lisbon","Porto","Vila Nova de Gaia","Amadora","Braga","Coimbra","Funchal")


### 3.2 Create a dataframe with cities, country and coordinates

Define a function to retrieve coordinates of cities in a subset of cities

In [3]:
def getCoord(city_subset,country):
    for city in city_subset:
        geolocator = Nominatim(user_agent="europe_explorer")
        location = geolocator.geocode(city)
        latitude = location.latitude
        longitude = location.longitude
        Lat.append(latitude)
        Lon.append(longitude)    
        City.append(city)
        Country.append(country)

Initialize lists to store city, country, latitude and longitude coordinates.

In [4]:
Lat = []
Lon = []
City = []
Country = []

Retrieve coordinate of cities using subsets of cities, to avoid problem with Nominatim.

In [5]:
getCoord(cities_italy1,"Italy")

In [6]:
getCoord(cities_italy2,"Italy")

In [7]:
getCoord(cities_italy3,"Italy")

In [8]:
getCoord(cities_france1,"France")

In [9]:
getCoord(cities_france2,"France")

In [10]:
getCoord(cities_france3,"France")

In [11]:
getCoord(cities_spain1,"Spain")

In [12]:
getCoord(cities_spain2,"Spain")

In [13]:
getCoord(cities_spain3,"Spain")

In [14]:
getCoord(cities_portugal,"Portugal")

Create a dataframe out of the lists.

In [15]:
df = pd.DataFrame(City,columns=['City'])
df['country']=Country
df['Longitude']=Lon
df['Latitude']=Lat

## 4. Data Understanding and Data Preparation

*Is the data that you collected representative of the problem to be solved?*

*What additional work is required to manipulate and work with the data?*

## 4.1 Create a map showing the cities of interest

Calculate the centroid of all coordinates of the cities of interest to center the map.

In [16]:
centroid_Lat=sum(Lat)/len(Lat)
centroid_Lon=sum(Lon)/len(Lon)

In [68]:
map = folium.Map(location=[centroid_Lat,centroid_Lon], zoom_start=4)

# add markers to map
for lat, lng, city, country in zip(df['Latitude'], df['Longitude'], df['City'], df['country']):
    label = '{}, {}'.format(city, country)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  

map

## 4.2 Get venues for each city center

Define radius around city center and limit of venues to get

In [18]:
radius=2000
LIMIT=100

Foursquare Credentials

In [19]:
#hidden

A function to repeat the process to get venues to all the cities

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Use function to get nearby venues for all cities

In [21]:
city_venues = getNearbyVenues(names=df['City'],
                                  latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Rome
Milan
Naples
Turin
Palermo
Genoa
Bologna
Florence
Bari
Catania
Venice
Verona
Messina
Padua
Trieste
Brescia
Taranto
Parma
Prato
Modena
Reggio Calabria
Paris
Marseille
Lyon
Toulouse
Nice
Nantes
Strasbourg
Montpellier
Bordeaux
Lille
Rennes
Reims
Le Havre
Saint-Etienne
Toulon
Grenoble
Dijon
Nimes
Angers
Villeurbanne
Le Mans
Madrid
Barcelona
Valencia
Seville
Zaragoza
Malaga
Murcia
Palma
Las Palmas de Gran Canaria
Bilbao
Alicante
Cordoba
Valladolid
Vigo
Gijon
L'Hospitalet de Llobregat
A Coruna
Vitoria-Gasteiz
Granada
Elche
Oviedo
Lisbon
Porto
Vila Nova de Gaia
Amadora
Braga
Coimbra
Funchal


Check how many venues were returned for each city

In [22]:
city_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A Coruna,100,100,100,100,100,100
Alicante,16,16,16,16,16,16
Amadora,13,13,13,13,13,13
Angers,53,53,53,53,53,53
Barcelona,100,100,100,100,100,100
Bari,24,24,24,24,24,24
Bilbao,100,100,100,100,100,100
Bologna,100,100,100,100,100,100
Bordeaux,90,90,90,90,90,90
Braga,100,100,100,100,100,100


## 4.3 Analyze each city

A dataframe is created which displays the top 10 venues for each city.

In [50]:
# one hot encoding
cities_onehot = pd.get_dummies(city_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
cities_onehot['City'] = city_venues['City'] 

# move city column to the first column
fixed_columns = [cities_onehot.columns[-1]] + list(cities_onehot.columns[:-1])
cities_onehot = cities_onehot[fixed_columns]

# Next, let's group rows by city and by taking the mean of the frequency of occurrence of each category
cities_grouped = cities_onehot.groupby('City').mean().reset_index()

In [51]:
# Let's write a function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [52]:
# Now let's create the new dataframe and display the top 10 venues for each city.
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
city_venues_sorted = pd.DataFrame(columns=columns)
city_venues_sorted['City'] = cities_grouped['City']

for ind in np.arange(cities_grouped.shape[0]):
    city_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cities_grouped.iloc[ind, :], num_top_venues)

city_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,A Coruna,Bar,Tapas Restaurant,Restaurant,Spanish Restaurant,Pub,Plaza,Ice Cream Shop,Café,Seafood Restaurant,Brewery
1,Alicante,Accessories Store,Bar,Plaza,Performing Arts Venue,Pub,Public Art,Park,Moroccan Restaurant,Gaming Cafe,Basketball Stadium
2,Amadora,Portuguese Restaurant,Restaurant,Pizza Place,Metro Station,Supermarket,Gym,Café,Bakery,Playground,Fast Food Restaurant
3,Angers,Bar,Pub,French Restaurant,Lounge,Sandwich Place,Japanese Restaurant,Italian Restaurant,Tram Station,Bakery,Nightclub
4,Barcelona,Tapas Restaurant,Plaza,Bar,Wine Bar,Cocktail Bar,Dessert Shop,Coffee Shop,Pizza Place,Spanish Restaurant,Italian Restaurant


# 5. Modeling

*In what way can the data be visualized to get to the answer that is required?*

## 5.1 k-means Clustering

In [53]:
# set number of clusters
kclusters = 10

cities_grouped_clustering = cities_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cities_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 7, 5, 8, 0, 3, 0, 3, 8, 5, 7, 5, 0, 8, 0, 3, 5, 3, 0, 6, 8, 7,
       0, 8, 8, 8, 5, 6, 0, 0, 8, 7, 3, 3, 8, 7, 8, 7, 8, 8, 1, 3, 7, 0,
       8, 3, 5, 2, 6, 8, 8, 3, 6, 0, 8, 6, 8, 6, 3, 0, 0, 3, 3, 9, 5, 4,
       0], dtype=int32)

## 5.2 Add clustering labels to dataframe

In [54]:
# add clustering labels
city_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

cities_merged = df

# merge cities_grouped with city df to add latitude/longitude for each city
cities_merged = cities_merged.join(city_venues_sorted.set_index('City'), on='City')

cities_merged.head() # check the last columns!

Unnamed: 0,City,country,Longitude,Latitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rome,Italy,12.485338,41.894802,3.0,Italian Restaurant,Historic Site,Plaza,Temple,History Museum,Museum,Hotel,Monument / Landmark,Scenic Lookout,Café
1,Milan,Italy,9.1905,45.4668,3.0,Italian Restaurant,Boutique,Hotel,Plaza,Pizza Place,Cocktail Bar,Ice Cream Shop,Bookstore,Coffee Shop,Monument / Landmark
2,Naples,Italy,14.248783,40.835934,7.0,Pizza Place,Ice Cream Shop,Café,Italian Restaurant,Trattoria/Osteria,Historic Site,Harbor / Marina,Plaza,Pastry Shop,Seafood Restaurant
3,Turin,Italy,7.682489,45.067755,3.0,Café,Ice Cream Shop,Plaza,Pizza Place,Italian Restaurant,Hotel,Bookstore,Historic Site,Piedmontese Restaurant,Boutique
4,Palermo,Italy,13.352443,38.111227,7.0,Plaza,Italian Restaurant,History Museum,Historic Site,Restaurant,Cocktail Bar,Sandwich Place,Bar,Café,Brewery


## 5.3 Display Clustering on Map

In [55]:
cities_merged=cities_merged.dropna()
cities_merged['Cluster_Labels'] = cities_merged.Cluster_Labels.astype(int)

# create map
map_clusters = folium.Map(location=[centroid_Lat,centroid_Lon], zoom_start=4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cities_merged['Latitude'], cities_merged['Longitude'], cities_merged['City'], cities_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5.4 Explore Clusters

### 5.4.1 Cluster 0

In [56]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 0, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,Madrid,Spain,Spanish Restaurant,Tapas Restaurant,Hotel,Plaza,Hostel,Bookstore,Restaurant,Gourmet Shop,Theater,Cocktail Bar
43,Barcelona,Spain,Tapas Restaurant,Plaza,Bar,Wine Bar,Cocktail Bar,Dessert Shop,Coffee Shop,Pizza Place,Spanish Restaurant,Italian Restaurant
44,Valencia,Spain,Restaurant,Hotel,Spanish Restaurant,Mediterranean Restaurant,Plaza,Coffee Shop,Clothing Store,Ice Cream Shop,Burger Joint,Italian Restaurant
45,Seville,Spain,Tapas Restaurant,Plaza,Hotel,Spanish Restaurant,Ice Cream Shop,Café,Restaurant,Historic Site,Bar,Mediterranean Restaurant
47,Malaga,Spain,Spanish Restaurant,Tapas Restaurant,Café,Plaza,Bar,Hotel,Bookstore,Seafood Restaurant,Mediterranean Restaurant,Restaurant
49,Palma,Spain,Tapas Restaurant,Plaza,Café,Hotel,Spanish Restaurant,Restaurant,Bar,Ice Cream Shop,Vegetarian / Vegan Restaurant,Coffee Shop
50,Las Palmas de Gran Canaria,Spain,Tapas Restaurant,Indian Restaurant,Gym,Breakfast Spot,Shopping Mall,Spanish Restaurant,Supermarket,Road,Snack Place,Coffee Shop
51,Bilbao,Spain,Spanish Restaurant,Restaurant,Tapas Restaurant,Seafood Restaurant,Plaza,Bar,Café,Wine Bar,Cocktail Bar,Bakery
53,Cordoba,Spain,Spanish Restaurant,Hotel,Bar,Tapas Restaurant,Plaza,Café,Hostel,Restaurant,Sandwich Place,Theater
54,Valladolid,Spain,Tapas Restaurant,Bar,Spanish Restaurant,Café,Plaza,Mediterranean Restaurant,Restaurant,Gourmet Shop,Nightclub,Art Gallery


### 5.4.2 Cluster 1

In [57]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 1, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
62,Oviedo,Spain,Supermarket,Women's Store,Financial or Legal Service,Event Space,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fried Chicken Joint


### 5.4.3 Cluster 2

In [58]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 2, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Prato,Italy,Italian Restaurant,Women's Store,Fish & Chips Shop,Event Space,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Fish Market


### 5.4.4 Cluster 3

In [59]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 3, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rome,Italy,Italian Restaurant,Historic Site,Plaza,Temple,History Museum,Museum,Hotel,Monument / Landmark,Scenic Lookout,Café
1,Milan,Italy,Italian Restaurant,Boutique,Hotel,Plaza,Pizza Place,Cocktail Bar,Ice Cream Shop,Bookstore,Coffee Shop,Monument / Landmark
3,Turin,Italy,Café,Ice Cream Shop,Plaza,Pizza Place,Italian Restaurant,Hotel,Bookstore,Historic Site,Piedmontese Restaurant,Boutique
5,Genoa,Italy,Italian Restaurant,Plaza,Pub,Café,Bar,Historic Site,Ice Cream Shop,Restaurant,Hotel,Pizza Place
6,Bologna,Italy,Italian Restaurant,Plaza,Café,Ice Cream Shop,Hotel,Wine Bar,History Museum,Bookstore,Sandwich Place,Toy / Game Store
7,Florence,Italy,Italian Restaurant,Hotel,Ice Cream Shop,Plaza,Café,Art Museum,Sandwich Place,Art Gallery,Cocktail Bar,Boutique
8,Bari,Italy,Pizza Place,Hotel,Café,Restaurant,Plaza,Italian Restaurant,Seafood Restaurant,Ice Cream Shop,Burger Joint,Food
10,Venice,Italy,Italian Restaurant,Hotel,Ice Cream Shop,Plaza,Bar,Wine Bar,Café,Restaurant,Museum,Cocktail Bar
11,Verona,Italy,Italian Restaurant,Pizza Place,Hotel,Bar,Restaurant,Café,Ice Cream Shop,Historic Site,Boutique,Theater
13,Padua,Italy,Café,Italian Restaurant,Plaza,Pizza Place,Wine Bar,Ice Cream Shop,Bar,Bakery,Pub,Sandwich Place


### 5.4.5 Cluster 4

In [60]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 4, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,Villeurbanne,France,Women's Store,Art Gallery,Supermarket,Bakery,Hookah Bar,Food,Flower Shop,Fish Market,Food & Drink Shop,Food Service


### 5.4.6 Cluster 5

In [61]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 5, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Lisbon,Portugal,Portuguese Restaurant,Hotel,Ice Cream Shop,Restaurant,Bar,Plaza,Wine Bar,Coffee Shop,Gastropub,Tapas Restaurant
64,Porto,Portugal,Portuguese Restaurant,Bar,Hotel,Café,Plaza,Hostel,Italian Restaurant,Breakfast Spot,Restaurant,Ice Cream Shop
65,Vila Nova de Gaia,Portugal,Café,Bakery,Restaurant,Coffee Shop,Sushi Restaurant,Grocery Store,Gastropub,Spanish Restaurant,Steakhouse,Supermarket
66,Amadora,Portugal,Portuguese Restaurant,Restaurant,Pizza Place,Metro Station,Supermarket,Gym,Café,Bakery,Playground,Fast Food Restaurant
67,Braga,Portugal,Bar,Restaurant,Portuguese Restaurant,Café,Plaza,Coffee Shop,Burger Joint,Tapas Restaurant,Bakery,Japanese Restaurant
68,Coimbra,Portugal,Portuguese Restaurant,Bakery,Plaza,Restaurant,Bar,Hotel,Tapas Restaurant,Café,Gastropub,Supermarket
69,Funchal,Portugal,Café,Restaurant,Plaza,Portuguese Restaurant,Bakery,Hotel,Seafood Restaurant,Ice Cream Shop,Historic Site,Coffee Shop


### 5.4.7 Cluster 6

In [62]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 6, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Trieste,Italy,Hotel,Café,Plaza,Italian Restaurant,Ice Cream Shop,Bar,Trattoria/Osteria,Historic Site,Pizza Place,Coffee Shop
20,Reggio Calabria,Italy,Plaza,Fast Food Restaurant,Castle,Café,Steakhouse,Train Station,Burger Joint,Garden,Trattoria/Osteria,Hotel
23,Lyon,France,Plaza,Hotel,Italian Restaurant,Wine Bar,Café,French Restaurant,Coffee Shop,Resort,Pub,Optical Shop
34,Saint-Etienne,France,Plaza,Bookstore,Pub,French Restaurant,Multiplex,Farmers Market,Café,Sandwich Place,Steakhouse,Sushi Restaurant
35,Toulon,France,Plaza,Hotel,Irish Pub,Pizza Place,Train Station,Fast Food Restaurant,Café,Salad Place,Lounge,Restaurant
60,Granada,Spain,Hotel,Café,Plaza,Bar,Tapas Restaurant,Spanish Restaurant,Scenic Lookout,Nightclub,Coffee Shop,Breakfast Spot


### 5.4.8 Cluster 7

In [63]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 7, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Naples,Italy,Pizza Place,Ice Cream Shop,Café,Italian Restaurant,Trattoria/Osteria,Historic Site,Harbor / Marina,Plaza,Pastry Shop,Seafood Restaurant
4,Palermo,Italy,Plaza,Italian Restaurant,History Museum,Historic Site,Restaurant,Cocktail Bar,Sandwich Place,Bar,Café,Brewery
9,Catania,Italy,Pizza Place,Cocktail Bar,Italian Restaurant,Plaza,Bed & Breakfast,Dessert Shop,Church,Pub,Restaurant,Café
12,Messina,Italy,Italian Restaurant,Dessert Shop,Greek Restaurant,Cocktail Bar,Movie Theater,Pizza Place,Café,Scenic Lookout,Gastropub,Theater
48,Murcia,Spain,Tapas Restaurant,Plaza,Pizza Place,Park,Steakhouse,Bagel Shop,Tennis Court,Light Rail Station,Nightclub,Mediterranean Restaurant
52,Alicante,Spain,Accessories Store,Bar,Plaza,Performing Arts Venue,Pub,Public Art,Park,Moroccan Restaurant,Gaming Cafe,Basketball Stadium
57,L'Hospitalet de Llobregat,Spain,Mediterranean Restaurant,Restaurant,Pizza Place,Cocktail Bar,Café,Bar,Bakery,Diner,Spanish Restaurant,Train Station


### 5.4.9 Cluster 8

In [64]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 8, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,Paris,France,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Pub,Cocktail Bar,Souvenir Shop,Bar,Cosmetics Shop,Park
22,Marseille,France,French Restaurant,Hotel,Bar,Plaza,Pub,Ice Cream Shop,Pizza Place,Steakhouse,Seafood Restaurant,Café
24,Toulouse,France,French Restaurant,Plaza,Tea Room,Bar,Coffee Shop,Ice Cream Shop,Burger Joint,Restaurant,Hotel,Argentinian Restaurant
25,Nice,France,Hotel,French Restaurant,Ice Cream Shop,Fast Food Restaurant,Clothing Store,Coffee Shop,Italian Restaurant,Middle Eastern Restaurant,Restaurant,Bed & Breakfast
26,Nantes,France,Bar,French Restaurant,Plaza,Restaurant,Pizza Place,Greek Restaurant,Indian Restaurant,Coffee Shop,Salad Place,Castle
27,Strasbourg,France,French Restaurant,Plaza,Bar,Bakery,Hotel,Alsatian Restaurant,Cupcake Shop,Brewery,Italian Restaurant,Restaurant
28,Montpellier,France,French Restaurant,Bar,Café,Pub,Cocktail Bar,Plaza,Coffee Shop,Wine Bar,Burger Joint,Pizza Place
29,Bordeaux,France,French Restaurant,Plaza,Coffee Shop,Pedestrian Plaza,Hotel,Café,Shopping Mall,Bistro,Electronics Store,Bakery
30,Lille,France,French Restaurant,Coffee Shop,Cocktail Bar,Bar,Creperie,Plaza,Burger Joint,Café,Italian Restaurant,Japanese Restaurant
31,Rennes,France,Bar,Plaza,Creperie,Coffee Shop,Irish Pub,Historic Site,Bakery,Pub,Market,French Restaurant


### 5.4.10 Cluster 9

In [65]:
cities_merged.loc[cities_merged['Cluster_Labels'] == 9, cities_merged.columns[[0] + [1] + list(range(5, cities_merged.shape[1]))]]

Unnamed: 0,City,country,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
55,Vigo,Spain,Café,Gaming Cafe,Coffee Shop,Women's Store,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Financial or Legal Service,Flower Shop


### 5.4.11 Summarize clusters

- Cluster 0 - "Spanish Cluster"
- Cluster 1 - Oviedo, Spain (1 venue) - excluded
- Cluster 2 - Prato, Italy (1 venue) - excluded
- Cluster 3 - "Italian Cluster"
- Cluster 4 - Villeurbane, France (5 venues) - excluded
- Cluster 5 - "Portoguese Cluster"
- Cluster 6 - Cities in Italy, France, Spain
- Cluster 7 - Cities in Italy and Spain
- Cluster 8 - "French Cluster"
- Cluster 9 - Vigo, Spain (3 venues) - excluded

# 6 Select cities

## Selection of 3 italian cities

- Select 1 city from cluster 3 ("Italian Cluster") - Rome (first entry)
- Select 1 city from cluster 6 - Trieste (first italian entry)
- Select 1 city from cluster 7 - Naples (first italian entry)

## Selection of the portoguese city

- Select 1 city from cluster 5 ("Portoguese Cluster") - Lisbon

## Selection of 3 french cities:

- Select 2 cities from cluster 8 ("French Cluster") - Paris (first entry) and Nice
- Select 1 city from cluster 6 - Lyon (first french entry)

## Selection of 3 spanish cities

- Select 1 city from cluster 0 ("Spanish Cluster") - Madrid (first entry)
- Select 1 city from cluster 6 - Granada (first spanish entry)
- Select 1 city from cluster 7 - Murcia (first spanish entry)

## Excluded clusters

Clusters 1, 2, 4 and 9 were excluded because they contain only 1 city each with very few venues (between 1 and 5).

## Show selected cities with their labeling on the map

In [66]:
cities_selected = ("Rome","Trieste","Naples","Paris","Nice","Lyon","Madrid","Granada","Murcia","Lisbon")
cities_merged_selected = cities_merged.loc[cities_merged['City'].isin(cities_selected)]
cities_merged_selected

Unnamed: 0,City,country,Longitude,Latitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Rome,Italy,12.485338,41.894802,3,Italian Restaurant,Historic Site,Plaza,Temple,History Museum,Museum,Hotel,Monument / Landmark,Scenic Lookout,Café
2,Naples,Italy,14.248783,40.835934,7,Pizza Place,Ice Cream Shop,Café,Italian Restaurant,Trattoria/Osteria,Historic Site,Harbor / Marina,Plaza,Pastry Shop,Seafood Restaurant
14,Trieste,Italy,13.770656,45.650033,6,Hotel,Café,Plaza,Italian Restaurant,Ice Cream Shop,Bar,Trattoria/Osteria,Historic Site,Pizza Place,Coffee Shop
21,Paris,France,2.351499,48.85661,8,French Restaurant,Ice Cream Shop,Plaza,Art Gallery,Pub,Cocktail Bar,Souvenir Shop,Bar,Cosmetics Shop,Park
23,Lyon,France,4.832011,45.757814,6,Plaza,Hotel,Italian Restaurant,Wine Bar,Café,French Restaurant,Coffee Shop,Resort,Pub,Optical Shop
25,Nice,France,7.268391,43.700936,8,Hotel,French Restaurant,Ice Cream Shop,Fast Food Restaurant,Clothing Store,Coffee Shop,Italian Restaurant,Middle Eastern Restaurant,Restaurant,Bed & Breakfast
42,Madrid,Spain,-3.703582,40.416705,0,Spanish Restaurant,Tapas Restaurant,Hotel,Plaza,Hostel,Bookstore,Restaurant,Gourmet Shop,Theater,Cocktail Bar
48,Murcia,Spain,-1.130543,37.992379,7,Tapas Restaurant,Plaza,Pizza Place,Park,Steakhouse,Bagel Shop,Tennis Court,Light Rail Station,Nightclub,Mediterranean Restaurant
60,Granada,Spain,-3.602193,37.183054,6,Hotel,Café,Plaza,Bar,Tapas Restaurant,Spanish Restaurant,Scenic Lookout,Nightclub,Coffee Shop,Breakfast Spot
63,Lisbon,Portugal,-9.136592,38.707751,5,Portuguese Restaurant,Hotel,Ice Cream Shop,Restaurant,Bar,Plaza,Wine Bar,Coffee Shop,Gastropub,Tapas Restaurant


In [67]:
# create map
map_clusters = folium.Map(location=[centroid_Lat,centroid_Lon], zoom_start=5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cities_merged_selected['Latitude'], cities_merged_selected['Longitude'], cities_merged_selected['City'], cities_merged_selected['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters