# Holiday Destination Recommender

## 1. Prepare Necessary Libraries

In [239]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [240]:
# retrieving lat and long
from geopy.geocoders import Nominatim 

# data science stuffs
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans

# visualization stuffs
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

# utilities
import requests 
import time

## 2. Retrieve Data

### 2.1 Get Coordinates

Enter the cities you like to find something similar for:

In [241]:
visited = ["Barcelona, Spain", "Budapest, Hungary"]

Retrieve coordinates:

In [242]:
dfVisited = pd.DataFrame(columns=['Place','Latitude','Longitude'])

for place in visited:        
    geolocator = Nominatim(user_agent="coursera_hhu")
    location = geolocator.geocode(place)
    try:
        latitude = location.latitude
        longitude = location.longitude
        print('The geograpical coordinate of {} are {}, {}.'.format(place, latitude, longitude))
        dfVisited = dfVisited.append({'Place' : place , 'Latitude' : latitude, 'Longitude' : longitude} , ignore_index=True)
    except:
        print(place + " not found")
    
dfVisited 

The geograpical coordinate of Barcelona, Spain are 41.3828939, 2.1774322.
The geograpical coordinate of Budapest, Hungary are 47.48138955, 19.14607278448202.


Unnamed: 0,Place,Latitude,Longitude
0,"Barcelona, Spain",41.382894,2.177432
1,"Budapest, Hungary",47.48139,19.146073


Load list of all cities in this world (found @ https://datahub.io/core/world-cities):

In [243]:
dfCities = pd.read_csv("world-cities.csv")

Randomly pick 100 cities and retrieve coordinates:
(Limit to 100 cities in order not to exceed the daily request limit of Foursquare Places API imposed on sandbox account. Random selection to ensure exploration element.)

In [246]:
dfCities = dfCities.sample(100)

new = []

for index, row in dfCities.iterrows():
    new.append(row['name'] + ', ' + row['country'])
    
dfNew = pd.DataFrame(columns=['Place','Latitude','Longitude'])

for place in new:        
    geolocator = Nominatim(user_agent="coursera_hhu")
    location = geolocator.geocode(place)
    try:
        latitude = location.latitude
        longitude = location.longitude
        print('The geograpical coordinate of {} are {}, {}.'.format(place, latitude, longitude))
        dfNew = dfNew.append({'Place' : place , 'Latitude' : latitude, 'Longitude' : longitude} , ignore_index=True)
    except:
        print(place + " not found")
    time.sleep(2)
    
dfNew 

The geograpical coordinate of Ammi Moussa, Algeria are 35.8702048, 1.1081265.
Sarankhola, Bangladesh not found
The geograpical coordinate of Almaty, Kazakhstan are 43.2363924, 76.9457275.
The geograpical coordinate of Bilāsipāra, India are 26.2316753, 90.2324363.
The geograpical coordinate of Kuching, Malaysia are 1.5574127, 110.3439862.
The geograpical coordinate of Ust’-Dzheguta, Russia are 44.0858961, 41.9722557.
The geograpical coordinate of Shorewood, United States are 44.930272, -93.54483032110988.
The geograpical coordinate of Parral, Chile are -36.1417466, -71.8227767.
The geograpical coordinate of Kangdong-ŭp, North Korea are 39.1416596, 126.0995107.
The geograpical coordinate of Santo Tomas, Philippines are 14.1078443, 121.1453304.
The geograpical coordinate of Uelzen, Germany are 52.9840679, 10.538588154650403.
The geograpical coordinate of Dorsten, Germany are 51.6604071, 6.9647431.
The geograpical coordinate of Baheri, India are 28.745743949999998, 79.51381790408576.
The g

Unnamed: 0,Place,Latitude,Longitude
0,"Ammi Moussa, Algeria",35.870205,1.108127
1,"Almaty, Kazakhstan",43.236392,76.945728
2,"Bilāsipāra, India",26.231675,90.232436
3,"Kuching, Malaysia",1.557413,110.343986
4,"Ust’-Dzheguta, Russia",44.085896,41.972256
5,"Shorewood, United States",44.930272,-93.544830
6,"Parral, Chile",-36.141747,-71.822777
7,"Kangdong-ŭp, North Korea",39.141660,126.099511
8,"Santo Tomas, Philippines",14.107844,121.145330
9,"Uelzen, Germany",52.984068,10.538588


### 2.2 Get Venues

Prepare requests to Foursquare Places API

In [248]:
CLIENT_ID = 'C30IS2UE2KULOT4PMMQ24TZQWFKP0Y4PLPKI4NNQ2X2CBZIT' # your Foursquare ID
CLIENT_SECRET = 'OYIZN0VZFPOKH4G3SFHUHL2AIKPZRSR4CPV4X5HW4F5NTN1Q' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: C30IS2UE2KULOT4PMMQ24TZQWFKP0Y4PLPKI4NNQ2X2CBZIT
CLIENT_SECRET:OYIZN0VZFPOKH4G3SFHUHL2AIKPZRSR4CPV4X5HW4F5NTN1Q


In [249]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
        
        try:
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            print(name + " not found")

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Place', 
                  'Place Latitude', 
                  'Place Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Retrieve venues for visited and new places:

In [267]:
LIMIT = 100

visited_venues = getNearbyVenues(names=dfVisited['Place'],
                                   latitudes=dfVisited['Latitude'],
                                   longitudes=dfVisited['Longitude']
                                  )

visited_venues

Barcelona, Spain
Budapest, Hungary


Unnamed: 0,Place,Place Latitude,Place Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Barcelona, Spain",41.382894,2.177432,Gelaaati! di Marco,41.383186,2.177369,Ice Cream Shop
1,"Barcelona, Spain",41.382894,2.177432,Barrio Gótico,41.383660,2.177290,Neighborhood
2,"Barcelona, Spain",41.382894,2.177432,La Alcoba Azul,41.382833,2.175506,Spanish Restaurant
3,"Barcelona, Spain",41.382894,2.177432,Plaça de Sant Felip Neri,41.383378,2.175152,Plaza
4,"Barcelona, Spain",41.382894,2.177432,Plaça de Sant Jaume,41.382690,2.177010,Plaza
5,"Barcelona, Spain",41.382894,2.177432,Pont del Carrer del Bisbe,41.383310,2.176413,Bridge
6,"Barcelona, Spain",41.382894,2.177432,Plaça del Rei,41.384080,2.177496,Plaza
7,"Barcelona, Spain",41.382894,2.177432,Frankfurt Sant Jaume,41.382275,2.176912,Hot Dog Joint
8,"Barcelona, Spain",41.382894,2.177432,El Cuiner de Damasc,41.381490,2.177677,Falafel Restaurant
9,"Barcelona, Spain",41.382894,2.177432,Xurrería Manuel San Román,41.382241,2.175085,Snack Place


In [268]:
new_venues = getNearbyVenues(names=dfNew['Place'],
                                   latitudes=dfNew['Latitude'],
                                   longitudes=dfNew['Longitude']
                                  )

new_venues

Ammi Moussa, Algeria
Almaty, Kazakhstan
Bilāsipāra, India
Kuching, Malaysia
Ust’-Dzheguta, Russia
Shorewood, United States
Parral, Chile
Kangdong-ŭp, North Korea
Santo Tomas, Philippines
Uelzen, Germany
Dorsten, Germany
Baheri, India
Āsind, India
Talāja, India
Durango, United States
Nāḩīyat Saddat al Hindīyah, Iraq
Tal’ne, Ukraine
Amesbury, United States
Elkton, United States
Baidyabāti, India
Gaogou, China
Pickering, Canada
Aïn Touta, Algeria
Wetteren, Belgium
Hayward, United States
Rybnik, Poland
Al Jahrā’, Kuwait
Oakdale, United States
Akbarpur, India
Eisenach, Germany
Gus’-Khrustal’nyy, Russia
Fond Parisien, Haiti
East Ridge, United States
Les Clayes-sous-Bois, France
Rouissat, Algeria
Bako, Ethiopia
Goshen, United States
Plast, Russia
Garden City, United States
Câmpia Turzii, Romania
‘Aïn el Hadjel, Algeria
Vratsa, Bulgaria
Kaluga, Russia
Encantado, Brazil
Poggibonsi, Italy
Warrensburg, United States
Koh Kong, Cambodia
Konongo, Ghana
Comé, Benin
Foso, Ghana
Kikwit, Democratic Repu

Unnamed: 0,Place,Place Latitude,Place Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Almaty, Kazakhstan",43.236392,76.945728,Президентский парк,43.234283,76.944700,Park
1,"Almaty, Kazakhstan",43.236392,76.945728,Республика алаңы / Площадь Республики / Republ...,43.238878,76.945406,Plaza
2,"Almaty, Kazakhstan",43.236392,76.945728,Ziyafet Steakhouse,43.238639,76.941904,Steakhouse
3,"Almaty, Kazakhstan",43.236392,76.945728,Жиробас бар,43.239296,76.948726,Cocktail Bar
4,"Almaty, Kazakhstan",43.236392,76.945728,InterContinental Almaty,43.235748,76.940222,Hotel
5,"Almaty, Kazakhstan",43.236392,76.945728,"Health Club ""SPA InterContinental""",43.235546,76.940371,Spa
6,"Almaty, Kazakhstan",43.236392,76.945728,Coffee and the City,43.237836,76.945510,Coffee Shop
7,"Almaty, Kazakhstan",43.236392,76.945728,Leo's Cafe & Terrace,43.238969,76.949472,Café
8,"Almaty, Kazakhstan",43.236392,76.945728,Fitnation Sport & Fitness Club,43.239289,76.945077,Gym / Fitness Center
9,"Almaty, Kazakhstan",43.236392,76.945728,Trattoria,43.239043,76.950158,Italian Restaurant


## 3. Feature Transformation

### 3.1 One Hot Encoding of Venue Categories

In [251]:
# one hot encoding
visited_onehot = pd.get_dummies(visited_venues[['Venue Category']], prefix="", prefix_sep="")

visited_onehot.insert(0, 'Place', visited_venues['Place'] , True) 

visited_onehot

Unnamed: 0,Place,Art Museum,Baby Store,Bakery,Bar,Beer Bar,Bistro,Bridge,Burger Joint,Bus Stop,...,Snack Place,Spanish Restaurant,Steakhouse,Tapas Restaurant,Tea Room,Track,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Women's Store
0,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
3,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"Barcelona, Spain",0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Barcelona, Spain",0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


In [252]:
new_onehot = pd.get_dummies(new_venues[['Venue Category']], prefix="", prefix_sep="")

new_onehot.insert(0, 'Place', new_venues['Place'] , True) 

new_onehot

Unnamed: 0,Place,ATM,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train,Train Station,Turkish Restaurant,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,"Almaty, Kazakhstan",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### 3.2 Calculating the Frequency of Venue Categories

In [253]:
visited_onehot = visited_onehot.loc[:,~visited_onehot.columns.duplicated()]

visited_grouped = visited_onehot.groupby('Place').mean().reset_index()
visited_grouped

Unnamed: 0,Place,Art Museum,Baby Store,Bakery,Bar,Beer Bar,Bistro,Bridge,Burger Joint,Bus Stop,...,Snack Place,Spanish Restaurant,Steakhouse,Tapas Restaurant,Tea Room,Track,Vegetarian / Vegan Restaurant,Wine Bar,Wine Shop,Women's Store
0,"Barcelona, Spain",0.01,0.01,0.02,0.04,0.01,0.01,0.01,0.02,0.0,...,0.01,0.04,0.01,0.1,0.01,0.0,0.02,0.04,0.01,0.01
1,"Budapest, Hungary",0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0


In [254]:
new_onehot = new_onehot.loc[:,~new_onehot.columns.duplicated()]

new_grouped = new_onehot.groupby('Place').mean().reset_index()
new_grouped

Unnamed: 0,Place,ATM,Accessories Store,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Train,Train Station,Turkish Restaurant,Video Store,Vietnamese Restaurant,Weight Loss Center,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Almaty, Kazakhstan",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.030303,...,0.000000,0.000000,0.030303,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,"Amesbury, United States",0.000000,0.00,0.181818,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.045455
2,"Amuntai, Indonesia",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,"An Nabk, Syria",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,"Artëm, Russia",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
5,"Aïn Touta, Algeria",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
6,"Baidyabāti, India",0.500000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.500000,0.000000
7,"Boufarik, Algeria",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
8,"Buchen, Germany",0.125000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
9,"Burnham-on-Sea, United Kingdom",0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000


## 4. K-Means Clustering 

### 4.1 Calculate Cosine Similarity Between New and Visited Places

In [255]:
visited_grouped_numerical = visited_grouped.drop("Place", 1)
new_grouped_numerical = new_grouped.drop("Place", 1)

new_grouped_numerical = new_grouped_numerical.loc[:,visited_grouped_numerical.columns]
new_grouped_numerical = new_grouped_numerical.fillna(0.0)

dfSimilarity = pd.DataFrame(data=cosine_similarity(new_grouped_numerical, visited_grouped_numerical),columns=visited)

print(dfSimilarity)

    Barcelona, Spain  Budapest, Hungary
0           0.483288           0.280745
1           0.323197           0.000000
2           0.204124           0.000000
3           0.051031           0.000000
4           0.357217           0.188982
..               ...                ...
56          0.294628           0.218218
57          0.291667           0.000000
58          0.359446           0.130931
59          0.273861           0.000000
60          0.000000           0.000000

[61 rows x 2 columns]


Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike
  return self._getitem_tuple(key)


### 4.2 Calculate Optimal K Using Silhouette Method

In [256]:
scores = []
kmax = 10

for k in range(2, kmax+1):
  kmeans = KMeans(n_clusters = k).fit(dfSimilarity)
  labels = kmeans.labels_
  scores.append(silhouette_score(dfSimilarity, labels, metric = 'euclidean'))

scores

[0.6220777535153776,
 0.5668605299571667,
 0.6110393140388912,
 0.6489429104301974,
 0.6465254049835942,
 0.6201313202445963,
 0.6170728042585615,
 0.6218453567908208,
 0.6100973252173566]

In [257]:
opt_k = scores.index(max(scores))+2

opt_k

5

### 4.3 Run K-Means

In [258]:
# run k-means clustering
kmeans = KMeans(n_clusters=opt_k, random_state=0).fit(dfSimilarity)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 0, 2, 3, 2, 2, 2, 0, 0], dtype=int32)

In [259]:
new_clustered = new_grouped.loc[:,["Place"]]
new_clustered.insert(1, 'Cluster Labels', kmeans.labels_)

new_clustered

Unnamed: 0,Place,Cluster Labels
0,"Almaty, Kazakhstan",3
1,"Amesbury, United States",0
2,"Amuntai, Indonesia",0
3,"An Nabk, Syria",2
4,"Artëm, Russia",3
5,"Aïn Touta, Algeria",2
6,"Baidyabāti, India",2
7,"Boufarik, Algeria",2
8,"Buchen, Germany",0
9,"Burnham-on-Sea, United Kingdom",0


## 5. Visualization

### 5.1 Prepare Data for Visualization

Merge cities data with cluster labels:

In [260]:
dfNew_clustered = pd.merge(dfNew, new_clustered, on='Place', how="inner")

dfNew_clustered.sort_values(by="Cluster Labels")

Unnamed: 0,Place,Latitude,Longitude,Cluster Labels
30,"Poggibonsi, Italy",43.465028,11.148337,0
40,"Burnham-on-Sea, United Kingdom",51.237736,-2.998695,0
39,"Buchen, Germany",49.522297,9.324319,0
49,"Versmold, Germany",52.041675,8.149388,0
34,"Ifakara, Tanzania",-8.132718,36.681804,0
31,"Warrensburg, United States",38.762437,-93.740960,0
59,"Amuntai, Indonesia",-2.419618,115.253971,0
26,"Garden City, United States",37.971690,-100.872662,0
24,"Goshen, United States",41.582409,-85.834366,0
53,"Takahashi, Japan",34.790897,133.616911,0


### 5.2 Draw Map with Clustered Cities

In [266]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=1)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dfNew_clustered['Latitude'], dfNew_clustered['Longitude'], dfNew_clustered['Place'], dfNew_clustered['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### In case the map is not rendered in Github properly, please copy the URL of this notebook and view it in https://nbviewer.jupyter.org/

## 6. Analysis of Clusters

Interpreting the clusters by looking at the average similarity between the cities in the cluster vs. the benchmark cities:

In [262]:
dfSimilarity_clustered = dfSimilarity.copy()
dfSimilarity_clustered.insert(0, 'Cluster Labels', kmeans.labels_)

dfMeans = pd.DataFrame(columns=visited)
dfMeans['Barcelona, Spain'] = dfSimilarity_clustered.groupby(['Cluster Labels'])['Barcelona, Spain'].mean()
dfMeans['Budapest, Hungary'] = dfSimilarity_clustered.groupby(['Cluster Labels'])['Budapest, Hungary'].mean()

dfMeans

Unnamed: 0_level_0,"Barcelona, Spain","Budapest, Hungary"
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.235006,0.0
1,0.113924,0.575725
2,0.023106,0.0
3,0.360149,0.190732
4,0.469447,0.015029


It can be seen that the following cities are worth a visit:
- Cities very similar to Barcelona but very dissimilar to Budapest from Cluster 4
- Cities very similar to Budapest but dissimilar to Barcelona from Cluster 1
- Cities similar to Budapest and similar to Barcelona from Cluster 3

Cities from cluster 2 definitely should not be visited as they are neither similar to Barcelona nor to Budapest.

In [264]:
dfNew_clustered.sort_values(by="Cluster Labels").drop(["Latitude", "Longitude"], axis=1)

Unnamed: 0,Place,Cluster Labels
30,"Poggibonsi, Italy",0
40,"Burnham-on-Sea, United Kingdom",0
39,"Buchen, Germany",0
49,"Versmold, Germany",0
34,"Ifakara, Tanzania",0
31,"Warrensburg, United States",0
59,"Amuntai, Indonesia",0
26,"Garden City, United States",0
24,"Goshen, United States",0
53,"Takahashi, Japan",0
