# 250 Global Attractions

Who can resist those articles on Facebook: "I've read 39/250 of these classic books! How many have you read?". [This](http://www.listchallenges.com/print-list/27700) article is "the ultimate list of the greatest wonders in the world" and lists 250 global attractions. I've visited about 61 of the 250 listed attractions (some I have been to, but not went inside, so I didn't count those ones). 

But there are so many left on this list that I haven't visited! So I want to plot all these attractions on a map and use the list of places I haven't been to yet to get some ideas for where I should go on my next holiday - are there a lot of locations I haven't visited close together?

> This project is once again inspired by Lecture 2 (about data scraping) of Harvard's CS109 Data Science course. I'm using this project to practice the skills taught in these lectures.

Let's start by importing all the libraries needed:

In [1]:
# All imports
import requests
import urllib2
import bs4
import socket
import re
import time
from pygeocoder import Geocoder
from math import radians, cos, sin, asin, sqrt
import folium
from IPython.core.display import HTML
from collections import OrderedDict
import scipy.spatial as spsp

import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

I am using `requests` and `BeautifulSoup` to scrape and parse the html from the website: http://www.listchallenges.com/print-list/27700.

In [2]:
url = 'http://www.listchallenges.com/print-list/27700'
req = requests.get(url)
page = req.text
tree = bs4.BeautifulSoup(page, 'html.parser')

# print tree.prettify()

Inspecting the elements of the website, it can be seen the list of global attractions is contained within one giant `<div>` tag with the id = 'MainContent_repeaterItems'. An extract of the contents of this `<div>` tag below shows the desired location names are mixed in with a large number of tab, carriage return and line feed characters, whitespaces, digits, etc. which will all need to be stripped to form a useable list. 

In [3]:
div_tag = tree.find('div', id = 'MainContent_repeaterItems')

# attractions variable currently a long, single string of unicode
attractions = div_tag.text
attractions[:550]

u'\r\n\t\t\t\t# 1\r\n\t\t\t\tGreat Wall of China\r\n\t\t\t\r\n\t\t\t\t# 2\r\n\t\t\t\tStatue of Liberty\r\n\t\t\t\r\n\t\t\t\t# 3\r\n\t\t\t\tEiffel Tower\r\n\t\t\t\r\n\t\t\t\t# 4\r\n\t\t\t\tBig Ben\r\n\t\t\t\r\n\t\t\t\t# 5\r\n\t\t\t\tSydney Opera House\r\n\t\t\t\r\n\t\t\t\t# 6\r\n\t\t\t\tHollywood Sign\r\n\t\t\t\r\n\t\t\t\t# 7\r\n\t\t\t\tColosseum\r\n\t\t\t\r\n\t\t\t\t# 8\r\n\t\t\t\tWhite House\r\n\t\t\t\r\n\t\t\t\t# 9\r\n\t\t\t\tSagrada Familia\r\n\t\t\t\r\n\t\t\t\t# 10\r\n\t\t\t\tLittle Mermaid\r\n\t\t\t\r\n\t\t\t\t# 11\r\n\t\t\t\tTaj Mahal\r\n\t\t\t\r\n\t\t\t\t# 12\r\n\t\t\t\tBurj Al Arab Hotel\r\n\t\t\t\r\n\t\t\t\t# 13\r\n\t\t\t\tThe Pyramids Of Giza\r\n\t\t\t\r\n\t\t\t\t# 14\r\n\t\t\t\tGrand Canyon\r\n\t\t\t\r\n\t\t\t\t# 15\r\n\t\t\t\tArc de Triomphe\r\n\t\t\t\r\n\t\t\t\t# 16\r\n\t\t\t\tTimes Square\r\n\t\t\t\r'

Let's start by replacing all the `\t`, `\r` and `\n` characters with empty strings. Note the name of each global attraction is preceeded by a '#' character, which can be used to split the single string of unicode into a list of attraction names. Since there is a '#' before the first attraction name, the first element in the new variable `attractions` will contain an empty string, so let's remove it.

A print out of the first 10 rows of the `attractions` variable shows us how far we've come in forming a useable list:

In [4]:
# Use .replace to remove \t, \r, \n and split single string into list using '#' as break point
attractions = attractions.replace(u'\t', '').replace(u'\r', '').replace(u'\n', ' ').split('#')
# Remove first element of list, does not contain desired information
attractions = attractions[1:]
attractions[:10]

[u' 1 Great Wall of China  ',
 u' 2 Statue of Liberty  ',
 u' 3 Eiffel Tower  ',
 u' 4 Big Ben  ',
 u' 5 Sydney Opera House  ',
 u' 6 Hollywood Sign  ',
 u' 7 Colosseum  ',
 u' 8 White House  ',
 u' 9 Sagrada Familia  ',
 u' 10 Little Mermaid  ']

The manual numbering of the attractions is unneccesary, so we can use a list comprehension to cycle through the characters of each unicode string, keeping only non-digit characters. The leading and trailing whitespace characters are removed using .strip().

In [5]:
# Remove digits and strip whitespace from list
attractions = [''.join([i for i in s if not i.isdigit()]).strip() for s in attractions]
attractions[:10]

[u'Great Wall of China',
 u'Statue of Liberty',
 u'Eiffel Tower',
 u'Big Ben',
 u'Sydney Opera House',
 u'Hollywood Sign',
 u'Colosseum',
 u'White House',
 u'Sagrada Familia',
 u'Little Mermaid']

If we print out the entire list at this point, we can see inconsistent labelling - some attraction names include the city or country in which the attraction is located. This is often understandable, particularly if the name of the attraction is not unique (e.g. Grand Bazaar) and the city is required to differentiate the attraction from another location. 

In [6]:
attractions[10:20]

[u'Taj Mahal',
 u'Burj Al Arab Hotel',
 u'The Pyramids Of Giza',
 u'Grand Canyon',
 u'Arc de Triomphe',
 u'Times Square',
 u'Acropolis, Greece',
 u'Tokyo Tower',
 u'Brussels: Mannekin-Pis',
 u'Christ the Redeemer - Rio De Janerio, Brazil']

Since I plan to use geocoding to find the city and country in which *every* attraction is located, I want to strip this information from the few attraction names that already have it included. The way in which country and city names have been included is not very consistent, but is most commonly separated from the attraction name by a comma (e.g. Acropolis, Greece). So let's use .split(',') to remove anything after the comma. 

Further, some attraction names have been re-labelled below. This was done to correct spelling errors, add descriptive information to the attraction name or remove any remaning city / country names that had not previously been removed.

In [7]:
# Remove country / city information after comma, if exists
attractions = [t.split(",")[0] for t in attractions]
# Remove brackets, if exist
attractions = [re.sub('[(){}<>]', '', t) for t in attractions]

# Re-label some attractions to correct spelling or add local name
attractions[9] = 'The Little Mermaid (statue)'.decode('utf-8')
attractions[18] = 'Manneken-Pis (statue)'.decode('utf-8') 
attractions[19] = 'Christ the Redeemer (statue)'.decode('utf-8')
attractions[26] = 'Musee du Louvre'.decode('utf-8') 
attractions[41] = "Mo'ai, Easter Island".decode('utf-8')
attractions[46] = 'Willis (Sears) Tower'.decode('utf-8')
attractions[49] = 'Chichen Itza'.decode('utf-8')
attractions[54] = 'Mont Saint-Michel'.decode('utf-8')
attractions[55] = "St. Peter's Basilica".decode('utf-8')
attractions[61] = 'Pont du Gard'.decode('utf-8')
attractions[65] = 'Potala Palace'.decode('utf-8')
attractions[67] = "St. Mark's Square".decode('utf-8')
attractions[91] = 'The Centre Pompidou'.decode('utf-8')
attractions[93] = 'Torre de Belem'.decode('utf-8')
attractions[128] = 'Sydney Harbour Bridge'.decode('utf-8')
attractions[137] = 'Freedom Tower, Ground Zero'.decode('utf-8')
attractions[135] = 'Kamakura Daibutsu (Great Buddha)'.decode('utf-8')
attractions[149] = 'Papal Palace'.decode('utf-8')
attractions[159] = 'Guggenheim Museum, Bilbao'.decode('utf-8')
attractions[167] = 'Sultan Ahmet (Blue) Mosque'.decode('utf-8')
attractions[178] = 'D-Day Beaches, Normandy American Cemetery'.decode('utf-8')
attractions[186] = 'Wawel Cathedral'.decode('utf-8')
attractions[189] = 'Guggenheim Museum, New York City'.decode('utf-8')
attractions[190] = 'Temppeliaukio Church (Rock Church)'.decode('utf-8')
attractions[207] = 'Meteor Crater, Winslow'.decode('utf-8')
attractions[217] = 'Aswan Dam'.decode('utf-8')
attractions[219] = 'Harmandir Sahib (Golden Temple)'.decode('utf-8')
attractions[238] = 'Salar de Uyuni'.decode('utf-8')

The next step is to geocode the attractions to extract the latitude and longitude coordinates, which will allow me to plot the points on a map. Here I used `pygeocoder` and the Google Maps geocoding API. Although the geocoder is designed for use with street addresses, I found it worked relatively well with the attraction names only (as the attraction names are often included in the `natural_feature` or `point_of_interest` tags for a particular address). 

Some attraction names generated zero results initially, so a separate list was created specifically for the geocoding process where the 'attraction name' could be changed to aid the process of finding the correct result. There were three main reasons for attraction names generating zero (or incorrect) results:
- The attraction name does not exist in the geocoding database. In this instance, the attraction name was changed to closest existing address to preserve the (approximate) geographic coordinates of the attraction. 
- The English name for the attraction returns zero results, but using the local name for the attraction returns the correct result.  
- Particularly where the attraction name is not unique (e.g. Grand Bazaar), additional information such as the city or country was required to find the correct result. As the results of the Google Maps Geocoding API are influenced by the region (or country) from which the request is sent, this is more common for attractions not located in the US.

After the geocoding process, I then reverse geocode using the geographical coordinates to extract the country and city (or geographic area) for each location. Not all locations had a city component to their geocode, so where the city component is `None`, the for loop below tries extracting the administrative area or postal town.

In [8]:
# Create a separate list with changes required for geocoding
geo_list = attractions[:]
geo_list[3] = 'Palace of Westminster'
geo_list[8] = 'La Sagrada Familia, Barcelona'
geo_list[9] = 'Langelinie, Copenhagen'
geo_list[16] = 'Acropolis, Athens' 
geo_list[18] = '1000 Bruxelles'
geo_list[19] = 'Cristo Redento, Rio di Janeiro'
geo_list[21] = "30 The Queen's Walk, London"
geo_list[24] = 'The Palace Museum, Beijing'
geo_list[27] = 'Maan' 
geo_list[30] = 'Oriental Pearl Tower, Shanghai'
geo_list[38] = 'Berlin'
geo_list[39] = 'Ancient City Walls, Dubrovnik' 
geo_list[46] = 'Willis Tower, Chicago'
geo_list[55] = 'Vatican'
geo_list[59] = 'Pantheon, Rome'
geo_list[60] = 'Trafalgar Square, London'
geo_list[61] = 'Vers-Pont-du-Gard'
geo_list[62] = 'Alhambra, Granada'
geo_list[67] = "St. Mark's Square, Venice"
geo_list[82] = 'Grand Bazaar, Istanbul' 
geo_list[88] = 'Cappadocia, Turkey'
geo_list[91] = '55 Rue Rambuteau, Paris'
geo_list[93] = 'Jardim da Torre de Belém, Lisboa'
geo_list[94] = 'Torre di Pisa'
geo_list[95] = 'Table Mountain, Cape Town'
geo_list[96] = 'The Twelve Apostles, Victoria'
geo_list[98] = 'Westminster Abbey, London'
geo_list[104] = 'Cathedrale Notre-Dame de Paris'
geo_list[106] = 'Basilica Cattedrale Patriarcale di San Marco'
geo_list[111] = "St. Paul's Cathedral, London"
geo_list[112] = 'Ponte di Rialto, Venice'
geo_list[114] = 'White Cliffs of Dover, England'
geo_list[115] = 'Washington Monument, DC'
geo_list[119] = 'Pentagon, Arlington'
geo_list[120] = 'Cloud Gate, Chicago'
geo_list[121] = 'Lintong'
geo_list[123] = 'The Shard, London'
geo_list[131] = 'Riva degli Schiavoni'
geo_list[133] = 'Lascaux'
geo_list[136] = 'Santissima Trinita al Monte Pincio'
geo_list[138] = 'Gateway Arch Trail'
geo_list[142] = 'Kinderdijk'
geo_list[146] = 'Am Lustgarten, Berlin'
geo_list[149] = 'Palais des Papes, Avignon'
geo_list[151] = 'Tsarskoye Selo'
geo_list[153] = 'Via Cappello, 23, 37121 Verona'
geo_list[158] = 'Kapellbrücke'
geo_list[163] = 'British Museum Reading Room'
geo_list[165] = 'Blue Lagoon, Iceland'
geo_list[166] = 'Oxford University, England'
geo_list[167] = 'Sultan Ahmet Cami'
geo_list[176] = 'Karluv most'
geo_list[178] = 'Omaha Beach, Normandy'
geo_list[186] = 'Zamek Wawel 5'
geo_list[190] = 'Temppeliaukio Church'
geo_list[201] = 'rijksmuseum'
geo_list[202] = 'Cambridge University, England'
geo_list[215] = 'Valley of the Kings, Luxor'
geo_list[219] = 'Harmandir Sahib'
geo_list[226] = 'Constitution Hill, London'
geo_list[228] = 'Duomo di Siena'
geo_list[231] = '1000 Constitution Ave. NW, Washington, DC'
geo_list[234] = 'Thingvellir'
geo_list[237] = 'Nazca'
geo_list[240] = 'Ngong Ping'
geo_list[244] = 'Efes'
geo_list[245] = 'Blue Hole, Belize'
geo_list[246] = 'Taktsang trail, Bhutan'
geo_list[247] = 'Gullfoss, Iceland'

# Geocoding and reverse geocoding assistance from:
# http://chrisalbon.com/python/geocoding_and_reverse_geocoding.html

# Geocoding to find latitude and longitude coordinates
coordinates = []

for location in geo_list:
    result = Geocoder.geocode(location)
    coordinates.append(result[0].coordinates)
    time.sleep(1) # slow number of requests to avoid OVER_QUERY_LIMIT error

coordinates = pd.DataFrame(data = coordinates, columns = ['Lat', 'Lon'])

# Reverse geocoding to find country and city / geographic area
cities = []
countries = []
        
for row in range(250):
    try:
        result = Geocoder.reverse_geocode(coordinates['Lat'][row], coordinates['Lon'][row])
        time.sleep(1) # slow number of requests to avoid OVER_QUERY_LIMIT error
        row += 1
        if result.city is not None: 
            cities.append(result.city) 
        else:
            if result.administrative_area_level_1 is not None:
                cities.append(result.administrative_area_level_1)
            else:
                if result.administrative_area_level_2 is not None:
                    cities.append(result.administrative_area_level_2)
                else:
                    if result.administrative_area_level_3 is not None:
                        cities.append(result.administrative_area_level_3)
                    else:
                        if result.postal_town is not None:
                            cities.append(result.postal_town)
                        else:
                            cities.append(result.route)
        countries.append(result.country)
    except:
        cities.append(None)
        countries.append(None)

Putting the attraction names, cities and countries together, we get a table of attractions and their locations.

In [9]:
attractions_table = pd.DataFrame(data = zip(attractions, cities, countries), 
                      columns = ['Global Attractions', 'City / Area', 'Country'])
attractions_table[:10]

Unnamed: 0,Global Attractions,City / Area,Country
0,Great Wall of China,Zunyi Shi,China
1,Statue of Liberty,New York,United States
2,Eiffel Tower,Paris,France
3,Big Ben,London,United Kingdom
4,Sydney Opera House,Sydney,Australia
5,Hollywood Sign,Los Angeles,United States
6,Colosseum,Roma,Italy
7,White House,Washington,United States
8,Sagrada Familia,Barcelona,Spain
9,The Little Mermaid (statue),København,Denmark


There are a few `None` results left, which can be replaced manually.

In [10]:
attractions_table[attractions_table.isnull().any(axis = 1)]

attractions_table.iloc[126, 1] = 'Jerusalem'
attractions_table.iloc[126, 2] = 'Israel'
attractions_table.iloc[147, 1] = 'Brighton'
attractions_table.iloc[205, 1] = 'Colorado River'
attractions_table.iloc[245, 1] = 'Belize Barrier Reef Reserve System'
attractions_table.iloc[216, 1] = 'Jerusalem'
attractions_table.iloc[216, 2] = 'Israel'
attractions_table.iloc[239, 1] = 'Dead Sea'
# Dead Sea shores in Israel, Jordan and West Bank, taken first country alphabetically
attractions_table.iloc[239, 2] = 'Israel'

So, how many of these places have I visited? Let's make a list.

In [11]:
visited = ['Statue of Liberty', 'Eiffel Tower', 'Big Ben', 'Sydney Opera House', 
           'Hollywood Sign', 'White House', 'Sagrada Familia', 'The Little Mermaid (statue)',
           'Taj Mahal', 'Burj Al Arab Hotel', 'The Pyramids Of Giza', 'Arc de Triomphe', 
           'Times Square', 'London Eye', 'Empire State Building', 'Burj Khalifa', 
           'Musee du Louvre', 'Petra', 'Hagia Sophia', 'Stonehenge', 
           'Capitol Hill', 'Tower of London', 'Willis (Sears) Tower', 'Tower Bridge',
           'The Great Sphinx', 'Brooklyn Bridge', 'Trafalgar Square', 'Buckingham Palace',
           'Windsor Castle', 'Luxor', 'Grand Bazaar', 'Cappadocia',
           'Central Park', 'Westminster Abbey', 'Lincoln Memorial', 'Notre Dame Cathedral', 
           'Las Vegas', "St. Paul’s Cathedral", 'Washington Monument', 'Cloud Gate',
           'Nyhavn', 'Wailing Wall', 'Sydney Harbour Bridge', 'Kronborg Castle',
           'Tivoli Gardens', 'Hollywood Walk of Fame', 'British Museum', 'Amalienborg Palace',
           'Oxford University', 'Sultan Ahmet (Blue) Mosque', 'Piccadilly Circus', "Lover's Bridge",
           'Oresund Bridge', 'Guggenheim Museum, New York City', 'Valley of the Kings', 'Jerusalem Old City', 
           'Aswan Dam', 'Smithsonian National Museum of Natural History', 'Dead Sea', 'Ephesus',
           'Halong Bay']

# Convert to unicode so we can compare with contents of attractions_table
visited = [i.decode('utf-8') for i in visited]

Now I want to use this `visited` list to split the table of attractions into a table of attractions I have visited and those I have yet to visit, and plot them all on a map. The places I have visited are marked using a blue marker and those I haven't with a red marker.

In [12]:
# Use visited variable to create column of 1's and 0's based on whether I have visited a location
visited_col = []
for row in range(250):
    if attractions_table.iloc[row, 0] in visited:
        visited_col.append(1)
    else:
        visited_col.append(0)

visited_df = pd.DataFrame(data = visited_col, columns = ['Visited?'])
map_data = pd.concat([attractions_table, visited_df, coordinates], axis = 1)

# Divide map_data dataframe into 'visited' and 'not visited'
map_data_visited = map_data[map_data['Visited?'] == 1].set_index([range(len(visited))])
map_data_not_visited = map_data[map_data['Visited?'] == 0].set_index([range(len(attractions) - len(visited))])

# Map data
map_osm = folium.Map(location = [map_data.iloc[0, 4], map_data.iloc[0, 5]], 
                    zoom_start = 2)
for row in range(len(visited)):
    folium.Marker([map_data_visited.iloc[row, 4], map_data_visited.iloc[row, 5]], 
                  popup = map_data_visited.iloc[row, 0], 
                  icon = folium.Icon(color = 'blue', icon = 'ok-sign')).add_to(map_osm)
for row in range(len(attractions) - len(visited)):
    folium.Marker([map_data_not_visited.iloc[row, 4], map_data_not_visited.iloc[row, 5]], 
                  popup = map_data_not_visited.iloc[row, 0], 
                  icon = folium.Icon(color = 'red', icon = 'remove-sign')).add_to(map_osm)

map_osm

That's too many red markers! I should go visit some! Can I go see a number of them in one trip?

Let's look at how close these attractions are. If I look at a 200km radius, what part of the world has the most global attractions that I haven't already been to? We can use the Haversine function to calculate the great circle distance between each location.

In [13]:
# Haversine function from Stack Overflow:
# http://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat / 2) ** 2 + cos(lat1) * cos(lat2) * sin(dlon / 2) ** 2
    c = 2 * asin(sqrt(a))
    # Radius of earth in kms.
    kms = 6371 * c  
    return kms

There are 189 attractions in the `map_data_not_visited` variable. By calculating the great circle distance from each attraction to all other attractions in the `map_data_not_visited` variable, recording only those that are less than 200km and completing the same calculation for each attraction, we get a list of lists (of length 189) with each sub-list containing the name of the attraction from which all distances are calcaulted and all other attractions that are within 200km of that attraction.

In [14]:
points = map_data_not_visited[['Lat', 'Lon']]

threshold = 200 # 200km radius
pts_len = len(points)
grouped_attractions = []
    
for num in range(pts_len):
    lat1 = points.iloc[num, 0]
    lon1 = points.iloc[num, 1]
    sub_list = []
    for point in range(pts_len):
        lat2 = points.iloc[point, 0]
        lon2 = points.iloc[point, 1]
        dist = haversine(lon1, lat1, lon2, lat2)
        if dist < threshold:
            sub_list.append(map_data_not_visited.iloc[point, 0])
        else:
            pass
    grouped_attractions.append(sub_list)

grouped_attractions[:5]

[[u'Great Wall of China'],
 [u'Colosseum',
  u"St. Peter's Basilica",
  u'Pantheon',
  u'Trevi Fountain',
  u'Sistine Chapel',
  u'Spanish Steps',
  u'Piazza Del Campo',
  u'Pienza',
  u'Siena Cathedral',
  u'Basilica in Assisi'],
 [u'Grand Canyon',
  u'Bryce Canyon National Park',
  u'Meteor Crater, Winslow',
  u'Zion National Park',
  u'Antelope Canyon'],
 [u'Acropolis', u'Delphi'],
 [u'Tokyo Tower',
  u'Fuji',
  u'Kamakura Daibutsu (Great Buddha)',
  u'Matsumoto Castle']]

If we calculate the length of each sub-list, we can see which cluster of locations is the biggest. 

In [15]:
# Calculate length of each sub-list in grouped_attractions
group_size = [len(row) for row in grouped_attractions]
# Find index of sublists with the largest number of attractions
indx = [i for i, j in enumerate(group_size) if j == max(group_size)]

# Print longest sublist of attractions
grouped_attractions[indx[0]]

[u'Milan Cathedral',
 u"St. Mark's Square",
 u"St. Mark's Basilica & Campanile",
 u'Florence Cathedral',
 u'Rialto Bridge',
 u'Arena di Verona',
 u'Ponte Vecchio',
 u'Bridge of Sighs',
 u"Juliet's Balcony",
 u'Portofino',
 u'Lago di Garda',
 u'Trentino Dolomites']

A printout of the largest cluster of attractions tells me I should go to Northern Italy! 

In [16]:
# Extract coordinates for clustered attractions to visit
to_visit_lat = []
to_visit_lon = []

for row in range(len(map_data_not_visited)):
    if map_data_not_visited.iloc[row, 0] in grouped_attractions[indx[0]]:
        to_visit_lat.append(map_data_not_visited.iloc[row, 4])
        to_visit_lon.append(map_data_not_visited.iloc[row, 5])
    else:
        pass

# Dataframe with clustered attractions and corresponding coordinates
to_visit = pd.DataFrame(zip(grouped_attractions[indx[0]], to_visit_lat, to_visit_lon), 
                        columns = ['Global Attractions', 'Lat', 'Lon'])

# Map data
map_osm = folium.Map(location = [to_visit.iloc[5, 1], to_visit.iloc[5, 2]], 
                    zoom_start = 7)
for row in range(len(to_visit)):
    folium.Marker([to_visit.iloc[row, 1], to_visit.iloc[row, 2]], 
                  popup = to_visit.iloc[row, 0], 
                  icon = folium.Icon(color = 'red', icon = 'remove-sign')).add_to(map_osm)
map_osm

Only 8 locations appear to be marked on the map above, but zooming in to Venezia and Verona will reveal the remaining markers. 

Now, where is my passport?