## Battle of the Neighborhoods submission - Full report

### 1. Introduction

<u>The business problem (who would be interested in this project)?</u>

A tourist agency covering entire Europe wants to classify their city trips to European capitals on their website and in their travel booklets based on the characteristics of the neighborhood of the city center. 

Accurate classification of European cities will help clients focus on relevant sections for their individual interests. For instance, some clients like to go to theaters, some love food, and others want to explore ruins. What are the really unique capitals in Europe based on neighborhood characteristics in the capital centers?

While the immediate target group is the Management of the travel company, the ultimate target group are the users of the website and travel booklets.

### 2. Data

<u>2. What data will be used to solve the problem (including data source).</u>

A list of European capital centers will be created using this list of global capital centers: http://techslides.com/list-of-countries-and-capitals . The list is very comprehensive which is intentionally selected for this analysis to increase reliability and ensure full coverage for the travel company.

After creating a dataframe of capitals in this list, the list needs to be filtered for non-European countries to create our data list with relevant latitudes and longitudes.

Regions will be defined in Foursquare using this location data with a larger scope than the previous exercise (top 100 venues with 1500 meter radius). This should be sufficient to capture the nature of the capital as the offering of this tourist agency consists of hotels in the city centers.

Subsequently a similar analysis will be conducted in a similar manner as in the previous week using Foursquare location data.


### 3. Methodology

<u>Description of any exploratory data analysis, inferential statistical testing, what machine learnings were used and why.</u>

<i>A) Create dataframe of European capitals from table on website</i>

In [1]:
# import modules and libraries

!conda install -c conda-forge folium=0.5.0 --yes
import folium
import requests
from bs4 import BeautifulSoup
import pandas as pd
import random
import numpy as np 
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
# table of broader European capitals (chosen over the more limited list because more comprehensive)
website_url = requests.get('http://techslides.com/list-of-countries-and-capitals').text
soup = BeautifulSoup(website_url,'lxml')

# extract table
my_table = soup.find('table')

In [3]:
# create list with all capitals in the world
table = []

table_rows = my_table.find_all('tr')

# extract rows
for tr in table_rows:
    td = tr.find_all('td')
    row = [tr.text for tr in td]
    table.append(row)

In [4]:
# create dataframe
all_capitals = pd.DataFrame(table[1:], columns=["Country", "Capital", "Latitude", "Longitude", "Code", "Continent" ])

In [5]:
# show structure dataframe
all_capitals.head()

Unnamed: 0,Country,Capital,Latitude,Longitude,Code,Continent
0,Afghanistan,Kabul,34.51666667,69.183333,AF,Asia
1,Aland Islands,Mariehamn,60.116667,19.9,AX,Europe
2,Albania,Tirana,41.31666667,19.816667,AL,Europe
3,Algeria,Algiers,36.75,3.05,DZ,Africa
4,American Samoa,Pago Pago,-14.26666667,-170.7,AS,Australia


In [6]:
# remove non-European continents
european_capitals = all_capitals.loc[all_capitals['Continent'] == "Europe"]
european_capitals.shape

(58, 6)

In [7]:
# remove code and continent columns
european_capitals.drop(columns=['Continent', 'Code']).head()
european_capitals = european_capitals.reset_index(drop=True)

european_capitals

Unnamed: 0,Country,Capital,Latitude,Longitude,Code,Continent
0,Aland Islands,Mariehamn,60.116667,19.9,AX,Europe
1,Albania,Tirana,41.31666667,19.816667,AL,Europe
2,Andorra,Andorra la Vella,42.5,1.516667,AD,Europe
3,Armenia,Yerevan,40.16666667,44.5,AM,Europe
4,Austria,Vienna,48.2,16.366667,AT,Europe
5,Azerbaijan,Baku,40.38333333,49.866667,AZ,Europe
6,Belarus,Minsk,53.9,27.566667,BY,Europe
7,Belgium,Brussels,50.83333333,4.333333,BE,Europe
8,Bosnia and Herzegovina,Sarajevo,43.86666667,18.416667,BA,Europe
9,Bulgaria,Sofia,42.68333333,23.316667,BG,Europe


In [8]:
# convert location in floats
# convert column "a" to int64 dtype and "b" to complex type
european_capitals = european_capitals.astype({"Latitude": float, "Longitude": float})

<i> B) Show capital centers on map</i>

In [9]:
# show capital centers
latitude = 64.5260
longitude = 15.2551

map_europe = folium.Map(location=[latitude, longitude], zoom_start=3)

# add markers to map
for lat, lng, capital in zip(european_capitals['Latitude'], european_capitals['Longitude'], european_capitals['Capital']):
    label = '{}, {}'.format(european_capitals, capital)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_europe)  
    
map_europe

<i> C) Generate venues for each capital</i>

In [10]:
# foursquare credentials
CLIENT_ID = 'DN4JF3XIOXNM45NJJAEQIDMUWG5KZBX2MPYRBEC3WR1IPJG2'

CLIENT_SECRET = 'F3EKQJWMA4P4ZJWVDE51GUSQEGTD1GEIDARNTLURMZDLD2E5'

VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DN4JF3XIOXNM45NJJAEQIDMUWG5KZBX2MPYRBEC3WR1IPJG2
CLIENT_SECRET: F3EKQJWMA4P4ZJWVDE51GUSQEGTD1GEIDARNTLURMZDLD2E5


In [11]:
# top 100 venues with 1500 meter radius for all neighborhoods
LIMIT = 100
RADIUS = 1500

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)          
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Capital', 'Capital Latitude', 'Capital Longitude', 'Venue', 
                                            'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)

europe_venues = getNearbyVenues(names=european_capitals['Capital'],
                                   latitudes=european_capitals['Latitude'],
                                   longitudes=european_capitals['Longitude'])

print(europe_venues.shape)

Mariehamn
Tirana
Andorra la Vella
Yerevan
Vienna
Baku
Minsk
Brussels
Sarajevo
Sofia
Zagreb
Nicosia
Prague
Copenhagen
Tallinn
Torshavn
Helsinki
Paris
Tbilisi
Berlin
Gibraltar
Athens
Saint Peter Port
Budapest
Reykjavik
Dublin
Douglas
Rome
Saint Helier
Pristina
Riga
Vaduz
Vilnius
Luxembourg
Skopje
Valletta
Chisinau
Monaco
Podgorica
Amsterdam
North Nicosia
Oslo
Warsaw
Lisbon
Bucharest
Moscow
San Marino
Belgrade
Bratislava
Ljubljana
Madrid
Longyearbyen
Stockholm
Bern
Ankara
Kyiv
London
Vatican City
(2382, 7)


In [12]:
# show dataframe with results
europe_venues.head()

Unnamed: 0,Capital,Capital Latitude,Capital Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tirana,41.316667,19.816667,Xheko Imperial Hotel,41.31827,19.818883,Hotel
1,Tirana,41.316667,19.816667,The Carlsberg,41.318043,19.818864,Diner
2,Tirana,41.316667,19.816667,Sophie Caffe & Snacks,41.317183,19.818986,Café
3,Tirana,41.316667,19.816667,Pagus Steakhouse,41.318313,19.818585,Italian Restaurant
4,Tirana,41.316667,19.816667,"Creperie ""Pietro Nini""",41.317311,19.81489,Creperie


In [42]:
# show how many venues per capital
europe_venues.groupby('Capital').count()

Unnamed: 0_level_0,Capital Latitude,Capital Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Capital,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Amsterdam,32,32,32,32,32,32
Andorra la Vella,10,10,10,10,10,10
Ankara,100,100,100,100,100,100
Athens,100,100,100,100,100,100
Baku,31,31,31,31,31,31
Belgrade,4,4,4,4,4,4
Berlin,56,56,56,56,56,56
Bern,3,3,3,3,3,3
Bratislava,87,87,87,87,87,87
Brussels,44,44,44,44,44,44


In [14]:
# unique categories for all these neighborhoods
print('There are {} uniques categories.'.format(len(europe_venues['Venue Category'].unique())))

There are 308 uniques categories.


<i> D) Aggregate venues for each capital</i>

In [15]:
# one hot encoding
europe_onehot = pd.get_dummies(europe_venues[['Venue Category']], prefix="", prefix_sep="")

# add capital column back to dataframe
europe_onehot['Capital'] = europe_venues['Capital'] 

# move neighborhood column to first column
fixed_columns = [europe_onehot.columns[-1]] + list(europe_onehot.columns[:-1])

europe_onehot = europe_onehot[fixed_columns]

europe_onehot.head()

Unnamed: 0,Capital,Accessories Store,African Restaurant,Airport Service,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Tirana,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
europe_onehot.shape

(2382, 309)

In [17]:
# group rows by capital and by taking mean of frequency of occurrence of each category.
europe_grouped = europe_onehot.groupby('Capital').mean().reset_index()

europe_grouped.head()

Unnamed: 0,Capital,Accessories Store,African Restaurant,Airport Service,American Restaurant,Antique Shop,Aquarium,Arcade,Art Gallery,Art Museum,...,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio,Zoo
0,Amsterdam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Andorra la Vella,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ankara,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.04,0.02,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Athens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.03,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
4,Baku,0.0,0.0,0.032258,0.0,0.0,0.0,0.032258,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
europe_grouped.shape

(57, 309)

In [19]:
# print capitals with top 5 venues
num_top_venues = 5

for hood in europe_grouped['Capital']:
    print("----"+hood+"----")
    temp = europe_grouped[europe_grouped['Capital'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Amsterdam----
                venue  freq
0         Coffee Shop  0.09
1               Plaza  0.06
2  Italian Restaurant  0.06
3                Café  0.06
4          Restaurant  0.06


----Andorra la Vella----
                  venue  freq
0            Restaurant   0.3
1                 Hotel   0.2
2          Tennis Court   0.1
3  Gym / Fitness Center   0.1
4          Cocktail Bar   0.1


----Ankara----
                venue  freq
0                Café  0.22
1       Historic Site  0.06
2        Antique Shop  0.04
3  Turkish Restaurant  0.04
4         Art Gallery  0.04


----Athens----
         venue  freq
0          Bar  0.16
1         Café  0.12
2  Coffee Shop  0.06
3      Theater  0.05
4    Bookstore  0.04


----Baku----
                     venue  freq
0               Restaurant  0.26
1  Health & Beauty Service  0.06
2                 Tea Room  0.06
3                     Park  0.06
4                     Café  0.06


----Belgrade----
                  venue  freq
0    Seafood Rest

In [20]:
# create dataframe and display top 10 venues for each capital

# sort capitals in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Capital']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

capitals_venues_sorted = pd.DataFrame(columns=columns)
capitals_venues_sorted['Capital'] = europe_grouped['Capital']

for ind in np.arange(europe_grouped.shape[0]):
    capitals_venues_sorted.iloc[ind, 1:] = return_most_common_venues(europe_grouped.iloc[ind, :], num_top_venues)

capitals_venues_sorted

Unnamed: 0,Capital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amsterdam,Coffee Shop,Plaza,Restaurant,Italian Restaurant,Café,Hotel,Roof Deck,Bookstore,Music Store,Office
1,Andorra la Vella,Restaurant,Hotel,Plaza,Tennis Court,Cocktail Bar,Gym / Fitness Center,Football Stadium,Farmers Market,Empanada Restaurant,English Restaurant
2,Ankara,Café,Historic Site,Art Gallery,Antique Shop,Turkish Restaurant,Doner Restaurant,Museum,Kebab Restaurant,Breakfast Spot,Art Museum
3,Athens,Bar,Café,Coffee Shop,Theater,Wine Bar,Bookstore,Souvlaki Shop,Greek Restaurant,Vegetarian / Vegan Restaurant,Cretan Restaurant
4,Baku,Restaurant,Tea Room,Park,Café,Health & Beauty Service,Lounge,Eastern European Restaurant,Grocery Store,Gym / Fitness Center,Middle Eastern Restaurant
5,Belgrade,Seafood Restaurant,Restaurant,Zoo,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market
6,Berlin,Hotel,German Restaurant,History Museum,Café,Plaza,Art Museum,Museum,Outdoor Sculpture,Art Gallery,Restaurant
7,Bern,Japanese Restaurant,Train Station,Electronics Store,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant
8,Bratislava,Café,Coffee Shop,Vegetarian / Vegan Restaurant,Bar,Brewery,Bakery,Plaza,Hostel,Wine Shop,Gym / Fitness Center
9,Brussels,Greek Restaurant,Sandwich Place,Hotel,Coffee Shop,Bar,Deli / Bodega,Vegetarian / Vegan Restaurant,French Restaurant,Furniture / Home Store,Salad Place


<i> E) Assign cluster to each capital</i>

In [21]:
# cluster capitals
kclusters = 15

europe_grouped_clustering = europe_grouped.drop('Capital', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(europe_grouped_clustering)

kmeans.labels_

array([13,  2, 13, 13,  2,  5,  0,  8, 13, 13, 13, 13,  9, 13, 13,  2,  0,
       13, 13, 13, 13, 13,  0, 10,  2, 13, 10, 13, 13, 13, 13, 13, 13, 13,
        2, 13,  2, 10,  0,  0,  1, 12, 13,  2,  0,  2,  7, 13,  6,  3,  4,
        0, 13, 14, 13, 11,  2], dtype=int32)

In [22]:
# new dataframe including cluster and top 10 venues for each capital.

capitals_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

capitals_venues_sorted.dropna(subset=["1st Most Common Venue"], axis = 0, inplace=True)

capitals_venues_sorted

Unnamed: 0,Cluster Labels,Capital,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,13,Amsterdam,Coffee Shop,Plaza,Restaurant,Italian Restaurant,Café,Hotel,Roof Deck,Bookstore,Music Store,Office
1,2,Andorra la Vella,Restaurant,Hotel,Plaza,Tennis Court,Cocktail Bar,Gym / Fitness Center,Football Stadium,Farmers Market,Empanada Restaurant,English Restaurant
2,13,Ankara,Café,Historic Site,Art Gallery,Antique Shop,Turkish Restaurant,Doner Restaurant,Museum,Kebab Restaurant,Breakfast Spot,Art Museum
3,13,Athens,Bar,Café,Coffee Shop,Theater,Wine Bar,Bookstore,Souvlaki Shop,Greek Restaurant,Vegetarian / Vegan Restaurant,Cretan Restaurant
4,2,Baku,Restaurant,Tea Room,Park,Café,Health & Beauty Service,Lounge,Eastern European Restaurant,Grocery Store,Gym / Fitness Center,Middle Eastern Restaurant
5,5,Belgrade,Seafood Restaurant,Restaurant,Zoo,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market
6,0,Berlin,Hotel,German Restaurant,History Museum,Café,Plaza,Art Museum,Museum,Outdoor Sculpture,Art Gallery,Restaurant
7,8,Bern,Japanese Restaurant,Train Station,Electronics Store,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant
8,13,Bratislava,Café,Coffee Shop,Vegetarian / Vegan Restaurant,Bar,Brewery,Bakery,Plaza,Hostel,Wine Shop,Gym / Fitness Center
9,13,Brussels,Greek Restaurant,Sandwich Place,Hotel,Coffee Shop,Bar,Deli / Bodega,Vegetarian / Vegan Restaurant,French Restaurant,Furniture / Home Store,Salad Place


In [23]:
europe_merged = european_capitals

# borough_dataframe = pd.merge(borough_dataframe, coordinates, left_on='PostalCode', right_on='Postal Code')

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
europe_merged = europe_merged.join(capitals_venues_sorted.set_index('Capital'), on='Capital')

# remove NaN rows
europe_merged.dropna(subset=["1st Most Common Venue"], inplace=True)
europe_merged["Cluster Labels"].astype('int')
europe_merged

# remove irrelevant columns
europe_merged.drop(['Country', 'Code', 'Continent'], axis=1)

europe_merged

Unnamed: 0,Country,Capital,Latitude,Longitude,Code,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Albania,Tirana,41.316667,19.816667,AL,Europe,13.0,Italian Restaurant,Café,Coffee Shop,Bar,Hotel,Cocktail Bar,Ice Cream Shop,Diner,Pizza Place,Fast Food Restaurant
2,Andorra,Andorra la Vella,42.5,1.516667,AD,Europe,2.0,Restaurant,Hotel,Plaza,Tennis Court,Cocktail Bar,Gym / Fitness Center,Football Stadium,Farmers Market,Empanada Restaurant,English Restaurant
3,Armenia,Yerevan,40.166667,44.5,AM,Europe,11.0,Fast Food Restaurant,Food Court,Public Art,Snack Place,Zoo,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant
4,Austria,Vienna,48.2,16.366667,AT,Europe,13.0,Café,Hotel,Sushi Restaurant,Japanese Restaurant,Bar,Coffee Shop,Indian Restaurant,Park,Asian Restaurant,Seafood Restaurant
5,Azerbaijan,Baku,40.383333,49.866667,AZ,Europe,2.0,Restaurant,Tea Room,Park,Café,Health & Beauty Service,Lounge,Eastern European Restaurant,Grocery Store,Gym / Fitness Center,Middle Eastern Restaurant
6,Belarus,Minsk,53.9,27.566667,BY,Europe,13.0,Café,Theater,Bookstore,Museum,Art Gallery,Doner Restaurant,Park,Art Museum,Plaza,Circus
7,Belgium,Brussels,50.833333,4.333333,BE,Europe,13.0,Greek Restaurant,Sandwich Place,Hotel,Coffee Shop,Bar,Deli / Bodega,Vegetarian / Vegan Restaurant,French Restaurant,Furniture / Home Store,Salad Place
8,Bosnia and Herzegovina,Sarajevo,43.866667,18.416667,BA,Europe,12.0,Lounge,Farmers Market,Modern European Restaurant,Café,Restaurant,Theater,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space
9,Bulgaria,Sofia,42.683333,23.316667,BG,Europe,2.0,Restaurant,Nightclub,Cocktail Bar,Bakery,Bar,Café,Dessert Shop,Hotel,Liquor Store,Vegetarian / Vegan Restaurant
10,Croatia,Zagreb,45.8,16.0,HR,Europe,2.0,Café,Hotel,Restaurant,Nightclub,Sushi Restaurant,Bar,Light Rail Station,BBQ Joint,Chinese Restaurant,French Restaurant


<i> G) Visualize cluster to each capital</i>

In [24]:
# visualize clusters

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=3)

# set color scheme
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to map
markers_colors = []
for lat, lon, poi, cluster in zip(europe_merged['Latitude'], europe_merged['Longitude'], europe_merged['Capital'], europe_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<i> H) Show list of each cluster of capitals</i>

In [25]:
# cluster 1
europe_merged.loc[europe_merged['Cluster Labels'] == 0, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Berlin,Europe,0.0,Hotel,German Restaurant,History Museum,Café,Plaza,Art Museum,Museum,Outdoor Sculpture,Art Gallery,Restaurant
20,Gibraltar,Europe,0.0,Scenic Lookout,Hotel,Botanical Garden,Rock Club,Nature Preserve,Restaurant,Café,Mediterranean Restaurant,Trail,Boat or Ferry
22,Saint Peter Port,Europe,0.0,Hotel,Bar,Restaurant,Seafood Restaurant,Bus Station,Eastern European Restaurant,Museum,Cocktail Bar,Fish Market,Fish & Chips Shop
28,Saint Helier,Europe,0.0,Hotel,Coffee Shop,Department Store,Pub,Restaurant,Molecular Gastronomy Restaurant,Bar,Pizza Place,Café,Sandwich Place
51,Longyearbyen,Europe,0.0,Hotel,Bar,Department Store,Liquor Store,Bakery,Outdoor Sculpture,Café,Scandinavian Restaurant,Pub,Grocery Store
52,Stockholm,Europe,0.0,Hotel,Hostel,Theater,Middle Eastern Restaurant,Café,Scandinavian Restaurant,Modern European Restaurant,Seafood Restaurant,Cheese Shop,Plaza
57,Vatican City,Europe,0.0,Hotel,Café,Italian Restaurant,Historic Site,Ice Cream Shop,Restaurant,Church,Supermarket,Scenic Lookout,Basketball Court


In [26]:
# cluster 2
europe_merged.loc[europe_merged['Cluster Labels'] == 1, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
46,San Marino,Europe,1.0,Health Food Store,Zoo,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant


In [27]:
# cluster 3
europe_merged.loc[europe_merged['Cluster Labels'] == 2, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Andorra la Vella,Europe,2.0,Restaurant,Hotel,Plaza,Tennis Court,Cocktail Bar,Gym / Fitness Center,Football Stadium,Farmers Market,Empanada Restaurant,English Restaurant
5,Baku,Europe,2.0,Restaurant,Tea Room,Park,Café,Health & Beauty Service,Lounge,Eastern European Restaurant,Grocery Store,Gym / Fitness Center,Middle Eastern Restaurant
9,Sofia,Europe,2.0,Restaurant,Nightclub,Cocktail Bar,Bakery,Bar,Café,Dessert Shop,Hotel,Liquor Store,Vegetarian / Vegan Restaurant
10,Zagreb,Europe,2.0,Café,Hotel,Restaurant,Nightclub,Sushi Restaurant,Bar,Light Rail Station,BBQ Joint,Chinese Restaurant,French Restaurant
14,Tallinn,Europe,2.0,Restaurant,Bistro,Nightclub,Pet Store,Wine Shop,Yoga Studio,Bar,Performing Arts Venue,Sushi Restaurant,Gym / Fitness Center
25,Dublin,Europe,2.0,Pub,Bus Stop,Restaurant,Indian Restaurant,Coffee Shop,Stadium,Café,Thai Restaurant,Park,Gym
29,Pristina,Europe,2.0,Hotel,Restaurant,Pizza Place,Burger Joint,History Museum,Café,Eastern European Restaurant,Park,Theater,Dessert Shop
30,Riga,Europe,2.0,Restaurant,Bar,Plaza,Hotel,Eastern European Restaurant,Café,Lounge,Coffee Shop,Art Gallery,Cocktail Bar
50,Madrid,Europe,2.0,Restaurant,Chinese Restaurant,Italian Restaurant,Café,Spanish Restaurant,Mediterranean Restaurant,Bar,Breakfast Spot,Tapas Restaurant,Pizza Place


In [28]:
# cluster 4
europe_merged.loc[europe_merged['Cluster Labels'] == 3, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
31,Vaduz,Europe,3.0,Bed & Breakfast,Border Crossing,Hotel,Gym Pool,Zoo,Filipino Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


In [29]:
# cluster 5
europe_merged.loc[europe_merged['Cluster Labels'] == 4, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,Valletta,Europe,4.0,Sandwich Place,Pier,Boat or Ferry,Restaurant,Zoo,Farmers Market,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit


In [30]:
# cluster 6
europe_merged.loc[europe_merged['Cluster Labels'] == 5, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
47,Belgrade,Europe,5.0,Seafood Restaurant,Restaurant,Zoo,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market


In [31]:
# cluster 7
europe_merged.loc[europe_merged['Cluster Labels'] == 6, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Torshavn,Europe,6.0,Aquarium,Pub,Zoo,Filipino Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market


In [32]:
# cluster 8
europe_merged.loc[europe_merged['Cluster Labels'] == 7, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,Tbilisi,Europe,7.0,Hotel,Coffee Shop,Comfort Food Restaurant,Filipino Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market


In [33]:
# cluster 9
europe_merged.loc[europe_merged['Cluster Labels'] == 8, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,Bern,Europe,8.0,Japanese Restaurant,Train Station,Electronics Store,Zoo,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


In [34]:
# cluster 10
europe_merged.loc[europe_merged['Cluster Labels'] == 9, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Chisinau,Europe,9.0,Pizza Place,Hotel,Ballroom,Romanian Restaurant,Park,Theme Park,Czech Restaurant,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant


In [35]:
# cluster 11
europe_merged.loc[europe_merged['Cluster Labels'] == 10, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,Rome,Europe,10.0,Italian Restaurant,Plaza,Ice Cream Shop,Hotel,Sandwich Place,Monument / Landmark,Restaurant,Pizza Place,Shopping Mall,Trattoria/Osteria
33,Luxembourg,Europe,10.0,Italian Restaurant,Indian Restaurant,French Restaurant,Asian Restaurant,Restaurant,Lebanese Restaurant,Nightclub,Auto Dealership,Coffee Shop,Gym
37,Monaco,Europe,10.0,Italian Restaurant,French Restaurant,Supermarket,Sushi Restaurant,Pizza Place,Plaza,Restaurant,Middle Eastern Restaurant,Fast Food Restaurant,Shopping Mall


In [36]:
# cluster 12
europe_merged.loc[europe_merged['Cluster Labels'] == 11, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Yerevan,Europe,11.0,Fast Food Restaurant,Food Court,Public Art,Snack Place,Zoo,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


In [37]:
# cluster 13
europe_merged.loc[europe_merged['Cluster Labels'] == 12, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Sarajevo,Europe,12.0,Lounge,Farmers Market,Modern European Restaurant,Café,Restaurant,Theater,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space


In [38]:
# cluster 14
europe_merged.loc[europe_merged['Cluster Labels'] == 13, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Tirana,Europe,13.0,Italian Restaurant,Café,Coffee Shop,Bar,Hotel,Cocktail Bar,Ice Cream Shop,Diner,Pizza Place,Fast Food Restaurant
4,Vienna,Europe,13.0,Café,Hotel,Sushi Restaurant,Japanese Restaurant,Bar,Coffee Shop,Indian Restaurant,Park,Asian Restaurant,Seafood Restaurant
6,Minsk,Europe,13.0,Café,Theater,Bookstore,Museum,Art Gallery,Doner Restaurant,Park,Art Museum,Plaza,Circus
7,Brussels,Europe,13.0,Greek Restaurant,Sandwich Place,Hotel,Coffee Shop,Bar,Deli / Bodega,Vegetarian / Vegan Restaurant,French Restaurant,Furniture / Home Store,Salad Place
11,Nicosia,Europe,13.0,Bar,Café,Italian Restaurant,Coffee Shop,Burger Joint,Greek Restaurant,Nightclub,Restaurant,Falafel Restaurant,Meze Restaurant
12,Prague,Europe,13.0,Grocery Store,Park,Tennis Court,Café,Gym / Fitness Center,General Entertainment,Bus Stop,Tram Station,Burger Joint,Steakhouse
13,Copenhagen,Europe,13.0,Café,Bakery,Coffee Shop,Hotel,Sushi Restaurant,Scandinavian Restaurant,Italian Restaurant,Pool,Pizza Place,Casino
16,Helsinki,Europe,13.0,Scandinavian Restaurant,Café,Wine Bar,Chinese Restaurant,Sushi Restaurant,Bar,Middle Eastern Restaurant,Japanese Restaurant,Record Shop,Pizza Place
17,Paris,Europe,13.0,Japanese Restaurant,Hotel,French Restaurant,Korean Restaurant,Italian Restaurant,Café,Bakery,Plaza,Udon Restaurant,Ramen Restaurant
21,Athens,Europe,13.0,Bar,Café,Coffee Shop,Theater,Wine Bar,Bookstore,Souvlaki Shop,Greek Restaurant,Vegetarian / Vegan Restaurant,Cretan Restaurant


In [39]:
# cluster 15
europe_merged.loc[europe_merged['Cluster Labels'] == 14, europe_merged.columns[[1] + list(range(5, europe_merged.shape[1]))]]

Unnamed: 0,Capital,Continent,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Vilnius,Europe,14.0,Forest,Food & Drink Shop,Grocery Store,Supermarket,Fast Food Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant


## Background information of the exercise

## Week 1

<u>Description of exercise</u>

- Use location data to explore geographical location being as creative as you want.

- Leverage Foursquare location data to explore or compare neighborhoods or cities of your choice or come up with a problem that you can use the Foursquare location data to solve.


**Examples**

- Compare neighborhoods of Toronto and NYC and determine how similar or dissimilar they are. Is New York City more like Toronto or Paris or some other multicultural city?
- In a city of your choice, where would you recommend that they open a restaurant or where to set up an office? 

Make sure to provide sufficient justification of why you think what you want to do or solve is important and why would a client or a group of people be interested in your project.

**Submit the following**

<u>1. Description of the problem and a discussion of the background (Introduction/Business Problem section)</u>

Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem.









<u>2. Description of the data and how it will be used to solve the problem (Data section)</u>



Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

This will become your Introduction/Business Problem section in your final report. So I recommend that you push the report (having your Introduction/Business Problem section only for now) to your Github repository and submit a link to it.

## Week 2

**Submit the following**

1. A link to your Notebook on your Github repository, showing your code.



   


2. A full report consisting of all of the following components:
   - Introduction where you discuss the business problem and who would be interested in this project.
   - Data where you describe the data that will be used to solve the problem and the source of the data.
   - Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.
   - Results section where you discuss the results.
   - Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.
   - Conclusion section where you conclude the report.

3. Your choice of a presentation or blogpost.