# Capstone Project

The purpose of this capstone project is to compare major cities around the world using the Foursquare API and webscraping. In the project, we will find the countries that have the 19 largest economies in the world as of mid-2021, and then make a comparison of the largest city of each country.

## Import the required libraries

The first thing we need to do is to import the libraries required for our project.

In [1]:
from bs4 import BeautifulSoup #used for webscraping
import requests #used for getting data from urls
import pandas as pd #used for creating and managing dataframes
import numpy as np
from geopy.geocoders import Nominatim #used to get geographical data
from sklearn.cluster import KMeans #used for K-Means clustering

In [2]:
# @hidden_cell
CLIENT_ID = 'FQZ5WX5UEPJTYIEXNDJDFEGBBPY4OSTAZDJHL11SIW3GNPOX'
CLIENT_SECRET = 'LCQPDLAMS4GZ4QS4RTPFXYNZXRJ1TWYHTTI3KZBMWAWUXUGV'
VERSION = '20180605'
LIMIT = 1000

## Webscrape to get data

To get the list of countries with the nineteen largest economies we will webscrape the Wikipedia page for the G20, remove the European Union from the list because other G20 countries are included in it, and store it as a BeautifulSoup object.

In [3]:
url = "https://en.wikipedia.org/wiki/G20"
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')

Next, we will search through the web page for the table that contains the list of G20 members. Once we determine which table has the required data, we extract the list of countries and store it as a pandas dataframe. Our dataframe has an extra column for the largest city of each country which we will fill later.

The table has an extra row for the European Union which we will drop since the EU is not a country.

In [4]:
tables = soup.find_all('table')
for index, table in enumerate(tables):
    if ("Member" in str(table)) & ("Population" in str(table)):
        table_index = index

g20_data = pd.DataFrame(columns=['Country', 'Largest City'])
for row in tables[table_index].tbody.find_all('tr'):
    col = row.find_all('td')
    if col != []:
        g20_data = g20_data.append({'Country': col[0].text.strip()}, ignore_index=True)

g20_data.drop(index=int(g20_data[g20_data['Country']=='European Union'].index[0]), inplace=True)
g20_data

Unnamed: 0,Country,Largest City
0,Argentina,
1,Australia,
2,Brazil,
3,Canada,
4,China,
5,France,
6,Germany,
7,India,
8,Indonesia,
9,Italy,


Wikipedia's country pages have a standard format that uses the country name in the url. We will use the country names that we got above to generate a list of urls to scrape for additional data.

In [5]:
urls = []
for row in range(g20_data.shape[0]):
    if len(g20_data['Country'][row].split(' ')) > 1:
        temp_name = g20_data['Country'][row].replace(' ', '_')
    else:
        temp_name = g20_data['Country'][row]
    urls.append('https://en.wikipedia.org/wiki/'+temp_name)

Now we will store the list of urls that we got above in our dataframe in a new column.

In [6]:
g20_data['url'] = urls

Let's take a look at our dataframe now.

In [7]:
g20_data

Unnamed: 0,Country,Largest City,url
0,Argentina,,https://en.wikipedia.org/wiki/Argentina
1,Australia,,https://en.wikipedia.org/wiki/Australia
2,Brazil,,https://en.wikipedia.org/wiki/Brazil
3,Canada,,https://en.wikipedia.org/wiki/Canada
4,China,,https://en.wikipedia.org/wiki/China
5,France,,https://en.wikipedia.org/wiki/France
6,Germany,,https://en.wikipedia.org/wiki/Germany
7,India,,https://en.wikipedia.org/wiki/India
8,Indonesia,,https://en.wikipedia.org/wiki/Indonesia
9,Italy,,https://en.wikipedia.org/wiki/Italy


Now we'll go through each country's Wikipedia page to extract from it the name of the largest city and store it in our dataframe.

In [8]:
for url in urls:
    data = requests.get(url).text
    soup = BeautifulSoup(data, 'html5lib')
    tables = soup.find_all('table')
    for table in tables:
        for row in table.find_all('tr'):
            if 'largest city' in str(row).lower():
                try:
                    largest_city = row.td.a.string
                except:
                    break
    g20_data.loc[g20_data[g20_data['url'] == url].index[0],['Largest City']] = largest_city

g20_data

Unnamed: 0,Country,Largest City,url
0,Argentina,Buenos Aires,https://en.wikipedia.org/wiki/Argentina
1,Australia,Sydney,https://en.wikipedia.org/wiki/Australia
2,Brazil,São Paulo,https://en.wikipedia.org/wiki/Brazil
3,Canada,Toronto,https://en.wikipedia.org/wiki/Canada
4,China,Shanghai,https://en.wikipedia.org/wiki/China
5,France,Paris,https://en.wikipedia.org/wiki/France
6,Germany,Berlin,https://en.wikipedia.org/wiki/Germany
7,India,Mumbai,https://en.wikipedia.org/wiki/India
8,Indonesia,Jakarta,https://en.wikipedia.org/wiki/Indonesia
9,Italy,Rome,https://en.wikipedia.org/wiki/Italy


Now that we have the names of the largest cities, we can find their latitudes and longintudes and add them to the dataframe.

In [9]:
lats = []
longs = []

for city,country in zip(g20_data['Largest City'], g20_data['Country']):
    address = str(city + ', ' + country)

    geolocator = Nominatim(user_agent="city_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    
    lats.append(latitude)
    longs.append(longitude)

g20_data['Latitude'] = lats
g20_data['Longitude'] = longs
g20_data

Unnamed: 0,Country,Largest City,url,Latitude,Longitude
0,Argentina,Buenos Aires,https://en.wikipedia.org/wiki/Argentina,-34.607568,-58.437089
1,Australia,Sydney,https://en.wikipedia.org/wiki/Australia,-33.854816,151.216454
2,Brazil,São Paulo,https://en.wikipedia.org/wiki/Brazil,-23.550651,-46.633382
3,Canada,Toronto,https://en.wikipedia.org/wiki/Canada,43.653482,-79.383935
4,China,Shanghai,https://en.wikipedia.org/wiki/China,31.232276,121.469207
5,France,Paris,https://en.wikipedia.org/wiki/France,48.856697,2.351462
6,Germany,Berlin,https://en.wikipedia.org/wiki/Germany,52.517037,13.38886
7,India,Mumbai,https://en.wikipedia.org/wiki/India,19.07599,72.877393
8,Indonesia,Jakarta,https://en.wikipedia.org/wiki/Indonesia,-6.175394,106.827183
9,Italy,Rome,https://en.wikipedia.org/wiki/Italy,41.89332,12.482932


## Getting venues in cities

We will now start by defining a function that gets all the venues that are within a 1000m radius of a given set of coordinates. We will use this function to get the venues near the latitudes and longitudes stored in our dataframe.

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now that the function is defined, let's use it to get the venues and store them in a new dataframe called <code><b>g20_venues</b></code>.

In [11]:
g20_venues = getNearbyVenues(names=g20_data['Largest City'], latitudes=g20_data['Latitude'], longitudes=g20_data['Longitude'], radius=1000)

### How many venues did we get?

In [12]:
print(g20_venues.shape)
g20_venues.head()

(1712, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Buenos Aires,-34.607568,-58.437089,Co-Pain Boulangerie (ex-Franck Dauffouis),-34.607693,-58.438662,Bakery
1,Buenos Aires,-34.607568,-58.437089,Teglia,-34.609058,-58.441282,Pizza Place
2,Buenos Aires,-34.607568,-58.437089,Heladería Tino,-34.608942,-58.430741,Ice Cream Shop
3,Buenos Aires,-34.607568,-58.437089,Anfiteatro Eva Perón,-34.605891,-58.436829,Amphitheater
4,Buenos Aires,-34.607568,-58.437089,Parque Centenario,-34.606597,-58.435464,Park


### How many unique venue categories were found?

In [13]:
print('There are {} unique categories of venues found'.format(len(g20_venues['Venue Category'].unique())))

There are 292 unique categories of venues found


Let's take a quick look at the venue categories found.

In [14]:
g20_venues.groupby(['Venue Category']).count()

Unnamed: 0_level_0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,2,2,2,2,2,2
African Restaurant,2,2,2,2,2,2
American Restaurant,7,7,7,7,7,7
Amphitheater,1,1,1,1,1,1
Antique Shop,3,3,3,3,3,3
...,...,...,...,...,...,...
Women's Store,2,2,2,2,2,2
Xinjiang Restaurant,1,1,1,1,1,1
Yakitori Restaurant,1,1,1,1,1,1
Yoga Studio,4,4,4,4,4,4


### One-hot Encoding

Now we will use one-hot encoding to create a separate column for each venue category. We will also add the city name to the beginning of the dataframe so we can group the data later.

In [15]:
g20_onehot = pd.get_dummies(g20_venues[['Venue Category']], prefix="", prefix_sep="")
g20_onehot['City'] = g20_venues['City']
fixed_columns = [g20_onehot.columns[-1]] + list(g20_onehot.columns[:-1])
g20_onehot = g20_onehot[fixed_columns]

g20_onehot.head()

Unnamed: 0,City,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Argentinian Restaurant,Armenian Restaurant,Art Gallery,Art Museum,...,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Xinjiang Restaurant,Yakitori Restaurant,Yoga Studio,Yoshoku Restaurant
0,Buenos Aires,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Buenos Aires,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Buenos Aires,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Buenos Aires,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Buenos Aires,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's group the data by city and store it in a new dataframe.

In [16]:
g20_grouped = g20_onehot.groupby('City').mean().reset_index()
g20_grouped

Unnamed: 0,City,Accessories Store,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Argentinian Restaurant,Armenian Restaurant,Art Gallery,Art Museum,...,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Xinjiang Restaurant,Yakitori Restaurant,Yoga Studio,Yoshoku Restaurant
0,Berlin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.03,...,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0
1,Buenos Aires,0.0,0.01,0.0,0.01,0.0,0.1,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Istanbul,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
3,Jakarta,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Johannesburg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051282,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,London,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0
6,Mexico City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Moscow,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.02,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Mumbai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,New York City,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,...,0.0,0.0,0.0,0.01,0.05,0.01,0.0,0.0,0.01,0.0


The dataframe has too many columns to give us a clear idea of which venues are the most common. We'll define a new function to return the most common venues for each city.

In [17]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Let's use the function defined above to get the 10 most common venues for each city and store it in a new dataframe.

In [18]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
g20_venues_sorted = pd.DataFrame(columns=columns)
g20_venues_sorted['City'] = g20_grouped['City']

for ind in np.arange(g20_grouped.shape[0]):
    g20_venues_sorted.iloc[ind, 1:] = return_most_common_venues(g20_grouped.iloc[ind, :], num_top_venues)

g20_venues_sorted

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berlin,History Museum,Drugstore,Hotel,Coffee Shop,Bookstore,Monument / Landmark,Cocktail Bar,Art Museum,Theater,Art Gallery
1,Buenos Aires,Café,Pizza Place,Argentinian Restaurant,Bakery,Ice Cream Shop,Burger Joint,Coffee Shop,Gym,Italian Restaurant,Indie Theater
2,Istanbul,Hotel,Turkish Restaurant,Mosque,Café,Historic Site,Restaurant,Jewelry Store,Kebab Restaurant,Bookstore,Seafood Restaurant
3,Jakarta,Indonesian Restaurant,Fast Food Restaurant,Asian Restaurant,Café,Hotel,Coffee Shop,Padangnese Restaurant,Bakery,Food Truck,Noodle House
4,Johannesburg,Café,Fast Food Restaurant,Portuguese Restaurant,Breakfast Spot,Art Gallery,Historic Site,Coffee Shop,Hotel,Scenic Lookout,Public Art
5,London,Hotel,Ice Cream Shop,Garden,Bakery,Gelato Shop,Steakhouse,Lounge,Coffee Shop,Plaza,Cocktail Bar
6,Mexico City,Mexican Restaurant,Ice Cream Shop,Art Museum,Museum,Arts & Crafts Store,Hotel,Restaurant,Jewelry Store,Clothing Store,Boutique
7,Moscow,Boutique,Hotel,Coffee Shop,Plaza,Italian Restaurant,Cosmetics Shop,History Museum,Art Gallery,Beer Bar,Caucasian Restaurant
8,Mumbai,Bar,Indian Restaurant,Coffee Shop,Flea Market,Multicuisine Indian Restaurant,Mexican Restaurant,Pizza Place,Italian Restaurant,Food & Drink Shop,Food Court
9,New York City,Coffee Shop,Wine Shop,Spa,Gym / Fitness Center,Memorial Site,Café,French Restaurant,Park,Gym,Burger Joint


## K-Means Clustering

Now that we have a dataframe with the most common venues in the nineteen cities, let us use k-means clustering to group these cities into 4 different clusters based on their associated venues.

We first start by setting the number of clusters to 4 and fitting our data using the <code>KMeans</code> function that we imported at the beginning of the notebook.

In [19]:
kclusters = 4

g20_grouped_clustering = g20_grouped.drop('City', 1)

kmeans = KMeans(n_clusters=kclusters).fit(g20_grouped_clustering)

Let's check what the resultant cluster labels are.

In [20]:
kmeans.labels_

array([2, 1, 2, 0, 1, 2, 2, 2, 3, 2, 2, 2, 2, 2, 0, 1, 2, 2, 1])

Now we will add the cluster labels to the dataframe with the most common venues and merge it with the original dataframe.

In [21]:
g20_venues_sorted.insert(0, 'Cluster labels', kmeans.labels_)

g20_merged = g20_data

In [22]:
g20_merged = g20_merged.join(g20_venues_sorted.set_index('City'), on='Largest City')

In [23]:
g20_merged

Unnamed: 0,Country,Largest City,url,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Argentina,Buenos Aires,https://en.wikipedia.org/wiki/Argentina,-34.607568,-58.437089,1,Café,Pizza Place,Argentinian Restaurant,Bakery,Ice Cream Shop,Burger Joint,Coffee Shop,Gym,Italian Restaurant,Indie Theater
1,Australia,Sydney,https://en.wikipedia.org/wiki/Australia,-33.854816,151.216454,1,Café,Australian Restaurant,Scenic Lookout,Hotel,Japanese Restaurant,Italian Restaurant,Ice Cream Shop,Cocktail Bar,Theater,Thai Restaurant
2,Brazil,São Paulo,https://en.wikipedia.org/wiki/Brazil,-23.550651,-46.633382,2,Japanese Restaurant,Cultural Center,Café,Sake Bar,Grocery Store,Theater,Bookstore,Dessert Shop,Bakery,Snack Place
3,Canada,Toronto,https://en.wikipedia.org/wiki/Canada,43.653482,-79.383935,1,Café,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Clothing Store,Gym,Furniture / Home Store,Plaza,Middle Eastern Restaurant
4,China,Shanghai,https://en.wikipedia.org/wiki/China,31.232276,121.469207,0,Coffee Shop,Fast Food Restaurant,Hotel,Chinese Restaurant,Café,Lounge,Indian Restaurant,Asian Restaurant,Gym,French Restaurant
5,France,Paris,https://en.wikipedia.org/wiki/France,48.856697,2.351462,2,French Restaurant,Ice Cream Shop,Plaza,Bookstore,Restaurant,Art Museum,Bakery,Tea Room,Lebanese Restaurant,Bar
6,Germany,Berlin,https://en.wikipedia.org/wiki/Germany,52.517037,13.38886,2,History Museum,Drugstore,Hotel,Coffee Shop,Bookstore,Monument / Landmark,Cocktail Bar,Art Museum,Theater,Art Gallery
7,India,Mumbai,https://en.wikipedia.org/wiki/India,19.07599,72.877393,3,Bar,Indian Restaurant,Coffee Shop,Flea Market,Multicuisine Indian Restaurant,Mexican Restaurant,Pizza Place,Italian Restaurant,Food & Drink Shop,Food Court
8,Indonesia,Jakarta,https://en.wikipedia.org/wiki/Indonesia,-6.175394,106.827183,0,Indonesian Restaurant,Fast Food Restaurant,Asian Restaurant,Café,Hotel,Coffee Shop,Padangnese Restaurant,Bakery,Food Truck,Noodle House
9,Italy,Rome,https://en.wikipedia.org/wiki/Italy,41.89332,12.482932,2,Historic Site,Italian Restaurant,Plaza,Sandwich Place,Ice Cream Shop,Monument / Landmark,Wine Bar,Temple,Garden,Church


The four resulting clusters can be seen below.

### Cluster 1

In [24]:
g20_merged.loc[g20_merged['Cluster labels'] == 0, g20_merged.columns[[1] + list(range(5, g20_merged.shape[1]))]]

Unnamed: 0,Largest City,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Shanghai,0,Coffee Shop,Fast Food Restaurant,Hotel,Chinese Restaurant,Café,Lounge,Indian Restaurant,Asian Restaurant,Gym,French Restaurant
8,Jakarta,0,Indonesian Restaurant,Fast Food Restaurant,Asian Restaurant,Café,Hotel,Coffee Shop,Padangnese Restaurant,Bakery,Food Truck,Noodle House


### Cluster 2

In [25]:
g20_merged.loc[g20_merged['Cluster labels'] == 1, g20_merged.columns[[1] + list(range(5, g20_merged.shape[1]))]]

Unnamed: 0,Largest City,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Buenos Aires,1,Café,Pizza Place,Argentinian Restaurant,Bakery,Ice Cream Shop,Burger Joint,Coffee Shop,Gym,Italian Restaurant,Indie Theater
1,Sydney,1,Café,Australian Restaurant,Scenic Lookout,Hotel,Japanese Restaurant,Italian Restaurant,Ice Cream Shop,Cocktail Bar,Theater,Thai Restaurant
3,Toronto,1,Café,Coffee Shop,Japanese Restaurant,Restaurant,Sushi Restaurant,Clothing Store,Gym,Furniture / Home Store,Plaza,Middle Eastern Restaurant
15,Johannesburg,1,Café,Fast Food Restaurant,Portuguese Restaurant,Breakfast Spot,Art Gallery,Historic Site,Coffee Shop,Hotel,Scenic Lookout,Public Art


### Cluster 3

In [26]:
g20_merged.loc[g20_merged['Cluster labels'] == 2, g20_merged.columns[[1] + list(range(5, g20_merged.shape[1]))]]

Unnamed: 0,Largest City,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,São Paulo,2,Japanese Restaurant,Cultural Center,Café,Sake Bar,Grocery Store,Theater,Bookstore,Dessert Shop,Bakery,Snack Place
5,Paris,2,French Restaurant,Ice Cream Shop,Plaza,Bookstore,Restaurant,Art Museum,Bakery,Tea Room,Lebanese Restaurant,Bar
6,Berlin,2,History Museum,Drugstore,Hotel,Coffee Shop,Bookstore,Monument / Landmark,Cocktail Bar,Art Museum,Theater,Art Gallery
9,Rome,2,Historic Site,Italian Restaurant,Plaza,Sandwich Place,Ice Cream Shop,Monument / Landmark,Wine Bar,Temple,Garden,Church
10,Tokyo,2,Hotel,Café,Japanese Restaurant,Chinese Restaurant,Chocolate Shop,Italian Restaurant,French Restaurant,Nabe Restaurant,Coffee Shop,Historic Site
11,Seoul,2,Hotel,Korean Restaurant,Coffee Shop,Café,Chinese Restaurant,Japanese Restaurant,Sushi Restaurant,Plaza,Historic Site,Bakery
12,Mexico City,2,Mexican Restaurant,Ice Cream Shop,Art Museum,Museum,Arts & Crafts Store,Hotel,Restaurant,Jewelry Store,Clothing Store,Boutique
13,Moscow,2,Boutique,Hotel,Coffee Shop,Plaza,Italian Restaurant,Cosmetics Shop,History Museum,Art Gallery,Beer Bar,Caucasian Restaurant
14,Riyadh,2,Jewelry Store,Asian Restaurant,Hotel,Middle Eastern Restaurant,Park,Historic Site,Shopping Mall,Electronics Store,Toy / Game Store,Market
16,Istanbul,2,Hotel,Turkish Restaurant,Mosque,Café,Historic Site,Restaurant,Jewelry Store,Kebab Restaurant,Bookstore,Seafood Restaurant


### Cluster 4

In [27]:
g20_merged.loc[g20_merged['Cluster labels'] == 3, g20_merged.columns[[1] + list(range(5, g20_merged.shape[1]))]]

Unnamed: 0,Largest City,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Mumbai,3,Bar,Indian Restaurant,Coffee Shop,Flea Market,Multicuisine Indian Restaurant,Mexican Restaurant,Pizza Place,Italian Restaurant,Food & Drink Shop,Food Court


---
This concludes the requirements of Week 5's assignment for the Capstone Project Course.