# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, I will try to find countries where you can eat what you usually eat in your country.

Let's assume that you are to live abroad for some reason and you are wondering where to go. If you are a gourmet, 'lineup of restaurants' (the most common categories of restaurants) will probably affect your decision. Even if you are not, what kind of restaurants are available in the country will affect whether you will be comfortable there.

I will cluster all the countries in the world into 5 clusters based on their lineup of restaurants. By selecting one of the countries in the same cluster as yours, you will be satisfied with your life there. 

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
    number of existing restaurants in the neighborhood
    category of each restaurant

Each country has a lot of cities. I use capital cities to evaluate lineup of restaurants of the county as a capital city is a typical city of the country.

Following data sources will be needed to extract/generate the required information:
- List of countries will be obtained from this site: https://geographyfieldwork.com/WorldCapitalCities.htm
- Location of the capital city of each country will be obtained by using *geopy.geocoders* library.
- List of restaurants near the center of capital cities and their categories will be obtained by using Foursquare API.

### Download libraries

In [1]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!conda install -c conda-forge lxml html5lib beautifulsoup4 --yes

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



### Import libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# Regular Expression
import re

print('Libraries imported.')

Libraries imported.


### Get the list of capital cities

In [3]:
url = 'https://geographyfieldwork.com/WorldCapitalCities.htm'
df_cap0 = pd.read_html(url)[2]

# Remove the last line (total)
df_cap0 = df_cap0.drop(len(df_cap0)-1)

# Remove duplicated cities (London)
df_cap0 = df_cap0.groupby(['Capital City']).nth(0)
df_cap0.reset_index(inplace=True)
df_cap0 = df_cap0[['Country', 'Capital City']]

# Remove numbers
df_cap0['Country'] = df_cap0['Country'].str.translate(str.maketrans('Dummy', 'Dummy', '0123456789'))
df_cap0['Capital City'] = df_cap0['Capital City'].str.translate(str.maketrans('Dummy', 'Dummy', '0123456789'))

df_cap0.head()

Unnamed: 0,Country,Capital City
0,United Arab Emirates,Abu Dhabi
1,Nigeria,Abuja
2,Ghana,Accra
3,Ethiopia,Addis Ababa
4,Algeria,Algiers


### Get the latitude and longitude of each capital city by using geopy.geocoders

In [4]:
df_cap1 = df_cap0.copy()
geolocator = Nominatim(user_agent="ny_explorer")

for index, row in df_cap1.iterrows():
    country = row['Country']
    capital_city = row['Capital City']
    try:
        location = geolocator.geocode(capital_city + ", " + country)
        latitude = location.latitude
        longitude = location.longitude
    except:
        latitude = 0
        longitude = 0
    print('{}: The geograpical coordinate of {}, {} are {}, {}.'.format(index, country, capital_city, latitude, longitude))
    df_cap1.loc[index, 'Latitude']=latitude
    df_cap1.loc[index, 'Longitude']=longitude
    
df_cap1

# Remove rows with Latitude 0.0 (geolocation does not support the capital city name)
df_cap1 = df_cap1[df_cap1['Latitude'] != 0.0]
df_cap1.reset_index(drop=True, inplace=True)

# Save the DataFrame
df_cap1.to_csv('capitals.csv')
df_cap1.head()

0: The geograpical coordinate of United Arab Emirates, Abu Dhabi are 23.99764435, 53.6439097569213.
1: The geograpical coordinate of Nigeria, Abuja are 9.0643305, 7.4892974.
2: The geograpical coordinate of Ghana, Accra are 5.5600141, -0.2057437.
3: The geograpical coordinate of Ethiopia, Addis Ababa are 9.0107934, 38.7612525.
4: The geograpical coordinate of Algeria, Algiers are 28.0000272, 2.9999825.
5: The geograpical coordinate of Jordan, Amman are 31.9515694, 35.9239625.
6: The geograpical coordinate of Netherlands, Amsterdam are 52.3745403, 4.89797550561798.
7: The geograpical coordinate of Andorra, Andorra la Vella are 42.5069391, 1.5212467.
8: The geograpical coordinate of Turkey, Ankara are 39.9207774, 32.854067.
9: The geograpical coordinate of Madagascar, Antananarivo are -18.9100122, 47.5255809.
10: The geograpical coordinate of Samoa, Apia are -13.8343691, -171.7692793.
11: The geograpical coordinate of Turkmenistan, Ashgabat are 37.9404379, 58.3822788.
12: The geograpical

Unnamed: 0,Country,Capital City,Latitude,Longitude
0,United Arab Emirates,Abu Dhabi,23.997644,53.64391
1,Nigeria,Abuja,9.064331,7.489297
2,Ghana,Accra,5.560014,-0.205744
3,Ethiopia,Addis Ababa,9.010793,38.761252
4,Algeria,Algiers,28.000027,2.999983


### Or, get the latitude and longitude of each capital city from the saved file.

In [5]:
df_cap1 = pd.read_csv('capitals.csv', index_col=0)
# df_cap1 = df_cap1[:20] # To reduce Foursquare API calls
df_cap1.head()

Unnamed: 0,Country,Capital City,Latitude,Longitude
0,United Arab Emirates,Abu Dhabi,23.997644,53.64391
1,Nigeria,Abuja,9.06433,7.489297
2,Ghana,Accra,5.560014,-0.205744
3,Ethiopia,Addis Ababa,9.010793,38.761253
4,Algeria,Algiers,28.000027,2.999983


### Display the world map with capital cities

In [6]:
# create world map using latitude and longitude values

# Center
latitude = 0.0
longitude = 0.0

map_world= folium.Map(location=[latitude, longitude], zoom_start=2)
world_data = df_cap1

# add markers to map
for lat, lng, country, city in zip(world_data['Latitude'], world_data['Longitude'], world_data['Country'], world_data['Capital City']):
    label = '{}, {}'.format(city, country)
    # print(type(label)) # str
    # print(label) # ex. Wakefield, Bronx
    label = folium.Popup(label, parse_html=True)
    # print(type(label)) # folium.map.Popup
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_world)  
    
map_world

## Methodology <a name="methodology"></a>

In this project, I will cluster almost all the countries in the world into 5 clusters based on the lineup of restaurants.

I assume a capital city has a typical lineup of the restaurants in the country. The location of capital cities have been already collected above. Foursquare API is available to explore venues by specifying a location. We can get only the food-related venues by specifying section=food.

In the first step, I will find the restaurants and their category within a radius of 1000 meters from the center of each capital city by Foursquare API.

In the second step, I will get the top 10 categories for each capital city.

In the third step, I will run k-means to cluster all the capital cities into 5 clusters.

## Analysis <a name="analysis"></a>

### Define Foursquare Credentials and Version

In [7]:
CLIENT_ID = '5WZXT1II0XTQSI2O55QZHYNAO02BINOHTSBWGJVZETGAV312' # your Foursquare ID
CLIENT_SECRET = 'VT4QS2NA5NBJG4TBDSIGKHU2V0M2KWLLPEVJSYB4EQQO2DCZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5WZXT1II0XTQSI2O55QZHYNAO02BINOHTSBWGJVZETGAV312
CLIENT_SECRET:VT4QS2NA5NBJG4TBDSIGKHU2V0M2KWLLPEVJSYB4EQQO2DCZ


### Create a list of capital cities with venues in food section

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [8]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Function to explore venues with specified section

In [9]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

def getNearbyVenues(names, latitudes, longitudes, section='food', radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&section={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            section
            )
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            print('Error: ' + url)
            break


    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Execute the function for all the capital cities in the world

In [10]:
world_venues = getNearbyVenues(names=world_data['Capital City'], 
                                   latitudes=world_data['Latitude'],
                                   longitudes=world_data['Longitude'],
                                   section='food'
                              )

Abu Dhabi
Abuja
Accra
Addis Ababa
Algiers
Amman
Amsterdam
Andorra la Vella
Ankara
Antananarivo
Apia
Ashgabat
Asmara
Astana
Asuncion
Athens
Baghdad
Baku
Bamako
Bandar Seri Begawan
Bangkok
Bangui
Banjul
Basseterre
Beijing
Beirut
Belfast
Belgrade
Belmopan
Berlin
Bern
Bishkek
Bissau
Bogota
Brasilia
Bratislava
Bridgetown
Brussels
Bucharest
Budapest
Buenos Aires
Cairo
Canberra
Caracas
Cardiff
Castries
Cayenne
Chisinau
Colombo
Conakry
Copenhagen
Dakar
Damascus
Dhaka
Dili
Djibouti
Dodoma
Doha
Dublin
Dushanbe
Edinburgh
Freetown
Funafuti
Gaborone
Georgetown
Gitega
Guatemala City
Hanoi
Harare
Havana
Helsinki
Honiara
Islamabad
Jakarta
Juba
Kabul
Kampala
Kathmandu
Khartoum
Kiev
Kigali
Kingston
Kingstown
Kuala Lumpur
Kuwait City
La Paz
Libreville
Lilongwe
Lima
Lisbon
Ljubljana
Lome
London
Luanda
Lusaka
Luxembourg
Madrid
Majuro
Malabo
Male
Managua
Manama
Manila
Maputo
Maseru
Melekeok
Mexico City
Minsk
Mogadishu
Monaco
Monrovia
Montevideo
Moroni
Moscow
Muscat
N'Djamena
Nairobi
Nassau
New Delhi
Niamey


Display the results.

In [11]:
print(world_venues.shape)
world_venues.to_csv('world_venues.csv')
world_venues.head()

(8276, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abuja,9.06433,7.489297,the secret garden,9.066731,7.490142,Italian Restaurant
1,Abuja,9.06433,7.489297,River Plate Garden,9.06626,7.49006,Pizza Place
2,Abuja,9.06433,7.489297,Papillion Restaurant,9.064327,7.484541,African Restaurant
3,Abuja,9.06433,7.489297,Yahuza Suya Spot,9.071558,7.485696,BBQ Joint
4,Abuja,9.06433,7.489297,Hatlab,9.071944,7.488913,Deli / Bodega


In [12]:
world_venues = pd.read_csv('world_venues.csv', index_col=0)
world_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abuja,9.064331,7.489297,the secret garden,9.066731,7.490142,Italian Restaurant
1,Abuja,9.064331,7.489297,River Plate Garden,9.06626,7.49006,Pizza Place
2,Abuja,9.064331,7.489297,Papillion Restaurant,9.064327,7.484541,African Restaurant
3,Abuja,9.064331,7.489297,Yahuza Suya Spot,9.071558,7.485696,BBQ Joint
4,Abuja,9.064331,7.489297,Hatlab,9.071944,7.488913,Deli / Bodega


Display the number of unique food categories.

In [13]:
print('There are {} uniques categories.'.format(len(world_venues['Venue Category'].unique())))

There are 187 uniques categories.


### Analyze each capital city

In [14]:
# one hot encoding
world_onehot = pd.get_dummies(world_venues[['Venue Category']], prefix="", prefix_sep="")
#    Accessories Store   Adult Boutique  ...
# 0 

# add neighborhood column back to dataframe
world_onehot['Neighborhood'] = world_venues['Neighborhood'] 
#    Accessories Store   Adult Boutique  ...  Neighborhood
# 0                                           Marbele Hill

# move neighborhood column to the first column from the last
fixed_columns = [world_onehot.columns[-1]] + list(world_onehot.columns[:-1])
world_onehot = world_onehot[fixed_columns]

world_onehot.head()
#   Neighborhood Accessories Store Adult Boutinue ...
# 0  Marble Hill                 0              0

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bavarian Restaurant,Beijing Restaurant,Belarusian Restaurant,Belgian Restaurant,Bistro,Blini House,Bossam/Jokbal Restaurant,Brasserie,Bratwurst Joint,Brazilian Restaurant,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burgundian Restaurant,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Campanian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chettinad Restaurant,Chinese Restaurant,Cigkofte Place,Comfort Food Restaurant,Corsican Restaurant,Creperie,Cretan Restaurant,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Taverna,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Friterie,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Grilled Meat Restaurant,Gukbap Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hotpot Restaurant,Hunan Restaurant,Hungarian Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kafenio,Kebab Restaurant,Kofte Place,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lyonese Bouchon,Mac & Cheese Joint,Magirio,Malay Restaurant,Manadonese Restaurant,Manti Place,Mediterranean Restaurant,Mexican Restaurant,Meyhane,Meze Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Modern Greek Restaurant,Molecular Gastronomy Restaurant,Mongolian Restaurant,Moroccan Restaurant,Nabe Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Ouzeri,Padangnese Restaurant,Paella Restaurant,Pakistani Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pide Place,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Rhenisch Restaurant,Roman Restaurant,Romanian Restaurant,Russian Restaurant,Salad Place,Samgyetang Restaurant,Sandwich Place,Scandinavian Restaurant,Scottish Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shanxi Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sukiyaki Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tatar Restaurant,Taverna,Tempura Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Tonkatsu Restaurant,Trattoria/Osteria,Turkish Home Cooking Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Unagi Restaurant,Varenyky restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wagashi Place,Wings Joint,Yakitori Restaurant,Yoshoku Restaurant,Yunnan Restaurant
0,Abuja,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abuja,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abuja,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abuja,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abuja,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [15]:
world_onehot.shape

(8276, 188)

#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [16]:
world_grouped = world_onehot.groupby('Neighborhood').mean().reset_index()
world_grouped.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,Alsatian Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Auvergne Restaurant,BBQ Joint,Bagel Shop,Bakery,Bavarian Restaurant,Beijing Restaurant,Belarusian Restaurant,Belgian Restaurant,Bistro,Blini House,Bossam/Jokbal Restaurant,Brasserie,Bratwurst Joint,Brazilian Restaurant,Breakfast Spot,Buffet,Bulgarian Restaurant,Burger Joint,Burgundian Restaurant,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Campanian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chettinad Restaurant,Chinese Restaurant,Cigkofte Place,Comfort Food Restaurant,Corsican Restaurant,Creperie,Cretan Restaurant,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Taverna,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Friterie,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Grilled Meat Restaurant,Gukbap Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hotpot Restaurant,Hunan Restaurant,Hungarian Restaurant,Indian Chinese Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kafenio,Kebab Restaurant,Kofte Place,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lyonese Bouchon,Mac & Cheese Joint,Magirio,Malay Restaurant,Manadonese Restaurant,Manti Place,Mediterranean Restaurant,Mexican Restaurant,Meyhane,Meze Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Modern Greek Restaurant,Molecular Gastronomy Restaurant,Mongolian Restaurant,Moroccan Restaurant,Nabe Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Ouzeri,Padangnese Restaurant,Paella Restaurant,Pakistani Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pide Place,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Poutine Place,Ramen Restaurant,Restaurant,Rhenisch Restaurant,Roman Restaurant,Romanian Restaurant,Russian Restaurant,Salad Place,Samgyetang Restaurant,Sandwich Place,Scandinavian Restaurant,Scottish Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Shanxi Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sukiyaki Restaurant,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tatar Restaurant,Taverna,Tempura Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tibetan Restaurant,Tonkatsu Restaurant,Trattoria/Osteria,Turkish Home Cooking Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Unagi Restaurant,Varenyky restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wagashi Place,Wings Joint,Yakitori Restaurant,Yoshoku Restaurant,Yunnan Restaurant
0,Abuja,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.210526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.157895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Accra,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Addis Ababa,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Algiers,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Amman,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.05,0.0,0.0,0.0,0.36,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0


#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [17]:
world_grouped.shape

(173, 188)

#### Print each neighborhood along with the top 5 most common venues

In [18]:
num_top_venues = 5

for hood in world_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = world_grouped[world_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abuja----
                venue  freq
0  Italian Restaurant  0.21
1         Pizza Place  0.16
2           BBQ Joint  0.11
3                Café  0.11
4  African Restaurant  0.11


----Accra----
                venue  freq
0          Restaurant  0.44
1  African Restaurant  0.22
2      Breakfast Spot  0.11
3    Swiss Restaurant  0.11
4   Indian Restaurant  0.11


----Addis Ababa----
                  venue  freq
0  Ethiopian Restaurant  0.33
1            Restaurant  0.17
2      Greek Restaurant  0.08
3   American Restaurant  0.08
4                  Café  0.08


----Algiers----
                   venue  freq
0                   Café   1.0
1      Afghan Restaurant   0.0
2            Pizza Place   0.0
3  Padangnese Restaurant   0.0
4      Paella Restaurant   0.0


----Amman----
                       venue  freq
0                       Café  0.36
1  Middle Eastern Restaurant  0.14
2         Italian Restaurant  0.07
3               Burger Joint  0.05
4         Falafel Restaurant  0.04




### Put that into a *pandas* dataframe

#### Function to sort the venues in descending order

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create the new dataframe and display the top 10 venues for each neighborhood.

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = world_grouped['Neighborhood']

for ind in np.arange(world_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(world_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abuja,Italian Restaurant,Pizza Place,African Restaurant,Café,BBQ Joint,Bakery,Deli / Bodega,Steakhouse,Burrito Place,Food
1,Accra,Restaurant,African Restaurant,Breakfast Spot,Indian Restaurant,Swiss Restaurant,Donut Shop,Dumpling Restaurant,French Restaurant,Food Truck,Food Stand
2,Addis Ababa,Ethiopian Restaurant,Restaurant,American Restaurant,Italian Restaurant,Greek Restaurant,Café,Chinese Restaurant,French Restaurant,Filipino Restaurant,Food Stand
3,Algiers,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
4,Amman,Café,Middle Eastern Restaurant,Italian Restaurant,Burger Joint,Breakfast Spot,Falafel Restaurant,Sandwich Place,Bakery,Doner Restaurant,Snack Place


### Cluster capital cities

#### Run *k*-means to cluster the neighborhood into 5 clusters.

In [21]:
# set number of clusters
kclusters = 5

world_grouped_clustering = world_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(world_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 0, 2, 1, 0, 0, 1, 0, 1], dtype=int32)

#### Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
# print(type(neighborhoods_venues_sorted)) # DataFrame
# print(neighborhoods_venues_sorted.shape) # (99, 12)
# print(world_data.shape) # (103, 5)

In [23]:
world_merged = world_data
# print(world_merged)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
world_merged = world_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Capital City', how='inner')
# print(type(toronto_merged['Cluster Labels'][0])) # int32 if joined with how='inner'. float64 if how='inner'

world_merged.to_csv('world_merged.csv')
world_merged # check the last columns!

Unnamed: 0,Country,Capital City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Nigeria,Abuja,9.06433,7.489297,0,Italian Restaurant,Pizza Place,African Restaurant,Café,BBQ Joint,Bakery,Deli / Bodega,Steakhouse,Burrito Place,Food
2,Ghana,Accra,5.560014,-0.205744,3,Restaurant,African Restaurant,Breakfast Spot,Indian Restaurant,Swiss Restaurant,Donut Shop,Dumpling Restaurant,French Restaurant,Food Truck,Food Stand
3,Ethiopia,Addis Ababa,9.010793,38.761253,0,Ethiopian Restaurant,Restaurant,American Restaurant,Italian Restaurant,Greek Restaurant,Café,Chinese Restaurant,French Restaurant,Filipino Restaurant,Food Stand
4,Algeria,Algiers,28.000027,2.999983,2,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
5,Jordan,Amman,31.951569,35.923963,1,Café,Middle Eastern Restaurant,Italian Restaurant,Burger Joint,Breakfast Spot,Falafel Restaurant,Sandwich Place,Bakery,Doner Restaurant,Snack Place
6,Netherlands,Amsterdam,52.37454,4.897976,0,Café,Italian Restaurant,Restaurant,Chinese Restaurant,French Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Indian Restaurant,Breakfast Spot
7,Andorra,Andorra la Vella,42.506939,1.521247,0,Restaurant,Spanish Restaurant,Tapas Restaurant,Burger Joint,Café,Diner,French Restaurant,Pizza Place,BBQ Joint,Fast Food Restaurant
8,Turkey,Ankara,39.920777,32.854067,1,Café,Turkish Restaurant,Kebab Restaurant,Sandwich Place,Restaurant,Doner Restaurant,Seafood Restaurant,Diner,Bakery,Food Truck
9,Madagascar,Antananarivo,-18.910012,47.525581,0,Restaurant,Burger Joint,Café,French Restaurant,African Restaurant,Italian Restaurant,Steakhouse,Vietnamese Restaurant,Sushi Restaurant,BBQ Joint
10,Samoa,Apia,-13.834369,-171.769279,1,Café,Restaurant,Pizza Place,Burger Joint,Fast Food Restaurant,Indian Restaurant,Chinese Restaurant,Bakery,Food Stand,Food Court


#### Visualize the resulting clusters

In [24]:
# create map
latitude = 0.0
longitude = 0.0
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=2)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(world_merged['Latitude'], world_merged['Longitude'], world_merged['Capital City'], world_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

In [25]:
cluster1 = world_merged.loc[world_merged['Cluster Labels'] == 0, world_merged.columns[[1] + list(range(5, world_merged.shape[1]))]]
print(cluster1.shape)
cluster1

(104, 11)


Unnamed: 0,Capital City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Abuja,Italian Restaurant,Pizza Place,African Restaurant,Café,BBQ Joint,Bakery,Deli / Bodega,Steakhouse,Burrito Place,Food
3,Addis Ababa,Ethiopian Restaurant,Restaurant,American Restaurant,Italian Restaurant,Greek Restaurant,Café,Chinese Restaurant,French Restaurant,Filipino Restaurant,Food Stand
6,Amsterdam,Café,Italian Restaurant,Restaurant,Chinese Restaurant,French Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Indian Restaurant,Breakfast Spot
7,Andorra la Vella,Restaurant,Spanish Restaurant,Tapas Restaurant,Burger Joint,Café,Diner,French Restaurant,Pizza Place,BBQ Joint,Fast Food Restaurant
9,Antananarivo,Restaurant,Burger Joint,Café,French Restaurant,African Restaurant,Italian Restaurant,Steakhouse,Vietnamese Restaurant,Sushi Restaurant,BBQ Joint
13,Astana,Asian Restaurant,Fast Food Restaurant,Restaurant,Eastern European Restaurant,Café,Modern European Restaurant,Middle Eastern Restaurant,Turkish Restaurant,BBQ Joint,Yunnan Restaurant
14,Asuncion,Restaurant,Café,Fast Food Restaurant,Bakery,Pizza Place,Breakfast Spot,South American Restaurant,Buffet,Italian Restaurant,German Restaurant
20,Bangkok,Buffet,Food Truck,Fast Food Restaurant,French Restaurant,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
24,Beijing,Chinese Restaurant,French Restaurant,Asian Restaurant,Café,Peking Duck Restaurant,Fast Food Restaurant,Yunnan Restaurant,Restaurant,American Restaurant,Eastern European Restaurant
25,Beirut,Café,Restaurant,Middle Eastern Restaurant,Lebanese Restaurant,Fast Food Restaurant,Diner,American Restaurant,Mediterranean Restaurant,Pizza Place,Bakery


#### Cluster 2

In [26]:
cluster2 = world_merged.loc[world_merged['Cluster Labels'] == 1, world_merged.columns[[1] + list(range(5, world_merged.shape[1]))]]
print(cluster2.shape)
cluster2

(42, 11)


Unnamed: 0,Capital City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Amman,Café,Middle Eastern Restaurant,Italian Restaurant,Burger Joint,Breakfast Spot,Falafel Restaurant,Sandwich Place,Bakery,Doner Restaurant,Snack Place
8,Ankara,Café,Turkish Restaurant,Kebab Restaurant,Sandwich Place,Restaurant,Doner Restaurant,Seafood Restaurant,Diner,Bakery,Food Truck
10,Apia,Café,Restaurant,Pizza Place,Burger Joint,Fast Food Restaurant,Indian Restaurant,Chinese Restaurant,Bakery,Food Stand,Food Court
11,Ashgabat,Café,Restaurant,Snack Place,Fast Food Restaurant,Italian Restaurant,Gastropub,Eastern European Restaurant,Pizza Place,BBQ Joint,Fish & Chips Shop
12,Asmara,Café,Restaurant,Asian Restaurant,BBQ Joint,Yunnan Restaurant,Filipino Restaurant,French Restaurant,Food Truck,Food Stand,Food Court
15,Athens,Café,Greek Restaurant,Souvlaki Shop,Falafel Restaurant,Bistro,Taverna,Modern Greek Restaurant,Vegetarian / Vegan Restaurant,Magirio,Indian Restaurant
17,Baku,Café,Restaurant,Turkish Restaurant,Steakhouse,Italian Restaurant,Eastern European Restaurant,Comfort Food Restaurant,Middle Eastern Restaurant,Kebab Restaurant,Pizza Place
19,Bandar Seri Begawan,Asian Restaurant,Café,Food Court,Seafood Restaurant,Chinese Restaurant,Italian Restaurant,Fast Food Restaurant,Indian Restaurant,Japanese Restaurant,BBQ Joint
28,Belmopan,Café,Deli / Bodega,Wings Joint,Restaurant,Chinese Restaurant,Pizza Place,Fast Food Restaurant,Food Stand,Food Court,Food
35,Bratislava,Café,Vegetarian / Vegan Restaurant,Bistro,Bakery,Burger Joint,Indian Restaurant,Italian Restaurant,Vietnamese Restaurant,Restaurant,Thai Restaurant


#### Cluster 3

In [27]:
cluster3 = world_merged.loc[world_merged['Cluster Labels'] == 2, world_merged.columns[[1] + list(range(5, world_merged.shape[1]))]]
print(cluster3.shape)
cluster3

(6, 11)


Unnamed: 0,Capital City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Algiers,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
16,Baghdad,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
97,Majuro,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
105,Melekeok,Café,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
154,Sana'a,Café,Kebab Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
170,Tehran,Café,Persian Restaurant,Breakfast Spot,Pizza Place,Falafel Restaurant,Restaurant,Kebab Restaurant,Bagel Shop,Sandwich Place,Comfort Food Restaurant


#### Cluster 4

In [28]:
cluster4 = world_merged.loc[world_merged['Cluster Labels'] == 3, world_merged.columns[[1] + list(range(5, world_merged.shape[1]))]]
print(cluster4.shape)
cluster4

(10, 11)


Unnamed: 0,Capital City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Accra,Restaurant,African Restaurant,Breakfast Spot,Indian Restaurant,Swiss Restaurant,Donut Shop,Dumpling Restaurant,French Restaurant,Food Truck,Food Stand
42,Canberra,Restaurant,Yunnan Restaurant,Falafel Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
52,Damascus,Restaurant,Diner,Café,Persian Restaurant,Fast Food Restaurant,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna
65,Gitega,American Restaurant,Restaurant,Yunnan Restaurant,Fast Food Restaurant,French Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant
74,Juba,Restaurant,Eastern European Restaurant,Indian Restaurant,Yunnan Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant
91,Lome,Restaurant,Japanese Restaurant,Spanish Restaurant,Falafel Restaurant,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
119,Niamey,Restaurant,French Restaurant,African Restaurant,Italian Restaurant,Fast Food Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant
125,Ouagadougou,Restaurant,Bistro,Yunnan Restaurant,Filipino Restaurant,French Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant
135,Port au Prince,Restaurant,Buffet,Fast Food Restaurant,French Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna
139,Praia,Restaurant,Bakery,Café,Yunnan Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna


#### Cluster 5

In [29]:
cluster5 = world_merged.loc[world_merged['Cluster Labels'] == 4, world_merged.columns[[1] + list(range(5, world_merged.shape[1]))]]
print(cluster5.shape)
cluster5

(11, 11)


Unnamed: 0,Capital City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Basseterre,Fast Food Restaurant,Caribbean Restaurant,Restaurant,American Restaurant,Chinese Restaurant,Pizza Place,Yunnan Restaurant,Food Stand,Food Court,Food
36,Bridgetown,Fast Food Restaurant,Caribbean Restaurant,Seafood Restaurant,Restaurant,Burger Joint,Fish & Chips Shop,Pizza Place,Snack Place,Food Court,Sandwich Place
46,Cayenne,Fast Food Restaurant,Japanese Restaurant,Chinese Restaurant,French Restaurant,Vietnamese Restaurant,Falafel Restaurant,Food Truck,Food Stand,Food Court,Food
63,Gaborone,Fast Food Restaurant,Restaurant,Steakhouse,Portuguese Restaurant,Bistro,Pizza Place,Yunnan Restaurant,Food Court,Food,Fondue Restaurant
81,Kingston,Fast Food Restaurant,Restaurant,Bakery,Pizza Place,Café,Caribbean Restaurant,Chinese Restaurant,Diner,Sandwich Place,Empanada Restaurant
87,Lilongwe,Fast Food Restaurant,Mexican Restaurant,Italian Restaurant,Café,Pizza Place,Yunnan Restaurant,Food Stand,Food Court,Food,Fondue Restaurant
94,Lusaka,Fast Food Restaurant,American Restaurant,Café,Indian Restaurant,Bakery,Yunnan Restaurant,Food Truck,Food Stand,Food Court,Food
102,Manila,Fast Food Restaurant,Chinese Restaurant,Filipino Restaurant,Café,Pizza Place,Bakery,Restaurant,Japanese Restaurant,Asian Restaurant,BBQ Joint
110,Monrovia,Fast Food Restaurant,Yunnan Restaurant,French Restaurant,Food Truck,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna,Fish & Chips Shop
149,Saint George's,Caribbean Restaurant,Seafood Restaurant,Fast Food Restaurant,Pizza Place,Ethiopian Restaurant,Food Stand,Food Court,Food,Fondue Restaurant,Fish Taverna


## Results and Discussion <a name="results"></a>

173 out of 200 countries were clustered into 5 clusters based on their lineup of restaurants. 18 countries were not clustered because Foursquare API did not return venues. The other 9 countries were not clustered because the list of the countries and capital cities I used has extra characters and congeolocator.geocode failed. I could have cleaned up the list.

**Cluster 1**
104 countries, like China, Japan and United States, were clustered into Cluster 1.

**Cluster 2**
42 countries, like Greece, Egypt and Singapore, were clustered into Cluster 2.

**Cluster 3**
6 countries, like Algeria, Iraq and Iran, were clustered into Cluster 3.

**Cluster 4**
10 countries, like Ghana, Australia and Syria, were clustered into Cluster 4.

**Cluster 5**
11 countries, like Jamaica, Philippine and Grenada, were clustered into Cluster 5.

**For improvements**
* The granularity of categories is not consistent. For example, some restaurants are categorized as 'Restaurant' while others 'Sushi Restaurant'. It may be better to ignore the restaurants categorized as 'Restaurant'.
* I explored only within a radius of 1000 meters from the center of each capital city. Larger area will give us more information and accurate analysis.
* This analysis focuses on the rate of categories rather than the number. You may not be able to easily find a restaurant in some countries. Another analysis is required to avoid going to countries with few restaurants.

## Conclusion <a name="conclusion"></a>

This project successfully clustered most of the countries in the world into 5 clusters based on their lineup of restaurants. By selecting one of the countries in the same cluster as yours, you will be able to find similar restaurants there as in your country.

Technically, the geographical information obtained by using geopy.geocoders library and the venue information provided by Foursquare API can be applied not only to restaurants but other categories like arts, outdoors, sights, etc. with a very small modification to the source code above.