## __Capstone Project - The Battle of Neighborhoods__

### A PIECE OF ART IN THE CAPITAL CITIES OF ALL COUNTRIES

### __INTRODUCTION__

Where would you recommend an art adventurer to visit in order to fulfill her/his hungry for various art pieces? The world is full of wonders and maybe the most ambiguous ones are considered as art. Many people dedicated themselves to travel around the world and discover these ambiguities. This project seeks for art related venues in the capitals of all countries, cluster them and reveals the different cities in terms of art venues although they have closer geography. By the nature of this project, travel companies and their costumers might be interested in this project to find similar and different places around the world. 

### __DATA__

In order to accomplish this goal, latitudes and longitudes of the capital cities of all countries in the world are required. The "simplemaps.com" offers a simple, accurate and up-to-date database of the world's cities and their locations. From this data, I need to select the capital cities and using their latitude and longitude values, I need to explore the venues with "art" section around these cities by utilizing Foursquare API.

The "simplemap" data contains city name, corresponding latitude, longitude and country name along with many other features for all cities in the world. I only use the features that I mentioned in this data and then filtered the capital cities for each country. After cleaning the data I have 225 countries with 225 capital cities.

Then I used the acquired location data to explore the nearby art venues from the Foursquare API. Using the "explore" option, I look for top 25 art venues in 10 km radius for each city center and I get "Venue", "Venue Latitude", "Venue Longitude" and "Venue Category" columns with  3316 row in total.

Data clearing is as follows:

### __PART 1: DATA PREPROCESSING__

__Before we get the data and start exploring it, let's import the libraries that we will need__

In [1]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner

import requests # library to handle requests
import zipfile # library to handle .zip files
import io # library to handle io

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import warnings
warnings.filterwarnings("ignore")

print('Libraries imported.')

Libraries imported.


__Now, we can download our data. I have extracted this data from an accurate and up-to-date database of the world's cities, which is offered by "simplemaps"__

In [2]:
url = "https://simplemaps.com/static/data/world-cities/basic/simplemaps_worldcities_basicv1.4.zip"

r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))

df = pd.read_csv(z.open('worldcities.csv'))

print('Data is downloaded.')
print('The shape of the data is:', df.shape)
df.head()

Data is downloaded.
The shape of the data is: (12893, 11)


Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Malishevë,Malisheve,42.4822,20.7458,Kosovo,XK,XKS,Malishevë,admin,,1901597212
1,Prizren,Prizren,42.2139,20.7397,Kosovo,XK,XKS,Prizren,admin,,1901360309
2,Zubin Potok,Zubin Potok,42.9144,20.6897,Kosovo,XK,XKS,Zubin Potok,admin,,1901608808
3,Kamenicë,Kamenice,42.5781,21.5803,Kosovo,XK,XKS,Kamenicë,admin,,1901851592
4,Viti,Viti,42.3214,21.3583,Kosovo,XK,XKS,Viti,admin,,1901328795


__In case a problem emerges in the link, I downloaded the data in .csv format and it can be read by uncommenting the following cell__

In [3]:
# df = pd.read_csv('worldcities.csv')
# df.head()
# print('Data is downloaded.')
# print('The shape of the data is:', df.shape)
# df.head()

__Next, we can render the data and have only the capitals__

In [4]:
df = df[df['capital'] == 'primary'].reset_index(drop=True)
print('The shape of the data is:', df.shape)
print('However there are only {} countries' .format(len(df['country'].unique())))
df.head()

The shape of the data is: (234, 11)
However there are only 225 countries


Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Pristina,Pristina,42.6666,21.1724,Kosovo,XK,XKS,Prishtinë,primary,,1901760068
1,Longyearbyen,Longyearbyen,78.2167,15.6333,Svalbard,XR,XSV,,primary,,1930654114
2,Sanaa,Sanaa,15.3547,44.2066,Yemen,YE,YEM,Amānat al ‘Āşimah,primary,2008000.0,1887750814
3,Pretoria,Pretoria,-25.7069,28.2294,South Africa,ZA,ZAF,Gauteng,primary,1338000.0,1710176249
4,Bloemfontein,Bloemfontein,-29.12,26.2299,South Africa,ZA,ZAF,Free State,primary,463064.0,1710495933


__Although there are 225 countries, there are 234 capital cities corresponding to these countries. The reason is that some countries have multiple capital cities. Since I want only one city for a country, I will select the city with the larger population__

In [5]:
df.sort_values(['country', 'population'], inplace=True)

In [6]:
df.shape

(234, 11)

In [7]:
df = df.drop_duplicates(subset='country', keep="last").reset_index(drop=True)
print('There are {} unique cities' .format(len(df['city'])))
print('There are {} unique countries' .format(len(df['country'])))

There are 225 unique cities
There are 225 unique countries


__So now there is only one city for one country. Let's keep only the relevant columns in the data__

In [8]:
df = df[['country', 'city', 'lat', 'lng']].sort_values('country').reset_index(drop=True)
print(df.shape)
df.head()

(225, 4)


Unnamed: 0,country,city,lat,lng
0,Afghanistan,Kabul,34.5167,69.1833
1,Albania,Tirana,41.3275,19.8189
2,Algeria,Algiers,36.7631,3.0506
3,American Samoa,Pago Pago,-14.274,-170.7046
4,Andorra,Andorra,42.5,1.5165


In [9]:
# Create a world map using the data
map_world = folium.Map(location=[0, 0], zoom_start=1)

# add markers to map
for lat, lng, city, country in zip(df['lat'], df['lng'], df['city'], df['country']):
    label = '{}, {}'.format(city, country)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=1,
        popup=label,
        color='red',
        fill=True,
        fill_color='#000000',
        fill_opacity=0.5,
        parse_html=False).add_to(map_world)  
    
map_world

In [10]:
CLIENT_ID = 'L0XUK3OF0DEMWCI0MYXLNQED0WMMİ1GIBIC43PQC7FYYID2XGX4WV' # your Foursquare ID  
CLIENT_SECRET = 'FYVOYA1GZZ3YQ5320PRWOU5XCVD1Y63TA1BCTAMGIW2OS5ZJTMG4CKH' # your Foursquare Secret  
VERSION = '20180604'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L0XUK3OF0DEMWCIMYXLNQEDWMM1GIBC43PQCFYYID2XGX4WV
CLIENT_SECRET:FYVOYA1ZZ3YQ320RWOU5XCVD1Y3TA1BCTGIW2OS5ZJTG4CKH


In [11]:
def getNearbyVenues(names, latitudes, longitudes, radius=10000, LIMIT=25, section='arts'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&section={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            section,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['shortName']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
svenues = getNearbyVenues(names=df['city'], latitudes=df['lat'], longitudes=df['lng'])

Kabul
Tirana
Algiers
Pago Pago
Andorra
Luanda
The Valley
Saint John's
Buenos Aires
Yerevan
Oranjestad
Canberra
Vienna
Baku
Nassau
Manama
Dhaka
Bridgetown
Minsk
Brussels
Belmopan
Cotonou
Hamilton
Thimphu
La Paz
Sarajevo
Gaborone
Brasília
Bandar Seri Begawan
Sofia
Ouagadougou
Rangoon
Bujumbura
Praia
Phnom Penh
Yaounde
Ottawa
George Town
Bangui
Ndjamena
Santiago
Beijing
Flying Fish Cove
Bogota
Moroni
Brazzaville
Kinshasa
Avarua
San José
Zagreb
Havana
Willemstad
Nicosia
Prague
Yamoussoukro
København
Djibouti
Roseau
Santo Domingo
Quito
Cairo
San Salvador
Malabo
Asmara
Tallinn
Addis Ababa
Stanley
Tórshavn
Suva
Helsinki
Paris
Papeete
Libreville
Banjul
Tbilisi
Berlin
Accra
Gibraltar
Athens
Nuuk
Saint George's
Hagåtña
Guatemala
Saint Peter Port
Conakry
Bissau
Georgetown
Port-au-Prince
Tegucigalpa
Budapest
Reykjavík
New Delhi
Jakarta
Tehran
Baghdad
Dublin
Douglas
Rome
Kingston
Tokyo
Saint Helier
Amman
Astana
Nairobi
Tarawa
Pyongyang
Seoul
Pristina
Kuwait
Bishkek
Vientiane
Riga
Beirut
Maseru
Monr

__Let's check the size of the resulting dataframe__

In [11]:
print(svenues.shape)
svenues.head()

(3316, 7)


Unnamed: 0,City,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kabul,34.5167,69.1833,Air Museum Kabul,34.518995,69.191667,History Museum
1,Kabul,34.5167,69.1833,Cinema Pamir,34.513448,69.172384,Movie Theater
2,Kabul,34.5167,69.1833,Ariana cinema,34.524251,69.190468,Indie Movies
3,Kabul,34.5167,69.1833,Gholghola Gallery,34.529611,69.169659,Art Gallery
4,Kabul,34.5167,69.1833,Cinema Park,34.533814,69.171044,Movie Theater


__Let's check how many venues were returned for each neighborhood__

In [12]:
svenues.groupby('City').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abu Dhabi,25,25,25,25,25,25
Abuja,7,7,7,7,7,7
Accra,22,22,22,22,22,22
Addis Ababa,5,5,5,5,5,5
Algiers,5,5,5,5,5,5
Alofi,1,1,1,1,1,1
Amman,25,25,25,25,25,25
Andorra,5,5,5,5,5,5
Ankara,25,25,25,25,25,25
Antananarivo,5,5,5,5,5,5


__Let's find out how many unique categories can be created from all the returned venues__

In [13]:
print('There are {} unique categories.'.format(len(svenues['Venue Category'].unique())))

There are 41 unique categories.


__Analyzing Each City__

In [14]:
# one hot encoding
world_onehot = pd.get_dummies(svenues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
world_onehot['City'] = svenues['City'] 

# move neighborhood column to the first column
fixed_columns = [world_onehot.columns[-1]] + list(world_onehot.columns[:-1])
world_onehot = world_onehot[fixed_columns]

world_onehot.head()

Unnamed: 0,City,Amphitheater,Art Gallery,Art Museum,Arts & Entertainment,Cineplex,Circus,Comedy Club,Concert Hall,Country Dance Club,Dance Studio,Disc Golf,Drive-in Theater,Exhibit,Go Kart,History Museum,Indie,Indie Movies,Jazz Club,Karaoke,Laser Tag,Memorial Site,Mini Golf,Movie Theater,Museum,Music Venue,Opera House,Outdoor Sculpture,Performing Arts,Piano Bar,Planetarium,Public Art,Racecourse,Rock Club,Roller Rink,Rugby,Salsa Club,Science Museum,Street Art,Theater,Tour Provider,Zoo Exhibit
0,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Kabul,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Kabul,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


__And let's examine the new dataframe size__

In [15]:
print(world_onehot.shape)

(3316, 42)


__Next, let's group rows by city and by taking the sum of the frequency of occurrence of each category__

In [16]:
world_grouped = world_onehot.groupby('City').sum().reset_index()
print(world_grouped.shape)
world_grouped.head()

(215, 42)


Unnamed: 0,City,Amphitheater,Art Gallery,Art Museum,Arts & Entertainment,Cineplex,Circus,Comedy Club,Concert Hall,Country Dance Club,Dance Studio,Disc Golf,Drive-in Theater,Exhibit,Go Kart,History Museum,Indie,Indie Movies,Jazz Club,Karaoke,Laser Tag,Memorial Site,Mini Golf,Movie Theater,Museum,Music Venue,Opera House,Outdoor Sculpture,Performing Arts,Piano Bar,Planetarium,Public Art,Racecourse,Rock Club,Roller Rink,Rugby,Salsa Club,Science Museum,Street Art,Theater,Tour Provider,Zoo Exhibit
0,Abu Dhabi,0,4,2,0,5,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,8,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,2,0,0
1,Abuja,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Accra,0,3,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,2,0,0,0,0,2,0,7,0,0,0,0,0,0,0,0,0,0,0,2,0,4,0,0
3,Addis Ababa,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Algiers,0,0,2,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0
