# 1. Introduction

### An important Gym Franchise wants to establish gyms in Central America. The have made studies that show that gyms are trending in Central America. 
### To start, they have decided to inaugurate two gyms in Costa Rica and then expand through the rest of the countries of Central America.
### They want to know in wich cities it would be a great idea to stablish without having a great competition at the beginning.
### So they decided to look for cities that are the most populated and at the same time, places where gyms are not among the five most common places.

# 2. Data

### For this project I'm going to import data from the webpage: https://simplemaps.com/data/world-cities.
### This webpage shows the information of many countries, including: city, latitude, longitude, country, country abbreviation, state or province and population, among others.
### This webpage lets to download the information in a csv file. So I'm going to use pandas to read that csv file.
### Then I'm going to filter the data to use only the information of the available cities from Costa Rica as requested for this project.
### After that, I'm going to use Foursquare API to fetch all the information of the common venues in every city.
### Then, I will use K-means method to clusterize the cities and focus only in the most populated cities that are close to each other.
### Finally, in the most populated clusters, I will look for those two most populated cities where gyms are not among the five most common places, to recomend stablish the gyms in those cities.¶

# 3. Methodology

### First, the information was imported from a webpage and then it was read with pandas.
### Then the Foursquare API was used to extract information about the most common venues of the cities of interest.
### After that, K-means method was used to clusterize the cities and focus only in the most populated cities that are close to each other.
### Finally, the clusters were used to look for those two most populated cities where gyms are not among the five most common places, to recomend stablish the gyms in those cities.¶

#### Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.12.5          |   py36h5fab9bb_1         143 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.1.0                |     pyhd3deb0d_0          64 KB  conda-forge
    openssl-1.1.1j             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.4 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.1.0-pyhd3deb0d_0

The following packages will be

#### Import Data

#### Import Data from webpage: https://simplemaps.com/data/world-cities

In [91]:
df=pd.read_csv('worldcities.csv')
df.head()

Unnamed: 0,city,city_ascii,lat,lng,country,iso2,iso3,admin_name,capital,population,id
0,Tokyo,Tokyo,35.6897,139.6922,Japan,JP,JPN,Tōkyō,primary,37977000.0,1392685764
1,Jakarta,Jakarta,-6.2146,106.8451,Indonesia,ID,IDN,Jakarta,primary,34540000.0,1360771077
2,Delhi,Delhi,28.66,77.23,India,IN,IND,Delhi,admin,29617000.0,1356872604
3,Mumbai,Mumbai,18.9667,72.8333,India,IN,IND,Mahārāshtra,admin,23355000.0,1356226629
4,Manila,Manila,14.5958,120.9772,Philippines,PH,PHL,Manila,primary,23088000.0,1608618140


#### Eliminate unnecessary columns

In [92]:
df = df.drop(['iso2','iso3','id','city_ascii','capital'], 1)
df.head()

Unnamed: 0,city,lat,lng,country,admin_name,population
0,Tokyo,35.6897,139.6922,Japan,Tōkyō,37977000.0
1,Jakarta,-6.2146,106.8451,Indonesia,Jakarta,34540000.0
2,Delhi,28.66,77.23,India,Delhi,29617000.0
3,Mumbai,18.9667,72.8333,India,Mahārāshtra,23355000.0
4,Manila,14.5958,120.9772,Philippines,Manila,23088000.0


#### Rename columns

In [93]:
df.columns = ['City','Latitude','Longitude','Country','Province','Population']
df.head()

Unnamed: 0,City,Latitude,Longitude,Country,Province,Population
0,Tokyo,35.6897,139.6922,Japan,Tōkyō,37977000.0
1,Jakarta,-6.2146,106.8451,Indonesia,Jakarta,34540000.0
2,Delhi,28.66,77.23,India,Delhi,29617000.0
3,Mumbai,18.9667,72.8333,India,Mahārāshtra,23355000.0
4,Manila,14.5958,120.9772,Philippines,Manila,23088000.0


#### Filter the table to obtain the information of the available cities from Costa Rica

In [94]:
df2 = df[df['Country'] == 'Costa Rica'].reset_index(drop=True)
df2.head()

Unnamed: 0,City,Latitude,Longitude,Country,Province,Population
0,San José,9.9333,-84.0833,Costa Rica,San José,288054.0
1,Cartago,9.8667,-83.9167,Costa Rica,Cartago,221733.0
2,Puerto Limón,10.0022,-83.084,Costa Rica,Limón,61072.0
3,Liberia,10.6338,-85.4333,Costa Rica,Guanacaste,45380.0
4,Alajuela,10.0278,-84.2041,Costa Rica,Alajuela,42975.0


#### Examine number of cities available

In [95]:
df2.shape

(18, 6)

#### Get the geographical coordinates of San Jose, capital of Costa Rica

In [96]:
address = 'San Jose, CR'

geolocator = Nominatim(user_agent="cr_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Costa Rica are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Costa Rica are 9.9325427, -84.0795782.


#### Create map of Costa Rica using latitude and longitude values

In [97]:

map_cr = folium.Map(location=[latitude, longitude], zoom_start=9)

# add markers to map
for lat, lng, label in zip(df2['Latitude'], df2['Longitude'], df2['City']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cr)  
    
map_cr

#### Define Foursquare Credentials and Version

In [98]:
CLIENT_ID = '5OQC1AURHZFISWL3IKHBLWVJWLMZVE4IQ2LFERPPGEV4SJXG' # your Foursquare ID
CLIENT_SECRET = 'GAPAK1LZP3W4ZLAYDGHHYJD5KPLATRVLPGXHYQVNPHCKACSO' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5OQC1AURHZFISWL3IKHBLWVJWLMZVE4IQ2LFERPPGEV4SJXG
CLIENT_SECRET:GAPAK1LZP3W4ZLAYDGHHYJD5KPLATRVLPGXHYQVNPHCKACSO


#### Function to extract the information from all the cities

In [99]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['City', 
                  'City Latitude', 
                  'City Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Code to run the above function on each city

In [100]:
CR_venues = getNearbyVenues(names=df2['City'],
                                   latitudes=df2['Latitude'],
                                   longitudes=df2['Longitude']
                                  )

San José
Cartago
Puerto Limón
Liberia
Alajuela
Puntarenas
San Juan
Heredia
Santa Ana
Buenos Aires
Quesada
Cañas
El Roble
Santiago
Sixaola
La Cruz
Golfito
Ciudad Cortés


#### Check the size of the resulting dataframe

In [101]:
print(CR_venues.shape)
CR_venues.head()

(255, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,San José,9.9333,-84.0833,La Sorbetera de Lolo Mora,9.934467,-84.081841,Ice Cream Shop
1,San José,9.9333,-84.0833,Rincón Retana,9.934561,-84.082022,Sandwich Place
2,San José,9.9333,-84.0833,El Tostador,9.934511,-84.083321,Café
3,San José,9.9333,-84.0833,Mercado Central de San José,9.934492,-84.08183,Market
4,San José,9.9333,-84.0833,Soda Tala,9.934671,-84.081785,Restaurant


#### Check how many venues were returned for each city

In [102]:
CR_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alajuela,4,4,4,4,4,4
Buenos Aires,4,4,4,4,4,4
Cartago,29,29,29,29,29,29
Cañas,2,2,2,2,2,2
Ciudad Cortés,6,6,6,6,6,6
El Roble,5,5,5,5,5,5
Heredia,28,28,28,28,28,28
La Cruz,4,4,4,4,4,4
Liberia,15,15,15,15,15,15
Puerto Limón,1,1,1,1,1,1


#### Let's find out how many unique categories can be curated from all the returned venues

In [103]:
print('There are {} uniques categories.'.format(len(CR_venues['Venue Category'].unique())))

There are 88 uniques categories.


#### Analyze Each Neighborhood

In [104]:
# one hot encoding
CR_onehot = pd.get_dummies(CR_venues[['Venue Category']], prefix="", prefix_sep="")

# add city column back to dataframe
CR_onehot['City'] = CR_venues['City'] 

# move city column to the first column
fixed_columns = [CR_onehot.columns[-1]] + list(CR_onehot.columns[:-1])
CR_onehot = CR_onehot[fixed_columns]

CR_onehot.head()

Unnamed: 0,City,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Bakery,Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Boutique,Boxing Gym,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Caribbean Restaurant,Chinese Restaurant,Church,Coffee Shop,Convenience Store,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Electronics Store,Event Space,Falafel Restaurant,Fast Food Restaurant,Food,Food & Drink Shop,Fried Chicken Joint,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Historic Site,Hotel,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Latin American Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Museum,Music Venue,Other Repair Shop,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pub,Racetrack,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Snack Place,Soccer Stadium,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Theater,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Women's Store
0,San José,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,San José,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,San José,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,San José,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,San José,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Let's examine the new dataframe size

In [105]:
CR_onehot.shape

(255, 89)

#### Let's group rows by City and by taking the mean of the frequency of occurrence of each category

In [106]:
CR_grouped = CR_onehot.groupby('City').mean().reset_index()
CR_grouped

Unnamed: 0,City,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Bakery,Bar,Bed & Breakfast,Beer Garden,Big Box Store,Bistro,Boutique,Boxing Gym,Brewery,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Caribbean Restaurant,Chinese Restaurant,Church,Coffee Shop,Convenience Store,Creperie,Deli / Bodega,Department Store,Dessert Shop,Diner,Electronics Store,Event Space,Falafel Restaurant,Fast Food Restaurant,Food,Food & Drink Shop,Fried Chicken Joint,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Historic Site,Hotel,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Latin American Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Museum,Music Venue,Other Repair Shop,Park,Peruvian Restaurant,Pet Store,Pharmacy,Pizza Place,Plaza,Pool,Pub,Racetrack,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Snack Place,Soccer Stadium,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Theater,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Wings Joint,Women's Store
0,Alajuela,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Buenos Aires,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Cartago,0.0,0.0,0.0,0.034483,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.034483,0.0,0.0,0.0,0.0,0.0,0.034483,0.103448,0.034483,0.0,0.034483,0.0,0.034483,0.034483,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483
3,Cañas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ciudad Cortés,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,El Roble,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0
6,Heredia,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.071429,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.107143,0.035714,0.0,0.0,0.035714,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0
7,La Cruz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Liberia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Puerto Limón,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each neighborhood along with the top 5 most common venues

In [107]:
num_top_venues = 5

for City in CR_grouped['City']:
    print("----"+City+"----")
    temp = CR_grouped[CR_grouped['City'] == City].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alajuela----
               venue  freq
0  Food & Drink Shop  0.25
1               Pool  0.25
2                Bar  0.25
3         Restaurant  0.25
4  Other Repair Shop  0.00


----Buenos Aires----
                       venue  freq
0                 Steakhouse  0.25
1                  Juice Bar  0.25
2                Karaoke Bar  0.25
3  Latin American Restaurant  0.25
4          Other Repair Shop  0.00


----Cartago----
                venue  freq
0         Pizza Place  0.10
1  Mexican Restaurant  0.07
2              Bakery  0.07
3       Women's Store  0.03
4                 Gym  0.03


----Cañas----
         venue  freq
0        Hotel   1.0
1  Art Gallery   0.0
2          Pub   0.0
3         Pool   0.0
4        Plaza   0.0


----Ciudad Cortés----
           venue  freq
0           Park  0.17
1       Pharmacy  0.17
2  Grocery Store  0.17
3    Bus Station  0.17
4    Snack Place  0.17


----El Roble----
                 venue  freq
0             Bus Stop   0.4
1    Other Repair Sho

#### Function to sort the venues in descending order

In [108]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Let's create the new dataframe and display the top 5 venues for each City.

In [109]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Cities_venues_sorted = pd.DataFrame(columns=columns)
Cities_venues_sorted['City'] = CR_grouped['City']

for ind in np.arange(CR_grouped.shape[0]):
    Cities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(CR_grouped.iloc[ind, :], num_top_venues)

Cities_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alajuela,Restaurant,Food & Drink Shop,Bar,Pool,Women's Store
1,Buenos Aires,Juice Bar,Karaoke Bar,Latin American Restaurant,Steakhouse,Women's Store
2,Cartago,Pizza Place,Bakery,Mexican Restaurant,Women's Store,Gym
3,Cañas,Hotel,Harbor / Marina,Coffee Shop,Convenience Store,Creperie
4,Ciudad Cortés,Snack Place,Grocery Store,Pharmacy,Bus Station,Park


#### Cluster Neighborhoods

In [110]:
# set number of clusters
kclusters = 5

CR_grouped_clustering = CR_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(CR_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 3, 0, 0, 0, 0, 0, 1], dtype=int32)

#### Let's create a new dataframe that includes the cluster as well as the previous information

In [111]:
# add clustering labels
Cities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

CR_merged = df2

# merge CR_grouped with df2 to add previous information for each city
CR_merged = CR_merged.join(Cities_venues_sorted.set_index('City'), on='City')

CR_merged.head() # check the last columns!

Unnamed: 0,City,Latitude,Longitude,Country,Province,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,San José,9.9333,-84.0833,Costa Rica,San José,288054.0,0.0,Coffee Shop,Sandwich Place,Fast Food Restaurant,Restaurant,Convenience Store
1,Cartago,9.8667,-83.9167,Costa Rica,Cartago,221733.0,0.0,Pizza Place,Bakery,Mexican Restaurant,Women's Store,Gym
2,Puerto Limón,10.0022,-83.084,Costa Rica,Limón,61072.0,1.0,Harbor / Marina,Wings Joint,Coffee Shop,Convenience Store,Creperie
3,Liberia,10.6338,-85.4333,Costa Rica,Guanacaste,45380.0,0.0,Chinese Restaurant,Hotel,Restaurant,Bar,Bed & Breakfast
4,Alajuela,10.0278,-84.2041,Costa Rica,Alajuela,42975.0,0.0,Restaurant,Food & Drink Shop,Bar,Pool,Women's Store


In [114]:
CR_merged.dropna()

Unnamed: 0,City,Latitude,Longitude,Country,Province,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,San José,9.9333,-84.0833,Costa Rica,San José,288054.0,0.0,Coffee Shop,Sandwich Place,Fast Food Restaurant,Restaurant,Convenience Store
1,Cartago,9.8667,-83.9167,Costa Rica,Cartago,221733.0,0.0,Pizza Place,Bakery,Mexican Restaurant,Women's Store,Gym
2,Puerto Limón,10.0022,-83.084,Costa Rica,Limón,61072.0,1.0,Harbor / Marina,Wings Joint,Coffee Shop,Convenience Store,Creperie
3,Liberia,10.6338,-85.4333,Costa Rica,Guanacaste,45380.0,0.0,Chinese Restaurant,Hotel,Restaurant,Bar,Bed & Breakfast
4,Alajuela,10.0278,-84.2041,Costa Rica,Alajuela,42975.0,0.0,Restaurant,Food & Drink Shop,Bar,Pool,Women's Store
5,Puntarenas,9.9764,-84.8339,Costa Rica,Puntarenas,41528.0,0.0,Seafood Restaurant,Ice Cream Shop,Chinese Restaurant,Restaurant,Fast Food Restaurant
6,San Juan,9.9609,-84.0731,Costa Rica,San José,24944.0,0.0,Latin American Restaurant,Pub,Sandwich Place,Fast Food Restaurant,Pet Store
7,Heredia,9.9985,-84.1169,Costa Rica,Heredia,22700.0,0.0,Gym,Ice Cream Shop,Coffee Shop,Bar,Bistro
8,Santa Ana,9.932,-84.176,Costa Rica,San José,11320.0,0.0,Restaurant,Steakhouse,Shop & Service,Brewery,Ice Cream Shop
9,Buenos Aires,9.1497,-83.3334,Costa Rica,Puntarenas,45000.0,0.0,Juice Bar,Karaoke Bar,Latin American Restaurant,Steakhouse,Women's Store


#### Examine Clusters

#### Cluster 1

In [118]:
CR_merged.loc[CR_merged['Cluster Labels'] == 0, CR_merged.columns[[1] + list(range(5, CR_merged.shape[1]))]]

Unnamed: 0,Latitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,9.9333,288054.0,0.0,Coffee Shop,Sandwich Place,Fast Food Restaurant,Restaurant,Convenience Store
1,9.8667,221733.0,0.0,Pizza Place,Bakery,Mexican Restaurant,Women's Store,Gym
3,10.6338,45380.0,0.0,Chinese Restaurant,Hotel,Restaurant,Bar,Bed & Breakfast
4,10.0278,42975.0,0.0,Restaurant,Food & Drink Shop,Bar,Pool,Women's Store
5,9.9764,41528.0,0.0,Seafood Restaurant,Ice Cream Shop,Chinese Restaurant,Restaurant,Fast Food Restaurant
6,9.9609,24944.0,0.0,Latin American Restaurant,Pub,Sandwich Place,Fast Food Restaurant,Pet Store
7,9.9985,22700.0,0.0,Gym,Ice Cream Shop,Coffee Shop,Bar,Bistro
8,9.932,11320.0,0.0,Restaurant,Steakhouse,Shop & Service,Brewery,Ice Cream Shop
9,9.1497,45000.0,0.0,Juice Bar,Karaoke Bar,Latin American Restaurant,Steakhouse,Women's Store
12,9.9771,15759.0,0.0,Bus Stop,Big Box Store,Tree,Other Repair Shop,Event Space


#### Cluster 2

In [119]:
CR_merged.loc[CR_merged['Cluster Labels'] == 1, CR_merged.columns[[1] + list(range(5, CR_merged.shape[1]))]]

Unnamed: 0,Latitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,10.0022,61072.0,1.0,Harbor / Marina,Wings Joint,Coffee Shop,Convenience Store,Creperie


#### Cluster 3

In [120]:
CR_merged.loc[CR_merged['Cluster Labels'] == 2, CR_merged.columns[[1] + list(range(5, CR_merged.shape[1]))]]

Unnamed: 0,Latitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,9.5083,10234.0,2.0,Bus Station,Women's Store,Church,Convenience Store,Creperie


#### Cluster 4

In [121]:
CR_merged.loc[CR_merged['Cluster Labels'] == 3, CR_merged.columns[[1] + list(range(5, CR_merged.shape[1]))]]

Unnamed: 0,Latitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
11,10.43,20306.0,3.0,Hotel,Harbor / Marina,Coffee Shop,Convenience Store,Creperie


#### Cluster 5

In [122]:
CR_merged.loc[CR_merged['Cluster Labels'] == 4, CR_merged.columns[[1] + list(range(5, CR_merged.shape[1]))]]

Unnamed: 0,Latitude,Population,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,10.3305,31106.0,4.0,Gymnastics Gym,Market,Event Space,Coffee Shop,Convenience Store


# Results

### From Cluster 1, it is recommended to establish the gyms in the cities of San Jose and Liberia.

# Discussion section

### The results may not represent the reality, because in the selected country the common venues does not necessarily include all the exact venues that exist in the area. For that reason it is recommended to made a second analysis.

# Conclusion