# Capstone Project - The Battle of Neighborhods - Moving to Santiago de Chile.

## 1. Discussion of the problem and problem background.

### 1.1. Introduction.
The main purpose of this project is to help people moving to Santiago de Chile to explore the facilities around different neighborhoods of the city, in order to settle in one of them.

### 1.2. Background of Santiago.

Santiago de Chile, is the capital and largest city of Chile as well as one of the largest cities in the Americas. It is the center of Chile's most densely populated region, the Santiago Metropolitan Region, the total population of which is 7 million, and more than 6 million of them live in the city's continuous urban area.

Founded in 1541 by the Spanish conquistador Pedro de Valdivia, Santiago has been the capital city of Chile since colonial times. The city has a downtown core of 19th-century neoclassical architecture and winding side-streets.

Santiago is the cultural, political and financial center of Chile and is home to the regional headquarters of many multinational corporations. The Chilean executive and judiciary branches are located in Santiago, but Congress itself meets mostly in nearby Valparaíso.

### 1.3. Target audience. 
The project aims to aid people moving to Santiago de Chile to select the suitable neighborhood that suits their necessities. Particularly young professionals and newly graduated students. Since the biggest companies are located in the capital, fresh grads from different regions of the country are likely to relocate in a city that is mostly unknown for them. Therefore, it’s important for them to get access to the services that provide information necessary to select a new place to settle for a new life. And basically, this could be of use to anyone looking to relocate in the capital.

### 1.4. Problem to solve.
Provide a clustering algorithm that suggests anyone looking to relocate in Santiago de Chile with the amenities included in the neighborhoods.

## 2. Description of the data.
The zip codes of the Metropolitan Region were obtained from: (The Santiago Metropolitan Region is made up of 6 provinces and 52 communes, from these I selected the 11 most important communes from the website) http://www.codigopostalchile.com/santiago-436

Data about different venues is necessary, about different neighborhoods of the specific borough. In order to gain that information, we will use "Foursquare" location information. As previously discussed, Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even picures. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue is as follows:

1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Name of the venue
6. Venue Latitude
7. Venue Longitude
8. Venue Category

## 3. Body.
### 3.1. Installing and importying Python Libraries to work with.

In [1]:
!pip install geocoder
!pip install folium

import pandas as pd
import requests
import numpy as np
import geocoder
import folium
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import xml
import matplotlib.pyplot as plt

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
from bs4 import BeautifulSoup

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print("Ready to go!")

Ready to go!


## 3.1. Data extrantion and cleaning.
I downloaded the data from the website specified in point 2 to a CSV file and then I will upload the data to work with. Due to its simplicity, I figured that the website contained the data in a confusing way and since its not much data it was the fastest way.

In [3]:
import os
path = 'C:\\Users\\HP\\Documents\\Career Develpment\\Business Analytics\\Analytics\\Applied Data Science Capstone'
os.chdir(path)
df_stg = pd.read_csv('Santiago.csv')
df_stg.head()

Unnamed: 0,Código,Neighborhood,Latitud,Longitud
0,7500000,Providencia,-33.43485,-70.61573
1,7550000,Las Condes,-33.40033,-70.50269
2,7630000,Vitacura,-33.37763,-70.56219
3,7690000,Lo Barnechea,-33.35153,-70.33607
4,7750000,Ñuñoa,-33.45449,-70.60383


### 3.1.1. Map of Santiago de Chile.

In [4]:
address = 'Provincia de Santiago'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude_x = location.latitude
longitude_y = location.longitude

In [5]:
map_Santiago = folium.Map(location=[latitude_x, longitude_y], zoom_start=10)

for lat, lng, nei in zip(df_stg['Latitud'], df_stg['Longitud'], df_stg['Neighborhood']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Santiago)  
    
map_Santiago

In [6]:
address = 'Provincia de Santiago'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude_n1 = location.latitude
longitude_n1 = location.longitude

In [7]:
CLIENT_ID = '0IUDXWL4OG11O3WD2EWHJJAK2ACN45NHUPUD4ZJNPIULG1MG' # your Foursquare ID
CLIENT_SECRET = 'NET11B22KLN55CI5ZH0EXIL45DTKWUAB1MXHCTWKXZHDCVZL' # your Foursquare Secret
ACCESS_TOKEN = '{"access_token":"XJN3NZGVXNBXNJJ5C1H5SSLBP12CXW5WCC1L1I2MTFJKD4RQ"}' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30
radius = 700 
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_n1, 
   longitude_n1, 
    radius, 
   LIMIT)
results = requests.get(url).json()

In [8]:
venues=results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.columns

Index(['referralId', 'reasons.count', 'reasons.items', 'venue.id',
       'venue.name', 'venue.location.lat', 'venue.location.lng',
       'venue.location.labeledLatLngs', 'venue.location.distance',
       'venue.location.cc', 'venue.location.city', 'venue.location.state',
       'venue.location.country', 'venue.location.formattedAddress',
       'venue.categories', 'venue.photos.count', 'venue.photos.groups',
       'venue.location.address', 'venue.location.crossStreet',
       'venue.location.neighborhood'],
      dtype='object')

In [9]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## 3.2. Nearby venues

In [10]:
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
nearby_venues.head()

Unnamed: 0,venue.name,venue.categories,venue.location.lat,venue.location.lng
0,KTM,"[{'id': '5032833091d4c4b30a586d60', 'name': 'M...",-33.36212,-70.50715
1,Camino A Farellones Km.0,"[{'id': '4eb1d4d54b900d56c88a45fc', 'name': 'M...",-33.366907,-70.498399
2,La Divina Comida,"[{'id': '4bf58dd8d48988d110941735', 'name': 'I...",-33.35957,-70.507337
3,El Mesón de la Patagonia,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",-33.357492,-70.506567
4,Pollo Al Cognac,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",-33.360354,-70.506882


## 3.3. Categories of nearby venues

In [11]:
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(5)

Unnamed: 0,name,categories,lat,lng
0,KTM,Motorcycle Shop,-33.36212,-70.50715
1,Camino A Farellones Km.0,Mountain,-33.366907,-70.498399
2,La Divina Comida,Italian Restaurant,-33.35957,-70.507337
3,El Mesón de la Patagonia,Restaurant,-33.357492,-70.506567
4,Pollo Al Cognac,Restaurant,-33.360354,-70.506882


In [12]:
# Top 10 Categories
a=pd.Series(nearby_venues.categories)
a.value_counts()[:10]

Restaurant            3
College Gym           1
Mountain              1
Gym                   1
Bus Station           1
Italian Restaurant    1
Garden                1
Motorcycle Shop       1
Chinese Restaurant    1
Auto Garage           1
Name: categories, dtype: int64

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making GET request
        venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
# Nearby Venues
Santiago_venues = getNearbyVenues(names=df_stg['Neighborhood'],
                                   latitudes=df_stg['Latitud'],
                                   longitudes=df_stg['Longitud']
                                  )

Providencia
Las Condes
Vitacura
Lo Barnechea
Ñuñoa
Macul
La Reina
Peñalolén
La Cisterna
El Bosque
La Florida
Santiago
Independencia
Recoleta
Pedro Aguirre Cerda
Quinta Normal
Conchalí
Huechuraba
Renca
Quilicura
La Granja
La Pintana
San Ramón
San Miguel
San Joaquín
Lo Prado
Pudahuel
Cerro Navia
Lo Espejo
Estación Central
Cerrillos
Maipú


In [15]:
print('There are {} Uniques Categories.'.format(len(Santiago_venues['Venue Category'].unique())))
Santiago_venues.groupby('Neighborhood').count().head()

There are 135 Uniques Categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Cerro Navia,5,5,5,5,5,5
Conchalí,4,4,4,4,4,4
Estación Central,18,18,18,18,18,18
Huechuraba,5,5,5,5,5,5
Independencia,20,20,20,20,20,20


### One hot encoding of feauteres.

In [16]:
Santiago_onehot = pd.get_dummies(Santiago_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Santiago_onehot['Neighborhood'] = Santiago_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Santiago_onehot.columns[-1]] + list(Santiago_onehot.columns[:-1])
Santiago_onehot = Santiago_onehot[fixed_columns]
Santiago_grouped = Santiago_onehot.groupby('Neighborhood').mean().reset_index()
Santiago_onehot.head(5)

Unnamed: 0,Neighborhood,Accessories Store,Advertising Agency,American Restaurant,Argentinian Restaurant,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Bakery,Bar,Basketball Court,Beach,Beer Bar,Big Box Store,Board Shop,Bookstore,Brazilian Restaurant,Breakfast Spot,Burger Joint,Bus Line,Bus Station,Cafeteria,Café,Cajun / Creole Restaurant,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Convenience Store,Convention Center,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Electronics Store,Elementary School,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,Fried Chicken Joint,Furniture / Home Store,Garden Center,Gastropub,General Entertainment,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Hardware Store,Hockey Field,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Karaoke Bar,Kids Store,Latin American Restaurant,Liquor Store,Martial Arts School,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Mountain,Moving Target,Music Store,Nightclub,Other Great Outdoors,Outdoors & Recreation,Outlet Mall,Park,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pool,Pool Hall,Pub,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Climbing Spot,Salad Place,Sandwich Place,Seafood Restaurant,Shopping Mall,Ski Area,Snack Place,Soccer Field,Soccer Stadium,South American Restaurant,Southern / Soul Food Restaurant,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Stables,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Temple,Tennis Court,Theater,Track Stadium,Trail,Vegetarian / Vegan Restaurant,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Providencia,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Providencia,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Providencia,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Providencia,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Providencia,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [17]:
num_top_venues = 5
for hood in Santiago_grouped['Neighborhood']:
    print("---- "+hood+" ----")
    temp =Santiago_grouped[Santiago_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Cerro Navia ----
                  venue  freq
0            Food Truck   0.2
1           Bus Station   0.2
2  Arts & Entertainment   0.2
3                  Café   0.2
4                 Plaza   0.2


---- Conchalí ----
                  venue  freq
0  Fast Food Restaurant  0.25
1                 Plaza  0.25
2      Sushi Restaurant  0.25
3          Cupcake Shop  0.25
4             Pool Hall  0.00


---- Estación Central ----
                                      venue  freq
0                               Bus Station  0.11
1                          Sushi Restaurant  0.11
2                              Dance Studio  0.06
3                             Hot Dog Joint  0.06
4  Residential Building (Apartment / Condo)  0.06


---- Huechuraba ----
                   venue  freq
0         Ice Cream Shop   0.2
1         Sandwich Place   0.2
2                Theater   0.2
3  Outdoors & Recreation   0.2
4                 Bakery   0.2


---- Independencia ----
                 venue  freq
0   

### Most common venues around neighborhood

In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Santiago_grouped['Neighborhood']

for ind in np.arange(Santiago_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Santiago_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Cerro Navia,Food Truck,Plaza,Arts & Entertainment,Bus Station,Café,Farmers Market,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
1,Conchalí,Cupcake Shop,Plaza,Sushi Restaurant,Fast Food Restaurant,Yoga Studio,Elementary School,Flea Market,Fish & Chips Shop,Farmers Market,Diner
2,Estación Central,Bus Station,Sushi Restaurant,Residential Building (Apartment / Condo),Bus Line,Burger Joint,Dance Studio,Food Truck,Bakery,Hot Dog Joint,Japanese Restaurant
3,Huechuraba,Ice Cream Shop,Outdoors & Recreation,Sandwich Place,Theater,Bakery,Elementary School,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
4,Independencia,Plaza,Fried Chicken Joint,Restaurant,Department Store,Asian Restaurant,Food,Fast Food Restaurant,Farmers Market,Big Box Store,Park


### 4. K-means Clustering Unsupervised approach.

In [20]:
# Using K-Means to cluster neighborhood into 3 clusters
Santiago_grouped_clustering = Santiago_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=3, random_state=0).fit(Santiago_grouped_clustering)
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 2, 0, 0, 0])

In [22]:
df_stg.head()

Unnamed: 0,Código,Neighborhood,Latitud,Longitud
0,7500000,Providencia,-33.43485,-70.61573
1,7550000,Las Condes,-33.40033,-70.50269
2,7630000,Vitacura,-33.37763,-70.56219
3,7690000,Lo Barnechea,-33.35153,-70.33607
4,7750000,Ñuñoa,-33.45449,-70.60383


In [23]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Santiago_merged =df_stg.iloc[:16,:]

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Santiago_merged = Santiago_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Santiago_merged.head()# check the last columns!

Unnamed: 0,Código,Neighborhood,Latitud,Longitud,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,7500000,Providencia,-33.43485,-70.61573,0.0,Pizza Place,Coffee Shop,Sandwich Place,Restaurant,Burger Joint,Chinese Restaurant,Peruvian Restaurant,Bakery,Martial Arts School,Comfort Food Restaurant
1,7550000,Las Condes,-33.40033,-70.50269,0.0,Soccer Stadium,Pharmacy,Athletics & Sports,Hockey Field,Fast Food Restaurant,College Cafeteria,Soccer Field,Stables,Gym / Fitness Center,Park
2,7630000,Vitacura,-33.37763,-70.56219,0.0,Coffee Shop,Sushi Restaurant,Pharmacy,Park,Café,Farmers Market,Board Shop,Plaza,Supermarket,Sports Bar
3,7690000,Lo Barnechea,-33.35153,-70.33607,0.0,Ski Area,Coffee Shop,Mountain,Rock Climbing Spot,Snack Place,Latin American Restaurant,Yoga Studio,Fast Food Restaurant,Farmers Market,Elementary School
4,7750000,Ñuñoa,-33.45449,-70.60383,0.0,Bakery,Chinese Restaurant,Coffee Shop,Pizza Place,Bar,Sandwich Place,Sushi Restaurant,Yoga Studio,Gymnastics Gym,Italian Restaurant


In [40]:
Santiago_merged.shape

(16, 15)

In [43]:
Santiago_merged = Santiago_merged.drop(9)

In [50]:
Santiago_merged.shape

(15, 15)

In [53]:
Santiago_merged = Santiago_merged.astype({'Cluster Labels': int})

### 5. Map of Clusters

In [54]:
kclusters = 6
# create map
map_clusters = folium.Map(location=[latitude_x, longitude_y], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)
# add markers to the map

markers_colors = []
for lat, lon, nei , cluster in zip(Santiago_merged['Latitud'], 
                                   Santiago_merged['Longitud'], 
                                   Santiago_merged['Neighborhood'], 
                                   Santiago_merged['Cluster Labels']):
    label = folium.Popup(str(nei) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

['#8000ff', '#1996f3', '#4df3ce', '#b2f396', '#ff964f', '#ff0000']


In [55]:
df1=Santiago_merged.loc[Santiago_merged['Cluster Labels'] == 0,Santiago_merged.columns[[2] + list(range(5, Santiago_merged.shape[1]))]]
df2=Santiago_merged.loc[Santiago_merged['Cluster Labels'] == 2,Santiago_merged.columns[[2] + list(range(5, Santiago_merged.shape[1]))]]

### 5.1. Examining Clusters
Finally, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. 

In [57]:
df1

Unnamed: 0,Latitud,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,-33.43485,Pizza Place,Coffee Shop,Sandwich Place,Restaurant,Burger Joint,Chinese Restaurant,Peruvian Restaurant,Bakery,Martial Arts School,Comfort Food Restaurant
1,-33.40033,Soccer Stadium,Pharmacy,Athletics & Sports,Hockey Field,Fast Food Restaurant,College Cafeteria,Soccer Field,Stables,Gym / Fitness Center,Park
2,-33.37763,Coffee Shop,Sushi Restaurant,Pharmacy,Park,Café,Farmers Market,Board Shop,Plaza,Supermarket,Sports Bar
3,-33.35153,Ski Area,Coffee Shop,Mountain,Rock Climbing Spot,Snack Place,Latin American Restaurant,Yoga Studio,Fast Food Restaurant,Farmers Market,Elementary School
4,-33.45449,Bakery,Chinese Restaurant,Coffee Shop,Pizza Place,Bar,Sandwich Place,Sushi Restaurant,Yoga Studio,Gymnastics Gym,Italian Restaurant
5,-33.48684,Restaurant,Pharmacy,Seafood Restaurant,Sushi Restaurant,Gymnastics Gym,Bar,Pool,Peruvian Restaurant,Chinese Restaurant,Sandwich Place
6,-33.44291,Plaza,Chinese Restaurant,Cupcake Shop,Mobile Phone Shop,Café,Sushi Restaurant,Shopping Mall,Liquor Store,Fish & Chips Shop,General Entertainment
7,-33.4816,Pub,BBQ Joint,Fast Food Restaurant,Coffee Shop,Shopping Mall,Soccer Field,South American Restaurant,Sushi Restaurant,Gym / Fitness Center,Pizza Place
8,-33.5288,Pizza Place,Chinese Restaurant,Gym,Pharmacy,Bakery,Bar,Fast Food Restaurant,Middle Eastern Restaurant,Park,Fried Chicken Joint
10,-33.5247,Cajun / Creole Restaurant,Rental Car Location,Moving Target,Gym / Fitness Center,Mountain,Park,Yoga Studio,Fast Food Restaurant,Farmers Market,Elementary School


In [58]:
df2

Unnamed: 0,Latitud,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


Due to the similarities in the venues founded by the API Foursquare in Santiago and probably due to people not activily giving their input about the venues available in the app, there were not richness of data; so the algorithm clustered all the venues in one big cluster, not particularly what we were looking for. 