<h1 align="center"> A Way to Find The Best Place</h1>
<h2 align="center">Clustering the Neighbourhoods of Bogota</h2>

<p align ="center"> Richard Alejandro Mora Perilla 
<br>
<br>
06 February 2021
</p>




# Introduction


Normally when people travel they make a plan or a routine to be able to know certain places of the site to which they are going to travel, however, prior to this, people usually look for those relevant places to which they would like to go according to the opinions or comments of the people who are near there.

# Business Problem

The problem to be solved is that of which sites could be known according to the good opinions of people from places close to that site, in order to make a decision before making your trip also this data collection can also help to recognize a good place to live in the city.


# Data Description

We require geolocation data for both Bogota. The city's zip codes serve as a starting point. Using the zip codes we use we can find the most popular towns, districts, places and their categories of places.


## Bogota

To derive our solution, We scrape our data from https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1

This wikipedia page has information about all the localities.

1. *borough* : Name of Neighbourhood
2. *town* : Name of borough
3. *post_code* : Postal codes for London.

## Foursquare API Data

We will need data about different venues. In order to gain that information we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

Neighborhood
Neighborhood Latitude
Neighborhood Longitude
Venue
Name of the venue e.g. the name of a store or restaurant
Venue Latitude
Venue Longitude
Venue Category

# Methodology

We will be creating our model with the help of Python so we start off by importing all the required packages.

In [3]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans

### Localities

We begin to start collecting and refining the data needed for thwe business solution.

### Data Collection

To get the localities in Bogota, we start by scraping the list of areas of Bogota wiki page.

In [4]:
url = "https://es.wikipedia.org/wiki/Anexo:Localidades_de_Bogot%C3%A1"
wiki = requests.get(url)
wiki

<Response [200]>

Response 200 means that we are able to make the connection

In [12]:
wiki_data = pd.read_html(wiki.text)
wiki_data

[    Nº           Localidad Códigos Postales  Superficie km²[2]​ Población[3]​  \
 0    1             Usaquén    110111-110151               65.31       501 999   
 1    2           Chapinero    110211-110231               38.15       139 701   
 2    3            Santa Fe    110311-110321               45.17       110 048   
 3    4       San Cristóbal    110411-110441               49.09       404 697   
 4    5                Usme    110511-110571              215.06       457 302   
 5    6          Tunjuelito    110611-110621                9.91       199 430   
 6    7                Bosa    110711-110741               23.93       673 077   
 7    8             Kennedy    110811-110881               38.59     1 088 443   
 8    9            Fontibón    110911-110931               33.28       394 648   
 9   10            Engativá    111011-111071               35.88       887 080   
 10  11                Suba    111111-111176              100.56     1 218 513   
 11  12      Bar

Scraping the webpage gives us all the tables present on the page. We need the 2nd table, so selecting the 2nd table.

In [13]:
wiki_data = wiki_data[0]
wiki_data

Unnamed: 0,Nº,Localidad,Códigos Postales,Superficie km²[2]​,Población[3]​,Densidad hab/km²
0,1,Usaquén,110111-110151,65.31,501 999,7 686.4
1,2,Chapinero,110211-110231,38.15,139 701,3 661.88
2,3,Santa Fe,110311-110321,45.17,110 048,2 436.3
3,4,San Cristóbal,110411-110441,49.09,404 697,8 243.98
4,5,Usme,110511-110571,215.06,457 302,2 126.39
5,6,Tunjuelito,110611-110621,9.91,199 430,20 124.11
6,7,Bosa,110711-110741,23.93,673 077,28 126.91
7,8,Kennedy,110811-110881,38.59,1 088 443,28 205.31
8,9,Fontibón,110911-110931,33.28,394 648,11 858.41
9,10,Engativá,111011-111071,35.88,887 080,24 723.52


### Data Preprocessing

we remove the spaces in the column titles and then we add _ between words.

In [14]:
wiki_data.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True)
wiki_data

Unnamed: 0,Nº,Localidad,Códigos_Postales,Superficie_km²[2]​,Población[3]​,Densidad_hab/km²
0,1,Usaquén,110111-110151,65.31,501 999,7 686.4
1,2,Chapinero,110211-110231,38.15,139 701,3 661.88
2,3,Santa Fe,110311-110321,45.17,110 048,2 436.3
3,4,San Cristóbal,110411-110441,49.09,404 697,8 243.98
4,5,Usme,110511-110571,215.06,457 302,2 126.39
5,6,Tunjuelito,110611-110621,9.91,199 430,20 124.11
6,7,Bosa,110711-110741,23.93,673 077,28 126.91
7,8,Kennedy,110811-110881,38.59,1 088 443,28 205.31
8,9,Fontibón,110911-110931,33.28,394 648,11 858.41
9,10,Engativá,111011-111071,35.88,887 080,24 723.52


We see that few columns have no '_' between the words despite applying our function meaning that there are special characters

### Feature Selection

We need only the boroughs, Postal codes, Post town for further steps. We can drop the locations, dial codes and OS grid.

In [127]:
df = wiki_data.drop( [ wiki_data.columns[0], wiki_data.columns[3], wiki_data.columns[4], wiki_data.columns[5]], axis=1)

In [103]:
df.head()

Unnamed: 0,Localidad,Códigos_Postales
0,Usaquén,110111-110151
1,Chapinero,110211-110231
2,Santa Fe,110311-110321
3,San Cristóbal,110411-110441
4,Usme,110511-110571


We change the column names for athe english language

In [128]:
df.columns = ['town','post_code']
df

Unnamed: 0,town,post_code
0,Usaquén,110111-110151
1,Chapinero,110211-110231
2,Santa Fe,110311-110321
3,San Cristóbal,110411-110441
4,Usme,110511-110571
5,Tunjuelito,110611-110621
6,Bosa,110711-110741
7,Kennedy,110811-110881
8,Fontibón,110911-110931
9,Engativá,111011-111071


We need to limit the postal codes of the localities to only one.

In [129]:
df['post_code'] = df['post_code'].str.split('-',expand=True)
df

Unnamed: 0,town,post_code
0,Usaquén,110111
1,Chapinero,110211
2,Santa Fe,110311
3,San Cristóbal,110411
4,Usme,110511
5,Tunjuelito,110611
6,Bosa,110711
7,Kennedy,110811
8,Fontibón,110911
9,Engativá,111011


## Geolocations of the localities of Bogota

### ArcGis API

We need to get the geographical co-ordinates for the neighbourhoods to plot out map. We will use the arcgis package to do so. 

In [41]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

  pd.datetime,


Defining Bogota arcgis geocode function to return latitude and longitude

In [115]:
def get_x_y_uk(address1):
   lat_coords = 0
   lng_coords = 0
   g = geocode(address='{}, Bogota, Colombia, GBR'.format(address1))[0]
   lng_coords = g['location']['x']
   lat_coords = g['location']['y']
   return str(lat_coords) +","+ str(lng_coords)

Checking sample data

In [45]:
c = get_x_y_uk('110111')

In [46]:
c

'4.691000000000031,-74.03344796899995'

Looks good, We Copy over the postal codes of Bogota to pass it into the geolocator function that we just defined above

In [106]:
coordinates = df['post_code']    
coordinates

0     110111
1     110211
2     110311
3     110411
4     110511
5     110611
6     110711
7     110811
8     110911
9     111011
10    111111
11    111211
12    111311
13    111411
14    111511
15    111611
16    111711
17    111811
18    111911
19    112011
Name: post_code, dtype: object

Passing postal codes of Bogota to get the geographical co-ordinates

In [107]:
coordinates_latlng_bg = coordinates.apply(lambda x: get_x_y_uk(x))
coordinates_latlng_bg

0      4.691000000000031,-74.03344796899995
1      4.667800000000057,-74.02591031199995
2     4.5902545530000225,-74.02563020599996
3      4.560310000000072,-74.05352330099998
4      4.496341219000044,-74.13080999999994
5      4.584725000000049,-74.14117047199994
6      4.638727325000048,-74.18991999999997
7     4.6525550000000635,-74.15529871899997
8      4.695895000000064,-74.14457572699996
9      4.724539094000022,-74.11550499999998
10    4.7080964370000515,-74.06316999999996
11      4.67783038400006,-74.07160499999998
12     4.633319682000035,-74.07290770799995
13     4.604255667000075,-74.08991499999996
14     4.587130000000059,-74.09692978299995
15     4.624250000000075,-74.10372384099998
16     4.599095000000034,-74.07197925799994
17    4.5797300000000405,-74.11682462899995
18     4.577027933000068,-74.15146499999997
19     4.223149782000064,-74.18503057699996
Name: post_code, dtype: object

### Latitude

Extracting the latitude from our previously collected coordinates

In [108]:
lat = coordinates_latlng_bg.apply(lambda x: x.split(',')[0])
lat

0      4.691000000000031
1      4.667800000000057
2     4.5902545530000225
3      4.560310000000072
4      4.496341219000044
5      4.584725000000049
6      4.638727325000048
7     4.6525550000000635
8      4.695895000000064
9      4.724539094000022
10    4.7080964370000515
11      4.67783038400006
12     4.633319682000035
13     4.604255667000075
14     4.587130000000059
15     4.624250000000075
16     4.599095000000034
17    4.5797300000000405
18     4.577027933000068
19     4.223149782000064
Name: post_code, dtype: object

### Longitude

Extracting the Longitude from our previously collected coordinates

In [109]:
lng = coordinates_latlng_bg.apply(lambda x: x.split(',')[1])
lng

0     -74.03344796899995
1     -74.02591031199995
2     -74.02563020599996
3     -74.05352330099998
4     -74.13080999999994
5     -74.14117047199994
6     -74.18991999999997
7     -74.15529871899997
8     -74.14457572699996
9     -74.11550499999998
10    -74.06316999999996
11    -74.07160499999998
12    -74.07290770799995
13    -74.08991499999996
14    -74.09692978299995
15    -74.10372384099998
16    -74.07197925799994
17    -74.11682462899995
18    -74.15146499999997
19    -74.18503057699996
Name: post_code, dtype: object

We now have the geographical co-ordinates of the localities of Bogota.

In [111]:
bogota = pd.concat([df,lat.astype(float), lng.astype(float)], axis=1)
bogota.columns= ['town','post_code','latitude','longitude']
bogota

Unnamed: 0,town,post_code,latitude,longitude
0,Usaquén,110111,4.691,-74.033448
1,Chapinero,110211,4.6678,-74.02591
2,Santa Fe,110311,4.590255,-74.02563
3,San Cristóbal,110411,4.56031,-74.053523
4,Usme,110511,4.496341,-74.13081
5,Tunjuelito,110611,4.584725,-74.14117
6,Bosa,110711,4.638727,-74.18992
7,Kennedy,110811,4.652555,-74.155299
8,Fontibón,110911,4.695895,-74.144576
9,Engativá,111011,4.724539,-74.115505


In [112]:
bogota.dtypes

town          object
post_code     object
latitude     float64
longitude    float64
dtype: object

### Co-ordinates for Bogota

Getting the geocode for Bogota to help visualize it on the map

In [117]:
bogota_cood = geocode(address='Bogota, Colombia, GBR')[0]
bogota_lng_coords = bogota_cood['location']['x']
bogota_lat_coords = bogota_cood['location']['y']
bogota_lng_coords

-74.06940999999995

In [118]:
bogota_lat_coords

4.614960000000053

## Visualize the Map of Bogota

In [123]:
map_bogota = folium.Map(location=[bogota_lat_coords, bogota_lng_coords], zoom_start=12)
map_bogota

for latitude, longitude, town in zip(bogota['latitude'], bogota['longitude'], bogota['town']):
    label = '{}'.format(town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_bogota)  
    
map_bogota

### Venues in Bogota

In [130]:
CLIENT_ID = 'FADCITXW01LZDVD3MM4UIIKXDT51PBNUTYRMVS3TO0RL3PQE' 
CLIENT_SECRET = 'XEJWEUHARX20SHWMVOZM2NVXBWATPN5BXNRJLNZPCAK4U4KD'
VERSION = '20180605' # Foursquare API version

In [165]:
LIMIT=100

def getVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    venues.columns = ['Localities', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(venues)

Getting the venues in Bogota

In [166]:
venues_bogota = getVenues(bogota['town'], bogota['latitude'], bogota['longitude'])

Usaquén
Chapinero
Santa Fe
San Cristóbal
Usme
Tunjuelito
Bosa
Kennedy
Fontibón
Engativá
Suba
Barrios Unidos
Teusaquillo
Los Mártires
Antonio Nariño
Puente Aranda
La Candelaria
Rafael Uribe Uribe
Ciudad Bolívar
Sumapaz


In [167]:
venues_bogota.head()

Unnamed: 0,Localities,Latitude,Longitude,Venue,Venue Category
0,Usaquén,4.691,-74.033448,Hotel NH Collection Bogotá Hacienda Royal,Hotel
1,Usaquén,4.691,-74.033448,Juan Valdéz Café,Coffee Shop
2,Usaquén,4.691,-74.033448,Parque Publico Usaquén II,Dog Run
3,Usaquén,4.691,-74.033448,W Bogotá Hotel,Hotel
4,Usaquén,4.691,-74.033448,Bodytech Hacienda,Gymnastics Gym


In [146]:
venues_bogota.shape

(248, 5)

We have 248 results.

### Grouping by Venue Categories

In [168]:
venues_bogota.groupby('Venue Category').max()

Unnamed: 0_level_0,Localities,Latitude,Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Airport,Fontibón,4.695895,-74.144576,Aeropuerto Internacional El Dorado (BOG) (Aero...
Airport Lounge,Fontibón,4.695895,-74.144576,Sala VIP LATAM
Airport Service,Fontibón,4.695895,-74.144576,International Connections Security
Airport Terminal,Fontibón,4.695895,-74.144576,Gate 92
Arcade,La Candelaria,4.599095,-74.071979,Teatro Odeon
...,...,...,...,...
Sushi Restaurant,Usaquén,4.691000,-74.033448,Sushisan Usaquén
Theater,La Candelaria,4.599095,-74.071979,Teatro Colón
Theme Restaurant,Fontibón,4.695895,-74.144576,Andrés Paradero
Toy / Game Store,Engativá,4.724539,-74.115505,Pepe Ganga Outlet


### One Hot Encoding 

In [148]:
bogota_ohe = pd.get_dummies(venues_bogota[['Venue Category']], prefix="", prefix_sep="")
bogota_ohe

Unnamed: 0,Airport,Airport Lounge,Airport Service,Airport Terminal,Arcade,Argentinian Restaurant,Art Museum,Asian Restaurant,Auto Garage,BBQ Joint,...,Shop & Service,Shopping Mall,Snack Place,South American Restaurant,Steakhouse,Sushi Restaurant,Theater,Theme Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
243,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
244,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
245,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
246,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Adding localities into the mix.

In [152]:
bogota_ohe['Localities'] = venues_bogota['Localities'] 

# put the column "Localities" at first
fixed_columns = [bogota_ohe.columns[-1]] + list(bogota_ohe.columns[:-1])
bogota_ohe = bogota_ohe[fixed_columns]

bogota_ohe

Unnamed: 0,Toy / Game Store,Vegetarian / Vegan Restaurant,Localities,Airport,Airport Lounge,Airport Service,Airport Terminal,Arcade,Argentinian Restaurant,Art Museum,...,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,South American Restaurant,Steakhouse,Sushi Restaurant,Theater,Theme Restaurant
0,0,0,Usaquén,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,Usaquén,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,Usaquén,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,Usaquén,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,Usaquén,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
243,0,0,Rafael Uribe Uribe,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
244,0,0,Ciudad Bolívar,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
245,0,0,Ciudad Bolívar,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
246,0,0,Ciudad Bolívar,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories

In [155]:
bogota_gp = bogota_ohe.groupby('Localities').mean().reset_index()
bogota_gp

Unnamed: 0,Localities,Toy / Game Store,Vegetarian / Vegan Restaurant,Airport,Airport Lounge,Airport Service,Airport Terminal,Arcade,Argentinian Restaurant,Art Museum,...,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,South American Restaurant,Steakhouse,Sushi Restaurant,Theater,Theme Restaurant
0,Antonio Nariño,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0
1,Barrios Unidos,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bosa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
3,Chapinero,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ciudad Bolívar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Engativá,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Fontibón,0.0,0.0,0.026316,0.131579,0.026316,0.026316,0.0,0.0,0.0,...,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.026316
7,Kennedy,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
8,La Candelaria,0.0,0.025641,0.0,0.0,0.0,0.0,0.012821,0.025641,0.025641,...,0.012821,0.0,0.0,0.0,0.025641,0.012821,0.0,0.012821,0.012821,0.0
9,Los Mártires,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.375,0.0,0.0,0.0,0.0,0.0,0.0


In [156]:
def common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

There are way too many venue categories, we can take the top 10 to clusters.

In [157]:
indicators = ['st', 'nd', 'rd']

columns = ['Localities']
for ind in np.arange(10):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


### Top venues in Bogota

In [158]:
bogota_localities_venues = pd.DataFrame(columns=columns)
bogota_localities_venues['Localities'] = bogota_gp['Localities']

for ind in np.arange(bogota_gp.shape[0]):
    bogota_localities_venues.iloc[ind, 1:] = common_venues(bogota_gp.iloc[ind, :], 10)

bogota_localities_venues

Unnamed: 0,Localities,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Antonio Nariño,Department Store,Clothing Store,Restaurant,Fried Chicken Joint,Pizza Place,Drugstore,Deli / Bodega,Gym,Mexican Restaurant,Mobile Phone Shop
1,Barrios Unidos,Mexican Restaurant,Latin American Restaurant,Theme Restaurant,Café,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store
2,Bosa,Movie Theater,Shopping Mall,Ice Cream Shop,Pharmacy,Cosmetics Shop,Café,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
3,Chapinero,Mountain,Theme Restaurant,French Restaurant,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store
4,Ciudad Bolívar,Fast Food Restaurant,Seafood Restaurant,Park,Auto Garage,Theme Restaurant,Cultural Center,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
5,Engativá,Pizza Place,Seafood Restaurant,Multiplex,Park,Bar,Bakery,Pub,Gym / Fitness Center,Toy / Game Store,Shop & Service
6,Fontibón,Airport Lounge,Coffee Shop,Café,Duty-free Shop,Pizza Place,Donut Shop,Cosmetics Shop,Fried Chicken Joint,Gift Shop,Cafeteria
7,Kennedy,Burger Joint,Shopping Mall,Beer Garden,Health Food Store,Cupcake Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall
8,La Candelaria,Café,Italian Restaurant,History Museum,Hostel,Restaurant,Cocktail Bar,Mexican Restaurant,Argentinian Restaurant,Bookstore,Breakfast Spot
9,Los Mártires,Shopping Mall,Mobile Phone Shop,Boutique,Clothing Store,Department Store,Restaurant,Theme Restaurant,Cosmetics Shop,Caribbean Restaurant,Cocktail Bar


##  KMeans Model Building

In [189]:
clusters = 5

bogota_gp_clustering = bogota_gp.drop('Localities', 1)

kmeans_bogota = KMeans(n_clusters=clusters, random_state=0).fit(bogota_gp_clustering)
kmeans_bogota

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [190]:
kmeans_bogota.labels_

array([0, 2, 0, 1, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 3, 0], dtype=int32)

In [161]:
bogota_localities_venues.insert(0, 'Cluster Labels', kmeans_bogota.labels_ +1)

Join the dataframe "bogota" with our localities venues sorted to add latitude & longitude for each of the localitie to prepare it for plotting

In [191]:
bogota_data = bogota

bogota_data = bogota_data.join(bogota_localities_venues.set_index('Localities'), on='town')

bogota_data

Unnamed: 0,town,post_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Usaquén,110111,4.691,-74.033448,1.0,Hotel,Café,French Restaurant,Asian Restaurant,Pub,Steakhouse,Bar,Restaurant,Gymnastics Gym,Gym / Fitness Center
1,Chapinero,110211,4.6678,-74.02591,2.0,Mountain,Theme Restaurant,French Restaurant,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store
2,Santa Fe,110311,4.590255,-74.02563,,,,,,,,,,,
3,San Cristóbal,110411,4.56031,-74.053523,,,,,,,,,,,
4,Usme,110511,4.496341,-74.13081,,,,,,,,,,,
5,Tunjuelito,110611,4.584725,-74.14117,4.0,Gym,Latin American Restaurant,Restaurant,Theme Restaurant,Cultural Center,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant
6,Bosa,110711,4.638727,-74.18992,1.0,Movie Theater,Shopping Mall,Ice Cream Shop,Pharmacy,Cosmetics Shop,Café,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
7,Kennedy,110811,4.652555,-74.155299,1.0,Burger Joint,Shopping Mall,Beer Garden,Health Food Store,Cupcake Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall
8,Fontibón,110911,4.695895,-74.144576,1.0,Airport Lounge,Coffee Shop,Café,Duty-free Shop,Pizza Place,Donut Shop,Cosmetics Shop,Fried Chicken Joint,Gift Shop,Cafeteria
9,Engativá,111011,4.724539,-74.115505,1.0,Pizza Place,Seafood Restaurant,Multiplex,Park,Bar,Bakery,Pub,Gym / Fitness Center,Toy / Game Store,Shop & Service



Drop all the NaN values.

In [192]:
bogota_data_wo_nan = bogota_data.dropna(subset=['Cluster Labels'])
bogota_data = bogota_data_wo_nan
bogota_data

Unnamed: 0,town,post_code,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Usaquén,110111,4.691,-74.033448,1.0,Hotel,Café,French Restaurant,Asian Restaurant,Pub,Steakhouse,Bar,Restaurant,Gymnastics Gym,Gym / Fitness Center
1,Chapinero,110211,4.6678,-74.02591,2.0,Mountain,Theme Restaurant,French Restaurant,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store
5,Tunjuelito,110611,4.584725,-74.14117,4.0,Gym,Latin American Restaurant,Restaurant,Theme Restaurant,Cultural Center,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant
6,Bosa,110711,4.638727,-74.18992,1.0,Movie Theater,Shopping Mall,Ice Cream Shop,Pharmacy,Cosmetics Shop,Café,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
7,Kennedy,110811,4.652555,-74.155299,1.0,Burger Joint,Shopping Mall,Beer Garden,Health Food Store,Cupcake Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall
8,Fontibón,110911,4.695895,-74.144576,1.0,Airport Lounge,Coffee Shop,Café,Duty-free Shop,Pizza Place,Donut Shop,Cosmetics Shop,Fried Chicken Joint,Gift Shop,Cafeteria
9,Engativá,111011,4.724539,-74.115505,1.0,Pizza Place,Seafood Restaurant,Multiplex,Park,Bar,Bakery,Pub,Gym / Fitness Center,Toy / Game Store,Shop & Service
10,Suba,111111,4.708096,-74.06317,1.0,Coffee Shop,Fast Food Restaurant,Gastropub,Seafood Restaurant,Ice Cream Shop,Restaurant,Pub,Park,Theme Restaurant,Cosmetics Shop
11,Barrios Unidos,111211,4.67783,-74.071605,3.0,Mexican Restaurant,Latin American Restaurant,Theme Restaurant,Café,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store
12,Teusaquillo,111311,4.63332,-74.072908,1.0,Brewery,Café,Burger Joint,Indie Theater,Peruvian Restaurant,New American Restaurant,Cocktail Bar,Ice Cream Shop,Restaurant,Sandwich Place


### Visualizing the clusters

In [186]:
map_clusters = folium.Map(location=[bogota_lat_coords, bogota_lng_coords], zoom_start=12)

x = np.arange(clusters)
ys = [i + x + (i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(bogota_data['latitude'], bogota_data['longitude'], bogota_data['town'], bogota_data['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster)) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters)
        
map_clusters

## Examining our Clusters

### Cluster 1

In [178]:
bogota_data.loc[bogota_data['Cluster Labels'] == 1, bogota_data.columns[[1] + list(range(5, bogota_data.shape[1]))]]

Unnamed: 0,post_code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,110111,Hotel,Café,French Restaurant,Asian Restaurant,Pub,Steakhouse,Bar,Restaurant,Gymnastics Gym,Gym / Fitness Center
6,110711,Movie Theater,Shopping Mall,Ice Cream Shop,Pharmacy,Cosmetics Shop,Café,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop
7,110811,Burger Joint,Shopping Mall,Beer Garden,Health Food Store,Cupcake Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall
8,110911,Airport Lounge,Coffee Shop,Café,Duty-free Shop,Pizza Place,Donut Shop,Cosmetics Shop,Fried Chicken Joint,Gift Shop,Cafeteria
9,111011,Pizza Place,Seafood Restaurant,Multiplex,Park,Bar,Bakery,Pub,Gym / Fitness Center,Toy / Game Store,Shop & Service
10,111111,Coffee Shop,Fast Food Restaurant,Gastropub,Seafood Restaurant,Ice Cream Shop,Restaurant,Pub,Park,Theme Restaurant,Cosmetics Shop
12,111311,Brewery,Café,Burger Joint,Indie Theater,Peruvian Restaurant,New American Restaurant,Cocktail Bar,Ice Cream Shop,Restaurant,Sandwich Place
13,111411,Shopping Mall,Mobile Phone Shop,Boutique,Clothing Store,Department Store,Restaurant,Theme Restaurant,Cosmetics Shop,Caribbean Restaurant,Cocktail Bar
14,111511,Department Store,Clothing Store,Restaurant,Fried Chicken Joint,Pizza Place,Drugstore,Deli / Bodega,Gym,Mexican Restaurant,Mobile Phone Shop
16,111711,Café,Italian Restaurant,History Museum,Hostel,Restaurant,Cocktail Bar,Mexican Restaurant,Argentinian Restaurant,Bookstore,Breakfast Spot


### Cluster 2

In [177]:
bogota_data.loc[bogota_data['Cluster Labels'] == 2, bogota_data.columns[[1] + list(range(5, bogota_data.shape[1]))]]

Unnamed: 0,post_code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,110211,Mountain,Theme Restaurant,French Restaurant,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store


### Cluster 3

In [179]:
bogota_data.loc[bogota_data['Cluster Labels'] == 3, bogota_data.columns[[1] + list(range(5, bogota_data.shape[1]))]]

Unnamed: 0,post_code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,111211,Mexican Restaurant,Latin American Restaurant,Theme Restaurant,Café,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store


### Cluster 4

In [180]:
bogota_data.loc[bogota_data['Cluster Labels'] == 4, bogota_data.columns[[1] + list(range(5, bogota_data.shape[1]))]]

Unnamed: 0,post_code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,110611,Gym,Latin American Restaurant,Restaurant,Theme Restaurant,Cultural Center,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant


### Cluster 5

In [181]:
bogota_data.loc[bogota_data['Cluster Labels'] == 5, bogota_data.columns[[1] + list(range(5, bogota_data.shape[1]))]]

Unnamed: 0,post_code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,111611,BBQ Joint,Seafood Restaurant,Bakery,Motorcycle Shop,Fast Food Restaurant,Duty-free Shop,Caribbean Restaurant,Clothing Store,Cocktail Bar,Coffee Shop


# Results and Discussion

The towns of Bogota have a diverse number of places to which you could go, the number of towns is few, despite the grouping that it can be seen that Bogota is a multicultural area, with few towns but with a great variety of activities that They can be done depending on what the person is looking for, its restaurants are divided mainly into typical Colombian food, French, Chinese, Italian and sometimes thematic restaurants, it has many museums and green areas that people seem to like very much, many bars and for people looking to exercise all towns have at least one gym.

# Conclusion

The purpose of this project was to explore the localities of the city of Bogota and see how attractive it is for tourists, people who live and also for people who would like to live in Bogota. We explore the city based on the zip codes of the localities and then we extrapolate the common places present in each of the neighborhoods and finally we conclude with the grouping of similar neighborhoods.

We were able to see that each of the towns in the city of Bogota has a wide variety of experiences to offer that are unique in their own way. The cultural diversity is quite evident, which also gives the feeling of inclusion thanks to its section on multinational culture.

Not all towns seem to offer a vacation getaway or romantic getaway with many places to explore, beautiful landscapes, and a wide variety of cultures. But if there is a large amount that could end up in a great experience to spend a short vacation with a pleasant memory not only for its museums, restaurants and parks, but also for its culture, tourist sites and the wonderful Colombian coffee.

## References

1. [The Battle of Neighbourhood — A Tale of Two Cities by Thomas George](https://medium.com/analytics-vidhya/a-tale-of-two-cities-clustering-neighborhoods-of-london-and-paris-5328f69cd8b6)
2. [Foursquare API](https://foursquare.com/)
3. [ArcGIS API](https://www.arcgis.com/index.html)