# Final Capstone Project: Bogota Foodie Tour


## Backgroud
Bogota has dramatically increased its gastronomical offer. Dozens of restaurants are opened every day, and hot zones are wider each month, with alternative offerings being opened in completely new places, far away from traditional restarant zones.

## Indroduction and business reason

This project intends to cluster neighborhoods in Bogota depending on their restaurants' offering so that Foodies can easily define a Tour route based on their preferences. Two deliverables will be presented at the end of the project:

1. Descriptive statistics from Bogotá venues, including the amount of restaurants per category, and a map showing their geographical distribution
2. Clustering of the neighborhoods based on offer density and restaurant categories. Following deliverables will be presented: 
    a. A description of each cluster 
    b. A table with the clasification of each neighborhood in Bogota, identifing the cluster it belongs to, amount of restaurants in the area, and distribution of restaurants per category
    c. A map to represent neighborhood clusters in a graphical matter

This project will allow the users to have an accurate map of the restaurant offer in Bogota and. This can be an input for the commercial zones to increase their visibility for Foodie tourism or for the tourism industry to better structure gastronomic tourism offerings.



## Data 

### Data Source for Bogota Neighborhood Location

Bogota's Urban Laboratory Institute ("Laboratorio Urbano de Bogota") has a very complete database that includes the name, location and description of each neighborhood in Bogota. This will be the main source of external data for the project. Raw data can be downloaded from the following link: https://bogota-laburbano.opendatasoft.com/explore/dataset/barrios_prueba/download?format=xls

### Data Uploading and Data Cleaning for Bogota Data

In order to be able to interact with the data in my jupyter notebook, I will upload the file to the project storage using the "File and Add data" funciontality from the Notebook application. 

Once updloaded I will have to clean the data: 
1. I will drop several lines that doesn't have a Borough identified ("Localidad"), that are not a legal neighborhood, or that don't have a name
2. The original table has the following columns: "OBJECTID	Codigo Localidad	Localidad	Estado	Nombre	Codigo	SHAPE.AREA	SHAPE.LEN	geo_shape	geo_point_2d" I will only use the columns "OBJECTID	Localidad Nombre geo_point_2d". Column geo_point_2d includes both Latitude Longitude and has to be splitted.
3. I will re-name the columns as follows: "Neighborhood_ID	Borough Name Latitude Longitude" 

<b>For Example</b>: Neighborhood "Usaquén" is one of the most important ones for the restaurant industry in Bogota. This neighborhood will definetely have to be included in the data set, final values for the mentioned columns shall be:

1. <b>Neighborhood_ID</b>: 657
2. <b>Borough</b>: Usaquén
3. <b>Name</b>: Usaquén
4. <b>Latitude</b>: 4.69474025606
5. <b>Borough</b>: -74.030740809

Neighborhood "Ciudadela El Poblado" in line 3000 from the original table on the contrary is not a legal neighborhood and doesn't specify a Borough, it will be dropped accordingly

The following code shows the steps to manage the data and the final result from the table


### Foursquare query

Once the Bogota information is clean and the data is in the proper format, I will perform a search in Foursquare for restaurant venues in each of the restaurants. The output of the query will have to be cleaned and grouped in order to provide meainingfull results:
1. The amount of categories obtained from the Foursquare query is extensive, and as a result it will be harder to interpretate and cluster results

<b>For Example</b>: When I query Usapen, it will probably show a wide number of categories for different Asian restaurants, including probably: Asian, Filipino, Chineese. I will group them as Asian. 

The final categories will be defined based on an exploratory analysis that will be performed once all the 

2. The information will have to be merged with the original dataframe and consolidated, 


### Clustering

Once the final dataframe is properly set, clustering will be performed. The number of cluster will be defined based on the analysis performed with the data.


### Analysis and presentation

An analysis will be made on each cluster to provide a description and characterization. A coloured map will be presented with the different neighborhoods and their distribution


In [1]:
# Import Libraries
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3
import numpy as np # library to handle data in a vectorized manner
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [7]:
# Code to access BM Cloud Object Storage including  credentials.
@hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_05f8314fda894ba0b966bc0d65084f49 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='L6FVRNp97UsiIX4tlvcrIKTer1mTqXqWSBrvIoXQD71z',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

In [53]:
body = client_05f8314fda894ba0b966bc0d65084f49.get_object(Bucket='capstonew45finalprojectbogotafood-donotdelete-pr-opjjibxwizmams',Key='barrios_bogota_2.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_bog_neigh = pd.read_excel(body)
df_bog_neigh.head()

Unnamed: 0,OBJECTID,Codigo Localidad,Localidad,Estado,Nombre,Codigo,SHAPE.AREA,SHAPE.LEN,geo_shape,geo_point_2d
0,1430,19,Ciudad Bolívar,LEGALIZADO,Jerusalén La Pradera,190083,2.199423e-05,0.022444,"{""type"": ""Polygon"", ""coordinates"": [[[-74.1599...","4.57170783064, -74.1623352778"
1,1431,19,Ciudad Bolívar,LEGALIZADO,Gibraltar I,190076,7.35052e-07,0.003827,"{""type"": ""Polygon"", ""coordinates"": [[[-74.1409...","4.55942859701, -74.1412673602"
2,1434,19,Ciudad Bolívar,LEGALIZADO,San Francisco La Palmera,190212,7.373517e-07,0.005711,"{""type"": ""Polygon"", ""coordinates"": [[[-74.1438...","4.56057102018, -74.143503033"
3,1437,19,Ciudad Bolívar,LEGALIZADO,La Florida del Sur,190100,1.1878e-06,0.006444,"{""type"": ""Polygon"", ""coordinates"": [[[-74.1388...","4.54954369286, -74.1388429955"
4,1440,19,Ciudad Bolívar,SIN LEGALIZAR,Urb. Villa del Rio Zona Comercial,190285,1.229367e-05,0.016802,"{""type"": ""Polygon"", ""coordinates"": [[[-74.1550...","4.59753400889, -74.152870721"


In [54]:
# Check shape
df_bog_neigh.shape

(3871, 10)

In [55]:
# Keep only legal neighborhoods
df_bog_neigh=df_bog_neigh[df_bog_neigh.Estado == 'LEGALIZADO']

In [56]:
# Drop neighborhoods without Borough
df_bog_neigh['Localidad'].replace('', np.nan, inplace=True)
df_bog_neigh.dropna(subset=['Localidad'], inplace=True)
df_bog_neigh.shape

(1618, 10)

In [57]:
# Drop Columns
df_bog_neigh.drop(['Codigo Localidad', 'Estado', 'Codigo', 'SHAPE.AREA', 'SHAPE.LEN', 'geo_shape'], axis=1, inplace=True)
df_bog_neigh.head()

Unnamed: 0,OBJECTID,Localidad,Nombre,geo_point_2d
0,1430,Ciudad Bolívar,Jerusalén La Pradera,"4.57170783064, -74.1623352778"
1,1431,Ciudad Bolívar,Gibraltar I,"4.55942859701, -74.1412673602"
2,1434,Ciudad Bolívar,San Francisco La Palmera,"4.56057102018, -74.143503033"
3,1437,Ciudad Bolívar,La Florida del Sur,"4.54954369286, -74.1388429955"
6,1448,Ciudad Bolívar,Naciones Unidas (Santa Rosa),"4.53915461488, -74.1508507618"


In [58]:
# Split coordinates in two columns and rename
coord = df_bog_neigh["geo_point_2d"].str.split(", ", n = 1, expand = True)
df_bog_neigh["Latitude"]=coord[0]
df_bog_neigh["Longitude"]=coord[1]
df_bog_neigh.drop(['geo_point_2d'], axis=1, inplace=True)
df_bog_neigh.rename(columns={'OBJECTID': 'Neighborhood_ID', 'Localidad': 'Borough', 'Nombre': 'Name'}, inplace=True)
df_bog_neigh

Unnamed: 0,Neighborhood_ID,Borough,Name,Latitude,Longitude
0,1430,Ciudad Bolívar,Jerusalén La Pradera,4.57170783064,-74.1623352778
1,1431,Ciudad Bolívar,Gibraltar I,4.55942859701,-74.1412673602
2,1434,Ciudad Bolívar,San Francisco La Palmera,4.56057102018,-74.143503033
3,1437,Ciudad Bolívar,La Florida del Sur,4.54954369286,-74.1388429955
6,1448,Ciudad Bolívar,Naciones Unidas (Santa Rosa),4.53915461488,-74.1508507618
7,1450,Ciudad Bolívar,El Tesoro,4.53922451762,-74.146721501
9,1453,Ciudad Bolívar,Villa Jacqui,4.54380177541,-74.1349323586
10,1459,Ciudad Bolívar,Bella Flor,4.54351911036,-74.1610882451
11,1464,Ciudad Bolívar,Bellavista Lucero Alto,4.54928656394,-74.1454955628
13,1471,Ciudad Bolívar,El Castillo_,4.55186147742,-74.1468462235
