# CAPSTONE PROJECT - THE BATTLE OF THE NEIGHBORHOODS 
### Applied Data Science Capstone by IBM/Coursera

Author: Eduardo José Mendoza Chávez


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The main purpose of this project is to find the optimal location to open a new restaurant in the municipality of San Pedro Garza Garcia, in Nuevo León, México. After analyzing the neighborhoods we will try to distinguish a neighborhood that isnt full of venues as such already and is still in the city center.

This analysis is also helpful since it may tell us that opening a restaurant is not be the best idea. The analysis involves every neighborhood in the city and it is possible to identify trends or missing venues in a neighborhood and exploid this data in our favor which is one of the "kindess" of using data science.



## Data <a name="data"></a>

The more important factors in the decision making process will be the following: 
* amount of existing restaurants in the neighborhod
* types of venues present in the neighborhood 
* distance from the city center (away from residential exclusive areas)

The following data sources were used to extract the required information: 
* Postal Codes (shown as CP from now on) were obtained from Mexican Postal Service (https://www.correosdemexico.gob.mx/SSLServicios/ConsultaCP/Descarga.aspx)
* Latitude and Longitude of each CP were obtained from the free online database named GeoNames (http://download.geonames.org/export/zip/)
* venues, type and location were obtained using  **Foursquare API**



In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

import requests

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
   # tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!pip install folium
import folium 
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')


Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 7.7MB/s ta 0:00:011
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Folium installed
Libraries imported.


**Importing data base file**

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,PAIS,CP,ZONA,ESTADO,MUNICIPIO,LATITUD,LONGITUD,DUPLICATE
0,MX,66200,San Pedro Garza Garcia Centro,Nuevo Leon,San Pedro Garza Garc√≠a,25.661658,-100.41,0
1,MX,66210,La Leona,Nuevo Leon,San Pedro Garza Garc√≠a,25.684703,-100.414234,0
2,MX,66214,El Obispo,Nuevo Leon,San Pedro Garza Garc√≠a,25.679291,-100.417176,0
3,MX,66215,San Pedro,Nuevo Leon,San Pedro Garza Garc√≠a,25.679291,-100.417176,0
4,MX,66216,El Obispo,Nuevo Leon,San Pedro Garza Garc√≠a,25.681186,-100.403745,0
5,MX,66217,Zona Industrial,Nuevo Leon,San Pedro Garza Garc√≠a,25.673186,-100.412024,0
6,MX,66218,Valle del Seminario 1 Sector,Nuevo Leon,San Pedro Garza Garc√≠a,25.672331,-100.401856,0
7,MX,66219,Revoluci√≥n 1er Sector,Nuevo Leon,San Pedro Garza Garc√≠a,25.683388,-100.409248,0
8,MX,66220,Del Valle,Nuevo Leon,San Pedro Garza Garc√≠a,25.658021,-100.371921,0
9,MX,66224,Fuentes del Valle Sector Colinas,Nuevo Leon,San Pedro Garza Garc√≠a,25.665108,-100.364288,0


Replacing a mistype in the municipality values:

In [3]:
df["MUNICIPIO"].replace("San Pedro Garza Garc√≠a", "San Pedro Garza Garcia", inplace= True )

In [4]:
print(df.shape)

(57, 8)


Longitude values were imported as object, so a transform was done to convert them to a float type: 

In [5]:
df["LONGITUD"] = df["LONGITUD"].astype("float64")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57 entries, 0 to 56
Data columns (total 8 columns):
PAIS         57 non-null object
CP           57 non-null int64
ZONA         57 non-null object
ESTADO       57 non-null object
MUNICIPIO    57 non-null object
LATITUD      57 non-null float64
LONGITUD     57 non-null float64
DUPLICATE    57 non-null int64
dtypes: float64(2), int64(2), object(4)
memory usage: 3.6+ KB


In [6]:
df2 = df.drop(['PAIS', 'DUPLICATE'], axis=1)
df2 = df.drop([16,53], axis = 0)
df2 = df2[['CP','ZONA','MUNICIPIO','ESTADO', 'LATITUD', 'LONGITUD']]

print(df2.shape)

df2.reset_index()

df2.head()

(55, 6)


Unnamed: 0,CP,ZONA,MUNICIPIO,ESTADO,LATITUD,LONGITUD
0,66200,San Pedro Garza Garcia Centro,San Pedro Garza Garcia,Nuevo Leon,25.661658,-100.41
1,66210,La Leona,San Pedro Garza Garcia,Nuevo Leon,25.684703,-100.414234
2,66214,El Obispo,San Pedro Garza Garcia,Nuevo Leon,25.679291,-100.417176
3,66215,San Pedro,San Pedro Garza Garcia,Nuevo Leon,25.679291,-100.417176
4,66216,El Obispo,San Pedro Garza Garcia,Nuevo Leon,25.681186,-100.403745


## Methodology <a name="methodology"></a>

For this project we are going to do an analysis of every neighborhood in the municipality of San Pedro Garza García, we will map every Postal Code in the city and analyze the top venues around each neighborhood. 

The first step was to collect the required data which is the information of the postal codes in the municipality, and the latitude and longitude in order to identify them on a map. 

Second step will require the processing of the data into a map and connect with the Foursquare API in order to collect the information of the type of venues and the most popular ones around each neighborhood. 

Finally, on the third step we are going to analyze the results of the top 10 venues per neighborhood and use the K-means methodology to cluster the neighborhoods by its simmilarities and decide if opening a restarant is the best way to go, and if so, where?; this information is supposed to be presented to the stakeholders who will make the final decision. 


## Analysis

let's begin the process by mapping the neighborhoods using geopy.geocoders and folium maps. 

In [7]:
from geopy.geocoders import Nominatim

address = 'San Pedro Garza García, Monterrey'

geolocator = Nominatim(user_agent="tl-SPG-neigh")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of San Pedro Garza García are {}, {}.'.format(latitude, longitude))

The geographical coordinates of San Pedro Garza García are 25.6651051, -100.4022714.


Mapping the Postal Codes of the city of San Pedro Garza Garcia: 

In [8]:

map_SPG = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, long, post, borough, neigh in zip(df2['LATITUD'],df2['LONGITUD'],df2['CP'], df2['MUNICIPIO'],df2['ZONA']):
    label = "{} ({}): {}".format(borough, post, neigh)
    popup = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=popup,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_SPG)
    
map_SPG

Hidden cell contains sensible information to connect to Foursquare API

In [9]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: GODUO2N50VOFI3FF0QT4UVCCULY5BFRSQDOLQTBC11C0VQP2
CLIENT_SECRET:U0YUKJTV0FWORIV3O4AHBRXQV1N0MMYWQC1ZIS0N4HEVGOQV


In [10]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(df2['LATITUD'],df2['LONGITUD'],df2['CP'],df2['MUNICIPIO'],df2['ZONA']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

Converting the venues information into a dataframe:

In [14]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['CP', 'MUNICIPIO', 'ZONA', 'MUNICIPIOLATITUD', 'MUNICIPIOLONGITUD', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head()

(1020, 9)


Unnamed: 0,CP,MUNICIPIO,ZONA,MUNICIPIOLATITUD,MUNICIPIOLONGITUD,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,25.661658,-100.41,DSTRTO,25.662664,-100.406337,Accessories Store
1,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,25.661658,-100.41,Oxxo Jimenez,25.663479,-100.409464,Convenience Store
2,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,25.661658,-100.41,Liga de Futbol Brillamont,25.664277,-100.409887,Soccer Field
3,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,25.661658,-100.41,Los Picosos de Puebla,25.659473,-100.410842,Mexican Restaurant
4,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,25.661658,-100.41,Tacos Barney,25.662008,-100.414186,Mexican Restaurant


Grouping venues by Postal Code: 

In [15]:
venues_df.groupby(['CP', 'MUNICIPIO', 'ZONA'])['VenueName'].count()

CP     MUNICIPIO               ZONA                            
66200  San Pedro Garza Garcia  San Pedro Garza Garcia Centro        23
66210  San Pedro Garza Garcia  La Leona                              3
66214  San Pedro Garza Garcia  El Obispo                            14
66215  San Pedro Garza Garcia  San Pedro                            14
66216  San Pedro Garza Garcia  El Obispo                            12
66217  San Pedro Garza Garcia  Zona Industrial                       4
66218  San Pedro Garza Garcia  Valle del Seminario 1 Sector          6
66219  San Pedro Garza Garcia  Revoluci√≥n 1er Sector                5
66220  San Pedro Garza Garcia  Del Valle                            68
66224  San Pedro Garza Garcia  Fuentes del Valle Sector Colinas      6
66225  San Pedro Garza Garcia  La Joya                              39
66226  San Pedro Garza Garcia  Bugambilias                          39
66227  San Pedro Garza Garcia  Las Capillas                         25
66228  San Pe

The Amount of unique type of venues in the area are the following: 

In [16]:
len(venues_df['VenueCategory'].unique())

172

Analyzing venues in each area:

In [17]:
# one hot encoding
SPG_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add postal, borough and neighborhood column back to dataframe
SPG_onehot['CP'] = venues_df['CP'] 
SPG_onehot['MUNICIPIO'] = venues_df['MUNICIPIO'] 
SPG_onehot['ZONA'] = venues_df['ZONA'] 

# move postal, borough and neighborhood column to the first column
fixed_columns = list(SPG_onehot.columns[-3:]) + list( SPG_onehot.columns[:-3])
SPG_onehot = SPG_onehot[fixed_columns]

print(SPG_onehot.shape)
SPG_onehot.head()

(1020, 175)


Unnamed: 0,CP,MUNICIPIO,ZONA,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Warehouse Store,Wine Shop,Wings Joint,Yoga Studio,Zoo Exhibit
0,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Getting the frequency of each venue:

In [18]:
SPG_venues_freq = SPG_onehot.groupby(['CP', 'MUNICIPIO', 'ZONA']).mean().reset_index()
print(SPG_venues_freq.shape)
SPG_venues_freq.head()

(55, 175)


Unnamed: 0,CP,MUNICIPIO,ZONA,ATM,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,...,Toy / Game Store,Trail,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Warehouse Store,Wine Shop,Wings Joint,Yoga Studio,Zoo Exhibit
0,66200,San Pedro Garza Garcia,San Pedro Garza Garcia Centro,0.0,0.043478,0.043478,0.043478,0.043478,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,66210,San Pedro Garza Garcia,La Leona,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,66214,San Pedro Garza Garcia,El Obispo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0
3,66215,San Pedro Garza Garcia,San Pedro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0
4,66216,San Pedro Garza Garcia,El Obispo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## TOP 10 VENUES IN EACH AREA

Creating dataframe with the top 10 venues per location: 

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
areaColumns = ['CP', 'MUNICIPIO', 'ZONA']
freqColumns = []
for ind in np.arange(num_top_venues):
    try:
        freqColumns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freqColumns.append('{}th Most Common Venue'.format(ind+1))
columns = areaColumns+freqColumns
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['CP'] = SPG_venues_freq['CP']
neighborhoods_venues_sorted['MUNICIPIO'] = SPG_venues_freq['MUNICIPIO']
neighborhoods_venues_sorted['ZONA'] = SPG_venues_freq['ZONA']

for ind in np.arange(SPG_venues_freq.shape[0]):
    row_categories = SPG_venues_freq.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    neighborhoods_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

neighborhoods_venues_sorted.sort_values(freqColumns, inplace=True)
neighborhoods_venues_sorted

Unnamed: 0,CP,MUNICIPIO,ZONA,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,66295,San Pedro Garza Garcia,Lomas de San Angel,American Restaurant,Forest,Golf Course,Zoo Exhibit,Dog Run,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space
51,66287,San Pedro Garza Garcia,Lomas Del Rosario,Beach Bar,Zoo Exhibit,Food Court,Food,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
42,66274,San Pedro Garza Garcia,Jardines de San Agustin,Botanical Garden,Dance Studio,Dog Run,Food,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
7,66219,San Pedro Garza Garcia,Revoluci√≥n 1er Sector,Burger Joint,Dog Run,Park,Department Store,Zoo Exhibit,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
10,66225,San Pedro Garza Garcia,La Joya,Café,Pizza Place,Paper / Office Supplies Store,Fast Food Restaurant,Supermarket,Convenience Store,Boutique,Ice Cream Shop,Gym,Mexican Restaurant
13,66228,San Pedro Garza Garcia,Residencial San Carlos,Convenience Store,Big Box Store,Food Court,Food,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
49,66285,San Pedro Garza Garcia,Sierra Nevada,Convenience Store,Pharmacy,Wine Shop,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
19,66238,San Pedro Garza Garcia,Lucio Blanco 3er Sector,Convenience Store,Taco Place,Pharmacy,Athletics & Sports,Mexican Restaurant,Soccer Field,Art Gallery,Cupcake Shop,Donut Shop,Flower Shop
48,66280,San Pedro Garza Garcia,Villa Del Pedregal,Cosmetics Shop,Cheese Shop,Business Service,Jewelry Store,Convenience Store,Park,Dance Studio,Electronics Store,Flower Shop,Flea Market
50,66286,San Pedro Garza Garcia,Colonial La Sierra,Department Store,Zoo Exhibit,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store,Donut Shop


Clustering areas using K-means methodology: 

In [20]:
from sklearn.cluster import KMeans


kclusters = 4

SPG_venues_freq_clustering = SPG_venues_freq.drop(['CP', 'MUNICIPIO', 'ZONA'], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(SPG_venues_freq_clustering)

SPG_venues_clustered_df = df2

print(kmeans.labels_)
SPG_venues_clustered_df['Cluster'] = kmeans.labels_

SPG_venues_clustered_df = SPG_venues_clustered_df.join(neighborhoods_venues_sorted.drop(['MUNICIPIO', 'ZONA'], 1).set_index('CP'), on='CP')
SPG_venues_clustered_df.sort_values(['Cluster'] + freqColumns, inplace=True)
SPG_venues_clustered_df

[3 1 3 3 3 3 3 3 3 3 3 3 3 3 1 3 1 3 3 3 3 3 3 1 3 3 3 1 3 1 3 3 3 3 3 3 3
 3 3 3 3 3 3 2 3 3 1 3 3 3 0 3 3 3 3]


Unnamed: 0,CP,ZONA,MUNICIPIO,ESTADO,LATITUD,LONGITUD,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
51,66286,Colonial La Sierra,San Pedro Garza Garcia,Nuevo Leon,25.631403,-100.381748,0,Department Store,Zoo Exhibit,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store,Donut Shop
30,66256,Lomas Del Valle,San Pedro Garza Garcia,Nuevo Leon,25.64331,-100.379767,1,Martial Arts Dojo,Park,Food Truck,Movie Theater,Zoo Exhibit,Dog Run,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
17,66235,Los Olmos,San Pedro Garza Garcia,Nuevo Leon,25.645319,-100.405364,1,Park,College Administrative Building,Athletics & Sports,Japanese Restaurant,Resort,Cupcake Shop,Dance Studio,Flea Market,Fast Food Restaurant,Farmers Market
47,66278,Flor de Mayo,San Pedro Garza Garcia,Nuevo Leon,25.638673,-100.326904,1,Park,Electronics Store,Zoo Exhibit,Diner,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Donut Shop,Dog Run
28,66250,Zona Jer√≥nimo Siller,San Pedro Garza Garcia,Nuevo Leon,25.644645,-100.371054,1,Park,Pharmacy,Convenience Store,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
14,66230,San Pedro,San Pedro Garza Garcia,Nuevo Leon,25.649847,-100.400158,1,Park,Resort,Zoo Exhibit,Diner,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store,Donut Shop
24,66245,Barranca Del Pedregal,San Pedro Garza Garcia,Nuevo Leon,25.643588,-100.385197,1,Park,Snack Place,BBQ Joint,Zoo Exhibit,Dive Bar,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
1,66210,La Leona,San Pedro Garza Garcia,Nuevo Leon,25.684703,-100.414234,1,Soccer Field,Park,Athletics & Sports,Dive Bar,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store
44,66275,Mesa de la Corona,San Pedro Garza Garcia,Nuevo Leon,25.625756,-100.34855,2,Recreation Center,Zoo Exhibit,Diner,Flea Market,Fast Food Restaurant,Farmers Market,Event Space,Electronics Store,Donut Shop,Dog Run
54,66295,Lomas de San Angel,San Pedro Garza Garcia,Nuevo Leon,25.625943,-100.358439,3,American Restaurant,Forest,Golf Course,Zoo Exhibit,Dog Run,Flower Shop,Flea Market,Fast Food Restaurant,Farmers Market,Event Space


Creating a mapa using the above dataframe with the locations already clustered using the K-means methodology:

In [21]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, post, bor, poi, cluster in zip(SPG_venues_clustered_df['LATITUD'], SPG_venues_clustered_df['LONGITUD'], SPG_venues_clustered_df['CP'], SPG_venues_clustered_df['MUNICIPIO'], SPG_venues_clustered_df['ZONA'], SPG_venues_clustered_df['Cluster']):
    label = folium.Popup('{} ({}): {} - Cluster {}'.format(bor, post, poi, cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


 #### Upon the result we can determine the following characeristics per cluster    
    Cluster 0 (RED): This neighborhood principal venues are department stores, zoo and dive bars
    Cluster 1 (PURPLE): around these neighborhoods there are a lot of parks, school buildings and recreational places for sports.
    Cluster 2 (BLUE): recreational places, zoo exhibits and diners.
    Cluster 3 (YELLOW): Burger joints, parks, and department stores. 

## Results and Discussion <a name="results"></a>

As you can see above, we have processed the data from API Foursquare and have created a more complex view of the neighborhood by analyzing and ranking the top venues on each neighborhood and finally clustering the neighborhoods by similarity; one thing to take into account is that data is constantly changing since everyday people keep uploading information at Foursquare. 

Another important point is that the quality of this study depends also on how much information does Foursquare have to share; I personally think that Foursquare is not that popular in latin america and by personally analyzing venues downloaded from the API I can say that some of the information is outdated... nevertheless this is a good exercise to demonstrate the power of data science and how we can use this tool for such type of complex analysis. 

Analyzing the info we can see that in cluster number 3 there are already many restaurants, parks, a different stores ranging from floweries, to supermarkets or even; blue cluster also has diners as one of there top 3 venues and the red cluster seems to be more about department stores and dive bars; which leaves us with cluster number one that has a lot of recreational places such as parks, gyms, and school buildings. 

## Conclusion <a name="conclusion"></a>

After analyzing the previous information, there is a detail that catched my eye: cluster number one.  Cluster number one has really interesting characteristics which I think it makes it suitable to start a restaurant or a juice bar; this cluster main characteristics are that is surrounded by recreational places such as parks, gyms and also school buildings. Having this information plus the uprising trends during the last decade about a healthier life style and taking care of your body, I think that starting a resturant of healthy food, or a smoothie/juice bar or even a different type of gym in this neighborhood is very promising since people who go to this recreational places in the neighborhood would find it convenient to have a healthy restaurant or a smoothie bar where they have a snack or a drink after a long walk in the park. Also you can take into account all the teenagers who go the the schools around this neighborhoods and their parents who will frequent these places and would have to wait frequently for their sons and would prefer to wait in a nice healthy bar or restaurant instead of just being park in their cars. 

