# Applied Data Science Capstone final project

This notebook is my final project for the Applied Data Science Capstone course from Coursera. In this notebook, a location problem will be studied. 

# Problem description

One of the major natural sites of São Paulo – Brazil is the Ibirapuera park. The park has over 100 hectares and it is in Moema neighborhood, in the center-south of São Paulo. The park is one of the most famous parks in Latin America, attracting millions of visitors a year.

Due the large number of visitors the park receives every day, an investor would like to know if the surrounding of the park would be a good place to build his restaurant. To answer this question, the investor will open his restaurant if there is a region, next to the park, with not too many competitors. If such area is available, he also wants to know if it is better to open an Italian, a vegetarian or a Japanese restaurant, the types of food he already invests in.

1. First, let's import the modules for the analysis.

In [1]:
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

# library to handle requests
import requests 

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#If the folium library is not installed, download it
#!conda install -c conda-forge folium=0.5.0 --yes
!pip install folium
import folium # plotting library

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 13.5MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.0


# Data collection

To answer the question proposed in the business problem, some location information will be required. First, it is necessary the latitude and longitude of the park. Due to its size, the center of the park will be used as reference for its location. The latitude and longitude were already identified using Google maps and is -23.588375, -46.658258. With the latitude and longitude of the park, the closest food venues will be identified through Foursquare API, with a radius of 2500 meters. This large radius was chosen due the size of the park. With a 2500 meter-radius, we will assure that at least a five-block radius surrounding the park will be considered when looking for food venues.

After collecting the data, the data will be reviewed and cleaned. A cluster analysis will be performed using the latitude and longitude of the venues to identify regions with lower density of food venues. The clusters will be plotted through using folium to make the visualization easier. If there is a low density region, the categories of the venues will be reviewed to see what are the most popular food types and a recommendation of a type of restaurant will be made based on the other types of restaurants available in the region selected.

2. Connect to Foursquare API (each developer should use their own ID).

In [2]:
CLIENT_ID = 'F5HJDHZEFG4JTDOGREYAUHRZH1ZGY1MYDWJCMZEDLPJ3DVHF' # your Foursquare ID
CLIENT_SECRET = '4K334FEQ4Y3UQBGWHJDWN1QIFRXWWXDQINHB0QYP4DJWDUPP' # your Foursquare Secret
VERSION = '20191005'
LIMIT = 100

3. Requesting the data for Foursquare.

Observation:

A) Since we are investigating food places, the search query will be food.

B) The minimum radius will be 2500 meter due the size of the park. With a 2500 radius, we will assure that at least a five-block radius surrounding the park will be considered when looking for food venues.

C) The latitude and longitude of the center of the park were identified through Google Maps.

In [3]:
#First we define what we are looking for
search_query = 'Food'

#Then we define the radious
radius = 2500

#Get the latitude and longitude Ibirapuera park 
lat = -23.588375
long =  -46.658258

#Finaly we set the url for our search
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, long, VERSION, search_query, radius, LIMIT)

4. Requesting the reults.

In [4]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5da1cc1666fc65002c825ab2'},
 'response': {'venues': [{'id': '54528c3f498ea48fc80daa59',
    'name': 'Goa Food Truck (TCC)',
    'location': {'address': 'Rua José Antônio Coelho, 879',
     'lat': -23.58184,
     'lng': -46.649849,
     'labeledLatLngs': [{'label': 'display',
       'lat': -23.58184,
       'lng': -46.649849}],
     'distance': 1124,
     'cc': 'BR',
     'city': 'São Paulo',
     'state': 'SP',
     'country': 'Brasil',
     'formattedAddress': ['Rua José Antônio Coelho, 879',
      'São Paulo, SP',
      'Brasil']},
    'categories': [{'id': '4bf58dd8d48988d1d3941735',
      'name': 'Vegetarian / Vegan Restaurant',
      'pluralName': 'Vegetarian / Vegan Restaurants',
      'shortName': 'Vegetarian / Vegan',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/vegetarian_',
       'suffix': '.png'},
      'primary': True}],
    'referralId': 'v-1570884630',
    'hasPerk': False},
   {'id': '4bf30cc1706e20a1fdc8a798

5. Assigning the data to a dataframe.

In [5]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
food_venues = json_normalize(venues)
food_venues.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1d3941735', 'name': 'V...",False,54528c3f498ea48fc80daa59,"Rua José Antônio Coelho, 879",BR,São Paulo,Brasil,,1124,"[Rua José Antônio Coelho, 879, São Paulo, SP, ...","[{'label': 'display', 'lat': -23.58184, 'lng':...",-23.58184,-46.649849,,,SP,Goa Food Truck (TCC),v-1570884630,
1,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,4bf30cc1706e20a1fdc8a798,Reebok Sports Club,BR,São Paulo,Brasil,,2777,"[Reebok Sports Club, São Paulo, SP, 04551-000,...","[{'label': 'display', 'lat': -23.5955329614596...",-23.595533,-46.684341,,04551-000,SP,Fit Food,v-1570884630,
2,"[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",False,4c548403b426ef3b53d8838a,"R. Dr. Fausto Ferraz, 28",BR,São Paulo,Brasil,R. Carlos Sampaio,2778,"[R. Dr. Fausto Ferraz, 28 (R. Carlos Sampaio),...","[{'label': 'display', 'lat': -23.5663916712315...",-23.566392,-46.64537,Bela Vista,01333-030,SP,Irashai Japanese Food,v-1570884630,
3,"[{'id': '4bf58dd8d48988d16b941735', 'name': 'B...",False,4c3ca244933b0f470856e421,"R. Peixoto Gomide, 1052",BR,São Paulo,Brasil,,2918,"[R. Peixoto Gomide, 1052, São Paulo, SP, 01409...","[{'label': 'display', 'lat': -23.5621563705620...",-23.562156,-46.658713,,01409-000,SP,Quality Food,v-1570884630,
4,"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",False,520ba1f711d2955c26da4960,Bodytech,BR,São Paulo,Brasil,,1994,"[Bodytech, São Paulo, SP, 04542-000, Brasil]","[{'label': 'display', 'lat': -23.5874505404352...",-23.587451,-46.677784,,04542-000,SP,Fit Food,v-1570884630,


6. Filtering the data and updating the dataframe.

In [6]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in food_venues.columns if col.startswith('location.')] + ['id']
food_filtered = food_venues.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
food_filtered['categories'] = food_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
food_filtered.columns = [column.split('.')[-1] for column in food_filtered.columns]

food_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Goa Food Truck (TCC),Vegetarian / Vegan Restaurant,"Rua José Antônio Coelho, 879",BR,São Paulo,Brasil,,1124,"[Rua José Antônio Coelho, 879, São Paulo, SP, ...","[{'label': 'display', 'lat': -23.58184, 'lng':...",-23.58184,-46.649849,,,SP,54528c3f498ea48fc80daa59
1,Fit Food,Restaurant,Reebok Sports Club,BR,São Paulo,Brasil,,2777,"[Reebok Sports Club, São Paulo, SP, 04551-000,...","[{'label': 'display', 'lat': -23.5955329614596...",-23.595533,-46.684341,,04551-000,SP,4bf30cc1706e20a1fdc8a798
2,Irashai Japanese Food,Sushi Restaurant,"R. Dr. Fausto Ferraz, 28",BR,São Paulo,Brasil,R. Carlos Sampaio,2778,"[R. Dr. Fausto Ferraz, 28 (R. Carlos Sampaio),...","[{'label': 'display', 'lat': -23.5663916712315...",-23.566392,-46.64537,Bela Vista,01333-030,SP,4c548403b426ef3b53d8838a
3,Quality Food,Brazilian Restaurant,"R. Peixoto Gomide, 1052",BR,São Paulo,Brasil,,2918,"[R. Peixoto Gomide, 1052, São Paulo, SP, 01409...","[{'label': 'display', 'lat': -23.5621563705620...",-23.562156,-46.658713,,01409-000,SP,4c3ca244933b0f470856e421
4,Fit Food,Restaurant,Bodytech,BR,São Paulo,Brasil,,1994,"[Bodytech, São Paulo, SP, 04542-000, Brasil]","[{'label': 'display', 'lat': -23.5874505404352...",-23.587451,-46.677784,,04542-000,SP,520ba1f711d2955c26da4960


Now the ID's and names are easier to interpret. When we look to the categories of the venues, some of them are related to food, but are not restaurants. The venue 6, for example, is a tech statup. 

Therefore, it is necessary to clean the dataframe.

7. Data cleaning.

To clean the framework, the categories column will be used. If the words "restaurant", "cafe" or "Food" are not presented in the categories, it is likely that the venue will not be a competitor to our investor, so this entries will be droped. 

In [7]:
#Creating an empty list that will keep the venues to be droped
not_restaurant = []

#Checking if the words 'restaurant', 'food' or 'cafe' are not in the column categories
for i in range(len(food_filtered['categories'])):
    if 'Restaurant' not in food_filtered['categories'].iloc[i]:
        if 'Food' not in food_filtered['categories'].iloc[i]:
            if 'Cafe' not in food_filtered['categories'].iloc[i]:
                not_restaurant.append(i)

#Dropping the row identified in the for loop
food_filtered.drop(food_filtered.index[[not_restaurant]], axis=0, inplace=True)
food_filtered.head(10)

  result = getitem(key)


Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Goa Food Truck (TCC),Vegetarian / Vegan Restaurant,"Rua José Antônio Coelho, 879",BR,São Paulo,Brasil,,1124,"[Rua José Antônio Coelho, 879, São Paulo, SP, ...","[{'label': 'display', 'lat': -23.58184, 'lng':...",-23.58184,-46.649849,,,SP,54528c3f498ea48fc80daa59
1,Fit Food,Restaurant,Reebok Sports Club,BR,São Paulo,Brasil,,2777,"[Reebok Sports Club, São Paulo, SP, 04551-000,...","[{'label': 'display', 'lat': -23.5955329614596...",-23.595533,-46.684341,,04551-000,SP,4bf30cc1706e20a1fdc8a798
2,Irashai Japanese Food,Sushi Restaurant,"R. Dr. Fausto Ferraz, 28",BR,São Paulo,Brasil,R. Carlos Sampaio,2778,"[R. Dr. Fausto Ferraz, 28 (R. Carlos Sampaio),...","[{'label': 'display', 'lat': -23.5663916712315...",-23.566392,-46.64537,Bela Vista,01333-030,SP,4c548403b426ef3b53d8838a
3,Quality Food,Brazilian Restaurant,"R. Peixoto Gomide, 1052",BR,São Paulo,Brasil,,2918,"[R. Peixoto Gomide, 1052, São Paulo, SP, 01409...","[{'label': 'display', 'lat': -23.5621563705620...",-23.562156,-46.658713,,01409-000,SP,4c3ca244933b0f470856e421
4,Fit Food,Restaurant,Bodytech,BR,São Paulo,Brasil,,1994,"[Bodytech, São Paulo, SP, 04542-000, Brasil]","[{'label': 'display', 'lat': -23.5874505404352...",-23.587451,-46.677784,,04542-000,SP,520ba1f711d2955c26da4960
6,Vila Olímpia Food Hall,Street Food Gathering,"R. Tenerife, 74",BR,São Paulo,Brasil,,3191,"[R. Tenerife, 74, São Paulo, SP, 04548-040, Br...","[{'label': 'display', 'lat': -23.596512, 'lng'...",-23.596512,-46.688254,,04548-040,SP,59a9aa63d8fe7a0d1368ee55
7,Pança's Fast Food,Brazilian Restaurant,"R. Correia Dias, 370",BR,São Paulo,Brasil,,2301,"[R. Correia Dias, 370, São Paulo, SP, 04104-00...","[{'label': 'display', 'lat': -23.576486, 'lng'...",-23.576486,-46.639803,,04104-001,SP,4c2a215f9fb5d13a9e329c57
8,Komê Japonese Food,Sushi Restaurant,Rua São Miguel,BR,São Paulo,Brasil,Rua Frei Caneca,3703,"[Rua São Miguel (Rua Frei Caneca), São Paulo, ...","[{'label': 'display', 'lat': -23.5552848935594...",-23.555285,-46.65453,,,SP,54302711498e92eb4a152e79
9,Sapporo Japanese Food,Japanese Restaurant,"Pç. N.Sa. Aparecida, 114",BR,São Paulo,Brasil,,1734,"[Pç. N.Sa. Aparecida, 114, São Paulo, SP, 0407...","[{'label': 'display', 'lat': -23.6037767738160...",-23.603777,-46.660836,,04075-010,SP,4b92f3e2f964a520f92934e3
10,NaturALL Fit - Healthy Food & Supplements,Food & Drink Shop,240 Rua Urussuí,BR,São Paulo,Brasil,,2575,"[240 Rua Urussuí, São Paulo, SP, 04542-051, Br...","[{'label': 'display', 'lat': -23.604378, 'lng'...",-23.604378,-46.640018,,04542-051,SP,591b62a4b9ac384466c171c7


As we can see, the row 6 and all other rows that didn't contained the words 'Restaurant', 'Cafe' or 'Food' we droped, therefore, the dataframe is clean.

8. Explore the location of the venues using their latitude and longitude.

To explore such locations, let's use the folium module to plot them.

In [8]:
venues_map = folium.Map(location=[lat, long], zoom_start=14) # generate map centred around Downtown Toronto

# add a red circle marker to represent Downtown Toronto
folium.vector_layers.CircleMarker(
    location = [lat, long],
    radius=10,
    color='red',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(food_filtered.lat, food_filtered.lng, food_filtered.categories):
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

We can already see that there are some regions with higher concentration of restaurants. To further explore such regions, let's group the venues using K-means.

# Research method

To further analyze the food venues in the region, a cluster analysis will be used to group the venues based on their latitude and longitude. To perform the clustering, Kmeans will be used. Kmeans was chosen because it is a simple algorithm to be implemented and will have a good performance considering the number of data points. 

After forming the clusters, the region that doesn’t present a cluster, or presents the cluster with the lowest density, will be chosen to be the potential location. With the potential location in mind, the surrounding venues of the location will be evaluated. If there is a region with no venues, the closest cluster will be used to choose the appropriate restaurant. Since our investor wants to open a restaurant with fewer competitors, the categories of the venue will be reviewed to see how many Japanese, Italian and vegetarian restaurants are in the region, the type of restaurant that has fewer direct competitors will be chosen.

9. Cluster analysis.

First, we need to load an extra module.

In [9]:
from sklearn.cluster import KMeans 

Second, let's use the latitude and longitude of the venues to cluster them.

Since we have a large radius, let's use seven clusters for the analysis.

In [10]:
#Initializing k-means with seven clusters
k_means = KMeans(init="k-means++", n_clusters=7, n_init=12)

#Grouping the location based on latitude and longitude
k_means.fit(food_filtered[['lat','lng']])

#Get the label of the cluster
k_means_labels = k_means.labels_
k_means_labels

array([2, 0, 4, 4, 0, 0, 6, 4, 5, 1, 2, 3, 2, 6, 2, 0, 0, 6, 6, 2, 2, 2,
       5, 2, 5, 2, 3, 0, 1, 3, 6, 6, 4, 5, 6, 6, 5, 5, 5], dtype=int32)

Third, let's add a column in the dataframe to store the cluster labels

In [11]:
food_filtered['cluster'] = k_means_labels
food_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,cluster
0,Goa Food Truck (TCC),Vegetarian / Vegan Restaurant,"Rua José Antônio Coelho, 879",BR,São Paulo,Brasil,,1124,"[Rua José Antônio Coelho, 879, São Paulo, SP, ...","[{'label': 'display', 'lat': -23.58184, 'lng':...",-23.58184,-46.649849,,,SP,54528c3f498ea48fc80daa59,2
1,Fit Food,Restaurant,Reebok Sports Club,BR,São Paulo,Brasil,,2777,"[Reebok Sports Club, São Paulo, SP, 04551-000,...","[{'label': 'display', 'lat': -23.5955329614596...",-23.595533,-46.684341,,04551-000,SP,4bf30cc1706e20a1fdc8a798,0
2,Irashai Japanese Food,Sushi Restaurant,"R. Dr. Fausto Ferraz, 28",BR,São Paulo,Brasil,R. Carlos Sampaio,2778,"[R. Dr. Fausto Ferraz, 28 (R. Carlos Sampaio),...","[{'label': 'display', 'lat': -23.5663916712315...",-23.566392,-46.64537,Bela Vista,01333-030,SP,4c548403b426ef3b53d8838a,4
3,Quality Food,Brazilian Restaurant,"R. Peixoto Gomide, 1052",BR,São Paulo,Brasil,,2918,"[R. Peixoto Gomide, 1052, São Paulo, SP, 01409...","[{'label': 'display', 'lat': -23.5621563705620...",-23.562156,-46.658713,,01409-000,SP,4c3ca244933b0f470856e421,4
4,Fit Food,Restaurant,Bodytech,BR,São Paulo,Brasil,,1994,"[Bodytech, São Paulo, SP, 04542-000, Brasil]","[{'label': 'display', 'lat': -23.5874505404352...",-23.587451,-46.677784,,04542-000,SP,520ba1f711d2955c26da4960,0


Finaly, let's review the map, using different colors for each cluster.

In [12]:
venues_map = folium.Map(location=[lat, long], zoom_start=14) # generate map centred around Ibirapuera park

# add a red circle marker to represent Ibirapuera park
folium.vector_layers.CircleMarker(
    location = [lat, long],
    radius=10,
    color='red',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the venues as colorful circle markers
# each cluster will have its unique color so it can be identified
for lat, lng, label, cluster in zip(food_filtered.lat, food_filtered.lng, food_filtered.name, food_filtered.cluster):
    if cluster == 0:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='blue',
            fill = True,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 1:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='red',
            fill = True,
            fill_color='red',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 2:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='purple',
            fill = True,
            fill_color='purple',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 3:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='green',
            fill = True,
            fill_color='green',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 4:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='orange',
            fill = True,
            fill_color='orange',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 5:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='pink',
            fill = True,
            fill_color='pink',
            fill_opacity=0.6
        ).add_to(venues_map)
    elif cluster == 6:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='gray',
            fill = True,
            fill_color='gray',
            fill_opacity=0.6
        ).add_to(venues_map)
    else:
        folium.vector_layers.CircleMarker(
            [lat, lng],
            radius=5,
            color='black',
            fill = True,
            fill_color='black',
            fill_opacity=0.6
        ).add_to(venues_map)
        

# display map
display(venues_map)

We can see that the purple cluster is the closest to the park and has several venues within it.

We can also see that, between the park and the blue cluster there are several block with no food venues. The same goes the red cluster.

Therefore, the West and South regions of the park would be good regions for the new restaurant, specially because their closer to the República do Líbano Avenue and Quarto Centenário Avenue.

Let's investigate both cluster to see which food options they provide.

10. Exploring the red and blue clusters.

In [13]:
#Displaying the blue cluster
food_filtered.loc[food_filtered['cluster'] == 0,:]

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,cluster
1,Fit Food,Restaurant,Reebok Sports Club,BR,São Paulo,Brasil,,2777,"[Reebok Sports Club, São Paulo, SP, 04551-000,...","[{'label': 'display', 'lat': -23.5955329614596...",-23.595533,-46.684341,,04551-000,SP,4bf30cc1706e20a1fdc8a798,0
4,Fit Food,Restaurant,Bodytech,BR,São Paulo,Brasil,,1994,"[Bodytech, São Paulo, SP, 04542-000, Brasil]","[{'label': 'display', 'lat': -23.5874505404352...",-23.587451,-46.677784,,04542-000,SP,520ba1f711d2955c26da4960,0
6,Vila Olímpia Food Hall,Street Food Gathering,"R. Tenerife, 74",BR,São Paulo,Brasil,,3191,"[R. Tenerife, 74, São Paulo, SP, 04548-040, Br...","[{'label': 'display', 'lat': -23.596512, 'lng'...",-23.596512,-46.688254,,04548-040,SP,59a9aa63d8fe7a0d1368ee55,0
18,Shigueru Japanese Food,Japanese Restaurant,"R. Leopoldo Couto de Magalhaes Júnior, 275",BR,São Paulo,Brasil,,1777,"[R. Leopoldo Couto de Magalhaes Júnior, 275, S...","[{'label': 'display', 'lat': -23.5869510955304...",-23.586951,-46.67561,,,SP,4d49cbf59544a0934a432ce7,0
19,Hale I'a Hawaiian Fish Food,Hawaiian Restaurant,"R. Prof. Atílio Innocenti, 693",BR,São Paulo,Brasil,,2236,"[R. Prof. Atílio Innocenti, 693, São Paulo, SP...","[{'label': 'display', 'lat': -23.5937617201006...",-23.593762,-46.679374,,04538-001,SP,5acdf3c56fa81f479105ddf4,0
35,Kebebariak Casual Food,Falafel Restaurant,,BR,,Brasil,,1461,[Brasil],"[{'label': 'display', 'lat': -23.584044, 'lng'...",-23.584044,-46.671783,,,,5058ae4de4b02049c1ed7626,0


In [14]:
#Displaying the red cluster
food_filtered.loc[food_filtered['cluster'] == 1,:]

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id,cluster
10,NaturALL Fit - Healthy Food & Supplements,Food & Drink Shop,240 Rua Urussuí,BR,São Paulo,Brasil,,2575,"[240 Rua Urussuí, São Paulo, SP, 04542-051, Br...","[{'label': 'display', 'lat': -23.604378, 'lng'...",-23.604378,-46.640018,,04542-051,SP,591b62a4b9ac384466c171c7,1
36,Food Trucker,Fast Food Restaurant,"Rua Dr. Bacelar, 1043",BR,São Paulo,Brasil,,2000,"[Rua Dr. Bacelar, 1043, São Paulo, SP, Brasil]","[{'label': 'display', 'lat': -23.6038580540477...",-23.603858,-46.648309,,,SP,543306ca498e1a4f90a35455,1


Considering that our investor has experience with Japanese, vegetarian and Italian food; if he chooses to stay in the South, a Japanese or Italian restaurants would be good options, if he chooses to stay in the West, the best choice would be an Italian restaurant.

# Summary of the results and discussion

After performing the cluster analysis, two regions next to the park didn’t have any food venues, South and West. Both regions would be good locations, not only because they don’t have any food venues, but also because they are close to two avenues that would make the access easier, Quarto Centenário Avenue in the South, and República do Líbano Avenue on the West.

Since there are two good option, both were evaluated. On the West, the next cluster is the blue cluster. In the blue cluster, we have one Japanese restaurant and two restaurants called fit food, which might provide vegetarian options. Therefore, if the investor chooses to stay in the West, the option would be an Italian restaurant.
In the South, we have just two venues, food truck and a venue called NatuALL fit, which most likely have vegetarian options. Therefore, both Italian and Japanese restaurants would be good options, if the investor chooses to stay in the South region.

# Conclusion

The West and South regions of the park presented no competition for several block, therefore, the recomended places to start the restaurant would be close to the República do Líbano Avenue and Quarto Centenário Avenue, not only because they are close to such regions, but will also provide easy access to the restaurant.

The two closest clusters (red and blue) didn't have any Italian nor vegetarian restaurants, so if the investor opens one of this two types of restaurant, he will not only be in a region with low competition, but he will also differenciate himself from the closest options.