# Introduction/Business Problem
---

`Clearly define a problem or an idea of your choice, where you would need to leverage the Foursquare location data to solve or execute. Remember that data science problems always target an audience and are meant to help a group of stakeholders solve a problem, so make sure that you explicitly describe your audience and why they would care about your problem`


If someone is looking to open a restaurant in blumenau, which neighbourhood would you recommend?

This is the defining problem for this capstone final project. The audience would be anyone who wants to or is thinking about starting a restuarent in Blumenau. Blumenau, is a small, yet, rapidly growing city in the south of Brazil. Because the city is growing, Blumenau has become a hot spot or perfect place to begin a restaurant. 

Numerous events occur in the city that promote the ever increaseing influx of foreigners, domestic and international, such as Oktober fest. As such, Blumenau does indeed offer a prefect place to begin a restaurant.  


# Data
`Describe the data that you will be using to solve the problem or execute your idea. Remember that you will need to use the Foursquare location data to solve the problem or execute your idea. You can absolutely use other datasets in combination with the Foursquare location data. So make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.`

I will be using a simple table of neighbourhoods also known as bairros in portguese. The data can be acquired from the local government [website](https://www.blumenau.sc.gov.br/secretarias/secretaria-de-desenvolvimento-urbano/pagina/historia-sobre-municipio/divisa-administrativa-bairros). Foursquare can then be used after the geocoordinates of each barrio is found.

The following is the data from the table that will be scrapped using BeautifulSoup:

Sobre o Município - Bairros - Divisão Administrativa
Bairros - Divisão Administrativa
Bairro Água Verde
Bairro Badenfurt
Bairro Boa Vista
Bairro Bom Retiro
Bairro Centro
Bairro Da Glória
Bairro Do Salto
Bairro Escola Agrícola
Bairro Fidélis
Bairro Fortaleza
Bairro Fortaleza Alta
Bairro Garcia
Bairro Itoupava Central
Bairro Itoupava Norte
Bairro Itoupava Seca
Bairro Itoupavazinha
Bairro Jardim Blumenau
Bairro Nova Esperança
Bairro Passo Manso
Bairro Ponta Aguda
Bairro Progresso
Bairro Ribeirão Fresco
Bairro Salto do Norte
Bairro Salto Weissbach
Bairro Testo Salto
Bairro Tribess
Bairro Valparaíso
Bairro Velha
Bairro Velha Central
Bairro Velha Grande
Bairro Victor Konder
Bairro Vila Formosa
Bairro Vila Itoupava
Bairro Vila Nova
Bairro Vorstardt



In [57]:
import os
import time
import json, requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

from dotenv import load_dotenv
load_dotenv()

client_id = os.getenv("client_id")
client_secret = os.getenv("client_secret")
version = '20180604'
limit = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore'

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from sklearn.preprocessing import normalize, MinMaxScaler
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.metrics import silhouette_score

import folium 

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="capstone project app")

resp = requests.get('https://www.blumenau.sc.gov.br/secretarias/secretaria-de-desenvolvimento-urbano/pagina/historia-sobre-municipio/divisa-administrativa-bairros').text
soup = BeautifulSoup(resp, 'lxml')
data = soup.find('div',{'id':'ultimas'})

address = 'Blumenau, Brazil'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Blumenau are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Blumenau are -26.9195567, -49.0658025.


In [58]:
bairros = []
for row in data.findAll('li')[1:]:
    cells = row.find_all(['span'])
   
    try:
        
        if(cells[0].text== 'Bairro Vorstardt'):
            bairro = 'Bairro Vorstadt'
        else:
            bairro = cells[0].text
        
        
    except IndexError:
        continue

    bairros.append(bairro.rstrip())

print(bairros)

['Bairro Água Verde', 'Bairro Badenfurt', 'Bairro Boa Vista', 'Bairro Bom Retiro', 'Bairro Centro', 'Bairro Da Glória', 'Bairro Do Salto', 'Bairro Escola Agrícola', 'Bairro Fidélis', 'Bairro Fortaleza', 'Bairro Fortaleza Alta', 'Bairro Garcia', 'Bairro Itoupava Central', 'Bairro Itoupava Norte', 'Bairro Itoupava Seca', 'Bairro Itoupavazinha', 'Bairro Jardim Blumenau', 'Bairro Nova Esperança', 'Bairro Passo Manso', 'Bairro Ponta Aguda', 'Bairro Progresso', 'Bairro Ribeirão Fresco', 'Bairro Salto do Norte', 'Bairro Salto Weissbach', 'Bairro Testo Salto', 'Bairro Tribess', 'Bairro Valparaíso', 'Bairro Velha', 'Bairro Velha Central', 'Bairro Velha Grande', 'Bairro Victor Konder', 'Bairro Vila Formosa', 'Bairro Vila Itoupava', 'Bairro Vila Nova', 'Bairro Vorstadt']


In [59]:
df = pd.DataFrame(bairros, columns=['Bairros'])
df['Bairros'] = df['Bairros'].map(lambda x: str(x)[7:])

df.head()

Unnamed: 0,Bairros
0,Água Verde
1,Badenfurt
2,Boa Vista
3,Bom Retiro
4,Centro


In [60]:
# def locate(x):
#     try:
#         location = geolocator.geocode('blumenau {}'.format(x))
#         print(x, location.latitude, location.longitude)
#     except:
#         time.sleep(2)
#         location = geolocator.geocode('blumenau {}'.format(x))
#         print(x, location.latitude, location.longitude)
#     time.sleep(2)
#     return location.latitude, location.longitude

# df["Latitude"], df["Longitude"] = zip(*df["Bairros"].map(locate))

# in fact we could just use the coords file, but in case we add addition columns in future df I will leave this way
latlong = pd.read_csv('coords.csv')
df = pd.merge(df, latlong, on='Bairros')

df.head()

Unnamed: 0,Bairros,Latitude,Longitude
0,Água Verde,-26.910743,-49.107369
1,Badenfurt,-26.88306,-49.135753
2,Boa Vista,-26.901357,-49.066842
3,Bom Retiro,-26.925561,-49.071635
4,Centro,-26.919902,-49.065934


# Methodology 

`the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.`

**important**

Because we work with a unlabeled dataset, I will use K-means clustering to find interesting groups/clusters within the dataset. I will also use cross validation and ensemble learning to fine-tune the model.

---

After data processing, the latitude and longitude for each bairro was used with Foursquare to obtain a list of venues specifically restuarants. 43 unique categories were found. The 10 most popular venues were selected and then a K-Fold and GridSearchCV with the following values were used:

```python
rand_state=50

folds=3

k_fold = KFold(n_splits=folds, shuffle=True, random_state=rand_state)

hyperparams = {
    "n_clusters": [2, 3, 4],
    "n_init": [10, 15, 20],
    "max_iter": [100, 200, 300, 400, 500],
    "tol": [.0000001, .000001, .00001, .0001],
}
```

`GridSearchCV()` typicall returns best parameters of `{'max_iter': 200, 'n_clusters': 4, 'n_init': 15, 'tol': 1e-05}` with a score of 0.33 (closer to 1 is best).




In [61]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        params = dict(
          client_id = client_id,
          client_secret = client_secret,
          v=version,
          ll='{},{}'.format(lat,lng),
          radius=radius,
          query='Restaurant',
          limit=limit
        )

        resp = requests.get(url=url, params=params)
        data = json.loads(resp.text)

        results = data["response"]['groups'][0]['items']      
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Bairros', 
                  'Bairros Latitude', 
                  'Bairros Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

blumenau_venues = getNearbyVenues(
    names=df['Bairros'],
    latitudes=df['Latitude'],
    longitudes=df['Longitude'])

Água Verde
Badenfurt
Boa Vista
Bom Retiro
Centro
Da Glória
Do Salto
Escola Agrícola
Fidélis
Fortaleza
Fortaleza Alta
Garcia
Itoupava Central
Itoupava Norte
Itoupava Seca
Itoupavazinha
Jardim Blumenau
Nova Esperança
Passo Manso
Ponta Aguda
Progresso
Ribeirão Fresco
Salto do Norte
Salto Weissbach
Testo Salto
Tribess
Valparaíso
Velha
Velha Central
Velha Grande
Victor Konder
Vila Formosa
Vila Itoupava
Vila Nova
Vorstadt


In [62]:
print('There are {} uniques categories.'.format(len(blumenau_venues['Venue Category'].unique())))

There are 43 uniques categories.


In [63]:
# one hot encoding
blumenau_onehot = pd.get_dummies(blumenau_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
blumenau_onehot['Bairros'] = blumenau_venues['Bairros'] 

# move neighborhood column to the first column
fixed_columns = [blumenau_onehot.columns[-1]] + list(blumenau_onehot.columns[:-1])
blumenau_onehot = blumenau_onehot[fixed_columns]

blumenau_onehot.head()

Unnamed: 0,Bairros,American Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Cafeteria,Café,Chinese Restaurant,Churrascaria,Creperie,Deli / Bodega,Diner,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Hawaiian Restaurant,Hot Dog Joint,Italian Restaurant,Japanese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Pastelaria,Pizza Place,Restaurant,Salad Place,Sandwich Place,Snack Place,Southern Brazilian Restaurant,Steakhouse,Sushi Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant
0,Água Verde,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Água Verde,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Badenfurt,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,Badenfurt,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Boa Vista,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [64]:
blumenau_grouped = blumenau_onehot.groupby('Bairros').mean().reset_index()
blumenau_grouped

Unnamed: 0,Bairros,American Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Cafeteria,Café,Chinese Restaurant,Churrascaria,Creperie,Deli / Bodega,Diner,Fast Food Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Hawaiian Restaurant,Hot Dog Joint,Italian Restaurant,Japanese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Pastelaria,Pizza Place,Restaurant,Salad Place,Sandwich Place,Snack Place,Southern Brazilian Restaurant,Steakhouse,Sushi Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant
0,Badenfurt,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Boa Vista,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
2,Bom Retiro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Centro,0.0,0.0,0.0,0.016949,0.0,0.152542,0.016949,0.084746,0.0,0.186441,0.016949,0.0,0.0,0.0,0.016949,0.050847,0.0,0.0,0.033898,0.0,0.0,0.0,0.016949,0.0,0.016949,0.016949,0.0,0.067797,0.033898,0.0,0.0,0.0,0.016949,0.067797,0.084746,0.0,0.016949,0.016949,0.016949,0.0,0.0,0.0,0.050847
4,Da Glória,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
5,Do Salto,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Escola Agrícola,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fortaleza,0.0,0.0,0.0,0.166667,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.083333,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0
8,Fortaleza Alta,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Garcia,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0


In [65]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [66]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Bairros']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
bairros_venues_sorted = pd.DataFrame(columns=columns)
bairros_venues_sorted['Bairros'] = blumenau_grouped['Bairros']

for ind in np.arange(blumenau_grouped.shape[0]):
    bairros_venues_sorted.iloc[ind, 1:] = return_most_common_venues(blumenau_grouped.iloc[ind, :], num_top_venues)

bairros_venues_sorted

Unnamed: 0,Bairros,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Badenfurt,Brazilian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega
1,Boa Vista,Steakhouse,Pastelaria,Food Stand,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega
2,Bom Retiro,Italian Restaurant,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie
3,Centro,Café,Brazilian Restaurant,Restaurant,Burger Joint,Italian Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Japanese Restaurant,Food Court
4,Da Glória,Snack Place,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie
5,Do Salto,Bakery,Bistro,Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Churrascaria,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant
6,Escola Agrícola,Bakery,Café,Food,Creperie,Chinese Restaurant,Food Court,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega
7,Fortaleza,Bakery,Restaurant,Pizza Place,Hot Dog Joint,Italian Restaurant,Snack Place,Brazilian Restaurant,Burger Joint,Vegetarian / Vegan Restaurant,Fish & Chips Shop
8,Fortaleza Alta,Bakery,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie
9,Garcia,Bakery,Sushi Restaurant,Fast Food Restaurant,Restaurant,Deli / Bodega,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Diner


In [67]:
rand_state=50
folds=3
k_fold = KFold(n_splits=folds, shuffle=True, random_state=rand_state)
hyperparams = {
    "n_clusters": [2, 3, 4],
    "n_init": [10, 15, 20],
    "max_iter": [100, 200, 300, 400, 500],
    "tol": [.0000001, .000001, .00001, .0001],
}

k_means = KMeans()

ensemble = GridSearchCV(
    estimator=k_means,
    param_grid=hyperparams,
    cv=k_fold,
    n_jobs=-1
)

blumenau_grouped_clustering = blumenau_grouped.drop('Bairros', 1)
ensemble.fit(blumenau_grouped_clustering)

labels = ensemble.predict(blumenau_grouped_clustering)
score = silhouette_score(blumenau_grouped_clustering, labels)

print(score)
print(ensemble.best_params_)

0.40750074160474825
{'max_iter': 200, 'n_clusters': 4, 'n_init': 10, 'tol': 1e-06}


# Results 

Using the following parameters for K-means:
`{'max_iter': 200, 'n_clusters': 4, 'n_init': 10, 'tol': 1e-06}`, which showed the best silhoutte score. The silhoutte score shows how close the points are to the center of their clusters where tighter clusters will give a better score. If the data points are very scattered, the clusters are too loose. 

We obtained 4 clusters and dropped all NaN rows leaving 31 bairros to examine while 4 were removed. 

Based on the Folium map below and from the output for cluster label 0, we can see that this cluster represents the bulk of the bairros (26). Cluster label 1 and 3 only represent a single neighbour while cluster label 2 represents two. 


In [77]:
kmeans = KMeans(n_clusters=ensemble.best_params_['n_clusters'], max_iter=ensemble.best_params_['max_iter'], n_init=ensemble.best_params_['n_init'], tol=ensemble.best_params_['tol'] ,random_state=RAND_STATE).fit(blumenau_grouped_clustering)
print(len(kmeans.labels_), len(blumenau_grouped_clustering), len(df), len(bairros_venues_sorted))

31 31 35 31


In [78]:
blumenau_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
blumenau_merged = blumenau_merged.join(bairros_venues_sorted.set_index('Bairros'), on='Bairros')

# drop all rows with NaN
blumenau_merged = blumenau_merged.dropna()

# add clustering labels
blumenau_merged['Cluster Labels'] = kmeans.labels_

blumenau_merged.head()

Unnamed: 0,Bairros,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Água Verde,-26.910743,-49.107369,Fast Food Restaurant,Hot Dog Joint,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Diner,Deli / Bodega,Creperie,0
1,Badenfurt,-26.88306,-49.135753,Brazilian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,0
2,Boa Vista,-26.901357,-49.066842,Steakhouse,Pastelaria,Food Stand,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,1
3,Bom Retiro,-26.925561,-49.071635,Italian Restaurant,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie,0
4,Centro,-26.919902,-49.065934,Café,Brazilian Restaurant,Restaurant,Burger Joint,Italian Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Japanese Restaurant,Food Court,2


In [79]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(ensemble.best_params_['n_clusters'])
ys = [i+x+(i*x)**2 for i in range(ensemble.best_params_['n_clusters'])]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(blumenau_merged['Latitude'], blumenau_merged['Longitude'], blumenau_merged['Bairros'], blumenau_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [80]:
blumenau_merged.loc[blumenau_merged['Cluster Labels'] == 0, blumenau_merged.columns[list(range(blumenau_merged.shape[1]))]]

Unnamed: 0,Bairros,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Água Verde,-26.910743,-49.107369,Fast Food Restaurant,Hot Dog Joint,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Diner,Deli / Bodega,Creperie,0
1,Badenfurt,-26.88306,-49.135753,Brazilian Restaurant,Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,0
3,Bom Retiro,-26.925561,-49.071635,Italian Restaurant,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie,0
5,Da Glória,-26.964187,-49.059479,Snack Place,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie,0
6,Do Salto,-26.883472,-49.102599,Bakery,Bistro,Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Churrascaria,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,0
7,Escola Agrícola,-26.895078,-49.099026,Bakery,Café,Food,Creperie,Chinese Restaurant,Food Court,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,0
10,Fortaleza Alta,-26.847192,-49.050457,Bakery,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,Creperie,0
11,Garcia,-26.934577,-49.059467,Bakery,Sushi Restaurant,Fast Food Restaurant,Restaurant,Deli / Bodega,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Diner,0
12,Itoupava Central,-26.81619,-49.089223,Bakery,Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,0
13,Itoupava Norte,-26.879553,-49.07824,Bakery,Sushi Restaurant,Burger Joint,Japanese Restaurant,Vegetarian / Vegan Restaurant,Churrascaria,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,0


In [81]:
blumenau_merged.loc[blumenau_merged['Cluster Labels'] == 1, blumenau_merged.columns[list(range(blumenau_merged.shape[1]))]]

Unnamed: 0,Bairros,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
2,Boa Vista,-26.901357,-49.066842,Steakhouse,Pastelaria,Food Stand,Vegetarian / Vegan Restaurant,Chinese Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,1


In [82]:
blumenau_merged.loc[blumenau_merged['Cluster Labels'] == 2, blumenau_merged.columns[list(range(blumenau_merged.shape[1]))]]

Unnamed: 0,Bairros,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
4,Centro,-26.919902,-49.065934,Café,Brazilian Restaurant,Restaurant,Burger Joint,Italian Restaurant,Pizza Place,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Japanese Restaurant,Food Court,2
25,Tribess,-26.871691,-49.049748,Fried Chicken Joint,Bakery,Snack Place,Hot Dog Joint,Chinese Restaurant,Food,Fish & Chips Shop,Fast Food Restaurant,Diner,Deli / Bodega,2


In [83]:
blumenau_merged.loc[blumenau_merged['Cluster Labels'] == 3, blumenau_merged.columns[list(range(blumenau_merged.shape[1]))]]

Unnamed: 0,Bairros,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
9,Fortaleza,-26.879053,-49.065259,Bakery,Restaurant,Pizza Place,Hot Dog Joint,Italian Restaurant,Snack Place,Brazilian Restaurant,Burger Joint,Vegetarian / Vegan Restaurant,Fish & Chips Shop,3
23,Salto Weissbach,-26.896694,-49.129936,German Restaurant,Diner,Vegetarian / Vegan Restaurant,Food Truck,Food Court,Food,Fish & Chips Shop,Fast Food Restaurant,Deli / Bodega,Creperie,3


# Discussion
---

Using a groupby we can see that a bulk of venues in category label 0 are infact just bakeries. However, this category contains the most restaurants.

Although we did not specify, in the business plan, the type of restaurant, we can see that majority of fast food resturants are located in Água Verde. Brazilian Restaurants are mainly found in Badenfurt, Passo Manso, and Ponta Aguda. Vila Formosa is a common place for burger joints, Tribess for fried chicken, Salto Weissbach for German food, Bom Retiro for italian food, Da Glória for snacks, and Boa Vista for steak. Meanwhile, for restaurants in general, the following bairros are popular Itoupavazinha, Jardim Blumenau, Velha, and Victor Konder.

These results may be interrupted in several ways for a new restaurant idea. The following bairros, Itoupavazinha, Jardim Blumenau, Velha, and Victor Konder, may be the best locations for generic restaurants; however, one may need to consider the overall cost for that location along with competition factors. Alternatively, the location may be best due to general thinking that this area contains the most generic restaurants bringing forth the most customers. 

In [75]:
blumenau_merged.groupby(['Cluster Labels', '1st Most Common Venue']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Bairros,Latitude,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
Cluster Labels,1st Most Common Venue,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
0,BBQ Joint,1,1,1,1,1,1,1,1,1,1,1,1
0,Bakery,10,10,10,10,10,10,10,10,10,10,10,10
0,Brazilian Restaurant,3,3,3,3,3,3,3,3,3,3,3,3
0,Burger Joint,1,1,1,1,1,1,1,1,1,1,1,1
0,Café,3,3,3,3,3,3,3,3,3,3,3,3
0,Fast Food Restaurant,1,1,1,1,1,1,1,1,1,1,1,1
0,Italian Restaurant,1,1,1,1,1,1,1,1,1,1,1,1
0,Restaurant,4,4,4,4,4,4,4,4,4,4,4,4
0,Snack Place,2,2,2,2,2,2,2,2,2,2,2,2
1,Steakhouse,1,1,1,1,1,1,1,1,1,1,1,1


In [89]:
blumenau_merged.groupby(['1st Most Common Venue', 'Bairros']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Latitude,Longitude,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
1st Most Common Venue,Bairros,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
BBQ Joint,Velha Central,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Do Salto,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Escola Agrícola,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Fortaleza,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Fortaleza Alta,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Garcia,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Itoupava Central,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Itoupava Norte,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Progresso,1,1,1,1,1,1,1,1,1,1,1,1
Bakery,Testo Salto,1,1,1,1,1,1,1,1,1,1,1,1


# Conclusion
---

In summary, there are four distinct categories described by the K-means method. I used K-fold and GridSearchCV to optimize the best values for the K-means method. The first category contains the bulk of the bairros that contain restaurants while category 1, 2, and 3 contain two or less bairros. Selection of bairro is strongly dictated by the type or theme of the restaurant; however, the most common bairros for general restaurants are Itoupavazinha, Jardim Blumenau, Velha, and Victor Konder. These bairros can represent either the best location to place a new restuarant or the worse. Factors such as rental costs, competition, etc. play an important role in the decision to open a restaurant in these locations, and must be considered.