# Battle of the Neighbourhoods week 2

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>
Montes Claros is a small city in the north of Minas Gerais, in Brazil. It's a growing, cosmopolitan city, full of business investment oportunities. We would like to help invest in a new type of restaurant, one that hasn't got a lot of competition, so we're looking for types of restaurant that are more rare in the city, as well as good locations to place the new restaurant.

## Data <a name="data"></a>
We will use the Fourquare API to gather venue data around downtown Montes Claros, and plot the map to check the locations. We need to check the data for any venues that are not food related, and make graphs to quantify the different types of restaurant that are already available.

In [361]:
import numpy as np # library to handle data in a vectorized manner
import time
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import plotly.express as px

import json # library to handle JSON files
import requests # library to handle requests

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium # map rendering library
import seaborn as sns

print('Libraries imported.')

Libraries imported.


This next group cells is to get the coordinates for Montes Claros, and connect to the Fourquare API to get venue information for plotting.

In [362]:
address = 'Montes Claros'

geolocator = Nominatim(user_agent="moc_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Montes Claros are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Montes Claros are -16.7273538, -43.8717676.


In [363]:
moc_latitude = -16.7273538
moc_longitude = -43.8717676

In [364]:
food_category = '4d4b7105d754a06374d81259'
CLIENT_ID = '********************************'
CLIENT_SECRET = '*********************************'
VERSION = '20180604'

In [365]:
LIMIT = 200 # limit of number of venues returned by Foursquare API
radius = 10000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    moc_latitude, 
    moc_longitude, 
    food_category,
    radius, 
    LIMIT)

In [366]:
results = requests.get(url).json()

In [367]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [368]:
venues = results['response']['groups'][0]['items']

MOCnearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
MOCnearby_venues = MOCnearby_venues.loc[:, filtered_columns]

# filter the category for each row
MOCnearby_venues['venue.categories'] = MOCnearby_venues.apply(get_category_type, axis=1)

# clean columns
MOCnearby_venues.columns = [col.split(".")[-1] for col in MOCnearby_venues.columns]

MOCnearby_venues

Unnamed: 0,name,categories,lat,lng
0,Subway,Sandwich Place,-16.728784,-43.87162
1,Center Pão,Bakery,-16.730044,-43.87534
2,Kalifa,Indian Restaurant,-16.721859,-43.86656
3,Center Pão,Bakery,-16.731479,-43.871844
4,Vila 61 Casa Bar,Brazilian Restaurant,-16.732331,-43.876665
5,Casa do Pastel,Bakery,-16.72365,-43.864111
6,Padaria Marron Glacê,Bakery,-16.718741,-43.872313
7,No Tempero Self Service,Brazilian Restaurant,-16.723626,-43.867107
8,Big's,Burger Joint,-16.718613,-43.867057
9,Jac's Café,Fast Food Restaurant,-16.741613,-43.871184


Now that we have the venues, it's time to see them in the map of Montes Claros:

In [369]:
map_moc = folium.Map(location=[latitude, longitude], zoom_start=14)
for lat, lng, name in zip(MOCnearby_venues['lat'], MOCnearby_venues['lng'], MOCnearby_venues['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moc) 
map_moc

## Methodology <a name="methodology"></a>

In this project we will try to determine the best type of restaurant to open in Montes Claros/MG, and also the best locations in which it can be opened.

We have collected Foursquare data on the food venues that already exist in the city, and have shown it all in a map. The map will help us decide the best locations. We have also generated a dataframe containing all the venues, along with their location, and food category.

During the data wrangling process, we will join similar restaurants together, since there are overlapping categories, e.g. Japanese Restaurant and Sushi Restaurant.

## Analysis <a name="analysis"></a>

We start the analysis by looking at a basic distribution of the categories by quantity in a histogram.

In [370]:
hist = MOCnearby_venues.groupby(['categories']).count().reset_index().sort_values(by='name', ascending=False)
fig = px.histogram(data_frame=hist, x='categories', y='name')
fig.update_layout(
    title="Quantity of food venues per category.",
    xaxis_title="Category",
    yaxis_title="Quantity",
    font=dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    )
)

fig.show()

- We see that there are overlapping categories, e.g. Burger Joint and Sandwich Place. They might seem different, but, in Brazil, they mean the same type of food in most cases.
- We will join the overlapping categories together.
- We will also remove bakeries, since they are not in the same business model. Also, to simplify, we will flag any place to be removed as a bakery.

In [371]:
MOCnearby_venues.replace('Sandwich Place', 'Burger Joint', inplace=True)
MOCnearby_venues.replace('Sushi Restaurant', 'Japanese Restaurant', inplace=True)
MOCnearby_venues.replace('Brazilian Restaurant', 'Restaurant', inplace=True)
MOCnearby_venues.loc[MOCnearby_venues.name == 'Casa do Pastel', 'categories'] = 'Pastelaria'
MOCnearby_venues.loc[MOCnearby_venues.name == 'Pesca e pague Caminho do Pescador', 'categories'] = 'Seafood Restaurant'
MOCnearby_venues.loc[MOCnearby_venues['name'].str.contains('Praça'), 'categories'] = 'Bakery'
MOCnearby_venues.loc[MOCnearby_venues['name'].str.contains('Bem Família'), 'categories'] = 'Restaurant'
MOCnearby_venues.loc[MOCnearby_venues['name'].str.contains('Sabor E'), 'categories'] = 'Restaurant'
MOCnearby_venues.replace('Pastelaria', 'Snack Place', inplace=True)
MOCnearby_venues.replace('Food', 'Snack Place', inplace=True)
MOCnearby_venues.replace('Steakhouse', 'BBQ Joint', inplace=True)
MOCnearby_venues.replace('Diner', 'Restaurant', inplace=True)
MOCnearby_venues.replace('Café', 'Bakery', inplace=True)
MOCnearby_venues.replace('Acai House', 'Snack Place', inplace=True)
MOCnearby_venues.replace('Gastropub', 'Restaurant', inplace=True)
MOCnearby_venues.replace('American Restaurant', 'Hot Dog Joint', inplace=True)
MOCnearby_venues.replace('Food Court', 'Bakery', inplace=True)
MOCnearby_venues.replace('Fish & Chips Shop', 'Seafood Restaurant', inplace=True)
MOCnearby_venues.replace('Food Truck', 'Hot Dog Joint', inplace=True)
MOCnearby_venues.replace('Mac & Cheese Joint', 'Snack Place', inplace=True)
venues = MOCnearby_venues.loc[(MOCnearby_venues['categories'] != 'Bakery')]
hist = venues.groupby(['categories']).count().reset_index().sort_values(by='name', ascending=False)
fig = px.histogram(data_frame=hist, x='categories', y='name')
fig.update_layout(
    title="Quantity of food venues per category.",
    xaxis_title="Category",
    yaxis_title="Quantity",
    font=dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    )
)

fig.show()

After this dataframe cleanup, every Fast Food Restaurant left is a burger place, so we will combine them together. 

In [372]:
MOCnearby_venues.replace('Fast Food Restaurant', 'Burger Joint', inplace=True)
venues = MOCnearby_venues.loc[(MOCnearby_venues['categories'] != 'Bakery')]
hist = venues.groupby(['categories']).count().reset_index().sort_values(by='name', ascending=False)
fig = px.histogram(data_frame=hist, x='categories', y='name')
fig.update_layout(
    title="Quantity of food venues per category.",
    xaxis_title="Category",
    yaxis_title="Quantity",
    font=dict(
        family="Courier New, monospace",
        size=11,
        color="RebeccaPurple"
    )
)

fig.show()

The next step will help us choose a location for our venue. We use DBSCAN to find clusters of restaurants to help determine which places are potentially more advantageous to open a new restaurant, based on the fact that they are clusteres around places of elevated flux of people everyday.

In [373]:
from sklearn.cluster import DBSCAN
import sklearn.utils
from sklearn.preprocessing import StandardScaler
X = venues[['lat', 'lng']]
X = StandardScaler().fit_transform(X)

In [374]:
db = DBSCAN(eps=0.33, min_samples=5).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
venues["Clus_Db"]=labels
realClusterNum=len(set(labels)) - (1 if -1 in labels else 0)
clusterNum = len(set(labels))



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [375]:
venues1 = venues.loc[venues['Clus_Db'] == 1]
venues2 = venues.loc[venues['Clus_Db'] == 2]
venues0 = venues.loc[venues['Clus_Db'] == 0]
venuesn1 = venues.loc[venues['Clus_Db'] == -1]

In [376]:
map_moc = folium.Map(location=[latitude, longitude], zoom_start=14)
for lat, lng, name in zip(venues1['lat'], venues1['lng'], venues1['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moc)
    
for lat, lng, name in zip(venues2['lat'], venues2['lng'], venues2['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moc)
    
for lat, lng, name in zip(venues0['lat'], venues0['lng'], venues0['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moc)
    
for lat, lng, name in zip(venuesn1['lat'], venuesn1['lng'], venuesn1['name']):
    label = '{}'.format(name)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='black',
        fill=True,
        fill_color='black',
        fill_opacity=0.7,
        parse_html=False).add_to(map_moc)
    
map_moc

The map shows us 3 different clusters, and the black dots are outliers.

## Results and Discussion <a name="results"></a>

The histogram shows us that the most common types of food venues are restaurants and burger joints. Since we are interested in opening a new place with less competition, we shall focus on the types with only one place, which are italian, indian, german and chinese restaurants. Another option would be to open a new kind of restaurant, such as a mexican restaurant. This could potentially be a better idea, because mexican cuisine shares many aspects with the local cuisine, but brings new elements into it.

A different approach would be to take advantage of the existing market for traditional restaurants and burger joints. This approach would have far more competition, however, the local population is already accustomed with the type of food.

In [377]:
fig.show()

- In deciding the place to open the restaurant, we look to the clustered map. We see three clusters. The blue cluster is around one of the main boulevards in the city (which includes a mall), the green cluster is downtown Montes Claros, and the red cluster is around a secondary commercial zone. Black dots are outliers.
- Our recommendation is that a new restaurant, especially one of a new type, be opened near one of these clusters, to increase visibility.

In [378]:
map_moc

## Conclusion <a name="conclusion"></a>

The purpose of this project was to determine which kind of restaurant would be more advantageous to open in the city of Montes Claros, and where it should be located.

We used the Foursquare API to determine the different food venues in the city, and determine their categories. In the data wrangling phase we made modifications to the dataframe based on our knowledge of the market. We determined two approaches. We recommend either to take advantage of the established taste for burgers and traditional cuisine, with increased competition, or to bring a rarely found type of restaurant, perhaps a new one. For a new type of restaurant, we recommend mexican food, for it has similarities with the local cuisine.

We used DBSCAN to create clusters of restaurants, which showed us the three best places where we recommend the restaurant should be opened, i.e. near the blue or green clusters.