# Capstone Project - The Battle of Neighborhoods (Week 1-2)

## Business Problem section

### Background

Italian immigration to Brazil peaked between 1880 and 1930. Italo-Brazilians are spread mainly across the southern and southeastern states of Brazil. Italo-Brazilians are descendants of the huge mass of Italian immigrants who arrived in Brazil between 1870 and 1960. There are no concrete data on the number of descendants of Italians in Brazil.
Immigration directly influenced the local cuisine of the southern provinces of Brazil, a region that received the greatest number of people.
The city of Rio de Janeiro, despite not having received a large number of immigrants, was influenced by the tourism of Italian citizens, who periodically visit the city.

The rich and varied Italian cuisine, distinct in the various regions of the country, influenced the cuisine of practically the rest of the world. In the case of Brazilian cuisine, for example, the influence occurred through Italian immigration in the period between 1880 and 1930. Immigrants mixed the flavors and aromas of the homeland with the ingredients offered by the new land, adding the recipes from the different regions of Italy. developing them on Sunday lunches, which brought together several families, from different regions, in the same backyard.
In Brazil there are many Italian restaurants with a menu full of delicious pasta, but for those who want to enjoy a good and unhealthy meal, it is advisable to pay attention to two important points: quality of the place and quality of food.

### Business Problem

Italian restaurants are spread out in large Brazilian centers, which increases competition. That way, we reach the big goal.

If a company specialized in food needs a consultancy on opening a restaurant in a Brazilian capital, which capital to indicate?
If the big prerequisite is the opening of a restaurant in a populous region with little competition, which city should we indicate.

A restaurant consultancy will help the company to better understand its reality, bring a new view on the competition and lead the company strategically to the creation of a new establishment, which will help in obtaining results.

### Data section

Based on immigration surveys (http://www.imigrantesitalianos.com.br), the most populous Brazilian capitals with the largest number of Italian immigrants were selected:

- Sao Paulo
- Porto Alegre
- Florianópolis
- Curitiba
- Rio de Janeiro

Such data help in the application analysis for the opening of a new restaurant.

### Methodology section

We will use the FourSquare API to collect data about restaurants in five major Brazilian cities. The data are plotted on a map to assist in the analysis of distance between restaurants, a prerequisite of the analysis for starting a business.
The quadrangular limits us to a maximum of 100 locations per consultation. The analysis focused only on the five cities mentioned, which were plotted to facilitate the analysis.
Then, to obtain an indicator of the density of Italian restaurants, the central coordinate of the locations was calculated to obtain the average values of longitude and latitude. Then, I averaged the Euclidean distance from each location to the average coordinates. That was my indicator; average distance to the average coordinate.

### Results

Below, we can see that all cities have a large number of Italian restaurants, and that competition is strong in the neighborhood.
The maps were generated in the geoplot with folio:

In [4]:
# import labrarys
import numpy as np
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library

In [5]:
# Forsquare credentials
CLIENT_ID = 'EQXNTOCMHQ2XTDFKEJ1BGUX1EHURHKCE4BBZUWSRAWD0NQAC' # your Foursquare ID
CLIENT_SECRET = 'W5JK30GFNMI50HH3VSALF2CHDXAF5BXP2TV5QRITEFXQ1NMC' # your Foursquare Secret
VERSION = '20180323'

In [6]:
# Find data from São Paulo (Brazil), Rio de Janiero (Brazil)
LIMIT = 500 # Maximum is 100
cities = ['São Paulo, SP', "Porto Alegre, RS", "Florianópolis, SC", "Curitiba, PR", "Rio de Janeiro, RJ"]
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d110941735") # Italian Food CATEGORY ID
    results[city] = requests.get(url).json()

In [7]:
results

{'São Paulo, SP': {'meta': {'code': 200,
   'requestId': '5ea7c59f6d8c560027c01ce4'},
  'response': {'suggestedFilters': {'header': 'Tap to show:',
    'filters': [{'name': 'Open now', 'key': 'openNow'}]},
   'geocode': {'what': '',
    'where': 'são paulo sp',
    'center': {'lat': -23.5475, 'lng': -46.63611},
    'displayString': 'São Paulo, SP, Brazil',
    'cc': 'BR',
    'geometry': {'bounds': {'ne': {'lat': -23.356903009710976,
       'lng': -46.36505302809169},
      'sw': {'lat': -24.007316994733713, 'lng': -46.82635794604196}}},
    'slug': 'sao-paulo',
    'longId': '72057594041376375'},
   'headerLocation': 'São Paulo',
   'headerFullLocation': 'São Paulo',
   'headerLocationGranularity': 'city',
   'query': 'italian',
   'totalResults': 188,
   'suggestedBounds': {'ne': {'lat': -23.466525905651366,
     'lng': -46.529600349001825},
    'sw': {'lat': -23.655072360285345, 'lng': -46.72037303641319}},
   'groups': [{'type': 'Recommended Places',
     'name': 'recommended',
   

In [8]:
len(results)

5

In [9]:
# The Foursquare API Only gives us the nearest 100 venues in the city.
df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']

  after removing the cwd from sys.path.


In [10]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print(f"Total number of pizza places in {city} = ", results[city]['response']['totalResults'])
    print("Showing Top 100")



Total number of pizza places in São Paulo, SP =  188
Showing Top 100
Total number of pizza places in Porto Alegre, RS =  87
Showing Top 100
Total number of pizza places in Florianópolis, SC =  68
Showing Top 100
Total number of pizza places in Curitiba, PR =  104
Showing Top 100
Total number of pizza places in Rio de Janeiro, RJ =  115
Showing Top 100


Below, we can see that all cities have a large number of Italian restaurants, and that competition is strong in the neighborhood.
The maps were generated in the geoplot with folio:

#### São Paulo

In [11]:
maps[cities[0]]

#### Porto Alegre

In [12]:
maps[cities[1]]

#### Florianópolis

In [13]:
maps[cities[2]]

#### Curitiba

In [14]:
maps[cities[3]]

#### Rio de Janeiro

In [15]:
maps[cities[4]]

Analysis of Results (only establishments registered on Foursquare)
- Total number of pizza places in São Paulo, SP = 188
- Total number of pizza places in Porto Alegre, RS = 85
- Total number of pizza places in Florianópolis, SC = 68
- Total number of pizza places in Curitiba, PR = 104
- Total number of pizza places in Rio de Janeiro, RJ = 115

Upon First inspection we see that São Paulo, Curitiba, Porto Alegre and Florianópolis are the most densely cities. In the next phase we Calculate the Mean coordinate and the mean distance to mean coordinate(MDMC). We represent the mean coordinate with a big green circle and distances with green line

In [16]:
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)
    venues_mean_coor = [df_venues[city]['Lat'].mean(), df_venues[city]['Lng'].mean()] 
    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])
        folium.PolyLine([venues_mean_coor, [lat, lng]], color="green", weight=1.5, opacity=0.5).add_to(maps[city])
    
    label = folium.Popup("Mean Co-ordinate", parse_html=True)
    folium.CircleMarker(
        venues_mean_coor,
        radius=10,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(maps[city])

    print(city)
    print("Mean Distance from Mean coordinates")
    print(np.mean(np.apply_along_axis(lambda x: np.linalg.norm(x - venues_mean_coor),1,df_venues[city][['Lat','Lng']].values)))

São Paulo, SP
Mean Distance from Mean coordinates
0.03834513392561405
Porto Alegre, RS
Mean Distance from Mean coordinates
0.03106063987001945
Florianópolis, SC
Mean Distance from Mean coordinates
0.08019835743825189
Curitiba, PR
Mean Distance from Mean coordinates
0.027956087712631295
Rio de Janeiro, RJ
Mean Distance from Mean coordinates
0.09823943423737182


### São Paulo

In [17]:
maps[cities[0]]

### Porto Alegre

In [18]:
maps[cities[1]]

### Florianópolis

In [19]:
maps[cities[2]]

### Curitiba

In [20]:
maps[cities[3]]

### Rio de Janeiro 

In [21]:
maps[cities[4]]

Results - Mean Distance from Mean coordinates:
- Curitiba, PR: 0.027956087712631295;
- Porto Alegre, RS: 0.03108649945434047;
- São Paulo, SP: 0.03834513392561405;
- Florianópolis, SC: 0.07987321369324735;
- Rio de Janeiro, RJ: 0.09823943423737182


## Discussion

We can see that São Paulo is the city with the largest number of establishments specializing in Italian food. The cities of southern Brazil also have a significant number of establishments.
Another important factor is the population of these capitals, the most significant of which are São Paulo and Rio de Janeiro, respectively.
We visually perceive that the density is concentrated in the shopping centers of these capitals, except in Rio de Janeiro, where the restaurants are far away.
To get a concrete measure of this density, we will use some basic statistics. I will get the average location of the establishments
Next, I will take the average distance from the locations to the average coordinates.
In a simplified way, Italian restaurants are crowded in most capitals, except in Rio de Janeiro.

### Conclusion

Taking into account that the consultancy was hired to analyze the best city for opening a restaurant away from the competition, we easily concluded that Rio de Janeiro is the only city with Italian food on every corner. That way, there is plenty of space to cook pasta and make money away from the competition in all other cities.
Buon appetito!