<a href="https://en.wikipedia.org/wiki/Bras%C3%ADlia"><img src = "https://www.jota.info/wp-content/uploads/2018/11/71a97b3fe90b01b307ae56f4a59e2dba.jpg" width = 400, align = "center"></a>

<h1 align=center><font size = 5>Using foursquare API to estimate per capita income in Brasilia neighborhoods</font></h1>

# Introduction

Brasília is the federal capital of Brazil. The city is located atop the Brazilian highlands in the center-western region and has the highest GDP per capita in Brazil. Even among major Latin American cities, Brasília has the highest GDP per capita. 

The local legislation treats the federal district as a unique city. As a consequence, official statistics are not segmented by neighborhood nor the so-called administrative regions within the federal district. For instance, there are no criminality records or per capita income segmented by neighborhood.

We attempt to produce useful information segmented by the neighborhoods of Brasilia. 
From the beginning, the neighborhood is an informal concept in Brasilia, since the government uses the concept of administrative regions. Administrative Regions שרק 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">The Lack of Segmented Information Problem</a>

2. <a href="#item2">Brasilia's neighborhoods from GuiaMais.com.br</a>

3. <a href="#item3">Using Geopy to Get Brasilia neighborhood informations</a>

4. <a href="#item4">Storing in pandas dataset</a>
    
5. <a href="#item5">Map of Brasilia using Folium</a>
    
6. <a href="#item6">More to come</a>
    
</font>
</div>

In [2]:
#Install de required packages
#! conda install -c anaconda beautifulsoup4 --yes
! conda install -c conda-forge geopy --yes 
#! conda install -c anaconda lxml --yes
#! conda install -c conda-forge geopy --yes 

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         238 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.50-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

## 1. The Lack of Segmented Information Problem

Although we can say that Brasilia is a city and the Federal District is a state, officially they share the same borders. It's a state with only one city within its borders. Even though people use to group together Asa Sul, Asa Norte, Sudoeste, Noroeste, Lago Sul, Lago Norte, and Park Way and call only it Brasilia. It's an informal denomination that relies on personal taste and can change from people to people.

The legislative design treats the federal district as a unique city. As a consequence, official statistics are not segmented by neighborhood or the so-called administrative regions within the federal district.

<img src = "https://www.infoescola.com/wp-content/uploads/2011/08/mapa-distrito-federal.jpg" width = 400, align = "center"></a>
    

Brazil's Federal District is a square-shaped territory within the borders of Goias state. That's why we got the local nickname "goianos from the square"
Can you see the 'airplane' in the middle of the square? 

## 2.  Brasilia's neighborhoods from Guia Mais

## Methodology
1. I got the list of neighborhoods using the informal list provided by [Guia Mais website](https://www.guiamais.com.br/bairros/brasilia-df).
2. I store the names of the neighborhoods in a list, and passed it to geopy query.
3. I stored the latitudes, longitudes, postal codes and other information in a pandas dataframe.
4. I intend to use the bsb dataframe to construct indicators such as the average price spends in restaurants, numbers of cafes, the number of pizza places, price of a pizza, neighborhood importance, to construct a proxy for per capita income by neighborhood.


## 3.  Using Geopy to Get Brasilia neighborhood informations

In [13]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
#from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Libraries imported.')

Libraries imported.


In [4]:
# Storing the neighborhood (bairros) in a list

In [14]:
bairros = ['Águas Claras','Asa Norte','Asa Sul','Brazlândia','Candangolândia','Ceilândia','Ceilândia Sul','Cruzeiro','Delegado Lago Ii','Eixo Monumental','Gama',
'Guará','Guará I','Lago Norte','Lago Sul','Núcleo Bandeirante','Núcleo Rural Lago Oeste','Octogonal','Sudoeste','Paranoá','Park Way',
'Planaltina','Ponte Alta','Recanto das Emas','Região dos Lagos','Residencial Itaipu','Riacho Fundo','Samambaia','Santa Maria','São Sebastião',
'Taguatinga','Setor de Habitaco Lago Norte','Setor Econômico de Sobradinho','Setor Habitacional Samambaia','Vicente Pires',
'Setor M Ceilândia','Monte D Armas Planaltina','Noroeste','Setor Norte Gama','Setor Norte Vila E Guará','Setor Oeste Guará',
'Setor Oficinas Asa Norte','Arniqueiras','SIA','Sobradinho','Águas Claras Sul','Taguatinga Centro','Taguatinga Sul','Vila Planalto',
'Vila São José Vicente Pires','Zona Indústrial Guará']


## 4. Storing in pandas dataset

In [16]:
bsb = pd.DataFrame({'bairro': [], 'latitude': [], 'longitude': [], 'type': [], 'postcode':[], 'importance':[] })
       
geolocator = Nominatim(user_agent="foursquare_agent")

for bairro in bairros:
    bairro2 = bairro + ', DF'
    location = geolocator.geocode(bairro2)
    try:
        lat = location.latitude
        lng = location.longitude
        type = location.raw['type']
        zipc = location.raw['display_name'].split()
        impo = location.raw['importance']
        
        bsb = bsb.append({'bairro': bairro, 'latitude': lat, 'longitude': lng, 'type': type, 'postcode': zipc[-2], 'importance': impo }, ignore_index=True)
    except:
        pass
    print('Neighborhood {0} has latitude {1} and longitude {2} '.format(bairro, lat, lng))

Neighborhood Águas Claras has latitude -15.8419933 and longitude -48.0281208 
Neighborhood Asa Norte has latitude -15.7627976 and longitude -47.883951 
Neighborhood Asa Sul has latitude -15.8169455 and longitude -47.900049 
Neighborhood Brazlândia has latitude -15.6808898 and longitude -48.1942621 
Neighborhood Candangolândia has latitude -15.8536609 and longitude -47.9493775 
Neighborhood Ceilândia has latitude -15.8173391 and longitude -48.1045766 
Neighborhood Ceilândia Sul has latitude -15.8287665 and longitude -48.0971053 
Neighborhood Cruzeiro has latitude -15.7908774 and longitude -47.9373916 
Neighborhood Delegado Lago Ii has latitude -15.7908774 and longitude -47.9373916 
Neighborhood Eixo Monumental has latitude -15.7842364 and longitude -47.9162597 
Neighborhood Gama has latitude -16.0170857 and longitude -48.0653054 
Neighborhood Guará has latitude -15.8235629 and longitude -47.9768165 
Neighborhood Guará I has latitude -15.8235629 and longitude -47.9768165 
Neighborhood La

In [20]:
bsb['postcode']=bsb['postcode'].str.replace('Centro-Oeste','').replace(',','').replace('-','')

In [21]:
bsb.head()

Unnamed: 0,bairro,latitude,longitude,type,postcode,importance
0,Águas Claras,-15.841993,-48.028121,city,,0.577761
1,Asa Norte,-15.762798,-47.883951,suburb,"70744-020,",0.56
2,Asa Sul,-15.816945,-47.900049,suburb,70347090,0.56
3,Brazlândia,-15.68089,-48.194262,administrative,,0.539662
4,Candangolândia,-15.853661,-47.949377,administrative,,0.515802


In [26]:
bsb[['latitude','longitude','bairro']].iloc[1]

latitude    -15.762798
longitude   -47.883951
Name: 1, dtype: float64

In [30]:
latitude = bsb[['latitude','longitude']].iloc[1][0]
longitude = bsb[['latitude','longitude']].iloc[1][1]

## 5.  Map of Brasilia using Folium

In [31]:

map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(bsb['latitude'], bsb['longitude'], bsb['bairro']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
display(map_newyork)

In [32]:
map_newyork.save("bsb.png")

## 5. More to come

Now its time to use foursquare API to get relevant information by Neighborhood

In [111]:
CLIENT_ID = 'USC23TI2Y3K0WLFYXPJUXXEJVZ4NUEMT3CR5T3ZCNTJ3IB55' # your Foursquare ID
CLIENT_SECRET = 'U2QVNANSIRP4OBT5MWECVSUP0VR1PVEQLDKY5S4HY4CNCP0S' # your Foursquare Secret
VERSION = '20191013'
LIMIT = 50
print('Patrick credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: USC23TI2Y3K0WLFYXPJUXXEJVZ4NUEMT3CR5T3ZCNTJ3IB55
CLIENT_SECRET:U2QVNANSIRP4OBT5MWECVSUP0VR1PVEQLDKY5S4HY4CNCP0S


In [33]:
import json 
import requests 
from pandas.io.json import json_normalize 
CLIENT_ID = 'V53MHPBBL23EVUFTF31HTKJAFRJIK2QL1WJURTZ3ARRFF3KE'
CLIENT_SECRET = 'G5Z0ALDAAKEOPESPXKBZPXNB2TIEJKGWM33ZGSBCNW0VGW0T'
VERSION = '20191018' 
LIMIT = 100

In [34]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
        'PostalCode', 'Latitude', 'Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

In [35]:
venues = getNearbyVenues(
    names=bsb['bairro'],
    latitudes=bsb['latitude'],
    longitudes=bsb['longitude'] )

In [36]:
print(venues.shape)
venues.head()

(762, 7)


Unnamed: 0,PostalCode,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Águas Claras,-15.841993,-48.028121,Ahi Poke Bar,-15.841719,-48.027315,Hawaiian Restaurant
1,Águas Claras,-15.841993,-48.028121,Me Gusta Sandubaria Artesanal,-15.840484,-48.029573,Food Truck
2,Águas Claras,-15.841993,-48.028121,Casero Bistrô,-15.843929,-48.029564,Restaurant
3,Águas Claras,-15.841993,-48.028121,Adorável Café,-15.843863,-48.029299,Café
4,Águas Claras,-15.841993,-48.028121,Bonnapan,-15.839519,-48.026673,Bakery


Finish for while

More to come...

More to come...

More to come...

More to come...



In [39]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in bsb.columns if col.startswith('location.')] + ['id']
bsb_filtered = bsb.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

In [40]:
bsb2 = bsb_filtered[['lat','lng','formattedAddress']]

In [41]:
bsb2.head()

In [42]:
bsb2.shape