# Capstone Project - Location of a Pet Store in São Paulo

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem

In this project we will try to find an optimal location for a pet store in São Paulo (capital). Specifically, this report will be targeted to stakeholders interested in opening a Pet Store in São Paulo, Brasil.

Since there are lots of Pet Stores in São Paulo we will try to detect locations that are not already crowded with competitors.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data

Based on definition of our problem, factor that will influence our decission are: number of existing Pet Stores in the neighborhood and population.

I decided to use a list of neighborhoods in São Paulo, from: https://www.prefeitura.sp.gov.br/cidade/secretarias/subprefeituras/subprefeituras/dados_demograficos/index.php?p=12758 and obtained the location using the argis method from geocoder.

The number of Pet Stores and location in every neighborhood will be obtained using Foursquare API.

Coordinate of São Paulo center will be obtained using Nominatim from geopy.

First, let's import de required libraries:

In [1]:
!pip install bs4
from bs4 import BeautifulSoup

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geocoder 
import geocoder # import geocoder

!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

 # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
# map rendering library
import folium 

print('Libraries imported.')

Libraries imported.


### Import Data from São Paulo City Hall

In [2]:
# using beautiful soup to import data from the city hall site
url='https://www.prefeitura.sp.gov.br/cidade/secretarias/subprefeituras/subprefeituras/dados_demograficos/index.php?p=12758'
data=requests.get(url).text
soup=BeautifulSoup(data,'html.parser')

In [3]:
table=soup.find('table')
neighborhoods=pd.DataFrame(columns=['Neighborhood']) #initiate the dataframe
population_data=pd.DataFrame(columns=['Neighborhood','Population']) #initiate population dataframe
for row in table.tbody.find_all('tr'):
    col=row.find_all('td')
    if (col!=[]):
        if len(col)==5:
            subprefeitura=col[0].text.strip()
            neighborhood=col[1].text.strip()
            area=col[2]
            population=col[3].text.strip()
            if neighborhood=='TOTAL':
                pass
            else:
                neighborhoods=neighborhoods.append({'Neighborhood':neighborhood},ignore_index=True)
                population_data=population_data.append({'Neighborhood':neighborhood,'Population':population},ignore_index=True)
            
        else:
            neighborhood=col[0].text.strip()
            area=col[1]
            population=col[2].text.strip()
            if neighborhood=='TOTAL':
                pass
            else:
                neighborhoods=neighborhoods.append({'Neighborhood':neighborhood},ignore_index=True)
                population_data=population_data.append({'Neighborhood':neighborhood,'Population':population},ignore_index=True)

In [4]:
neighborhoods.head() # examine the dataframe

Unnamed: 0,Neighborhood
0,Aricanduva
1,Carrão
2,Vila Formosa
3,Butantã
4,Morumbi


In this project we will be examining 96 neighborhoods.

In [5]:
neighborhoods.shape

(96, 1)

In [6]:
population_data.head() #examine the dataframe

Unnamed: 0,Neighborhood,Population
0,Aricanduva,89.622
1,Carrão,83.281
2,Vila Formosa,94.799
3,Butantã,54.196
4,Morumbi,46.957


In [7]:
# using arcgis to find the latitude and longitude of each neighborhood
sp_data=pd.DataFrame(columns=['Neighborhood','Latitude','Longitude']) #initiate the dataframe
for index,neighborhood in enumerate(neighborhoods['Neighborhood']):
    address = "".join((str(neighborhood),', São Paulo, São Paulo, Brasil'))
    g = geocoder.arcgis(address)
    while (g.latlng is None):
        g = geocoder.arcgis(address)
        print(address, g.latlng)
    latlng = g.latlng
    lat=latlng[0]
    sp_data=sp_data.append({'Neighborhood':neighborhood,'Latitude':lat,'Longitude':latlng[1]},ignore_index=True)

Status code Unknown from https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find: ERROR - HTTPSConnectionPool(host='geocode.arcgis.com', port=443): Read timed out. (read timeout=5.0)


Cidade Dutra, São Paulo, São Paulo, Brasil [-23.711959999999976, -46.70392999999996]


Status code Unknown from https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find: ERROR - HTTPSConnectionPool(host='geocode.arcgis.com', port=443): Read timed out. (read timeout=5.0)


Socorro, São Paulo, São Paulo, Brasil [-23.66939999999994, -46.71623999999997]


Status code Unknown from https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find: ERROR - HTTPSConnectionPool(host='geocode.arcgis.com', port=443): Read timed out. (read timeout=5.0)


Jaçanã, São Paulo, São Paulo, Brasil [-23.46805999999998, -46.582289999999944]


Status code Unknown from https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer/find: ERROR - HTTPSConnectionPool(host='geocode.arcgis.com', port=443): Read timed out. (read timeout=5.0)


Tucuruvi, São Paulo, São Paulo, Brasil [-23.474069999999983, -46.610739999999964]


In [8]:
sp_data.head() # examine the dataframe

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Aricanduva,-23.56771,-46.51025
1,Carrão,-23.54798,-46.53885
2,Vila Formosa,-23.56642,-46.5394
3,Butantã,-23.57089,-46.70968
4,Morumbi,-23.601,-46.71551


In [9]:
#using Nominatim to get the latitude and longitude of São Paulo
address = 'Sao Paulo, Sao Paulo'

geolocator = Nominatim(user_agent="sp_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of São Paulo are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of São Paulo are -23.5506507, -46.6333824.


In [10]:
# create map of São Paulo using latitude and longitude values
map_sp = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(sp_data['Latitude'], sp_data['Longitude'], sp_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sp)  
    
map_sp

### Foursquare

Now that we have the location of the neighborhoods, let's use Foursquare API to get info on venues in each neighborhood.

In [11]:
CLIENT_ID = 'X4AX3AVSPS23XZYFRYVDY02RPCZGQPHG3RYDW0HGFYZS1OIE' 
CLIENT_SECRET = 'I1EI1DGJUKRUJ1LR5CRJWAR0BIYRL0XXM3YIUS23CJLWTHPM' 
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [12]:
# Function to get the venues from the neighborhoods

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [13]:
sp_venues=getNearbyVenues(names=sp_data['Neighborhood'],latitudes=sp_data['Latitude'],longitudes=sp_data['Longitude'])
sp_venues.head()

Aricanduva
Carrão
Vila Formosa
Butantã
Morumbi
Raposo Tavares
Rio Pequeno
Vila Sônia
Campo Limpo
Capão Redondo
Vila Andrade
Cidade Dutra
Grajaú
Socorro
Cachoeirinha
Casa Verde
Limão
Cidade Ademar
Pedreira
Cidade Tiradentes
Ermelino Matarazzo
Ponte Rasa
Brasilândia
Freguesia do Ó
Lajeado
Guaianases
Cursino
Ipiranga
Sacomã
Itaim Paulista
Vila Curuçá
Cidade Líder
Itaquera
José Bonifácio
Parque do Carmo
Jabaquara
Jaçanã
Tremembé
Barra Funda
Jaguara
Jaguaré
Lapa
Perdizes
Vila Leopoldina
Jardim Ângela
Jardim São Luís
Água Rasa
Belém
Brás
Mooca
Pari
Tatuapé
Marsilac
Parelheiros
Artur Alvim
Cangaíba
Penha
Vila Matilde
Anhanguera
Perus
Alto de Pinheiros
Itaim Bibi
Jardim Paulista
Pinheiros
Jaraguá
Pirituba
São Domingos
Mandaqui
Santana
Tucuruvi
Campo Belo
Campo Grande
Santo Amaro
Iguatemi
São Rafael
São Mateus
São Miguel
Jardim Helena
Vila Jacuí
Sapopemba
Bela Vista
Bom Retiro
Cambuci
Consolação
Liberdade
República
Santa Cecília
Sé
Vila Guilherme
Vila Maria
Vila Medeiros
Moema
Saúde
Vila Marian

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aricanduva,-23.56771,-46.51025,O Pasteleiro,-23.568446,-46.509513,Food & Drink Shop
1,Aricanduva,-23.56771,-46.51025,Sodiê Doces,-23.569948,-46.508913,Dessert Shop
2,Aricanduva,-23.56771,-46.51025,Burlina Pet Shop,-23.567432,-46.506863,Pet Store
3,Aricanduva,-23.56771,-46.51025,Pães e Doces Rio das Pedras,-23.566978,-46.510838,Bakery
4,Aricanduva,-23.56771,-46.51025,X Personal Studio,-23.56821,-46.512536,Gym / Fitness Center


In [14]:
# getting the number of venues in each neighborhood
sp_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alto de Pinheiros,18,18,18,18,18,18
Anhanguera,3,3,3,3,3,3
Aricanduva,11,11,11,11,11,11
Artur Alvim,8,8,8,8,8,8
Barra Funda,86,86,86,86,86,86
Bela Vista,65,65,65,65,65,65
Belém,26,26,26,26,26,26
Bom Retiro,37,37,37,37,37,37
Brasilândia,24,24,24,24,24,24
Brás,42,42,42,42,42,42


In [15]:
print('There are {} uniques categories.'.format(len(sp_venues['Venue Category'].unique())))

There are 285 uniques categories.


In [16]:
# one hot encoding
sp_onehot = pd.get_dummies(sp_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sp_onehot['Neighborhood'] = sp_venues['Neighborhood'] 

sp_onehot.head()

Unnamed: 0,Acai House,Accessories Store,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baiano Restaurant,Bakery,Bank,Bar,Baseball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Bookstore,Borek Place,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Camera Store,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Quad,College Theater,Comedy Club,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cultural Center,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dive Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Empada House,Empanada Restaurant,Escape Room,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hardware Store,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotpot Restaurant,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Leather Goods Store,Lebanese Restaurant,Lingerie Store,Liquor Store,Lottery Retailer,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mental Health Office,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Nightclub,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paintball Field,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Piadineria,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool Hall,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Residential Building (Apartment / Condo),Rest Area,Restaurant,Road,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southeastern Brazilian Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapiocaria,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Neighborhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aricanduva
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aricanduva
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aricanduva
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aricanduva
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aricanduva


In [17]:
# group neighborhoods by mean of venues
sp_grouped = sp_onehot.groupby('Neighborhood').mean().reset_index()
sp_grouped

Unnamed: 0,Neighborhood,Acai House,Accessories Store,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Art Studio,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Auto Workshop,Automotive Shop,BBQ Joint,Baby Store,Bagel Shop,Baiano Restaurant,Bakery,Bank,Bar,Baseball Stadium,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bistro,Board Shop,Bookstore,Borek Place,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Camera Store,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Churrascaria,Circus,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,College Quad,College Theater,Comedy Club,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Credit Union,Creperie,Cultural Center,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Dive Shop,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Empada House,Empanada Restaurant,Escape Room,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Go Kart Track,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Range,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hardware Store,Health & Beauty Service,Health Food Store,Heliport,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotpot Restaurant,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Laundromat,Leather Goods Store,Lebanese Restaurant,Lingerie Store,Liquor Store,Lottery Retailer,Lounge,Market,Martial Arts School,Massage Studio,Mattress Store,Mediterranean Restaurant,Men's Store,Mental Health Office,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Miscellaneous Shop,Mobile Phone Shop,Molecular Gastronomy Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music School,Music Store,Music Venue,Nail Salon,Nightclub,Northeastern Brazilian Restaurant,Northern Brazilian Restaurant,Office,Optical Shop,Organic Grocery,Outdoors & Recreation,Outlet Store,Paella Restaurant,Paintball Field,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pet Café,Pet Store,Pharmacy,Piadineria,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool Hall,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Residential Building (Apartment / Condo),Rest Area,Restaurant,Road,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South American Restaurant,Southeastern Brazilian Restaurant,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Stationery Store,Steakhouse,Student Center,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapiocaria,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Toy / Game Store,Track,Trail,Train Station,Travel Agency,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Alto de Pinheiros,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anhanguera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aricanduva,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Artur Alvim,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barra Funda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.011628,0.0,0.0,0.023256,0.0,0.0,0.011628,0.011628,0.0,0.0,0.0,0.0,0.034884,0.0,0.081395,0.0,0.034884,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034884,0.0,0.023256,0.023256,0.011628,0.011628,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.034884,0.0,0.0,0.0,0.0,0.0,0.011628,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.011628,0.011628,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.011628,0.034884,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.023256,0.0,0.0,0.023256,0.0,0.081395,0.011628,0.0,0.0,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.0,0.034884,0.0,0.0,0.011628,0.0,0.011628,0.0,0.0,0.0,0.0,0.0,0.081395,0.0,0.0,0.0,0.0,0.0,0.0,0.034884,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011628,0.011628,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bela Vista,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061538,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.030769,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046154,0.030769,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.076923,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.015385,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.015385,0.0,0.0
6,Belém,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.115385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bom Retiro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.054054,0.0,0.027027,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.081081,0.0
8,Brasilândia,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.041667,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Brás,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.02381,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.071429,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.02381,0.0,0.047619,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.02381,0.0


In [18]:
# getting the top 5 venues of each neighborhood
num_top_venues = 5

for hood in sp_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sp_grouped[sp_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alto de Pinheiros----
               venue  freq
0              Plaza  0.17
1  Convenience Store  0.17
2         Restaurant  0.06
3               Café  0.06
4          Bookstore  0.06


----Anhanguera----
                  venue  freq
0  Brazilian Restaurant  0.33
1                  Lake  0.33
2            Restaurant  0.33
3                Office  0.00
4       Paintball Field  0.00


----Aricanduva----
                  venue  freq
0  Brazilian Restaurant  0.09
1   Arts & Crafts Store  0.09
2             Pet Store  0.09
3        Farmers Market  0.09
4  Fast Food Restaurant  0.09


----Artur Alvim----
            venue  freq
0          Bakery  0.25
1        Pharmacy  0.25
2  Farmers Market  0.12
3     Pizza Place  0.12
4     Flower Shop  0.12


----Barra Funda----
        venue  freq
0  Restaurant  0.08
1   Nightclub  0.08
2         Bar  0.08
3       Plaza  0.03
4      Bakery  0.03


----Bela Vista----
                venue  freq
0         Pizza Place  0.08
1                 Bar  0.

                  venue  freq
0                 Plaza  0.06
1  Fast Food Restaurant  0.06
2        Ice Cream Shop  0.06
3  Gym / Fitness Center  0.06
4                Bakery  0.06


----José Bonifácio----
                  venue  freq
0  Brazilian Restaurant  0.12
1  Gym / Fitness Center  0.12
2           Supermarket  0.12
3         Grocery Store  0.06
4                Bakery  0.06


----Lajeado----
            venue  freq
0          Bakery   0.4
1  Ice Cream Shop   0.4
2     Pizza Place   0.2
3      Acai House   0.0
4          Office   0.0


----Lapa----
                  venue  freq
0                Bakery  0.17
1         Grocery Store  0.08
2         Women's Store  0.08
3   Martial Arts School  0.08
4  Brazilian Restaurant  0.08


----Liberdade----
                 venue  freq
0  Japanese Restaurant  0.14
1        Grocery Store  0.05
2   Chinese Restaurant  0.05
3             Sake Bar  0.04
4     Ramen Restaurant  0.03


----Limão----
            venue  freq
0      Restaurant  0.14


4          Dessert Shop  0.08


----Vila Sônia----
                  venue  freq
0           Pizza Place  0.20
1                Bakery  0.08
2  Gym / Fitness Center  0.08
3        Ice Cream Shop  0.08
4             Pet Store  0.08


----Água Rasa----
                    venue  freq
0                  Bakery  0.08
1  Furniture / Home Store  0.08
2          Farmers Market  0.08
3    Brazilian Restaurant  0.08
4                Pharmacy  0.04




In [19]:
# Function to get the most common venues in a neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
# getting the top 10 venues in each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sp_grouped['Neighborhood']

for ind in np.arange(sp_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sp_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alto de Pinheiros,Plaza,Convenience Store,Trail,Tennis Court,Bar,Supermarket,Fast Food Restaurant,Restaurant,Market,Café
1,Anhanguera,Restaurant,Brazilian Restaurant,Lake,Yoga Studio,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market
2,Aricanduva,Arts & Crafts Store,Food & Drink Shop,Bakery,Brazilian Restaurant,Bank,Fast Food Restaurant,Grocery Store,Farmers Market,Gym / Fitness Center,Pet Store
3,Artur Alvim,Bakery,Pharmacy,Supermarket,Pizza Place,Farmers Market,Flower Shop,Yoga Studio,Farm,Fast Food Restaurant,Fish Market
4,Barra Funda,Bar,Nightclub,Restaurant,Fast Food Restaurant,Beer Bar,Martial Arts School,Bakery,Sandwich Place,Plaza,Brazilian Restaurant


In [21]:
#Function to get pet stores using Foursquare
def getPetStore(names, latitudes, longitudes,category, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
# category of pet store:
category='4bf58dd8d48988d100951735'
# getting pet data:
pet_data=getPetStore(names=sp_data['Neighborhood'],latitudes=sp_data['Latitude'],longitudes=sp_data['Longitude'],category=category)
pet_data.head()

Aricanduva
Carrão
Vila Formosa
Butantã
Morumbi
Raposo Tavares
Rio Pequeno
Vila Sônia
Campo Limpo
Capão Redondo
Vila Andrade
Cidade Dutra
Grajaú
Socorro
Cachoeirinha
Casa Verde
Limão
Cidade Ademar
Pedreira
Cidade Tiradentes
Ermelino Matarazzo
Ponte Rasa
Brasilândia
Freguesia do Ó
Lajeado
Guaianases
Cursino
Ipiranga
Sacomã
Itaim Paulista
Vila Curuçá
Cidade Líder
Itaquera
José Bonifácio
Parque do Carmo
Jabaquara
Jaçanã
Tremembé
Barra Funda
Jaguara
Jaguaré
Lapa
Perdizes
Vila Leopoldina
Jardim Ângela
Jardim São Luís
Água Rasa
Belém
Brás
Mooca
Pari
Tatuapé
Marsilac
Parelheiros
Artur Alvim
Cangaíba
Penha
Vila Matilde
Anhanguera
Perus
Alto de Pinheiros
Itaim Bibi
Jardim Paulista
Pinheiros
Jaraguá
Pirituba
São Domingos
Mandaqui
Santana
Tucuruvi
Campo Belo
Campo Grande
Santo Amaro
Iguatemi
São Rafael
São Mateus
São Miguel
Jardim Helena
Vila Jacuí
Sapopemba
Bela Vista
Bom Retiro
Cambuci
Consolação
Liberdade
República
Santa Cecília
Sé
Vila Guilherme
Vila Maria
Vila Medeiros
Moema
Saúde
Vila Marian

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aricanduva,-23.56771,-46.51025,Burlina Pet Shop,-23.567432,-46.506863,Pet Store
1,Aricanduva,-23.56771,-46.51025,Ao Passaredo Pet Shop,-23.565888,-46.511883,Pet Store
2,Aricanduva,-23.56771,-46.51025,Animal e Companhia,-23.566862,-46.507447,Pet Store
3,Aricanduva,-23.56771,-46.51025,Clínica Veterinária Animais & cia,-23.563318,-46.509354,Pet Store
4,Carrão,-23.54798,-46.53885,BB PetShop,-23.549625,-46.540639,Pet Store


In [94]:
pet_data.shape

(239, 7)

In [95]:
# create map of São Paulo using latitude and longitude values
map_sp = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(pet_data['Venue Latitude'], pet_data['Venue Longitude'], pet_data['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_sp)  
    
map_sp

## Methodology

In this project we will direct our efforts on detecting neighborhoods of São Paulo that have low pet store density and high population, considering that the population of pets is proportional to the population of people in the neighborhood.

In first step we have collected the required **data: location, venues of each neighborhood and pet store venues in each neighborhood** (according to Foursquare categorization).

Second step in our analysis will be to cluster the neighborhoods according to the types of venues, using **k-means clustering**. We will select a cluster that has the most promising areas: highest population, nearest to the city center and highest pet store density.

In third and final step we will select the most promising neighborhoods from the cluster: taking into consideration the number of pet stores (1-3 maximum) and the population.

## Analysis

### Clustering Neighborhoods

In [23]:
# set number of clusters
kclusters = 5

sp_grouped_clustering = sp_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([3, 1, 0, 2, 0, 0, 3, 0, 0, 1, 0, 1, 1, 0, 3, 0, 3, 2, 0, 0])

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [25]:
sp_merged = sp_data

In [26]:
# examine the dataframes before merge
print(sp_merged.shape)
print(neighborhoods_venues_sorted.shape)

(96, 3)
(95, 12)


In [27]:
# merge sp_grouped with neighborhoods to add latitude/longitude for each neighborhood
sp_merged = sp_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
sp_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aricanduva,-23.56771,-46.51025,0.0,Arts & Crafts Store,Food & Drink Shop,Bakery,Brazilian Restaurant,Bank,Fast Food Restaurant,Grocery Store,Farmers Market,Gym / Fitness Center,Pet Store
1,Carrão,-23.54798,-46.53885,0.0,Pharmacy,BBQ Joint,Dessert Shop,Restaurant,Burger Joint,Café,Supermarket,Steakhouse,Furniture / Home Store,Brazilian Restaurant
2,Vila Formosa,-23.56642,-46.5394,0.0,Clothing Store,Farmers Market,Bakery,Health & Beauty Service,Food & Drink Shop,Food,Furniture / Home Store,Northeastern Brazilian Restaurant,Scenic Lookout,Chocolate Shop
3,Butantã,-23.57089,-46.70968,0.0,Brazilian Restaurant,Bar,Bakery,Paper / Office Supplies Store,Pharmacy,Pizza Place,Martial Arts School,Food Truck,Dessert Shop,Hardware Store
4,Morumbi,-23.601,-46.71551,3.0,Café,Snack Place,Coffee Shop,Restaurant,Soccer Stadium,Sports Bar,Gym / Fitness Center,Athletics & Sports,Track,Stadium


In [28]:
sp_merged.dropna(axis=0,inplace=True)
sp_merged.shape
sp_merged['Cluster Labels']=sp_merged['Cluster Labels'].astype('int32')

In [29]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sp_merged['Latitude'], sp_merged['Longitude'], sp_merged['Neighborhood'], sp_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Examine Clusters:

In [30]:
sp_merged.loc[sp_merged['Cluster Labels'] == 0, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aricanduva,Arts & Crafts Store,Food & Drink Shop,Bakery,Brazilian Restaurant,Bank,Fast Food Restaurant,Grocery Store,Farmers Market,Gym / Fitness Center,Pet Store
1,Carrão,Pharmacy,BBQ Joint,Dessert Shop,Restaurant,Burger Joint,Café,Supermarket,Steakhouse,Furniture / Home Store,Brazilian Restaurant
2,Vila Formosa,Clothing Store,Farmers Market,Bakery,Health & Beauty Service,Food & Drink Shop,Food,Furniture / Home Store,Northeastern Brazilian Restaurant,Scenic Lookout,Chocolate Shop
3,Butantã,Brazilian Restaurant,Bar,Bakery,Paper / Office Supplies Store,Pharmacy,Pizza Place,Martial Arts School,Food Truck,Dessert Shop,Hardware Store
7,Vila Sônia,Pizza Place,Pet Store,Grocery Store,Gym / Fitness Center,Bakery,Ice Cream Shop,Bar,Burger Joint,Food Truck,Chinese Restaurant
8,Campo Limpo,Pizza Place,Gym,Plaza,Auto Workshop,Paintball Field,Nightclub,Chinese Restaurant,Food,Food & Drink Shop,Food Truck
10,Vila Andrade,Chocolate Shop,Dessert Shop,Pizza Place,Shopping Mall,Middle Eastern Restaurant,Market,Bakery,Restaurant,Movie Theater,Multiplex
11,Cidade Dutra,Dessert Shop,Bar,Pharmacy,Brazilian Restaurant,Cosmetics Shop,Snack Place,Sushi Restaurant,German Restaurant,Steakhouse,Burger Joint
13,Socorro,Pizza Place,Department Store,Comfort Food Restaurant,Soccer Field,Steakhouse,Grocery Store,Restaurant,BBQ Joint,Breakfast Spot,Dessert Shop
15,Casa Verde,Gym / Fitness Center,Bar,Restaurant,Burger Joint,Gym,Dance Studio,Bookstore,Sushi Restaurant,Music Venue,Martial Arts School


In [31]:
sp_merged.loc[sp_merged['Cluster Labels'] == 1, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Cachoeirinha,Brazilian Restaurant,Department Store,Pharmacy,Chocolate Shop,Athletics & Sports,Dessert Shop,Bus Stop,Furniture / Home Store,Café,Gift Shop
25,Guaianases,Ice Cream Shop,Arts & Crafts Store,Fried Chicken Joint,Food Truck,Cheese Shop,Sandwich Place,Gym Pool,Brazilian Restaurant,Clothing Store,Auto Garage
33,José Bonifácio,Brazilian Restaurant,Supermarket,Gym / Fitness Center,Restaurant,Bakery,Grocery Store,Warehouse Store,Ice Cream Shop,Bagel Shop,Outlet Store
35,Jabaquara,Brazilian Restaurant,Café,Burger Joint,Gym / Fitness Center,Chocolate Shop,Japanese Restaurant,Italian Restaurant,Park,Fried Chicken Joint,Restaurant
40,Jaguaré,Brazilian Restaurant,Music Venue,Grocery Store,Food Truck,Burger Joint,Fast Food Restaurant,Supermarket,Department Store,Plaza,Snack Place
43,Vila Leopoldina,Brazilian Restaurant,Food Truck,Tennis Court,Fruit & Vegetable Store,Soup Place,Restaurant,Buffet,Outlet Store,Flower Shop,Art Studio
48,Brás,Brazilian Restaurant,Italian Restaurant,Furniture / Home Store,Restaurant,Clothing Store,Shopping Mall,Pizza Place,Plaza,Snack Place,Buffet
50,Pari,Restaurant,Middle Eastern Restaurant,Brazilian Restaurant,Bar,Snack Place,Kids Store,South American Restaurant,Café,Chinese Restaurant,Chocolate Shop
56,Penha,Brazilian Restaurant,Gym / Fitness Center,Pharmacy,Bakery,Grocery Store,Ice Cream Shop,Department Store,Restaurant,Paper / Office Supplies Store,Coffee Shop
58,Anhanguera,Restaurant,Brazilian Restaurant,Lake,Yoga Studio,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market


In [32]:
sp_merged.loc[sp_merged['Cluster Labels'] == 2, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Capão Redondo,Pizza Place,Bakery,Gym,Health & Beauty Service,Men's Store,Farmers Market,Sandwich Place,Soup Place,Brazilian Restaurant,Convenience Store
17,Cidade Ademar,Gymnastics Gym,Pizza Place,Bakery,Music Venue,Yoga Studio,Flea Market,Farm,Farmers Market,Fast Food Restaurant,Fish Market
18,Pedreira,Pizza Place,Department Store,Bakery,Park,Soccer Field,Yoga Studio,Flea Market,Farm,Farmers Market,Fast Food Restaurant
24,Lajeado,Ice Cream Shop,Bakery,Pizza Place,Yoga Studio,Flower Shop,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market
30,Vila Curuçá,Pizza Place,Plaza,Supermarket,Bakery,Soccer Field,Gym / Fitness Center,Gaming Cafe,Gymnastics Gym,Event Space,Falafel Restaurant
39,Jaguara,Bakery,Spa,Gym / Fitness Center,Convenience Store,Food & Drink Shop,Sushi Restaurant,Pizza Place,Japanese Restaurant,Hostel,Fast Food Restaurant
54,Artur Alvim,Bakery,Pharmacy,Supermarket,Pizza Place,Farmers Market,Flower Shop,Yoga Studio,Farm,Fast Food Restaurant,Fish Market
64,Jaraguá,Gym,Bakery,Grocery Store,Food,Yoga Studio,Flea Market,Farmers Market,Fast Food Restaurant,Fish Market,Flower Shop
65,Pirituba,Bakery,Historic Site,Yoga Studio,Event Space,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop
73,Iguatemi,Bakery,Steakhouse,Fast Food Restaurant,Pizza Place,Theme Park Ride / Attraction,Comfort Food Restaurant,Coffee Shop,Restaurant,Falafel Restaurant,Farm


In [33]:
sp_merged.loc[sp_merged['Cluster Labels'] == 3, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Morumbi,Café,Snack Place,Coffee Shop,Restaurant,Soccer Stadium,Sports Bar,Gym / Fitness Center,Athletics & Sports,Track,Stadium
5,Raposo Tavares,Restaurant,Cafeteria,Gym / Fitness Center,Cosmetics Shop,Pharmacy,Snack Place,Café,Chocolate Shop,Fast Food Restaurant,Pastry Shop
6,Rio Pequeno,Bakery,Chocolate Shop,Park,Plaza,Pharmacy,Pet Store,Food & Drink Shop,Furniture / Home Store,Brazilian Restaurant,Restaurant
12,Grajaú,Ice Cream Shop,Food Truck,Farm,Soccer Stadium,Bakery,Rock Club,Fish Market,Falafel Restaurant,Farmers Market,Fast Food Restaurant
19,Cidade Tiradentes,Snack Place,Theater,Bus Station,Bakery,Fast Food Restaurant,Flea Market,BBQ Joint,Gym / Fitness Center,Pizza Place,Food & Drink Shop
20,Ermelino Matarazzo,Grocery Store,Ice Cream Shop,Pharmacy,Bakery,Plaza,Thrift / Vintage Store,Gym,Sporting Goods Shop,Liquor Store,Arts & Entertainment
29,Itaim Paulista,Dessert Shop,Bowling Alley,Bakery,BBQ Joint,Brazilian Restaurant,Food Truck,Shipping Store,Fast Food Restaurant,Supermarket,Gym
31,Cidade Líder,Dessert Shop,Pharmacy,Soccer Stadium,Furniture / Home Store,Liquor Store,Market,Gift Shop,Food Truck,Gym / Fitness Center,Pastelaria
32,Itaquera,Fast Food Restaurant,Bakery,Souvlaki Shop,Train Station,Brazilian Restaurant,Bus Station,Seafood Restaurant,Sandwich Place,Market,Bar
36,Jaçanã,Plaza,Ice Cream Shop,Pharmacy,Gym / Fitness Center,Fast Food Restaurant,Bakery,Food Stand,Fried Chicken Joint,Brazilian Restaurant,Soccer Field


In [34]:
sp_merged.loc[sp_merged['Cluster Labels'] == 4, sp_merged.columns[[0] + list(range(4, sp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,Parelheiros,Athletics & Sports,Yoga Studio,Event Space,Farm,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food


### Examine Pet Stores in Neighborhoods

In [36]:
# add markers to map
for lat, lng, label in zip(pet_data['Venue Latitude'], pet_data['Venue Longitude'], pet_data['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_clusters)  
    
map_clusters

In [46]:
pet_grouped=pet_data.groupby('Neighborhood').count() # counting the number of pet stores
pet_grouped=pet_grouped.merge(sp_merged,on='Neighborhood') # including cluster to the dataframe
pet_grouped=pet_grouped[['Neighborhood','Cluster Labels','Venue']] #filtering the columns of interest
pet_grouped=pet_grouped.merge(population_data,on='Neighborhood') # including the population
pet_grouped['Population']=pet_grouped['Population'].astype('float64') # changing the format of the population from string to float
pet_cluster=pet_grouped.groupby('Cluster Labels').sum()
pet_cluster

Unnamed: 0_level_0,Venue,Population
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1
0,163,4027.251
1,27,1212.261
2,7,1207.612
3,42,2081.285


### Cluster 0

Since Cluster 0 has the highest number of pet stores and the highest total population, we will continue this analysis with the neighborhoods of Cluster 0.

In [60]:
cluster_data = pet_grouped[pet_grouped['Cluster Labels']==0]
cluster_data

Unnamed: 0,Neighborhood,Cluster Labels,Venue,Population
0,Aricanduva,0,4,89.622
1,Barra Funda,0,3,14.383
2,Bela Vista,0,4,69.46
4,Bom Retiro,0,1,33.892
5,Butantã,0,1,54.196
8,Campo Belo,0,13,65.752
9,Campo Limpo,0,1,211.361
12,Carrão,0,5,83.281
13,Casa Verde,0,4,85.624
14,Cidade Dutra,0,3,196.36


To find the nearest neighborhoods to the center of the city we need to calculate the distance using haversine.

In [85]:
!pip install haversine
import haversine as hs

distance_data=pd.DataFrame(columns=['Neighborhood','Distance'])
cluster_data=cluster_data.merge(sp_merged,on='Neighborhood')
cluster_data=cluster_data[['Neighborhood','Latitude','Longitude']]

for neighborhood,lat,lon in zip(cluster_data['Neighborhood'],cluster_data['Latitude'],cluster_data['Longitude']):
    SP=(-23.5506507, -46.6333824)
    lat_lon=(lat,lon)
    d=hs.haversine(lat_lon,SP)
    distance_data=distance_data.append({'Neighborhood':neighborhood,'Distance':d},ignore_index=True)

distance_data.head()



Unnamed: 0,Neighborhood,Distance
0,Aricanduva,12.693023
1,Barra Funda,3.781646
2,Bela Vista,1.594185
3,Bom Retiro,3.001169
4,Butantã,8.095764


The nearest neighborhoods will be less than or equal to 6 km distance from the center of the city.

In [91]:
nearest_neighborhoods=pet_grouped[pet_grouped['Cluster Labels']==0]
nearest_neighborhoods=nearest_neighborhoods.merge(distance_data,on='Neighborhood')
nearest_neighborhoods=nearest_neighborhoods[nearest_neighborhoods['Distance']<=6]

nearest_neighborhoods

Unnamed: 0,Neighborhood,Cluster Labels,Venue,Population,Distance
1,Barra Funda,0,3,14.383,3.781646
2,Bela Vista,0,4,69.46,1.594185
3,Bom Retiro,0,1,33.892,3.001169
8,Casa Verde,0,4,85.624,4.938269
10,Consolação,0,9,57.365,2.472922
12,Ipiranga,0,5,106.865,5.252008
15,Jardim Paulista,0,7,88.692,3.695919
16,Liberdade,0,4,69.092,0.861862
20,Perdizes,0,5,111.161,4.128358
21,Pinheiros,0,11,65.364,5.506753


Finally, the top neighborhoods for a new pet store are the ones that have less than 4 stores.

In [92]:
top_neighborhoods = nearest_neighborhoods[nearest_neighborhoods['Venue']<4]
top_neighborhoods=top_neighborhoods.sort_values('Population',ascending=False)
top_neighborhoods

Unnamed: 0,Neighborhood,Cluster Labels,Venue,Population,Distance
39,Água Rasa,0,2,84.963,5.422305
3,Bom Retiro,0,1,33.892,3.001169
29,Sé,0,3,23.651,0.238769
1,Barra Funda,0,3,14.383,3.781646


## Results and Discussion

Our analysis shows that although there is a great number of pet stores in São Paulo, there are pockets of low pet store density fairly close to city center. Highest concentration of pet stores was detected in the Cluster 0, so we focused our attention on these neighborhoods.

After directing our attention to this more narrow area of interest, we first filtered the nearest neighborhoods to the city center and then the neighborhoods with less than four pet stores in the area. 

Result of all this is 4 potential neighborhoods for new pet store locations. Purpose of this analysis was to only provide info on areas close to São Paulo center but not crowded with existing pet stores - it is entirely possible that there is a very good reason for small number of pet stores in any of those areas, reasons which would make them unsuitable for a new pet store regardless of lack of competition in the area. Recommended neighborhoods should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion

The purpose of this project was to identify São Paulo neighborhoods close to center with low number of pet stores in order to aid stakeholders in narrowing down the search for optimal location for a new pet store. By calculating pet store density distribution from Foursquare data we have first identified general neighborhoods that justify further analysis, and then using population data found the top neighborhoods that satisfy 2 basic requirements.

Final decision on optimal pet store location will be made by stakeholders based on specific characteristics of the neighborhoods, for example, the population of pets in the area, attractiveness of each neighborhood (proximity to parks), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.