<a href="https://colab.research.google.com/github/mourodrigo/dataScienceCapstone/blob/master/The_Battle_of_Neighborhoods_P2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in New York City</font></h1>

## Introduction

In this lab, you will learn how to convert addresses into their equivalent latitude and longitude values. Also, you will use the Foursquare API to explore neighborhoods in New York City. You will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. You will use the *k*-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in New York City and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Download and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in New York City</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    
</font>
</div>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 

Luckily, this dataset exists for free on the web. Feel free to try to find this dataset on your own, but here is the link to the dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

For your convenience, I downloaded the files and placed it on the server, so you can simply run a `wget` command and access the data. So let's go ahead and do that.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [0]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [0]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [0]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Take a look at the empty dataframe to confirm that the columns are as intended.

In [6]:
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [0]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [8]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


And make sure that the dataset has all 5 boroughs and 306 neighborhoods.

In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [10]:
city_name = 'Florianopolis, SC'

geolocator = Nominatim(user_agent="foursquare_explorer")
location = geolocator.geocode(city_name)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Florianopolis are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Florianopolis are -27.5973002, -48.5496098.


#### Create a map of New York with neighborhoods superimposed on top.

In [11]:
# create map of Manhattan using latitude and longitude values
city_map = folium.Map(location=[latitude, longitude], zoom_start=11)

city_map

Preparing the dataframe structure

Let's get the geographical coordinates of Florianopolis Neighborhoods.

In [12]:
!pip install wikipedia

import wikipedia
import requests
from bs4 import BeautifulSoup as bs
import time
import numpy as np
import pandas as pd

Collecting wikipedia
  Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-cp36-none-any.whl size=11686 sha256=b6a6ad39575f9f93c08a417442d197ead227f60c1e72ffd739d92c94cdb4a513
  Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


In [13]:
wiki = requests.get('https://pt.wikipedia.org/wiki/Lista_de_distritos_e_bairros_de_Florian%C3%B3polis')
soup = bs(wiki.text, 'lxml') 
table = soup.findAll("table",class_="wikitable")[1]
neighborhood_names = []
for items in table.find_all("tr")[:-1]:
    data = [' '.join(item.text.split()) for item in items.find_all(['th','td'])]
    neighborhood_names.append(data[1])

neighborhood_names.remove(neighborhood_names[0])
neighborhood_names

['Centro',
 'Capoeiras',
 'Trindade',
 'Agronômica',
 'Saco dos Limões',
 'Coqueiros',
 'Monte Cristo',
 'Jardim Atlântico',
 'Itacorubi',
 'Costeira do Pirajubaé',
 'Capivari',
 'Tapera da Base',
 'Estreito',
 'Monte Verde',
 'Balneário',
 'São João do Rio Vermelho',
 'Canto',
 'Abraão',
 'Santa Mônica',
 'Lagoa',
 'Saco Grande',
 'Córrego Grande',
 'Canasvieiras',
 'Pantanal',
 'Coloninha',
 'Barra da Lagoa',
 'Carianos',
 'José Mendes',
 'Ingleses Centro',
 'João Paulo',
 'Campeche Leste',
 'Campeche Sul',
 'Rio Tavares Central',
 'Santinho',
 'Ponta das Canas',
 'Vargem do Bom Jesus',
 'Armação',
 'Cachoeira do Bom Jesus Leste',
 'Pântano do Sul',
 'Itaguaçu',
 'Jurere Leste',
 'Campeche Norte',
 'Vargem Grande',
 'Campeche Central',
 'Ressacada',
 'Morro das Pedras',
 'Alto Ribeirão Leste',
 'Alto Ribeirão',
 'Ribeirão da Ilha',
 'Santo Antônio',
 'Sambaqui',
 'Ingleses Sul',
 'Bom Abrigo',
 'Jurere Oeste',
 'Porto da Lagoa',
 'Cachoeira do Bom Jesus',
 'Rio Tavares do Norte',
 'P

In [14]:
#preparing neighborhoods dataframe
column_names = ['Name', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Name,Latitude,Longitude


In [15]:
print("Checking for neighborhoods latitude/longitude on foursquare API...")

for name in neighborhood_names:
  geolocator = Nominatim(user_agent="foursquare_explorer")
  location = geolocator.geocode(name+","+city_name)
  #print(location)
  if location is not None:
    latitude = location.latitude
    longitude = location.longitude
    #print('The geograpical coordinate of {} are {}, {}.'.format(name, latitude, longitude))
    neighborhoods = neighborhoods.append({'Name': name,
                                            'Latitude':latitude,
                                            'Longitude':longitude}, ignore_index=True)
  
print("Done!")

Checking for neighborhoods latitude/longitude on foursquare API...
Done!


Now we have the coordinates of the neighborhoods.

In [16]:
neighborhoods.head()

Unnamed: 0,Name,Latitude,Longitude
0,Centro,-27.592925,-48.55006
1,Capoeiras,-27.597333,-48.590008
2,Trindade,-27.589383,-48.5224
3,Agronômica,-27.578145,-48.535717
4,Saco dos Limões,-27.608268,-48.534343


In [17]:
# add markers to map
for lat, lng, label in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(city_map)  
    
city_map

In [18]:
#get list of beaches from wikipedia
wiki = requests.get('https://pt.wikipedia.org/wiki/Lista_de_praias_de_Florianópolis')
soup = bs(wiki.text, 'lxml') 
table = soup.findAll("li")

#filtering
beach_names=[]
for item in table:
      a = item.find("a")
      if a is not None:
        title = a.get('title')
        if title is not None and title.find("Prai") >= 0:
          if title.endswith(' (página não existe)'):
              newTitle = title[:-20]
              beach_names.append(newTitle)
          else:
              beach_names.append(title)

#removing the last unecessary item which is still on the list
del beach_names[-1]
beach_names

['Praia de Caieira da Barra do Sul',
 'Praia do Ribeirão da Ilha',
 'Praia da Tapera',
 'Praia de Sambaqui',
 'Praia de Santo Antônio de Lisboa',
 'Praia do Cacupé',
 'Praia do Santinho',
 'Praia dos Ingleses',
 'Praia da Lagoinha',
 'Praia Brava (Florianópolis)',
 'Praia da Ponta das Canas',
 'Praia Cachoeira do Bom Jesus',
 'Praia de Canasvieiras',
 'Praia de Canajurê',
 'Praia de Jurerê',
 'Praia do Forte (Florianópolis)',
 'Praia da Daniela',
 'Praia do Pontal',
 'Praia da Joaquina',
 'Praia do Gravatá',
 'Praia Mole',
 'Praia da Galheta',
 'Prainha (Barra da Lagoa)',
 'Praia da Barra da Lagoa',
 'Praia da Lagoa da Conceição',
 'Praia de Moçambique',
 'Praia do Campeche',
 'Praia da Ilha do Campeche',
 'Praia do Morro das Pedras',
 'Praia do Caldeirão',
 'Praia da Armação do Pântano do Sul',
 'Praia do Matadeiro',
 'Praia da Lagoinha do Leste',
 'Praia do Pântano do Sul',
 'Praia dos Açores',
 'Praia da Solidão',
 'Praia do Saquinho',
 'Praia de Naufragados',
 'Praia de Bom Abrigo'

In [19]:
#beach dataframe preparation
beaches = pd.DataFrame(columns=['Beach', 'Latitude', 'Longitude'])
beaches

Unnamed: 0,Beach,Latitude,Longitude


In [20]:
print("Checking for beaches latitude/longitude on foursquare API...")

for name in beach_names:
  geolocator = Nominatim(user_agent="foursquare_explorer")
  location = geolocator.geocode(name+", "+city_name)
  #print(location)
  if location is not None:
    latitude = location.latitude
    longitude = location.longitude
#     print('The geograpical coordinate of {} are {}, {}.'.format(name, latitude, longitude))
    beaches = beaches.append({'Beach': name,
                                            'Latitude':latitude,
                                            'Longitude':longitude}, ignore_index=True)
  
print("Done!")

Checking for beaches latitude/longitude on foursquare API...
Done!


As we did with all of New York City, let's visualizat Manhattan the neighborhoods in it.

In [21]:
# add markers to map
for lat, lng, label in zip(beaches['Latitude'], beaches['Longitude'], beaches['Beach']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(city_map)  
    
city_map

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

#### Which neighborhood has more beaches?

In [22]:
from scipy.spatial.distance import cdist

def closest_point(point, points):
    """ Find closest point from a list of points. """
    return points[cdist([point], points).argmin()]

def match_value(df, col1, x, col2):
    """ Match value x from col1 row to value in col2. """
    return df[df[col1] == x][col2].values[0]

df1 = neighborhoods
df2 = beaches

df1['point'] = [(x, y) for x,y in zip(df1['Latitude'], df1['Longitude'])]
df2['point'] = [(x, y) for x,y in zip(df2['Latitude'], df2['Longitude'])]

df2['closest'] = [closest_point(x, list(df1['point'])) for x in df2['point']]
df2['Neighborhood'] = [match_value(df1, 'point', x, 'Name') for x in df2['closest']]

df2 = df2.drop(columns=['point', 'closest'])
df1 = df1.drop(columns=['point'])
neighborhoods = neighborhoods.drop(columns=['point'])
df2.head()

Unnamed: 0,Beach,Latitude,Longitude,Neighborhood
0,Praia da Tapera,-27.688573,-48.568508,Tapera
1,Praia do Santinho,-27.458246,-48.374768,Santinho
2,Praia dos Ingleses,-27.429447,-48.396534,Ingleses Centro
3,Praia da Lagoinha,-27.388978,-48.42637,Lagoinha do Norte
4,Praia Brava (Florianópolis),-27.397613,-48.415825,Praia Brava


In [23]:
#now we get the counts for each neighborhood
counts = df2.groupby(['Neighborhood']).size()
counts

Neighborhood
Armação                     2
Barra da Lagoa              3
Caiacanga                   1
Campeche Leste              1
Canasvieiras                1
Canto dos Araçás            2
Coqueiros                   2
Daniela                     3
Ingleses Centro             1
Itaguaçu                    1
Jurere Leste                2
Lagoinha do Norte           2
Morro das Pedras            1
Porto da Lagoa              1
Praia Brava                 2
Pântano do Sul              4
Santinho                    1
São João do Rio Vermelho    2
Tapera                      1
dtype: int64

In [0]:
#adding the BeachesCount column to dataframe with default 0
neighborhoods.insert(3, 'BeachesCount', 0)

In [0]:
neighborhoods.set_index('Name', inplace=True) #Name is the index

In [0]:
# Assigning the value of the count list to each neighborhood
for n in counts.index:
  neighborhoods.loc[n,'BeachesCount'] = counts[n]

In [27]:
neighborhoods.sort_values(by='BeachesCount', ascending=False).head()

Unnamed: 0_level_0,Latitude,Longitude,BeachesCount
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Pântano do Sul,-27.779951,-48.507319,4
Daniela,-27.448954,-48.53125,3
Barra da Lagoa,-27.574778,-48.425835,3
São João do Rio Vermelho,-27.483806,-48.409638,2
Canto dos Araçás,-27.594075,-48.46094,2


In [104]:
winner = neighborhoods.sort_values(by='BeachesCount', ascending=False)[:1]
winner

Unnamed: 0_level_0,Latitude,Longitude,BeachesCount
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Pântano do Sul,-27.779951,-48.507319,4


#### Define Foursquare Credentials and Version

In [28]:
CLIENT_ID = 'XCZZBWCOGQ5BJ5KD3NOGA4J0LW52XU25S4IJCS5Z5GRJOGCV' # your Foursquare ID
CLIENT_SECRET = '5JWA0FLEQE0OEDHAJRXKDD2HPWBSUH5LKUASZXDPP13Q0OQ5' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XCZZBWCOGQ5BJ5KD3NOGA4J0LW52XU25S4IJCS5Z5GRJOGCV
CLIENT_SECRET:5JWA0FLEQE0OEDHAJRXKDD2HPWBSUH5LKUASZXDPP13Q0OQ5


#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [29]:
winner = neighborhoods.sort_values(by='BeachesCount', ascending=False).iloc[0]
winner

Latitude       -27.779951
Longitude      -48.507319
BeachesCount     4.000000
Name: Pântano do Sul, dtype: float64

Get the neighborhood's latitude and longitude values.

In [30]:
neighborhood_latitude = winner['Latitude'] # neighborhood latitude value
neighborhood_longitude = winner['Longitude'] # neighborhood longitude value
neighborhood_name = neighborhoods.sort_values(by='BeachesCount', ascending=False).index[0]
 # neighborhood name


print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Pântano do Sul are -27.7799509, -48.5073187.


#### Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [0]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

Send the GET request and examine the resutls

In [32]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d9bd443bcbf7a002c97dac2'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4ef07f8c9adf96dc8f9fca7b-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/hikingtrail_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d159941735',
         'name': 'Trail',
         'pluralName': 'Trails',
         'primary': True,
         'shortName': 'Trail'}],
       'id': '4ef07f8c9adf96dc8f9fca7b',
       'location': {'address': 'Manoel Vidal',
        'cc': 'BR',
        'city': 'Florianópolis',
        'country': 'Brasil',
        'distance': 106,
        'formattedAddress': ['Manoel Vidal', 'Florianópolis, SC', 'Brasil'],
        'labeledLatLngs': [{'label': 'display',
          'lat': -27.77899408630737,
          'lng': -

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [0]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Trilha Da Lagoinha Do Leste,Trail,-27.778994,-48.507371
1,Praia do Pântano do Sul,Beach,-27.78184,-48.508255
2,Bar do Arante,Seafood Restaurant,-27.781907,-48.508127
3,Bar do Vadinho,Seafood Restaurant,-27.782623,-48.506746
4,Restaurante Mandala,Seafood Restaurant,-27.782081,-48.507639


And how many venues were returned by Foursquare?

In [35]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

10 venues were returned by Foursquare.


## Finding some place to live

In [134]:
# import requests

import unicodedata

winnerNormalizedName = unicodedata.normalize('NFKD', winner.index[0].replace(" ", "-")).encode('ascii','ignore')
winnerNormalizedName = str(winnerNormalizedName).replace("b'","").replace("'","").lower()

request = requests.get('https://sc.olx.com.br/florianopolis-e-regiao/sul/'+winnerNormalizedName+'/imoveis/venda')
soup = bs(request.text, 'html.parser')
ads = soup.findAll("a",class_="OLXad-list-link")

# define the dataframe columns
placesToLive = pd.DataFrame(columns=['Title','Details', 'Price', 'Image', 'Link'])

#creating a function to remove text tags
def cleanText(text):
  return text.replace("\t", "").replace("\n", "")

for ad in ads:
  title = ad.find("h2",class_="OLXad-list-title").find(text=True)
  img = ad.find("img",class_="image")['src']
  details = ad.find("p",class_="text detail-specific").find(text=True)
  price = ad.find("p",class_="OLXad-list-price").find(text=True)
  href = ad['href']
  row = {'Title': cleanText(title),
                        'Details': cleanText(details),
                        'Price': price,
                        'Link': href,
                        'Image': img}
  placesToLive = placesToLive.append(row,ignore_index=True)
  
placesToLive



Unnamed: 0,Title,Details,Price,Image,Link
0,Casa à venda com 2 dormitórios em Rio vermelho...,2 quartos | 62 m² | 2 vagas,R$ 138.000,https://img.olx.com.br/thumbs256x256/80/800903...,https://sc.olx.com.br/florianopolis-e-regiao/i...
1,"Apartamento com 2 dormitórios à venda, por R$ ...",2 quartos | 73 m² | 1 vaga,R$ 430.000,https://img.olx.com.br/thumbs256x256/73/736904...,https://sc.olx.com.br/florianopolis-e-regiao/i...
2,"Ampla casa pântano do sul - florianópolis, 600...",3 quartos | 250 m²,R$ 690.000,https://img.olx.com.br/thumbs256x256/73/730903...,https://sc.olx.com.br/florianopolis-e-regiao/i...
3,Apartamento à venda com 2 dormitórios em Açore...,2 quartos | 75 m² | 1 vaga,R$ 485.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
4,"Apartamento residencial à venda, pântano do su...",2 quartos | 73 m² | 1 vaga,R$ 560.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
5,Apartamento com 2 dormitórios à venda por R$ 5...,2 quartos | 97 m² | Condomínio: R$ 339 | 1 vaga,R$ 560.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
6,1696 - Casa vista Mar na praia dos Açores - Ac...,3 quartos | 230 m² | 1 vaga,R$ 467.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
7,"1695 - Linda demais! Casa na Praia dos Açores,...",2 quartos | 370 m² | 2 vagas,R$ 280.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
8,Belíssima casa de Praia,4 quartos | 265 m² | 5 vagas,R$ 1.500.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...
9,Ms5 = Apartamento 2 dorm 1 km da praia com pis...,2 quartos,R$ 270.000,https://static.bn-static.com/img-49626/desktop...,https://sc.olx.com.br/florianopolis-e-regiao/i...


In [133]:
import matplotlib.pyplot as plt
# read the image
im = plt.imread(placesToLive.loc[0,"Image"])
# show the image
plt.imshow(im)
plt.show()

ValueError: ignored

## 2. Explore Neighborhoods in Manhattan

#### Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    print("Looking for Venues in ",names.size," places")
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *manhattan_venues*.

In [37]:
# type your answer here

manhattan_venues = getNearbyVenues(names=neighborhoods.index,
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )



Looking for Venues in  75  places


Double-click __here__ for the solution.
<!-- The correct answer is:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
--> 

#### Let's check the size of the resulting dataframe

In [38]:
print(manhattan_venues.shape)
manhattan_venues.head()

(1115, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro,-27.592925,-48.55006,Porto da Ilha Hotel,-27.592261,-48.551944,Hotel
1,Centro,-27.592925,-48.55006,Arbor Café,-27.593294,-48.55125,Coffee Shop
2,Centro,-27.592925,-48.55006,Bendito Fruto,-27.594434,-48.551179,Café
3,Centro,-27.592925,-48.55006,Ponto do Pão,-27.592904,-48.551332,Bakery
4,Centro,-27.592925,-48.55006,Art Gourmet,-27.591754,-48.549769,Buffet


Let's check how many venues were returned for each neighborhood

In [39]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abraão,22,22,22,22,22,22
Agronômica,5,5,5,5,5,5
Alto Ribeirão,5,5,5,5,5,5
Alto Ribeirão Leste,5,5,5,5,5,5
Armação,12,12,12,12,12,12
Açores,19,19,19,19,19,19
Balneário,30,30,30,30,30,30
Barra da Lagoa,42,42,42,42,42,42
Barra do Sambaqui,3,3,3,3,3,3
Base Aérea,1,1,1,1,1,1


#### Let's find out how many unique categories can be curated from all the returned venues

In [40]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 185 uniques categories.


<a id='item3'></a>

## 3. Analyze Each Neighborhood

In [41]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Bookstore,Botanical Garden,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Café,Campground,Candy Store,Cheese Shop,Chocolate Shop,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cupcake Shop,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Empada House,Event Space,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Goiano Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Laundromat,Lighthouse,Liquor Store,Lottery Retailer,Market,Martial Arts Dojo,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Mountain,Museum,Music Venue,Nail Salon,Newsstand,Nightclub,Organic Grocery,Paintball Field,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Racecourse,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Southern Brazilian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Water Park,Waterfront,Whisky Bar,Wine Bar,Women's Store
0,Centro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Centro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Centro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Centro,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Centro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [42]:
manhattan_onehot.shape

(1115, 186)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [43]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Argentinian Restaurant,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beach Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Bookstore,Botanical Garden,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Buffet,Burger Joint,Café,Campground,Candy Store,Cheese Shop,Chocolate Shop,Church,Circus,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cupcake Shop,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dive Bar,Empada House,Event Space,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Gluten-free Restaurant,Goiano Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Laundromat,Lighthouse,Liquor Store,Lottery Retailer,Market,Martial Arts Dojo,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Motorcycle Shop,Mountain,Museum,Music Venue,Nail Salon,Newsstand,Nightclub,Organic Grocery,Paintball Field,Paper / Office Supplies Store,Park,Pastelaria,Pastry Shop,Pedestrian Plaza,Performing Arts Venue,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Public Art,Racecourse,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Southern Brazilian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Taco Place,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Video Store,Water Park,Waterfront,Whisky Bar,Wine Bar,Women's Store
0,Abraão,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.090909,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.181818,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0
1,Agronômica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alto Ribeirão,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Alto Ribeirão Leste,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Armação,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Açores,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Balneário,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.1,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.066667,0.033333,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
7,Barra da Lagoa,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.095238,0.02381,0.071429,0.0,0.02381,0.0,0.02381,0.0,0.0,0.0,0.071429,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Barra do Sambaqui,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Base Aérea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [44]:
manhattan_grouped.shape

(73, 186)

#### Let's print each neighborhood along with the top 5 most common venues

In [45]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abraão----
                venue  freq
0        Burger Joint  0.18
1              Bakery  0.09
2  Salon / Barbershop  0.05
3           Pet Store  0.05
4            Pharmacy  0.05


----Agronômica----
                  venue  freq
0            Restaurant   0.2
1              Mountain   0.2
2  Gym / Fitness Center   0.2
3             Bookstore   0.2
4           Candy Store   0.2


----Alto Ribeirão----
                  venue  freq
0                 Diner   0.2
1  Brazilian Restaurant   0.2
2                Bakery   0.2
3          Soccer Field   0.2
4         Hot Dog Joint   0.2


----Alto Ribeirão Leste----
                  venue  freq
0                 Diner   0.2
1  Brazilian Restaurant   0.2
2                Bakery   0.2
3          Soccer Field   0.2
4         Hot Dog Joint   0.2


----Armação----
                venue  freq
0  Seafood Restaurant  0.08
1         Pizza Place  0.08
2       Hot Dog Joint  0.08
3                Pier  0.08
4     Bed & Breakfast  0.08


----Açores----

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [50]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Abraão,Burger Joint,Bakery,Scenic Lookout
1,Agronômica,Candy Store,Restaurant,Mountain
2,Alto Ribeirão,Diner,Brazilian Restaurant,Soccer Field
3,Alto Ribeirão Leste,Diner,Brazilian Restaurant,Soccer Field
4,Armação,Bed & Breakfast,Convenience Store,Seafood Restaurant
5,Açores,Bakery,Vegetarian / Vegan Restaurant,Italian Restaurant
6,Balneário,Burger Joint,Pet Store,Harbor / Marina
7,Barra da Lagoa,Seafood Restaurant,Beach,Pizza Place
8,Barra do Sambaqui,Racecourse,Shopping Plaza,Bar
9,Base Aérea,Hotel,Women's Store,Food


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [0]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# # run k-means clustering
# kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# # check cluster labels generated for each row in the dataframe
# kmeans.labels_[0:10] 

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [49]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

NameError: ignored

Finally, let's visualize the resulting clusters

In [0]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>