# Capstone Project – The Battle of Neighborhoods | Finding the Better Suited Neighborhood in Porto (Portugal)

## 1. Installing and importing libraries

### 1.1 Installing Follium and Geocoder

In [1]:
!pip install geocoder
!pip install folium
!pip install geopy
!pip install BeautifulSoup4

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 2.8MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/0c/67/915668d0e286caa21a1da82a85ffe3d20528ec7212777b43ccd027d94023/geopy-2.1.0-py3-none-any.whl (112kB)
[K     |████████████████████████████████| 112kB 4.3MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-n

### 1.2 Importing required libraries

In [2]:
import pandas as pd
import requests
import numpy as np
import geocoder
import folium
import requests 
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
import xml
import matplotlib.pyplot as plt
%matplotlib inline
import warnings

from pandas.io.json import json_normalize 
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim 
from bs4 import BeautifulSoup

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## 2. Data extraction an Wrangling

### 2.1 -Creating a DataFrame with a CSV tha contains the zip codes of all of Porto´s Parishes

In [56]:
#creates a dataframe, based on the infromation on a CSV file provided by the portuguese post offices
porto_zip_codes_df = pd.read_csv('Base de dados porto 2.csv', sep=';')

In [94]:
#Drops duplicates 
unique_zip_codes = pd.DataFrame(porto_zip_codes_df['Codigo Posta'].drop_duplicates())
unique_zip_codes.reset_index(drop=True,inplace=True)

In [95]:
unique_zip_codes

Unnamed: 0,Codigo Posta
0,4000
1,4049
2,4050
3,4099
4,4100
5,4149
6,4150
7,4169
8,4199
9,4200


In [96]:
#List with all of Porto´s Parishes
Parishes = pd.DataFrame(['Bonfim_1','Bonfim_2','Cedofeita_1','Cedofeita_2','Aldoar_1','Aldoar_2','Aldoar_3','Foz do Douro','Lordelo de Ouro','Paranhos_1','Bonfim_1','Paranhos_2','Cedofeita_3','Campanhã_1','Bonfim_2','Campanhã_2','Bonfim_3']) 

In [97]:
#Adds the list of Parishes to the dataframe with the unique zip codes
unique_zip_codes['Parishes'] = Parishes
unique_zip_codes.columns = ['Zip Code', 'Parishes']
unique_zip_codes

Unnamed: 0,Zip Code,Parishes
0,4000,Bonfim_1
1,4049,Bonfim_2
2,4050,Cedofeita_1
3,4099,Cedofeita_2
4,4100,Aldoar_1
5,4149,Aldoar_2
6,4150,Aldoar_3
7,4169,Foz do Douro
8,4199,Lordelo de Ouro
9,4200,Paranhos_1


In [98]:
#coverts the columns with zip codes in string formtat
for i,n in enumerate(unique_zip_codes['Zip Code']):
    unique_zip_codes.iloc[i,0] = str(unique_zip_codes.iloc[i,0])

In [99]:
unique_zip_codes

Unnamed: 0,Zip Code,Parishes
0,4000,Bonfim_1
1,4049,Bonfim_2
2,4050,Cedofeita_1
3,4099,Cedofeita_2
4,4100,Aldoar_1
5,4149,Aldoar_2
6,4150,Aldoar_3
7,4169,Foz do Douro
8,4199,Lordelo de Ouro
9,4200,Paranhos_1


In [100]:
unique_zip_codes.describe()

Unnamed: 0,Zip Code,Parishes
count,17,17
unique,17,15
top,4369,Bonfim_2
freq,1,2


### 2.2 Geting the coordinates of all of Porto's Parishes

In [101]:
#Function that allows us to retrieve the coordinates of a specific geographical location
def get_latilong(zip_codes):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Porto'.format(zip_codes))
        lati_long_coords = g.latlng
    return lati_long_coords
    
get_latilong('4250')

[41.173505000000034, -8.628429920999963]

In [102]:
#Gathering the coordinates of Porto's Parishes
zip_codes = unique_zip_codes['Zip Code']
coordinates = [get_latilong(zip_codes) for zip_codes in zip_codes.tolist()]
coordinates

[[41.151711748000025, -8.602319999999963],
 [41.14584000000008, -8.610809999999958],
 [41.153101055000036, -8.621014999999943],
 [41.14584000000008, -8.610809999999958],
 [41.16890500000005, -8.664942924999934],
 [41.14584000000008, -8.610809999999958],
 [41.15614500000004, -8.655310821999933],
 [41.14584000000008, -8.610809999999958],
 [41.14584000000008, -8.610809999999958],
 [41.17324000000008, -8.599796803999936],
 [41.14584000000008, -8.610809999999958],
 [41.173505000000034, -8.628429920999963],
 [41.14584000000008, -8.610809999999958],
 [41.15313000000003, -8.575648140999931],
 [41.14584000000008, -8.610809999999958],
 [41.169999878000056, -8.580749999999966],
 [41.14584000000008, -8.610809999999958]]

In [103]:
#adding two columns with the coordinates of Porto's parishes to the data frame containing the zip codes and parishes names
df_coordinates = pd.DataFrame(coordinates, columns=['Latitude','Longitude'])
unique_zip_codes['Latitude'] = df_coordinates['Latitude']
unique_zip_codes['Longitude'] = df_coordinates['Longitude']
unique_zip_codes.drop_duplicates(subset=['Latitude','Longitude'],keep='first',inplace=True)
unique_zip_codes.reset_index(drop=True,inplace=True)
unique_zip_codes

Unnamed: 0,Zip Code,Parishes,Latitude,Longitude
0,4000,Bonfim_1,41.151712,-8.60232
1,4049,Bonfim_2,41.14584,-8.61081
2,4050,Cedofeita_1,41.153101,-8.621015
3,4100,Aldoar_1,41.168905,-8.664943
4,4150,Aldoar_3,41.156145,-8.655311
5,4200,Paranhos_1,41.17324,-8.599797
6,4250,Paranhos_2,41.173505,-8.62843
7,4300,Campanhã_1,41.15313,-8.575648
8,4350,Campanhã_2,41.17,-8.58075


In [104]:
#Gets the coordinates of Porto through the geocode method of the geolocator
address = 'Porto'

geolocator = Nominatim(user_agent='Porto_explorer')
location = geolocator.geocode(address)
latitude_x = location.latitude
longitude_y = location.longitude
print('The Geograpical Co-ordinates of Porto,Portugal are {}, {}.'.format(latitude_x, longitude_y))

The Geograpical Co-ordinates of Porto,Portugal are 41.1494512, -8.6107884.


## 3. Mapping Porto

In [105]:
#Draws a map of the geographic coordinates of Porto's Parishes with Folium

map_Porto = folium.Map(location=[latitude_x, longitude_y], zoom_start=10)

for lat, lng, nei in zip(unique_zip_codes['Latitude'], unique_zip_codes['Longitude'], unique_zip_codes['Parishes']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Porto)  
    
map_Porto

In [106]:
address = 'Bonfim, Porto'

geolocator = Nominatim(user_agent='Porto_explorer')
location = geolocator.geocode(address)
latitude_Bonfim_1 = location.latitude
longitude_Bonfim_1 = location.longitude
print('The Geograpical Co-ordinate of Bonfim_1 are {}, {}.'.format(latitude_Bonfim_1, longitude_Bonfim_1))

The Geograpical Co-ordinate of Bonfim_1 are 41.1510697, -8.5939568.


## 4. Expolring the venues in each Parish of Porto 

### 4.1 Getting the information about the venues of Porto's Parishes with foursquare API and store this information in adataframe

In [107]:
# Create Client_ID and Client_Secret Objects
CLIENT_ID = 'QUCDYBVXTEH0DQ2G1Q0VGCHDEZILHFBUIHFSG0C3WRRQ3VTV' 
CLIENT_SECRET = 'ZGUBQ0X0GN0SU3PQMGUCON4WRG5TCZH13ONQXZBFSWT1WNYQ'
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: '+CLIENT_ID)
print('CLIENT_SECRET: '+CLIENT_SECRET)

Your credentails:
CLIENT_ID: QUCDYBVXTEH0DQ2G1Q0VGCHDEZILHFBUIHFSG0C3WRRQ3VTV
CLIENT_SECRET: ZGUBQ0X0GN0SU3PQMGUCON4WRG5TCZH13ONQXZBFSWT1WNYQ


In [108]:
#Creating URL object and sending a GET request to the Foursquare API
radius = 700 
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude_x, 
    longitude_y, 
    radius, 
    LIMIT)
results = requests.get(url).json()

In [109]:
venues=results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
nearby_venues.columns

  


Index(['referralId', 'reasons.count', 'reasons.items', 'venue.id',
       'venue.name', 'venue.location.address', 'venue.location.lat',
       'venue.location.lng', 'venue.location.labeledLatLngs',
       'venue.location.distance', 'venue.location.postalCode',
       'venue.location.cc', 'venue.location.city', 'venue.location.state',
       'venue.location.country', 'venue.location.formattedAddress',
       'venue.categories', 'venue.photos.count', 'venue.photos.groups',
       'venue.venuePage.id', 'venue.location.neighborhood',
       'venue.location.crossStreet'],
      dtype='object')

In [110]:
#Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [111]:
#cleans the json and structure it into a _pandas_ dataframe
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(5)

Unnamed: 0,name,categories,lat,lng
0,Avenida dos Aliados,Plaza,41.148302,-8.61104
1,Rivoli Cinema Hostel,Hostel,41.147622,-8.609883
2,Boa-Bao,Asian Restaurant,41.149274,-8.613109
3,Tábua Rasa,Portuguese Restaurant,41.149303,-8.612494
4,Cruel,Modern European Restaurant,41.149641,-8.612595


In [112]:
# 10 most frequent venues in a 700 meter distance of Porto's center
a=pd.Series(nearby_venues.categories)
a.value_counts()[:10]

Portuguese Restaurant    10
Hostel                    8
Bar                       7
Café                      4
Tapas Restaurant          4
Plaza                     3
Hotel                     3
Coffee Shop               3
Restaurant                2
Ice Cream Shop            2
Name: categories, dtype: int64

In [115]:
#Function that gets and stores the information the venues of porto's Parishes  
def getNearbyVenues(names, latitudes, longitudes, radius=700):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # making GET request
        venue_results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in venue_results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Parishes', 
                  'Parishes Latitude', 
                  'Parishes Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [118]:
#Nearby Venues in each parish
Parishes_venues = getNearbyVenues(names=unique_zip_codes['Parishes'],
                                   latitudes=unique_zip_codes['Latitude'],
                                   longitudes=unique_zip_codes['Longitude']
                                  )
Parishes_venues.head()

Unnamed: 0,Parishes,Parishes Latitude,Parishes Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bonfim_1,41.151712,-8.60232,Porto Spot Hostel,41.153593,-8.60526,Hostel
1,Bonfim_1,41.151712,-8.60232,The Artist Porto Hotel and Bistro,41.151033,-8.601361,Hotel
2,Bonfim_1,41.151712,-8.60232,Capela das Almas,41.149846,-8.605589,Church
3,Bonfim_1,41.151712,-8.60232,Chocolataria Equador,41.151758,-8.606364,Chocolate Shop
4,Bonfim_1,41.151712,-8.60232,Letraria - Craft Beer Garden Porto,41.148394,-8.604088,Brewery


In [119]:
print('There are {} Uniques Categories.'.format(len(Parishes_venues['Venue Category'].unique())))
Parishes_venues.groupby('Parishes').count()

There are 101 Uniques Categories.


Unnamed: 0_level_0,Parishes Latitude,Parishes Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Parishes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aldoar_1,20,20,20,20,20,20
Aldoar_3,31,31,31,31,31,31
Bonfim_1,100,100,100,100,100,100
Bonfim_2,100,100,100,100,100,100
Campanhã_1,8,8,8,8,8,8
Campanhã_2,16,16,16,16,16,16
Cedofeita_1,99,99,99,99,99,99
Paranhos_1,11,11,11,11,11,11
Paranhos_2,16,16,16,16,16,16


### 4.2 One Hot Encododing

In [121]:
# one hot encoding
Parishes_onehot = pd.get_dummies(Parishes_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Parishes_onehot['Parishes'] = Parishes_venues['Parishes'] 

# move neighborhood column to the first column
fixed_columns = [Parishes_onehot.columns[-1]] + list(Parishes_onehot.columns[:-1])
Parishes_onehot = Parishes_onehot[fixed_columns]
Parishes_grouped = Parishes_onehot.groupby('Parishes').mean().reset_index()
Parishes_onehot.head(5)

Unnamed: 0,Parishes,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boarding House,Breakfast Spot,Brewery,Bridge,Burger Joint,Bus Station,Café,Camera Store,Candy Store,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Creperie,Diner,Electronics Store,Empanada Restaurant,Escape Room,Exhibit,Fast Food Restaurant,Food & Drink Shop,Garden,Gas Station,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indie Movie Theater,Internet Cafe,Italian Restaurant,Japanese Restaurant,Laundromat,Light Rail Station,Liquor Store,Market,Martial Arts School,Mediterranean Restaurant,Mexican Restaurant,Modern European Restaurant,Monument / Landmark,Museum,Music Venue,Nightclub,Nightlife Spot,Paper / Office Supplies Store,Park,Pastry Shop,Pedestrian Plaza,Pharmacy,Pizza Place,Platform,Plaza,Pool,Portuguese Restaurant,Ramen Restaurant,Restaurant,Roof Deck,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Steakhouse,Supermarket,Sushi Restaurant,Syrian Restaurant,Tapas Restaurant,Tea Room,Theater,Trail,Train Station,Vegetarian / Vegan Restaurant,Wine Bar
0,Bonfim_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bonfim_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bonfim_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bonfim_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bonfim_1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [122]:
#Gets the top 5 venues in each parish
num_top_venues = 5
for hood in Parishes_grouped['Parishes']:
    print("---- "+hood+" ----")
    temp =Parishes_grouped[Parishes_grouped['Parishes'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

---- Aldoar_1 ----
                           venue  freq
0                           Café  0.30
1               Sushi Restaurant  0.15
2                          Plaza  0.05
3                         Bakery  0.05
4  Paper / Office Supplies Store  0.05


---- Aldoar_3 ----
                   venue  freq
0            Supermarket  0.10
1  Portuguese Restaurant  0.10
2                    Gym  0.06
3             Restaurant  0.06
4                 Bakery  0.06


---- Bonfim_1 ----
                   venue  freq
0  Portuguese Restaurant  0.15
1                   Café  0.09
2             Restaurant  0.09
3            Coffee Shop  0.06
4                  Hotel  0.06


---- Bonfim_2 ----
                   venue  freq
0  Portuguese Restaurant  0.10
1                    Bar  0.07
2                 Hostel  0.07
3       Tapas Restaurant  0.06
4                  Plaza  0.05


---- Campanhã_1 ----
                   venue  freq
0                   Café  0.25
1  Portuguese Restaurant  0.12
2         

In [131]:
#Function that returns the 10 most commun venues in each parish
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [133]:
# extracting the 10 most commun venues in each parish

import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Parishes']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

Parishes_venues_sorted = pd.DataFrame(columns=columns)
Parishes_venues_sorted['Parishes'] = Parishes_grouped['Parishes']

for ind in np.arange(Parishes_grouped.shape[0]):
    Parishes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Parishes_grouped.iloc[ind, :], num_top_venues)

Parishes_venues_sorted

Unnamed: 0,Parishes,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aldoar_1,Café,Sushi Restaurant,Pizza Place,Bakery,Pharmacy,Plaza,Portuguese Restaurant,Burger Joint,Sandwich Place,Paper / Office Supplies Store
1,Aldoar_3,Portuguese Restaurant,Supermarket,Gym,Fast Food Restaurant,Restaurant,Bakery,Park,Pool,Exhibit,Roof Deck
2,Bonfim_1,Portuguese Restaurant,Restaurant,Café,Coffee Shop,Hotel,Hostel,Bakery,Gourmet Shop,Grocery Store,Clothing Store
3,Bonfim_2,Portuguese Restaurant,Bar,Hostel,Tapas Restaurant,Plaza,Wine Bar,Ice Cream Shop,Breakfast Spot,Theater,Japanese Restaurant
4,Campanhã_1,Café,Park,Portuguese Restaurant,Grocery Store,Bakery,Garden,Shoe Store,Wine Bar,Escape Room,Coffee Shop
5,Campanhã_2,Bakery,Hotel,Platform,Chinese Restaurant,Electronics Store,Restaurant,Gas Station,Light Rail Station,Martial Arts School,Italian Restaurant
6,Cedofeita_1,Café,Bar,Portuguese Restaurant,Hotel,Italian Restaurant,Bakery,Restaurant,Hostel,Burger Joint,Plaza
7,Paranhos_1,College Cafeteria,Café,Coffee Shop,Plaza,Portuguese Restaurant,Bakery,Supermarket,Bar,Restaurant,Wine Bar
8,Paranhos_2,Bakery,Supermarket,Portuguese Restaurant,Trail,Park,Soccer Field,Bus Station,Asian Restaurant,BBQ Joint,Mediterranean Restaurant


## 5. Clustering Porto's Parishes with K-Means Clustering Approach

In [137]:
# Using K-Means to cluster neighborhood into 3 clusters
Parishes_grouped_clustering = Parishes_grouped.drop('Parishes', 1)
kmeans = KMeans(n_clusters=4, random_state=0).fit(Parishes_grouped_clustering)
kmeans.labels_

array([3, 1, 1, 1, 3, 0, 1, 1, 2], dtype=int32)

In [143]:
#insert clumns with cluster labels into the dataframe with the venue information
Parishes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Parishes_merged =unique_zip_codes.iloc[:16,:]

# merge parishes_grouped with parishes_data to add latitude/longitude for each neighborhood
Parishes_merged = Parishes_merged.join(Parishes_venues_sorted.set_index('Parishes'), on='Parishes')

Parishes_merged

Unnamed: 0,Zip Code,Parishes,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4000,Bonfim_1,41.151712,-8.60232,1,Portuguese Restaurant,Restaurant,Café,Coffee Shop,Hotel,Hostel,Bakery,Gourmet Shop,Grocery Store,Clothing Store
1,4049,Bonfim_2,41.14584,-8.61081,1,Portuguese Restaurant,Bar,Hostel,Tapas Restaurant,Plaza,Wine Bar,Ice Cream Shop,Breakfast Spot,Theater,Japanese Restaurant
2,4050,Cedofeita_1,41.153101,-8.621015,1,Café,Bar,Portuguese Restaurant,Hotel,Italian Restaurant,Bakery,Restaurant,Hostel,Burger Joint,Plaza
3,4100,Aldoar_1,41.168905,-8.664943,3,Café,Sushi Restaurant,Pizza Place,Bakery,Pharmacy,Plaza,Portuguese Restaurant,Burger Joint,Sandwich Place,Paper / Office Supplies Store
4,4150,Aldoar_3,41.156145,-8.655311,1,Portuguese Restaurant,Supermarket,Gym,Fast Food Restaurant,Restaurant,Bakery,Park,Pool,Exhibit,Roof Deck
5,4200,Paranhos_1,41.17324,-8.599797,1,College Cafeteria,Café,Coffee Shop,Plaza,Portuguese Restaurant,Bakery,Supermarket,Bar,Restaurant,Wine Bar
6,4250,Paranhos_2,41.173505,-8.62843,2,Bakery,Supermarket,Portuguese Restaurant,Trail,Park,Soccer Field,Bus Station,Asian Restaurant,BBQ Joint,Mediterranean Restaurant
7,4300,Campanhã_1,41.15313,-8.575648,3,Café,Park,Portuguese Restaurant,Grocery Store,Bakery,Garden,Shoe Store,Wine Bar,Escape Room,Coffee Shop
8,4350,Campanhã_2,41.17,-8.58075,0,Bakery,Hotel,Platform,Chinese Restaurant,Electronics Store,Restaurant,Gas Station,Light Rail Station,Martial Arts School,Italian Restaurant


### 5.1 Mapping the Clusters

In [144]:
kclusters = 10

In [146]:
# create map
map_Parishes_clusters = folium.Map(location=[latitude_x, longitude_y], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)
# add markers to the map

markers_colors = []
for lat, lon, nei , cluster in zip(Parishes_merged['Latitude'], 
                                   Parishes_merged['Longitude'], 
                                   Parishes_merged['Parishes'], 
                                   Parishes_merged['Cluster Labels']):
    label = folium.Popup(str(nei) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_Parishes_clusters)
       
map_Parishes_clusters

['#8000ff', '#4856fb', '#10a2f0', '#2adddd', '#62fbc4', '#9cfba4', '#d4dd80', '#ffa256', '#ff562c', '#ff0000']


### 5.2 Examining the clusters

#### Cluster 1

In [148]:
Parishes_merged.loc[Parishes_merged['Cluster Labels'] == 0, Parishes_merged.columns[[1] + list(range(5, Parishes_merged.shape[1]))]]

Unnamed: 0,Parishes,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Campanhã_2,Bakery,Hotel,Platform,Chinese Restaurant,Electronics Store,Restaurant,Gas Station,Light Rail Station,Martial Arts School,Italian Restaurant


#### Cluster 2

In [149]:
Parishes_merged.loc[Parishes_merged['Cluster Labels'] == 1, Parishes_merged.columns[[1] + list(range(5, Parishes_merged.shape[1]))]]

Unnamed: 0,Parishes,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bonfim_1,Portuguese Restaurant,Restaurant,Café,Coffee Shop,Hotel,Hostel,Bakery,Gourmet Shop,Grocery Store,Clothing Store
1,Bonfim_2,Portuguese Restaurant,Bar,Hostel,Tapas Restaurant,Plaza,Wine Bar,Ice Cream Shop,Breakfast Spot,Theater,Japanese Restaurant
2,Cedofeita_1,Café,Bar,Portuguese Restaurant,Hotel,Italian Restaurant,Bakery,Restaurant,Hostel,Burger Joint,Plaza
4,Aldoar_3,Portuguese Restaurant,Supermarket,Gym,Fast Food Restaurant,Restaurant,Bakery,Park,Pool,Exhibit,Roof Deck
5,Paranhos_1,College Cafeteria,Café,Coffee Shop,Plaza,Portuguese Restaurant,Bakery,Supermarket,Bar,Restaurant,Wine Bar


#### Cluster 3

In [150]:
Parishes_merged.loc[Parishes_merged['Cluster Labels'] == 2, Parishes_merged.columns[[1] + list(range(5, Parishes_merged.shape[1]))]]

Unnamed: 0,Parishes,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Paranhos_2,Bakery,Supermarket,Portuguese Restaurant,Trail,Park,Soccer Field,Bus Station,Asian Restaurant,BBQ Joint,Mediterranean Restaurant


#### Cluster 4

In [151]:
Parishes_merged.loc[Parishes_merged['Cluster Labels'] == 3, Parishes_merged.columns[[1] + list(range(5, Parishes_merged.shape[1]))]]

Unnamed: 0,Parishes,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Aldoar_1,Café,Sushi Restaurant,Pizza Place,Bakery,Pharmacy,Plaza,Portuguese Restaurant,Burger Joint,Sandwich Place,Paper / Office Supplies Store
7,Campanhã_1,Café,Park,Portuguese Restaurant,Grocery Store,Bakery,Garden,Shoe Store,Wine Bar,Escape Room,Coffee Shop


## 6. Conclusion

In this project, we used the k-means cluster algorithm to group Porto's parishes into 4 different clusters. Most parishes were grouped in cluster number 3 (Bonfim_1, Bonfim_2, Cedofeita_1, Aldoar_1, Aldoar_2), which means that most parishes of Porto are very similar when it comes to its venues. The most frequent venues in this neighborhoods are Portuguese Restaurant, Cafés, Bars, Supermarkets, suchi restaurants and hostels. 

Our algorithm also revealed a cluster which encapsulated two parishes Paranhos_2 and Camapanhã_2. These Parishes are geographically very close, which may explain the similarities between this two neighborhoods. These most frequent venues in this cluster are bakeries, hoteis, bus Station, supermarkets.

At last, our algorithm found two clusters composed by one single parish. This suggests that this two parishes are quit unique, in comparison with other parishes, when it comes to its venues. The second cluster was composed by Paranhos_1, which is a parish mainly populated by college cafeterias, cafés and portuguese restaurants; and the fourth cluster was composed by Campanhã_1, which is mainly populated by gardens, parks, and cafes. 

## 7. Link Publication

https://www.linkedin.com/pulse/final-report-capstone-project-battle-neighborhoods-finding-monteiro/?published=t