# Capstone Project - The Battle of the Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM/Coursera
### Downtown vs Suburbs comparison for the city of Santander 
#### by Angel San Emeterio Herrera


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project our goal will be to identify which streets in the furthest areas of the city of Santander, a medium size town in the north coast of Spain, are similar to the ones belonging to innermost downtown, in terms of venues and services. 

In other words, we will make a comparison between two zones of the city of Santander: most internal downtown (identify as the first postal code of the city, 39001) and most external suburbs (identify as the last postal code, 39012), in order to pinpoint which areas of the latter would be similar to the former, according to the closeness of settings and places of interest.

Given that the urban development in Spain, and more precisely in Santander, focus services and venues on the downtown, while leaving certain zones in the suburbs lacking places of interest, what we will try to do is finding out to what extent this is true for the case of Santander. 

That is to say, what we actually want to know is which streets in the most exterior area of the city of Santander are similar to the ones in the very center of the town.


## Data <a name="data"></a>

Based on the above definition of our problem, we will need:
- First of all, data on postal codes, streets and latitude and longitude about the city of Santander. To get this
  we will use **Santander city council's API**, located in the city open data repository.
  
- Secondly, data on venues and their location for the city of Santander, which will be provided by 
  using **Foursquare API**.

## Methodology <a name="methodology"></a>

We'll complete the project in several steps:

1) Get the geo data about the city of Santander, by calling the city council API 
   (http://datos.santander.es/api/rest/datasets/callejero_numpostales.json)
   The city provides the geo data in UTM format, so we will need to transform them into GPS coordinates (by means of
   utm library). We will end up by obtaining a clean dataset with just two postal codes and all of the streets belonging to    each one. The innermost one will be postal code 39001 while the furthest will be postal code 39012.

2) After that, we will use Foursquare API to get the venues associated with each of the locations (streets) which belongs      to each postal codes.

3) Then, we will make the comparison by making three complementary clustering analyis (by using **k-means clustering**):

    -analysis of the result with 2 clusters (minimum): given we have just two set of locations grouped by postal code
  
    -analysis of the result with 5 clusters (medium): as an intermediate solution, with more than double of the 
     original sets of data
  
    -analysis of the result with 20 clusters (maximum): in order to get the most dispersion for postal code 39012 and            least dispersion for postal code 39001.

We will present one map for each clustering option, to better show which of the areas of the furthest most suburb of the city of Santander are similar to the ones in the very center of the city.

## Analysis <a name="analysis"></a>

#### First, download all the libraries we will need

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes  
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

!conda install -c conda-forge folium=0.5.0 --yes  
import folium # map rendering library

# libraries for web scraping
%pip install beautifulsoup4   
from bs4 import BeautifulSoup
import urllib.request

# Convert an UTM coordinate into a (latitude, longitude) tuple
import utm

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


print('<<<<<<<<<<<<<<<<<<< Libraries imported >>>>>>>>>>>>>>>>>>>')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::openssl-1.1.1d-he774522_2
  - defaults/win-64::openssl-1.1.1d-he774522_2done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::openssl-1.1.1d-he774522_2
  - defaults/win-64::openssl-1.1.1d-he774522_2done

# All requested packages already installed.

Note: you may need to restart the kernel to use updated packages.
<<<<<<<<<<<<<<<<<<< Libraries imported >>>>>>>>>>>>>>>>>>>


#### Get the data from the Santander city council open repository

In [2]:
url = "http://datos.santander.es/api/datos/callejero_numpostales.json"

results = requests.get(url).json()
results

{'summary': {'items': 13878,
  'items_per_page': 1000,
  'pages': 14,
  'current_page': 1},
 'resources': [{'callej:sigla': 'CL',
   'callej:distrito': '08',
   'rdf:type': 'callej:Numero-postal',
   'dct:spatial': 'POINT  ( 430592.56000000 4810310.05000000)',
   'gn:postalCode': '39011',
   'callej:portal-bis': ' ',
   'dc:modified': '2020-04-28T22:06:28.35Z',
   'callej:num-portal': '7',
   'callej:seccion': '020',
   'dc:identifier': '4728',
   'callej:nombre-clasif': 'FAUSTINO CAVADAS',
   'callej:portal-bloque': ' ',
   'callej:portal': ' ',
   'dc:description': '',
   'uri': 'http://datos.santander.es/api/datos/callejero_numpostales/2.json'},
  {'callej:sigla': 'CL',
   'callej:distrito': '08',
   'rdf:type': 'callej:Numero-postal',
   'dct:spatial': 'POINT  ( 429633.68000000 4813410.42000000)',
   'gn:postalCode': '39012',
   'callej:portal-bis': 'B',
   'dc:modified': '2020-04-28T22:06:28.35Z',
   'callej:num-portal': '6',
   'callej:seccion': '007',
   'dc:identifier': '3411',

In [3]:
# assign relevant part of JSON to venues
venues = results['resources']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()


Unnamed: 0,callej:sigla,callej:distrito,rdf:type,dct:spatial,gn:postalCode,callej:portal-bis,dc:modified,callej:num-portal,callej:seccion,dc:identifier,callej:nombre-clasif,callej:portal-bloque,callej:portal,dc:description,uri
0,CL,8,callej:Numero-postal,POINT ( 430592.56000000 4810310.05000000),39011,,2020-04-28T22:06:28.35Z,7,20,4728,FAUSTINO CAVADAS,,,,http://datos.santander.es/api/datos/callejero_...
1,CL,8,callej:Numero-postal,POINT ( 429633.68000000 4813410.42000000),39012,B,2020-04-28T22:06:28.35Z,6,7,3411,COSTA QUEBRADA,,,,http://datos.santander.es/api/datos/callejero_...
2,CL,8,callej:Numero-postal,POINT ( 434920.66000000 4814743.53000000),39012,,2020-04-28T22:06:28.35Z,162,12,1399,INES D. NOVAL,,,,http://datos.santander.es/api/datos/callejero_...
3,CL,8,callej:Numero-postal,POINT ( 434890.16000000 4814688.66000000),39012,,2020-04-28T22:06:28.35Z,174,12,1399,INES D. NOVAL,,,,http://datos.santander.es/api/datos/callejero_...
4,CL,7,callej:Numero-postal,POINT ( 435178.29000000 4813424.31000000),39006,,2020-04-28T22:06:28.35Z,62,12,172,FERNANDO DE LOS RIOS,,,,http://datos.santander.es/api/datos/callejero_...


In [4]:
# keep only columns with relevant information
filtered_columns = ['callej:sigla','callej:distrito','dct:spatial','gn:postalCode','callej:seccion','callej:nombre-clasif']
dataframe_filtered = dataframe.loc[:, filtered_columns]
dataframe_filtered.head()

Unnamed: 0,callej:sigla,callej:distrito,dct:spatial,gn:postalCode,callej:seccion,callej:nombre-clasif
0,CL,8,POINT ( 430592.56000000 4810310.05000000),39011,20,FAUSTINO CAVADAS
1,CL,8,POINT ( 429633.68000000 4813410.42000000),39012,7,COSTA QUEBRADA
2,CL,8,POINT ( 434920.66000000 4814743.53000000),39012,12,INES D. NOVAL
3,CL,8,POINT ( 434890.16000000 4814688.66000000),39012,12,INES D. NOVAL
4,CL,7,POINT ( 435178.29000000 4813424.31000000),39006,12,FERNANDO DE LOS RIOS


In [5]:
# rename colummns
dataframe_filtered.rename(columns = {'callej:sigla':'StreetType','callej:distrito':'District','dct:spatial':'Coordinates','gn:postalCode':'PostalCode','callej:seccion':'Section','callej:nombre-clasif':'StreetName'}, inplace = True)
dataframe_filtered.head()

Unnamed: 0,StreetType,District,Coordinates,PostalCode,Section,StreetName
0,CL,8,POINT ( 430592.56000000 4810310.05000000),39011,20,FAUSTINO CAVADAS
1,CL,8,POINT ( 429633.68000000 4813410.42000000),39012,7,COSTA QUEBRADA
2,CL,8,POINT ( 434920.66000000 4814743.53000000),39012,12,INES D. NOVAL
3,CL,8,POINT ( 434890.16000000 4814688.66000000),39012,12,INES D. NOVAL
4,CL,7,POINT ( 435178.29000000 4813424.31000000),39006,12,FERNANDO DE LOS RIOS


In [6]:
# only process records with district postal code not null

# to keep records with Postal Code not null (length > 1)
dataframe_filtered = dataframe_filtered[dataframe_filtered['PostalCode'].map(len) > 1].reset_index(drop=True)
dataframe_filtered.to_excel("test_filtered.xlsx")
print(dataframe_filtered.shape)

(970, 6)


In [7]:
dataframe_filtered

Unnamed: 0,StreetType,District,Coordinates,PostalCode,Section,StreetName
0,CL,8,POINT ( 430592.56000000 4810310.05000000),39011,20,FAUSTINO CAVADAS
1,CL,8,POINT ( 429633.68000000 4813410.42000000),39012,7,COSTA QUEBRADA
2,CL,8,POINT ( 434920.66000000 4814743.53000000),39012,12,INES D. NOVAL
3,CL,8,POINT ( 434890.16000000 4814688.66000000),39012,12,INES D. NOVAL
4,CL,7,POINT ( 435178.29000000 4813424.31000000),39006,12,FERNANDO DE LOS RIOS
5,CL,8,POINT ( 430363.59000000 4811007.80000000),39011,3,ALBERICO PARDO
6,AV,8,POINT ( 431483.33000000 4810345.64000000),39011,17,NUEVA MONTAÑA
7,CL,8,POINT ( 429610.03000000 4813102.36000000),39012,7,CORBAN
8,CL,5,POINT ( 433512.12000000 4811648.78000000),39009,15,MARQUES HERMIDA
9,CL,8,POINT ( 431163.14000000 4812256.02000000),39011,23,LOS CIRUELOS


In [8]:
# eliminate POINT() and split coordinates into 2 columns

# eliminate POINT()
dataframe_filtered['Coordinates'] = dataframe_filtered['Coordinates'].str.replace(r'POINT', '')
dataframe_filtered['Coordinates'] = dataframe_filtered['Coordinates'].str.replace(r'(', '')
dataframe_filtered['Coordinates'] = dataframe_filtered['Coordinates'].str.replace(r')', '')

# split coordinates into 2 columns
# by default it splits on single space 
dataframe_filtered[['Lat_UTM','Long_UTM']] = dataframe_filtered.Coordinates.str.split(expand=True) 

dataframe_filtered.drop(['Coordinates'], axis = 1, inplace=True)

dataframe_filtered.head()


Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Lat_UTM,Long_UTM
0,CL,8,39011,20,FAUSTINO CAVADAS,430592.56,4810310.05
1,CL,8,39012,7,COSTA QUEBRADA,429633.68,4813410.42
2,CL,8,39012,12,INES D. NOVAL,434920.66,4814743.53
3,CL,8,39012,12,INES D. NOVAL,434890.16,4814688.66
4,CL,7,39006,12,FERNANDO DE LOS RIOS,435178.29,4813424.31


In [9]:
dataframe_filtered.shape

(970, 7)

In [10]:
dataframe_filtered.head()

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Lat_UTM,Long_UTM
0,CL,8,39011,20,FAUSTINO CAVADAS,430592.56,4810310.05
1,CL,8,39012,7,COSTA QUEBRADA,429633.68,4813410.42
2,CL,8,39012,12,INES D. NOVAL,434920.66,4814743.53
3,CL,8,39012,12,INES D. NOVAL,434890.16,4814688.66
4,CL,7,39006,12,FERNANDO DE LOS RIOS,435178.29,4813424.31


In [11]:
# Bidirectional UTM-WGS84 converter for python http://pypi.python.org/pypi/utm
# To get WGS84 (geographical lat/long) coordinates

# Spain's UTM zone, which are required parameters for the call to utm.to_latlon
# https://epsg.io/3042

gps_lat = dataframe_filtered['Lat_UTM']
gps_long = dataframe_filtered['Long_UTM']
gps_coordinates =[]

# get the GPS coordinates for every UTM location
for lat, long in zip(gps_lat, gps_long):
    f_lat = float(lat)
    f_long = float(long)
# returns a tuple with the form (LATITUDE,LONGITUDE)
    gps_coordinates.append(utm.to_latlon(f_lat, f_long, 30, 'N'))
    
# add the newly obtained GPS coordinates to the dataframe
dataframe_filtered['Coordinates_GPS'] = gps_coordinates

# split Coordinates_GPS in two columns
dataframe_filtered[['Latitude','Longitude']] = pd.DataFrame(dataframe_filtered['Coordinates_GPS'].tolist(),index=dataframe_filtered.index)

dataframe_filtered_GPS = dataframe_filtered
dataframe_filtered_GPS.head()


Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Lat_UTM,Long_UTM,Coordinates_GPS,Latitude,Longitude
0,CL,8,39011,20,FAUSTINO CAVADAS,430592.56,4810310.05,"(43.442475049099215, -3.8577163390893805)",43.442475,-3.857716
1,CL,8,39012,7,COSTA QUEBRADA,429633.68,4813410.42,"(43.47029921592573, -3.8699646672425647)",43.470299,-3.869965
2,CL,8,39012,12,INES D. NOVAL,434920.66,4814743.53,"(43.4827804017062, -3.8047656625950004)",43.48278,-3.804766
3,CL,8,39012,12,INES D. NOVAL,434890.16,4814688.66,"(43.482283728116016, -3.8051362254977525)",43.482284,-3.805136
4,CL,7,39006,12,FERNANDO DE LOS RIOS,435178.29,4813424.31,"(43.470925241101796, -3.8014231395857867)",43.470925,-3.801423


In [12]:
dataframe_filtered_GPS.shape

(970, 10)

In [13]:
# drop no longer needed UTM coordinates 
dataframe_filtered_GPS.drop(['Lat_UTM','Long_UTM'], axis = 1, inplace=True)
dataframe_filtered_GPS.head()

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Coordinates_GPS,Latitude,Longitude
0,CL,8,39011,20,FAUSTINO CAVADAS,"(43.442475049099215, -3.8577163390893805)",43.442475,-3.857716
1,CL,8,39012,7,COSTA QUEBRADA,"(43.47029921592573, -3.8699646672425647)",43.470299,-3.869965
2,CL,8,39012,12,INES D. NOVAL,"(43.4827804017062, -3.8047656625950004)",43.48278,-3.804766
3,CL,8,39012,12,INES D. NOVAL,"(43.482283728116016, -3.8051362254977525)",43.482284,-3.805136
4,CL,7,39006,12,FERNANDO DE LOS RIOS,"(43.470925241101796, -3.8014231395857867)",43.470925,-3.801423


In [14]:
dataframe_filtered_GPS.shape

(970, 8)

In [15]:
# drop no longer needed Coordinates_GPS tuple column
dataframe_filtered_GPS.drop(['Coordinates_GPS'], axis = 1, inplace=True)
dataframe_filtered_GPS.shape

(970, 7)

In [16]:
dataframe_filtered_GPS.head()

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
0,CL,8,39011,20,FAUSTINO CAVADAS,43.442475,-3.857716
1,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965
2,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766
3,CL,8,39012,12,INES D. NOVAL,43.482284,-3.805136
4,CL,7,39006,12,FERNANDO DE LOS RIOS,43.470925,-3.801423


In [17]:
# Now we have the Santander dataframe with all the relevant data in the desired format
df_santander = dataframe_filtered_GPS

print('The Santander dataframe has {} districts and {} postal codes.'.format(
        len(df_santander['District'].unique()),
        len(df_santander['PostalCode'].unique()),
    )
)

The Santander dataframe has 8 districts and 13 postal codes.


#### Begin to prepare filtered dataframes

To explore the postal codes present in the dataframe

In [18]:
df_aux1 = df_santander['PostalCode'].unique()
df_aux1.sort()
print(df_aux1)

['36005' '39001' '39002' '39003' '39004' '39005' '39006' '39007' '39008'
 '39009' '39010' '39011' '39012']


Now, we remove postal code 36005, for being an anomaly

In [19]:
df_santander_f1 = df_santander.drop(df_santander[df_santander.PostalCode == '36005'].index)

Then, let's order the resulting postal codes, to know first and last

In [20]:
df_aux2 = df_santander_f1['PostalCode'].unique()
df_aux2.sort()
print(df_aux2)

['39001' '39002' '39003' '39004' '39005' '39006' '39007' '39008' '39009'
 '39010' '39011' '39012']


In [21]:
df_santander_f1.shape

(969, 7)

Remove all rows not belonging to first or last postal code (39001,39012)

In [22]:
df_santander_f2 = df_santander_f1.drop(df_santander_f1[ (df_santander_f1['PostalCode'] != '39001') & (df_santander_f1['PostalCode'] != '39012') ].index) 
df_aux2 = df_santander_f2['PostalCode'].unique()
df_aux2.sort()
print(df_aux2)

['39001' '39012']


In [23]:
df_santander_f2.groupby('PostalCode').count()

Unnamed: 0_level_0,StreetType,District,Section,StreetName,Latitude,Longitude
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
39001,34,34,34,34,34,34
39012,322,322,322,322,322,322


In [24]:
df_santander_filtered = df_santander_f2

In [25]:
df_santander_filtered.shape

(356, 7)

To check duplicate streets

In [26]:
df_santander_filtered.groupby('StreetName').count()

Unnamed: 0_level_0,StreetType,District,PostalCode,Section,Latitude,Longitude
StreetName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ALBERICIA,3,3,3,3,3,3
ALFONSINA STORNI,1,1,1,1,1,1
ALSEDO BUSTAMANTE,1,1,1,1,1,1
ARCILLERO,1,1,1,1,1,1
ARRIBA,2,2,2,2,2,2
ARSENIO ODRIOZOLA,2,2,2,2,2,2
ASILO,2,2,2,2,2,2
ATALAYA,2,2,2,2,2,2
AURELIO RUIZ CRESPO,1,1,1,1,1,1
AUTONOMIA,5,5,5,5,5,5


Remove duplicates, because we take just the first occurrence of each street

In [27]:
df_santander_filtered_unique = df_santander_filtered.drop_duplicates(subset=['StreetName'], keep='first')

In [28]:
df_santander_filtered_unique.reset_index(drop=True,inplace=True)
df_santander_filtered_unique

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766
2,CL,8,39012,7,CORBAN,43.467523,-3.870217
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979
5,CL,8,39012,8,AVICHE,43.474001,-3.822505
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043


To check how many streets belong to each of the two postal codes in treatment

In [29]:
df_santander_filtered_unique.groupby('PostalCode').count()

Unnamed: 0_level_0,StreetType,District,Section,StreetName,Latitude,Longitude
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
39001,21,21,21,21,21,21
39012,96,96,96,96,96,96


In [30]:
df_santander_filtered_unique.shape

(117, 7)

#### Final Santander dataframe with just 2 postal codes and no duplicates

In [31]:
# Santander DataFrame
sdf = df_santander_filtered_unique

In [32]:
sdf

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766
2,CL,8,39012,7,CORBAN,43.467523,-3.870217
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979
5,CL,8,39012,8,AVICHE,43.474001,-3.822505
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043


#### Plot the data on maps

In [33]:
address = 'Santander, ES'

geolocator = Nominatim(user_agent="santander_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of the city of Santander are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of the city of Santander are 43.4620412, -3.8099719.


Create a map of Santander with streets superimposed on top, each one with its district and postal code.

In [34]:
# create map of Santander using latitude and longitude values
map_santander = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for street, lat, long, district, postalcode in zip(sdf['StreetName'], sdf['Latitude'], 
                                                   sdf['Longitude'], sdf['District'], 
                                                   sdf['PostalCode']):
    label = '{} ({}, {})'.format(street, district, postalcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_santander)
    
map_santander

In [35]:
sdf.shape

(117, 7)

Now, we create a map of Santander with streets superimposed on top, but differentiating by postal code

In [36]:
# # Dataframe of Postal Code 39001 alone
sdf_39001 = sdf.drop(sdf[ (sdf['PostalCode'] != '39001') ].index) 
sdf_39001 

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
15,CL,1,39001,3,GUEVARA,43.46555,-3.804091
18,CL,3,39001,2,LA LEVA,43.467443,-3.807361
30,CL,3,39001,3,MARIA CRISTINA,43.468321,-3.806775
36,CT,3,39001,2,ATALAYA,43.467079,-3.806902
51,CL,2,39001,21,JUAN XXIII,43.46604,-3.812874
53,CL,3,39001,2,SAN CELEDONIO,43.46666,-3.807685
72,CL,3,39001,7,RIO DE LA PILA,43.468416,-3.804033
76,CL,1,39001,2,SAN JOSE,43.464434,-3.80418
77,CL,3,39001,4,ALSEDO BUSTAMANTE,43.465899,-3.80546
81,CL,2,39001,21,VIA CORNELIA,43.467975,-3.810516


In [37]:
# Remove duplicate streets to better compare in the following map
sdf_39001_unique = sdf_39001.drop_duplicates(subset=['StreetName'], keep='first').reset_index(drop=True)
sdf_39001_unique.shape

(21, 7)

In [38]:
# Dataframe of Postal Code 39012 alone
sdf_39012 = sdf.drop(sdf[ (sdf['PostalCode'] != '39012') ].index) 
sdf_39012 

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766
2,CL,8,39012,7,CORBAN,43.467523,-3.870217
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979
5,CL,8,39012,8,AVICHE,43.474001,-3.822505
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043


In [39]:
# Remove duplicate streets to better compare in the following map
sdf_39012_unique = sdf_39012.drop_duplicates(subset=['StreetName'], keep='first').reset_index(drop=True)
sdf_39012_unique.shape

(96, 7)

First, we plot 39001 streets on a new map

In [40]:
# create map of Santander using latitude and longitude values
map_santander2 = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map for dataframe with 39001 streets
for street, lat, long, district, postalcode in zip(sdf_39001_unique['StreetName'], sdf_39001_unique['Latitude'], 
                                                   sdf_39001_unique['Longitude'], sdf_39001_unique['District'], 
                                                   sdf_39001_unique['PostalCode']):
    label = '{} ({}, {})'.format(street, district, postalcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='#62cc31',
        fill_opacity=0.7,
        parse_html=False).add_to(map_santander2)
    
map_santander2

Then, we plot 39012 streets on a new map

In [41]:
# add markers to map for dataframe with 39001 streets
for street, lat, long, district, postalcode in zip(sdf_39012_unique['StreetName'], sdf_39012_unique['Latitude'], 
                                                   sdf_39012_unique['Longitude'], sdf_39012_unique['District'], 
                                                   sdf_39012_unique['PostalCode']):
    label = '{} ({}, {})'.format(street, district, postalcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FF0000',
        fill_opacity=0.7,
        parse_html=False).add_to(map_santander2)
    
map_santander2

So we have green for 39001 streets and red for 39012 streets.

#### Next, we are going to start utilizing the Foursquare API to explore the streets and segment them

In [42]:
CLIENT_ID = 'XL5WFC051BLGJ4QOTZBYUUIGRFDJPHQ2CS3VGYQEHFMRWFFB' # your Foursquare ID
CLIENT_SECRET = 'IRU1QPBYKJGRZZ3XVNCIXMLOA3LMIKMXMEU3BHMVDRQKNF1N' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XL5WFC051BLGJ4QOTZBYUUIGRFDJPHQ2CS3VGYQEHFMRWFFB
CLIENT_SECRET:IRU1QPBYKJGRZZ3XVNCIXMLOA3LMIKMXMEU3BHMVDRQKNF1N


Let's explore the first street in our dataframe

Get the street's name.

In [43]:
sdf.loc[0, 'StreetName']

'COSTA QUEBRADA'

In [44]:
street_latitude = sdf.loc[0, 'Latitude'] # neighborhood latitude value
street_longitude = sdf.loc[0, 'Longitude'] # neighborhood longitude value

street_name = sdf.loc[0, 'StreetName'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(street_name, 
                                                               street_latitude, 
                                                               street_longitude))

Latitude and longitude values of COSTA QUEBRADA are 43.47029921592573, -3.8699646672425647.


To get venues for the first street of the dataframe

In [45]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    street_latitude, 
    street_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=XL5WFC051BLGJ4QOTZBYUUIGRFDJPHQ2CS3VGYQEHFMRWFFB&client_secret=IRU1QPBYKJGRZZ3XVNCIXMLOA3LMIKMXMEU3BHMVDRQKNF1N&v=20180605&ll=43.47029921592573,-3.8699646672425647&radius=500&limit=100'

In [46]:
results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '5ea9c8e214a1267c9ca13caf'},
 'response': {'headerLocation': 'Soto de la Marina',
  'headerFullLocation': 'Soto de la Marina',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 43.47479922042573,
    'lng': -3.86377559463083},
   'sw': {'lat': 43.465799211425725, 'lng': -3.8761537398542996}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c863eb52f1c236ac23c5b43',
       'name': 'El Llar',
       'location': {'lat': 43.4661330769018,
        'lng': -3.8708171856596785,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.4661330769018,
          'lng': -3.8708171856596785}],
        'distance': 468,
        'cc': 'ES',
        'country': 'España',
        'formattedAddress'

We know that all the information is in the items key. Now, let's borrow the get_category_type function from the Foursquare lab.

In [47]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [48]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,El Llar,Spanish Restaurant,43.466133,-3.870817
1,Casa Miguel,Spanish Restaurant,43.466709,-3.868025
2,Restaurante Casa Miguel,Spanish Restaurant,43.467554,-3.87201
3,El Llar Terraza,Pub,43.466329,-3.870943


And how many venues were returned by Foursquare? 

In [49]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


In [50]:
sdf.head()

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766
2,CL,8,39012,7,CORBAN,43.467523,-3.870217
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979


Let's create a function to repeat the same process to all the neighborhoods in our dataframe

In [51]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for i, name, lat, lng in zip(sdf.index, names, latitudes, longitudes):
        print(i, name, lat, lng)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [52]:
sdf_venues = getNearbyVenues(names=sdf['StreetName'],
                                   latitudes=sdf['Latitude'],
                                   longitudes=sdf['Longitude']
                                  )

0 COSTA QUEBRADA 43.47029921592573 -3.8699646672425647
1 INES D. NOVAL 43.4827804017062 -3.8047656625950004
2 CORBAN 43.46752343168151 -3.8702172330613376
3 REPUENTE 43.46569124599082 -3.834288618599536
4 PRONILLO 43.464794477934085 -3.8309787917538314
5 AVICHE 43.47400132304138 -3.8225048599030576
6 SOMONTE 43.47057652735494 -3.859825362324643
7 CORBANERA 43.47786941190596 -3.8341612830904124
8 ELENA QUIROGA 43.46794284583219 -3.8648895519109225
9 MAZO DE ABAJO 43.47042717028424 -3.8610426415214008
10 LA TORRE 43.47528082309851 -3.8158705417663645
11 FUMORIL 43.48394842198416 -3.8087238553145037
12 TRISTANA 43.47291994327368 -3.829728243192483
13 ARRIBA 43.479822944740825 -3.8119278521899695
14 DOCTOR DIEGO MADRAZO 43.48477563021825 -3.794004841854865
15 GUEVARA 43.46555039455035 -3.8040906355932336
16 JESUS OTERO 43.48123073340663 -3.8108875583491573
17 SAN MIGUEL 43.47102909219792 -3.8382517145933024
18 LA LEVA 43.46744283213899 -3.807361437234898
19 CAMUS 43.48632584178901 -3.79812

Let's check how many venues were returned for each street

In [53]:
sdf_venues.head()

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,COSTA QUEBRADA,43.470299,-3.869965,El Llar,43.466133,-3.870817,Spanish Restaurant
1,COSTA QUEBRADA,43.470299,-3.869965,Casa Miguel,43.466709,-3.868025,Spanish Restaurant
2,COSTA QUEBRADA,43.470299,-3.869965,Restaurante Casa Miguel,43.467554,-3.87201,Spanish Restaurant
3,COSTA QUEBRADA,43.470299,-3.869965,El Llar Terraza,43.466329,-3.870943,Pub
4,INES D. NOVAL,43.48278,-3.804766,Mercadona,43.481584,-3.800052,Supermarket


In [54]:
sdf_venues.shape

(1103, 7)

In [55]:
sdf_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ALBERICIA,5,5,5,5,5,5
ALFONSINA STORNI,6,6,6,6,6,6
ALSEDO BUSTAMANTE,66,66,66,66,66,66
ARCILLERO,57,57,57,57,57,57
ARRIBA,2,2,2,2,2,2
ARSENIO ODRIOZOLA,8,8,8,8,8,8
ASILO,35,35,35,35,35,35
ATALAYA,29,29,29,29,29,29
AURELIO RUIZ CRESPO,31,31,31,31,31,31
AUTONOMIA,8,8,8,8,8,8


Let's find out how many unique categories can be curated from all the returned venues

In [56]:
print('There are {} unique categories.'.format(len(sdf_venues['Venue Category'].unique())))

There are 74 unique categories.


#### Analyze Each Neighborhood

In [57]:
# one hot encoding
sdf_onehot = pd.get_dummies(sdf_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
sdf_onehot['Street'] = sdf_venues['Street'] 

# move street column to the first column
street_df = sdf_onehot['Street']
sdf_onehot.drop(labels=['Street'], axis=1,inplace = True)
sdf_onehot.insert(0, 'Street', street_df)

sdf_onehot.head()

Unnamed: 0,Street,Art Gallery,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beer Bar,Big Box Store,Bike Rental / Bike Share,Bookstore,Boutique,Breakfast Spot,Brewery,Building,Burger Joint,Café,Campground,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant,Food & Drink Shop,Food Court,Football Stadium,Frozen Yogurt Shop,Garden,Gastropub,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Italian Restaurant,Light Rail Station,Lighthouse,Mediterranean Restaurant,Mexican Restaurant,Movie Theater,Nightclub,Park,Pharmacy,Pizza Place,Playground,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soccer Field,Spanish Restaurant,Sports Bar,Sports Club,Stables,Stadium,Supermarket,Surf Spot,Tapas Restaurant,Theme Park,Video Store,Wine Bar
0,COSTA QUEBRADA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
1,COSTA QUEBRADA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,COSTA QUEBRADA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,COSTA QUEBRADA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,INES D. NOVAL,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


And let's examine the new dataframe size.

In [58]:
sdf_onehot.shape

(1103, 75)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [59]:
sdf_grouped = sdf_onehot.groupby('Street').mean().reset_index()
sdf_grouped

Unnamed: 0,Street,Art Gallery,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bakery,Bar,Beach,Beer Bar,Big Box Store,Bike Rental / Bike Share,Bookstore,Boutique,Breakfast Spot,Brewery,Building,Burger Joint,Café,Campground,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Dessert Shop,Diner,Electronics Store,Fast Food Restaurant,Food & Drink Shop,Food Court,Football Stadium,Frozen Yogurt Shop,Garden,Gastropub,Grocery Store,Gym,Gym / Fitness Center,Historic Site,History Museum,Hostel,Hotel,Ice Cream Shop,Italian Restaurant,Light Rail Station,Lighthouse,Mediterranean Restaurant,Mexican Restaurant,Movie Theater,Nightclub,Park,Pharmacy,Pizza Place,Playground,Plaza,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Snack Place,Soccer Field,Spanish Restaurant,Sports Bar,Sports Club,Stables,Stadium,Supermarket,Surf Spot,Tapas Restaurant,Theme Park,Video Store,Wine Bar
0,ALBERICIA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
1,ALFONSINA STORNI,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0
2,ALSEDO BUSTAMANTE,0.0,0.0,0.0,0.0,0.030303,0.106061,0.0,0.015152,0.0,0.0,0.030303,0.0,0.0,0.0,0.015152,0.030303,0.090909,0.0,0.0,0.0,0.0,0.045455,0.015152,0.015152,0.0,0.0,0.0,0.015152,0.030303,0.0,0.015152,0.0,0.030303,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.015152,0.015152,0.0,0.015152,0.030303,0.015152,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.015152,0.0,0.045455,0.0,0.030303,0.015152,0.0,0.0,0.0,0.121212,0.0,0.015152,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.015152
3,ARCILLERO,0.0,0.0,0.0,0.0,0.017544,0.122807,0.0,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.017544,0.122807,0.0,0.0,0.017544,0.017544,0.017544,0.035088,0.017544,0.0,0.017544,0.0,0.017544,0.017544,0.017544,0.017544,0.017544,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.017544,0.035088,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.052632,0.0,0.070175,0.0,0.0,0.0,0.0,0.070175,0.0,0.017544,0.0,0.0,0.0,0.0,0.087719,0.0,0.0,0.035088
4,ARRIBA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,ARSENIO ODRIOZOLA,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,ASILO,0.0,0.0,0.0,0.0,0.028571,0.085714,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.171429,0.0,0.0,0.0,0.028571,0.057143,0.028571,0.028571,0.0,0.0,0.0,0.0,0.057143,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.028571
7,ATALAYA,0.0,0.0,0.0,0.0,0.034483,0.103448,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.172414,0.0,0.0,0.0,0.0,0.068966,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.034483,0.0,0.0,0.0,0.068966,0.0,0.034483,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.034483
8,AURELIO RUIZ CRESPO,0.0,0.0,0.0,0.0,0.0,0.225806,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.129032,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.032258,0.064516,0.032258,0.0,0.0,0.0,0.064516,0.0,0.032258,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0
9,AUTONOMIA,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


And let's examine also the size of this dataframe.

In [60]:
sdf_grouped.shape

(117, 75)

Let's print each street along with the top 5 most common venue

In [61]:
num_top_venues = 5

for st in sdf_grouped['Street']:
    print("----"+st+"----")
    temp = sdf_grouped[sdf_grouped['Street'] == st].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALBERICIA----
               venue  freq
0         Restaurant   0.2
1        Supermarket   0.2
2      Big Box Store   0.2
3  Food & Drink Shop   0.2
4        Pizza Place   0.2


----ALFONSINA STORNI----
         venue  freq
0         Park  0.17
1       Bakery  0.17
2        Hotel  0.17
3  Supermarket  0.17
4      Stadium  0.17


----ALSEDO BUSTAMANTE----
                venue  freq
0  Spanish Restaurant  0.12
1                 Bar  0.11
2                Café  0.09
3    Tapas Restaurant  0.09
4           Nightclub  0.05


----ARCILLERO----
                venue  freq
0                 Bar  0.12
1                Café  0.12
2    Tapas Restaurant  0.09
3  Spanish Restaurant  0.07
4          Restaurant  0.07


----ARRIBA----
                venue  freq
0  Italian Restaurant   0.5
1  Spanish Restaurant   0.5
2         Art Gallery   0.0
3          Lighthouse   0.0
4            Pharmacy   0.0


----ARSENIO ODRIOZOLA----
                venue  freq
0          Restaurant  0.12
1  Spanish Res

4  Mediterranean Restaurant  0.00


----LA CUEVONA----
                venue  freq
0  Spanish Restaurant  0.67
1                 Pub  0.33
2         Art Gallery  0.00
3          Lighthouse  0.00
4                Park  0.00


----LA GLORIA----
                      venue  freq
0               Coffee Shop  0.25
1                  Boutique  0.25
2          Football Stadium  0.25
3  Bike Rental / Bike Share  0.25
4               Pizza Place  0.00


----LA HONDAL----
               venue  freq
0        Supermarket   0.4
1         Restaurant   0.2
2  Food & Drink Shop   0.2
3        Pizza Place   0.2
4        Art Gallery   0.0


----LA LEVA----
                venue  freq
0                Café  0.24
1    Tapas Restaurant  0.12
2  Spanish Restaurant  0.12
3          Restaurant  0.06
4        Burger Joint  0.06


----LA LLANILLA----
                      venue  freq
0                Restaurant  0.33
1                    Bakery  0.33
2                Playground  0.33
3               Art Gallery

              venue  freq
0              Café  0.21
1               Bar  0.16
2  Tapas Restaurant  0.11
3        Restaurant  0.05
4      Concert Hall  0.05




Let's put it into a new dataframe

First, let's write a function to sort the venues in descending order.

In [62]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each street.

In [63]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
streets_venues_sorted = pd.DataFrame(columns=columns)
streets_venues_sorted['Street'] = sdf_grouped['Street']

for ind in np.arange(sdf_grouped.shape[0]):
    streets_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sdf_grouped.iloc[ind, :], num_top_venues)

streets_venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALBERICIA,Pizza Place,Supermarket,Restaurant,Big Box Store,Food & Drink Shop,Fast Food Restaurant,Electronics Store,Diner,Dessert Shop,Department Store
1,ALFONSINA STORNI,Park,Seafood Restaurant,Supermarket,Stadium,Bakery,Hotel,Castle,Church,Clothing Store,Cocktail Bar
2,ALSEDO BUSTAMANTE,Spanish Restaurant,Bar,Café,Tapas Restaurant,Nightclub,Plaza,Cocktail Bar,Bakery,Frozen Yogurt Shop,Restaurant
3,ARCILLERO,Café,Bar,Tapas Restaurant,Restaurant,Spanish Restaurant,Plaza,Ice Cream Shop,Wine Bar,Coffee Shop,Convention Center
4,ARRIBA,Spanish Restaurant,Italian Restaurant,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store


In [64]:
sdf_grouped.shape

(117, 75)

In [65]:
streets_venues_sorted.shape

(117, 11)

In [66]:
streets_venues_sorted

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALBERICIA,Pizza Place,Supermarket,Restaurant,Big Box Store,Food & Drink Shop,Fast Food Restaurant,Electronics Store,Diner,Dessert Shop,Department Store
1,ALFONSINA STORNI,Park,Seafood Restaurant,Supermarket,Stadium,Bakery,Hotel,Castle,Church,Clothing Store,Cocktail Bar
2,ALSEDO BUSTAMANTE,Spanish Restaurant,Bar,Café,Tapas Restaurant,Nightclub,Plaza,Cocktail Bar,Bakery,Frozen Yogurt Shop,Restaurant
3,ARCILLERO,Café,Bar,Tapas Restaurant,Restaurant,Spanish Restaurant,Plaza,Ice Cream Shop,Wine Bar,Coffee Shop,Convention Center
4,ARRIBA,Spanish Restaurant,Italian Restaurant,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
5,ARSENIO ODRIOZOLA,Campground,Gym,Brewery,Beach,Restaurant,Spanish Restaurant,Italian Restaurant,Snack Place,Convenience Store,Clothing Store
6,ASILO,Café,Tapas Restaurant,Bar,Plaza,Diner,Cocktail Bar,Spanish Restaurant,Wine Bar,Electronics Store,Concert Hall
7,ATALAYA,Café,Tapas Restaurant,Bar,Diner,Cocktail Bar,Spanish Restaurant,Restaurant,Wine Bar,Sandwich Place,Concert Hall
8,AURELIO RUIZ CRESPO,Bar,Café,Tapas Restaurant,Diner,Burger Joint,Restaurant,Cocktail Bar,Spanish Restaurant,Grocery Store,Light Rail Station
9,AUTONOMIA,Seafood Restaurant,Bakery,Gym,Gym / Fitness Center,Brewery,Italian Restaurant,Restaurant,Snack Place,Wine Bar,Cocktail Bar


#### Cluster Streets

We'll make a three-clustering comparison, to analyze and find the optimal option, which'll be the one with the least streets belonging to the postal code 39012 in the same cluster as the streets of the 39001 postal code.

So we will implement three clustering options: 
    
    a) 2 clusters (provided we are comparing 2 sets of streets), 
    b) 5 clusters, as an intermediate solution and 
    c) 20 clusters, as a way of obtaining the least possible number of streets from postal code 39012 clustered with 39001          ones

First of all, we check the dataframes to join

In [67]:
print(sdf.shape , '<-- sdf shape, to test if dataframes to join have equal dimensions')
print(streets_venues_sorted.shape , '<-- streets venues, to test if dataframes to join have equal dimensions')

(117, 7) <-- sdf shape, to test if dataframes to join have equal dimensions
(117, 11) <-- streets venues, to test if dataframes to join have equal dimensions


We'll send both dataframes to excel to keep their reference.

In [68]:
# send both dataframes to excel
sdf.to_excel("sdf.xlsx")
streets_venues_sorted.to_excel("streets_venues_sorted.xlsx")

Now, we start the pure clustering process

#### Run k-means to cluster the neighborhood into 2 clusters.

In [69]:
# make a working copy to keep a reference of the original sorted dataframe
streets_venues_sorted2 = streets_venues_sorted.copy()

In [70]:
# set number of clusters
kclusters = 2

sdf_grouped_clustering2 = sdf_grouped.drop('Street', 1)

# run k-means clustering
kmeans2 = KMeans(n_clusters=kclusters, random_state=0).fit(sdf_grouped_clustering2)

# check cluster labels generated for each row in the dataframe
kmeans2.labels_[0:10] 

array([1, 1, 1, 1, 0, 1, 1, 1, 1, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each street.

In [71]:
streets_venues_sorted2.insert(0, 'Cluster Labels', kmeans2.labels_)

sdf_merged2 = sdf

# merge Santander data in sdf with streets_venues_sorted to add latitude/longitude for each street
sdf_merged2 = sdf_merged2.join(streets_venues_sorted2.set_index('Street'), on='StreetName')

In [72]:
sdf_merged2

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965,0,Spanish Restaurant,Pub,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,1,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
2,CL,8,39012,7,CORBAN,43.467523,-3.870217,0,Spanish Restaurant,Bakery,Sports Bar,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289,0,Supermarket,Pizza Place,Spanish Restaurant,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979,0,Spanish Restaurant,Supermarket,Pizza Place,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
5,CL,8,39012,8,AVICHE,43.474001,-3.822505,0,Spanish Restaurant,Restaurant,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825,1,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161,1,Restaurant,Beach,Seafood Restaurant,Wine Bar,Convention Center,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489,0,Spanish Restaurant,Bakery,Restaurant,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,1,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop


Finally, let's visualize the resulting clusters

In [73]:
# create map
map_clusters_2 = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sdf_merged2['Latitude'], sdf_merged2['Longitude'], sdf_merged2['StreetName'], sdf_merged2['Cluster Labels']):
#   label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    label = folium.Popup(poi + ' Street ')
   
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_2)
       
map_clusters_2

Cluster with all the streets belonging to postal code 39001 and several from 39012 (in this case,cluster 1)

In [74]:
sdf_cluster2 = sdf_merged2.drop(sdf_merged2[ (sdf_merged2['Cluster Labels'] != 1) ].index).reset_index(drop=True)
sdf_cluster2 

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,1,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,24,SOMONTE,43.470577,-3.859825,1,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
2,CL,8,39012,9,CORBANERA,43.477869,-3.834161,1,Restaurant,Beach,Seafood Restaurant,Wine Bar,Convention Center,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
3,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,1,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop
4,CL,8,39012,8,LA TORRE,43.475281,-3.815871,1,Italian Restaurant,Bar,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
5,CL,8,39012,11,FUMORIL,43.483948,-3.808724,1,Italian Restaurant,Castle,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Wine Bar
6,AV,8,39012,27,DOCTOR DIEGO MADRAZO,43.484776,-3.794005,1,Snack Place,Restaurant,Campground,Bakery,Gym,Beach,Italian Restaurant,Brewery,Spanish Restaurant,Seafood Restaurant
7,CL,1,39001,3,GUEVARA,43.46555,-3.804091,1,Spanish Restaurant,Café,Bar,Tapas Restaurant,Restaurant,Wine Bar,Ice Cream Shop,Pizza Place,Plaza,Coffee Shop
8,CL,8,39012,9,SAN MIGUEL,43.471029,-3.838252,1,Gym / Fitness Center,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop
9,CL,3,39001,2,LA LEVA,43.467443,-3.807361,1,Café,Tapas Restaurant,Spanish Restaurant,Burger Joint,Diner,Bar,Sports Club,Cocktail Bar,Beer Bar,Concert Hall


In [75]:
sdf_cluster2.groupby('PostalCode').count()

Unnamed: 0_level_0,StreetType,District,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
39001,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21
39012,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66,66


In [77]:
sdf_cluster2_39012 = sdf_cluster2.drop(sdf_cluster2[ (sdf_cluster2['PostalCode'] != '39012') ].index).reset_index(drop=True)
sdf_cluster2_39012

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,1,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,24,SOMONTE,43.470577,-3.859825,1,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
2,CL,8,39012,9,CORBANERA,43.477869,-3.834161,1,Restaurant,Beach,Seafood Restaurant,Wine Bar,Convention Center,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
3,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,1,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop
4,CL,8,39012,8,LA TORRE,43.475281,-3.815871,1,Italian Restaurant,Bar,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
5,CL,8,39012,11,FUMORIL,43.483948,-3.808724,1,Italian Restaurant,Castle,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Wine Bar
6,AV,8,39012,27,DOCTOR DIEGO MADRAZO,43.484776,-3.794005,1,Snack Place,Restaurant,Campground,Bakery,Gym,Beach,Italian Restaurant,Brewery,Spanish Restaurant,Seafood Restaurant
7,CL,8,39012,9,SAN MIGUEL,43.471029,-3.838252,1,Gym / Fitness Center,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop
8,CL,8,39012,11,CAMUS,43.486326,-3.798123,1,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
9,CL,8,39012,24,MANUEL CACICEDO,43.470066,-3.85603,1,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center


#### Run k-means to cluster the neighborhood into 5 clusters.

In [78]:
# make a working copy to keep a reference of the original sorted dataframe
streets_venues_sorted5 = streets_venues_sorted.copy()

In [79]:
# set number of clusters
kclusters = 5

sdf_grouped_clustering5 = sdf_grouped.drop('Street', 1)

# run k-means clustering
kmeans5 = KMeans(n_clusters=kclusters, random_state=0).fit(sdf_grouped_clustering5)

# check cluster labels generated for each row in the dataframe
kmeans5.labels_[0:10] 

array([0, 3, 0, 0, 1, 0, 0, 0, 0, 3])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each street.

In [80]:
streets_venues_sorted5.insert(0, 'Cluster Labels', kmeans5.labels_)

sdf_merged5 = sdf

# merge Santander data in sdf with streets_venues_sorted to add latitude/longitude for each street
sdf_merged5 = sdf_merged5.join(streets_venues_sorted5.set_index('Street'), on='StreetName')

In [81]:
sdf_merged5

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965,1,Spanish Restaurant,Pub,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,0,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
2,CL,8,39012,7,CORBAN,43.467523,-3.870217,3,Spanish Restaurant,Bakery,Sports Bar,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289,1,Supermarket,Pizza Place,Spanish Restaurant,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979,1,Spanish Restaurant,Supermarket,Pizza Place,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
5,CL,8,39012,8,AVICHE,43.474001,-3.822505,1,Spanish Restaurant,Restaurant,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825,2,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161,3,Restaurant,Beach,Seafood Restaurant,Wine Bar,Convention Center,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489,3,Spanish Restaurant,Bakery,Restaurant,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,0,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop


Finally, let's visualize the resulting clusters

In [82]:
# create map
map_clusters_5 = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sdf_merged5['Latitude'], sdf_merged5['Longitude'], sdf_merged5['StreetName'], sdf_merged5['Cluster Labels']):
#   label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    label = folium.Popup(poi + ' Street ')
   
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_5)
       
map_clusters_5

Cluster with all the streets belonging to postal code 39001 and several from 39012 (in this case,cluster 0)

In [83]:
sdf_cluster5 = sdf_merged5.drop(sdf_merged5[ (sdf_merged5['Cluster Labels'] != 0) ].index).reset_index(drop=True)
sdf_cluster5 

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,0,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,0,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop
2,CL,8,39012,8,LA TORRE,43.475281,-3.815871,0,Italian Restaurant,Bar,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
3,CL,8,39012,11,FUMORIL,43.483948,-3.808724,0,Italian Restaurant,Castle,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Wine Bar
4,CL,1,39001,3,GUEVARA,43.46555,-3.804091,0,Spanish Restaurant,Café,Bar,Tapas Restaurant,Restaurant,Wine Bar,Ice Cream Shop,Pizza Place,Plaza,Coffee Shop
5,CL,8,39012,9,SAN MIGUEL,43.471029,-3.838252,0,Gym / Fitness Center,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop
6,CL,3,39001,2,LA LEVA,43.467443,-3.807361,0,Café,Tapas Restaurant,Spanish Restaurant,Burger Joint,Diner,Bar,Sports Club,Cocktail Bar,Beer Bar,Concert Hall
7,CL,8,39012,11,CAMUS,43.486326,-3.798123,0,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
8,CL,8,39012,7,CORCEÑO,43.467073,-3.852841,0,Italian Restaurant,Arts & Crafts Store,Clothing Store,Wine Bar,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
9,CL,8,39012,8,BOLADO,43.475206,-3.823026,0,Hostel,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop


In [84]:
sdf_cluster5.groupby('PostalCode').count()

Unnamed: 0_level_0,StreetType,District,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
39001,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21
39012,44,44,44,44,44,44,44,44,44,44,44,44,44,44,44,44,44


In [87]:
sdf_cluster5_39012 = sdf_cluster5.drop(sdf_cluster5[ (sdf_cluster5['PostalCode'] != '39012') ].index).reset_index(drop=True)
sdf_cluster5_39012

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,0,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,0,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop
2,CL,8,39012,8,LA TORRE,43.475281,-3.815871,0,Italian Restaurant,Bar,Wine Bar,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
3,CL,8,39012,11,FUMORIL,43.483948,-3.808724,0,Italian Restaurant,Castle,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store,Wine Bar
4,CL,8,39012,9,SAN MIGUEL,43.471029,-3.838252,0,Gym / Fitness Center,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop
5,CL,8,39012,11,CAMUS,43.486326,-3.798123,0,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
6,CL,8,39012,7,CORCEÑO,43.467073,-3.852841,0,Italian Restaurant,Arts & Crafts Store,Clothing Store,Wine Bar,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Department Store
7,CL,8,39012,8,BOLADO,43.475206,-3.823026,0,Hostel,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center,Dessert Shop
8,CL,8,39012,15,LA TEJERA,43.463518,-3.836478,0,Pizza Place,Supermarket,Restaurant,Big Box Store,Food & Drink Shop,Fast Food Restaurant,Electronics Store,Diner,Dessert Shop,Department Store
9,CL,7,39012,21,JORGE SEPULVEDA,43.476015,-3.811762,0,Fast Food Restaurant,Bar,Spanish Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store


#### Run k-means to cluster the neighborhood into 20 clusters.

In [88]:
# make a working copy to keep a reference of the original sorted dataframe
streets_venues_sorted20 = streets_venues_sorted.copy()

In [89]:
# set number of clusters
kclusters = 20

sdf_grouped_clustering20 = sdf_grouped.drop('Street', 1)

# run k-means clustering
kmeans20 = KMeans(n_clusters=kclusters, random_state=0).fit(sdf_grouped_clustering20)

# check cluster labels generated for each row in the dataframe
kmeans20.labels_[0:10] 

array([10,  4,  1,  1, 11,  8,  1,  1,  1,  8])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each street.

In [90]:
streets_venues_sorted20.insert(0, 'Cluster Labels', kmeans20.labels_)

sdf_merged20 = sdf

# merge Santander data in sdf with streets_venues_sorted to add latitude/longitude for each street
sdf_merged20 = sdf_merged20.join(streets_venues_sorted20.set_index('Street'), on='StreetName')

In [91]:
sdf_merged20

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,7,COSTA QUEBRADA,43.470299,-3.869965,16,Spanish Restaurant,Pub,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
1,CL,8,39012,12,INES D. NOVAL,43.48278,-3.804766,8,Video Store,BBQ Joint,Supermarket,Italian Restaurant,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
2,CL,8,39012,7,CORBAN,43.467523,-3.870217,9,Spanish Restaurant,Bakery,Sports Bar,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
3,CL,7,39012,20,REPUENTE,43.465691,-3.834289,10,Supermarket,Pizza Place,Spanish Restaurant,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
4,CL,7,39012,1,PRONILLO,43.464794,-3.830979,3,Spanish Restaurant,Supermarket,Pizza Place,Food & Drink Shop,Convenience Store,Castle,Church,Clothing Store,Cocktail Bar,Coffee Shop
5,CL,8,39012,8,AVICHE,43.474001,-3.822505,9,Spanish Restaurant,Restaurant,Campground,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
6,CL,8,39012,24,SOMONTE,43.470577,-3.859825,15,Arts & Crafts Store,Tapas Restaurant,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
7,CL,8,39012,9,CORBANERA,43.477869,-3.834161,9,Restaurant,Beach,Seafood Restaurant,Wine Bar,Convention Center,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
8,CL,8,39012,24,ELENA QUIROGA,43.467943,-3.86489,9,Spanish Restaurant,Bakery,Restaurant,Seafood Restaurant,Convention Center,Church,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall
9,CL,8,39012,24,MAZO DE ABAJO,43.470427,-3.861043,7,Restaurant,Arts & Crafts Store,Tapas Restaurant,Playground,Wine Bar,Convenience Store,Church,Clothing Store,Cocktail Bar,Coffee Shop


Finally, let's visualize the resulting clusters

In [92]:
# create map
map_clusters_20 = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sdf_merged20['Latitude'], sdf_merged20['Longitude'], sdf_merged20['StreetName'], sdf_merged20['Cluster Labels']):
#   label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    label = folium.Popup(poi + ' Street ')
   
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_20)
       
map_clusters_20

Cluster with all the streets belonging to postal code 39001 and several from 39012 (in this case,cluster 1)

In [93]:
sdf_cluster20 = sdf_merged20.drop(sdf_merged20[ (sdf_merged20['Cluster Labels'] != 1) ].index).reset_index(drop=True)
sdf_cluster20 

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,1,39001,3,GUEVARA,43.46555,-3.804091,1,Spanish Restaurant,Café,Bar,Tapas Restaurant,Restaurant,Wine Bar,Ice Cream Shop,Pizza Place,Plaza,Coffee Shop
1,CL,3,39001,2,LA LEVA,43.467443,-3.807361,1,Café,Tapas Restaurant,Spanish Restaurant,Burger Joint,Diner,Bar,Sports Club,Cocktail Bar,Beer Bar,Concert Hall
2,CL,8,39012,11,CAMUS,43.486326,-3.798123,1,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
3,CL,3,39001,3,MARIA CRISTINA,43.468321,-3.806775,1,Bar,Café,Spanish Restaurant,Sports Club,Restaurant,Sandwich Place,Cocktail Bar,Beer Bar,Pub,Concert Hall
4,CT,3,39001,2,ATALAYA,43.467079,-3.806902,1,Café,Tapas Restaurant,Bar,Diner,Cocktail Bar,Spanish Restaurant,Restaurant,Wine Bar,Sandwich Place,Concert Hall
5,CL,2,39001,21,JUAN XXIII,43.46604,-3.812874,1,Bar,Tapas Restaurant,Gastropub,Mexican Restaurant,Burger Joint,Concert Hall,Food & Drink Shop,Ice Cream Shop,Italian Restaurant,Art Gallery
6,CL,3,39001,2,SAN CELEDONIO,43.46666,-3.807685,1,Café,Bar,Tapas Restaurant,Restaurant,Cocktail Bar,Diner,Wine Bar,Beer Bar,Sandwich Place,Italian Restaurant
7,CL,8,39012,11,RICARDO LORENZO,43.492528,-3.796664,1,Bar,Lighthouse,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
8,AV,4,39012,14,CANTABRIA,43.479272,-3.795595,1,Park,Bakery,Mexican Restaurant,Hotel,Gym / Fitness Center,Grocery Store,Café,Snack Place,Seafood Restaurant,Theme Park
9,CL,3,39001,7,RIO DE LA PILA,43.468416,-3.804033,1,Bar,Café,Tapas Restaurant,Cocktail Bar,Restaurant,Spanish Restaurant,Light Rail Station,Burger Joint,Hotel,Plaza


In [94]:
sdf_cluster20.groupby('PostalCode').count()

Unnamed: 0_level_0,StreetType,District,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
39001,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21,21
39012,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5


In [95]:
sdf_cluster20_39012 = sdf_cluster20.drop(sdf_cluster20[ (sdf_cluster20['PostalCode'] != '39012') ].index).reset_index(drop=True)
sdf_cluster20_39012

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,11,CAMUS,43.486326,-3.798123,1,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,11,RICARDO LORENZO,43.492528,-3.796664,1,Bar,Lighthouse,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
2,AV,4,39012,14,CANTABRIA,43.479272,-3.795595,1,Park,Bakery,Mexican Restaurant,Hotel,Gym / Fitness Center,Grocery Store,Café,Snack Place,Seafood Restaurant,Theme Park
3,CL,8,39012,5,LA GLORIA,43.467476,-3.841331,1,Bike Rental / Bike Share,Football Stadium,Coffee Shop,Boutique,Wine Bar,Cocktail Bar,Concert Hall,Convenience Store,Convention Center,Department Store
4,CL,8,39012,5,LOS FORAMONTANOS,43.469578,-3.84135,1,Bike Rental / Bike Share,Football Stadium,Coffee Shop,Wine Bar,Clothing Store,Cocktail Bar,Concert Hall,Convenience Store,Convention Center,Department Store


## Results and Discussion <a name="results"></a>

At this point, we have made a triple analysis: from minimum possible number of clusters to a potential maximum.
And our results have been the following in each case:
   - 2 clusters: Cluster 1 has the 21 streets situated in the downtonw and 66 streets belonging to the suburbs.
   - 5 clusters: Cluster 0 has the 21 streets situated in the downtown and 44 streets belonging to the suburbs.
   - 20 clusters: Cluster 1 has the 21 streets situated in the downtown and 5 streets belonging to the suburbs.

So, as we increase the number of clusters, we see how the number of similar streets between both areas decreases, 
reaching a miminum of 5 streets in the suburbs clustered in the same label that all of the center ones.
So we can say that streets in postal code 39012 most alike to those in 39001 are:
    

In [96]:
sdf_cluster20_39012

Unnamed: 0,StreetType,District,PostalCode,Section,StreetName,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CL,8,39012,11,CAMUS,43.486326,-3.798123,1,Campground,BBQ Joint,Bar,Lighthouse,Snack Place,Department Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store
1,CL,8,39012,11,RICARDO LORENZO,43.492528,-3.796664,1,Bar,Lighthouse,Wine Bar,Department Store,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Convention Center
2,AV,4,39012,14,CANTABRIA,43.479272,-3.795595,1,Park,Bakery,Mexican Restaurant,Hotel,Gym / Fitness Center,Grocery Store,Café,Snack Place,Seafood Restaurant,Theme Park
3,CL,8,39012,5,LA GLORIA,43.467476,-3.841331,1,Bike Rental / Bike Share,Football Stadium,Coffee Shop,Boutique,Wine Bar,Cocktail Bar,Concert Hall,Convenience Store,Convention Center,Department Store
4,CL,8,39012,5,LOS FORAMONTANOS,43.469578,-3.84135,1,Bike Rental / Bike Share,Football Stadium,Coffee Shop,Wine Bar,Clothing Store,Cocktail Bar,Concert Hall,Convenience Store,Convention Center,Department Store


## Conclusion <a name="conclusion"></a>

Our goal when starting this project was, by means of joining data from two main sources, the city of Santander's open repository
and Foursquare, make a comparison between two areas of the city, the very center and the outer most suburb, in order to 
find out which places in the latter would be similar to the ones in the former in terms of venues and places to go.
And, as we have already seen, our analysis have showed that there are 5 possible candidates to be considered. In other words,
5 streets in the suburbs similar to the downtown.