<a href="https://colab.research.google.com/github/josedandrade/Coursera_Capstone/blob/main/Capstone_Final_Assignment_Mall_for_SDE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Shopping Mall for Santo Domingo Este



# Introduction 

Shopping malls have been increasingly important in modern society. Our visits are not limited to buying things any more. They have become the children's playground. Adults spent the day walking through elegant alleys equipped with benches, flowers and even palms as a means of exercising. Today, most shopping malls have many restaurants, bars, cafes or even hairdressers, beauty salons, gyms, cinemas and other entertainment attractions which enables us to fulfil a lot of different needs in the area of a single building.  Social life is gradually transferring from the areas of old towns and main streets to shopping centres.

The Distrito Nacional is a subdivision of the Dominican Republic enclosing the capital Santo Domingo. Before 2001, the Distrito Nacional was a large area that included what is now known as Santo Domingo Province. The Law 163-01 created the province and separated the Distrito Nacional from other municipalities. Santo Domingo Este was created.

Santo Domingo Este is across the Ozama River which divides the east and west sections of metropolitan Santo Domingo. It is more residential and less commercially developed, but it has experienced growth since its creation, with new malls and department stores.

# Business Problem

The **Distrito National** has a high density of housing and businesses. Transportation is a growing issue. In the last two decades many shopping centers have experienced a decline in attracting visiting public and a drop in their commercial activities. The current economic climate and culture of "new is better than old" has left many commercial centers built in the 80's, 90's and early 2000, vacant and disused. It may be time to look elsewhere when thinking about new commercial plazas.

**Santo Domingo Este**, has a booming economy which is rapidly attracting the interest of many not just as a living destination, but for investing purposes as well. The rapid growth poses a problem when trying to decide where to open a business. This work aims to provide a guide to answer the business question: 

Considering the issues in the National District, where in the city of Santo Domingo Este would be the best location to build a new shopping mall?

# Data

> **Neighbourhoods**. Geographically localised communities within a larger city, town, suburb or rural area. The scope of this project is constrained to locations in the city of Santo Domingo Este, the second most important municipality of the province of Santo Domingo. 

> **Latitude and longitude coordinates**. Latitude and longitude, coordinate system by means of which the position or location of any place on Earth’s surface can be determined and described.

> **Venues**. Data of businesses in the vicinity of the geocoded neighbourhoods. We will use this data for cluster analysis.

## Data Sources, APIs and Python Libraries
This project requires many data science skills from web scraping, working with API, data wrangling, to machine learning and data visualization.

> **Governmen Data**. Data from this government page https://www.one.gob.do/ , Dominican Republic’s head department in charge of statistics. There are several databases and Keyhole Markup Language (KML) files with geographic annotation.

> **Foursquare and Google Places API**. After obtaining geocoded data, we will use these APIs to get venue data for those neighbourhoods. These two service providers have some of the largest databases of places.

> **Python Libraries**.
We will get geographical coordinates using Python Geocoder package which will give us coordinates of the neighbourhoods. Other libraries to be used:
**Pandas**: For creating and manipulating dataframes.
**Folium**: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
**Scikit Learn**: For importing k-means clustering.
**JSON**: Library to handle JSON files.
**Beautiful Soup** and **Requests**: To scrap web pages and to handle http requests.
Matplotlib: Python Plotting Module.
Foursquare API will provide many categories of the venue data. Our main interest is the Shopping Mall category.

# Methodology

Geospatial analysis can help us to select the best location for opening a new shopping mall in the city of Santo Domingo Este. We will follow a data science methodology and utilize machine learning techniques to create a model.

We will follow a process of geocoding neighbourhoods. This is the process of transforming a description of a location, such as an address, or a name of a place, to a location on the earth's surface. The resulting locations are output as geographic features with attributes, which can be used for mapping or spatial analysis. The features are expressed in terms of latitude and longitude, or coordinates, of neighbourhoods. 

For every neighbourhood, we will search for venues in the area. We will use FOURSQUARE services to get access to global Points Of Interes data and rich content, such as Shooping Mall, Restaurant, etc. The resulting venue categories will be new attributes that describe neighbourhoods.

Every observed neighbourhood, described by it's attributes, will be grouped (a cluster analysis) using an algorithm and technique that allows us to find similar neighbourhood. Groups with low density or absense of Shopping Malls will become our choice to recommend stakeholders where to build their project.

We will use the general-purpose programming language Python.

# Getting the Data

We start off by importing all the required packages.

In [1]:
!pip install geocoder

import numpy as np                          # vectors
import pandas as pd                         # data analysis
#pd.set_option("display.max_columns", None)
#pd.set_option("display.max_rows", None)

import json                                 # JSON files
import geocoder                             # geocoding
import requests                             # requests
from bs4 import BeautifulSoup               # parsing 
from pandas.io.json import json_normalize   # tranform JSON files for data analysis
import matplotlib.cm as cm                  # plotting
import matplotlib.colors as colors
from sklearn.cluster import KMeans          # machine learning for clustering
import folium                               # map rendering

print("Libraries imported.")

Libraries imported.


## Geolocation of Santo Domingo Este and Neighbourhoods

### Note: Web Scraping

We could build a list of neighbourhoods in **Santo Domingo Este** by scraping a web page and then geocode using any good provider. But, we had visited a government building that provided us with KML files, and other data sources.

Code to build a list by scraping the data from this [Wikipedia](https://en.wikipedia.org/wiki/Santo_Domingo_Este) page is as follows:

url = https://en.wikipedia.org/wiki/Santo_Domingo_Este'

```
# # send the GET request and parse response into a beautifulsoup object
data = requests.get(url).text
soup = BeautifulSoup(data, 'html.parser')

# store neighborhood data in a List
neighborhoodList = []
for row in soup.find_all("table", class_="multicol")[0].findAll("li"):
    neighborhoodList.append(row.text)

# create a DataFrame from the list
sde_df = pd.DataFrame({"Neighborhood": neighborhoodList})
print(sde_df.shape)
sde_df.head()
```






We chose to build our list of neighbourhoods in **Santo Domingo Este** by following government preestablished demarcations and geocoded points available at the government site https://www.one.gob.do/ , Dominican Republic’s head department in charge of statistics. We transformed the one KML file to JSON format.

In [2]:
# file with demarcations and coordinates
filename = 'https://raw.githubusercontent.com/josedandrade/Coursera_Capstone/main/sdePoints.json'

# create our initial dataFrame with data from file
sde_df = pd.read_json(filename)   

# a copy dataframe to populate the coordinates later on
sde_df_copy = sde_df

# sample some data
sde_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,El Cuatro,18.556292,-69.814507
1,La Ureña,18.472346,-69.753835
2,Los Paredones,18.493994,-69.751161
3,El Valiente,18.467266,-69.715294
4,San Miguel,18.501591,-69.804967


In [3]:
# number of geocoded areas or neighbourhoods
print(sde_df.shape) 

(34, 3)


**So we have 34 geocoded neighbourhoods. We will search venues around those points to find a most suitable location for our desired Shopping Mall.**


**Precautions must be taken when geocoding using APIs and libraries, there might be errors. We geocoded with ArcGIS and Google Maps and found a few errors.**

The purpose the the following two sections is to justify our choosing of official data. We found some neighbourhoods where completely in another city. We will show the differences in a map.

We will define a global variable of our main location of interest, **Santo Domingo Este** and geocode the previous list of 34 neighbourhoods.

In [4]:
# let's define a global variable to hold our main location of interest
location_of_interest = 'Santo Domingo Este, Dominican Republic'

### **Geocoding with Google Maps API**

We need an API Key in order to use Google APIs.

In [5]:
# Google API Key
google_api_key = ''  # your Google API key

Get the coordinates of Santo Domingo Este and neighbourhoods.

In [6]:
def get_coordinates(api_key, location_of_interest, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, location_of_interest)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
  
location_center = get_coordinates(google_api_key, location_of_interest)
print('Coordinates of {}: {}'.format(location_of_interest, location_center))

Coordinates of Santo Domingo Este, Dominican Republic: [18.4893469, -69.8255369]


In [7]:
# store coordinates in a list calling function on every location
coords = []
coords = [ get_coordinates(google_api_key, '{}, {}'.format(neighborhood, location_of_interest)) for neighborhood in sde_df["Neighborhood"].tolist() ]

# uncomment to inspect coordinates
# coords

In [8]:
# create a dataframe to populate the coordinates into Latitude and Longitude
google_coords = pd.DataFrame(coords, columns=['LatitudeGoogle', 'LongitudeGoogle'])

In [9]:
# merge into the initial dataframe
sde_df['LatitudeGoogle'] = google_coords['LatitudeGoogle']
sde_df['LongitudeGoogle'] = google_coords['LongitudeGoogle']

In [10]:
# sample some data
print(sde_df.shape)
sde_df.head()

(34, 5)


Unnamed: 0,Neighborhood,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
0,El Cuatro,18.556292,-69.814507,18.487608,-69.84775
1,La Ureña,18.472346,-69.753835,18.470018,-69.73465
2,Los Paredones,18.493994,-69.751161,18.467393,-69.734941
3,El Valiente,18.467266,-69.715294,18.461279,-69.69711
4,San Miguel,18.501591,-69.804967,18.51593,-69.842611


### **Geocoding with ARCGIS**

Get the coordinates of Santo Domingo Este

In [11]:
geocoded_location_of_interest = geocoder.arcgis(location_of_interest)
geolocation_of_interest = geocoded_location_of_interest.latlng
print('The geographical coordinate of {} {}.'.format(location_of_interest, geolocation_of_interest))

The geographical coordinate of Santo Domingo Este, Dominican Republic [18.50532000000004, -69.85663999999997].


Now, we can implement that as a function to get geographical coordinates of our neighbourhoods.

In [12]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, {}'.format(location_of_interest, neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [13]:
# store coordinates in a list calling function on every location
coords = []
coords = [ get_latlng(neighborhood) for neighborhood in sde_df["Neighborhood"].tolist() ]

# uncomment to inspect coordinates
# coords

In [14]:
# create a dataframe to populate the coordinates into Latitude and Longitude
merge_arcgis_coords = pd.DataFrame(coords, columns=['LatitudeArcGIS', 'LongitudeArcGIS'])

In [15]:
# merge into the original dataframe
sde_df['LatitudeArcGIS'] = merge_arcgis_coords['LatitudeArcGIS']
sde_df['LongitudeArcGIS'] = merge_arcgis_coords['LongitudeArcGIS']

In [16]:
# inspect neighborhoods and coordinates
print(sde_df.shape)
sde_df.head()

(34, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle,LatitudeArcGIS,LongitudeArcGIS
0,El Cuatro,18.556292,-69.814507,18.487608,-69.84775,18.54982,-69.80892
1,La Ureña,18.472346,-69.753835,18.470018,-69.73465,18.46572,-69.73383
2,Los Paredones,18.493994,-69.751161,18.467393,-69.734941,18.50532,-69.85664
3,El Valiente,18.467266,-69.715294,18.461279,-69.69711,18.46456,-69.69571
4,San Miguel,18.501591,-69.804967,18.51593,-69.842611,18.511155,-69.860275


Let's map all coordinates and see if there are errors with neighbourhoods that fall outside the boundaries of our location of interest.

> We will use the geographical boundaries of towns and rural areas from KML files we got from official sources [here](https://www.one.gob.do).



In [17]:
from folium import plugins

filename = 'https://raw.githubusercontent.com/josedandrade/Coursera_Capstone/main/barriosSDE.geojson'
location_boroughs = requests.get(filename).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

# create map of Santo Domingo Este using latitude and longitude values
map_location = folium.Map(location=location_center, zoom_start=12)
folium.Marker(location_center, popup=location_of_interest).add_to(map_location)

# add boroughs
folium.TileLayer('cartodbpositron').add_to(map_location)            #cartodbpositron cartodbdark_matter
folium.GeoJson(location_boroughs, style_function=boroughs_style, name='geojson').add_to(map_location)

# add blue markers to map to display official data
for lat, lng, latG, lngG, LatA, lngA, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['LatitudeArcGIS'], sde_df['LongitudeArcGIS'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=7, popup=label, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_location)  

# add green markers to map to display Google geocoded data
for lat, lng, latG, lngG, LatA, lngA, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['LatitudeArcGIS'], sde_df['LongitudeArcGIS'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([LatA, lngA], radius=4, popup=label, color='green', fill=True, fill_color='green', fill_opacity=1).add_to(map_location)  

# add red markers to map to display ArcGIS geocoded data
for lat, lng, latG, lngG, LatA, lngA, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['LatitudeArcGIS'], sde_df['LongitudeArcGIS'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([latG, lngG], radius=4, popup=label, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_location)  

map_location


**We can see some red and green points outside the boundaries of Santo Domingo Este**. The blue dots are right in place, and they are from official sources.

> **FINDING** We should use official data and discard geocoding neighbourhoods.

## Venues
We will use Foursquare to explore neighbourhood venues. We use the geographical coordinates of neighborhoods, then obtain venue data from neighborhoods via Foursquare API.


### Using Foursquare API to get Venues

Now that we have geocoded our locations we use Foursquare API to get information on businesses in each area. We nee a Key to access the API.

In [18]:
# Foursquare API Key
CLIENT_ID = ''     # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'

We are interested in venues in 'Shop and Service' Category ID 4d4b7105d754a06378d81259, but we shall get information on other businesses like coffe shops, pizza places, bakeries etc. so that we can find similarities and later on perform a cluster analysis.

In [19]:
radius = 3000
LIMIT = 200

venues = []

for lat, long, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

Sampling our data

In [20]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(10)

(1841, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,El Cuatro,18.556292,-69.814507,Demolition Gym,18.530249,-69.812116,Gym / Fitness Center
1,El Cuatro,18.556292,-69.814507,Terraza Car Wash San Luis,18.532166,-69.807607,Beer Garden
2,El Cuatro,18.556292,-69.814507,Supermarket Pristine,18.536692,-69.813238,Market
3,El Cuatro,18.556292,-69.814507,Parque Club Invicea,18.532229,-69.816012,Park
4,El Cuatro,18.556292,-69.814507,Control de la Omsa,18.530601,-69.819003,Bus Station
5,La Ureña,18.472346,-69.753835,Autodromo Sunix,18.464667,-69.749162,Racetrack
6,La Ureña,18.472346,-69.753835,Autódromo Mobil 1,18.465494,-69.747178,Racetrack
7,La Ureña,18.472346,-69.753835,Hipódromo V Centenario,18.477778,-69.778161,Racetrack
8,La Ureña,18.472346,-69.753835,Club de la Direccion General de Aduanas,18.476786,-69.753682,Café
9,La Ureña,18.472346,-69.753835,Hipermercados Olé,18.493034,-69.74675,Big Box Store


### Venue Categories
We need to now see how many Venue Categories are there for further processing

In [21]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alma Rosa,100,100,100,100,100,100
Brisa Oriental,17,17,17,17,17,17
Cancino,53,53,53,53,53,53
Cancino Adentro,29,29,29,29,29,29
Cancino Afuera,27,27,27,27,27,27
El Almirante,34,34,34,34,34,34
El Cachón de la Rubia,41,41,41,41,41,41
El Cuatro,5,5,5,5,5,5
El Tamarindo,22,22,22,22,22,22
El Valiente,10,10,10,10,10,10


In [22]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 119 uniques categories.


In [23]:
venues_df['VenueCategory'].unique()[:50]

array(['Gym / Fitness Center', 'Beer Garden', 'Market', 'Park',
       'Bus Station', 'Racetrack', 'Café', 'Big Box Store',
       'Baseball Field', 'BBQ Joint', 'Hotel', 'Coffee Shop',
       'Latin American Restaurant', 'Toll Booth', 'Bus Stop',
       'Gas Station', 'Bar', 'Gym', 'Toll Plaza', 'Burger Joint',
       'Bakery', 'Supermarket', 'Steakhouse', 'Restaurant',
       'Cupcake Shop', 'Fast Food Restaurant', 'Caribbean Restaurant',
       'Ice Cream Shop', 'Pharmacy', 'Taco Place', 'Nightclub',
       'Food Truck', 'Bank', 'Grocery Store', 'Shopping Mall', 'Dive Bar',
       'Sandwich Place', 'Snack Place', 'Fried Chicken Joint',
       'Department Store', 'Seafood Restaurant', 'Cable Car',
       'Pizza Place', 'Hookah Bar', 'Furniture / Home Store', 'Plaza',
       'History Museum', 'French Restaurant', 'Spanish Restaurant',
       'Music Venue'], dtype=object)

Evaluate if our category of interest is present on all unique categories from all the returned venues

In [24]:
"Shopping Mall" in venues_df['VenueCategory'].unique() #displays all the category names

True

In [25]:
venues_df.query('VenueCategory == "Shopping Mall"').agg(['nunique','count','size'])

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
nunique,21,21,21,4,4,4,1
count,25,25,25,25,25,25,25
size,25,25,25,25,25,25,25


So now we have all the venues in our location of interest and it's demarcations. We have collected all venues within a radius of 3 km of every neighbourhood center. We also know there are Shopping Malls in the area.

This concludes the data gathering aspect of our study. We are going to use this data for analysis and to produce the report on optimal locations for a new Shopping Mall.

# Model Building

Our focus is on detecting areas of Santo Domingo Este that have low Shopping Malls density, We will limit our analysis to an area from a location that is highly centered. And, we have collected the required data: location and venues.

Second step in our analysis will be calculation and exploration of geographic segmentation across different areas of Santo Domingo Este. We will identify a few promising areas and focus our attention on those areas.

We will create clusters of locations that meet some basic requirements established in discussion with stakeholders. We will present a map of all such locations but also create clusters (using k-means clustering) of those locations to identify general neighborhoods which should be a starting point for final exploration and search for optimal venue location by stakeholders.

In [26]:
from folium.plugins import HeatMap

map_location = folium.Map(location=location_center, zoom_start=13)  

folium.TileLayer('cartodbpositron').add_to(map_location) #cartodbpositron cartodbdark_matter
folium.Marker(location_center).add_to(map_location)
folium.Circle(location_center, radius=6000, fill=False, color='red').add_to(map_location)
folium.GeoJson(location_boroughs, style_function=boroughs_style, name='geojson').add_to(map_location)

# add blue markers to map to display official data
for lat, lng, latG, lngG, LatA, lngA, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['LatitudeArcGIS'], sde_df['LongitudeArcGIS'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=7, popup=label, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_location)  
    
map_location

## One Hot Encoding 
We need to Encode our venue categories to get a better result for our clustering

In [27]:
# one hot encoding
sde_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sde_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sde_onehot.columns[-1]] + list(sde_onehot.columns[:-1])
sde_onehot = sde_onehot[fixed_columns]

print(sde_onehot.shape)
sde_onehot.head(20)

(1841, 120)


Unnamed: 0,Neighborhoods,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Nightclub,Optical Shop,Paella Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar
0,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
sde_grouped = sde_onehot.groupby(["Neighborhoods"]).sum().reset_index()

print(sde_grouped.shape)
sde_grouped

(34, 120)


Unnamed: 0,Neighborhoods,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Nightclub,Optical Shop,Paella Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,5,0,0,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0
1,Brisa Oriental,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,Cancino,0,0,0,1,1,0,0,1,4,1,0,0,0,0,0,1,0,3,0,0,2,1,1,0,0,0,1,0,0,0,0,1,2,0,0,0,1,0,2,...,2,0,0,0,0,0,2,2,0,1,0,0,0,3,0,0,0,1,0,0,1,0,1,0,1,1,0,2,5,0,0,0,0,0,0,0,0,0,0,0
3,Cancino Adentro,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,1,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,0,0
4,Cancino Afuera,0,0,0,1,0,0,0,0,3,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,...,2,0,0,1,0,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0
5,El Almirante,0,0,0,0,0,0,0,1,3,2,0,0,0,0,0,0,0,1,1,0,0,0,2,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,...,2,0,0,2,0,0,2,0,0,0,0,0,0,1,0,0,0,1,0,1,2,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0
6,El Cachón de la Rubia,0,0,0,1,1,0,0,0,3,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,0,2,0,0,0,0,0,4,0,0,0,0,0,2,...,0,0,0,0,0,0,1,3,0,1,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,1,0,1,3,0,0,0,0,0,0,0,0,0,0,0
7,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,El Tamarindo,0,0,0,0,0,1,0,0,3,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0
9,El Valiente,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0


We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood

In [29]:
len((sde_grouped[sde_grouped["Shopping Mall"] > 0])) 

21

In [30]:
sde_mall = sde_grouped[["Neighborhoods","Shopping Mall"]]

In [31]:
sde_mall

Unnamed: 0,Neighborhoods,Shopping Mall
0,Alma Rosa,1
1,Brisa Oriental,1
2,Cancino,1
3,Cancino Adentro,0
4,Cancino Afuera,0
5,El Almirante,2
6,El Cachón de la Rubia,1
7,El Cuatro,0
8,El Tamarindo,0
9,El Valiente,0


## K Means Clustering

Run k-means to cluster the neighborhoods into 3 clusters.

In [32]:
sde_mall = sde_grouped
sde_mall.head()

Unnamed: 0,Neighborhoods,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Nightclub,Optical Shop,Paella Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,5,0,0,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0
1,Brisa Oriental,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
2,Cancino,0,0,0,1,1,0,0,1,4,1,0,0,0,0,0,1,0,3,0,0,2,1,1,0,0,0,1,0,0,0,0,1,2,0,0,0,1,0,2,...,2,0,0,0,0,0,2,2,0,1,0,0,0,3,0,0,0,1,0,0,1,0,1,0,1,1,0,2,5,0,0,0,0,0,0,0,0,0,0,0
3,Cancino Adentro,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,1,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,0,0
4,Cancino Afuera,0,0,0,1,0,0,0,0,3,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,...,2,0,0,1,0,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0


In [33]:
# set number of clusters
kclusters = 3

sde_clustering = sde_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sde_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 0, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

## Labelling Clustered Data

In [34]:
kmeans.labels_

array([0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 2, 0, 1, 0, 1, 1, 2, 0, 1, 2, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 1, 2, 2, 0], dtype=int32)

In [35]:
# create a new dataframe that includes the cluster
sde_merged = sde_mall.copy()

# add clustering labels
sde_merged["Cluster Labels"] = kmeans.labels_

In [36]:
sde_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
sde_merged

Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Optical Shop,Paella Restaurant,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,0,0,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0,0
1,Brisa Oriental,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,...,0,0,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1
2,Cancino,0,0,0,1,1,0,0,1,4,1,0,0,0,0,0,1,0,3,0,0,2,1,1,0,0,0,1,0,0,0,0,1,2,0,0,0,1,0,2,...,0,0,0,0,0,2,2,0,1,0,0,0,3,0,0,0,1,0,0,1,0,1,0,1,1,0,2,5,0,0,0,0,0,0,0,0,0,0,0,0
3,Cancino Adentro,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,0,0,1
4,Cancino Afuera,0,0,0,1,0,0,0,0,3,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,...,0,0,1,0,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,1
5,El Almirante,0,0,0,0,0,0,0,1,3,2,0,0,0,0,0,0,0,1,1,0,0,0,2,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,...,0,0,2,0,0,2,0,0,0,0,0,0,1,0,0,0,1,0,1,2,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,1
6,El Cachón de la Rubia,0,0,0,1,1,0,0,0,3,0,0,0,0,0,0,1,0,1,0,0,1,2,0,0,0,0,2,0,0,0,0,0,4,0,0,0,0,0,2,...,0,0,0,0,0,1,3,0,1,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,1,0,1,3,0,0,0,0,0,0,0,0,0,0,0,1
7,El Cuatro,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
8,El Tamarindo,0,0,0,0,0,1,0,0,3,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1
9,El Valiente,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1


In [37]:
df_tmp = sde_df_copy.sort_values('Neighborhood').reset_index(drop=True)
df_tmp

Unnamed: 0,Neighborhood,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle,LatitudeArcGIS,LongitudeArcGIS
0,Alma Rosa,18.492066,-69.854715,18.494693,-69.851387,18.49417,-69.85453
1,Brisa Oriental,18.489187,-69.795586,18.489556,-69.79522,18.48877,-69.79605
2,Cancino,18.518586,-69.849113,18.521362,-69.844691,18.51377,-69.84089
3,Cancino Adentro,18.535158,-69.842649,18.53181,-69.842383,18.53676,-69.84154
4,Cancino Afuera,18.525038,-69.834488,18.522167,-69.83482,18.52734,-69.83831
5,El Almirante,18.519651,-69.80756,18.521651,-69.807377,18.50532,-69.85664
6,El Cachón de la Rubia,18.531825,-69.852934,18.531336,-69.847387,18.51106,-69.84841
7,El Cuatro,18.556292,-69.814507,18.487608,-69.84775,18.54982,-69.80892
8,El Tamarindo,18.537185,-69.821381,18.535525,-69.835246,18.53184,-69.82803
9,El Valiente,18.467266,-69.715294,18.461279,-69.69711,18.46456,-69.69571


In [38]:
# add coordinates
sde_merged['Latitude'] = df_tmp['Latitude']
sde_merged['Longitude'] = df_tmp['Longitude']
sde_merged.head() 

Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels,Latitude,Longitude
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0,0,18.492066,-69.854715
1,Brisa Oriental,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,...,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,18.489187,-69.795586
2,Cancino,0,0,0,1,1,0,0,1,4,1,0,0,0,0,0,1,0,3,0,0,2,1,1,0,0,0,1,0,0,0,0,1,2,0,0,0,1,0,2,...,0,0,0,2,2,0,1,0,0,0,3,0,0,0,1,0,0,1,0,1,0,1,1,0,2,5,0,0,0,0,0,0,0,0,0,0,0,0,18.518586,-69.849113
3,Cancino Adentro,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,0,0,1,18.535158,-69.842649
4,Cancino Afuera,0,0,0,1,0,0,0,0,3,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,...,1,0,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,1,18.525038,-69.834488


In [39]:
# sorting the results by Cluster Labels
print(sde_merged.shape)
sde_merged.sort_values(["Cluster Labels"], inplace=True)
sde_merged

(34, 123)


Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels,Latitude,Longitude
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0,0,18.492066,-69.854715
29,San José de Mendoza,0,0,0,0,0,0,4,2,5,2,0,0,0,0,0,1,0,2,0,0,0,0,4,0,0,0,0,0,0,0,0,2,1,1,0,0,0,0,2,...,4,0,0,3,3,0,0,0,0,0,1,0,0,0,1,0,0,2,0,0,0,0,0,0,2,4,0,1,0,0,0,0,0,0,0,0,0,0,18.496395,-69.817361
27,Mendoza,1,0,0,1,2,0,6,2,3,3,0,0,0,1,0,2,0,7,0,0,0,1,1,0,0,0,1,0,0,1,0,3,3,1,0,1,1,0,3,...,4,0,0,2,6,0,1,1,0,0,4,0,0,1,2,0,0,1,0,1,0,0,0,0,2,5,0,1,0,0,0,0,0,0,0,0,0,0,18.504841,-69.847502
26,Los Trinitarios,0,0,0,1,1,0,4,3,8,2,0,0,0,1,0,2,0,4,0,0,0,2,3,0,0,0,2,0,0,0,0,3,5,1,0,1,2,0,4,...,3,0,0,2,3,0,1,0,0,0,3,0,0,0,2,0,1,2,0,1,0,0,0,0,3,8,1,1,0,0,0,0,0,0,0,0,0,0,18.50587,-69.834931
25,Los Tres Ojos,0,1,0,0,0,0,4,3,2,2,0,0,0,1,0,1,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,2,...,4,0,0,1,1,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,0,2,4,0,1,0,1,0,0,0,0,0,1,0,0,18.482405,-69.833204
22,Los Mina Sur,1,0,0,1,2,0,5,1,2,5,0,1,0,1,0,1,0,6,0,0,0,2,0,0,0,0,2,0,0,1,0,1,3,0,0,1,2,0,3,...,1,0,0,1,7,0,1,2,0,0,3,0,0,1,1,0,0,1,0,1,0,1,1,0,2,3,1,0,0,1,0,0,0,0,0,0,0,0,18.503515,-69.866777
21,Los Mina Norte,1,0,0,1,2,0,4,1,2,2,0,0,0,0,0,1,0,5,0,0,1,2,0,0,0,0,2,0,0,1,0,1,4,0,0,1,1,0,3,...,0,0,0,1,7,0,1,1,0,0,4,0,0,1,1,0,0,1,0,1,0,1,1,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,18.514073,-69.864281
18,Las Canas,0,1,0,0,0,0,4,2,4,3,0,0,0,2,0,1,0,3,0,0,0,0,3,0,0,0,0,0,1,0,0,1,1,1,1,0,0,0,1,...,3,0,0,1,5,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,3,1,1,0,0,0,0,0,0,0,0,0,0,18.473049,-69.82278
14,Juan López,0,0,0,0,0,0,2,2,5,1,0,0,0,0,0,1,0,2,0,0,0,0,4,0,0,0,0,0,1,0,0,2,1,0,1,0,0,0,2,...,5,0,0,4,1,0,0,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,3,0,1,0,0,0,0,0,0,0,0,0,0,18.488003,-69.812614
12,Hainamosa,0,0,0,0,0,0,0,2,6,2,0,0,0,0,0,1,0,2,1,0,0,0,3,0,0,0,0,0,0,0,0,2,3,0,0,0,0,0,2,...,2,0,0,3,1,0,0,0,0,0,2,0,0,0,1,0,1,1,0,1,0,0,0,0,3,5,0,1,0,0,0,0,0,0,0,0,0,0,18.511071,-69.822362


## Visualization

In [40]:
# create map
map_clusters = folium.Map(location=location_center, zoom_start=11)
folium.GeoJson(location_boroughs, style_function=boroughs_style, name='geojson').add_to(map_clusters)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, cluster in zip(sde_merged['Latitude'], sde_merged['Longitude'], sde_merged['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for lat, lng, latG, lngG, LatA, lngA, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['LatitudeArcGIS'], sde_df['LongitudeArcGIS'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=1, popup=label, color='blue', fill=True, fill_color='blue', fill_opacity=0).add_to(map_clusters)          

map_clusters

## Examine clusters

In [41]:
# number of neighbourhoods in cluster
print(len(sde_merged.loc[sde_merged['Cluster Labels'] == 0]))
print(len(sde_merged.loc[sde_merged['Cluster Labels'] == 1]))
print(len(sde_merged.loc[sde_merged['Cluster Labels'] == 2]))

13
16
5


Cluster 0

In [42]:
print('Cluster 0: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 0])))
sde_merged.loc[sde_merged['Cluster Labels'] == 0]

Cluster 0: Number of neighbourhoods/places: 13


Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels,Latitude,Longitude
0,Alma Rosa,1,1,0,1,2,0,6,2,2,5,0,0,0,2,0,1,0,6,0,0,0,1,1,0,0,0,1,0,0,1,0,2,3,1,0,0,2,0,3,...,3,0,0,1,6,0,1,2,0,0,2,0,0,1,1,0,0,1,0,1,0,1,0,0,2,4,1,1,0,1,0,0,0,0,0,1,0,0,18.492066,-69.854715
29,San José de Mendoza,0,0,0,0,0,0,4,2,5,2,0,0,0,0,0,1,0,2,0,0,0,0,4,0,0,0,0,0,0,0,0,2,1,1,0,0,0,0,2,...,4,0,0,3,3,0,0,0,0,0,1,0,0,0,1,0,0,2,0,0,0,0,0,0,2,4,0,1,0,0,0,0,0,0,0,0,0,0,18.496395,-69.817361
27,Mendoza,1,0,0,1,2,0,6,2,3,3,0,0,0,1,0,2,0,7,0,0,0,1,1,0,0,0,1,0,0,1,0,3,3,1,0,1,1,0,3,...,4,0,0,2,6,0,1,1,0,0,4,0,0,1,2,0,0,1,0,1,0,0,0,0,2,5,0,1,0,0,0,0,0,0,0,0,0,0,18.504841,-69.847502
26,Los Trinitarios,0,0,0,1,1,0,4,3,8,2,0,0,0,1,0,2,0,4,0,0,0,2,3,0,0,0,2,0,0,0,0,3,5,1,0,1,2,0,4,...,3,0,0,2,3,0,1,0,0,0,3,0,0,0,2,0,1,2,0,1,0,0,0,0,3,8,1,1,0,0,0,0,0,0,0,0,0,0,18.50587,-69.834931
25,Los Tres Ojos,0,1,0,0,0,0,4,3,2,2,0,0,0,1,0,1,0,4,0,0,0,0,4,0,0,0,0,0,0,0,0,3,0,1,0,0,0,0,2,...,4,0,0,1,1,0,0,0,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,0,2,4,0,1,0,1,0,0,0,0,0,1,0,0,18.482405,-69.833204
22,Los Mina Sur,1,0,0,1,2,0,5,1,2,5,0,1,0,1,0,1,0,6,0,0,0,2,0,0,0,0,2,0,0,1,0,1,3,0,0,1,2,0,3,...,1,0,0,1,7,0,1,2,0,0,3,0,0,1,1,0,0,1,0,1,0,1,1,0,2,3,1,0,0,1,0,0,0,0,0,0,0,0,18.503515,-69.866777
21,Los Mina Norte,1,0,0,1,2,0,4,1,2,2,0,0,0,0,0,1,0,5,0,0,1,2,0,0,0,0,2,0,0,1,0,1,4,0,0,1,1,0,3,...,0,0,0,1,7,0,1,1,0,0,4,0,0,1,1,0,0,1,0,1,0,1,1,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,18.514073,-69.864281
18,Las Canas,0,1,0,0,0,0,4,2,4,3,0,0,0,2,0,1,0,3,0,0,0,0,3,0,0,0,0,0,1,0,0,1,1,1,1,0,0,0,1,...,3,0,0,1,5,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,3,1,1,0,0,0,0,0,0,0,0,0,0,18.473049,-69.82278
14,Juan López,0,0,0,0,0,0,2,2,5,1,0,0,0,0,0,1,0,2,0,0,0,0,4,0,0,0,0,0,1,0,0,2,1,0,1,0,0,0,2,...,5,0,0,4,1,0,0,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,3,0,1,0,0,0,0,0,0,0,0,0,0,18.488003,-69.812614
12,Hainamosa,0,0,0,0,0,0,0,2,6,2,0,0,0,0,0,1,0,2,1,0,0,0,3,0,0,0,0,0,0,0,0,2,3,0,0,0,0,0,2,...,2,0,0,3,1,0,0,0,0,0,2,0,0,0,1,0,1,1,0,1,0,0,0,0,3,5,0,1,0,0,0,0,0,0,0,0,0,0,18.511071,-69.822362


Cluster 1

In [43]:
print('Cluster 1: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 1])))
sde_merged.loc[sde_merged['Cluster Labels'] == 1]

Cluster 1: Number of neighbourhoods/places: 16


Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels,Latitude,Longitude
5,El Almirante,0,0,0,0,0,0,0,1,3,2,0,0,0,0,0,0,0,1,1,0,0,0,2,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,...,2,0,0,2,0,0,0,0,0,0,1,0,0,0,1,0,1,2,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,1,18.519651,-69.80756
30,San Miguel,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,3,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,2,...,2,0,0,2,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,2,0,1,0,0,0,0,0,0,0,0,0,1,18.501591,-69.804967
1,Brisa Oriental,0,0,0,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,...,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,18.489187,-69.795586
28,Prado Oriental,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,...,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,18.491295,-69.784123
3,Cancino Adentro,0,0,0,0,0,0,0,0,3,1,0,0,0,0,0,1,0,1,1,0,2,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,...,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,1,0,0,0,0,1,4,0,0,0,0,0,0,0,0,0,0,0,1,18.535158,-69.842649
4,Cancino Afuera,0,0,0,1,0,0,0,0,3,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,...,1,0,0,2,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,1,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,1,18.525038,-69.834488
24,Los Tres Brazos,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,2,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,2,...,0,0,0,1,3,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,18.517222,-69.882235
23,Los Paredones,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,18.493994,-69.751161
16,La Ureña,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,18.472346,-69.753835
19,Los Frailes,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,...,1,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,1,18.471529,-69.79927


Cluster 2

In [44]:
print('Cluster 2: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 2])))
sde_merged.loc[sde_merged['Cluster Labels'] == 2]

Cluster 2: Number of neighbourhoods/places: 5


Unnamed: 0,Neighborhood,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,...,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Post Office,Pub,Racetrack,Resort,Restaurant,Road,Roof Deck,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Stadium,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Cluster Labels,Latitude,Longitude
20,Los Mameyes,0,1,2,0,1,0,4,0,0,7,0,0,2,2,0,1,1,2,0,0,0,3,2,1,1,1,0,1,1,1,0,1,0,1,0,0,0,1,0,...,5,0,1,0,2,2,0,2,0,0,2,0,0,0,1,2,0,0,0,0,2,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,2,2,18.471205,-69.861227
17,Las Américas,1,0,2,0,1,0,4,1,0,7,0,0,2,1,0,1,1,4,0,0,0,2,2,1,1,1,0,1,1,1,0,1,0,1,0,0,0,0,1,...,4,0,1,0,2,2,0,2,0,0,2,0,0,1,1,2,0,0,0,0,2,0,0,0,1,2,0,0,1,1,1,0,0,0,1,0,2,2,18.483856,-69.866139
32,Villa Duarte,0,0,1,0,0,0,4,0,0,4,0,0,1,0,0,0,1,1,0,0,0,2,2,1,1,1,0,1,1,1,0,1,1,0,0,0,1,0,1,...,4,1,1,0,1,1,0,0,0,0,0,0,0,0,2,1,0,0,1,0,4,0,0,1,1,2,1,0,1,0,0,0,0,0,0,1,1,2,18.480788,-69.876387
31,Sans Soucí,0,1,1,0,1,0,4,0,0,4,0,0,1,3,1,0,1,0,0,0,0,3,3,1,1,1,1,2,1,1,0,0,0,0,0,0,0,1,0,...,4,1,1,0,2,2,0,2,0,0,1,0,1,0,1,2,0,0,0,0,3,0,0,0,1,1,1,0,2,1,1,1,0,0,1,0,1,2,18.469144,-69.873595
11,Faro a Colón,0,1,2,0,1,0,3,0,0,8,0,0,2,2,0,0,1,3,0,0,0,3,2,2,1,1,0,1,1,1,0,0,0,1,0,0,0,1,0,...,5,0,1,0,2,2,0,2,0,0,2,0,0,0,0,2,0,0,0,0,2,0,0,0,1,1,0,0,1,1,1,1,0,0,1,1,2,2,18.475734,-69.86572


# Final observation

A good number of shopping locations are in the central area of Santo Domingo Este, with the highest number in cluster 1 and almost same moderate number in cluster 0.

It seems there is opportunity and high potential areas to open new shopping malls as there is no competition from existing malls in Cluster 2. 

Shopping malls in cluster 0 and 1 are likely suffering from intense competition due to high concentration of shopping locations. Therefore, this project recommends property developers to open new shopping malls in neighbourhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighbourhoods in cluster 0 with moderate competition.

One other observation is the surrounding areas in Cluster 2. Museum, Aquatic Park, Aquarium and Racetrack are among the attractions in the area that are a walking distance away or few minutes driving. And, better yet, there is unbuilt land.