<a href="https://colab.research.google.com/github/josedandrade/Coursera_Capstone/blob/main/Capstone_Final_Assignment_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Shopping Mall for Santo Domingo Este



# Introduction 

Shopping malls have been increasingly important in modern society. Our visits are not limited to buying things any more. They have become the children's playground. Adults spent the day walking through elegant alleys equipped with benches, flowers and even palms as a means of exercising. Today, most shopping malls have many restaurants, bars, cafes or even hairdressers, beauty salons, gyms, cinemas and other entertainment attractions which enables us to fulfil a lot of different needs in the area of a single building.  Social life is gradually transferring from the areas of old towns and main streets to shopping centres.

The Distrito Nacional is a subdivision of the Dominican Republic enclosing the capital Santo Domingo. Before 2001, the Distrito Nacional was a large area that included what is now known as Santo Domingo Province. The Law 163-01 created the province and separated the Distrito Nacional from other municipalities. Santo Domingo Este was created.

Santo Domingo Este is across the Ozama River which divides the east and west sections of metropolitan Santo Domingo. It is more residential and less commercially developed, but it has experienced growth since its creation, with new malls and department stores.

# Business Problem

The **Distrito National** has a high density of housing and businesses. Transportation is a growing issue. In the last two decades many shopping centers have experienced a decline in attracting visiting public and a drop in their commercial activities. The current economic climate and culture of "new is better than old" has left many commercial centers built in the 80's, 90's and early 2000, vacant and disused. It may be time to look elsewhere when thinking about new commercial plazas.

**Santo Domingo Este**, has a booming economy which is rapidly attracting the interest of many not just as a living destination, but for investing purposes as well. The rapid growth poses a problem when trying to decide where to open a business.

Geospatial analysis can help us to select the best location for opening a new shopping mall in the city of Santo Domingo Este. Following a data science methodology and utilizing machine learning techniques, this work aims to provide a guide to answer the business question: 

Considering the issues in the National District, where in the city of Santo Domingo Este would be the best location to build a new shopping mall?

# Data

> **Neighbourhoods**. The scope of this project is constrained to locations in the city of Santo Domingo Este, the second most important municipality of the province of Santo Domingo. 

> **Latitude and longitude coordinates**. Geocoding is the process of transforming a description of a location, such as an address, or a name of a place, to a location on the earth's surface. The resulting locations are output as geographic features with attributes, which can be used for mapping or spatial analysis.

> **Venues**. Data of businesses in the vicinity of the geocoded neighbourhoods. We will use this data for cluster analysis.

## Data Sources, APIs and Python Libraries
This project requires many data science skills from web scraping, working with API, data wrangling, to machine learning and data visualization.

> **Government**. Data from this government page https://www.one.gob.do/ , Dominican Republic’s head department in charge of statistics. There are several databases and Keyhole Markup Language (KML) files with geographic annotation. One of such databases contains demographic information about every province, municipality down to neighbourhoods in the country.

> **Foursquare and Google Places API**. After obtaining geocoded data, we will use these APIs to get venue data for those neighbourhoods. These two service providers have some of the largest databases of places.

> **Python Libraries**.
We will get geographical coordinates using Python Geocoder package which will give us coordinates of the neighbourhoods. Other libraries to be used:
**Pandas**: For creating and manipulating dataframes.
**Folium**: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.
**Scikit Learn**: For importing k-means clustering.
**JSON**: Library to handle JSON files.
**Beautiful Soup** and **Requests**: To scrap web pages and to handle http requests.
Matplotlib: Python Plotting Module.
Foursquare API will provide many categories of the venue data. Our main interest is the Shopping Mall category.

In [1]:
# Google API Key
google_api_key = '1234567890SNPKvXPuRzynq14G9HJT4WDzBOsqw'

In [2]:
# Foursquare API Key
CLIENT_ID = 'HGUZJNKFLB12345678902MPBR53UIDGT' # your Foursquare ID
CLIENT_SECRET = 'DSPFDADCYVWNF1234567890GK3AHIQZRODB' # your Foursquare Secret
VERSION = '20180604'

In [3]:
location_of_interest = 'Santo Domingo Este, Dominican Republic'
country = ', Dominican Republic'

In [4]:
!pip install geocoder
#!pip install folium

import numpy as np                          # vectors

import pandas as pd                         # data analysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json                                 # JSON files

import geocoder                             # geocoding

import requests                             # requests
from bs4 import BeautifulSoup               # parsing of HTML and XML

from pandas.io.json import json_normalize   # tranform JSON files for data analysis

import matplotlib.cm as cm                  # plotting
import matplotlib.colors as colors

from sklearn.cluster import KMeans          # machine learning for clustering

import folium                               # map rendering

print("Libraries imported.")

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 20.5MB/s eta 0:00:01[K     |██████▋                         | 20kB 17.2MB/s eta 0:00:01[K     |██████████                      | 30kB 10.4MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 8.5MB/s eta 0:00:01[K     |████████████████▋               | 51kB 7.0MB/s eta 0:00:01[K     |████████████████████            | 61kB 7.4MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 7.9MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 7.5MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 7.5MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 5.8MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592ca

# Getting the Data

## Neighbourhoods
We will build our list of neighbourhoods in **Santo Domingo Este** by web scraping the data from [Wikipedia](https://en.wikipedia.org/wiki/Santo_Domingo_Este).

In [5]:
# send the GET request and parse response into a beautifulsoup object
data = requests.get("https://en.wikipedia.org/wiki/Santo_Domingo_Este").text
soup = BeautifulSoup(data, 'html.parser')

# store neighborhood data in a List
neighborhoodList = []
for row in soup.find_all("table", class_="multicol")[0].findAll("li"):
    neighborhoodList.append(row.text)

# create a DataFrame from the list
sde_df = pd.DataFrame({"Neighborhood": neighborhoodList})
print(sde_df.shape)
sde_df.head()

(60, 1)


Unnamed: 0,Neighborhood
0,Alma Rosa I
1,Alma Rosa II
2,Ana Teresa Balaguer
3,Arismar
4,Barrio Ámbar


### Geocoding

First we will try to get geographical coordinates of our location of interest.

#### **Geocoding with ARCGIS**

Get the coordinates of Santo Domingo Este

In [6]:
geocoded_location_of_interest = geocoder.arcgis(location_of_interest)
geolocation_of_interest = geocoded_location_of_interest.latlng
print('The geographical coordinate of {} {}.'.format(location_of_interest, geolocation_of_interest))

The geographical coordinate of Santo Domingo Este, Dominican Republic [18.50532000000004, -69.85663999999997].


Now, we can implement that as a function to get geographical coordinates of our neighbourhoods.

In [7]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, {}'.format(location_of_interest, neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [8]:
# store coordinates in a list calling function on every location
coords = [ get_latlng(neighborhood) for neighborhood in sde_df["Neighborhood"].tolist() ]

# uncomment to inspect coordinates
# coords

In [9]:
# create a dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

# merge into the original dataframe
sde_df['Latitude'] = df_coords['Latitude']
sde_df['Longitude'] = df_coords['Longitude']

In [10]:
# inspect neighborhoods and coordinates
print(sde_df.shape)
sde_df.head()

(60, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Alma Rosa I,18.49417,-69.85453
1,Alma Rosa II,18.48742,-69.85057
2,Ana Teresa Balaguer,18.52523,-69.82409
3,Arismar,18.4706,-69.81673
4,Barrio Ámbar,18.49847,-69.87071


In [11]:
# create map of Santo Domingo Este using latitude and longitude values
map_sde = folium.Map(location=geolocation_of_interest, zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_sde)  
    
map_sde

In [12]:
# save the map as HTML file
#map_sde.save('/users/dandrade/desktop/map_sde.html')

#### **Geocoding with Google Maps API**

Get the coordinates of Santo Domingo Este

In [13]:
def get_coordinates(api_key, location_of_interest, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, location_of_interest)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
location_center = get_coordinates(google_api_key, location_of_interest)
print('Coordinates of {}: {}'.format(location_of_interest, location_center))

Coordinates of Santo Domingo Este, Dominican Republic: [18.4893469, -69.8255369]


In [14]:
# store coordinates in a list calling function on every location
coords = [ get_coordinates(google_api_key, '{}, {}'.format(neighborhood, location_of_interest)) for neighborhood in sde_df["Neighborhood"].tolist() ]

# uncomment to inspect coordinates
# coords

In [15]:
# create a dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['LatitudeGoogle', 'LongitudeGoogle'])

# merge into the original dataframe
sde_df['LatitudeGoogle'] = df_coords['LatitudeGoogle']
sde_df['LongitudeGoogle'] = df_coords['LongitudeGoogle']

In [16]:
# inspect neighborhoods and coordinates
print(sde_df.shape)
sde_df.head()

(60, 5)


Unnamed: 0,Neighborhood,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
0,Alma Rosa I,18.49417,-69.85453,18.490636,-69.863463
1,Alma Rosa II,18.48742,-69.85057,18.495101,-69.84597
2,Ana Teresa Balaguer,18.52523,-69.82409,18.524252,-69.821769
3,Arismar,18.4706,-69.81673,18.469575,-69.816929
4,Barrio Ámbar,18.49847,-69.87071,18.49861,-69.869978


> Aware of differences in geocoded data from different service providers, we are now interested in seeing if there are neighbourhoods that fall outside the boundaries of our location of interest.

> We will use the geographical boundaries of all the towns and rural areas from the official website of the State. Link [here](https://www.one.gob.do)



In [17]:
from folium import plugins

filename = 'https://raw.githubusercontent.com/josedandrade/Coursera_Capstone/main/barriosSDE.geojson'
location_boroughs = requests.get(filename).json()

def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

# create map of Santo Domingo Este using latitude and longitude values
map_location = folium.Map(location=location_center, zoom_start=11)
folium.Marker(location_center, popup=location_of_interest).add_to(map_location)

# add radius of interest and boroughs
folium.TileLayer('cartodbpositron').add_to(map_location)            #cartodbpositron cartodbdark_matter
folium.Circle(location_center, radius=6000, fill=False, color='red').add_to(map_location)
folium.GeoJson(location_boroughs, style_function=boroughs_style, name='geojson').add_to(map_location)

# add markers to map
for lat, lng, latG, lngG, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_location)  

for lat, lng, latG, lngG, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latG, lngG],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=1).add_to(map_location)  

map_location


> **FINDING** I will discard ArcGIS geocoded data because of outliers.



## Venues
We use Foursquare API to explore neighbourhood venues. We get the geographical coordinates of neighborhoods using python Geocoder package, then obtain the venue data for the neighborhoods from Foursquare API.


### Data Wrangling

### Foursquare API

Now that we have geocoded our locations we use Foursquare API to get information on businesses in each area. We are interested in venues in 'Shop and Service' Category ID 4d4b7105d754a06378d81259, but we shall get information on other businesses like coffe shops, pizza places, bakeries etc. so that we can find similarities and later on perform a cluster analysis. Our focus again will be Shopping Mall and Shopping Plaza. These categories are identified by IDs: 4bf58dd8d48988d1fd941735 and 5744ccdfe4b0c0459246b4dc respectively. See [here](https://developer.foursquare.com/docs/build-with-foursquare/categories/).

In [18]:
radius = 3000
LIMIT = 100

venues = []

for lat, lng, latG, lngG, neighborhood in zip(sde_df['Latitude'], sde_df['Longitude'], sde_df['LatitudeGoogle'], sde_df['LongitudeGoogle'], sde_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lng,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            lng, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

#### Convert the venues list into a new DataFrame

In [19]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(10)

(3903, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Alma Rosa I,18.49417,-69.85453,Kissairis Panaderia,18.500252,-69.852752,Bakery
1,Alma Rosa I,18.49417,-69.85453,Antojitos Premium,18.486981,-69.847865,Burger Joint
2,Alma Rosa I,18.49417,-69.85453,Helados Bon,18.489972,-69.86538,Ice Cream Shop
3,Alma Rosa I,18.49417,-69.85453,Wicho,18.499029,-69.864061,Burger Joint
4,Alma Rosa I,18.49417,-69.85453,Supermercados Bravo,18.503335,-69.855931,Supermarket
5,Alma Rosa I,18.49417,-69.85453,Smart Fit,18.50582,-69.856721,Gym / Fitness Center
6,Alma Rosa I,18.49417,-69.85453,Dial Bar and Lounge,18.48848,-69.869165,Lounge
7,Alma Rosa I,18.49417,-69.85453,Chimi El Patricio,18.489007,-69.850788,Burger Joint
8,Alma Rosa I,18.49417,-69.85453,Excellent Cakes,18.491431,-69.850293,Cupcake Shop
9,Alma Rosa I,18.49417,-69.85453,Parque Nacional Los Tres Ojos,18.481558,-69.84337,Park


#### Evaluate number of venues per neighbourhood

In [20]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alma Rosa I,100,100,100,100,100,100
Alma Rosa II,100,100,100,100,100,100
Ana Teresa Balaguer,25,25,25,25,25,25
Arismar,43,43,43,43,43,43
Barrio La Isla,49,49,49,49,49,49
Barrio Ámbar,100,100,100,100,100,100
Brisas del Este,15,15,15,15,15,15
Cansino Adentro,32,32,32,32,32,32
Corales del Este,58,58,58,58,58,58
El Almirante,200,200,200,200,200,200


#### Evaluate if our category of interest is present on all unique categories from all the returned venues

In [21]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 138 uniques categories.


In [22]:
unique_categories = venues_df['VenueCategory'].unique() #displays all the category names
df = pd.DataFrame(unique_categories, columns = ['Unique Categories'])
df.tail()

Unnamed: 0,Unique Categories
133,Paintball Field
134,Rental Car Location
135,Liquor Store
136,Automotive Shop
137,Pie Shop


In [23]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

#### Grouping by Venue Categories
We need to now see how many Venue Categories are there for further processing

In [24]:
venues_df.groupby(["VenueCategory"]).max()

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude
VenueCategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,Villa Olímpica,18.51022,-69.83749,Mr. Letta,18.491104,-69.865287
Aquarium,Villa Olímpica,18.49417,-69.82365,Acuario Nacional,18.468197,-69.850437
Art Gallery,Villa Olímpica,18.49847,-69.85937,Quinta Dominica,18.476392,-69.884128
Arts & Crafts Store,Vista Hermosa,18.53427,-69.83571,Dume,18.508019,-69.855701
Asian Restaurant,Vista Hermosa,18.52232,-69.83571,Yokomo Sushi,18.5074,-69.856491
Athletics & Sports,Villa Eloisa,18.492345,-69.799065,Play Villa Carmen,18.501259,-69.823418
Auto Workshop,Villa Eloisa,18.50473,-69.80721,Azua Muffler,18.478997,-69.82806
Automotive Shop,Villa Cumbre,18.505144,-69.965599,Repuestos Y Centro De Servicios Sidra,18.479779,-69.969279
BBQ Joint,Villa Olímpica,18.511313,-69.794995,Vulcano Grill and Beer Market,18.507559,-69.818085
Bakery,Vista Hermosa,18.52232,-69.79017,Reposteria La Bonashe,18.500252,-69.811284


In [25]:
venues_df.groupby(["VenueCategory"]).max().shape

(138, 6)

We can see 138 records, goes to show diversity. So now we have all the venues in our area of interest. We have collected all venues within a radius of 3km of every neighbourhood. We also know there are Shopping Malls in the area.

This concludes the data gathering aspect of our study. We are going to use this data for analysis and to produce the report on optimal locations for a new Shopping Mall.

# Methodology

Our focus is on detecting areas of Santo Domingo Este that have low Shopping Malls density, We will limit our analysis to and area of about of 6km of radius from a location that is highly centered.

We have collected the required data: location and venues.

Second step in our analysis will be calculation and exploration of geographic segmentation across different areas of Santo Domingo Este. We will use heatmaps to identify a few promising areas close to center and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet some basic requirements established in discussion with stakeholders. We will present a map of all such locations but also create clusters (using k-means clustering) of those locations to identify general neighborhoods which should be a starting point for final exploration and search for optimal venue location by stakeholders.

In [26]:
#!pip install geopandas

from folium.plugins import HeatMap

map_location = folium.Map(location=location_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_location) #cartodbpositron cartodbdark_matter
folium.Marker(location_center).add_to(map_location)
folium.Circle(location_center, radius=1000, fill=False, color='red').add_to(map_location)
folium.Circle(location_center, radius=3000, fill=False, color='red').add_to(map_location)
folium.Circle(location_center, radius=6000, fill=False, color='red').add_to(map_location)
folium.GeoJson(location_boroughs, style_function=boroughs_style, name='geojson').add_to(map_location)
map_location

## One Hot Encoding 
We need to Encode our venue categories to get a better result for our clustering

In [27]:
# one hot encoding
sde_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")
sde_onehot.head(5)


Unnamed: 0,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Botanical Garden,Brewery,Burger Joint,Bus Station,Cable Car,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Intersection,Italian Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Latin American Restaurant,Lighthouse,Liquor Store,Lounge,Market,Massage Studio,Medical Lab,Metro Station,Mexican Restaurant,Mobile Phone Shop,Motel,Movie Theater,Museum,Music Venue,Neighborhood,Nightclub,Optical Shop,Paella Restaurant,Paintball Field,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Port,Post Office,Print Shop,Pub,Racetrack,Rental Car Location,Resort,Restaurant,Road,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Social Club,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Tennis Stadium,Theater,Theme Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Yoga Studio
0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
sde_onehot.shape

(3903, 138)

Add neighborhood column back to dataframe

In [29]:
sde_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [sde_onehot.columns[-1]] + list(sde_onehot.columns[:-1])
sde_onehot = sde_onehot[fixed_columns]

print(sde_onehot.shape)
sde_onehot.head(10)

(3903, 139)


Unnamed: 0,Neighborhoods,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Botanical Garden,Brewery,Burger Joint,Bus Station,Cable Car,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Intersection,Italian Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Latin American Restaurant,Lighthouse,Liquor Store,Lounge,Market,Massage Studio,Medical Lab,Metro Station,Mexican Restaurant,Mobile Phone Shop,Motel,Movie Theater,Museum,Music Venue,Neighborhood,Nightclub,Optical Shop,Paella Restaurant,Paintball Field,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Port,Post Office,Print Shop,Pub,Racetrack,Rental Car Location,Resort,Restaurant,Road,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Social Club,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Tennis Stadium,Theater,Theme Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Yoga Studio
0,Alma Rosa I,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
5,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Alma Rosa I,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We will group the Neighbourhoods and calculate the mean venue categories value in each Neighbourhood

In [30]:
sde_grouped = sde_onehot.groupby(["Neighborhoods"]).mean().reset_index()                        
print(sde_grouped.shape)
sde_grouped

(59, 139)


Unnamed: 0,Neighborhoods,Accessories Store,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,Automotive Shop,BBQ Joint,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Garden,Beer Store,Big Box Store,Bistro,Botanical Garden,Brewery,Burger Joint,Bus Station,Cable Car,Cafeteria,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Church,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Cupcake Shop,Department Store,Dessert Shop,Dive Bar,Electronics Store,Empanada Restaurant,Falafel Restaurant,Fast Food Restaurant,Food Court,Food Service,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Gymnastics Gym,Harbor / Marina,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Ice Cream Shop,Intersection,Italian Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Latin American Restaurant,Lighthouse,Liquor Store,Lounge,Market,Massage Studio,Medical Lab,Metro Station,Mexican Restaurant,Mobile Phone Shop,Motel,Movie Theater,Museum,Music Venue,Neighborhood,Nightclub,Optical Shop,Paella Restaurant,Paintball Field,Park,Pedestrian Plaza,Performing Arts Venue,Pharmacy,Pie Shop,Pier,Pizza Place,Plaza,Port,Post Office,Print Shop,Pub,Racetrack,Rental Car Location,Resort,Restaurant,Road,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Snack Place,Soccer Field,Social Club,Spanish Restaurant,Sports Bar,Sports Club,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Tennis Stadium,Theater,Theme Restaurant,Toy / Game Store,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Yoga Studio
0,Alma Rosa I,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.06,0.02,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.03,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.04,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
1,Alma Rosa II,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.06,0.02,0.02,0.06,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.04,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0
2,Ana Teresa Balaguer,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Arismar,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.069767,0.046512,0.093023,0.069767,0.0,0.0,0.0,0.023256,0.0,0.023256,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.046512,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.046512,0.0,0.0,0.023256,0.0,0.0,0.046512,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.046512,0.0,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Barrio La Isla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.020408,0.040816,0.102041,0.0,0.0,0.040816,0.0,0.020408,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.020408,0.0,0.020408,0.061224,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.0,0.0,0.0,0.020408,0.0,0.0,0.020408,0.0,0.0,0.040816,0.020408,0.020408,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Barrio Ámbar,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.04,0.01,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.02,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.02,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0
6,Brisas del Este,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Cansino Adentro,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.03125,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.03125,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.03125,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Corales del Este,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0,0.086207,0.017241,0.068966,0.034483,0.0,0.0,0.0,0.034483,0.0,0.017241,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.051724,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.017241,0.0,0.0,0.017241,0.0,0.017241,0.017241,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.051724,0.0,0.0,0.017241,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.034483,0.017241,0.017241,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017241,0.0,0.0
9,El Almirante,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.06,0.01,0.03,0.04,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.03,0.01,0.0,0.01,0.02,0.0,0.03,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.07,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.04,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.05,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's make a function to get the top most common venue categories

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

There are way too many venue categories, we can take the top 10 to cluster the neighbourhoods.

Creating a function to label the columns of the venue correctly

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

## Top venue categories

Getting the top venue categories in Santo Domingo Este

In [33]:
# create a new dataframe
neighborhoods_venues_sorted_sde = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_sde['Neighborhoods'] = sde_grouped['Neighborhoods']

for ind in np.arange(sde_grouped.shape[0]):
    neighborhoods_venues_sorted_sde.iloc[ind, 1:] = return_most_common_venues(sde_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_sde.head()

Unnamed: 0,Neighborhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Alma Rosa I,Burger Joint,Pizza Place,BBQ Joint,Nightclub,Bar,Supermarket,Gym,Fast Food Restaurant,Fried Chicken Joint,Park
1,Alma Rosa II,Bar,Pizza Place,BBQ Joint,Burger Joint,Nightclub,Supermarket,Fried Chicken Joint,Fast Food Restaurant,Steakhouse,Park
2,Ana Teresa Balaguer,Bank,Pharmacy,Restaurant,Supermarket,Grocery Store,Cupcake Shop,Park,Nightclub,Fast Food Restaurant,Caribbean Restaurant
3,Arismar,Bank,BBQ Joint,Bar,Burger Joint,Gym,Pizza Place,Supermarket,Bakery,Caribbean Restaurant,Fried Chicken Joint
4,Barrio La Isla,Gym,Pizza Place,Supermarket,Harbor / Marina,Grocery Store,Department Store,Park,Restaurant,Burger Joint,Social Club


In [34]:
# How many shopping malls?
len((sde_grouped[sde_grouped["Shopping Mall"] > 0]))  

34

## Model Building

## K Means

Run k-means to cluster the neighborhoods into 4 clusters.

In [37]:
# set number of clusters
k = 5

sde_clustering = sde_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(sde_clustering)
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=5, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

## Labelling Clustered Data

In [40]:
kmeans.labels_

array([0, 0, 2, 2, 3, 0, 2, 2, 3, 0, 2, 2, 0, 2, 3, 4, 2, 2, 2, 0, 2, 3,
       1, 2, 0, 0, 2, 3, 3, 2, 3, 2, 2, 2, 1, 2, 0, 2, 3, 0, 2, 2, 3, 0,
       2, 2, 0, 3, 0, 3, 3, 0, 3, 1, 2, 4, 0, 2, 0], dtype=int32)

In [41]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
sde_merged = sde_mall.copy()

# add clustering labels
sde_merged["Cluster Labels"] = kmeans.labels_

In [42]:
sde_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
sde_merged.head(10)

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Alma Rosa I,0.01,0
1,Alma Rosa II,0.01,0
2,Ana Teresa Balaguer,0.0,2
3,Arismar,0.0,2
4,Barrio La Isla,0.020408,3
5,Barrio Ámbar,0.01,0
6,Brisas del Este,0.0,2
7,Cansino Adentro,0.0,2
8,Corales del Este,0.017241,3
9,El Almirante,0.01,0


In [43]:
#Add latitude and longitude values

print(sde_merged.shape)
sde_merged['Latitude'] = sde_df['Latitude']
sde_merged['Longitude'] = sde_df['Longitude']
sde_merged['LatitudeGoogle'] = sde_df['LatitudeGoogle']
sde_merged['LongitudeGoogle'] = sde_df['LongitudeGoogle']
sde_merged.head()

(59, 3)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
0,Alma Rosa I,0.01,0,18.49417,-69.85453,18.490636,-69.863463
1,Alma Rosa II,0.01,0,18.48742,-69.85057,18.495101,-69.84597
2,Ana Teresa Balaguer,0.0,2,18.52523,-69.82409,18.524252,-69.821769
3,Arismar,0.0,2,18.4706,-69.81673,18.469575,-69.816929
4,Barrio La Isla,0.020408,3,18.49847,-69.87071,18.49861,-69.869978


In [44]:
# sorting the results by Cluster Labels
sde_merged.sort_values(["Cluster Labels"], inplace=True)
sde_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
0,Alma Rosa I,0.01,0,18.49417,-69.85453,18.490636,-69.863463
56,Villa Faro,0.01,0,18.47878,-69.81663,18.480503,-69.816557
51,Valle del Este,0.011494,0,18.50244,-69.84783,18.50315,-69.848573
48,Urbanizacion Mi Hogar,0.01,0,18.47059,-69.82365,18.472043,-69.823258
46,Sans Souci,0.01,0,18.54526,-69.79389,18.543305,-69.795709
43,Residencial del Este,0.01,0,18.53544,-69.84654,18.493313,-69.812226
39,Reparto Alma Rosa,0.01,0,18.49417,-69.83749,18.496171,-69.83666
36,Ozama,0.011905,0,18.47593,-69.85946,18.475227,-69.864952
25,Los Minas Sur,0.01,0,18.49891,-69.85762,18.502707,-69.86653
24,Los Minas,0.01,0,18.47294,-69.86787,18.472586,-69.855826


In [45]:
sde_merged['Shopping Mall'].max()

0.047619047619047616

## Now we visualize the resulting clusters

In [46]:
# create map
map_clusters = folium.Map(location=location_center, zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i+x+(i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, latG, lonG, poi, cluster in zip(sde_merged['Latitude'], sde_merged['Longitude'], sde_merged['LatitudeGoogle'], sde_merged['LongitudeGoogle'], sde_merged['Neighborhood'], sde_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        #[lat, lon],
        [latG, lonG],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
# save the map as HTML file
map_clusters.save('/users/dandrade/desktop/map_clusters.html')

## Examine clusters

Cluster 0

In [47]:
print('Cluster 0: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 0])))
sde_merged.loc[sde_merged['Cluster Labels'] == 0]

Cluster 0: Number of neighbourhoods/places: 16


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
0,Alma Rosa I,0.01,0,18.49417,-69.85453,18.490636,-69.863463
56,Villa Faro,0.01,0,18.47878,-69.81663,18.480503,-69.816557
51,Valle del Este,0.011494,0,18.50244,-69.84783,18.50315,-69.848573
48,Urbanizacion Mi Hogar,0.01,0,18.47059,-69.82365,18.472043,-69.823258
46,Sans Souci,0.01,0,18.54526,-69.79389,18.543305,-69.795709
43,Residencial del Este,0.01,0,18.53544,-69.84654,18.493313,-69.812226
39,Reparto Alma Rosa,0.01,0,18.49417,-69.83749,18.496171,-69.83666
36,Ozama,0.011905,0,18.47593,-69.85946,18.475227,-69.864952
25,Los Minas Sur,0.01,0,18.49891,-69.85762,18.502707,-69.86653
24,Los Minas,0.01,0,18.47294,-69.86787,18.472586,-69.855826


Cluster 1

In [48]:
print('Cluster 1: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 1])))
sde_merged.loc[sde_merged['Cluster Labels'] == 1]

Cluster 1: Number of neighbourhoods/places: 3


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
22,Los Frailes II,0.04,1,18.47424,-69.8201,18.47234,-69.819535
53,Villa Cumbre,0.044944,1,18.50473,-69.82533,18.503809,-69.822513
34,Milagrosa,0.047619,1,18.50387,-69.81732,18.505649,-69.847629


Cluster 2

In [49]:
print('Cluster 2: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 2])))
sde_merged.loc[sde_merged['Cluster Labels'] == 2]

Cluster 2: Number of neighbourhoods/places: 25


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
11,El Paredón,0.0,2,18.492345,-69.799065,18.508648,-69.847084
7,Cansino Adentro,0.0,2,18.53676,-69.84154,18.53181,-69.842383
37,Paraíso Oriental,0.0,2,18.51022,-69.87509,18.50151,-69.868741
6,Brisas del Este,0.0,2,18.47897,-69.79017,18.480384,-69.797556
40,Residencial Don Oscar,0.0,2,18.50133,-69.85729,18.500798,-69.857024
13,Hainamosa,0.0,2,18.48608,-69.847023,18.487413,-69.847084
44,San Isidro,0.0,2,18.52232,-69.85071,18.522822,-69.850434
45,San Luis,0.0,2,18.529,-69.7619,18.528748,-69.779327
3,Arismar,0.0,2,18.4706,-69.81673,18.469575,-69.816929
2,Ana Teresa Balaguer,0.0,2,18.52523,-69.82409,18.524252,-69.821769


In [50]:
print('Cluster 3: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 3])))
sde_merged.loc[sde_merged['Cluster Labels'] == 3]

Cluster 3: Number of neighbourhoods/places: 13


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
38,Ralma,0.02,3,18.521582,-69.753584,18.485407,-69.830517
27,Los Tres Ojos,0.017857,3,18.47572,-69.87688,18.475288,-69.878354
4,Barrio La Isla,0.020408,3,18.49847,-69.87071,18.49861,-69.869978
47,Tropical del Este,0.017544,3,18.50532,-69.85664,18.468601,-69.874981
28,Los Trinitarios,0.020408,3,18.47956,-69.83192,18.479833,-69.843039
8,Corales del Este,0.017241,3,18.469919,-69.826187,18.472537,-69.827725
50,Urbanización San Cirilo,0.02381,3,18.511313,-69.81064,18.510556,-69.811531
42,Residencial Tito IV,0.015873,3,18.53427,-69.84958,18.534458,-69.848619
52,Villa Carmen,0.023256,3,18.476352,-69.863277,18.482378,-69.830331
30,Lucerna,0.017241,3,18.48045,-69.80846,18.478527,-69.798687


In [51]:
print('Cluster 4: Number of neighbourhoods/places: {}'.format(len(sde_merged.loc[sde_merged['Cluster Labels'] == 4])))
sde_merged.loc[sde_merged['Cluster Labels'] == 4]

Cluster 4: Number of neighbourhoods/places: 2


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude,LatitudeGoogle,LongitudeGoogle
15,Invivienda,0.029851,4,18.518,-69.82064,18.520005,-69.821769
55,Villa Eloisa,0.03125,4,18.47991,-69.87673,18.480498,-69.87606


# Final observation

A good number of shopping locations are concentrated in the central area of Santo Domingo Este, with the highest number in cluster 2 and moderate number in cluster 1. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighbourhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighbourhoods in cluster 1 with moderate competition. Lastly, property developers are advised to avoid neighbourhoods in cluster 2 which already have a high concentration of shopping malls and suffering from intense competition.


