# Capstone Project: What is the best location to open a CrossFit Box in Paris?

**April 2019**

## Table of contents
* [Business Problem](#introduction)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## PART 1: Business Problem <a name="introduction"></a>

<h3><center><font color='royalblue'> What is the best location to open a CrossFit Box in Paris? </font></center></h3>


In this project we will try to find an optimal location for a CrossFit Box in Paris. We will try to detect **locations that are not already crowded with gyms or fitness centers**. In fact, we will be particularly interested in areas with no gyms, fitness centers or crossfit boxes in vicinity, and also with few sporting facilities nearby. We will target more **residential areas** (not touristic). We will also **prefer locations as close to city center as possible**.

## PART 2: Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number and distance to existing gyms (any type of gym/fitness center)
* number of sporting facilities nearby (football pitch, tennis court, boxing gym, ...)
* number and distance to existing crossfit boxes
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of gyms/sporting facilities/crossfit boxes and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Paris center will be obtained using **Google Maps API geocoding** 

First, let's import the libraries we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

### Define Neighborhoods

We are going to divide Paris into multiple areas of the same size. **Each area will be considered as a neighborhood** and will have a centroid defined with a latitude and a longitude. 

The first step is to find the latitude & longitude of Paris city center, using specific, well known address and Google Maps geocoding API. The geographic genter of the french capital is considered to be _Place Dauphine_, next to the famous "Notre-Dame Cathedral". Let's find the coordinates for this address.

In [2]:
# API KEY 
# This code is not shown

In [3]:
def get_coordinates(api_key, address):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        results = response['results']
        geographical_data = results[0]['geometry']['location']
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Place Dauphine, 75001 Paris, France'
paris_center = get_coordinates(google_api_key, address)
print('Coordinates of {} are: {}'.format(address, paris_center))

Coordinates of Place Dauphine, 75001 Paris, France are: [48.8565422, 2.3425083]


Now that we have the city center, we need to know the total area we need to cover with our grid. We decided to create a grid centered around Place Dauphine and within 6km from that center point. It will cover all Paris. Maybe even more than Paris, but we'll fix that later.

Each neighborhood will be defined as a circular area with a radius of 300 meters, so neighborhood centers will be 600 meters apart from the others.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map.  
  
First, let's define the function we will need for these conversions between Cartesian 2D and Lat/Long. In order to do it, we need to import the **pyproj** module.

In [4]:
import shapely.geometry

# pyproj is used to convert latitudes/longitudes to cartesian 2D
import pyproj

# import math module to calculate euclidean distance
import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

Let's create an **hexagonal grid of cells**: 

In [5]:
# City center in Cartesian coordinates
paris_center_x, paris_center_y = lonlat_to_xy(paris_center[1], paris_center[0])

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = paris_center_x - 6000
x_step = 600
y_min = paris_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(paris_center_x, paris_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print('Number of neighborhoods generated = ', len(latitudes))

Number of neighborhoods generated =  364


Using folium we can visualize these neighborhoods:

In [6]:
map_paris = folium.Map(location=paris_center, zoom_start=12)
folium.Marker(paris_center, popup='Place Dauphine').add_to(map_paris)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='royalblue', fill=False).add_to(map_paris)

map_paris

 Now, let's create a function that takes latitude and longitude as inputs and returns a well formatted address.

In [7]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

Now we can use this function to get all addresses:

In [8]:
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', France', '') # We don't need country part of address
    print(address)
    addresses.append(address)

3 Rue Berthollet, 94110 Arcueil
60 Rue de la Division du Général Leclerc, 94110 Arcueil
112 Avenue Charles Gide, 94270 Le Kremlin-Bicêtre
15 Avenue Charles Gide, 94270 Le Kremlin-Bicêtre
141 Avenue de Fontainebleau, 94270 Le Kremlin-Bicêtre
21 Rue Carnot, 94200 Ivry-sur-Seine
14 Rue Michelet, 94200 Ivry-sur-Seine
8 Avenue de Stalingrad, 92220 Bagneux
43 D920, 94110 Arcueil
Laplace RER, 94110 Arcueil
3 Rue Emile Bougard, 94110 Arcueil
63 Rue Gabriel Péri, 94270 Le Kremlin-Bicêtre
61 Rue du Général Leclerc, 94270 Le Kremlin-Bicêtre
1 Avenue du Boulodrome, 94270 Le Kremlin-Bicêtre
5 Allée Irène Joliot-Curie, 94200 Ivry-sur-Seine
42 Rue Gabriel Péri, 94200 Ivry-sur-Seine
6 Rue Marcel Cachin, 94200 Ivry-sur-Seine
85 Avenue Marx Dormoy, 92220 Bagneux
58 Rue Fénelon, 92120 Montrouge
24 Rue du Stade Buffalo, 92120 Montrouge
45 Avenue du Président Nelson Mandela, 94110 Arcueil
49 Rue Pierre Marcel, 94250 Gentilly
15 Rue Raymond Lefebvre, 94250 Gentilly
14 Avenue Raspail, 94250 Gentilly
27 Avenu

129 Boulevard Mortier, 75020 Paris
68 Rue Louis David, 93170 Bagnolet
2 Boulevard Suchet, 75016 Paris
119 Rue de la Tour, 75116 Paris
17 Rue Greuze, 75116 Paris
15 Rue de Longchamp, 75016 Paris
4 Rue Léonce Reynaud, 75116 Paris
18 Rue François 1er, 75008 Paris
10 Av. des Champs-Élysées, 75008 Paris
1 Rue du Faubourg Saint-Honoré, 75008 Paris
13 Rue de la Paix, 75002 Paris
85Z Rue de Richelieu, 75002 Paris
10 Boulevard Poissonnière, 75009 Paris
13 Passage Reilhac, 75010 Paris
25 Rue des Vinaigriers, 75010 Paris
4 Rue Jean et Marie Moinon, 75010 Paris
1 Rue Lauzin, 75019 Paris
7 Voie Communale H 19, 75019 Paris
1 Rue Henri Ribière, 75019 Paris
55 Rue de Romainville, 75019 Paris
57 Rue des Frères Flavien, 75020 Paris
47 Avenue du Maréchal Fayolle, 75116 Paris
7 Villa Spontini, 75116 Paris
79 Rue Boissière, 75116 Paris
16 Rue Dumont d'Urville, 75116 Paris
23 Rue Quentin-Bauchart, 75008 Paris
61 Avenue Franklin Delano Roosevelt, 75008 Paris
11 Rue Cambacérès, 75008 Paris
34 Rue des Mathurin

Let's visualize it in a dataframe.

In [9]:
df = pd.DataFrame({'Neighborhood': addresses,
                   'Latitude': latitudes,
                   'Longitude': longitudes,
                   'X': xs,
                   'Y': ys,
                   'Distance from center': distances_from_center})

df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center
0,"3 Rue Berthollet, 94110 Arcueil",48.803704,2.331432,-429193.491168,5483527.0,5992.495307
1,"60 Rue de la Division du Général Leclerc, 9411...",48.804594,2.339404,-428593.491168,5483527.0,5840.3767
2,"112 Avenue Charles Gide, 94270 Le Kremlin-Bicêtre",48.805484,2.347376,-427993.491168,5483527.0,5747.173218
3,"15 Avenue Charles Gide, 94270 Le Kremlin-Bicêtre",48.806374,2.355349,-427393.491168,5483527.0,5715.767665
4,"141 Avenue de Fontainebleau, 94270 Le Kremlin-...",48.807263,2.363322,-426793.491168,5483527.0,5747.173218
5,"21 Rue Carnot, 94200 Ivry-sur-Seine",48.808151,2.371296,-426193.491168,5483527.0,5840.3767
6,"14 Rue Michelet, 94200 Ivry-sur-Seine",48.809039,2.37927,-425593.491168,5483527.0,5992.495307
7,"8 Avenue de Stalingrad, 92220 Bagneux",48.806927,2.318305,-430093.491168,5484047.0,5855.766389
8,"43 D920, 94110 Arcueil",48.807819,2.326277,-429493.491168,5484047.0,5604.462508
9,"Laplace RER, 94110 Arcueil",48.80871,2.33425,-428893.491168,5484047.0,5408.326913


As you can see, we have some addresses outside Paris. This is because Paris is not a perfect circle, therefore some areas outside Paris have also been selected.  

To solve this, we are going to delete all neighborhoods outside Paris. We can use a Regex to check if the address ends with 'Paris'.

In [10]:
df = df[df.Neighborhood.str.match(".*Paris$")].reset_index(drop=True)
print('Final number of Neighborhoods = ', df.shape[0], '\n')
df.head()

Final number of Neighborhoods =  281 



Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center
0,"27 Avenue de la Porte d'Italie, 75013 Paris",48.816385,2.360992,-426793.491168,5484566.0,4714.870094
1,"9 B Boulevard Jourdan, 75014 Paris",48.818721,2.339888,-428293.491168,5485086.0,4253.234064
2,"jardin Jean-Claude-Nicolas-Forestier, Rue Thom...",48.819611,2.347863,-427693.491168,5485086.0,4167.733197
3,"4 Rue Keufer, 75013 Paris",48.820501,2.355838,-427093.491168,5485086.0,4167.733197
4,"16 Avenue de Choisy, 75013 Paris",48.82139,2.363814,-426493.491168,5485086.0,4253.234064


We will visualize the 281 neighborhoods to check if everything is OK:

In [11]:
map_paris = folium.Map(location=paris_center, zoom_start=12)

for label, lat, lon in zip(df['Neighborhood'], df['Latitude'], df['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.Circle([lat, lon], 
                  radius=300, 
                  color='royalblue', 
                  fill=True, 
                  fill_color='#3186cc',
                  fill_opacity=0.1,
                  popup=label).add_to(map_paris)

map_paris

It looks perfect now. We can move on to the venues.

### Get the venues for each Neighborhood (Foursquare API)
Now that we have our locations for each neighborhood, let's use Foursquare API to get info on venues in each neighborhood.  

We will use a 500 meter radius around our neighborhood center.

In [12]:
# Foursquare api credentials
# This code is not shown

In [13]:
# Foursquare API version
VERSION = '20180605'

# define the limit
LIMIT = 100

We need to define the function to get the venues' info for each neighborhood. We need to know the number of sporting facilities for each neighborhood and the coordinates of gyms and crossfit boxes.

Venues considered as Sporting Facilities (or Sport Venues) are:  
  * Basketball Court
  * Tennis Court
  * Soccer Field
  * Track Field
  * Boxing Club
  * Swimming Pool
  * Martial Arts Dojo

Venues considered as gyms go from Yoga, Pilates studios and Fitness Centers to Crossfit Boxes.  
  
We want to know:

- Number of Sporting Facilities per neighborhood
- Distance to closest Gym (Gym/Fitness Center)
- Distance to closest CrossFit Box

In [14]:
import re

# the id's are available at https://developer.foursquare.com/docs/resources/categories

sport_venues_ids = ['4bf58dd8d48988d18b941735', '4bf58dd8d48988d188941735', '4e39a891bd410d7aed40cbc2', 
                   '4bf58dd8d48988d187941735', '4bf58dd8d48988d1b4941735', '4bf58dd8d48988d1b7941735', 
                   '4e39a9cebd410d7aed40cbc4', '4bf58dd8d48988d1b6941735', '4f4528bc4b90abdf24c9de85', 
                   '52f2ab2ebcbc57f1066b8b47', '4bf58dd8d48988d105941735', '4bf58dd8d48988d101941735', 
                   '4bf58dd8d48988d106941735', '52e81612bcbc57f1066b7a2c', '4cce455aebf7b749d5e191f5', 
                   '52e81612bcbc57f1066b7a2e', '52e81612bcbc57f1066b7a2d', '4e39a956bd410d7aed40cbc3'
                   ]

gym_crossfit_ids = ['4bf58dd8d48988d1b2941735', '4bf58dd8d48988d175941735', '52f2ab2ebcbc57f1066b8b48',
                    '4bf58dd8d48988d176941735', '58daa1558bbb0b01f18ec203', '5744ccdfe4b0c0459246b4b2',
                    '590a0744340a5803fd8508c3', '4bf58dd8d48988d102941735'
                   ]

crossfitRegex = re.compile(r'.*[Cc]ross[Ff]it.*')


def is_sporting_facility(venue):
    """Returns True if the venue is a sporting facility"""
    if venue['venue']['categories'][0]['id'] in sport_venues_ids:
        return True

def is_gym(venue):
    """Returns True if the venue is a Gym, but not a Crossfit Gym"""
    if venue['venue']['categories'][0]['id'] in gym_crossfit_ids:
        if crossfitRegex.match(venue['venue']['name']):
            return False
        else:
            return True

def is_crossfit_box(venue):
    """Returns True if the venue is a Crossfit Box"""
    if venue['venue']['categories'][0]['id'] in gym_crossfit_ids:
        if crossfitRegex.match(venue['venue']['name']):
            return True

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=350):
    
    sporting_facilities, gyms_coordinates, crossfit_coordinates = [], [], []
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            foursquare_id, 
            foursquare_secret, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        sporting_fs = 0
        for v in results:
            if is_sporting_facility(v):
                sporting_fs += 1
            if is_gym(v):
                gym_name = v['venue']['name']
                latitude = v['venue']['location']['lat']
                longitude = v['venue']['location']['lng']
                x, y = lonlat_to_xy(longitude, latitude)
                gyms_coordinates.append([gym_name, latitude, longitude, x, y])
            if is_crossfit_box(v):
                crossfit_name = v['venue']['name']
                latitude = v['venue']['location']['lat']
                longitude = v['venue']['location']['lng']
                x, y = lonlat_to_xy(longitude, latitude)
                crossfit_coordinates.append([crossfit_name, latitude, longitude, x, y])
        sporting_facilities.append([name, sporting_fs])
    
    return sporting_facilities, gyms_coordinates, crossfit_coordinates
                      

def getMinimumDistance(x1, y1, coordinates_list):
    distances = np.zeros(len(coordinates_list))
    for i in range(len(coordinates_list)):
        x2 = coordinates_list[i][3]
        y2 = coordinates_list[i][4]
        distance = calc_xy_distance(x1, y1, x2, y2)
        distances[i] = distance
    return min(distances)
        
        
def getDistances(df, gyms_coordinates, crossfit_coordinates):
    df['Distance to Gym'] = 0
    df['Distance to Crossfit'] = 0
    for index, row in df.iterrows():
        x1 = row['X']
        y1 = row['Y']
        df.loc[index, 'Distance to Gym'] = getMinimumDistance(x1, y1, gyms_coordinates)
        df.loc[index, 'Distance to Crossfit'] = getMinimumDistance(x1, y1, crossfit_coordinates)
    return df

In [17]:
sporting_facilities, gyms_coordinates, crossfit_coordinates = getNearbyVenues(names = df['Neighborhood'],
                                                                              latitudes = df['Latitude'],
                                                                              longitudes = df['Longitude']
                                                                             )

In [18]:
df_final = getDistances(df, gyms_coordinates, crossfit_coordinates)
df_final.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center,Distance to Gym,Distance to Crossfit
0,"27 Avenue de la Porte d'Italie, 75013 Paris",48.816385,2.360992,-426793.491168,5484566.0,4714.870094,1383.497554,4844.821592
1,"9 B Boulevard Jourdan, 75014 Paris",48.818721,2.339888,-428293.491168,5485086.0,4253.234064,713.899536,4325.95122
2,"jardin Jean-Claude-Nicolas-Forestier, Rue Thom...",48.819611,2.347863,-427693.491168,5485086.0,4167.733197,521.73237,4770.713609
3,"4 Rue Keufer, 75013 Paris",48.820501,2.355838,-427093.491168,5485086.0,4167.733197,842.899983,4505.527699
4,"16 Avenue de Choisy, 75013 Paris",48.82139,2.363814,-426493.491168,5485086.0,4253.234064,1004.469723,4250.547462


Let's add the number of sporting facilities per neighborhood.

In [19]:
sporting_df = pd.DataFrame(sporting_facilities, columns=['Neighborhood', 'Sporting Facilities'])
df_merged = df_final.join(sporting_df.set_index('Neighborhood'), on='Neighborhood')
df_merged.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center,Distance to Gym,Distance to Crossfit,Sporting Facilities
0,"27 Avenue de la Porte d'Italie, 75013 Paris",48.816385,2.360992,-426793.491168,5484566.0,4714.870094,1383.497554,4844.821592,0
1,"9 B Boulevard Jourdan, 75014 Paris",48.818721,2.339888,-428293.491168,5485086.0,4253.234064,713.899536,4325.95122,0
2,"jardin Jean-Claude-Nicolas-Forestier, Rue Thom...",48.819611,2.347863,-427693.491168,5485086.0,4167.733197,521.73237,4770.713609,0
3,"4 Rue Keufer, 75013 Paris",48.820501,2.355838,-427093.491168,5485086.0,4167.733197,842.899983,4505.527699,0
4,"16 Avenue de Choisy, 75013 Paris",48.82139,2.363814,-426493.491168,5485086.0,4253.234064,1004.469723,4250.547462,1
5,"44 Rue Péan, 75013 Paris",48.822279,2.37179,-425893.491168,5485086.0,4419.275959,525.474122,3948.790729,0
6,"16 Avenue de la Porte de Vitry, 75013 Paris",48.823166,2.379766,-425293.491168,5485086.0,4657.252409,413.690369,3602.394331,0
7,"9 Rue Henri Barboux, 75014 Paris",48.821946,2.326758,-429193.491168,5485606.0,4058.324778,503.32977,3331.568175,1
8,"25 Rue du Parc de Montsouris, 75014 Paris",48.822837,2.334733,-428593.491168,5485606.0,3830.1436,525.377745,3750.3601,0
9,"77 Rue Brillat-Savarin, 75013 Paris",48.823727,2.342708,-427993.491168,5485606.0,3686.461718,198.033916,4213.200121,0


## PART 3: Analysis <a name="analysis"></a>

OK, now that we have the dataframe ready for analysis, let's get some preliminary info.

In [20]:
df_merged[df_merged['Sporting Facilities'] > 1 ]

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center,Distance to Gym,Distance to Crossfit,Sporting Facilities
18,"10 Rue Henry de Bournazel, 75014 Paris",48.825168,2.313626,-430093.491168,5486125.0,4124.318125,109.675514,2375.887002,2
32,"2 Rue Antonin Mercié, 75015 Paris",48.828389,2.300493,-430993.491168,5486645.0,4439.594576,561.560821,1533.079121,2
46,"Porte de Montempoivre, 75012 Paris",48.840841,2.41218,-422593.491168,5486645.0,5458.02162,619.897831,1722.117364,2
100,"5 Rue des Docteurs Dejerine, 75020 Paris",48.85497,2.412684,-422293.491168,5488204.0,5204.805472,692.959304,1121.135469,4
137,"10 Rue Joseph Python, 75020 Paris",48.864094,2.410358,-422293.491168,5489243.0,5100.0,1110.003778,1558.744215,2
195,"47 Avenue du Maréchal Fayolle, 75116 Paris",48.867189,2.269985,-432493.491168,5491321.0,5507.267925,402.943023,3382.922205,4
214,"63 Rue Pergolèse, 75116 Paris",48.873092,2.280781,-431593.491168,5491841.0,4938.62329,630.760421,3715.806551,2
266,"173 Rue Ordener, 75018 Paris",48.893472,2.33714,-427093.491168,5493400.0,4167.733197,221.232069,2956.535026,2
269,"1001 Rue du Pré, 75018 Paris",48.896145,2.361101,-425293.491168,5493400.0,4657.252409,316.819396,3836.119701,3
277,"4 Rue Camille Flammarion, 75018 Paris",48.898479,2.339963,-426793.491168,5493919.0,4714.870094,215.686905,3539.245169,2


In [21]:
print('The average distance from neighborhood to closest Gym is ', np.round(df_merged['Distance to Gym'].mean()), ' meters.')
print('The average distance from neighborhood to closest CrossFit Box is ', np.round(df_merged['Distance to Crossfit'].mean()), ' meters.')

The average distance from neighborhood to closest Gym is  549.0  meters.
The average distance from neighborhood to closest CrossFit Box is  2217.0  meters.


Now that we have this information, we can define our criteria to find the optimal location to open a CrossFit Box in Paris.  
  
We will target locations that match the **following criteria**:
    - Distance to Gym > 700 meters.
    - Distance to CrossFit Box > 2,200 meters
    - A maximum of 1 Sporting Facility in the neighborhood radius (350 meters)

Let's first visualize a Heatmap of Sporting Facilities in Paris.

In [22]:
gyms_latlongs = [[gym[1], gym[2]] for gym in gyms_coordinates]

In [23]:
from folium import plugins
from folium.plugins import HeatMap

map_paris = folium.Map(location=paris_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_paris) #cartodbpositron cartodbdark_matter
HeatMap(gyms_latlongs).add_to(map_paris)
folium.Marker(paris_center).add_to(map_paris)
map_paris

There are 4 areas where we can clearly see there are less sporting venues:
* North East
* Center of Paris (probably due to the fact it is a more touristic area, with less space to invest and very expensive)
* West
* South East

Now let's first visualize where are the CrossFit Boxes in Paris.

In [24]:
map_paris = folium.Map(location=paris_center, zoom_start=12)
for box in crossfit_coordinates:
    label = folium.Popup(box[0], parse_html=True)
    folium.Circle([box[1], box[2]], 
                    radius=200, 
                    color='orangered', 
                    fill=True, 
                    fill_color='orangered',
                    fill_opacity=0.1,
                    popup=label).add_to(map_paris)
map_paris

According to Foursquare, there are 4 CrossFit Boxes in Paris. 3 of them are located in the 'Rive Droite', which means the northern bank of the river Seine in Paris, and they are concentrated in the same place as the sporting facilities. The other is located in the 15th district (South West).  
  
We are interested in two main zones:  
 * North of Paris
 * South East
 * West

Now let's build the dataframe matching the conditions we previously defined for further analysis.

In [25]:
df_selected = df_merged[(df_merged['Sporting Facilities'] < 2) 
                        & (df_merged['Distance to Gym'] > 700) 
                        & (df_merged['Distance to Crossfit'] > 2200)]
df_selected

Unnamed: 0,Neighborhood,Latitude,Longitude,X,Y,Distance from center,Distance to Gym,Distance to Crossfit,Sporting Facilities
0,"27 Avenue de la Porte d'Italie, 75013 Paris",48.816385,2.360992,-426793.491168,5484566.0,4714.870094,1383.497554,4844.821592,0
1,"9 B Boulevard Jourdan, 75014 Paris",48.818721,2.339888,-428293.491168,5485086.0,4253.234064,713.899536,4325.95122,0
3,"4 Rue Keufer, 75013 Paris",48.820501,2.355838,-427093.491168,5485086.0,4167.733197,842.899983,4505.527699,0
4,"16 Avenue de Choisy, 75013 Paris",48.82139,2.363814,-426493.491168,5485086.0,4253.234064,1004.469723,4250.547462,1
12,"67 Rue Nationale, 75013 Paris",48.826395,2.366636,-426193.491168,5485606.0,3830.1436,706.935166,3658.143136,0
25,"22 Rue Duchefdelaville, 75013 Paris",48.831401,2.369459,-425893.491168,5486125.0,3459.768778,786.389392,3068.691688,0
41,"25 Rue Fernand Braudel, 75013 Paris",48.836406,2.372283,-425593.491168,5486645.0,3160.696126,808.169987,2484.295906,0
54,"83 Boulevard de Port-Royal, 75013 Paris",48.837854,2.343193,-427693.491168,5487164.0,2100.0,766.742721,3337.227607,0
73,"26 Rue Lhomond, 75005 Paris",48.84286,2.346014,-427393.491168,5487684.0,1558.845727,908.337228,2809.260681,0
157,"Porte de Passy, 75016 Paris",48.857175,2.264358,-433093.491168,5490282.0,5793.962375,842.397763,2758.579353,0


We can visualize the neighborhoods that made the cut:

In [26]:
map_paris = folium.Map(location=paris_center, zoom_start=12)
for label, lat, lon in zip(df_selected['Neighborhood'], df_selected['Latitude'], df_selected['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.Circle([lat, lon],
                  radius=200,
                  color='orangered',
                  fill=True,
                  fill_color='orangered',
                  fill_opacity=0.2,
                  popup=label).add_to(map_paris)
map_paris

Looking good. What we have now is a clear indication of zones with low number of sporting venues in vicinity, and *no* CrossFit Boxes at all nearby.

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

In [27]:
good_latitudes = df_selected['Latitude'].values
good_longitudes = df_selected['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

In [28]:
from sklearn.cluster import KMeans

number_of_clusters = 6

good_xys = df_selected[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_paris = folium.Map(location=paris_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_paris)
HeatMap(gyms_latlongs).add_to(map_paris)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='orangered', fill=True, fill_opacity=0.25).add_to(map_paris) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_paris)
for box in crossfit_coordinates:
    folium.Marker([box[1], box[2]], popup=box[0]).add_to(map_paris)
map_paris

Finaly, let's **reverse geocode those candidate area centers to get the addresses** which can be presented to stakeholders.

In [29]:
candidate_area_addresses = []
for lon, lat in cluster_centers:
    candidate = []
    add = get_address(google_api_key, lat, lon)
    candidate.append(add)
    x, y = lonlat_to_xy(lon, lat)
    distance_to_center = calc_xy_distance(x, y, paris_center_x, paris_center_y)
    candidate.append(distance_to_center)
    candidate.append(lat)
    candidate.append(lon)
    candidate_area_addresses.append(candidate)

### Clusters Comparison  
  
In order to choose the optimal cluster we now need to have in mind two things:  
* What is the distance to the center of the city.
* What kind of neighborhood it is (touristic, residential, ...)

The first one can be easily calculate, but to get an idea of the kind of neighborhood it is, we will need to use the Foursquare API and get the 10 most common venues per cluster in a radius of 500 m.

Let's first define the function to get that:

In [32]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    """Function to get nearby venues from an address"""
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            foursquare_id, 
            foursquare_secret, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


def get_grouped_dataframe(df):
    """Function that takes the venues dataframe as input and returns a dataframe of venues grouped by neighborhoods"""
    # one hot encoding
    df_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")
    # add neighborhood column back to dataframe
    df_onehot['Neighborhood'] = df['Neighborhood'] 
    # move neighborhood column to the first column
    fixed_columns = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
    df_onehot = df_onehot[fixed_columns]
    df_grouped = df_onehot.groupby('Neighborhood').mean().reset_index()
    return df_grouped

def return_most_common_venues(row, num_top_venues):
    """Function to sort the venues in descending order"""
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


def create_dataframe_top_venues(num_top_venues, df_grouped):
    """Function that creates a dataframe and display the top X venues for each neighborhood."""
    
    indicators = ['st', 'nd', 'rd']
    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = df_grouped['Neighborhood']

    for ind in np.arange(df_grouped.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_grouped.iloc[ind, :], num_top_venues)

    return neighborhoods_venues_sorted

In [33]:
candidate_names = [item[0] for item in candidate_area_addresses]
candidate_lats = [item[2] for item in candidate_area_addresses]
candidate_lons = [item[3] for item in candidate_area_addresses]

nearby_venues = getNearbyVenues(names=candidate_names, 
                               latitudes=candidate_lats,
                               longitudes=candidate_lons
                               )

grouped_venues = get_grouped_dataframe(nearby_venues)

top_venues = create_dataframe_top_venues(15, grouped_venues)

top_venues

33 Rue du Département, 75018 Paris, France
61 Rue Claude Bernard, 75005 Paris, France
63 Avenue du Maréchal Fayolle, 75116 Paris, France
89 Avenue de Choisy, 75013 Paris, France
108 Bd Sérurier, 75019 Paris, France
24 Rue Laure Diebold, 75008 Paris, France


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,"108 Bd Sérurier, 75019 Paris, France",French Restaurant,Bar,Hotel,Pizza Place,Grocery Store,Supermarket,Pool,Brewery,Burger Joint,Sandwich Place,Pub,Japanese Restaurant,Bistro,Plaza,Park
1,"24 Rue Laure Diebold, 75008 Paris, France",French Restaurant,Hotel,Café,Clothing Store,Asian Restaurant,Salad Place,Italian Restaurant,Hotel Bar,Chocolate Shop,Sushi Restaurant,Pizza Place,Coffee Shop,Hookah Bar,Playground,Cocktail Bar
2,"33 Rue du Département, 75018 Paris, France",Indian Restaurant,French Restaurant,Hotel,Plaza,Lounge,Bakery,Theater,Café,Grocery Store,Garden,Wine Bar,Motorcycle Shop,Gym Pool,Dim Sum Restaurant,Diner
3,"61 Rue Claude Bernard, 75005 Paris, France",Café,French Restaurant,Hotel,Greek Restaurant,Bar,Science Museum,Pizza Place,Coffee Shop,Bistro,Creperie,Dessert Shop,Wine Bar,Italian Restaurant,Farmers Market,Indie Movie Theater
4,"63 Avenue du Maréchal Fayolle, 75116 Paris, Fr...",Café,Italian Restaurant,Bakery,Diner,Soccer Stadium,Plaza,Chinese Restaurant,Supermarket,Tennis Court,French Restaurant,Park,Track,Garden,Train Station,Gym Pool
5,"89 Avenue de Choisy, 75013 Paris, France",Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,Cantonese Restaurant,French Restaurant,Cambodian Restaurant,Fast Food Restaurant,Supermarket,Bakery,Park,Coffee Shop,Café,Gourmet Shop,Japanese Restaurant


## **<font color='mediumblue'>CLUSTER 1:</font>**

In [82]:
print('Address is: ', candidate_area_addresses[0][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[0][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[0][0]]

Address is:  33 Rue du Département, 75018 Paris, France
Distance to Paris Center is :  3693.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
2,"33 Rue du Département, 75018 Paris, France",Indian Restaurant,French Restaurant,Hotel,Plaza,Lounge,Bakery,Theater,Café,Grocery Store,Garden,Wine Bar,Motorcycle Shop,Gym Pool,Dim Sum Restaurant,Diner


**XVIII DISTRICT**: "A district of all paradoxes, the 18th is home to Montmartre, one of the most beautiful - and most touristic - places in the world, as well as popular areas long neglected.
Neighborhoods that are renewed and welcome a new young population, specially around the City Hall of the 18th (Jules Joffrin), which has very nice surprises. Ideal for going out at night while being close to the main Parisian attractions."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

Our cluster is located in the popular area. **If our CrossFit Box aims at lower economic status this residential area could be considered.**

## **<font color='mediumblue'>CLUSTER 2:</font>**

In [74]:
print('Address is: ', candidate_area_addresses[1][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[1][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[1][0]]

Address is:  61 Rue Claude Bernard, 75005 Paris, France
Distance to Paris Center is :  1825.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
3,"61 Rue Claude Bernard, 75005 Paris, France",Café,French Restaurant,Hotel,Greek Restaurant,Bar,Science Museum,Pizza Place,Coffee Shop,Bistro,Creperie,Dessert Shop,Wine Bar,Italian Restaurant,Farmers Market,Indie Movie Theater


**V DISTRICT**: "The Latin District, a neighborhood known throughout the world for its history and cultural influence. The oldest part of Paris with the Ile de la Cité, there are some vestiges of the Gallo-Roman era and the mythical Sorbonne, symbol of the University established in Paris in the 12th century.

A district where each street breathes history, and where exceptional heritage meets, great Parisian museums and student atmosphere."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

**Best location if we target younger people (students), and also if we want to be in a very centric place. It is very close to the center.**

## **<font color='mediumblue'>CLUSTER 3:</font>**

In [75]:
print('Address is: ', candidate_area_addresses[2][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[2][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[2][0]]

Address is:  63 Avenue du Maréchal Fayolle, 75116 Paris, France
Distance to Paris Center is :  5520.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
4,"63 Avenue du Maréchal Fayolle, 75116 Paris, Fr...",Café,Italian Restaurant,Bakery,Diner,Soccer Stadium,Plaza,Chinese Restaurant,Supermarket,Tennis Court,French Restaurant,Park,Track,Garden,Train Station,Gym Pool


**XVI DISTRICT**: "Nowadays, this district is completely inside of Paris, but it has not always been the case since for more than 1,000 years, the entire district - now bourgeois and very green - was located outside the capital. More residential than touristic, it is still visited by the curious since it has the famous Trocadero, the Bois de Boulogne, the Parc des Princes or the Roland-Garros stadium."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

**This could be interesting if we target the higher economic status population of Paris. However, distance to the center is not optimal, quite far away.**

## **<font color='mediumblue'>CLUSTER 4:</font>**

In [76]:
print('Address is: ', candidate_area_addresses[3][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[3][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[3][0]]

Address is:  89 Avenue de Choisy, 75013 Paris, France
Distance to Paris Center is :  3864.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
5,"89 Avenue de Choisy, 75013 Paris, France",Vietnamese Restaurant,Asian Restaurant,Thai Restaurant,Chinese Restaurant,Cantonese Restaurant,French Restaurant,Cambodian Restaurant,Fast Food Restaurant,Supermarket,Bakery,Park,Coffee Shop,Café,Gourmet Shop,Japanese Restaurant


**XIII DISTRICT**: "More local and less touristic, the 13th arrondissement is known to be a former working class district but also the current Asian neighborhood and the district of Butte-aux-Cailles located on the heights of Paris.
The area is also home to the Bibliothèque François-Mitterrand, the Austerlitz train station, the Gobelins factory and the renowned hospital of La Pitié-Salpêtrière."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

**As we can observe from the most common venues, our cluster is inside the Asian neighborhood.**

## **<font color='mediumblue'>CLUSTER 5:</font>**

In [77]:
print('Address is: ', candidate_area_addresses[4][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[4][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[4][0]]

Address is:  108 Bd Sérurier, 75019 Paris, France
Distance to Paris Center is :  4994.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,"108 Bd Sérurier, 75019 Paris, France",French Restaurant,Bar,Hotel,Pizza Place,Grocery Store,Supermarket,Pool,Brewery,Burger Joint,Sandwich Place,Pub,Japanese Restaurant,Bistro,Plaza,Park


**XIX DISTRICT**: "Former industrial district developed around the Canal de l'Ourcq, the 19th arrondissement knows today very important transformations, symbolized by the Parc de la Villette and its museums, new major cultural center of Paris.

A popular district with a deep Parisian soul, which also reserves secret secret walks around Buttes Chaumont. A neighborhood that goes up, and deserves to be discovered."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

**Trendy neighborhood. Quite far from center.**

## **<font color='mediumblue'>CLUSTER 6:</font>**

In [78]:
print('Address is: ', candidate_area_addresses[5][0])
print('Distance to Paris Center is : ', np.round(candidate_area_addresses[5][1]), ' ms.')
print('\n Top venues are:', '\n')
top_venues.loc[top_venues['Neighborhood'] == candidate_area_addresses[5][0]]

Address is:  24 Rue Laure Diebold, 75008 Paris, France
Distance to Paris Center is :  3537.0  ms.

 Top venues are: 



Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
1,"24 Rue Laure Diebold, 75008 Paris, France",French Restaurant,Hotel,Café,Clothing Store,Asian Restaurant,Salad Place,Italian Restaurant,Hotel Bar,Chocolate Shop,Sushi Restaurant,Pizza Place,Coffee Shop,Hookah Bar,Playground,Cocktail Bar


**VIII DISTRICT**: "On the right bank, the 8th arrondissement is that of luxury and fashion, symbolized by the "golden triangle" that form the rue Montaigne, the rue George V and the avenue des Champs-Élysées, whose atmosphere extend to rue Royale and that of Faubourg Saint-Honoré.

The 8th arrondissement is also home to a few ministries, as well as the Palais de l'Elysée, residence of the President of the French Republic."

_Source: https://www.unjourdeplusaparis.com/paris-essentiel/paris-par-arrondissements_

**Very good location if we were to open a restaurant, but not a CrossFit Box (tourists are not our target). Close to center, but surely expensive.**

## PART 4: Results and Discussion <a name="results"></a>

Our analysis shows that there are _6 zones_ we should consider if we want to open a CrossFit Box in Paris. However, it also shows **CLUSTER 4** is in a very specific district (Asian neighborhood) and this could be risky. **CLUSTER 6** is not what we are looking for because it is a very touristic neighborhood, full of hotels and less residential. Also, **CLUSTER 5** is located in a trendy neighborhood, which is growing, but very far from city center. We think this could be risky as well.  
  
This leaves us with three interesting options:

* **CLUSTER 1**: Located in a popular and residential area in the North of the city. Perfect if we want to make a low investment and apply a **LOW-COST STRATEGY** for our CrossFit Box.
  
  
* **CLUSTER 2**: Located in a very famous and centric place. Particularly interesting if we are targeting **students** (a very good profile for a CrossFit gym). Investment would be higher as it is considered as a historical district.  
  
  
* **CLUSTER 3**: Located in the most "bourgeois" district. Residential area. Very interesting if we target the upper-middle-class with a **DIFFERENTIATION STRATEGY** (Higher investment).
  
    
Recommended zones should be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## PART 5: Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Paris areas close to center with low number sporting facilities (particularly Gyms and CrossFit Boxes) in order to aid stakeholders in narrowing down the search for optimal location for a new CrossFit Box. By calculating sporting venues density distribution from Foursquare data we have first identified general areas that justify further analysis (North, South East, West), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby gyms and crossfit boxes. Clustering of those locations was then performed in order to create major zones of interest and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decision on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (_economic status_: upper-middle-class vs lower class, _age_: undetermined vs students, _degree of residents_: very high vs high).