<H2>Capstone Project: Finding Locations to Open a Gym in Brooklyn, NY</H2>

<h2>1. Introduction</h2>

A fitness club group is interested in opening their gym/fitness center in Brooklyn, NY. This project report is for the director board of the fitness club to suggest potential gym/fitness center locations closer to the city center of Brooklyn, NY & away from other gym/fitness centers, Boxing clubs or Gym pools.

There are several gym & fitness centers already operating in the Brooklyn area. <b>Our goal is to identify locations within 5km from the Brooklyn city center and 2.5km away from an existing Gym or Fitness club</b>. We will leverage the <b>Foursquare Places API</b> to find the candidate neighborhood centers for the Gym.

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

<h2>2. Download and Explore Dataset</h2>

We will essentially need a dataset that contains the borough 'Brooklyn' and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

Fortunately, we have a dataset available free in the New York University's spatial data repository geo.nyu.edu. The dataset is named 'New York City Neighborhood Names' in JSON format consists of the boroughs, neighborhood names, geo coordinates, etc of New York. Here are the links to the New York neighborhood dataset:

Description: https://geo.nyu.edu/catalog/nyu-2451-34572 <br />
Download GeoJSON: https://geo.nyu.edu/catalog/nyu_2451_34572

Let's run the `wget` command and access the data. So let's go ahead and do that.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [4]:
#Explore data

neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Tranform the data into a *pandas* dataframe

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [5]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Then let's loop through the data and fill the dataframe one row at a time.

In [6]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Quickly examine the resulting dataframe.

In [7]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Let's slice the original dataframe and create a new dataframe of the Brooklyn data.

In [8]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


Let's see the shape of the dataframe

In [9]:
brooklyn_data.shape

(70, 4)

#### Use geopy library to get the latitude and longitude values of Brooklyn, NY.
In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>ny_explorer</em>, as shown below.

In [10]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


<h3>Visualize neighborhoods</h3>

#### Create a map of Brooklyn with neighborhoods superimposed on top.

Let's visualize the Brooklyn neighborhood locations using the data we have using python's folium library for map rendering. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

In [41]:
# create map of Manhattan using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)
brooklyn = [40.6501038, -73.9495823]

# add markers to map
folium.Marker(brooklyn).add_to(map_brooklyn)
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

## 3. Explore Neighborhods in Brooklyn

<h3>Foursquare API</h3>
Next, we are going to start utilizing the Foursquare Places API to explore the neighborhoods and segment them.

#### Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = 'QOZYZ3COF2WH1YL5Z4KU5YIP24DD2J0NZBJRKVOUFDHBK0EY' # your Foursquare ID
CLIENT_SECRET = 'QPHPHREZBWB0QJTC2ARM5BPLDAKXDDG05XO51KV3RBR2CGZG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QOZYZ3COF2WH1YL5Z4KU5YIP24DD2J0NZBJRKVOUFDHBK0EY
CLIENT_SECRET:QPHPHREZBWB0QJTC2ARM5BPLDAKXDDG05XO51KV3RBR2CGZG


#### Get the top 100 venues that are in neighborhoods within a radius of 500 meters.

Let's create a function with the GET request URL to explore venues for all the neighborhoods in Brooklyn.

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


#### Now write the code to run the above function on each neighborhood and create a new dataframe called *brooklyn_venues*.

In [14]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


List the resulting dataframe

In [15]:
brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place


### List the existing Gyms from *brooklyn_venues*

Let's list the gym/fitness centers from brooklyn_venues dataframe 

In [16]:
brooklyn_gym = brooklyn_venues[brooklyn_venues['Venue Category'].str.contains('Gym')]
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym
5,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym
6,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center
7,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym
8,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym
9,Brooklyn Heights,40.695864,-73.993782,Xtend Barre Brooklyn Heights,40.693599,-73.992376,Gym / Fitness Center


### Create the map of existing gyms in Brooklyn

In [40]:
# create map of Manhattan using latitude and longitude values
brooklyn = [40.6501038, -73.9495823]
brooklyn_map = folium.Map(location=brooklyn, zoom_start=12)
folium.Marker(brooklyn).add_to(brooklyn_map)
# add markers to map
for lat, lng, label in zip(brooklyn_gym['Venue Latitude'], brooklyn_gym['Venue Longitude'], brooklyn_gym['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(brooklyn_map)  
    
brooklyn_map

We have all the gyms/fitness centers (500m from the neighborhood centers) listed and shown in the map. We're now ready to use this data for analysis to produce the report on optimal locations for a new gym/finess center in Brooklyn, NY

## 4. Methodology

In this project, we are limiting our area of analysis to an area of 5km around the Brooklyn city center. We are looking for areas with a lower density of gym/fitness centers.

As the first step, we have collected the New York neighborhood data from New York University's spatial data repository geo.nyu.edu. The dataset is named 'New York City Neighborhood Names' in JSON format consists of the boroughs, neighborhood names, geo coordinates, etc of New York.

The second step will be calculating the gym/fitness center density in our target area. We will use heatmaps to find a few areas 5km around Brooklyn city center with low gym/fitness center density and focus our attention to those areas.

In the third and final step, we will create a denser grid of location candidates in our area of interest. Let’s make the location candidates 200m apart from each other. We will apply the K-Means clustering model on the candidate locations data to group the location candidates into zones. The centers of the zones will be our resulted locations for a potential new gym/fitness center.

## 5. Analysis

### Distance from Centre & Nearest Gym

Let's write the functions to calculate the distance of the venues from te city center & distance to the nearest gym

But while exploring the venues in each neighborhood & filtering the gym/fitness center, we have to add important features such as coordinates of the location in UTM cartesian coordinate system (X/Y coordinates in meters) and the distance from the Brooklyn city center to the venue.

We will create python functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters) as well as finding the distance of each Gym/Fitness center from the city center.

In [18]:
#Functions

!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)


Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/38/b6/b53f19062afd49bb5abd049aeed36f13bf8d57ef8f3fa07a5203531a0252/Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5MB)
[K     |████████████████████████████████| 1.5MB 11.1MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/7a/b1/ab67ad924770e1c1432fa0953a665b8ea193b60c7494457b69da052d6e83/pyproj-2.4.0-cp36-cp36m-manylinux1_x86_64.whl (10.1MB)
[K     |████████████████████████████████| 10.1MB 13.8MB/s eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.4.0


Let's convert WGS84 spherical coordinate system (latitude/longitude degrees) to UTM Cartesian coordinate system (X/Y coordinates in meters)

In [19]:
X = []
Y = []
for lat, lon in zip (brooklyn_gym['Venue Latitude'], brooklyn_gym['Venue Longitude']):
    lo, la = lonlat_to_xy(lon, lat)
    X.append(lo)
    Y.append(la)

Add the cartesian coordinates into *brooklyn_gym* dataframe

In [20]:
brooklyn_gym['X'] = X
brooklyn_gym['Y'] = Y

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


List the updated *brooklyn_gym* dataframe 

In [21]:
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,X,Y
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center,-5837706.0,9871937.0
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym,-5833708.0,9870246.0
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym,-5833754.0,9870285.0
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym,-5818871.0,9863118.0
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym,-5842722.0,9864887.0
5,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym,-5827771.0,9864680.0
6,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center,-5827676.0,9863620.0
7,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym,-5822797.0,9863420.0
8,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym,-5824139.0,9858481.0
9,Brooklyn Heights,40.695864,-73.993782,Xtend Barre Brooklyn Heights,40.693599,-73.992376,Gym / Fitness Center,-5825490.0,9867757.0


Calculate the distance from existing gym/fitness centers to brooklyn city center

In [23]:
brooklyn_x, brooklyn_y = lonlat_to_xy(longitude, latitude) # City center in Cartesian coordinates

distances_from_center = []
for i in range(len(brooklyn_gym)):
        ds = calc_xy_distance(brooklyn_x, brooklyn_y, brooklyn_gym.iloc[i, 7], brooklyn_gym.iloc[i, 8])
        distances_from_center.append(ds)

In [24]:
distances_from_center[0:5]

[11096.556208660997,
 8282.19140784975,
 8326.085154117502,
 13894.753652070982,
 10401.383936303522]

Add the distance from center to *brooklyn_gym* dataframe

In [25]:
brooklyn_gym['distances_from_center'] = distances_from_center

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Let's list the updated dataframe with distance from center column

In [26]:
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,X,Y,distances_from_center
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center,-5837706.0,9871937.0,11096.556209
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym,-5833708.0,9870246.0,8282.191408
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym,-5833754.0,9870285.0,8326.085154
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym,-5818871.0,9863118.0,13894.753652
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym,-5842722.0,9864887.0,10401.383936
5,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym,-5827771.0,9864680.0,5619.288109
6,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center,-5827676.0,9863620.0,5293.496694
7,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym,-5822797.0,9863420.0,10022.914315
8,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym,-5824139.0,9858481.0,9285.306089
9,Brooklyn Heights,40.695864,-73.993782,Xtend Barre Brooklyn Heights,40.693599,-73.992376,Gym / Fitness Center,-5825490.0,9867757.0,9230.296221


There is a total of 55 existing gym/fitness centers superimposed on the map for further analysis.

### Heatmap

Let's create a heatmap using the *brooklyn_gym* locations. And direct our point of attention to 5km around Brooklyn city center.

In [42]:
#Heatmap
from folium import plugins
from folium.plugins import HeatMap

gym_latlons = brooklyn_gym[['Venue Latitude','Venue Longitude']].values.tolist()
brooklyn_map = folium.Map(location=brooklyn, zoom_start=11)
folium.Marker(brooklyn).add_to(brooklyn_map)
HeatMap(gym_latlons).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
    
brooklyn_map

The heatmap will help us to identify a few promising areas with a low number of gyms in general. Here we can see the pockets from the heatmap where low density of gym/fitness centers in our focus area.

### New location candidates

As we already know, 5km around the city center is our target area. Let’s create denser grid of location candidates in our area of interest. Let’s make the location candidates 200m apart from each other.

We will generate the location candidates using our python function and save the latitude and longitude of the location candidates in the candidate location data frame. We will also calculate the cartesian coordinates in meter & distance to the nearest gym using python functions and add in to the candidate location dataframe.

In [28]:
# Create location candidates 200M apart

brooklyn_center_x, brooklyn_center_y = lonlat_to_xy(brooklyn[1], brooklyn[0]) # City center in Cartesian coordinates


k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 200
y_step = 200 * k 
roi_x_min = brooklyn_center_x - 5000
roi_y_min = brooklyn_center_y - 5000
roi_y_max = brooklyn_center_y + 5000

roi_center_x = roi_x_min + 5000
roi_center_y = roi_y_max - 5000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 5001):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2263 candidate neighborhood centers generated.


Now, let's calculate the distance from candidate centers to the nearest gym/fitness center.

In [29]:
def find_nearest_gym(x, y, gym):
    d_min = 100000
    for res in gym:
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_gym_distances = []

gym = brooklyn_gym.values.tolist()

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    distance = find_nearest_gym(x, y, gym)
    roi_gym_distances.append(distance)
print('done.')

Generating data on location candidates... done.


Our new location candidate dataframe is given below

In [30]:
# Let's put this into dataframe
df_gym_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Distance to nearest Gym':roi_gym_distances})

df_gym_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Distance to nearest Gym
0,40.649559,-73.910952,-5832672.0,9857023.0,2792.590294
1,40.642236,-73.912569,-5833922.0,9857196.0,1674.436758
2,40.643412,-73.912525,-5833722.0,9857196.0,1852.669494
3,40.644589,-73.91248,-5833522.0,9857196.0,2034.9521
4,40.645765,-73.912435,-5833322.0,9857196.0,2220.287334
5,40.646942,-73.91239,-5833122.0,9857196.0,2407.97044
6,40.648118,-73.912345,-5832922.0,9857196.0,2597.492529
7,40.649295,-73.912301,-5832722.0,9857196.0,2788.478659
8,40.650471,-73.912256,-5832522.0,9857196.0,2980.647418
9,40.651648,-73.912211,-5832322.0,9857196.0,3173.783991


Let's filter hose location. We need the only locations having no gym/fitness centers in the radius of 2.5km (least distance to the nearest gym).

In [31]:
# relevant location candidates

good_gym_distance = np.array(df_gym_locations['Distance to nearest Gym']>=2500)

Let's list our *df_good_location* dataframe with relevant location candiates 

In [32]:
df_good_location = df_gym_locations[good_gym_distance] 
df_good_location.reset_index(drop="True")

Unnamed: 0,Latitude,Longitude,X,Y,Distance to nearest Gym
0,40.649559,-73.910952,-5832672.0,9857023.0,2792.590294
1,40.648118,-73.912345,-5832922.0,9857196.0,2597.492529
2,40.649295,-73.912301,-5832722.0,9857196.0,2788.478659
3,40.650471,-73.912256,-5832522.0,9857196.0,2980.647418
4,40.651648,-73.912211,-5832322.0,9857196.0,3173.783991
5,40.652824,-73.912166,-5832122.0,9857196.0,3212.486624
6,40.654001,-73.912122,-5831922.0,9857196.0,3110.753376
7,40.655178,-73.912077,-5831722.0,9857196.0,3018.857866
8,40.656354,-73.912032,-5831522.0,9857196.0,2937.72345
9,40.647265,-73.913717,-5833072.0,9857369.0,2516.882441


In [51]:
print ("Number of location candidates with no other gym in a radius of 2.5km:", df_good_location.shape[0])

Number of location candidates with no other gym in a radius of 2.5km: 690


Let's see how the selected location candidates looks on a map.

In [43]:
good_latitudes = df_good_location['Latitude'].values
good_longitudes = df_good_location['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
HeatMap(gym_latlons).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 

brooklyn_map

Let's now show those good location candidates in a form of heatmap:

In [44]:
brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
HeatMap(good_locations).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 

brooklyn_map

<h2> 6. Modeling</h2>

We have already identified this as an <b>unsupervised learning problem</b>. Having unlabeled data, we should apply a <b>clustering model</b> to segment the locations. Here we are using <b>K-Means clustering</b> model. It’s one of the simplest clustering models and it is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data. 

Let’s apply the K-Means clustering model on the candidate locations data to create centers of zones containing good locations. The map shows the resulted grouping of location candidates. The centers are placed in the middle of the zone’s dense with candidate locations.

In [45]:
#K-Means clustering

from sklearn.cluster import KMeans

number_of_clusters = 10

good_xys = df_good_location[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)
cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(brooklyn_map) 

brooklyn_map

Those zone centers and addresses will be the final result of our analysis.

In [46]:
cluster_centers

[(-73.9627999710185, 40.6620131332326),
 (-73.94125851543225, 40.62733464325463),
 (-73.92699203056455, 40.643786839906745),
 (-73.97458409579738, 40.63370057452641),
 (-73.98217622034818, 40.64372781145237),
 (-73.93235434768745, 40.6330347917024),
 (-73.91606672257319, 40.65127428275275),
 (-73.93962768287722, 40.67780562945376),
 (-73.98047435650962, 40.654508022753404),
 (-73.95363980708976, 40.62717500919314)]

<h3>Addresses of the zone centers</h3>
    
Let’s find the addresses of the locations using  <b>Google Map API reverse geocoding</b>. These are the best possible locations suggestion we are making for the new gym/fitness center. This will help the management to find the best possible location based on the neighborhood specifics.

In [37]:
google_api_key = 'AIzaSyD2QcZXSnJ0sWSQnj3q2xRdqE8HLg-y68c'
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

In [38]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', USA', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, brooklyn_center_x, brooklyn_center_y)
    print('{}{} => {:.1f}km from Brooklyn center'.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

39 Ocean Ave, Brooklyn, NY 11225                   => 2.6km from Brooklyn center
3602 Avenue J, Brooklyn, NY 11210                  => 4.0km from Brooklyn center
638 E 53rd St, Brooklyn, NY 11203                  => 3.1km from Brooklyn center
401 Avenue F, Brooklyn, NY 11218                   => 4.3km from Brooklyn center
48 Clara St, Brooklyn, NY 11218                    => 4.4km from Brooklyn center
4524 Glendale Ct, Brooklyn, NY 11234               => 3.7km from Brooklyn center
652 E 92nd St, Brooklyn, NY 11236                  => 4.3km from Brooklyn center
1562 Atlantic Ave, Brooklyn, NY 11213              => 4.9km from Brooklyn center
593 20th St, Brooklyn, NY 11218                    => 4.1km from Brooklyn center
951 E 23rd St, Brooklyn, NY 11210                  => 3.9km from Brooklyn center


In [47]:
brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(brooklyn_map)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(brooklyn_map) 

brooklyn_map

<h2> Results & Discussion</h2>

Our analysis clearly identifies areas with lower density of gym/fitness centers in Brooklyn. We focused on an area of 5km around Brooklyn as we need the gym/fitness center location closer to the Brooklyn city center.  We have created a dense grid of location candidates spaced 200m apart and applied filtering on the dataset to list the only location candidates with no other gym or fitness center present within 2.5km.

We have used machine learning model K-Means Clustering cluster the location candidates into create zones of interest which contain dense location candidates. We used Google Map API reverse geocoding to generate the addresses of the 10 zone centers
The result of our analysis in the 10 location which are the centers of the location candidate zones. The result is based on target area – around 5km from Brooklyn city center - and 2.5km to a nearest gym or fitness center. Our goal is to provide this locations to the director board of the fitness club. And based on the analysis I recommend this 10 locations in your pursuit to identify the best location for a gym/fitness center.  <b>But it is entirely possible that there is a very good reason for no gym/fitness centers in those areas</b>. Regardless of the competition, the reasons can make those areas unsuitable for a gym/fitness center. <b>These recommended locations will be a good starting point for further analysis</b> and finding the best location for the new gym/fitness center.

<h2>Conclusion</h2>

We have identified 10 locations for a new gym/fitness center with low density of gym/fitness centers based on our goal to assist the director board of a fitness club. By finding the existing gym/fitness centers using Foursquare API we identified areas with low density of gym/fitness centers. We generated a number of locations in these areas based on the criteria and listed the location candidates for  a new gym/fitness center.

We performed clustering of those locations and grouped the location candidates into zones based on the density of location candidates.  The address of those zone centers are the 10 locations we have identified as potential locations for a new gym/fitness center.
The final decision on the new location will be made by the director board of the fitness club. They should consider specific characteristics of the neighborhoods of each recommended location. Additional factors like residential areas, traffic, roads, social & economic dynamics of the neighborhoods should be taken into consideration. 