<H2>Capstone Project: Finding Locations to Open a Gym in Brooklyn, NY</H2>

<h2>1. Introduction</h2>

A fitness club group is interested in opening their gym/fitness center in Brooklyn, NY. This project report is for the director board of the fitness club to suggest potential gym/fitness center locations closer to the city center of Brooklyn, NY & away from other gym/fitness centers, Boxing clubs or Gym pools.

There are several gym & fitness centers already operating in the Brooklyn area. <b>Our goal is to identify locations within 5km from the Brooklyn city center and about 3km away from an existing Gym or Fitness club</b>. We will leverage the <b>Foursquare Places API</b> to find the candidate neighborhood centers for the Gym.

In [1]:

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [2]:
#Capstone Project
print("Hello Capstone Project Course!")

Hello Capstone Project Course!


<h2>Data</h2>

In [3]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [5]:
#Explore data

neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [8]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


However, for illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Brooklyn data.

In [9]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


In [13]:
brooklyn_data.shape

(70, 4)

In [10]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


# #Visualize

In [11]:
# create map of Manhattan using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)
brooklyn = [40.6501038, -73.9495823]

# add markers to map
folium.Marker(brooklyn).add_to(map_brooklyn)
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

## Explore Neighborhods in Brooklyn

In [12]:
CLIENT_ID = 'QOZYZ3COF2WH1YL5Z4KU5YIP24DD2J0NZBJRKVOUFDHBK0EY' # your Foursquare ID
CLIENT_SECRET = 'QPHPHREZBWB0QJTC2ARM5BPLDAKXDDG05XO51KV3RBR2CGZG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: QOZYZ3COF2WH1YL5Z4KU5YIP24DD2J0NZBJRKVOUFDHBK0EY
CLIENT_SECRET:QPHPHREZBWB0QJTC2ARM5BPLDAKXDDG05XO51KV3RBR2CGZG


In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


In [14]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude']
                                  )

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


In [15]:
brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Cocoa Grinder,40.623967,-74.030863,Juice Bar
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place


## Gym

In [16]:
brooklyn_gym = brooklyn_venues[brooklyn_venues['Venue Category'].str.contains('Gym')]
brooklyn_gym.shape
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym
5,Kensington,40.642382,-73.980421,Yeled VYalda Fitness Center,40.640745,-73.985359,Gym
6,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym
7,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center
8,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym
9,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym


In [17]:
# create map of Manhattan using latitude and longitude values
brooklyn = [40.6501038, -73.9495823]
brooklyn_map = folium.Map(location=brooklyn, zoom_start=12)
folium.Marker(brooklyn).add_to(brooklyn_map)
# add markers to map
for lat, lng, label in zip(brooklyn_gym['Venue Latitude'], brooklyn_gym['Venue Longitude'], brooklyn_gym['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(brooklyn_map)  
    
brooklyn_map

## Distance from Centre & Nearest Gym

In [22]:
#Functions

!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)


Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/38/b6/b53f19062afd49bb5abd049aeed36f13bf8d57ef8f3fa07a5203531a0252/Shapely-1.6.4.post2-cp36-cp36m-manylinux1_x86_64.whl (1.5MB)
[K     |████████████████████████████████| 1.5MB 10.8MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.6.4.post2
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/20/37/86f8a661cf252ff16a1a11b2c2a452e0d19aebf8934cc70e9a95d2d038be/pyproj-2.3.1-cp36-cp36m-manylinux1_x86_64.whl (9.8MB)
[K     |████████████████████████████████| 9.8MB 9.0MB/s eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.3.1


In [23]:
X = []
Y = []
for lat, lon in zip (brooklyn_gym['Venue Latitude'], brooklyn_gym['Venue Longitude']):
    lo, la = lonlat_to_xy(lon, lat)
    X.append(lo)
    Y.append(la)

In [24]:
brooklyn_gym['X'] = X
brooklyn_gym['Y'] = Y

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


In [25]:
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,X,Y
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center,-5837706.0,9871937.0
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym,-5833708.0,9870246.0
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym,-5833754.0,9870285.0
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym,-5818871.0,9863118.0
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym,-5842722.0,9864887.0
5,Kensington,40.642382,-73.980421,Yeled VYalda Fitness Center,40.640745,-73.985359,Gym,-5834440.0,9866608.0
6,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym,-5827771.0,9864680.0
7,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center,-5827676.0,9863620.0
8,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym,-5822797.0,9863420.0
9,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym,-5824139.0,9858481.0


In [26]:
brooklyn_x, brooklyn_y = lonlat_to_xy(longitude, latitude) # City center in Cartesian coordinates

distances_from_center = []
for i in range(len(brooklyn_gym)):
        ds = calc_xy_distance(brooklyn_x, brooklyn_y, brooklyn_gym.iloc[i, 7], brooklyn_gym.iloc[i, 8])
        distances_from_center.append(ds)

In [27]:
distances_from_center

[11096.556208660997,
 8282.19140784975,
 8326.085154117502,
 13894.753652070982,
 10401.383936303522,
 4896.269325425681,
 5619.28810922445,
 5293.496694440897,
 10022.914314676374,
 9285.306089178075,
 9230.296220979302,
 9214.674554383879,
 9023.971209150868,
 9154.754015344888,
 10000.751109382927,
 8337.05778258257,
 8680.467198296014,
 8121.604324191346,
 8252.567786445707,
 8337.05778258257,
 7721.986631606686,
 7131.473100449047,
 6486.963254748933,
 7394.415122846099,
 5312.5985472180055,
 9804.23706533292,
 9295.593999693876,
 6227.752176793899,
 7883.257045262405,
 8245.859734261137,
 7791.717929773839,
 8438.519605277825,
 8638.600441526165,
 7791.717929773839,
 7081.222302144858,
 1328.440703542655,
 10263.085263948491,
 10613.230575537746,
 10618.815400813079,
 10816.789145966824,
 11256.719142029866,
 10022.914314676374,
 10816.789145966824,
 6606.184000880814,
 7291.8912921917035,
 7491.816559562766,
 11805.777673190922,
 11712.045933432417,
 1328.440703542655,
 3124.669

In [28]:
brooklyn_gym['distances_from_center'] = distances_from_center

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [29]:
brooklyn_gym.reset_index(drop=True)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,X,Y,distances_from_center
0,Bay Ridge,40.625801,-74.030621,New York Sports Clubs,40.622364,-74.027163,Gym / Fitness Center,-5837706.0,9871937.0,11096.556209
1,Sunset Park,40.645103,-74.010316,Blink Fitness Sunset Park,40.645622,-74.013302,Gym,-5833708.0,9870246.0,8282.191408
2,Sunset Park,40.645103,-74.010316,Richie's Gym,40.645354,-74.013609,Gym,-5833754.0,9870285.0,8326.085154
3,Greenpoint,40.730201,-73.954241,IncrediPole,40.731838,-73.955069,Gymnastics Gym,-5818871.0,9863118.0,13894.753652
4,Gravesend,40.59526,-73.973471,Fitness by bobby,40.591779,-73.973823,Gym,-5842722.0,9864887.0,10401.383936
5,Kensington,40.642382,-73.980421,Yeled VYalda Fitness Center,40.640745,-73.985359,Gym,-5834440.0,9866608.0,4896.269325
6,Prospect Heights,40.676822,-73.964859,Tabata Ultimate Fitness,40.679674,-73.969058,Gym,-5827771.0,9864680.0,5619.288109
7,Prospect Heights,40.676822,-73.964859,Crossfit Kingsboro,40.680065,-73.960838,Gym / Fitness Center,-5827676.0,9863620.0,5293.496694
8,Williamsburg,40.707144,-73.958115,Blink Fitness Williamsburg,40.708756,-73.958248,Gym,-5822797.0,9863420.0,10022.914315
9,Bushwick,40.698116,-73.925258,Blink Fitness Bushwick,40.700033,-73.920319,Gym,-5824139.0,9858481.0,9285.306089


In [30]:
brooklyn_gym['Venue Longitude']

68     -74.027163
142    -74.013302
152    -74.013609
164    -73.955069
272    -73.973823
465    -73.985359
550    -73.969058
566    -73.960838
592    -73.958248
645    -73.920319
726    -73.992376
728    -73.991519
754    -73.991587
780    -73.993344
805    -73.993948
871    -73.994875
890    -73.992940
953    -73.995996
990    -73.995209
999    -73.994875
1096   -73.999934
1118   -73.996560
1123   -73.990245
1205   -73.977862
1237   -73.975628
1327   -73.880805
1334   -73.877843
1345   -73.906732
1500   -73.930101
1538   -73.968389
1624   -73.983285
1652   -73.985522
1667   -73.986172
1713   -73.983285
1784   -73.978484
1937   -73.958149
2032   -73.936069
2041   -73.935655
2061   -73.935681
2083   -73.958759
2090   -73.959977
2203   -73.958248
2205   -73.958759
2289   -73.971906
2301   -73.973559
2305   -73.971504
2321   -74.030291
2332   -74.030213
2375   -73.958149
2423   -73.932808
2513   -73.912844
2666   -73.988892
2668   -73.988930
2677   -73.987791
2680   -73.989122
2698   -73

## Distance to nearest Gym

In [31]:
#Heatmap
from folium import plugins
from folium.plugins import HeatMap

gym_latlons = brooklyn_gym[['Venue Latitude','Venue Longitude']].values.tolist()
brooklyn_map = folium.Map(location=brooklyn, zoom_start=11)
folium.Marker(brooklyn).add_to(brooklyn_map)
HeatMap(gym_latlons).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
    
brooklyn_map

In [72]:
# Create location candidates 100M apart

brooklyn_center_x, brooklyn_center_y = lonlat_to_xy(brooklyn[1], brooklyn[0]) # City center in Cartesian coordinates


k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 200
y_step = 200 * k 
roi_x_min = brooklyn_center_x - 5000
roi_y_min = brooklyn_center_y - 5000
roi_y_max = brooklyn_center_y + 5000

roi_center_x = roi_x_min + 5000
roi_center_y = roi_y_max - 5000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 5001):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

2263 candidate neighborhood centers generated.


In [73]:
def find_nearest_gym(x, y, gym):
    d_min = 100000
    for res in gym:
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_gym_distances = []

gym = brooklyn_gym.values.tolist()

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    distance = find_nearest_gym(x, y, gym)
    roi_gym_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [74]:
# Let's put this into dataframe
df_gym_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Distance to nearest Gym':roi_gym_distances})

df_gym_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Distance to nearest Gym
0,40.649559,-73.910952,-5832672.0,9857023.0,2792.590294
1,40.642236,-73.912569,-5833922.0,9857196.0,1674.436758
2,40.643412,-73.912525,-5833722.0,9857196.0,1852.669494
3,40.644589,-73.91248,-5833522.0,9857196.0,2034.9521
4,40.645765,-73.912435,-5833322.0,9857196.0,2220.287334
5,40.646942,-73.91239,-5833122.0,9857196.0,2407.97044
6,40.648118,-73.912345,-5832922.0,9857196.0,2597.492529
7,40.649295,-73.912301,-5832722.0,9857196.0,2788.478659
8,40.650471,-73.912256,-5832522.0,9857196.0,2980.647418
9,40.651648,-73.912211,-5832322.0,9857196.0,3173.783991


In [75]:
# gym 5 km

good_gym_distance = np.array(df_gym_locations['Distance to nearest Gym']>=2750)

In [76]:
df_good_location = df_gym_locations[good_gym_distance] 
df_good_location

Unnamed: 0,Latitude,Longitude,X,Y,Distance to nearest Gym
0,40.649559,-73.910952,-5832672.0,9857023.0,2792.590294
7,40.649295,-73.912301,-5832722.0,9857196.0,2788.478659
8,40.650471,-73.912256,-5832522.0,9857196.0,2980.647418
9,40.651648,-73.912211,-5832322.0,9857196.0,3173.783991
10,40.652824,-73.912166,-5832122.0,9857196.0,3212.486624
11,40.654001,-73.912122,-5831922.0,9857196.0,3110.753376
12,40.655178,-73.912077,-5831722.0,9857196.0,3018.857866
13,40.656354,-73.912032,-5831522.0,9857196.0,2937.72345
23,40.649618,-73.913627,-5832672.0,9857369.0,2889.980763
24,40.650795,-73.913583,-5832472.0,9857369.0,3079.063917


In [80]:
good_latitudes = df_good_location['Latitude'].values
good_longitudes = df_good_location['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
HeatMap(gym_latlons).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 

brooklyn_map

In [81]:
brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
HeatMap(good_locations).add_to(brooklyn_map)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 

brooklyn_map

In [84]:
#Cluster

from sklearn.cluster import KMeans

number_of_clusters = 10

good_xys = df_good_location[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)
cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(brooklyn_map) 
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(brooklyn_map) 

brooklyn_map

In [85]:
cluster_centers

[(-73.94789806895461, 40.62354495968689),
 (-73.9250429404015, 40.645325833639255),
 (-73.97730622873766, 40.65739034596175),
 (-73.93793015486995, 40.63309202535489),
 (-73.91497204945802, 40.651601617789986),
 (-73.93105605447062, 40.62856449338722),
 (-73.9454265349885, 40.62943156070039),
 (-73.93078179684107, 40.63750705318921),
 (-73.93870366069508, 40.62520139001369),
 (-73.95469899239583, 40.62770905136861)]

In [86]:
google_api_key = 'AIzaSyD2QcZXSnJ0sWSQnj3q2xRdqE8HLg-y68c'
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

In [89]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon).replace(', USA', '')
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, brooklyn_center_x, brooklyn_center_y)
    print('{}{} => {:.1f}km from Brooklyn center'.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

1125 E 28th St, Brooklyn, NY 11210                 => 4.5km from Brooklyn center
5520 Kings Hwy, Brooklyn, NY 11203                 => 3.3km from Brooklyn center
51 Sherman St, Brooklyn, NY 11215                  => 3.8km from Brooklyn center
915 E 40th St, Brooklyn, NY 11210                  => 3.3km from Brooklyn center
658 E 93rd St, Brooklyn, NY 11236                  => 4.5km from Brooklyn center
1800 Schenectady Ave, Brooklyn, NY 11234           => 4.4km from Brooklyn center
3115 Avenue I, Brooklyn, NY 11210                  => 3.6km from Brooklyn center
979 E 48th St, Brooklyn, NY 11203                  => 3.2km from Brooklyn center
1282 E 39th St, Brooklyn, NY 11210                 => 4.5km from Brooklyn center
997 E 22nd St, Brooklyn, NY 11210                  => 3.9km from Brooklyn center


In [90]:
brooklyn_map = folium.Map(location=brooklyn, zoom_start=13)
folium.Circle(brooklyn, radius=5000, fill=False, color='white').add_to(brooklyn_map)
folium.Marker(brooklyn).add_to(brooklyn_map)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(brooklyn_map)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(brooklyn_map) 

brooklyn_map