# IBM Applied DataScience Capstone

### Project: Opening a gym in New York City

#### Steps:
- Export the neighborhood of the new york city 
- Get the coordinate of the neighbohoods
- Explor the venues near to the neighborhoods
- Cluster them using machine learning method - KMeans clusting
- Find out the best suitables places to open a gym


#### Import librearies

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim
import matplotlib.pyplot as plt
import folium
from geopy.extra.rate_limiter import RateLimiter
from pandas.io.json import json_normalize

### Scrap the neighborhood from wikipedia

In [2]:
url="https://en.wikipedia.org/wiki/Neighborhoods_in_New_York_City"
page = requests.get(url)
page

<Response [200]>

In [3]:
soup = BeautifulSoup(page.text,'html.parser')
table=soup.find_all('table')[0]

In [4]:
df = pd.DataFrame(np.arange(5))

In [5]:
row_index=0
for row in table.find_all('tr'):
    col_index=0
    for col in row.find_all('td'):
        df.loc[row_index,col_index] = col.get_text().replace('\n','')
        col_index +=1
    row_index +=1

In [6]:
df.columns = ['Borough','Area km2','Pop Census','Pop/km2','Neighborhoods']
df = df[(df['Borough'] != 'New York City') & (df['Borough'] != 0)]
df.head()

Unnamed: 0,Borough,Area km2,Pop Census,Pop/km2,Neighborhoods
1,Bronx CB 1,7.17,91497,12761,"Melrose, Mott Haven, Port Morris"
2,Bronx CB 2,5.54,52246,9792,"Hunts Point, Longwood"
3,Bronx CB 3,4.07,79762,19598,"Claremont, Concourse Village, Crotona Park, Mo..."
4,Bronx CB 4,5.28,146441,27735,"Concourse, Highbridge"
5,Bronx CB 5,3.55,128200,36145,"Fordham, Morris Heights, Mount Hope, Universit..."


### Update the Borough to unique names

In [7]:
df.loc[df['Borough'].str.contains('Bronx'),'Borough'] = 'Bronx'
df.loc[df['Borough'].str.contains('Brooklyn'),'Borough'] = 'Brooklyn'
df.loc[df['Borough'].str.contains('Queens'),'Borough'] = 'Queens'
df.loc[df['Borough'].str.contains('Staten Island'),'Borough'] = 'Staten Island'
df.loc[df['Borough'].str.contains('Manhattan'),'Borough'] = 'Manhattan'

In [8]:
df['Borough'].unique()

array(['Bronx', 'Brooklyn', 'Manhattan', 'Queens', 'Staten Island'],
      dtype=object)

### Split the Neighborhoods column to multiple rows

In [9]:
neighborhood=df['Neighborhoods'].str.split(',').apply(pd.Series,1).stack()
neighborhood.name = 'Neighborhoods'
neighborhood.index = neighborhood.index.droplevel(-1)

df.drop('Neighborhoods',axis=1,inplace=True)

df=df.join(neighborhood).reset_index()
df["Neighborhoods"]=df["Neighborhoods"].str.strip()
df["Borough"]=df["Borough"].str.strip()
del df['index']

In [10]:
df.head()

Unnamed: 0,Borough,Area km2,Pop Census,Pop/km2,Neighborhoods
0,Bronx,7.17,91497,12761,Melrose
1,Bronx,7.17,91497,12761,Mott Haven
2,Bronx,7.17,91497,12761,Port Morris
3,Bronx,5.54,52246,9792,Hunts Point
4,Bronx,5.54,52246,9792,Longwood


In [11]:
df.groupby('Borough').count()

Unnamed: 0_level_0,Area km2,Pop Census,Pop/km2,Neighborhoods
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bronx,60,60,60,60
Brooklyn,80,80,80,80
Manhattan,48,48,48,48
Queens,86,86,86,86
Staten Island,56,56,56,56


### Cretae a function to get co-orodinates

In [12]:
geolocator = Nominatim(user_agent="newyork_loc",timeout=3)

def get_location(name,borough):
    name = name + "," + borough
    print(name)
    location = geolocator.geocode(name,country_codes='US')
    return(location.latitude,location.longitude)

### Get the coordinates of the neighborhoods

In [13]:
for index,row in df.iterrows():
    try:
        location = get_location(row['Neighborhoods'],row['Borough'])
        df.loc[index,'Latitude'] = location[0]
        df.loc[index,'Longitude'] = location[1]
    except:
        print("Retrying....")
        try:
            location = get_location(row['Neighborhoods'],row['Borough'])
            df.loc[index,'Latitude'] = location[0]
            df.loc[index,'Longitude'] = location[1]
        except:
            print("***** Not Found in second attempt ****")

Melrose,Bronx
Mott Haven,Bronx
Port Morris,Bronx
Hunts Point,Bronx
Longwood,Bronx
Claremont,Bronx
Concourse Village,Bronx
Crotona Park,Bronx
Morrisania,Bronx
Concourse,Bronx
Highbridge,Bronx
Fordham,Bronx
Morris Heights,Bronx
Mount Hope,Bronx
University Heights,Bronx
Bathgate,Bronx
Belmont,Bronx
East Tremont,Bronx
West Farms,Bronx
Bedford Park,Bronx
Norwood,Bronx
University Heights,Bronx
Fieldston,Bronx
Kingsbridge,Bronx
Kingsbridge Heights,Bronx
Marble Hill,Bronx
Riverdale,Bronx
Spuyten Duyvil,Bronx
Van Cortlandt Village,Bronx
Retrying....
Van Cortlandt Village,Bronx
***** Not Found in second attempt ****
Bronx River,Bronx
Bruckner,Bronx
Castle Hill,Bronx
Clason Point,Bronx
Harding Park,Bronx
Parkchester,Bronx
Soundview,Bronx
Unionport,Bronx
City Island,Bronx
Co-op City,Bronx
Locust Point,Bronx
Pelham Bay,Bronx
Silver Beach,Bronx
Throgs Neck,Bronx
Westchester Square,Bronx
Allerton,Bronx
Bronxdale,Bronx
Indian Village,Bronx
Laconia,Bronx
Morris Park,Bronx
Pelham Gardens,Bronx
Pelham Pa

### Check for the location without coordinates and delete from the dataframe

In [14]:
df[df.isnull().any(axis=1)]

Unnamed: 0,Borough,Area km2,Pop Census,Pop/km2,Neighborhoods,Latitude,Longitude
28,Bronx,8.83,101731,11521,Van Cortlandt Village,,
94,Brooklyn,4.07,104014,25556,Prospect Lefferts Gardens,,
228,Queens,19.17,146594,7647,Kew Gardens Hills,,
237,Queens,16.19,127274,7861,Lindenwood,,
262,Queens,33.31,196284,5893,New Hyde Park,,
284,Staten Island,36.62,162609,4440,Meiers Corners,,
292,Staten Island,36.62,162609,4440,Silver Lake,,


In [15]:
df=df[~df.isnull().any(axis=1)]

### Select required columns and plot in the map

In [16]:
df_neighborhood = df[['Borough','Neighborhoods','Latitude','Longitude']]
df_neighborhood.head()

Unnamed: 0,Borough,Neighborhoods,Latitude,Longitude
0,Bronx,Melrose,40.82567,-73.915242
1,Bronx,Mott Haven,40.80899,-73.922915
2,Bronx,Port Morris,40.801515,-73.909581
3,Bronx,Hunts Point,40.812601,-73.884025
4,Bronx,Longwood,40.816292,-73.89622


In [17]:
address = "New York City"

geolocator = Nominatim(user_agent="newyork_loc")
location = geolocator.geocode(address,country_codes='US')
latitude = location.latitude
longitude = location.longitude
print("Latitude:{} and Longitude:{}".format(latitude,longitude))


Latitude:40.7127281 and Longitude:-74.0060152


In [18]:
df_neighborhood.drop(df_neighborhood[(df_neighborhood['Neighborhoods'] == 'Highland Park')].index, inplace=True)
df_neighborhood.drop(df_neighborhood[(df_neighborhood['Neighborhoods'] == 'Floral Park')].index, inplace=True)
df_neighborhood.drop(df_neighborhood[(df_neighborhood['Neighborhoods'] == 'Liberty Park')].index, inplace=True)
df_neighborhood.drop(df_neighborhood[(df_neighborhood['Neighborhoods'] == 'Plum Beach')].index, inplace=True)
df_neighborhood.drop(df_neighborhood[(df_neighborhood['Neighborhoods'] == 'Madison')].index, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


In [19]:
newyork_map = folium.Map(location=[latitude,longitude],zoom_start=10)

for neigh, lat, lng in zip(df_neighborhood['Neighborhoods'],df_neighborhood['Latitude'],df_neighborhood['Longitude']):
    folium.Popup(neigh,parse_html=True)
    folium.CircleMarker(
           [lat,lng],
           radius=5,
           popup="{} : {}-{}".format(neigh,lat,lng),
           color='blue',
           fill=True,
           fill_color='#3186cc',
           fill_capacity=0.7,
           parse_html=False).add_to(newyork_map)

newyork_map

### Explore the venues of the neighborhoods

In [20]:
CLIENT_ID = '1PRWYLG3NUWGEWRZ4QMIO313VEIFQ5P5VFRVR5MIF255VOXC' # your Foursquare ID
CLIENT_SECRET = 'XADJWYNICLSRLQWN1XRH5BQX30TLSIPKOT1NJSIJVIY4GYKK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1PRWYLG3NUWGEWRZ4QMIO313VEIFQ5P5VFRVR5MIF255VOXC
CLIENT_SECRET:XADJWYNICLSRLQWN1XRH5BQX30TLSIPKOT1NJSIJVIY4GYKK


In [21]:
def GetNearbyVenue(name, borough, latitude, longitude, radius=500):
    venues_list=[]
    LIMIT=100
    
    for name, borough, lat, log in zip(name, borough, latitude,longitude):
        print(name)
        
        url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
    CLIENT_ID,CLIENT_SECRET,VERSION,lat,log,radius,LIMIT)
        
        result = requests.get(url)

        result = result.json()['response']['groups'][0]['items']
        
        venues_list.append([(
        name,
        borough,
        lat,
        log,
        v['venue']['name'],
        v['venue']['location']['lat'],
        v['venue']['location']['lng'],
        v['venue']['categories'][0]['name']) for v in result
        ])
        
        nearby_venue = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venue.columns = ['Neighborhood',
                                'Borough',
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return nearby_venue
        

In [22]:
df_neighborhood.isnull().any(axis=1).unique()

array([False])

In [23]:
newyork_venues = GetNearbyVenue(df_neighborhood['Neighborhoods'],
                                df_neighborhood['Borough'],
                               df_neighborhood['Latitude'],
                               df_neighborhood['Longitude'])

Melrose
Mott Haven
Port Morris
Hunts Point
Longwood
Claremont
Concourse Village
Crotona Park
Morrisania
Concourse
Highbridge
Fordham
Morris Heights
Mount Hope
University Heights
Bathgate
Belmont
East Tremont
West Farms
Bedford Park
Norwood
University Heights
Fieldston
Kingsbridge
Kingsbridge Heights
Marble Hill
Riverdale
Spuyten Duyvil
Bronx River
Bruckner
Castle Hill
Clason Point
Harding Park
Parkchester
Soundview
Unionport
City Island
Co-op City
Locust Point
Pelham Bay
Silver Beach
Throgs Neck
Westchester Square
Allerton
Bronxdale
Indian Village
Laconia
Morris Park
Pelham Gardens
Pelham Parkway
Van Nest
Baychester
Edenwald
Eastchester
Fish Bay
Olinville
Wakefield
Williamsbridge
Woodlawn
Greenpoint
Williamsburg
Williamsburg Houses
Boerum Hill
Brooklyn Heights
Brooklyn Navy Yard
Clinton Hill
Dumbo
Fort Greene
Fulton Ferry
Fulton Mall
Vinegar Hill
Bedford-Stuyvesant
Ocean Hill
Stuyvesant Heights
Bushwick
City Line
Cypress Hills
East New York
New Lots
Starrett City
Carroll Gardens
Cobble

In [24]:
newyork_venues.head()

Unnamed: 0,Neighborhood,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Melrose,Bronx,40.82567,-73.915242,Porto Salvo,40.823887,-73.91291,Italian Restaurant
1,Melrose,Bronx,40.82567,-73.915242,Perry Coffee Shop.,40.823181,-73.910928,Diner
2,Melrose,Bronx,40.82567,-73.915242,Chipotle Mexican Grill,40.82589,-73.919534,Mexican Restaurant
3,Melrose,Bronx,40.82567,-73.915242,Starbucks,40.825556,-73.918865,Coffee Shop
4,Melrose,Bronx,40.82567,-73.915242,Concourse Village,40.823697,-73.919607,Shopping Mall


In [25]:
newyork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Allerton,34,34,34,34,34,34,34
Alphabet City,100,100,100,100,100,100,100
Annadale,21,21,21,21,21,21,21
Arlington,6,6,6,6,6,6,6
Arrochar,8,8,8,8,8,8,8
Arverne,8,8,8,8,8,8,8
Astoria,21,21,21,21,21,21,21
Astoria Heights,54,54,54,54,54,54,54
Auburndale,24,24,24,24,24,24,24
Baisley Park,5,5,5,5,5,5,5


In [26]:
len(newyork_venues['Venue Category'].unique())

439

In [27]:
'Gym' in newyork_venues['Venue Category'].unique()

True

In [28]:
newyork_venues[newyork_venues['Venue Category'].str.contains('Gym')]['Venue Category'].unique()

array(['Gym / Fitness Center', 'Gym', 'Boxing Gym', 'Climbing Gym',
       'Gym Pool', 'Outdoor Gym', 'Gymnastics Gym'], dtype=object)

### Updating all the Gym catagories to "Gym"

In [29]:
newyork_venues.loc[newyork_venues['Venue Category'].str.contains('Gym'),'Venue Category'] = 'Gym'

In [30]:
newyork_venues[newyork_venues['Venue Category'].str.contains('Gym')]['Venue Category'].unique()

array(['Gym'], dtype=object)

In [31]:
newyork_venues.head()

Unnamed: 0,Neighborhood,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Melrose,Bronx,40.82567,-73.915242,Porto Salvo,40.823887,-73.91291,Italian Restaurant
1,Melrose,Bronx,40.82567,-73.915242,Perry Coffee Shop.,40.823181,-73.910928,Diner
2,Melrose,Bronx,40.82567,-73.915242,Chipotle Mexican Grill,40.82589,-73.919534,Mexican Restaurant
3,Melrose,Bronx,40.82567,-73.915242,Starbucks,40.825556,-73.918865,Coffee Shop
4,Melrose,Bronx,40.82567,-73.915242,Concourse Village,40.823697,-73.919607,Shopping Mall


### Create the dataframe with categories as column and group by neighborhood

In [32]:
ny_venues = pd.get_dummies(newyork_venues[['Venue Category']], prefix="", prefix_sep="")
ny_venues['Neighborhood'] = newyork_venues['Neighborhood']

temp_col = list(ny_venues.columns)
temp_col.remove('Neighborhood')

columns = ['Neighborhood'] + temp_col
ny_venues = ny_venues[columns]

In [33]:
ny_venues.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,...,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,Melrose,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Melrose,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Melrose,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Melrose,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Melrose,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [34]:
ny_venue_grouped = ny_venues.groupby('Neighborhood').mean().reset_index()

In [35]:
ny_venue_grouped.shape

(301, 433)

In [36]:
# Rename the column and merge with dataframe for coordinates

df_neighborhood.rename(columns={'Neighborhoods':'Neighborhood'},inplace=True)
ny_venue_grouped = ny_venue_grouped.merge(df_neighborhood, on='Neighborhood')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [37]:
ny_venue_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport Food Court,Airport Lounge,Airport Terminal,American Restaurant,Amphitheater,...,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo,Zoo Exhibit,Borough,Latitude,Longitude
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Bronx,40.86543,-73.867365
1,Alphabet City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.0,0.02,0.0,0.0,Manhattan,40.725102,-73.979583
2,Annadale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Staten Island,40.54455,-74.176532
3,Arlington,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Staten Island,40.632326,-74.165144
4,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,Staten Island,40.598438,-74.072641


### Get neighborhoods with column gym only

In [38]:
len(ny_venue_grouped[ny_venue_grouped['Gym'] > 0])

116

In [39]:
ny_gym = ny_venue_grouped[['Borough','Neighborhood','Gym']]
ny_gym.head()

Unnamed: 0,Borough,Neighborhood,Gym
0,Bronx,Allerton,0.0
1,Manhattan,Alphabet City,0.01
2,Staten Island,Annadale,0.0
3,Staten Island,Arlington,0.0
4,Staten Island,Arrochar,0.0


In [40]:
ny_gym['Borough'].unique()

array(['Bronx', 'Manhattan', 'Staten Island', 'Queens', 'Brooklyn'],
      dtype=object)

### Apply Machine learning alogrithm to the dataset

In [41]:
ny_gym[ny_gym.isnull().any(axis=1)]

Unnamed: 0,Borough,Neighborhood,Gym


In [42]:
from sklearn.cluster import KMeans

In [43]:
# Set the cluster to 3
k_clusters = 3
ny_gym_tmp = ny_gym.copy()

for borough in ny_gym['Borough'].unique():
    print(borough)
    ny_clusters = ny_gym_tmp[ny_gym_tmp['Borough'] == borough].drop(['Borough','Neighborhood'],axis=1)
    
    # Kmeans
    ny_kmeans = KMeans(n_clusters=k_clusters,random_state=0).fit(ny_clusters)
    ny_gym.loc[ny_gym[ny_gym['Borough'] == borough].index,'cluster'] = ny_kmeans.labels_

Bronx
Manhattan


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.p

Staten Island
Queens
Brooklyn


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


In [44]:
ny_gym = ny_gym.merge(df_neighborhood, on=['Neighborhood','Borough'])

In [45]:
ny_gym['cluster'] = ny_gym['cluster'].astype(int)

In [46]:
ny_gym['cluster'].unique()

array([0, 2, 1])

In [47]:
ny_gym['cluster']=ny_gym['cluster'].astype(int)

In [48]:
ny_gym.head()

Unnamed: 0,Borough,Neighborhood,Gym,cluster,Latitude,Longitude
0,Bronx,Allerton,0.0,0,40.86543,-73.867365
1,Manhattan,Alphabet City,0.01,0,40.725102,-73.979583
2,Staten Island,Annadale,0.0,2,40.54455,-74.176532
3,Staten Island,Arlington,0.0,2,40.632326,-74.165144
4,Staten Island,Arrochar,0.0,2,40.598438,-74.072641


### Plot the neighborhood with the label in the map

In [49]:
ny_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i+x+(i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for bor,lat, lon, poi, cluster in zip(ny_gym['Borough'],ny_gym['Latitude'], ny_gym['Longitude'], ny_gym['Neighborhood'], ny_gym['cluster']):
    label = folium.Popup(str(poi) + ', ' + bor + ' - Cluster: ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(ny_map)
       
ny_map

### Save the map

In [50]:
ny_map.save(outfile='ny_gym.html')

### Check the cluster of the each borough

In [51]:
ny_gym.groupby(['Borough','cluster']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,Neighborhood,Gym,Latitude,Longitude
Borough,cluster,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Bronx,0,47,47,47,47
Bronx,1,5,5,5,5
Bronx,2,9,9,9,9
Brooklyn,0,69,69,69,69
Brooklyn,1,1,1,1,1
Brooklyn,2,20,20,20,20
Manhattan,0,31,31,31,31
Manhattan,1,10,10,10,10
Manhattan,2,11,11,11,11
Queens,0,62,62,62,62


### Analyze clusters of Brooklyn

In [52]:
ny_gym[(ny_gym['Borough'] == 'Queens') & (ny_gym['cluster'] == 0)]

Unnamed: 0,Borough,Neighborhood,Gym,cluster,Latitude,Longitude
5,Queens,Arverne,0.0,0,40.593417,-73.789546
6,Queens,Astoria,0.0,0,40.772014,-73.930267
8,Queens,Auburndale,0.0,0,40.761452,-73.789724
9,Queens,Baisley Park,0.0,0,40.676015,-73.784897
14,Queens,Bay Terrace,0.0,0,40.561639,-73.920204
18,Queens,Bayswater,0.0,0,40.610278,-73.767222
22,Queens,Belle Harbor,0.0,0,40.577552,-73.848577
23,Queens,Bellerose,0.0,0,40.726769,-73.741521
30,Queens,Blissville,0.0,0,40.734721,-73.937780
34,Queens,Breezy Point,0.0,0,40.556240,-73.926718


In [53]:
ny_gym[(ny_gym['Borough'] == 'Queens') & (ny_gym['cluster'] == 1)]

Unnamed: 0,Borough,Neighborhood,Gym,cluster,Latitude,Longitude
17,Queens,Bayside,0.0625,1,40.768435,-73.777077
21,Queens,Beechhurst,0.125,1,40.79149,-73.804578
46,Queens,Cambria Heights,0.076923,1,40.694547,-73.738465
161,Queens,Kew Gardens,0.0625,1,40.713941,-73.830742
305,Queens,Tudor Village,0.082353,1,40.746394,-73.971705


In [54]:
ny_gym[(ny_gym['Borough'] == 'Queens') & (ny_gym['cluster'] == 2)]

Unnamed: 0,Borough,Neighborhood,Gym,cluster,Latitude,Longitude
7,Queens,Astoria Heights,0.037037,2,40.760527,-73.911649
67,Queens,College Point,0.047619,2,40.787601,-73.845968
72,Queens,Corona,0.021739,2,40.746959,-73.860146
85,Queens,East Elmhurst,0.043478,2,40.761212,-73.865136
111,Queens,Forest Hills,0.033333,2,40.719594,-73.844855
122,Queens,Glendale,0.027027,2,40.701492,-73.886803
233,Queens,Ozone Park,0.051282,2,40.67677,-73.843746
250,Queens,Queensbridge,0.037037,2,40.754495,-73.945613
257,Queens,Richmond Hill,0.045455,2,40.699425,-73.830967
258,Queens,Richmond Hill,0.045455,2,40.699425,-73.830967


### Analysis

If we look into above data, we can see there are 3 clusters of the neighborhood of Queens

Clusters 0 : Neighborhood with low frequency of the gyms

Clusters 1 : Neighborhood with high frequency of the gyms

Clusters 2 : Neighborhood with modorate frequency of the gyms


So we can choose cluster 1 neighborhoods to open a gym without any competition and cluster 2 with less competition.
we can avoid cluster 0 neighborhoods as it has maximum gyms in its nearby.
