# Peer-graded Assignment: Capstone Project
# The Battle of Neighborhoods (Week 2)

Created by Hugo Robalino

---

## Introduction

Manhattan, land of opportunities! With this slogan many people are projected to an immediate, complete and fruitful business idea. Manhattan is a land of charm, where the smile of its people and the honesty of its inhabitants create an atmosphere of opportunities. It enjoys a culture willing to serve those who need it most. In short, it is a land where everyone struggles and shows that they are capable of leading a satisfying lifestyle.

Different local media have carried out studies in order to determine which are the jobs that immediately generate income. In this sense, a population census revealed that people prefer four types of businesses. In relation to the social group within which it operates in everyday life. We can also infer that they are closely related to immediate customer satisfaction.

Opening a boxing gym is one of those adventures carried out by passionate fitness, health and well-being. In addition, it is one of the most booming businesses due to the growing interest by the population in practicing sports and leading a healthy lifestyle. This project could be of interest to young entrepreneurs who want to create their own business, renting a place where they can equip it to provide the physical conditioning service, either through the use of exercise machines or simply using only body weight.

## Business Problem

### Which neighborhood should I choose to open a boxing gym in the city of Manhattan?

There are around 200 gyms in Manhattan but not all of them are designed for boxers, that need another kind of training and enviroment. Finding a suitable place for the target audience, thinking about accessibility and visibility, where customers are expected to park their vehicles and, above all, be in a quiet and safe area.

## Data

We are going to use a json file that has New York data: https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json. Also, we are going to implement the Foursquare API to explore the neighborhoods and segment them.

Combining the data from all of these sources will clearly demonstrate the following:
- Which neighborhoods in Manhattan have clusters of like boxing gym.
- How populated each neighborhoods is.
- Which neighborhood should he target to open a new boxing gym.

## Methodology

In this project we will use the k-means clustering algorithm to complete this task. Finally, you will use the Folium library to visualize the neighborhoods in Toronto and their emerging clusters. 

First, we have to collect the information about the location of every gym located in Manhattan, we do this using Foursquare categorization.

Then, our analysis will be focus in calculate and explore the density of gyms across different areas of Manhattan - we will use maps to identify areas that are far away from gym and could be a good choice to rent a place.

Finally, we will focus on most promising areas and within those create clusters of locations. We will take into consideration locations with no gyms on the same street or with a radius of 200 meters. We will present map of all such locations but also create clusters (using k-means clustering) of those locations to identify a location which should be a starting point for explore and search for optimal venue location.

### 1. Load all the dependencies that we will need

In [1]:
import numpy as np # Library to handle data in a vectorized manner

import pandas as pd # Library for data analsysis

import json # Library to handle JSON files

from geopy.geocoders import Nominatim # Convert an address into latitude and longitude values

import requests # Library to handle requests

from pandas.io.json import json_normalize # Tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # Map rendering library

print('Libraries imported.')

Libraries imported.


### 2. Download and Explore Dataset

In [2]:
# Download the json file
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json

# Load and explore the data
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
# Define a new variable that includes the features key
neighborhoods_data = newyork_data['features']

# Define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns = column_names)

# Fill the dataframe one row at a time
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index = True)

# Create a dataframe only with the Borough equals to Manhattan
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop = True)

# Examine the resulting dataframe
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


### 3. Create a map of Toronto with neighborhoods superimposed on top

In [3]:
# Use geopy library to get the latitude and longitude values of Manhattan
address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [4]:
# Create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)

# Show the map
map_manhattan

### 4. Using the Foursquare API to explore the neighborhoods with gym and segment them

In [5]:
CLIENT_ID = 'HIDDEN' # Your Foursquare ID
CLIENT_SECRET = 'HIDDEN' # Your Foursquare Secret
VERSION = '20181218' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius = 500 # Define radius

In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?categoryId=4bf58dd8d48988d175941735&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
# Run the above function on each neighborhood and create a new dataframe called manhattan_venues
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'], latitudes=manhattan_data['Latitude'], longitudes=manhattan_data['Longitude'])

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [8]:
# Check how many venues were returned for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,33,33,33,33,33,33
Carnegie Hill,51,51,51,51,51,51
Central Harlem,12,12,12,12,12,12
Chelsea,45,45,45,45,45,45
Chinatown,19,19,19,19,19,19
Civic Center,93,93,93,93,93,93
Clinton,53,53,53,53,53,53
East Harlem,10,10,10,10,10,10
East Village,31,31,31,31,31,31
Financial District,100,100,100,100,100,100


In [9]:
# Find out how many unique categories can be curated from all the returned venues
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 35 uniques categories.


### 5. Analyze Each Neighborhood

In [10]:
# One hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Athletics & Sports,Bike Shop,Boxing Gym,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,College Gym,...,Pilates Studio,Pool,Residential Building (Apartment / Condo),Spa,Spiritual Center,Tennis Court,Track,Weight Loss Center,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [11]:
# Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Athletics & Sports,Bike Shop,Boxing Gym,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,College Gym,...,Pilates Studio,Pool,Residential Building (Apartment / Condo),Spa,Spiritual Center,Tennis Court,Track,Weight Loss Center,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.030303,0.060606,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,...,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.078431
2,Central Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.155556
4,Chinatown,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263
5,Civic Center,0.0,0.0,0.021505,0.0,0.0,0.021505,0.0,0.0,0.0,...,0.053763,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.129032
6,Clinton,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,...,0.0,0.0,0.018868,0.0,0.0,0.018868,0.018868,0.0,0.0,0.018868
7,East Harlem,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
8,East Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.16129,0.0,0.0,0.0,0.0,0.0,0.032258,0.032258,0.0,0.16129
9,Financial District,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.07


In [12]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                  venue  freq
0                   Gym  0.58
1  Gym / Fitness Center  0.27
2            Boxing Gym  0.06
3             Bike Shop  0.03
4        Gymnastics Gym  0.03


----Carnegie Hill----
                  venue  freq
0  Gym / Fitness Center  0.35
1                   Gym  0.33
2           Yoga Studio  0.08
3   Martial Arts School  0.06
4    Weight Loss Center  0.04


----Central Harlem----
                  venue  freq
0  Gym / Fitness Center  0.42
1                   Gym  0.42
2   Martial Arts School  0.08
3          Cycle Studio  0.08
4          Nutritionist  0.00


----Chelsea----
                  venue  freq
0  Gym / Fitness Center  0.53
1           Yoga Studio  0.16
2                   Gym  0.11
3   Martial Arts School  0.09
4          Cycle Studio  0.04


----Chinatown----
                  venue  freq
0  Gym / Fitness Center  0.42
1                   Gym  0.32
2           Yoga Studio  0.11
3   Martial Arts School  0.11
4            Boxi

In [13]:
# Write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [14]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Gym,Gym / Fitness Center,Boxing Gym,Gym Pool,Bike Shop,Gymnastics Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio
1,Carnegie Hill,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Weight Loss Center,Pilates Studio,Gymnastics Gym,Deli / Bodega,Community Center,Gym Pool
2,Central Harlem,Gym / Fitness Center,Gym,Martial Arts School,Cycle Studio,Yoga Studio,Gymnastics Gym,Doctor's Office,Deli / Bodega,Community Center,College Gym
3,Chelsea,Gym / Fitness Center,Yoga Studio,Gym,Martial Arts School,Cycle Studio,Weight Loss Center,Gym Pool,Pilates Studio,Clothing Store,Climbing Gym
4,Chinatown,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym


### 5. Cluster Neighborhoods

In [15]:
# Set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 1, 1, 0, 1, 2, 1, 2, 2, 1], dtype=int32)

In [16]:
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# Merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,0,Gym / Fitness Center,Yoga Studio,Boxing Gym,Gym,College Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House
1,Manhattan,Chinatown,40.715618,-73.994279,1,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Gym,Gym / Fitness Center,Pilates Studio,Gymnastics Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym,Yoga Studio
3,Manhattan,Inwood,40.867684,-73.92121,2,Pilates Studio,Gym,Gym / Fitness Center,Yoga Studio,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,Gymnastics Gym
4,Manhattan,Hamilton Heights,40.823604,-73.949688,3,Yoga Studio,Gym,College Gym,Gym / Fitness Center,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House,Gymnastics Gym


In [17]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 7. Examine Clusters

#### 7.1 Cluster 1

In [18]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Gym / Fitness Center,Yoga Studio,Boxing Gym,Gym,College Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House
5,Manhattanville,Gym / Fitness Center,Climbing Gym,Yoga Studio,College Gym,Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House
17,Chelsea,Gym / Fitness Center,Yoga Studio,Gym,Martial Arts School,Cycle Studio,Weight Loss Center,Gym Pool,Pilates Studio,Clothing Store,Climbing Gym
33,Midtown South,Gym / Fitness Center,Gym,Martial Arts School,Yoga Studio,Boxing Gym,Building,Medical Center,Chiropractor,Climbing Gym,Clothing Store
37,Stuyvesant Town,Gym / Fitness Center,Yoga Studio,Gym,College Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House,Gymnastics Gym
38,Flatiron,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Residential Building (Apartment / Condo),Women's Store,Martial Arts School,Athletics & Sports,Pool,Doctor's Office


#### 7.2 Cluster 2

In [19]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio,College Gym
6,Central Harlem,Gym / Fitness Center,Gym,Martial Arts School,Cycle Studio,Yoga Studio,Gymnastics Gym,Doctor's Office,Deli / Bodega,Community Center,College Gym
9,Yorkville,Gym,Gym / Fitness Center,Gymnastics Gym,Martial Arts School,Gym Pool,Pilates Studio,Boxing Gym,Climbing Gym,Clothing Store,Club House
12,Upper West Side,Gym / Fitness Center,Gym,Yoga Studio,Gymnastics Gym,Pilates Studio,Boxing Gym,Chiropractor,Climbing Gym,Clothing Store,Club House
13,Lincoln Square,Gym / Fitness Center,Gym,Martial Arts School,Gym Pool,Cycle Studio,Residential Building (Apartment / Condo),Yoga Studio,Climbing Gym,Pilates Studio,Clothing Store
14,Clinton,Gym / Fitness Center,Gym,Building,Chiropractor,Residential Building (Apartment / Condo),Yoga Studio,Tennis Court,Track,Boxing Gym,Gym Pool
15,Midtown,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Martial Arts School,Weight Loss Center,Boxing Gym,Chiropractor,Cycle Studio,Hospital
16,Murray Hill,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Track,Nutritionist,Boxing Gym,Doctor's Office,Pilates Studio,Weight Loss Center
21,Tribeca,Gym / Fitness Center,Gym,Gym Pool,Yoga Studio,Pilates Studio,Cycle Studio,Gymnastics Gym,Athletics & Sports,Track,Doctor's Office
24,West Village,Gym,Gym / Fitness Center,Yoga Studio,Cycle Studio,Track,College Gym,Doctor's Office,Deli / Bodega,Community Center,Club House


#### 7.3 Cluster 3

In [20]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Inwood,Pilates Studio,Gym,Gym / Fitness Center,Yoga Studio,Building,Chiropractor,Climbing Gym,Clothing Store,Club House,Gymnastics Gym
7,East Harlem,Gym / Fitness Center,Martial Arts School,Yoga Studio,Boxing Gym,Building,Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio
8,Upper East Side,Gym / Fitness Center,Gym,Yoga Studio,Doctor's Office,Pilates Studio,Cycle Studio,Spa,Martial Arts School,Boxing Gym,Building
10,Lenox Hill,Gym / Fitness Center,Gym,Yoga Studio,Cycle Studio,Martial Arts School,Pilates Studio,Spa,Club House,Boxing Gym,Building
18,Greenwich Village,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Martial Arts School,Boxing Gym,Spa,Medical Center,Building,Chiropractor
19,East Village,Gym / Fitness Center,Gym,Pilates Studio,Yoga Studio,Outdoor Gym,Martial Arts School,Track,Weight Loss Center,Bike Shop,Boxing Gym
20,Lower East Side,Gym,Martial Arts School,Yoga Studio,Gym / Fitness Center,Pool,Community Center,Building,Chiropractor,Climbing Gym,Clothing Store
22,Little Italy,Gym / Fitness Center,Gym,Yoga Studio,Martial Arts School,Boxing Gym,Pilates Studio,Spa,Cycle Studio,Building,Chiropractor
23,Soho,Gym / Fitness Center,Gym,Yoga Studio,Pilates Studio,Boxing Gym,Martial Arts School,Office,Gymnastics Gym,Medical Center,Cycle Studio
26,Morningside Heights,Yoga Studio,Gym Pool,Gym,Medical Center,Park,College Gym,Gym / Fitness Center,Climbing Gym,Clothing Store,Club House


#### 7.4 Cluster 4

In [21]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Hamilton Heights,Yoga Studio,Gym,College Gym,Gym / Fitness Center,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House,Gymnastics Gym


#### 7.5 Cluster 5

In [22]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Gym,Gym / Fitness Center,Pilates Studio,Gymnastics Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,College Gym,Yoga Studio
11,Roosevelt Island,Gym,Gym / Fitness Center,Yoga Studio,College Gym,Doctor's Office,Deli / Bodega,Cycle Studio,Community Center,Club House,Gymnastics Gym
28,Battery Park City,Gym,Gym / Fitness Center,Boxing Gym,Gym Pool,Bike Shop,Gymnastics Gym,Community Center,Doctor's Office,Deli / Bodega,Cycle Studio


## Results and Discussion

Our analysis shows that there are a large number of gyms in Manhattan, but we can find that boxing gyms are not very common. As can be seen in each cluster there are few places that offer these services, so it would be ideal to open a gym either in cluster 4, which does not have a gym of this type. Also, cluster 5, could be a good choice, because Battery Park City 	is the only neighborhood with a boxing gym.

Must take into account, does not suggest that those zones are really ideal areas for a modern boxing exercise center! Reason of this investigation was to as it were give information on zones near to Manhattan but not swarmed with existing boxing exercise center. Prescribed zones ought to in this manner be considered as it were as a beginning point for more point by point investigation which might in the long run result in area which has not as it were no adjacent competition but too other components taken into consideration and all other pertinent conditions met.

## Conclusion

Reason of this project was to distinguish Manhattan regions near to center with moo number of boxing exercise centers in arrange to explore for an ideal location for a unused boxing exercise center. BY distinguishing clusters, we are able say that cluster 4 and 5 may well be a great choice to open a unused boxing gym. 

Final decission on ideal eatery area will be made by partners based on particular characteristics of neighborhoods and areas in each suggested zone, taking into thought extra components like allure of each area, costs, social and financial elements of each neighborhood etc.