# Capstone Project - The Battle of the Neighborhoods - Locaton to Open a Gym (Week 4 - Week 2)

### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a gym/fitness center. Specifically, this report will be targeted to stakeholders interested in opening a **Gym / Fitness Center** in **Pougkeepsie, NY**, USA.

Gyms and Fitness Centers are important for people's health, both physical and mental well-being of all age groups.  Gyms need to convenient for members to travel to where they should not need to drive over 7 miles to get to the gym.  

While there are not many fitness centers in Poughkeepsie, we will try to detect **locations where members do not need to drive over 7 miles to the gym.**. We are also particularly interested in **areas with no fitness centers in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promising locations based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number of existing fitness centers in the four Poughkeepsie zipcodes
* number of and distance to fitness centers in the neighborhoods
* distance of neighborhood from city center within 7 miles

I will use zipcodes to define our Poughkeepsie neighborhoods.

The following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using the **Foursquare API**
* number of gym/fitness centers and their location in every neighborhood will be obtained using **Foursquare API**

A dataframe containing the 4 zipcode neighborhoods in Poughkeepsie is hardcode defined by looking up information on wikipedias and https://www.unitedstateszipcodes.org/.  Thru this in data, I created the dataframe for the Poughkeepsie 4 neighborhoods.


	Zipcode	Town	        Neighbourhood	        Latitude	Longitude	Population
0	12601 &emsp; Poughkeepsie &emsp; Western &nbsp; &emsp;  &emsp; &emsp;  &emsp;  &emsp;  41.7004	&emsp;   -73.923912	&emsp;43398

1	12602 &emsp; Poughkeepsie &emsp; City of Poughkeepsie &emsp; 	41.7004	&emsp;   -73.921000	&emsp;30398

2	12603 &emsp; Poughkeepsie &emsp; Arlington	&nbsp; &emsp;  &emsp;  &emsp;  &emsp; &emsp;   41.6959	 &emsp;  -73.896800	&emsp;42810

3	12604 &emsp; Poughkeepsie &emsp; Vassar&emsp; &nbsp; &emsp;  &emsp;  &emsp;  &emsp;  &emsp; 	41.6862	 &emsp;   -73.897300&emsp;	584




## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Poughkeepsie that have low gym / fitness center density. We will limit our analysis to an area ~7 miles around city center.

In first step we have collected the required **data: location and type (category) of every gym / fitness centers within 7 miles from Poughkeepsie center**. We have also **identified gyms / fitness centers** (according to Foursquare categorization).

In third and final step we will focus on most promising areas and within those create **areas of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no gyms within one mile of each other**. We will present map of all such locations which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

In [64]:
!pip install bs4
!pip install shapely
!pip install pyproj



## Create a dataframe of the 4 neighourhood locations in and around Poughkeepsie NY.  

### Data was manually input from the Poughkeepsie Wikipedia

In [65]:
import pandas as pd

#initialize a dataframe
Poughkeepsie_zipcodes = pd.DataFrame(
[['12601', 'Poughkeepsie', 'Western',41.7004, -73.923912,43398],
['12602', 'Poughkeepsie', 'City of Poughkeepsie',41.7004,-73.9210, 30398],
['12603', 'Poughkeepsie', 'Arlington',41.6959, -73.8968, 42810],
['12604', 'Poughkeepsie', 'Vassar',41.6862,-73.8973, 584]],
columns=['Zipcode','Town','Neighbourhood','Latitude','Longitude','Population'])

print(Poughkeepsie_zipcodes.shape)
Poughkeepsie_zipcodes 

(4, 6)


Unnamed: 0,Zipcode,Town,Neighbourhood,Latitude,Longitude,Population
0,12601,Poughkeepsie,Western,41.7004,-73.923912,43398
1,12602,Poughkeepsie,City of Poughkeepsie,41.7004,-73.921,30398
2,12603,Poughkeepsie,Arlington,41.6959,-73.8968,42810
3,12604,Poughkeepsie,Vassar,41.6862,-73.8973,584


## Map the 4 Neighbourhoods of Poughkeepsie, NY

In [66]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Poughkeepsie, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Poughkeepsie are {}, {}.'.format(latitude, longitude))

import folium # map rendering library
# create map of Poughkeepsie using latitude and longitude values
map_Poughkeepsie = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, town, area in zip(Poughkeepsie_zipcodes['Latitude'], Poughkeepsie_zipcodes['Longitude'], Poughkeepsie_zipcodes['Town'], Poughkeepsie_zipcodes['Neighbourhood']):
    label = '{}, {}'.format(area, town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie)  
map_Poughkeepsie

The geograpical coordinate of Poughkeepsie are 41.7065539, -73.9283672.


## Use FourSquare for location services to find the location of the gyms / fitness center locations around Poughkeepsie.

In [67]:
CLIENT_ID = '1J4H1XCKHFROMH41OUQ2Q255QMUFLIOPLE1PFUFWITISG3XP' # your Foursquare ID
CLIENT_SECRET = 'AKPY1YQUGEI4KVDOTJ0APTV4FN015PPYDIPE1EGQI3ECZUNP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 5000 # A default Foursquare API limit value

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

### Use City of Poughkeepsie as center point of "Gym / Fitness Center" location search of 7 miles in any location.

In [68]:
locval=1
Poughkeepsie_zipcodes.loc[locval, 'Neighbourhood']
neighbourhood_latitude = Poughkeepsie_zipcodes.loc[locval, 'Latitude'] # neighborhood latitude value
neighbourhood_longitude = Poughkeepsie_zipcodes.loc[locval, 'Longitude'] # neighborhood longitude value

neighbourhood_name = Poughkeepsie_zipcodes.loc[locval, 'Neighbourhood'] # neighborhood name
print('Latitude and longitude values of {} are {}, {}.'.format(neighbourhood_name, 
                                                               neighbourhood_latitude, 
                                                               neighbourhood_longitude))
LIMIT = 5000 # limit of number of venues returned by Foursquare API

radius = 11265.4 # define radius in kilometers which is about 7 miles.  

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighbourhood_latitude, 
    neighbourhood_longitude, 
    radius, 
    LIMIT)
url # display URL

results = requests.get(url).json()

Latitude and longitude values of City of Poughkeepsie are 41.7004, -73.921.


In [69]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Create a table of the gyms within 7 miles of center of "City of Poughkeepsie".

In [70]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.address','venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

# select gym type facilities
nearby_venues.loc[(nearby_venues["categories"]=='Gym / Fitness Center') | (nearby_venues["categories"]=='Gym')]




  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,address,lat,lng
56,Planet Fitness,Gym / Fitness Center,3675 Albany Post Rd,41.740156,-73.930774
60,Crunch - Poughkeepsie,Gym / Fitness Center,2 Neptune Road,41.644061,-73.924041
75,Gold's Gym,Gym,258 Titusville Rd,41.657765,-73.855858
93,Mike Arteaga's Health and Fitness,Gym / Fitness Center,3425 US Highway 9W,41.73919,-73.963842


In [71]:
existing_gyms=nearby_venues.loc[(nearby_venues["categories"]=='Gym / Fitness Center') | (nearby_venues["categories"]=='Gym')]
existing_gyms

Unnamed: 0,name,categories,address,lat,lng
56,Planet Fitness,Gym / Fitness Center,3675 Albany Post Rd,41.740156,-73.930774
60,Crunch - Poughkeepsie,Gym / Fitness Center,2 Neptune Road,41.644061,-73.924041
75,Gold's Gym,Gym,258 Titusville Rd,41.657765,-73.855858
93,Mike Arteaga's Health and Fitness,Gym / Fitness Center,3425 US Highway 9W,41.73919,-73.963842


## Map the location of the 4 gyms of Poughkeepsie.

### Markers in green are the existing gyms in and around Poughkeepsie.
### Markers in blue are the 4 neighborhoods/zipcodes of Poughkeepsie

In [72]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'Poughkeepsie, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Poughkeepsie are {}, {}.'.format(latitude, longitude))

import folium # map rendering library
# create map of Poughkeepsie using latitude and longitude values
map_Poughkeepsie_gyms = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, town, area in zip(Poughkeepsie_zipcodes['Latitude'], Poughkeepsie_zipcodes['Longitude'], Poughkeepsie_zipcodes['Town'], Poughkeepsie_zipcodes['Neighbourhood']):
    label = '{}, {}'.format(area, town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie_gyms) 
    
# add markers to map
for lat, lng, town, area in zip(existing_gyms['lat'], existing_gyms['lng'], existing_gyms['name'], 
                                existing_gyms['address']):
    label = '{}, {}'.format(town,area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie_gyms)  
map_Poughkeepsie_gyms

The geograpical coordinate of Poughkeepsie are 41.7065539, -73.9283672.


### Compute the distance from the "City of Poughkeepsie" center to the existing gym / fitness centers

In [73]:
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    # Radius of earth in kilometers is 6371
    km = 6371* c
    return km

In [74]:
print("City of Poughkeepsie to Planet Fitness is",haversine(41.740156,-73.930774, 41.7004, -73.921000)/1.609,"miles.")
print("City of Poughkeepsie to Crunch Fitness is",haversine(41.740156,-73.930774, 41.64406, -73.924041)/1.609,"miles.")
print("City of Poughkeepsie to Mike Arteaga's Health Fitness Center is",haversine(41.740156,-73.930774, 41.739190,-73.963842)/1.609,"miles.")
print("City of Poughkeepsie to Gold's Gym is",haversine(41.740156,-73.930774, 41.657765, -73.855858)/1.609,"miles.")
print(" ")

City of Poughkeepsie to Planet Fitness is 1.017321156943173 miles.
City of Poughkeepsie to Crunch Fitness is 1.896562253260341 miles.
City of Poughkeepsie to Mike Arteaga's Health Fitness Center is 2.285341080331084 miles.
City of Poughkeepsie to Gold's Gym is 5.412919133725739 miles.
 


### Calculate the distance of the candidate gym locations from the "City of Poughkeepsie" center to ensure candidate gym is located within Poughkeepsie limits.  No existing gym should be within 1 mile of a candidate gym location.

In [75]:
# Calculate the distance between existing gym locations and candidate areas.
print("City of Poughkeepsie to Candidate - Arlington Adams is",haversine(41.740156,-73.930774, 41.700432, -73.877971)/1.609,"miles.")
print("City of Poughkeepsie to Candidate - Corlies is",haversine(41.740156,-73.930774, 41.707070, -73.905860)/1.609,"miles.")
print("City of Poughkeepsie to Candidate - Dutchess Turnpike is",haversine(41.740156,-73.930774, 41.7004962, -73.8862521)/1.609,"miles.")
print("City of Poughkeepsie to Candidate - Vassar College is",haversine(41.740156,-73.930774, 41.6862,-73.8973)/1.609,"miles.")


City of Poughkeepsie to Candidate - Arlington Adams is 3.7276407505828595 miles.
City of Poughkeepsie to Candidate - Corlies is 1.834563995642151 miles.
City of Poughkeepsie to Candidate - Dutchess Turnpike is 3.1692193848375916 miles.
City of Poughkeepsie to Candidate - Vassar College is 2.533557448277119 miles.


### Calculate the distance between existing gym locations and candidate gym locations to ensure candidates are appropriately spaced from existing gyms.

In [76]:
# Calculate the distance between existing gym locations and candidate areas.
print("Candidate - Arlington Adams to Planet Fitness is",haversine(41.700432, -73.877971, 41.7004, -73.921000)/1.609,"miles.")
print("Candidate - Arlington Adams to Crunch Fitness is",haversine(41.700432, -73.877971, 41.64406, -73.924041)/1.609,"miles.")
print("Candidate - Arlington Adams to Mike Arteaga's Health Fitness Center is",haversine(41.700432, -73.877971, 41.739190,-73.963842)/1.609,"miles.")
print("Candidate - Arlington Adams to Gold's Gym is",haversine(41.700432, -73.877971, 41.657765, -73.855858)/1.609,"miles.")
print(" ")
print("Candidate - Corlies to Planet Fitness is",haversine(41.707070, -73.905860, 41.7004, -73.921000)/1.609,"miles.")
print("Candidate - Corlies to Crunch Fitness is",haversine(41.707070, -73.905860, 41.64406, -73.924041)/1.609,"miles.")
print("Candidate - Corlies to Mike Arteaga's Health Fitness Center is",haversine(41.707070, -73.905860, 41.739190,-73.963842)/1.609,"miles.")
print("Candidate - Corlies Adams to Gold's Gym is",haversine(41.707070, -73.905860, 41.657765, -73.855858)/1.609,"miles.")
print(" ")
print("Candidate - Dutchess Turnpike to Planet Fitness is",haversine(41.700496, -73.886252, 41.7004, -73.921000)/1.609,"miles.")
print("Candidate - Dutchess Turnpike to Crunch Fitness is",haversine(41.700496, -73.886252, 41.64406, -73.924041)/1.609,"miles.")
print("Candidate - Dutchess Turnpike to Mike Arteaga's Health Fitness Center is",haversine(41.700496, -73.886252, 41.739190,-73.963842)/1.609,"miles.")
print("Candidate - Dutchess Turnpike to Gold's Gym is",haversine(41.700496, -73.886252, 41.657765, -73.855858)/1.609,"miles.")
print(" ")
print("Candidate - Vassar Collage to Planet Fitness is",haversine(41.6862,-73.8973, 41.7004, -73.921000)/1.609,"miles.")
print("Candidate - Vassar Collage to Crunch Fitness is",haversine(41.6862,-73.8973, 41.64406, -73.924041)/1.609,"miles.")
print("Candidate - Vassar Collage to Mike Arteaga's Health Fitness Center is",haversine(41.6862,-73.8973, 41.739190,-73.963842)/1.609,"miles.")
print("Candidate - Vassar Collage to Gold's Gym is",haversine(41.6862,-73.8973, 41.657765, -73.855858)/1.609,"miles.")

Candidate - Arlington Adams to Planet Fitness is 2.97365233085538 miles.
Candidate - Arlington Adams to Crunch Fitness is 3.3620918471657704 miles.
Candidate - Arlington Adams to Mike Arteaga's Health Fitness Center is 5.980569635590196 miles.
Candidate - Arlington Adams to Gold's Gym is 1.7339742927765385 miles.
 
Candidate - Corlies to Planet Fitness is 1.054063616292027 miles.
Candidate - Corlies to Crunch Fitness is 1.7419125531929378 miles.
Candidate - Corlies to Mike Arteaga's Health Fitness Center is 4.053835687694942 miles.
Candidate - Corlies Adams to Gold's Gym is 3.5826955760587396 miles.
 
Candidate - Dutchess Turnpike to Planet Fitness is 2.401368828183791 miles.
Candidate - Dutchess Turnpike to Crunch Fitness is 2.8265083128374933 miles.
Candidate - Dutchess Turnpike to Mike Arteaga's Health Fitness Center is 5.412977677371845 miles.
Candidate - Dutchess Turnpike to Gold's Gym is 2.2549876947793632 miles.
 
Candidate - Vassar Collage to Planet Fitness is 1.660291829522300

### Identify a few candidate areas in and around Poughkeepsie for gym / fitness center locations.  Candidate areas are areas with major route access, far from other gym fitness center locations, within 7 miles of City of Poughkeepsie limits.

#### By visually looking at the map of existing gym locations, there are 4 candidate locations.


In [77]:
#initialize a dataframe
candidate_areas = pd.DataFrame(
[['12603', 'Poughkeepsie', 'Candidate - Arlington Adams',41.700432, -73.877971],
['12603', 'Poughkeepsie', 'Candidate - Corlies',41.707070, -73.905860],
['12603', 'Poughkeepsie', 'Candidate - Dutchess Turnpike',41.7004962, -73.8862521],
['12604', 'Poughkeepsie', 'Candidate - Vassar Collage',41.6862,-73.8973]],
columns=['Zipcode','Town','address','lat','lng'])

print(candidate_areas)

  Zipcode          Town                        address        lat        lng
0   12603  Poughkeepsie    Candidate - Arlington Adams  41.700432 -73.877971
1   12603  Poughkeepsie            Candidate - Corlies  41.707070 -73.905860
2   12603  Poughkeepsie  Candidate - Dutchess Turnpike  41.700496 -73.886252
3   12604  Poughkeepsie     Candidate - Vassar Collage  41.686200 -73.897300


### Map the 4 candidate areas in relation to the 4 Poughkeepsie neighborhoods and the 4 existing gym / fitness centers.

#### red marker - candidate gym location, blue - neighborhood, green - existing gym

In [78]:
map_Poughkeepsie_candidate_gyms = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, town, area in zip(Poughkeepsie_zipcodes['Latitude'], Poughkeepsie_zipcodes['Longitude'], Poughkeepsie_zipcodes['Town'], Poughkeepsie_zipcodes['Neighbourhood']):
    label = '{}, {}'.format(area, town)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie_candidate_gyms) 
    
# add markers to map
for lat, lng, town, area in zip(existing_gyms['lat'], existing_gyms['lng'], existing_gyms['name'], 
                                existing_gyms['address']):
    label = '{}, {}'.format(town,area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='green',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie_candidate_gyms)  
    
# add markers to map
for lat, lng, town, area in zip(candidate_areas['lat'], candidate_areas['lng'], candidate_areas['address'], 
                                candidate_areas['Town']):
    label = '{}, {}'.format(town,area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='red',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Poughkeepsie_candidate_gyms) 
map_Poughkeepsie_candidate_gyms

## Conclusion <a name="conclusion"></a>

#### Purpose of this project was to identify Poughkeepsie areas close to center with low number of gym / fitness centers in order to aid stakeholders in narrowing down the search for optimal location for a new gym / fitness center.  Any of the 4 candidate gym locations are ideal locations.  

#### The optimal location for a gym would be the "Dutchess Turnpike" location.  This location has multiple state managed roads that provide quick and convenient access.  There is also alot of huge empty store areas in this location with existing parking lots.

#### The second best location for a gym would be "Arlington Adams".  This location is furthest east (most right) and on the fringe of Poughkeepsie.  The next town over has no gym's in the vacinity.  This  location is also good as it has one major state managed road.  The area also has alot of new business and housing development occurring.  Costs may be higher for this area as there is no existing empty buildings.  So a building would need to be built.

#### The third best location would be "Vassar College".  While a great area, it is already a traffic congested area which would frustrate potential customer.  

#### The last best location would be "Corlies".  While a good area, it is more in the city area in a more low-income neighborhood.

#### Final decission on the optimal gym / fitness location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended location, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.