# Capstone Project - The Battle of Neighborhoods

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
1. <a href="#item1">Introduction/Business Problem</a>    
2. <a href="#item2">Data/Methodology</a> 
3. <a href="#item2">Results/Discussion</a> 
4. <a href="#item5">Conclusion</a>     
</font>
</div>

## 1. Introduction/Business Problem 
**The purpose of this project is to find potential neighborhoods in Bronx, NY to open a new gym. First, I will identify the neighborhoods that do not have a gym yet. Secondly, I will find the top 3 most common venues for each neighborhood. Lastly, potential neighborhoods will be selected based on these venues. A venues of interest is public transportation**

## 2. Data/Methodology

## 2.1. I used Foursquare location data to find potential neighborhoods in Bronx, NY that do not have a gym.

### Import necessary Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    cryptography-2.4.2         |   py36h1ba5d50_0         618 KB
    openssl-1.1.1a             |    h14c3975_1000         4.0 MB  conda-forge
    libarchive-3.3.3           |       h5d8350f_5         1.5 MB
    grpcio-1.16.1              |   py36hf8bcb03_1         1.1 MB
    geopy-1.18.1               |             py_0          51 KB  conda-forge
    conda-4.6.2                |           py36_0         869 KB  conda-forge
    libssh2-1.8.0              |                1         239 KB  conda-forge
    python-3.6.8               |       h0371630_0        34.4 MB
    ------------------------------------------------------------
      

#### Download and export dataset (New York)

In [3]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


In [4]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [2]:
#newyork_data

In [5]:
neighborhoods_data = newyork_data['features']

In [168]:
#neighborhoods_data[0]

#### Tranform the data into a *pandas* dataframe

In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

#### Select the Borough Bronx

In [8]:
bronx_data = neighborhoods[neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
bronx_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Define Foursquare Credentials and Version

In [115]:
#not shown

#### Find venues for each neighborhood

In [10]:
LIMIT = 100 
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'VenueLatitude', 
                  'VenueLongitude', 
                  'VenueCategory']
    
    return(nearby_venues)

In [11]:
bronx_venues = getNearbyVenues(names=bronx_data['Neighborhood'],
                                   latitudes=bronx_data['Latitude'],
                                   longitudes=bronx_data['Longitude']
                                  )

In [12]:
#bronx_venues.head()

#### Print existent gyms and their neighborhood. Then select all venues from neigborhoods without a gym

In [13]:
bronx_nogym = []
print('Neighborhood:Gym')
for row,venuecategory in enumerate(bronx_venues.VenueCategory):
    if "Gym" in venuecategory:
        print(bronx_venues.loc[row,"Neighborhood"],":",bronx_venues.loc[row,"Venue"])
    else: 
        bronx_nogym.append(bronx_venues.loc[row,])
bronx_nogym=pd.DataFrame(bronx_nogym)
#bronx_nogym.head()

Neighborhood:Gym
Riverdale : Hayden On Hudson Gym
Baychester : Planet Fitness
Pelham Parkway : B-Well Studio
Fordham : Blink Fitness Fordham
Fordham : Blink Fitness
Fordham : Lucille Roberts
Fordham : Planet Fitness
Fordham : 24 Hour Fitness
Fordham : 24 Hour Fitness
High  Bridge : Retro Fitness
Melrose : Blink Fitness
Melrose : Blink Fitness St Ann's
Mott Haven : CrossFit SoBro (South Bronx)
Mott Haven : The Bronx Box
Parkchester : Blink Fitness Parkchester
Westchester Square : Star Fitness
Pelham Bay : iLoveKickboxing - Bronx
Pelham Bay : Planet Fitness
Pelham Bay : Thin Is In
Claremont Village : Retro Fitness
Mount Eden : Blink Fitness Mt. Eden
Mount Eden : Planet Fitness
Bronxdale : Bronx House Fitness


In [191]:
print('Bronx has {} neighborhoods without a gym.'.format(len(bronx_nogym["Neighborhood"].unique())))

Bronx has 52 neighborhoods without a gym.


## 2.2. I found the top 3 venues for each of the neighborhoods that do not have a gym yet.

In [14]:
# one hot encoding
bronx_nogym_onehot = pd.get_dummies(bronx_nogym[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bronx_nogym_onehot['Neighborhood'] = bronx_nogym['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bronx_nogym_onehot.columns[-1]] + list(bronx_nogym_onehot.columns[:-1])
bronx_nogym_onehot = bronx_nogym_onehot[fixed_columns]

#bronx_nogym_onehot.head()

In [15]:
bronx_nogym_grouped = bronx_nogym_onehot.groupby('Neighborhood').mean().reset_index()

In [16]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bronx_nogym_grouped['Neighborhood']

for ind in np.arange(bronx_nogym_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bronx_nogym_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Allerton,Pizza Place,Supermarket,Spa
1,Baychester,Pizza Place,Bank,Discount Store
2,Bedford Park,Chinese Restaurant,Deli / Bodega,Diner
3,Belmont,Italian Restaurant,Pizza Place,Deli / Bodega
4,Bronxdale,Italian Restaurant,Chinese Restaurant,Breakfast Spot
5,Castle Hill,Latin American Restaurant,Pharmacy,Pizza Place
6,City Island,Harbor / Marina,Seafood Restaurant,Thrift / Vintage Store
7,Claremont Village,Bus Station,Pizza Place,Bakery
8,Clason Point,Park,Spa,Business Service
9,Co-op City,Bus Station,Discount Store,Ice Cream Shop


## 2.3. Select potential neighborhoods: public transportation is the 1st most common venue

In [100]:
bronx_potential_gym = []

for index,row in neighborhoods_venues_sorted.iterrows():
    if "Station" in row["1st Most Common Venue"]:
        print(row["Neighborhood"],":",row["1st Most Common Venue"])
        bronx_potential_gym.append(row["Neighborhood"])       

Claremont Village : Bus Station
Co-op City : Bus Station
Morris Heights : Bus Station
Pelham Gardens : Bus Station
West Farms : Bus Station


In [101]:
 bronx_potential_gym

['Claremont Village',
 'Co-op City',
 'Morris Heights',
 'Pelham Gardens',
 'West Farms']

In [107]:
bronx_nogym_potential = []
bronx_nogym_potential1 = bronx_nogym [bronx_nogym['Neighborhood'] == 'Claremont Village'].reset_index(drop=True)
bronx_nogym_potential2 = bronx_nogym[bronx_nogym['Neighborhood'] == 'Co-op City'].reset_index(drop=True)
bronx_nogym_potential3 = bronx_nogym[bronx_nogym['Neighborhood'] == 'Morris Heights'].reset_index(drop=True)
bronx_nogym_potential4 = bronx_nogym[bronx_nogym['Neighborhood'] == 'Pelham Gardens'].reset_index(drop=True)
bronx_nogym_potential5 = bronx_nogym[bronx_nogym['Neighborhood'] == 'West Farms'].reset_index(drop=True)
bronx_nogym_potential = pd.concat([bronx_nogym_potential1,bronx_nogym_potential2,bronx_nogym_potential3,bronx_nogym_potential4,bronx_nogym_potential5])

#### Create a map of Bronx with the 5 potential neighborhoods superimposed on top.

In [109]:
address = 'Bronx, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bronx are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Bronx are 40.85048545, -73.8404035580209.


In [113]:
# create map of Manhattan using latitude and longitude values
map_bronx_potential = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(bronx_nogym_potential["Neighborhood Latitude"], bronx_nogym_potential["Neighborhood Longitude"], bronx_nogym_potential["Neighborhood"]):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bronx_potential)  
    
map_bronx_potential

## 3. Results/Discussion

1. I found that Bronx has **52** neighborhoods **without a gym**. Therefore, the new gym in these neighborhoods would not have local competition.
2. I identified the **3 most common venues** for each neighborhood of the selected 52 neighborhoods. These venues were then used to further investigate the best location to open the gym.
3. To identify potential good neighborhoods to open a gym, I selected only neighborhoods where the 1st Common Venue was "Public transportation" (Bus/Metro). 
4. Based only on the venue "Public Transportation" I found 5 potential neighborhoods: **Claremont Village, Co-op City, Morris Heights, Pelham Gardens, and West Farms**. In these neighborhoods bus station is the 1st most common venue. 
5. I created a map of Bronx with the 5 potential locations for a new gym.

## 4. Conclusion

I identified **Claremont Village, Co-op City, Morris Heights, Pelham Gardens, and West Farms** as potential good neighborhoods to open a new gym. These neighborhoods do not have yet a gym, which eliminates local competition. These neighborhoods have also frequent use of the bus. I used  public transportation as a favorable venue but others should be considered to further reduce the list of potential neighborhoods to only one neighborhood with potential for a successful gym.  