# Capstone Project - The Battle of Neighborhoods

## 1. Introduction/Business Problem section

#### A short description of the business problem and the target audience interested

My business partner and I would like to initiate a startup business to provide indoor rock climbing facilities in Singapore. We have looked through a list of locations provided by our property consultant and narrowed down our preferred choice to 5 locations based on their rental prices. 

Aside from rental prices, we would also like to take reference from the characteristics (e.g near public transport facilities, food outlets etc) of our competitors' location to select our choice of venue. 

My partner has asked for my help to find out how many of such similar rock climbing facilities are in Singapore and where they are located. Leveraging on the skills I have learnt in this course on using Foursquare location API, I would be identifying the characteristics of the venues surrounding our competitors' facilities (e.g public transport facilities and food outlets). Subsequently, based on what we know of our competitors' venue, we would then apply the characteristics to our list of shortlisted venues to make our decision.

## 2. Data section

#### Description of the data and its sources that will be used to solve the problem

**2.1 Data on existing rock climbing facilities in Singapore and their location**

To identify the characteristics of our competitors' venues in Singapore, I would first need to find out the number of rock climbing facilities in Singapore currently and their location. I have found a list of popular rock climbing facilities in Singapore from the website __[thehoneycombers.com](https://thehoneycombers.com/singapore/rock-climbing-in-singapore-conquer-this-extreme-sport-at-these-best-indoor-climbing-walls-and-bouldering-gyms/)__ and obtained their postal code addresses. 

Next, I would be using Google Map API to find their geographic coordinates based on their postal code addresses.

In [98]:
!pip install beautifulsoup4
!pip install lxml
!pip install html5lib
!pip install requests
!pip install geocoder

from bs4 import BeautifulSoup
import requests
import csv
import json
import xml
import pandas as pd

print("Installed successfully!")

Installed successfully!


In [99]:
# To crawl the website for competitors' names and address
res = requests.get("https://thehoneycombers.com/singapore/rock-climbing-in-singapore-conquer-this-extreme-sport-at-these-best-indoor-climbing-walls-and-bouldering-gyms/")
df_rock_climbing = BeautifulSoup(res.content,'lxml')


In [100]:
col_names = ['Name','Address']
df_climbing = pd.DataFrame(columns = col_names)
pd.set_option('display.max_colwidth', 0)

name = 0
address = 0

# To extract names of comeptitor and their addresses and store as dataframe
content = df_rock_climbing.find('div', class_='entry-content')

for para in content.find_all('p'):
    i = 0  
    if i == 0:
        for outlets in para.find_all(['b','strong']):
            name = outlets.text.strip('\n').replace('.','')
            
        for add in para.find_all(['i','em']):
            address = add.text.strip('\n').replace(',','')
            
            i = i + 1
            
    df_climbing = df_climbing.append({'Name': name, 'Address': address},ignore_index=True) 

# Duplicates and invalid rows are dropped
df_climbing.drop_duplicates(subset=['Address'], keep='first', inplace=False)
df_climbing = df_climbing[3:15].reset_index(drop=True)

df_climbing.head(15)

Unnamed: 0,Name,Address
0,Gorilla Climbing Gym,#01-02 Viva Business Park 750B Chai Chee Road Singapore 469002 p. 6243 0386
1,Let ‘em Play,#01-100 18 Boon Lay Way Singapore 609966 p. 6266 6125
2,Oyeyo Boulder Home,148 Mackenzie Road Singapore 228724
3,Boulder Movement,#B1-03 OUE Downtown Gallery 6A Shenton Way Singapore 068815
4,Climbers Laboratory,#05-153 Enterprise Hub 48 Toh Guan Road East Singapore 608586 p. 6515 9363
5,Onsight Climbing,100 Guillemard Road Singapore 399718 p. 6348 8272
6,Climb Central,#B1-01 Kallang Wave Mall 1 Stadium Place Singapore 397718 p. 6702 7972
7,Kinetics Climbing,#02-07 Orion @ Paya Lebar 160 Paya Lebar Road Singapore 409022 p. 6745 6426
8,Climb Asia,60 Tessensohn Road Singapore 217664 p. 6292 7701
9,The Rock School,850 New Upper Changi Road Singapore 467352 p. 6242 2106


In [101]:
# To create a new postalcode field from address field
df_climbing['Postalcode'] = df_climbing['Address'].str.extract('(\d\d\d\d\d\d)', expand=True)
df_climbing.head()

Unnamed: 0,Name,Address,Postalcode
0,Gorilla Climbing Gym,#01-02 Viva Business Park 750B Chai Chee Road Singapore 469002 p. 6243 0386,469002
1,Let ‘em Play,#01-100 18 Boon Lay Way Singapore 609966 p. 6266 6125,609966
2,Oyeyo Boulder Home,148 Mackenzie Road Singapore 228724,228724
3,Boulder Movement,#B1-03 OUE Downtown Gallery 6A Shenton Way Singapore 068815,68815
4,Climbers Laboratory,#05-153 Enterprise Hub 48 Toh Guan Road East Singapore 608586 p. 6515 9363,608586


In [102]:
# @hidden_cell
GOOGLE_API_KEY='AIzaSyAQWqMTOcyLBRDR2skO4F_5QEWzNDOlUHw'

In [103]:
# Obtain the Latitude and Longitude of each competitor's venue using Geocoder and Google API
import geocoder

# Define function to get latitude and longitude
def get_latlng(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Singapore'.format(postal_code), key=GOOGLE_API_KEY)
        lat_lng_coords = g.latlng
    return lat_lng_coords

# Insert new column of latitude and logitude into dataframe
postal_codes1 = df_climbing['Postalcode']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes1.tolist() ]

df_climbing_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_climbing['Latitude'] = df_climbing_coords['Latitude']
df_climbing['Longitude'] = df_climbing_coords['Longitude']
df_climbing.head(15)

Unnamed: 0,Name,Address,Postalcode,Latitude,Longitude
0,Gorilla Climbing Gym,#01-02 Viva Business Park 750B Chai Chee Road Singapore 469002 p. 6243 0386,469002,1.323535,103.920741
1,Let ‘em Play,#01-100 18 Boon Lay Way Singapore 609966 p. 6266 6125,609966,1.328234,103.753597
2,Oyeyo Boulder Home,148 Mackenzie Road Singapore 228724,228724,1.306886,103.846657
3,Boulder Movement,#B1-03 OUE Downtown Gallery 6A Shenton Way Singapore 068815,68815,1.352083,103.819836
4,Climbers Laboratory,#05-153 Enterprise Hub 48 Toh Guan Road East Singapore 608586 p. 6515 9363,608586,1.337078,103.754725
5,Onsight Climbing,100 Guillemard Road Singapore 399718 p. 6348 8272,399718,1.310077,103.881833
6,Climb Central,#B1-01 Kallang Wave Mall 1 Stadium Place Singapore 397718 p. 6702 7972,397718,1.303242,103.874757
7,Kinetics Climbing,#02-07 Orion @ Paya Lebar 160 Paya Lebar Road Singapore 409022 p. 6745 6426,409022,1.32932,103.890342
8,Climb Asia,60 Tessensohn Road Singapore 217664 p. 6292 7701,217664,1.315722,103.856214
9,The Rock School,850 New Upper Changi Road Singapore 467352 p. 6242 2106,467352,1.324349,103.935899


**2.2 Data on geographic coordinates of the 5 shortlisted locations**

We have also prepared a datset of the addresses of the 5 locations we have shortlisted. Using Google Map API, we extracted their geographic coordinates based on their postal code addresses.

In [104]:
# To list the 5 locations we have shortlisted
shortlisted_locations = [
    { 'Location': 'L1', 'Address': 'The Grandstand, 200 Turf Club Rd','Postalcode': '287994'},
    { 'Location': 'L2', 'Address': 'IMM, 2 Jurong East Street 21','Postalcode': '609601'},
    { 'Location': 'L3', 'Address': 'KINEX, 11 Tanjong Katong Rd','Postalcode': '437157'},
    { 'Location': 'L4', 'Address': 'The Star Vista, 1 Vista Exchange Green','Postalcode': '138617'}, 
    { 'Location': 'L5', 'Address': 'Causeway Point, 1 Woodlands Square','Postalcode': '738099'},
]

In [105]:
# Extract latitude and logitude details and create dataframe for the 5 shortlisted locations
for loc in shortlisted_locations:        
    (lat, lng) = get_latlng(loc['Address'])
    loc['Latitude'] = lat
    loc['Longitude'] = lng
    
df_shortlisted_locations = pd.DataFrame(shortlisted_locations, columns=['Location', 'Address', 'Latitude', 'Longitude'])
df_shortlisted_locations

Unnamed: 0,Location,Address,Latitude,Longitude
0,L1,"The Grandstand, 200 Turf Club Rd",1.337892,103.793338
1,L2,"IMM, 2 Jurong East Street 21",1.334816,103.746834
2,L3,"KINEX, 11 Tanjong Katong Rd",1.31485,103.894613
3,L4,"The Star Vista, 1 Vista Exchange Green",1.306763,103.788424
4,L5,"Causeway Point, 1 Woodlands Square",1.435984,103.786013


**2.3 Data on transportation, food and other amenities around competitors' venue**

We expect our competitors' venues to be located in areas with good access to public transportation (Train stations / Bus-Stops), food outlets (fast-food chains / cafes / restaurants) and retail shopping outlets. We would be using Foursquare API to find out what are the services around our competitors' venue for comparison. The radius of exploration distance is set at 350 meters, which is about 5 minutes walking distance.

In [117]:
# Types of services that we would like to examine
services_categories = {
    'Food': '4d4b7105d754a06374d81259',
    'Shop & Service': '4d4b7105d754a06378d81259',
    'Bus Stop': '52f2ab2ebcbc57f1066b8b4f',
    'Metro Station': '4bf58dd8d48988d1fd931735',
}

In [107]:
# To create a list of services that we would like to examine
', '.join([cat for cat in services_categories])

'Food Court, Shop & Service, Bus Stop, Metro Station'

In [149]:
# Use the one of competitors as example to explore the services nearby within 350m
climb_venue = df_climbing.loc[7]
print(climb_venue)

Name          Kinetics Climbing                                                           
Address        #02-07 Orion @ Paya Lebar 160 Paya Lebar Road Singapore 409022 p. 6745 6426
Postalcode    409022                                                                      
Latitude      1.32932                                                                     
Longitude     103.89                                                                      
Name: 7, dtype: object


In [109]:
# Install FourSquare client library
!pip install foursquare



In [110]:
import foursquare
from pandas.io.json import json_normalize

In [111]:
CLIENT_ID = '11EAVFNW2GBPYO2GQZPKXZAR1GZMVU2XMGIDLYPZ3IXIJCUK' # Foursquare ID
CLIENT_SECRET = 'ZURQ1DXHFF4BX3CIXWIRFKZU3N345KRXMKV32ER2VUM1XGB3' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [112]:
fs = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)

In [138]:
# Define a function to search nearby information and convert the result as dataframe
RADIUS = 350

def venues_nearby(latitude, longitude, category, verbose=True):    
    results = fs.venues.search(
        params = {
            'query': category, 
            'll': '{},{}'.format(latitude, longitude),
            'radius': RADIUS,
            'categoryId': services_categories[category]
        }
    )
    df = json_normalize(results['venues'])
    cols = ['Name','Latitude','Longitude','Tips','Users','Visits']    
    if( len(df) == 0 ):        
        df = pd.DataFrame(columns=cols)
    else:        
        df = df[['name','location.lat','location.lng','stats.tipCount','stats.usersCount','stats.visitsCount']]
        df.columns = cols
    if( verbose ):
        print('{} "{}" venues are found within {}m of location'.format(len(df), category, RADIUS))
    return df

In [145]:
# Metro station around the venue of our sample of competitor
venues_nearby(climb_venue['Latitude'], climb_venue['Longitude'], 'Metro Station').head()

1 "Metro Station" venues are found within 350m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,MacPherson MRT Interchange (CC10/DT26),1.32612,103.890066,0,0,0


In [150]:
# Bus stops around the venue of our sample of competitor
venues_nearby(climb_venue['Latitude'], climb_venue['Longitude'], 'Bus Stop').head()

4 "Bus Stop" venues are found within 350m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Bus Stop 70099 (Traffic Police),1.330498,103.890291,0,0,0
1,Bus Stop 70379 (Opp MacPherson Stn Exit A),1.326391,103.889509,0,0,0
2,Bus Stop 70381 (Kong Hwa Sch),1.327464,103.887688,0,0,0
3,macpherson busstop 70191,1.326664,103.889572,0,0,0


In [153]:
# Food outlets around the venue of our sample of competitor
venues_nearby(climb_venue['Latitude'], climb_venue['Longitude'], 'Food').head()

5 "Food" venues are found within 350m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Blk 80 Circuit Road Market & Food Centre,1.327835,103.88709,0,0,0
1,Food Loft,1.329035,103.893789,0,0,0
2,Foodhub,1.331463,103.889779,0,0,0
3,Mr Pnut Food Industries,1.327508,103.893338,0,0,0
4,senoko food junction,1.328311,103.894085,0,0,0


In [152]:
# Retail shopping outlets around the venue of our sample of competitor
venues_nearby(climb_venue['Latitude'], climb_venue['Longitude'], 'Shop & Service').head()

3 "Shop & Service" venues are found within 350m of location


Unnamed: 0,Name,Latitude,Longitude,Tips,Users,Visits
0,Number Plate & Sticker Shop,1.330521,103.892029,0,0,0
1,TopMax Auto Body Shop,1.330419,103.893785,0,0,0
2,MyCarShop Car Accessories,1.330061,103.892749,0,0,0
