# Battle of London Borough

#### Finding the Best Place to live for Body Building Enthusiast

## 1. Introduction

### 1.1. Background Story

People who want to begin doing body building have to do workout routine to get the body they want. As well as people who has been on body bulding for a long time have to maintain their workout routine to stay in shape.

To maintain their routine, instead of doing workout in their own home, it is likely to be more effective for them to do their workout in the gym / fitness center. But, they can be reluctant to go to the gym regularly (especially for beginner) because of:
    1. The gym is far away, and
    2. There are very little options of another gym if they don't like the nearest gym from their home

We are going to help the body building enthusiast, the beginners and experts, to choose the best place for them to live in a city, in this case is London. We will find the areas in London where have many option of gym in it's small sorrounding area.

### 1.2. Ideas

Ideas to solve the problem:

1. Find nearby venues on each borough in London
2. Retrieve all gym venues on London to be analyzed 
3. Cluster gym venues using Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
4. Select the cluster with most number of gym in the cluster
5. If the biggest and most dense cluster is too large, find the subcluster to obtain the most suitable area which sorrounded by many gym venues

### 1.3. Data

1. List of London Borough
Source : from the Wikipedia (London Boroughs, link : https://en.wikipedia.org/wiki/London_boroughs)

    How to use the Data:
    - Obtain Data using BeautifulSoup
    - Data of Borough Name will be used for searching the coordinate of Borough, and then finding the nearby gym venues

2. Coordinate of Borough
Source : longitude and latitiude of each borough can be retrieved with Geopy library

    How to use the Data:
    - Obtain Data Geopy library
    - Data will be used for searchig nearby gym venues

3. Venues
Source : Foursquare API

    How to use the Data:
    - Obtain with Foursquare API
    - Data will be used to cluster the venues based on the location

### 1.4. Methodology

1. Get Table of list of London Borough using beautiful soup from wikipedia page
2. Get the coordinate data (longitude and latitude) from each borough using geopy library
3. From each borough, find the near venue on radius 2km with specific category id of gym / fitness center
4. Clean data and do the preprocessing method
5. CLuster the gym data using DBSCAN
6. Select the cluster with most number of gym in the cluster
7. If the biggest and most dense cluster is too large, find the subcluster to obtain the most suitable area which sorrounded by many gym venues
8. Find the conclusion and summary of the entire process and output

# 

## 2. Data (Collection and Preparation)

1. List of London Borough
Source : from the Wikipedia (London Boroughs, link : https://en.wikipedia.org/wiki/London_boroughs)

    How to use the Data:
    - Obtain Data using BeautifulSoup
    - Data of Borough Name will be used for searching the coordinate of Borough, and then finding the nearby gym venues

2. Coordinate of Borough
Source : longitude and latitiude of each borough can be retrieved with Geopy library

    How to use the Data:
    - Obtain Data Geopy library
    - Data will be used for searchig nearby gym venues

3. Venues
Source : Foursquare API

    How to use the Data:
    - Obtain with Foursquare API
    - Data will be used to cluster the venues based on the location

### 2.1 Import required libraries
Especially for downloading and working with table from Wikipedia

In [1]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

### 2.2 Get the URL and create Beautiful Soup Object

In [2]:
URL = 'https://en.wikipedia.org/wiki/London_boroughs'
page = requests.get(URL)

In [3]:
soup = BeautifulSoup(page.content, 'html.parser')

### 2.3 Get the Table from Wikipedia

In [4]:
table = soup.find_all('table')

In [5]:
len(table)

9

In [6]:
table_raw = pd.read_html(str(table[2]), flavor='bs4')[0] #table(2) is the selected table
table_raw.reset_index(drop=True)
table_raw

Unnamed: 0,London borough,Designation,Former areas,Former areas.1,Former areas.2,Former areas.3,Former areas.4
0,Camden,Inner,Hampstead (11a),St Pancras (11b),Holborn (11c),,
1,Greenwich,Inner,Greenwich (22a),Woolwich (part) (22b),,,
2,Hackney,Inner,Hackney (9a),Shoreditch (9b),Stoke Newington (9c),,
3,Hammersmith[notes 2],Inner,Hammersmith (4a),Fulham (4b),,,
4,Islington,Inner,Islington (10a),Finsbury (10b),,,
5,Kensington and Chelsea,Inner,Kensington (3a),Chelsea (3b),,,
6,Lambeth,Inner,Lambeth (6a),Wandsworth (part) (6b),,,
7,Lewisham,Inner,Lewisham (21a),Deptford (21b),,,
8,Southwark,Inner,Bermondsey (7b),Camberwell (7c),Southwark (7a),,
9,Tower Hamlets,Inner,Bethnal Green (8a),Poplar (8c),Stepney (8b),,


In [7]:
london_bor = pd.DataFrame(table_raw[['London borough','Designation']])
london_bor.rename(columns = {'London borough':'Borough'}, inplace=True)
london_bor

Unnamed: 0,Borough,Designation
0,Camden,Inner
1,Greenwich,Inner
2,Hackney,Inner
3,Hammersmith[notes 2],Inner
4,Islington,Inner
5,Kensington and Chelsea,Inner
6,Lambeth,Inner
7,Lewisham,Inner
8,Southwark,Inner
9,Tower Hamlets,Inner


In [8]:
# rename the incorrect name of borough
london_bor['Borough'] = london_bor['Borough'].str.replace('[notes 2]', '', regex=False)
london_bor['Borough'] = london_bor['Borough'].str.replace('[notes 3]', '', regex=False)
london_bor

Unnamed: 0,Borough,Designation
0,Camden,Inner
1,Greenwich,Inner
2,Hackney,Inner
3,Hammersmith,Inner
4,Islington,Inner
5,Kensington and Chelsea,Inner
6,Lambeth,Inner
7,Lewisham,Inner
8,Southwark,Inner
9,Tower Hamlets,Inner


Add city of london Area

In [9]:
london_bor.loc[len(london_bor.index)] = ['City of London', 'Inner'] 

In [10]:
london_bor

Unnamed: 0,Borough,Designation
0,Camden,Inner
1,Greenwich,Inner
2,Hackney,Inner
3,Hammersmith,Inner
4,Islington,Inner
5,Kensington and Chelsea,Inner
6,Lambeth,Inner
7,Lewisham,Inner
8,Southwark,Inner
9,Tower Hamlets,Inner


### 2.4. Add Geo Data of The Boroughs

In [11]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [12]:
address = ', London'
london_bor['Longitude'] = london_bor['Designation']
london_bor['Latitude'] = london_bor['Designation']


for k in range(len(london_bor)):
    geolocator = Nominatim(user_agent="t_explorer")
    location = geolocator.geocode(str(london_bor['Borough'][k]) + address)
    latitude = location.latitude
    longitude = location.longitude
    london_bor['Latitude'][k] = latitude
    london_bor['Longitude'][k] = longitude

In [13]:
london_bor

Unnamed: 0,Borough,Designation,Longitude,Latitude
0,Camden,Inner,-0.13956,51.542305
1,Greenwich,Inner,-0.004542,51.482084
2,Hackney,Inner,-0.049362,51.54324
3,Hammersmith,Inner,-0.22364,51.492038
4,Islington,Inner,-0.099905,51.538429
5,Kensington and Chelsea,Inner,-0.199043,51.49848
6,Lambeth,Inner,-0.117287,51.501301
7,Lewisham,Inner,-0.010133,51.462432
8,Southwark,Inner,-0.103458,51.502922
9,Tower Hamlets,Inner,-0.076222,51.50812


### 2.5. Visualize The Boroughs in London

In [14]:
address = 'London'

geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [15]:
import folium # map rendering library

In [16]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Des, Borough in zip(london_bor['Latitude'], london_bor['Longitude'], london_bor['Designation'], london_bor['Borough']):
    label = '{}, {}'.format(Borough, Des)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

# 

## 3. Get Venues From Foursquare

### 3.1. Define Keys and Function

In [17]:
CLIENT_ID = '1BCHB0C1RPZKMYMFZNFR5M1JQLSGSKPSNOCOLZLDYV0U1PUT' # your Foursquare ID
CLIENT_SECRET = 'RYUK0Q03OPC4WN5OWIP0SDZ2A4K02KDAL3ZTXJHW5Z4KKZDD' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1BCHB0C1RPZKMYMFZNFR5M1JQLSGSKPSNOCOLZLDYV0U1PUT
CLIENT_SECRET:RYUK0Q03OPC4WN5OWIP0SDZ2A4K02KDAL3ZTXJHW5Z4KKZDD


In [18]:
category_id = '4bf58dd8d48988d175941735'

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, category_id, radius, LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### 3.2. Retrieve Venues in London

In [20]:
london_venues = getNearbyVenues(names=london_bor['Borough'],
                                   latitudes=london_bor['Latitude'],
                                   longitudes=london_bor['Longitude']
                                  )

Camden
Greenwich
Hackney
Hammersmith
Islington
Kensington and Chelsea
Lambeth
Lewisham
Southwark
Tower Hamlets
Wandsworth
Westminster
Barking
Barnet
Bexley
Brent
Bromley
Croydon
Ealing
Enfield
Haringey
Harrow
Havering
Hillingdon
Hounslow
Kingston upon Thames
Merton
Newham
Redbridge
Richmond upon Thames
Sutton
Waltham Forest
City of London


In [21]:
london_venues

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Camden,51.542305,-0.139560,PureGym,51.539250,-0.143077,Gym / Fitness Center
1,Camden,51.542305,-0.139560,Barry's Bootcamp,51.527075,-0.131056,Gym / Fitness Center
2,Camden,51.542305,-0.139560,FRAME,51.536593,-0.122553,Gym
3,Camden,51.542305,-0.139560,PureGym,51.554052,-0.144984,Gym / Fitness Center
4,Camden,51.542305,-0.139560,Urban Kings,51.531300,-0.121950,Gym / Fitness Center
...,...,...,...,...,...,...,...
1030,City of London,51.515618,-0.091998,Crossfit Aldgate East,51.514626,-0.069102,Gym
1031,City of London,51.515618,-0.091998,Assam Place Gym,51.515643,-0.067872,Gym
1032,City of London,51.515618,-0.091998,Perseverance Works,51.527964,-0.077165,Gym
1033,City of London,51.515618,-0.091998,adidas studio LDN,51.521747,-0.071571,Pilates Studio


In [22]:
london_venues.groupby('Borough').count()


Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking,4,4,4,4,4,4
Barnet,6,6,6,6,6,6
Bexley,2,2,2,2,2,2
Brent,5,5,5,5,5,5
Bromley,11,11,11,11,11,11
Camden,49,49,49,49,49,49
City of London,100,100,100,100,100,100
Croydon,8,8,8,8,8,8
Ealing,12,12,12,12,12,12
Enfield,6,6,6,6,6,6


### 3.3. Explore the Venue Categories

In [23]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 28 uniques categories.


In [24]:
london_venues['Venue Category'].unique()

array(['Gym / Fitness Center', 'Gym', 'Yoga Studio', 'Gymnastics Gym',
       'Gym Pool', 'School', 'Martial Arts School', 'Pool', 'Track',
       'Pilates Studio', 'Buddhist Temple', 'Climbing Gym',
       'Chiropractor', 'Business Service', "Women's Store", 'Boxing Gym',
       'Dance Studio', 'Community Center', 'Cycle Studio', 'College Gym',
       'Athletics & Sports', 'Temple', 'Office', 'Spa', 'Cricket Ground',
       'Massage Studio', 'Soccer Field', 'Physical Therapist'],
      dtype=object)

### 3.4. Select Gym Venues

In [25]:
gym_categories = ['Gym / Fitness Center', 'Gym']

gym_data_raw = pd.DataFrame()

for cat in gym_categories:
    _select_ven = london_venues[london_venues['Venue Category'] == cat]
    gym_data_raw = gym_data_raw.append(_select_ven)
gym_data_raw = gym_data_raw.reset_index(drop=True)
gym_data_raw

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Camden,51.542305,-0.139560,PureGym,51.539250,-0.143077,Gym / Fitness Center
1,Camden,51.542305,-0.139560,Barry's Bootcamp,51.527075,-0.131056,Gym / Fitness Center
2,Camden,51.542305,-0.139560,PureGym,51.554052,-0.144984,Gym / Fitness Center
3,Camden,51.542305,-0.139560,Urban Kings,51.531300,-0.121950,Gym / Fitness Center
4,Camden,51.542305,-0.139560,The Armoury Hampstead,51.552551,-0.160132,Gym / Fitness Center
...,...,...,...,...,...,...,...
831,City of London,51.515618,-0.091998,OBSESSIVE GYM DISORDER,51.527809,-0.086541,Gym
832,City of London,51.515618,-0.091998,Crossfit Aldgate East,51.514626,-0.069102,Gym
833,City of London,51.515618,-0.091998,Assam Place Gym,51.515643,-0.067872,Gym
834,City of London,51.515618,-0.091998,Perseverance Works,51.527964,-0.077165,Gym


In [26]:
gym_data_raw.shape

(836, 7)

Delete duplicate, to get the unique venues:

In [27]:
gym_data = gym_data_raw.drop_duplicates(subset=["Venue", "Venue Latitude"]).reset_index(drop=True)

In [28]:
gym_data.shape

(534, 7)

Since the Deleted Duplicate can be the nearest borough, we try to find the nearest borough of all unique gym venues:

In [29]:
nearest_borough = []

In [30]:
for venue in range(len(gym_data.index)):
    jarak = list((gym_data['Venue Latitude'][venue] - london_bor['Latitude'])**2 + (gym_data['Venue Longitude'][venue] - london_bor['Longitude'])**2)
    nearest_borough.append(jarak.index(min(jarak)))

In [31]:
gym_data['Nearest Borough'] = list(london_bor['Borough'][nearest_borough])

In [32]:
gym_data

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Nearest Borough
0,Camden,51.542305,-0.139560,PureGym,51.539250,-0.143077,Gym / Fitness Center,Camden
1,Camden,51.542305,-0.139560,Barry's Bootcamp,51.527075,-0.131056,Gym / Fitness Center,Camden
2,Camden,51.542305,-0.139560,PureGym,51.554052,-0.144984,Gym / Fitness Center,Havering
3,Camden,51.542305,-0.139560,Urban Kings,51.531300,-0.121950,Gym / Fitness Center,Camden
4,Camden,51.542305,-0.139560,The Armoury Hampstead,51.552551,-0.160132,Gym / Fitness Center,Havering
...,...,...,...,...,...,...,...,...
529,Newham,51.530000,0.029318,Atherton Leisure Centre,51.544646,0.015263,Gym,Newham
530,Newham,51.530000,0.029318,East River Spa,51.516235,0.010779,Gym,Newham
531,Richmond upon Thames,51.440553,-0.307639,Blitz CrossFit,51.448928,-0.332068,Gym,Richmond upon Thames
532,Sutton,51.357464,-0.173627,Go-Gym,51.360355,-0.195039,Gym,Sutton


In [33]:
gym_data = gym_data.drop(columns=['Borough','Borough Latitude', 'Borough Longitude'])

In [34]:
gym_data = gym_data.rename(columns={'Nearest Borough':'Borough'})

In [35]:
gym_data

Unnamed: 0,Venue,Venue Latitude,Venue Longitude,Venue Category,Borough
0,PureGym,51.539250,-0.143077,Gym / Fitness Center,Camden
1,Barry's Bootcamp,51.527075,-0.131056,Gym / Fitness Center,Camden
2,PureGym,51.554052,-0.144984,Gym / Fitness Center,Havering
3,Urban Kings,51.531300,-0.121950,Gym / Fitness Center,Camden
4,The Armoury Hampstead,51.552551,-0.160132,Gym / Fitness Center,Havering
...,...,...,...,...,...
529,Atherton Leisure Centre,51.544646,0.015263,Gym,Newham
530,East River Spa,51.516235,0.010779,Gym,Newham
531,Blitz CrossFit,51.448928,-0.332068,Gym,Richmond upon Thames
532,Go-Gym,51.360355,-0.195039,Gym,Sutton


### 3.5. Visualize Gym Venues

In [36]:
address = 'London'

geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [37]:
# create map of London using latitude and longitude values
map_london_selected = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, Ven, Ven_cat, Borough in zip(gym_data['Venue Latitude'], 
                                           gym_data['Venue Longitude'], 
                                           gym_data['Venue'], 
                                           gym_data['Venue Category'], 
                                           gym_data['Borough']
                                          ):
    label = 'Venue: {}, Category: {}, Borough: {}'.format(Ven, Ven_cat, Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london_selected)  
    
map_london_selected

# 