# The battle of the neighbourhoods 

## Introduction

### Background & Problem description

BYO Ltd is a startup company, aiming to launch modern restaurants/bars/co-working spaces.Management is currently evaluating the options to open the first restaurant/bar/co-working space in Glasgow, Scotland. 

Restaurants/bars/co-working spaces opened by BYO Ltd. target professionals between 20 and 45. During the day the location serve as co-working/event and meeting space. In the evening, it turns into a restaurant and bar. Although the cocktails are excellent, BYO Ltd. offers clients with a paid subscription the possibility to bring their own drinks to the restaurant/bar. As part of the business model, (paying) customers can use the BYO App to inform staff about the intent to bring wine/spirits to the bar. BYO Ltd.'s staff will schedule a pick-up time and fetch the wine/spririts from the client's house (or any other location requested). This makes it easier to enjoy their favorite drink in their favorite restaurant, without the hassle of having to carry it. Further premium options are available on request, e.g. requesting BYO Ltd.'s staff to source a specific bottle of wine (or gin or vodka or... or... ).

BYO Ltd. takes great interest in their customers and the local community. Easy access, interesting environment and safety are some of the main concerns for the customers. 
At the same time, BYO Ltd. wants to help develop local communities and always aims to choose developing areas, rather than the well-developed (and more expensive) downtown / main business districts. 

The key question asked by BYO Ltd's management is : 
_Which of the location candidates in Glasgow is the best place for our venture?_

Based on past projects, some evaluation criteria were suggested by the CEO: 
- Parking spaces and/or public transport -> How easy is it for customers to reach us?
- Competitors in neighbourhood (co-working spaces, cocktail bars, higher-end restaurants) -> Is the market already saturated? 
- Other businesses (e.g. coffee shops, gyms, small restaurants) -> Is there potential for cooperation with other businesses? 
- Possible activities in surrounding area (guided tours, parks etc) -> Can customers (during the day) have additional activities to enhance their experience, bring their kids etc?
- Wine shops / Supermarkets / farmers' markets in the area -> Are there suppliers nearby?

(The above list is not complete.)

Field trips and requirements workshops, four initial location candidates were identified: 

- Òran Mór in the West End
- Park Circus
- Merchant City
- Craighton 

The CEO wants to know which of the location candidates could offer the best chance to successfully launch the new venture.

To ensure the business side is also 'represented', several workshops were held. Participants were business representatives, location managers, as well as other staff members. As a group, the follwing 'business rules' - or criteria a location candidate must fulfill - wered defined:
- 'Class 1' defines venues within 500 m around the location
- 'Class 2' defines venues within 1000m around the location

For each class, a set of four (4) business rules exists:
#### Business rules & criteria - Class 1

- Number of competitors       : [the smaller the better]
- Number of suppliers         : > 3
- Number of restaurants       : > 2 && < 10
- Number of parks & recreation: > 2

#### Business rules & criteria - Class 2
Here the business rules are easier and we focus only on :
- Number of competitors       : [the smaller the better]
- Number of suppliers         : [the higher the better]
- Number of restaurants       : [the higher the better]
- Number of parks & recreation: [the higher the better]

Location candidates will be ranked for each business rule (place 1 through 4). If a location candidate does not fulfill the criteria, the name is excluded from the ranking. 

## Target Audience

The main audience of this project is the board of BYO Ltd. 
The results, however, can also be useful to any company looking for a similar business opportunity in Glasgow and has a set of business rules and location candidates.

## Outcome and success criteria

Expected outcome is a data driven recommendation where in Glasgow to open the next BYO location. This needs to be based on facts and open to further exploration and evaluation through the board. 

Success is - in this context- defined as a solid recommendation which holds up to interrogation on any level. Criteria defined by the CEO need to be included in the model and weigh in on the recommendation.

The ideal location should have the optimal combination of the criteria listed above (aka fulfill the business rules)

# Data

In [37]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
import urllib.request
#beautiful soup
from bs4 import BeautifulSoup
import requests

### Location candidates

At first, let's fetch the lat/lon coordinates for all of the location candidates.

- Òran Mór in the West End [https://oran-mor.co.uk/]
- Park Circus [https://en.wikipedia.org/wiki/Park_District,_Glasgow]
- Merchant City [https://en.wikipedia.org/wiki/Merchant_City]
- Craighton [https://en.wikipedia.org/wiki/Craigton,_Glasgow]

In [38]:
add_oran_mor = 'G12 8QX, Glasgow, Scotland' 
add_park_circus = 'G3 6AX, Glasgow, Scotland' 
add_merch_city = 'G1 1NQ, Glasgow, Scotland' 
add_craighton = 'G41 5BW,Glasgow, Scotland'
location_candidates = {'oran_mor': {'address':add_oran_mor, 'lat':'','lon':'','name':'Òran Mór','color':'red'},
                        'park_circus': {'address':add_park_circus, 'lat':'','lon':'','name':'Park Circus','color':'green'},
                        'merchant_city': {'address':add_merch_city, 'lat':'','lon':'', 'name':'Merchant City','color':'blue'},
                        'craighton': {'address':add_craighton, 'lat':'','lon':'', 'name':'Craighton','color':'yellow'} 
                        }

The map needs a center_point, so I will use Gordon Street, right next to Glasgow Central Station.

In [39]:
geolocator = Nominatim(user_agent="gl_data_explorer")
for loc in location_candidates:
    address = location_candidates[loc]['address']
    _location = geolocator.geocode(address)
    location_candidates[loc]['lat'] = _location.latitude
    location_candidates[loc]['lon'] = _location.longitude
    
print("Loaded coordinates for all location candidates.")

central_point_addr = 'Gordon St, Glasgow G1 3SL, UK'
_location = geolocator.geocode(central_point_addr)
central_point = (_location.latitude, _location.longitude)
print('Coordinate of map center {}: {}'.format(central_point_addr, central_point))

Loaded coordinates for all location candidates.
Coordinate of map center Gordon St, Glasgow G1 3SL, UK: (55.8605571, -4.2569757)


In [40]:
for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]
    print('The geograpical coordinates of {} are {}, {}.'.format(name,latitude, longitude))

The geograpical coordinates of Òran Mór are 55.8775610893258, -4.2895424129414.
The geograpical coordinates of Park Circus are 55.8698479457035, -4.27857721843912.
The geograpical coordinates of Merchant City are 55.8587330614908, -4.24470424975926.
The geograpical coordinates of Craighton are 55.8473585303755, -4.3138599892371.


To provide some orientation, here's a map of Glasgow with the location candidates.
The center of the map, Gordon Street, is highlighted in yellow. For each of the location candidates, you can see the name if you click the marker with your mouse.

In [41]:
map_candidates = folium.Map(location=central_point, zoom_start=13)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.CircleMarker(central_point,radius=10,popup='Glasgow Central',color='yellow',fill=True).add_to(map_candidates)
    folium.Marker((latitude,longitude), popup=name).add_to(map_candidates)
    
map_candidates

In [73]:
# @hidden_cell
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=1000

In [74]:
import shapely.geometry
import pyproj
import math

# methods from example project (reuse is better than re-inventing the wheel)
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    
    print('\nCoordinate transformation check for',name)
    print('----------------------------------------------')
    print('\t{} longitude={}, latitude={}'.format(name,lon, lat))
    x, y = lonlat_to_xy(lon, lat)
    lo, la = xy_to_lonlat(x, y)
    print('\t{} UTM X={}, Y={}'.format(name,x, y))
    print('\t{} longitude={}, latitude={}'.format(name,lo, la))


Coordinate transformation check for Òran Mór
----------------------------------------------
	Òran Mór longitude=-4.2895424129414, latitude=55.8775610893258
	Òran Mór UTM X=-698031.2685291516, Y=6362012.91139814
	Òran Mór longitude=-4.28954241294139, latitude=55.8775610893258

Coordinate transformation check for Park Circus
----------------------------------------------
	Park Circus longitude=-4.27857721843912, latitude=55.8698479457035
	Park Circus UTM X=-697603.670315566, Y=6360979.559254846
	Park Circus longitude=-4.2785772184391195, latitude=55.869847945703484

Coordinate transformation check for Merchant City
----------------------------------------------
	Merchant City longitude=-4.24470424975926, latitude=55.8587330614908
	Merchant City UTM X=-695881.0832927255, Y=6359170.8527913885
	Merchant City longitude=-4.244704249759258, latitude=55.85873306149078

Coordinate transformation check for Craighton
----------------------------------------------
	Craighton longitude=-4.313859989

## Foursquare API to generate information
Now it's time to start using Foursquare's API and generate some information. 
below are lists of categories (competitors, shopping/supply chain, and environment). 
You can find (and crosscheck) the category IDs at [https://developer.foursquare.com/docs/resources/categories]

In [44]:
## competitors (bars, coworking_spaces)
bar = '4bf58dd8d48988d116941735'
nightlife_spot = '4d4b7105d754a06376d81259'
cocktail_bar = '4bf58dd8d48988d11e941735'
wine_bar = '4bf58dd8d48988d123941735'
champagne_bar = '52e81612bcbc57f1066b7a0e'
speakeasy = '4bf58dd8d48988d1d4941735'
whisky_bar = '4bf58dd8d48988d122941735'
pub = '4bf58dd8d48988d11b941735'
nightlife_categories = [nightlife_spot,bar, cocktail_bar, wine_bar, champagne_bar, speakeasy, whisky_bar]
gaming_cafe = '4bf58dd8d48988d18d941735'
internet_cafe = '4bf58dd8d48988d1f0941735'
coworking_spaces = ['4bf58dd8d48988d174941735',gaming_cafe, internet_cafe]
competitor_categories = [coworking_spaces, internet_cafe,gaming_cafe, bar, nightlife_spot, cocktail_bar, wine_bar, champagne_bar, speakeasy, whisky_bar]


## Food & restaurant categories
# Food is used as overall category - if the category name contains restaurant, it will be marked as restaurant
# Food, indian, italian, kebab
restaurant_categories = ['4d4b7105d754a06374d81259','4bf58dd8d48988d10f941735','4bf58dd8d48988d110941735', '5283c7b4e4b094cb91ec88d7']

## shopping / supermarkets
wine_shop = '4bf58dd8d48988d119951735'
liquor_store = '4bf58dd8d48988d186941735'
coffee_shop = '4bf58dd8d48988d1e0931735'
food_drink_shop = '4bf58dd8d48988d1f9941735'
supermarket = '52f2ab2ebcbc57f1066b8b46'
fruit_vegetable_store = '52f2ab2ebcbc57f1066b8b1c'
farmers_market = '4bf58dd8d48988d1fa941735'
shopping_categories = [wine_shop, liquor_store, coffee_shop, food_drink_shop,supermarket,fruit_vegetable_store,farmers_market]

## environment
parking = '4c38df4de52ce0d596b336e1'
park = '4bf58dd8d48988d163941735'
botanical_garden = '52e81612bcbc57f1066b7a22'
bus_station = '4bf58dd8d48988d1fe931735'
bus_stop = '52f2ab2ebcbc57f1066b8b4f'
train_station = '4bf58dd8d48988d129951735'
tram_station = '52f2ab2ebcbc57f1066b8b51'
fitness_center = '4bf58dd8d48988d175941735'
playground = '4bf58dd8d48988d1e7941735'
library = '4bf58dd8d48988d12f941735'
arts = '4d4b7104d754a06370d81259'
uni = '4d4b7105d754a06372d81259'
environment_categories = [parking, park,botanical_garden,arts,uni, bus_station, bus_stop, train_station, tram_station, fitness_center, playground,library]

In [75]:
all_search_cats = competitor_categories + restaurant_categories + shopping_categories + environment_categories
# ensure there are no duplicates
search_categories = []

for el in all_search_cats:
    if el not in search_categories:
        search_categories.append(el)
    else:
        print('Duplicate %s' %el)  # added this on purpose to do a small sense check ;) 
        
print("We are going to search in " + str(len(search_categories)) + " categories.")

We are going to search in 33 categories.


A small lookup to make the output look a bit nicer (to human eyes) while fetching the data

In [76]:
category_dict = {}
category_dict['4bf58dd8d48988d116941735'] = 'bar'
category_dict['4d4b7105d754a06376d81259'] = 'nightlife_spot'
category_dict['4bf58dd8d48988d11e941735'] = 'cocktail_bar'
category_dict['4bf58dd8d48988d123941735'] = 'wine_bar'
category_dict['52e81612bcbc57f1066b7a0e'] = 'champagne_bar'
category_dict['4bf58dd8d48988d1d4941735'] = 'speakeasy'
category_dict['4bf58dd8d48988d122941735'] = 'whisky_bar'
category_dict['4bf58dd8d48988d11b941735'] = 'pub'
category_dict['4bf58dd8d48988d174941735'] = 'coworking_space'
category_dict['4d4b7105d754a06374d81259'] = 'food'
category_dict['4bf58dd8d48988d1e0931735'] = 'coffee_shop'
category_dict['4bf58dd8d48988d119951735'] = 'wine_shop'
category_dict['4bf58dd8d48988d186941735'] = 'liquor_store'
category_dict['4bf58dd8d48988d1e0931735'] = 'coffee_shop'
category_dict['4bf58dd8d48988d1f9941735'] = 'food_drink_shop'
category_dict['52f2ab2ebcbc57f1066b8b46'] = 'supermarket'
category_dict['52f2ab2ebcbc57f1066b8b1c'] = 'fruit_vegetable_store'
category_dict['4bf58dd8d48988d1fa941735'] = 'farmers_market'
category_dict['4c38df4de52ce0d596b336e1'] = 'parking'
category_dict['4bf58dd8d48988d163941735'] = 'park'
category_dict['4bf58dd8d48988d1fe931735'] = 'bus_station'
category_dict['52f2ab2ebcbc57f1066b8b4f'] = 'bus_stop'
category_dict['4bf58dd8d48988d129951735'] = 'train_station'
category_dict['52f2ab2ebcbc57f1066b8b51'] = 'tram_station'
category_dict['4bf58dd8d48988d175941735'] = 'fitness_center'
category_dict['4bf58dd8d48988d1e7941735'] = 'playground'
category_dict['4bf58dd8d48988d12f941735'] = 'library'
category_dict['52e81612bcbc57f1066b7a22'] = "botanical garden"
category_dict['4d4b7105d754a06374d81259'] = 'food'
category_dict['4bf58dd8d48988d10f941735'] = 'Indian restaurant'
category_dict['4bf58dd8d48988d110941735'] = 'Italian restaurant'
category_dict['5283c7b4e4b094cb91ec88d7'] = 'Kebab restaurant'
category_dict['4d4b7104d754a06370d81259'] = 'arts&entertainment'
category_dict['4bf58dd8d48988d18d941735'] = 'gaming cafe'
category_dict['4bf58dd8d48988d1f0941735'] = 'internet cafe'

def get_cat_name(cat_id):
    if cat_id in category_dict.keys():
        return category_dict[cat_id]
    return 'NaN'

print("Test: \tExpected outcome for '4bf58dd8d48988d174941735' is 'coworking_space'")
result = get_cat_name('4bf58dd8d48988d174941735')
if result == "coworking_space":
    print("\tSUCCESS: test passed --> %s" % result)
else:
    print("ERROR: \ttest failed --> %s" % result)
    

Test: 	Expected outcome for '4bf58dd8d48988d174941735' is 'coworking_space'
	SUCCESS: test passed --> coworking_space


Below are just a few utility functions

In [91]:
import unidecode
def get_category_type(cat):
    category = ""
    for c in cat:
        cid = ""
        cid = c[1]
        # remove accents
        cname = unidecode.unidecode(c[0].replace('-',''))
        if cid == "":
            continue
        if cid in competitor_categories or 'coworking' in cname.lower():
            category = 'competitor'
        elif cid in shopping_categories:
            category = 'supply'
        elif cid in environment_categories: 
            category = 'environment'
        elif cid in restaurant_categories or 'restaurant' in cname.lower() or 'gastropub' in cname.lower() or 'joint' in cname.lower()  or 'cafe' in cname.lower() or 'pub' in cname.lower():
            category = 'restaurant'
        if category != "":
            break
    if category == "":
        category = "NaN"
    return category


print("Test: \tExpected outcome for '4bf58dd8d48988d174941735' (coworking_space) is 'competitor'")
result = get_category_type([('Coworking Space', '4bf58dd8d48988d174941735')])
if result == "competitor":
    print("\tSUCCESS: test passed --> %s" % result)
else:
    print("ERROR: \ttest failed --> %s" % result)

Test: 	Expected outcome for '4bf58dd8d48988d174941735' (coworking_space) is 'competitor'
	SUCCESS: test passed --> competitor


In [92]:
intent = "browse"

def get_categories(cats):
    return [(c['name'], c['id']) for c in cats]

def format_address(loc):
    return ', '.join(loc['formattedAddress'])

def search_venues(lat, lon, category, client_id, client_secret, radius=500, limit=1000):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&intent={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, category, intent,radius, limit)
    #try:
    results = requests.get(url).json()['response']['groups'][0]['items']
    venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    #except:
    #    print("No venues found")
    #    venues = []

    return venues  

# build dataframe
def build_df(venues):
    dataset = []
    cols = ['id','type','name','categories','lat','lon','address','distance','x','y']
    
    for loc in location_candidates:
        cols.append('dist_{}'.format(location_candidates[loc]['name']))

    ids = []
    for v in venues:
        v_id = v[0]
        ids.append(v_id)
        v_name = v[1]
        v_categories = v[2]
        v_latlon = v[3]
        v_address = v[4]
        v_distance = v[5]
        cat_type = get_category_type(v_categories)
        x,y = lonlat_to_xy(v_latlon[1],v_latlon[0])
        record = (v_id, cat_type, v_name, v_categories, v_latlon[0], v_latlon[1], v_address, v_distance, x, y,0,0,0,0)
        #print("Name: ",v_name," | Distance : ",v_distance," | cat type: ", cat_type)
        dataset.append(record)
    df = pd.DataFrame(dataset, columns=cols, index=ids)
    #df.set_index('id', inplace=True)
    return df

Let's define radius and limit.

In [49]:
radius = 6000
limit = 1000

### Coworking spaces

Accepting the realities, it is not only 'official coworking spaces' we need to look at. Rather, gaming and internet cafes need to be included, too. 

In [93]:
frames = []
print("searching for all coworking spaces, gaming & internet cafes")
for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    print('\t within {}m of {}'.format(radius,name))
    for space in coworking_spaces:
        venues = search_venues(lat, lon,space,CLIENT_ID, CLIENT_SECRET,radius=radius,limit=LIMIT)
        df = build_df(venues)
        frames.append(df)
df_coworking = pd.concat(frames,sort=True) 
#remove any duplicates
df_coworking.drop_duplicates(subset='id',inplace=True)


searching for all coworking spaces, gaming & internet cafes
	 within 6000m of Òran Mór
	 within 6000m of Park Circus
	 within 6000m of Merchant City
	 within 6000m of Craighton


In [97]:
df_coworking.head()

Unnamed: 0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,id,lat,lon,name,type,x,y
4b9608dbf964a520bdba34e3,"First Floor, 169 Elderslie Street, Glasgow, Gl...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1495,4b9608dbf964a520bdba34e3,55.866158,-4.276892,Alienation Digital,competitor,-697616.790985,6360548.0
51dea287498eb194932ea493,"Dawson Road (M8), Glasgow, Glasgow City, g4 9s...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1947,51dea287498eb194932ea493,55.876154,-4.258453,The Whisky Bond,competitor,-696174.233147,6361309.0
4c722e20376da09395c3a5c6,"144 Elliot St., Glasgow, Glasgow City, G3 8EX,...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2146,4c722e20376da09395c3a5c6,55.859261,-4.278724,Equator,competitor,-697946.129788,6359830.0
4f292591e4b083b610837939,"84 Miller Street, Glasgow, Glasgow City, G1 1D...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2314,4f292591e4b083b610837939,55.864299,-4.261004,Snook,competitor,-696703.378464,6360065.0
4f195f14e4b0b62f9b348789,"21 Tyndrum Street,, Glasgow, Glasgow City, G4 ...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2453,4f195f14e4b0b62f9b348789,55.869459,-4.253004,KURA Citipoint,competitor,-696051.556822,6360485.0


### Nightlife Spots

In [95]:
frames = []
print("Load nightlife spots")
for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    print('\t within {}m of {}'.format(radius,name))
    for space in nightlife_categories:
        venues = search_venues(lat, lon,space,CLIENT_ID, CLIENT_SECRET,radius=radius,limit=LIMIT)
        df = build_df(venues)
        frames.append(df)

df_nightlife = pd.concat(frames,sort=True)   

#remove any duplicates
df_nightlife.drop_duplicates(subset='id',inplace=True)

Load nightlife spots
	 within 6000m of Òran Mór
	 within 6000m of Park Circus
	 within 6000m of Merchant City
	 within 6000m of Craighton


In [98]:
df_nightlife.head()

Unnamed: 0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,id,lat,lon,name,type,x,y
4b58d626f964a520d56d28e3,"8-12 Ashton Ln. (Byres Rd.), Glasgow, Glasgow ...","[(Bar, 4bf58dd8d48988d116941735)]",0,0,0,0,378,4b58d626f964a520d56d28e3,55.874798,-4.293064,Ubiquitous Chip,competitor,-698333.714792,6361775.0
4be5f1282468c9281fce0043,"617 Great Western Road (Hillhead Street), Glas...","[(Bar, 4bf58dd8d48988d116941735)]",0,0,0,0,258,4be5f1282468c9281fce0043,55.876611,-4.285762,The Belle,competitor,-697830.020737,6361843.0
4bd91f5eefb2ef3b1666ec9c,"17 Vinicombe St., Glasgow, Glasgow City, G12 8...","[(Pub, 4bf58dd8d48988d11b941735)]",0,0,0,0,85,4bd91f5eefb2ef3b1666ec9c,55.877149,-4.290705,Hillhead Bookclub,restaurant,-698115.393254,6361989.0
4b47d429f964a520964026e3,"731-735 Great Western Rd. (Byres Rd.), Glasgow...","[(Gastropub, 4bf58dd8d48988d155941735)]",0,0,0,0,31,4b47d429f964a520964026e3,55.877835,-4.289663,Òran Mór,restaurant,-698029.985435,6362045.0
50489002e4b02a1c178f2de7,"445 Great Western Road (Caledonian Street), Gl...","[(Bar, 4bf58dd8d48988d116941735)]",0,0,0,0,646,50489002e4b02a1c178f2de7,55.874657,-4.280578,Inn Deep,competitor,-697574.520867,6361538.0


### Possible suppliers

In [99]:
frames = []
print("Load supplier")
for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    print('\t within {}m of {}'.format(radius,name))
    for space in shopping_categories:
        venues = search_venues(lat, lon,space,CLIENT_ID, CLIENT_SECRET,radius=radius,limit=limit)
        df = build_df(venues)
        frames.append(df)

df_suppliers = pd.concat(frames,sort=True)    
#remove any duplicates
df_suppliers.drop_duplicates(subset='id',inplace=True)

Load supplier
	 within 6000m of Òran Mór
	 within 6000m of Park Circus
	 within 6000m of Merchant City
	 within 6000m of Craighton


In [100]:
df_suppliers.head()

Unnamed: 0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,id,lat,lon,name,type,x,y
51e839ba498e2a68e9ed5e53,"449 Great Western Road, Glasgow, Glasgow City,...","[(Wine Shop, 4bf58dd8d48988d119951735)]",0,0,0,0,612,51e839ba498e2a68e9ed5e53,55.875069,-4.280804,Valhalla's Goat,supply,-697575.379286,6361587.0
4bd805435cf276b0e4539c00,"124-126 Byres Rd, Glasgow, Glasgow City, G12 8...","[(Wine Shop, 4bf58dd8d48988d119951735)]",0,0,0,0,593,4bd805435cf276b0e4539c00,55.873515,-4.295742,Peckham's,supply,-698538.004553,6361683.0
4eb4f8a693ad23656ff5349a,"449 Great Western Rd, Glasgow, Glasgow City, G...","[(Wine Shop, 4bf58dd8d48988d119951735)]",0,0,0,0,597,4eb4f8a693ad23656ff5349a,55.87513,-4.281009,Quel Vin,supply,-697586.023509,6361597.0
4d90eb529b3841bd0eaa445f,"89 Dumbarton Rd, Glasgow, Glasgow City, G11 6P...","[(Wine Shop, 4bf58dd8d48988d119951735)]",0,0,0,0,1001,4d90eb529b3841bd0eaa445f,55.869961,-4.298131,Majestic Wine,supply,-698796.127384,6361338.0
4bb5ed2e2ea195214ee4aa2f,"164A/165A Hyndland Rd, Glasgow, Glasgow City, ...","[(Wine Shop, 4bf58dd8d48988d119951735)]",0,0,0,0,1038,4bb5ed2e2ea195214ee4aa2f,55.876219,-4.306007,Oddbins,supply,-699080.496727,6362159.0


### Parks & Recreation

In [101]:
frames = []
print("Load parks & recreation")
for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    print('\t within {}m of {}'.format(radius,name))
    for space in environment_categories:
        venues = search_venues(lat, lon,space,CLIENT_ID, CLIENT_SECRET,radius=radius,limit=limit)
        df = build_df(venues)
        frames.append(df)
        #print('\t found {} entries'.format(df['id'].count()))

df_environment = pd.concat(frames,sort=True)    
#remove any duplicates
df_environment.drop_duplicates(subset='id',inplace=True)

Load parks & recreation
	 within 6000m of Òran Mór
	 within 6000m of Park Circus
	 within 6000m of Merchant City
	 within 6000m of Craighton


In [102]:
df_environment.head()

Unnamed: 0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,id,lat,lon,name,type,x,y
4b5d887af964a5206d6029e3,"Exhibition Way, Glasgow, Glasgow City, G3 8YW,...","[(Event Space, 4bf58dd8d48988d171941735)]",0,0,0,0,1898,4b5d887af964a5206d6029e3,55.860537,-4.287699,Scottish Event Campus (SEC),,-698455.069737,6360128.0
4b4489b4f964a5203cf625e3,"220 Buchanan St, Glasgow, Glasgow City, G1 2FF...","[(Shopping Mall, 4bf58dd8d48988d1fd941735)]",0,0,0,0,2793,4b4489b4f964a5203cf625e3,55.863262,-4.25279,Buchanan Galleries,,-696233.407465,6359807.0
4f193962e4b025b060c490bd,United Kingdom,"[(Parking, 4c38df4de52ce0d596b336e1)]",0,0,0,0,713,4f193962e4b025b060c490bd,55.874276,-4.279731,Kelvinbridge Subway Station Park and Ride,environment,-697534.732194,6361482.0
4ba0f550f964a520218a37e3,"1053 Great Western Rd, Glasgow, Glasgow City, ...","[(Hospital, 4bf58dd8d48988d196941735)]",0,0,0,0,1526,4ba0f550f964a520218a37e3,55.883075,-4.311919,Gartnavel General Hospital,,-699225.597637,6363010.0
4cd800f9a42b236aac3cfd08,"Elmbank Cr, Glasgow, G2 4PF, United Kingdom","[(Parking, 4c38df4de52ce0d596b336e1)]",0,0,0,0,1938,4cd800f9a42b236aac3cfd08,55.864587,-4.268835,Elmbank Crescent Car Park,environment,-697173.365454,6360235.0


### Restaurants

In [103]:
frames = []
print("Load restaurants")
for loc in location_candidates:
    lat = location_candidates[loc]['lat']
    lon = location_candidates[loc]['lon']
    name = location_candidates[loc]['name']
    print('\t within {}m of {}'.format(radius,name))
    for space in restaurant_categories:
        venues = search_venues(lat, lon,space,CLIENT_ID, CLIENT_SECRET,radius=radius,limit=limit)
        df = build_df(venues)
        frames.append(df)
        #print('\t found {} entries'.format(df['id'].count()))

df_restaurants = pd.concat(frames,sort=True)    
# let's remove any duplicates
df_restaurants.drop_duplicates(subset='id',inplace=True)

Load restaurants
	 within 6000m of Òran Mór
	 within 6000m of Park Circus
	 within 6000m of Merchant City
	 within 6000m of Craighton


In [104]:
df_restaurants.head()

Unnamed: 0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,id,lat,lon,name,type,x,y
5049b3b6e4b0f60ad68304f5,"Ruthven Lane (West End), Glasgow, Glasgow City...","[(Vietnamese Restaurant, 4bf58dd8d48988d14a941...",0,0,0,0,391,5049b3b6e4b0f60ad68304f5,55.875273,-4.294308,The Hanoi Bike Shop,restaurant,-698394.85074,6361848.0
52c18626498e804badc1c874,"497 Great Western Rd, Hillhead, Glasgow City, ...","[(Bakery, 4bf58dd8d48988d16a941735)]",0,0,0,0,514,52c18626498e804badc1c874,55.875505,-4.282161,Cottonrake Bakery,,-697644.668545,6361658.0
4b47d429f964a520964026e3,"731-735 Great Western Rd. (Byres Rd.), Glasgow...","[(Gastropub, 4bf58dd8d48988d155941735)]",0,0,0,0,31,4b47d429f964a520964026e3,55.877835,-4.289663,Òran Mór,restaurant,-698029.985435,6362045.0
4b742075f964a52017c92de3,"134 Byres Rd, Glasgow, Glasgow City, G12 8TD, ...","[(Café, 4bf58dd8d48988d16d941735)]",0,0,0,0,552,4b742075f964a52017c92de3,55.873805,-4.295313,Kember & Jones,restaurant,-698502.59506,6361707.0
4b75b213f964a520a31d2ee3,"205 Byres Rd., Glasgow, Glasgow City, G12 8TN,...","[(Italian Restaurant, 4bf58dd8d48988d110941735)]",0,0,0,0,482,4b75b213f964a520a31d2ee3,55.874278,-4.2946,Little Italy,restaurant,-698444.038252,6361745.0


#### Short overview

A first look at the totla numbers will be helpful. __Of course, not every nightlife spot and restaurant is a competitor!__

In fact, BYO Ltd. __wants__ to have other restaurants and nightlife spots in the area (think of the business rules!). 

But for the sake of simplicity, we will list them here under competitors anyway. 

In [120]:
print("Total number of potential competitors: ", df_coworking['id'].count()+df_nightlife['id'].count()+df_restaurants['id'].count())
print("\tTotal number of coworking spaces: ", df_coworking['id'].count())
print("\tTotal number of nightlife spots: ", df_nightlife['id'].count())
print("\tTotal number of restaurants: ", df_nightlife['id'].count())
print("Total number of potential suppliers: ", df_suppliers['id'].count())
print("Total number of parks & recreation venues: ", df_environment['id'].count())

Total number of potential competitors:  498
	Total number of coworking spaces:  18
	Total number of nightlife spots:  192
	Total number of restaurants:  192
Total number of potential suppliers:  282
Total number of parks & recreation venues:  546


Now that we have all the relevant data, let's map it out and see how it looks on the map.

In [121]:
frames = [df_coworking, df_suppliers,df_nightlife,df_environment,df_restaurants]
df_all = pd.concat(frames, sort=True)
df_all.set_index('id',inplace=True)
#remove any duplicates
#df_all.drop_duplicates(subset='id',inplace=True)
df_all.head()

Unnamed: 0_level_0,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,lat,lon,name,type,x,y
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
4b9608dbf964a520bdba34e3,"First Floor, 169 Elderslie Street, Glasgow, Gl...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1495,55.866158,-4.276892,Alienation Digital,competitor,-697616.790985,6360548.0
51dea287498eb194932ea493,"Dawson Road (M8), Glasgow, Glasgow City, g4 9s...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1947,55.876154,-4.258453,The Whisky Bond,competitor,-696174.233147,6361309.0
4c722e20376da09395c3a5c6,"144 Elliot St., Glasgow, Glasgow City, G3 8EX,...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2146,55.859261,-4.278724,Equator,competitor,-697946.129788,6359830.0
4f292591e4b083b610837939,"84 Miller Street, Glasgow, Glasgow City, G1 1D...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2314,55.864299,-4.261004,Snook,competitor,-696703.378464,6360065.0
4f195f14e4b0b62f9b348789,"21 Tyndrum Street,, Glasgow, Glasgow City, G4 ...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2453,55.869459,-4.253004,KURA Citipoint,competitor,-696051.556822,6360485.0


In [122]:
def format_category(cat):
    s = ">>"
    for c in cat:
        s = s + str(c[0]) + ","
        
    return s

In [123]:
# save data to local drive
file_coworking = 'data/gla_coworking.csv'
file_restaurants = 'data/gla_restaurants.csv'
file_suppliers = 'data/gla_suppliers.csv'
file_environment = 'data/gla_environment.csv'
file_nightlife = 'data/gla_nigtlife.csv'
try:
    df_coworking.to_csv(file_coworking,sep=';',header=True)
    df_suppliers.to_csv(file_suppliers,sep=';',header=True)
    df_environment.to_csv(file_environment,sep=';',header=True)
    df_nightlife.to_csv(file_nightlife,sep=';',header=True)
    df_restaurants.to_csv(file_restaurants,sep=';',header=True)
    print("saved data to local drive")
except:
    print("Could not save files")
    pass

saved data to local drive


In [124]:
color_competitor = 'red'
color_nightlife = 'white'
color_environment = 'green'
color_restaurant = 'gray'
color_default = 'lightgray'
color_supplier = 'purple'

def get_color_for_type(typeStr,index):
    if typeStr == "competitor":
        color = color_competitor
        fill = True
        if index in df_nightlife.index:
            color = color_nightlife
            
    elif typeStr == "supplier":
        color = color_supplier
    elif typeStr == "environment":
        color = color_environment
    elif typeStr == "restaurant":
        color = color_restaurant
        fill = True
    else:
        color = color_default  
    
    return color

Now that we have all the data available, let's just map it out and take a first look. 

In [116]:
map_glasgow_data = folium.Map(location=central_point, zoom_start=11, 
                             tiles='CartoDB dark_matter')
folium.CircleMarker(central_point,radius=10,popup='Glasgow Central',color='yellow',fill=True).add_to(map_glasgow_data)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_glasgow_data)
    
for index, row in df_all.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, tooltip=popup_txt,color=color, fill=fill).add_to(map_glasgow_data)

map_glasgow_data

While this is a very colourful display, yet it does not help a lot with the analysis.
Let's break it down a bit further and look at the details first.

## Map of coworking spaces and nightlife spots

In [125]:
map_gla_competitors = folium.Map(location=central_point, zoom_start=12, 
                             tiles='CartoDB dark_matter')
folium.CircleMarker(central_point,radius=10,popup='Glasgow Central',color='yellow',fill=True).add_to(map_gla_competitors)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_gla_competitors)
    
frames = [df_coworking,df_nightlife]
df_competitors = pd.concat(frames,sort=True)
df_competitors.set_index('id',inplace=True)
for index, row in df_competitors.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
        
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, 
                         tooltip=popup_txt,color=color, fill=fill).add_to(map_gla_competitors)

    
map_gla_competitors

The above map shows nightlife spots and coworking spaces across central Glasgow. As indicated, we don't want to be in the absolute hotspot (too much competition). However, our location should also not be too far out. 

Let's look at a heatmap for nightlife spots and coworking spaces. Maybe that draws a clearer picture.

## Heatmaps for coworking spaces

Together with the our stakeholders and subject matter experts, we defined the 'location sweetspots' (see business rules above):

##### Coworking spaces
- *Class 1*: a place where there is no _coworking space_ within 500m and the heatmap does shows nothing or green
- *Class 2*: a place where there is no _coworking space_ within 1000m and the heatmap shows yellow or red



##### Heatmap coworking space

In [130]:
dist_class1 = int(500)
dist_class2 = int(1000)

In [131]:
co_data = []
for index,row in df_coworking.iterrows():
    co_data.append([row['lat'],row['lon']])

from folium import plugins
from folium.plugins import HeatMap

map_gla = folium.Map(location=central_point, zoom_start=11, 
                             tiles='CartoDB dark_matter')
folium.Marker(central_point, popup='Glasgow Central').add_to(map_gla)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_gla)
    folium.Circle((latitude,longitude), radius=dist_class1,tooltip='Class 1: {}'.format(dist_class1), fill=False, color='green',opacity=0.7).add_to(map_gla)
    folium.Circle((latitude,longitude), radius=dist_class2,tooltip='Class 2: {}'.format(dist_class2), fill=False, color='yellow').add_to(map_gla)

for index, row in df_coworking.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
        
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, 
                         tooltip=popup_txt,color=color, fill=fill).add_to(map_gla)
    
HeatMap(co_data).add_to(map_gla)
#folium.Marker(central_point).add_to(map_gla)


map_gla    

This map is more informative and helps to understand the distribution of coworking spaces much better. 



##### Heatmap Nightlife

Of course we want to be in an area with *some* nightlife. However, if there is too much competition, the challenge would be much bigger to establish our 'brand'. Therefore, after many - and long discussions - the decision was made to use the heatmap for nightlife spots as indicator:
- *Class 1* : green to yellow
- *Class 2* : red

This simplified approach allows us to apply the same concept - quickly, effectively and easily - to other potenial cities.

In [134]:
nl_data = []
for index,row in df_nightlife.iterrows():
    co_data.append([row['lat'],row['lon']])

from folium import plugins
from folium.plugins import HeatMap

map_gla_nl = folium.Map(location=central_point, zoom_start=12.2, 
                             tiles='CartoDB dark_matter')
folium.Marker(central_point, popup='Glasgow Central').add_to(map_gla_nl)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_gla_nl)
    folium.Circle((latitude,longitude), radius=dist_class1,tooltip='Class 1: {}'.format(dist_class1), fill=False, color='green',opacity=0.7).add_to(map_gla_nl)
    folium.Circle((latitude,longitude), radius=dist_class2,tooltip='Class 2: {}'.format(dist_class2), fill=False, color='yellow').add_to(map_gla_nl)

for index, row in df_nightlife.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
        
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, 
                         tooltip=popup_txt,color=color, fill=fill).add_to(map_gla_nl)
    
HeatMap(co_data).add_to(map_gla_nl)

map_gla_nl

Nightlife in Glasgow is centered downtown.

##### Heatmap Parks & Recreation

Let's look at the parks & recreation sector. During the day customers will want to have a chance to take break and walk through the park. 

In [140]:
env_data = []
frames = [df_environment]
df_pr = pd.concat(frames, sort=True)
df_pr.set_index('id',inplace=True)
for index,row in df_environment.iterrows():
    env_data.append([row['lat'],row['lon']])


map_gla_env = folium.Map(location=central_point, zoom_start=12.2, 
                             tiles='CartoDB dark_matter')
folium.Marker(central_point, popup='Glasgow Central').add_to(map_gla_env)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_gla_env)
    folium.Circle((latitude,longitude), radius=dist_class1,tooltip='Class 1: {}'.format(dist_class1), fill=False, color='green',opacity=0.7).add_to(map_gla_env)
    folium.Circle((latitude,longitude), radius=dist_class2,tooltip='Class 2: {}'.format(dist_class2), fill=False, color='yellow').add_to(map_gla_env)

for index, row in df_pr.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
        
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, 
                         tooltip=popup_txt,color=color, fill=fill).add_to(map_gla_env)
    
HeatMap(co_data).add_to(map_gla_env)

map_gla_env

#### Heatmap Restaurants

Culinary delights are offered by BYO Ltd.'s own kitchen. However, we understand that not everything a customer may want can be offered. Rather than tryingto satisfy every possible request, our team focusses on a small menu excecuted perfectly. 

We allow customers (during the day) to pick up something from near restaurants. 
As stated in the business rules, location candidates should have other restaurants in the close vicinity (class 1/class 2). 

Now, let's look at the reataurant distribution in Glasgow. 

In [142]:
res_data = []
df_res = df_restaurants
for index,row in df_restaurants.iterrows():
    res_data.append([row['lat'],row['lon']])


map_gla_res = folium.Map(location=central_point, zoom_start=12.2, 
                             tiles='CartoDB dark_matter')
folium.Marker(central_point, popup='Glasgow Central').add_to(map_gla_res)

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    label = location_candidates[loc]["name"]+" "+location_candidates[loc]["address"]
    color = location_candidates[loc]["color"]
    folium.Marker((latitude,longitude), popup=name).add_to(map_gla_res)
    folium.Circle((latitude,longitude), radius=dist_class1,tooltip='Class 1: {}'.format(dist_class1), fill=False, color='green',opacity=0.7).add_to(map_gla_res)
    folium.Circle((latitude,longitude), radius=dist_class2,tooltip='Class 2: {}'.format(dist_class2), fill=False, color='yellow').add_to(map_gla_res)

for index, row in df_res.iterrows():
    fill = False
    color = get_color_for_type(row['type'], index)
        
    popup_txt = str(row["name"])+" | type:"+str(row["type"]) + " || " + format_category(row["categories"])
    folium.Circle([row["lat"], row["lon"]], radius=3, 
                         tooltip=popup_txt,color=color, fill=fill).add_to(map_gla_res)
    
HeatMap(co_data).add_to(map_gla_res)

map_gla_res

Glasgow is a city that likes to eat. Most location candidates, apart from Craighton, seem to have a good selection of restaurants within class 1/class 2. 

In [144]:
df_all.reset_index(inplace=True)

In [145]:
df_all.head()

Unnamed: 0,index,id,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,lat,lon,name,type,x,y
0,0,4b9608dbf964a520bdba34e3,"First Floor, 169 Elderslie Street, Glasgow, Gl...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1495,55.866158,-4.276892,Alienation Digital,competitor,-697616.790985,6360548.0
1,1,51dea287498eb194932ea493,"Dawson Road (M8), Glasgow, Glasgow City, g4 9s...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,1947,55.876154,-4.258453,The Whisky Bond,competitor,-696174.233147,6361309.0
2,2,4c722e20376da09395c3a5c6,"144 Elliot St., Glasgow, Glasgow City, G3 8EX,...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2146,55.859261,-4.278724,Equator,competitor,-697946.129788,6359830.0
3,3,4f292591e4b083b610837939,"84 Miller Street, Glasgow, Glasgow City, G1 1D...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2314,55.864299,-4.261004,Snook,competitor,-696703.378464,6360065.0
4,4,4f195f14e4b0b62f9b348789,"21 Tyndrum Street,, Glasgow, Glasgow City, G4 ...","[(Coworking Space, 4bf58dd8d48988d174941735)]",0,0,0,0,2453,55.869459,-4.253004,KURA Citipoint,competitor,-696051.556822,6360485.0


Now let's calculate the distance from each candidate location.

In [146]:
# Calculate the distance from each candidate
for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]
    loc_x, loc_y = lonlat_to_xy(latitude,longitude)
    for index, row in df_all.iterrows():
        x,y = lonlat_to_xy(row['lat'], row['lon'])
        distance = calc_xy_distance(loc_x, loc_y, x, y)
        df_all.loc[index,'dist_{}'.format(name)] = distance
        

In [147]:
df_all.head()

Unnamed: 0,index,id,address,categories,dist_Craighton,dist_Merchant City,dist_Park Circus,dist_Òran Mór,distance,lat,lon,name,type,x,y
0,0,4b9608dbf964a520bdba34e3,"First Floor, 169 Elderslie Street, Glasgow, Gl...","[(Coworking Space, 4bf58dd8d48988d174941735)]",6068.3,4830.88,595.17,2495.0,1495,55.866158,-4.276892,Alienation Digital,competitor,-697616.790985,6360548.0
1,1,51dea287498eb194932ea493,"Dawson Road (M8), Glasgow, Glasgow City, g4 9s...","[(Coworking Space, 4bf58dd8d48988d174941735)]",9137.47,3253.09,3085.14,4551.44,1947,55.876154,-4.258453,The Whisky Bond,competitor,-696174.233147,6361309.0
2,2,4c722e20376da09395c3a5c6,"144 Elliot St., Glasgow, Glasgow City, G3 8EX,...","[(Coworking Space, 4bf58dd8d48988d174941735)]",5425.49,4974.65,1554.15,3117.51,2146,55.859261,-4.278724,Equator,competitor,-697946.129788,6359830.0
3,3,4f292591e4b083b610837939,"84 Miller Street, Glasgow, Glasgow City, G1 1D...","[(Coworking Space, 4bf58dd8d48988d174941735)]",8117.55,2519.5,2695.63,4605.05,2314,55.864299,-4.261004,Snook,competitor,-696703.378464,6360065.0
4,4,4f195f14e4b0b62f9b348789,"21 Tyndrum Street,, Glasgow, Glasgow City, G4 ...","[(Coworking Space, 4bf58dd8d48988d174941735)]",9470.36,1987.96,3740.11,5474.22,2453,55.869459,-4.253004,KURA Citipoint,competitor,-696051.556822,6360485.0


# Methodology

Goal of this analysis is to evaluate the four given location candidates for suiatability to open a BYO Ltd. venue. 

Several business rules and criteria were defined, in cooperation with key stakeholders and subject matter experts. The ideal candidate location will match all of the criteria defined in the business rules. If this is not possible, the next best candidate will be identified. 

At first, I collected required data (location, type of venue) around the candidate locations. The venues were further categorized into the following groups:
- competitor
- supplier
- restaurant
- parks & recreation

Then I looked at the distribution of the venues, using heatmaps, to get a quick visual overview of the potential locations and their suitability. Focus here is to understand the density of venues in the class1/class2 areas around each location.

In the last step, I looked at each of the candidates in more detail, focussing on the distribution of the categorized venues per class. Business rules were applied to build the final ranking of candidate locations. 



#### Business rules & criteria - Class 1

- Number of competitors       : [the smaller the better]
- Number of suppliers         : > 3
- Number of restaurants       : > 2 && < 10
- Number of parks & recreation: > 2

#### Business rules & criteria - Class 2
Here the business rules are easier and we focus only on :
- Number of competitors       : [the smaller the better]
- Number of suppliers         : [the higher the better]
- Number of restaurants       : [the higher the better]
- Number of parks & recreation: [the higher the better]

# Analysis

Now I will perform some exploratory data analysis and apply the business rules.
Let's look at how many venues are in class 1 and class2 around each of the location.

In [148]:
for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    df_class1 = df_all[df_all['dist_{}'.format(name)] <= dist_class1]
    df_class2 = df_all[(df_all['dist_{}'.format(name)] > dist_class1) & (df_all['dist_{}'.format(name)] <= dist_class2)]

    # let's remove any duplicates
    df_class1.drop_duplicates(subset='id',inplace=True)
    df_class2.drop_duplicates(subset='id',inplace=True)
    
    print("Looking at the whole dataset for {}".format(name))
    print("\t-there are  {} items in class 1 \t(within {}m around location)".format(df_class1['id'].count(),dist_class1))
    print("\t-there are  {} items in class 2 \t(within {}m around location)".format(df_class2['id'].count(),dist_class2))
    print('----------------------------------------------------------')


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  # This is added back by InteractiveShellApp.init_path()


Looking at the whole dataset for Òran Mór
	-there are  19 items in class 1 	(within 500m around location)
	-there are  59 items in class 2 	(within 1000m around location)
----------------------------------------------------------
Looking at the whole dataset for Park Circus
	-there are  14 items in class 1 	(within 500m around location)
	-there are  50 items in class 2 	(within 1000m around location)
----------------------------------------------------------
Looking at the whole dataset for Merchant City
	-there are  68 items in class 1 	(within 500m around location)
	-there are  78 items in class 2 	(within 1000m around location)
----------------------------------------------------------
Looking at the whole dataset for Craighton
	-there are  2 items in class 1 	(within 500m around location)
	-there are  3 items in class 2 	(within 1000m around location)
----------------------------------------------------------


In [149]:
# set show_output = True to show details below
show_output = False

Let's break it down a bit more and look at the details of each location candidate. 

In [150]:
results = {}

for loc in location_candidates:
    name = location_candidates[loc]["name"]
    latitude = location_candidates[loc]["lat"]
    longitude = location_candidates[loc]["lon"]

    df_class1 = df_all[df_all['dist_{}'.format(name)] <= dist_class1]
    df_class2 = df_all[(df_all['dist_{}'.format(name)] > dist_class1) & (df_all['dist_{}'.format(name)] <= dist_class2)]

    # let's remove any duplicates
    df_class1.drop_duplicates(subset='id',inplace=True)
    df_class2.drop_duplicates(subset='id',inplace=True)
    
    df_class1_competitors = df_class1[df_class1['type']=='competitor']
    df_class1_supply = df_class1[df_class1['type']=='supply']
    df_class1_restaurants = df_class1[df_class1['type']=='restaurant']
    df_class1_env = df_class1[(df_class1['type']=='environment') | (df_class1['type']=='NaN')]

    df_class2_competitors = df_class2[df_class2['type']=='competitor']
    df_class2_supply = df_class2[df_class2['type']=='supply']
    df_class2_restaurants = df_class2[df_class2['type']=='restaurant']
    df_class2_env = df_class2[(df_class2['type']=='environment') | (df_class2['type']=='NaN')]

    num_class1_competitors = df_class1_competitors['id'].count()
    num_class1_supply = df_class1_supply['id'].count()
    num_class1_restaurants = df_class1_restaurants['id'].count()
    num_class1_env = df_class1_env['id'].count()
    
    num_class2_competitors = df_class2_competitors['id'].count()
    num_class2_supply = df_class2_supply['id'].count()
    num_class2_restaurants = df_class2_restaurants['id'].count()
    num_class2_env = df_class2_env['id'].count()
    results[loc] = {
                    'name': name,
                    'num_class1_competitors':num_class1_competitors,
                    'num_class1_supply':num_class1_supply,
                    'num_class1_restaurants': num_class1_restaurants,
                    'num_class1_env':num_class1_env,
                    'num_class2_competitors':num_class2_competitors,
                    'num_class2_supply':num_class2_supply,
                    'num_class2_restaurants': num_class2_restaurants,
                    'num_class2_env':num_class2_env
                    }
    if show_output:
        print('Name: {}'.format(name))
        print("Class 1 Stats for {}".format(name))
        print("\t competitors: {}".format(num_class1_competitors))
        print("\t supply: {}".format(num_class1_supply))
        print("\t restaurants: {}".format(num_class1_restaurants))
        print("\t parks & recreation: {}".format(num_class1_env))
        print("Class 2 Stats for {}".format(name))
        print("\t competitors: {}".format(num_class2_competitors))
        print("\t supply: {}".format(num_class2_supply))
        print("\t restaurants: {}".format(num_class2_restaurants))
        print("\t parks & recreation: {}".format(num_class2_env))
        print("------------------------------------------------------------")
    

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]


Now we have the results for each of the candidates. Let's apply the business rules and see which location candidate comes out on top.

#### Apply business rules

In [151]:
# hilo:         True  -> highest to lowest
#               False -> lowest to highest
# excludeZero : True  -> entries with 0 EXCLUDED
# .             False -> entries with 0 INCLUDED
def get_ranking(candidates,prop,hilo=True,excludeZero=False):
    ranking = []
    data = {}
    for c in candidates:
        #print("{}=>{}".format(c,candidates[c][prop]))
        if excludeZero and candidates[c][prop]==0:
            continue
            
        data[c] = candidates[c][prop]
    
    ranking = sorted(data, key=data.get, reverse=hilo)
    return {k: v for v, k in enumerate(ranking)}

def get_ranking_between(candidates, prop,minVal,maxVal):
    ranking = []
    for c in candidates:
        #print("{}=>{}".format(c,candidates[c][prop]))
        if maxVal == 0:
            if candidates[c][prop] >= minVal:
                ranking.append(c)
        else:    
            if candidates[c][prop] >= minVal and candidates[c][prop] <= maxVal:
                ranking.append(c)
    return {k: v for v, k in enumerate(ranking)}
    

            

In [152]:
ranking = {"class1":{"br1":'','br2':'','br3':'','br4':''},
           "class2":{"br1":'','br2':'','br3':'','br4':''}
          }
# smallest number of competitors in class 1
ranking['class1']['br1'] = get_ranking(results,prop='num_class1_competitors',hilo=False,excludeZero=True)

# biggest number of suppliers in class 1
ranking['class1']['br2'] = get_ranking(results,prop='num_class1_supply',hilo=True,excludeZero=False)

# number of restaurants in class 1 between >=2 && <=10
ranking['class1']['br3'] = get_ranking_between(results,prop='num_class1_restaurants',minVal=2,maxVal=10)    

#  number of P&R in class 1 > 2
ranking['class1']['br4'] = get_ranking_between(results,prop='num_class1_env',minVal=2,maxVal=0)    

# For Class 2, the business rules are easier
ranking['class2']['br1'] = get_ranking(results,prop='num_class2_competitors',hilo=False,excludeZero=True)
ranking['class2']['br2'] = get_ranking(results,prop='num_class2_supply',hilo=True,excludeZero=False)
ranking['class2']['br3'] = get_ranking(results,prop='num_class2_restaurants',hilo=True,excludeZero=False)
ranking['class2']['br4'] = get_ranking(results,prop='num_class2_env',hilo=True,excludeZero=False)

Below I show the final ranking (based on business rules). 


__Please keep in mind 'NaN' is used when no location candidate could satisfy the business rule__

In [153]:
from prettytable import PrettyTable
cols = ['Class', 'Business Rule','#1','#2','#3','#4']
t = PrettyTable(cols)

for cname,y in ranking.items():
    for business_rule,rows in ranking['class1'].items():
        #print("Business Rule: {}".format(business_rule.replace('br','')))
        foo = [cname,business_rule]
        for i,r in rows.items():
            #print("#{} {}".format(r+1,location_candidates[i]['name']), end="\t")
            foo.append(location_candidates[i]['name'])
        #print("\n")
        if len(foo) <= len(cols)-1:
            while len(foo) <= len(cols)-1:
                foo.append('NaN')
        t.add_row(foo)
    
print(t)

+--------+---------------+---------------+-------------+---------------+-----------+
| Class  | Business Rule |       #1      |      #2     |       #3      |     #4    |
+--------+---------------+---------------+-------------+---------------+-----------+
| class1 |      br1      |    Òran Mór   | Park Circus | Merchant City |    NaN    |
| class1 |      br2      | Merchant City |   Òran Mór  |  Park Circus  | Craighton |
| class1 |      br3      |  Park Circus  |     NaN     |      NaN      |    NaN    |
| class1 |      br4      |    Òran Mór   | Park Circus | Merchant City |    NaN    |
| class2 |      br1      |    Òran Mór   | Park Circus | Merchant City |    NaN    |
| class2 |      br2      | Merchant City |   Òran Mór  |  Park Circus  | Craighton |
| class2 |      br3      |  Park Circus  |     NaN     |      NaN      |    NaN    |
| class2 |      br4      |    Òran Mór   | Park Circus | Merchant City |    NaN    |
+--------+---------------+---------------+-------------+---------

### Summary 
With this the analysis is concluded. 

Based on the given criteria and business rules, Òran Mór  is the best location candidate. 
The location fullfills two (2) of the class 1 business rules and two(2) of the class 2 business rules. 

The next best location is Park Circus, fullfilling one (1) class 1 busines rule and one clas 2 business rule. Also, Park Circus comes in second place for a total of 4 business rules (two class 1 and two class 2). 

Merchant City ranks in third place (one business rule class 1 and 4 class 2 business rules), while Craighton comes in last. 

__Therefore the recommendation is to look for a venue close to Òran Mór.__

# Results / Discussion

The analysis shows each of the location candidates have their own merits, ranging from nearly no competition/supply/restaurants in the area (Craighton) to the opposite (Merchant Square). Applying the business rules made certain the criteria identified as 'vital' by the business owners were included in the analysis. 

For all location candidates, heatmaps show the density and distribution of venues for the selected categories, which should prove helpful in discussions with the business side. 

Displaying the 'class1' and 'class 2' areas per category on the maps makes it easier to (visually)  comprehend the environment of venues (restaurants, competitors, supply etc). 

Local subject matter experts and realtors now need to be included to identify available properties around the winning locations. Also, additional information, e.g. socio-economic factors, city building plans, crime statistics etc, should be included in a more in depth analysis, to reach a final decision. 

# Conclusion

The defined task was to identify the most suitable of the four location candidates, and provide supporting data, to empower stakeholders to make a data driven decision. 

Pre-defined business rules (aka criteria the location needs to fulfill) needed to be applied and the results evaluated. 

Òran Mór proved to be the best location, within the frameset of this analysis.
A final decision can be made by the board of BYO Ltd., in cooperation with local subject matter experts, once available properties around Òran Mór are identified and the financial aspect is evaluated. 