# Capstone Project - Donut Venue Location Analysis
### Applied Data Science Capstone by IBM/Coursera

## Table of Contents

* [Introduction](#introduction)
* [Neighbourhood Candidates](#candidates)
* [Foursquare](#foursquare)
* [Methodology](#methodology)
* [Result](#result)

## Introduction <a name="introduction"></a>
Analysis will be conducted on the location of the current market of donut in Glasgow city centre, possibly to choose an optimal location where there are not already too many competing establishments. There will be a preference for locations nearest the Glasgow city centre area, after the first condition have been fulfilled.
This project will generate a map and suggest some promising locations based on market research.

In [1]:
#imports
import numpy as np
import pandas as pd
import folium
import json
import requests

In [2]:
#find coordinates for Glasgow City Centre with Nominatim
from geopy.geocoders import Nominatim
address = 'Glasgow'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Glasgow are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Glasgow are 55.8609825, -4.2488787.


In [294]:
#set coordinates to correct values as failsafe when Nominatim times out
#latitude = 55.8609825
#longitude=-4.2488787
#centre = [55.8609825,-4.2488787]

## Neighbourhood Candidates <a name="candidates"></a>
To create a grid of neighbourhoods, equally spaced around the city centre, these functions accurately calculate distances needed to create a grid of locations in Cartesian 2D coordinate system. This allows distances to be calculated in metres instead of lat/lon degrees. These coordinates are then projected back to lat/lon degress to be shown on a Folium map. The functions below convert between WGS84 spherical coordinate system and UTM Cartesian coordinate system.

In [295]:
#!pip install shapely
import shapely

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

#test functions by entering glasgow coordinates and converting twice

print('Coordinate transformation check')
print('-------------------------------')
print('Glasgow centre longitude={}, latitude={}'.format(centre[1], centre[0]))
x, y = lonlat_to_xy(centre[1], centre[0])
print('Glasgow centre UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Glasgow centre longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Glasgow centre longitude=-4.2488787, latitude=55.8609825
Glasgow centre UTM X=-696065.7893588608, Y=6359489.415378951
Glasgow centre longitude=-4.248878699999992, latitude=55.86098249999999


This creates a grid of cells, offset every other row, adjusted vertical row spacing so that every cell centre is equally distant from its neighbours.

In [296]:
centre_x, centre_y = lonlat_to_xy(centre[1], centre[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = centre_x - 6000
x_step = 400
y_min = centre_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 400 * k 

latitudes = []
longitudes = []
distances_from_centre = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 200 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_centre = calc_xy_distance(centre_x, centre_y, x, y)
        if (distance_from_centre <= 1501):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_centre.append(distance_from_centre)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centres generated.')

48 candidate neighborhood centres generated.


Visualise these neighbourhoods with Folium

In [297]:
map_glasgow = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_glasgow)
for lat, lon in zip(latitudes, longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_glasgow) 
    folium.Circle([lat, lon], radius=200, color='blue', fill=False).add_to(map_glasgow)
    folium.Marker([lat, lon]).add_to(map_glasgow)
map_glasgow

Dataframe containing centre coordinates of each neighbourhood with distances to city centre

In [177]:
d = {'Latitude': latitudes, 'Longitude': longitudes, 'Distances':distances_from_centre}
neighbourhoods = pd.DataFrame(data=d)
neighbourhoods

Unnamed: 0,Latitude,Longitude,Distances
0,55.848738,-4.245856,1400.0
1,55.849719,-4.239824,1400.0
2,55.850206,-4.256413,1311.487705
3,55.851187,-4.250381,1113.552873
4,55.852167,-4.244349,1039.230485
5,55.853147,-4.238317,1113.552873
6,55.854127,-4.232285,1311.487705
7,55.852653,-4.260939,1216.552506
8,55.853635,-4.254907,916.515139
9,55.854615,-4.248875,721.110255


In [178]:
lat=latitudes
lng=longitudes

# Foursquare <a name="foursquare"></a>
The main challenge when building the Foursquare API query is providing the right categories so the query returns specifically venues which would compete with this donut business. The categories can be found in the documentation. Initially, all possible donut selling establishments were considered, this was found via the Foursquare app on Android using the search query “Donut Shop” in Glasgow. This returned donut shops, bakeries, cafes, sandwich shops as they all may contain donuts in their menu. After this query was made, the returned venues often did not sell donuts. When queried with only donut shops, the data set would be too small and did not include all venues which sold donuts. The final categories which were chosen were donut shops and bakeries as this provided the most accurate data.

Foursquare details removed for submission

In [3]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


Set categories to bakeries and donut shops, categoryid found on the Foursquare Docs

In [180]:
category = '4bf58dd8d48988d16a941735,4bf58dd8d48988d148941735' # bakery & donut shop
radius = 1500
print(category + ' .... OK!')

4bf58dd8d48988d16a941735,4bf58dd8d48988d148941735 .... OK!


Find nearby venues to each neighbourhood (search radius 200)

In [181]:
def getNearbyVenues(latitudes, longitudes, radius=200):
    
    venues_list=[]
    for lat, lng in zip(latitudes, longitudes):
        print('.')
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            #name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [#'Postcode', 
                  'Neighbourhood_Latitude', 
                  'Neighbourhood_Longitude', 
                  'Venue', 
                  'Venue_Latitude', 
                  'Venue_Longitude', 
                  'Venue_Category']
    
    return(nearby_venues)

In [182]:
venues =getNearbyVenues(lat,lng)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.


Dataframe containing all the venues, with venue location and neighbourhood location

In [183]:
print(venues.shape)
venues

(17, 6)


Unnamed: 0,Neighbourhood_Latitude,Neighbourhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,55.857063,-4.253401,Greggs,55.858022,-4.253581,Bakery
1,55.857063,-4.253401,Greggs,55.857437,-4.254928,Bakery
2,55.858044,-4.247369,Greggs,55.857467,-4.24736,Bakery
3,55.858044,-4.247369,McDonalds Bakers,55.857741,-4.249066,Bakery
4,55.859511,-4.257928,Tantrum Doughnuts,55.860512,-4.255771,Donut Shop
5,55.859511,-4.257928,Krispy Kreme,55.859836,-4.257977,Donut Shop
6,55.859511,-4.257928,Greggs,55.860515,-4.257039,Bakery
7,55.859511,-4.257928,Greggs,55.858297,-4.256496,Bakery
8,55.860492,-4.251895,Greggs,55.860916,-4.251275,Bakery
9,55.861473,-4.245862,Greggs,55.860845,-4.243839,Bakery


Create dataframe containing neighbourhoods with venues

In [197]:
d = {'Latitude': venues.Neighbourhood_Latitude, 'Longitude': venues.Neighbourhood_Longitude}
nwv = pd.DataFrame(data=d)

nwv

Unnamed: 0,Latitude,Longitude
0,55.857063,-4.253401
1,55.857063,-4.253401
2,55.858044,-4.247369
3,55.858044,-4.247369
4,55.859511,-4.257928
5,55.859511,-4.257928
6,55.859511,-4.257928
7,55.859511,-4.257928
8,55.860492,-4.251895
9,55.861473,-4.245862


Visualise venues on top of neighbourhoods

In [185]:
area_venues_map = folium.Map(location=[latitude, longitude], zoom_start=13)

for lat, lng, label in zip(venues.Venue_Latitude, venues.Venue_Longitude, venues.Venue_Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(area_venues_map)

for lat, lon in zip(latitudes, longitudes):    
        folium.Circle([lat, lon], radius=200, color='blue', fill=False).add_to(area_venues_map)
        
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Glasgow City Centre',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(area_venues_map)

# display map
area_venues_map

## Methodology <a name="methodology"></a>

Display only neighbourhoods with venues

In [204]:
map_nwv = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_nwv)
for lat, lon in zip(nwv.Latitude, nwv.Longitude):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_nwv) 
    folium.Circle([lat, lon], radius=200, color='blue', fill=False).add_to(map_nwv)
    folium.Marker([lat, lon]).add_to(map_nwv)
    
for lat, lng, label in zip(venues.Venue_Latitude, venues.Venue_Longitude, venues.Venue_Category):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_nwv)
        
map_nwv

Before merging, make a dataframe with only the coordinates of all neighbourhoods (excluding distance to centre).

In [222]:
d = {'Latitude': latitudes, 'Longitude': longitudes}
clean_neighbourhoods = pd.DataFrame(data=d)
clean_neighbourhoods

Unnamed: 0,Latitude,Longitude
0,55.848738,-4.245856
1,55.849719,-4.239824
2,55.850206,-4.256413
3,55.851187,-4.250381
4,55.852167,-4.244349
5,55.853147,-4.238317
6,55.854127,-4.232285
7,55.852653,-4.260939
8,55.853635,-4.254907
9,55.854615,-4.248875


Find neighbourhoods which are not contained within the "neighbourhoods with venues" dataframe. (find neighbourhoods without venues)

In [228]:
common = clean_neighbourhoods.merge(nwv,on=['Latitude','Longitude'])
empty_n = clean_neighbourhoods[(~clean_neighbourhoods.Latitude.isin(common.Latitude))&(~clean_neighbourhoods.Longitude.isin(common.Longitude))]
empty_n

Unnamed: 0,Latitude,Longitude
0,55.848738,-4.245856
1,55.849719,-4.239824
2,55.850206,-4.256413
3,55.851187,-4.250381
4,55.852167,-4.244349
5,55.853147,-4.238317
6,55.854127,-4.232285
7,55.852653,-4.260939
8,55.853635,-4.254907
9,55.854615,-4.248875


Plot neighbourhoods without venues

In [226]:
map_empty_n = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_empty_n)
for lat, lon in zip(empty_n.Latitude, empty_n.Longitude):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_empty_n) 
    folium.Circle([lat, lon], radius=200, color='blue', fill=False).add_to(map_empty_n)
    folium.Marker([lat, lon]).add_to(map_empty_n)
           
map_empty_n

Merge empty neighbourhoods with all neighbourhoods to find distances for empty neighbourhoods

In [232]:
common = empty_n.merge(neighbourhoods,on=['Latitude','Longitude'])
common

Unnamed: 0,Latitude,Longitude,Distances
0,55.848738,-4.245856,1400.0
1,55.849719,-4.239824,1400.0
2,55.850206,-4.256413,1311.487705
3,55.851187,-4.250381,1113.552873
4,55.852167,-4.244349,1039.230485
5,55.853147,-4.238317,1113.552873
6,55.854127,-4.232285,1311.487705
7,55.852653,-4.260939,1216.552506
8,55.853635,-4.254907,916.515139
9,55.854615,-4.248875,721.110255


Sort by distances in ascending order and take only top 3

In [257]:
top_three=common.sort_values(by='Distances')[:3]
top_three

Unnamed: 0,Latitude,Longitude,Distances
25,55.864902,-4.244355,529.150262
15,55.859025,-4.241336,529.150262
20,55.862453,-4.239829,600.0


Display G1 and G2 postcode boundary data

In [258]:
with open('G1G2.geojson') as f:
    world_geo = json.load(f)

Plot closest three locations to city centre with G1 and G2 postcodes

In [259]:
map_top_three = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_top_three)
for lat, lon in zip(top_three.Latitude, top_three.Longitude):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_top_three) 
    folium.Circle([lat, lon], radius=200, color='blue', fill=False).add_to(map_top_three)
    folium.Marker([lat, lon]).add_to(map_top_three)
    
folium.GeoJson(data=world_geo, name='geojson').add_to(map_top_three)

        
map_top_three

It is clear only one location is contained within the G1 and G2 postcodes. This is 55.859025 -4.241336. The next step is to find the postcodes which are in this neighbourhood.

In [247]:
#find city centre postcodes within 200 radius of one of the candidate locations
#its clear only one candidate location contains city centre postcodes
#55.859025 -4.241336
with open('centroidsG1G2.geojson') as f:
    world_geo = json.load(f)

neighborhoods_data = world_geo['features']
# define the dataframe columns
column_names = ['Postcode', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in neighborhoods_data:
    postcode = data['properties']['masterpc'] 
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Postcode': postcode,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Postcode,Latitude,Longitude
0,G2 5QD,55.861583,-4.260778
1,G1 1RE,55.860224,-4.246293
2,G2 4PQ,55.864392,-4.268865
3,G1 1HJ,55.859841,-4.249525
4,G1 4PD,55.857037,-4.261136


Apply latlon to xy to find X Y coordinates for centroid of each postcode area

In [269]:
neighborhoods_t = neighborhoods.apply (lambda row : lonlat_to_xy(row.Longitude, row.Latitude), axis=1)
neighborhoods_t

0     (-696774.9968963135, 6359765.282329559)
1      (-695931.412473761, 6359361.164217636)
2     (-697181.3114617155, 6360214.048413507)
3     (-696141.2210566797, 6359376.591014969)
4      (-696939.9256205589, 6359277.00857053)
                       ...                   
90    (-696282.4265754838, 6358876.943777269)
91    (-695997.0822119878, 6359250.541056749)
92      (-696164.34175008, 6359431.656944119)
93    (-696541.1969862981, 6359478.724421603)
94    (-696534.0717766469, 6359008.105294452)
Length: 95, dtype: object

Place these X Y coordiantes in new dataframe

In [271]:
neighborhoods_xy=neighborhoods
neighborhoods_xy["X"]=[i[0] for i in neighborhoods_t]
neighborhoods_xy["Y"]=[i[1] for i in neighborhoods_t]
neighborhoods_xy

Unnamed: 0,Postcode,Latitude,Longitude,X,Y
0,G2 5QD,55.861583,-4.260778,-696774.996896,6.359765e+06
1,G1 1RE,55.860224,-4.246293,-695931.412474,6.359361e+06
2,G2 4PQ,55.864392,-4.268865,-697181.311462,6.360214e+06
3,G1 1HJ,55.859841,-4.249525,-696141.221057,6.359377e+06
4,G1 4PD,55.857037,-4.261136,-696939.925621,6.359277e+06
...,...,...,...,...,...
90,G1 5HE,55.855256,-4.249476,-696282.426575,6.358877e+06
91,G1 1TF,55.859125,-4.246801,-695997.082212,6.359251e+06
92,G1 1JG,55.860251,-4.250113,-696164.341750,6.359432e+06
93,G1 3LN,55.859726,-4.256002,-696541.196986,6.359479e+06


Find XY for candidate location

In [275]:
#calculate xy for candidate location
x_t, y_t = lonlat_to_xy(-4.241336, 55.859025)

Find distance for all postcodes to the candidate location

In [276]:
neighborhoods_xy_d = neighborhoods_xy
neighborhoods_xy_d["Distance"] = neighborhoods_xy.apply (lambda row : calc_xy_distance(x_t, y_t, row.X, row.Y), axis=1)
neighborhoods_xy_d

Unnamed: 0,Postcode,Latitude,Longitude,X,Y,Distance
0,G2 5QD,55.861583,-4.260778,-696774.996896,6.359765e+06,1271.806390
1,G1 1RE,55.860224,-4.246293,-695931.412474,6.359361e+06,343.689227
2,G2 4PQ,55.864392,-4.268865,-697181.311462,6.360214e+06,1855.750026
3,G1 1HJ,55.859841,-4.249525,-696141.221057,6.359377e+06,529.686394
4,G1 4PD,55.857037,-4.261136,-696939.925621,6.359277e+06,1281.154099
...,...,...,...,...,...,...
90,G1 5HE,55.855256,-4.249476,-696282.426575,6.358877e+06,671.607062
91,G1 1TF,55.859125,-4.246801,-695997.082212,6.359251e+06,348.287709
92,G1 1JG,55.860251,-4.250113,-696164.341750,6.359432e+06,576.053755
93,G1 3LN,55.859726,-4.256002,-696541.196986,6.359479e+06,937.550794


Sort by closest distance and select top 11 (so it is under the radius 200m).

In [284]:
top_n=neighborhoods_xy_d.sort_values(by='Distance')[:11]
top_n

Unnamed: 0,Postcode,Latitude,Longitude,X,Y,Distance
71,G1 1PZ,55.859112,-4.240988,-695641.76554,6359146.0,24.260685
72,G1 1QN,55.859262,-4.241451,-695665.389769,6359171.0,27.778503
74,G1 1QL,55.858528,-4.240978,-695659.488655,6359083.0,60.71055
73,G1 1QG,55.859024,-4.24007,-695588.323659,6359121.0,80.663246
34,G1 1QH,55.8597,-4.240883,-695616.861983,6359208.0,81.665165
16,G1 1HE,55.859047,-4.242796,-695754.417127,6359171.0,93.001539
20,G1 1HF,55.858717,-4.242813,-695765.883086,6359136.0,100.363915
6,G1 1BL,55.858413,-4.242626,-695763.959697,6359099.0,107.481341
66,G1 1EX,55.859996,-4.243223,-695750.727997,6359282.0,162.89865
33,G1 1PA,55.858304,-4.24393,-695847.177424,6359110.0,184.292956


Plot these postcodes on map with boundary data

In [287]:
map_top_n = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_top_n)
for lat, lon in zip(top_three_n.Latitude, top_three_n.Longitude):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_top_n) 
    folium.Marker([lat, lon]).add_to(map_top_n)

folium.GeoJson(data=world_geo, name='geojson').add_to(map_top_n)

        
map_top_n

Candidate.geojson has been made in QGIS to only contain the features masterpc (postcodes) which are found in the previous dataframe.

In [289]:
with open('candidate.geojson') as f:
    candidate = json.load(f)

Plot candidate.geojson on Folium map

In [290]:
map_candidate = folium.Map(location=centre, zoom_start=13)
folium.Marker(centre, popup='City Centre').add_to(map_candidate)
folium.GeoJson(data=candidate, name='geojson').add_to(map_candidate)   
map_candidate

# Result <a name="result"></a>
This final map displays a heatmap of the current venues and the location of the proposed new donut shop.

In [291]:
from folium import plugins
from folium.plugins import HeatMap


map_heat = folium.Map(location=[latitude, longitude],
                    zoom_start = 14) 

# List comprehension to make out list of lists
heat_data = [[row['Venue_Latitude'],row['Venue_Longitude']] for index, row in venues.iterrows()]

folium.GeoJson(data=candidate, name='geojson').add_to(map_heat)
# Plot it on the map
HeatMap(heat_data).add_to(map_heat)
folium.Marker(centre, popup='City Centre').add_to(map_heat)
# Display the map
map_heat