## Coursera Capstone

This notebook will be mainly used for the capstone project!

In [2]:
import pandas as pd
import numpy as np

print ('Hello Capstone Project Course!')

Hello Capstone Project Course!


### Introduction/Business Problem

The manager of a chain of niche bakeries has been in touch as the company would like to expand to Scotland, having opened shops across 10 locations across England. They have requested that I compare the postcodes of the Scottish capital, Edinburgh, to inform what location would be best suited for their business. Of particular interest are the existing cafés, bistros and bakeries that would, naturally, compete with their branch for customers, as well as the general layout of the city. The client's key requirement is that the bakery should be centrally located, to maximise foot traffic during busy times such as festivals.

### Data

I will use the Foursquare API (https://foursquare.com/) for this analysis:
- The location of interest is Edinburgh.
- Using the latitude and longitude of each postcode in the city, I will explore the city centre, paying specific attention to eateries (cafés, bistros and bakeries). This will involve the use of the explore function and a k-means clustering algorithm.
- Finally, I will use the Folium library to map out the city, showing the clusters of eateries in Edinburgh.

To obtain coordinates for each location, I will use the '2020-2 Scottish Postcode Directory Files' datasets provided by National Records Scotland. This is a list of active and deleted postcodes in Scotland, offered freely online (https://www.nrscotland.gov.uk/statistics-and-data/geography/nrs-postcode-extract).


# Analysis


To download the required libraries:


In [3]:
import numpy as np # to handle vectors

import pandas as pd # for analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # to convert an address into latitude and longitude

import requests # to handle requests
from pandas.io.json import json_normalize # to tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

### 1. Explore Dataset

To import the dataset containing coordinates of different postcodes in Scotland:

In [4]:

import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

df_data_1 = pd.read_csv(body)

df_data_1.head(20)


  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,Postcode,PostcodeDistrict,PostcodeSector,DateOfIntroduction,DateOfDeletion,GridReferenceEasting,GridReferenceNorthing,Latitude,Longitude,SplitIndicator,CouncilArea2019Code,UKParliamentaryConstituency2005Code,ScottishParliamentaryRegion2014Code,ScottishParliamentaryConstituency2014Code,ElectoralWard2019Code,HealthBoardArea2019Code,HealthBoardArea2006Code,HealthBoardArea1995Code,IntegrationAuthority2019Code,OutputArea2011Code,OutputArea2001Code,OutputArea1991Code,DataZone2011Code,DataZone2001Code,IntermediateZone2011Code,IntermediateZone2001Code,CensusHouseholdCount2011,CensusPopulationCount2011,CensusHouseholdCount2001,CensusPopulationCount2001,CensusHouseholdCount1991,CensusPopulationCount1991,ScottishIndexOfMultipleDeprivation2020Rank,LAU2019Level1Code,NUTS2018Level2Code,NUTS2018Level3Code,Locality2016Code,Locality2001Code,Locality1991Code,Settlement2016Code,Settlement2001Code,CivilParish1930Code,EnterpriseRegion2008Code,Islands2020Code,LocalGovernmentDistrict1995Code,LocalGovernmentDistrict1991Code,NationalPark2010Code,RegistrationDistrict2007Code,ROACommunityPlanningPartnership2006Code,ROALocal2006Code,StrategicDevelopmentPlanningArea2013Code,TravelToWorkArea2011Code,UrbanRural6Fold2016Code,UrbanRural8Fold2016Code,GridLinkIndicator,GridLinkPositionalAccuracy,NeverDigitised
0,AB1 0AA,AB1,AB1 0,1/1/1980 00:00:00,1/6/1996 00:00:00,385386,801193,57.101482,-2.242872,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090303,S00001364,6015AK12A,S01006514,S01000011,S02001237,S02000007,,,,,1.0,2.0,6715,S30000026,UKM5,UKM50,S19001711,,2.0,S20001422,,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,3,3,N,,
1,AB1 0AB,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,384939,801420,57.103507,-2.250264,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090303,S00001270,6015AK12A,S01006514,S01000011,S02001237,S02000007,,,,,1.0,3.0,6715,S30000026,UKM5,UKM50,S19001711,402001.0,2.0,S20001422,402.0,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,3,3,N,,
2,AB1 0AD,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,384939,800980,57.099555,-2.250237,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090399,S00001364,6015AK11A,S01006514,S01000011,S02001237,S02000007,,,,,7.0,21.0,6715,S30000026,UKM5,UKM50,,,2.0,,,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,5,6,N,,
3,AB1 0AE,AB1,AB1 0,1/2/1994 00:00:00,1/4/1996 00:00:00,384599,799300,57.084452,-2.255745,N,S12000034,S14000058,S17000014,S16000076,S13002864,S08000020,S08000006,2,S37000002,S00091322,S00002142,6018AC03B,S01006853,S01000333,S02001296,S02000061,,,,,,,5069,S30000027,UKM5,UKM50,,,,,,S35000593,S09000001,0,18,18,,S12000034,,,S11000001,S22000047,5,6,N,,
4,AB1 0AJ,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,384739,801010,57.099817,-2.25354,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090399,S00001364,6015AK11A,S01006514,S01000011,S02001237,S02000007,,,,,3.0,7.0,6715,S30000026,UKM5,UKM50,S19001778,,2.0,S20001422,,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,3,3,N,,
5,AB1 0AL,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,384739,801260,57.102063,-2.253555,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090381,S00001364,6015AK11A,S01006511,S01000011,S02001236,S02000007,,,,,3.0,11.0,6253,S30000026,UKM5,UKM50,S19001778,,2.0,S20001422,,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,3,3,N,,
6,AB1 0AP,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,385019,800740,57.097401,-2.248902,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090399,S00001364,6015AK11B,S01006514,S01000011,S02001237,S02000007,,,,,0.0,0.0,6715,S30000026,UKM5,UKM50,,,2.0,,,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,5,6,N,,
7,AB1 0AQ,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,384799,800870,57.098562,-2.252541,N,S12000033,S14000002,S17000014,S16000076,S13002843,S08000020,S08000006,2,S37000001,S00090399,S00001269,6015AK11B,S01006514,S01000007,S02001237,S02000003,,,,,1.0,1.0,6715,S30000026,UKM5,UKM50,S19001778,402001.0,2.0,S20001422,402.0,S35000689,S09000001,0,15,15,,S12000033,,,S11000001,S22000047,3,3,N,,
8,AB1 0AR,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,386539,800040,57.09116,-2.223778,N,S12000034,S14000058,S17000014,S16000076,S13002864,S08000020,S08000006,2,S37000002,S00092401,S00003113,6018AC03A,S01006853,S01000333,S02001296,S02000061,,,,,9.0,28.0,5069,S30000027,UKM5,UKM50,,,,,,S35000593,S09000001,0,18,18,,S12000034,,,S11000001,S22000047,5,6,N,,
9,AB1 0AS,AB1,AB1 0,1/8/1973 00:00:00,1/4/1996 00:00:00,386239,799140,57.083067,-2.228679,N,S12000034,S14000058,S17000014,S16000076,S13002864,S08000020,S08000006,2,S37000002,S00092401,S00003113,6018AC03A,S01006853,S01000333,S02001296,S02000061,,,,,17.0,49.0,5069,S30000027,UKM5,UKM50,,,,,,S35000593,S09000001,0,18,18,,S12000034,,,S11000001,S22000047,5,6,N,,


In [5]:
live_df = df_data_1[df_data_1['DateOfDeletion'].isnull()]  # to eliminate 'dead' postcodes from the dataset

postcodes = pd.DataFrame()
postcodes ['Postcode'] = live_df ['Postcode']
postcodes ['Latitude'] = live_df ['Latitude']
postcodes ['Longitude'] = live_df ['Longitude']
# new dataframe containing only relevant columns

postcodes.reset_index(drop = True, inplace = True)

postcodes.head()


Unnamed: 0,Postcode,Latitude,Longitude
0,AB42 0HJ,57.414884,-1.84619
1,AB42 0HL,57.426715,-1.913715
2,AB42 0HP,57.418375,-1.85147
3,AB42 0HW,57.4221,-1.84939
4,AB42 0HX,57.423733,-1.913722


The dataframe now contains only postcodes that are currently in use in Scotland.

### 2. Define function

To declare a function that will use Foursquare credentials to query the venues in postcodes within Edinburgh:

In [6]:
CLIENT_ID = '' # my Foursquare ID
CLIENT_SECRET = '' # my Foursquare Secret
VERSION = '' # Foursquare API version
LIMIT = 80 # limit of number of venues returned by Foursquare API

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4bf58dd8d48988d1d0941735,4bf58dd8d48988d1e0931735,4bf58dd8d48988d16a941735,52e81612bcbc57f1066b79f1,4bf58dd8d48988d16d941735'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postcode', 
                  'Postcode Latitude', 
                  'Postcode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

To get the geographical coordinates of Edinburgh:

In [10]:
# Edinburgh
address = 'Edinburgh, UK'

geolocator = Nominatim(user_agent="scot_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Edinburgh are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Edinburgh are 55.9533456, -3.1883749.


Given the client's requirement to site the bakery in the city centre, I limit the dataset to postcodes within a 1km radius of Edinburgh:

In [28]:
from geopy.distance import geodesic

def cleanPostcodes(origin_latitude, origin_longitude, postcodes_original, radius=1000):
    center = (origin_latitude, origin_longitude)

    new_rows = []
    for i, x in postcodes_original.iterrows():
        p = (x['Latitude'], x['Longitude'])
        if geodesic(p, center).meters < radius:
            new_rows.append(x.values)
        
    postcodes_cleaned = pd.DataFrame(new_rows, columns=postcodes_original.columns)
    #.reset_index(drop=True, inplace=True)
    return(postcodes_cleaned)

In [29]:
postcodes_edb = cleanPostcodes(latitude, longitude, postcodes)

In [30]:
postcodes_edb.shape

(667, 3)

This leaves a total of 667 postcodes to be used for Foursquare calls.

In [31]:
edinburgh_venues = getNearbyVenues(names = postcodes_edb['Postcode'],
                                   latitudes = postcodes_edb['Latitude'],
                                   longitudes = postcodes_edb['Longitude']
                                  )

In [32]:
edinburgh_venues.head()

Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,EH1 1BL,55.950658,-3.191749,artisan gelato,55.950581,-3.19065,Gelato Shop
1,EH1 1BB,55.952027,-3.189565,Cornish Pasty Co,55.952224,-3.189511,Bakery
2,EH1 1BB,55.952027,-3.189565,Millie's Cookies,55.951883,-3.189436,Bakery
3,EH1 1BB,55.952027,-3.189565,Greggs,55.952774,-3.190335,Bakery
4,EH1 1DE,55.950954,-3.189996,artisan gelato,55.950581,-3.19065,Gelato Shop


In [33]:
edinburgh_venues.shape

(479, 7)

Due to overlap within the search radius for each postcode, I have to drop duplicate venues:

In [36]:
edinburgh_venues.drop_duplicates(subset = 'Venue', inplace = True)
edinburgh_venues.head()

Unnamed: 0,Postcode,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,EH1 1BL,55.950658,-3.191749,artisan gelato,55.950581,-3.19065,Gelato Shop
1,EH1 1BB,55.952027,-3.189565,Cornish Pasty Co,55.952224,-3.189511,Bakery
2,EH1 1BB,55.952027,-3.189565,Millie's Cookies,55.951883,-3.189436,Bakery
3,EH1 1BB,55.952027,-3.189565,Greggs,55.952774,-3.190335,Bakery
7,EH1 1AD,55.948907,-3.192608,Demijohn,55.94862,-3.193979,Gourmet Shop


To group venues within each postcode by venue category:

In [37]:
edinburgh_venues.groupby('Postcode').count()

Unnamed: 0_level_0,Postcode Latitude,Postcode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
EH1 1AD,1,1,1,1,1,1
EH1 1BB,3,3,3,3,3,3
EH1 1BL,1,1,1,1,1,1
EH1 1BQ,1,1,1,1,1,1
EH1 1DR,3,3,3,3,3,3
EH1 1EZ,1,1,1,1,1,1
EH1 1HR,2,2,2,2,2,2
EH1 1JQ,1,1,1,1,1,1
EH1 1JX,1,1,1,1,1,1
EH1 1LS,1,1,1,1,1,1


In [39]:
print('There are {} unique categories.'.format(len(edinburgh_venues['Venue Category'].unique())))

There are 14 unique categories.


One hot encoding to prepare for kMeans clustering:

In [63]:
# one hot encoding
edinburgh_onehot = pd.get_dummies(edinburgh_venues[['Venue Category']], prefix="", prefix_sep="")

# add postcode column back to dataframe
edinburgh_onehot['Postcode'] = edinburgh_venues['Postcode'] 

# move postcode column to the first column
fixed_columns = [edinburgh_onehot.columns[-1]] + list(edinburgh_onehot.columns[:-1])
edinburgh_onehot = edinburgh_onehot[fixed_columns]

edinburgh_onehot.head()

Unnamed: 0,Postcode,Bakery,Café,Candy Store,Creperie,Cupcake Shop,Dessert Shop,Gelato Shop,Gift Shop,Gourmet Shop,Ice Cream Shop,Pastry Shop,Pie Shop,Tea Room,Thai Restaurant
0,EH1 1BL,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,EH1 1BB,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,EH1 1BB,1,0,0,0,0,0,0,0,0,0,0,0,0,0
3,EH1 1BB,1,0,0,0,0,0,0,0,0,0,0,0,0,0
7,EH1 1AD,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [64]:
edinburgh_grouped = edinburgh_onehot.groupby('Postcode').mean().reset_index()
edinburgh_grouped.shape

(33, 15)

To sort the venues in descending order:

In [65]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

To display the top 5 venues for each postcode:

In [66]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postcode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postcodes_venues_sorted = pd.DataFrame(columns=columns)
postcodes_venues_sorted['Postcode'] = edinburgh_grouped['Postcode']

for ind in np.arange(edinburgh_grouped.shape[0]):
    postcodes_venues_sorted.iloc[ind, 1:] = return_most_common_venues(edinburgh_grouped.iloc[ind, :], num_top_venues)

postcodes_venues_sorted.head()

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,EH1 1AD,Gourmet Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
1,EH1 1BB,Bakery,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
2,EH1 1BL,Gelato Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
3,EH1 1BQ,Tea Room,Thai Restaurant,Pie Shop,Pastry Shop,Ice Cream Shop
4,EH1 1DR,Ice Cream Shop,Gift Shop,Bakery,Thai Restaurant,Tea Room


In [67]:
postcodes_venues_sorted.reset_index(drop = True, inplace = True)
postcodes_venues_sorted

Unnamed: 0,Postcode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,EH1 1AD,Gourmet Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
1,EH1 1BB,Bakery,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
2,EH1 1BL,Gelato Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
3,EH1 1BQ,Tea Room,Thai Restaurant,Pie Shop,Pastry Shop,Ice Cream Shop
4,EH1 1DR,Ice Cream Shop,Gift Shop,Bakery,Thai Restaurant,Tea Room
5,EH1 1EZ,Ice Cream Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
6,EH1 1HR,Pie Shop,Dessert Shop,Thai Restaurant,Tea Room,Pastry Shop
7,EH1 1JQ,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop,Ice Cream Shop
8,EH1 1JX,Café,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
9,EH1 1LS,Gourmet Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop


To run the kMeans clustering algorithm using k=5 (chosen at random):

In [68]:
# set number of clusters
kclusters = 5

edinburgh_grouped_clustering = edinburgh_grouped.drop('Postcode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(edinburgh_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 4, 3, 3, 1, 3, 3, 3, 0], dtype=int32)

In [69]:
# add clustering labels
postcodes_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

postcodes_edb_merged = postcodes_edb

# merge postcodes_edb with postcodes_venues_sorted to add latitude/longitude for each postcode
postcodes_edb_merged = postcodes_edb_merged.join(postcodes_venues_sorted.set_index('Postcode'), on='Postcode')

postcodes_edb_merged = postcodes_edb_merged[postcodes_edb_merged['Cluster Labels'].isnull() == False]

postcodes_edb_merged.reset_index(drop = True, inplace = True) # check the last columns!

In [86]:
postcodes_edb_merged

Unnamed: 0,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,EH1 1BL,55.950658,-3.191749,4.0,Gelato Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
1,EH1 1BB,55.952027,-3.189565,2.0,Bakery,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
2,EH1 1AD,55.948907,-3.192608,0.0,Gourmet Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
3,EH1 1BQ,55.95263,-3.191329,3.0,Tea Room,Thai Restaurant,Pie Shop,Pastry Shop,Ice Cream Shop
4,EH1 1DR,55.950942,-3.184711,3.0,Ice Cream Shop,Gift Shop,Bakery,Thai Restaurant,Tea Room
5,EH1 1EZ,55.945896,-3.189857,1.0,Ice Cream Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
6,EH1 1HR,55.948178,-3.187141,3.0,Pie Shop,Dessert Shop,Thai Restaurant,Tea Room,Pastry Shop
7,EH1 1JQ,55.948616,-3.18733,3.0,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop,Ice Cream Shop
8,EH1 1JX,55.948143,-3.193529,3.0,Café,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop
9,EH1 1LS,55.947738,-3.18615,0.0,Gourmet Shop,Thai Restaurant,Tea Room,Pie Shop,Pastry Shop


To visualise the clusters on a map of Edinburgh:

In [84]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

rainbow

['#8000ff', '#00b5eb', '#80ffb4', '#ffb360', '#ff0000']

In [85]:
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(postcodes_edb_merged['Latitude'], postcodes_edb_merged['Longitude'], postcodes_edb_merged['Postcode'], postcodes_edb_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    cluster_index = int(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster_index-1],
        fill=True,
        fill_color=rainbow[cluster_index-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters