## Similarity k-Means Clustering of Edinburgh and Glasgow Postcodes Based on Local Amenities

#### James Alibhai, PhD

Moving home can be both an exicting and a daunting prospect, especially when you are unfamiliar with the new area you are moving to. Often a person has a desire to move either to a location that has similar local amenities or would like to move to an area that matches a desired area that they are familiar with. In many cases estate agents can offer help and advice to help a client understand an area that is unfamiliar, however this can be fraught with bias by the estate agent and inability to convey an ideal area by the client. Furthermore, with the growing market of property searching moving online, it can be highly advantageous for an estate agent to offer their client an unbiased model allowing a search of known locations to match areas in the search perimeter based on desired local amenities, such as nearby parks, shops, restaurants etc.

Here, we present a machine learning k-means cluster model to match locations in the two largest and most populous cities in Scotland; Edinburgh and Glasgow. The cities share a similar distribution of population, with young people (ages 16-29) making up around one fifth of the total population. Both cities are also famous for their culture. Glasgow was named the city of culture in 1990 and remains one of the most vibrant culture hubs in the world, boasting the National Theatre of Scotland, BBC Scottish Symphony Orchestra, National Theatre of Scotland, Gallery of Modern Art and the Scottish Hydro. Edinburgh on the other hand includes the National Museum of Scotland, Scottish National Gallery of Modern Art, Edinburgh Festival Theatre as well as a number of world famous festivals, including Edinburgh Fringe, International Film Festival, the Royal Edinburgh Military Tattoo, amongst many others. Combined these contribute to thriving socioeconomic cities that are enriched with a variety of local amenities in different parts of each city. 


##### Data

UK postcode data is obtained from Doogle.co.uk (https://www.doogal.co.uk/ukpostcodes.php). This provides latitute and longitude coordinates of every postcode in the UK. Data is filtered to EH (Edinburgh) and G (Glasgow) postcodes which correspond to geographical locations within or closely surrounding either city. Data is split and grouped into the sector part of the postcode (e.g. EH1 1 or G2 1) and all full postcodes that fit within each sector is averaged to the mean and binned into the sector. Each sector postcode has then been manually attributed an 'Area' description. Overall there are 203 areas combined between Edinburgh and Glasgow data.

Local amenities data is provided by FourSquare (https://foursquare.com/). We call data from the FourSquare database as a .JSON file relating to 'Venue', 'Venue Category', 'Latitute' and 'Longitude'. The dataframe is then linked to our Edinburgh and Glasgow dataframe. All local venues within 500m of a Edinburgh or Glasgow postcode averaged latitute and longitude is then categorised as a venue within that specific postcode.

##### Methodology

Edinburgh and Glasgow data has been constructed into a dataframe using the pandas library, and consists of the columns 'Postcode', 'Latitute', 'Longitude' and 'Area'. In total there are 203 postcodes across both cities represented in the dataframe.

The Nominatim library is used to validate both Edinburgh and Glasgow coordinates and we selected an equidistant location between either city, Harthill, to define our cooridinates for generating maps, using the Folium library.

We obtain local amenity data via the FourSquare API and define a limit of 100 which returns a .JSON file containing venue, venue category, venue latitute and venue longitude. Each venue is then defined as being geographically located near to an Edinburgh or Glasgow postcode based on whether is resides within 500m of the latitute and longitude of a specific postcode. Data is then onehot encoded, grouped and averaged by the mean.

Edinburgh and Glasgow areas are clustered according to the k-Means clustering algorithm based on which venue categories are present in each Edinburgh and Glasgow area. The similarity of data is measured using Euclidean distance, fit to 10 clusters and each cluster then plotted on a folium map.

##### Results

Our results contains 203 seperate postcode areas from both Edinburgh and Glasgow. We show each postcode linked to a unique area name held within a pandas dataframe. We confirm that the latitutes and longitudes of both Edinburgh and Glasgow match that of the Nominatim library and show an interactive map created from the Folium package containing each postcode from the Edinburgh and Glasgow areas (Figure 1).

Next, FourSquare API data is obtained containing venue and venue category data alongside each venues latitude and longitude. These are then constructed into a dataframe with additional data from the area, latitute and longitude of our Edinburgh and Glasgow dataframe. In total we find a total of 265 unique venue categories across all areas of Edinburgh and Glasgow which are then onehot encoded, grouped and averaged by mean into a new dataframe. This new dataframe then is used to obtain the top three venues for each postcode area which is then fitted using the k-Means cluster algorithm (k = 10) and the cluster labels added to the dataframe. Due to an unexpected error, the cluster labels initially loaded into the dataframe as a float type which is not appropriate for our final output of an interactive map. Therefore the cluster labels data was changed to an integer type and reintegrated into the dataframe.

Our dataframe was constructed into an interactive map which show the clusters of different postcode areas in Edinburgh and Glasgow (Figure 2). Our data shows several areas in both cities that share similar local amenities. For example, George Street a central location in Edinburgh with plenty of shops, restaurants and parks nearby is clustered to Cluster 2 and Buchanan Street, similarly a central location in Glasgow with a similar array of local amenities is also clustered to Cluster 2. Indeed, cluster 2 (dark blue) appears to represent areas of each city most geographically in the city centres and in some outlying regions. Most other clusters, such as clusters 0 (red) and 5 (turquoise) represent suburban regions of either city and therefore highly residential areas.



##### Code

In [1]:
import pandas as pd #Import Pandas library

Upload Edinburgh and Glasgow post code data

In [2]:
postcode_csv = 'C:/Users/James/Desktop/IBM Data Science Certificate/Course 9/G_E_postcodes.csv'

df = pd.read_csv(postcode_csv)

Visualise dataframe

In [3]:
df.head()

Unnamed: 0,Postcode,Latitude,Longitude,Area
0,EH1 1,55.9509,-3.1891,Old Town
1,EH1 2,55.9492,-3.19825,Edinburgh Castle
2,EH1 3,55.9568,-3.18752,York Place
3,EH1 9,55.9432,-3.23293,Haymarket
4,EH2 1,55.955,-3.19491,Queen Street


In [4]:
df.shape

(203, 4)

In [5]:
import json
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

Confirm Edinburgh coordinates from dataframe are correct. Coordinates in our dataframe are around 55.95, -3.188

In [6]:
address_edin = 'Edinburgh'

geolocator = Nominatim(user_agent="edinburgh_explorer")
location = geolocator.geocode(address_edin)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Edinburgh are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Edinburgh are 55.9533456, -3.1883749.


Confirm Glasgow coordinates from dataframe are correct. Coordinates in our dataframe are around 55.86, -4.248

In [7]:
address_glasgow = 'Glasgow'

geolocator = Nominatim(user_agent="glasgow_explorer")
location = geolocator.geocode(address_glasgow)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Glasgow are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Glasgow are 55.8609825, -4.2488787.


Use the coordinates from a geographical point half way between either city as a zoom point to generate a map.

In [11]:
address_scotland = 'Harthill'

geolocator = Nominatim(user_agent="scottish_explorer")
location = geolocator.geocode(address_scotland)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Harthill are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Harthill are 55.8603773, -3.7490307.


##### Figure 1

A map to visualise the location of all Edinburgh and Glasgow postcodes in our dataframe

In [13]:
# create map of Toronto using latitude and longitude values
map_scotland = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postcode, area in zip(df['Latitude'], df['Longitude'], df['Area'], df['Postcode']):
    label = '{}, {}'.format(postcode, area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scotland)  
    
map_scotland

Outline FourSquare credentials. Please note Client_ID and Client_Secret are changed for data security.

In [14]:
CLIENT_ID = 'FourSquare Client ID - Restricted Access'
CLIENT_SECRET = 'FourSquare Secret - Restricted Access'
VERSION = '20180605'
LIMIT = 100

A Function to return a list of venue categories from the FourSquare API

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

A function to link the venues obtained from the FourSquare API with our Edinburgh and Glasgow postcode dataframe. All venues are linked to a postcode area based on its geographical distance (within 500m) of the defined latitute and longitude of each postcode.

In [16]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items'] 
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Generate a dataframe and populate with the Area, Latitute and Longitude of each postcode and the venue, venue category, venue latitude and venue longitude that are attributed to each postcode.

In [17]:
import requests
from pandas.io.json import json_normalize

scottish_venues = getNearbyVenues(names=df['Area'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Old Town
Edinburgh Castle
York Place
Haymarket
Queen Street
George Street
Frederick Street
Charlotte Square
West Coates
Brandon Street
Russell Gardens
Canonmills
Northumberland Street
Melville Street
West Approach
Lady Lawson Street
Comely Bank
Western General Hospital
Queensferry Road
Pennywell Road
Silverknowes
Barton Avenue
Clermiston
Drum Brae North
Crewe Road South
Wardieburn
East Pilton Farm Crescent
Earl Haig Gardens
Bellevue Crescent
Summerside
Pilrig Park
Mill Lane Park
Leith Links Gardens
Easter Road
Sir Harry Lauder Road
Hopetoun
Edina Place
Craigentinny
Piershall Square
Willowbrae Road
Holyrood Abbey
Pleasance
Mansionhouse Road
Grange
Langton Road
Strathearn Road
Fountainpark
Church Hill
Moringside Cemetery
Braid Mount
Buckstone Terrace
Moringside
Angle Park Terrace
Tynecastle
Saughton Mains Bank
Sighthill Park
Turnhouse
Broomhall
Roseburn
Belmont
Corstorphine Dovecot
Drum Brae South
South Gyle
Spylaw
Colinton Mains Park
Allan Park
Longstone
Harvesters Park
Calder Grove
Mui

In [18]:
print(scottish_venues.shape)
scottish_venues.head()

(3798, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Old Town,55.9509,-3.1891,The Milkman,55.95065,-3.19101,Coffee Shop
1,Old Town,55.9509,-3.1891,The Royal Mile,55.950029,-3.188567,Road
2,Old Town,55.9509,-3.1891,The Devil's Advocate,55.950309,-3.191643,Cocktail Bar
3,Old Town,55.9509,-3.1891,Whiski Bar & Restaurant,55.950318,-3.186471,Bar
4,Old Town,55.9509,-3.1891,Civerino's,55.949738,-3.188043,Pizza Place


In [19]:
scottish_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alderman Road,3,3,3,3,3,3
Alexandra Parade,7,7,7,7,7,7
Allan Park,14,14,14,14,14,14
Andersonian Library,16,16,16,16,16,16
Angle Park Terrace,19,19,19,19,19,19
...,...,...,...,...,...,...
Western General Hospital,16,16,16,16,16,16
Willowbrae Road,4,4,4,4,4,4
Wyndford,3,3,3,3,3,3
Wyndford Road,5,5,5,5,5,5


In [20]:
print('There are {} uniques categories.'.format(len(scottish_venues['Venue Category'].unique())))

There are 265 uniques categories.


Onehot encode each venue category

In [21]:
scottish_onehot = pd.get_dummies(scottish_venues[['Venue Category']], prefix="", prefix_sep="")

scottish_onehot

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,...,Veterinarian,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Well,Whisky Bar,Wine Bar,Wine Shop,Wings Joint
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3793,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3794,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3795,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3796,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
#Add area column back into the onehot dataframe
scottish_onehot['Area']=scottish_venues['Neighborhood']

scottish_onehot.head()

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Auto Garage,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Well,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Area
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Old Town
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Old Town
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Old Town
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Old Town
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Old Town


Generate a new dataframe with each onehot venue category averaged by the mean to each postcode Area.

In [28]:
scotland_grouped = scottish_onehot.groupby('Area').mean().reset_index()
scotland_grouped

Unnamed: 0,Area,Afghan Restaurant,African Restaurant,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,...,Veterinarian,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Well,Whisky Bar,Wine Bar,Wine Shop,Wings Joint
0,Alderman Road,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
1,Alexandra Parade,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
2,Allan Park,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
3,Andersonian Library,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.062500,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
4,Angle Park Terrace,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.052632,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193,Western General Hospital,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0625,0.0,0.0,0.000000,0.0,0.000000,0.0
194,Willowbrae Road,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
195,Wyndford,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0
196,Wyndford Road,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,0.000000,0.0,...,0.0,0.0,0.0,0.0000,0.0,0.0,0.000000,0.0,0.000000,0.0


In [29]:
scotland_grouped.shape

(198, 266)

A function to define the most common venues found in each postcode Area

In [35]:
import numpy as np

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [66]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
area_venues_sorted = pd.DataFrame(columns=columns)
area_venues_sorted['Area'] = scotland_grouped['Area']

for ind in np.arange(scotland_grouped.shape[0]):
    area_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scotland_grouped.iloc[ind, :], num_top_venues)

area_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Alderman Road,Print Shop,Storage Facility,Auto Workshop
1,Alexandra Parade,Fast Food Restaurant,Train Station,Convenience Store
2,Allan Park,Supermarket,Grocery Store,Tourist Information Center
3,Andersonian Library,Park,Hotel,Grocery Store
4,Angle Park Terrace,Pub,Chinese Restaurant,Pharmacy


Cluster segregation analysis using k-Means clustering algorithm to group different postcode areas of both Edinburgh and Glasgow based on the local amenitites (venues) that are located in each postcode area.

In [74]:
# set number of clusters
kclusters = 10

area_grouped_clustering = scotland_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(area_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 5, 2, 1, 5, 1, 2, 2, 2])

Add cluster labels to the dataframe

In [76]:
del area_venues_sorted['Cluster Labels']
# add clustering labels
area_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Join together the cluster dataframe data and the postcode area data

In [77]:
scotland_merged = df

# merge scotland_grouped with df to add latitude/longitude for each neighborhood
scotland_merged = df.join(area_venues_sorted.set_index('Area'), on='Area')

In [78]:
scotland_merged.head()

Unnamed: 0,Postcode,Latitude,Longitude,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,EH1 1,55.9509,-3.1891,Old Town,2.0,Hotel,Pub,Bar
1,EH1 2,55.9492,-3.19825,Edinburgh Castle,2.0,Café,Pub,Historic Site
2,EH1 3,55.9568,-3.18752,York Place,2.0,Hotel,Bar,Pub
3,EH1 9,55.9432,-3.23293,Haymarket,2.0,Hotel,Grocery Store,Trail
4,EH2 1,55.955,-3.19491,Queen Street,2.0,Bar,Café,Hotel


In [79]:
scotland_merged.shape

(203, 8)

Due to the scotland_merged dataframe returning a float in the 'Cluster Labels' column, which is not functional for construcing a map using folium, a new dataframe is constructed with the 'Cluster Labels' column converted from float to integer.

In [123]:
scotland_merged[['Cluster Labels']] = scotland_merged[['Cluster Labels']].fillna(value=0)
data = scotland_merged['Cluster Labels'].astype(int)
columns = ['Cluster Labels']
df2 = pd.DataFrame(data, columns=columns, dtype=object)

df2.head()

Unnamed: 0,Cluster Labels
0,2
1,2
2,2
3,2
4,2


Insert integer type 'Cluster Labels' column into scotland_merged dataframe and remove existing float type 'Cluster Labels' column

In [124]:
scotland_merged['Cluster Labels'] = df2['Cluster Labels']

In [125]:
scotland_merged.head()

Unnamed: 0,Postcode,Latitude,Longitude,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,EH1 1,55.9509,-3.1891,Old Town,2,Hotel,Pub,Bar
1,EH1 2,55.9492,-3.19825,Edinburgh Castle,2,Café,Pub,Historic Site
2,EH1 3,55.9568,-3.18752,York Place,2,Hotel,Bar,Pub
3,EH1 9,55.9432,-3.23293,Haymarket,2,Hotel,Grocery Store,Trail
4,EH2 1,55.955,-3.19491,Queen Street,2,Bar,Café,Hotel


##### Figure 2

Generate an interactive map to visualise postcode areas that are similar across both Edinburgh and Glasgow.

In [127]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scotland_merged['Latitude'], scotland_merged['Longitude'], scotland_merged['Area'], scotland_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##### Discussion

Here we have presented a k-Means cluster model of the two largest cities in Scotland; Edinburgh and Glasgow. These cities are known to be two world cultural hubs which has led to the convergent development of either city. This has resulted in a fluid movement of people moving between either city for the purposes of work, family and culture/leisure. Despite this, it can be difficult for an individual to make a permanent move from one city to another if they are not familiar with the local areas. In fact, this problem is one faced across the UK, and indeed the world. Often invidiuals build an idea of a desired location based on where they currently reside or a familiar desired location. This tool therefore provides an interactive map allowing individuals to probe areas within a known location and match those areas in an unknown location based on local amenitites, such as retail shops, restaurants, theatres, parks etc.

We found one particular cluster (cluster 2; dark blue) which is predominantly occupies the city centres of both Edinburgh and Glasgow as well as some suburban areas. Notably, many of these suburban areas consist of a number of small retail shops and cafes, such as Morningside (Edinburgh) and Hillhead (Glasgow), which represent a typical UK town High Street. Therefore for individuals that may wish to move to a suburban area of either city that has plenty of local shops and cafes nearby, they can search out all areas with a dark blue marker. Alternatively, there exist a number of clusters which relate to different suburbs of either city. These demonstrate the diversity of residential areas in both cities and likely compliment the needs of the majority of people that would use this model. Interestingly we found one notable cluster (cluster 0; red) which was heavily associated with the east suburbs of Glasgow but was relatively sparsely distributed across the city of Edinburgh.

This model, overall, appears to provide reliable information about different areas of the cities of Edinburgh and Glasgow. There remains scope to scale up the model, including the full data UK postcode dataset outlined in this study whereby a user can input a known location and desired moving location, such as we present here, anywhere in the UK. Furthermore, additional datasets regarding other sought after information such as crime safety data or school performance tables could be integrated into the database alongside the calls for venue information from the FourSquare API to develop the model into a more accurate tool for a user.

##### Conclusion

The model presented here provides a tool to demonstrate the effectiveness of using cluster machine learning to help reduce the burden of moving home to an unfamiliar location. To the best of my knowledge, this model is the first of its kind and has potential for scalability. It would be an ideal showcase tool for online property search engines to feature on their websites to attract a greater market share of users, and thus gain a competitive advantage over their rivals.