## Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods

### 1. INTRODUCTION

Soaring house prices in major cities is a hot topic. From 1980 to 2019, Stockholm has seen a rise in real estate price index of around 1060%1. Naturally, this has narrowed the scope in terms of what people can afford to live in the city. It has also affected the business landscape, large corporations such as Telia and SEB moving out of the city center and the closing of mom & pop stores. 

#### 1.2. PROBLEM
The rise in house prices has also affected restaurants. They are not able to just move out of the city center like large corporations – but it seems that they have not been ousted by large restaurant-chains (except for fast-food) or by e-commerce (for natural reasons), like in the example of the mom & pop stores. The Stockholmer’s demand seems to still be high for unique restaurants. But what types of restaurants are actually surviving in the cut-throat restaurant industry of Stockholm? That is what this paper intends to find out.

#### 1.3. USE OF REPORT
This report may be used by entrepreneurs trying to setup new restaurants in Stockholm. What types of restaurants are common in what type of districts? Is there a gap in the market for another Pizza Place in Vasastan, Stockholms highest priced district? 



### 2. DATA 

Forsquare API was used to retrieve data for restaurants, restaurant types and their location.
FORSQUARE API DATA:
Name of restaurant
Restaurant Type
Latitude, 
Longitude 
Address

Mäklarstatistik was used to retrieve housing prices for different Stockholm districts I then manually added the corresponding zip-codes to the districts (Mäklarstatistik had their own way of dividing and combining the districts)

HOUSING PRICE DATA:
District
Price / SQM (Last 12 month)
Average Price (Last 12 month)

#### 2.1. SOURCES
Forsquare API: https://developer.foursquare.com/
Mäklarstatistik: https://www.maklarstatistik.se

#### 2.2. USE OF DATA
The data will be used to cluster together the restaurant types with their corresponding level of house pricing. This will help me if the house pricing levels affects what restaurant types there are in each District



### 3. METHODOLOGY

Install Packages

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.11.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.11.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    folium-0.11.0              |             py_0          61 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    branca:          0.4.1-py_0        conda-forge
    folium:   

In [4]:
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version



Your credentails:


Load Manual data, the Prices data also contains Lat and Long data in order to be able to fetch several results around that area from Forsquare. They have a limit if you only use Restaurants in Stockholm, for example so you have to do multiple queries around the city. I will also use these Coordinates to assign House Pricing values, instead of postal codes.

In [5]:
!wget -q -O 'bostadspriser.csv' https://www.dropbox.com/s/occkfnntcsun9k0/Bostadspriser.csv
print('Data downloaded!')
df_prices = pd.read_csv('bostadspriser.csv')


Data downloaded!


In [6]:
df_prices

Unnamed: 0,District,kr/sqm (12 mnth),avg price 12 mnth,latitude,longitude
0,Bromma-Vasterled,56926,3495000,59.337961,17.945547
1,Centrala Stockholm,91627,5438000,59.327695,18.069292
2,Brannkyrka-Skarholmen,45503,2932000,59.283816,17.970571
3,Enskede-Skarpnack,58189,3306000,59.275398,18.105174
4,Essingen,76204,3971000,59.321324,17.992583
5,Farsta-Vantor,45123,2718000,59.25153,18.089711
6,Hagersten-Liljeholmen,66738,3830000,59.301059,17.983589
7,Hasselby-Vallingby,41113,2567000,59.365964,17.871634
8,Kungsholmen,88818,4981000,59.332419,18.036804
9,Spaanga-Kista,34779,2262000,59.380716,17.905998


Create a function that loops that uses the coordinates for each District and then fetches nearby restaurants of each neighborhood using the Foursquare API. Category is set for Food venues. New dataframe sthlm_venues is created with the data

In [8]:
category = '4d4b7105d754a06374d81259' #Food
LIMIT = 100
radius = 100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT, category)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# type your answer here

sthlm_venues = getNearbyVenues(names=df_prices['District'],
                                   latitudes=df_prices['latitude'],
                                   longitudes=df_prices['longitude']
                                  )

sthlm_venues

Bromma-Vasterled
Centrala Stockholm
Brannkyrka-Skarholmen
Enskede-Skarpnack
Essingen
Farsta-Vantor
Hagersten-Liljeholmen
Hasselby-Vallingby
Kungsholmen
Spaanga-Kista
Sodermalm
Vasastan-Norrmalm
ostermalm


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bromma-Vasterled,59.337961,17.945547,Sushibar Kirin,59.339206,17.937934,Sushi Restaurant
1,Bromma-Vasterled,59.337961,17.945547,Pizzeria La Bella,59.333929,17.947791,Pizza Place
2,Bromma-Vasterled,59.337961,17.945547,Brommagrillen,59.342037,17.949059,Fast Food Restaurant
3,Bromma-Vasterled,59.337961,17.945547,Bromma Restaurang Pizzeria,59.338735,17.940599,Pizza Place
4,Bromma-Vasterled,59.337961,17.945547,Finess Konditori,59.340032,17.940417,Bakery
5,Bromma-Vasterled,59.337961,17.945547,Daiichi Sushi,59.338991,17.939297,Sushi Restaurant
6,Bromma-Vasterled,59.337961,17.945547,Vivels Bageri Brommaplan,59.338566,17.938986,Café
7,Bromma-Vasterled,59.337961,17.945547,Rasmus Grill,59.338692,17.938915,Fast Food Restaurant
8,Bromma-Vasterled,59.337961,17.945547,Restaurant East Ocean,59.339246,17.938755,Chinese Restaurant
9,Bromma-Vasterled,59.337961,17.945547,Sofie's Grill,59.337958,17.952971,Burger Joint


Add Pricing Data to the columns

In [9]:
addcol = df_prices.loc[:, ['District', 'kr/sqm (12 mnth)']]
addcol.rename(columns={'District':'Dist'}, inplace=True)
sthlm_venues=sthlm_venues.merge(addcol, left_on='District', right_on='Dist', how='inner')
sthlm_venues=sthlm_venues.drop(['Dist'], axis=1)
sthlm_venues


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,kr/sqm (12 mnth)
0,Bromma-Vasterled,59.337961,17.945547,Sushibar Kirin,59.339206,17.937934,Sushi Restaurant,56926
1,Bromma-Vasterled,59.337961,17.945547,Pizzeria La Bella,59.333929,17.947791,Pizza Place,56926
2,Bromma-Vasterled,59.337961,17.945547,Brommagrillen,59.342037,17.949059,Fast Food Restaurant,56926
3,Bromma-Vasterled,59.337961,17.945547,Bromma Restaurang Pizzeria,59.338735,17.940599,Pizza Place,56926
4,Bromma-Vasterled,59.337961,17.945547,Finess Konditori,59.340032,17.940417,Bakery,56926
5,Bromma-Vasterled,59.337961,17.945547,Daiichi Sushi,59.338991,17.939297,Sushi Restaurant,56926
6,Bromma-Vasterled,59.337961,17.945547,Vivels Bageri Brommaplan,59.338566,17.938986,Café,56926
7,Bromma-Vasterled,59.337961,17.945547,Rasmus Grill,59.338692,17.938915,Fast Food Restaurant,56926
8,Bromma-Vasterled,59.337961,17.945547,Restaurant East Ocean,59.339246,17.938755,Chinese Restaurant,56926
9,Bromma-Vasterled,59.337961,17.945547,Sofie's Grill,59.337958,17.952971,Burger Joint,56926


In [79]:
sthlm_venues

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,kr/sqm (12 mnth)
0,Bromma-Vasterled,59.337961,17.945547,Sushibar Kirin,59.339206,17.937934,Sushi Restaurant,56926
1,Bromma-Vasterled,59.337961,17.945547,Pizzeria La Bella,59.333929,17.947791,Pizza Place,56926
2,Bromma-Vasterled,59.337961,17.945547,Brommagrillen,59.342037,17.949059,Fast Food Restaurant,56926
3,Bromma-Vasterled,59.337961,17.945547,Bromma Restaurang Pizzeria,59.338735,17.940599,Pizza Place,56926
4,Bromma-Vasterled,59.337961,17.945547,Finess Konditori,59.340032,17.940417,Bakery,56926
5,Bromma-Vasterled,59.337961,17.945547,Daiichi Sushi,59.338991,17.939297,Sushi Restaurant,56926
6,Bromma-Vasterled,59.337961,17.945547,Vivels Bageri Brommaplan,59.338566,17.938986,Café,56926
7,Bromma-Vasterled,59.337961,17.945547,Rasmus Grill,59.338692,17.938915,Fast Food Restaurant,56926
8,Bromma-Vasterled,59.337961,17.945547,Restaurant East Ocean,59.339246,17.938755,Chinese Restaurant,56926
9,Bromma-Vasterled,59.337961,17.945547,Sofie's Grill,59.337958,17.952971,Burger Joint,56926


See how many venues in each District. Only to see that there is not too few samples.

In [10]:
sthlm_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,kr/sqm (12 mnth)
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Brannkyrka-Skarholmen,11,11,11,11,11,11,11
Bromma-Vasterled,14,14,14,14,14,14,14
Centrala Stockholm,47,47,47,47,47,47,47
Enskede-Skarpnack,1,1,1,1,1,1,1
Essingen,6,6,6,6,6,6,6
Farsta-Vantor,2,2,2,2,2,2,2
Hagersten-Liljeholmen,6,6,6,6,6,6,6
Hasselby-Vallingby,16,16,16,16,16,16,16
Kungsholmen,84,84,84,84,84,84,84
Sodermalm,43,43,43,43,43,43,43


Create dummy categories, see if a category of a restaurant exists in a District 

In [11]:
# one hot encoding
sthlm_onehot = pd.get_dummies(sthlm_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sthlm_onehot['District'] = sthlm_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [sthlm_onehot.columns[-1]] + list(sthlm_onehot.columns[:-1])
sthlm_onehot = sthlm_onehot[fixed_columns]

sthlm_onehot.head()

Unnamed: 0,District,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Breakfast Spot,Burger Joint,Café,...,Soup Place,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Bromma-Vasterled,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,Bromma-Vasterled,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bromma-Vasterled,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bromma-Vasterled,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bromma-Vasterled,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group the values by mean to see how common a restaurant type is in each District

In [12]:
sthlm_grouped = sthlm_onehot.groupby('District').mean().reset_index()
sthlm_grouped

Unnamed: 0,District,American Restaurant,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bistro,Breakfast Spot,Burger Joint,Café,...,Soup Place,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Brannkyrka-Skarholmen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,...,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0
1,Bromma-Vasterled,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.071429,...,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0
2,Centrala Stockholm,0.0,0.0,0.021277,0.0,0.06383,0.0,0.021277,0.042553,0.191489,...,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.021277,0.0,0.0
3,Enskede-Skarpnack,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Essingen,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Farsta-Vantor,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Hagersten-Liljeholmen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0
7,Hasselby-Vallingby,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0
8,Kungsholmen,0.011905,0.02381,0.0,0.011905,0.071429,0.02381,0.0,0.035714,0.083333,...,0.011905,0.011905,0.02381,0.095238,0.0,0.02381,0.047619,0.0,0.011905,0.02381
9,Sodermalm,0.0,0.023256,0.0,0.0,0.093023,0.023256,0.0,0.046512,0.116279,...,0.0,0.0,0.023256,0.069767,0.0,0.0,0.046512,0.0,0.023256,0.023256


Make a list to visualize which are the top 10 most common Restaurant types in each Price class

In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues Använder indicators för 1ST 2ND 3RD Sen går den till th alltså 4th
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
sthlm_venues_sorted = pd.DataFrame(columns=columns)
sthlm_venues_sorted['District'] = sthlm_grouped['District']

for ind in np.arange(sthlm_grouped.shape[0]):
    sthlm_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sthlm_grouped.iloc[ind, :], num_top_venues)

sthlm_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brannkyrka-Skarholmen,Scandinavian Restaurant,Fast Food Restaurant,Pizza Place,Café,Italian Restaurant,Sushi Restaurant,Steakhouse,Vietnamese Restaurant,Ethiopian Restaurant,Falafel Restaurant
1,Bromma-Vasterled,Fast Food Restaurant,Pizza Place,Bakery,Sushi Restaurant,Indian Restaurant,Chinese Restaurant,Burger Joint,Café,French Restaurant,Food Court
2,Centrala Stockholm,Scandinavian Restaurant,Café,Bakery,Restaurant,Italian Restaurant,Burger Joint,Irish Pub,Salad Place,French Restaurant,Mexican Restaurant
3,Enskede-Skarpnack,Café,Vietnamese Restaurant,Eastern European Restaurant,Hawaiian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub,French Restaurant,Food Truck
4,Essingen,Deli / Bodega,Asian Restaurant,Italian Restaurant,French Restaurant,Café,Pizza Place,Ethiopian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant
5,Farsta-Vantor,Pizza Place,Vietnamese Restaurant,Eastern European Restaurant,Hawaiian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub,French Restaurant,Food Truck
6,Hagersten-Liljeholmen,Greek Restaurant,Indian Restaurant,Thai Restaurant,Sushi Restaurant,Middle Eastern Restaurant,Pizza Place,Eastern European Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub
7,Hasselby-Vallingby,Café,Fast Food Restaurant,American Restaurant,Food Court,Pizza Place,Hawaiian Restaurant,Sandwich Place,Restaurant,Sushi Restaurant,Taco Place
8,Kungsholmen,Indian Restaurant,Sushi Restaurant,Café,Bakery,Pizza Place,Scandinavian Restaurant,Thai Restaurant,Burger Joint,Japanese Restaurant,Italian Restaurant
9,Sodermalm,Café,Bakery,Greek Restaurant,Sushi Restaurant,Scandinavian Restaurant,Pizza Place,Indian Restaurant,Fast Food Restaurant,Italian Restaurant,Thai Restaurant


Cluster the data using Kmeans. I use two clusters to try to distinguish if there is any difference between more expensive and less expensive districts. An indication that there would be a difference between restaurant types in expensive versus less expensive districts would be if the less expensive districts would be clustered together.

In [14]:
# set number of clusters
kclusters = 2

sthlm_grouped_clustering = sthlm_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sthlm_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0], dtype=int32)

In [15]:

# add clustering labels
sthlm_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sthlm_merged = df_prices



### RESULTS 

Here are the 10 most common restaurant types in each District

In [16]:
# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
sthlm_merged = sthlm_merged.join(sthlm_venues_sorted.set_index('District'), on='District')


sthlm_merged # check the last columns!

Unnamed: 0,District,kr/sqm (12 mnth),avg price 12 mnth,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bromma-Vasterled,56926,3495000,59.337961,17.945547,0,Fast Food Restaurant,Pizza Place,Bakery,Sushi Restaurant,Indian Restaurant,Chinese Restaurant,Burger Joint,Café,French Restaurant,Food Court
1,Centrala Stockholm,91627,5438000,59.327695,18.069292,0,Scandinavian Restaurant,Café,Bakery,Restaurant,Italian Restaurant,Burger Joint,Irish Pub,Salad Place,French Restaurant,Mexican Restaurant
2,Brannkyrka-Skarholmen,45503,2932000,59.283816,17.970571,0,Scandinavian Restaurant,Fast Food Restaurant,Pizza Place,Café,Italian Restaurant,Sushi Restaurant,Steakhouse,Vietnamese Restaurant,Ethiopian Restaurant,Falafel Restaurant
3,Enskede-Skarpnack,58189,3306000,59.275398,18.105174,1,Café,Vietnamese Restaurant,Eastern European Restaurant,Hawaiian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub,French Restaurant,Food Truck
4,Essingen,76204,3971000,59.321324,17.992583,0,Deli / Bodega,Asian Restaurant,Italian Restaurant,French Restaurant,Café,Pizza Place,Ethiopian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant
5,Farsta-Vantor,45123,2718000,59.25153,18.089711,0,Pizza Place,Vietnamese Restaurant,Eastern European Restaurant,Hawaiian Restaurant,Greek Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub,French Restaurant,Food Truck
6,Hagersten-Liljeholmen,66738,3830000,59.301059,17.983589,0,Greek Restaurant,Indian Restaurant,Thai Restaurant,Sushi Restaurant,Middle Eastern Restaurant,Pizza Place,Eastern European Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub
7,Hasselby-Vallingby,41113,2567000,59.365964,17.871634,0,Café,Fast Food Restaurant,American Restaurant,Food Court,Pizza Place,Hawaiian Restaurant,Sandwich Place,Restaurant,Sushi Restaurant,Taco Place
8,Kungsholmen,88818,4981000,59.332419,18.036804,0,Indian Restaurant,Sushi Restaurant,Café,Bakery,Pizza Place,Scandinavian Restaurant,Thai Restaurant,Burger Joint,Japanese Restaurant,Italian Restaurant
9,Spaanga-Kista,34779,2262000,59.380716,17.905998,0,Pizza Place,Restaurant,Eastern European Restaurant,Thai Restaurant,Bakery,Gastropub,Food Truck,Chinese Restaurant,Greek Restaurant,Gluten-free Restaurant


Here is a visualization of the Kmeans. Where the Districts have been clustered together based on what types of restaurants are in each district.

In [17]:
# create map

address = 'Stockholm, Sweden'

geolocator = Nominatim(user_agent="sthlm_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Stockholm are {}, {}.'.format(latitude, longitude))

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sthlm_merged['latitude'], sthlm_merged['longitude'], sthlm_merged['District'], sthlm_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Stockholm are 59.3251172, 18.0710935.


### Discussion

The Kmeans shows that there is only one District that is distinct from the others. This likely indicates that all Districts have similar restaurant types, possibly is the one district distinct, but likely not. It is likely only shown as distinct as Kmeans is divided into two groups and there has to be two groups.

For someone that wants to establish a new restaurant in Stockholm, one should look at the top restaurants in each district and stay away from those restaurant types if they wants to avoid competition.

### Conclusion

In conclusion, there seems to be no indication from this study that the house prices affects what restaurant type can afford to establish in what location in Stockholm. It seems like the kinds of restaurants are similar in each district in Stockholm.