# Capstone Project IBM: Copenhagen Neighborhood Analysis

## Introduction

Copenhagen is the capital and most populous city of Denmark. The city is also reputed to be the happiest city in the world. I will do an exchange academic semester in Copenhagen during the next semester and would like to gain a better understanding of the city facilities. I found that choosing a neighborhood to live in can be a complicated task to do, but with the help of location data from Foursquare, it could be a little bit easier.

## Problem

The objective of this capstone project is to analyze and select the best locations close to the city of Copenhagen using data science methodology and machine learning techniques like clustering. This project aims to provide solutions to answer the question: In the city of Copenhagen, what would be a better place to live in during my exchange period?

## Target Audience

- People interested in moving to Copenhagen and looking for a perfect neighborhood for their needs
- People interested in Copenhagen facilities for other purpose: open a restaurant, start their business, etc. 
- A beginner data scientist who may use this research as an example

## Data

For this project, the following data is needed:
- List of neighborhoods in Copenhagen
- Latitude and longitude coordinates of neighborhoods to get the venue data
- Venues Details

<u> 1) Location Data </u>

First, we need to get a full list of all Copenhagen neighborhoods. <b> Regionh.dk </b> article list of Postal codes in Copenhagen is a great place to start: https://www.regionh.dk/english/about-the-capital-region/facts-about-the-region/Pages/Postal-codes.aspx

<b> BeautifulSoup </b> is a Python library used for pulling data out of HTML. We will use it to parse the Regionh.dk page.

For geolocation data, we will use the <b> Geocoding API</b>. To get more information about it, follow the Geocoding Developer Guide.

<u> 2) Venues Data (Foursquare API) </u>

<b> Foursquare API </b> provides information about venues and geolocation. We will use Foursquare API to get the venue data for Copenhagen neighborhoods. Foursquare has one of the largest databases of 105+ million places and is used by over 125,000 developers. Foursquare API will provide many categories of the venue data such as name, location, hours, rating, or prices.

### Import Required Libraries

In [1]:
import pandas as pd # Library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np # Library to handle data in a vectorized manner

import requests # Library to handle requests
from bs4 import BeautifulSoup

import json # Library to handle JSON files
from pandas.io.json import json_normalize # Tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # Map rendering library

# Import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


### Neighborhoods Data

In [2]:
# Get the data
result = requests.get("https://www.regionh.dk/english/about-the-capital-region/facts-about-the-region/Pages/Postal-codes.aspx")
data = result.content
soup = BeautifulSoup(data,'html5lib')

# Create the Dataframe with HTML table data
table=soup.find('table')
cph = pd.read_html(str(table), header=0, flavor='bs4')[0]
cph.columns = ["Neighborhood", "PostalCode"]

# Eliminate unsuitable postal codes
cph = cph[cph['PostalCode'].str.contains('N/A')==False]
cph = cph[cph['PostalCode'].str.startswith('3')==False]
cph = cph[cph['PostalCode'].str.startswith('4')==False]

# Eliminate distant neighborhoods
distant_neighborhoods = ['Hørsholm','Nivå','Kokkedal','Rungsted Kyst','Vedbæk',
                         'Skodsborg','Nærum','Holte','Virum','Måløv','Smørum','Hedehusene']
cph = cph[cph['Neighborhood'].isin(distant_neighborhoods)==False]

cph.reset_index(inplace=True)
cph

Unnamed: 0,index,Neighborhood,PostalCode
0,0,Albertslund,2620
1,3,Bagsværd,2880
2,4,Ballerup,2750
3,6,Brøndby,2605
4,7,Brøndby Strand,2660
5,8,Brønshøj,2700
6,9,Charlottenlund,2920
7,10,Dragør,2791
8,12,Dyssegård,2870
9,17,Frederiksberg,2000


### Google Maps Geocoding API

In [3]:
API_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'

# Initializing the latitude and longitude lists
latitudes = [] 
longitudes = []

for nbd in cph["Neighborhood"]:
    params = {
        'key': API_KEY,
        'address': "{}, Denmark".format(nbd)
    }
    
    # Loads the JSON file in the form of a python dictionary
    base_url = 'https://maps.googleapis.com/maps/api/geocode/json?'
    response = requests.get(base_url, params=params)
    data = response.json()
    
    # Extracts the results information out of the JSON file
    result = data['results'][0]
    location = result['geometry']['location']
    
    lat = location['lat']
    lng = location['lng']
    
    # Appending to the lists
    latitudes.append(lat) 
    longitudes.append(lng) 

In [4]:
# Add Latitude and Longitude columns to the Dataframe
cph['Latitude'] = latitudes
cph['Longitude'] = longitudes
    
cph.head(5)

Unnamed: 0,index,Neighborhood,PostalCode,Latitude,Longitude
0,0,Albertslund,2620,55.677263,12.343027
1,3,Bagsværd,2880,55.765314,12.457244
2,4,Ballerup,2750,55.724508,12.354765
3,6,Brøndby,2605,55.647287,12.414867
4,7,Brøndby Strand,2660,55.618777,12.417375


### Visualize Neighborhoods

In [5]:
address = 'Copenhagen, Denmark'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Copenhagen are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Copenhagen are 55.6867243, 12.5700724.


In [6]:
# create map of Copenhagen using latitude and longitude values
map_cph = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(cph['Latitude'], cph['Longitude'], cph['Neighborhood']):
    label = '{}, Denmark'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cph)  
    
map_cph

### Foursquare API

In [7]:
CLIENT_ID = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # Default Foursquare API limit value

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
# Get venues data
cph_venues = getNearbyVenues(cph['Neighborhood'], cph['Latitude'], cph['Longitude'], radius=500)
cph_venues.head()

Albertslund
Bagsværd
Ballerup
Brøndby
Brøndby Strand
Brønshøj
Charlottenlund
Dragør
Dyssegård
Frederiksberg
Gentofte
Glostrup
Hellerup
Herlev
Hvidovre
Ishøj
Kastrup
Klampenborg
Kongens Lyngby
København N
København NV
København S
København SV
København Ø
Rødovre
Skovlunde
Søborg
Taastrup
Valby
Vallensbæk
Vallensbæk Strand
Vanløse


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Albertslund,55.677263,12.343027,Galgebakken Torv,55.679553,12.347632,Plaza
1,Albertslund,55.677263,12.343027,Herstedly,55.677052,12.336611,Bed & Breakfast
2,Bagsværd,55.765314,12.457244,Aldershvile Planteskole,55.765009,12.457966,Flower Shop
3,Bagsværd,55.765314,12.457244,Juleaften,55.762165,12.461558,Diner
4,Bagsværd,55.765314,12.457244,Berent & Ko,55.761201,12.456102,Steakhouse


In [10]:
# Number of venues returned for each neighborhood
cph_venues.groupby('Neighborhood').count().head(5)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Albertslund,2,2,2,2,2,2
Bagsværd,5,5,5,5,5,5
Ballerup,6,6,6,6,6,6
Brøndby,5,5,5,5,5,5
Brøndby Strand,9,9,9,9,9,9


### Analyze Each Neighborhood

#### Frequency of occurence of each category

In [11]:
# one hot encoding
cph_onehot = pd.get_dummies(cph_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
cph_onehot['Neighborhood'] = cph_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [cph_onehot.columns[-1]] + list(cph_onehot.columns[:-1])
cph_onehot = cph_onehot[fixed_columns]

# taking the mean of the frequency of occurrence
cph_grouped = cph_onehot.groupby('Neighborhood').mean().reset_index()
cph_grouped.head(5)

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bistro,Bookstore,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Bus Station,Café,Camera Store,Candy Store,Castle,Cemetery,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Diner,Discount Store,Dive Bar,Electronics Store,Falafel Restaurant,Farm,Fast Food Restaurant,Flower Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,Gift Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,History Museum,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Movie Theater,Multiplex,Museum,Music Venue,Nightclub,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pie Shop,Pizza Place,Playground,Plaza,Pub,Restaurant,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Spa,Sporting Goods Shop,Stables,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Tourist Information Center,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Albertslund,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bagsværd,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ballerup,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brøndby,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Brøndby Strand,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444444,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0


#### Top 5 most common venues for each neighborhood

In [12]:
num_top_venues = 5

for hood in cph_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = cph_grouped[cph_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Albertslund----
               venue  freq
0              Plaza   0.5
1    Bed & Breakfast   0.5
2  Mobile Phone Shop   0.0
3         Playground   0.0
4        Pizza Place   0.0


----Bagsværd----
               venue  freq
0  Convenience Store   0.2
1        Flower Shop   0.2
2         Steakhouse   0.2
3             Garden   0.2
4              Diner   0.2


----Ballerup----
               venue  freq
0  Accessories Store  0.17
1             Lounge  0.17
2     Clothing Store  0.17
3          Nightclub  0.17
4            Stadium  0.17


----Brøndby----
               venue  freq
0  Convention Center   0.2
1         Food Court   0.2
2     Soccer Stadium   0.2
3       Soccer Field   0.2
4             Lounge   0.2


----Brøndby Strand----
                  venue  freq
0         Grocery Store  0.44
1  Gym / Fitness Center  0.11
2         Train Station  0.11
3           Bus Station  0.11
4                 Beach  0.11


----Brønshøj----
                    venue  freq
0          Cosmetics

#### DataFrame of the top 10 most common venues for each neighborhood

In [13]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = cph_grouped['Neighborhood']

for ind in np.arange(cph_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(cph_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Albertslund,Plaza,Bed & Breakfast,Falafel Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio
1,Bagsværd,Steakhouse,Convenience Store,Diner,Garden,Flower Shop,Yoga Studio,Deli / Bodega,Discount Store,Dessert Shop,Cupcake Shop
2,Ballerup,Accessories Store,Stadium,Lounge,Nightclub,Clothing Store,Playground,Asian Restaurant,Concert Hall,Convention Center,Art Gallery
3,Brøndby,Soccer Stadium,Lounge,Soccer Field,Convention Center,Food Court,Yoga Studio,Deli / Bodega,Diner,Dessert Shop,Dance Studio
4,Brøndby Strand,Grocery Store,Gym / Fitness Center,Train Station,Shopping Mall,Beach,Bus Station,Deli / Bodega,Discount Store,Diner,Dessert Shop


### Cluster Neighborhoods

In [14]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

cph_grouped_clustering = cph_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cph_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 3, 0, 3, 3, 0, 0], dtype=int32)

In [15]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

cph_merged = cph

# merge cph_grouped with cph_data to add latitude/longitude for each neighborhood
cph_merged = cph_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how='inner')

cph_merged.head() # check the last columns!

Unnamed: 0,index,Neighborhood,PostalCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Albertslund,2620,55.677263,12.343027,0,Plaza,Bed & Breakfast,Falafel Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio
1,3,Bagsværd,2880,55.765314,12.457244,0,Steakhouse,Convenience Store,Diner,Garden,Flower Shop,Yoga Studio,Deli / Bodega,Discount Store,Dessert Shop,Cupcake Shop
2,4,Ballerup,2750,55.724508,12.354765,0,Accessories Store,Stadium,Lounge,Nightclub,Clothing Store,Playground,Asian Restaurant,Concert Hall,Convention Center,Art Gallery
3,6,Brøndby,2605,55.647287,12.414867,0,Soccer Stadium,Lounge,Soccer Field,Convention Center,Food Court,Yoga Studio,Deli / Bodega,Diner,Dessert Shop,Dance Studio
4,7,Brøndby Strand,2660,55.618777,12.417375,3,Grocery Store,Gym / Fitness Center,Train Station,Shopping Mall,Beach,Bus Station,Deli / Bodega,Discount Store,Diner,Dessert Shop


### Visualize Clusters

In [16]:
from geopy.geocoders import Nominatim
address = 'Copenhagen, Denmark'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Copenhagen are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Copenhagen are 55.6867243, 12.5700724.


In [17]:
# create map of Copenhagen using latitude and longitude values
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cph_merged['Latitude'], cph_merged['Longitude'], cph_merged['Neighborhood'], cph_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color = rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

#### Cluster 1

In [18]:
cph_merged.loc[cph_merged['Cluster Labels'] == 0, cph_merged.columns[[0] + list(range(5, cph_merged.shape[1]))]]

Unnamed: 0,index,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,0,Plaza,Bed & Breakfast,Falafel Restaurant,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio
1,3,0,Steakhouse,Convenience Store,Diner,Garden,Flower Shop,Yoga Studio,Deli / Bodega,Discount Store,Dessert Shop,Cupcake Shop
2,4,0,Accessories Store,Stadium,Lounge,Nightclub,Clothing Store,Playground,Asian Restaurant,Concert Hall,Convention Center,Art Gallery
3,6,0,Soccer Stadium,Lounge,Soccer Field,Convention Center,Food Court,Yoga Studio,Deli / Bodega,Diner,Dessert Shop,Dance Studio
5,8,0,Grocery Store,Sporting Goods Shop,Furniture / Home Store,Ice Cream Shop,Flower Shop,Falafel Restaurant,Discount Store,Cosmetics Shop,Coffee Shop,Park
8,12,0,Discount Store,Gym / Fitness Center,Restaurant,Fast Food Restaurant,Ice Cream Shop,Yoga Studio,Dance Studio,Diner,Dessert Shop,Deli / Bodega
9,17,0,Gastropub,Gym / Fitness Center,Park,Scandinavian Restaurant,Yoga Studio,Dive Bar,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop
10,20,0,Italian Restaurant,Convenience Store,Train Station,Gym / Fitness Center,Hotel,Yoga Studio,Dive Bar,Discount Store,Diner,Dessert Shop
11,22,0,Electronics Store,Toy / Game Store,Shop & Service,Café,Restaurant,Cosmetics Shop,Park,Convenience Store,Convention Center,Dive Bar
12,29,0,Grocery Store,Sushi Restaurant,Hotel,Ice Cream Shop,Food Truck,Park,Restaurant,Pizza Place,Asian Restaurant,Bakery


#### Cluster 2

In [19]:
cph_merged.loc[cph_merged['Cluster Labels'] == 1, cph_merged.columns[[0] + list(range(5, cph_merged.shape[1]))]]

Unnamed: 0,index,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,32,1,Snack Place,Yoga Studio,Electronics Store,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio


#### Cluster 3

In [20]:
cph_merged.loc[cph_merged['Cluster Labels'] == 2, cph_merged.columns[[0] + list(range(5,cph_merged.shape[1]))]]

Unnamed: 0,index,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,79,2,Pharmacy,Bar,Yoga Studio,Falafel Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio
30,80,2,Pharmacy,Bar,Yoga Studio,Falafel Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio


#### Cluster 4

In [21]:
cph_merged.loc[cph_merged['Cluster Labels'] == 3, cph_merged.columns[[0] + list(range(5,cph_merged.shape[1]))]]

Unnamed: 0,index,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,7,3,Grocery Store,Gym / Fitness Center,Train Station,Shopping Mall,Beach,Bus Station,Deli / Bodega,Discount Store,Diner,Dessert Shop
6,9,3,Castle,Athletics & Sports,Yoga Studio,Farm,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega
7,10,3,Grocery Store,Pharmacy,Shopping Mall,Supermarket,Dive Bar,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop
14,38,3,Grocery Store,Gas Station,Café,Athletics & Sports,Fast Food Restaurant,Gym / Fitness Center,Gastropub,Garden Center,Convenience Store,Convention Center
15,40,3,Gas Station,Convenience Store,Garden,Supermarket,Yoga Studio,Dance Studio,Discount Store,Diner,Dessert Shop,Deli / Bodega
18,46,3,Grocery Store,Thai Restaurant,Gas Station,Park,Playground,Arts & Crafts Store,Comfort Food Restaurant,Art Gallery,Convenience Store,Convention Center
22,52,3,Grocery Store,Bakery,Pizza Place,Plaza,Pub,Café,Burger Joint,Pharmacy,Hotel,Deli / Bodega
24,63,3,Grocery Store,Gas Station,Brewery,Fast Food Restaurant,Electronics Store,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop
26,74,3,Grocery Store,Pizza Place,Bakery,Convenience Store,Pharmacy,Ice Cream Shop,Discount Store,Plaza,Toy / Game Store,Comfort Food Restaurant
27,75,3,Grocery Store,Café,Discount Store,Steakhouse,Falafel Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop


#### Cluster 5

In [22]:
cph_merged.loc[cph_merged['Cluster Labels'] == 4, cph_merged.columns[[0] + list(range(5,cph_merged.shape[1]))]]

Unnamed: 0,index,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,67,4,Basketball Court,Yoga Studio,Farm,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega


### Observations and Conclusion

- Three clusters have few neighborhoods (2, 3 and 5). These clusters appear to be in the farthest areas from the city center. Indeed, the farthest neighborhoods are more difficult to characterize and thus not a good option for a stay close to Copenhagen. 

- The two last clusters present the largest number of neighborhoods (10 and 18) and seem to be more relevant for a stay in the city. The two clusters are well defined and show the various activities available in Copenhagen. 

- In general, we can see that Cluster 4 represents the remote area of the city while Cluster 1 concerns the center. 

- Cluster 4 seems to be for people who prefer convenient-based activity, "Grocery Store" and "Gas Station" are the most popular categories for the neighborhoods of this cluster. 

- Cluster 1 seems to be for people who prefer a more diversified activity, from various restaurants to cultural and sport excursions. Unlike Cluster 4, Cluster 1 doesn't provide with one most popular category. 

- Consequently, you can expect more essentials facilities in the back/southewest of the city. In the other hand, if you plan to stay longer in Copenhagen, center/east should be more appropriate.
