# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone Course by IBM/Coursera

### Table of contents
1. [Introduction](#introduction)
2. [Data](#data)
3. [Methodology](#methodology)
4. [Results](#results)
5. [Discussion](#discussion)
6. [Conclusion](#conclusion)

## 1. Introduction <a name="introduction"></a>

Berlin is the capital and largest city of Germany by both area and population. It hosts an exciting art and nightlife scene and, best of all, it is very affordable to live in. While looking for an apartment, you can get easily lost in the decision of which neighborhood suits you best.

Berlin is divided into 12 different neighbourhoods (Bezirke in German), with each of it having its own vibe and feel. Nevertheless, this project aims to analyze different neighborhoods of Berlin by exploring venues surrounding the public transport stations and describe different clusters of the neighborhoods. The Berlin-all-inclusive public transport system consists of U-Bahn (underground railway), S-Bahn (elevated railway), buses, and trams, which are managed by The Berliner Verkehrsbetriebe (BVG) and the Deutsche Bahn (DB). In the scope of this work I focused on **exploring and clustering venues surrounding the U-Bahn (173) and S-Bahn (166) stations**, which make a total of 339 stations.

The result can be useful for someone planning to move to Berlin (for studying, looking for jobs) and would like to find a place that fits their lifestyle and interests.

## 2. Data <a name="data"></a>

> Following data sources was used to extract and generate the required information:
* list of the Berlin U-Bahn stations (173 stations, total route length: 151,7 km) and their geographical coordinates web scraped from 
[Wikipedia: Liste der Berliner U-Bahnhöfe](https://de.wikipedia.org/wiki/Liste_der_Berliner_U-Bahnh%C3%B6fe).
* list of the Berlin S-Bahn stations (166 stations, total route length: 327,4 km) and their geographical coordinates web scraped from 
[Wikipedia: Liste der Stationen der S-Bahn Berlin](https://de.wikipedia.org/wiki/Liste_der_Stationen_der_S-Bahn_Berlin).
> In order to explore and cluster the neighborhoods surrounding each U-Bahn and S-Bahn station, Foursquare Venue Data was deployed. The average distance between two U-Bahn stations is 0,87 km, between two S-Bahn stations is 1,97 km. In order to avoid overlapping and to cover reasonable walking distance around each station, the radius of the query was limited to 500 meters. Therefore:
* list of the venue categories on the main level and on lower levels from Foursquare Venue Categories.

#### Let's first install (if needed) and import necessary libraries:

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#!pip3 install beautifulsoup4
#!pip3 install lxml
#!pip3 install requests
from bs4 import BeautifulSoup
import requests

# convert an address into latitude and longitude values
#!pip3 install geopy
from geopy.geocoders import Nominatim

# Matplotlib and associated plotting modules
#!pip3 install matplotlib
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from collections import Counter, defaultdict

# map rendering library
#!pip3 install folium
import folium

# library to handle JSON files
import json

# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize
import xml.etree.ElementTree as et 

print("All requested libraries were successfully imported!")

All requested libraries were successfully imported!


#### Let's scrape data from the Wikipedia page and get a list of U-Bahn stations in Berlin and their coordinate values:

First, get a local copy of the Wikipedia article.

In [2]:
import urllib.request

url = 'https://de.wikipedia.org/wiki/Liste_der_Berliner_U-Bahnh%C3%B6fe'
req = urllib.request.urlopen(url)
article = req.read().decode()

with open('Liste_der_Berliner_U-Bahnhoefe.html', 'w') as fo:
    fo.write(article)

Load article, turn it into soup and get the table.

In [3]:
article = open('Liste_der_Berliner_U-Bahnh%C3%B6fe.html').read()
soup = BeautifulSoup(article, 'html.parser')
tables = soup.find_all('table', class_ ='sortable')

# Search through the tables for the one with the headings we want.
for table in tables:
    ths = table.find_all('th', {"id" : "curr_table"})
    headings = [th.text.strip() for th in ths]
    if headings[:5] == ['Linie', 'Eröffnung', 'Lage', 'Ortsteil', 'Umstieg']:
        break

Transform the wanted data into a pandas dataframe.

In [4]:
table_rows = table.find_all('tr')
res = []
with open('Liste_der_Berliner_U-Bahnhoefe.txt', 'w') as fo:
    for tr in table_rows:
        td = tr.find_all('td')
        row = [tr.text.strip() for tr in td if tr.text.strip()]
        if row:
            res.append(row)         
df_ubahn = pd.DataFrame(res, columns=["Bahnhof","Line","Eröffnung","Lage","Ortsteil","Umstieg","Denkmal",])

# Remove unnecessary columns, keep only the first column
droped_columns = df_ubahn.drop(["Line", "Eröffnung", "Lage", "Ortsteil", "Umstieg", "Denkmal"], axis=1)
df_ubahn = droped_columns
df_ubahn.head()

Unnamed: 0,Bahnhof
0,"Adenauerplatz (Ad)52° 29′ 59″ N, 13° 18′ 26″ O"
1,"Afrikanische Straße (Afr)52° 33′ 38″ N, 13° 20..."
2,"Alexanderplatz (A)52° 31′ 17″ N, 13° 24′ 48″ O"
3,"Alexanderplatz (Al)52° 31′ 17″ N, 13° 24′ 48″ O"
4,"Alexanderplatz (Ap)52° 31′ 17″ N, 13° 24′ 48″ O"


Clean the dataframe and filter only wanted data.

In [5]:
# Remove parentheses and all the data within them since they're not necessary
df_ubahn = df_ubahn["Bahnhof"].str.replace(r"\(.*\)","")

# Remove duplicate stations
df_ubahn.drop_duplicates(keep = 'first', inplace=True)

# Spliting the dataframe into 3 new columns: Station, Latitude, Longitude
df_ubahn = df_ubahn.to_frame().reset_index(drop=True) # Reset index
df_ubahn = df_ubahn.Bahnhof.str.extract(r'(?P<Station>.*?)(?P<Latitude>\d+°[^,]+),(?P<Longitude>.*)', expand=True)
df_ubahn.head()

Unnamed: 0,Station,Latitude,Longitude
0,Adenauerplatz,52° 29′ 59″ N,13° 18′ 26″ O
1,Afrikanische Straße,52° 33′ 38″ N,13° 20′ 3″ O
2,Alexanderplatz,52° 31′ 17″ N,13° 24′ 48″ O
3,Altstadt Spandau,52° 32′ 21″ N,13° 12′ 20″ O
4,Alt-Mariendorf,52° 26′ 23″ N,13° 23′ 15″ O


Check the size of cleaned dataframe.

In [None]:
df_ubahn.shape

Convert degrees minutes seconds to decimal latitude/longitude

In [6]:
import re

def dms2dd(s):
    degrees, minutes, seconds, direction = re.split('[°\′″]+', s)
    dd = float(degrees) + float(minutes)/60 + float(seconds)/(60*60);
    if direction in ('S','W'):
        dd*= -1
    return dd

df_ubahn['Latitude'] = df_ubahn['Latitude'].apply(dms2dd)
df_ubahn['Longitude'] = df_ubahn['Longitude'].apply(dms2dd)
df_ubahn.head()

Unnamed: 0,Station,Latitude,Longitude
0,Adenauerplatz,52.499722,13.307222
1,Afrikanische Straße,52.560556,13.334167
2,Alexanderplatz,52.521389,13.413333
3,Altstadt Spandau,52.539167,13.205556
4,Alt-Mariendorf,52.439722,13.3875


#### Export the dataframe to a csv file:

In [None]:
df_ubahn.to_csv("ubahn_berlin.csv")

## 3. Methodology <a name="methodology"></a>

### 3.1. Exploratory Data Analysis

Now let's get the Foursquare Venue data. The average distance between two U-Bahn stations is 0,87 km. In order to avoid overlapping, the radius of the query was limited to 500 meters.

Let's start with finding coordinates for Berlin:

#### Use geopy library to get the latitude and longitude values of Toronto and show the map

In [7]:
# get the coordinate values of Berlin
address = 'Berlin, Germany'

geolocator = Nominatim(user_agent='toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geographical coordinates of Berlin are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Berlin are 52.5170365, 13.3888599.


In [8]:
# Create a map of Berlin using extracted coordinates
map_berlin_ubahn = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, station in zip(
        df_ubahn['Latitude'],
        df_ubahn['Longitude'],
        df_ubahn['Station']):
    label = '{}'.format(station)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='',
        fill_opacity=0.7,
        parse_html=False).add_to(map_berlin_ubahn)
    
# Show map
map_berlin_ubahn

#### Let's define Foursquare credentials and version:

In [37]:
CLIENT_ID = 'GUHGH2KSQ214GWOPXNURTPMRUK1VBOTKO0OFHL4H0SOX3W5Y' # my Foursquare ID (removed on public repository)
CLIENT_SECRET = 'KFOHQZ1P4UWSQBRC1GIWMJZX1ELGTI1TO0X5KWHZSESPDXCL' # my Foursquare Secret (removed on public repository)
VERSION = '20190718' # Foursquare API version

#### Let's explore venues surrounding the first U-Bahn station in the dataframe

In [10]:
# get the first U-Bahn station's  name
first_ubahn = df_ubahn.loc[0, 'Station']

# get its latitude and longitude values
first_ubahn_latitude = df_ubahn.loc[0, 'Latitude']
first_ubahn_longitude = df_ubahn.loc[0, 'Longitude']

print("The first neighborhood's name is {}. Its coordinates are {}, {}".format(first_ubahn,
                                                                              first_ubahn_latitude,
                                                                              first_ubahn_longitude))

The first neighborhood's name is Adenauerplatz . Its coordinates are 52.499722222222225, 13.307222222222222


### Beginning Model Moscow

In [11]:
categories_url = 'https://api.foursquare.com/v2/venues/categories?client_id={}&client_secret={}&v={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
            
# make the GET request
results = requests.get(categories_url).json()

In [12]:
len(results['response']['categories'])

10

In [13]:
categories_list = []
# Let's print only the top-level categories and their IDs and also add them to categories_list

def print_categories(categories, level=0, max_level=0):    
    if level>max_level: return
    out = ''
    out += '-'*level
    for category in categories:
        print(out + category['name'] + ' (' + category['id'] + ')')
        print_categories(category['categories'], level+1, max_level)
        categories_list.append((category['name'], category['id']))
        
print_categories(results['response']['categories'], 0, 0)

Arts & Entertainment (4d4b7104d754a06370d81259)
College & University (4d4b7105d754a06372d81259)
Event (4d4b7105d754a06373d81259)
Food (4d4b7105d754a06374d81259)
Nightlife Spot (4d4b7105d754a06376d81259)
Outdoors & Recreation (4d4b7105d754a06377d81259)
Professional & Other Places (4d4b7105d754a06375d81259)
Residence (4e67e38e036454776db1fb3a)
Shop & Service (4d4b7105d754a06378d81259)
Travel & Transport (4d4b7105d754a06379d81259)


In [22]:
def get_venues_count(latitudes, longitudes, radius, categoryId):
    explore_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION,
                latitudes,
                longitudes,
                radius,
                categoryId)

    # make the GET request
    return requests.get(explore_url).json()['response']['totalResults']

In [23]:
#Create new dataframe to store venues data
stations_venues_df = df_ubahn.copy()
for c in categories_list:
    stations_venues_df[c[0]] = 0
stations_venues_df

Unnamed: 0,Station,Latitude,Longitude,Arts & Entertainment,College & University,Event,Food,Nightlife Spot,Outdoors & Recreation,Professional & Other Places,Residence,Shop & Service,Travel & Transport
0,Adenauerplatz,52.499722,13.307222,0,0,0,0,0,0,0,0,0,0
1,Afrikanische Straße,52.560556,13.334167,0,0,0,0,0,0,0,0,0,0
2,Alexanderplatz,52.521389,13.413333,0,0,0,0,0,0,0,0,0,0
3,Altstadt Spandau,52.539167,13.205556,0,0,0,0,0,0,0,0,0,0
4,Alt-Mariendorf,52.439722,13.3875,0,0,0,0,0,0,0,0,0,0
5,Alt-Tegel,52.589444,13.283611,0,0,0,0,0,0,0,0,0,0
6,Alt-Tempelhof,52.466111,13.385556,0,0,0,0,0,0,0,0,0,0
7,Amrumer Straße,52.542222,13.348889,0,0,0,0,0,0,0,0,0,0
8,Augsburger Straße,52.500556,13.336389,0,0,0,0,0,0,0,0,0,0
9,Bayerischer Platz,52.488611,13.34,0,0,0,0,0,0,0,0,0,0


In [44]:
#Request number of venues, store result as CSV
for i, row in stations_venues_df.iterrows():
    print(i)
    for c in categories_list:        
        stations_venues_df.loc[i, c[0]] = get_venues_count(stations_venues_df.Latitude.iloc[i],
                                                           stations_venues_df.Longitude.iloc[i],
                                                           radius=10,
                                                           categoryId=c[1])
    stations_venues_df.to_csv('stations_venues.csv')

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16


KeyError: 'totalResults'

### Ending Model Moscow

#### Let's get top 100 venues surrounding U-Bahn station Adenauerplatz withtin a radius of 300 meters:

*Funfact: The station was named after Konrad Adenauer, a German statesman who served as the first Chancellor of the Federal Republic of Germany (West Germany) from 1949 to 1963.*

In [38]:
LIMIT = 100
radius = 300
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    first_ubahn_latitude,
    first_ubahn_longitude,
    radius,
    LIMIT)

results = requests.get(url).json() # Get the result to a json file

# Function that extracts the category of the venue:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [39]:
#Clean the json and structure it into a pandas dataframe:
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) #flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

#clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Louisa´s Place,Hotel,52.4995,13.305004
1,Bellucci,Italian Restaurant,52.49943,13.3068
2,Frau Behrens Torten,Café,52.501653,13.307663
3,SAVU,Modern European Restaurant,52.499323,13.305033
4,Block House,Steakhouse,52.499846,13.306933


#### Let's create a function to repeat the same process to all the U-Bahn stations in Berlin

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=300, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng, in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT,
            "4d4b7105d754a06374d81259")
        
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        # results = requests.get(url).json()['response']['venues']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_values = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_values.columns = ['Station',
                             'U-Bahn Latitude',
                            'U-Bahn Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category']
    
    return(nearby_values)

#### Write the code to run the above function on each U-Bahn station and create a new dataframe called ubahn_venues.

In [41]:
ubahn_venues = getNearbyVenues(names=df_ubahn['Station'],
                                latitudes=df_ubahn['Latitude'],
                                longitudes=df_ubahn['Longitude']
                                )

Adenauerplatz 
Afrikanische Straße 
Alexanderplatz 
Altstadt Spandau 
Alt-Mariendorf 
Alt-Tegel 
Alt-Tempelhof 
Amrumer Straße 
Augsburger Straße 
Bayerischer Platz 
Berliner Straße 
Bernauer Straße 
Biesdorf-Süd 
Birkenstraße 
Bismarckstraße 
Blaschkoallee 
Blissestraße 
Boddinstraße 
Borsigwerke 
Brandenburger Tor 
Breitenbachplatz 
Britz-Süd 
Bülowstraße 
Bundesplatz 
Bundestag 
Cottbusser Platz 
Dahlem-Dorf 
Deutsche Oper 
Eberswalder Straße 
Eisenacher Straße 
Elsterwerdaer Platz 
Ernst-Reuter-Platz 
Fehrbelliner Platz 
Frankfurter Allee 
Frankfurter Tor 
Franz-Neumann-Platz 
Französische Straße 
Freie Universität 
Friedrichsfelde 
Friedrichstraße 
Friedrich-Wilhelm-Platz 
Gesundbrunnen 
Gleisdreieck 
Gneisenaustraße 
Görlitzer Bahnhof 
Grenzallee 
Güntzelstraße 
Halemweg 
Hallesches Tor 
Hansaplatz 
Haselhorst 
Hauptbahnhof 
Hausvogteiplatz 
Heidelberger Platz 
Heinrich-Heine-Straße 
Hellersdorf 
Hermannplatz 
Hermannstraße 
Hohenzollernplatz 
Holzhauser Straße 
Hönow 
Innsbrucke

#### Let's check the size of the resulting dataframe:

In [42]:
print(ubahn_venues.shape)
ubahn_venues.head()

(2501, 7)


Unnamed: 0,Station,U-Bahn Latitude,U-Bahn Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adenauerplatz,52.499722,13.307222,Bellucci,52.49943,13.3068,Italian Restaurant
1,Adenauerplatz,52.499722,13.307222,Frau Behrens Torten,52.501653,13.307663,Café
2,Adenauerplatz,52.499722,13.307222,SAVU,52.499323,13.305033,Modern European Restaurant
3,Adenauerplatz,52.499722,13.307222,Block House,52.499846,13.306933,Steakhouse
4,Adenauerplatz,52.499722,13.307222,Kurpfalz Weinstuben,52.500898,13.307718,German Restaurant


#### Let's check how many values were returned for each station:

In [None]:
ubahn_venues.groupby('Station').count().head()

#### Let's find out how many unique categories can be curated from all the returned venues:

In [None]:
print('There are {} uniques categories.'.format(len(ubahn_venues['Venue Category'].unique())))

#### Analyze each U-Bahn station:

In [None]:
# one hot encoding
ubahn_onehot = pd.get_dummies(ubahn_venues[['Venue Category']], prefix='', prefix_sep='')

# add neighborhood column back to dataframe
ubahn_onehot['Station'] = ubahn_venues['Station']

# move neighborhood column to the first column
fixed_columns = [ubahn_onehot.columns[-1]] + list(ubahn_onehot.columns[:-1])
ubahn_onehot = ubahn_onehot[fixed_columns]

ubahn_onehot.head()

#### As we can see, there are a lot of categories which we can group them in one main category (e.g. Restaurant). First download the list of Foursquare's main and lower level categories originally downloaded from https://developer.foursquare.com/docs/resources/categories.


In [None]:
response1 = requests.get("https://developer.foursquare.com/docs/api-reference/venues/categories")
print(response1)

#### Let's group rows by stations and by taking the mean of the frequency of occurence of each category:

In [None]:
ubahn_grouped = ubahn_onehot.groupby('Station').mean().reset_index()
ubahn_grouped

#### Let's create a new dataframe and display the top 10 venues for each U-Bahn station

In [None]:
# first write a function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# define number of top venues and indicators
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Station']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# create a new dataframe
ubahn_venues_sorted = pd.DataFrame(columns=columns)
ubahn_venues_sorted['Station'] = ubahn_grouped['Station']

for ind in np.arange(ubahn_grouped.shape[0]):
    ubahn_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ubahn_grouped.iloc[ind, :], num_top_venues)

ubahn_venues_sorted.head()

### 3.2. Clustering the U-Bahn stations

#### Run k-means to cluster the stations into 5 clusters:

In [None]:
# set number of clusters
kclusters = 5

ubahn_grouped_clustering = ubahn_grouped.drop('Station', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ubahn_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

#### Let's create a new dataframe that include the cluster as well as the top 10 venues for each station

In [None]:
# add clustering labels
ubahn_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_.astype(int))

ubahn_merged = df_ubahn

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ubahn_merged = ubahn_merged.join(ubahn_venues_sorted.set_index('Station'), on='Station')

ubahn_merged.head()

#### Finally, let's visualize the resulting clusters

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        ubahn_merged['Latitude'],
        ubahn_merged['Longitude'],
        ubahn_merged['Station'],
        ubahn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' (Cluster ' + str(cluster) + ')', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Let's examine the clusters:

##### Cluster 1

In [None]:
ubahn_merged.loc[ubahn_merged['Cluster Labels'] == 0,
                 ubahn_merged.columns[[0] + list(range(4, ubahn_merged.shape[1]))]]

##### Cluster 2

In [None]:
ubahn_merged.loc[ubahn_merged['Cluster Labels'] == 1,
                 ubahn_merged.columns[[0] + list(range(4, ubahn_merged.shape[1]))]]

##### Cluster 3

In [None]:
ubahn_merged.loc[ubahn_merged['Cluster Labels'] == 2,
                 ubahn_merged.columns[[0] + list(range(4, ubahn_merged.shape[1]))]]

##### Cluster 4

In [None]:
ubahn_merged.loc[ubahn_merged['Cluster Labels'] == 3,
                 ubahn_merged.columns[[0] + list(range(4, ubahn_merged.shape[1]))]]

##### Cluster 5

In [None]:
ubahn_merged.loc[ubahn_merged['Cluster Labels'] == 4,
                 ubahn_merged.columns[[0] + list(range(4, ubahn_merged.shape[1]))]]

## 4. Results <a name="results"></a>

## 5. Discussion <a name="discussion"></a>

## 6. Conclusion <a name="conclusion"></a>