# Introduction

Let's try to explore similarities and differences between these two cities. Why are these two cities interesting? It's because I lived in both of them and now I'm curious to see are they similar. The first similarity is that the first language in both of these is german, but that's not important here. The city boroughs are going to be explored and analyzed based on the Foursquare app venue search. I will cluster separately both cities based on the top 10 venues and then try to find out what boroughs in Stuttgart and Graz belong to the same cluster based on the top 10 venues. And find a conclusion if I'm going to move from Graz to Stuttgart, where I can find in Stuttgart similar borough.

## Data Collection and Data Cleaning

### Finding data for Graz, Austria

#### Instaling and importing Beatifull Soup for Web Scraping

In [1]:
!pip install beautifulsoup4



In [2]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

__Web scraping__

On a website https://www.graz.at/cms/beitrag/10034856/7769112/Die_Bezirke.html there are zip codes for all boroughs in city of Graz in Austria


In [3]:
#requesting a url

url = requests.get("https://www.graz.at/cms/beitrag/10034856/7769112/Die_Bezirke.html")
url = url.content

In [4]:
#web scraping with beautifulsoup
soup = BeautifulSoup(url, 'html.parser')

In [5]:
table = soup.find_all('div', {'class':"txtblock-content standard"})

In [6]:
table = table[0]
table = table.ol

In [7]:
gplz_list = []
lis = table.find_all("li")
a = lis[0].text.strip().split(":")
a[1].split(" ")

['', '1,16', 'Quadratkilometer,', '3.899', '(3.933)', 'EinwohnerInnen']

In [8]:
for li in lis:
    l = li.text.strip().split(":")
    gplz_list.append(l)


In [9]:
plz_city = []
area_num = []
for plz in gplz_list:
    plz_city.append(plz[0])
    area_num.append(plz[1])

[' 1,16 Quadratkilometer, 3.899 (3.933) EinwohnerInnen',
 ' 1,83 Quadratkilometer, 16.235 (16.123) EinwohnerInnen',
 ' 5,50 Quadratkilometer, 25.300 (24.990) EinwohnerInnen',
 ' 3,70 Quadratkilometer, 30.966 (30.891) EinwohnerInnen',
 ' 5,05 Quadratkilometer, 28.735 (27.732) EinwohnerInnen',
 ' 4,06 Quadratkilometer, 33.283 (33.082) EinwohnerInnen',
 ' 7,99 Quadratkilometer, 14.417 (14.170) EinwohnerInnen',
 ' 8,86 Quadratkilometer, 15.139 (14.937) EinwohnerInnen',
 ' 4,48 Quadratkilometer, 11.906 (11.869) EinwohnerInnen',
 ' 10,16 Quadratkilometer,\xa0 5.910 (5.886) EinwohnerInnen',
 ' 13,99 Quadratkilometer, 9.756 (9.647) EinwohnerInnen',
 ' 18,47 Quadratkilometer, 19.197 (19.022) EinwohnerInnen',
 ' 10,83 Quadratkilometer, 11.129 (10.900) EinwohnerInnen',
 ' 7,79 Quadratkilometer, 20.553 (20.075) EinwohnerInnen',
 ' 5,77 Quadratkilometer, 15.630 (15.215) EinwohnerInnen',
 ' 11,75 Quadratkilometer, 16.003 (15.590) EinwohnerInnen',
 ' 6,18 Quadratkilometer, 8.628 (8.417) EinwohnerInne

In [10]:
plz = [plz[0:4] for plz in plz_city]
borough = [bor[4:].strip() for bor in plz_city]

In [11]:
area_num = [are.split(" ") for are in area_num]
area = []
population = []

In [12]:
#geting the area of borouhg
for are in area_num:
    area.append(float(are[1].replace(",",".")))

[1.16,
 1.83,
 5.5,
 3.7,
 5.05,
 4.06,
 7.99,
 8.86,
 4.48,
 10.16,
 13.99,
 18.47,
 10.83,
 7.79,
 5.77,
 11.75,
 6.18]

In [13]:
#geting the population of borough
for pop in area_num:
    population.append(int(pop[3].replace(".","")))

[3899,
 16235,
 25300,
 30966,
 28735,
 33283,
 14417,
 15139,
 11906,
 5910,
 9756,
 19197,
 11129,
 20553,
 15630,
 16003,
 8628]

### Making Pandas Dataframe from Scraped Data

In [14]:
graz = pd.DataFrame({'Post Code':plz, 'Borough':borough, 'Area [km2]':area, 'Population':population })

In [15]:
graz

Unnamed: 0,Post Code,Borough,Area [km2],Population
0,8010,Innere Stadt,1.16,3899
1,8010,St. Leonhard,1.83,16235
2,8010,Geidorf,5.5,25300
3,8020,Lend,3.7,30966
4,8020,Gries,5.05,28735
5,8010,Jakomini,4.06,33283
6,8041,Liebenau,7.99,14417
7,8042,St. Peter,8.86,15139
8,8010,Waltendorf,4.48,11906
9,8010,Ries,10.16,5910


### Geting latitudes and longitudes of boroughs in Graz

__Installing and importing geopy library__

In [16]:
!pip install geopy



__From geopy I use Nominatim to get latitude and longitude__

In [17]:
import geopy
from geopy.geocoders import Nominatim

In [18]:
locator = Nominatim(user_agent="myGeocoder")
#location = locator.geocode("Gries, Graz, Austria")

In [19]:
latitude =[]
longitude = []

for bor in borough:
    print(bor + ", Graz, Austria")
    location = locator.geocode(bor + ", Graz, Austria")
    if location!=None:
        latitude.append(location.latitude)
        longitude.append(location.longitude)
    else:
        latitude.append(None)
        longitude.append(None)

Innere Stadt, Graz, Austria
St. Leonhard, Graz, Austria
Geidorf, Graz, Austria
Lend, Graz, Austria
Gries, Graz, Austria
Jakomini, Graz, Austria
Liebenau, Graz, Austria
St. Peter, Graz, Austria
Waltendorf, Graz, Austria
Ries, Graz, Austria
Mariatrost, Graz, Austria
Andritz, Graz, Austria
Gösting, Graz, Austria
Eggenberg, Graz, Austria
Wetzelsdorf, Graz, Austria
Straßgang, Graz, Austria
Puntigam, Graz, Austria


__Adding latitude and longitude into padndas dataframe__

In [20]:
graz['Latitude'] = latitude
graz['Longitude'] = longitude

In [21]:
graz

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude
0,8010,Innere Stadt,1.16,3899,47.074261,15.438466
1,8010,St. Leonhard,1.83,16235,47.068287,15.456344
2,8010,Geidorf,5.5,25300,47.084668,15.442896
3,8020,Lend,3.7,30966,47.079675,15.420325
4,8020,Gries,5.05,28735,47.061222,15.42737
5,8010,Jakomini,4.06,33283,47.059623,15.444707
6,8041,Liebenau,7.99,14417,47.040169,15.449265
7,8042,St. Peter,8.86,15139,47.058701,15.469985
8,8010,Waltendorf,4.48,11906,47.067741,15.477172
9,8010,Ries,10.16,5910,47.088113,15.49718


## Now I need to find all datafor Stuttgart, Germany

__Web scraping with beautifulsoup__

In [22]:
url = "https://www.suche-postleitzahl.org/stuttgart-plz-70173-70629.608e"
url = requests.get(url)
url = url.content

Pandas library has a method read_html, which puts data from table in one dataframe

In [23]:
stuttgart_plz = pd.read_html(url)

In [24]:
stuttgart_plz = stuttgart_plz[1]

In [25]:
url = "https://de.wikipedia.org/wiki/Liste_der_Stadtbezirke_und_Stadtteile_von_Stuttgart"
url = requests.get(url).content

In [26]:
soup = BeautifulSoup(url,'html.parser')
table = soup.find_all('table',{'class':"wikitable sortable mw-datatable"})

In [27]:
trs = table[0].find_all('tr')
stutt = []
line = []
s_borough = []
s_pop = []
s_area_ha = []

for tr in trs:
    line = []
    tds = tr.find_all("td")
    for td in tds:
        line.append(td.text)
    stutt.append(line)

In [28]:
for st in stutt:
    if len(st) != 0:
        s_borough.append(st[1])
        s_pop.append(st[2].replace(".",""))
        s_area_ha.append(st[3].replace(",","."))

In [29]:
len(s_borough)

23

In [30]:
len(s_pop)

23

In [31]:
len(s_area_ha)

23

In [32]:
stuttgart = pd.DataFrame({'Borough':s_borough, 'Area [ha]':s_area_ha, 'Population': s_pop})

__Converting data from String to float and int__

In [33]:
stuttgart['Area [ha]'] = stuttgart['Area [ha]'].astype(float)
stuttgart['Population'] = stuttgart['Population'].astype(int)

In [34]:
stuttgart.dtypes

Borough        object
Area [ha]     float64
Population      int32
dtype: object

__Converting Area from [ha] to [km2]__

In [35]:
stuttgart['Area [km2]'] = stuttgart['Area [ha]']/100

In [36]:
stuttgart.drop('Area [ha]', axis = 1, inplace=True)

In [37]:
stuttgart

Unnamed: 0,Borough,Population,Area [km2]
0,Stuttgart-Mitte,23956,3.808
1,Stuttgart-Nord,27629,6.815
2,Stuttgart-Ost,48730,9.035
3,Stuttgart-Süd,44050,9.586
4,Stuttgart-West,52668,18.643
5,Bad Cannstatt,71285,15.713
6,Birkach,7149,3.089
7,Botnang,13165,2.135
8,Degerloch,16686,8.021
9,Feuerbach,30417,11.554


In [38]:
stuttgart_plz = stuttgart_plz.loc[stuttgart_plz['Stadtteil'].isin(stuttgart['Borough'])]

In [39]:
stuttgart_plz.rename(columns={'Stadtteil':'Borough'}, inplace=True)
stuttgart_plz.rename(columns={'Postleitzahl':'Post Code'}, inplace=True)

In [40]:
stuttgart_plz

Unnamed: 0,Borough,Post Code
1,Bad Cannstatt,"70191, 70372, 70374, 70376, 70378"
3,Birkach,70599
4,Botnang,70195
8,Degerloch,70597
11,Feuerbach,"70192, 70469, 70499"
17,Hedelfingen,"70327, 70329"
25,Möhringen,"70565, 70567, 70597"
27,Mühlhausen,70378
28,Münster,70376
31,Obertürkheim,70329


__Joining two dataframes together__

In [42]:
stuttgart = stuttgart.join(stuttgart_plz.set_index('Borough'), on='Borough')

In [43]:
latitude =[]
longitude = []

for bor in stuttgart['Borough']:
    
    print(bor + ", Stuttgart, Germany")
    location = locator.geocode(bor + ", Stuttgart, Germany")
    if location!=None:
        latitude.append(location.latitude)
        longitude.append(location.longitude)
    else:
        latitude.append(None)
        longitude.append(None)

Stuttgart-Mitte, Stuttgart, Germany
Stuttgart-Nord, Stuttgart, Germany
Stuttgart-Ost, Stuttgart, Germany
Stuttgart-Süd, Stuttgart, Germany
Stuttgart-West, Stuttgart, Germany
Bad Cannstatt, Stuttgart, Germany
Birkach, Stuttgart, Germany
Botnang, Stuttgart, Germany
Degerloch, Stuttgart, Germany
Feuerbach, Stuttgart, Germany
Hedelfingen, Stuttgart, Germany
Möhringen, Stuttgart, Germany
Mühlhausen, Stuttgart, Germany
Münster, Stuttgart, Germany
Obertürkheim, Stuttgart, Germany
Plieningen, Stuttgart, Germany
Sillenbuch, Stuttgart, Germany
Stammheim, Stuttgart, Germany
Untertürkheim, Stuttgart, Germany
Vaihingen, Stuttgart, Germany
Wangen, Stuttgart, Germany
Weilimdorf, Stuttgart, Germany
Zuffenhausen, Stuttgart, Germany


In [44]:
stuttgart['Latitude'] = latitude
stuttgart['Longitude'] = longitude

In [45]:
stuttgart = stuttgart[['Post Code', 'Borough', 'Area [km2]', 'Population', 'Latitude', 'Longitude']]
stuttgart

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude
0,"70173, 70174, 70176, 70178, 70180, 70182, 7018...",Stuttgart-Mitte,3.808,23956,48.7759,9.1798
1,"70174, 70191, 70192, 70193",Stuttgart-Nord,6.815,27629,48.796661,9.176252
2,"70184, 70186, 70188, 70190, 70327",Stuttgart-Ost,9.035,48730,48.776972,9.207365
3,"70178, 70180, 70184, 70199",Stuttgart-Süd,9.586,44050,48.753021,9.132492
4,"70174, 70176, 70178, 70193, 70197",Stuttgart-West,18.643,52668,48.777659,9.151351
5,"70191, 70372, 70374, 70376, 70378",Bad Cannstatt,15.713,71285,48.804883,9.21468
6,70599,Birkach,3.089,7149,48.728574,9.203406
7,70195,Botnang,2.135,13165,48.778495,9.129532
8,70597,Degerloch,8.021,16686,48.749597,9.170345
9,"70192, 70469, 70499",Feuerbach,11.554,30417,48.812305,9.159031


## Let us compare this two cities

In [46]:
stuttgart.describe()

Unnamed: 0,Area [km2],Population,Latitude,Longitude
count,23.0,23.0,23.0,23.0
mean,9.015087,26762.73913,48.778547,9.186186
std,5.227646,17346.05574,0.038593,0.047979
min,2.135,6796.0,48.711395,9.088648
25%,4.898,12833.5,48.751309,9.154328
50%,8.021,24067.0,48.776972,9.1798
75%,12.274,35885.0,48.808594,9.221109
max,20.893,71285.0,48.849798,9.268515


In [47]:
graz.describe()

Unnamed: 0,Area [km2],Population,Latitude,Longitude
count,17.0,17.0,17.0,17.0
mean,7.504118,16863.882353,47.069096,15.439408
std,4.495399,8570.006395,0.023491,0.03025
min,1.16,3899.0,47.027102,15.394168
25%,4.48,11129.0,47.058701,15.420325
50%,6.18,15630.0,47.068287,15.438466
75%,10.16,20553.0,47.084668,15.456344
max,18.47,33283.0,47.114287,15.49718


__So what is the difference in area of this two cities__

In [48]:
print("Area of Stuttgart is " + str(stuttgart['Area [km2]'].sum()) +  " km2")
print("Area of Graz is " + str(graz['Area [km2]'].sum()) +  " km2")

Area of Stuttgart is 207.347 km2
Area of Graz is 127.57 km2


In [49]:
area_diff = round(stuttgart['Area [km2]'].sum()/ graz['Area [km2]'].sum(),2)
print("Area of Stuttgart is " + str(area_diff) + " times or " + str(round((area_diff-1)*100)) + "% bigger than area of Graz")

Area of Stuttgart is 1.63 times or 63.0% bigger than area of Graz


__So what is the bigest and smallest borough in each city?__

In [50]:
print("Smallest area in Stuttgart is in " + stuttgart['Borough'].loc[stuttgart['Area [km2]'] == stuttgart['Area [km2]'].min()].values[0] + " with area of " + str(stuttgart['Area [km2]'].min()) + " km2")
print("Smallest area in Graz is in " + graz['Borough'].loc[graz['Area [km2]'] == graz['Area [km2]'].min()].values[0] + " with area of " + str(graz['Area [km2]'].min()) + " km2")

Smallest area in Stuttgart is in Botnang with area of 2.135 km2
Smallest area in Graz is in Innere Stadt with area of 1.16 km2


In [51]:
print("Biggest area in Stuttgart is in " + stuttgart['Borough'].loc[stuttgart['Area [km2]'] == stuttgart['Area [km2]'].max()].values[0] + " with area of " + str(stuttgart['Area [km2]'].max()) + " km2")
print("Biggest area in Graz is in " + graz['Borough'].loc[graz['Area [km2]'] == graz['Area [km2]'].max()].values[0] + " with area of " + str(graz['Area [km2]'].max()) + " km2")

Biggest area in Stuttgart is in Vaihingen with area of 20.893 km2
Biggest area in Graz is in Andritz with area of 18.47 km2


__How many people live in each town?__

In [52]:
print("In Stuttgart lives " + str("{:,} people".format(stuttgart['Population'].sum())).replace(","," "))
print("In Graz lives " + str("{:,} people".format(graz['Population'].sum())).replace(","," "))

In Stuttgart lives 615 543 people
In Graz lives 286 686 people


In [53]:
print("So most populated area of Stuttgart is {} with population of {:,}".format(stuttgart['Borough'].loc[stuttgart['Population'] == stuttgart['Population'].max()].values[0], stuttgart['Population'].max()).replace(","," "))
print("So most populated area of Graz is {} with population of {:,}".format(graz['Borough'].loc[graz['Population'] == graz['Population'].max()].values[0], graz['Population'].max()).replace(","," "))

So most populated area of Stuttgart is Bad Cannstatt with population of 71 285
So most populated area of Graz is Jakomini with population of 33 283


In [54]:
print("So least populated area of Stuttgart is {} with population of {:,}".format(stuttgart['Borough'].loc[stuttgart['Population'] == stuttgart['Population'].min()].values[0], stuttgart['Population'].min()).replace(","," "))
print("So least populated area of Graz is {} with population of {:,}".format(graz['Borough'].loc[graz['Population'] == graz['Population'].min()].values[0], graz['Population'].min()).replace(","," "))

So least populated area of Stuttgart is Münster with population of 6 796
So least populated area of Graz is Innere Stadt with population of 3 899


__One interesting aspect is also population density of the area__

In [55]:
# Density of stuttgart
stuttgart['Population/km2'] = stuttgart['Population']/stuttgart['Area [km2]']

In [56]:
#Density of Graz
graz['Population/km2'] = graz['Population']/graz['Area [km2]']

In [57]:
graz

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude,Population/km2
0,8010,Innere Stadt,1.16,3899,47.074261,15.438466,3361.206897
1,8010,St. Leonhard,1.83,16235,47.068287,15.456344,8871.584699
2,8010,Geidorf,5.5,25300,47.084668,15.442896,4600.0
3,8020,Lend,3.7,30966,47.079675,15.420325,8369.189189
4,8020,Gries,5.05,28735,47.061222,15.42737,5690.09901
5,8010,Jakomini,4.06,33283,47.059623,15.444707,8197.783251
6,8041,Liebenau,7.99,14417,47.040169,15.449265,1804.380476
7,8042,St. Peter,8.86,15139,47.058701,15.469985,1708.690745
8,8010,Waltendorf,4.48,11906,47.067741,15.477172,2657.589286
9,8010,Ries,10.16,5910,47.088113,15.49718,581.692913


# Analyse and create the maps of Graz and Stuttgart

In [58]:
import folium #import Folium to create the map

__Creating the map of Graz__

In [59]:
#geting the coordinates of Graz

def draw_a_map(city, city_name, country):
    loc = locator.geocode(city_name + ', ' + country)

    # creating a map of Graz usin latitude and longitude values
    map = folium.Map(location=[loc.latitude, loc.longitude], zoom_start=12)

    # add markers on map
    for lat, log, borough in zip(city['Latitude'], city['Longitude'], city['Borough']):
        borough = borough.replace("ä","a")
        borough = borough.replace("ü","u")
        borough = borough.replace("ö","o")
        borough = borough.replace("ß","ss")
        label = "{}, {}".format(borough, city_name)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, log],
            radius=15,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map)

    folium.Marker(
            [location.latitude, location.longitude],
            popup=label).add_to(map)
    return map

In [60]:
graz_map = draw_a_map(graz, 'Graz', 'Austria')

In [61]:
graz_map

In [62]:
stuttgart_map = draw_a_map(stuttgart, 'Stuttgart', 'Germany')

In [63]:
stuttgart_map

__Define Foursquare Credential and Version__

In [64]:
CLIENT_ID = 'Z5G40VQM4O2WVBA2LBNO0ZKSZAXHLNOGXNG4J2SKFNYSHDDG'
CLIENT_SECRET = 'HNKTSE1ZE1MUAR4NF02NMZHNUFTO1PCHLZPZF15POF103GDH'
VERSION = '20200727'
print('MY credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

MY credentails:
CLIENT_ID: Z5G40VQM4O2WVBA2LBNO0ZKSZAXHLNOGXNG4J2SKFNYSHDDG
CLIENT_SECRET:HNKTSE1ZE1MUAR4NF02NMZHNUFTO1PCHLZPZF15POF103GDH


Let's explore fifth borough in Graz in our dataframe.

In [65]:
graz.loc[4, 'Borough']

'Gries'

In [66]:
# data for Gries, Graz
borough_latitude = graz.loc[4, 'Latitude'] #Latitude
borough_longitude = graz.loc[4, 'Longitude'] #Longitude

borough_name = graz.loc[4, 'Borough'] #Borough Name

print('Latitude and longitude values of {} are {}, {}'.format(borough_name,
                                                              borough_latitude,
                                                              borough_longitude))

Latitude and longitude values of Gries are 47.0612224, 15.42737


##### Let's get the top 100 venues that are in Gries, Graz within a radius of 500 meters.

Import libraries

In [67]:
import json
from pandas.io.json import json_normalize

In [68]:
LIMIT = 100
RADIUS = 500

#create url
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    borough_latitude,
    borough_longitude,
    RADIUS,
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=Z5G40VQM4O2WVBA2LBNO0ZKSZAXHLNOGXNG4J2SKFNYSHDDG&client_secret=HNKTSE1ZE1MUAR4NF02NMZHNUFTO1PCHLZPZF15POF103GDH&v=20200727&ll=47.0612224,15.42737&radius=500&limit=100'

In [69]:
result = requests.get(url).json()

Function that extracts the category of the venue

In [70]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list)==0:
        return None
    else:
        return categories_list[0]['name']

Clean the json and structure it into a *pandas* dataframe

In [71]:
venues = result['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) #flaten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

#clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Airea 55,Pizza Place,47.06053,15.427137
1,CITYPARK,Shopping Mall,47.060486,15.426947
2,Lidl,Discount Store,47.061523,15.430424
3,Tre Amici,Pizza Place,47.060451,15.426879
4,INTERSPAR,Supermarket,47.060176,15.426446


In [72]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

19 venues were returned by Foursquare.


## Explore all boroughs in Graz

#### Let's create a function to repeat the same process to all the boroughs

In [73]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #create API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        
        #make GET request
        results = requests.get(url).json()
        results = results["response"]['groups'][0]['items']
        
        #return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
        
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)   

In [74]:
graz_venues = getNearbyVenues(names=graz['Borough'],
                                     latitudes=graz['Latitude'],
                                     longitudes=graz['Longitude'])

Innere Stadt
St. Leonhard
Geidorf
Lend
Gries
Jakomini
Liebenau
St. Peter
Waltendorf
Ries
Mariatrost
Andritz
Gösting
Eggenberg
Wetzelsdorf
Straßgang
Puntigam


In [75]:
graz_venues.head()

Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Innere Stadt,47.074261,15.438466,Schloßberg,47.074057,15.437077,Mountain
1,Innere Stadt,47.074261,15.438466,Creperie Le Schnurrbart,47.074756,15.440271,French Restaurant
2,Innere Stadt,47.074261,15.438466,Uhrturm,47.07359,15.437703,Historic Site
3,Innere Stadt,47.074261,15.438466,Eis-Greissler,47.07212,15.438734,Ice Cream Shop
4,Innere Stadt,47.074261,15.438466,Oh! Matcha,47.073505,15.439936,Asian Restaurant


Let's check the size of the resulting dataframe

In [76]:
print(graz_venues.shape)

(203, 7)


Let's check how many venues were returned for each neighborhood

In [77]:
graz_venues.groupby("Borough").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andritz,4,4,4,4,4,4
Eggenberg,11,11,11,11,11,11
Geidorf,6,6,6,6,6,6
Gries,19,19,19,19,19,19
Gösting,6,6,6,6,6,6
Innere Stadt,95,95,95,95,95,95
Jakomini,8,8,8,8,8,8
Lend,2,2,2,2,2,2
Liebenau,3,3,3,3,3,3
Mariatrost,4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [78]:
print('There are {} uniques categories.'.format(len(graz_venues['Venue Category'].unique())))

There are 87 uniques categories.


## Explore all boroughs in Stuttgart

In [79]:
stuttgart_venues = getNearbyVenues(names=stuttgart['Borough'],
                                     latitudes=stuttgart['Latitude'],
                                     longitudes=stuttgart['Longitude'])

Stuttgart-Mitte
Stuttgart-Nord
Stuttgart-Ost
Stuttgart-Süd
Stuttgart-West
Bad Cannstatt
Birkach
Botnang
Degerloch
Feuerbach
Hedelfingen
Möhringen
Mühlhausen
Münster
Obertürkheim
Plieningen
Sillenbuch
Stammheim
Untertürkheim
Vaihingen
Wangen
Weilimdorf
Zuffenhausen


In [80]:
stuttgart_venues.head()

Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Stuttgart-Mitte,48.7759,9.1798,Old Bridge,48.774173,9.179274,Ice Cream Shop
1,Stuttgart-Mitte,48.7759,9.1798,Markthalle,48.776145,9.179335,Market
2,Stuttgart-Mitte,48.7759,9.1798,Altes Schloss,48.777168,9.179526,History Museum
3,Stuttgart-Mitte,48.7759,9.1798,Landesmuseum Württemberg,48.777143,9.179542,Museum
4,Stuttgart-Mitte,48.7759,9.1798,Eduard's,48.775537,9.179935,Cocktail Bar


Let's check the size of the resulting dataframe

In [81]:
print(stuttgart_venues.shape)

(350, 7)


Let's check how many venues were returned for each neighborhood

In [82]:
stuttgart_venues.groupby("Borough").count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bad Cannstatt,50,50,50,50,50,50
Birkach,3,3,3,3,3,3
Botnang,9,9,9,9,9,9
Degerloch,19,19,19,19,19,19
Feuerbach,16,16,16,16,16,16
Hedelfingen,4,4,4,4,4,4
Möhringen,17,17,17,17,17,17
Mühlhausen,4,4,4,4,4,4
Münster,5,5,5,5,5,5
Obertürkheim,5,5,5,5,5,5


#### Let's find out how many unique categories can be curated from all the returned venues

In [83]:
print('There are {} uniques categories.'.format(len(stuttgart_venues['Venue Category'].unique())))

There are 120 uniques categories.


## Analyze each borough in Graz

In [84]:
#one hot encoder
graz_onehot = pd.get_dummies(graz_venues[['Venue Category']], prefix="", prefix_sep=" ")
print(graz_onehot.shape)

(203, 87)


In [85]:
graz_onehot.head()

Unnamed: 0,Art Gallery,Asian Restaurant,Austrian Restaurant,Bakery,Bar,Beach,Beer Bar,Beer Garden,Betting Shop,Board Shop,...,Steakhouse,Supermarket,Sushi Restaurant,Tech Startup,Thai Restaurant,Theater,Tour Provider,Tram Station,Video Store,Water Park
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [86]:
graz_venues['Borough'].head()

0    Innere Stadt
1    Innere Stadt
2    Innere Stadt
3    Innere Stadt
4    Innere Stadt
Name: Borough, dtype: object

In [87]:
# add neighborhood column back to dataframe
graz_onehot['Borough'] = graz_venues['Borough']

In [88]:
fixed_columns = [graz_onehot.columns[-1]] + graz_onehot.columns[:-1].tolist()
graz_onehot = graz_onehot[fixed_columns]
graz_onehot.head()

Unnamed: 0,Borough,Art Gallery,Asian Restaurant,Austrian Restaurant,Bakery,Bar,Beach,Beer Bar,Beer Garden,Betting Shop,...,Steakhouse,Supermarket,Sushi Restaurant,Tech Startup,Thai Restaurant,Theater,Tour Provider,Tram Station,Video Store,Water Park
0,Innere Stadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Innere Stadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Innere Stadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Innere Stadt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Innere Stadt,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [89]:
graz_onehot.shape

(203, 88)

#### Next, let's group rows by borough and by taking the mean of the frequency of occurrence of each category

In [90]:
graz_grouped = graz_onehot.groupby('Borough').mean().reset_index()
graz_grouped.head()

Unnamed: 0,Borough,Art Gallery,Asian Restaurant,Austrian Restaurant,Bakery,Bar,Beach,Beer Bar,Beer Garden,Betting Shop,...,Steakhouse,Supermarket,Sushi Restaurant,Tech Startup,Thai Restaurant,Theater,Tour Provider,Tram Station,Video Store,Water Park
0,Andritz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Eggenberg,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Geidorf,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667
3,Gries,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0
4,Gösting,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each borough along with the top 5 most common venues

In [91]:
num_top_venues = 5
for borough in graz_grouped['Borough']:
    print("----"+borough+"----")
    temp = graz_grouped[graz_grouped['Borough'] == borough].T.reset_index()
    temp.columns = ['venues', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Andritz----
                        venues  freq
0   Construction & Landscaping  0.25
1                    Gastropub  0.25
2                  Supermarket  0.25
3                          Spa  0.25
4                  Art Gallery  0.00


----Eggenberg----
              venues  freq
0      Grocery Store  0.09
1             Lounge  0.09
2   Asian Restaurant  0.09
3         Restaurant  0.09
4   Football Stadium  0.09


----Geidorf----
                venues  freq
0           Water Park  0.17
1   Spanish Restaurant  0.17
2          Pizza Place  0.17
3   Seafood Restaurant  0.17
4   Italian Restaurant  0.17


----Gries----
                  venues  freq
0                   Café  0.11
1            Pizza Place  0.11
2            Art Gallery  0.05
3   Fast Food Restaurant  0.05
4              Gastropub  0.05


----Gösting----
              venues  freq
0   Asian Restaurant  0.17
1        Supermarket  0.17
2        Snack Place  0.17
3       Soccer Field  0.17
4      Grocery Store  0.17


----

#### Put that into a *pandas* dataframe

Function to sort the venues in descending order

In [92]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [93]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
#Create new dataframe
graz_borough_venues_sorted = pd.DataFrame(columns=columns)
graz_borough_venues_sorted['Borough'] = graz_grouped['Borough']

for ind in np.arange(graz_grouped.shape[0]):
    graz_borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(graz_grouped.iloc[ind, :], num_top_venues)

graz_borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Andritz,Gastropub,Supermarket,Spa,Construction & Landscaping,Historic Site,French Restaurant,Department Store,Dessert Shop,Discount Store,Electronics Store
1,Eggenberg,Restaurant,Football Stadium,Asian Restaurant,Pool,Grocery Store,Betting Shop,Lounge,Park,Paper / Office Supplies Store,Neighborhood
2,Geidorf,Water Park,Austrian Restaurant,Italian Restaurant,Pizza Place,Seafood Restaurant,Spanish Restaurant,French Restaurant,Dessert Shop,Discount Store,Electronics Store
3,Gries,Pizza Place,Café,Art Gallery,Shopping Mall,Gift Shop,Gastropub,Fast Food Restaurant,Electronics Store,Discount Store,Plaza
4,Gösting,Restaurant,Asian Restaurant,Grocery Store,Supermarket,Soccer Field,Snack Place,French Restaurant,Construction & Landscaping,Department Store,Dessert Shop


In [94]:
graz_grouped.shape[0]

17

## Cluster Boroughs

Import KMeans

In [95]:
from sklearn.cluster import KMeans

Run *k*-means to cluster the neighborhood into 5 clusters.

In [96]:
#set number of clusters
kclusters = 4

graz_grouped_clustering = graz_grouped.drop('Borough',1)

#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(graz_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 1, 2, 0, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [97]:
#add cluster labels
graz_borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

graz_merged = graz

# merge graz_grouped with graz to add latitude/longitude for each borough
graz_merged = graz_merged.join(graz_borough_venues_sorted.set_index('Borough'), on='Borough')
graz_merged['Cluster Labels'].unique()

array([0, 2, 1, 3], dtype=int64)

In [98]:
graz_merged.head()

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude,Population/km2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,8010,Innere Stadt,1.16,3899,47.074261,15.438466,3361.206897,0,Café,Bar,Hotel,Plaza,Coffee Shop,Austrian Restaurant,Restaurant,Historic Site,Italian Restaurant,Cocktail Bar
1,8010,St. Leonhard,1.83,16235,47.068287,15.456344,8871.584699,0,Café,Tram Station,Coffee Shop,Bakery,Supermarket,Light Rail Station,Gastropub,Plaza,Park,Water Park
2,8010,Geidorf,5.5,25300,47.084668,15.442896,4600.0,0,Water Park,Austrian Restaurant,Italian Restaurant,Pizza Place,Seafood Restaurant,Spanish Restaurant,French Restaurant,Dessert Shop,Discount Store,Electronics Store
3,8020,Lend,3.7,30966,47.079675,15.420325,8369.189189,2,Bus Stop,Water Park,Construction & Landscaping,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Football Stadium,French Restaurant
4,8020,Gries,5.05,28735,47.061222,15.42737,5690.09901,0,Pizza Place,Café,Art Gallery,Shopping Mall,Gift Shop,Gastropub,Fast Food Restaurant,Electronics Store,Discount Store,Plaza


Finally, let's visualize the resulting clusters

In [99]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [100]:
graz_loc = locator.geocode('Graz', 'Austria')

In [101]:
graz_loc.latitude

47.0708678

In [102]:
# create map
graz_map_clusters = folium.Map(location=[graz_loc.latitude, graz_loc.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
graz_markers_colors = []
for lat, lon, poi, cluster in zip(graz_merged['Latitude'], graz_merged['Longitude'], graz_merged['Borough'], graz_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(graz_map_clusters)
       
graz_map_clusters

##  Examine Clusters

Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

#### Cluster 1

In [103]:
graz_merged.loc[graz_merged['Cluster Labels'] == 0, graz_merged.columns[[1] + [*range(7, graz_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Innere Stadt,0,Café,Bar,Hotel,Plaza,Coffee Shop,Austrian Restaurant,Restaurant,Historic Site,Italian Restaurant,Cocktail Bar
1,St. Leonhard,0,Café,Tram Station,Coffee Shop,Bakery,Supermarket,Light Rail Station,Gastropub,Plaza,Park,Water Park
2,Geidorf,0,Water Park,Austrian Restaurant,Italian Restaurant,Pizza Place,Seafood Restaurant,Spanish Restaurant,French Restaurant,Dessert Shop,Discount Store,Electronics Store
4,Gries,0,Pizza Place,Café,Art Gallery,Shopping Mall,Gift Shop,Gastropub,Fast Food Restaurant,Electronics Store,Discount Store,Plaza
6,Liebenau,0,Tour Provider,Supermarket,Skate Park,Water Park,Furniture / Home Store,Department Store,Dessert Shop,Discount Store,Electronics Store,Event Space
7,St. Peter,0,Print Shop,Chinese Restaurant,Supermarket,Café,Water Park,Furniture / Home Store,Discount Store,Electronics Store,Event Space,Fast Food Restaurant
11,Andritz,0,Gastropub,Supermarket,Spa,Construction & Landscaping,Historic Site,French Restaurant,Department Store,Dessert Shop,Discount Store,Electronics Store
12,Gösting,0,Restaurant,Asian Restaurant,Grocery Store,Supermarket,Soccer Field,Snack Place,French Restaurant,Construction & Landscaping,Department Store,Dessert Shop
13,Eggenberg,0,Restaurant,Football Stadium,Asian Restaurant,Pool,Grocery Store,Betting Shop,Lounge,Park,Paper / Office Supplies Store,Neighborhood
14,Wetzelsdorf,0,Supermarket,Asian Restaurant,Market,Restaurant,Gastropub,Water Park,Furniture / Home Store,Dessert Shop,Discount Store,Electronics Store


#### Cluster 2

In [104]:
graz_merged.loc[graz_merged['Cluster Labels'] == 1, graz_merged.columns[[1] + [*range(7, graz_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Jakomini,1,Hotel,Supermarket,Light Rail Station,Grocery Store,Event Space,Indie Movie Theater,Bar,Beach,Gourmet Shop,Bakery
8,Waltendorf,1,Construction & Landscaping,Hotel,Restaurant,Furniture / Home Store,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Football Stadium
9,Ries,1,Hotel,Park,Water Park,Furniture / Home Store,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Football Stadium
10,Mariatrost,1,Hotel,Light Rail Station,Supermarket,Pharmacy,Furniture / Home Store,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant


#### Cluster 3

In [105]:
graz_merged.loc[graz_merged['Cluster Labels'] == 2, graz_merged.columns[[1] + [*range(7, graz_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Lend,2,Bus Stop,Water Park,Construction & Landscaping,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Football Stadium,French Restaurant


#### Cluster 4

In [106]:
graz_merged.loc[graz_merged['Cluster Labels'] == 3, graz_merged.columns[[1] + [*range(7, graz_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,Straßgang,3,Furniture / Home Store,Water Park,Gastropub,Dessert Shop,Discount Store,Electronics Store,Event Space,Fast Food Restaurant,Football Stadium,French Restaurant


## Analyze each borough in Stuttgart

In [107]:
#one hot encoder
stuttgart_onehot = pd.get_dummies(stuttgart_venues[['Venue Category']], prefix="", prefix_sep=" ")
print(stuttgart_onehot.shape)

(350, 120)


In [108]:
stuttgart_onehot.head()

Unnamed: 0,African Restaurant,American Restaurant,Art Museum,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Bakery,Bank,Bar,...,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [109]:
stuttgart_venues['Borough'].head()

0    Stuttgart-Mitte
1    Stuttgart-Mitte
2    Stuttgart-Mitte
3    Stuttgart-Mitte
4    Stuttgart-Mitte
Name: Borough, dtype: object

In [110]:
# add neighborhood column back to dataframe
stuttgart_onehot['Borough'] = stuttgart_venues['Borough']

In [111]:
fixed_columns = [stuttgart_onehot.columns[-1]] + stuttgart_onehot.columns[:-1].tolist()
stuttgart_onehot = stuttgart_onehot[fixed_columns]
stuttgart_onehot.head()

Unnamed: 0,Borough,African Restaurant,American Restaurant,Art Museum,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Bakery,Bank,...,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop
0,Stuttgart-Mitte,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Stuttgart-Mitte,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Stuttgart-Mitte,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Stuttgart-Mitte,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Stuttgart-Mitte,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [112]:
stuttgart_onehot.shape

(350, 121)

#### Next, let's group rows by borough and by taking the mean of the frequency of occurrence of each category

In [113]:
stuttgart_grouped = stuttgart_onehot.groupby('Borough').mean().reset_index()
stuttgart_grouped.head()

Unnamed: 0,Borough,African Restaurant,American Restaurant,Art Museum,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Bakery,Bank,...,Trail,Train Station,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Vietnamese Restaurant,Vineyard,Wine Bar,Wine Shop
0,Bad Cannstatt,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.06,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Birkach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Botnang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111
3,Degerloch,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,...,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Feuerbach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's print each borough along with the top 5 most common venues

In [114]:
num_top_venues = 5
for borough in stuttgart_grouped['Borough']:
    print("----"+borough+"----")
    temp = stuttgart_grouped[stuttgart_grouped['Borough'] == borough].T.reset_index()
    temp.columns = ['venues', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bad Cannstatt----
               venues  freq
0   German Restaurant  0.12
1                Café  0.10
2       Metro Station  0.06
3              Bakery  0.06
4           Bookstore  0.04


----Birkach----
                        venues  freq
0   Construction & Landscaping  0.33
1                         Farm  0.33
2                        Trail  0.33
3           African Restaurant  0.00
4             Pedestrian Plaza  0.00


----Botnang----
           venues  freq
0   Metro Station  0.22
1          Bakery  0.22
2     Supermarket  0.22
3       Wine Shop  0.11
4      Restaurant  0.11


----Degerloch----
                venues  freq
0          Supermarket  0.21
1   Italian Restaurant  0.11
2       Shop & Service  0.05
3      Thai Restaurant  0.05
4           Restaurant  0.05


----Feuerbach----
                 venues  freq
0    Italian Restaurant  0.12
1          Dessert Shop  0.06
2       Thai Restaurant  0.06
3   Japanese Restaurant  0.06
4                 Hotel  0.06


----Hedelfin

#### Put that into a *pandas* dataframe

Create the new dataframe and display the top 10 venues for each neighborhood.

In [115]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
#Create new dataframe
stuttgart_borough_venues_sorted = pd.DataFrame(columns=columns)
stuttgart_borough_venues_sorted['Borough'] = stuttgart_grouped['Borough']

for ind in np.arange(stuttgart_grouped.shape[0]):
    stuttgart_borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(stuttgart_grouped.iloc[ind, :], num_top_venues)

stuttgart_borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bad Cannstatt,German Restaurant,Café,Bakery,Metro Station,Drugstore,Bus Stop,Theater,Bookstore,Asian Restaurant,Hotel
1,Birkach,Farm,Trail,Construction & Landscaping,Wine Shop,Gas Station,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Event Space
2,Botnang,Bakery,Metro Station,Supermarket,Restaurant,Stadium,Wine Shop,Doner Restaurant,Drugstore,Event Space,Falafel Restaurant
3,Degerloch,Supermarket,Italian Restaurant,Thai Restaurant,Coffee Shop,Science Museum,Shop & Service,Drugstore,Paper / Office Supplies Store,French Restaurant,Restaurant
4,Feuerbach,Italian Restaurant,Japanese Restaurant,Thai Restaurant,Hotel,Restaurant,Pub,Drugstore,Pet Store,Burger Joint,Organic Grocery


In [116]:
stuttgart_grouped.shape[0]

23

## Cluster Boroughs in Stuttgart

Run *k*-means to cluster the neighborhood into 4 clusters.

In [117]:
#set number of clusters
kclusters = 4

stuttgart_grouped_clustering = stuttgart_grouped.drop('Borough',1)

#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(stuttgart_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 1, 3, 2, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [118]:
#add cluster labels
stuttgart_borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

stuttgart_merged = stuttgart

# merge graz_grouped with graz to add latitude/longitude for each borough
stuttgart_merged = stuttgart_merged.join(stuttgart_borough_venues_sorted.set_index('Borough'), on='Borough')
stuttgart_merged['Cluster Labels'].unique()

array([1, 0, 3, 2], dtype=int64)

In [119]:
stuttgart.head()

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude,Population/km2
0,"70173, 70174, 70176, 70178, 70180, 70182, 7018...",Stuttgart-Mitte,3.808,23956,48.7759,9.1798,6290.966387
1,"70174, 70191, 70192, 70193",Stuttgart-Nord,6.815,27629,48.796661,9.176252,4054.145268
2,"70184, 70186, 70188, 70190, 70327",Stuttgart-Ost,9.035,48730,48.776972,9.207365,5393.46984
3,"70178, 70180, 70184, 70199",Stuttgart-Süd,9.586,44050,48.753021,9.132492,4595.243063
4,"70174, 70176, 70178, 70193, 70197",Stuttgart-West,18.643,52668,48.777659,9.151351,2825.0818


In [120]:
stuttgart_merged.head()

Unnamed: 0,Post Code,Borough,Area [km2],Population,Latitude,Longitude,Population/km2,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"70173, 70174, 70176, 70178, 70180, 70182, 7018...",Stuttgart-Mitte,3.808,23956,48.7759,9.1798,6290.966387,1,German Restaurant,Bar,Café,Plaza,Coffee Shop,Sushi Restaurant,Clothing Store,Cocktail Bar,Wine Bar,Jazz Club
1,"70174, 70191, 70192, 70193",Stuttgart-Nord,6.815,27629,48.796661,9.176252,4054.145268,1,Metro Station,Scenic Lookout,Tennis Stadium,Museum,Neighborhood,Organic Grocery,Shopping Mall,Supermarket,Ice Cream Shop,Bakery
2,"70184, 70186, 70188, 70190, 70327",Stuttgart-Ost,9.035,48730,48.776972,9.207365,5393.46984,1,Bakery,Bar,Italian Restaurant,Pub,Trattoria/Osteria,Grocery Store,French Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market
3,"70178, 70180, 70184, 70199",Stuttgart-Süd,9.586,44050,48.753021,9.132492,4595.243063,0,Furniture / Home Store,Tunnel,Metro Station,Wine Shop,Gas Station,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Event Space
4,"70174, 70176, 70178, 70193, 70197",Stuttgart-West,18.643,52668,48.777659,9.151351,2825.0818,1,German Restaurant,Dessert Shop,Bakery,Café,Szechuan Restaurant,Spanish Restaurant,Organic Grocery,Sports Bar,Fast Food Restaurant,Burger Joint


Finally, let's visualize the resulting clusters

In [121]:
stuttgart_loc = locator.geocode('Stuttgart', 'Germany')

In [122]:
stuttgart_loc.latitude

48.7784485

In [123]:
# create map
stuttgart_map_clusters = folium.Map(location=[stuttgart_loc.latitude, stuttgart_loc.longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
stuttgart_markers_colors = []
for lat, lon, poi, cluster in zip(stuttgart_merged['Latitude'], stuttgart_merged['Longitude'], stuttgart_merged['Borough'], stuttgart_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(stuttgart_map_clusters)
       
stuttgart_map_clusters

##  Examine Clusters in Stuttgart

Examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

#### Cluster 1

In [124]:
stuttgart_merged.loc[stuttgart_merged['Cluster Labels'] == 0, stuttgart_merged.columns[[1] + [*range(7, stuttgart_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Stuttgart-Süd,0,Furniture / Home Store,Tunnel,Metro Station,Wine Shop,Gas Station,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Event Space


#### Cluster 2

In [125]:
stuttgart_merged.loc[stuttgart_merged['Cluster Labels'] == 1, stuttgart_merged.columns[[1] + [*range(7, stuttgart_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Stuttgart-Mitte,1,German Restaurant,Bar,Café,Plaza,Coffee Shop,Sushi Restaurant,Clothing Store,Cocktail Bar,Wine Bar,Jazz Club
1,Stuttgart-Nord,1,Metro Station,Scenic Lookout,Tennis Stadium,Museum,Neighborhood,Organic Grocery,Shopping Mall,Supermarket,Ice Cream Shop,Bakery
2,Stuttgart-Ost,1,Bakery,Bar,Italian Restaurant,Pub,Trattoria/Osteria,Grocery Store,French Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market
4,Stuttgart-West,1,German Restaurant,Dessert Shop,Bakery,Café,Szechuan Restaurant,Spanish Restaurant,Organic Grocery,Sports Bar,Fast Food Restaurant,Burger Joint
5,Bad Cannstatt,1,German Restaurant,Café,Bakery,Metro Station,Drugstore,Bus Stop,Theater,Bookstore,Asian Restaurant,Hotel
6,Birkach,1,Farm,Trail,Construction & Landscaping,Wine Shop,Gas Station,Department Store,Dessert Shop,Doner Restaurant,Drugstore,Event Space
7,Botnang,1,Bakery,Metro Station,Supermarket,Restaurant,Stadium,Wine Shop,Doner Restaurant,Drugstore,Event Space,Falafel Restaurant
8,Degerloch,1,Supermarket,Italian Restaurant,Thai Restaurant,Coffee Shop,Science Museum,Shop & Service,Drugstore,Paper / Office Supplies Store,French Restaurant,Restaurant
9,Feuerbach,1,Italian Restaurant,Japanese Restaurant,Thai Restaurant,Hotel,Restaurant,Pub,Drugstore,Pet Store,Burger Joint,Organic Grocery
10,Hedelfingen,1,Metro Station,German Restaurant,Home Service,Bar,Farm,French Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market,Wine Shop


#### Cluster 3

In [126]:
stuttgart_merged.loc[stuttgart_merged['Cluster Labels'] == 2, stuttgart_merged.columns[[1] + [*range(7, stuttgart_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Münster,2,Metro Station,Train Station,Soccer Field,Botanical Garden,Doner Restaurant,Drugstore,Event Space,Dessert Shop,Furniture / Home Store,Falafel Restaurant
17,Stammheim,2,Metro Station,Supermarket,Hotel,Jewelry Store,Falafel Restaurant,French Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market,Farm


#### Cluster 4

In [127]:
stuttgart_merged.loc[stuttgart_merged['Cluster Labels'] == 3, stuttgart_merged.columns[[1] + [*range(7, stuttgart_merged.shape[1])]]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Mühlhausen,3,Vineyard,Karaoke Bar,Bus Stop,Castle,Wine Shop,Farm,French Restaurant,Flower Shop,Fast Food Restaurant,Farmers Market


## Find similarities

I lived in both cities and I want to find out what are similarities between boroughs that I lived in. So I will put all boroughs toghether and find out what boroughs from other city are in the same cluster.

In [128]:
graz_venues.head()

Unnamed: 0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Innere Stadt,47.074261,15.438466,Schloßberg,47.074057,15.437077,Mountain
1,Innere Stadt,47.074261,15.438466,Creperie Le Schnurrbart,47.074756,15.440271,French Restaurant
2,Innere Stadt,47.074261,15.438466,Uhrturm,47.07359,15.437703,Historic Site
3,Innere Stadt,47.074261,15.438466,Eis-Greissler,47.07212,15.438734,Ice Cream Shop
4,Innere Stadt,47.074261,15.438466,Oh! Matcha,47.073505,15.439936,Asian Restaurant


In [129]:
stuttgart_venues.shape

(350, 7)

In [130]:
merged_venues = graz_venues.append(stuttgart_venues, ignore_index=True, sort=False)

In [131]:
merged_venues.shape

(553, 7)

In [132]:
#one hot encoder
merged_onehot = pd.get_dummies(merged_venues[['Venue Category']], prefix="", prefix_sep=" ")
print(merged_onehot.shape)

(553, 151)


In [133]:
merged_onehot.head()

Unnamed: 0,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Garage,Bakery,...,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Video Store,Vietnamese Restaurant,Vineyard,Water Park,Wine Bar,Wine Shop
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [134]:
# add neighborhood column back to dataframe
merged_onehot['Borough'] = merged_venues['Borough']

In [135]:
fixed_columns = [merged_onehot.columns[-1]] + merged_onehot.columns[:-1].tolist()
merged_onehot = merged_onehot[fixed_columns]
merged_onehot.shape

(553, 152)

In [136]:
merged_grouped = merged_onehot.groupby('Borough').mean().reset_index()
merged_grouped.head()

Unnamed: 0,Borough,African Restaurant,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Austrian Restaurant,Auto Dealership,Auto Garage,...,Tram Station,Trattoria/Osteria,Tunnel,Turkish Restaurant,Video Store,Vietnamese Restaurant,Vineyard,Water Park,Wine Bar,Wine Shop
0,Andritz,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bad Cannstatt,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Birkach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Botnang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111
4,Degerloch,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create the new dataframe and display the top 10 venues for each neighborhood.

In [137]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
#Create new dataframe
merged_borough_venues_sorted = pd.DataFrame(columns=columns)
merged_borough_venues_sorted['Borough'] = merged_grouped['Borough']

for ind in np.arange(merged_grouped.shape[0]):
    merged_borough_venues_sorted.iloc[ind, 1:] = return_most_common_venues(merged_grouped.iloc[ind, :], num_top_venues)

merged_borough_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Andritz,Gastropub,Spa,Supermarket,Construction & Landscaping,Discount Store,Doner Restaurant,Drugstore,Electronics Store,Event Space,Falafel Restaurant
1,Bad Cannstatt,German Restaurant,Café,Metro Station,Bakery,Drugstore,Theater,Bookstore,Hotel,Bus Stop,Asian Restaurant
2,Birkach,Construction & Landscaping,Trail,Farm,Discount Store,Doner Restaurant,Drugstore,Electronics Store,Event Space,Falafel Restaurant,German Pop-Up Restaurant
3,Botnang,Bakery,Metro Station,Supermarket,Wine Shop,Restaurant,Stadium,Furniture / Home Store,French Restaurant,Football Stadium,Flower Shop
4,Degerloch,Supermarket,Italian Restaurant,Shop & Service,Bakery,Paper / Office Supplies Store,Restaurant,German Restaurant,Thai Restaurant,Drugstore,French Restaurant


In [138]:
merged_grouped.shape

(40, 152)

Let's try find more clusters with data from both cities

In [139]:
#set number of clusters
kclusters = 15

merged_grouped_clustering = merged_grouped.drop('Borough',1)

#run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=3).fit(merged_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([10,  3,  9,  1,  3,  3,  3,  3,  3, 14])

In [140]:
#add cluster labels
merged_borough_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [141]:
merged_borough_venues_sorted

Unnamed: 0,Cluster Labels,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,10,Andritz,Gastropub,Spa,Supermarket,Construction & Landscaping,Discount Store,Doner Restaurant,Drugstore,Electronics Store,Event Space,Falafel Restaurant
1,3,Bad Cannstatt,German Restaurant,Café,Metro Station,Bakery,Drugstore,Theater,Bookstore,Hotel,Bus Stop,Asian Restaurant
2,9,Birkach,Construction & Landscaping,Trail,Farm,Discount Store,Doner Restaurant,Drugstore,Electronics Store,Event Space,Falafel Restaurant,German Pop-Up Restaurant
3,1,Botnang,Bakery,Metro Station,Supermarket,Wine Shop,Restaurant,Stadium,Furniture / Home Store,French Restaurant,Football Stadium,Flower Shop
4,3,Degerloch,Supermarket,Italian Restaurant,Shop & Service,Bakery,Paper / Office Supplies Store,Restaurant,German Restaurant,Thai Restaurant,Drugstore,French Restaurant
5,3,Eggenberg,Neighborhood,Grocery Store,Paper / Office Supplies Store,Restaurant,Park,Asian Restaurant,Pool,Betting Shop,Football Stadium,Lounge
6,3,Feuerbach,Italian Restaurant,Drugstore,Bakery,Greek Restaurant,Thai Restaurant,Dessert Shop,Burger Joint,Gas Station,Pet Store,Korean Restaurant
7,3,Geidorf,Italian Restaurant,Water Park,Seafood Restaurant,Spanish Restaurant,Austrian Restaurant,Pizza Place,Farm,French Restaurant,Football Stadium,Flower Shop
8,3,Gries,Pizza Place,Café,Supermarket,Thai Restaurant,Plaza,Discount Store,Clothing Store,Shoe Store,Shopping Mall,Electronics Store
9,14,Gösting,Snack Place,Asian Restaurant,Grocery Store,Soccer Field,Supermarket,Restaurant,Discount Store,Doner Restaurant,Drugstore,Electronics Store


In [142]:
merged_borough_venues_sorted[merged_borough_venues_sorted['Borough'] == 'Gries']

Unnamed: 0,Cluster Labels,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,3,Gries,Pizza Place,Café,Supermarket,Thai Restaurant,Plaza,Discount Store,Clothing Store,Shoe Store,Shopping Mall,Electronics Store


In [143]:
clusterNr = int(merged_borough_venues_sorted[merged_borough_venues_sorted['Borough'] == 'Gries']['Cluster Labels'])

In [145]:
merged_cluster = merged_borough_venues_sorted[merged_borough_venues_sorted['Cluster Labels'] == clusterNr]

In [146]:
merged_cluster.shape

(19, 12)

In [147]:
merged_cluster

Unnamed: 0,Cluster Labels,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,3,Bad Cannstatt,German Restaurant,Café,Metro Station,Bakery,Drugstore,Theater,Bookstore,Hotel,Bus Stop,Asian Restaurant
4,3,Degerloch,Supermarket,Italian Restaurant,Shop & Service,Bakery,Paper / Office Supplies Store,Restaurant,German Restaurant,Thai Restaurant,Drugstore,French Restaurant
5,3,Eggenberg,Neighborhood,Grocery Store,Paper / Office Supplies Store,Restaurant,Park,Asian Restaurant,Pool,Betting Shop,Football Stadium,Lounge
6,3,Feuerbach,Italian Restaurant,Drugstore,Bakery,Greek Restaurant,Thai Restaurant,Dessert Shop,Burger Joint,Gas Station,Pet Store,Korean Restaurant
7,3,Geidorf,Italian Restaurant,Water Park,Seafood Restaurant,Spanish Restaurant,Austrian Restaurant,Pizza Place,Farm,French Restaurant,Football Stadium,Flower Shop
8,3,Gries,Pizza Place,Café,Supermarket,Thai Restaurant,Plaza,Discount Store,Clothing Store,Shoe Store,Shopping Mall,Electronics Store
10,3,Hedelfingen,Metro Station,Home Service,Bar,German Restaurant,Drugstore,Fast Food Restaurant,Gastropub,Gas Station,Furniture / Home Store,French Restaurant
11,3,Innere Stadt,Café,Bar,Hotel,Plaza,Restaurant,Austrian Restaurant,Coffee Shop,Historic Site,Italian Restaurant,Park
16,3,Möhringen,Middle Eastern Restaurant,Metro Station,Bakery,German Restaurant,Café,Farm,Big Box Store,Swabian Restaurant,Trattoria/Osteria,Department Store
21,3,Puntigam,Brewery,Rock Club,Climbing Gym,Supermarket,Grocery Store,Bar,Gym,Sandwich Place,Video Store,Ice Cream Shop


## It seems out that most of the boroughs fall into the same cluster. That means that there is no much difference between these two cities. And now I have a lot of boroughs to choose out from.