# Capstone Project

## Introduction

In my capstone project, I will want to figure out which neighborhoods in Zurich, Switzerland, me and my wife could look to live in. We currently live in the District 2, more specifically in Wollishofen, and enjoy the parks we have very close to us. We do however lack restaurants and cafés here. Can we find a neighborhood in Zurich that has parks as well as cafés to buy housing in?

My project could help other citizens of Zürich to also find places that are similar to their own neighborhoods. As the center of Zurich is very expensive, finding places outside of the very center of Zurich could pay off a lot. Therefore, this project can also help in saving rent or costs when buying homes. 

## Methods

For this analysis I used data from Wikipedia: https://de.wikipedia.org/wiki/Stadtteile_der_Stadt_Z%C3%BCrich. This data was webscraped using the BeautifulSoup package. 

Additionally, after an extensive search, I found data on the Swiss coordinates on: https://simplemaps.com/data/ch-cities. 

Using these two sources, I was able to get the coordinates and match them with the neighborhoods. 

I used the kMeans method to find clusters of neighborhoods that are similar to the one we currently live in as this gives us an insight into which neighborhoods we might like to move to. 


## Data

Examples of the data I was working with:

The data I used from Wikipedia looked followingly: Kreis 1 was the borough; Rathaus, Hochschulen, Lindenhof, City were nieghborhoods in this borough. 

The longitudes, latitudes were, eg. Wollishofen - 47.3478° N, 8.5335° E.

I then leveraged the Foursquare data to answer my above question. 


## Analysis

In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd
import folium
import geocoder
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize 

from sklearn.cluster import KMeans

import matplotlib.cm as cm
import matplotlib.colors as colors

Info on Zurich and its district / neighborhoods

In [2]:
source = requests.get("https://de.wikipedia.org/wiki/Stadtteile_der_Stadt_Z%C3%BCrich").text
soup = BeautifulSoup(source,"html5lib")

tables = soup.find_all('table') #find all html tables in the web page

for index, table in enumerate(tables):
    if ("Stadtkreise" in str(table)):
        table_index = index
print(table_index)

0


In [3]:
zurich_data = pd.DataFrame(columns=["Stadtkreis", "Fläche", "Einwohner", "Dichte"])

for row in tables[table_index].tbody.find_all("tr"):
    col = row.find_all("td")
    if (col != []):
        kreis = col[0].text[:7]
        fläche = col[1].text.strip('\n')
        einwohner = col[2].text.strip()
        dichte = col[3].text.strip()
        zurich_data = zurich_data.append({"Stadtkreis":kreis, "Fläche":fläche, "Einwohner":einwohner, "Dichte":dichte}, ignore_index=True)

In [4]:
zurich_data.drop(zurich_data.tail(1).index, inplace=True)
zurich_data.at[9,'Stadtkreis']="Kreis 10"
zurich_data.at[10,'Stadtkreis']="Kreis 11"
zurich_data.at[11,'Stadtkreis']="Kreis 12"

as there is hardly any information available on neighborhoods, I created an own table:

- neighborhoods are called "Quartiere"
- boroughs are called "Stadtkreise"

In [5]:
nghb = {'Stadtkreis': ["Kreis 1", "Kreis 2", "Kreis 3", 
                       "Kreis 4", "Kreis 5", "Kreis 6", 
                       "Kreis 7", "Kreis 8", "Kreis 9", 
                       "Kreis 10", "Kreis 11", "Kreis 12"], 
        'Quartiere': ["Rathaus, Hochschulen, Lindenhof, City", "Wollishofen, Leimbach, Enge",
                      "Alt-Wiedikon, Friesenberg, Sihlfeld", "Werd, Langstrasse, Hard",
                      "Gewerbeschule, Escher Wyss", "Unterstrass, Oberstrass",
                      "Fluntern, Hottingen, Hirslanden, Witikon", "Seefeld, Mühlebach, Weinegg",
                      "Albisrieden, Altstetten", "Höngg, Wipkingen", "Affoltern, Oerlikon, Seebach",
                      "Saatlen, Schwamendingen-Mitte, Hirzenbach"
                     ]}
nghoods = pd.DataFrame(data=nghb)


zurich_data = zurich_data.join(nghoods, how='left', rsuffix = "Stadtkreis")
zurich_data = zurich_data.drop(['StadtkreisStadtkreis'], axis=1)

In [6]:
df_geo_coor = pd.read_csv("C:/Users/Michael/Desktop/PLZO_CSV_WGS84.csv")
df_geo_coor = df_geo_coor.dropna()
df_geo_coor = df_geo_coor.astype({"Unnamed: 1":int})
df_geo_coor = df_geo_coor.drop(columns=['Unnamed: 0', 'Unnamed: 2', 'Unnamed: 4', 'Unnamed: 8'])


df_geo_coor = df_geo_coor[df_geo_coor['Unnamed: 3'].str.contains("rich")]
indexNames = df_geo_coor[ (df_geo_coor['Unnamed: 1'] != 8001) & (df_geo_coor['Unnamed: 1'] != 8038) 
                        & (df_geo_coor['Unnamed: 1'] != 8055) & (df_geo_coor['Unnamed: 1'] != 8004)
                        & (df_geo_coor['Unnamed: 1'] != 8005) & (df_geo_coor['Unnamed: 1'] != 8006)
                        & (df_geo_coor['Unnamed: 1'] != 8032) & (df_geo_coor['Unnamed: 1'] != 8008)
                        & (df_geo_coor['Unnamed: 1'] != 8048) & (df_geo_coor['Unnamed: 1'] != 8037)
                        & (df_geo_coor['Unnamed: 1'] != 8050) & (df_geo_coor['Unnamed: 1'] != 8051)].index

df_geo_coor.drop(indexNames , inplace=True)
df_geo_coor

Unnamed: 0,Unnamed: 1,Unnamed: 3,Unnamed: 5,Unnamed: 6,Unnamed: 7
486,8001,Z�rich,ZH,8.541349,47.372049
492,8004,Z�rich,ZH,8.52337,47.378002
494,8005,Z�rich,ZH,8.520684,47.386766
496,8006,Z�rich,ZH,8.543175,47.385766
498,8008,Z�rich,ZH,8.561385,47.35364
500,8032,Z�rich,ZH,8.564439,47.366689
502,8037,Z�rich,ZH,8.524262,47.398495
504,8038,Z�rich,ZH,8.536905,47.341574
516,8048,Z�rich,ZH,8.483871,47.386747
520,8050,Z�rich,ZH,8.550802,47.411938


In [7]:
df_geo_coor = df_geo_coor.drop(columns=['Unnamed: 1', 'Unnamed: 3', 'Unnamed: 5'])
df_geo_coor = df_geo_coor.rename(columns={'Unnamed: 6': 'longitude', 'Unnamed: 7': 'latitude'})

df_geo_coor = df_geo_coor.reset_index()
df_geo_coor = df_geo_coor.reindex([0,7,11,1,2,3,5,4,8,6,9,10])
index = pd.Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
df_geo_coor = df_geo_coor.set_index(index)
df_geo_coor

Unnamed: 0,index,longitude,latitude
0,486,8.541349,47.372049
1,504,8.536905,47.341574
2,528,8.490292,47.364189
3,492,8.52337,47.378002
4,494,8.520684,47.386766
5,496,8.543175,47.385766
6,500,8.564439,47.366689
7,498,8.561385,47.35364
8,516,8.483871,47.386747
9,502,8.524262,47.398495


In [8]:
zurich_data["longitude"] = df_geo_coor["longitude"]
zurich_data["latitude"] = df_geo_coor["latitude"]

zurich_data = zurich_data.drop(columns=['Fläche', 'Einwohner', 'Dichte'])
zurich_data

Unnamed: 0,Stadtkreis,Quartiere,longitude,latitude
0,Kreis 1,"Rathaus, Hochschulen, Lindenhof, City",8.541349,47.372049
1,Kreis 2,"Wollishofen, Leimbach, Enge",8.536905,47.341574
2,Kreis 3,"Alt-Wiedikon, Friesenberg, Sihlfeld",8.490292,47.364189
3,Kreis 4,"Werd, Langstrasse, Hard",8.52337,47.378002
4,Kreis 5,"Gewerbeschule, Escher Wyss",8.520684,47.386766
5,Kreis 6,"Unterstrass, Oberstrass",8.543175,47.385766
6,Kreis 7,"Fluntern, Hottingen, Hirslanden, Witikon",8.564439,47.366689
7,Kreis 8,"Seefeld, Mühlebach, Weinegg",8.561385,47.35364
8,Kreis 9,"Albisrieden, Altstetten",8.483871,47.386747
9,Kreis 10,"Höngg, Wipkingen",8.524262,47.398495


In [9]:
address = 'Wollishofen, Zürich'

geolocator = Nominatim(user_agent="zh_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Wollishofen, Zürich is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Wollishofen, Zürich is 47.3424271, 8.5307085.


# Explore and Cluster

In [10]:
zurich_data_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(zurich_data['latitude'], zurich_data['longitude'], zurich_data['Stadtkreis'], zurich_data['Quartiere']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(zurich_data_map)
    
zurich_data_map

In [11]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
ACCESS_TOKEN = '' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KZMA35V044ZK5MXXV4HP54C3540NFU2XIQGRY0NQM3XFO40S
CLIENT_SECRET:QOKBVTPE55003ESJLC4FO0T5CUPS3MOXGYFPVHXDY02UQHTS


#### Let's explore my neighborhood in the dataframe.

In [12]:
zurich_data.loc[1,'Quartiere']

'Wollishofen, Leimbach, Enge'

In [13]:
quartier_latitude = zurich_data.loc[1, 'latitude'] # neighborhood latitude value
quartier_longitude = zurich_data.loc[1, 'longitude'] # neighborhood longitude value

quartier_name = zurich_data.loc[1, 'Quartiere'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(quartier_name, 
                                                               quartier_latitude, 
                                                               quartier_longitude))

Latitude and longitude values of Wollishofen, Leimbach, Enge are 47.34157429220139, 8.53690475536683.


#### Now, let's get the top 20 venues that are in Wollishofen within a radius of 500 meters.

In [14]:
# type your answer here
LIMIT = 20 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, quartier_latitude, quartier_longitude, radius, LIMIT)

results = requests.get(url).json()

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues = results['response']['venues']
    
nearby_venues = json_normalize(venues) # flatten JSON


# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Rote Fabrik,Music Venue,47.343607,8.536923
1,Cassiopeiasteg,Bridge,47.342183,8.538094
2,Badi Wollishofen,Pool,47.340901,8.537727
3,Ziegel oh Lac,Swiss Restaurant,47.343652,8.536766
4,Schulhaus Hans Asper,School,47.34128,8.534386
5,Hafen Wollishofen,Harbor / Marina,47.340456,8.539185
6,Ziegel Oh Lac,Music Venue,47.343117,8.53604
7,Wöschi,Swiss Restaurant,47.340526,8.538174
8,Trois Pommes Outlet,Boutique,47.340074,8.537858
9,Chelsey Schill,Music Venue,47.344231,8.531946


Interesting.. it seems I don't know my neighborhood as well as I thought I would. I've never been to the restaurant "Ziegel oh Lac". We do enjoy spending time at the harbour, however. 

## Explore Neighborhoods in Zürich

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Quartiere', 
                  'Quartiere Latitude', 
                  'Quartiere Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

# type your answer here
zurich_venues = getNearbyVenues(names=zurich_data['Quartiere'],
                                   latitudes=zurich_data['latitude'],
                                   longitudes=zurich_data['longitude']
                                  )

Rathaus, Hochschulen, Lindenhof, City
Wollishofen, Leimbach, Enge
Alt-Wiedikon, Friesenberg, Sihlfeld
Werd, Langstrasse, Hard
Gewerbeschule, Escher Wyss
Unterstrass, Oberstrass
Fluntern, Hottingen, Hirslanden, Witikon
Seefeld, Mühlebach, Weinegg
Albisrieden, Altstetten
Höngg, Wipkingen
Affoltern, Oerlikon, Seebach
Saatlen, Schwamendingen-Mitte, Hirzenbach


In [16]:
zurich_venues.groupby('Quartiere').count()

Unnamed: 0_level_0,Quartiere Latitude,Quartiere Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Quartiere,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Affoltern, Oerlikon, Seebach",20,20,20,20,20,20
"Albisrieden, Altstetten",15,15,15,15,15,15
"Alt-Wiedikon, Friesenberg, Sihlfeld",12,12,12,12,12,12
"Fluntern, Hottingen, Hirslanden, Witikon",18,18,18,18,18,18
"Gewerbeschule, Escher Wyss",20,20,20,20,20,20
"Höngg, Wipkingen",6,6,6,6,6,6
"Rathaus, Hochschulen, Lindenhof, City",20,20,20,20,20,20
"Saatlen, Schwamendingen-Mitte, Hirzenbach",5,5,5,5,5,5
"Seefeld, Mühlebach, Weinegg",20,20,20,20,20,20
"Unterstrass, Oberstrass",20,20,20,20,20,20


In [17]:
print('There are {} uniques categories.'.format(len(zurich_venues['Venue Category'].unique())))

There are 82 uniques categories.


In [18]:
# one hot encoding
zurich_onehot = pd.get_dummies(zurich_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
zurich_onehot['Quartiere'] = zurich_venues['Quartiere'] 

# move neighborhood column to the first column
fixed_columns = [zurich_onehot.columns[-1]] + list(zurich_onehot.columns[:-1])
zurich_onehot = zurich_onehot[fixed_columns]

zurich_grouped = zurich_onehot.groupby('Quartiere').mean().reset_index()


In [19]:
num_top_venues = 5

for hood in zurich_grouped['Quartiere']:
    print("----"+hood+"----")
    temp = zurich_grouped[zurich_grouped['Quartiere'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Affoltern, Oerlikon, Seebach----
                venue  freq
0                 Pub  0.10
1  Falafel Restaurant  0.05
2        Burger Joint  0.05
3                Pool  0.05
4         Salad Place  0.05


----Albisrieden, Altstetten----
              venue  freq
0       Bus Station  0.13
1       Supermarket  0.13
2  Swiss Restaurant  0.07
3             Plaza  0.07
4     Shopping Mall  0.07


----Alt-Wiedikon, Friesenberg, Sihlfeld----
                venue  freq
0               Hotel  0.33
1    Swiss Restaurant  0.17
2          Playground  0.08
3  Light Rail Station  0.08
4      Scenic Lookout  0.08


----Fluntern, Hottingen, Hirslanden, Witikon----
              venue  freq
0      Tram Station  0.17
1             Plaza  0.11
2  Swiss Restaurant  0.11
3             Hotel  0.11
4       Gas Station  0.06


----Gewerbeschule, Escher Wyss----
                             venue  freq
0                        Nightclub  0.15
1                              Bar  0.10
2                Accesso

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Quartiere']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Quartiere'] = zurich_grouped['Quartiere']

for ind in np.arange(zurich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(zurich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Quartiere,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Affoltern, Oerlikon, Seebach",Pub,Pizza Place,Steakhouse,Pool,Lounge,Salad Place,Burger Joint,Falafel Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant
1,"Albisrieden, Altstetten",Supermarket,Bus Station,Plaza,Fast Food Restaurant,Pool,Shopping Mall,Mexican Restaurant,Bakery,Mediterranean Restaurant,Swiss Restaurant
2,"Alt-Wiedikon, Friesenberg, Sihlfeld",Hotel,Swiss Restaurant,Playground,Light Rail Station,Tea Room,Grocery Store,Scenic Lookout,Molecular Gastronomy Restaurant,Food Truck,Design Studio
3,"Fluntern, Hottingen, Hirslanden, Witikon",Tram Station,Plaza,Swiss Restaurant,Hotel,Gas Station,Bakery,Park,Modern European Restaurant,Cable Car,Light Rail Station
4,"Gewerbeschule, Escher Wyss",Nightclub,Bar,Accessories Store,Café,Italian Restaurant,Jazz Club,Market,Mediterranean Restaurant,Gastropub,Food Truck


## 4. Cluster Neighborhoods

In [21]:
# set number of clusters
kclusters = 5

zurich_grouped_clustering = zurich_grouped.drop('Quartiere', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(zurich_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 3, 1, 2, 3, 4, 1, 3])

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

zurich_merged = zurich_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
zurich_merged = zurich_merged.join(neighborhoods_venues_sorted.set_index('Quartiere'), on='Quartiere')

zurich_merged.head() # check the last columns!

Unnamed: 0,Stadtkreis,Quartiere,longitude,latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Kreis 1,"Rathaus, Hochschulen, Lindenhof, City",8.541349,47.372049,3,Café,Hotel,Arts & Crafts Store,Swiss Restaurant,Cocktail Bar,Plaza,Italian Restaurant,Lounge,Gourmet Shop,Pedestrian Plaza
1,Kreis 2,"Wollishofen, Leimbach, Enge",8.536905,47.341574,1,Harbor / Marina,Restaurant,Bus Station,Pier,Fast Food Restaurant,Tram Station,Community Center,Thai Restaurant,Swiss Restaurant,Pool
2,Kreis 3,"Alt-Wiedikon, Friesenberg, Sihlfeld",8.490292,47.364189,0,Hotel,Swiss Restaurant,Playground,Light Rail Station,Tea Room,Grocery Store,Scenic Lookout,Molecular Gastronomy Restaurant,Food Truck,Design Studio
3,Kreis 4,"Werd, Langstrasse, Hard",8.52337,47.378002,3,Café,Bakery,Bar,Swiss Restaurant,Wine Bar,Pizza Place,Burger Joint,Indian Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
4,Kreis 5,"Gewerbeschule, Escher Wyss",8.520684,47.386766,1,Nightclub,Bar,Accessories Store,Café,Italian Restaurant,Jazz Club,Market,Mediterranean Restaurant,Gastropub,Food Truck


## Results

In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(zurich_merged['latitude'], zurich_merged['longitude'], zurich_merged['Quartiere'], zurich_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

In [24]:
zurich_merged.loc[zurich_merged['Cluster Labels'] == 1, zurich_merged.columns[[1] + list(range(5, zurich_merged.shape[1]))]]

Unnamed: 0,Quartiere,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Wollishofen, Leimbach, Enge",Harbor / Marina,Restaurant,Bus Station,Pier,Fast Food Restaurant,Tram Station,Community Center,Thai Restaurant,Swiss Restaurant,Pool
4,"Gewerbeschule, Escher Wyss",Nightclub,Bar,Accessories Store,Café,Italian Restaurant,Jazz Club,Market,Mediterranean Restaurant,Gastropub,Food Truck
7,"Seefeld, Mühlebach, Weinegg",Museum,Swiss Restaurant,Restaurant,Café,Snack Place,Italian Restaurant,Food Court,Mexican Restaurant,Park,Performing Arts Venue
8,"Albisrieden, Altstetten",Supermarket,Bus Station,Plaza,Fast Food Restaurant,Pool,Shopping Mall,Mexican Restaurant,Bakery,Mediterranean Restaurant,Swiss Restaurant
10,"Affoltern, Oerlikon, Seebach",Pub,Pizza Place,Steakhouse,Pool,Lounge,Salad Place,Burger Joint,Falafel Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant


## Discussion

According to this kMeans clustering, Wollishofen is similar to four other neighborhoods. Interestingly enough, this also confirms our opinion, as we were mainly looking for new apartments in the neigborhoods "Seefeld, Mühlebach, Weinegg" and "Gewerbeschule, Escher Wyss", too. So any of these neighborhoods could be potential winners. We especially like Seefeld as its close to the lake and the city, but still has some parks in it. Looking at the most common venues, there is a huge variety of restaurants there, too. 


## Let's have a closer look at our candidate Seefeld

We can see that it's close to the lake and the green spots on the map. How is it with Cafés though?
Let's see where they are and how many there are. 

In [25]:
address = 'Seefeld, Zürich'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)


47.3577831 8.5510737


In [26]:
search_query = 'Café'
radius = 500



url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]


venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Seefeld',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)


# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

    
address = 'Wollishofen, Zürich'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)



# display map
venues_map

  dataframe = json_normalize(venues)


47.3424271 8.5307085


## How does this compare with Wollishofen?

In [27]:
address = 'Wollishofen, Zürich'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

47.3424271 8.5307085


In [28]:
search_query = 'Café'
radius = 500



url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]


venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Seefeld',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)


# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

  dataframe = json_normalize(venues)


We see that both places have some cafés, with Seefeld having some more, and both neighborhoods are close to the lake. 

## Conclusion

All in all, due to this analysis, I could find neighborhoods in Zurich that were similar to the neighborhood we are currently living in. Seefeld was among the found nieghborhoods and we actually like that neighborhood a lot. As it has a bit more cafés, close access to the lake as well as some parks, it is a viable candidate to look for places in. On top of this, there is a huge variety of restaurants which is a huge plus, too. 

I believe to have found a matching neighborhood to ours due to this analysis and will look more closely at possible places to live in. 