# IBM Capstone project: Battle of Neighbourhoods in Zurich, Switzerland
In this notebook we will cluster the Neighbourhoods in Zurich based on the data retrieved by the Foursquare API. We will then draw the cluster onto a map of Zurich with the Folium library and additionally draw the districts of Zurich on top of the map. Next, the rental prices of each Neighbourhood will be plotted on top. With the final map a user can look up which type of cluster best suits him and already get a first impression of which Neighbourhoods can be cheaper to live in.

Start notebook by importing all necessary libraries

In [115]:
import pandas as pd

import requests
from bs4 import BeautifulSoup

from geopy.geocoders import Nominatim
import folium

from sklearn.cluster import KMeans

import json

import matplotlib.cm as cm
import matplotlib.colors as colors

from urllib.parse import urlparse, urljoin

#### Define user credentials to Foursquare API

In [116]:
# @hidden_cell
{
    "tags": [
        "hide-cell",
    ]
}

CLIENT_ID = '2ZAQYVFBCUVQV0KZ1DD3R1B50YRIRUDDDUU0K5CSEMR12LLG' # your Foursquare ID
CLIENT_SECRET = 'MAP5H1QI4U2TW2BGWFAQV5PZ2OOJ1SPJVOJSUPYYKABL302H' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 2ZAQYVFBCUVQV0KZ1DD3R1B50YRIRUDDDUU0K5CSEMR12LLG
CLIENT_SECRET:MAP5H1QI4U2TW2BGWFAQV5PZ2OOJ1SPJVOJSUPYYKABL302H


## Retrieve Zurich districts and Neighbourhoods

The city of Zurich manages a website where open data is stored. From there I downloaded the csv file containing the cities neighbourhoods, districts and other info. The city is composed of 12 districts which are known as Kreise (german for circles). Each district is composed of a couple of neighbourhoods.

Here is the link to the data: https://data.stadt-zuerich.ch/dataset/geo_statistische_quartiere/resource/cf640846-bf40-4bc8-ab2a-a1b8052fc424

On the git repository the file can be found in the data folder. From the csv file I only keep the district (number raning from 1 to 12 for each district/kreis) and the neighbourhood.

In [143]:
df_neighbourhoods = pd.read_csv('./data/stzh.adm_statistische_quartiere_v.csv')
# drop some columns and rename columns
df_neighbourhoods.drop(columns=['objid', 'qnr', 'kname', 'geometry'], inplace=True)
df_neighbourhoods.rename(columns={'knr': 'District', 'qname': 'Neighbourhood'}, inplace=True)
df_neighbourhoods = df_neighbourhoods.reindex(columns=['District', 'Neighbourhood'])
df_neighbourhoods.head(5)

Unnamed: 0,objid,qnr,qname,knr,kname,geometry
0,4,92,Altstetten,9,Kreis 9,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
1,28,101,Höngg,10,Kreis 10,"POLYGON ((2680537.8 1249894.5,2680532.2 124989..."


## Now that we have a list of all neighbourhoods we can retrieve the latitude/longitude values of each neighbourhood using geopy


Mietpreise nach Kreis: https://www.stadt-zuerich.ch/prd/de/index/statistik/themen/bauen-wohnen/mietpreise/mietpreise-strukturerhebung.html


Mietpreis nach Quartier: https://realadvisor.ch/en/property-prices/city-zurich

In [120]:
address = 'Zurich City, Zurich'
geolocator = Nominatim(user_agent="zurich_city")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinates of Zurich are lat: {}, lon: {}.\n'.format(latitude, longitude))

print('Retrieving coordinates for all Neighbourghoods in Zurich')
latitudes, longitudes = [], []
for i, hood in enumerate(df_neighbourhoods['Neighbourhood']):
    
    # Make sure to add 'Kreis' in geolocator.geocode otherwise wront coordinates could be retrieved.
    location = geolocator.geocode('{}, Zurich, Kreis'.format(hood))
    latitudes.append(location.latitude)
    longitudes.append(location.longitude)

The geograpical coordinates of Zurich are lat: 47.3744489, lon: 8.5410422.

Retrieving coordinates for all Neighbourghoods in Zurich


After retrieving the coordinates we write them to a pandas dataframe and append them to the neighbourhood dataframe

In [121]:
latlng = {
    "Latitude": latitudes,
    "Longitude": longitudes
}

df_latlng = pd.DataFrame(latlng)

df_neighbourhoods=pd.concat([df_neighbourhoods, df_latlng], axis=1)
df_neighbourhoods.head(5)

Unnamed: 0,District,Neighbourhood,Latitude,Longitude
0,9,Altstetten,47.387403,8.486061
1,10,Höngg,47.40166,8.497715
2,11,Affoltern,47.418762,8.507186
3,2,Wollishofen,47.342427,8.530708
4,3,Friesenberg,47.354922,8.500523


#### Now a map of Zurich with a marker for each neighourhood can be created. Additionally we will map the exact boundaries of each neighbourhood and group them by district (kreis)

To draw the boundaries of each neighbourhood I downloaded a geojson file from: https://data.stadt-zuerich.ch/dataset/geo_statistische_quartiere/resource/3c384ced-12ac-4578-b3da-bc86feb690d4. Then the geojson file is passed to the choropleth class.

In [129]:
state_geo = './data/stzh.adm_statistische_quartiere_v.json'

In [142]:
# create map of Toronto using latitude and longitude values
map_zurich = folium.Map(location=[latitude, longitude], zoom_start=11)
    
map_zurich.choropleth(
    geo_data=state_geo,
    name='choropleth',
    data=bla2,
    columns=['Neighbourhood', 'District'],
    key_on='feature.properties.qname',
    #fill_color='OrRd',
    fill_color='Paired',
    fill_opacity=0.7,
    line_opacity=0.7,
    legend_name='Clusters',
    bins=12
)
#folium.LayerControl().add_to(map_zurich)

# add markers to map
for lat, lng, district, neighbourhood in zip(
    df_neighbourhoods['Latitude'], df_neighbourhoods['Longitude'], 
    df_neighbourhoods['District'], df_neighbourhoods['Neighbourhood']):
    
    label = '{}, {}'.format(neighbourhood, district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False
    ).add_to(map_zurich)
    
map_zurich



## Retrieving venues in each Neighbourhood with the Foursquare API

#### Functions to retrieve venues in all Neighbourhoods

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Note that we retrieve all venues withing a radius of 500 m from the location of the Neighbourhood
While this is an easy way to retrieve venues for each Neighbourhood there are a couple of issues that should be kept in mind. Since all neighbourhoods have different shapes and sizes (as shown on the map above), venues from a large neighbourhood will not be included in the search and a small neighbourhood might contain venues that are actually located within its neighbouring neighbourhoods. Not to forget that the maximum number of retrieved venues with a free account on Foursquare is limited to 100.

If one is interested in getting the correct and all venues for each neighbourhood I would suggest to create a grid of lonitude and latitude pairs all over the city of choice. Next, I'd retrieve the venues within a fixed radius for each geolocation and save it in a dataframe. Then I'd remove all duplicate venues. Finally, the venues can be queried given their address and added to the correct neighbourhood. Certainly better ways of getting all venues per neighbourhood exist but it is my first time working with the Foursquares API and this approach is currently the easiest one that I could think of in terms of implimentation.

In [40]:
zh_venues = getNearbyVenues(names=df_neighbourhoods['Neighbourhood'],
                           latitudes=df_neighbourhoods['Latitude'],
                           longitudes=df_neighbourhoods['Longitude'],
                           radius=500)

Check size of venues dataframe

In [41]:
print(zh_venues.shape)
zh_venues.head()

(1083, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altstetten,47.387403,8.486061,La Taquería,47.38857,8.486356,Mexican Restaurant
1,Altstetten,47.387403,8.486061,Memo Bar,47.387783,8.486138,Fast Food Restaurant
2,Altstetten,47.387403,8.486061,Masa Restaurant,47.389121,8.488317,Mediterranean Restaurant
3,Altstetten,47.387403,8.486061,Asian Dining,47.38957,8.486498,Asian Restaurant
4,Altstetten,47.387403,8.486061,Samurai VII,47.388589,8.489764,Japanese Restaurant


In [42]:
zh_venues['Venue Category'].unique()

array(['Mexican Restaurant', 'Fast Food Restaurant',
       'Mediterranean Restaurant', 'Asian Restaurant',
       'Japanese Restaurant', 'Plaza', 'Supermarket', 'Gym',
       'Chinese Restaurant', 'Café', 'French Restaurant', 'Burger Joint',
       'Italian Restaurant', 'Swiss Restaurant',
       'Paper / Office Supplies Store', 'Hotel', 'Discount Store',
       'Doner Restaurant', 'Bakery', 'Food', 'Bus Station',
       'Sandwich Place', 'Grocery Store', 'Gym Pool', 'Steakhouse',
       'Gas Station', 'Tram Station', 'Health & Beauty Service',
       'Pizza Place', 'Beach', 'Train Station', 'Department Store',
       'Diner', 'Gym / Fitness Center', 'Athletics & Sports',
       'Bike Trail', 'Light Rail Station', 'Restaurant', 'Cheese Shop',
       'Music Venue', 'Irish Pub', 'Salon / Barbershop', 'Stables',
       'Tennis Court', 'Dessert Shop', 'Bistro', 'Theater',
       'Korean Restaurant', 'Coffee Shop', 'History Museum',
       'Vietnamese Restaurant', 'Indian Restaurant', 'Chu

In [43]:
n_venues = zh_venues.groupby('Neighbourhood').count().reset_index()
n_venues = n_venues[['Neighbourhood','Neighbourhood Latitude']].rename({'Neighbourhood':'Neighbourhood','Neighbourhood Latitude':'n_venues'}, axis=1)
n_venues

Unnamed: 0,Neighbourhood,n_venues
0,Affoltern,15
1,Albisrieden,7
2,Alt-Wiedikon,19
3,Altstetten,26
4,City,70
5,Enge,37
6,Escher Wyss,73
7,Fluntern,13
8,Friesenberg,2
9,Gewerbeschule,59


In [44]:
df_neighbourhoods = pd.merge(df_neighbourhoods, n_venues, on='Neighbourhood')

In [45]:
df_neighbourhoods

Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues
0,9,Altstetten,47.387403,8.486061,26
1,10,Höngg,47.40166,8.497715,16
2,11,Affoltern,47.418762,8.507186,15
3,2,Wollishofen,47.342427,8.530708,11
4,3,Friesenberg,47.354922,8.500523,2
5,7,Hottingen,47.36968,8.555082,22
6,7,Witikon,47.35831,8.590628,10
7,11,Seebach,47.420438,8.548377,16
8,9,Albisrieden,47.374857,8.484657,7
9,2,Leimbach,47.330511,8.512539,4


#### Mapping Neighbourhoods based on number of venues
From the table above we can see that some neighborhoods have very few venues compared to others. This is usual in cities where most venues are located closer to the center. To visualize this we will map the neighbourhoods with less than 12 venues in blue and the others in red.

We will also use this to retrieve all neighbourhoods with less than 12 venues to discard them.

In [46]:


# create map of Toronto using latitude and longitude values
map_zurich = folium.Map(location=[latitude, longitude], zoom_start=12)

remove_neighbourhoods = []
# add markers to map
for lat, lng, neighborhood, n_venues in zip(
    df_neighbourhoods['Latitude'], df_neighbourhoods['Longitude'], 
    df_neighbourhoods['Neighbourhood'], df_neighbourhoods['n_venues']):
    
    if n_venues < 12:
        clr = 'blue'
        remove_neighbourhoods.append(neighborhood)
    else:
        clr = 'red'
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=clr,
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_zurich)  
    
map_zurich

#### Remove neighbourhoods with a small number of venues form the analysis

In [47]:
#for i, neigh in enumerate(remove_neighbourhoods):
#    indexNames = zh_venues[zh_venues['Neighbourhood']==neigh].index
#    indexNames2 = df_neighbourhoods[df_neighbourhoods['Neighbourhood']==neigh].index
#    zh_venues.drop(indexNames, inplace=True)
#    df_neighbourhoods.drop(indexNames2, inplace=True)

## Analyse neighbourhoods

In [48]:
# one hot encoding
zurich_onehot = pd.get_dummies(zh_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
zurich_onehot['Neighbourhood'] = zh_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [zurich_onehot.columns[-1]] + list(zurich_onehot.columns[:-1])
zurich_onehot = zurich_onehot[fixed_columns]

zurich_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,...,Tram Station,Trattoria/Osteria,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Altstetten,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Altstetten,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Altstetten,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Altstetten,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Altstetten,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [49]:
zurich_grouped = zurich_onehot.groupby('Neighbourhood').mean().reset_index()
zurich_grouped

Unnamed: 0,Neighbourhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,...,Tram Station,Trattoria/Osteria,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Affoltern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Albisrieden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alt-Wiedikon,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0
3,Altstetten,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,City,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,...,0.0,0.0,0.042857,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0
5,Enge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027
6,Escher Wyss,0.0,0.0,0.0,0.0,0.027397,0.0,0.0,0.0,0.013699,...,0.027397,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fluntern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Friesenberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Gewerbeschule,0.016949,0.0,0.0,0.016949,0.0,0.0,0.050847,0.0,0.0,...,0.016949,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.016949


#### Print 5 most common venues for each neighbourhood

In [50]:
num_top_venues = 5

for hood in zurich_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = zurich_grouped[zurich_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Affoltern----
                  venue  freq
0           Bus Station  0.20
1           Supermarket  0.13
2      Department Store  0.07
3         Grocery Store  0.07
4  Gym / Fitness Center  0.07


----Albisrieden----
               venue  freq
0        Supermarket  0.14
1        Pizza Place  0.14
2   Swiss Restaurant  0.14
3  Trattoria/Osteria  0.14
4     Scenic Lookout  0.14


----Alt-Wiedikon----
                venue  freq
0          Restaurant  0.16
1  Italian Restaurant  0.11
2              Bakery  0.05
3     Thai Restaurant  0.05
4         Beer Garden  0.05


----Altstetten----
                venue  freq
0         Supermarket  0.15
1      Discount Store  0.08
2                 Gym  0.04
3  Mexican Restaurant  0.04
4    Doner Restaurant  0.04


----City----
                           venue  freq
0             Italian Restaurant  0.09
1                            Bar  0.09
2                   Cocktail Bar  0.06
3  Vegetarian / Vegan Restaurant  0.04
4                          H

#### Write the results into pandas dataframe
We keep the 10 top venues of each hood

In [51]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [52]:
zurich_grouped

Unnamed: 0,Neighbourhood,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,...,Tram Station,Trattoria/Osteria,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Water Park,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Affoltern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Albisrieden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alt-Wiedikon,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,...,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0
3,Altstetten,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,City,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,...,0.0,0.0,0.042857,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0
5,Enge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027
6,Escher Wyss,0.0,0.0,0.0,0.0,0.027397,0.0,0.0,0.0,0.013699,...,0.027397,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Fluntern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Friesenberg,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Gewerbeschule,0.016949,0.0,0.0,0.016949,0.0,0.0,0.050847,0.0,0.0,...,0.016949,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.016949


In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = zurich_grouped['Neighbourhood']

for ind in np.arange(zurich_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(zurich_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Affoltern,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner
1,Albisrieden,Bakery,Scenic Lookout,Pizza Place,Supermarket,Swiss Restaurant,Trattoria/Osteria,Grocery Store,Flea Market,Fast Food Restaurant,Farmers Market
2,Alt-Wiedikon,Restaurant,Italian Restaurant,Light Rail Station,Burrito Place,Lounge,Farmers Market,Supermarket,Beer Garden,Tapas Restaurant,Bakery
3,Altstetten,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant
4,City,Bar,Italian Restaurant,Cocktail Bar,Vegetarian / Vegan Restaurant,Restaurant,Plaza,Pedestrian Plaza,Department Store,Hotel,Boutique


## Clustering neighbourhoods

Let's first determine the optimal number of clusters with the elbow method.

In [59]:
inertia = []
n_clusters = list(range(1,15))
for i, k in enumerate(n_clusters):
    kmeans = KMeans(n_clusters=k, n_init=20)
    kmeans.fit(zurich_grouped_cluster)
    inertia.append(kmeans.inertia_)

NameError: name 'zurich_grouped_cluster' is not defined

In [60]:
%matplotlib notebook
fig = plt.figure(figsize=(10, 10))
plt.plot(n_clusters, inertia, '-o')
plt.grid(True)

plt.xlabel('Number of clusters k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')

NameError: name 'plt' is not defined

In [61]:

# set number of clusters
kclusters = 8

zurich_grouped_cluster = zurich_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, n_init=20).fit(zurich_grouped_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([5, 0, 1, 1, 1, 1, 1, 0, 4, 1], dtype=int32)

In [62]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

zurich_merged = df_neighbourhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
zurich_merged = zurich_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

zurich_merged.head() # check the last columns!

Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,9,Altstetten,47.387403,8.486061,26,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant
1,10,Höngg,47.40166,8.497715,16,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach
2,11,Affoltern,47.418762,8.507186,15,5,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner
3,2,Wollishofen,47.342427,8.530708,11,5,Supermarket,Plaza,Cheese Shop,Bus Station,Fast Food Restaurant,Irish Pub,Swiss Restaurant,Restaurant,Salon / Barbershop,Music Venue
4,3,Friesenberg,47.354922,8.500523,2,4,Stables,Tennis Court,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant


In [77]:
# create map


map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []

    folium.Marker(
        location=[latitude,longitude],
        popup=price
    ).add_to(map_cluster)
    
for lat, lon, poi, cluster in zip(zurich_merged['Latitude'], zurich_merged['Longitude'],
                                  zurich_merged['Neighbourhood'], zurich_merged['Cluster Labels']):
    
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

IndentationError: unexpected indent (<ipython-input-77-d55a29f66918>, line 16)

In [64]:
zurich_merged

Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,9,Altstetten,47.387403,8.486061,26,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant
1,10,Höngg,47.40166,8.497715,16,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach
2,11,Affoltern,47.418762,8.507186,15,5,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner
3,2,Wollishofen,47.342427,8.530708,11,5,Supermarket,Plaza,Cheese Shop,Bus Station,Fast Food Restaurant,Irish Pub,Swiss Restaurant,Restaurant,Salon / Barbershop,Music Venue
4,3,Friesenberg,47.354922,8.500523,2,4,Stables,Tennis Court,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant
5,7,Hottingen,47.36968,8.555082,22,1,Hotel,Swiss Restaurant,Italian Restaurant,Tram Station,Plaza,Grocery Store,Dessert Shop,Coffee Shop,Supermarket,Korean Restaurant
6,7,Witikon,47.35831,8.590628,10,5,Bus Station,Department Store,Indian Restaurant,Supermarket,Discount Store,Tram Station,Church,Bakery,Grocery Store,Factory
7,11,Seebach,47.420438,8.548377,16,0,Bakery,Hookah Bar,Tram Station,Korean Restaurant,Laser Tag,Eastern European Restaurant,Pharmacy,Pizza Place,Pool,Supermarket
8,9,Albisrieden,47.374857,8.484657,7,0,Bakery,Scenic Lookout,Pizza Place,Supermarket,Swiss Restaurant,Trattoria/Osteria,Grocery Store,Flea Market,Fast Food Restaurant,Farmers Market
9,2,Leimbach,47.330511,8.512539,4,6,Grocery Store,Pharmacy,Gas Station,Trail,Yoga Studio,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant


## Examine clusters 



#### Cluster 1

In [65]:
for i in range(0,kclusters):
    print('\n Cluster {}'.format(i+1))
    display(zurich_merged.loc[zurich_merged['Cluster Labels'] == i, zurich_merged.columns[[1] + list(range(5, zurich_merged.shape[1]))]])


 Cluster 1


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Seebach,0,Bakery,Hookah Bar,Tram Station,Korean Restaurant,Laser Tag,Eastern European Restaurant,Pharmacy,Pizza Place,Pool,Supermarket
8,Albisrieden,0,Bakery,Scenic Lookout,Pizza Place,Supermarket,Swiss Restaurant,Trattoria/Osteria,Grocery Store,Flea Market,Fast Food Restaurant,Farmers Market
10,Fluntern,0,Tram Station,Bakery,Grocery Store,Supermarket,Café,Gastropub,Indie Movie Theater,Bus Station,Plaza,Falafel Restaurant



 Cluster 2


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Altstetten,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant
1,Höngg,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach
5,Hottingen,1,Hotel,Swiss Restaurant,Italian Restaurant,Tram Station,Plaza,Grocery Store,Dessert Shop,Coffee Shop,Supermarket,Korean Restaurant
12,Oerlikon,1,Supermarket,Hotel,Italian Restaurant,Indian Restaurant,Restaurant,Pub,Kebab Restaurant,Tram Station,Indonesian Restaurant,Other Great Outdoors
13,Oberstrass,1,Hotel,Bakery,Tram Station,Bus Station,Swiss Restaurant,Italian Restaurant,Supermarket,Bistro,Restaurant,Cable Car
14,Unterstrass,1,Middle Eastern Restaurant,Café,Bakery,Tram Station,Italian Restaurant,Hotel,Grocery Store,Falafel Restaurant,Supermarket,Swiss Restaurant
15,Seefeld,1,Italian Restaurant,Café,Hotel,Art Museum,Restaurant,Salad Place,Bakery,Supermarket,Swiss Restaurant,Park
16,Enge,1,Italian Restaurant,Hotel,Bar,History Museum,Tram Station,Supermarket,Café,Swiss Restaurant,Plaza,Park
17,Hirslanden,1,Tram Station,Plaza,Park,Hotel,Mediterranean Restaurant,Steakhouse,Swiss Restaurant,Korean Restaurant,Grocery Store,Pizza Place
18,Wipkingen,1,Grocery Store,Italian Restaurant,Breakfast Spot,Train Station,Supermarket,Eastern European Restaurant,Business Service,Bakery,Restaurant,Bar



 Cluster 3


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,Hirzenbach,2,Tram Station,Steakhouse,Furniture / Home Store,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market



 Cluster 4


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Saatlen,3,Bus Station,Supermarket,Arts & Crafts Store,Kebab Restaurant,Bagel Shop,Yoga Studio,Factory,Food Truck,Food & Drink Shop,Food



 Cluster 5


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Friesenberg,4,Stables,Tennis Court,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant



 Cluster 6


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Affoltern,5,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner
3,Wollishofen,5,Supermarket,Plaza,Cheese Shop,Bus Station,Fast Food Restaurant,Irish Pub,Swiss Restaurant,Restaurant,Salon / Barbershop,Music Venue
6,Witikon,5,Bus Station,Department Store,Indian Restaurant,Supermarket,Discount Store,Tram Station,Church,Bakery,Grocery Store,Factory
11,Schwamendingen-Mitte,5,Bus Station,Tram Station,Fast Food Restaurant,Light Rail Station,Shopping Mall,Supermarket,Swiss Restaurant,Thai Restaurant,Restaurant,Plaza



 Cluster 7


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Leimbach,6,Grocery Store,Pharmacy,Gas Station,Trail,Yoga Studio,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant



 Cluster 8


Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Weinegg,7,Bus Station,Restaurant,Skate Park,Modern European Restaurant,Grocery Store,Yoga Studio,Event Space,Food,Flea Market,Fast Food Restaurant


## Retrieve Zurich rent data

Retrieve data from https://realadvisor.ch/en/property-prices/city-zurich

In [67]:
urljoin(url,"/en/property-prices/neighbourhood-escher-wyss")

NameError: name 'url' is not defined

In [390]:
BeautifulSoup?

https://realadvisor.ch/en/property-prices/neighbourhood-escher-wyss

In [68]:
url = 'https://realadvisor.ch/en/property-prices/city-zurich'
html_content = requests.get(url).text
# parse html content
soup = BeautifulSoup(html_content, "lxml-xml")

I scrape the website of realadvisor to get the median prices per sqm per year for apartments in Zurichs neighbourhood.
For some reason directly scraping the table "Price per sqm for each neighbourhood in Zurich" was not possible: I could only retrieve all columns until the neighbourhood row City. After that only the neighbourhood was returned without rhe prices. Therefore I retrieve the links to each neighbourhood. Then I read the html address with BeautifulSoup again and retrieve the average price per sqm from there. Surely, more elegant ways exist, but since it is my first time scraping a website I chose what first came to mind..

In [69]:
# Url from website with the table
url = 'https://realadvisor.ch/en/property-prices/city-zurich'
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html")
price_sqm_table = soup.findAll("div", attrs={"class": "css-5oyp9l"})[1]

# Now find all link in table and access it to retrieve the price
price_sqms = []
hoods = []
for i in range(0, len(price_sqm_table)):
    for td in price_sqm_table.tbody.find_all("a"):
        href = td.get("href")
        html_temp = requests.get(urljoin(url, href)).text
        soup_temp = BeautifulSoup(html_temp, "html")
        name = soup_temp.findAll("h1", attrs={"class": "css-3ek332"})[0].text
        
        price_sqm_tmp = soup_temp.findAll("div", attrs={"class": "css-qpxiip-MedianPrice"})[4]
        price_sqm_tmp = int(price_sqm_tmp.text.split('\xa0')[1])
        price_sqms.append(price_sqm_tmp)
        
        
        hood = name.split(':')[0]# href.split('/')[-1].split('-', maxsplit=1)
        hoods.append(hood)

In [475]:
name[0].text.split(':')[0]

'Wipkingen'

In [70]:
price_sqm_dict = {
    "Neighbourhood": hoods,
    "Price": price_sqms
}

df_price_sqm = pd.DataFrame(price_sqm_dict)
df_price_sqm['Neighbourhood'] = df_price_sqm['Neighbourhood'].str.title()

In [71]:
bla = zurich_merged.join(df_price_sqm.set_index('Neighbourhood'), on='Neighbourhood')

In [72]:
bla.head(2)

Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Price
0,9,Altstetten,47.387403,8.486061,26,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant,393
1,10,Höngg,47.40166,8.497715,16,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach,342


In [532]:
df_neighbourhoods.head(2)

Unnamed: 0,objid,qnr,qname,knr,kname,geometry
0,4,92,Altstetten,9,Kreis 9,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
1,28,101,Höngg,10,Kreis 10,"POLYGON ((2680537.8 1249894.5,2680532.2 124989..."


In [73]:
df_neighbourhoods = pd.read_csv('./data/stzh.adm_statistische_quartiere_v.csv')
# drop some columns and rename columns
df_neighbourhoods= df_neighbourhoods.drop(columns=['objid', 'qnr','knr', 'kname'], inplace=False)
print(type(df_neighbourhoods))
df_neighbourhoods=df_neighbourhoods.rename(columns={'qname': 'Neighbourhood', 'geometry': 'geometry'}, inplace=False)
bla2 = bla.join(df_neighbourhoods.set_index('Neighbourhood'), on='Neighbourhood')
#bla.head(2)
bla2

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Price,geometry
0,9,Altstetten,47.387403,8.486061,26,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant,393,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
1,10,Höngg,47.40166,8.497715,16,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach,342,"POLYGON ((2680537.8 1249894.5,2680532.2 124989..."
2,11,Affoltern,47.418762,8.507186,15,5,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner,320,"POLYGON ((2682159.2 1254109.4,2682159.2 125410..."
3,2,Wollishofen,47.342427,8.530708,11,5,Supermarket,Plaza,Cheese Shop,Bus Station,Fast Food Restaurant,Irish Pub,Swiss Restaurant,Restaurant,Salon / Barbershop,Music Venue,384,"POLYGON ((2682061.5 1245325.9,2682083.8 124531..."
4,3,Friesenberg,47.354922,8.500523,2,4,Stables,Tennis Court,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,412,"POLYGON ((2678931.5 1245766.8,2678964.2 124573..."
5,7,Hottingen,47.36968,8.555082,22,1,Hotel,Swiss Restaurant,Italian Restaurant,Tram Station,Plaza,Grocery Store,Dessert Shop,Coffee Shop,Supermarket,Korean Restaurant,480,"POLYGON ((2684457 1246514.8,2684466.2 1246519,..."
6,7,Witikon,47.35831,8.590628,10,5,Bus Station,Department Store,Indian Restaurant,Supermarket,Discount Store,Tram Station,Church,Bakery,Grocery Store,Factory,315,"POLYGON ((2686429 1245344.5,2686434.8 1245337...."
7,11,Seebach,47.420438,8.548377,16,0,Bakery,Hookah Bar,Tram Station,Korean Restaurant,Laser Tag,Eastern European Restaurant,Pharmacy,Pizza Place,Pool,Supermarket,383,"POLYGON ((2682159.2 1254109.4,2682157.8 125410..."
8,9,Albisrieden,47.374857,8.484657,7,0,Bakery,Scenic Lookout,Pizza Place,Supermarket,Swiss Restaurant,Trattoria/Osteria,Grocery Store,Flea Market,Fast Food Restaurant,Farmers Market,386,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
9,2,Leimbach,47.330511,8.512539,4,6,Grocery Store,Pharmacy,Gas Station,Trail,Yoga Studio,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,297,"POLYGON ((2681741 1242023.1,2681709.8 1242041...."


In [507]:
# create map


map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map

for lat, lon, poi, cluster, price in zip(bla2['Latitude'], bla2['Longitude'],
                                  bla2['Neighbourhood'], bla2['Cluster Labels'], bla2["Price"]):
    
    cluster = int(cluster)
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
    icon = folium.DivIcon('<div style="font-size: 12">{}</div>'.format(str(price)))
    folium.Marker(
        location=[lat,lon],
        icon=icon
    ).add_to(map_clusters)
    
       
map_clusters

In [4]:
df_neighbourhoods = pd.read_csv('./data/stzh.adm_statistische_quartiere_v.csv')
df_neighbourhoods

Unnamed: 0,objid,qnr,qname,knr,kname,geometry
0,4,92,Altstetten,9,Kreis 9,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
1,28,101,Höngg,10,Kreis 10,"POLYGON ((2680537.8 1249894.5,2680532.2 124989..."
2,26,111,Affoltern,11,Kreis 11,"POLYGON ((2682159.2 1254109.4,2682159.2 125410..."
3,16,21,Wollishofen,2,Kreis 2,"POLYGON ((2682061.5 1245325.9,2682083.8 124531..."
4,6,33,Friesenberg,3,Kreis 3,"POLYGON ((2678931.5 1245766.8,2678964.2 124573..."
5,22,72,Hottingen,7,Kreis 7,"POLYGON ((2684457 1246514.8,2684466.2 1246519,..."
6,10,74,Witikon,7,Kreis 7,"POLYGON ((2686429 1245344.5,2686434.8 1245337...."
7,7,119,Seebach,11,Kreis 11,"POLYGON ((2682159.2 1254109.4,2682157.8 125410..."
8,23,91,Albisrieden,9,Kreis 9,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
9,15,23,Leimbach,2,Kreis 2,"POLYGON ((2681741 1242023.1,2681709.8 1242041...."


In [6]:

def pandas_to_geojson(df, fn_geojson):
    
    geojson = {"type": "FeatureCollection", "features": []}
    
    for _, row in df.iterrows():
        feature = {"type": "Feature", "id": row['qname'], 
                   "geometry": {"type": "Polygon", "coordinates": [row['geometry']]}}
        geojson['features'].append(feature)
        
    with open(fn_geojson, 'w') as fp:
        json.dump(geojson, fp)
        
    return geojson

geojson = pandas_to_geojson(df_neighbourhoods, 'zurich.json')

In [79]:
bla2

Unnamed: 0,District,Neighbourhood,Latitude,Longitude,n_venues,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Price,geometry
0,9,Altstetten,47.387403,8.486061,26,1,Supermarket,Discount Store,Plaza,Swiss Restaurant,Paper / Office Supplies Store,Hotel,Doner Restaurant,Sandwich Place,Chinese Restaurant,Fast Food Restaurant,393,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
1,10,Höngg,47.40166,8.497715,16,1,Grocery Store,Plaza,Discount Store,Fast Food Restaurant,Bus Station,Mexican Restaurant,Steakhouse,Supermarket,Gas Station,Beach,342,"POLYGON ((2680537.8 1249894.5,2680532.2 124989..."
2,11,Affoltern,47.418762,8.507186,15,5,Bus Station,Supermarket,Department Store,Athletics & Sports,Bike Trail,Light Rail Station,Gym / Fitness Center,Train Station,Grocery Store,Diner,320,"POLYGON ((2682159.2 1254109.4,2682159.2 125410..."
3,2,Wollishofen,47.342427,8.530708,11,5,Supermarket,Plaza,Cheese Shop,Bus Station,Fast Food Restaurant,Irish Pub,Swiss Restaurant,Restaurant,Salon / Barbershop,Music Venue,384,"POLYGON ((2682061.5 1245325.9,2682083.8 124531..."
4,3,Friesenberg,47.354922,8.500523,2,4,Stables,Tennis Court,Event Space,Food Truck,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,412,"POLYGON ((2678931.5 1245766.8,2678964.2 124573..."
5,7,Hottingen,47.36968,8.555082,22,1,Hotel,Swiss Restaurant,Italian Restaurant,Tram Station,Plaza,Grocery Store,Dessert Shop,Coffee Shop,Supermarket,Korean Restaurant,480,"POLYGON ((2684457 1246514.8,2684466.2 1246519,..."
6,7,Witikon,47.35831,8.590628,10,5,Bus Station,Department Store,Indian Restaurant,Supermarket,Discount Store,Tram Station,Church,Bakery,Grocery Store,Factory,315,"POLYGON ((2686429 1245344.5,2686434.8 1245337...."
7,11,Seebach,47.420438,8.548377,16,0,Bakery,Hookah Bar,Tram Station,Korean Restaurant,Laser Tag,Eastern European Restaurant,Pharmacy,Pizza Place,Pool,Supermarket,383,"POLYGON ((2682159.2 1254109.4,2682157.8 125410..."
8,9,Albisrieden,47.374857,8.484657,7,0,Bakery,Scenic Lookout,Pizza Place,Supermarket,Swiss Restaurant,Trattoria/Osteria,Grocery Store,Flea Market,Fast Food Restaurant,Farmers Market,386,"POLYGON ((2677911.5 1247524.4,2677919.2 124752..."
9,2,Leimbach,47.330511,8.512539,4,6,Grocery Store,Pharmacy,Gas Station,Trail,Yoga Studio,Food,Flea Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,297,"POLYGON ((2681741 1242023.1,2681709.8 1242041...."


In [114]:

state_geo = 'stzh_adm_statistische_quartiere_v.json'
 
# Load the unemployment value of each state
# Find the original file here: https://github.com/python-visualization/folium/tree/master/examples/data
 
# Initialize the map:
m = folium.Map(location=[47.4, 8.4], zoom_start=10)
 
# Add the color for the chloropleth:
m.choropleth(
    geo_data=state_geo,
    name='choropleth',
    data=bla2,
    columns=['Neighbourhood', 'District'],
    key_on='feature.properties.qname',
    #fill_color='OrRd',
    fill_color='Paired',
    fill_opacity=0.7,
    line_opacity=0.7,
    legend_name='Clusters',
    bins=12
)
folium.LayerControl().add_to(m)

# Save to html
m




In [81]:
m.choropleth?

In [None]:


geojson = {"type": "FeatureCollection", "features": []}

for _, row in df.iterrows():
    feature = {"type": "Feature", "geometry": {"type": "Point", "coordinates": [row['Longitude'], row['Latitude']]}, "properties": {"city": row['city']}}
    geojson['features'].append(feature)

with open('result.geojson', 'w') as fp:
    json.dump(geojson, fp)    
    
def json_to_geojson(data, districts):
    # create a geojson from a list of dictionaries
    # containing coordinates with the name of the polygon
    # in our case a polygon is a district
    assert type(data) == list, "The parameter data should be a list of coordinates with a name argument!"
    
    geojson = {
        "type": "FeatureCollection",
        "features": [
        {
            "type": "Feature",
            "geometry" : {
                "type": "Polygon",
                "name": district,
                "coordinates": [[[d["lon"], d["lat"]] for d in data if d['name'] == district]],
                },
            "properties" : {'name': district},
            
         } for district in districts]
    }
    
    return geojson