# Demo: Scraping Apartments

The whole 'tool' consists of two parts: a Python script for data retrieval and an HTML to display the data on a map.<br>
Run this notebook to scrape apartments. Replace in the HTML document :::API-KEY::: with your maptiler API key (https://cloud.maptiler.com/maps/).



In [1]:
# Import packages / functions, set path

import sys, os
import numpy as np
import pandas as pd 
import time

import scrapeApartments as sap
from scrapeApartments import Scraper, Geocoding, CommutingTimes
import geopandas as gpd
from shapely.geometry import Point

PATH = '' # path to folder with the HTML file

## Part 1: Scrape websites 

In [2]:
t0 = time.time()
# Define parameters for the scraping. Run the scraper and omit entries having certain keywords

class ScrapingParameters(Scraper):
    PAGE = 'homegate_immoscout'
    ROOMS_MIN = 1.5
    SIZE_MIN = 40
    PRICE_MAX = 2000
    RADIUS = 0
    LOCATION = "Zürich"
    FILTER_KEYWORDS = [' befristet', 'Befristet', 'WG', 'Mitbewohner', 'BEFRISTET', 'untermiete', 'Untermiete']
    MAX_WORKERS = 8
    
myprmtrs = ScrapingParameters()
results = myprmtrs.scrape()
t1 = time.time()

print('\n\n===> scraped a total of {} relevant entries.'.format(len(results)))   
print('\n')
print('Execution took {} seconds'.format(np.round(t1-t0,3)))


Homegate accessed, no. of pages: 6
Homegate scraped.
Immoscout accessed, no. of pages: 3
Immoscout scraped.


===> scraped a total of 109 relevant entries.


Execution took 2.412 seconds


In [3]:
# Inspect results
results[0:10]

Unnamed: 0,url,address,nRooms,size,rent,currency,description,published,source
0,https://www.homegate.ch/mieten/3001686204,"Lerchenrain 1, 8046 Zürich",2.5,60.0,1602.0,CHF,Ruhige 2.5 Zi Wohnung Stadt Zürich,,homegate
1,https://www.homegate.ch/mieten/3001683394,"Schauenbergstrasse 25, 8046 Zürich",3.0,75.0,1850.0,CHF,Sehr zentral gelegene 3 Zimmerwohnung,,homegate
2,https://www.homegate.ch/mieten/3001568545,"Fellenbergstrasse 319, 8047 Zürich",3.0,80.0,1900.0,CHF,3½ Zimmer-Wohnung in Zürich,,homegate
3,https://www.homegate.ch/mieten/3001681318,"nahe Giesshübel, 8045 Zürich",2.5,60.0,1830.0,CHF,schöne Wohnung mit Balkon gerne an Einzelperson,,homegate
4,https://www.homegate.ch/mieten/3001685191,"Brandschenkestrasse 166, 8002 Zürich",2.0,50.0,1640.0,CHF,2-Zimmerwohnung an zentraler Lage im Kreis 2,,homegate
5,https://www.homegate.ch/mieten/3001673767,"Nietengasse 11, 8004 Zürich",3.0,78.0,1780.0,CHF,Zeitlich beschränkter Mietvertrag bis 30.09.2022,,homegate
6,https://www.homegate.ch/mieten/3001560704,"Josefstrasse 200, 8005 Zürich",2.0,45.0,1890.0,CHF,2-Zimmerwohnung bei den Viaduktbögen,,homegate
7,https://www.homegate.ch/mieten/3001656016,"Asylstr. 11, 8032 Zürich",1.5,40.0,1975.0,CHF,"Designer-Studio AEGR im Patrizierhaus, 8032 Zü...",,homegate
8,https://www.homegate.ch/mieten/3001693872,"Farenweg 6, 8038 Zürich",2.0,56.0,1825.0,CHF,Helle Wohnung mit Gartensitzplatz im Grünen !,,homegate
9,https://www.homegate.ch/mieten/3001679290,"Nidelbadstrasse, 8038 Zürich",2.0,54.0,1950.0,CHF,Wohnung in Zürich,,homegate


## Part 2: Geocode scraped data
Strictly speaking, the geocoding module is decoupled from the scraping module and requires minimally a list with addresses. 

This makes sense for a large number of addresses running Nominatim locally. 

For the purpose of demonstration, however, we're deploying the geocoder on a web-based server running Nominatim. A multi-threaded query with a large number of addresses is not well-received and hence, we need a couple of lines of code to select the relevant data manually. Hence, only the addresses which were not georeferenced previously (e.g. from Homegate) are processed and later on merged.

If any address is not geocoded after the next step, the address string must be inspected, adapted and geocoded (e.g., by using the geocode() function).

In [5]:
t0 = time.time()

# Initiate geocoding
mygeocoder = Geocoding()
mygeocoder.NOMINATIM = 'web'
mygeocoder.ADDRESSES = results.address

df_geocoded = mygeocoder.geocode()

#df_geocoded.drop(['address_located'], axis=1, inplace=True)
df_geocoded = df_geocoded.reset_index()
df_geocoded.drop(['index'], axis=1, inplace=True)

t1 = time.time()
print('\n')
print('Execution took {} seconds'.format(np.round(t1-t0,3)))

# Note, when running Nominatim locally, it only took 0.965 seconds.



Execution took 54.603 seconds


In [6]:
# Inspect results - note the differences between the address columns.
# GIGO: The geocoding is done only as well as the addresses given as input...

df_geocoded[0:10]

Unnamed: 0,address,address_located,lat,lon
0,"Lerchenrain 1, 8046 Zürich","Lerchenrain 1, 8046 Zürich",47.413786,8.508526
1,"Schauenbergstrasse 25, 8046 Zürich","Schauenbergstrasse 25, 8046 Zürich",47.417966,8.506605
2,"Fellenbergstrasse 319, 8047 Zürich","Fellenbergstrasse 319, 8047 Zürich",47.375135,8.487825
3,"nahe Giesshübel, 8045 Zürich","Giesshübel , 8045 Zürich",47.358162,8.517569
4,"Brandschenkestrasse 166, 8002 Zürich","Brandschenkestrasse 166, 8002 Zürich",47.363481,8.525825
5,"Nietengasse 11, 8004 Zürich","Nietengasse 11, 8004 Zürich",47.379616,8.525331
6,"Josefstrasse 200, 8005 Zürich","Josefstrasse 200, 8005 Zürich",47.386751,8.524199
7,"Asylstr. 11, 8032 Zürich","Asylstrasse 11, 8032 Zürich",47.369536,8.557383
8,"Farenweg 6, 8038 Zürich","Farenweg 6, 8038 Zürich",47.340409,8.526157
9,"Nidelbadstrasse, 8038 Zürich","Nidelbadstrasse , 8038 Zürich",47.335558,8.532923


In [8]:
t0 = time.time()

class Commute(CommutingTimes):
    DATAFRAME = df_geocoded
    DESTINATION = ["Hauptbahnhof, 8001 Zürich", (47.3808203,8.5256856)] ### as LatLon or str

cmt = Commute()
df_commutingTimes = cmt.getCommutingTimes() 
df_fin = results.merge(df_commutingTimes)

outpathData =  '/home/user/Proj/WebScraper/Webscraper_Wohnungen_v2/data.geojson'
sap.df2GeoJSON(df_fin,outpathData, varname='dataset', avgMinutesCol='mins_sbb_1')

outpathDestination =  '/home/user/Proj/WebScraper/Webscraper_Wohnungen_v2/destination.geojson'
sap.df2GeoJSON(cmt.DESTINATION_DF, outpathDestination, varname='destination', avgMinutesCol=None)

t1 = time.time()
print('\n')
print('Execution took {} seconds'.format(np.round(t1-t0,3)))



Execution took 10.109 seconds


In [9]:
df_fin[0:5]

Unnamed: 0,url,address,nRooms,size,rent,currency,description,published,source,address_located,lat,lon,mins_sbb_1,mins_sbb_2,geometry,avgMinutes
0,https://www.homegate.ch/mieten/3001686204,"Lerchenrain 1, 8046 Zürich",2.5,60.0,1602.0,CHF,Ruhige 2.5 Zi Wohnung Stadt Zürich,,homegate,"Lerchenrain 1, 8046 Zürich",47.413786,8.508526,33,31,POINT (8.50853 47.41379),33
1,https://www.homegate.ch/mieten/3001683394,"Schauenbergstrasse 25, 8046 Zürich",3.0,75.0,1850.0,CHF,Sehr zentral gelegene 3 Zimmerwohnung,,homegate,"Schauenbergstrasse 25, 8046 Zürich",47.417966,8.506605,29,28,POINT (8.50661 47.41797),29
2,https://www.homegate.ch/mieten/3001568545,"Fellenbergstrasse 319, 8047 Zürich",3.0,80.0,1900.0,CHF,3½ Zimmer-Wohnung in Zürich,,homegate,"Fellenbergstrasse 319, 8047 Zürich",47.375135,8.487825,26,25,POINT (8.48783 47.37514),26
3,https://www.homegate.ch/mieten/3001681318,"nahe Giesshübel, 8045 Zürich",2.5,60.0,1830.0,CHF,schöne Wohnung mit Balkon gerne an Einzelperson,,homegate,"Giesshübel , 8045 Zürich",47.358162,8.517569,21,26,POINT (8.51757 47.35816),21
4,https://www.homegate.ch/mieten/3001685191,"Brandschenkestrasse 166, 8002 Zürich",2.0,50.0,1640.0,CHF,2-Zimmerwohnung an zentraler Lage im Kreis 2,,homegate,"Brandschenkestrasse 166, 8002 Zürich",47.363481,8.525825,19,24,POINT (8.52582 47.36348),19


In [11]:
# Filter by commuting time. E.g., 20min to the train station and 30min to location 2.

df_filtered = df_fin.loc[(df_fin['mins_sbb_1'] < 20) & (df_fin['mins_sbb_2'] < 30)]

print("{} entries before filtering, {} entries after.".format(len(df_fin), len(df_filtered)))

outpathData =  '/home/user/Proj/WebScraper/Webscraper_Wohnungen_v2/data.geojson'
sap.df2GeoJSON(df_filtered,outpathData, varname='dataset', avgMinutesCol='mins_sbb_1')

109 entries before filtering, 15 entries after.


In [12]:
# save centroid for display
x = [X for X in df_filtered.lon.tolist()]
y = [Y for Y in df_filtered.lat.tolist()]
centroid = [sum(x) / len( df_filtered.lon.tolist()), sum(y) / len( df_filtered.lat.tolist())]

centroidString = "var centerP = ["+str(centroid[1])+", "+str(centroid[0])+"]"
with open('/home/user/Proj/WebScraper/Webscraper_Wohnungen_v2/centerpoint.txt', 'w') as file:
    file.write(centroidString)

In [13]:
df_filtered

Unnamed: 0,url,address,nRooms,size,rent,currency,description,published,source,address_located,lat,lon,mins_sbb_1,mins_sbb_2,geometry,avgMinutes
4,https://www.homegate.ch/mieten/3001685191,"Brandschenkestrasse 166, 8002 Zürich",2.0,50.0,1640.0,CHF,2-Zimmerwohnung an zentraler Lage im Kreis 2,,homegate,"Brandschenkestrasse 166, 8002 Zürich",47.363481,8.525825,19,24,POINT (8.52582 47.36348),19
5,https://www.homegate.ch/mieten/3001673767,"Nietengasse 11, 8004 Zürich",3.0,78.0,1780.0,CHF,Zeitlich beschränkter Mietvertrag bis 30.09.2022,,homegate,"Nietengasse 11, 8004 Zürich",47.379616,8.525331,17,3,POINT (8.52533 47.37962),17
6,https://www.homegate.ch/mieten/3001560704,"Josefstrasse 200, 8005 Zürich",2.0,45.0,1890.0,CHF,2-Zimmerwohnung bei den Viaduktbögen,,homegate,"Josefstrasse 200, 8005 Zürich",47.386751,8.524199,16,17,POINT (8.52420 47.38675),16
7,https://www.homegate.ch/mieten/3001656016,"Asylstr. 11, 8032 Zürich",1.5,40.0,1975.0,CHF,"Designer-Studio AEGR im Patrizierhaus, 8032 Zü...",,homegate,"Asylstrasse 11, 8032 Zürich",47.369536,8.557383,16,24,POINT (8.55738 47.36954),16
16,https://www.homegate.ch/mieten/3001465286,"Baslerstrasse 71, 8048 Zürich",1.5,42.0,1740.0,CHF,Ihr persönlicher Wohntraum vom Leben in der St...,,homegate,"Baslerstrasse 71, 8048 Zürich",47.389349,8.488015,19,21,POINT (8.48802 47.38935),19
44,https://www.homegate.ch/mieten/3001517199,"Bucheggstrasse 157, 8057 Zürich",2.5,61.0,1950.0,CHF,Erstvermietung nach Totalsanierung,,homegate,"Bucheggstrasse 157, 8057 Zürich",47.39903,8.540753,19,25,POINT (8.54075 47.39903),19
45,https://www.homegate.ch/mieten/3001641976,"Lintheschergasse 23, 8001 Zürich",4.5,200.0,,CHF,Le Bijou Luxury Apartments - an investment in ...,,homegate,"Lintheschergasse 23, 8001 Zürich",47.376826,8.53884,4,14,POINT (8.53884 47.37683),4
47,https://www.homegate.ch/mieten/3001682353,"Schöntalstrasse, 8004 Zürich",2.5,50.0,1900.0,CHF,"2.5 Zimmer Wohnung, Zwischenmiete im Kreis 4",,homegate,"Schöntalstrasse , 8004 Zürich",47.371005,8.526145,17,15,POINT (8.52614 47.37100),17
54,https://www.immoscout24.ch/de/d/wohnung-mieten...,"Anwandstr. 74, 8004 Zürich",3.0,67.0,1913.0,CHF,****Grosszügige 2.5 Zimmer-Wohnung im TREND QU...,2022-02-22T18:09:37+01:00,immoscout,"Anwandstrasse 74, 8004 Zürich",47.377705,8.520439,19,8,POINT (8.52044 47.37771),19
55,https://www.immoscout24.ch/de/d/wohnung-mieten...,"Wuhrstrasse 18, 8003 Zürich",2.0,45.0,1950.0,CHF,2-Zimmer-Wohnung in der Nähe des Manesseplatzes,2022-01-28T09:08:48+01:00,immoscout,"Wuhrstrasse 18, 8003 Zürich",47.36714,8.521584,18,22,POINT (8.52158 47.36714),18
