ImmoScoutScraper
go to immobilienscout24.de, do your search, specifying all search criteria, copy url and paste into input cell below.
Specify SaveFolder to have the result .csv saved there

Further steps: 
Expose-ID will be read and Immobilien-Scout Expose-API will be called

In [23]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import math
import os.path
from os import path

LinkList=[]
PageNo, ResultPageNo = 0, 1

In [24]:
#Inputs
#Paste Search URL (with filters encoded into it) here
URLwCriteria = "https://www.immobilienscout24.de/Suche/radius/wohnung-kaufen?centerofsearchaddress=Landshut%20(Kreis);;;1276002051;Bayern;&numberofrooms=1.0-&price=-950000.0&livingspace=40.0-&geocoordinates=48.54481;12.19322;10.0&enteredFrom=one_step_search"

#specify folder to save results in (relative to JupyterProjects folder)
SaveFolder = "DownloadedFiles/" #empty if it should be stored in same folder as script, other relative path to sript path
DownloadFileName = "ExposeURLs_WhngKaufen_LA" #date and .csv will be added automaticalle. recommendation: specify Object type and location
DownloadMasterName = "Master_WhngKaufen_LA"

In [25]:
#preprocess input
##remove the "enteredFrom=XYZ" and add add "pagenumber=" from URL (assumption: user does not go to other search pages before copying URL)
if "enteredFrom=" in URLwCriteria:
    URLwCriteria = URLwCriteria[:URLwCriteria.find("&enteredFrom=",33)] #33 as startposition for search (after "www. ... .de/" to speed up)
URLwCriteria += "&pagenumber="

DateToday = str(pd.datetime.now().date())
DateTimeToday = pd.to_datetime("today")

##construct paths from filenames and folder
DownloadFilePath = SaveFolder+DateToday+"_"+DownloadFileName
DownloadFilePath_newExp = SaveFolder+DateToday+"_"+DownloadFileName+"_newExp"
DownloadMasterPath = SaveFolder+DownloadMasterName

In [26]:
#Loop over result pages. starting with 1 up number of results divided by 20 (which is the number of results displayed per result page)

while PageNo < ResultPageNo:
    PageNo += 1

    URLwPageNo = URLwCriteria + str(PageNo)
    page = requests.get(URLwPageNo, timeout=3)
    soup = BeautifulSoup(page.content, 'html.parser')

    #find number of search results to calculate number of pages to check, if on first page
    if PageNo == 1:
        titletext = soup.find ("div", class_="palm-hide margin-bottom-m")
        #print(titletext)
        resultnumberhtml = titletext.find("span")
        resultnumber=resultnumberhtml.text
        #print(resultnumber)
        #calculate number of pages to check
        ResultPageNo = math.ceil(int(resultnumber)/20)
        #print(ResultPageNo)

    #Extract Links from html and then extract only URL itself after "href"
    HtmlLinks = soup.find_all('a')
    #print(HtmlLinks)
    for HtmlLink in HtmlLinks:
        LinkText = HtmlLink.get("href")
        #print(LinkText)
        LinkList.append(LinkText[:17])  

In [27]:
#print(LinkList)
#clean up
del soup

In [28]:
#Create list of URLs of (only) Exposes, extracted from list of all links after looping over result pages
ExposeURLList = list(set(["https://www.immobilienscout24.de"+ExposePath for ExposePath in LinkList if "/expose/" in ExposePath])) #use set to remove duplicates. order will be lost
#print(ExposeURLList)
#print(len(ExposeURLList))

#Create list of Expose IDs from ExposeURLList
ExposeIDList = [ExposeURL[40:49] for ExposeURL in ExposeURLList]
#print(ExposeIDList)

In [29]:
#Create Pandas DataFrame with ExposeID, ExposeURL and Download date
ExposeDF = pd.DataFrame({"ExposeID":ExposeIDList, "ExposeURL":ExposeURLList})
ExposeDF["DownloadDate"] = DateTimeToday
ExposeDF["SearchURL"] = URLwCriteria
ExposeDF.set_index("ExposeID")
ExposeDF.head()

Unnamed: 0,ExposeID,ExposeURL,DownloadDate,SearchURL
0,115484389,https://www.immobilienscout24.de/expose/115484389,2020-02-13 00:29:41.714637,https://www.immobilienscout24.de/Suche/radius/...
1,115595343,https://www.immobilienscout24.de/expose/115595343,2020-02-13 00:29:41.714637,https://www.immobilienscout24.de/Suche/radius/...
2,114892178,https://www.immobilienscout24.de/expose/114892178,2020-02-13 00:29:41.714637,https://www.immobilienscout24.de/Suche/radius/...
3,113536800,https://www.immobilienscout24.de/expose/113536800,2020-02-13 00:29:41.714637,https://www.immobilienscout24.de/Suche/radius/...
4,115633718,https://www.immobilienscout24.de/expose/115633718,2020-02-13 00:29:41.714637,https://www.immobilienscout24.de/Suche/radius/...


In [30]:
#save DataFrame as .csv with current date to specified folder
ExposeDF.to_csv(DownloadFilePath+".csv", index=False, header = True)

In [31]:
#append downloads to master file which collects all the downloaded URLs
##check if master file exists. if yes, do not add headers to file, if no add header
if path.exists(DownloadMasterPath):
    ExposeDF.to_csv(DownloadMasterPath+".csv", mode = "a", header = False)
else:
    ExposeDF.to_csv(DownloadMasterPath+".csv", mode = "a", header = True)

In [32]:
# open master file, write new entries only to "New"-file (i.e. if same Expose-ID not in MasterDF yet)
MasterDF = pd.read_csv(DownloadMasterPath+".csv", index_col=0)
NewDF = ExposeDF[~ExposeDF.ExposeID.isin(MasterDF.ExposeID)]
NewDF.to_csv(DownloadFilePath_newExp+".csv", index=False, header = True)
NewDF.to_excel(DownloadFilePath_newExp+".xlsx", index=False, header = True)

In [11]:
#open accordant file (depending on use-case) and use a) Scrape using ExposeURLs or b) Expose-API to add information from Exposes
#here try a)
##depending on which to check either read file to Dataframe or use one of the above DataFrames
###assumption for this file: fill in information to the ExpoIDs found today in Masterfile
ExposeResultDF = MasterDF

#loop over different ExposeIDs (while or for?)
for ExpoID in ExposeResultDF.ExposeID & ExposeResultDF.DownloadDate == DateTimeToday
    ExposeURL = "https://www.immobilienscout24.de/expose/"+ExpoID
    ExposePage = requests.get(ExposeURL, timeout=3)
    ExposeSoup = BeautifulSoup(ExposePage.content, 'html.parser')
    
    #extract Info from Expose and add to DF

In [12]:
#save IDs + information in final file

instruction to open result file (.csv) in Excel (or similar) (note: direct export Excel also available): 
1) open .csv file in Excel
2) mark first column ("A")
3) go to "Data" -> "Text to columns"
4) choose "delimited", then "comma" as separator. you can choose formats for certain columns but not necessary
5) save copy as Excel-File. Do (beeter) not save changes to the original file, if you still want to process them