# Autoscout24.ch Price Monitor

The idea is to get an automated (eventual daily price overview for s specific type of car.
As Autoscout24.ch does not offer a API, we use web scraping to get the data we are interested in. 
Weak point in this approacg is the possibility of changes in the HTML structure. This would require changes also in the Jupyter notebook. But this lies in the nature of web scraping.

(c) Manuel Kohler, Basel, Switzerland

In [15]:
import sys
print(sys.version)

3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24) 
[Clang 6.0 (clang-600.0.57)]


In [85]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

baseurl = 'https://www.autoscout24.ch'
bmwi3_baseurl = baseurl + '/de/autos/bmw--i3'
# default is 20 cars per page, which would mean 20*15 cars
maxpages = 5

link_list = [] 

for i in range(1, maxpages):
    # fuel:16 = electro only, no REX
    # make:9 = BMW
    # model:1949 = i3
    
    payload = {'fuel':16, 'make': '9', 'model': '1949', 'page': i, 'st': 1, 'vehtyp': 10, 'sort': 'price_asc'}
    print('Loading page: ', i)    
    r = requests.get(bmwi3_baseurl, params=payload)
    soup = BeautifulSoup(r.text, 'html.parser')
#     print(soup.prettify(formatter=None))
    car_links = soup.find_all("a", class_="primary-link")

    # if no more new cars found we can exit the for loop. 
    # maxpages is then bigger than the real number of cars
    if (len(car_links) == 0):
        print(f"No more cars found after {i-1} iterations")
        break
    for car_link in car_links:
        link_list.append(car_link)
        
print('Found ' + str(len(link_list)) + ' cars')

Loading page:  1
Loading page:  2
Loading page:  3
Loading page:  4
Found 74 cars


In [86]:
data = {}

for carIdx, carVal in enumerate(link_list):
    car = {}
    value_list = []
    property_list = []
        
    if (carIdx % 10 == 0):
        print(f"Downloaded {carIdx} cars")
    individual_car_request = requests.get(baseurl + carVal.get('href'))
    soup = BeautifulSoup(individual_car_request.text, 'html.parser')
    
    # extract the individual vehical id given by autoscout, we assume that those are unique
    vehid = carVal.get('href').split("&")[7].split("=")[1]
    car['vehid'] = vehid
    car_textlist_item = soup.find_all("li", class_="textlist-item")

    prop = soup.find_all("div", class_="prop")
    value = soup.find_all("div", class_="value")
      
    # I assume that a property has always a matching value!
    for idx, val in enumerate(prop):
#         print(val.get_text().strip())
        car[val.get_text().strip()] = value[idx].get_text().strip()
        property_list.append(val.get_text().strip())
        value_list.append(value[idx].get_text().strip())
        data[vehid] = value_list[0:16]

print(f"Downloaded {carIdx} cars in total.")
#     print(len(value_list))
# print(data)
#     print(car['Preis'])    

Downloaded 0 cars
Downloaded 10 cars
Downloaded 20 cars
Downloaded 30 cars
Downloaded 40 cars
Downloaded 50 cars
Downloaded 60 cars
Downloaded 70 cars
Downloaded 73 cars in total.


In [83]:
# print(property_list[0:16])
car_df = pd.DataFrame.from_dict(data, orient='index', columns=property_list[0:16])
car_df

Unnamed: 0,Inverkehrsetzung,Fahrzeugart,Aussenfarbe,Kilometer,Getriebeart,Antriebsart,Treibstoff,Türen,Sitze,Innenfarbe,PS,Leergewicht,Stromverbrauch (kWh / 100 km),Benzinäquivalent in l/100km,CO2-Emissionen aus der Treibstoff- und/oder der Strombereitstellung,Durchschnitt der CO2-Emissionen aller verkauften Neuwagen
6137957,08.2014,Occasion,orange mét.,65'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,beige,170,1'270 kg,12.9,A,Ja\r\n\r\n \n\n\r\n ...,CHF 53'920.-
5541764,12.2014,Occasion,weiss,85'200 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,beige,170,1'270 kg,12.9,A,1BE809,22.12.2014
6023042,08.2014,Occasion,grau,61'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,schwarz,170,1'270 kg,12.9,A,1BE809,07.09.2018
6149716,03.2014,Occasion,gris mét.,72'400 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,170,1'270 kg,12.9,A,VZ60160VO54CL,1BE809,Ja\r\n\r\n \n\n\r\n ...
6115387,09.2014,Occasion,beige mét.,54'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,170,1'270 kg,12.9,A,1BE809,Ja\r\n\r\n \n\n\r\n ...,Ja\r\n\r\n \n\n\r\n ...
5976697,06.2014,Occasion,grau mét.,51'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,grau,170,1'270 kg,12.9,A,1BE809,Ja\r\n\r\n \n\n\r\n ...
6001246,05.2015,Occasion,weiss mét.,26'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,schwarz,170,1'270 kg,12.9,A,1BE809,Ja\r\n\r\n \n\n\r\n ...
6076529,07.2015,Occasion,grau,19'450 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,170,1'270 kg,12.9,A,1BE809,WBY1Z21040V308200,27.09.2018
5712317,03.2015,Occasion,silber mét.,12'300 km,Automat,Hinterradantrieb,Elektro,5,4,schwarz,170,1'390 kg,.0/.0/.6 (St/Land/Tot),11.5,13 g/km,A
5990506,01.2014,Occasion,schwarz mét.,41'000 km,Automatisiertes Schaltgetriebe,Hinterradantrieb,Elektro,5,4,schwarz,170,1'270 kg,12.9,A,1BE809,27.08.2018


In [77]:
#data = {'row_1': [3, 2, 1, 0], 'row_2': ['a', 'b', 'c', 'd']}
#pd.DataFrame.from_dict(data, orient='index')

In [84]:
import datetime
now = datetime.datetime.now()
car_df.to_csv(f"./cars_{now}.csv", encoding='utf-8')