For the course Online Data Collection & Management at Tilburg University, our team aims to scrape data from the website kayak.com. KAYAK provides a service that compares airline tickets with one another. When on the explore page, KAYAK provides a set of destinations based on the airport of departure. The purpose of this scraper is to create a dataset in which the suggested destinations based on the airport of departure (the top 10 biggest airports in Europe) become visible in April for a trip duration of five days.
The top 10 of the biggest airports in Europe is as follows:
- London Heathrow Airport, United Kingdom (LHA)
- Aéroport de Paris-Charles de Gaulle, France (CDG)
- Amsterdam Airport Schiphol, the Netherlands (AMS)
- Flughafen Frankfurt am Main, Germany (FRA)
- Aeropuerto Adolfo Suárez Madrid-Barajas, Spain (MAD)
- Aeroport de Barcelona-el Prat, Spain (BCN)
- Istanbul Airport, Turkey (IST)
- Sheremetyevo International Airport, Russia (SVO)
- Flughafen München Franz Josef Strauß, Germany (MUC)
- London Gatwick Airport, United Kingdom (LGW)
In this notebook we will discuss the scraping in three chapters:
- The preparation before scraping
- The kayak.com/explore scraper
- Saving the scraped data in a csv file