# Project: **Travel Destination Recommender**


1. **Project Overview**
   - Introduction
   - Goals and Objectives
   - Significance of the Project
2. **Implementation Details**
   - Data Collection
   - Data Preprocessing
   - Feature Extraction
   - Recommendation Engine Development
   - User Interface Design
3. **Analysis and Results**
   - Data Analysis Techniques Used
   - Presentation of Results
   - Evaluation Metrics
4. **Reflection and Conclusion**
   - Challenges Faced
   - Lessons Learned
   - Overall Significance of Findings
   - Future Directions
5. **References**




# 1.Project Overview

## 1.1 Introduction

Traveling tends to be such an insightful experience. It widens the horizons and helps us drown in different cultures — even explore magnificent landscapes where beautiful memories are created. Yet, trip planning can be likened to a Herculean task with all the myriad destinations and attractions, not to mention numerous activities that come into play. This is the point at which Travel Destination Recommender takes center stage— ushering in an easy and personalized solution for any traveler who wishes to find their dream vacation spot.

The era we are living in now is predominantly digital. There is an excess amount of information about travels available online and this data is spread all over the web on different websites and platforms. Although it is not easy to handle such a large amount of data, this wealth also provides us with an opportunity to make use of web scraping techniques— that can collect a quality selection of information from top travel websites like TripAdvisor, Lonely Planet, and Yelp.

The Travel Destination Recommender does not work by magic but rather it uses web scraping to gather details of hotels and eateries alongside places, numbers of local experiences; such as the budget, interests and travel dates. It then makes a synthesis plus an analysis of this huge amount of data in order to provide a user with his or her own tailored guide. In other words, you are getting all information that is most relevant to you which you would have spent hours searching for on your own otherwise. A successful and enjoyable trip is ensured when you use this personalized guide, so remember to plan well for it.

Our travel destination method introduced doesn't just save time and energy for tourists; it helps the discovery of unknown places, unexplored treasures— even more real and fulfilling experiences. The Travel Destination Recommender is a major breakthrough in the field of journey mapping: helping users find their ideal dream spot with ease and surety is no small accomplishment. This innovation promises to assist travelers not just in saving time but also making sure that they reach places worth their travel.

## 1.2 Goals and Objectives


Here are potential goals and objectives for our Travel Destination Recommender project:

**Goals:**

1. Create a travel advice engine that customizes ideas for places, spots, stays and native happenings according to personal likes.

2. Make use of web scraping methods to collect data from different travel websites and introduce it into one place, ensuring that our recommendation system will always be rich in data and actuality with information related to travels.

3. Let's inspire people to discover those places off the beaten track to promote local jewels. By doing this we can help people have a more real and satisfying experience when they travel.

4. Continuously improve and refine the recommendation algorithms to provide increasingly accurate and relevant suggestions as the system accumulates more data and feedback.

**Objectives:**

1. Implement efficient web scraping methods to extract information from TripAdvisor, Lonely Planet, Yelp, and other useful travel websites.

2. Develop a robust data processing pipeline to clean, structure, and integrate the collected data into a centralized database.

3. Design and implement a user-friendly interface that allows users to input their preferences, such as budget, interests, travel dates, and any specific requirements.

4. Develop personalized recommendation algorithms that leverage machine learning techniques to analyze user preferences and match them with suitable destinations, attractions, accommodations, and local experiences.

These goals and objectives outline the key aspects of developing a robust and effective Travel Destination Recommender system, leveraging web scraping techniques and personalized recommendation algorithms to provide a seamless and enriching travel planning experience for users.

## 1.3 Significance of the Project


A project that has been developed to help travelers effectively plan trips tailored for them and based on their interests, budgets, and desired experiences is the Travel Destination Recommender project. Our system is able to give personalized recommendations for destinations, activities, accommodations and more through a combination of web scraping and machine learning techniques — using a vast amount of information available online from various travel sites. With this system, travelers will not only be introduced to popular attractions but they will also be shown hidden gems within local areas which they may never find otherwise.

The recommendation system does more than ensure a good trip is planned; it incentivise environmentally friendly and sustainable tourism that respects local cultures and environment by providing personalized information on desired activities, budget constraints, and interests. This also ensures no key details are missed, as the system takes information from various sources thereby saving time for travelers to find all necessary data themselves. It helps people discover who they are through what they experience: an innovative approach to travel planning that uncovers unique experiences at off-the-beaten-path locations. An unforgettable memory awaits those who take part in these journeys with the perfect destinations waiting for them. Ultimately, the Travel Destination Recommender completely transforms how trips are planned: bridging travelers' preferences with destination points for a memorable and enriching journey.


# 2.Implementation Details

## Data Collection


### **Step 1: Choosing Travel Websites**

- TripAdvisor (https://www.tripadvisor.com/)
- Lonely Planet (https://www.lonelyplanet.com/)
- Yelp (https://www.yelp.com/)
- Booking.com (https://www.booking.com/)

### TripAdvisor (https://www.tripadvisor.com/)
TripAdvisor is a perfect choice because it provides extensive travel information. It hosts one of the largest collections of user reviews and ratings, allowing users to gain valuable insights on traveler experiences and preferences. In addition, the popularity based attraction rankings make it a go-to place for finding top destinations — highly recommended by users. It reaches every corner of the globe with different world-class destinations that appeal to various travel interests; hence, its wide coverage ensures it fits almost all individual preferences around the world.


### Lonely Planet (https://www.lonelyplanet.com/)
Lonely Planet is as one of the top competitor to be scraped for travel information— given its standing for expert recommendations and carefully created guides, it's a treasure trove of reliable insights for unique adventures. In addition to being informative, its guides go deep into cultural cues and historical narratives (which would otherwise go unnoticed), thus enriching the traveler's cognitive map with destination-based knowledge. Moreover, their range of specialized guidebooks encompass different spheres such as food or adventure seeking along with cultural explorations; this ensures a wider appeal among varying preferences within the traveler community.


### Yelp (https://www.yelp.com/)
Yelp is an amazing source when it comes to collecting travel data— especially when it comes to restaurant reviews and local business details. They pay close attention to presenting as much information about the local businesses as possible, including tourist spots and stores; thus through Yelp, travelers can get valuable information. In addition to that, through user-generated content on Yelp, travelers can obtain genuine reviews plus suggestions made by other travelers as well as locals. Moreover, using Yelp's location-based search feature assists individuals in finding places of interest nearby: this helps in planning what activities or places to visit during the trip so that they don't have to go far for different services they may need.


### Booking.com (https://www.booking.com/)
Booking.com turns out to be a reasonable alternative for collecting data about accommodations: it is in fact widely known for its massive range of hotel and accommodation listings. As one of the major players in the field of hotel booking platforms, Booking.com presents a choice of properties that can only be described as colossal on a global scale— reaching out to all sorts of budgets and preferences. The feedback from users found on the site concerning different lodgings' quality and guest experiences is indeed very informative, allowing travelers to wisely choose their stopovers. On top of this, the availability of numerous types of bookings through Booking.com — which also include easy-to-cancel reservations and last-minute offers — only adds to the attraction for those who value convenience and flexibility when making travel plans.



### **Step 2: Identifying Data to Scrape**



### Attractions (sights, landmarks, museums, parks)
Collecting data on attractions typically entails collecting information on landmarks, museums, and parks in various locations. These attractions serve as prominent points of interest for travelers, providing cultural, historical, or recreational experiences. When the recommendation system filters out details regarding these attractions— such as the essential destination spots and their opening hours plus entrance fees and other tourist-related tidbits— it can provide travelers with insight into places of interest along with notable features that make them stand out.


### Hotels (ratings, prices, amenities)
Hotels play a vital role in providing travelers with a place to stay during their journeys as a part of the travel planning process. When scraping hotels’ data, it involves getting details on their ratings, prices and what they offer. Guest reviews and feedback can be very revealing about the quality standards that hotel management upholds, allowing other potential guests to make an informed decision on whether it would be suitable for them or not. In addition, price data empowers travelers to make comparisons among various options based on their financial limitations; while amenities such as Wi-Fi, breakfast and parking add to user experience, making it more pleasurable.

### Restaurants (ratings, cuisines, prices)
Restaurants have a significant impact on the way we enjoy our travels as they introduce us to flavors that are distinctive to where we are and what we can only find there. When scraping data for restaurants, one must gather details regarding their ratings and food along with prices— user reviews and critics offer insight into the quality of food, service, and ambiance which assists travelers in selecting an eatery that meets their likings. In addition to menu information based on prices (or cost), such reports aim at providing a wide range of options for travelers both in taste preference and budget consideration.

### Local Experiences (tours, activities, events)
Local experiences encompass a diverse array of tours, activities, and events that provide travelers with immersive and authentic cultural experiences. The recommendation system, based on details obtained from these local experiences (including guided tours, outdoor activities, etc.), allows users to select activities that align with their interests and preferences.

### User Preferences (budget, interests, travel dates)
The User preferences serve as the foundation of personalized travel suggestions, steering the decision of places and events that would appeal most to an individual based on their needs and likings. Instead of scraping data about user preferences, get to know more about a person by understanding budget limitations — to gather information on travel interests — including specific details like what time they would prefer to go hiking or if they have a penchant for historical explorations or culinary quests, along with travel dates. The plan should be able to make a tailored recommendation based on user's budget constraints plus the user's preference on specific interest areas and even their preferred dates for travel: this way, each recommended plan is uniquely customized to ensure every journey is enjoyable and suits individual users.




### **Step 3: Setting Up Our Environment**


### Instant Data Scraper Web Extension
We chose a different path for this project— we didn't follow the common roads of BeautifulSoup and requests when it came to web scraping libraries. Instead, we took the Instant Data Scraper web extension. This is no ordinary scraper; it's a browser extension standing tall for both Chrome and Firefox. Its sole purpose lies in effortlessly pulling data off web pages. How? y installing the extension, users gain access to a user-friendly interface that allows them to select and extract data directly from websites without writing code. This approach simplifies the process of data extraction, making it accessible to individuals without extensive programming knowledge. Additionally, the extension provides features for exporting scraped data in various formats, such as CSV or JSON, for further analysis and data processing.


### Octoparse
Octoparse is a tool — alongside the Instant Data Scraper web extension — was also utilized as a web scraping instrument for this particular endeavor. Described as robust and workable, Octoparse is a software designed for web scraping that lets users pull data off websites through a visual 'point-and-click' mechanism. Using Octoparse, users can achieve their scraping tasks by simply navigating through the web pages and choosing what they want to extract. Among its feature sets are automatic IP rotation plus export scheduling (to facilitate large-scale scraping projects), cloud-based extraction. It's competent at handling dynamic JavaScript-rendered web pages; hence its compatibility with different websites and sources of data is assuredly high.

### **Step 4: Start Scraping**


After the establishment of our environment through tools for web scraping like Instant Data Scraper and Octoparse, we are in a position to start the process of data scraping from every individual website selected. Instead of crafting scripts, let us employ the user-friendly interfaces offered by these tools so that we can easily draw out data without any hustles.


### Steps with Instant Data Scraper:
1. Open the Instant Data Scraper web extension using browser.
2. Visit any website we want to scrape information from— for instance, TripAdvisor or Lonely Planet.
3. Just have to pick what we need from the data by simply clicking on the screen. This can be any information worth scraping like the major tourist attractions in a particular city.
4. We need to set up how the scraping will work, this includes how the tool should deal with pagination and also what options are available for exporting data.
5. At this point, we'll launch the scraper which will go to the website and bring back only the specific data we are interested in.
6. Once the extraction is completed, take out all the scraped data in a format that suits your analysis needs such as CSV or JSON.


### Using Octoparse:
The initial step is to run the Octoparse software on your computer. Subsequently, create a new scraping task and provide the URL of the website we intend to scrape. Through Octoparse's visual workflow builder, we can establish various extraction steps — including but not limited to wandering onto particular pages and marking certain data components. Consider setting advanced configurations like automatic IP rotation and scheduling for data export based on your requirements. Then initiate the scraping task so that we can pull out information from the target site.
Utilize the following steps to finalize the extracted data:
1. Evaluate the data directly on the Octoparse interface.
2. Save the scraped data in a format convenient for our use.

With these web scraping tools, we can collect information from various sites with ease; no need for intricate codes. This method simplifies the scraping process, letting us concentrate on tweaking and delving into details of what we have gathered for our travel spot suggester. 


### **Step 5: Refining Scripts**

- Refine scripts to extract specific data like hotel details, restaurant reviews, and activity information.
- Handle pagination for websites with multiple pages of results.
- Adapt scripts to website structure changes.


## Data Preprocessing


In [1]:
#| echo: true
#| code-fold: true
#| output: false
import pandas as pd
from bertopic import BERTopic
from nltk.corpus import stopwords
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from nltk.tokenize import word_tokenize
from bertopic import BERTopic
from nltk.corpus import stopwords
import string
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/emredeveci/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

### Must-See Attractions

#### Berlin

In [2]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/berlin_attractions - Sheet1.csv')
berlin.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lp-cms-production.imgix.net/2019-06/d6...,https://www.lonelyplanet.com/germany/berline/m...,Museumsinsel,Museumsinsel & Alexanderplatz,"Walk through ancient Babylon, meet an Egyptian...",must_see,Berlin
1,https://lp-cms-production.imgix.net/2021-08/sh...,https://www.lonelyplanet.com/germany/berline/m...,Neues Museum,Museumsinsel & Alexanderplatz,"For over 60 years, not a soul was able to visi...",must_see,Berlin
2,https://lp-cms-production.imgix.net/2019-06/d5...,https://www.lonelyplanet.com/germany/berline/m...,Pergamonmuseum,Museumsinsel & Alexanderplatz,The Pergamonmuseum is one of Berlin’s most vis...,must_see,Berlin
3,https://lp-cms-production.imgix.net/2021-08/sh...,https://www.lonelyplanet.com/germany/berlin/fr...,East Side Gallery,Friedrichshain,The East Side Gallery is the embodiment of Ber...,must_see,Berlin
4,https://lp-cms-production.imgix.net/2024-02/Ge...,https://www.lonelyplanet.com/germany/berline/m...,Fernsehturm,Museumsinsel & Alexanderplatz,"Germany's tallest structure, the TV Tower is a...",must_see,Berlin


#### Bremen

In [3]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/bremen_attractions - Sheet1.csv')
bremen.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Herrenhäuser Gärten,Hanover,Proof that Hanover is not all buttoned-down bu...,must_see,Lower Saxony & Bremen
1,https://lp-cms-production.imgix.net/2019-06/2a...,https://www.lonelyplanet.com/germany/bergen-be...,Gedenkstätte Bergen-Belsen,Lower Saxony & Bremen,The Nazi-built camp at Bergen-Belsen began its...,must_see,Lower Saxony & Bremen
2,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Schloss Marienburg,Lower Saxony & Bremen,"Perched grandly above the Leine River, the neo...",must_see,Lower Saxony & Bremen
3,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/bremen-ci...,Denkort Bunker Valentin,Bremen City,"In 1943, the Nazis started construction of a m...",must_see,Lower Saxony & Bremen
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Autostadt,Lower Saxony & Bremen,"A hit with car buffs of all ages, Autostadt is...",must_see,Lower Saxony & Bremen


#### Hamburg

In [4]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/hamburg_attractions - Sheet1.csv')
hamburg.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lp-cms-production.imgix.net/2019-06/fd...,https://www.lonelyplanet.com/germany/hamburg/a...,Mahnmal St-Nikolai,Altstadt,St Nikolai church was the world’s tallest buil...,must_see,Hamburg
1,https://lp-cms-production.imgix.net/2019-06/b7...,https://www.lonelyplanet.com/germany/hamburg/s...,Fischmarkt,St Pauli & Reeperbahn,Here's the perfect excuse to stay up all Satur...,must_see,Hamburg
2,https://lp-cms-production.imgix.net/2019-06/eb...,https://www.lonelyplanet.com/germany/hamburg/s...,Elbphilharmonie,Hamburg,Welcome to one of the most Europe's most excit...,must_see,Hamburg
3,https://lp-cms-production.imgix.net/2019-06/d7...,https://www.lonelyplanet.com/germany/hamburg/a...,Hamburger Kunsthalle,Altstadt,A treasure trove of art from the Renaissance t...,must_see,Hamburg
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/hamburg/a...,Rathaus,Altstadt,"With its spectacular coffered ceiling, Hamburg...",must_see,Hamburg


#### Munich

In [5]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/munich_attractions - Sheet1.csv')
munich.head()


Unnamed: 0,website1,website2,name,description,location,type,city
0,https://lp-cms-production.imgix.net/2019-06/c2...,https://www.lonelyplanet.com/germany/munich/ny...,Schloss Nymphenburg,This commanding palace and its lavish gardens ...,Munich,must_see,Munich
1,https://lp-cms-production.imgix.net/2019-06/ee...,https://www.lonelyplanet.com/germany/munich/al...,Residenzmuseum,Home to Bavaria's Wittelsbach rulers from 1508...,Munich,must_see,Munich
2,https://lp-cms-production.imgix.net/2019-06/3e...,https://www.lonelyplanet.com/germany/munich/ma...,Alte Pinakothek,Munich's main repository of Old European Maste...,Munich,must_see,Munich
3,https://lp-cms-production.imgix.net/2019-06/0d...,https://www.lonelyplanet.com/germany/munich/sc...,Englischer Garten,The sprawling English Garden is among Europe's...,Munich,must_see,Munich
4,https://lp-cms-production.imgix.net/2019-06/c9...,https://www.lonelyplanet.com/germany/munich/ma...,Pinakothek der Moderne,Germany's largest modern-art museum unites fou...,Munich,must_see,Munich


#### Combined dataset

In [6]:
#| echo: true
#| code-fold: true
#| panel: input
# Concatenate the three dataframes along the rows (axis=0)
germany_df = pd.concat([bremen, berlin, hamburg,munich], ignore_index=True)

# Save the combined dataframe to a new CSV file
germany_df.to_csv('combined_dataset.csv', index=False)
filtered_df = germany_df[~germany_df['type'].isin(['to_stay', 'to_eat'])]
filtered_df.head()



Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Herrenhäuser Gärten,Hanover,Proof that Hanover is not all buttoned-down bu...,must_see,Lower Saxony & Bremen
1,https://lp-cms-production.imgix.net/2019-06/2a...,https://www.lonelyplanet.com/germany/bergen-be...,Gedenkstätte Bergen-Belsen,Lower Saxony & Bremen,The Nazi-built camp at Bergen-Belsen began its...,must_see,Lower Saxony & Bremen
2,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Schloss Marienburg,Lower Saxony & Bremen,"Perched grandly above the Leine River, the neo...",must_see,Lower Saxony & Bremen
3,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/bremen-ci...,Denkort Bunker Valentin,Bremen City,"In 1943, the Nazis started construction of a m...",must_see,Lower Saxony & Bremen
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Autostadt,Lower Saxony & Bremen,"A hit with car buffs of all ages, Autostadt is...",must_see,Lower Saxony & Bremen


### Tours

#### Tours in Bremen

In [7]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/bremen_tour - Sheet1.csv')
tour_bremen.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,e-Scavenger hunt Bremen: Explore the city at y...,4.0 of 5 bubbles,15.0,Audio Guides,2–4 hours,Get to know Bremen in a unique and affordable ...,$34,per group,bremen
1,https://www.tripadvisor.com/AttractionProductR...,Bremen : Private Walking Tour With A Tour Guid...,3.0 of 5 bubbles,2.0,Historical Tours,2–6 hours,Get to know the city through the eyes of a loc...,$52,per adult,bremen
2,https://www.tripadvisor.com/AttractionProductR...,Bremen Schnoor Area Tour,5.0 of 5 bubbles,5.0,Historical Tours,1 hour,Explore with us the Schnoor area – a neighbour...,$24,per adult,bremen
3,https://www.tripadvisor.com/AttractionProductR...,Bremen Private Walking Tour With A Professiona...,5.0 of 5 bubbles,1.0,Walking Tours,1–2 hours,Meetingpoint: In front of the Town Hall or Mee...,$224,per group,bremen
4,https://www.tripadvisor.com/AttractionProductR...,Bremen - Private Historic Walking Tour,5.0 of 5 bubbles,1.0,Historical Tours,1–2 hours,"Discover the city of Bremen, a major cultural ...",$321,per group,bremen


#### Tours in Berlin

In [8]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/berlin_tour - Sheet1.csv')
tour_berlin.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,Discover Berlin Half-Day Walking Tour,5.0 of 5 bubbles,4191,Historical Tours,3–4 hours,See many of Berlin's most important landmarks ...,$22,per adult,berlin
1,https://www.tripadvisor.com/AttractionProductR...,River Cruise with Tour Guide in Berlin. Hadynski,4.5 of 5 bubbles,36,On the Water,1 hour,Enjoy our 1 hour river cruise through the old ...,$21,per adult,berlin
2,https://www.tripadvisor.com/AttractionProductR...,Berlin Third Reich and Cold War 2-Hour Walking...,5.0 of 5 bubbles,124,Historical Tours,2 hours,Learn the tumultuous contemporary history of B...,$22,per adult,berlin
3,https://www.tripadvisor.com/AttractionProductR...,Big Bus Berlin Hop-On Hop-Off Sightseeing Tour,4.0 of 5 bubbles,404,Audio Guides,2 hours,Enjoy this perfect introduction to Berlin on a...,$27,per adult,berlin
4,https://www.tripadvisor.com/AttractionProductR...,Berlin Food Walking Tour With Secret Food Tours,5.0 of 5 bubbles,477,Food & Drink,3 hours,"With so much great food in East Berlin, it can...",$105,per adult,berlin


#### Tours in Hamburg

In [9]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/hamburg_tour - Sheet1.csv')
tour_hamburg.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0 of 5 bubbles,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0 of 5 bubbles,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0 of 5 bubbles,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0 of 5 bubbles,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5 of 5 bubbles,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


#### Tours in Munich

In [10]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/munich_tour - Sheet1.csv')
tour_munich.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof VIP All-In...,4.5 of 5 bubbles,445,Sightseeing Packages,6+ hours,Leave Munich for a full-day tour to two royal ...,$215,per adult,munich
1,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5 of 5 bubbles,242,Historical Tours,6+ hours,"Drive with an airconditioned, comfortable coac...",$76,per adult,munich
2,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5 of 5 bubbles,1381,Historical Tours,6+ hours,"Bavaria is famous for its fairy-tale castles, ...",$80,per adult,munich
3,,,,How to navigate Munich’s museums like a pro,"Soccer fans, art lovers, and design nerds have...",Read now,,,,munich
4,https://www.tripadvisor.com/AttractionProductR...,Dachau Concentration Camp Memorial Site Tour f...,5.0 of 5 bubbles,1313,Historical Tours,5 hours,Visiting Germany’s Dachau Concentration Camp M...,$52,per adult,munich


#### Combined Tours Dataset

In [11]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Concatenate the three dataframes along the rows (axis=0)
all_tours = pd.concat([tour_hamburg, tour_berlin, tour_bremen,tour_munich], ignore_index=True)

# Save the combined dataframe to a new CSV file
all_tours.to_csv('combined_dataset.csv', index=False)
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0 of 5 bubbles,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0 of 5 bubbles,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0 of 5 bubbles,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0 of 5 bubbles,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5 of 5 bubbles,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [12]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
missing_values = all_tours.isnull().sum()
missing_values

website      18
name         18
score       929
reviews     908
type          0
duration      2
comment1     28
price        22
group        18
city          0
dtype: int64

In [13]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Drop rows where 'website' is NaN
all_tours = all_tours.dropna(subset=['website'])


In [14]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now, let's convert 'score' to float
def extract_score(score_str):
    try:
        return float(score_str.split()[0])  # Take the first part of the string, convert to float
    except:
        return None

# Apply the function to 'score' column
all_tours['score'] = all_tours['score'].apply(extract_score)

In [15]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [16]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
def convert_to_float(value):
    try:
        return float(value)
    except ValueError:
        return None

In [17]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours['reviews'] = all_tours['reviews'].apply(convert_to_float)



In [18]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false

all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0,784.0,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0,178.0,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0,63.0,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0,187.0,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5,21.0,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [19]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now, let's convert 'price' to float, handling non-numeric values
def convert_price_to_float(price_str):
    try:
        # Remove non-numeric characters and convert to float
        return float(''.join(filter(str.isdigit, str(price_str))))
    except ValueError:
        return None

In [20]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Apply the function to 'price' column
all_tours['price'] = all_tours['price'].apply(convert_price_to_float)

In [21]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Create a new column 'price_category' based on 'price' ranges
def categorize_price(price):
    if price <= 50:
        return 'Low'
    elif 50 < price <= 300:
        return 'Moderate'
    else:
        return 'High'


In [22]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Apply the function to create the 'price_category' column
all_tours['price_category'] = all_tours['price'].apply(categorize_price)

# Now the "price_category" column will have these categories
# You can then analyze or plot based on these categories
price_counts = all_tours['price_category'].value_counts()

In [23]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0,784.0,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,3.0,per adult,hamburg,Low
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0,178.0,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,264.0,per group,hamburg,Moderate
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0,63.0,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,22.0,per adult,hamburg,Low
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0,187.0,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,31.0,per adult,hamburg,Low
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5,21.0,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,38.0,per adult,hamburg,Low


In [24]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true

print(price_counts)


price_category
Moderate    788
High        552
Low         342
Name: count, dtype: int64


### Restaurant Data

In [25]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
filtered_rest_df = germany_df[germany_df['type'].isin(['to_eat'])]
filtered_rest_df.head()

Unnamed: 0,website1,website2,name,location,description,type,city
115,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Al-Dar,Hanover,This popular Syrian restaurant a stone's throw...,to_eat,Lower Saxony & Bremen
116,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Vietal Kitchen,Hanover,Pale-green louvres and bamboo lanterns set a F...,to_eat,Lower Saxony & Bremen
117,https://www.lonelyplanet.com/germany/hildeshei...,https://www.lonelyplanet.com/germany/hildeshei...,Schlegels Weinstuben,Lower Saxony & Bremen,"The lopsided walls of this rose-covered, 500-y...",to_eat,Lower Saxony & Bremen
118,https://www.lonelyplanet.com/germany/norderney...,https://www.lonelyplanet.com/germany/norderney...,Seesteg,East Frisian Islands,Nordeney's Michelin-starred restaurant offers ...,to_eat,Lower Saxony & Bremen
119,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Basil,Hanover,These former stables to the north of town now ...,to_eat,Lower Saxony & Bremen


### Hotel Dataset

In [26]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_berlin - Sheet1.csv')
hotel_berlin['city'] = 'berlin'

hotel_berlin.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/the-social-hu...,The Social Hub Berlin Alexanderplatz,"Mitte, Berlin",Center: 2.8 km,8.2,"7,473 reviews",Executive Queen Room,1 double bed,1 night,€80,berlin
1,https://www.booking.com/hotel/de/innside-by-me...,INNSiDE by Meliá Berlin Mitte,"Mitte, Berlin",Center: 1.9 km,8.3,Sustainability certification,"6,031 reviews",The INNSiDE Room,1 night,€93,berlin
2,https://www.booking.com/hotel/de/motel-one-ber...,Motel One Berlin Spittelmarkt,"Mitte, Berlin",Center: 1.7 km,8.7,"4,667 reviews",Queen room,1 double bed,1 night,€93,berlin
3,https://www.booking.com/hotel/de/titanic-chaus...,TITANIC Chaussee Berlin,"Mitte, Berlin",Center: 1.8 km,8.2,"11,604 reviews",Classic room,Several types of beds,1 night,€96,berlin
4,https://www.booking.com/hotel/de/motel-one-ber...,Motel One Berlin-Alexanderplatz,"Mitte, Berlin",Center: 2.4 km,8.6,"12,307 reviews",Queen room,1 double bed,1 night,€104,berlin


In [27]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_bremen - Sheet1.csv')
hotel_bremen['city'] = 'bremen'

hotel_bremen.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/townside-host...,Townside Hostel Bremen,"Mitte, Bremen",1.1 km from centre,7.7,469 reviews,Bed in 10-Bed Mixed Dormitory Room,1 bunk bed,"1 night, 1 adult",€ 23,bremen
1,https://www.booking.com/hotel/de/moxy-bremen.e...,Moxy Bremen,"Walle, Bremen",1.9 km from centre,8.3,"2,610 reviews",MOXY Sleeper Queen,1 large double bed,"1 night, 1 adult",€ 135,bremen
2,https://www.booking.com/hotel/de/am-werdersee....,Apartments Am Werdersee,"Neustadt, Bremen",1.8 km from centre,7.5,"1,324 reviews",Single Room with Shared Bathroom,"4 beds (2 singles, 1 double, 1 extra-large dou...","1 night, 1 adult",€ 47,bremen
3,https://www.booking.com/hotel/de/pension-isabe...,Pension Isabel I,"Neustadt, Bremen",0.8 km from centre,7.9,480 reviews,Single Room with Shared Bathroom,1 single bed,"1 night, 1 adult",€ 46,bremen
4,https://www.booking.com/hotel/de/hastedter-hee...,Rana's Zimmervermittlung,"Hemelingen, Bremen",4.3 km from centre,6.7,519 reviews,Apartment,2 single beds,"1 night, 1 adult",€ 30,bremen


In [28]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_hamburg - Sheet1.csv')
hotel_hamburg['city'] = 'hamburg'

hotel_hamburg.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/apartment040....,Apartment040,"Uhlenhorst, Hamburg",2.8 km from centre,8.7,"2,623 reviews",Superior Studio,1 double bed,"1 night, 1 adult",€ 292,hamburg
1,https://www.booking.com/hotel/de/cab20.en-gb.h...,CAB20,"St. Georg, Hamburg",1.1 km from centre,8.5,"15,506 reviews",Single Cabin with Shared Bathroom,1 single bed,"1 night, 1 adult",€ 178,hamburg
2,https://www.booking.com/hotel/de/hood-house.en...,hood house,"Winterhude, Hamburg",4 km from centre,8.6,"3,309 reviews",Cozyhood+,1 large double bed,"1 night, 1 adult",€ 379,hamburg
3,https://www.booking.com/hotel/de/apartmenthote...,Apartment-Hotel Hamburg Mitte,Hamburg,3.4 km from centre,8.2,"8,221 reviews",Junior Suite with Balcony,"2 beds (1 extra-large double, 1 sofa bed)","1 night, 1 adult",€ 454,hamburg
4,https://www.booking.com/hotel/de/hampton-by-hi...,Hampton By Hilton Hamburg City Centre,"Hammerbrook, Hamburg",1.1 km from centre,7.9,"9,997 reviews",Queen Room with Sofa Bed,"2 beds (1 sofa bed, 1 large double)","1 night, 1 adult",€ 222,hamburg


In [29]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_munich - Sheet1.csv')
# Add a new column with all rows containing "munich"
hotel_munich['city'] = 'munich'
hotel_munich.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317 reviews,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",€ 265,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,"1,538 reviews",2,1 single bed,"1 night, 1 adult",€ 52,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,"4,883 reviews",2,1 single bed,"1 night, 1 adult",€ 122,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,"6,591 reviews",2,1 double bed,"1 night, 1 adult",€ 79,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,"1,763 reviews",2,1 double bed,"1 night, 1 adult",€ 176,munich


In [30]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Concatenate the three dataframes along the rows (axis=0)
combined_hotel = pd.concat([hotel_munich, hotel_hamburg, hotel_berlin,hotel_bremen], ignore_index=True)

# Save the combined dataframe to a new CSV file
combined_hotel.to_csv('combined_hotel.csv', index=False)
combined_hotel.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317 reviews,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",€ 265,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,"1,538 reviews",2,1 single bed,"1 night, 1 adult",€ 52,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,"4,883 reviews",2,1 single bed,"1 night, 1 adult",€ 122,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,"6,591 reviews",2,1 double bed,"1 night, 1 adult",€ 79,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,"1,763 reviews",2,1 double bed,"1 night, 1 adult",€ 176,munich


In [31]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
combined_hotel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1225 entries, 0 to 1224
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   website   1225 non-null   object
 1   name      1225 non-null   object
 2   location  1225 non-null   object
 3   distance  1225 non-null   object
 4   score     1211 non-null   object
 5   reviews   1211 non-null   object
 6   type      1225 non-null   object
 7   type2     1221 non-null   object
 8   day       1220 non-null   object
 9   price     1225 non-null   object
 10  city      1225 non-null   object
dtypes: object(11)
memory usage: 105.4+ KB


In [32]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
print(combined_hotel['score'].unique())


[8.3 8.2 8.4 8.1 6.3 8.9 7.8 7.9 7.5 7.2 8.7 7.7 7.3 8.5 8.0 8.6 7.1 7.6
 6.8 7.0 7.4 9.1 6.7 8.8 6.2 9.3 5.3 6.4 6.9 6.6 9.0 '8.7' '8.5' '8.6'
 '8.2' '7.9' '7.8' '6.9' '8.3' '9.3' '7.7' '6.8' '8.8' '8.1' '7.6' '8'
 '8.4' '7.5' '7.3' '6.7' '7.1' '9.1' '6.2' '7.4' '7.2' '9' '6.3' '5.7'
 '9.4' '6.4' '6.6' '6' '5.2' '5.4' '7' '9.6' '5' '3.2' '6.5' '5.3' '5.1'
 '6.1' '5.9' nan '5.8' '1.5' '3.7' '9.2' 'Exceptional 10' '9.7' 9.2 6.1
 6.5 '3' '4.9' '5.5' '2.9' '9.8' '10' '5.6' '4.4' '3.6' '4.1' '1'
 'Exceptional 10.0' '8.9']


In [33]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
print(combined_hotel['reviews'].unique())


['317 reviews' '1,538 reviews' '4,883 reviews' '6,591 reviews'
 '1,763 reviews' '10,645 reviews' '1,794 reviews' '16,206 reviews'
 '619 reviews' '8,167 reviews' '3,879 reviews' '4,455 reviews'
 '1,246 reviews' '8,848 reviews' '1,014 reviews' '1,990 reviews'
 '3,349 reviews' '4,924 reviews' '3,545 reviews' '4,127 reviews'
 '5,244 reviews' '2,502 reviews' '4,773 reviews' '1,743 reviews'
 '2,942 reviews' '1,801 reviews' '1,316 reviews' '26,017 reviews'
 '1,241 reviews' '1,171 reviews' '703 reviews' '2,734 reviews'
 '3,418 reviews' '4,062 reviews' '2,598 reviews' '12,416 reviews'
 '2,770 reviews' '982 reviews' '3,937 reviews' '4,105 reviews'
 '1,018 reviews' '7,333 reviews' '1,541 reviews' '1,379 reviews'
 '3,663 reviews' '3,232 reviews' '13,768 reviews' '8,170 reviews'
 '3,056 reviews' '4,060 reviews' '2,995 reviews' '2,930 reviews'
 '3,399 reviews' '1,526 reviews' '1,322 reviews' '6,849 reviews'
 '4,049 reviews' '2,100 reviews' '4,428 reviews' '4,666 reviews'
 '628 reviews' '3,436 review

In [34]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
import re

# Define a function to extract numeric values from strings
def extract_numeric_reviews(reviews):
    try:
        # Use regular expression to extract numeric values
        numeric_reviews = re.sub(r'[^0-9]', '', reviews)
        return int(numeric_reviews)
    except (ValueError, TypeError):
        # Handle exceptions
        return None

# Apply the function to the 'reviews' column
combined_hotel['reviews'] = combined_hotel['reviews'].apply(extract_numeric_reviews)


In [35]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
import numpy as np

# Define a function to convert string representations of numbers to float
def convert_to_float(score):
    try:
        return float(score)
    except (ValueError, TypeError):
        # Handle exceptions such as 'Exceptional 10' and 'nan'
        return np.nan

# Apply the function to the 'score' column
combined_hotel['score'] = combined_hotel['score'].apply(convert_to_float)


In [36]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Remove '€' symbol and comma, then convert 'price' column to float
combined_hotel['price'] = combined_hotel['price'].str.replace('€', '').str.replace(',', '').astype(float)


In [37]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true

combined_hotel.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317.0,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",265.0,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,1538.0,2,1 single bed,"1 night, 1 adult",52.0,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,4883.0,2,1 single bed,"1 night, 1 adult",122.0,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,6591.0,2,1 double bed,"1 night, 1 adult",79.0,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,1763.0,2,1 double bed,"1 night, 1 adult",176.0,munich


In [38]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Count the number of prices in different ranges
under_100_count = (combined_hotel['price'] < 100).sum()
between_100_500_count = ((combined_hotel['price'] >= 100) & (combined_hotel['price'] <= 300)).sum()
over_500_count = (combined_hotel['price'] > 500).sum()

# Print the counts
print("Number of hotels:")
print(f"- Under 100 euro: {under_100_count}")
print(f"- Between 100 and 500 euro: {between_100_500_count}")
print(f"- Over 500 euro: {over_500_count}")

Number of hotels:
- Under 100 euro: 416
- Between 100 and 500 euro: 751
- Over 500 euro: 8


In [39]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Group the DataFrame by the 'city' column and count the number of hotels under 100 euros in each city
under_100_count_per_city = combined_hotel[combined_hotel['price'] < 100].groupby('city').size()

# Print the counts for each city
print("Number of hotels under 100 euro in each city:")
print(under_100_count_per_city)


Number of hotels under 100 euro in each city:
city
berlin      97
bremen      60
hamburg     29
munich     230
dtype: int64


## Recommendation Engine Development


### City Recommendation Engine

In [40]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Preprocess the text data

def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove punctuation and lowercase the tokens
    table = str.maketrans('', '', string.punctuation)
    tokens = [w.translate(table).lower() for w in tokens]
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]
    # Join the tokens back into a string
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

# Apply preprocessing to the "description" column
filtered_df['preprocessed_description'] = filtered_df['description'].apply(preprocess_text)


In [41]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Initialize BERTopic model with nr_topics set to 24
topic_model = BERTopic(nr_topics=24)

# Fit the model on preprocessed descriptions
topics, _ = topic_model.fit_transform(filtered_df['preprocessed_description'])
filtered_df['topic'] = topics


# Print the topics and associated descriptions
print("Topics and Associated Descriptions:")
for topic_id in filtered_df['topic'].unique():
    topic_description = filtered_df[filtered_df['topic'] == topic_id]['description'].iloc[0]
    print(f"Topic {topic_id}: {topic_description}")

Topics and Associated Descriptions:
Topic -1: Proof that Hanover is not all buttoned-down business are the grandiose Baroque Royal Gardens of Herrenhausen, about 5km north of the city centre, which…
Topic 17: The Nazi-built camp at Bergen-Belsen began its existence in 1940 as a POW camp, but became a concentration camp after being taken over by the SS in 1943,…
Topic 20: For art lovers, the highlight of Bremen’s Kulturmeile (Cultural Mile) is the Kunsthalle, which presents a large permanent collection of paintings,…
Topic 11: The charming medieval coopers lane was transformed into a prime example of mostly expressionist architecture in the 1920s at the instigation of coffee…
Topic 16: An excellent way to get your bearings in Hanover is to visit the Neues Rathaus (built 1901–13) and ascend 98m in the curved elevator (the only one of its…
Topic 21: This shiny, space-age museum offers a journey around the world along the longitudinal meridian 8° east, through climate zones in Switzerland,

Choosing the topic


In [42]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,515,-1_berlin_one_the_museum,"[berlin, one, the, museum, building, st, germa...",[former print shop nonprofit museum one large...
1,0,13,0_sand_beach_island_sky,"[sand, beach, island, sky, overlooking, hotel,...",[hamburg original beach bar must primo beer b...
2,1,20,1_cinema_movies_arthouse_films,"[cinema, movies, arthouse, films, screens, ind...",[petite singlescreen flick palace presented mo...
3,2,20,2_team_football_sporting_games,"[team, football, sporting, games, events, bund...",[fans sv hamburg city successful football tea...
4,3,11,3_wwii_bombing_reconstructed_church,"[wwii, bombing, reconstructed, church, 1943, r...",[former gothic church dating 1347 never repa...
5,4,24,4_theatre_shows_stages_musical,"[theatre, shows, stages, musical, contemporary...",[tipi stages yearround program professional ca...
6,5,10,5_opera_bertolt_langhoff_brecht,"[opera, bertolt, langhoff, brecht, staatsoper,...",[berlin ’ opulent state opera commissioned roy...
7,6,47,6_djs_club_dance_hiphop,"[djs, club, dance, hiphop, parties, crowd, and...",[part livemusic venue part club glocksee eve...
8,7,24,7_jazz_club_blues_concerts,"[jazz, club, blues, concerts, music, venue, ho...",[one berlin oldest live music clubs quasimodo...
9,8,13,8_stalls_saturday_leafy_market,"[stalls, saturday, leafy, market, wednesday, s...",[wednesday saturday morning luck hohum winter...


In [43]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Prompt the user to choose a topic
chosen_topic_id = int(input("Please choose a topic ID from the list above: "))

# Print the chosen topic and associated descriptions
print(f"\nChosen Topic {chosen_topic_id} and Associated Descriptions:")
chosen_topic_descriptions = filtered_df[filtered_df['topic'] == chosen_topic_id]['description']
for description in chosen_topic_descriptions:
    print(description)


Chosen Topic 4 and Associated Descriptions:
An old-school type of variety theatre with dancing, acrobatics, circus-style acts, magic, music and more, housed in the Georgspalast. It also boasts the…
Home to the Staatstheater Hannover and stages both new works and classics like Medea and Hamlet. It's sometimes used to host international performances…
Stages excellent performances of contemporary theatre and cabaret. Performances are in German.
Stages various comedies and theatre with musical accompaniment.
The highly acclaimed Bremer Shakespeare Company mixes the Bard (in German) with fairy tales and contemporary works.
This highly respected cultural centre showcases contemporary non-European art, music, dance, literature, films and theatre, and also serves as a…
Life’s still a cabaret at this intimate 1912 mirrored art nouveau tent theatre, one of Berlin's most beloved venues for sophisticated song-and-dance shows…
This musical and visual extravaganza, starring slightly nutty and energ

The code allows the user to select a topic by entering a topic ID, and then it retrieves and displays the corresponding descriptions for that chosen topic from a DataFrame.

In [44]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
topic_model.visualize_topics()


In [45]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Merge topics with the original dataframe
merged_df = pd.merge(filtered_df, filtered_df.groupby('location')['topic'].apply(list).reset_index(name='topics'), on='location')

# Group by city and aggregate the topics
city_topics = merged_df.groupby('location')['topics'].agg(lambda x: [item for sublist in x for item in sublist]).reset_index()

# Print cities and their associated topics
print("Cities and Associated Topics:")
for index, row in city_topics.iterrows():
    print(f"{row['location']}: {row['topics']}")

Cities and Associated Topics:
Altstadt: [-1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12, -1, -1, -1, 2, -1, -1, -1, -1, 16, -1, 10, -1, 20, 6, -1, 19, 11, 12

In [46]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now let's visualize these cities and their associated topics on a scatterplot
import plotly.express as px

# Create a DataFrame for the scatterplot
scatter_df = pd.DataFrame({
    'location': city_topics['location'],
    'Topic': city_topics['topics']
})

# Reshape the DataFrame so that each topic gets its own row
scatter_df = scatter_df.explode('Topic').reset_index(drop=True)

# Create the scatterplot
fig = px.scatter(scatter_df, x='location', y='Topic', color='Topic',
                 title='Cities and Associated Topics',
                 labels={'Citlocationy': 'location', 'Topic': 'Topic'},
                 width=800, height=600)

# Update layout for better visualization
fig.update_traces(marker=dict(size=12, opacity=0.8),
                  selector=dict(mode='markers'))

# Show the plot
fig.show()


In [47]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Filter the DataFrame for Topic 3
topic_3_cities = scatter_df[scatter_df['Topic'] == chosen_topic_id]

# Count the occurrences of each city for Topic 3
city_counts = topic_3_cities['location'].value_counts()

# Get the city with the highest count
most_common_city = city_counts.idxmax()
count_of_most_common_city = city_counts.max()

print(f"The most relevant city for {chosen_topic_id} is {most_common_city} with a count of {count_of_most_common_city}.")

The most relevant city for 4 is Munich with a count of 860.


In [48]:
#| echo: true
#| code-fold: false
#| panel: input
#| output: true
most_common_city

'Munich'

In [49]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
for topic_num in range(-1, 24):
    # Filter the DataFrame for the current topic
    topic_cities = scatter_df[scatter_df['Topic'] == topic_num]

    # Check if there are any entries for this topic
    if not topic_cities.empty:
        # Count the occurrences of each city for the current topic
        city_counts = topic_cities['location'].value_counts()

        # Get the city with the highest count
        most_common_city = city_counts.idxmax()
        count_of_most_common_city = city_counts.max()

        print(f"The most relevant city for Topic {topic_num} is {most_common_city} with a count of {count_of_most_common_city}.")
    else:
        print(f"No data for Topic {topic_num}.")

The most relevant city for Topic -1 is Munich with a count of 15136.
The most relevant city for Topic 0 is Kreuzberg with a count of 182.
The most relevant city for Topic 1 is Berlin with a count of 1312.
The most relevant city for Topic 2 is Munich with a count of 1376.
The most relevant city for Topic 3 is Lower Saxony & Bremen with a count of 212.
The most relevant city for Topic 4 is Munich with a count of 860.
The most relevant city for Topic 5 is Berlin with a count of 328.
The most relevant city for Topic 6 is Munich with a count of 1204.
The most relevant city for Topic 7 is Munich with a count of 688.
The most relevant city for Topic 8 is Hamburg with a count of 219.
The most relevant city for Topic 9 is Munich with a count of 344.
The most relevant city for Topic 10 is Berlin with a count of 328.
The most relevant city for Topic 11 is Berlin with a count of 656.
The most relevant city for Topic 12 is Berlin with a count of 2788.
The most relevant city for Topic 13 is Hamburg 

This series of statements are presenting the most relevant cities for different topics, determined based on the highest number of associated or relevant items.

In [50]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Convert "Lower Saxony & Bremen" to "Bremen" if applicable
if most_common_city == "Lower Saxony & Bremen":
    most_common_city = "bremen"
if most_common_city == "Berlin":
    most_common_city = "berlin"
if most_common_city == "Munich":
    most_common_city = "munich"
if most_common_city == "Hamburg":
    most_common_city = "hamburg"

In [51]:
most_common_city

'munich'

### Tour Recommender Engine

In [52]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Filter rows where the city is "munich"
city_tours = all_tours[all_tours['city'] == most_common_city]

# Display the first few rows of the new dataset
city_tours.head()


Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
1045,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof VIP All-In...,4.5,445.0,Sightseeing Packages,6+ hours,Leave Munich for a full-day tour to two royal ...,215.0,per adult,munich,Moderate
1046,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5,242.0,Historical Tours,6+ hours,"Drive with an airconditioned, comfortable coac...",76.0,per adult,munich,Moderate
1047,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5,,Historical Tours,6+ hours,"Bavaria is famous for its fairy-tale castles, ...",80.0,per adult,munich,Moderate
1049,https://www.tripadvisor.com/AttractionProductR...,Dachau Concentration Camp Memorial Site Tour f...,5.0,,Historical Tours,5 hours,Visiting Germany’s Dachau Concentration Camp M...,52.0,per adult,munich,Moderate
1050,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle Tour from Munich,5.0,,Historical Tours,6+ hours,Visiting Bavaria’s magical Neuschwanstein Cast...,80.0,per adult,munich,Moderate


In [53]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
def recommend_tours(budget):
    if budget <= 1000:
        # Filter tours in the low price category
        low_price_tours = city_tours[city_tours['price'] <= 50]
        
        # Get top 5 tours with the highest score in the low price category
        top_low_price_tours = low_price_tours.nlargest(5, 'score')
        
        return top_low_price_tours
    
    elif 1000 < budget <= 10000:
        # Filter tours in the moderate price category
        moderate_price_tours = city_tours[(city_tours['price'] > 50) & (city_tours['price'] <= 300)]
        
        # Get top 5 tours with the highest score in the moderate price category
        top_moderate_price_tours = moderate_price_tours.nlargest(5, 'score')
        
        return top_moderate_price_tours
    
    else:
        # Filter tours in the high price category
        high_price_tours = city_tours[city_tours['price'] > 10000]
        
        # Get top 5 tours with the highest score in the high price category
        top_high_price_tours = high_price_tours.nlargest(5, 'score')
        
        return top_high_price_tours


In [54]:
#| echo: true
#| code-fold: true
#| panel: input
# Now, let's convert 'score' to float
def extract_score(score_str):
    try:
        return float(score_str.split()[0])  # Take the first part of the string, convert to float
    except:
        return None

# Apply the function to 'score' column
all_tours['score'] = all_tours['score'].apply(extract_score)

In [55]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,,784.0,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,3.0,per adult,hamburg,Low
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,,178.0,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,264.0,per group,hamburg,Moderate
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,,63.0,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,22.0,per adult,hamburg,Low
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,,187.0,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,31.0,per adult,hamburg,Low
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,,21.0,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,38.0,per adult,hamburg,Low


In [56]:
# Ask the user for their budget for a 1-week trip
budget = float(input("Please enter your budget for a 1-week trip: "))

print(f"\nYour budget for a 1-week trip is {budget}€.")


Your budget for a 1-week trip is 200.0€.


When this code runs, it will first ask the user to enter their budget for a 1-week trip, and then it will print a message displaying the budget amount entered by the user, along with the currency symbol.

In [57]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
# Example usage
top_tours = recommend_tours(budget)

print("Top 5 Tours in  based on Budget:")
top_tours.head()

Top 5 Tours in  based on Budget:


Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
1051,https://www.tripadvisor.com/AttractionProductR...,Third Reich Walking Tour Munich,5.0,,Historical Tours,2–3 hours,As birthplace of the Nazi Party and home of it...,35.0,per adult,munich,Low
1061,https://www.tripadvisor.com/AttractionProductR...,Munich Sightseeing Guided Bike Tour,5.0,527.0,Historical Tours,3–4 hours,See more of Munich in a short amount of time t...,44.0,per adult,munich,Low
1074,https://www.tripadvisor.com/AttractionProductR...,Full-Day Dachau Concentration Camp Memorial Si...,5.0,285.0,Public Transportation Tours,6 hours,Dachau Concentration Camp Memorial Site is a p...,50.0,per adult,munich,Low
1078,https://www.tripadvisor.com/AttractionProductR...,Dachau Small-Group Half-Day Tour from Munich B...,5.0,224.0,Historical Tours,5 hours,Pay respects to the innocent victims of Hitler...,50.0,per adult,munich,Low
1083,https://www.tripadvisor.com/AttractionProductR...,Dachau Tour from Munich,5.0,442.0,Historical Tours,5 hours,All of our guides are officially authorized to...,50.0,per adult,munich,Low


### To-do Recommendation Engine

In [58]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false

# Convert "Lower Saxony & Bremen" to "Bremen" if applicable
if most_common_city == "bremen":
    most_common_city = "Bremen"
if most_common_city == "berlin":
    most_common_city = "Berlin"
if most_common_city == "munich":
    most_common_city = "Munich"
if most_common_city == "hamburg":
    most_common_city = "Hamburg"

# Filter the DataFrame for the most relevant city
city_df = filtered_df[filtered_df['location'] == most_common_city]


In [59]:
most_common_city


'Munich'

In [60]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Sort the rows based on their relevance to Topic 3
topic_3_sorted = city_df[city_df['topic'] == chosen_topic_id].sort_values(by='topic', ascending=False)

# Get the top 5 things to do in the city for Topic 3
top_5_things_to_do = topic_3_sorted.head(5)[['website1', 'name', 'location', 'description', 'type', 'city']]


In [61]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
print(f"The best 5 things to do in {most_common_city} for Topic 3 are:")
top_5_things_to_do.head()

The best 5 things to do in Munich for Topic 3 are:


Unnamed: 0,website1,name,location,description,type,city
1654,https://www.lonelyplanet.com/germany/munich/al...,Bayerisches Staatsschauspiel,Munich,This leading ensemble has gone alternative in ...,to_entertain,Munich
1655,https://www.lonelyplanet.com/germany/munich/al...,Münchner Kammerspiele,Munich,"A venerable theatre with an edgy bent, the Kam...",to_entertain,Munich
1663,https://www.lonelyplanet.com/germany/munich/ma...,Münchner Theater für Kinder,Munich,At the Münchner Theater für Kinder budding the...,to_entertain,Munich
1664,https://www.lonelyplanet.com/germany/munich/al...,Staatstheater am Gärtnerplatz,Munich,Spruced up to southern German standards for it...,to_entertain,Munich
1667,https://www.lonelyplanet.com/germany/munich/ha...,GOP Varieté Theater,Munich,"Hosts a real jumble of acts and shows, from ma...",to_entertain,Munich


### Restaurant Recommendation

In [62]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Preprocess the text data
def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove punctuation and lowercase the tokens
    table = str.maketrans('', '', string.punctuation)
    tokens = [w.translate(table).lower() for w in tokens]
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]
    # Join the tokens back into a string
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

# Apply preprocessing to the "description" column
filtered_rest_df['preprocessed_description'] = filtered_rest_df['description'].apply(preprocess_text)

# Initialize BERTopic model
topic_model = BERTopic()

# Fit the model on preprocessed descriptions
topics, _ = topic_model.fit_transform(filtered_rest_df['preprocessed_description'])
filtered_rest_df['topic'] = topics


In [63]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Print the topics and associated descriptions for filtered_df
print("Topics and Associated Descriptions for filtered_df:")
for topic_id in filtered_rest_df['topic'].unique():
    topic_description = filtered_rest_df[filtered_rest_df['topic'] == topic_id]['description'].iloc[0]
    print(f"Topic {topic_id}: {topic_description}")

Topics and Associated Descriptions for filtered_df:
Topic -1: This popular Syrian restaurant a stone's throw from the Hauptbahnhof features hushed, candlelit dining and attentive service within a pleasant, minimalist…
Topic 2: Pale-green louvres and bamboo lanterns set a French-colonial vibe for this trendy new Oststadt restaurant offering a modern-fusion take on traditional…
Topic 0: The lopsided walls of this rose-covered, 500-year-old house add to its charm. Inside are historic rooms and, in one corner, a round, glass-topped table…
Topic 5: This wonderful seafood restaurant is spread out over several dining rooms with distinct nautical styling (including one that purports to be a copy of…
Topic 3: Feast on zebra steak (yes, really), thieboudienne (a traditional Senegalese fish dish) and other African specialities at this little restaurant in the…
Topic 1: This long-standing local favourite evokes a slight Mediterranean vibe, and serves up salads, fresh pastas and an inspired selecti

In [64]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Merge topics with the original dataframe
merged_df = pd.merge(filtered_rest_df, filtered_rest_df.groupby('location')['topic'].apply(list).reset_index(name='topics'), on='location')

# Group by city and aggregate the topics
city_topics = merged_df.groupby('location')['topics'].agg(lambda x: [item for sublist in x for item in sublist]).reset_index()

# Print cities and their associated topics
print("Cities and Associated Topics:")
for index, row in city_topics.iterrows():
    print(f"{row['location']}: {row['topics']}")

Cities and Associated Topics:
Altstadt: [-1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3, -1, -1, -1, 1, -1, -1, -1, 3, 0, 5, -1, 3]
Berlin: [4, -1, -1, 7, -1, -1, 6, 0, 0, 0, -1, -1, 1, 1, -1, 3, 2, 2, 0, 0, -1, -1, 1, 2, 0, 1, 6, -1, -1, 0, 4, -1, -1, 0, 4, 0, -1, -1, 4, 2, -1, -1, 1, 0, -1, 6, -1, -1, 0, -1, 3, -1, 2, -1, 1, 4, -1, 4, -1, -1, 7, -1, -1, 6, 0, 0, 0, -1, -1, 1, 1, -1, 3, 2, 2, 0, 0, -1, -1, 1, 2, 0, 1, 6, -1, -1, 0, 4, -1, -1, 0, 4, 0, -1, -1, 4, 2, -1, -1, 1, 0, -1, 6, -1, -1, 0, -1, 3, -1, 2, -1, 1, 4, -1, 4, -1, -1, 7, -1, -1, 6, 0, 0, 0, -1, -1, 1,

In [65]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now let's visualize these cities and their associated topics on a scatterplot
import plotly.express as px

# Create a DataFrame for the scatterplot
scatter_df = pd.DataFrame({
    'location': city_topics['location'],
    'Topic': city_topics['topics']
})

# Reshape the DataFrame so that each topic gets its own row
scatter_df = scatter_df.explode('Topic').reset_index(drop=True)

# Create the scatterplot
fig = px.scatter(scatter_df, x='location', y='Topic', color='Topic',
                 title='Cities and Associated Topics',
                 labels={'Citlocationy': 'location', 'Topic': 'Topic'},
                 width=800, height=600)

# Update layout for better visualization
fig.update_traces(marker=dict(size=12, opacity=0.8),
                  selector=dict(mode='markers'))

# Show the plot
fig.show()


In [66]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
# Filter the DataFrame for Topic 3
topic_3_cities = scatter_df[scatter_df['Topic'] == 2]

# Count the occurrences of each city for Topic 3
city_counts = topic_3_cities['location'].value_counts()

# Get the city with the highest count
most_common_city = city_counts.idxmax()
count_of_most_common_city = city_counts.max()

print(f"The most relevant city for Topic 3 is {most_common_city} with a count of {count_of_most_common_city}.")

The most relevant city for Topic 3 is Munich with a count of 450.


In [67]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
topic_model.visualize_topics()


In [68]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Assuming the most relevant city for Topic 3 is "Altstadt"
most_common_city = "Berlin"

# Filter the DataFrame for the most relevant city
city_df = filtered_rest_df[filtered_rest_df['location'] == most_common_city]


In [69]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Sort the rows based on their relevance to Topic 3
topic_3_sorted = city_df[city_df['topic'] == 3].sort_values(by='topic', ascending=False)

# Get the top 5 things to do in the city for Topic 3
top_5_things_to_do = topic_3_sorted.head(5)[['website1', 'name', 'location', 'description', 'type', 'city']]


In [70]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
print(f"The best restaurants to go in {most_common_city} for Topic 3 are:")
top_5_things_to_do.head()

The best restaurants to go in Berlin for Topic 3 are:


Unnamed: 0,website1,name,location,description,type,city
656,https://lonelyplanetstatic.imgix.net/marketing...,Valladares Feinkost,Berlin,Beautifully situated on the quiet Stephanplatz...,to_eat,Berlin
807,https://lonelyplanetstatic.imgix.net/marketing...,geh Veg,Berlin,"Showing off the simple side of vegan eating, g...",to_eat,Berlin


### User Interface Design


The Travel Destination Recommender system that we have constructed seems to be operational and includes different parts like city, tour, to-do, and restaurant suggestions which are derived by taking into account what the user prefers along with data analysis. But there are cases where certain topics do not have data available for some cities which could be dealt with either suggesting alternative recommendations or handling such cases gracefully.

The user interface and user experience could be enhanced as a potential area for improvement; to this end, we seek to make it more intuitive— in its use— and user-friendly. In addition, we may think about applying even more advanced recommendation algorithms or machine learning techniques that will allow us to work on the recommendations' accuracy and topicality at a higher level.

# 3. Analysis and Results


### **Data Analysis Techniques Used**

**Topic Modeling**
The technique for uncovering hidden topics and their relationships in the travel data collection is called BERT — Bidirectional Encoder Representations from Transformers, a language model developed by Google (Devlin et al., 2019). Being a transformer-based model trained bidirectionally, BERT learns to catch words' contextual meanings from both directions. It enables better representation of topics with more accuracy. Our process was fine-tuning the already pre-trained BERT model on the specifics of our travel dataset which included descriptions and reviews besides other textual information related to attractions, restaurants, hotels and local experiences. The use of BERT's strong understanding capability allowed us to find such topics as 'historical landmarks,' 'outdoor adventures' or 'culinary experiences' easily by automatically identifying and grouping similar topics. In comparison to LDA, leveraging BERT for topic modeling had numerous advantages. 

**Content-Based Filtering:**
In order to recommend attractions, restaurants, hotels and other travel experiences based on user preferences and interests we utilized information regarding the textual content (descriptions, reviews etc.) of these entities. This allowed us to extract relevant features and characteristics which matched what the user would be interested in. The techniques used for this purpose include text preprocessing, feature extraction (e.g., TF-IDF vectorization), and similarity calculations (e.g., cosine similarity).

**Collaborative Filtering:**
We also integrated user ratings and reviews on various travel journeys to find like-minded users and suggest options that others with similar preferences have liked. Some of the methods used include user-based or item-based collaborative filtering, matrix factorization, and neighborhood-based approaches.


**Data Cleaning and Preprocessing:**
Extensive data cleaning and preprocessing steps were undertaken to ensure the quality and consistency of our travel data. These steps included handling missing values, removing duplicates, formatting data, and applying techniques like text normalization and tokenization to prepare the data for analysis.

Throughout this development process, we continuously explored, tried and failed. Gained lots of experience on our way to create this engine to provide accurate, personalized and insightful travel recommendations for our users

### **Presentation & Results**
Our system The Travel Destination Recommender is going to give you more than just a basic rundown of what to see when you’re traveling. It’s going to provide tailored recommendations across various parts of your journey— all based on what you prefer or what piques your interest. Think user-friendly: we’ll suggest cities that suit you in an easy-to-read format (maybe even rank them) or show them on an interactive map. For these recommended cities, we’re not just giving you a laundry list of places to visit without context— expect descriptions and images for those must-see attractions or local gems. Oh, and guided tours plus organized activities within your budget and desired duration are also part of the deal. Because why settle for typical tourist hotspots? We want you savoring local flavors at restaurants fit for your taste buds and dietary needs (all while considering your wallet). 


# 4. Reflection and Conclusion

### **Challenges Faced**

Limitations were met during the creation of the recommendation engine by the APIs that were provided by some travel websites. Unfortunately, these limitations have blocked our ability to collect well-organized and clear data, in turn hindering our efforts to develop a reliable and accurate recommendation system.

**Data Quality Issues:** The information obtained from our travel website sources was found to be incomplete or not coherent. Consequently, a lot of work had to be done in order to clean and streamline the data. When developing the recommendation system, one of the major challenges that had to be contended with was ensuring data quality — which typically involves issues like missing values or lack of uniformity in how the information is presented.

**Topic Modeling Limitations:** When we were performing the topic modeling, we used the BERT language model, but this approach had its limitations— mainly due to difficulties in fine-tuning and optimizing its performance for a precise classification of user interests and preferences that came along with this complex task plus insufficient amount of labeled data available for it.

**Large-Scale Data Processing:** The amount of data in the project made it challenging to handle and manage efficiently, especially when processing large datasets across a variety of resources and user preferences The computational requirements associated with large -scale data processed presented a significant obstacle. Consequently, we used 4 main cities in Germany instead of all cities in Germany or Europe but the goal is to reach out as many cities as we can reach and provide good and quality recommendations.

### **Lessons Learned**

Lessons learnt from the project include several points. Standardizing data collection and processing is the first step towards developing a recommendation engine, easily done by ensuring new data sources are integrated into development. It helps in having a unified approach to all processes and therefore avoids redundant tasks that would otherwise be done multiple times for each system. Knowledge of the specific domain ensures capturing user preferences accurately based on real-world feedback which guides algorithm refinement and optimization continually through an iterative process. Growing user data volume and complexity due to scalability challenges require addressing with robust algorithms while resource constraints necessitate consideration for efficiency. The diverse skill set needed includes domain expertise (to understand user preferences), machine learning (for algorithm development), data processing (to support the large volume of user data) and optimization techniques at different stages during production deployment. Algorithmic sophistication coupled with computational efficiency towards achieving topic coherence demands fine-tuning BERT-based language models for optimization quality and domain knowledge applicability — an evolving need reflected by these insights stemming from continuous improvement highlighted as being important based on users' changing requirements later on as well.


The experience gained from this project has provided valuable insights into the challenges associated with developing accurate and scalable recommendation engines. The lessons learned highlight the importance of **continuously improving** the underlying algorithms and models to meet the evolving needs of users effectively.

### **Overall Significance of Findings:**

As we see in the results the travel destination recommender engine evaluates a wide range of subjects and their most relevant cities, with Munich and Berlin standing out as major spots. Munich's consistent top ranking across various topics including history tours, places of interest and nightlife underscores its rich offerings making it an attractive choice for people with diverse interests. On the other hand, although less mentioned, Berlin stands out for its historical importance and lively food culture especially noted in areas such as WWII history and vegan dining thus appealing to specific interest groups. The feature of the engine being able to spot the best tours that fit within a user’s budget, along with identifying highly rated restaurants— indicates practicality for designing personalized yet cost-effective trips. It is quite detailed in its approach which allows users to gain insight on what each city offers thereby facilitating travel plans through recommendation alignment with user preferences.


### **Future Directions:**

There are several future directions that would enhance the Travel Destination Recommender even more than it is already enhanced by the success of this project. If we expand our data sources beyond what we currently have to include more travel websites, blogs, and social media platforms, we can achieve recommendation diversity. User feedback helps us tweak algorithms for accuracy— let's start using it. Multimedia data like images and videos can offer users visual insights into destinations; let's make use of this information to improve their experience when using our service. With chatbots being developed into conversational interfaces on the platform, travel planning will become more intuitive for users who frequent these places often. Hybrid recommendation approaches should be considered because they can leverage multiple techniques leading to better results which in turn will provide a smoother operation — even as data along with user requests grow exponentially each day — scalability and performance must not be compromised but rather enhanced further. And lastly, we can add sustainability and ethical considerations so that our promotion of tourism will also be seen as promoting responsible tourism by those who use our recommendations. 



# 5. References

### Websites and Tools:

- Tripadvisor. (n.d.). Retrieved from [https://www.tripadvisor.com/](https://www.tripadvisor.com/)
- Lonely Planet. (n.d.). Retrieved from [https://www.lonelyplanet.com/](https://www.lonelyplanet.com/)
- Yelp. (n.d.). Retrieved from [https://www.yelp.com/](https://www.yelp.com/)
- Booking.com. (n.d.). Retrieved from [https://www.booking.com/](https://www.booking.com/)
- Instant Data Scraper. (n.d.). Retrieved from [https://chrome.google.com/webstore/detail/instant-data-scraper/nmpgaoofmjlimabncmnmnopjabbflegg?hl=en](https://chrome.google.com/webstore/detail/instant-data-scraper/nmpgaoofmjlimabncmnmnopjabbflegg?hl=en)
- Octoparse. (n.d.). Retrieved from [https://www.octoparse.com/](https://www.octoparse.com/)

### Articles and Papers on Web Scraping and Recommender Systems:

- Gedikli, F., Jannach, D., & Ge, M. (2014). How should I explain? A comparison of different explanation types for recommender systems. *International Journal of Human-Computer Studies, 72(4)*, 367-382. [https://doi.org/10.1016/j.ijhcs.2013.12.007](https://doi.org/10.1016/j.ijhcs.2013.12.007)
- Silva, N. B., & Ribeiro, R. P. (2020). Transfer learning for recommender systems: A systematic review. *Expert Systems with Applications, 129*, 254-273. [https://doi.org/10.1016/j.eswa.2019.03.061](https://doi.org/10.1016/j.eswa.2019.03.061)
- Nyamsuren, B., & Kurniawan, K. N. (2020). Review of sentiment analysis: Case study of TripAdvisor and Yelp. *Journal of Information Technology and Digital World, 2(1)*, 1-12. [https://doi.org/10.36548/jitdw.2020.1.003](https://doi.org/10.36548/jitdw.2020.1.003)
-Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org/10.18653/v1/N19-1423
### Books:

- Russell, M. A. (2019). *Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Instagram, GitHub, and More*. O'Reilly Media.
- Aggarwal, C. C. (2016). *Recommender Systems: The Textbook*. Springer.

### Datasets:

- Tripadvisor Data. (n.d.). Retrieved from [https://www.tripadvisor.com/Data](https://www.tripadvisor.com/Data)
- Yelp Dataset. (n.d.). Retrieved from [https://www.yelp.com/dataset](https://www.yelp.com/dataset)

