# Project: **Travel Destination Recommender**


1. [Project Overview](#1-project-overview)
   - Introduction
   - Goals and Objectives
   - Significance of the Project
2. [Implementation Details](#2-implementation-details)
   - Data Collection
   - Data Preprocessing
   - Feature Extraction
   - Recommendation Engine Development
   - User Interface Design
3. [Analysis and Results](#3-analysis-and-results)
   - Data Analysis Techniques Used
   - Presentation of Results
   - Evaluation Metrics
4. [Reflection and Conclusion](#4-reflection-and-conclusion)
   - Challenges Faced
   - Lessons Learned
   - Overall Significance of Findings
   - Future Directions
5. [References](#5-references)
6. [Appendix](#6-appendix)
   - Code snippets (if applicable)
   - Visualizations


# 1.Project Overview

## 1.1 Introduction

Travel is an enriching experience that broadens our horizons, allowing us to immerse ourselves in new cultures, explore breathtaking landscapes, and create lasting memories. However, planning a trip can be a daunting task, with countless destinations, attractions, and activities to consider. This is where the Travel Destination Recommender comes into play, offering a personalized and seamless solution to help travelers discover their ideal vacation spot.

In today's digital age, an abundance of travel-related information is readily available on the internet, scattered across various websites and platforms. While this wealth of data can be overwhelming, it also presents an opportunity to leverage web scraping techniques to gather and curate relevant information from popular travel sites like TripAdvisor, Lonely Planet, and Yelp.

The Travel Destination Recommender harnesses the power of web scraping to collect and analyze data on attractions, hotels, restaurants, and local experiences, tailoring recommendations to individual preferences such as budget, interests, and travel dates. By consolidating and processing this vast array of information, the recommender system provides users with a comprehensive and personalized guide, streamlining the travel planning process and ensuring a memorable and fulfilling journey.

This innovative approach not only saves time and effort for travelers but also encourages the exploration of lesser-known destinations and hidden gems, fostering a more authentic and enriching travel experience. The Travel Destination Recommender represents a significant step forward in the realm of travel planning, empowering users to discover their dream destinations with ease and confidence.

## 1.2 Goals and Objectives


Here are some potential goals and objectives for our Travel Destination Recommender project:

**Goals:**

1. Develop a travel recommendation system that provides personalized suggestions for destinations, attractions, accommodations, and local experiences based on individual preferences.

2. Leverage web scraping techniques to gather and consolidate relevant data from popular travel websites, ensuring the recommender system has access to a vast and up-to-date database of travel-related information.

3. Encourage the exploration of lesser-known destinations and hidden gems, fostering a more authentic and enriching travel experience for users.

4. Continuously improve and refine the recommendation algorithms to provide increasingly accurate and relevant suggestions as the system accumulates more data and feedback.

**Objectives:**

1. Implement efficient web scraping methods to extract data from TripAdvisor, Lonely Planet, Yelp, and other relevant travel websites.

2. Develop a robust data processing pipeline to clean, structure, and integrate the collected data into a centralized database.

3. Design and implement a user-friendly interface that allows users to input their preferences, such as budget, interests, travel dates, and any specific requirements.

4. Develop personalized recommendation algorithms that leverage machine learning techniques to analyze user preferences and match them with suitable destinations, attractions, accommodations, and local experiences.

These goals and objectives outline the key aspects of developing a robust and effective Travel Destination Recommender system, leveraging web scraping techniques and personalized recommendation algorithms to provide a seamless and enriching travel planning experience for users.

## 1.3 Significance of the Project


The **Travel Destination Recommender** project holds significant importance and offers numerous benefits for both travelers and the travel industry. Here are some key points highlighting the significance of this project:

1. **Personalized Travel Experience:** The recommender system tailors suggestions to individual preferences, interests, and budgets, ensuring a highly personalized travel experience. By considering factors such as travel dates, desired activities, and accommodation preferences, the system can provide customized recommendations that cater to each user's unique needs and aspirations.

2. **Efficient Travel Planning:** Planning a trip can be a time-consuming and overwhelming process, requiring extensive research across multiple websites and sources. The Travel Destination Recommender streamlines this process by consolidating relevant information from various travel platforms, saving users valuable time and effort while minimizing the risk of overlooking important details.

3. **Discovery of Hidden Gems:** Popular tourist destinations and attractions are well-documented, but the recommender system has the potential to uncover lesser-known destinations, off-the-beaten-path experiences, and local gems. By analyzing data from diverse sources, the system can suggest unique and authentic travel experiences that might otherwise be missed, enhancing the overall travel experience.

4. **Data-Driven Insights:** By leveraging web scraping techniques and machine learning algorithms, the Travel Destination Recommender can uncover valuable insights and patterns from vast amounts of travel data. These insights can inform decision-making processes for travel agencies, tourism boards, and other stakeholders, enabling them to better understand consumer preferences and tailor their offerings accordingly.

5. **Promoting Sustainable Tourism:** The recommender system can be designed to prioritize sustainable and responsible travel options, highlighting destinations and activities that align with eco-friendly practices and support local communities. This can contribute to the growth of sustainable tourism and raise awareness about the importance of minimizing the environmental impact of travel.

6. **Enhancing User Engagement:** By providing a seamless and personalized travel planning experience, the Travel Destination Recommender can foster increased user engagement and loyalty. Users are more likely to return to a platform that consistently delivers relevant and valuable recommendations, creating opportunities for monetization and long-term growth.

7. **Industry Innovation:** The project represents an innovative approach to travel planning, leveraging cutting-edge technologies such as web scraping, machine learning, and personalized recommendation algorithms. Its success can inspire further advancements and innovations within the travel industry, driving progress and enhancing the overall travel experience for consumers.

The Travel Destination Recommender project has the potential to revolutionize the way people plan and experience their travels, offering a personalized, efficient, and sustainable approach to exploring the world. By harnessing the power of data and technology, this project can significantly contribute to the growth and evolution of the travel industry while enhancing the overall quality of travel experiences for users.


# 2. Implementation Details


## Data Collection


### **Step 1: Choosing Travel Websites**

- TripAdvisor (https://www.tripadvisor.com/)
- Lonely Planet (https://www.lonelyplanet.com/)
- Yelp (https://www.yelp.com/)
- Booking.com (https://www.booking.com/)

### TripAdvisor (https://www.tripadvisor.com/)
TripAdvisor stands out as an ideal choice due to its comprehensive coverage of travel information. Boasting one of the largest repositories of user-generated reviews and ratings, it offers valuable insights into travelers' experiences and preferences. Moreover, TripAdvisor's attraction rankings based on popularity and user ratings make it an invaluable resource for identifying top destinations. With its global coverage spanning diverse destinations worldwide, it caters to a broad spectrum of travel interests and preferences.

### Lonely Planet (https://www.lonelyplanet.com/)
Lonely Planet emerges as a prime candidate for scraping travel data owing to its reputation for expert recommendations and curated travel guides. Renowned for its commitment to promoting authentic and off-the-beaten-path experiences, Lonely Planet offers travelers reliable insights into unique adventures. Its guides are not only informative but also provide in-depth cultural insights and historical background, enriching travelers' understanding of destinations. Additionally, Lonely Planet's specialized guides cater to various interests, including food, adventure, and cultural exploration, ensuring relevance to diverse traveler preferences.

### Yelp (https://www.yelp.com/)
Yelp is an excellent choice for scraping travel data, particularly regarding restaurant reviews and local business information. With a focus on providing comprehensive coverage of local businesses, including attractions, shops, and services, Yelp offers valuable information for travelers. Its user-generated content allows travelers to access authentic reviews and recommendations from fellow travelers and locals alike. Furthermore, Yelp's location-based search feature facilitates the discovery of nearby attractions, restaurants, and activities, streamlining the trip planning process for travelers.

### Booking.com (https://www.booking.com/)
Booking.com emerges as a suitable option for scraping accommodation-related data, given its extensive coverage of hotel and accommodation listings. As a leading platform for hotel bookings, Booking.com offers a vast selection of properties worldwide, catering to various budgets and preferences. The platform's user reviews and ratings provide valuable insights into the quality and guest experiences of accommodations, enabling travelers to make informed decisions. Additionally, Booking.com's range of booking options, including flexible cancellation policies and last-minute deals, enhances its appeal to travelers seeking convenience and flexibility in their bookings.


### **Step 2: Identifying Data to Scrape**



### Attractions (sights, landmarks, museums, parks)
Scraping data on attractions entails gathering information about notable sights, landmarks, museums, and parks in various destinations. These attractions serve as key points of interest for travelers, offering cultural, historical, and recreational experiences. By scraping data on attractions, the recommender system can provide travelers with insights into must-visit locations, including details on opening hours, admission fees, visitor reviews, and notable features.

### Hotels (ratings, prices, amenities)
Hotels constitute a crucial aspect of travel planning, providing accommodations for travelers during their trips. Scraping data on hotels involves gathering information on their ratings, prices, and amenities. Ratings and reviews from previous guests offer valuable insights into the quality and service standards of hotels, helping travelers make informed decisions. Additionally, pricing information enables travelers to compare options based on their budget constraints, while details on amenities such as Wi-Fi, breakfast, and parking facilities enhance the overall user experience.

### Restaurants (ratings, cuisines, prices)
Restaurants play a vital role in enhancing the travel experience by offering diverse culinary experiences and flavors unique to each destination. Scraping data on restaurants involves gathering information on their ratings, cuisines, and prices. User-generated ratings and reviews provide valuable feedback on the quality of food, service, and ambiance, aiding travelers in selecting dining options that align with their preferences. Moreover, details on cuisines and price ranges allow travelers to explore a variety of dining options that suit their tastes and budget.

### Local Experiences (tours, activities, events)
Local experiences encompass a wide range of tours, activities, and events that offer immersive and authentic cultural experiences for travelers. Scraping data on local experiences involves gathering information on guided tours, outdoor activities, cultural events, and entertainment options available in each destination. By providing insights into these experiences, the recommender system enables travelers to discover and participate in activities that align with their interests and preferences, enhancing their overall travel experience.

### User Preferences (budget, interests, travel dates)
User preferences constitute the foundation of personalized travel recommendations, guiding the selection of destinations and experiences that best suit each traveler's needs and preferences. Scraping data on user preferences involves gathering information on budget constraints, travel interests, and travel dates. By understanding travelers' budget limitations, interests such as adventure, history, or culinary exploration, and travel dates, the recommender system can tailor recommendations to match individual preferences, ensuring a customized and enjoyable travel experience for each user.


### **Step 3: Set Up Your Environment**


### Instant Data Scraper Web Extension
For this project, the Instant Data Scraper web extension was utilized as an alternative to traditional web scraping libraries like BeautifulSoup and requests. Instant Data Scraper is a browser extension available for Chrome and Firefox, designed specifically for scraping data from web pages with ease. By installing the extension, users gain access to a user-friendly interface that allows them to select and extract data directly from websites without writing code. This approach simplifies the process of data extraction, making it accessible to individuals without extensive programming knowledge. Additionally, the extension provides features for exporting scraped data in various formats, such as CSV or JSON, for further analysis and processing.

### Octoparse
In addition to the Instant Data Scraper web extension, Octoparse was employed as a web scraping tool for this project. Octoparse is a powerful and versatile web scraping software that enables users to extract data from websites using a visual point-and-click interface. With Octoparse, users can create scraping tasks by simply navigating through web pages and selecting the data they want to extract. The software offers advanced features such as automatic IP rotation, data export scheduling, and cloud-based extraction for handling large-scale scraping projects efficiently. Furthermore, Octoparse provides support for scraping dynamic and JavaScript-rendered web pages, ensuring compatibility with a wide range of websites and data sources.


### **Step 4: Start Scraping**



Now that we've set up our environment using web scraping tools like Instant Data Scraper and Octoparse, we can begin scraping data from each selected website. Instead of writing scripts, we'll utilize the intuitive interfaces provided by these tools to extract data seamlessly.

### Using Instant Data Scraper:
1. Launch the Instant Data Scraper web extension in your browser.
2. Navigate to the desired website, such as TripAdvisor or Lonely Planet.
3. Use the extension's point-and-click interface to select the data you want to scrape, such as top attractions in a city.
4. Configure the scraping settings, including pagination handling and data export options.
5. Run the scraper to extract the selected data from the website.
6. Export the scraped data in a suitable format, such as CSV or JSON, for further analysis.

### Using Octoparse:
1. Launch the Octoparse software on your computer.
2. Create a new scraping task and enter the URL of the target website.
3. Use Octoparse's visual workflow builder to define the extraction steps, such as navigating to specific pages and selecting data elements.
4. Configure advanced settings, such as automatic IP rotation and data export scheduling, as needed.
5. Run the scraping task to extract data from the website.
6. Review and verify the extracted data within the Octoparse interface.
7. Export the scraped data to your preferred format for analysis and processing.

By leveraging these web scraping tools, we can efficiently gather data from multiple websites without the need for writing complex scripts. This approach streamlines the scraping process and allows us to focus on refining and analyzing the extracted data for our travel destination recommender.



### **Step 5: Refine Your Scripts**

- Refine scripts to extract specific data like hotel details, restaurant reviews, and activity information.
- Handle pagination for websites with multiple pages of results.
- Adapt scripts to website structure changes.


## Data Preprocessing


In [2]:
#| echo: true
#| code-fold: true
#| output: false
import pandas as pd
from bertopic import BERTopic
from nltk.corpus import stopwords
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from nltk.tokenize import word_tokenize
from bertopic import BERTopic
from nltk.corpus import stopwords
import string
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/emredeveci/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

### Must-See Attractions

#### Berlin

In [6]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/berlin_attractions - Sheet1.csv')
berlin.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lp-cms-production.imgix.net/2019-06/d6...,https://www.lonelyplanet.com/germany/berline/m...,Museumsinsel,Museumsinsel & Alexanderplatz,"Walk through ancient Babylon, meet an Egyptian...",must_see,Berlin
1,https://lp-cms-production.imgix.net/2021-08/sh...,https://www.lonelyplanet.com/germany/berline/m...,Neues Museum,Museumsinsel & Alexanderplatz,"For over 60 years, not a soul was able to visi...",must_see,Berlin
2,https://lp-cms-production.imgix.net/2019-06/d5...,https://www.lonelyplanet.com/germany/berline/m...,Pergamonmuseum,Museumsinsel & Alexanderplatz,The Pergamonmuseum is one of Berlin’s most vis...,must_see,Berlin
3,https://lp-cms-production.imgix.net/2021-08/sh...,https://www.lonelyplanet.com/germany/berlin/fr...,East Side Gallery,Friedrichshain,The East Side Gallery is the embodiment of Ber...,must_see,Berlin
4,https://lp-cms-production.imgix.net/2024-02/Ge...,https://www.lonelyplanet.com/germany/berline/m...,Fernsehturm,Museumsinsel & Alexanderplatz,"Germany's tallest structure, the TV Tower is a...",must_see,Berlin


#### Bremen

In [12]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/bremen_attractions - Sheet1.csv')
bremen.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Herrenhäuser Gärten,Hanover,Proof that Hanover is not all buttoned-down bu...,must_see,Lower Saxony & Bremen
1,https://lp-cms-production.imgix.net/2019-06/2a...,https://www.lonelyplanet.com/germany/bergen-be...,Gedenkstätte Bergen-Belsen,Lower Saxony & Bremen,The Nazi-built camp at Bergen-Belsen began its...,must_see,Lower Saxony & Bremen
2,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Schloss Marienburg,Lower Saxony & Bremen,"Perched grandly above the Leine River, the neo...",must_see,Lower Saxony & Bremen
3,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/bremen-ci...,Denkort Bunker Valentin,Bremen City,"In 1943, the Nazis started construction of a m...",must_see,Lower Saxony & Bremen
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Autostadt,Lower Saxony & Bremen,"A hit with car buffs of all ages, Autostadt is...",must_see,Lower Saxony & Bremen


#### Hamburg

In [13]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/hamburg_attractions - Sheet1.csv')
hamburg.head()


Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lp-cms-production.imgix.net/2019-06/fd...,https://www.lonelyplanet.com/germany/hamburg/a...,Mahnmal St-Nikolai,Altstadt,St Nikolai church was the world’s tallest buil...,must_see,Hamburg
1,https://lp-cms-production.imgix.net/2019-06/b7...,https://www.lonelyplanet.com/germany/hamburg/s...,Fischmarkt,St Pauli & Reeperbahn,Here's the perfect excuse to stay up all Satur...,must_see,Hamburg
2,https://lp-cms-production.imgix.net/2019-06/eb...,https://www.lonelyplanet.com/germany/hamburg/s...,Elbphilharmonie,Hamburg,Welcome to one of the most Europe's most excit...,must_see,Hamburg
3,https://lp-cms-production.imgix.net/2019-06/d7...,https://www.lonelyplanet.com/germany/hamburg/a...,Hamburger Kunsthalle,Altstadt,A treasure trove of art from the Renaissance t...,must_see,Hamburg
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/hamburg/a...,Rathaus,Altstadt,"With its spectacular coffered ceiling, Hamburg...",must_see,Hamburg


#### Munich

In [11]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false

# Load data from CSV file
munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/must-see-attractions/munich_attractions - Sheet1.csv')
munich.head()


Unnamed: 0,website1,website2,name,description,location,type,city
0,https://lp-cms-production.imgix.net/2019-06/c2...,https://www.lonelyplanet.com/germany/munich/ny...,Schloss Nymphenburg,This commanding palace and its lavish gardens ...,Munich,must_see,Munich
1,https://lp-cms-production.imgix.net/2019-06/ee...,https://www.lonelyplanet.com/germany/munich/al...,Residenzmuseum,Home to Bavaria's Wittelsbach rulers from 1508...,Munich,must_see,Munich
2,https://lp-cms-production.imgix.net/2019-06/3e...,https://www.lonelyplanet.com/germany/munich/ma...,Alte Pinakothek,Munich's main repository of Old European Maste...,Munich,must_see,Munich
3,https://lp-cms-production.imgix.net/2019-06/0d...,https://www.lonelyplanet.com/germany/munich/sc...,Englischer Garten,The sprawling English Garden is among Europe's...,Munich,must_see,Munich
4,https://lp-cms-production.imgix.net/2019-06/c9...,https://www.lonelyplanet.com/germany/munich/ma...,Pinakothek der Moderne,Germany's largest modern-art museum unites fou...,Munich,must_see,Munich


#### Combined dataset

In [17]:
#| echo: true
#| code-fold: true
#| panel: input
# Concatenate the three dataframes along the rows (axis=0)
germany_df = pd.concat([bremen, berlin, hamburg,munich], ignore_index=True)

# Save the combined dataframe to a new CSV file
germany_df.to_csv('combined_dataset.csv', index=False)
filtered_df = germany_df[~germany_df['type'].isin(['to_stay', 'to_eat'])]
filtered_df.head()



Unnamed: 0,website1,website2,name,location,description,type,city
0,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Herrenhäuser Gärten,Hanover,Proof that Hanover is not all buttoned-down bu...,must_see,Lower Saxony & Bremen
1,https://lp-cms-production.imgix.net/2019-06/2a...,https://www.lonelyplanet.com/germany/bergen-be...,Gedenkstätte Bergen-Belsen,Lower Saxony & Bremen,The Nazi-built camp at Bergen-Belsen began its...,must_see,Lower Saxony & Bremen
2,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Schloss Marienburg,Lower Saxony & Bremen,"Perched grandly above the Leine River, the neo...",must_see,Lower Saxony & Bremen
3,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/bremen-ci...,Denkort Bunker Valentin,Bremen City,"In 1943, the Nazis started construction of a m...",must_see,Lower Saxony & Bremen
4,https://lonelyplanetstatic.imgix.net/marketing...,https://www.lonelyplanet.com/germany/lower-sax...,Autostadt,Lower Saxony & Bremen,"A hit with car buffs of all ages, Autostadt is...",must_see,Lower Saxony & Bremen


### Tours

#### Tours in Bremen

In [20]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/bremen_tour - Sheet1.csv')
tour_bremen.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,e-Scavenger hunt Bremen: Explore the city at y...,4.0 of 5 bubbles,15.0,Audio Guides,2–4 hours,Get to know Bremen in a unique and affordable ...,$34,per group,bremen
1,https://www.tripadvisor.com/AttractionProductR...,Bremen : Private Walking Tour With A Tour Guid...,3.0 of 5 bubbles,2.0,Historical Tours,2–6 hours,Get to know the city through the eyes of a loc...,$52,per adult,bremen
2,https://www.tripadvisor.com/AttractionProductR...,Bremen Schnoor Area Tour,5.0 of 5 bubbles,5.0,Historical Tours,1 hour,Explore with us the Schnoor area – a neighbour...,$24,per adult,bremen
3,https://www.tripadvisor.com/AttractionProductR...,Bremen Private Walking Tour With A Professiona...,5.0 of 5 bubbles,1.0,Walking Tours,1–2 hours,Meetingpoint: In front of the Town Hall or Mee...,$224,per group,bremen
4,https://www.tripadvisor.com/AttractionProductR...,Bremen - Private Historic Walking Tour,5.0 of 5 bubbles,1.0,Historical Tours,1–2 hours,"Discover the city of Bremen, a major cultural ...",$321,per group,bremen


#### Tours in Berlin

In [21]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/berlin_tour - Sheet1.csv')
tour_berlin.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,Discover Berlin Half-Day Walking Tour,5.0 of 5 bubbles,4191,Historical Tours,3–4 hours,See many of Berlin's most important landmarks ...,$22,per adult,berlin
1,https://www.tripadvisor.com/AttractionProductR...,River Cruise with Tour Guide in Berlin. Hadynski,4.5 of 5 bubbles,36,On the Water,1 hour,Enjoy our 1 hour river cruise through the old ...,$21,per adult,berlin
2,https://www.tripadvisor.com/AttractionProductR...,Berlin Third Reich and Cold War 2-Hour Walking...,5.0 of 5 bubbles,124,Historical Tours,2 hours,Learn the tumultuous contemporary history of B...,$22,per adult,berlin
3,https://www.tripadvisor.com/AttractionProductR...,Big Bus Berlin Hop-On Hop-Off Sightseeing Tour,4.0 of 5 bubbles,404,Audio Guides,2 hours,Enjoy this perfect introduction to Berlin on a...,$27,per adult,berlin
4,https://www.tripadvisor.com/AttractionProductR...,Berlin Food Walking Tour With Secret Food Tours,5.0 of 5 bubbles,477,Food & Drink,3 hours,"With so much great food in East Berlin, it can...",$105,per adult,berlin


#### Tours in Hamburg

In [22]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/hamburg_tour - Sheet1.csv')
tour_hamburg.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0 of 5 bubbles,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0 of 5 bubbles,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0 of 5 bubbles,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0 of 5 bubbles,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5 of 5 bubbles,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


#### Tours in Munich

In [23]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
tour_munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/tours/munich_tour - Sheet1.csv')
tour_munich.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof VIP All-In...,4.5 of 5 bubbles,445,Sightseeing Packages,6+ hours,Leave Munich for a full-day tour to two royal ...,$215,per adult,munich
1,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5 of 5 bubbles,242,Historical Tours,6+ hours,"Drive with an airconditioned, comfortable coac...",$76,per adult,munich
2,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5 of 5 bubbles,1381,Historical Tours,6+ hours,"Bavaria is famous for its fairy-tale castles, ...",$80,per adult,munich
3,,,,How to navigate Munich’s museums like a pro,"Soccer fans, art lovers, and design nerds have...",Read now,,,,munich
4,https://www.tripadvisor.com/AttractionProductR...,Dachau Concentration Camp Memorial Site Tour f...,5.0 of 5 bubbles,1313,Historical Tours,5 hours,Visiting Germany’s Dachau Concentration Camp M...,$52,per adult,munich


#### Combined Tours Dataset

In [32]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Concatenate the three dataframes along the rows (axis=0)
all_tours = pd.concat([tour_hamburg, tour_berlin, tour_bremen,tour_munich], ignore_index=True)

# Save the combined dataframe to a new CSV file
all_tours.to_csv('combined_dataset.csv', index=False)
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0 of 5 bubbles,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0 of 5 bubbles,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0 of 5 bubbles,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0 of 5 bubbles,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5 of 5 bubbles,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [25]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
missing_values = all_tours.isnull().sum()
missing_values

website      18
name         18
score       929
reviews     908
type          0
duration      2
comment1     28
price        22
group        18
city          0
dtype: int64

In [26]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Drop rows where 'website' is NaN
all_tours = all_tours.dropna(subset=['website'])


In [27]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now, let's convert 'score' to float
def extract_score(score_str):
    try:
        return float(score_str.split()[0])  # Take the first part of the string, convert to float
    except:
        return None

# Apply the function to 'score' column
all_tours['score'] = all_tours['score'].apply(extract_score)

In [28]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [29]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
def convert_to_float(value):
    try:
        return float(value)
    except ValueError:
        return None

In [30]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours['reviews'] = all_tours['reviews'].apply(convert_to_float)



In [31]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false

all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0,784.0,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,$3,per adult,hamburg
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0,178.0,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,$264,per group,hamburg
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0,63.0,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,$22,per adult,hamburg
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0,187.0,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,$31,per adult,hamburg
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5,21.0,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,$38,per adult,hamburg


In [33]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now, let's convert 'price' to float, handling non-numeric values
def convert_price_to_float(price_str):
    try:
        # Remove non-numeric characters and convert to float
        return float(''.join(filter(str.isdigit, str(price_str))))
    except ValueError:
        return None

In [34]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Apply the function to 'price' column
all_tours['price'] = all_tours['price'].apply(convert_price_to_float)

In [35]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Create a new column 'price_category' based on 'price' ranges
def categorize_price(price):
    if price <= 50:
        return 'Low'
    elif 50 < price <= 300:
        return 'Moderate'
    else:
        return 'High'


In [36]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Apply the function to create the 'price_category' column
all_tours['price_category'] = all_tours['price'].apply(categorize_price)

# Now the "price_category" column will have these categories
# You can then analyze or plot based on these categories
price_counts = all_tours['price_category'].value_counts()

In [38]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,5.0 of 5 bubbles,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,3.0,per adult,hamburg,Low
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,5.0 of 5 bubbles,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,264.0,per group,hamburg,Moderate
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,4.0 of 5 bubbles,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,22.0,per adult,hamburg,Low
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,4.0 of 5 bubbles,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,31.0,per adult,hamburg,Low
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,3.5 of 5 bubbles,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,38.0,per adult,hamburg,Low


In [39]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true

print(price_counts)


price_category
Moderate    788
High        570
Low         342
Name: count, dtype: int64


### Restaurant Data

In [41]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
filtered_rest_df = germany_df[germany_df['type'].isin(['to_eat'])]
filtered_rest_df.head()

Unnamed: 0,website1,website2,name,location,description,type,city
115,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Al-Dar,Hanover,This popular Syrian restaurant a stone's throw...,to_eat,Lower Saxony & Bremen
116,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Vietal Kitchen,Hanover,Pale-green louvres and bamboo lanterns set a F...,to_eat,Lower Saxony & Bremen
117,https://www.lonelyplanet.com/germany/hildeshei...,https://www.lonelyplanet.com/germany/hildeshei...,Schlegels Weinstuben,Lower Saxony & Bremen,"The lopsided walls of this rose-covered, 500-y...",to_eat,Lower Saxony & Bremen
118,https://www.lonelyplanet.com/germany/norderney...,https://www.lonelyplanet.com/germany/norderney...,Seesteg,East Frisian Islands,Nordeney's Michelin-starred restaurant offers ...,to_eat,Lower Saxony & Bremen
119,https://www.lonelyplanet.com/germany/lower-sax...,https://www.lonelyplanet.com/germany/lower-sax...,Basil,Hanover,These former stables to the north of town now ...,to_eat,Lower Saxony & Bremen


### Hotel Dataset

In [181]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_berlin = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_berlin - Sheet1.csv')
hotel_berlin['city'] = 'berlin'

hotel_berlin.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/the-social-hu...,The Social Hub Berlin Alexanderplatz,"Mitte, Berlin",Center: 2.8 km,8.2,"7,473 reviews",Executive Queen Room,1 double bed,1 night,€80,berlin
1,https://www.booking.com/hotel/de/innside-by-me...,INNSiDE by Meliá Berlin Mitte,"Mitte, Berlin",Center: 1.9 km,8.3,Sustainability certification,"6,031 reviews",The INNSiDE Room,1 night,€93,berlin
2,https://www.booking.com/hotel/de/motel-one-ber...,Motel One Berlin Spittelmarkt,"Mitte, Berlin",Center: 1.7 km,8.7,"4,667 reviews",Queen room,1 double bed,1 night,€93,berlin
3,https://www.booking.com/hotel/de/titanic-chaus...,TITANIC Chaussee Berlin,"Mitte, Berlin",Center: 1.8 km,8.2,"11,604 reviews",Classic room,Several types of beds,1 night,€96,berlin
4,https://www.booking.com/hotel/de/motel-one-ber...,Motel One Berlin-Alexanderplatz,"Mitte, Berlin",Center: 2.4 km,8.6,"12,307 reviews",Queen room,1 double bed,1 night,€104,berlin


In [182]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_bremen = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_bremen - Sheet1.csv')
hotel_bremen['city'] = 'bremen'

hotel_bremen.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/townside-host...,Townside Hostel Bremen,"Mitte, Bremen",1.1 km from centre,7.7,469 reviews,Bed in 10-Bed Mixed Dormitory Room,1 bunk bed,"1 night, 1 adult",€ 23,bremen
1,https://www.booking.com/hotel/de/moxy-bremen.e...,Moxy Bremen,"Walle, Bremen",1.9 km from centre,8.3,"2,610 reviews",MOXY Sleeper Queen,1 large double bed,"1 night, 1 adult",€ 135,bremen
2,https://www.booking.com/hotel/de/am-werdersee....,Apartments Am Werdersee,"Neustadt, Bremen",1.8 km from centre,7.5,"1,324 reviews",Single Room with Shared Bathroom,"4 beds (2 singles, 1 double, 1 extra-large dou...","1 night, 1 adult",€ 47,bremen
3,https://www.booking.com/hotel/de/pension-isabe...,Pension Isabel I,"Neustadt, Bremen",0.8 km from centre,7.9,480 reviews,Single Room with Shared Bathroom,1 single bed,"1 night, 1 adult",€ 46,bremen
4,https://www.booking.com/hotel/de/hastedter-hee...,Rana's Zimmervermittlung,"Hemelingen, Bremen",4.3 km from centre,6.7,519 reviews,Apartment,2 single beds,"1 night, 1 adult",€ 30,bremen


In [183]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_hamburg = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_hamburg - Sheet1.csv')
hotel_hamburg['city'] = 'hamburg'

hotel_hamburg.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/apartment040....,Apartment040,"Uhlenhorst, Hamburg",2.8 km from centre,8.7,"2,623 reviews",Superior Studio,1 double bed,"1 night, 1 adult",€ 292,hamburg
1,https://www.booking.com/hotel/de/cab20.en-gb.h...,CAB20,"St. Georg, Hamburg",1.1 km from centre,8.5,"15,506 reviews",Single Cabin with Shared Bathroom,1 single bed,"1 night, 1 adult",€ 178,hamburg
2,https://www.booking.com/hotel/de/hood-house.en...,hood house,"Winterhude, Hamburg",4 km from centre,8.6,"3,309 reviews",Cozyhood+,1 large double bed,"1 night, 1 adult",€ 379,hamburg
3,https://www.booking.com/hotel/de/apartmenthote...,Apartment-Hotel Hamburg Mitte,Hamburg,3.4 km from centre,8.2,"8,221 reviews",Junior Suite with Balcony,"2 beds (1 extra-large double, 1 sofa bed)","1 night, 1 adult",€ 454,hamburg
4,https://www.booking.com/hotel/de/hampton-by-hi...,Hampton By Hilton Hamburg City Centre,"Hammerbrook, Hamburg",1.1 km from centre,7.9,"9,997 reviews",Queen Room with Sofa Bed,"2 beds (1 sofa bed, 1 large double)","1 night, 1 adult",€ 222,hamburg


In [184]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Load data from CSV file
hotel_munich = pd.read_csv('/Users/emredeveci/Desktop/TDR/hotels/hotel_munich - Sheet1.csv')
# Add a new column with all rows containing "munich"
hotel_munich['city'] = 'munich'
hotel_munich.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317 reviews,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",€ 265,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,"1,538 reviews",2,1 single bed,"1 night, 1 adult",€ 52,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,"4,883 reviews",2,1 single bed,"1 night, 1 adult",€ 122,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,"6,591 reviews",2,1 double bed,"1 night, 1 adult",€ 79,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,"1,763 reviews",2,1 double bed,"1 night, 1 adult",€ 176,munich


In [199]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Concatenate the three dataframes along the rows (axis=0)
combined_hotel = pd.concat([hotel_munich, hotel_hamburg, hotel_berlin,hotel_bremen], ignore_index=True)

# Save the combined dataframe to a new CSV file
combined_hotel.to_csv('combined_hotel.csv', index=False)
combined_hotel.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317 reviews,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",€ 265,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,"1,538 reviews",2,1 single bed,"1 night, 1 adult",€ 52,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,"4,883 reviews",2,1 single bed,"1 night, 1 adult",€ 122,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,"6,591 reviews",2,1 double bed,"1 night, 1 adult",€ 79,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,"1,763 reviews",2,1 double bed,"1 night, 1 adult",€ 176,munich


In [206]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
combined_hotel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1225 entries, 0 to 1224
Data columns (total 11 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   website   1225 non-null   object 
 1   name      1225 non-null   object 
 2   location  1225 non-null   object 
 3   distance  1225 non-null   object 
 4   score     1209 non-null   float64
 5   reviews   1205 non-null   float64
 6   type      1225 non-null   object 
 7   type2     1221 non-null   object 
 8   day       1220 non-null   object 
 9   price     1225 non-null   float64
 10  city      1225 non-null   object 
dtypes: float64(3), object(8)
memory usage: 105.4+ KB


In [201]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
print(combined_hotel['score'].unique())


[8.3 8.2 8.4 8.1 6.3 8.9 7.8 7.9 7.5 7.2 8.7 7.7 7.3 8.5 8.0 8.6 7.1 7.6
 6.8 7.0 7.4 9.1 6.7 8.8 6.2 9.3 5.3 6.4 6.9 6.6 9.0 '8.7' '8.5' '8.6'
 '8.2' '7.9' '7.8' '6.9' '8.3' '9.3' '7.7' '6.8' '8.8' '8.1' '7.6' '8'
 '8.4' '7.5' '7.3' '6.7' '7.1' '9.1' '6.2' '7.4' '7.2' '9' '6.3' '5.7'
 '9.4' '6.4' '6.6' '6' '5.2' '5.4' '7' '9.6' '5' '3.2' '6.5' '5.3' '5.1'
 '6.1' '5.9' nan '5.8' '1.5' '3.7' '9.2' 'Exceptional 10' '9.7' 9.2 6.1
 6.5 '3' '4.9' '5.5' '2.9' '9.8' '10' '5.6' '4.4' '3.6' '4.1' '1'
 'Exceptional 10.0' '8.9']


In [207]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
print(combined_hotel['reviews'].unique())


[3.1700e+02 1.5380e+03 4.8830e+03 6.5910e+03 1.7630e+03 1.0645e+04
 1.7940e+03 1.6206e+04 6.1900e+02 8.1670e+03 3.8790e+03 4.4550e+03
 1.2460e+03 8.8480e+03 1.0140e+03 1.9900e+03 3.3490e+03 4.9240e+03
 3.5450e+03 4.1270e+03 5.2440e+03 2.5020e+03 4.7730e+03 1.7430e+03
 2.9420e+03 1.8010e+03 1.3160e+03 2.6017e+04 1.2410e+03 1.1710e+03
 7.0300e+02 2.7340e+03 3.4180e+03 4.0620e+03 2.5980e+03 1.2416e+04
 2.7700e+03 9.8200e+02 3.9370e+03 4.1050e+03 1.0180e+03 7.3330e+03
 1.5410e+03 1.3790e+03 3.6630e+03 3.2320e+03 1.3768e+04 8.1700e+03
 3.0560e+03 4.0600e+03 2.9950e+03 2.9300e+03 3.3990e+03 1.5260e+03
 1.3220e+03 6.8490e+03 4.0490e+03 2.1000e+03 4.4280e+03 4.6660e+03
 6.2800e+02 3.4360e+03 1.3660e+03 1.3130e+03 4.2850e+03 2.3320e+03
 9.9100e+02 5.8470e+03 4.5980e+03 7.4000e+01 1.8790e+03 4.0660e+03
 2.2100e+03 5.5650e+03 4.1380e+03 5.6700e+03 7.8000e+02 5.1280e+03
 1.6490e+03 3.6950e+03 5.6780e+03 1.5920e+03 2.1210e+03 2.2140e+03
 2.4560e+03 4.0250e+03 3.4300e+02 3.6770e+03 2.7770e+03 5.6700

In [208]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
import re

# Define a function to extract numeric values from strings
def extract_numeric_reviews(reviews):
    try:
        # Use regular expression to extract numeric values
        numeric_reviews = re.sub(r'[^0-9]', '', reviews)
        return int(numeric_reviews)
    except (ValueError, TypeError):
        # Handle exceptions
        return None

# Apply the function to the 'reviews' column
combined_hotel['reviews'] = combined_hotel['reviews'].apply(extract_numeric_reviews)


In [202]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
import numpy as np

# Define a function to convert string representations of numbers to float
def convert_to_float(score):
    try:
        return float(score)
    except (ValueError, TypeError):
        # Handle exceptions such as 'Exceptional 10' and 'nan'
        return np.nan

# Apply the function to the 'score' column
combined_hotel['score'] = combined_hotel['score'].apply(convert_to_float)


In [203]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Remove '€' symbol and comma, then convert 'price' column to float
combined_hotel['price'] = combined_hotel['price'].str.replace('€', '').str.replace(',', '').astype(float)


In [None]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true

combined_hotel.head()

Unnamed: 0,website,name,location,distance,score,reviews,type,type2,day,price,city
0,https://www.booking.com/hotel/de/hapimag-resor...,Hapimag Ferienwohnungen München,"Ludwigsvorstadt, Munich",Show on map,8.3,317 reviews,2,"3 beds (2 sofa beds, 1 large double)","1 night, 1 adult",265.0,munich
1,https://www.booking.com/hotel/de/ausbildungsho...,Ausbildungshotel St. Theresia,"Neuhausen - Nymphenburg, Munich",Show on map,8.3,"1,538 reviews",2,1 single bed,"1 night, 1 adult",52.0,munich
2,https://www.booking.com/hotel/de/edenwolff.en-...,Eden Hotel Wolff,"Maxvorstadt, Munich",Show on map,8.2,"4,883 reviews",2,1 single bed,"1 night, 1 adult",122.0,munich
3,https://www.booking.com/hotel/de/motel-one-mun...,Motel One München-Campus,"Obergiesing - Fasangarten, Munich",Show on map,8.4,"6,591 reviews",2,1 double bed,"1 night, 1 adult",79.0,munich
4,https://www.booking.com/hotel/de/hotelpreysing...,JAMS Music Hotel Munich,"Au-Haidhausen, Munich",Show on map,8.3,"1,763 reviews",2,1 double bed,"1 night, 1 adult",176.0,munich


In [209]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Count the number of prices in different ranges
under_100_count = (combined_hotel['price'] < 100).sum()
between_100_500_count = ((combined_hotel['price'] >= 100) & (combined_hotel['price'] <= 300)).sum()
over_500_count = (combined_hotel['price'] > 500).sum()

# Print the counts
print("Number of hotels:")
print(f"- Under 100 euro: {under_100_count}")
print(f"- Between 100 and 500 euro: {between_100_500_count}")
print(f"- Over 500 euro: {over_500_count}")

Number of hotels:
- Under 100 euro: 416
- Between 100 and 500 euro: 751
- Over 500 euro: 8


In [213]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Group the DataFrame by the 'city' column and count the number of hotels under 100 euros in each city
under_100_count_per_city = combined_hotel[combined_hotel['price'] < 100].groupby('city').size()

# Print the counts for each city
print("Number of hotels under 100 euro in each city:")
print(under_100_count_per_city)


Number of hotels under 100 euro in each city:
city
berlin      97
bremen      60
hamburg     29
munich     230
dtype: int64


## Feature Extraction


## Recommendation Engine Development


### City Recommendation Engine

In [221]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Preprocess the text data
def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove punctuation and lowercase the tokens
    table = str.maketrans('', '', string.punctuation)
    tokens = [w.translate(table).lower() for w in tokens]
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]
    # Join the tokens back into a string
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

# Apply preprocessing to the "description" column
filtered_df['preprocessed_description'] = filtered_df['description'].apply(preprocess_text)

# Initialize BERTopic model
topic_model = BERTopic()

# Fit the model on preprocessed descriptions
topics, _ = topic_model.fit_transform(filtered_df['preprocessed_description'])
filtered_df['topic'] = topics

# Print the topics and associated descriptions
print("Topics and Associated Descriptions:")
for topic_id in filtered_df['topic'].unique():
    topic_description = filtered_df[filtered_df['topic'] == topic_id]['description'].iloc[0]
    print(f"Topic {topic_id}: {topic_description}")

Topics and Associated Descriptions:
Topic 8: Proof that Hanover is not all buttoned-down business are the grandiose Baroque Royal Gardens of Herrenhausen, about 5km north of the city centre, which…
Topic 7: The Nazi-built camp at Bergen-Belsen began its existence in 1940 as a POW camp, but became a concentration camp after being taken over by the SS in 1943,…
Topic -1: A hit with car buffs of all ages, Autostadt is a celebration of all things automobile, spread across 25 hectares. A visit to this theme park and museum…
Topic 2: The Sprengel Museum is held in extremely high esteem, both for the design of the building as well as for the art housed inside. Its huge interior spaces…
Topic 10: The charming medieval coopers lane was transformed into a prime example of mostly expressionist architecture in the 1920s at the instigation of coffee…
Topic 15: This shiny, space-age museum offers a journey around the world along the longitudinal meridian 8° east, through climate zones in Switzerland

In [215]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,472,-1_berlin_one_the_museum,"[berlin, one, the, museum, bar, st, germany, m...",[make historical sense st pauli museum excell...
1,0,85,0_club_dance_djs_jazz,"[club, dance, djs, jazz, rock, music, hiphop, ...",[one berlin oldest live music clubs quasimodo...
2,1,73,1_bar_cocktail_wine_lit,"[bar, cocktail, wine, lit, drinks, beer, cafe,...",[bad boy bar cafe day breakfast 2pm homemad...
3,2,53,2_art_artist_artists_gallery,"[art, artist, artists, gallery, collection, wo...",[small elegant exhibition hall deutsche bank...
4,3,41,3_garden_beer_park_munich,"[garden, beer, park, munich, botanical, altsta...",[one hard ignore english garden location pedig...
5,4,40,4_museum_history_greek_lion,"[museum, history, greek, lion, collection, mid...",[museum chronicles area ’ history middle ages ...
6,5,37,5_women_label_fashion_men,"[women, label, fashion, men, store, designs, s...",[berlinbased uvr label designs urban fashions ...
7,6,34,6_memorial_nazi_victims_cemetery,"[memorial, nazi, victims, cemetery, nazis, jew...",[heart treptower park gargantuan soviet war m...
8,7,24,7_coffee_cafe_cakes_roasted,"[coffee, cafe, cakes, roasted, beans, java, co...",[pioneers thirdwave coffee berlin yumi kiduk ...
9,8,22,8_rulers_built_baroque_palace,"[rulers, built, baroque, palace, prussian, cou...",[18thcentury country estate frilly neorenaiss...


In [216]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
topic_model.visualize_topics()


In [217]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Merge topics with the original dataframe
merged_df = pd.merge(filtered_df, filtered_df.groupby('location')['topic'].apply(list).reset_index(name='topics'), on='location')

# Group by city and aggregate the topics
city_topics = merged_df.groupby('location')['topics'].agg(lambda x: [item for sublist in x for item in sublist]).reset_index()

# Print cities and their associated topics
print("Cities and Associated Topics:")
for index, row in city_topics.iterrows():
    print(f"{row['location']}: {row['topics']}")

Cities and Associated Topics:
Altstadt: [-1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 23, -1, 11, -1, -1, -1, 16, -1, 7, 15, 2, 2, -1, -1, 3, 7, 1, 0, 

In [218]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now let's visualize these cities and their associated topics on a scatterplot
import plotly.express as px

# Create a DataFrame for the scatterplot
scatter_df = pd.DataFrame({
    'location': city_topics['location'],
    'Topic': city_topics['topics']
})

# Reshape the DataFrame so that each topic gets its own row
scatter_df = scatter_df.explode('Topic').reset_index(drop=True)

# Create the scatterplot
fig = px.scatter(scatter_df, x='location', y='Topic', color='Topic',
                 title='Cities and Associated Topics',
                 labels={'Citlocationy': 'location', 'Topic': 'Topic'},
                 width=800, height=600)

# Update layout for better visualization
fig.update_traces(marker=dict(size=12, opacity=0.8),
                  selector=dict(mode='markers'))

# Show the plot
fig.show()


In [223]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Filter the DataFrame for Topic 3
topic_3_cities = scatter_df[scatter_df['Topic'] == 3]

# Count the occurrences of each city for Topic 3
city_counts = topic_3_cities['location'].value_counts()

# Get the city with the highest count
most_common_city = city_counts.idxmax()
count_of_most_common_city = city_counts.max()

print(f"The most relevant city for Topic 3 is {most_common_city} with a count of {count_of_most_common_city}.")

The most relevant city for Topic 3 is Munich with a count of 3096.


In [222]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
for topic_num in range(-1, 24):
    # Filter the DataFrame for the current topic
    topic_cities = scatter_df[scatter_df['Topic'] == topic_num]

    # Check if there are any entries for this topic
    if not topic_cities.empty:
        # Count the occurrences of each city for the current topic
        city_counts = topic_cities['location'].value_counts()

        # Get the city with the highest count
        most_common_city = city_counts.idxmax()
        count_of_most_common_city = city_counts.max()

        print(f"The most relevant city for Topic {topic_num} is {most_common_city} with a count of {count_of_most_common_city}.")
    else:
        print(f"No data for Topic {topic_num}.")

The most relevant city for Topic -1 is Munich with a count of 13588.
The most relevant city for Topic 0 is Munich with a count of 2408.
The most relevant city for Topic 1 is Berlin with a count of 2132.
The most relevant city for Topic 2 is Berlin with a count of 1312.
The most relevant city for Topic 3 is Munich with a count of 3096.
The most relevant city for Topic 4 is Munich with a count of 1720.
The most relevant city for Topic 5 is Berlin with a count of 1968.
The most relevant city for Topic 6 is Berlin with a count of 1804.
The most relevant city for Topic 7 is Munich with a count of 688.
The most relevant city for Topic 8 is Munich with a count of 1376.
The most relevant city for Topic 9 is Munich with a count of 860.
The most relevant city for Topic 10 is Berlin with a count of 820.
The most relevant city for Topic 11 is Munich with a count of 1376.
The most relevant city for Topic 12 is Berlin with a count of 1312.
The most relevant city for Topic 13 is Hamburg with a count 

### Tour Recommender Engine

In [233]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Filter rows where the city is "munich"
city_tours = all_tours[all_tours['city'] == 'munich']

# Display the first few rows of the new dataset
city_tours.head()


Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
1045,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof VIP All-In...,4.5,445,Sightseeing Packages,6+ hours,Leave Munich for a full-day tour to two royal ...,215.0,per adult,munich,Moderate
1046,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5,242,Historical Tours,6+ hours,"Drive with an airconditioned, comfortable coac...",76.0,per adult,munich,Moderate
1047,https://www.tripadvisor.com/AttractionProductR...,Neuschwanstein Castle and Linderhof Palace Day...,4.5,1381,Historical Tours,6+ hours,"Bavaria is famous for its fairy-tale castles, ...",80.0,per adult,munich,Moderate
1048,,,,How to navigate Munich’s museums like a pro,"Soccer fans, art lovers, and design nerds have...",Read now,,,,munich,High
1049,https://www.tripadvisor.com/AttractionProductR...,Dachau Concentration Camp Memorial Site Tour f...,5.0,1313,Historical Tours,5 hours,Visiting Germany’s Dachau Concentration Camp M...,52.0,per adult,munich,Moderate


In [234]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
def recommend_tours(budget):
    if budget <= 1000:
        # Filter tours in the low price category
        low_price_tours = city_tours[city_tours['price'] <= 50]
        
        # Get top 5 tours with the highest score in the low price category
        top_low_price_tours = low_price_tours.nlargest(5, 'score')
        
        return top_low_price_tours
    
    elif 1000 < budget <= 10000:
        # Filter tours in the moderate price category
        moderate_price_tours = city_tours[(city_tours['price'] > 50) & (city_tours['price'] <= 300)]
        
        # Get top 5 tours with the highest score in the moderate price category
        top_moderate_price_tours = moderate_price_tours.nlargest(5, 'score')
        
        return top_moderate_price_tours
    
    else:
        # Filter tours in the high price category
        high_price_tours = city_tours[city_tours['price'] > 10000]
        
        # Get top 5 tours with the highest score in the high price category
        top_high_price_tours = high_price_tours.nlargest(5, 'score')
        
        return top_high_price_tours


In [235]:
#| echo: true
#| code-fold: true
#| panel: input
# Now, let's convert 'score' to float
def extract_score(score_str):
    try:
        return float(score_str.split()[0])  # Take the first part of the string, convert to float
    except:
        return None

# Apply the function to 'score' column
all_tours['score'] = all_tours['score'].apply(extract_score)

In [236]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
all_tours.head()

Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
0,https://www.tripadvisor.com/AttractionProductR...,The Local Tour of Hamburg Historic Centre,,784,Historical Tours,2 hours,Pay-What-You-Want tour with booking fee!\n\nWi...,3.0,per adult,hamburg,Low
1,https://www.tripadvisor.com/AttractionProductR...,Private Small-Group Hamburg City Tour with a L...,,178,Historical Tours,3 hours,Welcome to our beloved Hamburg.\nI´m a former ...,264.0,per group,hamburg,Moderate
2,https://www.tripadvisor.com/AttractionProductR...,Hop-on hop-off on the water with the Maritime ...,,63,Day Cruises,1–2 hours,A sightseeing harbor cruise with the Maritime ...,22.0,per adult,hamburg,Low
3,https://www.tripadvisor.com/AttractionProductR...,Hamburg Dungeon Admission Ticket,,187,Historical Tours,1–2 hours,Experience 600 years of dark history - if you ...,31.0,per adult,hamburg,Low
4,https://www.tripadvisor.com/AttractionProductR...,Hamburg 2-hour harbor tour on the beautiful Elbe,,21,Historical Tours,2 hours,We offer an extensive Hamburg XXL harbor tour ...,38.0,per adult,hamburg,Low


In [237]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
# Example usage
budget = 200  # User's budget in dollars
top_tours = recommend_tours(budget)

print("Top 5 Tours based on Budget:")
top_tours.head()

Top 5 Tours based on Budget:


Unnamed: 0,website,name,score,reviews,type,duration,comment1,price,group,city,price_category
1051,https://www.tripadvisor.com/AttractionProductR...,Third Reich Walking Tour Munich,5.0,1516,Historical Tours,2–3 hours,As birthplace of the Nazi Party and home of it...,35.0,per adult,munich,Low
1061,https://www.tripadvisor.com/AttractionProductR...,Munich Sightseeing Guided Bike Tour,5.0,527,Historical Tours,3–4 hours,See more of Munich in a short amount of time t...,44.0,per adult,munich,Low
1074,https://www.tripadvisor.com/AttractionProductR...,Full-Day Dachau Concentration Camp Memorial Si...,5.0,285,Public Transportation Tours,6 hours,Dachau Concentration Camp Memorial Site is a p...,50.0,per adult,munich,Low
1078,https://www.tripadvisor.com/AttractionProductR...,Dachau Small-Group Half-Day Tour from Munich B...,5.0,224,Historical Tours,5 hours,Pay respects to the innocent victims of Hitler...,50.0,per adult,munich,Low
1083,https://www.tripadvisor.com/AttractionProductR...,Dachau Tour from Munich,5.0,442,Historical Tours,5 hours,All of our guides are officially authorized to...,50.0,per adult,munich,Low


### To-do Recommendation Engine

In [238]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Assuming the most relevant city for Topic 3 is "Altstadt"
most_common_city = "Munich"

# Filter the DataFrame for the most relevant city
city_df = filtered_df[filtered_df['location'] == most_common_city]


In [239]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Sort the rows based on their relevance to Topic 3
topic_3_sorted = city_df[city_df['topic'] == 4].sort_values(by='topic', ascending=False)

# Get the top 5 things to do in the city for Topic 3
top_5_things_to_do = topic_3_sorted.head(5)[['website1', 'name', 'location', 'description', 'type', 'city']]


In [240]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
print(f"The best 5 things to do in {most_common_city} for Topic 3 are:")
top_5_things_to_do.head()

The best 5 things to do in Munich for Topic 3 are:


Unnamed: 0,website1,name,location,description,type,city
1720,https://lp-cms-production.imgix.net/2019-06/a4...,Holareidulijö,Munich,This rare secondhand traditional-clothing stor...,to_shop,Munich
1721,https://lonelyplanetstatic.imgix.net/marketing...,Pick & Weight,Munich,"Part of a small national chain, Pick & Weight ...",to_shop,Munich
1724,https://lonelyplanetstatic.imgix.net/marketing...,Loden-Frey,Munich,The famous cloth producer stocks a wide range ...,to_shop,Munich
1725,https://lonelyplanetstatic.imgix.net/marketing...,7 Himmel,Munich,Couture cool-hunters will be in seventh heaven...,to_shop,Munich


### Restaurant Recommendation

In [241]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Preprocess the text data
def preprocess_text(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Remove punctuation and lowercase the tokens
    table = str.maketrans('', '', string.punctuation)
    tokens = [w.translate(table).lower() for w in tokens]
    # Remove stopwords
    stop_words = set(stopwords.words('english'))
    tokens = [word for word in tokens if word not in stop_words]
    # Join the tokens back into a string
    preprocessed_text = ' '.join(tokens)
    return preprocessed_text

# Apply preprocessing to the "description" column
filtered_rest_df['preprocessed_description'] = filtered_rest_df['description'].apply(preprocess_text)

# Initialize BERTopic model
topic_model = BERTopic()

# Fit the model on preprocessed descriptions
topics, _ = topic_model.fit_transform(filtered_rest_df['preprocessed_description'])
filtered_rest_df['topic'] = topics


In [242]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Print the topics and associated descriptions for filtered_df
print("Topics and Associated Descriptions for filtered_df:")
for topic_id in filtered_rest_df['topic'].unique():
    topic_description = filtered_rest_df[filtered_rest_df['topic'] == topic_id]['description'].iloc[0]
    print(f"Topic {topic_id}: {topic_description}")

Topics and Associated Descriptions for filtered_df:
Topic -1: This popular Syrian restaurant a stone's throw from the Hauptbahnhof features hushed, candlelit dining and attentive service within a pleasant, minimalist…
Topic 1: Pale-green louvres and bamboo lanterns set a French-colonial vibe for this trendy new Oststadt restaurant offering a modern-fusion take on traditional…
Topic 0: The lopsided walls of this rose-covered, 500-year-old house add to its charm. Inside are historic rooms and, in one corner, a round, glass-topped table…
Topic 2: Romantic at sundown, glass-walled Pier 51 juts out over the Maschsee. Expect light pasta dishes and a small selection of fish, poultry and red meats on a…
Topic 4: This wonderful seafood restaurant is spread out over several dining rooms with distinct nautical styling (including one that purports to be a copy of…
Topic 3: This bright little cafe-restaurant with natural-edge wood tables serves up a menu of delicious, filling vegan dishes, namely s

In [243]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Merge topics with the original dataframe
merged_df = pd.merge(filtered_rest_df, filtered_rest_df.groupby('location')['topic'].apply(list).reset_index(name='topics'), on='location')

# Group by city and aggregate the topics
city_topics = merged_df.groupby('location')['topics'].agg(lambda x: [item for sublist in x for item in sublist]).reset_index()

# Print cities and their associated topics
print("Cities and Associated Topics:")
for index, row in city_topics.iterrows():
    print(f"{row['location']}: {row['topics']}")

Cities and Associated Topics:
Altstadt: [4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3, 4, -1, -1, 2, -1, 3, 4, 3, 0, 4, -1, 3]
Berlin: [5, -1, -1, 6, 1, -1, 1, 0, 0, 0, -1, -1, 2, 2, -1, 3, 1, 1, 0, 0, -1, 0, 2, 1, 0, 2, 1, -1, -1, 0, 5, 3, -1, 0, 5, 0, -1, -1, -1, 1, -1, 1, -1, 0, -1, 1, 3, -1, 0, -1, 3, -1, 1, -1, 2, 5, -1, 5, -1, -1, 6, 1, -1, 1, 0, 0, 0, -1, -1, 2, 2, -1, 3, 1, 1, 0, 0, -1, 0, 2, 1, 0, 2, 1, -1, -1, 0, 5, 3, -1, 0, 5, 0, -1, -1, -1, 1, -1, 1, -1, 0, -1, 1, 3, -1, 0, -1, 3, -1, 1, -1, 2, 5, -1, 5, -1, -1, 6, 1, -1, 1, 0, 0, 0, -1, -1, 2, 2, -1, 3, 1, 1, 0, 0, -1, 0, 2, 1, 0, 2, 1

In [244]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Now let's visualize these cities and their associated topics on a scatterplot
import plotly.express as px

# Create a DataFrame for the scatterplot
scatter_df = pd.DataFrame({
    'location': city_topics['location'],
    'Topic': city_topics['topics']
})

# Reshape the DataFrame so that each topic gets its own row
scatter_df = scatter_df.explode('Topic').reset_index(drop=True)

# Create the scatterplot
fig = px.scatter(scatter_df, x='location', y='Topic', color='Topic',
                 title='Cities and Associated Topics',
                 labels={'Citlocationy': 'location', 'Topic': 'Topic'},
                 width=800, height=600)

# Update layout for better visualization
fig.update_traces(marker=dict(size=12, opacity=0.8),
                  selector=dict(mode='markers'))

# Show the plot
fig.show()


In [245]:
#| echo: true
#| code-fold: true
#| panel: input
#| output: true
# Filter the DataFrame for Topic 3
topic_3_cities = scatter_df[scatter_df['Topic'] == 2]

# Count the occurrences of each city for Topic 3
city_counts = topic_3_cities['location'].value_counts()

# Get the city with the highest count
most_common_city = city_counts.idxmax()
count_of_most_common_city = city_counts.max()

print(f"The most relevant city for Topic 3 is {most_common_city} with a count of {count_of_most_common_city}.")

The most relevant city for Topic 3 is Berlin with a count of 285.


In [251]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
topic_model.visualize_topics()


In [247]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
# Assuming the most relevant city for Topic 3 is "Altstadt"
most_common_city = "Berlin"

# Filter the DataFrame for the most relevant city
city_df = filtered_rest_df[filtered_rest_df['location'] == most_common_city]


In [248]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: false
# Sort the rows based on their relevance to Topic 3
topic_3_sorted = city_df[city_df['topic'] == 3].sort_values(by='topic', ascending=False)

# Get the top 5 things to do in the city for Topic 3
top_5_things_to_do = topic_3_sorted.head(5)[['website1', 'name', 'location', 'description', 'type', 'city']]


In [249]:
#| echo: false
#| code-fold: true
#| panel: input
#| output: true
print(f"The best restaurants to go in {most_common_city} for Topic 3 are:")
top_5_things_to_do.head()

The best restaurants to go in Berlin for Topic 3 are:


Unnamed: 0,website1,name,location,description,type,city
656,https://lonelyplanetstatic.imgix.net/marketing...,Valladares Feinkost,Berlin,Beautifully situated on the quiet Stephanplatz...,to_eat,Berlin
736,https://lonelyplanetstatic.imgix.net/marketing...,Night Kitchen,Berlin,This smartly seductive courtyard bistro is oft...,to_eat,Berlin
794,https://lp-cms-production.imgix.net/2019-06/e7...,Weilands Wellfood,Berlin,"After a menu revamp, this easy-going day-time ...",to_eat,Berlin
807,https://lonelyplanetstatic.imgix.net/marketing...,geh Veg,Berlin,"Showing off the simple side of vegan eating, g...",to_eat,Berlin


## User Interface Design


# 3. Analysis and Results


## Data Analysis Techniques Used

## Presentation of Results




### City Recommendation:
The system will provide personalized city recommendations based on the user's preferences and interests. The results will be presented in a user-friendly format, such as a ranked list or an interactive map, showcasing the most suitable cities for the user's travel needs.

### Must-See Attractions:
For the recommended cities, the system will suggest a curated list of must-see attractions, landmarks, and points of interest. These recommendations will be accompanied by relevant information, such as descriptions, images, and location details, to help users plan their itineraries effectively.

### Tour Recommendations:
The system will recommend guided tours and organized activities within the suggested cities. The tour recommendations will be tailored to the user's budget preferences, interests, and desired duration. The results will likely be presented in a tabular format, displaying essential details like tour name, price, duration, and brief descriptions.

### Things to Do:
In addition to traditional attractions and tours, the system will recommend unique local experiences, hidden gems, and off-the-beaten-path activities based on the user's interests and the city's offerings. These recommendations will be presented as a curated list or interactive map, providing users with a diverse range of options to explore the destination authentically.

### Restaurant Recommendations:
Recognizing the importance of culinary experiences in travel, the system will suggest restaurants and dining options within the recommended cities. These recommendations will consider the user's preferences, such as cuisine types, budgets, and any dietary restrictions. The results may be presented as a list or map view, along with relevant details like ratings, reviews, and sample menus.

### Hotel Recommendations:
To complete the travel planning experience, the system will provide hotel recommendations within the recommended cities. These recommendations will be based on the user's budget, desired amenities, and location preferences. The results may be presented in a tabular format or interactive map view, displaying details like hotel ratings, pricing, and proximity to attractions.

Throughout the presentation of results, the Travel Destination Recommender will likely incorporate visual aids, such as images, maps, and interactive elements, to enhance the user experience and make the recommendations more engaging and informative. Additionally, the system may offer personalized travel itineraries or allow users to save and share their preferred recommendations for future reference or collaboration.


## Evaluation Metrics


# 4. Reflection and Conclusion

## Challenges Faced


## Lessons Learned


## Overall Significance of Findings


## Future Directions


# 5. References

# 6. Appendix