# **Content-Based Recommendation System for Mindanao Tourist Trip Packages**

This recommender system suggests travel packages in Mindanao based on the user's preferred activity, duration, budget, and previous destination. It uses a content-based approach to match their preferences with the characteristics of the destinations.

The system analyzes the textual descriptions of destinations using TF-IDF vectorization to capture important terms and computes cosine similarity to measure text similarity.

If there is an exact match between the preferred activity and a destination in Mindanao, those destinations will be recommended. Otherwise, the system will find destinations in Mindanao that are similar to the preferred destination based on their textual descriptions. It also considers the similarity between your preferred activity and the activities associated with the destinations using Spacy's similarity measure.

To get personalized recommendations, enter the previous destination, preferred activity, duration, and budget, and the system will suggest the best matches for the tourist packages in Mindanao.

## Imports and Loading of Data

In [1]:
import pandas as pd
import re
import spacy
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import warnings

warnings.filterwarnings("ignore")



In [2]:
# Load spaCy model
nlp = spacy.load("en_core_web_lg")

In [3]:
# Load the data from CSV
data = pd.read_csv('../data/travel_packages_ph.csv', encoding="ISO-8859-1")
data = data.drop(data.columns[0], axis=1)
data.head()

Unnamed: 0,region,location,activity,title,description,price,rating,review_count,duration,url,description_clean,general
0,Luzon,Palawan,Day Trips,Underground River Day Trips from Puerto Prince...,Discover Puerto Princesa Subterranean River Na...,2820.81,4.5,104,6.0,https://www.tripadvisor.com.ph/AttractionProdu...,Discover Puerto Princesa Subterranean River Na...,Tours
1,Visayas,Cebu,Snorkeling,Whale Shark Encounter and Sumilon Sandbar with...,"An amazing experience. It was a very long day,...",8922.98,4.5,83,6.0,https://www.tripadvisor.com.ph/AttractionProdu...,amazing experience long day 330am pickup 2hr h...,Water Activities
2,Luzon,Metro Manila,Day Trips,Amazing Manila - Tagaytay Full Day Sightseeing...,Thanks again to the amazing staff Sean (Lead T...,7483.79,5.0,47,7.0,https://www.tripadvisor.com.ph/AttractionProdu...,Thanks amazing staff Sean Lead Tour Guide Dan ...,Tours
3,Luzon,Palawan,Ziplining,"3-in-1 Adventure: Underground River, Zipline r...",Get the most out of your trip to the world-fam...,4029.73,4.5,9,8.0,https://www.tripadvisor.com.ph/AttractionProdu...,Get trip worldfamous Underground River booking...,Outdoor Activities
4,Luzon,Palawan,Day Trips,El Nido Island Hopping Day Tour from Puerto Pr...,Enjoy an island hopping experience in El Nido ...,11455.96,5.0,1,18.0,https://www.tripadvisor.com.ph/AttractionProdu...,Enjoy island hopping experience El Nido even h...,Tours


In [4]:
data.tail()

Unnamed: 0,region,location,activity,title,description,price,rating,review_count,duration,url,description_clean,general
757,Luzon,Bicol,Boat tours,Whaleshark Interaction with Firefly Expedition...,Donsol is best known for its whale watching wh...,2640.0,3.0,0,6.0,https://www.klook.com/en-PH/activity/87485-don...,Donsol is best known for its whale watching wh...,Tours
758,Luzon,Tanay,Wildlife,Moolk Farm Tour in Rizal,Experience the New Zealand of Tanay Rizal and ...,499.0,3.0,0,,https://www.klook.com/en-PH/activity/87773-moo...,Experience the New Zealand of Tanay Rizal and ...,Outdoor Activities
759,Luzon,Tanay,Hiking,Mount Batolusong Join In Day Hike from Manila,Start your morning by hiking to Mt. Batolusong...,1600.0,3.0,0,24.0,https://www.klook.com/en-PH/activity/87889-mou...,Start your morning by hiking to Mt. Batolusong...,Tours
760,Visayas,Carles,Multi-day tours,Gigantes Islands All In Package from Iloilo,Enjoy this Ultimate Gigantes Islands All In Pa...,2799.0,3.0,0,48.0,https://www.klook.com/en-PH/activity/88078-gig...,Enjoy this Ultimate Gigantes Islands All In Pa...,Tours
761,Luzon,Antipolo,Camping,Camping Experience at Gabriel's Sanctuary,Experience camping in the sky only an hour awa...,500.0,3.0,0,,https://www.klook.com/en-PH/activity/88115-cam...,Experience camping in the sky only an hour awa...,Outdoor Activities


In [5]:
data['description_clean'] = data['description_clean'].astype(str)

In [6]:
data["description_clean"][1]

'amazing experience long day 330am pickup 2hr hectic drive Oslob Long queues buy whale shark tickets got tickets time breakfast basic tasty WHALE shark experience absolutely world boat trip Sumilon Sandbar wasnt impressed way many people tiny white beach visited 1st 2 waterfalls Tumalog steep walk road beautiful took bike back hill steep easily fall Last Kawasan Falls totally unprepared bit amazing need prepare 5kms walk alongthrough riverpools rocks Make sure good water shoes easy walk Take possessions waterproof bag take water also money vendors along way shattered walk amazing experience Next long 2hr drive back would recommend actually stay night 2 area fully utilize activities area'

## Helper Functions

In [7]:
def preprocess_text(text):
    # Convert to lowercase
    text = text.lower()

    # Remove non-alphanumeric characters
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)

    # Remove extra whitespaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

In [8]:
def calculate_similarity(text1, text2):
    doc1 = nlp(text1)
    doc2 = nlp(text2)
    similarity = doc1.similarity(doc2)
    return similarity

## Recommender Engine

How does the recommender engine work?

1. User Inputs: The system takes input from the user, including the destination they have been to, their preferred activity, duration, and budget.

2. Filtering: The system filters the dataset based on the user's destination, duration, and budget. It selects destinations that match the user's destination, duration, and have a price within the user's budget.

3. Exact Activity Match: The system checks if there are any destinations in Mindanao that exactly match the user's preferred activity. If a match is found, those destinations are recommended.

4. TF-IDF Vectorization: If there is no exact activity match, the system preprocesses the destination descriptions and computes TF-IDF vectors for both the user's destination descriptions and the Mindanao destination descriptions. TF-IDF captures the importance of terms in the descriptions and represents them numerically.

5. Cosine Similarity: The system calculates the cosine similarity between the TF-IDF vectors of the user's destination and the Mindanao destinations. This measures the similarity between the textual contents of the descriptions.

6. Activity Similarity: The system also calculates the similarity between the user's preferred activity and the activities associated with the Mindanao destinations using Spacy's similarity measure. This measures the semantic similarity between the user's activity and the activities of the destinations.

7. Average Similarity: The system combines the cosine similarity and the activity similarity by taking their average for each destination.

8. Ranking and Recommendation: The system ranks the Mindanao destinations based on the average similarity and recommends the top three destinations with the highest average similarity.

Why TF-IDF on descriptions?
> By applying TF-IDF, we can quantify the importance of words in the descriptions relative to the entire dataset. Words that are frequently occurring in a specific destination's description but less common in other descriptions will receive higher weights. This helps capture the uniqueness and distinguishing features of each destination. TF-IDF vectorization allows us to measure the similarity between the user's destination and the Mindanao destinations based on the importance and occurrence of words in their descriptions.

Why spaCy similarities on activities?
> It utilizes pre-trained word vectors to capture semantic information about words based on their context and distributional properties. These word vectors encode semantic similarity, allowing us to measure the similarity between two words or phrases. By comparing the word vectors of different activities, we can compute their semantic similarity, indicating how closely related they are in terms of meaning. This allows us to identify destinations with activities that align closely with the user's preferred activity.

By considering both the semantic similarity of the activity using spaCy and the textual similarity of descriptions using TF-IDF and cosine similarity, the system can capture different aspects of user preferences and match them with suitable destinations.

In [9]:
def recommend_destination(user_destination, user_activity, user_budget, user_duration):
    # Convert user inputs to title case
    user_destination = user_destination.title()
    user_activity = user_activity.title()

    # Filter data based on the user's destination, duration, and budget
    user_filtered_data = data[(data['location'] == user_destination) & (data['duration'] <= user_duration) & (data['price'] <= user_budget)]

    if user_filtered_data.empty: # No similar results found
        return pd.DataFrame()

    # Filter data based on the Mindanao region and exact activity match
    mindanao_exact_data = data[(data['region'] == 'Mindanao') & (data['activity'] == user_activity) & (data['duration'] <= user_duration) & (data['price'] <= user_budget)]

    if not mindanao_exact_data.empty:
        # Exact activity match found in Mindanao, return the top 3 recommendations
        mindanao_exact_data_sorted = mindanao_exact_data.sort_values(by='duration', ascending=False)
        return mindanao_exact_data_sorted.head(3)

    # Preprocess the user's destination description
    user_filtered_data['preprocessed_description'] = user_filtered_data['description_clean'].apply(preprocess_text)

    # Preprocess the Mindanao destination descriptions
    mindanao_data = data[data['region'] == 'Mindanao']
    mindanao_data['preprocessed_description'] = mindanao_data['description_clean'].apply(preprocess_text)

    # Compute TF-IDF vectors for the Mindanao destination descriptions
    tfidf = TfidfVectorizer()
    mindanao_tfidf_matrix = tfidf.fit_transform(mindanao_data['preprocessed_description'])

    # Compute TF-IDF vectors for the user's destination descriptions
    user_tfidf_matrix = tfidf.transform(user_filtered_data['preprocessed_description'])

    # Compute cosine similarity between user's destination and Mindanao destinations
    similarity_matrix = cosine_similarity(user_tfidf_matrix, mindanao_tfidf_matrix)

    if similarity_matrix.size == 0:
        return pd.DataFrame()  # No similarity found, return empty DataFrame

    # Get the index of the most similar Mindanao destinations
    destination_index = similarity_matrix[0].argsort()[::-1]

    # Get the cosine similarity values
    cosine_similarities = similarity_matrix[0, destination_index]

    # Calculate activity similarity for user's input
    user_activity = nlp(user_activity)
    mindanao_data['activity_similarity'] = mindanao_data['activity'].apply(lambda x: user_activity.similarity(nlp(str(x))) if isinstance(x, str) else 0.0)

    # Calculate average similarity for each destination
    recommended_destinations = mindanao_data.iloc[destination_index]
    recommended_destinations['cosine_similarity'] = cosine_similarities
    recommended_destinations['average_similarity'] = (recommended_destinations['activity_similarity'] + recommended_destinations['cosine_similarity']) / 2.0

    # Filter recommendations based on duration and budget
    filtered_destinations = recommended_destinations[(recommended_destinations['duration'] <= user_duration) & (recommended_destinations['price'] <= user_budget)]

    # Sort recommendations based on average similarity
    sorted_destinations = filtered_destinations.sort_values(by='average_similarity', ascending=False)

    # Get the top 3 recommendations
    top_recommendations = sorted_destinations.head(3)

    # return top_recommendations[['region', 'location', 'activity', 'title', 'description', 'price', 'rating', 'duration', 'general', 'activity_similarity', 'cosine_similarity', 'average_similarity']]
    return top_recommendations

In [10]:
# Usage
user_destination = input("Enter the destination you've been to: ")
user_activity = input("Enter your preferred activity: ")
user_budget = float(input("Enter your preferred budget: "))
user_duration = float(input("Enter your preferred duration: "))

recommendations = recommend_destination(user_destination, user_activity, user_budget, user_duration)
recommendations


Unnamed: 0,region,location,activity,title,description,price,rating,review_count,duration,url,description_clean,general,preprocessed_description,activity_similarity,cosine_similarity,average_similarity
711,Mindanao,Siargao,Surfing,Siargao Surfing Lessons,SIARGAO is the top surfing spot in the Philipp...,700.0,4.5,4,1.0,https://www.klook.com/en-PH/activity/76411-sia...,S IA RG AO is the top surfing spot in the Phil...,Water Activities,s ia rg ao is the top surfing spot in the phil...,0.533359,0.030704,0.282031
388,Mindanao,Surigao del Norte,surfing,Quality Surfing Lessons in Siargao Island,Highly recommend Ulap Siyam for anybody that w...,2590.54,5.0,7,1.0,https://www.tripadvisor.com.ph/AttractionProdu...,Highly recommend Ulap Siyam anybody wants get ...,Water Activities,highly recommend ulap siyam anybody wants get ...,0.522602,0.028123,0.275362
315,Mindanao,Surigao del Norte,Boat Charters,Siargao Island Hopping,"Overall, this was a well-organized Private tou...",3338.92,4.5,2,4.0,https://www.tripadvisor.com.ph/AttractionProdu...,Overall wellorganized Private tour bit disappo...,Water Activities,overall wellorganized private tour bit disappo...,0.461378,0.067522,0.26445
