# Automated Itinerary Generator for Sri Lanka

This notebook implements a smart travel assistant that recommends personalized and practical travel plans for Sri Lanka. The system uses a multi-layered scoring algorithm to rank potential locations based on user preferences and cross-references them with a comprehensive dataset of locations.

## Key Features:
- **Multi-layered Scoring Algorithm**: Ranks locations based on user preferences, ratings, and travel practicality
- **Personalized Recommendations**: Takes into account user's province preferences and interests
- **Practical Itineraries**: Considers travel times and creates logical travel sequences
- **Human-readable Output**: Generates comprehensive itineraries with detailed information

---

## 1. Import Required Libraries

Import all necessary libraries for data processing, calculations, and visualization.

In [4]:
# Core data processing libraries
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# For handling geographical calculations and mapping
from math import radians, cos, sin, asin, sqrt
import folium
from folium import plugins

# For user input and display formatting
from IPython.display import display, HTML, clear_output
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

# System utilities
import os
import json
from collections import defaultdict

print("All libraries imported successfully!")
print("System ready for Automated Itinerary Generation")

All libraries imported successfully!
System ready for Automated Itinerary Generation


## 2. Load and Explore the Dataset

Load the Sri Lankan locations dataset and perform initial exploration to understand the data structure.

In [11]:
# Load the datasets
try:
    # Load location data
    locations_df = pd.read_csv('data/sri_lanka_locations.csv')
    
    # Load travel times data
    travel_times_df = pd.read_csv('data/travel_times.csv')
    
    # Load reviews data
    reviews_df = pd.read_csv('data/reviews.csv')
    
    print("✅ All datasets loaded successfully!")
    print(f"Locations dataset shape: {locations_df.shape}")
    print(f"Travel times dataset shape: {travel_times_df.shape}")
    print(f"Reviews dataset shape: {reviews_df.shape}")
    
except FileNotFoundError as e:
    print(f"❌ Error loading datasets: {e}")
    print("Please ensure the data files are in the 'data/' directory")
    
# Display basic information about the datasets
print("\n" + "="*50)
print("DATASET OVERVIEW")
print("="*50)

✅ All datasets loaded successfully!
Locations dataset shape: (50, 14)
Travel times dataset shape: (353, 3)
Reviews dataset shape: (300, 4)

DATASET OVERVIEW


In [12]:
# Explore locations dataset structure
print("LOCATIONS DATASET:")
print("-" * 20)
print(f"Columns: {list(locations_df.columns)}")
print(f"Data types:\n{locations_df.dtypes}")
print(f"\nFirst few rows:")
display(locations_df.head())

# Check for missing values
print(f"\nMissing values:")
print(locations_df.isnull().sum())

# Explore unique values in key columns
province_col = 'province' if 'province' in locations_df.columns else 'Province'
category_col = 'primary_category' if 'primary_category' in locations_df.columns else 'Category'

if province_col in locations_df.columns:
    print(f"\nUnique Provinces: {locations_df[province_col].unique()}")
if category_col in locations_df.columns:
    print(f"\nUnique Categories: {locations_df[category_col].unique()}")

LOCATIONS DATASET:
--------------------
Columns: ['location_id', 'name', 'province', 'city', 'primary_category', 'secondary_tags', 'description', 'avg_time_hours', 'best_season', 'difficulty', 'entry_fee_required', 'popularity_score', 'latitude', 'longitude']
Data types:
location_id             int64
name                   object
province               object
city                   object
primary_category       object
secondary_tags         object
description            object
avg_time_hours        float64
best_season            object
difficulty             object
entry_fee_required       bool
popularity_score        int64
latitude              float64
longitude             float64
dtype: object

First few rows:


Unnamed: 0,location_id,name,province,city,primary_category,secondary_tags,description,avg_time_hours,best_season,difficulty,entry_fee_required,popularity_score,latitude,longitude
0,1,Galle Fort,Southern Province,Galle,History,"culture,food,walk,scenic",Explore the Dutch colonial-era fort with bouti...,4.5,Year-round,Easy,False,9,6.026,80.217
1,2,Unawatuna Beach,Southern Province,Unawatuna,Beach,"relax,swim,food,wildlife",A popular crescent-shaped beach known for its ...,5.0,Nov-Apr,Easy,False,8,6.011,80.248
2,3,Yala National Park,Southern Province,Tissamaharama,Nature,"wildlife,safari,scenic",Famous for having one of the highest leopard d...,6.0,Feb-Jul,,True,9,6.283,81.516
3,4,Temple of the Tooth,Central Province,Kandy,Culture,"history,religion,scenic",A sacred Buddhist temple that houses the relic...,2.5,Year-round,Easy,True,10,7.293,80.641
4,5,Sigiriya Rock,Central Province,Dambulla,History,"hike,nature,culture,scenic",An ancient rock fortress with stunning frescoe...,4.0,"Jan-Apr, Jul-Sep",Challenging,True,10,7.957,80.76



Missing values:
location_id           0
name                  0
province              0
city                  0
primary_category      0
secondary_tags        0
description           0
avg_time_hours        0
best_season           0
difficulty            9
entry_fee_required    0
popularity_score      0
latitude              0
longitude             0
dtype: int64

Unique Provinces: ['Southern Province' 'Central Province' 'Eastern Province'
 'Northern Province' 'Uva Province' 'Sabaragamuwa Province'
 'Western Province' 'North Central Province' 'North Western Province']

Unique Categories: ['History' 'Beach' 'Nature' 'Culture' 'Scenic' 'Adventure' 'Relax']


## 3. Data Preprocessing and Cleaning

Clean and preprocess the location data, handle missing values, and standardize formats.

In [13]:
class DataPreprocessor:
    """
    Class for preprocessing and cleaning the Sri Lanka tourism dataset
    """
    
    def __init__(self, locations_df, reviews_df, travel_times_df):
        self.locations_df = locations_df.copy()
        self.reviews_df = reviews_df.copy()
        self.travel_times_df = travel_times_df.copy()
        
    def clean_locations_data(self):
        """Clean and standardize the locations dataset"""
        
        # Remove any duplicates
        initial_count = len(self.locations_df)
        self.locations_df = self.locations_df.drop_duplicates()
        print(f"Removed {initial_count - len(self.locations_df)} duplicate locations")
        
        # Create standardized column names mapping
        column_mapping = {}
        for col in self.locations_df.columns:
            if col.lower() in ['province', 'provinces']:
                column_mapping[col] = 'Province'
            elif col.lower() in ['category', 'primary_category', 'type']:
                column_mapping[col] = 'Category'
            elif col.lower() in ['name', 'location_name', 'title']:
                column_mapping[col] = 'Name'
            elif col.lower() in ['latitude', 'lat']:
                column_mapping[col] = 'Latitude'
            elif col.lower() in ['longitude', 'lon', 'lng']:
                column_mapping[col] = 'Longitude'
        
        # Apply column renaming
        self.locations_df = self.locations_df.rename(columns=column_mapping)
        
        # Handle missing values based on data type
        for column in self.locations_df.columns:
            if self.locations_df[column].dtype == 'object':
                # For string columns, fill with 'Unknown'
                self.locations_df[column] = self.locations_df[column].fillna('Unknown')
            else:
                # For numeric columns, fill with median
                self.locations_df[column] = self.locations_df[column].fillna(
                    self.locations_df[column].median()
                )
        
        # Standardize province names
        if 'Province' in self.locations_df.columns:
            self.locations_df['Province'] = self.locations_df['Province'].str.strip().str.title()
        
        # Standardize category names
        if 'Category' in self.locations_df.columns:
            self.locations_df['Category'] = self.locations_df['Category'].str.strip().str.title()
            
        return self.locations_df
    
    def process_reviews_data(self):
        """Process and aggregate review data"""
        
        if len(self.reviews_df) > 0:
            # Calculate average ratings and review counts per location
            # Use the correct column name from the actual data structure
            location_col = 'location_id' if 'location_id' in self.reviews_df.columns else 'Location'
            rating_col = 'rating' if 'rating' in self.reviews_df.columns else 'Rating'
            
            review_stats = self.reviews_df.groupby(location_col).agg({
                rating_col: ['mean', 'count']
            }).round(2)
            
            # Flatten column names
            review_stats.columns = ['Avg_Rating', 'Rating_Count']
            review_stats = review_stats.reset_index()
            
            return review_stats
        else:
            print("No review data available")
            return pd.DataFrame()
    
    def merge_datasets(self):
        """Merge all datasets into a comprehensive dataset"""
        
        # Start with locations as base
        comprehensive_df = self.locations_df.copy()
        
        # Merge with review statistics if available
        review_stats = self.process_reviews_data()
        if len(review_stats) > 0:
            # Use the correct location identifier column
            location_id_col = 'location_id' if 'location_id' in comprehensive_df.columns else comprehensive_df.columns[0]
            review_id_col = 'location_id' if 'location_id' in review_stats.columns else 'Location'
            
            comprehensive_df = pd.merge(
                comprehensive_df, 
                review_stats, 
                left_on=location_id_col,
                right_on=review_id_col, 
                how='left'
            )
        
        # Fill missing review data
        if 'Avg_Rating' in comprehensive_df.columns:
            comprehensive_df['Avg_Rating'] = comprehensive_df['Avg_Rating'].fillna(3.0)
            comprehensive_df['Rating_Count'] = comprehensive_df['Rating_Count'].fillna(0)
        
        return comprehensive_df

# Initialize preprocessor and clean data
preprocessor = DataPreprocessor(locations_df, reviews_df, travel_times_df)
cleaned_locations = preprocessor.clean_locations_data()
comprehensive_df = preprocessor.merge_datasets()

print("✅ Data preprocessing completed!")
print(f"Final dataset shape: {comprehensive_df.shape}")
print(f"Columns: {list(comprehensive_df.columns)}")

# Display sample of cleaned data
display(comprehensive_df.head())

Removed 0 duplicate locations
✅ Data preprocessing completed!
Final dataset shape: (50, 16)
Columns: ['location_id', 'Name', 'Province', 'city', 'Category', 'secondary_tags', 'description', 'avg_time_hours', 'best_season', 'difficulty', 'entry_fee_required', 'popularity_score', 'Latitude', 'Longitude', 'Avg_Rating', 'Rating_Count']


Unnamed: 0,location_id,Name,Province,city,Category,secondary_tags,description,avg_time_hours,best_season,difficulty,entry_fee_required,popularity_score,Latitude,Longitude,Avg_Rating,Rating_Count
0,1,Galle Fort,Southern Province,Galle,History,"culture,food,walk,scenic",Explore the Dutch colonial-era fort with bouti...,4.5,Year-round,Easy,False,9,6.026,80.217,4.22,9.0
1,2,Unawatuna Beach,Southern Province,Unawatuna,Beach,"relax,swim,food,wildlife",A popular crescent-shaped beach known for its ...,5.0,Nov-Apr,Easy,False,8,6.011,80.248,3.78,9.0
2,3,Yala National Park,Southern Province,Tissamaharama,Nature,"wildlife,safari,scenic",Famous for having one of the highest leopard d...,6.0,Feb-Jul,Unknown,True,9,6.283,81.516,4.0,9.0
3,4,Temple of the Tooth,Central Province,Kandy,Culture,"history,religion,scenic",A sacred Buddhist temple that houses the relic...,2.5,Year-round,Easy,True,10,7.293,80.641,4.22,9.0
4,5,Sigiriya Rock,Central Province,Dambulla,History,"hike,nature,culture,scenic",An ancient rock fortress with stunning frescoe...,4.0,"Jan-Apr, Jul-Sep",Challenging,True,10,7.957,80.76,4.18,11.0


## 4. User Input Collection

Create interactive interface to collect user preferences for itinerary generation.

In [15]:
class UserPreferenceCollector:
    """
    Interactive class to collect user preferences for itinerary generation
    """
    
    def __init__(self, comprehensive_df):
        self.df = comprehensive_df
        self.user_preferences = {}
    
    def get_available_options(self):
        """Extract available options from the dataset"""
        options = {}
        
        # Get available provinces
        if 'Province' in self.df.columns:
            options['provinces'] = sorted(self.df['Province'].unique())
        
        # Get available categories/interests  
        if 'Category' in self.df.columns:
            options['categories'] = sorted(self.df['Category'].unique())
        
        return options
    
    def collect_preferences_interactive(self):
        """Create interactive widgets for preference collection"""
        
        options = self.get_available_options()
        
        # Province selection
        if 'provinces' in options:
            province_widget = widgets.SelectMultiple(
                options=options['provinces'],
                description='Provinces:',
                disabled=False,
                layout=widgets.Layout(height='120px')
            )
        
        # Interest/Category selection
        if 'categories' in options:
            category_widget = widgets.SelectMultiple(
                options=options['categories'],
                description='Interests:',
                disabled=False,
                layout=widgets.Layout(height='120px')
            )
        
        # Duration selection
        duration_widget = widgets.IntSlider(
            value=3,
            min=1,
            max=14,
            step=1,
            description='Days:',
            disabled=False,
            continuous_update=False,
            orientation='horizontal',
            readout=True,
            readout_format='d'
        )
        
        # Budget selection (example categories)
        budget_widget = widgets.Dropdown(
            options=['Budget ($0-50/day)', 'Mid-range ($50-150/day)', 'Luxury ($150+/day)'],
            value='Mid-range ($50-150/day)',
            description='Budget:',
            disabled=False,
        )
        
        # Travel style
        style_widget = widgets.Dropdown(
            options=['Relaxed', 'Moderate', 'Intensive'],
            value='Moderate',
            description='Travel Style:',
            disabled=False,
        )
        
        # Display widgets
        print("🗺️ AUTOMATED ITINERARY GENERATOR - USER PREFERENCES")
        print("=" * 60)
        print("Please select your preferences below:")
        
        if 'provinces' in options:
            display(province_widget)
        if 'categories' in options:
            display(category_widget)
        display(duration_widget)
        display(budget_widget)
        display(style_widget)
        
        return {
            'provinces': province_widget if 'provinces' in options else None,
            'categories': category_widget if 'categories' in options else None,
            'duration': duration_widget,
            'budget': budget_widget,
            'style': style_widget
        }
    
    def collect_preferences_manual(self):
        """Manual preference collection method"""
        
        options = self.get_available_options()
        preferences = {}
        
        print("🗺️ AUTOMATED ITINERARY GENERATOR - USER PREFERENCES")
        print("=" * 60)
        
        # Province preferences
        if 'provinces' in options:
            print(f"Available Provinces: {', '.join(options['provinces'])}")
            prov_input = input("Enter preferred provinces (comma-separated, or 'all'): ").strip()
            if prov_input.lower() == 'all':
                preferences['provinces'] = options['provinces']
            else:
                preferences['provinces'] = [p.strip().title() for p in prov_input.split(',') if p.strip()]
        
        # Category preferences  
        if 'categories' in options:
            print(f"Available Categories: {', '.join(options['categories'])}")
            cat_input = input("Enter preferred interests/categories (comma-separated, or 'all'): ").strip()
            if cat_input.lower() == 'all':
                preferences['categories'] = options['categories']
            else:
                preferences['categories'] = [c.strip().title() for c in cat_input.split(',') if c.strip()]
        
        # Duration
        try:
            duration = int(input("Enter trip duration in days (1-14): ").strip())
            preferences['duration'] = max(1, min(14, duration))
        except ValueError:
            preferences['duration'] = 3
            print("Invalid input. Using default duration: 3 days")
        
        # Budget
        print("Budget categories: 1) Budget ($0-50/day), 2) Mid-range ($50-150/day), 3) Luxury ($150+/day)")
        try:
            budget_choice = int(input("Enter budget category (1-3): ").strip())
            budget_options = ['Budget ($0-50/day)', 'Mid-range ($50-150/day)', 'Luxury ($150+/day)']
            preferences['budget'] = budget_options[budget_choice - 1]
        except (ValueError, IndexError):
            preferences['budget'] = 'Mid-range ($50-150/day)'
            print("Invalid input. Using default budget: Mid-range")
        
        # Travel style
        print("Travel styles: 1) Relaxed, 2) Moderate, 3) Intensive")
        try:
            style_choice = int(input("Enter travel style (1-3): ").strip())
            style_options = ['Relaxed', 'Moderate', 'Intensive']
            preferences['style'] = style_options[style_choice - 1]
        except (ValueError, IndexError):
            preferences['style'] = 'Moderate'
            print("Invalid input. Using default style: Moderate")
        
        self.user_preferences = preferences
        return preferences
    
    def display_preferences_summary(self, preferences):
        """Display a summary of collected preferences"""
        
        print("\n" + "="*50)
        print("📋 YOUR TRAVEL PREFERENCES SUMMARY")
        print("="*50)
        
        for key, value in preferences.items():
            if isinstance(value, list):
                print(f"{key.title()}: {', '.join(value) if value else 'None selected'}")
            else:
                print(f"{key.title()}: {value}")
        
        print("="*50)

# Initialize preference collector
preference_collector = UserPreferenceCollector(comprehensive_df)

# For demonstration, let's show manual collection
print("Choose preference collection method:")
print("1. Manual input")
print("2. Interactive widgets (for Jupyter environments)")

# Manual collection example
user_prefs = preference_collector.collect_preferences_manual()
preference_collector.display_preferences_summary(user_prefs)

Choose preference collection method:
1. Manual input
2. Interactive widgets (for Jupyter environments)
🗺️ AUTOMATED ITINERARY GENERATOR - USER PREFERENCES
Available Provinces: Central Province, Eastern Province, North Central Province, North Western Province, Northern Province, Sabaragamuwa Province, Southern Province, Uva Province, Western Province
Available Categories: Adventure, Beach, Culture, History, Nature, Relax, Scenic
Available Categories: Adventure, Beach, Culture, History, Nature, Relax, Scenic
Budget categories: 1) Budget ($0-50/day), 2) Mid-range ($50-150/day), 3) Luxury ($150+/day)
Budget categories: 1) Budget ($0-50/day), 2) Mid-range ($50-150/day), 3) Luxury ($150+/day)
Travel styles: 1) Relaxed, 2) Moderate, 3) Intensive
Travel styles: 1) Relaxed, 2) Moderate, 3) Intensive

📋 YOUR TRAVEL PREFERENCES SUMMARY
Provinces: Central
Categories: Nature, Swim
Duration: 2
Budget: Budget ($0-50/day)
Style: Relaxed

📋 YOUR TRAVEL PREFERENCES SUMMARY
Provinces: Central
Categories:

## 5. Implement Multi-layered Scoring Algorithm

Develop the core scoring system that weights locations based on multiple factors.

In [22]:
class MultilayeredScoringAlgorithm:
    """
    Advanced scoring algorithm that evaluates locations based on multiple criteria
    """
    
    def __init__(self, comprehensive_df):
        self.df = comprehensive_df
        self.scoring_weights = {
            'province_match': 0.25,     # 25% weight for province preference
            'category_match': 0.30,     # 30% weight for interest/category match
            'rating_score': 0.25,       # 25% weight for user ratings
            'popularity_score': 0.10,   # 10% weight for popularity (review count)
            'accessibility_score': 0.10 # 10% weight for accessibility
        }
    
    def calculate_province_score(self, location_province, preferred_provinces):
        """Calculate score based on province preference match"""
        
        if not preferred_provinces or 'all' in [p.lower() for p in preferred_provinces]:
            return 1.0
        
        # Exact match gets full score
        if location_province in preferred_provinces:
            return 1.0
        
        # Partial match for similar provinces (can be enhanced with geographical proximity)
        for pref_prov in preferred_provinces:
            if pref_prov.lower() in location_province.lower() or location_province.lower() in pref_prov.lower():
                return 0.8
        
        return 0.2  # Minimum score for non-matching provinces
    
    def calculate_category_score(self, location_category, preferred_categories):
        """Calculate score based on interest/category match"""
        
        if not preferred_categories or 'all' in [c.lower() for c in preferred_categories]:
            return 1.0
        
        # Exact match gets full score
        if location_category in preferred_categories:
            return 1.0
        
        # Partial match for related categories
        category_synonyms = {
            'Historical': ['Culture', 'Heritage', 'Temple', 'Monument'],
            'Nature': ['Wildlife', 'Beach', 'Mountain', 'National Park'],
            'Adventure': ['Sports', 'Hiking', 'Water Sports', 'Climbing'],
            'Religious': ['Temple', 'Church', 'Monastery', 'Pilgrimage'],
            'Beach': ['Coastal', 'Water Sports', 'Resort'],
            'Cultural': ['Historical', 'Traditional', 'Museum', 'Art']
        }
        
        for pref_cat in preferred_categories:
            # Direct category match
            if pref_cat.lower() in location_category.lower() or location_category.lower() in pref_cat.lower():
                return 0.9
            
            # Synonym match
            if pref_cat in category_synonyms:
                for synonym in category_synonyms[pref_cat]:
                    if synonym.lower() in location_category.lower():
                        return 0.7
        
        return 0.3  # Minimum score for non-matching categories
    
    def calculate_rating_score(self, rating, max_rating=5.0):
        """Normalize rating to 0-1 scale"""
        
        if pd.isna(rating) or rating == 0:
            return 0.6  # Default score for locations without ratings
        
        return min(rating / max_rating, 1.0)
    
    def calculate_popularity_score(self, review_count, max_reviews=None):
        """Calculate popularity score based on review count"""
        
        if pd.isna(review_count) or review_count == 0:
            return 0.5  # Default score for locations without reviews
        
        # Use log scale to avoid extreme values dominating
        if max_reviews is None:
            max_reviews = self.df['Review_Count'].max() if 'Review_Count' in self.df.columns else 100
        
        # Log scale normalization
        log_reviews = np.log(review_count + 1)
        log_max = np.log(max_reviews + 1)
        
        return min(log_reviews / log_max, 1.0)
    
    def calculate_accessibility_score(self, location_data, travel_style):
        """Calculate accessibility score based on travel style and location characteristics"""
        
        # Base accessibility score
        base_score = 0.7
        
        # Adjust based on travel style
        style_adjustments = {
            'Relaxed': {'bonus_for': ['Beach', 'Resort', 'City'], 'penalty_for': ['Mountain', 'Remote']},
            'Moderate': {'bonus_for': ['Cultural', 'Historical'], 'penalty_for': []},
            'Intensive': {'bonus_for': ['Adventure', 'Hiking', 'Wildlife'], 'penalty_for': ['Beach']}
        }
        
        if travel_style in style_adjustments:
            adjustments = style_adjustments[travel_style]
            
            # Check location category for bonuses/penalties
            location_category = location_data.get('Category', '')
            
            for bonus_category in adjustments['bonus_for']:
                if bonus_category.lower() in location_category.lower():
                    base_score += 0.2
                    break
            
            for penalty_category in adjustments['penalty_for']:
                if penalty_category.lower() in location_category.lower():
                    base_score -= 0.1
                    break
        
        return min(max(base_score, 0.1), 1.0)  # Keep score between 0.1 and 1.0
    
    def calculate_composite_score(self, location_data, user_preferences):
        """Calculate the final composite score for a location"""
        
        scores = {}
        
        # Province score
        scores['province'] = self.calculate_province_score(
            location_data.get('Province', ''),
            user_preferences.get('provinces', [])
        )
        
        # Category score
        scores['category'] = self.calculate_category_score(
            location_data.get('Category', ''),
            user_preferences.get('categories', [])
        )
        
        # Rating score
        scores['rating'] = self.calculate_rating_score(
            location_data.get('Avg_Rating', 0)
        )
        
        # Popularity score
        scores['popularity'] = self.calculate_popularity_score(
            location_data.get('Review_Count', 0)
        )
        
        # Accessibility score
        scores['accessibility'] = self.calculate_accessibility_score(
            location_data,
            user_preferences.get('style', 'Moderate')
        )
        
        # Calculate weighted composite score
        composite_score = (
            scores['province'] * self.scoring_weights['province_match'] +
            scores['category'] * self.scoring_weights['category_match'] +
            scores['rating'] * self.scoring_weights['rating_score'] +
            scores['popularity'] * self.scoring_weights['popularity_score'] +
            scores['accessibility'] * self.scoring_weights['accessibility_score']
        )
        
        return composite_score, scores
    
    def score_all_locations(self, user_preferences):
        """Score all locations in the dataset"""
        
        scored_locations = []
        
        for idx, location in self.df.iterrows():
            location_dict = location.to_dict()
            composite_score, individual_scores = self.calculate_composite_score(
                location_dict, user_preferences
            )
            
            location_dict['Composite_Score'] = composite_score
            location_dict['Individual_Scores'] = individual_scores
            location_dict['Index'] = idx
            
            scored_locations.append(location_dict)
        
        # Sort by composite score in descending order
        scored_locations.sort(key=lambda x: x['Composite_Score'], reverse=True)
        
        return scored_locations
    
    def get_top_locations(self, user_preferences, top_n=20):
        """Get top N locations based on scoring"""
        
        scored_locations = self.score_all_locations(user_preferences)
        return scored_locations[:top_n]

# Initialize the scoring algorithm
scoring_algorithm = MultilayeredScoringAlgorithm(comprehensive_df)

# Test the scoring system with user preferences
if 'user_prefs' in locals():
    print("🎯 TESTING SCORING ALGORITHM")
    print("=" * 50)
    
    # Get top 10 locations for demonstration
    top_locations = scoring_algorithm.get_top_locations(user_prefs, top_n=10)
    
    print(f"Top 10 recommended locations based on your preferences:")
    print("-" * 50)
    
    for i, location in enumerate(top_locations, 1):
        name = location.get('Name', location.get('name', 'Unknown'))  # Use standardized Name column
        province = location.get('Province', 'Unknown')
        category = location.get('Category', 'Unknown')
        rating = location.get('Avg_Rating', 0)
        score = location.get('Composite_Score', 0)
        print(f"{i:2d}. {name}")
        print(f"    Province: {province} | Category: {category}")
        print(f"    Rating: {rating:.1f} | Composite Score: {score:.3f}")
        print()
    
    print("✅ Scoring algorithm working successfully!")
else:
    print("⚠️ Run the user preference collection first to test the scoring algorithm")

🎯 TESTING SCORING ALGORITHM
Top 10 recommended locations based on your preferences:
--------------------------------------------------
 1. Knuckles Mountain Range
    Province: Central Province | Category: Nature
    Rating: 4.2 | Composite Score: 0.833

 2. Horton Plains National Park
    Province: Central Province | Category: Nature
    Rating: 4.2 | Composite Score: 0.831

 3. Wasgamuwa National Park
    Province: Central Province | Category: Nature
    Rating: 4.0 | Composite Score: 0.820

 4. Kaudulla National Park
    Province: North Central Province | Category: Nature
    Rating: 4.0 | Composite Score: 0.820

 5. Bambarakanda Falls
    Province: Uva Province | Category: Nature
    Rating: 5.0 | Composite Score: 0.720

 6. Udawalawe National Park
    Province: Sabaragamuwa Province | Category: Nature
    Rating: 4.3 | Composite Score: 0.686

 7. Yala National Park
    Province: Southern Province | Category: Nature
    Rating: 4.0 | Composite Score: 0.670

 8. Diyaluma Falls
    P

## 6. Location Filtering and Ranking

Apply additional filters and ranking logic to optimize location selection.

In [23]:
class LocationFilterAndRanker:
    """
    Advanced filtering and ranking system for location selection
    """
    
    def __init__(self, scoring_algorithm):
        self.scoring_algorithm = scoring_algorithm
        
    def apply_budget_filter(self, locations, budget_category):
        """Filter locations based on budget constraints"""
        
        # Define budget-based filtering criteria
        budget_filters = {
            'Budget ($0-50/day)': {
                'min_rating': 3.0,  # Lower minimum rating for budget options
                'exclude_categories': ['Luxury Resort', 'High-End Restaurant'],
                'prefer_categories': ['Beach', 'Historical', 'Nature', 'Cultural']
            },
            'Mid-range ($50-150/day)': {
                'min_rating': 3.5,
                'exclude_categories': [],
                'prefer_categories': []
            },
            'Luxury ($150+/day)': {
                'min_rating': 4.0,
                'exclude_categories': ['Budget'],
                'prefer_categories': ['Resort', 'Luxury', 'Premium', 'Five Star']
            }
        }
        
        if budget_category not in budget_filters:
            return locations
        
        filter_criteria = budget_filters[budget_category]
        filtered_locations = []
        
        for location in locations:
            # Check minimum rating
            rating = location.get('Avg_Rating', 0)
            if rating < filter_criteria['min_rating'] and rating > 0:
                continue
            
            # Check excluded categories
            category = location.get('Category', '')
            if any(excluded.lower() in category.lower() for excluded in filter_criteria['exclude_categories']):
                continue
            
            # Boost score for preferred categories
            if any(preferred.lower() in category.lower() for preferred in filter_criteria['prefer_categories']):
                location['Composite_Score'] = min(location['Composite_Score'] + 0.1, 1.0)
            
            filtered_locations.append(location)
        
        return filtered_locations
    
    def apply_diversity_filter(self, locations, max_per_category=None, max_per_province=None):
        """Ensure diversity in location selection"""
        
        if max_per_category is None:
            max_per_category = max(2, len(locations) // 5)  # At most 20% from same category
        
        if max_per_province is None:
            max_per_province = max(3, len(locations) // 3)  # At most 33% from same province
        
        category_counts = defaultdict(int)
        province_counts = defaultdict(int)
        diverse_locations = []
        
        for location in locations:
            category = location.get('Category', 'Unknown')
            province = location.get('Province', 'Unknown')
            
            # Check if adding this location would exceed diversity limits
            if (category_counts[category] < max_per_category and 
                province_counts[province] < max_per_province):
                
                diverse_locations.append(location)
                category_counts[category] += 1
                province_counts[province] += 1
        
        return diverse_locations
    
    def apply_accessibility_filter(self, locations, travel_style, duration):
        """Filter based on accessibility and travel style"""
        
        accessibility_criteria = {
            'Relaxed': {
                'max_locations_per_day': 1,
                'prefer_accessible': True,
                'avoid_remote': True
            },
            'Moderate': {
                'max_locations_per_day': 2,
                'prefer_accessible': False,
                'avoid_remote': False
            },
            'Intensive': {
                'max_locations_per_day': 3,
                'prefer_accessible': False,
                'avoid_remote': False
            }
        }
        
        if travel_style not in accessibility_criteria:
            return locations
        
        criteria = accessibility_criteria[travel_style]
        max_total_locations = duration * criteria['max_locations_per_day']
        
        # Limit total locations based on travel style and duration
        filtered_locations = locations[:max_total_locations]
        
        # Apply additional style-based filtering
        if criteria['avoid_remote']:
            filtered_locations = [
                loc for loc in filtered_locations 
                if not any(remote_keyword in loc.get('Category', '').lower() 
                          for remote_keyword in ['remote', 'wilderness', 'trek'])
            ]
        
        return filtered_locations
    
    def rank_locations_final(self, locations, user_preferences):
        """Apply final ranking with additional considerations"""
        
        duration = user_preferences.get('duration', 3)
        travel_style = user_preferences.get('style', 'Moderate')
        
        # Adjust scores based on itinerary coherence
        for i, location in enumerate(locations):
            # Bonus for locations that fit well in a sequence
            base_score = location['Composite_Score']
            
            # Early locations get slight bonus (easier to plan around)
            if i < len(locations) // 3:
                location['Composite_Score'] = min(base_score + 0.05, 1.0)
            
            # Penalize locations that might be too similar to already selected ones
            for j, other_location in enumerate(locations[:i]):
                if (location.get('Category') == other_location.get('Category') and
                    location.get('Province') == other_location.get('Province')):
                    location['Composite_Score'] = max(base_score - 0.1, 0.0)
                    break
        
        # Re-sort after adjustments
        locations.sort(key=lambda x: x['Composite_Score'], reverse=True)
        return locations
    
    def filter_and_rank_locations(self, user_preferences, max_locations=None):
        """Complete filtering and ranking pipeline"""
        
        # Start with scored locations
        scored_locations = self.scoring_algorithm.score_all_locations(user_preferences)
        
        # Apply budget filter
        budget_filtered = self.apply_budget_filter(
            scored_locations, 
            user_preferences.get('budget', 'Mid-range ($50-150/day)')
        )
        
        # Apply accessibility filter
        accessibility_filtered = self.apply_accessibility_filter(
            budget_filtered,
            user_preferences.get('style', 'Moderate'),
            user_preferences.get('duration', 3)
        )
        
        # Apply diversity filter
        diverse_locations = self.apply_diversity_filter(accessibility_filtered)
        
        # Final ranking
        final_ranked = self.rank_locations_final(diverse_locations, user_preferences)
        
        # Limit to maximum locations if specified
        if max_locations:
            final_ranked = final_ranked[:max_locations]
        
        return final_ranked

# Initialize the filter and ranker
filter_ranker = LocationFilterAndRanker(scoring_algorithm)

# Test the filtering and ranking system
if 'user_prefs' in locals():
    print("🔍 TESTING FILTERING AND RANKING SYSTEM")
    print("=" * 60)
    
    # Get filtered and ranked locations
    final_locations = filter_ranker.filter_and_rank_locations(
        user_prefs, 
        max_locations=user_prefs.get('duration', 3) * 2  # 2 locations per day max
    )
    
    print(f"Final recommended locations after filtering and ranking:")
    print(f"(Limited to {len(final_locations)} locations based on {user_prefs.get('duration', 3)}-day itinerary)")
    print("-" * 60)
    
    for i, location in enumerate(final_locations, 1):
        name = location.get('Name', location.get('name', 'Unknown'))
        province = location.get('Province', 'Unknown')
        category = location.get('Category', 'Unknown')
        rating = location.get('Avg_Rating', 0)
        score = location.get('Composite_Score', 0)
        
        print(f"{i:2d}. {name}")
        print(f"    Province: {province} | Category: {category}")
        print(f"    Rating: {rating:.1f} | Final Score: {score:.3f}")
        
        # Show individual score breakdown
        individual_scores = location.get('Individual_Scores', {})
        if individual_scores:
            print(f"    Score breakdown: ", end="")
            for score_type, score_val in individual_scores.items():
                print(f"{score_type}: {score_val:.2f} ", end="")
        print("\n")
    
    print("✅ Filtering and ranking completed successfully!")
else:
    print("⚠️ Run the user preference collection first to test the filtering system")

🔍 TESTING FILTERING AND RANKING SYSTEM
Final recommended locations after filtering and ranking:
(Limited to 2 locations based on 2-day itinerary)
------------------------------------------------------------
 1. Knuckles Mountain Range
    Province: Central Province | Category: Nature
    Rating: 4.2 | Final Score: 0.932
    Score breakdown: province: 0.80 category: 1.00 rating: 0.85 popularity: 0.50 accessibility: 0.70 

 2. Horton Plains National Park
    Province: Central Province | Category: Nature
    Rating: 4.2 | Final Score: 0.831
    Score breakdown: province: 0.80 category: 1.00 rating: 0.84 popularity: 0.50 accessibility: 0.70 

✅ Filtering and ranking completed successfully!


## 7. Travel Time Calculation

Implement functions to calculate travel times and optimize visit sequences.

In [19]:
class TravelTimeCalculator:
    """
    Calculate travel times and optimize travel sequences
    """
    
    def __init__(self, travel_times_df=None):
        self.travel_times_df = travel_times_df
        
    def haversine_distance(self, lat1, lon1, lat2, lon2):
        """
        Calculate the great circle distance between two points 
        on Earth (specified in decimal degrees)
        """
        # Convert decimal degrees to radians
        lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
        
        # Haversine formula
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        c = 2 * asin(sqrt(a))
        
        # Radius of Earth in kilometers
        r = 6371
        
        return c * r
    
    def estimate_travel_time(self, distance_km, transport_mode='car'):
        """
        Estimate travel time based on distance and transport mode
        """
        # Average speeds in km/h for different transport modes in Sri Lanka
        speeds = {
            'car': 45,      # Considering traffic and road conditions
            'bus': 35,      # Public transport
            'train': 40,    # Rail transport
            'walk': 5,      # Walking
            'tuk_tuk': 30   # Three-wheeler
        }
        
        speed = speeds.get(transport_mode, speeds['car'])
        travel_time_hours = distance_km / speed
        
        # Add buffer time for stops, traffic, etc.
        buffer_multiplier = 1.3
        
        return travel_time_hours * buffer_multiplier
    
    def get_travel_time_between_locations(self, loc1, loc2, transport_mode='car'):
        """
        Get travel time between two locations
        """
        # Try to get from travel times dataset first
        if self.travel_times_df is not None and len(self.travel_times_df) > 0:
            # Use correct column names from the actual data
            origin_col = 'origin_city' if 'origin_city' in self.travel_times_df.columns else 'Origin'
            dest_col = 'destination_city' if 'destination_city' in self.travel_times_df.columns else 'Destination'
            time_col = 'travel_time_hours' if 'travel_time_hours' in self.travel_times_df.columns else 'Travel_Time_Hours'
            
            # Get city names from locations (assuming we have a city column or use name)
            loc1_city = loc1.get('city', loc1.get('City', loc1.get('name', loc1.get('Name', ''))))
            loc2_city = loc2.get('city', loc2.get('City', loc2.get('name', loc2.get('Name', ''))))
            
            # Look for direct match in travel times data
            direct_match = self.travel_times_df[
                ((self.travel_times_df[origin_col] == loc1_city) & 
                 (self.travel_times_df[dest_col] == loc2_city)) |
                ((self.travel_times_df[origin_col] == loc2_city) & 
                 (self.travel_times_df[dest_col] == loc1_city))
            ]
            
            if not direct_match.empty:
                return direct_match.iloc[0][time_col]
        
        # Fall back to coordinate-based calculation
        if all(key in loc1 for key in ['Latitude', 'Longitude']) and \
           all(key in loc2 for key in ['Latitude', 'Longitude']):
            
            distance = self.haversine_distance(
                loc1['Latitude'], loc1['Longitude'],
                loc2['Latitude'], loc2['Longitude']
            )
            
            return self.estimate_travel_time(distance, transport_mode)
        
        # Default estimate if no coordinates available
        return 2.0  # 2 hours default
    
    def optimize_travel_sequence(self, locations, start_location=None):
        """
        Optimize the sequence of locations to minimize travel time
        Uses a greedy nearest-neighbor approach
        """
        if len(locations) <= 1:
            return locations
        
        # Convert locations to a format with names for easier handling
        location_data = []
        for i, loc in enumerate(locations):
            loc_data = loc.copy()
            loc_data['name'] = loc.get('Name', loc.get('name', f'Location_{i}'))
            loc_data['original_index'] = i
            location_data.append(loc_data)
        
        if start_location is None:
            # Start with the highest-scored location
            current = location_data[0]
            unvisited = location_data[1:]
        else:
            # Find the specified start location
            current = None
            for loc in location_data:
                if loc['name'] == start_location:
                    current = loc
                    break
            
            if current is None:
                current = location_data[0]
            
            unvisited = [loc for loc in location_data if loc != current]
        
        optimized_sequence = [current]
        total_travel_time = 0
        
        # Greedy nearest-neighbor algorithm
        while unvisited:
            nearest_location = None
            min_travel_time = float('inf')
            
            for candidate in unvisited:
                travel_time = self.get_travel_time_between_locations(current, candidate)
                
                if travel_time < min_travel_time:
                    min_travel_time = travel_time
                    nearest_location = candidate
            
            if nearest_location is not None:
                optimized_sequence.append(nearest_location)
                total_travel_time += min_travel_time
                current = nearest_location
                unvisited.remove(nearest_location)
            else:
                # If no nearest location found, add remaining locations
                optimized_sequence.extend(unvisited)
                break
        
        # Add travel time information to the sequence
        for i in range(len(optimized_sequence)):
            if i > 0:
                travel_time = self.get_travel_time_between_locations(
                    optimized_sequence[i-1], 
                    optimized_sequence[i]
                )
                optimized_sequence[i]['Travel_Time_From_Previous'] = travel_time
            else:
                optimized_sequence[i]['Travel_Time_From_Previous'] = 0
        
        return optimized_sequence, total_travel_time
    
    def calculate_daily_itinerary_times(self, locations, travel_style='Moderate'):
        """
        Calculate time allocations for each location in daily itinerary
        """
        # Time allocation based on travel style (hours per location)
        time_allocations = {
            'Relaxed': {'major': 4, 'minor': 2, 'travel_buffer': 1},
            'Moderate': {'major': 3, 'minor': 1.5, 'travel_buffer': 0.5},
            'Intensive': {'major': 2, 'minor': 1, 'travel_buffer': 0.3}
        }
        
        allocation = time_allocations.get(travel_style, time_allocations['Moderate'])
        
        for location in locations:
            # Determine if location is major or minor based on score
            score = location.get('Composite_Score', 0.5)
            
            if score > 0.7:
                location['Visit_Duration_Hours'] = allocation['major']
            else:
                location['Visit_Duration_Hours'] = allocation['minor']
            
            location['Travel_Buffer_Hours'] = allocation['travel_buffer']
        
        return locations

# Initialize travel time calculator
travel_calculator = TravelTimeCalculator(travel_times_df)

# Test travel time calculations
if 'final_locations' in locals() and len(final_locations) > 1:
    print("🚗 TESTING TRAVEL TIME CALCULATIONS")
    print("=" * 60)
    
    # Optimize travel sequence
    optimized_locations, total_time = travel_calculator.optimize_travel_sequence(final_locations)
    
    print(f"Optimized travel sequence (Total estimated travel time: {total_time:.1f} hours):")
    print("-" * 60)
    
    for i, location in enumerate(optimized_locations, 1):
        name = location.get('name', location.get('Name', 'Unknown'))
        province = location.get('Province', 'Unknown')
        travel_time = location.get('Travel_Time_From_Previous', 0)
        
        print(f"{i}. {name} ({province})")
        if travel_time > 0:
            print(f"   ⏱️ Travel time from previous: {travel_time:.1f} hours")
        print()
    
    # Calculate daily time allocations
    timed_locations = travel_calculator.calculate_daily_itinerary_times(
        optimized_locations, 
        user_prefs.get('style', 'Moderate')
    )
    
    print("\nTime allocations for each location:")
    print("-" * 40)
    
    for location in timed_locations:
        name = location.get('name', location.get('Name', 'Unknown'))
        visit_duration = location.get('Visit_Duration_Hours', 2)
        
        print(f"{name}: {visit_duration} hours visit time")
    
    print("\n✅ Travel time calculations completed successfully!")
    
    # Store optimized locations for itinerary generation
    optimized_itinerary_locations = optimized_locations
    
else:
    print("⚠️ Run the previous sections to generate locations for travel time testing")

🚗 TESTING TRAVEL TIME CALCULATIONS
Optimized travel sequence (Total estimated travel time: 2.1 hours):
------------------------------------------------------------
1. Knuckles Mountain Range (Central Province)

2. Horton Plains National Park (Central Province)
   ⏱️ Travel time from previous: 2.1 hours


Time allocations for each location:
----------------------------------------
Knuckles Mountain Range: 4 hours visit time
Horton Plains National Park: 4 hours visit time

✅ Travel time calculations completed successfully!


## 8. Itinerary Generation

Generate the complete itinerary by organizing locations into a logical daily schedule.

In [24]:
class ItineraryGenerator:
    """
    Generate complete day-by-day itineraries with detailed scheduling
    """
    
    def __init__(self, travel_calculator):
        self.travel_calculator = travel_calculator
        
    def organize_locations_by_day(self, locations, duration, travel_style='Moderate'):
        """
        Organize locations into daily schedules
        """
        # Define daily time constraints based on travel style
        daily_constraints = {
            'Relaxed': {
                'max_daily_hours': 6,    # 6 hours of activities per day
                'start_time': 9,         # Start at 9:00 AM
                'max_locations_per_day': 2
            },
            'Moderate': {
                'max_daily_hours': 8,    # 8 hours of activities per day
                'start_time': 8,         # Start at 8:00 AM
                'max_locations_per_day': 3
            },
            'Intensive': {
                'max_daily_hours': 10,   # 10 hours of activities per day
                'start_time': 7,         # Start at 7:00 AM
                'max_locations_per_day': 4
            }
        }
        
        constraints = daily_constraints.get(travel_style, daily_constraints['Moderate'])
        
        daily_itinerary = []
        current_day = 1
        current_day_locations = []
        current_day_hours = 0
        current_time = constraints['start_time']
        
        for i, location in enumerate(locations):
            visit_duration = location.get('Visit_Duration_Hours', 2)
            travel_time = location.get('Travel_Time_From_Previous', 0)
            buffer_time = location.get('Travel_Buffer_Hours', 0.5)
            
            total_time_needed = visit_duration + travel_time + buffer_time
            
            # Check if location fits in current day
            if (current_day_hours + total_time_needed <= constraints['max_daily_hours'] and
                len(current_day_locations) < constraints['max_locations_per_day'] and
                current_day <= duration):
                
                # Add to current day
                location_schedule = {
                    'location': location,
                    'start_time': current_time + travel_time,
                    'end_time': current_time + travel_time + visit_duration,
                    'travel_time_from_previous': travel_time,
                    'visit_duration': visit_duration
                }
                
                current_day_locations.append(location_schedule)
                current_day_hours += total_time_needed
                current_time += total_time_needed
                
            else:
                # Start new day
                if current_day_locations:
                    daily_itinerary.append({
                        'day': current_day,
                        'locations': current_day_locations,
                        'total_hours': current_day_hours,
                        'start_time': constraints['start_time'],
                        'estimated_end_time': current_time
                    })
                
                current_day += 1
                if current_day > duration:
                    break
                
                # Reset for new day
                current_day_locations = []
                current_day_hours = 0
                current_time = constraints['start_time']
                
                # Add current location to new day
                location_schedule = {
                    'location': location,
                    'start_time': current_time,
                    'end_time': current_time + visit_duration,
                    'travel_time_from_previous': 0,  # First location of the day
                    'visit_duration': visit_duration
                }
                
                current_day_locations.append(location_schedule)
                current_day_hours = visit_duration + buffer_time
                current_time += visit_duration + buffer_time
        
        # Add the last day if it has locations
        if current_day_locations and current_day <= duration:
            daily_itinerary.append({
                'day': current_day,
                'locations': current_day_locations,
                'total_hours': current_day_hours,
                'start_time': constraints['start_time'],
                'estimated_end_time': current_time
            })
        
        return daily_itinerary
    
    def add_practical_recommendations(self, daily_itinerary, user_preferences):
        """
        Add practical travel recommendations to the itinerary
        """
        budget_category = user_preferences.get('budget', 'Mid-range ($50-150/day)')
        travel_style = user_preferences.get('style', 'Moderate')
        
        recommendations = {
            'Budget ($0-50/day)': {
                'accommodation': 'Guesthouses, hostels, budget hotels',
                'transport': 'Local buses, trains, shared tuk-tuks',
                'dining': 'Local restaurants, street food, rice and curry shops',
                'tips': ['Book accommodation in advance', 'Use public transport', 'Eat at local places']
            },
            'Mid-range ($50-150/day)': {
                'accommodation': 'Mid-range hotels, boutique guesthouses',
                'transport': 'Private tuk-tuks, taxis, rental cars',
                'dining': 'Mix of local and tourist restaurants',
                'tips': ['Book popular attractions in advance', 'Negotiate transport prices', 'Try authentic Sri Lankan cuisine']
            },
            'Luxury ($150+/day)': {
                'accommodation': 'Luxury hotels, resorts, villa rentals',
                'transport': 'Private cars with drivers, domestic flights',
                'dining': 'Fine dining restaurants, hotel restaurants',
                'tips': ['Consider helicopter tours', 'Book spa treatments', 'Enjoy premium experiences']
            }
        }
        
        budget_recs = recommendations.get(budget_category, recommendations['Mid-range ($50-150/day)'])
        
        # Add recommendations to each day
        for day_info in daily_itinerary:
            day_info['recommendations'] = {
                'accommodation': budget_recs['accommodation'],
                'transport': budget_recs['transport'],
                'dining': budget_recs['dining'],
                'daily_tips': budget_recs['tips'][:2]  # Limit to 2 tips per day
            }
            
            # Add location-specific recommendations
            provinces_visited = set()
            categories_visited = set()
            
            for loc_schedule in day_info['locations']:
                location = loc_schedule['location']
                provinces_visited.add(location.get('Province', ''))
                categories_visited.add(location.get('Category', ''))
            
            # Add province-specific tips
            province_tips = {
                'Western': ['Visit Colombo markets', 'Try street food in Pettah'],
                'Central': ['Book train tickets early', 'Bring warm clothes for hill country'],
                'Southern': ['Check surf conditions', 'Visit spice gardens'],
                'Northern': ['Respect cultural sites', 'Try Jaffna cuisine'],
                'Eastern': ['Best visited Apr-Sep', 'Great for water sports'],
                'North Western': ['Visit during dry season', 'Explore ancient ruins'],
                'North Central': ['Early morning temple visits', 'Carry water and sun protection'],
                'Uva': ['Perfect for hiking', 'Visit tea plantations'],
                'Sabaragamuwa': ['Ideal for nature lovers', 'Explore gem mines']
            }
            
            day_info['location_specific_tips'] = []
            for province in provinces_visited:
                if province in province_tips:
                    day_info['location_specific_tips'].extend(province_tips[province][:1])
        
        return daily_itinerary
    
    def generate_complete_itinerary(self, locations, user_preferences):
        """
        Generate a complete, detailed itinerary
        """
        duration = user_preferences.get('duration', 3)
        travel_style = user_preferences.get('style', 'Moderate')
        
        # Organize locations by day
        daily_itinerary = self.organize_locations_by_day(locations, duration, travel_style)
        
        # Add practical recommendations
        complete_itinerary = self.add_practical_recommendations(daily_itinerary, user_preferences)
        
        # Add metadata
        itinerary_metadata = {
            'total_days': duration,
            'travel_style': travel_style,
            'budget_category': user_preferences.get('budget', 'Mid-range ($50-150/day)'),
            'preferred_provinces': user_preferences.get('provinces', []),
            'preferred_categories': user_preferences.get('categories', []),
            'total_locations': len(locations),
            'generation_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        }
        
        return {
            'metadata': itinerary_metadata,
            'daily_schedule': complete_itinerary,
            'locations_summary': locations
        }

# Initialize itinerary generator
itinerary_generator = ItineraryGenerator(travel_calculator)

# Generate complete itinerary
if 'optimized_itinerary_locations' in locals() and 'user_prefs' in locals():
    print("📅 GENERATING COMPLETE ITINERARY")
    print("=" * 60)
    
    # Generate the complete itinerary
    complete_itinerary = itinerary_generator.generate_complete_itinerary(
        optimized_itinerary_locations, 
        user_prefs
    )
    
    print("✅ Complete itinerary generated successfully!")
    print(f"📊 Itinerary includes {len(complete_itinerary['daily_schedule'])} days")
    print(f"📍 Total locations: {complete_itinerary['metadata']['total_locations']}")
    print(f"🎯 Travel style: {complete_itinerary['metadata']['travel_style']}")
    print(f"💰 Budget category: {complete_itinerary['metadata']['budget_category']}")
    
    # Quick preview of the itinerary structure
    print("\n📋 Daily Schedule Overview:")
    print("-" * 40)
    
    for day_info in complete_itinerary['daily_schedule']:
        day_num = day_info['day']
        num_locations = len(day_info['locations'])
        total_hours = day_info['total_hours']
        
        print(f"Day {day_num}: {num_locations} locations, {total_hours:.1f} hours planned")
        
        for i, loc_schedule in enumerate(day_info['locations'], 1):
            location = loc_schedule['location']
            name = location.get('name', location.get('Name', 'Unknown'))
            start_time = loc_schedule['start_time']
            
            # Convert decimal hours to HH:MM format
            start_hour = int(start_time)
            start_min = int((start_time - start_hour) * 60)
            
            print(f"  {i}. {start_hour:02d}:{start_min:02d} - {name}")
        print()
    
    print("✅ Itinerary generation pipeline completed successfully!")
    
else:
    print("⚠️ Run the previous sections to generate optimized locations for itinerary creation")

📅 GENERATING COMPLETE ITINERARY
✅ Complete itinerary generated successfully!
📊 Itinerary includes 2 days
📍 Total locations: 2
🎯 Travel style: Relaxed
💰 Budget category: Budget ($0-50/day)

📋 Daily Schedule Overview:
----------------------------------------
Day 1: 1 locations, 5.0 hours planned
  1. 09:00 - Knuckles Mountain Range

Day 2: 1 locations, 5.0 hours planned
  1. 09:00 - Horton Plains National Park

✅ Itinerary generation pipeline completed successfully!


## 9. Output Formatting and Display

Format and display the final itinerary in a human-readable format with all details.

In [25]:
class ItineraryFormatter:
    """
    Format and display itineraries in various human-readable formats
    """
    
    def __init__(self):
        self.emojis = {
            'day': '📅',
            'location': '📍',
            'time': '⏰',
            'travel': '🚗',
            'food': '🍽️',
            'accommodation': '🏨',
            'tip': '💡',
            'rating': '⭐',
            'category': '🏷️',
            'province': '🗺️',
            'budget': '💰',
            'duration': '⏱️'
        }
    
    def format_time(self, decimal_hours):
        """Convert decimal hours to HH:MM format"""
        hours = int(decimal_hours)
        minutes = int((decimal_hours - hours) * 60)
        return f"{hours:02d}:{minutes:02d}"
    
    def format_detailed_itinerary(self, complete_itinerary):
        """Generate a detailed, formatted itinerary"""
        
        output = []
        metadata = complete_itinerary['metadata']
        daily_schedule = complete_itinerary['daily_schedule']
        
        # Header
        output.append("=" * 80)
        output.append("🇱🇰 PERSONALIZED SRI LANKA TRAVEL ITINERARY 🇱🇰")
        output.append("=" * 80)
        output.append("")
        
        # Metadata summary
        output.append(f"{self.emojis['duration']} Duration: {metadata['total_days']} days")
        output.append(f"{self.emojis['budget']} Budget: {metadata['budget_category']}")
        output.append(f"🎯 Travel Style: {metadata['travel_style']}")
        output.append(f"{self.emojis['location']} Total Locations: {metadata['total_locations']}")
        
        if metadata['preferred_provinces']:
            output.append(f"{self.emojis['province']} Preferred Provinces: {', '.join(metadata['preferred_provinces'])}")
        
        if metadata['preferred_categories']:
            output.append(f"{self.emojis['category']} Interests: {', '.join(metadata['preferred_categories'])}")
        
        output.append(f"📆 Generated: {metadata['generation_date']}")
        output.append("")
        output.append("=" * 80)
        output.append("")
        
        # Daily schedules
        for day_info in daily_schedule:
            day_num = day_info['day']
            locations = day_info['locations']
            
            output.append(f"{self.emojis['day']} DAY {day_num}")
            output.append("-" * 50)
            
            if not locations:
                output.append("   No activities planned for this day.")
                output.append("")
                continue
            
            # Daily overview
            start_time = self.format_time(day_info['start_time'])
            end_time = self.format_time(day_info['estimated_end_time'])
            
            output.append(f"   {self.emojis['time']} Schedule: {start_time} - {end_time} ({day_info['total_hours']:.1f} hours)")
            output.append("")
            
            # Location details
            for i, loc_schedule in enumerate(locations, 1):
                location = loc_schedule['location']
                name = location.get('name', location.get('Name', 'Unknown'))
                province = location.get('Province', 'Unknown')
                category = location.get('Category', 'Unknown')
                rating = location.get('Avg_Rating', 0)
                
                start_time = self.format_time(loc_schedule['start_time'])
                end_time = self.format_time(loc_schedule['end_time'])
                visit_duration = loc_schedule['visit_duration']
                travel_time = loc_schedule['travel_time_from_previous']
                
                output.append(f"   {i}. {self.emojis['location']} {name}")
                output.append(f"      {self.emojis['time']} Time: {start_time} - {end_time} ({visit_duration:.1f}h visit)")
                output.append(f"      {self.emojis['province']} Province: {province}")
                output.append(f"      {self.emojis['category']} Category: {category}")
                
                if rating > 0:
                    stars = "⭐" * min(int(rating), 5)
                    output.append(f"      {self.emojis['rating']} Rating: {rating:.1f}/5 {stars}")
                
                if travel_time > 0:
                    output.append(f"      {self.emojis['travel']} Travel from previous: {travel_time:.1f} hours")
                
                # Location-specific score info
                if 'Individual_Scores' in location:
                    scores = location['Individual_Scores']
                    score_text = ", ".join([f"{k}: {v:.2f}" for k, v in scores.items()][:3])
                    output.append(f"      📊 Match Score: {location.get('Composite_Score', 0):.3f} ({score_text})")
                
                output.append("")
            
            # Daily recommendations
            if 'recommendations' in day_info:
                recs = day_info['recommendations']
                output.append("   💡 DAILY RECOMMENDATIONS:")
                output.append(f"   {self.emojis['accommodation']} Stay: {recs['accommodation']}")
                output.append(f"   {self.emojis['travel']} Transport: {recs['transport']}")
                output.append(f"   {self.emojis['food']} Dining: {recs['dining']}")
                
                if 'daily_tips' in recs and recs['daily_tips']:
                    output.append("   ✨ Tips:")
                    for tip in recs['daily_tips']:
                        output.append(f"     • {tip}")
                
                if 'location_specific_tips' in day_info and day_info['location_specific_tips']:
                    output.append("   🗺️ Local Tips:")
                    for tip in day_info['location_specific_tips']:
                        output.append(f"     • {tip}")
            
            output.append("")
            output.append("-" * 50)
            output.append("")
        
        # Footer with general tips
        output.append("🌟 GENERAL TRAVEL TIPS FOR SRI LANKA:")
        output.append("-" * 40)
        output.append("• Always carry a valid ID and make copies of important documents")
        output.append("• Respect local customs and dress modestly at religious sites")
        output.append("• Try authentic Sri Lankan cuisine - rice and curry is a must!")
        output.append("• Negotiate prices for transport and shopping")
        output.append("• Stay hydrated and use sunscreen")
        output.append("• Learn basic Sinhala/Tamil greetings")
        output.append("• Keep emergency contacts handy")
        output.append("")
        output.append("=" * 80)
        output.append("🎉 HAVE A WONDERFUL TRIP TO SRI LANKA! 🎉")
        output.append("=" * 80)
        
        return "\n".join(output)
    
    def format_summary_itinerary(self, complete_itinerary):
        """Generate a concise summary version"""
        
        output = []
        metadata = complete_itinerary['metadata']
        daily_schedule = complete_itinerary['daily_schedule']
        
        output.append(f"🇱🇰 {metadata['total_days']}-Day Sri Lanka Itinerary ({metadata['travel_style']} Style)")
        output.append("=" * 60)
        
        for day_info in daily_schedule:
            day_num = day_info['day']
            locations = day_info['locations']
            
            if locations:
                location_names = []
                for loc_schedule in locations:
                    location = loc_schedule['location']
                    name = location.get('name', location.get('Name', 'Unknown'))
                    location_names.append(name)
                
                output.append(f"Day {day_num}: {' → '.join(location_names)}")
        
        return "\n".join(output)
    
    def display_itinerary(self, complete_itinerary, format_type='detailed'):
        """Display the itinerary in the specified format"""
        
        if format_type == 'detailed':
            formatted_itinerary = self.format_detailed_itinerary(complete_itinerary)
        elif format_type == 'summary':
            formatted_itinerary = self.format_summary_itinerary(complete_itinerary)
        else:
            formatted_itinerary = self.format_detailed_itinerary(complete_itinerary)
        
        print(formatted_itinerary)
        return formatted_itinerary
    
    def save_itinerary_to_file(self, complete_itinerary, filename=None, format_type='detailed'):
        """Save the itinerary to a text file"""
        
        if filename is None:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            filename = f"sri_lanka_itinerary_{timestamp}.txt"
        
        formatted_itinerary = self.format_detailed_itinerary(complete_itinerary) if format_type == 'detailed' else self.format_summary_itinerary(complete_itinerary)
        
        try:
            with open(filename, 'w', encoding='utf-8') as f:
                f.write(formatted_itinerary)
            print(f"✅ Itinerary saved to: {filename}")
            return filename
        except Exception as e:
            print(f"❌ Error saving itinerary: {e}")
            return None

# Initialize formatter and display the complete itinerary
formatter = ItineraryFormatter()

if 'complete_itinerary' in locals():
    print("\n" + "🎨 FORMATTING COMPLETE ITINERARY" + "\n")
    
    # Display detailed itinerary
    formatted_output = formatter.display_itinerary(complete_itinerary, 'detailed')
    
    print("\n" + "="*60)
    print("📋 SUMMARY VERSION:")
    print("="*60)
    
    # Display summary version
    summary_output = formatter.display_itinerary(complete_itinerary, 'summary')
    
    # Option to save to file
    print(f"\n💾 To save this itinerary to a file, run:")
    print(f"filename = formatter.save_itinerary_to_file(complete_itinerary)")
    
    print("\n✅ Automated Itinerary Generator completed successfully!")
    print("🌟 Your personalized Sri Lanka travel itinerary is ready!")
    
else:
    print("⚠️ Run the complete pipeline first to generate an itinerary for formatting")


🎨 FORMATTING COMPLETE ITINERARY

🇱🇰 PERSONALIZED SRI LANKA TRAVEL ITINERARY 🇱🇰

⏱️ Duration: 2 days
💰 Budget: Budget ($0-50/day)
🎯 Travel Style: Relaxed
📍 Total Locations: 2
🗺️ Preferred Provinces: Central
🏷️ Interests: Nature, Swim
📆 Generated: 2025-10-03 17:46:59


📅 DAY 1
--------------------------------------------------
   ⏰ Schedule: 09:00 - 14:00 (5.0 hours)

   1. 📍 Knuckles Mountain Range
      ⏰ Time: 09:00 - 13:00 (4.0h visit)
      🗺️ Province: Central Province
      🏷️ Category: Nature
      ⭐ Rating: 4.2/5 ⭐⭐⭐⭐
      📊 Match Score: 0.932 (province: 0.80, category: 1.00, rating: 0.85)

   💡 DAILY RECOMMENDATIONS:
   🏨 Stay: Guesthouses, hostels, budget hotels
   🚗 Transport: Local buses, trains, shared tuk-tuks
   🍽️ Dining: Local restaurants, street food, rice and curry shops
   ✨ Tips:
     • Book accommodation in advance
     • Use public transport

--------------------------------------------------

📅 DAY 2
--------------------------------------------------
   ⏰ Sched

## 🎯 Complete Automated Itinerary Generator Demo

**Run this cell to see the complete system in action with sample data:**

In [None]:
def run_complete_itinerary_generator_demo():
    """
    Demonstrate the complete Automated Itinerary Generator system
    """
    
    print("🇱🇰 AUTOMATED ITINERARY GENERATOR FOR SRI LANKA")
    print("=" * 80)
    print("🚀 Running complete system demonstration...")
    print()
    
    # Sample user preferences for demonstration
    demo_preferences = {
        'provinces': ['Western', 'Central', 'Southern'],
        'categories': ['Historical', 'Nature', 'Beach', 'Cultural'],
        'duration': 5,
        'budget': 'Mid-range ($50-150/day)',
        'style': 'Moderate'
    }
    
    print("📋 Demo User Preferences:")
    for key, value in demo_preferences.items():
        if isinstance(value, list):
            print(f"   {key.title()}: {', '.join(value)}")
        else:
            print(f"   {key.title()}: {value}")
    print()
    
    try:
        # Step 1: Score and rank locations
        print("🎯 Step 1: Scoring and ranking locations...")
        top_locations = scoring_algorithm.get_top_locations(demo_preferences, top_n=15)
        print(f"   ✅ Identified {len(top_locations)} potential locations")
        
        # Step 2: Apply filters and final ranking
        print("🔍 Step 2: Applying filters and final ranking...")
        filtered_locations = filter_ranker.filter_and_rank_locations(
            demo_preferences, 
            max_locations=demo_preferences['duration'] * 2
        )
        print(f"   ✅ Filtered to {len(filtered_locations)} optimal locations")
        
        # Step 3: Optimize travel sequence
        print("🚗 Step 3: Optimizing travel sequence...")
        optimized_sequence, total_travel_time = travel_calculator.optimize_travel_sequence(filtered_locations)
        print(f"   ✅ Optimized sequence with {total_travel_time:.1f} hours total travel time")
        
        # Step 4: Generate complete itinerary
        print("📅 Step 4: Generating complete itinerary...")
        complete_itinerary = itinerary_generator.generate_complete_itinerary(
            optimized_sequence, 
            demo_preferences
        )
        print(f"   ✅ Generated {len(complete_itinerary['daily_schedule'])}-day itinerary")
        
        # Step 5: Format and display
        print("🎨 Step 5: Formatting final output...")
        print()
        print("=" * 80)
        print("📋 GENERATED ITINERARY:")
        print("=" * 80)
        
        # Display the complete formatted itinerary
        formatter.display_itinerary(complete_itinerary, 'detailed')
        
        return complete_itinerary
        
    except Exception as e:
        print(f"❌ Demo failed with error: {e}")
        print("💡 This might be due to missing or incompatible data files.")
        print("   Please ensure all CSV files are available and properly formatted.")
        return None

def create_sample_data_demo():
    """
    Create sample data for demonstration if real data is not available
    """
    
    print("📊 Creating sample data for demonstration...")
    
    # Create sample locations data
    sample_locations = pd.DataFrame({
        'Name': [
            'Sigiriya Rock Fortress', 'Temple of the Tooth', 'Galle Fort', 
            'Yala National Park', 'Adam\'s Peak', 'Dambulla Cave Temple',
            'Ella Rock', 'Mirissa Beach', 'Polonnaruwa', 'Anuradhapura',
            'Nuwara Eliya', 'Bentota Beach', 'Hikkaduwa', 'Pinnawala Elephant Orphanage',
            'Royal Botanic Gardens Peradeniya'
        ],
        'Province': [
            'Central', 'Central', 'Southern', 'Uva', 'Central', 'Central',
            'Uva', 'Southern', 'North Central', 'North Central',
            'Central', 'Western', 'Southern', 'Western', 'Central'
        ],
        'Category': [
            'Historical', 'Religious', 'Historical', 'Nature', 'Religious', 'Religious',
            'Nature', 'Beach', 'Historical', 'Historical',
            'Nature', 'Beach', 'Beach', 'Nature', 'Nature'
        ],
        'Latitude': [
            7.9569, 7.2906, 6.0329, 6.3725, 6.8096, 7.8567,
            6.8667, 5.9549, 7.9403, 8.3114,
            6.9497, 6.4253, 6.1408, 7.2996, 7.2707
        ],
        'Longitude': [
            80.7603, 80.6337, 80.2168, 81.5185, 80.4987, 80.6516,
            81.0462, 80.4567, 81.0188, 80.3964,
            80.7891, 79.9925, 80.0991, 80.3887, 80.5977
        ],
        'Avg_Rating': [
            4.6, 4.4, 4.3, 4.5, 4.2, 4.4,
            4.3, 4.1, 4.2, 4.0,
            4.1, 3.9, 4.0, 4.0, 4.2
        ],
        'Review_Count': [
            1250, 980, 1100, 750, 650, 820,
            580, 920, 420, 380,
            640, 720, 890, 560, 480
        ]
    })
    
    # Create sample travel times
    sample_travel_times = pd.DataFrame({
        'Origin': ['Sigiriya Rock Fortress', 'Temple of the Tooth', 'Galle Fort'],
        'Destination': ['Temple of the Tooth', 'Ella Rock', 'Mirissa Beach'],
        'Travel_Time_Hours': [2.5, 3.0, 0.5]
    })
    
    # Create sample reviews
    sample_reviews = pd.DataFrame({
        'Location': ['Sigiriya Rock Fortress', 'Temple of the Tooth', 'Galle Fort'],
        'Rating': [5, 4, 4],
        'Review': ['Amazing historical site!', 'Very peaceful and spiritual', 'Beautiful colonial architecture']
    })
    
    print("✅ Sample data created successfully!")
    
    return sample_locations, sample_travel_times, sample_reviews

# Check if we have real data, otherwise create sample data
if 'comprehensive_df' not in locals() or len(comprehensive_df) == 0:
    print("⚠️ Real data not available. Creating sample data for demonstration...")
    sample_locations, sample_travel_times, sample_reviews = create_sample_data_demo()
    
    # Initialize with sample data
    sample_preprocessor = DataPreprocessor(sample_locations, sample_reviews, sample_travel_times)
    comprehensive_df = sample_preprocessor.merge_datasets()
    
    # Reinitialize components with sample data
    scoring_algorithm = MultilayeredScoringAlgorithm(comprehensive_df)
    filter_ranker = LocationFilterAndRanker(scoring_algorithm)
    travel_calculator = TravelTimeCalculator(sample_travel_times)
    itinerary_generator = ItineraryGenerator(travel_calculator)
    formatter = ItineraryFormatter()
    
    print("✅ System reinitialized with sample data")
    print()

# Run the complete demonstration
demo_result = run_complete_itinerary_generator_demo()

if demo_result:
    print("\n" + "🎉 DEMONSTRATION COMPLETED SUCCESSFULLY! 🎉")
    print("\n💡 Key Features Demonstrated:")
    print("   ✅ Multi-layered scoring algorithm")
    print("   ✅ User preference integration") 
    print("   ✅ Location filtering and ranking")
    print("   ✅ Travel time optimization")
    print("   ✅ Complete itinerary generation")
    print("   ✅ Human-readable output formatting")
    print("\n🚀 The Automated Itinerary Generator is ready for use!")
else:
    print("\n❌ Demonstration encountered issues. Please check data availability.")