<cell_type>markdown</cell_type># üåç Complete City Itinerary Generator

## End-to-End Flow:
1. **Select City** - Choose from pre-configured cities
2. **Fetch Landmarks** - From file/Wikidata/LLM (scalable!)
3. **Fetch POIs** - From Overture Maps (BigQuery)
4. **Merge & Enrich** - Combine data + add persona scores
5. **Define Trip** - Persona, Budget, Days
6. **Generate Itinerary** - Where to stay + What to visit (6 personas!)

---

### Scalable Landmark System
```bash
# Pre-generate landmarks for all cities
python -m data.scripts.generate_landmarks

# Or for a specific city
python -m data.scripts.generate_landmarks rome
```

### Setup
```bash
pip install google-cloud-bigquery db-dtypes pandas plotly requests
gcloud auth application-default login
```

In [1]:
# ===========================
# IMPORTS & SETUP
# ===========================

import pandas as pd
import numpy as np
import json
from pathlib import Path
from dataclasses import dataclass, field, asdict
from typing import List, Dict, Optional, Any
from datetime import date, timedelta
import warnings
warnings.filterwarnings('ignore')

# Check if running on Colab
try:
    import google.colab
    IN_COLAB = True
    print("‚úÖ Running on Google Colab")
except ImportError:
    IN_COLAB = False
    print("üìç Running locally")

# Colab Authentication (one-click!)
if IN_COLAB:
    from google.colab import auth
    auth.authenticate_user()
    print("‚úÖ Authenticated with Google Cloud!")

# BigQuery
try:
    from google.cloud import bigquery
    BIGQUERY_AVAILABLE = True
    print("‚úÖ BigQuery available")
except ImportError:
    BIGQUERY_AVAILABLE = False
    print("‚ùå Run: pip install google-cloud-bigquery db-dtypes")

# Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.templates.default = "plotly_white"

print("‚úÖ Libraries loaded!")

‚úÖ Running on Google Colab
‚úÖ Authenticated with Google Cloud!
‚úÖ BigQuery available
‚úÖ Libraries loaded!


---
## üìç Step 1: Add New City

Define your city's bounding box. Get coordinates from:
- https://boundingbox.klokantech.com/
- Google Maps (right-click ‚Üí coordinates)

In [2]:
# ===========================
# CITY CONFIGURATION
# ===========================

# Pre-configured cities
CITY_DATABASE = {
    "paris": {
        "name": "Paris",
        "country": "France",
        "currency": "EUR",
        "bbox": {"min_lon": 2.25, "min_lat": 48.81, "max_lon": 2.42, "max_lat": 48.91},
        "center": {"lat": 48.8566, "lon": 2.3522},
        "neighborhoods": [
            {"name": "Le Marais", "lat": 48.8566, "lon": 2.3622, "vibes": ["cultural", "shopping", "nightlife"], "best_for": ["couple", "solo", "friends"]},
            {"name": "Saint-Germain-des-Pr√©s", "lat": 48.8539, "lon": 2.3338, "vibes": ["cultural", "romantic", "foodie"], "best_for": ["couple", "honeymoon"]},
            {"name": "Montmartre", "lat": 48.8867, "lon": 2.3431, "vibes": ["romantic", "cultural", "photography"], "best_for": ["couple", "honeymoon", "solo"]},
            {"name": "Latin Quarter", "lat": 48.8494, "lon": 2.3470, "vibes": ["cultural", "foodie", "nightlife"], "best_for": ["solo", "friends", "budget"]},
            {"name": "Champs-√âlys√©es", "lat": 48.8698, "lon": 2.3076, "vibes": ["shopping", "cultural"], "best_for": ["family", "couple", "business"]},
            {"name": "Eiffel Tower / 7th", "lat": 48.8584, "lon": 2.2945, "vibes": ["romantic", "cultural", "photography"], "best_for": ["family", "couple", "honeymoon"]},
        ]
    },
    "rome": {
        "name": "Rome",
        "country": "Italy",
        "currency": "EUR",
        "bbox": {"min_lon": 12.40, "min_lat": 41.85, "max_lon": 12.55, "max_lat": 41.95},
        "center": {"lat": 41.9028, "lon": 12.4964},
        "neighborhoods": [
            {"name": "Centro Storico", "lat": 41.8986, "lon": 12.4769, "vibes": ["cultural", "foodie", "romantic"], "best_for": ["couple", "solo"]},
            {"name": "Trastevere", "lat": 41.8894, "lon": 12.4700, "vibes": ["foodie", "nightlife", "romantic"], "best_for": ["couple", "friends"]},
            {"name": "Vatican City", "lat": 41.9029, "lon": 12.4534, "vibes": ["cultural", "photography"], "best_for": ["family", "solo", "seniors"]},
            {"name": "Monti", "lat": 41.8956, "lon": 12.4939, "vibes": ["shopping", "foodie", "nightlife"], "best_for": ["solo", "friends"]},
        ]
    },
    "barcelona": {
        "name": "Barcelona",
        "country": "Spain",
        "currency": "EUR",
        "bbox": {"min_lon": 2.05, "min_lat": 41.32, "max_lon": 2.23, "max_lat": 41.47},
        "center": {"lat": 41.3851, "lon": 2.1734},
        "neighborhoods": [
            {"name": "Gothic Quarter", "lat": 41.3833, "lon": 2.1777, "vibes": ["cultural", "nightlife", "foodie"], "best_for": ["solo", "friends", "couple"]},
            {"name": "El Born", "lat": 41.3850, "lon": 2.1833, "vibes": ["shopping", "foodie", "cultural"], "best_for": ["couple", "friends"]},
            {"name": "Barceloneta", "lat": 41.3795, "lon": 2.1894, "vibes": ["beach", "relaxation", "foodie"], "best_for": ["family", "friends"]},
            {"name": "Gr√†cia", "lat": 41.4036, "lon": 2.1561, "vibes": ["local", "foodie", "relaxation"], "best_for": ["solo", "couple"]},
        ]
    },
    "tokyo": {
        "name": "Tokyo",
        "country": "Japan",
        "currency": "JPY",
        "bbox": {"min_lon": 139.55, "min_lat": 35.55, "max_lon": 139.85, "max_lat": 35.80},
        "center": {"lat": 35.6762, "lon": 139.6503},
        "neighborhoods": [
            {"name": "Shibuya", "lat": 35.6580, "lon": 139.7016, "vibes": ["shopping", "nightlife", "photography"], "best_for": ["friends", "solo"]},
            {"name": "Shinjuku", "lat": 35.6938, "lon": 139.7034, "vibes": ["nightlife", "shopping", "foodie"], "best_for": ["friends", "solo"]},
            {"name": "Asakusa", "lat": 35.7148, "lon": 139.7967, "vibes": ["cultural", "photography", "foodie"], "best_for": ["family", "couple", "seniors"]},
            {"name": "Ginza", "lat": 35.6717, "lon": 139.7649, "vibes": ["shopping", "foodie", "luxury"], "best_for": ["couple", "business"]},
        ]
    },
    "london": {
        "name": "London",
        "country": "UK",
        "currency": "GBP",
        "bbox": {"min_lon": -0.20, "min_lat": 51.45, "max_lon": 0.05, "max_lat": 51.55},
        "center": {"lat": 51.5074, "lon": -0.1278},
        "neighborhoods": [
            {"name": "Westminster", "lat": 51.4975, "lon": -0.1357, "vibes": ["cultural", "photography"], "best_for": ["family", "couple", "solo"]},
            {"name": "Soho", "lat": 51.5137, "lon": -0.1337, "vibes": ["nightlife", "foodie", "shopping"], "best_for": ["friends", "solo", "couple"]},
            {"name": "South Bank", "lat": 51.5055, "lon": -0.1146, "vibes": ["cultural", "photography", "foodie"], "best_for": ["family", "couple"]},
            {"name": "Shoreditch", "lat": 51.5246, "lon": -0.0794, "vibes": ["nightlife", "foodie", "art"], "best_for": ["friends", "solo"]},
        ]
    },
}

def add_new_city(name: str, country: str, currency: str,
                 min_lon: float, min_lat: float, max_lon: float, max_lat: float,
                 neighborhoods: List[Dict] = None):
    """Add a new city to the database."""
    key = name.lower()
    CITY_DATABASE[key] = {
        "name": name,
        "country": country,
        "currency": currency,
        "bbox": {"min_lon": min_lon, "min_lat": min_lat, "max_lon": max_lon, "max_lat": max_lat},
        "center": {"lat": (min_lat + max_lat) / 2, "lon": (min_lon + max_lon) / 2},
        "neighborhoods": neighborhoods or []
    }
    print(f"‚úÖ Added city: {name}")
    return CITY_DATABASE[key]

# Show available cities
print("\nüìç Available Cities:")
for key, city in CITY_DATABASE.items():
    print(f"   ‚Ä¢ {city['name']}, {city['country']}")


üìç Available Cities:
   ‚Ä¢ Paris, France
   ‚Ä¢ Rome, Italy
   ‚Ä¢ Barcelona, Spain
   ‚Ä¢ Tokyo, Japan
   ‚Ä¢ London, UK


---
## üåç Step 0: Country-Level Planning (Multi-City Trip)

Before selecting a single city, plan your multi-city trip:
1. **Select a country**
2. **Set total trip duration**
3. **Get city allocation options** (e.g., 3 days Rome + 2 days Florence + 1 day Milan)
4. **Select your preferred option**
5. **Generate itineraries for each city**

In [3]:
# ===========================
# COUNTRY DATABASE
# ===========================

COUNTRY_DATABASE = {
    "italy": {
        "name": "Italy",
        "currency": "EUR",
        "languages": ["Italian"],
        "cities": {
            "rome": {
                "name": "Rome",
                "min_days": 2,
                "max_days": 5,
                "ideal_days": 3,
                "priority": 1,  # Higher = more important
                "highlights": ["Colosseum", "Vatican", "Trevi Fountain", "Roman Forum"],
                "vibes": ["cultural", "historical", "foodie", "romantic"],
                "best_for": ["family", "couple", "solo", "seniors"],
            },
            "florence": {
                "name": "Florence",
                "min_days": 2,
                "max_days": 4,
                "ideal_days": 2,
                "priority": 2,
                "highlights": ["Uffizi Gallery", "Duomo", "Ponte Vecchio", "Accademia"],
                "vibes": ["cultural", "art", "foodie", "romantic"],
                "best_for": ["couple", "solo", "art lovers"],
            },
            "venice": {
                "name": "Venice",
                "min_days": 1,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["St. Mark's Square", "Grand Canal", "Rialto Bridge", "Murano"],
                "vibes": ["romantic", "photography", "unique", "cultural"],
                "best_for": ["couple", "honeymoon", "photography"],
            },
            "milan": {
                "name": "Milan",
                "min_days": 1,
                "max_days": 2,
                "ideal_days": 1,
                "priority": 4,
                "highlights": ["Duomo", "Last Supper", "Galleria", "Fashion District"],
                "vibes": ["shopping", "fashion", "cultural", "business"],
                "best_for": ["solo", "business", "shopping lovers"],
            },
            "amalfi": {
                "name": "Amalfi Coast",
                "min_days": 2,
                "max_days": 4,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Positano", "Amalfi", "Ravello", "Capri"],
                "vibes": ["beach", "relaxation", "romantic", "scenic"],
                "best_for": ["couple", "honeymoon", "relaxation"],
            },
            "cinque_terre": {
                "name": "Cinque Terre",
                "min_days": 1,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 4,
                "highlights": ["Five Villages", "Hiking Trails", "Beaches", "Seafood"],
                "vibes": ["adventure", "photography", "beach", "hiking"],
                "best_for": ["adventure", "couple", "friends"],
            },
        },
        # Travel time between cities (in minutes)
        "travel_times": {
            ("rome", "florence"): 95,      # High-speed train
            ("rome", "venice"): 225,       # High-speed train
            ("rome", "milan"): 180,        # High-speed train
            ("rome", "amalfi"): 180,       # Car/bus
            ("florence", "venice"): 120,   # High-speed train
            ("florence", "milan"): 100,    # High-speed train
            ("florence", "cinque_terre"): 150,  # Train
            ("venice", "milan"): 150,      # High-speed train
            ("milan", "cinque_terre"): 180, # Train
        },
        # Popular multi-city routes
        "popular_routes": [
            ["rome", "florence", "venice"],  # Classic Italy
            ["rome", "florence"],             # Art & History
            ["rome", "amalfi"],               # Rome & Beach
            ["milan", "cinque_terre", "florence"],  # North Italy
        ],
    },
    "france": {
        "name": "France",
        "currency": "EUR",
        "languages": ["French"],
        "cities": {
            "paris": {
                "name": "Paris",
                "min_days": 3,
                "max_days": 6,
                "ideal_days": 4,
                "priority": 1,
                "highlights": ["Eiffel Tower", "Louvre", "Notre-Dame", "Versailles"],
                "vibes": ["romantic", "cultural", "foodie", "art"],
                "best_for": ["couple", "honeymoon", "family", "solo"],
            },
            "nice": {
                "name": "Nice",
                "min_days": 2,
                "max_days": 4,
                "ideal_days": 2,
                "priority": 2,
                "highlights": ["Promenade des Anglais", "Old Town", "Beach", "Monaco day trip"],
                "vibes": ["beach", "relaxation", "scenic", "foodie"],
                "best_for": ["couple", "relaxation", "seniors"],
            },
            "lyon": {
                "name": "Lyon",
                "min_days": 1,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Old Lyon", "Gastronomy", "Basilica", "Traboules"],
                "vibes": ["foodie", "cultural", "local"],
                "best_for": ["foodie", "couple", "solo"],
            },
            "bordeaux": {
                "name": "Bordeaux",
                "min_days": 2,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Wine Regions", "Old Town", "La Cit√© du Vin"],
                "vibes": ["wine", "foodie", "relaxation", "cultural"],
                "best_for": ["couple", "foodie", "wine lovers"],
            },
        },
        "travel_times": {
            ("paris", "nice"): 330,        # TGV
            ("paris", "lyon"): 120,        # TGV
            ("paris", "bordeaux"): 140,    # TGV
            ("lyon", "nice"): 280,         # Train
        },
        "popular_routes": [
            ["paris", "nice"],              # City & Beach
            ["paris", "lyon", "nice"],      # Grand Tour
            ["paris", "bordeaux"],          # City & Wine
        ],
    },
    "spain": {
        "name": "Spain",
        "currency": "EUR",
        "languages": ["Spanish"],
        "cities": {
            "barcelona": {
                "name": "Barcelona",
                "min_days": 2,
                "max_days": 5,
                "ideal_days": 3,
                "priority": 1,
                "highlights": ["Sagrada Familia", "Park G√ºell", "Gothic Quarter", "La Rambla"],
                "vibes": ["cultural", "beach", "nightlife", "art"],
                "best_for": ["couple", "friends", "family", "solo"],
            },
            "madrid": {
                "name": "Madrid",
                "min_days": 2,
                "max_days": 4,
                "ideal_days": 2,
                "priority": 2,
                "highlights": ["Prado Museum", "Royal Palace", "Retiro Park", "Plaza Mayor"],
                "vibes": ["cultural", "nightlife", "foodie", "art"],
                "best_for": ["couple", "friends", "solo"],
            },
            "seville": {
                "name": "Seville",
                "min_days": 2,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Alc√°zar", "Cathedral", "Flamenco", "Plaza de Espa√±a"],
                "vibes": ["cultural", "romantic", "foodie", "flamenco"],
                "best_for": ["couple", "solo", "cultural"],
            },
            "granada": {
                "name": "Granada",
                "min_days": 1,
                "max_days": 2,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Alhambra", "Albaic√≠n", "Tapas Culture"],
                "vibes": ["cultural", "historical", "foodie"],
                "best_for": ["couple", "solo", "history lovers"],
            },
        },
        "travel_times": {
            ("barcelona", "madrid"): 150,   # AVE high-speed
            ("madrid", "seville"): 150,     # AVE
            ("seville", "granada"): 180,    # Bus/train
            ("barcelona", "seville"): 330,  # AVE via Madrid
        },
        "popular_routes": [
            ["barcelona", "madrid"],                    # Two Cities
            ["barcelona", "madrid", "seville"],         # Classic Spain
            ["madrid", "seville", "granada"],           # Andalusia
        ],
    },
    "japan": {
        "name": "Japan",
        "currency": "JPY",
        "languages": ["Japanese"],
        "cities": {
            "tokyo": {
                "name": "Tokyo",
                "min_days": 3,
                "max_days": 6,
                "ideal_days": 4,
                "priority": 1,
                "highlights": ["Shibuya", "Senso-ji", "Meiji Shrine", "Akihabara"],
                "vibes": ["modern", "cultural", "foodie", "shopping"],
                "best_for": ["family", "couple", "solo", "friends"],
            },
            "kyoto": {
                "name": "Kyoto",
                "min_days": 2,
                "max_days": 4,
                "ideal_days": 3,
                "priority": 2,
                "highlights": ["Fushimi Inari", "Kinkaku-ji", "Arashiyama", "Geisha District"],
                "vibes": ["traditional", "cultural", "peaceful", "photography"],
                "best_for": ["couple", "solo", "seniors", "cultural"],
            },
            "osaka": {
                "name": "Osaka",
                "min_days": 1,
                "max_days": 3,
                "ideal_days": 2,
                "priority": 3,
                "highlights": ["Dotonbori", "Osaka Castle", "Street Food", "Universal Studios"],
                "vibes": ["foodie", "nightlife", "fun", "local"],
                "best_for": ["friends", "family", "foodie"],
            },
            "hiroshima": {
                "name": "Hiroshima",
                "min_days": 1,
                "max_days": 2,
                "ideal_days": 1,
                "priority": 4,
                "highlights": ["Peace Memorial", "Miyajima Island", "Atomic Bomb Dome"],
                "vibes": ["historical", "peaceful", "cultural"],
                "best_for": ["solo", "couple", "history lovers"],
            },
        },
        "travel_times": {
            ("tokyo", "kyoto"): 135,        # Shinkansen
            ("tokyo", "osaka"): 150,        # Shinkansen
            ("kyoto", "osaka"): 15,         # Train
            ("osaka", "hiroshima"): 90,     # Shinkansen
            ("kyoto", "hiroshima"): 100,    # Shinkansen
        },
        "popular_routes": [
            ["tokyo", "kyoto"],                     # Classic Japan
            ["tokyo", "kyoto", "osaka"],            # Golden Route
            ["tokyo", "kyoto", "osaka", "hiroshima"],  # Extended
        ],
    },
}

def get_travel_time(country_id: str, city1: str, city2: str) -> int:
    """Get travel time between two cities in minutes."""
    country = COUNTRY_DATABASE.get(country_id, {})
    travel_times = country.get("travel_times", {})

    # Check both directions
    key1 = (city1, city2)
    key2 = (city2, city1)

    if key1 in travel_times:
        return travel_times[key1]
    elif key2 in travel_times:
        return travel_times[key2]
    else:
        return 999  # Unknown - very long

# Show available countries
print("üåç Available Countries:")
for key, country in COUNTRY_DATABASE.items():
    cities = list(country["cities"].keys())
    print(f"   ‚Ä¢ {country['name']}: {', '.join([country['cities'][c]['name'] for c in cities])}")


üåç Available Countries:
   ‚Ä¢ Italy: Rome, Florence, Venice, Milan, Amalfi Coast, Cinque Terre
   ‚Ä¢ France: Paris, Nice, Lyon, Bordeaux
   ‚Ä¢ Spain: Barcelona, Madrid, Seville, Granada
   ‚Ä¢ Japan: Tokyo, Kyoto, Osaka, Hiroshima


In [4]:
# ===========================
# CITY ALLOCATION ALGORITHM
# ===========================
from itertools import combinations, permutations
from typing import List, Dict, Tuple
import copy

def generate_city_allocations(
    country_id: str,
    total_days: int,
    group_type: str = "couple",
    vibes: List[str] = None,
    must_include: List[str] = None,
    exclude_cities: List[str] = None,
    prefer_fewer_cities: bool = False,
    num_options: int = 3
) -> List[Dict]:
    """
    Generate city allocation options for a multi-city trip.

    Returns multiple options like:
    - Option 1: 3 days Rome, 2 days Florence, 1 day Venice
    - Option 2: 4 days Rome, 2 days Florence
    - Option 3: 2 days Rome, 2 days Florence, 2 days Amalfi
    """
    country = COUNTRY_DATABASE.get(country_id)
    if not country:
        raise ValueError(f"Country {country_id} not found")

    vibes = vibes or []
    must_include = must_include or []
    exclude_cities = exclude_cities or []

    cities = country["cities"]
    available_cities = {k: v for k, v in cities.items() if k not in exclude_cities}

    # Score each city based on preferences
    city_scores = {}
    for city_id, city in available_cities.items():
        score = 0

        # Priority score
        score += (6 - city.get("priority", 5)) * 2

        # Vibe match
        city_vibes = set(city.get("vibes", []))
        vibe_match = len(set(vibes) & city_vibes)
        score += vibe_match * 3

        # Group type match
        best_for = set(city.get("best_for", []))
        if group_type in best_for:
            score += 5

        # Must include bonus
        if city_id in must_include:
            score += 100

        city_scores[city_id] = score

    # Sort cities by score
    sorted_cities = sorted(city_scores.items(), key=lambda x: x[1], reverse=True)

    # Generate allocation options
    options = []

    # Determine number of cities based on trip length
    if prefer_fewer_cities:
        min_cities = 1
        max_cities = min(2, len(available_cities))
    else:
        if total_days <= 3:
            min_cities, max_cities = 1, 2
        elif total_days <= 5:
            min_cities, max_cities = 2, 3
        elif total_days <= 7:
            min_cities, max_cities = 2, 4
        else:
            min_cities, max_cities = 3, min(5, len(available_cities))

    # Ensure must_include cities are counted
    min_cities = max(min_cities, len(must_include))

    # Generate combinations
    top_cities = [c[0] for c in sorted_cities[:6]]  # Top 6 scored cities

    # Ensure must_include cities are in the list
    for city_id in must_include:
        if city_id not in top_cities:
            top_cities.append(city_id)

    all_combinations = []
    for num_cities in range(min_cities, max_cities + 1):
        for combo in combinations(top_cities, num_cities):
            # Skip if missing must_include cities
            if not all(c in combo for c in must_include):
                continue
            all_combinations.append(combo)

    # Score and allocate days for each combination
    scored_options = []

    for combo in all_combinations:
        # Calculate allocation
        allocation = allocate_days_to_cities(
            country_id, list(combo), total_days, available_cities
        )

        if allocation is None:
            continue

        # Calculate total travel time
        total_travel = 0
        for i in range(len(allocation) - 1):
            travel = get_travel_time(country_id, allocation[i]["city_id"], allocation[i+1]["city_id"])
            total_travel += travel
            allocation[i+1]["travel_time_from_previous"] = travel

        # Score this option
        option_score = 0
        for alloc in allocation:
            city_id = alloc["city_id"]
            days = alloc["days"]
            city = available_cities[city_id]

            # City score
            option_score += city_scores.get(city_id, 0) * days

            # Penalty for too few or too many days
            ideal = city.get("ideal_days", 2)
            min_d = city.get("min_days", 1)
            max_d = city.get("max_days", 5)

            if days < min_d:
                option_score -= 10
            elif days > max_d:
                option_score -= 5
            elif days == ideal:
                option_score += 5

        # Penalty for excessive travel
        option_score -= total_travel / 30  # -1 point per 30 min travel

        scored_options.append({
            "combo": combo,
            "allocation": allocation,
            "score": option_score,
            "total_travel_minutes": total_travel,
        })

    # Sort by score and take top N
    scored_options.sort(key=lambda x: x["score"], reverse=True)

    # Format final options
    for i, opt in enumerate(scored_options[:num_options]):
        cities_str = " + ".join([f"{a['days']} days {a['city_name']}" for a in opt["allocation"]])

        # Generate option name based on cities
        city_vibes = []
        for alloc in opt["allocation"]:
            city = available_cities[alloc["city_id"]]
            city_vibes.extend(city.get("vibes", []))

        if "romantic" in city_vibes and "beach" in city_vibes:
            name = "Romance & Relaxation"
        elif "cultural" in city_vibes and "art" in city_vibes:
            name = "Art & Culture Tour"
        elif "foodie" in city_vibes:
            name = "Culinary Journey"
        elif "adventure" in city_vibes:
            name = "Adventure Trail"
        else:
            name = f"Option {i+1}"

        # Pros and cons
        pros = []
        cons = []

        num_cities_in_opt = len(opt["allocation"])
        if num_cities_in_opt == 1:
            pros.append("Deep exploration of one city")
            cons.append("Less variety")
        elif num_cities_in_opt >= 3:
            pros.append("Great variety of experiences")
            cons.append(f"More travel time ({opt['total_travel_minutes']//60}h total)")

        if opt["total_travel_minutes"] < 180:
            pros.append("Minimal travel time")

        for alloc in opt["allocation"]:
            city = available_cities[alloc["city_id"]]
            if alloc["days"] >= city.get("ideal_days", 2):
                pros.append(f"Enough time in {alloc['city_name']}")

        options.append({
            "option_id": i + 1,
            "option_name": name,
            "description": cities_str,
            "cities": opt["allocation"],
            "total_days": total_days,
            "total_travel_minutes": opt["total_travel_minutes"],
            "match_score": min(opt["score"] / 50, 1.0),  # Normalize to 0-1
            "pros": pros[:3],
            "cons": cons[:2],
        })

    return options


def allocate_days_to_cities(
    country_id: str,
    city_ids: List[str],
    total_days: int,
    cities_data: Dict
) -> List[Dict]:
    """Allocate days to cities based on their ideal duration."""

    # Start with minimum days
    allocation = []
    remaining_days = total_days

    # Sort by priority
    sorted_cities = sorted(
        city_ids,
        key=lambda x: cities_data[x].get("priority", 5)
    )

    # First pass: assign minimum days
    for city_id in sorted_cities:
        city = cities_data[city_id]
        min_days = city.get("min_days", 1)

        if remaining_days < min_days:
            return None  # Can't fit all cities

        allocation.append({
            "city_id": city_id,
            "city_name": city["name"],
            "days": min_days,
            "highlights": city.get("highlights", [])[:3],
        })
        remaining_days -= min_days

    # Second pass: distribute remaining days
    while remaining_days > 0:
        # Find city that can use more days and has highest priority
        best_city_idx = None
        best_score = -1

        for i, alloc in enumerate(allocation):
            city_id = alloc["city_id"]
            city = cities_data[city_id]
            current_days = alloc["days"]
            max_days = city.get("max_days", 5)
            ideal_days = city.get("ideal_days", 2)
            priority = city.get("priority", 5)

            if current_days >= max_days:
                continue

            # Score: prefer cities below ideal, then by priority
            score = (6 - priority) * 10
            if current_days < ideal_days:
                score += 20

            if score > best_score:
                best_score = score
                best_city_idx = i

        if best_city_idx is None:
            break  # Can't allocate more days

        allocation[best_city_idx]["days"] += 1
        remaining_days -= 1

    return allocation


# Display function
def display_allocation_options(options: List[Dict]):
    """Pretty print allocation options."""
    print("\n" + "="*60)
    print("üó∫Ô∏è  CITY ALLOCATION OPTIONS")
    print("="*60)

    for opt in options:
        print(f"\nüìã Option {opt['option_id']}: {opt['option_name']}")
        print(f"   {opt['description']}")
        print(f"   Total travel: {opt['total_travel_minutes']//60}h {opt['total_travel_minutes']%60}m")
        print(f"   Match score: {opt['match_score']:.0%}")

        print("   ‚úÖ Pros:", ", ".join(opt['pros']))
        if opt['cons']:
            print("   ‚ö†Ô∏è Cons:", ", ".join(opt['cons']))

        print("   ---")
        for city in opt['cities']:
            travel = city.get('travel_time_from_previous', 0)
            travel_str = f" (üöÜ {travel//60}h {travel%60}m travel)" if travel else ""
            print(f"   ‚Ä¢ {city['city_name']}: {city['days']} day(s){travel_str}")
            print(f"     Highlights: {', '.join(city['highlights'])}")

print("‚úÖ City allocation algorithm loaded!")


‚úÖ City allocation algorithm loaded!


In [5]:
# ===========================
# üéØ SELECT YOUR COUNTRY & TRIP DETAILS
# ===========================

# üîß CHANGE THESE VALUES FOR YOUR TRIP
SELECTED_COUNTRY = "italy"      # Options: italy, france, spain, japan
TOTAL_TRIP_DAYS = 7             # Total days for your trip
GROUP_TYPE = "couple"           # family, couple, solo, friends, honeymoon
PREFERRED_VIBES = ["cultural", "foodie", "romantic"]  # Your travel vibes
MUST_INCLUDE_CITIES = []        # Cities that must be in the trip, e.g., ["rome", "venice"]
EXCLUDE_CITIES = []             # Cities to avoid
PREFER_FEWER_CITIES = False     # True = deeper exploration, fewer cities

# Generate allocation options
allocation_options = generate_city_allocations(
    country_id=SELECTED_COUNTRY,
    total_days=TOTAL_TRIP_DAYS,
    group_type=GROUP_TYPE,
    vibes=PREFERRED_VIBES,
    must_include=MUST_INCLUDE_CITIES,
    exclude_cities=EXCLUDE_CITIES,
    prefer_fewer_cities=PREFER_FEWER_CITIES,
    num_options=3
)

# Display options
display_allocation_options(allocation_options)



üó∫Ô∏è  CITY ALLOCATION OPTIONS

üìã Option 1: Art & Culture Tour
   5 days Rome + 2 days Florence
   Total travel: 1h 35m
   Match score: 100%
   ‚úÖ Pros: Minimal travel time, Enough time in Rome, Enough time in Florence
   ---
   ‚Ä¢ Rome: 5 day(s)
     Highlights: Colosseum, Vatican, Trevi Fountain
   ‚Ä¢ Florence: 2 day(s) (üöÜ 1h 35m travel)
     Highlights: Uffizi Gallery, Duomo, Ponte Vecchio

üìã Option 2: Art & Culture Tour
   4 days Rome + 2 days Florence + 1 days Venice
   Total travel: 3h 35m
   Match score: 100%
   ‚úÖ Pros: Great variety of experiences, Enough time in Rome, Enough time in Florence
   ‚ö†Ô∏è Cons: More travel time (3h total)
   ---
   ‚Ä¢ Rome: 4 day(s)
     Highlights: Colosseum, Vatican, Trevi Fountain
   ‚Ä¢ Florence: 2 day(s) (üöÜ 1h 35m travel)
     Highlights: Uffizi Gallery, Duomo, Ponte Vecchio
   ‚Ä¢ Venice: 1 day(s) (üöÜ 2h 0m travel)
     Highlights: St. Mark's Square, Grand Canal, Rialto Bridge

üìã Option 3: Culinary Journey
   5 days

In [6]:
# ===========================
# üéØ SELECT YOUR PREFERRED OPTION
# ===========================

# üîß CHOOSE YOUR OPTION (1, 2, or 3)
SELECTED_OPTION = 1

# Get the selected allocation
selected_allocation = allocation_options[SELECTED_OPTION - 1]

print(f"\n‚úÖ Selected: {selected_allocation['option_name']}")
print(f"   {selected_allocation['description']}")
print(f"\nüìÖ Trip Breakdown:")
for city in selected_allocation['cities']:
    print(f"   ‚Ä¢ {city['city_name']}: {city['days']} day(s)")

# Store cities for itinerary generation
CITIES_TO_GENERATE = selected_allocation['cities']



‚úÖ Selected: Art & Culture Tour
   5 days Rome + 2 days Florence

üìÖ Trip Breakdown:
   ‚Ä¢ Rome: 5 day(s)
   ‚Ä¢ Florence: 2 day(s)


---
## üóìÔ∏è Generate Multi-City Itinerary

Now we'll generate a detailed day-by-day itinerary for each city in your selected option.

For each city, the notebook will:
1. Fetch POIs from Overture Maps
2. Score POIs based on your persona
3. Generate optimized daily itinerary

In [7]:
# ===========================
# üóìÔ∏è GENERATE MULTI-CITY ITINERARY
# ===========================

from datetime import date, timedelta
import json as json_lib

def generate_multi_city_trip(
    country_id: str,
    cities_allocation: List[Dict],
    start_date: date,
    group_type: str,
    vibes: List[str],
    budget_level: int = 3,
    pacing: str = "moderate"
):
    """Generate a complete multi-city trip itinerary."""

    country = COUNTRY_DATABASE[country_id]
    full_itinerary = {
        "country": country["name"],
        "total_days": sum(c["days"] for c in cities_allocation),
        "start_date": str(start_date),
        "group_type": group_type,
        "vibes": vibes,
        "city_itineraries": [],
        "travel_segments": [],
    }

    current_date = start_date

    for i, city_alloc in enumerate(cities_allocation):
        city_id = city_alloc["city_id"]
        city_name = city_alloc["city_name"]
        num_days = city_alloc["days"]

        print(f"\n{'='*50}")
        print(f"üìç Generating itinerary for {city_name} ({num_days} days)")
        print(f"{'='*50}")

        # Check if city exists in CITY_DATABASE
        if city_id not in CITY_DATABASE:
            print(f"‚ö†Ô∏è City {city_id} not in CITY_DATABASE. Adding basic config...")
            # Add basic city config from country database
            country_city = country["cities"][city_id]
            # You would need to add proper bbox and neighborhoods
            print(f"   Please add {city_name} to CITY_DATABASE for full POI fetching")

            city_itinerary = {
                "city": city_name,
                "days": num_days,
                "start_date": str(current_date),
                "end_date": str(current_date + timedelta(days=num_days - 1)),
                "highlights": country_city.get("highlights", []),
                "daily_plan": [
                    {
                        "day": d + 1,
                        "date": str(current_date + timedelta(days=d)),
                        "activities": country_city.get("highlights", [])[:3]
                    }
                    for d in range(num_days)
                ],
                "status": "basic"  # Indicates needs full POI data
            }
        else:
            # Full city itinerary generation
            city_config = CITY_DATABASE[city_id]

            # Here you would call the existing POI fetching and scoring
            # For now, create a placeholder
            city_itinerary = {
                "city": city_name,
                "city_id": city_id,
                "days": num_days,
                "start_date": str(current_date),
                "end_date": str(current_date + timedelta(days=num_days - 1)),
                "neighborhoods": [n["name"] for n in city_config.get("neighborhoods", [])],
                "daily_plan": [],
                "status": "ready_for_generation"
            }

            print(f"   ‚úÖ City config found! Run Step 2 onwards with SELECTED_CITY = '{city_id}'")

        full_itinerary["city_itineraries"].append(city_itinerary)

        # Add travel segment to next city
        if i < len(cities_allocation) - 1:
            next_city = cities_allocation[i + 1]
            travel_time = get_travel_time(country_id, city_id, next_city["city_id"])

            travel_segment = {
                "from": city_name,
                "to": next_city["city_name"],
                "date": str(current_date + timedelta(days=num_days - 1)),
                "travel_time_minutes": travel_time,
                "suggested_departure": "Morning" if travel_time > 180 else "Afternoon"
            }
            full_itinerary["travel_segments"].append(travel_segment)

            print(f"   üöÜ Travel to {next_city['city_name']}: {travel_time//60}h {travel_time%60}m")

        current_date += timedelta(days=num_days)

    return full_itinerary


# Generate the multi-city itinerary
from datetime import date

# üîß SET YOUR START DATE
TRIP_START_DATE = date(2024, 6, 15)

multi_city_itinerary = generate_multi_city_trip(
    country_id=SELECTED_COUNTRY,
    cities_allocation=CITIES_TO_GENERATE,
    start_date=TRIP_START_DATE,
    group_type=GROUP_TYPE,
    vibes=PREFERRED_VIBES,
    budget_level=3,
    pacing="moderate"
)

# Display summary
print("\n" + "="*60)
print("üéâ MULTI-CITY TRIP SUMMARY")
print("="*60)
print(f"Country: {multi_city_itinerary['country']}")
print(f"Total Days: {multi_city_itinerary['total_days']}")
print(f"Start Date: {multi_city_itinerary['start_date']}")

print("\nüìÖ City Schedule:")
for city_itin in multi_city_itinerary['city_itineraries']:
    print(f"   ‚Ä¢ {city_itin['city']}: {city_itin['start_date']} to {city_itin['end_date']} ({city_itin['days']} days)")

if multi_city_itinerary['travel_segments']:
    print("\nüöÜ Travel Segments:")
    for segment in multi_city_itinerary['travel_segments']:
        print(f"   ‚Ä¢ {segment['from']} ‚Üí {segment['to']}: {segment['travel_time_minutes']//60}h {segment['travel_time_minutes']%60}m ({segment['suggested_departure']})")



üìç Generating itinerary for Rome (5 days)
   ‚úÖ City config found! Run Step 2 onwards with SELECTED_CITY = 'rome'
   üöÜ Travel to Florence: 1h 35m

üìç Generating itinerary for Florence (2 days)
‚ö†Ô∏è City florence not in CITY_DATABASE. Adding basic config...
   Please add Florence to CITY_DATABASE for full POI fetching

üéâ MULTI-CITY TRIP SUMMARY
Country: Italy
Total Days: 7
Start Date: 2024-06-15

üìÖ City Schedule:
   ‚Ä¢ Rome: 2024-06-15 to 2024-06-19 (5 days)
   ‚Ä¢ Florence: 2024-06-20 to 2024-06-21 (2 days)

üöÜ Travel Segments:
   ‚Ä¢ Rome ‚Üí Florence: 1h 35m (Afternoon)


In [8]:
# ===========================
# ADD A NEW CITY (Example)
# ===========================

# Uncomment and modify to add a new city:

# add_new_city(
#     name="Dubai",
#     country="UAE",
#     currency="AED",
#     min_lon=55.10,
#     min_lat=25.05,
#     max_lon=55.35,
#     max_lat=25.30,
#     neighborhoods=[
#         {"name": "Downtown Dubai", "lat": 25.1972, "lon": 55.2744, "vibes": ["luxury", "shopping", "photography"], "best_for": ["couple", "family", "business"]},
#         {"name": "Dubai Marina", "lat": 25.0805, "lon": 55.1403, "vibes": ["nightlife", "beach", "foodie"], "best_for": ["friends", "couple"]},
#     ]
# )

In [9]:
# ===========================
# SELECT YOUR CITY
# ===========================

# üîß CHANGE THIS TO YOUR CITY
SELECTED_CITY = "paris"  # Options: paris, rome, barcelona, tokyo, london

city_config = CITY_DATABASE[SELECTED_CITY]
print(f"\nüèôÔ∏è Selected City: {city_config['name']}, {city_config['country']}")
print(f"   Currency: {city_config['currency']}")
print(f"   Bounding Box: {city_config['bbox']}")
print(f"   Neighborhoods: {len(city_config['neighborhoods'])}")


üèôÔ∏è Selected City: Paris, France
   Currency: EUR
   Bounding Box: {'min_lon': 2.25, 'min_lat': 48.81, 'max_lon': 2.42, 'max_lat': 48.91}
   Neighborhoods: 6


<cell_type>markdown</cell_type>---
## üèõÔ∏è Step 2a: Famous Landmarks (Scalable)

Before fetching POIs, we load famous landmarks for the city. This ensures iconic places like Eiffel Tower, Colosseum, etc. are always included.

**3 Methods (Auto-fallback):**
1. üìÅ **File** - Load from `data/landmarks/{city}_landmarks.json` (fastest)
2. üåê **Wikidata** - Query free SPARQL endpoint (no API key)
3. ü§ñ **LLM** - Generate with Gemini (requires API key)

To pre-generate landmarks for all cities:
```bash
python -m data.scripts.generate_landmarks
```

In [28]:
# ===========================
# GCP CONFIGURATION
# ===========================

# üîß SET YOUR GCP PROJECT ID
PROJECT_ID = "gen-lang-client-0518072406"  # <-- CHANGE THIS!

# üîß HOW MANY POIs TO FETCH (increased for better coverage)
POI_LIMIT = 5000

In [29]:
# ===========================
# TOURIST ATTRACTION CATEGORIES (Overture Maps)
# ===========================

# Only fetch these categories - real tourist attractions
TOURIST_CATEGORIES = [
    # Historical & Cultural
    'museum', 'art_gallery', 'art_museum', 'history_museum', 'science_museum',
    'church', 'cathedral', 'basilica', 'chapel', 'temple', 'mosque', 'synagogue',
    'monastery', 'abbey',
    'castle', 'palace', 'fort', 'fortress', 'citadel',
    'monument', 'memorial', 'statue', 'sculpture',
    'archaeological_site', 'ruins', 'historic_site', 'heritage_site',
    'tower', 'clock_tower', 'bell_tower',

    # Landmarks & Attractions
    'tourist_attraction', 'landmark', 'point_of_interest',
    'viewpoint', 'observation_deck', 'scenic_lookout',
    'bridge', 'famous_bridge',
    'square', 'plaza', 'piazza', 'place',
    'fountain', 'famous_fountain',
    'gate', 'arch', 'triumphal_arch',

    # Parks & Nature
    'park', 'garden', 'botanical_garden', 'public_garden',
    'zoo', 'aquarium', 'wildlife_park',
    'national_park', 'nature_reserve',

    # Entertainment & Culture
    'theater', 'theatre', 'opera_house', 'concert_hall',
    'amphitheater', 'amphitheatre', 'stadium', 'arena',
    'library', 'famous_library',
    'cemetery', 'famous_cemetery',  # Pere Lachaise, etc.

    # Markets (tourist ones)
    'market', 'flea_market', 'farmers_market', 'food_market',
]

def fetch_pois_from_overture(city_key: str, project_id: str, limit: int = 500) -> pd.DataFrame:
    """
    Fetch ONLY tourist attractions from Overture Maps via BigQuery.

    Filters for museums, monuments, churches, parks, landmarks, etc.
    No hotels, restaurants, shops, offices, etc.
    """
    if not BIGQUERY_AVAILABLE:
        print("‚ùå BigQuery not available")
        return None

    city = CITY_DATABASE[city_key]
    bbox = city['bbox']

    try:
        client = bigquery.Client(project=project_id)
        print(f"‚úÖ Connected to BigQuery (Project: {client.project})")
    except Exception as e:
        print(f"‚ùå Connection failed: {e}")
        print("\nüí° Make sure you ran the first cell to authenticate!")
        return None

    # Build category filter for SQL
    category_conditions = " OR ".join([
        f"LOWER(categories.primary) LIKE '%{cat}%'" for cat in TOURIST_CATEGORIES
    ])

    # Query with category filter - ONLY tourist attractions
    query = f"""
    SELECT
        id,
        names.primary AS name,
        categories.primary AS category,
        categories.alternate AS subcategories,
        ST_Y(geometry) AS latitude,
        ST_X(geometry) AS longitude,
        confidence
    FROM
        `bigquery-public-data.overture_maps.place`
    WHERE
        ST_X(geometry) BETWEEN {bbox['min_lon']} AND {bbox['max_lon']}
        AND ST_Y(geometry) BETWEEN {bbox['min_lat']} AND {bbox['max_lat']}
        AND confidence > 0.7
        AND names.primary IS NOT NULL
        AND categories.primary IS NOT NULL
        AND (
            {category_conditions}
        )
    ORDER BY confidence DESC
    LIMIT {limit}
    """

    print(f"\nüîÑ Fetching TOURIST ATTRACTIONS for {city['name']}...")
    print(f"   üìç Categories: museums, monuments, churches, parks, landmarks...")

    try:
        df = client.query(query).to_dataframe()
        df['address'] = ''

        print(f"‚úÖ Fetched {len(df)} tourist attractions!")

        # Show category breakdown
        if len(df) > 0:
            print(f"\nüìä Category breakdown:")
            cat_counts = df['category'].value_counts().head(10)
            for cat, count in cat_counts.items():
                print(f"   ‚Ä¢ {cat}: {count}")

        return df
    except Exception as e:
        print(f"‚ùå Query failed: {e}")
        return None


# Fetch POIs
pois_df = fetch_pois_from_overture(SELECTED_CITY, PROJECT_ID, POI_LIMIT)

‚úÖ Connected to BigQuery (Project: gen-lang-client-0518072406)

üîÑ Fetching TOURIST ATTRACTIONS for Paris...
   üìç Categories: museums, monuments, churches, parks, landmarks...
‚úÖ Fetched 5000 tourist attractions!

üìä Category breakdown:
   ‚Ä¢ art_gallery: 1126
   ‚Ä¢ supermarket: 865
   ‚Ä¢ marketing_agency: 425
   ‚Ä¢ architectural_designer: 402
   ‚Ä¢ park: 259
   ‚Ä¢ parking: 258
   ‚Ä¢ public_plaza: 241
   ‚Ä¢ landmark_and_historical_building: 196
   ‚Ä¢ theatre: 178
   ‚Ä¢ catholic_church: 125


In [30]:
# ===========================
# LANDMARK FUNCTIONS (FALLBACK ONLY)
# ===========================
# These functions are only used if Overture data is not available
# Primary approach: Overture ‚Üí Mark Famous ‚Üí Use directly

import requests

def load_landmarks_from_file(city: str) -> list:
    """Load landmarks from curated JSON file (backup method)."""
    possible_paths = [
        Path(f'../data/landmarks/{city.lower()}_landmarks.json'),
        Path(f'data/landmarks/{city.lower()}_landmarks.json'),
        Path(f'/content/data/landmarks/{city.lower()}_landmarks.json'),
    ]

    for path in possible_paths:
        if path.exists():
            with open(path, 'r') as f:
                return json.load(f)
    return []


def fetch_landmarks_from_wikidata(city_name: str, country: str, limit: int = 20) -> list:
    """Fetch from Wikidata SPARQL (backup method)."""
    query = f"""
    SELECT DISTINCT ?place ?placeLabel ?placeDescription (SAMPLE(?coord) AS ?coordinate)
    WHERE {{
      VALUES ?type {{ wd:Q570116 wd:Q33506 wd:Q16970 wd:Q839954 wd:Q4989906 }}
      ?place wdt:P31 ?type.
      ?place wdt:P131* ?city.
      ?city rdfs:label "{city_name}"@en.
      ?place wdt:P625 ?coord.
      ?article schema:about ?place.
      ?article schema:isPartOf <https://en.wikipedia.org/>.
      SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    GROUP BY ?place ?placeLabel ?placeDescription
    LIMIT {limit}
    """

    try:
        response = requests.get(
            "https://query.wikidata.org/sparql",
            params={'query': query, 'format': 'json'},
            headers={'User-Agent': 'TravelItineraryBot/1.0'},
            timeout=30
        )
        data = response.json()

        landmarks = []
        for item in data.get('results', {}).get('bindings', []):
            name = item.get('placeLabel', {}).get('value', '')
            coord = item.get('coordinate', {}).get('value', '')

            if name.startswith('Q') and name[1:].isdigit():
                continue

            if coord and name:
                try:
                    coord = coord.replace('Point(', '').replace(')', '')
                    lon, lat = map(float, coord.split())
                    landmarks.append({
                        'name': name,
                        'category': 'attraction',
                        'latitude': lat,
                        'longitude': lon,
                        'is_famous': True,
                        'must_see': True,
                        'duration_override': 90,
                    })
                except:
                    pass
        return landmarks
    except Exception as e:
        print(f"‚ö†Ô∏è Wikidata fetch failed: {e}")
        return []


print("‚úÖ Fallback landmark functions loaded (File, Wikidata)")
print("   üìç Primary method: Overture Maps ‚Üí Mark Famous")

‚úÖ Fallback landmark functions loaded (File, Wikidata)
   üìç Primary method: Overture Maps ‚Üí Mark Famous


In [13]:
# ===========================
# üóÑÔ∏è LEGACY: OLD LANDMARK FETCHING APPROACH
# ===========================
# This cell contains the OLD approach for reference/backup
# USE ONLY if Overture data is not available or incomplete
#
# Old Flow: File ‚Üí Wikidata ‚Üí LLM ‚Üí Hardcoded ‚Üí Merge with Overture
# New Flow: Overture ‚Üí Mark Famous ‚Üí Use directly (simpler!)
#
# To use old approach, uncomment and run this cell instead of cell-10

"""
# ============================================================
# LEGACY APPROACH - UNCOMMENT TO USE
# ============================================================

import requests

# Hardcoded fallback for major cities
HARDCODED_LANDMARKS = {
    'paris': [
        {'id': 'famous_eiffel', 'name': 'Eiffel Tower', 'category': 'monument', 'latitude': 48.8584, 'longitude': 2.2945, 'confidence': 1.0, 'address': 'Champ de Mars', 'is_famous': True, 'description': 'Iconic iron lattice tower, symbol of Paris.', 'duration_override': 120, 'must_see': True},
        {'id': 'famous_louvre', 'name': 'Louvre Museum', 'category': 'museum', 'latitude': 48.8606, 'longitude': 2.3376, 'confidence': 1.0, 'address': 'Rue de Rivoli', 'is_famous': True, 'description': "World's largest art museum. Home to Mona Lisa.", 'duration_override': 240, 'must_see': True},
        {'id': 'famous_notredame', 'name': 'Notre-Dame Cathedral', 'category': 'church', 'latitude': 48.8530, 'longitude': 2.3499, 'confidence': 1.0, 'address': '√éle de la Cit√©', 'is_famous': True, 'description': 'Medieval Gothic cathedral.', 'duration_override': 60, 'must_see': True},
        {'id': 'famous_sacrecoeur', 'name': 'Sacr√©-C≈ìur Basilica', 'category': 'church', 'latitude': 48.8867, 'longitude': 2.3431, 'confidence': 1.0, 'address': 'Montmartre', 'is_famous': True, 'description': 'White-domed basilica with panoramic views.', 'duration_override': 90, 'must_see': True},
        {'id': 'famous_arc', 'name': 'Arc de Triomphe', 'category': 'monument', 'latitude': 48.8738, 'longitude': 2.2950, 'confidence': 1.0, 'address': 'Place Charles de Gaulle', 'is_famous': True, 'description': 'Iconic triumphal arch.', 'duration_override': 60, 'must_see': True},
        {'id': 'famous_versailles', 'name': 'Palace of Versailles', 'category': 'palace', 'latitude': 48.8049, 'longitude': 2.1204, 'confidence': 1.0, 'address': 'Versailles', 'is_famous': True, 'description': 'Opulent royal palace with stunning gardens.', 'duration_override': 300, 'must_see': True},
        {'id': 'famous_orsay', 'name': "Mus√©e d'Orsay", 'category': 'museum', 'latitude': 48.8600, 'longitude': 2.3266, 'confidence': 1.0, 'address': 'Left Bank', 'is_famous': True, 'description': 'Impressionist masterpieces in former railway station.', 'duration_override': 180, 'must_see': True},
    ],
    'rome': [
        {'id': 'rome_colosseum', 'name': 'Colosseum', 'category': 'monument', 'latitude': 41.8902, 'longitude': 12.4922, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Iconic ancient Roman amphitheater, symbol of Rome.', 'duration_override': 120, 'must_see': True},
        {'id': 'rome_vatican', 'name': 'Vatican Museums', 'category': 'museum', 'latitude': 41.9065, 'longitude': 12.4536, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "World's greatest art collection including Sistine Chapel.", 'duration_override': 240, 'must_see': True},
        {'id': 'rome_stpeters', 'name': "St. Peter's Basilica", 'category': 'church', 'latitude': 41.9022, 'longitude': 12.4539, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Largest church in the world.', 'duration_override': 120, 'must_see': True},
        {'id': 'rome_pantheon', 'name': 'Pantheon', 'category': 'monument', 'latitude': 41.8986, 'longitude': 12.4769, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Best-preserved ancient Roman building.', 'duration_override': 60, 'must_see': True},
        {'id': 'rome_trevifountain', 'name': 'Trevi Fountain', 'category': 'monument', 'latitude': 41.9009, 'longitude': 12.4833, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Baroque masterpiece, throw a coin to return.', 'duration_override': 30, 'must_see': True},
        {'id': 'rome_romanforum', 'name': 'Roman Forum', 'category': 'monument', 'latitude': 41.8925, 'longitude': 12.4853, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "Ancient ruins of Rome's political center.", 'duration_override': 120, 'must_see': True},
        {'id': 'rome_spanishsteps', 'name': 'Spanish Steps', 'category': 'monument', 'latitude': 41.9060, 'longitude': 12.4828, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Iconic 135-step stairway.', 'duration_override': 45, 'must_see': True},
        {'id': 'rome_piazzanavona', 'name': 'Piazza Navona', 'category': 'attraction', 'latitude': 41.8992, 'longitude': 12.4731, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Baroque square with Bernini fountains.', 'duration_override': 45, 'must_see': True},
    ],
    'tokyo': [
        {'id': 'tokyo_sensoji', 'name': 'Senso-ji Temple', 'category': 'church', 'latitude': 35.7148, 'longitude': 139.7967, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "Tokyo's oldest Buddhist temple.", 'duration_override': 90, 'must_see': True},
        {'id': 'tokyo_meiji', 'name': 'Meiji Shrine', 'category': 'church', 'latitude': 35.6764, 'longitude': 139.6993, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Serene Shinto shrine in forested park.', 'duration_override': 90, 'must_see': True},
        {'id': 'tokyo_shibuya', 'name': 'Shibuya Crossing', 'category': 'attraction', 'latitude': 35.6595, 'longitude': 139.7004, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "World's busiest pedestrian crossing.", 'duration_override': 30, 'must_see': True},
        {'id': 'tokyo_skytree', 'name': 'Tokyo Skytree', 'category': 'monument', 'latitude': 35.7101, 'longitude': 139.8107, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Tallest tower in Japan.', 'duration_override': 120, 'must_see': True},
        {'id': 'tokyo_imperialpalace', 'name': 'Imperial Palace', 'category': 'palace', 'latitude': 35.6852, 'longitude': 139.7528, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "Home of Japan's Emperor.", 'duration_override': 120, 'must_see': True},
    ],
    'barcelona': [
        {'id': 'bcn_sagrada', 'name': 'Sagrada Familia', 'category': 'church', 'latitude': 41.4036, 'longitude': 2.1744, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "Gaudi's unfinished masterpiece.", 'duration_override': 120, 'must_see': True},
        {'id': 'bcn_parkguell', 'name': 'Park G√ºell', 'category': 'park', 'latitude': 41.4145, 'longitude': 2.1527, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Whimsical Gaudi park with mosaics.', 'duration_override': 120, 'must_see': True},
        {'id': 'bcn_casabatllo', 'name': 'Casa Batll√≥', 'category': 'museum', 'latitude': 41.3916, 'longitude': 2.1650, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': "Gaudi's fantastical house.", 'duration_override': 90, 'must_see': True},
        {'id': 'bcn_ramblas', 'name': 'La Rambla', 'category': 'attraction', 'latitude': 41.3809, 'longitude': 2.1735, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Famous pedestrian boulevard.', 'duration_override': 60, 'must_see': True},
        {'id': 'bcn_gothic', 'name': 'Gothic Quarter', 'category': 'attraction', 'latitude': 41.3833, 'longitude': 2.1777, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Medieval labyrinth streets.', 'duration_override': 120, 'must_see': True},
    ],
    'london': [
        {'id': 'london_bigben', 'name': 'Big Ben & Parliament', 'category': 'monument', 'latitude': 51.5007, 'longitude': -0.1246, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Iconic clock tower.', 'duration_override': 60, 'must_see': True},
        {'id': 'london_tower', 'name': 'Tower of London', 'category': 'monument', 'latitude': 51.5081, 'longitude': -0.0759, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Historic castle with Crown Jewels.', 'duration_override': 180, 'must_see': True},
        {'id': 'london_buckingham', 'name': 'Buckingham Palace', 'category': 'palace', 'latitude': 51.5014, 'longitude': -0.1419, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Royal residence.', 'duration_override': 90, 'must_see': True},
        {'id': 'london_british', 'name': 'British Museum', 'category': 'museum', 'latitude': 51.5194, 'longitude': -0.1270, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'World-class museum. Free!', 'duration_override': 240, 'must_see': True},
        {'id': 'london_eye', 'name': 'London Eye', 'category': 'attraction', 'latitude': 51.5033, 'longitude': -0.1196, 'confidence': 1.0, 'address': '', 'is_famous': True, 'description': 'Giant observation wheel.', 'duration_override': 60, 'must_see': True},
    ],
}


def load_landmarks_from_file_legacy(city: str) -> list:
    '''Load landmarks from curated JSON file.'''
    possible_paths = [
        Path(f'../data/landmarks/{city.lower()}_landmarks.json'),
        Path(f'data/landmarks/{city.lower()}_landmarks.json'),
        Path(f'/content/data/landmarks/{city.lower()}_landmarks.json'),
    ]

    for path in possible_paths:
        if path.exists():
            print(f"   üìÅ Loading from: {path}")
            with open(path, 'r') as f:
                raw_landmarks = json.load(f)

            landmarks = []
            for lm in raw_landmarks:
                landmarks.append({
                    'id': lm.get('id', f'file_{len(landmarks)}'),
                    'name': lm['name'],
                    'category': lm.get('category', 'attraction'),
                    'latitude': lm['latitude'],
                    'longitude': lm['longitude'],
                    'confidence': 1.0,
                    'address': lm.get('address', ''),
                    'is_famous': True,
                    'description': lm.get('description', f"Famous landmark in {city}"),
                    'duration_override': lm.get('duration_minutes', 90),
                    'must_see': lm.get('must_see', True),
                    'family_only': lm.get('family_only', False),
                })
            return landmarks
    return []


def fetch_landmarks_from_wikidata_legacy(city_name: str, country: str, limit: int = 20) -> list:
    '''Fetch famous landmarks from Wikidata SPARQL.'''
    query = f'''
    SELECT DISTINCT ?place ?placeLabel ?placeDescription (SAMPLE(?coord) AS ?coordinate)
    WHERE {{
      VALUES ?type {{ wd:Q570116 wd:Q33506 wd:Q16970 wd:Q839954 wd:Q4989906 wd:Q811979 }}
      ?place wdt:P31 ?type.
      ?place wdt:P131* ?city.
      ?city rdfs:label "{city_name}"@en.
      ?place wdt:P625 ?coord.
      ?article schema:about ?place.
      ?article schema:isPartOf <https://en.wikipedia.org/>.
      SERVICE wikibase:label {{ bd:serviceParam wikibase:language "en". }}
    }}
    GROUP BY ?place ?placeLabel ?placeDescription
    LIMIT {limit}
    '''

    try:
        response = requests.get(
            "https://query.wikidata.org/sparql",
            params={'query': query, 'format': 'json'},
            headers={'User-Agent': 'TravelItineraryBot/1.0'},
            timeout=30
        )
        data = response.json()

        landmarks = []
        for item in data.get('results', {}).get('bindings', []):
            name = item.get('placeLabel', {}).get('value', '')
            coord = item.get('coordinate', {}).get('value', '')
            desc = item.get('placeDescription', {}).get('value', '')

            if name.startswith('Q') and name[1:].isdigit():
                continue

            if coord and name:
                try:
                    coord = coord.replace('Point(', '').replace(')', '')
                    lon, lat = map(float, coord.split())
                    landmarks.append({
                        'id': f'wikidata_{len(landmarks)}',
                        'name': name,
                        'category': 'attraction',
                        'latitude': lat,
                        'longitude': lon,
                        'confidence': 1.0,
                        'address': '',
                        'is_famous': True,
                        'description': desc[:200] if desc else f"Famous landmark in {city_name}",
                        'duration_override': 90,
                        'must_see': True
                    })
                except:
                    pass
        return landmarks
    except Exception as e:
        print(f"‚ö†Ô∏è Wikidata fetch failed: {e}")
        return []


def get_famous_landmarks_legacy(city: str, country: str, method: str = 'auto') -> list:
    '''
    LEGACY: Get famous landmarks using cascading fallback.
    Priority: File ‚Üí Hardcoded ‚Üí Wikidata
    '''
    city_lower = city.lower()

    # 1. Try file first
    if method in ['auto', 'file']:
        landmarks = load_landmarks_from_file_legacy(city)
        if landmarks:
            print(f"‚úÖ Loaded {len(landmarks)} landmarks from file")
            return landmarks

    # 2. Try hardcoded
    if method == 'auto' and city_lower in HARDCODED_LANDMARKS:
        landmarks = HARDCODED_LANDMARKS[city_lower]
        print(f"‚úÖ Using {len(landmarks)} hardcoded landmarks for {city}")
        return landmarks

    # 3. Try Wikidata
    if method in ['auto', 'wikidata']:
        print(f"üîÑ Fetching from Wikidata...")
        landmarks = fetch_landmarks_from_wikidata_legacy(city, country)
        if landmarks and len(landmarks) >= 5:
            print(f"‚úÖ Fetched {len(landmarks)} landmarks from Wikidata")
            return landmarks

    # 4. Final fallback
    if city_lower in HARDCODED_LANDMARKS:
        return HARDCODED_LANDMARKS[city_lower]

    print(f"‚ö†Ô∏è No landmarks found for {city}")
    return []


def merge_landmarks_with_overture_legacy(pois_df, famous_landmarks: list, city_config: dict):
    '''
    LEGACY: Merge hardcoded/file landmarks with Overture data.
    '''
    if pois_df is None or len(pois_df) == 0:
        return pd.DataFrame(famous_landmarks)

    existing_names = set(pois_df['name'].str.lower())
    new_landmarks = [lm for lm in famous_landmarks if lm['name'].lower() not in existing_names]

    if new_landmarks:
        landmarks_df = pd.DataFrame(new_landmarks)
        pois_df = pd.concat([pois_df, landmarks_df], ignore_index=True)
        print(f"   ‚≠ê Added {len(new_landmarks)} new landmarks")

    return pois_df


# ============================================================
# TO USE LEGACY APPROACH, UNCOMMENT AND RUN:
# ============================================================

# print(f"\\nüèõÔ∏è LEGACY: FETCHING FAMOUS LANDMARKS FOR {city_config['name'].upper()}")
# famous_landmarks = get_famous_landmarks_legacy(
#     city=city_config['name'],
#     country=city_config['country'],
#     method='auto'
# )
#
# print(f"\\n‚≠ê {len(famous_landmarks)} FAMOUS LANDMARKS:")
# for lm in famous_landmarks:
#     print(f"   ‚≠ê {lm['name']} ({lm['category']})")
#
# # Merge with Overture data
# pois_df = merge_landmarks_with_overture_legacy(pois_df, famous_landmarks, city_config)
# print(f"\\n‚úÖ Total POIs after merge: {len(pois_df)}")

"""

print("üóÑÔ∏è LEGACY CODE CELL - Uncomment to use old approach")
print("   Old: File ‚Üí Wikidata ‚Üí Hardcoded ‚Üí Merge")
print("   New: Overture ‚Üí Mark Famous (simpler!)")

üóÑÔ∏è LEGACY CODE CELL - Uncomment to use old approach
   Old: File ‚Üí Wikidata ‚Üí Hardcoded ‚Üí Merge
   New: Overture ‚Üí Mark Famous (simpler!)


In [31]:
# ===========================
# üèõÔ∏è FETCH LANDMARKS + MARK FAMOUS ONES
# ===========================

# Famous landmarks that tourists MUST see (used to mark Overture POIs)
FAMOUS_LANDMARKS_DB = {
    'rome': {
        'colosseum': {'duration': 120, 'description': 'Iconic ancient Roman amphitheater, symbol of Rome.'},
        'colosseo': {'duration': 120, 'description': 'Iconic ancient Roman amphitheater, symbol of Rome.'},
        'vatican museum': {'duration': 240, 'description': "World's greatest art collection including Sistine Chapel."},
        'musei vaticani': {'duration': 240, 'description': "World's greatest art collection including Sistine Chapel."},
        'st peter': {'duration': 120, 'description': 'Largest church in the world, masterpiece of Renaissance.'},
        'san pietro': {'duration': 120, 'description': 'Largest church in the world, masterpiece of Renaissance.'},
        'sistine': {'duration': 60, 'description': "Michelangelo's ceiling masterpiece."},
        'sistina': {'duration': 60, 'description': "Michelangelo's ceiling masterpiece."},
        'pantheon': {'duration': 60, 'description': 'Best-preserved ancient Roman building with iconic dome.'},
        'trevi': {'duration': 30, 'description': 'Baroque masterpiece, throw a coin to return to Rome.'},
        'roman forum': {'duration': 120, 'description': "Ancient ruins of Rome's political and commercial center."},
        'foro romano': {'duration': 120, 'description': "Ancient ruins of Rome's political and commercial center."},
        'spanish steps': {'duration': 45, 'description': 'Iconic 135-step stairway, perfect for people watching.'},
        'piazza di spagna': {'duration': 45, 'description': 'Iconic 135-step stairway, perfect for people watching.'},
        'piazza navona': {'duration': 45, 'description': "Baroque square with Bernini's Fountain of Four Rivers."},
        'navona': {'duration': 45, 'description': "Baroque square with Bernini's Fountain of Four Rivers."},
        'castel sant\'angelo': {'duration': 90, 'description': 'Former papal fortress with panoramic views of Rome.'},
        'borghese gallery': {'duration': 120, 'description': 'Bernini sculptures and Caravaggio paintings.'},
        'galleria borghese': {'duration': 120, 'description': 'Bernini sculptures and Caravaggio paintings.'},
        'palatine': {'duration': 90, 'description': 'Legendary birthplace of Rome with imperial palace ruins.'},
        'palatino': {'duration': 90, 'description': 'Legendary birthplace of Rome with imperial palace ruins.'},
        'villa borghese': {'duration': 90, 'description': "Rome's Central Park - beautiful gardens and lake."},
        'campo de\' fiori': {'duration': 60, 'description': 'Lively market square, great for food and atmosphere.'},
        'trastevere': {'duration': 120, 'description': 'Charming cobblestone streets, best for evening walks.'},
    },
    'paris': {
        'eiffel': {'duration': 120, 'description': 'Iconic iron lattice tower, symbol of Paris.'},
        'tour eiffel': {'duration': 120, 'description': 'Iconic iron lattice tower, symbol of Paris.'},
        'louvre': {'duration': 240, 'description': "World's largest art museum. Home to Mona Lisa."},
        'notre-dame': {'duration': 60, 'description': 'Medieval Gothic cathedral, masterpiece of French architecture.'},
        'notre dame': {'duration': 60, 'description': 'Medieval Gothic cathedral, masterpiece of French architecture.'},
        'arc de triomphe': {'duration': 60, 'description': 'Iconic triumphal arch honoring those who fought for France.'},
        'sacre-coeur': {'duration': 90, 'description': 'White-domed basilica with panoramic views of Paris.'},
        'sacr√©-c≈ìur': {'duration': 90, 'description': 'White-domed basilica with panoramic views of Paris.'},
        'orsay': {'duration': 180, 'description': 'Impressionist masterpieces in former railway station.'},
        'versailles': {'duration': 300, 'description': 'Opulent royal palace with stunning gardens.'},
        'montmartre': {'duration': 120, 'description': 'Artistic hilltop neighborhood with stunning views.'},
        'luxembourg': {'duration': 90, 'description': 'Beautiful formal gardens in the heart of Paris.'},
        'champs': {'duration': 90, 'description': 'Famous avenue from Arc de Triomphe to Place de la Concorde.'},
    },
    'tokyo': {
        'sensoji': {'duration': 90, 'description': "Tokyo's oldest and most famous Buddhist temple."},
        'senso-ji': {'duration': 90, 'description': "Tokyo's oldest and most famous Buddhist temple."},
        'asakusa': {'duration': 90, 'description': 'Historic district with temples and traditional shops.'},
        'skytree': {'duration': 120, 'description': 'Tallest tower in Japan with observation decks.'},
        'meiji': {'duration': 90, 'description': 'Serene Shinto shrine in forested park.'},
        'shibuya': {'duration': 30, 'description': "World's busiest pedestrian crossing - organized chaos."},
        'tokyo tower': {'duration': 90, 'description': 'Iconic red-and-white tower inspired by Eiffel Tower.'},
        'imperial palace': {'duration': 120, 'description': "Home of Japan's Emperor, beautiful gardens."},
        'tsukiji': {'duration': 90, 'description': 'Fresh sushi and street food paradise.'},
        'harajuku': {'duration': 90, 'description': 'Youth fashion and kawaii culture central.'},
        'shinjuku': {'duration': 120, 'description': 'Neon-lit entertainment district.'},
        'akihabara': {'duration': 120, 'description': 'Anime, manga, and electronics paradise.'},
    },
    'barcelona': {
        'sagrada': {'duration': 120, 'description': "Gaudi's unfinished masterpiece, iconic Barcelona landmark."},
        'guell': {'duration': 120, 'description': 'Whimsical Gaudi park with colorful mosaics and city views.'},
        'g√ºell': {'duration': 120, 'description': 'Whimsical Gaudi park with colorful mosaics and city views.'},
        'batllo': {'duration': 90, 'description': "Gaudi's fantastical house with dragon-scale roof."},
        'batll√≥': {'duration': 90, 'description': "Gaudi's fantastical house with dragon-scale roof."},
        'rambla': {'duration': 60, 'description': 'Famous pedestrian boulevard from Pla√ßa Catalunya to sea.'},
        'gothic quarter': {'duration': 120, 'description': 'Medieval labyrinth of narrow streets and hidden plazas.'},
        'barri gotic': {'duration': 120, 'description': 'Medieval labyrinth of narrow streets and hidden plazas.'},
        'mila': {'duration': 90, 'description': "Gaudi's wave-like apartment building with rooftop warriors."},
        'pedrera': {'duration': 90, 'description': "Gaudi's wave-like apartment building with rooftop warriors."},
        'barceloneta': {'duration': 120, 'description': "City's most popular beach with seafood restaurants."},
        'boqueria': {'duration': 60, 'description': 'Vibrant food market, feast for all senses.'},
        'montjuic': {'duration': 90, 'description': 'Hilltop fortress with panoramic harbor views.'},
        'camp nou': {'duration': 120, 'description': "FC Barcelona's legendary home stadium."},
    },
    'london': {
        'big ben': {'duration': 60, 'description': 'Iconic clock tower and Houses of Parliament.'},
        'elizabeth tower': {'duration': 60, 'description': 'Iconic clock tower and Houses of Parliament.'},
        'tower of london': {'duration': 180, 'description': 'Historic castle with Crown Jewels and Beefeaters.'},
        'buckingham': {'duration': 90, 'description': 'Royal residence, famous for Changing of the Guard.'},
        'westminster abbey': {'duration': 120, 'description': 'Gothic abbey for royal coronations and weddings.'},
        'london eye': {'duration': 60, 'description': 'Giant observation wheel with panoramic city views.'},
        'tower bridge': {'duration': 60, 'description': 'Iconic Victorian bridge with glass walkway.'},
        'british museum': {'duration': 240, 'description': 'World-class museum with Rosetta Stone and mummies. Free!'},
        'st paul': {'duration': 120, 'description': "Wren's masterpiece with whispering gallery dome."},
        'trafalgar': {'duration': 45, 'description': "London's famous square with Nelson's Column."},
        'hyde park': {'duration': 90, 'description': "Royal park with Serpentine lake and Speaker's Corner."},
    },
}


def mark_famous_landmarks(pois_df, city: str) -> pd.DataFrame:
    """
    Mark POIs that are famous landmarks.
    Adds: is_famous, must_see, duration_override, description
    """
    city_lower = city.lower()
    famous_db = FAMOUS_LANDMARKS_DB.get(city_lower, {})

    if pois_df is None or len(pois_df) == 0:
        return pois_df

    # Add new columns
    pois_df = pois_df.copy()
    pois_df['is_famous'] = False
    pois_df['must_see'] = False
    pois_df['duration_override'] = None
    pois_df['description'] = ''

    famous_count = 0
    famous_names = []

    for idx, row in pois_df.iterrows():
        name_lower = row['name'].lower()

        # Check against famous landmarks database
        for keyword, info in famous_db.items():
            if keyword in name_lower:
                pois_df.at[idx, 'is_famous'] = True
                pois_df.at[idx, 'must_see'] = True
                pois_df.at[idx, 'duration_override'] = info['duration']
                pois_df.at[idx, 'description'] = info['description']
                famous_count += 1
                famous_names.append(row['name'])
                break  # Only match once per POI

    print(f"\nüèõÔ∏è FAMOUS LANDMARKS IDENTIFIED IN OVERTURE DATA")
    print(f"{'='*60}")
    print(f"‚úÖ Marked {famous_count} POIs as famous landmarks:")
    for name in famous_names[:15]:
        print(f"   ‚≠ê {name}")
    if len(famous_names) > 15:
        print(f"   ... and {len(famous_names) - 15} more")

    return pois_df


# Mark famous landmarks in Overture data
pois_df = mark_famous_landmarks(pois_df, city_config['name'])

# Show stats
if pois_df is not None:
    famous_df = pois_df[pois_df['is_famous'] == True]
    regular_df = pois_df[pois_df['is_famous'] == False]

    print(f"\nüìä DATA SUMMARY:")
    print(f"   Total POIs from Overture: {len(pois_df)}")
    print(f"   ‚≠ê Famous landmarks: {len(famous_df)}")
    print(f"   üìç Regular attractions: {len(regular_df)}")


üèõÔ∏è FAMOUS LANDMARKS IDENTIFIED IN OVERTURE DATA
‚úÖ Marked 66 POIs as famous landmarks:
   ‚≠ê Mus√©e d'Orsay
   ‚≠ê Chapelle Notre Dame du Lys
   ‚≠ê √âglise Notre-Dame-des-Pauvres
   ‚≠ê Cath√©drale Notre-Dame de Paris
   ‚≠ê Montmartre aux artistes
   ‚≠ê Mus√©e de Montmartre
   ‚≠ê Galerie Montmartre
   ‚≠ê Parking Indigo Paris Pierre Charron Champs-Elysees
   ‚≠ê √âglise Notre-Dame de Compassion
   ‚≠ê √âglise Notre-Dame de l'Assomption
   ‚≠ê √âglise Notre-Dame d'Auteuil
   ‚≠ê M√©diath√®que Gustave Eiffel
   ‚≠ê √âglise Saint-Jean de Montmartre
   ‚≠ê Parc Gustave Eiffel
   ‚≠ê √âglise Notre-Dame de l'Assomption
   ... and 51 more

üìä DATA SUMMARY:
   Total POIs from Overture: 5000
   ‚≠ê Famous landmarks: 66
   üìç Regular attractions: 4934


In [15]:
# ===========================
# üîç OVERTURE vs FAMOUS LANDMARKS COMPARISON
# ===========================
# This cell shows what Overture Maps has vs what tourists actually need

# Famous landmarks that EVERY tourist expects in Rome
EXPECTED_LANDMARKS = {
    'rome': [
        ('Colosseum', ['colosseum', 'coliseum', 'colosseo']),
        ('Vatican Museums', ['vatican museum', 'musei vaticani']),
        ('St. Peter\'s Basilica', ['st peter', 'san pietro', 'peter\'s basilica']),
        ('Sistine Chapel', ['sistine', 'sistina']),
        ('Pantheon', ['pantheon']),
        ('Trevi Fountain', ['trevi', 'fontana di trevi']),
        ('Roman Forum', ['roman forum', 'foro romano']),
        ('Spanish Steps', ['spanish steps', 'piazza di spagna', 'trinita dei monti']),
        ('Piazza Navona', ['piazza navona', 'navona']),
        ('Castel Sant\'Angelo', ['castel sant\'angelo', 'sant\'angelo']),
        ('Borghese Gallery', ['borghese', 'galleria borghese']),
        ('Palatine Hill', ['palatine', 'palatino']),
        ('Trastevere', ['trastevere']),
        ('Campo de\' Fiori', ['campo de\' fiori', 'campo dei fiori']),
        ('Villa Borghese', ['villa borghese']),
    ],
    'paris': [
        ('Eiffel Tower', ['eiffel', 'tour eiffel']),
        ('Louvre Museum', ['louvre']),
        ('Notre-Dame', ['notre-dame', 'notre dame']),
        ('Arc de Triomphe', ['arc de triomphe', 'triomphe']),
        ('Sacr√©-C≈ìur', ['sacre-coeur', 'sacr√©-c≈ìur', 'sacre coeur']),
        ('Champs-√âlys√©es', ['champs-√©lys√©es', 'champs elysees']),
        ('Mus√©e d\'Orsay', ['orsay', 'd\'orsay']),
        ('Palace of Versailles', ['versailles']),
        ('Montmartre', ['montmartre']),
        ('Luxembourg Gardens', ['luxembourg']),
    ],
    'tokyo': [
        ('Senso-ji Temple', ['sensoji', 'senso-ji', 'asakusa']),
        ('Tokyo Skytree', ['skytree']),
        ('Meiji Shrine', ['meiji']),
        ('Shibuya Crossing', ['shibuya']),
        ('Tokyo Tower', ['tokyo tower']),
        ('Imperial Palace', ['imperial palace', 'kokyo']),
        ('Tsukiji Market', ['tsukiji']),
        ('Harajuku', ['harajuku', 'takeshita']),
        ('Shinjuku', ['shinjuku']),
        ('Akihabara', ['akihabara']),
    ],
    'barcelona': [
        ('Sagrada Familia', ['sagrada']),
        ('Park G√ºell', ['guell', 'g√ºell']),
        ('Casa Batll√≥', ['batllo', 'batll√≥']),
        ('La Rambla', ['rambla']),
        ('Gothic Quarter', ['gothic quarter', 'barri gotic']),
        ('Casa Mil√†', ['mila', 'mil√†', 'pedrera']),
        ('Barceloneta Beach', ['barceloneta']),
        ('La Boqueria', ['boqueria']),
        ('Montju√Øc', ['montjuic', 'montju√Øc']),
        ('Camp Nou', ['camp nou']),
    ],
    'london': [
        ('Big Ben', ['big ben', 'elizabeth tower']),
        ('Tower of London', ['tower of london']),
        ('Buckingham Palace', ['buckingham']),
        ('Westminster Abbey', ['westminster abbey']),
        ('London Eye', ['london eye']),
        ('Tower Bridge', ['tower bridge']),
        ('British Museum', ['british museum']),
        ('St Paul\'s Cathedral', ['st paul', 'saint paul']),
        ('Trafalgar Square', ['trafalgar']),
        ('Hyde Park', ['hyde park']),
    ],
}

def check_overture_coverage(pois_df, city: str):
    """Check how many famous landmarks Overture Maps has."""

    city_lower = city.lower()
    if city_lower not in EXPECTED_LANDMARKS:
        print(f"‚ö†Ô∏è No expected landmarks defined for {city}")
        return

    expected = EXPECTED_LANDMARKS[city_lower]

    print(f"\n{'='*70}")
    print(f"üîç OVERTURE MAPS COVERAGE CHECK: {city.upper()}")
    print(f"{'='*70}")
    print(f"Checking if Overture has the landmarks tourists actually want...\n")

    found = []
    missing = []

    for landmark_name, search_terms in expected:
        # Search for any of the terms in POI names
        match_found = False
        matched_pois = []

        if pois_df is not None and len(pois_df) > 0:
            for term in search_terms:
                matches = pois_df[pois_df['name'].str.lower().str.contains(term, na=False)]
                if len(matches) > 0:
                    match_found = True
                    matched_pois.extend(matches.to_dict('records'))

        if match_found:
            found.append((landmark_name, matched_pois[:2]))  # Keep max 2 matches
        else:
            missing.append(landmark_name)

    # Display results
    print(f"‚úÖ FOUND IN OVERTURE ({len(found)}/{len(expected)}):")
    print(f"{'‚îÄ'*50}")
    for landmark, matches in found:
        print(f"   ‚úÖ {landmark}")
        for m in matches[:1]:  # Show first match
            print(f"      ‚îî‚îÄ \"{m['name']}\" ({m['category']})")

    print(f"\n‚ùå MISSING FROM OVERTURE ({len(missing)}/{len(expected)}):")
    print(f"{'‚îÄ'*50}")
    for landmark in missing:
        print(f"   ‚ùå {landmark}")

    # Coverage stats
    coverage = len(found) / len(expected) * 100
    print(f"\n{'='*70}")
    print(f"üìä COVERAGE SUMMARY")
    print(f"{'='*70}")
    print(f"   Overture Coverage: {coverage:.0f}% ({len(found)}/{len(expected)} landmarks)")

    if coverage < 50:
        print(f"\n   ‚ö†Ô∏è  POOR COVERAGE - Hardcoded landmarks are ESSENTIAL!")
        print(f"   Without them, tourists would miss: {', '.join(missing[:5])}...")
    elif coverage < 80:
        print(f"\n   ‚ö° MODERATE COVERAGE - Hardcoded landmarks fill the gaps")
    else:
        print(f"\n   üéâ GOOD COVERAGE - Overture has most landmarks!")

    print(f"\nüí° This is why we use: File ‚Üí Hardcoded ‚Üí Overture (as fallback)")

    return {'found': found, 'missing': missing, 'coverage': coverage}


# Run the check
coverage_result = check_overture_coverage(pois_df, city_config['name'])


üîç OVERTURE MAPS COVERAGE CHECK: PARIS
Checking if Overture has the landmarks tourists actually want...

‚úÖ FOUND IN OVERTURE (5/10):
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
   ‚úÖ Eiffel Tower
      ‚îî‚îÄ "M√©diath√®que Gustave Eiffel" (library)
   ‚úÖ Notre-Dame
      ‚îî‚îÄ "√âglise Notre-Dame d'Auteuil" (catholic_church)
   ‚úÖ Arc de Triomphe
      ‚îî‚îÄ "Arc de Triomphe du Carrousel" (landmark_and_historical_building)
   ‚úÖ Mus√©e d'Orsay
      ‚îî‚îÄ "Mus√©e d'Orsay" (art_museum)
   ‚úÖ Montmartre
      ‚îî‚îÄ "√âglise Saint-Jean de Montmartre" (catholic_church)

‚ùå MISSING FROM OVERTURE (5/10):
‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ
   ‚ùå Louvre Museum
   ‚ùå Sacr√©-C≈ìur
   ‚ùå Champs-√âlys√©es
   ‚ùå Palace of Versailles
   ‚ùå Luxembourg Gardens



---
## üîç Step 2b: Fetch POIs from Overture Maps (BigQuery)

Now we fetch additional POIs from Overture Maps. These will be merged with the famous landmarks.

In [16]:
# Visualize POIs on map
city = CITY_DATABASE[SELECTED_CITY]

fig = px.scatter_mapbox(
    pois_df,
    lat='latitude',
    lon='longitude',
    color='category',
    hover_name='name',
    hover_data=['category', 'address'],
    title=f"<b>üó∫Ô∏è {city['name']} POIs</b><br><sup>Colored by category</sup>",
    zoom=12,
    height=600,
    color_discrete_sequence=px.colors.qualitative.Set2
)

fig.update_layout(
    mapbox_style='carto-positron',
    margin={'r':0,'t':80,'l':0,'b':0}
)

fig.show()

# Category distribution
print("\nüìä Category Distribution:")
print(pois_df['category'].value_counts())


üìä Category Distribution:
category
supermarket                         522
art_gallery                         134
theatre                              42
park                                 38
parking                              37
catholic_church                      36
museum                               22
public_plaza                         20
library                              19
art_museum                           16
bridge                               14
landmark_and_historical_building     13
home_and_garden                      10
marketing_agency                      8
architectural_designer                7
stadium_arena                         7
church_cathedral                      7
monument                              5
farmers_market                        4
modern_art_museum                     3
medical_research_and_development      3
science_museum                        3
synagogue                             3
internet_marketing_service            2
co

---
## üéØ Step 3: Generate Persona Scores

In [17]:
# ===========================
# PERSONA SCORING ENGINE
# ===========================

# Category to Persona mapping (expanded for Overture Maps categories)
CATEGORY_PERSONA_MAP = {
    # ATTRACTIONS
    'museum': {
        'main_category': 'attraction',
        'subcategory': 'museum',
        'duration': 120,
        'cost_level': 3,
        'scores': {
            'family': 0.70, 'kids': 0.50, 'couple': 0.85, 'honeymoon': 0.80,
            'solo': 0.95, 'friends': 0.80, 'seniors': 0.90, 'business': 0.65,
            'cultural': 0.98, 'foodie': 0.20, 'romantic': 0.75, 'adventure': 0.30,
            'relaxation': 0.50, 'nightlife': 0.10, 'shopping': 0.30, 'photography': 0.85,
            'nature': 0.10, 'wellness': 0.30
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': True}
    },
    'church': {
        'main_category': 'attraction',
        'subcategory': 'church',
        'duration': 45,
        'cost_level': 1,
        'scores': {
            'family': 0.75, 'kids': 0.50, 'couple': 0.85, 'honeymoon': 0.80,
            'solo': 0.85, 'friends': 0.70, 'seniors': 0.90, 'business': 0.50,
            'cultural': 0.95, 'foodie': 0.15, 'romantic': 0.80, 'adventure': 0.30,
            'relaxation': 0.60, 'nightlife': 0.05, 'shopping': 0.10, 'photography': 0.90,
            'nature': 0.20, 'wellness': 0.50
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': True}
    },
    'monument': {
        'main_category': 'attraction',
        'subcategory': 'monument',
        'duration': 45,
        'cost_level': 2,
        'scores': {
            'family': 0.80, 'kids': 0.65, 'couple': 0.90, 'honeymoon': 0.88,
            'solo': 0.85, 'friends': 0.85, 'seniors': 0.80, 'business': 0.60,
            'cultural': 0.90, 'foodie': 0.15, 'romantic': 0.85, 'adventure': 0.40,
            'relaxation': 0.40, 'nightlife': 0.30, 'shopping': 0.20, 'photography': 0.95,
            'nature': 0.30, 'wellness': 0.30
        },
        'attributes': {'is_indoor': False, 'is_outdoor': True, 'is_must_see': True}
    },
    'attraction': {
        'main_category': 'attraction',
        'subcategory': 'landmark',
        'duration': 90,
        'cost_level': 3,
        'scores': {
            'family': 0.85, 'kids': 0.75, 'couple': 0.85, 'honeymoon': 0.80,
            'solo': 0.80, 'friends': 0.85, 'seniors': 0.75, 'business': 0.60,
            'cultural': 0.80, 'foodie': 0.30, 'romantic': 0.75, 'adventure': 0.60,
            'relaxation': 0.50, 'nightlife': 0.30, 'shopping': 0.30, 'photography': 0.90,
            'nature': 0.40, 'wellness': 0.30
        },
        'attributes': {'is_indoor': False, 'is_outdoor': True, 'is_must_see': True}
    },
    'park': {
        'main_category': 'attraction',
        'subcategory': 'park',
        'duration': 90,
        'cost_level': 1,
        'scores': {
            'family': 0.95, 'kids': 0.95, 'couple': 0.85, 'honeymoon': 0.80,
            'solo': 0.80, 'friends': 0.80, 'seniors': 0.85, 'business': 0.40,
            'cultural': 0.50, 'foodie': 0.30, 'romantic': 0.85, 'adventure': 0.50,
            'relaxation': 0.95, 'nightlife': 0.10, 'shopping': 0.10, 'photography': 0.85,
            'nature': 0.95, 'wellness': 0.80
        },
        'attributes': {'is_indoor': False, 'is_outdoor': True, 'is_must_see': False}
    },
    # RESTAURANTS & FOOD
    'restaurant': {
        'main_category': 'restaurant',
        'subcategory': 'restaurant',
        'duration': 75,
        'cost_level': 3,
        'scores': {
            'family': 0.75, 'kids': 0.60, 'couple': 0.90, 'honeymoon': 0.88,
            'solo': 0.70, 'friends': 0.90, 'seniors': 0.80, 'business': 0.85,
            'cultural': 0.60, 'foodie': 0.95, 'romantic': 0.85, 'adventure': 0.40,
            'relaxation': 0.70, 'nightlife': 0.50, 'shopping': 0.10, 'photography': 0.60,
            'nature': 0.10, 'wellness': 0.40
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': False}
    },
    'cafe': {
        'main_category': 'restaurant',
        'subcategory': 'cafe',
        'duration': 45,
        'cost_level': 2,
        'scores': {
            'family': 0.65, 'kids': 0.60, 'couple': 0.85, 'honeymoon': 0.80,
            'solo': 0.95, 'friends': 0.80, 'seniors': 0.80, 'business': 0.70,
            'cultural': 0.60, 'foodie': 0.80, 'romantic': 0.75, 'adventure': 0.30,
            'relaxation': 0.90, 'nightlife': 0.20, 'shopping': 0.30, 'photography': 0.70,
            'nature': 0.20, 'wellness': 0.60
        },
        'attributes': {'is_indoor': True, 'is_outdoor': True, 'is_must_see': False}
    },
    'bar': {
        'main_category': 'restaurant',
        'subcategory': 'bar',
        'duration': 90,
        'cost_level': 3,
        'scores': {
            'family': 0.20, 'kids': 0.05, 'couple': 0.80, 'honeymoon': 0.70,
            'solo': 0.75, 'friends': 0.95, 'seniors': 0.40, 'business': 0.60,
            'cultural': 0.40, 'foodie': 0.60, 'romantic': 0.70, 'adventure': 0.50,
            'relaxation': 0.60, 'nightlife': 0.98, 'shopping': 0.10, 'photography': 0.50,
            'nature': 0.10, 'wellness': 0.20
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': False}
    },
    'shop': {
        'main_category': 'shopping',
        'subcategory': 'shop',
        'duration': 60,
        'cost_level': 2,
        'scores': {
            'family': 0.60, 'kids': 0.50, 'couple': 0.75, 'honeymoon': 0.65,
            'solo': 0.80, 'friends': 0.85, 'seniors': 0.60, 'business': 0.50,
            'cultural': 0.40, 'foodie': 0.30, 'romantic': 0.50, 'adventure': 0.30,
            'relaxation': 0.50, 'nightlife': 0.20, 'shopping': 0.98, 'photography': 0.50,
            'nature': 0.05, 'wellness': 0.30
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': False}
    },
    'hotel': {
        'main_category': 'hotel',
        'subcategory': 'hotel',
        'duration': 0,
        'cost_level': 3,
        'scores': {
            'family': 0.50, 'kids': 0.50, 'couple': 0.50, 'honeymoon': 0.50,
            'solo': 0.50, 'friends': 0.50, 'seniors': 0.50, 'business': 0.70,
            'cultural': 0.30, 'foodie': 0.30, 'romantic': 0.50, 'adventure': 0.20,
            'relaxation': 0.70, 'nightlife': 0.30, 'shopping': 0.20, 'photography': 0.40,
            'nature': 0.10, 'wellness': 0.60
        },
        'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': False}
    },
}

# Default for unknown categories
DEFAULT_PERSONA = {
    'main_category': 'other',
    'subcategory': 'general',
    'duration': 45,
    'cost_level': 2,
    'scores': {k: 0.5 for k in ['family', 'kids', 'couple', 'honeymoon', 'solo', 'friends',
                                'seniors', 'business', 'cultural', 'foodie', 'romantic',
                                'adventure', 'relaxation', 'nightlife', 'shopping',
                                'photography', 'nature', 'wellness']},
    'attributes': {'is_indoor': True, 'is_outdoor': False, 'is_must_see': False}
}

def get_persona_mapping(category: str) -> dict:
    """
    Get persona mapping for an Overture Maps category.
    Enhanced to handle various Overture category formats.
    """
    if not category:
        return DEFAULT_PERSONA

    cat_lower = category.lower()

    # Direct match
    if cat_lower in CATEGORY_PERSONA_MAP:
        return CATEGORY_PERSONA_MAP[cat_lower]

    # ATTRACTIONS - Enhanced matching
    attraction_keywords = [
        'museum', 'gallery', 'art_gallery', 'exhibition',
        'church', 'cathedral', 'basilica', 'chapel', 'temple', 'mosque', 'synagogue',
        'monument', 'memorial', 'statue', 'landmark', 'historic', 'heritage',
        'castle', 'palace', 'tower', 'fort', 'ruin', 'archaeological',
        'park', 'garden', 'botanical', 'zoo', 'aquarium',
        'theater', 'theatre', 'opera', 'concert_hall', 'performing_arts',
        'tourist_attraction', 'point_of_interest', 'viewpoint', 'scenic',
        'bridge', 'square', 'plaza', 'fountain'
    ]

    for keyword in attraction_keywords:
        if keyword in cat_lower:
            # Determine subcategory
            if any(x in cat_lower for x in ['museum', 'gallery', 'exhibition']):
                return CATEGORY_PERSONA_MAP['museum']
            elif any(x in cat_lower for x in ['church', 'cathedral', 'basilica', 'chapel', 'temple', 'mosque']):
                return CATEGORY_PERSONA_MAP['church']
            elif any(x in cat_lower for x in ['park', 'garden', 'botanical', 'zoo']):
                return CATEGORY_PERSONA_MAP['park']
            elif any(x in cat_lower for x in ['monument', 'memorial', 'statue', 'tower', 'castle', 'palace']):
                return CATEGORY_PERSONA_MAP['monument']
            else:
                return CATEGORY_PERSONA_MAP['attraction']

    # RESTAURANTS - Enhanced matching
    restaurant_keywords = [
        'restaurant', 'dining', 'bistro', 'brasserie', 'trattoria', 'ristorante',
        'steakhouse', 'grill', 'pizzeria', 'sushi', 'ramen', 'noodle',
        'french_restaurant', 'italian_restaurant', 'chinese_restaurant', 'indian_restaurant',
        'japanese_restaurant', 'thai_restaurant', 'mexican_restaurant', 'american_restaurant',
        'seafood', 'vegetarian', 'vegan', 'fine_dining', 'fast_food', 'food_court'
    ]

    for keyword in restaurant_keywords:
        if keyword in cat_lower:
            return CATEGORY_PERSONA_MAP['restaurant']

    # CAFES - Enhanced matching
    cafe_keywords = [
        'cafe', 'coffee', 'tea_house', 'bakery', 'patisserie', 'dessert',
        'ice_cream', 'gelato', 'juice', 'smoothie', 'breakfast'
    ]

    for keyword in cafe_keywords:
        if keyword in cat_lower:
            return CATEGORY_PERSONA_MAP['cafe']

    # BARS - Enhanced matching
    bar_keywords = [
        'bar', 'pub', 'tavern', 'wine_bar', 'cocktail', 'lounge',
        'nightclub', 'club', 'brewery', 'beer', 'sports_bar'
    ]

    for keyword in bar_keywords:
        if keyword in cat_lower:
            return CATEGORY_PERSONA_MAP['bar']

    # SHOPPING - Enhanced matching
    shop_keywords = [
        'shop', 'store', 'boutique', 'market', 'mall', 'retail',
        'supermarket', 'grocery', 'fashion', 'clothing', 'jewelry',
        'bookstore', 'souvenir', 'gift_shop', 'department_store'
    ]

    for keyword in shop_keywords:
        if keyword in cat_lower:
            return CATEGORY_PERSONA_MAP['shop']

    # HOTELS - Enhanced matching
    hotel_keywords = ['hotel', 'hostel', 'motel', 'resort', 'inn', 'lodging', 'accommodation', 'b&b']

    for keyword in hotel_keywords:
        if keyword in cat_lower:
            return CATEGORY_PERSONA_MAP['hotel']

    return DEFAULT_PERSONA

print("‚úÖ Persona scoring engine ready (with enhanced Overture Maps category mapping)")

‚úÖ Persona scoring engine ready (with enhanced Overture Maps category mapping)


In [18]:
# ===========================
# ENRICH POIs WITH SCORES
# ===========================

# Garbage POI names to filter out (not real tourist attractions)
GARBAGE_KEYWORDS = [
    'garage', 'parking', 'parcheggio', 'autorimessa',
    'supermarket', 'supermercato', 'grocery', 'alimentari',
    'pharmacy', 'farmacia', 'hospital', 'ospedale', 'clinic',
    'bank', 'banca', 'atm', 'bancomat',
    'gas station', 'benzina', 'petrol',
    'laundry', 'lavanderia', 'dry clean',
    'dentist', 'doctor', 'medico',
    'hotel', 'hostel', 'b&b', 'airbnb',
    'gym', 'palestra', 'fitness',
    'office', 'ufficio', 'coworking',
    'school', 'scuola', 'university',
    'apartment', 'appartamento', 'residence',
    'welcome to',  # Generic welcome centers
    'services', 'servizi',
]

def is_garbage_poi(name: str, category: str) -> bool:
    """Check if POI is garbage (not a real tourist attraction)."""
    name_lower = name.lower()

    # Check against garbage keywords
    for keyword in GARBAGE_KEYWORDS:
        if keyword in name_lower:
            return True

    # Very short names are suspicious
    if len(name) < 4:
        return True

    # Names that are just numbers or codes
    if name.replace(' ', '').replace('-', '').isdigit():
        return True

    return False


def enrich_poi(row, city_config):
    """Enrich a POI with persona scores and attributes."""
    mapping = get_persona_mapping(row['category'])

    # Check if this is a famous landmark with custom data
    is_famous = row.get('is_famous', False)
    custom_duration = row.get('duration_override')
    custom_description = row.get('description', '')
    is_must_see = row.get('must_see', False)
    is_family_only = row.get('family_only', False)

    # Find nearest neighborhood
    neighborhood = "Unknown"
    min_dist = float('inf')
    for nb in city_config['neighborhoods']:
        dist = ((row['latitude'] - nb['lat'])**2 + (row['longitude'] - nb['lon'])**2)**0.5
        if dist < min_dist:
            min_dist = dist
            neighborhood = nb['name']

    # Use custom duration for famous landmarks, otherwise use mapping default
    # IMPORTANT: Ensure duration is never NaN
    if custom_duration and not pd.isna(custom_duration):
        duration = int(custom_duration)
    else:
        duration = mapping.get('duration', 60)  # Default 60 min

    # Generate description
    if custom_description and isinstance(custom_description, str):
        description = custom_description
    else:
        description = f"{row['name']} - {row['category']} in {neighborhood}"

    # Adjust scores for famous landmarks (they should score higher)
    scores = mapping['scores'].copy()
    if is_famous:
        # Boost all scores for famous landmarks
        for key in scores:
            scores[key] = min(scores[key] * 1.2, 1.0)
        # Photography score high for famous places
        scores['photography'] = max(scores.get('photography', 0.8), 0.95)

    # Family-only attractions (like Disneyland)
    if is_family_only:
        scores['family'] = 0.98
        scores['kids'] = 0.98
        scores['honeymoon'] = 0.3  # Not ideal for honeymoon
        scores['solo'] = 0.4
        scores['seniors'] = 0.5

    return {
        'name': row['name'],
        'description': description,
        'latitude': row['latitude'],
        'longitude': row['longitude'],
        'address': row.get('address', ''),
        'neighborhood': neighborhood,
        'city': city_config['name'],
        'country': city_config['country'],
        'category': mapping['main_category'],
        'subcategory': mapping['subcategory'],
        'original_category': row['category'],
        'typical_duration_minutes': duration,  # Guaranteed to be int, not NaN
        'cost_level': mapping['cost_level'],
        'avg_cost_per_person': mapping['cost_level'] * 12.0,
        'cost_currency': city_config['currency'],
        'source': 'famous_landmark' if is_famous else 'overture_maps',
        'source_id': row.get('id', ''),
        'is_famous': is_famous,
        'persona_scores': {f"score_{k}": v for k, v in scores.items()},
        'attributes': {
            'is_kid_friendly': scores.get('kids', 0.5) > 0.5,
            'is_wheelchair_accessible': True,
            'requires_reservation': mapping['cost_level'] >= 3 or is_famous,
            **mapping['attributes'],
            'is_must_see': is_must_see or mapping['attributes'].get('is_must_see', False),
            'physical_intensity': 2,
            'typical_crowd_level': 4 if is_famous else 3,
            'is_hidden_gem': False,
            'instagram_worthy': scores.get('photography', 0.5) > 0.7 or is_famous,
            'is_family_only': is_family_only,
        }
    }


# Filter out garbage POIs before enriching
print(f"üìä Filtering POIs...")
original_count = len(pois_df)

# Filter out garbage
valid_pois = []
garbage_count = 0
for _, row in pois_df.iterrows():
    if is_garbage_poi(row['name'], row['category']):
        garbage_count += 1
    else:
        valid_pois.append(row)

print(f"   üóëÔ∏è Filtered out {garbage_count} garbage POIs (garages, parking, etc.)")

# Enrich valid POIs only
enriched_pois = [enrich_poi(row, city_config) for row in valid_pois]

# Separate famous landmarks and regular POIs for priority
famous_pois = [p for p in enriched_pois if p.get('is_famous', False)]
regular_pois = [p for p in enriched_pois if not p.get('is_famous', False)]

print(f"‚úÖ Enriched {len(enriched_pois)} POIs with persona scores")
print(f"   ‚≠ê Famous landmarks: {len(famous_pois)}")
print(f"   üìç Regular POIs: {len(regular_pois)}")

# Show famous landmarks
if famous_pois:
    print(f"\nüåü Famous Landmarks:")
    for p in famous_pois[:10]:
        print(f"   ‚Ä¢ {p['name']} ({p['subcategory']}) - {p['typical_duration_minutes']} min")

üìä Filtering POIs...
   üóëÔ∏è Filtered out 27 garbage POIs (garages, parking, etc.)
‚úÖ Enriched 973 POIs with persona scores
   ‚≠ê Famous landmarks: 18
   üìç Regular POIs: 955

üåü Famous Landmarks:
   ‚Ä¢ Mus√©e d'Orsay (museum) - 180 min
   ‚Ä¢ √âglise Saint-Jean de Montmartre (church) - 120 min
   ‚Ä¢ √âglise Notre-Dame d'Auteuil (church) - 60 min
   ‚Ä¢ √âglise Notre-Dame de l'Assomption (church) - 60 min
   ‚Ä¢ Mus√©e de Montmartre (museum) - 120 min
   ‚Ä¢ Basilique Notre-Dame-du-Perpetuel-Secours (church) - 60 min
   ‚Ä¢ M√©diath√®que Gustave Eiffel (general) - 120 min
   ‚Ä¢ √âglise Notre-Dame de Compassion (church) - 60 min
   ‚Ä¢ √âglise Notre-Dame-des-Pauvres (church) - 60 min
   ‚Ä¢ Parc Gustave Eiffel (park) - 120 min


---
## üíæ Step 4: Save POIs to Seed File

In [19]:
# ===========================
# SAVE TO SEED FILE
# ===========================

def save_seed_file(pois: list, city_config: dict, output_dir: Path = None):
    """Save POIs to a seed JSON file."""
    if output_dir is None:
        output_dir = Path('../data/seed')
    output_dir.mkdir(parents=True, exist_ok=True)

    seed_data = {
        "city": city_config['name'],
        "country": city_config['country'],
        "currency": city_config['currency'],
        "source": "Overture Maps via BigQuery + EDA Processing",
        "total_pois": len(pois),
        "neighborhoods": city_config['neighborhoods'],
        "pois": pois,
        "persona_templates": []  # Can add later
    }

    filename = f"{city_config['name'].lower()}_pois.json"
    output_file = output_dir / filename

    with open(output_file, 'w', encoding='utf-8') as f:
        json.dump(seed_data, f, indent=2, ensure_ascii=False)

    print(f"‚úÖ Saved to {output_file}")
    return output_file

# Save the enriched POIs
saved_file = save_seed_file(enriched_pois, city_config)

print(f"\nüìÅ Seed file saved!")
print(f"   File: {saved_file}")
print(f"   POIs: {len(enriched_pois)}")
print(f"\nüí° To load into database:")
print(f"   python -m data.scripts.seed_data {city_config['name'].lower()}")

‚úÖ Saved to ../data/seed/paris_pois.json

üìÅ Seed file saved!
   File: ../data/seed/paris_pois.json
   POIs: 973

üí° To load into database:
   python -m data.scripts.seed_data paris


---
## üë§ Step 5: Define Trip Parameters

Now let's define your trip!

In [20]:
# ===========================
# TRIP CONFIGURATION
# ===========================

import math

def haversine_distance(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
    """
    Calculate the distance between two points on Earth using Haversine formula.
    Returns distance in kilometers.
    """
    R = 6371  # Earth's radius in km

    lat1_rad = math.radians(lat1)
    lat2_rad = math.radians(lat2)
    delta_lat = math.radians(lat2 - lat1)
    delta_lon = math.radians(lon2 - lon1)

    a = math.sin(delta_lat/2)**2 + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(delta_lon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

    return R * c


def estimate_travel_time(lat1: float, lon1: float, lat2: float, lon2: float) -> dict:
    """
    Estimate travel time between two locations.
    Returns dict with distance, walking_time, transit_time.

    Speeds:
    - Walking: 5 km/h (average tourist pace)
    - Transit/Metro: 25 km/h (including waiting + walking to station)
    """
    distance_km = haversine_distance(lat1, lon1, lat2, lon2)

    # Walking: 5 km/h average
    walk_hours = distance_km / 5.0
    walk_minutes = round(walk_hours * 60)

    # Transit: 25 km/h (accounts for waiting, walking to station)
    # But minimum 10 min for any transit (getting to station, waiting)
    transit_hours = distance_km / 25.0
    transit_minutes = max(10, round(transit_hours * 60))

    # Recommend mode based on distance
    if distance_km < 1.0:
        recommended = 'walk'
        recommended_time = walk_minutes
    elif distance_km < 2.5:
        recommended = 'walk'  # Still walkable for tourists
        recommended_time = walk_minutes
    else:
        recommended = 'transit'
        recommended_time = transit_minutes

    return {
        'distance_km': round(distance_km, 2),
        'walk_minutes': walk_minutes,
        'transit_minutes': transit_minutes,
        'recommended_mode': recommended,
        'recommended_time': recommended_time
    }


@dataclass
class TripConfig:
    """Trip configuration."""
    # Basic
    city: str
    num_days: int
    start_date: date

    # Persona
    group_type: str  # family, couple, solo, friends, honeymoon, seniors, business
    group_size: int = 2
    has_kids: bool = False
    has_seniors: bool = False

    # Preferences
    vibes: List[str] = field(default_factory=lambda: ['cultural'])
    budget_level: int = 3  # 1-5
    pacing: str = 'moderate'  # slow, moderate, fast

    @property
    def end_date(self) -> date:
        return self.start_date + timedelta(days=self.num_days - 1)

    @property
    def pois_per_day(self) -> int:
        # Realistic pacing (considering travel time between locations):
        # - slow: 2-3 attractions (focus on quality, long visits)
        # - moderate: 3-4 attractions (balanced)
        # - fast: 4-5 attractions (max reasonable with travel time)
        return {'slow': 2, 'moderate': 4, 'fast': 5}.get(self.pacing, 3)


# ===========================
# üîß CONFIGURE YOUR TRIP
# ===========================

trip = TripConfig(
    city=city_config['name'],
    num_days=4,
    start_date=date(2025, 6, 15),

    # Persona
    group_type='honeymoon',  # Options: family, couple, solo, friends, honeymoon, seniors, business
    group_size=2,
    has_kids=False,
    has_seniors=False,

    # Preferences
    vibes=['romantic', 'foodie', 'cultural'],  # Options: cultural, foodie, romantic, adventure, relaxation, nightlife, shopping, photography, nature, wellness
    budget_level=4,  # 1=budget, 5=luxury
    pacing='slow',  # Options: slow, moderate, fast
)

print(f"\nüéí TRIP CONFIGURATION")
print(f"{'='*50}")
print(f"üìç Destination: {trip.city}")
print(f"üìÖ Dates: {trip.start_date} to {trip.end_date} ({trip.num_days} days)")
print(f"üë• Group: {trip.group_type.title()} ({trip.group_size} people)")
print(f"üéØ Vibes: {', '.join(trip.vibes)}")
print(f"üí∞ Budget: {'üíµ' * trip.budget_level} (Level {trip.budget_level}/5)")
print(f"‚è±Ô∏è Pacing: {trip.pacing.title()} (~{trip.pois_per_day} attractions/day)")

# Test travel time function
print(f"\nüö∂ Travel Time Calculator Ready!")
print(f"   Example: Eiffel Tower ‚Üí Louvre")
test_travel = estimate_travel_time(48.8584, 2.2945, 48.8606, 2.3376)
print(f"   Distance: {test_travel['distance_km']} km")
print(f"   Walk: {test_travel['walk_minutes']} min | Transit: {test_travel['transit_minutes']} min")
print(f"   Recommended: {test_travel['recommended_mode'].upper()} ({test_travel['recommended_time']} min)")


üéí TRIP CONFIGURATION
üìç Destination: Paris
üìÖ Dates: 2025-06-15 to 2025-06-18 (4 days)
üë• Group: Honeymoon (2 people)
üéØ Vibes: romantic, foodie, cultural
üí∞ Budget: üíµüíµüíµüíµ (Level 4/5)
‚è±Ô∏è Pacing: Slow (~2 attractions/day)

üö∂ Travel Time Calculator Ready!
   Example: Eiffel Tower ‚Üí Louvre
   Distance: 3.16 km
   Walk: 38 min | Transit: 10 min
   Recommended: TRANSIT (10 min)


---
## üè† Step 6: Generate Itinerary - Where to Stay

In [21]:
# ===========================
# NEIGHBORHOOD RECOMMENDATIONS
# ===========================

def score_neighborhood(nb: dict, trip: TripConfig) -> dict:
    """Score a neighborhood for the trip."""
    score = 0.0
    reasons = []

    # Group type match
    best_for = nb.get('best_for', [])
    if trip.group_type in best_for:
        score += 0.4
        reasons.append(f"Great for {trip.group_type} travelers")
    elif any(g in best_for for g in ['all', trip.group_type[:4]]):
        score += 0.2

    # Vibe overlap
    nb_vibes = set(nb.get('vibes', []))
    trip_vibes = set(trip.vibes)
    matching_vibes = nb_vibes & trip_vibes

    if matching_vibes:
        vibe_score = len(matching_vibes) / len(trip_vibes) * 0.4
        score += vibe_score
        reasons.append(f"Matches {', '.join(matching_vibes)} vibes")

    # Base score for being in the city
    score += 0.2

    return {
        'name': nb['name'],
        'lat': nb['lat'],
        'lon': nb['lon'],
        'vibes': nb.get('vibes', []),
        'best_for': best_for,
        'score': min(score, 1.0),
        'reasoning': '; '.join(reasons) if reasons else 'Central location'
    }

# Score all neighborhoods
neighborhood_scores = [
    score_neighborhood(nb, trip)
    for nb in city_config['neighborhoods']
]
neighborhood_scores.sort(key=lambda x: x['score'], reverse=True)

print(f"\nüè† WHERE TO STAY - Recommendations for {trip.group_type.title()}")
print(f"{'='*60}")

medals = ['ü•á', 'ü•à', 'ü•â', '4Ô∏è‚É£', '5Ô∏è‚É£']
for i, nb in enumerate(neighborhood_scores[:5]):
    medal = medals[i] if i < len(medals) else '  '
    bar = '‚ñà' * int(nb['score'] * 10) + '‚ñë' * (10 - int(nb['score'] * 10))
    print(f"\n{medal} {nb['name']}")
    print(f"   Score: {bar} {nb['score']:.2f}")
    print(f"   Vibes: {', '.join(nb['vibes'])}")
    print(f"   Why: {nb['reasoning']}")


üè† WHERE TO STAY - Recommendations for Honeymoon

ü•á Saint-Germain-des-Pr√©s
   Score: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà 1.00
   Vibes: cultural, romantic, foodie
   Why: Great for honeymoon travelers; Matches cultural, romantic, foodie vibes

ü•à Montmartre
   Score: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë 0.87
   Vibes: romantic, cultural, photography
   Why: Great for honeymoon travelers; Matches cultural, romantic vibes

ü•â Eiffel Tower / 7th
   Score: ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë 0.87
   Vibes: romantic, cultural, photography
   Why: Great for honeymoon travelers; Matches cultural, romantic vibes

4Ô∏è‚É£ Latin Quarter
   Score: ‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë 0.47
   Vibes: cultural, foodie, nightlife
   Why: Matches cultural, foodie vibes

5Ô∏è‚É£ Le Marais
   Score: ‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë 0.33
   Vibes: cultural, shopping, nightlife
   Why: Matches cultural vibes


In [22]:
# Visualize neighborhoods on map
nb_df = pd.DataFrame(neighborhood_scores)

fig = px.scatter_mapbox(
    nb_df,
    lat='lat',
    lon='lon',
    size='score',
    color='score',
    hover_name='name',
    hover_data=['vibes', 'reasoning'],
    title=f"<b>üè† Recommended Neighborhoods for {trip.group_type.title()}</b>",
    zoom=12,
    height=500,
    size_max=30,
    color_continuous_scale='RdYlGn'
)

fig.update_layout(
    mapbox_style='carto-positron',
    margin={'r':0,'t':80,'l':0,'b':0}
)

fig.show()

---
## üó∫Ô∏è Step 7: Generate Itinerary - What to Visit

In [23]:
# ===========================
# POI SCORING FOR TRIP
# ===========================

def score_poi_for_trip(poi: dict, trip: TripConfig) -> dict:
    """Score a POI for the trip."""
    scores = poi.get('persona_scores', {})
    attrs = poi.get('attributes', {})

    # Skip hotels
    if poi.get('category') == 'hotel':
        return None

    # Family-only attractions (like Disneyland) - skip for non-family trips
    if attrs.get('is_family_only', False) and trip.group_type != 'family':
        return None

    # Hard filters
    if trip.has_kids and not attrs.get('is_kid_friendly', True):
        return None
    if poi.get('cost_level', 3) > trip.budget_level + 1:
        return None

    # Group score
    group_score = scores.get(f'score_{trip.group_type}', 0.5)

    # Vibe score (average of selected vibes)
    vibe_scores = [scores.get(f'score_{v}', 0.5) for v in trip.vibes]
    vibe_score = sum(vibe_scores) / len(vibe_scores) if vibe_scores else 0.5

    # Famous landmark bonus
    is_famous = poi.get('is_famous', False)
    famous_bonus = 0.15 if is_famous else 0

    # Final score
    final_score = (group_score * 0.4) + (vibe_score * 0.4) + 0.2 + famous_bonus

    # Determine priority
    priority = 'recommended'
    if attrs.get('is_must_see', False) or is_famous:
        priority = 'must_see'
        final_score += 0.1
    elif final_score >= 0.75:
        priority = 'highly_recommended'

    # Generate reasoning
    reasons = []

    # Famous landmark reasoning
    if is_famous:
        reasons.append(f"Iconic {poi['category']} - must visit!")

    if group_score >= 0.8:
        reasons.append(f"Perfect for {trip.group_type}")

    matching_vibes = [v for v in trip.vibes if scores.get(f'score_{v}', 0) >= 0.7]
    if matching_vibes:
        reasons.append(f"Strong {', '.join(matching_vibes)} vibes")

    if attrs.get('is_must_see') and not is_famous:
        reasons.append("Must-see attraction")

    if attrs.get('instagram_worthy'):
        reasons.append("Great for photos")

    # Add description for famous landmarks (ensure it's a string, not NaN/float)
    description = poi.get('description', '')
    if description and is_famous and isinstance(description, str):
        reasons.append(description)

    return {
        'name': poi['name'],
        'category': poi['category'],
        'subcategory': poi['subcategory'],
        'neighborhood': poi['neighborhood'],
        'duration': poi['typical_duration_minutes'],
        'cost_level': poi['cost_level'],
        'cost': poi.get('avg_cost_per_person', 0),
        'latitude': poi['latitude'],
        'longitude': poi['longitude'],
        'group_score': group_score,
        'vibe_score': vibe_score,
        'final_score': min(final_score, 1.0),
        'priority': priority,
        'is_famous': is_famous,
        'reasoning': '; '.join(reasons) if reasons else f"Good for {trip.group_type}"
    }

# Score all POIs
scored_pois = []
for poi in enriched_pois:
    scored = score_poi_for_trip(poi, trip)
    if scored:
        scored_pois.append(scored)

# Sort by: 1) Famous landmarks first, 2) Must-see priority, 3) Final score
scored_pois.sort(key=lambda x: (x['is_famous'], x['priority'] == 'must_see', x['final_score']), reverse=True)

# Count categories
attractions = [p for p in scored_pois if p['category'] == 'attraction']
famous = [p for p in scored_pois if p['is_famous']]

print(f"\nüó∫Ô∏è POI RECOMMENDATIONS for {trip.group_type.title()} trip")
print(f"{'='*60}")
print(f"Total POIs matching criteria: {len(scored_pois)}")
print(f"‚≠ê Famous landmarks: {len(famous)}")
print(f"üèõÔ∏è Attractions: {len(attractions)}")

# Show top famous landmarks
print(f"\nüåü TOP FAMOUS LANDMARKS IN YOUR ITINERARY:")
for poi in famous[:5]:
    print(f"   ‚Ä¢ {poi['name']} ({poi['subcategory']}) - Score: {poi['final_score']:.0%}")


üó∫Ô∏è POI RECOMMENDATIONS for Honeymoon trip
Total POIs matching criteria: 973
‚≠ê Famous landmarks: 18
üèõÔ∏è Attractions: 402

üåü TOP FAMOUS LANDMARKS IN YOUR ITINERARY:
   ‚Ä¢ Mus√©e d'Orsay (museum) - Score: 100%
   ‚Ä¢ √âglise Saint-Jean de Montmartre (church) - Score: 100%
   ‚Ä¢ √âglise Notre-Dame d'Auteuil (church) - Score: 100%
   ‚Ä¢ √âglise Notre-Dame de l'Assomption (church) - Score: 100%
   ‚Ä¢ Mus√©e de Montmartre (museum) - Score: 100%


In [24]:
# ===========================
# BUILD ATTRACTION-ONLY ITINERARY
# ===========================

def cluster_pois_by_neighborhood(pois: list) -> dict:
    """Group POIs by neighborhood for efficient routing."""
    clusters = {}
    for poi in pois:
        nb = poi.get('neighborhood', 'Unknown')
        if nb not in clusters:
            clusters[nb] = []
        clusters[nb].append(poi)
    return clusters


def sort_by_proximity(pois: list, start_lat: float = None, start_lon: float = None) -> list:
    """Sort POIs by proximity using nearest neighbor algorithm."""
    if not pois:
        return []

    sorted_pois = []
    remaining = pois.copy()

    # Start from center or first POI
    if start_lat is None:
        current = remaining.pop(0)
    else:
        # Find nearest to start point
        min_dist = float('inf')
        nearest_idx = 0
        for i, p in enumerate(remaining):
            dist = haversine_distance(start_lat, start_lon, p['latitude'], p['longitude'])
            if dist < min_dist:
                min_dist = dist
                nearest_idx = i
        current = remaining.pop(nearest_idx)

    sorted_pois.append(current)

    # Greedy nearest neighbor
    while remaining:
        min_dist = float('inf')
        nearest_idx = 0
        for i, p in enumerate(remaining):
            dist = haversine_distance(current['latitude'], current['longitude'],
                                     p['latitude'], p['longitude'])
            if dist < min_dist:
                min_dist = dist
                nearest_idx = i
        current = remaining.pop(nearest_idx)
        sorted_pois.append(current)

    return sorted_pois


def build_attraction_itinerary(scored_pois: list, trip: TripConfig) -> dict:
    """
    Build a day-by-day itinerary with ONLY attractions.

    Key improvements:
    1. Clusters POIs by neighborhood (no back-and-forth)
    2. Famous landmarks get priority
    3. Each day focuses on 1-2 nearby neighborhoods
    4. Travel time calculated between activities
    """

    # Filter to attractions only (no restaurants, hotels, shops)
    attractions = [p for p in scored_pois if p['category'] == 'attraction']

    print(f"   üèõÔ∏è Found {len(attractions)} attractions for {trip.group_type} trip")

    # Separate famous landmarks (must include) from regular attractions
    famous = [p for p in attractions if p.get('is_famous', False)]
    regular = [p for p in attractions if not p.get('is_famous', False)]

    print(f"   ‚≠ê Famous landmarks: {len(famous)}")
    print(f"   üìç Regular attractions: {len(regular)}")

    # Sort by score within each group
    famous.sort(key=lambda x: x['final_score'], reverse=True)
    regular.sort(key=lambda x: x['final_score'], reverse=True)

    # Cluster by neighborhood
    famous_by_nb = cluster_pois_by_neighborhood(famous)
    regular_by_nb = cluster_pois_by_neighborhood(regular)

    # Plan which neighborhoods to visit each day
    all_neighborhoods = list(set(list(famous_by_nb.keys()) + list(regular_by_nb.keys())))

    days = []
    used_pois = set()
    total_travel_time = 0

    for day_num in range(1, trip.num_days + 1):
        day = {
            'day': day_num,
            'date': str(trip.start_date + timedelta(days=day_num - 1)),
            'activities': [],
            'total_duration': 0,
            'total_travel_time': 0,
            'neighborhoods_visited': set(),
        }

        activities_for_day = trip.pois_per_day

        # Strategy: Pick neighborhoods with most unused famous landmarks
        nb_scores = {}
        for nb in all_neighborhoods:
            unused_famous = [p for p in famous_by_nb.get(nb, []) if p['name'] not in used_pois]
            unused_regular = [p for p in regular_by_nb.get(nb, []) if p['name'] not in used_pois]
            # Score = famous count * 10 + regular count
            nb_scores[nb] = len(unused_famous) * 10 + len(unused_regular)

        # Sort neighborhoods by score
        sorted_nbs = sorted(nb_scores.keys(), key=lambda x: nb_scores[x], reverse=True)

        # Collect POIs for this day, neighborhood by neighborhood
        day_pois = []
        for nb in sorted_nbs:
            if len(day_pois) >= activities_for_day:
                break

            # Get unused POIs from this neighborhood (famous first)
            nb_famous = [p for p in famous_by_nb.get(nb, []) if p['name'] not in used_pois]
            nb_regular = [p for p in regular_by_nb.get(nb, []) if p['name'] not in used_pois]

            # Add famous first, then regular
            for poi in nb_famous:
                if len(day_pois) < activities_for_day:
                    day_pois.append(poi)
                    used_pois.add(poi['name'])
                    day['neighborhoods_visited'].add(nb)

            for poi in nb_regular[:2]:  # Max 2 regular per neighborhood
                if len(day_pois) < activities_for_day:
                    day_pois.append(poi)
                    used_pois.add(poi['name'])
                    day['neighborhoods_visited'].add(nb)

        # Sort day's POIs by proximity (minimize walking)
        if day_pois:
            day_pois = sort_by_proximity(day_pois)

        # Add activities with travel time
        for i, poi in enumerate(day_pois):
            # Determine time slot
            if i == 0:
                slot = 'morning'
            elif i < len(day_pois) // 2:
                slot = 'late_morning'
            elif i < len(day_pois) - 1:
                slot = 'afternoon'
            else:
                slot = 'evening'

            # Calculate travel time from previous activity
            travel_info = None
            if day['activities']:
                prev = day['activities'][-1]
                travel_info = estimate_travel_time(
                    prev['latitude'], prev['longitude'],
                    poi['latitude'], poi['longitude']
                )
                day['total_travel_time'] += travel_info['recommended_time']

            day['activities'].append({
                **poi,
                'slot': slot,
                'travel_from_previous': travel_info
            })
            day['total_duration'] += poi['duration']

        total_travel_time += day['total_travel_time']

        # Generate day theme
        if day['neighborhoods_visited']:
            main_nb = list(day['neighborhoods_visited'])[0]
            subcats = [a['subcategory'] for a in day['activities']]

            if subcats.count('museum') >= 2:
                day['theme'] = f"Museums of {main_nb}"
            elif subcats.count('church') >= 2:
                day['theme'] = f"Sacred {main_nb}"
            elif subcats.count('monument') >= 2:
                day['theme'] = f"Historic {main_nb}"
            elif subcats.count('park') >= 2:
                day['theme'] = f"Gardens of {main_nb}"
            else:
                nbs = list(day['neighborhoods_visited'])
                if len(nbs) == 1:
                    day['theme'] = f"Exploring {nbs[0]}"
                else:
                    day['theme'] = f"{nbs[0]} & {nbs[1]}"
        else:
            day['theme'] = f"Day {day_num}"

        day['neighborhoods_visited'] = list(day['neighborhoods_visited'])
        days.append(day)

    # Calculate summary
    total_activities = sum(len(d['activities']) for d in days)
    total_duration = sum(d['total_duration'] for d in days)

    return {
        'trip': {
            'city': trip.city,
            'start_date': str(trip.start_date),
            'end_date': str(trip.end_date),
            'num_days': trip.num_days,
            'group_type': trip.group_type,
            'vibes': trip.vibes,
            'budget_level': trip.budget_level,
            'pacing': trip.pacing
        },
        'stay_recommendation': neighborhood_scores[0] if neighborhood_scores else None,
        'days': days,
        'summary': {
            'total_attractions': total_activities,
            'total_duration_hours': total_duration / 60,
            'total_travel_hours': total_travel_time / 60,
            'total_time_hours': (total_duration + total_travel_time) / 60,
            'must_see_count': sum(1 for d in days for a in d['activities'] if a['priority'] == 'must_see'),
            'avg_per_day': total_activities / trip.num_days if trip.num_days > 0 else 0
        }
    }


def display_itinerary(itinerary: dict, persona_name: str = ""):
    """Display itinerary with travel time between POIs."""
    trip = itinerary['trip']

    print(f"\n{'='*80}")
    print(f"üóìÔ∏è {trip['num_days']}-DAY {trip['city'].upper()} ITINERARY")
    if persona_name:
        print(f"   üë§ Persona: {persona_name}")
    print(f"   üéØ Group: {trip['group_type'].title()} | Vibes: {', '.join(trip['vibes'])}")
    print(f"   ‚è±Ô∏è Pacing: {trip['pacing'].title()}")
    print(f"{'='*80}")

    # Stay recommendation
    if itinerary['stay_recommendation']:
        stay = itinerary['stay_recommendation']
        print(f"\nüè† WHERE TO STAY: {stay['name']}")
        print(f"   üí° {stay['reasoning']}")

    # Day by day
    for day in itinerary['days']:
        print(f"\n{'‚îÄ'*80}")
        print(f"üìÖ DAY {day['day']}: {day['theme']}")
        print(f"   üìÜ {day['date']}")
        nbs = day.get('neighborhoods_visited', [])
        if nbs:
            print(f"   üìç Areas: {', '.join(nbs)}")
        print(f"{'‚îÄ'*80}")

        for i, activity in enumerate(day['activities'], 1):
            # Show travel time from previous activity
            if activity.get('travel_from_previous'):
                travel = activity['travel_from_previous']
                mode_emoji = 'üö∂' if travel['recommended_mode'] == 'walk' else 'üöá'
                print(f"\n   {mode_emoji} {travel['recommended_time']} min {travel['recommended_mode']} ({travel['distance_km']} km)")
                print(f"   ‚îÇ")

            # Emoji based on subcategory
            emoji_map = {
                'museum': 'üèõÔ∏è',
                'church': '‚õ™',
                'monument': 'üóº',
                'park': 'üå≥',
                'landmark': 'üìç',
            }
            emoji = emoji_map.get(activity['subcategory'], 'üèõÔ∏è')
            priority_badge = '‚≠ê MUST-SEE' if activity['priority'] == 'must_see' else ''
            famous_badge = 'üåü' if activity.get('is_famous', False) else ''

            print(f"\n   {i}. {emoji} {activity['name']} {famous_badge}{priority_badge}")
            print(f"      üìç {activity['neighborhood']}")
            print(f"      ‚è±Ô∏è {activity['duration']} min | üè∑Ô∏è {activity['subcategory'].title()}")
            print(f"      üí° {activity['reasoning']}")

        # Day totals including travel
        day_total_with_travel = day['total_duration'] + day['total_travel_time']
        print(f"\n   {'‚îÄ'*40}")
        print(f"   üìä Day: {day['total_duration']} min sightseeing + {day['total_travel_time']} min travel = {day_total_with_travel} min ({day_total_with_travel/60:.1f} hrs)")

    # Summary
    summary = itinerary['summary']
    print(f"\n{'='*80}")
    print(f"üìä TRIP SUMMARY")
    print(f"{'='*80}")
    print(f"   Attractions: {summary['total_attractions']} | Must-See: {summary['must_see_count']}")
    print(f"   Sightseeing: {summary['total_duration_hours']:.1f}h | Travel: {summary['total_travel_hours']:.1f}h | Total: {summary['total_time_hours']:.1f}h")


print("‚úÖ Itinerary builder ready (with neighborhood clustering)!")

‚úÖ Itinerary builder ready (with neighborhood clustering)!


In [25]:
# ===========================
# GENERATE MULTIPLE PERSONA ITINERARIES
# ===========================

# Define 6 different persona configurations
PERSONA_CONFIGS = [
    {
        'name': 'üéí Solo Explorer - Cultural Deep Dive',
        'group_type': 'solo',
        'vibes': ['cultural', 'photography', 'adventure'],
        'pacing': 'fast',
        'budget_level': 3,
        'description': 'Independent traveler who wants to see everything, loves museums and hidden gems'
    },
    {
        'name': 'üíë Romantic Couple - Honeymoon Vibes',
        'group_type': 'honeymoon',
        'vibes': ['romantic', 'cultural', 'photography'],
        'pacing': 'slow',
        'budget_level': 4,
        'description': 'Newlyweds seeking romantic spots, scenic views, and memorable experiences'
    },
    {
        'name': 'üë¥üëµ Senior Travelers - Relaxed Culture',
        'group_type': 'seniors',
        'vibes': ['cultural', 'relaxation', 'nature'],
        'pacing': 'slow',
        'budget_level': 3,
        'description': 'Experienced travelers who prefer accessible sites, parks, and a gentle pace'
    },
    {
        'name': 'üë®‚Äçüë©‚Äçüëß‚Äçüë¶ Family Adventure',
        'group_type': 'family',
        'vibes': ['cultural', 'nature', 'adventure'],
        'pacing': 'moderate',
        'budget_level': 3,
        'description': 'Family with kids looking for engaging, kid-friendly attractions and parks',
        'has_kids': True
    },
    {
        'name': 'üëØ Friends Group - Active Exploration',
        'group_type': 'friends',
        'vibes': ['adventure', 'photography', 'cultural'],
        'pacing': 'fast',
        'budget_level': 2,
        'description': 'Group of friends who want to cover maximum ground and capture great photos'
    },
    {
        'name': 'üíº Business + Leisure',
        'group_type': 'business',
        'vibes': ['cultural', 'relaxation'],
        'pacing': 'moderate',
        'budget_level': 4,
        'description': 'Business traveler with limited free time, wants key highlights efficiently'
    },
]

# Store all generated itineraries
all_itineraries = {}

print("="*80)
print("üåç GENERATING 6 PERSONA-BASED ITINERARIES FOR", city_config['name'].upper())
print("="*80)

for config in PERSONA_CONFIGS:
    print(f"\n\n{'#'*80}")
    print(f"# {config['name']}")
    print(f"# {config['description']}")
    print(f"{'#'*80}")

    # Create trip config for this persona
    persona_trip = TripConfig(
        city=city_config['name'],
        num_days=4,
        start_date=date(2025, 6, 15),
        group_type=config['group_type'],
        group_size=2,
        has_kids=config.get('has_kids', False),
        has_seniors=(config['group_type'] == 'seniors'),
        vibes=config['vibes'],
        budget_level=config['budget_level'],
        pacing=config['pacing'],
    )

    # Score POIs for this persona (using fresh scoring)
    persona_scored_pois = []
    for poi in enriched_pois:
        scored = score_poi_for_trip(poi, persona_trip)
        if scored:
            persona_scored_pois.append(scored)

    # Sort by: Famous first, then must-see, then score
    persona_scored_pois.sort(key=lambda x: (x['is_famous'], x['priority'] == 'must_see', x['final_score']), reverse=True)

    # Count famous landmarks for this persona
    famous_count = sum(1 for p in persona_scored_pois if p['is_famous'])
    print(f"   ‚≠ê Famous landmarks available: {famous_count}")

    # Update neighborhood scores for this persona
    persona_neighborhood_scores = [
        score_neighborhood(nb, persona_trip)
        for nb in city_config['neighborhoods']
    ]
    persona_neighborhood_scores.sort(key=lambda x: x['score'], reverse=True)

    # Store original and update global for build function
    global neighborhood_scores
    neighborhood_scores = persona_neighborhood_scores

    # Build itinerary
    itinerary = build_attraction_itinerary(persona_scored_pois, persona_trip)

    # Display
    display_itinerary(itinerary, config['name'])

    # Store
    all_itineraries[config['group_type']] = {
        'config': config,
        'itinerary': itinerary
    }

print("\n\n" + "="*80)
print("‚úÖ ALL 6 ITINERARIES GENERATED!")
print("="*80)

# Quick summary
print("\nüìä QUICK SUMMARY:")
for group_type, data in all_itineraries.items():
    config = data['config']
    itinerary = data['itinerary']
    famous_in_trip = sum(1 for d in itinerary['days'] for a in d['activities'] if a.get('is_famous', False))
    print(f"   {config['name'].split('-')[0].strip()}: {itinerary['summary']['total_attractions']} attractions, {famous_in_trip} famous landmarks")

üåç GENERATING 6 PERSONA-BASED ITINERARIES FOR PARIS


################################################################################
# üéí Solo Explorer - Cultural Deep Dive
# Independent traveler who wants to see everything, loves museums and hidden gems
################################################################################
   ‚≠ê Famous landmarks available: 18
   üèõÔ∏è Found 402 attractions for solo trip
   ‚≠ê Famous landmarks: 17
   üìç Regular attractions: 385

üóìÔ∏è 4-DAY PARIS ITINERARY
   üë§ Persona: üéí Solo Explorer - Cultural Deep Dive
   üéØ Group: Solo | Vibes: cultural, photography, adventure
   ‚è±Ô∏è Pacing: Fast

üè† WHERE TO STAY: Montmartre
   üí° Great for solo travelers; Matches cultural, photography vibes

‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î

In [26]:
# ===========================
# COMPARISON: HOW PERSONAS DIFFER
# ===========================

print("\n" + "="*100)
print("üìä PERSONA COMPARISON - WHY EACH ITINERARY IS DIFFERENT")
print("="*100)

comparison_data = []

for group_type, data in all_itineraries.items():
    config = data['config']
    itinerary = data['itinerary']

    # Get all attractions
    all_attractions = []
    for day in itinerary['days']:
        all_attractions.extend([a['name'] for a in day['activities']])

    # Get subcategories
    subcategories = []
    for day in itinerary['days']:
        subcategories.extend([a['subcategory'] for a in day['activities']])

    comparison_data.append({
        'Persona': config['name'].split(' - ')[0],
        'Group': config['group_type'],
        'Vibes': ', '.join(config['vibes']),
        'Pacing': config['pacing'],
        'Attractions': itinerary['summary']['total_attractions'],
        'Sightseeing': f"{itinerary['summary']['total_duration_hours']:.1f}h",
        'Travel': f"{itinerary['summary']['total_travel_hours']:.1f}h",
        'Total': f"{itinerary['summary']['total_time_hours']:.1f}h",
        'Top Types': ', '.join(list(set(subcategories))[:3])
    })

# Display as table
comparison_df = pd.DataFrame(comparison_data)
print("\nüìã OVERVIEW TABLE:")
print(comparison_df.to_string(index=False))

# Show unique attractions per persona
print("\n\n" + "="*100)
print("üîç UNIQUE ATTRACTIONS BY PERSONA")
print("="*100)

# Collect all attractions per persona
persona_attractions = {}
for group_type, data in all_itineraries.items():
    config = data['config']
    itinerary = data['itinerary']

    attractions = set()
    for day in itinerary['days']:
        for a in day['activities']:
            attractions.add(a['name'])

    persona_attractions[config['name'].split(' - ')[0]] = attractions

# Find overlaps and unique
all_common = set.intersection(*persona_attractions.values()) if persona_attractions else set()

print(f"\nüåü ATTRACTIONS COMMON TO ALL PERSONAS ({len(all_common)}):")
for a in list(all_common)[:5]:
    print(f"   ‚Ä¢ {a}")

print("\nüìç UNIQUE/PRIORITY ATTRACTIONS BY PERSONA:")
for persona, attractions in persona_attractions.items():
    unique = attractions - all_common
    print(f"\n   {persona}:")
    for a in list(unique)[:3]:
        print(f"      ‚Ä¢ {a}")


# WHY analysis - Updated with realistic pacing
print("\n\n" + "="*100)
print("üí° WHY EACH PERSONA GETS DIFFERENT RECOMMENDATIONS")
print("="*100)

persona_reasoning = {
    'solo': """
    üéí SOLO EXPLORER gets:
    - MORE attractions (fast pacing = 5/day)
    - Museums & cultural sites (high solo score)
    - Hidden gems and off-beaten-path spots
    - Photography hotspots
    WHY: Solo travelers can move quickly, spend more time at museums,
    and don't need to coordinate with others. Travel time is minimized
    since one person can navigate efficiently.
    """,

    'honeymoon': """
    üíë HONEYMOON COUPLE gets:
    - FEWER attractions (slow pacing = 2/day)
    - Romantic viewpoints and scenic spots
    - Iconic landmarks for memorable photos
    - Gardens and peaceful areas
    WHY: Honeymooners prioritize quality over quantity, want intimate
    experiences, and need time for each other - plus leisurely meals
    and spontaneous moments. Walking between 2 spots is perfect!
    """,

    'seniors': """
    üë¥üëµ SENIORS get:
    - FEWER attractions (slow pacing = 2/day)
    - Accessible, well-maintained sites
    - Parks and gardens for rest
    - Cultural sites without extensive walking
    WHY: Senior travelers appreciate comfort, accessibility, and a
    relaxed pace. With only 2 attractions per day, there's time to
    rest and avoid physical strain from excessive travel.
    """,

    'family': """
    üë®‚Äçüë©‚Äçüëß‚Äçüë¶ FAMILY gets:
    - MODERATE attractions (4/day)
    - Kid-friendly sites (parks, interactive museums)
    - Mix of education and entertainment
    - Outdoor spaces for kids to run
    WHY: Families need variety to keep kids engaged, plus breaks
    and open spaces for children's energy. Travel time between
    spots allows for snack/bathroom breaks.
    """,

    'friends': """
    üëØ FRIENDS GROUP gets:
    - MORE attractions (fast pacing = 5/day)
    - Photography spots (Instagram-worthy)
    - Adventure activities
    - Active exploration routes
    WHY: Friend groups are energetic, want maximum experiences,
    love group photos, and can split up/regroup easily. 5 attractions
    per day is achievable with efficient metro use.
    """,

    'business': """
    üíº BUSINESS TRAVELER gets:
    - MODERATE attractions (4/day)
    - Key highlights only (must-sees)
    - Efficient routes with minimal travel time
    - Cultural essentials
    WHY: Limited time requires focusing on the absolute must-see
    attractions. 4 per day ensures major sites without exhaustion.
    """
}

for group_type, reasoning in persona_reasoning.items():
    if group_type in [d['group_type'] for d in PERSONA_CONFIGS]:
        print(reasoning)

print("\n" + "="*100)
print("‚úÖ ANALYSIS COMPLETE!")
print("="*100)


üìä PERSONA COMPARISON - WHY EACH ITINERARY IS DIFFERENT

üìã OVERVIEW TABLE:
                 Persona     Group                            Vibes   Pacing  Attractions Sightseeing Travel Total                Top Types
         üéí Solo Explorer      solo cultural, photography, adventure     fast           20       36.0h   2.4h 38.4h   landmark, park, church
       üíë Romantic Couple honeymoon  romantic, cultural, photography     slow            8        6.5h   1.0h  7.5h         monument, church
     üë¥üëµ Senior Travelers   seniors     cultural, relaxation, nature     slow            8        6.5h   1.0h  7.5h                   church
üë®‚Äçüë©‚Äçüëß‚Äçüë¶ Family Adventure    family      cultural, nature, adventure moderate           16       23.0h   2.4h 25.4h   landmark, park, church
         üëØ Friends Group   friends adventure, photography, cultural     fast           20       31.5h   3.0h 34.5h     park, church, museum
    üíº Business + Leisure  business         

In [27]:
# ===========================
# SAVE ITINERARY
# ===========================

output_dir = Path('../data/itineraries')
output_dir.mkdir(parents=True, exist_ok=True)

filename = f"{trip.city.lower()}_{trip.group_type}_{trip.num_days}days.json"
output_file = output_dir / filename

with open(output_file, 'w', encoding='utf-8') as f:
    json.dump(itinerary, f, indent=2, ensure_ascii=False)

print(f"\n‚úÖ Itinerary saved to: {output_file}")


‚úÖ Itinerary saved to: ../data/itineraries/paris_honeymoon_4days.json


---
## üéâ Summary

### What We Did:
1. ‚úÖ **Added City** - Configured bounding box and neighborhoods
2. ‚úÖ **Fetched POIs** - From Overture Maps via BigQuery
3. ‚úÖ **Scored POIs** - Generated persona scores
4. ‚úÖ **Saved Data** - To seed file for API use
5. ‚úÖ **Defined Trip** - Persona, budget, days
6. ‚úÖ **Generated Itinerary** - Where to stay + What to visit

### Next Steps:
```bash
# Load POIs into database
python -m data.scripts.seed_data {city}

# Start API server
uvicorn app.main:app --reload

# Use the API
POST /api/v1/recommendations/first-level
POST /api/v1/itinerary/generate
```