# üèõÔ∏è Rome POI Clustering Analysis
## Using Overture Maps Data (BigQuery) for Persona-Driven Itinerary Planning

---

### Objectives

1. **Fetch Real POI Data** from Overture Maps via Google BigQuery
2. **Cluster POIs** into walkable day-zones
3. **Profile Each Cluster** - Which personas/groups fit best?
4. **Plan Routes** - How pacing affects itinerary within each cluster

### Data Source
```
BigQuery Public Dataset: bigquery-public-data.overture_maps
Tables: place, address, building, segment
```

---

In [None]:
# Install required packages
# !pip install google-cloud-bigquery pandas numpy plotly scikit-learn db-dtypes pandas-gbq

In [None]:
import pandas as pd
import numpy as np
import json
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# BigQuery
try:
    from google.cloud import bigquery
    from google.oauth2 import service_account
    BIGQUERY_AVAILABLE = True
except ImportError:
    BIGQUERY_AVAILABLE = False
    print("‚ö†Ô∏è google-cloud-bigquery not installed. Run: pip install google-cloud-bigquery db-dtypes")

# Clustering
from sklearn.cluster import DBSCAN, KMeans, AgglomerativeClustering
from sklearn.preprocessing import StandardScaler

# Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Distance calculations
from math import radians, cos, sin, asin, sqrt, atan2

# Set style
import plotly.io as pio
pio.templates.default = "plotly_white"

# Colors
COLORS = px.colors.qualitative.Set2
PERSONA_COLORS = {
    'family': '#4CAF50',
    'couple': '#E91E63',
    'honeymoon': '#FF4081',
    'solo': '#2196F3',
    'friends': '#FF9800',
    'seniors': '#9C27B0',
    'business': '#607D8B'
}

print("‚úÖ Libraries loaded!")

---
## 1. Fetch Rome Data from Overture Maps (BigQuery)

### BigQuery Public Dataset
```
Project: bigquery-public-data
Dataset: overture_maps
Tables:
  - place (POIs - restaurants, attractions, etc.)
  - address
  - building
  - segment (roads)
```

In [None]:
# Rome bounding box coordinates
ROME_BBOX = {
    'min_lon': 12.40,
    'min_lat': 41.85,
    'max_lon': 12.55,
    'max_lat': 41.95
}

# Rome neighborhoods for reference
ROME_NEIGHBORHOODS = {
    'Centro Storico': {'lat': 41.8986, 'lon': 12.4769, 'radius': 0.015},
    'Trastevere': {'lat': 41.8894, 'lon': 12.4700, 'radius': 0.012},
    'Vatican City': {'lat': 41.9029, 'lon': 12.4534, 'radius': 0.010},
    'Testaccio': {'lat': 41.8767, 'lon': 12.4744, 'radius': 0.008},
    'Monti': {'lat': 41.8956, 'lon': 12.4939, 'radius': 0.008},
    'Aventine': {'lat': 41.8826, 'lon': 12.4791, 'radius': 0.008},
    'Prati': {'lat': 41.9071, 'lon': 12.4600, 'radius': 0.010},
    'Villa Borghese': {'lat': 41.9125, 'lon': 12.4850, 'radius': 0.012},
    'Colosseo': {'lat': 41.8902, 'lon': 12.4922, 'radius': 0.010},
    'San Lorenzo': {'lat': 41.8970, 'lon': 12.5150, 'radius': 0.008}
}

print(f"üìç Rome Bounding Box:")
print(f"   Lat: {ROME_BBOX['min_lat']} to {ROME_BBOX['max_lat']}")
print(f"   Lon: {ROME_BBOX['min_lon']} to {ROME_BBOX['max_lon']}")

In [None]:
def fetch_overture_from_bigquery(project_id=None):
    """
    Fetch Rome POI data from Overture Maps via BigQuery.
    
    BigQuery Dataset: bigquery-public-data.overture_maps
    Note: geometry is GEOGRAPHY type, use ST_X/ST_Y to extract coordinates
    """
    
    if not BIGQUERY_AVAILABLE:
        print("‚ùå BigQuery not available")
        return None
    
    # Initialize BigQuery client
    try:
        if project_id:
            client = bigquery.Client(project=project_id)
        else:
            client = bigquery.Client()
        print(f"‚úÖ Connected to BigQuery (Project: {client.project})")
    except Exception as e:
        print(f"‚ö†Ô∏è BigQuery connection failed: {e}")
        print("   Tip: Run 'gcloud auth application-default login' or set up service account")
        return None
    
    # Query Overture Maps place data for Rome
    # Using simplified query - core fields only (avoiding complex nested arrays)
    query = f"""
    SELECT 
        id,
        names.primary AS name,
        categories.primary AS category,
        categories.alternate AS subcategories,
        ST_Y(geometry) AS latitude,
        ST_X(geometry) AS longitude,
        confidence
    FROM 
        `bigquery-public-data.overture_maps.place`
    WHERE 
        ST_X(geometry) BETWEEN {ROME_BBOX['min_lon']} AND {ROME_BBOX['max_lon']}
        AND ST_Y(geometry) BETWEEN {ROME_BBOX['min_lat']} AND {ROME_BBOX['max_lat']}
        AND confidence > 0.6
        AND names.primary IS NOT NULL
    LIMIT 5000
    """
    
    print("\nüîÑ Fetching data from BigQuery (Overture Maps)...")
    print(f"   Query: Rome POIs within bounding box")
    print(f"   Using ST_X/ST_Y for GEOGRAPHY extraction")
    
    try:
        df = client.query(query).to_dataframe()
        print(f"‚úÖ Fetched {len(df)} POIs from Overture Maps!")
        return df
    except Exception as e:
        print(f"‚ùå Query failed: {e}")
        return None

# Alternative: Use pandas-gbq (simpler authentication)
def fetch_overture_pandas_gbq(project_id):
    """
    Alternative method using pandas-gbq.
    Requires: pip install pandas-gbq
    """
    try:
        import pandas_gbq
    except ImportError:
        print("‚ö†Ô∏è pandas-gbq not installed. Run: pip install pandas-gbq")
        return None
    
    query = f"""
    SELECT 
        id,
        names.primary AS name,
        categories.primary AS category,
        categories.alternate AS subcategories,
        ST_Y(geometry) AS latitude,
        ST_X(geometry) AS longitude,
        confidence
    FROM 
        `bigquery-public-data.overture_maps.place`
    WHERE 
        ST_X(geometry) BETWEEN {ROME_BBOX['min_lon']} AND {ROME_BBOX['max_lon']}
        AND ST_Y(geometry) BETWEEN {ROME_BBOX['min_lat']} AND {ROME_BBOX['max_lat']}
        AND confidence > 0.6
        AND names.primary IS NOT NULL
    LIMIT 5000
    """
    
    print("\nüîÑ Fetching via pandas-gbq...")
    print(f"   Using ST_X/ST_Y for GEOGRAPHY extraction")
    try:
        df = pandas_gbq.read_gbq(query, project_id=project_id)
        print(f"‚úÖ Fetched {len(df)} POIs!")
        return df
    except Exception as e:
        print(f"‚ùå Failed: {e}")
        return None

In [None]:
# ============================================
# üîß CONFIGURE YOUR PROJECT ID HERE
# ============================================
# Replace with your GCP project ID
PROJECT_ID = "gen-lang-client-0518072406"  # Your project ID

# Try to fetch from BigQuery
overture_df = fetch_overture_from_bigquery(project_id=PROJECT_ID)

# If that fails, try pandas-gbq
if overture_df is None:
    print("\nüîÑ Trying pandas-gbq method...")
    overture_df = fetch_overture_pandas_gbq(PROJECT_ID)

In [None]:
# Fallback: Generate rich sample data if BigQuery fails
def generate_rome_sample_data(n_pois=500):
    """Generate realistic Rome POI data (fallback if BigQuery unavailable)."""
    np.random.seed(42)
    
    # POI types matching Overture categories
    categories = {
        'restaurant': 0.30,
        'cafe': 0.12,
        'bar': 0.08,
        'attraction': 0.15,
        'museum': 0.05,
        'hotel': 0.08,
        'shop': 0.12,
        'church': 0.05,
        'park': 0.03,
        'entertainment': 0.02
    }
    
    # Famous Rome POIs
    famous_pois = [
        {'name': 'Colosseum', 'lat': 41.8902, 'lon': 12.4922, 'category': 'attraction'},
        {'name': 'Vatican Museums', 'lat': 41.9065, 'lon': 12.4536, 'category': 'museum'},
        {'name': 'St. Peters Basilica', 'lat': 41.9022, 'lon': 12.4539, 'category': 'church'},
        {'name': 'Trevi Fountain', 'lat': 41.9009, 'lon': 12.4833, 'category': 'attraction'},
        {'name': 'Pantheon', 'lat': 41.8986, 'lon': 12.4769, 'category': 'attraction'},
        {'name': 'Roman Forum', 'lat': 41.8925, 'lon': 12.4853, 'category': 'attraction'},
        {'name': 'Spanish Steps', 'lat': 41.9060, 'lon': 12.4828, 'category': 'attraction'},
        {'name': 'Piazza Navona', 'lat': 41.8992, 'lon': 12.4731, 'category': 'attraction'},
        {'name': 'Borghese Gallery', 'lat': 41.9142, 'lon': 12.4921, 'category': 'museum'},
        {'name': 'Castel Sant Angelo', 'lat': 41.9031, 'lon': 12.4663, 'category': 'attraction'},
        {'name': 'Campo de Fiori', 'lat': 41.8956, 'lon': 12.4720, 'category': 'attraction'},
        {'name': 'Villa Borghese', 'lat': 41.9125, 'lon': 12.4850, 'category': 'park'},
        {'name': 'Trastevere', 'lat': 41.8867, 'lon': 12.4688, 'category': 'attraction'},
        {'name': 'Testaccio Market', 'lat': 41.8767, 'lon': 12.4744, 'category': 'shop'},
        {'name': 'Aventine Keyhole', 'lat': 41.8826, 'lon': 12.4791, 'category': 'attraction'},
    ]
    
    pois = []
    
    # Add famous POIs
    for fp in famous_pois:
        pois.append({
            'id': f"famous_{len(pois)}",
            'name': fp['name'],
            'latitude': fp['lat'],
            'longitude': fp['lon'],
            'category': fp['category'],
            'confidence': 0.99,
            'is_famous': True
        })
    
    # Generate remaining POIs around neighborhoods
    remaining = n_pois - len(pois)
    
    for i in range(remaining):
        # Pick neighborhood
        nb_name = np.random.choice(list(ROME_NEIGHBORHOODS.keys()))
        nb = ROME_NEIGHBORHOODS[nb_name]
        
        # Generate location
        lat = nb['lat'] + np.random.normal(0, nb['radius'])
        lon = nb['lon'] + np.random.normal(0, nb['radius'] * 1.3)
        
        # Pick category
        category = np.random.choice(list(categories.keys()), p=list(categories.values()))
        
        pois.append({
            'id': f"poi_{len(pois)}",
            'name': f"{category.title()} {len(pois)}",
            'latitude': lat,
            'longitude': lon,
            'category': category,
            'confidence': np.random.uniform(0.7, 0.99),
            'is_famous': False
        })
    
    return pd.DataFrame(pois)

# Use BigQuery data or fallback to sample
if overture_df is not None and len(overture_df) > 100:
    df = overture_df.copy()
    print(f"\n‚úÖ Using Overture Maps BigQuery data: {len(df)} POIs")
else:
    print("\n‚ö†Ô∏è Using generated sample data (BigQuery unavailable)")
    df = generate_rome_sample_data(500)
    print(f"‚úÖ Generated {len(df)} sample POIs for Rome")

In [None]:
# Data preview
print("\n" + "="*70)
print("üìä DATA PREVIEW")
print("="*70)
print(f"\nColumns: {list(df.columns)}")
print(f"\nShape: {df.shape}")
df.head(10)

In [None]:
# Clean and enrich data
def assign_neighborhood(lat, lon):
    """Assign neighborhood based on proximity."""
    min_dist = float('inf')
    nearest = 'Other'
    
    for name, coords in ROME_NEIGHBORHOODS.items():
        dist = ((lat - coords['lat'])**2 + (lon - coords['lon'])**2)**0.5
        if dist < min_dist and dist < coords['radius'] * 1.5:
            min_dist = dist
            nearest = name
    
    return nearest

def map_to_itinerary_category(overture_category):
    """Map Overture categories to our itinerary categories."""
    if overture_category is None:
        return 'other'
    
    cat = str(overture_category).lower()
    
    # Attractions
    if any(x in cat for x in ['museum', 'gallery', 'art']):
        return 'museum'
    if any(x in cat for x in ['church', 'basilica', 'cathedral', 'chapel']):
        return 'church'
    if any(x in cat for x in ['monument', 'memorial', 'statue', 'fountain']):
        return 'monument'
    if any(x in cat for x in ['ruin', 'archaeological', 'ancient', 'historic']):
        return 'historical_site'
    if any(x in cat for x in ['park', 'garden', 'green']):
        return 'park'
    if any(x in cat for x in ['viewpoint', 'scenic', 'lookout']):
        return 'viewpoint'
    if any(x in cat for x in ['attraction', 'landmark', 'sight', 'tourist']):
        return 'attraction'
    
    # Food & Drink
    if any(x in cat for x in ['restaurant', 'ristorante', 'trattoria', 'osteria']):
        return 'restaurant'
    if any(x in cat for x in ['pizza', 'pizzeria']):
        return 'pizzeria'
    if any(x in cat for x in ['cafe', 'coffee', 'caff√®', 'espresso', 'bar']):
        return 'cafe'
    if any(x in cat for x in ['gelat', 'ice cream']):
        return 'gelateria'
    if any(x in cat for x in ['wine', 'enoteca']):
        return 'wine_bar'
    if any(x in cat for x in ['pub', 'beer', 'cocktail', 'nightclub', 'club']):
        return 'bar'
    
    # Shopping
    if any(x in cat for x in ['shop', 'store', 'boutique', 'market', 'retail']):
        return 'shop'
    
    # Accommodation
    if any(x in cat for x in ['hotel', 'hostel', 'accommodation', 'b&b']):
        return 'hotel'
    
    return 'other'

def get_itinerary_supercategory(category):
    """Group into main itinerary categories."""
    attractions = ['museum', 'church', 'monument', 'historical_site', 'park', 'viewpoint', 'attraction']
    restaurants = ['restaurant', 'pizzeria', 'trattoria']
    cafes = ['cafe', 'gelateria']
    nightlife = ['bar', 'wine_bar', 'club']
    shopping = ['shop', 'market']
    
    if category in attractions:
        return 'attraction'
    elif category in restaurants:
        return 'restaurant'
    elif category in cafes:
        return 'cafe'
    elif category in nightlife:
        return 'nightlife'
    elif category in shopping:
        return 'shopping'
    else:
        return 'other'

# Apply transformations
df['neighborhood'] = df.apply(lambda x: assign_neighborhood(x['latitude'], x['longitude']), axis=1)
df['subcategory'] = df['category'].apply(map_to_itinerary_category)
df['main_category'] = df['subcategory'].apply(get_itinerary_supercategory)

# Add estimated duration based on category
duration_map = {
    'museum': 120, 'church': 45, 'monument': 30, 'historical_site': 90,
    'park': 60, 'viewpoint': 20, 'attraction': 60,
    'restaurant': 75, 'pizzeria': 60, 'trattoria': 75,
    'cafe': 30, 'gelateria': 20,
    'bar': 60, 'wine_bar': 75,
    'shop': 30, 'market': 60,
    'other': 30
}
df['typical_duration'] = df['subcategory'].map(duration_map).fillna(30)

# Add cost level estimate
cost_map = {
    'museum': 3, 'church': 1, 'monument': 1, 'historical_site': 3,
    'park': 1, 'viewpoint': 1, 'attraction': 2,
    'restaurant': 3, 'pizzeria': 2, 'trattoria': 2,
    'cafe': 1, 'gelateria': 1,
    'bar': 2, 'wine_bar': 3,
    'shop': 2, 'market': 2,
    'other': 2
}
df['cost_level'] = df['subcategory'].map(cost_map).fillna(2)

print(f"\n‚úÖ Data enriched!")
print(f"\nCategory distribution:")
print(df['main_category'].value_counts())

In [None]:
# Overview statistics
print("\n" + "="*70)
print("üìä ROME DATA OVERVIEW (Overture Maps)")
print("="*70)

print(f"\nüìç Total POIs: {len(df)}")
print(f"\nüèõÔ∏è By Main Category:")
for cat, count in df['main_category'].value_counts().items():
    pct = count / len(df) * 100
    print(f"   {cat.title():15} {count:5} ({pct:5.1f}%)")

print(f"\nüìç By Neighborhood:")
for nb, count in df['neighborhood'].value_counts().head(10).items():
    pct = count / len(df) * 100
    print(f"   {nb:20} {count:5} ({pct:5.1f}%)")

In [None]:
# Visualize all POIs on map
fig = px.scatter_mapbox(
    df,
    lat='latitude',
    lon='longitude',
    color='main_category',
    hover_name='name',
    hover_data=['subcategory', 'neighborhood'],
    title='<b>Rome POIs from Overture Maps</b><br><sup>Colored by category</sup>',
    zoom=12,
    height=600,
    color_discrete_sequence=px.colors.qualitative.Set2
)

fig.update_layout(
    mapbox_style='carto-positron',
    margin={'r':0,'t':80,'l':0,'b':0}
)

fig.show()

---
## 2. Geographic Clustering - Day Zones

Cluster POIs into **walkable day-zones** where travelers can spend:
- **Micro zones** (300m) - Quick walkable cluster
- **Half-day zones** (500m) - Morning or afternoon
- **Full-day zones** (1km) - Entire day exploration

In [None]:
def haversine(lon1, lat1, lon2, lat2):
    """Calculate distance in km."""
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    return 6371 * 2 * asin(sqrt(a))

# Prepare coordinates
coords = df[['latitude', 'longitude']].values
coords_rad = np.radians(coords)

# DBSCAN clustering at different scales
zone_configs = {
    'micro_zone': {'eps_km': 0.3, 'min_samples': 5, 'description': 'Quick stops (5-10 min walk)'},
    'half_day_zone': {'eps_km': 0.5, 'min_samples': 8, 'description': 'Half-day area (15 min walk)'},
    'day_zone': {'eps_km': 0.8, 'min_samples': 15, 'description': 'Full day zone (20-25 min walk)'}
}

print("\n" + "="*70)
print("üó∫Ô∏è CLUSTERING RESULTS")
print("="*70)

for zone_name, config in zone_configs.items():
    eps_rad = config['eps_km'] / 6371.0
    dbscan = DBSCAN(eps=eps_rad, min_samples=config['min_samples'], metric='haversine')
    df[zone_name] = dbscan.fit_predict(coords_rad)
    
    n_clusters = len(set(df[zone_name])) - (1 if -1 in df[zone_name].values else 0)
    n_noise = (df[zone_name] == -1).sum()
    n_clustered = len(df) - n_noise
    
    print(f"\n{zone_name.upper()} ({config['eps_km']}km / {config['description']}):")
    print(f"   üìç Clusters found: {n_clusters}")
    print(f"   ‚úÖ Clustered POIs: {n_clustered} ({n_clustered/len(df)*100:.1f}%)")
    print(f"   ‚ùå Isolated POIs: {n_noise}")

In [None]:
# Visualize Day Zones
df_day_zones = df[df['day_zone'] >= 0].copy()

fig = px.scatter_mapbox(
    df_day_zones,
    lat='latitude',
    lon='longitude',
    color='day_zone',
    hover_name='name',
    hover_data=['main_category', 'neighborhood', 'typical_duration'],
    title='<b>Rome Day Zones</b><br><sup>Each color = one full-day exploration area</sup>',
    zoom=12,
    height=700,
    color_continuous_scale='Turbo'
)

fig.update_layout(
    mapbox_style='carto-positron',
    margin={'r':0,'t':80,'l':0,'b':0}
)

fig.show()

n_day_zones = df_day_zones['day_zone'].nunique()
print(f"\nüìç Identified {n_day_zones} distinct day zones in Rome")

---
## 3. Cluster Profiling - Persona & Group Tagging

For each day zone, we determine:
- **Dominant vibes**: Cultural, Foodie, Romantic, etc.
- **Best group types**: Families, Couples, Solo travelers, etc.
- **Zone characteristics**: What makes this zone special?

In [None]:
def calculate_persona_scores(cluster_df):
    """
    Calculate persona fit scores based on POI composition.
    Returns vibe_scores and group_scores dictionaries.
    """
    
    # Category -> Persona mapping (which personas like which POI types)
    category_vibes = {
        'museum': {'cultural': 0.95, 'photography': 0.7, 'relaxation': 0.5},
        'church': {'cultural': 0.9, 'photography': 0.8, 'romantic': 0.6},
        'monument': {'cultural': 0.8, 'photography': 0.9, 'romantic': 0.5},
        'historical_site': {'cultural': 0.95, 'adventure': 0.6, 'photography': 0.8},
        'park': {'nature': 0.95, 'relaxation': 0.9, 'romantic': 0.7, 'family': 0.8},
        'viewpoint': {'photography': 0.95, 'romantic': 0.9},
        'attraction': {'cultural': 0.7, 'photography': 0.7},
        'restaurant': {'foodie': 0.9, 'romantic': 0.6, 'relaxation': 0.5},
        'pizzeria': {'foodie': 0.8, 'family': 0.7, 'budget': 0.8},
        'trattoria': {'foodie': 0.95, 'cultural': 0.6, 'romantic': 0.7},
        'cafe': {'relaxation': 0.8, 'romantic': 0.6, 'foodie': 0.5},
        'gelateria': {'foodie': 0.7, 'family': 0.8, 'romantic': 0.6},
        'bar': {'nightlife': 0.9, 'social': 0.8},
        'wine_bar': {'foodie': 0.8, 'romantic': 0.85, 'nightlife': 0.6},
        'shop': {'shopping': 0.9},
        'market': {'foodie': 0.7, 'cultural': 0.6, 'shopping': 0.8, 'photography': 0.6}
    }
    
    category_groups = {
        'museum': {'couple': 0.8, 'solo': 0.9, 'seniors': 0.8, 'family': 0.6},
        'church': {'couple': 0.7, 'seniors': 0.85, 'solo': 0.8},
        'monument': {'couple': 0.8, 'friends': 0.7, 'solo': 0.8, 'family': 0.7},
        'historical_site': {'couple': 0.8, 'solo': 0.85, 'friends': 0.7, 'family': 0.6},
        'park': {'family': 0.95, 'couple': 0.85, 'friends': 0.8, 'seniors': 0.7},
        'viewpoint': {'couple': 0.95, 'honeymoon': 0.95, 'solo': 0.8, 'friends': 0.8},
        'attraction': {'family': 0.7, 'couple': 0.7, 'friends': 0.7, 'solo': 0.7},
        'restaurant': {'couple': 0.85, 'honeymoon': 0.8, 'friends': 0.8, 'business': 0.7, 'family': 0.7},
        'pizzeria': {'family': 0.9, 'friends': 0.85, 'solo': 0.7, 'budget': 0.9},
        'trattoria': {'couple': 0.9, 'honeymoon': 0.85, 'friends': 0.8, 'seniors': 0.8},
        'cafe': {'solo': 0.9, 'couple': 0.8, 'friends': 0.7, 'business': 0.8},
        'gelateria': {'family': 0.95, 'couple': 0.8, 'friends': 0.8},
        'bar': {'friends': 0.95, 'solo': 0.7, 'couple': 0.7},
        'wine_bar': {'couple': 0.9, 'honeymoon': 0.9, 'friends': 0.8},
        'shop': {'solo': 0.8, 'couple': 0.7, 'friends': 0.7},
        'market': {'foodie': 0.8, 'couple': 0.75, 'solo': 0.85, 'family': 0.6}
    }
    
    # Initialize scores
    vibes = ['cultural', 'foodie', 'romantic', 'adventure', 'nightlife', 
             'shopping', 'relaxation', 'nature', 'photography']
    groups = ['family', 'couple', 'honeymoon', 'solo', 'friends', 'seniors', 'business']
    
    vibe_scores = {v: 0 for v in vibes}
    group_scores = {g: 0 for g in groups}
    
    total_weight = 0
    
    for _, poi in cluster_df.iterrows():
        subcat = poi.get('subcategory', 'other')
        weight = 1.0
        
        # Add vibe scores
        if subcat in category_vibes:
            for vibe, score in category_vibes[subcat].items():
                if vibe in vibe_scores:
                    vibe_scores[vibe] += score * weight
        
        # Add group scores
        if subcat in category_groups:
            for group, score in category_groups[subcat].items():
                if group in group_scores:
                    group_scores[group] += score * weight
        
        total_weight += weight
    
    # Normalize
    if total_weight > 0:
        vibe_scores = {k: min(v/total_weight * 2, 1.0) for k, v in vibe_scores.items()}
        group_scores = {k: min(v/total_weight * 2, 1.0) for k, v in group_scores.items()}
    
    return vibe_scores, group_scores


def profile_day_zone(zone_id, zone_df):
    """Create comprehensive profile for a day zone."""
    
    profile = {
        'zone_id': zone_id,
        'total_pois': len(zone_df),
        'center_lat': zone_df['latitude'].mean(),
        'center_lon': zone_df['longitude'].mean(),
    }
    
    # Category mix
    profile['category_mix'] = zone_df['main_category'].value_counts().to_dict()
    profile['subcategory_mix'] = zone_df['subcategory'].value_counts().head(5).to_dict()
    
    # Primary neighborhood
    profile['primary_neighborhood'] = zone_df['neighborhood'].mode().iloc[0] if len(zone_df) > 0 else 'Unknown'
    profile['neighborhoods'] = zone_df['neighborhood'].unique().tolist()
    
    # Time & cost
    profile['total_duration_hours'] = zone_df['typical_duration'].sum() / 60
    profile['avg_cost_level'] = zone_df['cost_level'].mean()
    
    # Persona scores
    vibe_scores, group_scores = calculate_persona_scores(zone_df)
    profile['vibe_scores'] = vibe_scores
    profile['group_scores'] = group_scores
    
    # Top matches
    profile['top_vibes'] = sorted(vibe_scores.items(), key=lambda x: x[1], reverse=True)[:3]
    profile['top_groups'] = sorted(group_scores.items(), key=lambda x: x[1], reverse=True)[:3]
    
    # Generate zone name
    top_vibe = profile['top_vibes'][0][0] if profile['top_vibes'] else 'mixed'
    nb = profile['primary_neighborhood']
    
    name_templates = {
        'cultural': f"Historic {nb}",
        'foodie': f"{nb} Food Quarter",
        'romantic': f"Romantic {nb}",
        'nightlife': f"{nb} After Dark",
        'shopping': f"{nb} Shopping District",
        'relaxation': f"Peaceful {nb}",
        'nature': f"{nb} Gardens",
        'photography': f"Scenic {nb}"
    }
    profile['zone_name'] = name_templates.get(top_vibe, f"Exploring {nb}")
    
    # Notable POIs (if we have names)
    if 'is_famous' in zone_df.columns:
        famous = zone_df[zone_df['is_famous'] == True]['name'].tolist()
        profile['famous_pois'] = famous[:5]
    else:
        profile['famous_pois'] = []
    
    return profile

In [None]:
# Profile all day zones
zone_profiles = []

for zone_id in sorted(df_day_zones['day_zone'].unique()):
    zone_df = df_day_zones[df_day_zones['day_zone'] == zone_id]
    profile = profile_day_zone(zone_id, zone_df)
    zone_profiles.append(profile)

print(f"\n‚úÖ Profiled {len(zone_profiles)} day zones")

In [None]:
# Display detailed zone profiles
print("\n" + "="*80)
print("üó∫Ô∏è ROME DAY ZONE PROFILES")
print("="*80)

for profile in zone_profiles:
    print(f"\n{'‚îÅ'*80}")
    print(f"üìç ZONE {profile['zone_id']}: {profile['zone_name']}")
    print(f"{'‚îÅ'*80}")
    
    print(f"\nüìä Overview:")
    print(f"   üìç Total POIs: {profile['total_pois']}")
    print(f"   üèòÔ∏è Area: {profile['primary_neighborhood']}")
    print(f"   ‚è±Ô∏è Content: {profile['total_duration_hours']:.1f} hours worth of activities")
    cost_symbol = '$' * int(profile['avg_cost_level'])
    print(f"   üí∞ Avg Cost: {cost_symbol} ({profile['avg_cost_level']:.1f}/5)")
    
    print(f"\nüèõÔ∏è What's Here:")
    for cat, count in profile['category_mix'].items():
        print(f"   ‚Ä¢ {cat.title()}: {count}")
    
    if profile['famous_pois']:
        print(f"\n‚≠ê Notable Spots:")
        for poi in profile['famous_pois']:
            print(f"   ‚Ä¢ {poi}")
    
    print(f"\nüéØ BEST FOR (Vibes):")
    for vibe, score in profile['top_vibes']:
        bar = '‚ñà' * int(score * 15) + '‚ñë' * (15 - int(score * 15))
        print(f"   {vibe.title():15} {bar} {score:.2f}")
    
    print(f"\nüë• BEST FOR (Groups):")
    for group, score in profile['top_groups']:
        bar = '‚ñà' * int(score * 15) + '‚ñë' * (15 - int(score * 15))
        print(f"   {group.title():15} {bar} {score:.2f}")

In [None]:
# Create heatmap: Groups vs Zones
groups = ['family', 'couple', 'honeymoon', 'solo', 'friends', 'seniors', 'business']

group_matrix = []
zone_labels = []

for profile in zone_profiles:
    row = [profile['group_scores'].get(g, 0) for g in groups]
    group_matrix.append(row)
    zone_labels.append(f"Zone {profile['zone_id']}: {profile['zone_name'][:25]}")

fig = px.imshow(
    group_matrix,
    x=[g.title() for g in groups],
    y=zone_labels,
    title='<b>Which Traveler Groups Fit Each Zone?</b><br><sup>Darker = Better Match</sup>',
    color_continuous_scale='YlOrRd',
    aspect='auto'
)

fig.update_layout(height=max(400, len(zone_profiles) * 40))
fig.show()

In [None]:
# Create heatmap: Vibes vs Zones
vibes = ['cultural', 'foodie', 'romantic', 'adventure', 'nightlife', 'shopping', 'relaxation', 'photography']

vibe_matrix = []
for profile in zone_profiles:
    row = [profile['vibe_scores'].get(v, 0) for v in vibes]
    vibe_matrix.append(row)

fig = px.imshow(
    vibe_matrix,
    x=[v.title() for v in vibes],
    y=zone_labels,
    title='<b>Which Vibes Does Each Zone Offer?</b><br><sup>Darker = Stronger Vibe</sup>',
    color_continuous_scale='Blues',
    aspect='auto'
)

fig.update_layout(height=max(400, len(zone_profiles) * 40))
fig.show()

In [None]:
# Map zones colored by dominant vibe
profiles_df = pd.DataFrame(zone_profiles)
profiles_df['top_vibe'] = profiles_df['top_vibes'].apply(lambda x: x[0][0] if x else 'mixed')
profiles_df['top_group'] = profiles_df['top_groups'].apply(lambda x: x[0][0] if x else 'all')

fig = px.scatter_mapbox(
    profiles_df,
    lat='center_lat',
    lon='center_lon',
    size='total_pois',
    color='top_vibe',
    hover_name='zone_name',
    hover_data=['total_pois', 'primary_neighborhood', 'top_group', 'total_duration_hours'],
    title='<b>Day Zone Map - Colored by Dominant Vibe</b><br><sup>Size = number of POIs in zone</sup>',
    zoom=12,
    height=600,
    size_max=50,
    color_discrete_sequence=px.colors.qualitative.Bold
)

fig.update_layout(
    mapbox_style='carto-positron',
    margin={'r':0,'t':80,'l':0,'b':0}
)

fig.show()

---
## 4. Pacing-Based Route Planning

Different travelers want different pacing:

| Pacing | Activities/Day | Style | Best For |
|--------|---------------|-------|----------|
| **Slow** | 2-3 | Long meals, leisurely | Honeymoon, Seniors |
| **Moderate** | 4-5 | Balanced | Couples, Families |
| **Fast** | 6-7 | Efficient | Solo, Friends |

In [None]:
# Pacing configurations
PACING_CONFIG = {
    'slow': {
        'activities_per_day': 3,
        'meal_duration': 90,
        'buffer_minutes': 60,
        'max_walking_km': 3,
        'description': 'üê¢ Relaxed pace with long meals & breaks',
        'best_for': ['honeymoon', 'seniors', 'relaxation seekers']
    },
    'moderate': {
        'activities_per_day': 5,
        'meal_duration': 60,
        'buffer_minutes': 30,
        'max_walking_km': 6,
        'description': 'üö∂ Balanced sightseeing with time to enjoy',
        'best_for': ['couples', 'families', 'first-timers']
    },
    'fast': {
        'activities_per_day': 7,
        'meal_duration': 45,
        'buffer_minutes': 15,
        'max_walking_km': 10,
        'description': 'üèÉ Maximum coverage, efficient schedule',
        'best_for': ['solo travelers', 'friends groups', 'short trips']
    }
}

def create_day_itinerary(zone_df, pacing='moderate', start_time='09:00'):
    """Create optimized day itinerary within a zone."""
    
    config = PACING_CONFIG[pacing]
    zone_df = zone_df.copy()
    
    # Separate by category
    attractions = zone_df[zone_df['main_category'] == 'attraction'].copy()
    restaurants = zone_df[zone_df['main_category'] == 'restaurant'].copy()
    cafes = zone_df[zone_df['main_category'] == 'cafe'].copy()
    
    # Build schedule
    schedule = []
    current_time = pd.to_datetime(start_time, format='%H:%M')
    
    # Morning coffee
    if len(cafes) > 0:
        cafe = cafes.sample(1).iloc[0]
        schedule.append({
            'time': current_time.strftime('%H:%M'),
            'name': cafe['name'],
            'type': 'breakfast',
            'category': cafe['subcategory'],
            'duration': 30,
            'lat': cafe['latitude'],
            'lon': cafe['longitude']
        })
        current_time += pd.Timedelta(minutes=30 + config['buffer_minutes']//2)
    
    # Morning attractions
    morning_count = config['activities_per_day'] // 2
    morning_attractions = attractions.sample(min(morning_count, len(attractions)))
    
    for _, poi in morning_attractions.iterrows():
        duration = int(poi['typical_duration'])
        schedule.append({
            'time': current_time.strftime('%H:%M'),
            'name': poi['name'],
            'type': 'attraction',
            'category': poi['subcategory'],
            'duration': duration,
            'lat': poi['latitude'],
            'lon': poi['longitude']
        })
        current_time += pd.Timedelta(minutes=duration + config['buffer_minutes'])
    
    # Lunch
    if current_time.hour < 12:
        current_time = current_time.replace(hour=12, minute=30)
    
    if len(restaurants) > 0:
        restaurant = restaurants.sample(1).iloc[0]
        schedule.append({
            'time': current_time.strftime('%H:%M'),
            'name': restaurant['name'],
            'type': 'lunch',
            'category': restaurant['subcategory'],
            'duration': config['meal_duration'],
            'lat': restaurant['latitude'],
            'lon': restaurant['longitude']
        })
        current_time += pd.Timedelta(minutes=config['meal_duration'] + config['buffer_minutes'])
    
    # Afternoon attractions
    afternoon_count = config['activities_per_day'] - morning_count
    remaining_attractions = attractions[~attractions['name'].isin(morning_attractions['name'])]
    afternoon_attractions = remaining_attractions.sample(min(afternoon_count, len(remaining_attractions)))
    
    for _, poi in afternoon_attractions.iterrows():
        duration = int(poi['typical_duration'])
        schedule.append({
            'time': current_time.strftime('%H:%M'),
            'name': poi['name'],
            'type': 'attraction',
            'category': poi['subcategory'],
            'duration': duration,
            'lat': poi['latitude'],
            'lon': poi['longitude']
        })
        current_time += pd.Timedelta(minutes=duration + config['buffer_minutes'])
    
    # Dinner
    if current_time.hour < 19:
        current_time = current_time.replace(hour=19, minute=30)
    
    if len(restaurants) > 1:
        remaining_restaurants = restaurants[restaurants['name'] != schedule[-1]['name'] if schedule else True]
        if len(remaining_restaurants) > 0:
            dinner = remaining_restaurants.sample(1).iloc[0]
            schedule.append({
                'time': current_time.strftime('%H:%M'),
                'name': dinner['name'],
                'type': 'dinner',
                'category': dinner['subcategory'],
                'duration': config['meal_duration'],
                'lat': dinner['latitude'],
                'lon': dinner['longitude']
            })
    
    return schedule

print("‚úÖ Route planning functions ready")

In [None]:
# Pick largest zone for demonstration
largest_zone_id = profiles_df.loc[profiles_df['total_pois'].idxmax(), 'zone_id']
demo_zone = df_day_zones[df_day_zones['day_zone'] == largest_zone_id]
demo_profile = [p for p in zone_profiles if p['zone_id'] == largest_zone_id][0]

print(f"\n{'='*80}")
print(f"üóìÔ∏è SAMPLE DAY ITINERARIES: {demo_profile['zone_name']}")
print(f"{'='*80}")
print(f"\nZone Info: {demo_profile['total_pois']} POIs in {demo_profile['primary_neighborhood']}")

# Generate itineraries for all pacing options
for pacing in ['slow', 'moderate', 'fast']:
    config = PACING_CONFIG[pacing]
    itinerary = create_day_itinerary(demo_zone, pacing=pacing)
    
    print(f"\n{'‚îÄ'*70}")
    print(f"‚è±Ô∏è {pacing.upper()} PACE")
    print(f"   {config['description']}")
    print(f"   Best for: {', '.join(config['best_for'])}")
    print(f"{'‚îÄ'*70}")
    
    total_duration = 0
    for item in itinerary:
        emoji_map = {
            'breakfast': '‚òï',
            'lunch': 'üçù',
            'dinner': 'üç∑',
            'attraction': 'üèõÔ∏è'
        }
        emoji = emoji_map.get(item['type'], 'üìç')
        
        print(f"   {item['time']} {emoji} {item['name'][:40]}")
        print(f"           ({item['category']}, {item['duration']} min)")
        total_duration += item['duration']
    
    print(f"\n   üìä Summary: {len(itinerary)} stops, {total_duration//60}h {total_duration%60}m total")

In [None]:
# Visualize routes on map
def plot_itinerary_route(zone_df, pacing='moderate'):
    """Plot day itinerary on map."""
    
    itinerary = create_day_itinerary(zone_df, pacing=pacing)
    config = PACING_CONFIG[pacing]
    
    # Convert to dataframe
    route_df = pd.DataFrame(itinerary)
    route_df['order'] = range(1, len(route_df) + 1)
    
    # Create figure
    fig = go.Figure()
    
    # Background: all POIs in zone
    fig.add_trace(go.Scattermapbox(
        lat=zone_df['latitude'],
        lon=zone_df['longitude'],
        mode='markers',
        marker=dict(size=6, color='lightgray', opacity=0.5),
        name='All POIs',
        hoverinfo='skip'
    ))
    
    # Route line
    fig.add_trace(go.Scattermapbox(
        lat=route_df['lat'],
        lon=route_df['lon'],
        mode='lines',
        line=dict(width=3, color='#2196F3'),
        name='Route'
    ))
    
    # Route stops
    colors = {
        'breakfast': '#795548',
        'lunch': '#FF9800',
        'dinner': '#9C27B0',
        'attraction': '#F44336'
    }
    
    for _, stop in route_df.iterrows():
        fig.add_trace(go.Scattermapbox(
            lat=[stop['lat']],
            lon=[stop['lon']],
            mode='markers+text',
            marker=dict(size=20, color=colors.get(stop['type'], '#2196F3')),
            text=str(stop['order']),
            textposition='middle center',
            textfont=dict(color='white', size=12),
            name=f"{stop['time']} {stop['name'][:30]}",
            hovertemplate=f"<b>{stop['time']}</b><br>{stop['name']}<br>{stop['duration']} min<extra></extra>"
        ))
    
    fig.update_layout(
        title=f"<b>{pacing.title()} Pace Day Route</b><br><sup>{config['description']}</sup>",
        mapbox=dict(
            style='carto-positron',
            center=dict(lat=route_df['lat'].mean(), lon=route_df['lon'].mean()),
            zoom=14
        ),
        height=500,
        showlegend=False,
        margin={'r':0,'t':80,'l':0,'b':0}
    )
    
    return fig

# Show all three pacing options
for pacing in ['slow', 'moderate', 'fast']:
    fig = plot_itinerary_route(demo_zone, pacing)
    fig.show()

In [None]:
# Pacing comparison table
comparison = []

for pacing, config in PACING_CONFIG.items():
    itinerary = create_day_itinerary(demo_zone, pacing=pacing)
    
    attractions = sum(1 for i in itinerary if i['type'] == 'attraction')
    meals = sum(1 for i in itinerary if i['type'] in ['breakfast', 'lunch', 'dinner'])
    total_duration = sum(i['duration'] for i in itinerary)
    
    comparison.append({
        'Pacing': pacing.title(),
        'Total Stops': len(itinerary),
        'Attractions': attractions,
        'Meals': meals,
        'Active Time (hrs)': round(total_duration / 60, 1),
        'Buffer Time': f"{config['buffer_minutes']} min",
        'Best For': ', '.join(config['best_for'][:2])
    })

comparison_df = pd.DataFrame(comparison)

print("\n" + "="*80)
print("üìä PACING COMPARISON")
print("="*80)
print(comparison_df.to_string(index=False))

---
## 5. Persona Recommendations

Which zones should each traveler type visit?

In [None]:
# Recommendation engine
traveler_personas = [
    {
        'name': 'üíë Honeymooners',
        'vibes': ['romantic', 'foodie', 'photography'],
        'groups': ['honeymoon', 'couple'],
        'pacing': 'slow'
    },
    {
        'name': 'üë®‚Äçüë©‚Äçüëß‚Äçüë¶ Families',
        'vibes': ['relaxation', 'cultural'],
        'groups': ['family'],
        'pacing': 'moderate'
    },
    {
        'name': 'üéí Solo Explorers',
        'vibes': ['cultural', 'adventure', 'foodie'],
        'groups': ['solo'],
        'pacing': 'fast'
    },
    {
        'name': 'üëØ Friend Groups',
        'vibes': ['nightlife', 'foodie', 'photography'],
        'groups': ['friends'],
        'pacing': 'fast'
    },
    {
        'name': 'üë¥üëµ Senior Travelers',
        'vibes': ['cultural', 'relaxation'],
        'groups': ['seniors'],
        'pacing': 'slow'
    },
    {
        'name': 'üçù Foodies',
        'vibes': ['foodie', 'cultural'],
        'groups': ['couple', 'friends', 'solo'],
        'pacing': 'moderate'
    }
]

print("\n" + "="*80)
print("üéØ PERSONALIZED ZONE RECOMMENDATIONS")
print("="*80)

for persona in traveler_personas:
    print(f"\n{'‚îÅ'*80}")
    print(f"{persona['name']}")
    print(f"   Looking for: {', '.join(persona['vibes'])}")
    print(f"   Recommended pacing: {persona['pacing'].upper()}")
    print(f"{'‚îÅ'*80}")
    
    # Score zones
    zone_scores = []
    for profile in zone_profiles:
        score = 0
        for vibe in persona['vibes']:
            score += profile['vibe_scores'].get(vibe, 0)
        for group in persona['groups']:
            score += profile['group_scores'].get(group, 0)
        
        zone_scores.append({
            'zone_id': profile['zone_id'],
            'name': profile['zone_name'],
            'neighborhood': profile['primary_neighborhood'],
            'score': score / (len(persona['vibes']) + len(persona['groups'])),
            'pois': profile['total_pois']
        })
    
    zone_scores.sort(key=lambda x: x['score'], reverse=True)
    
    print(f"\n   üèÜ TOP 3 RECOMMENDED ZONES:")
    medals = ['ü•á', 'ü•à', 'ü•â']
    
    for i, zone in enumerate(zone_scores[:3]):
        bar = '‚ñà' * int(zone['score'] * 10) + '‚ñë' * (10 - int(zone['score'] * 10))
        print(f"\n   {medals[i]} Zone {zone['zone_id']}: {zone['name']}")
        print(f"      {bar} Score: {zone['score']:.2f}")
        print(f"      üìç {zone['neighborhood']} | {zone['pois']} POIs")

---
## 6. Export for Production Use

In [None]:
# Save processed data
output_path = Path('../data/processed')
output_path.mkdir(exist_ok=True)

# Save POIs with cluster assignments
df.to_csv(output_path / 'rome_pois_clustered.csv', index=False)

# Save zone profiles
profiles_export = []
for p in zone_profiles:
    export = p.copy()
    export['top_vibes'] = [{'vibe': v, 'score': s} for v, s in p['top_vibes']]
    export['top_groups'] = [{'group': g, 'score': s} for g, s in p['top_groups']]
    profiles_export.append(export)

with open(output_path / 'rome_zone_profiles.json', 'w') as f:
    json.dump(profiles_export, f, indent=2)

# Save summary
summary = {
    'city': 'Rome',
    'data_source': 'Overture Maps via BigQuery',
    'total_pois': len(df),
    'day_zones': len(zone_profiles),
    'categories': df['main_category'].nunique(),
    'neighborhoods': df['neighborhood'].nunique(),
    'analysis_date': pd.Timestamp.now().isoformat()
}

with open(output_path / 'rome_summary.json', 'w') as f:
    json.dump(summary, f, indent=2)

print(f"\n‚úÖ Data exported to {output_path}/")
print(f"   üìÑ rome_pois_clustered.csv ({len(df)} POIs)")
print(f"   üìÑ rome_zone_profiles.json ({len(zone_profiles)} zones)")
print(f"   üìÑ rome_summary.json")

<cell_type>markdown</cell_type>---
## Summary

### What We Built

| Component | Output |
|-----------|--------|
| **Data Source** | Overture Maps via BigQuery (free) |
| **Clustering** | Day zones by walkable distance |
| **Persona Tagging** | Vibe & group scores per zone |
| **Route Planning** | Slow/moderate/fast pacing |

### Key Insights

| Zone | Best For | Vibe |
|------|----------|------|
| Centro Storico | Couples, Cultural seekers | Historic |
| Trastevere | Foodies, Romantics | Foodie |
| Vatican | All groups | Cultural |
| Testaccio | Adventurous foodies | Local |

### Scaling to New Cities

Same BigQuery approach works for any city:
```sql
SELECT 
    id,
    names.primary AS name,
    categories.primary AS category,
    ST_Y(geometry) AS latitude,  -- Use ST_Y for lat (GEOGRAPHY type)
    ST_X(geometry) AS longitude  -- Use ST_X for lon (GEOGRAPHY type)
FROM `bigquery-public-data.overture_maps.place`
WHERE 
    ST_X(geometry) BETWEEN [min_lon] AND [max_lon]
    AND ST_Y(geometry) BETWEEN [min_lat] AND [max_lat]
```

**Time per city: 8-12 days** (data + persona scoring + validation)