# iSamples Interactive Explorer

An interactive interface for exploring iSamples data across all sources.

**Features:**
- Map view with 6M+ samples (lonboard WebGL)
- Interactive table with filtering (ipydatagrid)
- Sample cards on selection
- Source filtering
- **Fulltext search**: Search label, description, place name with ranked results
- **Bidirectional selection sync**: Click map → highlights table row; click table → recenters map
- **Viewport Mode**: Dynamic loading based on pan/zoom (with loading indicator)
- **Adaptive sampling**: More points when zoomed in, fewer when zoomed out

**Data:** Zenodo wide parquet (~282 MB, 20M rows)

In [1]:
# Imports
import os
import math
import threading
import duckdb
import pandas as pd
import geopandas as gpd
import numpy as np
from functools import partial
from shapely.geometry import Point

# Visualization
from lonboard import Map, ScatterplotLayer
from lonboard.colormap import apply_continuous_cmap
from ipydatagrid import DataGrid
import ipywidgets as widgets
from IPython.display import display, HTML

In [2]:
# Data paths
LOCAL_WIDE = os.path.expanduser("~/Data/iSample/pqg_refining/zenodo_wide_2026-01-09.parquet")
REMOTE_WIDE = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"

# Use local if available
PARQUET_PATH = LOCAL_WIDE if os.path.exists(LOCAL_WIDE) else REMOTE_WIDE
print(f"Using: {PARQUET_PATH}")

# Connect to DuckDB
con = duckdb.connect()

Using: /Users/raymondyee/Data/iSample/pqg_refining/zenodo_wide_2026-01-09.parquet


In [3]:
# Source color scheme (consistent across iSamples)
SOURCE_COLORS = {
    'SESAR': [51, 102, 204, 200],       # Blue
    'OPENCONTEXT': [220, 57, 18, 200],  # Red
    'GEOME': [16, 150, 24, 200],        # Green
    'SMITHSONIAN': [255, 153, 0, 200],  # Orange
}

DEFAULT_COLOR = [128, 128, 128, 200]  # Gray for unknown

## Load Sample Data

We start with a sample of 50K records across all sources for responsive interaction.

In [4]:
def load_samples(max_per_source=12500, source_filter=None, bbox=None, search_term=None):
    """
    Load samples with coordinates from the wide parquet.
    
    Args:
        max_per_source: Maximum samples per source (for balanced representation)
        source_filter: Optional source name to filter (e.g., 'OPENCONTEXT')
        bbox: Optional bounding box dict with min_lat, max_lat, min_lon, max_lon
        search_term: Optional search string to filter and rank results
    
    Returns:
        GeoDataFrame with sample data (includes search_score if search_term provided)
    """
    where_clause = "WHERE otype = 'MaterialSampleRecord' AND latitude IS NOT NULL"
    if source_filter:
        where_clause += f" AND n = '{source_filter}'"
    if bbox:
        where_clause += f" AND latitude BETWEEN {bbox['min_lat']} AND {bbox['max_lat']}"
        where_clause += f" AND longitude BETWEEN {bbox['min_lon']} AND {bbox['max_lon']}"
    
    # Search filtering and scoring
    search_score_expr = "0 AS search_score"
    search_filter = ""
    order_by = "ORDER BY RANDOM()"
    
    if search_term and search_term.strip():
        # Escape single quotes in search term
        term = search_term.strip().replace("'", "''")
        
        # Weighted scoring: label (10) > description (5) > place_name (3)
        search_score_expr = f"""
            (CASE WHEN label ILIKE '%{term}%' THEN 10 ELSE 0 END +
             CASE WHEN description ILIKE '%{term}%' THEN 5 ELSE 0 END +
             CASE WHEN CAST(place_name AS VARCHAR) ILIKE '%{term}%' THEN 3 ELSE 0 END) AS search_score
        """
        
        # Filter to only matching records
        search_filter = f"""
            AND (label ILIKE '%{term}%' 
                 OR description ILIKE '%{term}%' 
                 OR CAST(place_name AS VARCHAR) ILIKE '%{term}%')
        """
        
        # Sort by score (highest first), then random within same score
        order_by = "ORDER BY search_score DESC, RANDOM()"
    
    # Query with balanced sampling across sources
    query = f"""
        WITH scored AS (
            SELECT 
                row_id, pid, label, description, latitude, longitude, n as source,
                place_name, result_time,
                {search_score_expr}
            FROM read_parquet('{PARQUET_PATH}')
            {where_clause}
            {search_filter}
        ),
        ranked AS (
            SELECT *,
                ROW_NUMBER() OVER (PARTITION BY source {order_by.replace('ORDER BY', 'ORDER BY') if 'ORDER BY' in order_by else ''}) as rn
            FROM scored
        )
        SELECT row_id, pid, label, description, latitude, longitude, source, place_name, result_time, search_score
        FROM ranked
        WHERE rn <= {max_per_source}
        {order_by}
    """
    
    df = con.sql(query).df()
    
    # Convert to GeoDataFrame
    geometry = [Point(lon, lat) for lon, lat in zip(df['longitude'], df['latitude'])]
    gdf = gpd.GeoDataFrame(df, geometry=geometry, crs="EPSG:4326")
    
    return gdf


def view_state_to_bbox(view_state, buffer_factor=1.5, aspect_ratio=1.5):
    """
    Calculate bounding box from lonboard view_state.

    The view_state contains latitude, longitude, and zoom level.
    We calculate the visible extent using Web Mercator projection math.

    Args:
        view_state: lonboard MapViewState with latitude, longitude, zoom
        buffer_factor: Multiply bbox by this to load slightly more data (default 1.5)
        aspect_ratio: Width/height ratio of map container (default 1.5 for wider maps)

    Returns:
        dict with min_lat, max_lat, min_lon, max_lon
    """
    lat = view_state.latitude
    lon = view_state.longitude
    zoom = view_state.zoom

    # At zoom 0, entire world visible (~360 degrees longitude)
    # Each zoom level halves the visible area
    # Approximate degrees visible at zoom level
    degrees_visible = 360 / (2 ** zoom)

    # Latitude visible area - apply buffer
    lat_degrees = degrees_visible * buffer_factor / 2

    # Longitude visible area - wider due to aspect ratio and Mercator at higher latitudes
    # Mercator stretches longitude at higher latitudes, so we need more buffer
    lat_rad = math.radians(abs(lat))
    mercator_stretch = 1 / max(math.cos(lat_rad), 0.1)  # Avoid division by zero near poles
    lon_degrees = degrees_visible * buffer_factor * aspect_ratio * mercator_stretch / 2

    # Clamp latitude to valid range
    min_lat = max(-90, lat - lat_degrees)
    max_lat = min(90, lat + lat_degrees)
    min_lon = max(-180, lon - lon_degrees)
    max_lon = min(180, lon + lon_degrees)

    return {
        'min_lat': min_lat,
        'max_lat': max_lat,
        'min_lon': min_lon,
        'max_lon': max_lon
    }


def adaptive_sample_size(zoom, base_size=50000):
    """
    Calculate sample size based on zoom level.
    
    At low zoom (world view), sample aggressively to avoid overwhelming.
    At high zoom (local view), show all available points.
    
    Args:
        zoom: Current zoom level (0-20)
        base_size: Base sample size per source
    
    Returns:
        Sample size to use per source
    """
    if zoom < 2:
        return min(base_size, 10000)  # World view: max 10K per source
    elif zoom < 5:
        return min(base_size, 25000)  # Continent view: max 25K
    elif zoom < 8:
        return min(base_size, 50000)  # Country view: max 50K
    elif zoom < 12:
        return min(base_size, 100000)  # Region view: max 100K
    else:
        return base_size  # Local view: use full base_size


# Load initial data
print("Loading samples...")
samples_gdf = load_samples(max_per_source=12500)
print(f"Loaded {len(samples_gdf):,} samples")
print(f"\nBy source:")
print(samples_gdf['source'].value_counts())

Loading samples...
Loaded 50,000 samples

By source:
source
GEOME          12500
SESAR          12500
OPENCONTEXT    12500
SMITHSONIAN    12500
Name: count, dtype: int64


## Sample Card Renderer

In [5]:
def render_sample_card(row):
    """
    Render a sample as an HTML card.
    
    Args:
        row: DataFrame row or Series with sample data
    
    Returns:
        HTML string
    """
    if row is None:
        return "<div style='padding: 10px; color: #666;'>Click a point to see details</div>"
    
    source = row.get('source', 'Unknown')
    source_color = {
        'SESAR': '#3366CC',
        'OPENCONTEXT': '#DC3912',
        'GEOME': '#109618',
        'SMITHSONIAN': '#FF9900'
    }.get(source, '#808080')
    
    label = row.get('label', 'No label')
    if pd.isna(label):
        label = 'No label'
    
    description = row.get('description', '')
    if pd.isna(description):
        description = ''
    elif len(str(description)) > 200:
        description = str(description)[:200] + '...'
    
    lat = row.get('latitude', 0)
    lon = row.get('longitude', 0)
    if pd.isna(lat):
        lat = 0
    if pd.isna(lon):
        lon = 0
        
    pid = row.get('pid', '')
    if pd.isna(pid):
        pid = ''
    
    place = row.get('place_name', '')
    if pd.isna(place):
        place = ''
    elif isinstance(place, list):
        place = ' > '.join(str(p) for p in place if p and not pd.isna(p))
    else:
        place = str(place)
    
    # Build place HTML only if place has content
    place_html = ''
    if place and len(place) > 0:
        place_html = f'<div style="margin-bottom: 4px;"><strong>Place:</strong> {place[:100]}</div>'
    
    html = f"""
    <div style="font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
                border: 1px solid #ddd; border-radius: 8px; padding: 16px; 
                max-width: 400px; background: white; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
        <div style="display: flex; align-items: center; margin-bottom: 12px;">
            <span style="background: {source_color}; color: white; padding: 4px 8px; 
                        border-radius: 4px; font-size: 12px; font-weight: 600;">{source}</span>
        </div>
        <h3 style="margin: 0 0 8px 0; font-size: 16px; color: #333;">{label}</h3>
        <p style="margin: 0 0 12px 0; font-size: 13px; color: #666; line-height: 1.4;">
            {description if description else '<em>No description</em>'}
        </p>
        <div style="font-size: 12px; color: #888;">
            <div style="margin-bottom: 4px;"><strong>Location:</strong> {lat:.4f}, {lon:.4f}</div>
            {place_html}
            <div><strong>ID:</strong> <code style="background: #f5f5f5; padding: 2px 4px; border-radius: 3px;">{str(pid)[:50]}{'...' if len(str(pid)) > 50 else ''}</code></div>
        </div>
    </div>
    """
    return html

# Test the card
display(HTML(render_sample_card(samples_gdf.iloc[0])))

## Map Component

In [6]:
def get_colors_for_sources(sources):
    """
    Get color array for a list of sources.
    
    Args:
        sources: pandas Series or list of source names
    
    Returns:
        numpy array of RGBA colors
    """
    colors = np.array([
        SOURCE_COLORS.get(s, DEFAULT_COLOR) for s in sources
    ], dtype=np.uint8)
    return colors

def create_map_layer(gdf):
    """
    Create a lonboard ScatterplotLayer from a GeoDataFrame.
    """
    colors = get_colors_for_sources(gdf['source'])
    
    layer = ScatterplotLayer.from_geopandas(
        gdf,
        get_fill_color=colors,
        get_radius=1000,
        radius_units='meters',
        radius_min_pixels=2,
        radius_max_pixels=10,
        pickable=True,
        auto_highlight=True,
    )
    return layer

# Create initial map
layer = create_map_layer(samples_gdf)
sample_map = Map(layers=[layer])
print("Map created with", len(samples_gdf), "points")

Map created with 50000 points


## Table Component

In [7]:
def create_table(gdf):
    """
    Create an ipydatagrid table from sample data.
    """
    # Select columns for display
    display_cols = ['source', 'label', 'latitude', 'longitude']
    df_display = gdf[display_cols].copy()
    df_display['latitude'] = df_display['latitude'].round(4)
    df_display['longitude'] = df_display['longitude'].round(4)
    
    grid = DataGrid(
        df_display,
        base_row_size=32,
        base_column_size=120,
        selection_mode='row',
        editable=False,
        layout=widgets.Layout(height='300px', width='100%')
    )
    return grid

# Create table
sample_table = create_table(samples_gdf)

## Interactive Controls

In [None]:
# State management
class ExplorerState:
    def __init__(self):
        self.selected_index = None
        self.selected_row = None
        self.current_gdf = None
        self.viewport_mode = False
        self.debounce_timer = None
        self.loading = False
        self.syncing_selection = False  # Prevent infinite loops
        self.current_search = ""  # Current search term

state = ExplorerState()
state.current_gdf = samples_gdf

# Widgets
source_filter = widgets.Dropdown(
    options=['All Sources', 'SESAR', 'OPENCONTEXT', 'GEOME', 'SMITHSONIAN'],
    value='All Sources',
    description='Source:',
    style={'description_width': '60px'}
)

search_input = widgets.Text(
    value='',
    placeholder='Search label, description, place...',
    description='Search:',
    style={'description_width': '60px'},
    layout=widgets.Layout(width='280px')
)

search_btn = widgets.Button(
    description='',
    button_style='',
    icon='search',
    tooltip='Search (or press Enter)',
    layout=widgets.Layout(width='40px')
)

clear_search_btn = widgets.Button(
    description='',
    button_style='',
    icon='times',
    tooltip='Clear search',
    layout=widgets.Layout(width='40px')
)

sample_count = widgets.IntSlider(
    value=12500,
    min=1000,
    max=500000,  # 500K per source - plenty for 128GB RAM
    step=5000,
    description='Per source:',
    style={'description_width': '80px'}
)

viewport_toggle = widgets.ToggleButton(
    value=False,
    description='Viewport Mode',
    tooltip='When enabled, automatically loads data for current map view',
    icon='map',
    button_style=''  # 'success' when active
)

refresh_btn = widgets.Button(
    description='Refresh Data',
    button_style='primary',
    icon='refresh'
)

# Loading indicator with spinner
loading_indicator = widgets.HTML(value="")

status_label = widgets.HTML(value=f"<b>Loaded:</b> {len(samples_gdf):,} samples")

card_output = widgets.HTML(value=render_sample_card(None))


def show_loading(message="Loading..."):
    """Show loading indicator."""
    state.loading = True
    loading_indicator.value = f"""
    <div style="display: inline-flex; align-items: center; color: #666;">
        <svg width="20" height="20" viewBox="0 0 50 50" style="animation: spin 1s linear infinite; margin-right: 8px;">
            <circle cx="25" cy="25" r="20" fill="none" stroke="#3366CC" stroke-width="4" stroke-dasharray="80,40"/>
        </svg>
        <style>@keyframes spin {{ from {{ transform: rotate(0deg); }} to {{ transform: rotate(360deg); }} }}</style>
        <span>{message}</span>
    </div>
    """


def hide_loading():
    """Hide loading indicator."""
    state.loading = False
    loading_indicator.value = ""


def select_sample(idx, source='map'):
    """
    Select a sample by index and sync map/table/card.
    
    Args:
        idx: Row index in current_gdf
        source: 'map' or 'table' - which triggered the selection
    """
    if idx is None or idx >= len(state.current_gdf):
        return
        
    state.selected_index = idx
    state.selected_row = state.current_gdf.iloc[idx]
    
    # Update sample card
    card_output.value = render_sample_card(state.selected_row)
    
    if source == 'map':
        # Map click -> highlight table row
        # Use selections property directly (list of selection dicts)
        # Column count depends on whether we're showing search_score
        col_count = 4 if state.current_search else 3
        sample_table.selections = [{'r1': idx, 'c1': 0, 'r2': idx, 'c2': col_count}]
    
    elif source == 'table':
        # Table click -> recenter map (keep current zoom)
        lat = state.selected_row['latitude']
        lon = state.selected_row['longitude']
        if not pd.isna(lat) and not pd.isna(lon):
            sample_map.set_view_state(latitude=float(lat), longitude=float(lon))


def on_map_point_click(change):
    """Handle click on a map point - highlight corresponding table row."""
    if state.syncing_selection:
        return
    
    idx = change.get('new')
    if idx is None:
        return
    
    state.syncing_selection = True
    try:
        select_sample(idx, source='map')
    finally:
        state.syncing_selection = False


def setup_layer_observer(layer):
    """Setup the selected_index observer on a layer."""
    layer.observe(on_map_point_click, names=['selected_index'])


def update_map_and_table(new_gdf, search_active=False):
    """Update map and table with new data."""
    state.current_gdf = new_gdf
    state.current_search = search_input.value.strip() if search_active else ""
    
    # Update map with new layer
    new_layer = create_map_layer(new_gdf)
    
    # Setup observer on new layer BEFORE adding to map
    setup_layer_observer(new_layer)
    
    sample_map.layers = [new_layer]
    
    # Update table - include score column if searching
    if search_active and 'search_score' in new_gdf.columns:
        display_cols = ['search_score', 'source', 'label', 'latitude', 'longitude']
        df_display = new_gdf[display_cols].copy()
        df_display = df_display.rename(columns={'search_score': 'score'})
    else:
        display_cols = ['source', 'label', 'latitude', 'longitude']
        df_display = new_gdf[display_cols].copy()
    
    df_display['latitude'] = df_display['latitude'].round(4)
    df_display['longitude'] = df_display['longitude'].round(4)
    sample_table.data = df_display
    
    # Update status
    if search_active:
        status_label.value = f"<b>Found:</b> {len(new_gdf):,} matches for '{state.current_search}'"
    else:
        status_label.value = f"<b>Loaded:</b> {len(new_gdf):,} samples"


def do_search():
    """Execute search with current parameters."""
    show_loading("Searching...")
    
    try:
        source = None if source_filter.value == 'All Sources' else source_filter.value
        search_term = search_input.value.strip()
        
        if state.viewport_mode:
            # Search within current viewport
            view_state = sample_map.view_state
            zoom = view_state.zoom if hasattr(view_state, 'zoom') else 1
            bbox = view_state_to_bbox(view_state)
            
            # When searching, use slider value directly (no adaptive reduction)
            # When browsing, use adaptive sampling based on zoom
            if search_term:
                max_samples = sample_count.value
            else:
                max_samples = adaptive_sample_size(zoom, base_size=sample_count.value)
            
            new_gdf = load_samples(
                max_per_source=max_samples,
                source_filter=source,
                bbox=bbox,
                search_term=search_term if search_term else None
            )
            
            zoom_info = f" (zoom {zoom:.1f})"
        else:
            # Search globally
            new_gdf = load_samples(
                max_per_source=sample_count.value,
                source_filter=source,
                search_term=search_term if search_term else None
            )
            zoom_info = ""
        
        update_map_and_table(new_gdf, search_active=bool(search_term))
        
        if search_term:
            status_label.value = f"<b>Found:</b> {len(new_gdf):,} matches for '{search_term}'{zoom_info}"
        else:
            status_label.value = f"<b>Loaded:</b> {len(new_gdf):,} samples{zoom_info}"
        
    except Exception as e:
        status_label.value = f"<b>Error:</b> {str(e)[:50]}"
    finally:
        hide_loading()


def on_search_click(b):
    """Handle search button click."""
    do_search()


def on_search_submit(change):
    """Handle Enter key in search box."""
    do_search()


def on_clear_search(b):
    """Clear search and reload data."""
    search_input.value = ''
    do_search()


search_btn.on_click(on_search_click)
search_input.on_submit(on_search_submit)
clear_search_btn.on_click(on_clear_search)


def load_viewport_data():
    """Load data for current viewport with adaptive sampling."""
    if state.loading:
        return
        
    show_loading("Loading viewport data...")
    
    try:
        # Get current view state
        view_state = sample_map.view_state
        zoom = view_state.zoom if hasattr(view_state, 'zoom') else 1
        
        # Calculate bounding box
        bbox = view_state_to_bbox(view_state)
        
        # Get source filter and search term
        source = None if source_filter.value == 'All Sources' else source_filter.value
        search_term = search_input.value.strip() if search_input.value.strip() else None
        
        # When searching, use slider value directly (no adaptive reduction)
        # When browsing, use adaptive sampling based on zoom
        if search_term:
            max_samples = sample_count.value
        else:
            max_samples = adaptive_sample_size(zoom, base_size=sample_count.value)
        
        # Load data
        new_gdf = load_samples(
            max_per_source=max_samples,
            source_filter=source,
            bbox=bbox,
            search_term=search_term
        )
        
        update_map_and_table(new_gdf, search_active=bool(search_term))
        
        # Show zoom info in status
        if search_term:
            status_label.value = f"<b>Found:</b> {len(new_gdf):,} matches for '{search_term}' (zoom {zoom:.1f})"
        else:
            status_label.value = f"<b>Loaded:</b> {len(new_gdf):,} samples (zoom {zoom:.1f}, {max_samples:,}/source max)"
        
    except Exception as e:
        status_label.value = f"<b>Error:</b> {str(e)[:50]}"
    finally:
        hide_loading()


def debounced_viewport_load():
    """Debounced viewport loading - waits for user to stop panning/zooming."""
    # Cancel any existing timer
    if state.debounce_timer is not None:
        state.debounce_timer.cancel()
    
    # Set new timer (500ms delay)
    state.debounce_timer = threading.Timer(0.5, load_viewport_data)
    state.debounce_timer.start()


def on_view_state_change(change):
    """Handle map pan/zoom changes."""
    if state.viewport_mode and not state.loading:
        debounced_viewport_load()


def on_viewport_toggle(change):
    """Handle viewport mode toggle."""
    state.viewport_mode = change['new']
    if change['new']:
        viewport_toggle.button_style = 'success'
        viewport_toggle.description = 'Viewport Mode ON'
        # Immediately load viewport data
        load_viewport_data()
    else:
        viewport_toggle.button_style = ''
        viewport_toggle.description = 'Viewport Mode'


viewport_toggle.observe(on_viewport_toggle, names=['value'])


# Event handlers
def on_refresh_click(b):
    do_search()  # Refresh now uses same logic as search

refresh_btn.on_click(on_refresh_click)


def on_table_selection(change):
    """Handle table row selection - recenter map on selected point."""
    if state.syncing_selection:
        return
    
    # selections is a LIST of selection dicts
    selections = change.get('new', [])
    if selections and len(selections) > 0:
        # Get the first selection
        sel = selections[0]
        row_idx = sel.get('r1')
        if row_idx is not None and row_idx < len(state.current_gdf):
            state.syncing_selection = True
            try:
                select_sample(row_idx, source='table')
            finally:
                state.syncing_selection = False

sample_table.observe(on_table_selection, names=['selections'])

# Register view_state observer on the map
sample_map.observe(on_view_state_change, names=['view_state'])

# Setup observer on initial layer
setup_layer_observer(sample_map.layers[0])

## Explorer Interface

Run this cell to launch the interactive explorer.

In [9]:
# Layout the interface

# Search box with buttons
search_box = widgets.HBox([
    search_input,
    search_btn,
    clear_search_btn
], layout=widgets.Layout(margin='0 15px 0 0'))

# Row 1: Source filter and search
controls_row1 = widgets.HBox([
    source_filter,
    search_box,
], layout=widgets.Layout(margin='5px 0'))

# Row 2: Sample count, viewport mode, refresh, status
controls_row2 = widgets.HBox([
    sample_count,
    viewport_toggle,
    refresh_btn,
    loading_indicator,
    status_label
], layout=widgets.Layout(margin='5px 0', flex_wrap='wrap'))

controls = widgets.VBox([controls_row1, controls_row2])

# Legend
legend_html = """
<div style="display: flex; gap: 15px; padding: 8px; background: #f9f9f9; border-radius: 4px; font-size: 12px;">
    <span><span style="display: inline-block; width: 12px; height: 12px; background: #3366CC; border-radius: 50%; margin-right: 4px;"></span>SESAR</span>
    <span><span style="display: inline-block; width: 12px; height: 12px; background: #DC3912; border-radius: 50%; margin-right: 4px;"></span>OpenContext</span>
    <span><span style="display: inline-block; width: 12px; height: 12px; background: #109618; border-radius: 50%; margin-right: 4px;"></span>GEOME</span>
    <span><span style="display: inline-block; width: 12px; height: 12px; background: #FF9900; border-radius: 50%; margin-right: 4px;"></span>Smithsonian</span>
</div>
"""
legend = widgets.HTML(value=legend_html)

# Main layout
left_panel = widgets.VBox([
    widgets.HTML("<h4 style='margin: 0 0 8px 0;'>Map</h4>"),
    legend,
    sample_map
], layout=widgets.Layout(flex='2', margin='0 10px 0 0'))

right_panel = widgets.VBox([
    widgets.HTML("<h4 style='margin: 0 0 8px 0;'>Selected Sample</h4>"),
    card_output,
    widgets.HTML("<h4 style='margin: 16px 0 8px 0;'>Sample List</h4>"),
    sample_table
], layout=widgets.Layout(flex='1', min_width='420px'))

main_layout = widgets.HBox([left_panel, right_panel])

# Display
display(widgets.VBox([
    widgets.HTML("<h2 style='margin-bottom: 5px;'>iSamples Explorer</h2>"),
    widgets.HTML("<p style='color: #666; margin-top: 0;'>Interactive exploration of physical samples across scientific domains</p>"),
    controls,
    main_layout
]))

VBox(children=(HTML(value="<h2 style='margin-bottom: 5px;'>iSamples Explorer</h2>"), HTML(value="<p style='col…

## Usage

1. **Filter by source**: Use the dropdown to show only one data source
2. **Search**: Type in the search box and press Enter or click the search icon
3. **Adjust sample size**: Increase/decrease points per source for performance vs. coverage
4. **Click Refresh**: Load new data after changing filters
5. **Pan/zoom map**: Explore geographic distribution

### Search

Search filters samples by matching text in **label**, **description**, and **place name** fields:

- **Enter a term**: Type "pottery", "basalt", "Cyprus", etc. and press Enter
- **Results are ranked**: Label matches (10 pts) > Description (5 pts) > Place name (3 pts)
- **Score column**: When searching, a "score" column appears in the table showing match quality
- **Clear search**: Click the X button to clear and reload all samples
- **Viewport aware**: With Viewport Mode ON, search is limited to the current map view

### Selection Sync (Bidirectional)

Map and table selections are synchronized:

- **Click a dot on the map** → The corresponding row is highlighted in the table, and the sample card updates
- **Click a row in the table** → The map recenters on that point (zoom level is preserved), and the sample card updates

This makes it easy to explore samples visually on the map and then find them in the table, or vice versa.

### Viewport Mode (Dynamic Loading)

Enable **Viewport Mode** to automatically reload data as you pan and zoom:

- **Toggle ON**: Click the "Viewport Mode" button (turns green when active)
- **Pan/zoom**: Data reloads automatically after you stop moving (500ms debounce)
- **Loading indicator**: Spinner shows while data is being fetched
- **Adaptive sampling**: 
  - World view (zoom < 2): max 10K samples per source
  - Continent (zoom 2-5): max 25K per source
  - Country (zoom 5-8): max 50K per source
  - Region (zoom 8-12): max 100K per source
  - Local (zoom > 12): uses your slider value

### Color Legend
- **Blue**: SESAR (geological samples, IGSNs)
- **Red**: OpenContext (archaeological samples)
- **Green**: GEOME (genomic/biological samples)
- **Orange**: Smithsonian (museum collections)

## Debug: Raw Data Access

Use these cells to explore the underlying data.

In [10]:
# Current selection
if state.selected_row is not None:
    print("Selected sample:")
    print(state.selected_row)
else:
    print("No sample selected")

No sample selected


In [11]:
# Query the full dataset
con.sql(f"""
    SELECT n as source, COUNT(*) as total_samples
    FROM read_parquet('{PARQUET_PATH}')
    WHERE otype = 'MaterialSampleRecord' AND latitude IS NOT NULL
    GROUP BY n
    ORDER BY total_samples DESC
""").df()

Unnamed: 0,source,total_samples
0,SESAR,4389231
1,OPENCONTEXT,1059025
2,GEOME,291210
3,SMITHSONIAN,240816
