Skip to content

explorer: facet look-ahead counts — wiring plan + open design questions #229

@rdhyee

Description

@rdhyee

Goal

Show look-ahead counts next to each facet value in the legend (Source / Material /
Sampled Feature / Specimen Type). When a user toggles a facet, every other facet
value displays "if you added me to the current selection, you'd have N samples" —
the classic guided-navigation UX.

This issue is the design-review gate before the Layer 2+ work below. Layer 1 is
trivial wiring and will likely ship first; this issue exists to settle the harder
questions before they block.

Foundation already in place

R2 parquets (live)

URL Rows Purpose
https://data.isamples.org/isamples_202601_sample_facets_v2.parquet 5.98M Per-sample normalized facet values for live queries
https://data.isamples.org/isamples_202601_facet_summaries.parquet 56 Baseline counts (the legend numbers we show today)
https://data.isamples.org/isamples_202601_facet_cross_filter.parquet 526 Pre-computed cross-filter cube

The cube — what it covers

Schema: (filter_source, filter_material, filter_context, filter_object_type, facet_type, facet_value, count).
NULL filter columns mean "axis open".

Filter pattern Cells Covers
···· no filter 56 baseline
S··· source only 68 given a source, counts for material/context/object_type
·M·· material only 144 given a material, counts for the other three axes
··C· context only 116 given a context, counts for the other three axes
···O object_type only 142 given an object_type, counts for the other three axes

Not in the cube: 2+ active filters, multi-select within an axis, bbox, text search.

The cube correctly encodes the faceted-search rule: when filter is S=SESAR, only
material/context/object_type counts are stored — the Source axis is "open" relative
to itself, so its counts come from elsewhere.

Frontend hooks (already half-built in explorer.qmd)

  • ~L672-675: parquet URL declarations
  • ~L877: facetFilterSQL() — predicate builder, supports OR-within/AND-across
  • ~L976-996: applyFacetCounts(facetKey, countsMap) — currently only called once on baseline
  • ~L593-596, L514-538: .facet-count spans in the DOM, ready for population
  • CSS classes .facet-row.zero (dim) and .facet-count.recomputing (loading) already defined

Faceted-search semantics (the rule)

OR within an axis, AND across axes. The axis being calculated is "open" for its
own count.

If user has material = {bone} and we want to show the count for "Material: pottery":

  • ❌ Wrong: ... AND material = 'bone' AND material = 'pottery' → returns 0
  • ✅ Right: ... AND material = 'pottery' (drop the current Material constraint)

The cube bakes this in. Live queries must do it explicitly.

Implementation layers

Layer 1 — surface what already exists (~half day)

Hook a cube lookup into the filter-toggle handler. Single-active-filter case only.
This is pure wiring and will land first as a separate small PR; tracking here for
context but not for design review.

Layer 2 — multi-axis active filters (~1–2 days) ← needs decision

When both source=SESAR AND context=earthmaterial are active, the cube can't
help. Two paths:

  • (a) Live DuckDB-WASM GROUP BY against sample_facets_v2.parquet (5.98M rows).
    Budget: 50–200 ms per filter change. Needs a freshSelectionToken-style
    cancellation primitive (one already exists for sample-card detail loads).
  • (b) Expand the cube to pre-compute pairwise (SM, SC, SO, MC, MO, CO — 6
    patterns × ~50 × ~50 ≈ 15k extra cells, still tiny). Triples don't fit.

Recommendation: try (a) first, fall back to (b) if perf is bad.

Layer 3 — spatial + text + multi-select within axis (~2–3 days)

  • Multi-select within axis (e.g. source IN (SESAR, OpenContext)) — live query
    handles this naturally.
  • Bbox constraint — needs JOIN wide_url for lat/lng, or pre-bake H3 cell into
    facets_url. Adds 5–10× query cost.
  • Text searchILIKE on label/description/place_name. The existing search
    path already does this; needs to feed into the count query.

Layer 4 — UX polish (ongoing)

Threshold for "..." spinner on expensive queries, hooking up .facet-row.zero
dimming, mobile collapse, "Reset" behavior on huge categories.

Open design questions

  1. Stale-while-revalidate? On filter toggle, show baseline counts immediately
    and update to filtered counts when the live query returns? Or block on the
    query and show "…" in the meantime? (Lean: stale-while-revalidate, with
    .facet-count.recomputing italic styling already in place.)

  2. Bbox constraint scope. Should facet counts reflect "in current viewport" or
    "globally"? In-map H3 cells already show in-viewport counts. (Lean: globally
    for the legend, viewport stays an H3-only thing.)

  3. Hierarchical concept rollup. context=earthinterior is a sub-concept of
    anysampledfeature. Should parent counts include children? Does the cube
    already do this, or are counts strictly leaf-level? (Needs verification against
    the cube.)

  4. Non-canonical axes. Cube is limited to source/material/context/object_type.
    If the explorer ever surfaces project / site / curation-location facets, the
    cube schema needs migration. Is this on the roadmap?

Out of scope

Acceptance

  • Decisions recorded on the four questions above.
  • Layer 2 path (a vs. b) chosen.
  • Layers 1, 2, 3 broken into trackable sub-issues once the design lands.

Cross-refs: #163, #164, #226.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestexplorerInteractive Explorer featuresneeds-discussionRequires team input before implementing

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions