Skip to content

explorer: dynamic cross-filter facet counts#155

Merged
rdhyee merged 4 commits intoisamplesorg:mainfrom
rdhyee:explorer/cross-filter-counts
May 1, 2026
Merged

explorer: dynamic cross-filter facet counts#155
rdhyee merged 4 commits intoisamplesorg:mainfrom
rdhyee:explorer/cross-filter-counts

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented May 1, 2026

Closes #154.

Implements the path-A plan from #154 (Codex-resolved): when the user toggles a filter, the count next to every other facet value recalculates against the new filter combination. Matches the OpenContext UX Eric Kansa pointed at. Source + material + sampled feature + specimen type all participate. Search and camera viewport stay out of scope per the issue.

Live preview: https://rdhyee.github.io/isamplesorg.github.io/tutorials/isamples_explorer.html?sources=OPENCONTEXT

What changed

Data

  • New constant cross_filter_url = .../isamples_202601_facet_cross_filter.parquet (the existing ~6 KB single-filter pre-aggregation cache).

DOM scaffolding

  • Source legend and material/context/object_type rows now share a uniform shape:
    • <label class="facet-row" data-facet="K" data-value="V">
    • <span class="facet-count" data-facet="K" data-value="V">
  • This lets applyFacetCounts(facetKey, countsMap) update text in place without rebuilding checkbox HTML — important because mid-interaction rebuilds would lose user selections.
  • New CSS: .facet-row.zero { opacity: 0.4 } for dim-on-zero, and .facet-count.recomputing { opacity: 0.55; font-style: italic } while a refresh is in flight.

Helpers (helpers cell)

  • applyFacetCounts(facetKey, countsMap | null) — writes count text and toggles .zero on each row. Null = restore baseline (from viewer._baselineCounts).
  • markFacetCountsRecomputing() — italicizes counts during a query.

refreshFacetCounts() (inside zoomWatcher)

Debounced 250 ms with a generation guard (facetCountsReqId). Three cases:

  1. No filters active → restore baseline. No query.
  2. Exactly one facet value selected → cache hit. SQL:
    SELECT facet_type, facet_value, count
    FROM read_parquet(cross_filter_url)
    WHERE filter_<key> = '<value>'
      AND <other filter cols> IS NULL
    Verified schema: (filter_source, filter_material, filter_context, filter_object_type, facet_type, facet_value, count).
  3. Anything else → four concurrent on-the-fly GROUP BY queries on sample_facets_v2, each excluding the column being recomputed:
    SELECT <col> AS value, COUNT(*) FROM facets_v2
    WHERE <other dim filters>
    GROUP BY <col>

Selected dim's own values keep baseline counts (so the user can switch their selection within that group without the counts collapsing to "now I only have one option").

Wiring

Existing source / material / context / object_type change handlers each gain a refreshFacetCounts() call alongside the existing refreshResultsTable(). One initial paint at startup so any ?sources= URL param's narrowing is reflected immediately.

What's deferred (per #154)

  • Search composition. Search is currently an imperative panel lookup; folding it into count math would require promoting it to a real persistent filter with URL sync. Not in v1.
  • Camera-aware counts. Mixes "what filters are possible in the dataset" with "what is visible in viewport." Defer.
  • Path C (denser pre-aggregated cache covering all 2-filter combinations). Touches the data-build pipeline. Revisit only if path B's perf turns out unacceptable in practice.

Test plan

  • quarto render tutorials/isamples_explorer.qmd succeeds
  • Playwright smoke test: 0 JS exceptions, 0 console errors, 0 network failures
  • Cross-filter parquet schema verified via DuckDB (4 filter cols + facet_type/facet_value/count)
  • Deployed to rdhyee fork preview
  • Reviewer eyeballs preview URL above:
    • check OpenContext in the source legend → material / sampled feature / specimen type counts should all drop to OpenContext-only counts; values with 0 OpenContext samples should dim. Source counts stay at baseline.
    • within the dimmed material list, check Pottery → should now have ≥2 filters active, hitting the on-the-fly fallback path. Other facets (source, sampled feature, specimen type) should recompute against OpenContext+Pottery; material list itself stays at baseline.
    • Clear Filters → all counts return to baseline, no rows dimmed.
  • Codex pass

References

🤖 Generated with Claude Code

Implements path A from issue isamplesorg#154 (Codex-resolved plan): facet counts
recalculate when filters are toggled, matching the OpenContext UX Eric
Kansa flagged. Counts reflect the full source + material + context +
object_type filter combination; search and camera viewport remain out
of v1 scope per the issue.

How it works:
- All four filter groups (source legend + material + sampled feature +
  specimen type) carry uniform `data-facet` / `data-value` count spans
  + `.facet-row` parents so refreshFacetCounts() updates text in place
  without rebuilding checkbox HTML (which would lose mid-interaction
  selections).
- One debounced refreshFacetCounts() (~250 ms) with a generation/stale-
  result guard, mirroring the existing loadViewportSamples and
  refreshResultsTable patterns.
- Strategy:
  * No filters active → restore baseline counts from facet_summaries
    (stashed on viewer._baselineCounts after initial load).
  * Exactly one facet value selected → cache lookup against
    isamples_202601_facet_cross_filter.parquet (~6 KB, single-filter
    pre-aggregation; one matching row per target facet value).
  * Anything else → on-the-fly: four concurrent GROUP BY queries on
    sample_facets_v2, each excluding the column being recomputed.
- The selected facet's own values keep baseline counts (so the user
  can see what other values WOULD yield if they switched their
  selection within that group).
- Zero-count rows dim (opacity 0.4) rather than hide; less layout
  churn and clearer signal that the selection eliminated those values.
- During recompute the counts get a subtle italic .recomputing class
  so users see something is updating.

Source counts are sample-level (matching facet_summaries / facets_url)
and not H3 cluster dominant_source counts; the cluster-zoom caveat in
"How It Works" already covers the dominant_source semantics for the
globe view.

Smoke-test: 0 JS exceptions, 0 console errors, 0 network failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented May 1, 2026

Semantics discussion (Raymond ↔ Claude, 2026-04-30)

Anchoring the within-facet semantics + "count next to each value" before continuing the patch.

Raymond

let's talk about the semantics of the UI. I think that if for any given facet (including collection) if nothing is selected, it means all...and then if you pick a filter then it's just that one item and if multiple items of a given filter, you do an "or" amonng all the choices. Ideally, we'd calculate the number of items for if you add that filter item.

Does that make sense? I think this is my intution for how it should behave. Let's think about it

Claude

Yes — and your intuition is the standard faceted-search behavior, which is good news. But there's a subtle ambiguity in the "count if you add this" part that's worth pinning down before I touch the code.

Within-facet semantics you described (empty = all; 1 = just that; multi = OR) — this is what the SQL filter logic already does in PR #155. WHERE material IN ('Pottery', 'Glass') is OR; material clause absent is "all." That part is settled.

"Count if you add this" — two readings:

(A) Per-value count under cross-dim filters. Next to "Glass" we display the count of samples whose material = Glass AND <other dim filters>, regardless of what else is selected within the same Material facet.

  • Pros: simple, one query per dim, this is what OpenContext shows.
  • Reads as: "how many Glass samples exist under my other filters."

(B) Hypothetical-result-after-add count. Next to "Glass" we display the new total result set size if Glass were added to whatever's already selected in Material.

  • Pros: literally "what happens if I click this."
  • Cons: every value's number is current_result + per_value_count_of_that_value (since same-dim values are disjoint per sample) — they all shift by the same amount, so they convey less relative information than (A). And clicking selected values (which would remove them) needs a separate computation. Not what OpenContext does.

I'd go with (A). It's what every major faceted UI does (OpenContext included), it answers "how big is each option?" which is what users actually want for picking next, and (B)'s numbers move in lockstep so they don't help comparison.

Bug in my current PR #155 to fix regardless: when ANY value in a dim is selected, I currently fall back to baseline counts for that dim's own values (the unfiltered facet_summaries). I should be applying cross-dim filters to that dim too, just excluding the dim's own current selection. So if you've selected Source=OpenContext, the Material list values should each show "count of that material under OpenContext" — not the global counts. That's plain (A).

Decision

Going with A and patching PR #155 to fix the dim-with-selections bug.

Per semantics (A) confirmed in the PR isamplesorg#155 thread: every dim's value
counts should reflect cross-dim filters, not just dims without active
selections. Previously refreshFacetCountsNow's on-the-fly path skipped
any dim that had selections (paint baseline), which broke multi-filter
compositions: with Source=OpenContext + Material=Pottery selected,
Material values still showed unfiltered baseline counts instead of
"count of each material under Source=OpenContext."

Fix: remove the skip in the on-the-fly fallback. The existing
buildExcludeWhere() already excludes the dim being recomputed from the
WHERE, so every dim's GROUP BY yields the correct (A)-semantic count.

Single-active cache path is unchanged: under one active dim with no
other filters, recomputing that dim's values = baseline by definition,
and the cache is empirically confirmed to not carry rows for the
filter-dim's own facet_type. Updated the comment to explain why.

Smoke test still PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented May 1, 2026

Review findings from the semantics pass anchored at #155 (comment):

  1. tutorials/isamples_explorer.qmd:1265 treats “zero source checkboxes selected” as no active source filter, so refreshFacetCountsNow() restores baseline counts. The existing result SQL treats zero selected sources as AND 1=0 at tutorials/isamples_explorer.qmd:308, so the UI can show nonzero facet counts while the actual result set is empty. The source dimension needs a distinct representation for “all selected” versus “none selected.”

  2. tutorials/isamples_explorer.qmd:1377 increments facetCountsReqId only after the 250 ms debounce fires. If an older count query is in flight and the user changes filters, the old query can still pass the generation check and repaint stale counts during the debounce window. Increment the generation when scheduling/invalidating a refresh, then pass that captured id into the delayed call.

I did not run the Quarto render or browser smoke test; this was a code-review pass focused on the linked semantics comment and PR diff.

…+ debounce race

Two fixes from the Codex review of isamplesorg#155:

1. Source filter "0 checked" now means "all" (no filter), matching
   Raymond's stated intuition for facet UX (PR isamplesorg#155 thread) and the
   semantics already used by material/context/object_type. Previously
   sourceFilterSQL returned `AND 1=0` when zero source checkboxes were
   checked, while refreshFacetCountsNow treated zero-checked as "no
   active source filter" and restored baseline counts — meaning the UI
   could show non-zero facet counts while the actual result set was
   empty. The describeActiveFilters helper already mapped both 0-checked
   and 4-checked to "source not filtering," so this just unifies
   sourceFilterSQL with that view.

2. Debounce race fix: refreshFacetCounts() now increments
   facetCountsReqId synchronously when scheduling, not after the 250 ms
   debounce fires. Previously, an in-flight count query could resolve
   during the debounce window of a newer schedule and still pass its
   generation check (because the gen was unchanged), repainting stale
   counts. Bumping the gen on schedule invalidates all in-flight queries
   immediately. Also added an early-stale guard at the top of
   refreshFacetCountsNow.

Smoke test still PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee
Copy link
Copy Markdown
Contributor Author

rdhyee commented May 1, 2026

Codex re-review finding after commit 1bcf1ad:

  • P3: tutorials/isamples_explorer.qmd:1403 still applies .disabled to every unchecked source checkbox. After the new fix, zero checked sources means “all sources,” but the UI will show every source label dimmed as if every source is excluded, while results/counts include all sources. Either don’t apply the disabled styling when active.length === 0, or normalize the zero-source state back to all checked.

The two prior review findings are addressed in current head 1bcf1ad: result filtering and facet-count semantics now agree for zero checked sources, and facetCountsReqId is invalidated before the debounce delay.

I did not run Quarto/browser tests for this re-review.

After the prior fix unifying source semantics so 0 checked == 4 checked
== "all sources / no filter," the .disabled styling on the legend rows
was still based on the per-checkbox checked state. Result: unchecking
all 4 source boxes painted every label dimmed (looked like every
source was excluded) while results and counts correctly included all
sources. Cosmetically inconsistent.

Fix: dim a row only when the source filter is actually narrowing the
result set (1–3 checked). 0 or 4 checked → no dimming, matching the
"no source filter" semantics. Also reuses the resulting filterIsActive
flag for the URL persistence branch a few lines down (was duplicating
the `> 0 && < 4` test).

Smoke test still PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rdhyee rdhyee merged commit e9a8288 into isamplesorg:main May 1, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explorer: dynamic facet count recalculation (cross-filter counts)

1 participant