Search-within-a-facet (facet_query typeahead) for high-cardinality facets beyond the maxFacetValues cap

## Context

The keyed `DatasetFacets` GraphQL surface returns a bounded set of facet buckets per field, capped by the engine’s `maxFacetValues` (`CompileOptions.maxFacetValues` → Typesense `max_facet_values`). The Dataset Register consumer sets this to a fixed ceiling (recently raised from 250 to 2000). Beyond the ceiling, Typesense returns only the top-N buckets by count and silently drops the rest — a facet value that exists in the data but ranks below the cutoff becomes unreachable through the UI, with no signal that anything is missing.

The value is not client-parameterizable: `maxFacetValues` is set once at engine construction and is not part of the `SearchQuery` IR or the GraphQL schema, so every request gets the deployment’s fixed ceiling. The consumer’s facet search box is currently **client-side only** — it filters the already-fetched top-N buckets, so it cannot reach anything the cap dropped.

Measured production cardinality (Dataset Register, ~2.6k datasets) shows the cap is genuinely hit: `keyword` ≈ 838 distinct values, `class` ≈ 786, `publisher` ≈ 303. Raising the cap to 2000 covers these today with headroom, so this is **not urgent** — but it does not scale: a literal facet like `keyword` growing into the tens of thousands would make “return every bucket on every page load” too heavy for the per-facet fan-out (one engine search per facet, per page).

## Proposal

Add a per-facet **value-query** (search-within-a-facet / typeahead) so a client can search a facet’s full value space without prefetching all buckets:

- **`@lde/search`** — extend the query IR so a facet request can carry an optional value-query string (and optionally return a truncation signal / total distinct count, which Typesense already exposes as `facet_counts[].stats.total_values`).
- **`@lde/search-typesense`** — compile that into Typesense `facet_query`, composed with the existing skip-own-filter per-facet search so typeahead counts still respect the other active filters.
- **`@lde/search-api-graphql`** — expose the value-query in the schema (capped server-side — it is a public endpoint, so a client-supplied bound must not be unbounded).
- **Consumer (Dataset Register)** — promote the client-side facet filter to a server-backed lookup and add the UI wiring.

## Caveat: stored value vs. translated label

Typesense `facet_query` matches the **stored** facet value. This works directly for literal-valued facets (e.g. `keyword`), but not for IRI-valued facets whose labels are resolved/translated on the client (e.g. `publisher`, `class`) — a `facet_query` on the raw IRI will not match what the user types. Server-backed typeahead for those facets needs a searchable indexed label field first, so the natural first slice is the literal high-cardinality facets only.

## Current mitigation

`maxFacetValues` raised to 2000 in the consumer, which covers present cardinality (max ≈ 838) with headroom. Optionally surface `stats.total_values` as a “showing X of Y” indicator so any future truncation is never silent.

## When to pick this up

When a literal (non-IRI) facet’s distinct cardinality grows large enough that returning its full bucket list on every page load becomes a payload/latency problem, or when product wants genuine search-within-a-facet UX rather than filtering a prefetched slice.

## Related

- Sibling facet-engine follow-up: #532 (batch per-facet searches into one `multi_search`) — a value-query typeahead would ride the same per-facet fan-out.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Search-within-a-facet (facet_query typeahead) for high-cardinality facets beyond the maxFacetValues cap #533

Context

Proposal

Caveat: stored value vs. translated label

Current mitigation

When to pick this up

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Search-within-a-facet (facet_query typeahead) for high-cardinality facets beyond the maxFacetValues cap #533

Description

Context

Proposal

Caveat: stored value vs. translated label

Current mitigation

When to pick this up

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions