Skip to content

feat(_mapping): timestamp pushdown + column-hints fast path#6436

Open
congx4 wants to merge 1 commit into
mainfrom
cong/quickwit-mapping-fast-path
Open

feat(_mapping): timestamp pushdown + column-hints fast path#6436
congx4 wants to merge 1 commit into
mainfrom
cong/quickwit-mapping-fast-path

Conversation

@congx4
Copy link
Copy Markdown
Contributor

@congx4 congx4 commented May 15, 2026

Summary

Two small additions to the ES-compat _mapping(s) endpoint that together let downstream callers (e.g. Trino's ES connector) skip the expensive list_fields scan in the common case.

Today GET /_elastic/{index}/_mapping(s) calls list_fields over every published split. On indexes with hundreds of thousands of dynamic fields this can take several seconds, and over a certain threshold the leaf hits QW_FIELD_LIST_SIZE_LIMIT (100k by default) and the request fails. This PR addresses both pieces with no proto change:

  1. Timestamp pushdown — new ?start_timestamp=…&end_timestamp=… URL params on _mapping(s), forwarded into ListFieldsRequest verbatim. The metastore prunes the candidate split set by time window before any leaf fan-out. Unit is epoch seconds, half-open interval — matching the existing ListFieldsRequest proto contract.

  2. Column-hints fast path — new ?fields=… URL param (comma-separated names). When every requested name is a flat literal (no *, no ?, no .) declared in the union of the indexes' doc_mapping, the handler builds the response straight from the declared mapping, filtered to those names. No list_fields call, no split I/O.

    Anything else (wildcards, dotted paths, names not in doc_mapping) falls through to the full-mapping path: list_fields over the splits in the time range, full unfiltered mapping returned — same shape as today, just with the timestamp-pushdown optimization applied.

Test plan

  • cargo build -p quickwit-serve — clean
  • cargo clippy -p quickwit-serve --tests — clean
  • cargo +nightly fmt --all -- --check — clean
  • cargo test -p quickwit-serve --lib elasticsearch_api:: — 75 / 75 pass

Test coverage added

  • IndexMappingQueryParams parser: start_timestamp / end_timestamp / fields accepted in isolation and together; empty fields=; unknown params silently ignored.
  • ElasticsearchMappingsResponse::from_doc_mapping_filtered: keeps only requested names; object subtree preserved on top-level match; empty filter behaves identically to from_doc_mapping.
  • collect_declared_top_level_names: unions field names across multiple indexes.
  • parse_field_hints: empty / whitespace / commas-only / well-formed lists.

🤖 Generated with Claude Code

Two small additions to the ES-compat `_mapping(s)` endpoint that
together let downstream callers (e.g. Trino's ES connector) skip the
expensive `list_fields` scan in the common case.

Today `GET /_elastic/{index}/_mapping(s)` calls `list_fields` over every
published split. On indexes with hundreds of thousands of dynamic
fields this can take several seconds and runs into
`QW_FIELD_LIST_SIZE_LIMIT` (100k by default). This PR addresses both
pieces with no proto change:

1. Timestamp pushdown
   New `?start_timestamp=…&end_timestamp=…` URL params on `_mapping(s)`,
   forwarded into `ListFieldsRequest` verbatim. The metastore prunes
   the candidate split set by time window before any leaf fan-out.
   Unit is epoch seconds, half-open interval — matching the existing
   `ListFieldsRequest` proto contract.

2. Column-hints fast path
   New `?fields=…` URL param (comma-separated names). When every
   requested name is a flat literal (no `*`, no `?`, no `.`) declared
   in the union of the indexes' `doc_mapping`, the handler builds the
   response straight from the declared mapping, filtered to those
   names. No `list_fields` call, no split I/O.

   Anything else (wildcards, dotted paths, names not in `doc_mapping`)
   falls through to the full-mapping path: `list_fields` over the
   splits in the time range, full unfiltered mapping returned — same
   shape as today, just with the timestamp-pushdown optimization
   applied.

Notes:
- Unknown query params are silently ignored (no `deny_unknown_fields`)
  to stay compatible with standard ES clients that pass `pretty`,
  `ignore_unavailable`, `allow_no_indices`, etc.
- No proto change. Stays on existing `ListFieldsRequest`.
- `IndexMappingQueryParams` parser and the new
  `ElasticsearchMappingsResponse::from_doc_mapping_filtered` are
  unit-tested in their respective modules.
@congx4 congx4 requested review from a team as code owners May 15, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant