feat(_mapping): timestamp pushdown + column-hints fast path#6436
Open
congx4 wants to merge 1 commit into
Open
Conversation
Two small additions to the ES-compat `_mapping(s)` endpoint that
together let downstream callers (e.g. Trino's ES connector) skip the
expensive `list_fields` scan in the common case.
Today `GET /_elastic/{index}/_mapping(s)` calls `list_fields` over every
published split. On indexes with hundreds of thousands of dynamic
fields this can take several seconds and runs into
`QW_FIELD_LIST_SIZE_LIMIT` (100k by default). This PR addresses both
pieces with no proto change:
1. Timestamp pushdown
New `?start_timestamp=…&end_timestamp=…` URL params on `_mapping(s)`,
forwarded into `ListFieldsRequest` verbatim. The metastore prunes
the candidate split set by time window before any leaf fan-out.
Unit is epoch seconds, half-open interval — matching the existing
`ListFieldsRequest` proto contract.
2. Column-hints fast path
New `?fields=…` URL param (comma-separated names). When every
requested name is a flat literal (no `*`, no `?`, no `.`) declared
in the union of the indexes' `doc_mapping`, the handler builds the
response straight from the declared mapping, filtered to those
names. No `list_fields` call, no split I/O.
Anything else (wildcards, dotted paths, names not in `doc_mapping`)
falls through to the full-mapping path: `list_fields` over the
splits in the time range, full unfiltered mapping returned — same
shape as today, just with the timestamp-pushdown optimization
applied.
Notes:
- Unknown query params are silently ignored (no `deny_unknown_fields`)
to stay compatible with standard ES clients that pass `pretty`,
`ignore_unavailable`, `allow_no_indices`, etc.
- No proto change. Stays on existing `ListFieldsRequest`.
- `IndexMappingQueryParams` parser and the new
`ElasticsearchMappingsResponse::from_doc_mapping_filtered` are
unit-tested in their respective modules.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two small additions to the ES-compat
_mapping(s)endpoint that together let downstream callers (e.g. Trino's ES connector) skip the expensivelist_fieldsscan in the common case.Today
GET /_elastic/{index}/_mapping(s)callslist_fieldsover every published split. On indexes with hundreds of thousands of dynamic fields this can take several seconds, and over a certain threshold the leaf hitsQW_FIELD_LIST_SIZE_LIMIT(100k by default) and the request fails. This PR addresses both pieces with no proto change:Timestamp pushdown — new
?start_timestamp=…&end_timestamp=…URL params on_mapping(s), forwarded intoListFieldsRequestverbatim. The metastore prunes the candidate split set by time window before any leaf fan-out. Unit is epoch seconds, half-open interval — matching the existingListFieldsRequestproto contract.Column-hints fast path — new
?fields=…URL param (comma-separated names). When every requested name is a flat literal (no*, no?, no.) declared in the union of the indexes'doc_mapping, the handler builds the response straight from the declared mapping, filtered to those names. Nolist_fieldscall, no split I/O.Anything else (wildcards, dotted paths, names not in
doc_mapping) falls through to the full-mapping path:list_fieldsover the splits in the time range, full unfiltered mapping returned — same shape as today, just with the timestamp-pushdown optimization applied.Test plan
cargo build -p quickwit-serve— cleancargo clippy -p quickwit-serve --tests— cleancargo +nightly fmt --all -- --check— cleancargo test -p quickwit-serve --lib elasticsearch_api::— 75 / 75 passTest coverage added
IndexMappingQueryParamsparser:start_timestamp/end_timestamp/fieldsaccepted in isolation and together; emptyfields=; unknown params silently ignored.ElasticsearchMappingsResponse::from_doc_mapping_filtered: keeps only requested names; object subtree preserved on top-level match; empty filter behaves identically tofrom_doc_mapping.collect_declared_top_level_names: unions field names across multiple indexes.parse_field_hints: empty / whitespace / commas-only / well-formed lists.🤖 Generated with Claude Code