Skip to content

feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443

Draft
congx4 wants to merge 3 commits into
mainfrom
cong/mapping-fast-path-on-list-fields
Draft

feat(_mapping): timestamp pushdown + column-hints fast path (on top of #6439)#6443
congx4 wants to merge 3 commits into
mainfrom
cong/mapping-fast-path-on-list-fields

Conversation

@congx4
Copy link
Copy Markdown
Contributor

@congx4 congx4 commented May 18, 2026

Summary

This is PR #6436 ported on top of #6439 so the two can land together cleanly. Identical behavior to #6436; the only changes are the conflict resolutions required by #6439's renames.

If you're reviewing this PR after #6439 merges, the diff against main will be exactly the PR #6436 diff. While #6439 is open, this PR's base is guilload/list-fields so GitHub only shows the column-hints + timestamp-pushdown changes, not #6439's churn.

What's in this PR

Two small additions to the ES-compat _mapping(s) endpoint that together let downstream callers (e.g. Trino's ES connector) skip the expensive list_fields scan in the common case.

Today GET /_elastic/{index}/_mapping(s) calls list_fields over every published split. On indexes with hundreds of thousands of dynamic fields this can take several seconds, and over a certain threshold the leaf hits QW_FIELD_LIST_SIZE_LIMIT (100k by default) and the request fails.

  1. Timestamp pushdown — new ?start_timestamp=…&end_timestamp=… URL params on _mapping(s), forwarded into ListFieldsRequest verbatim. The metastore prunes the candidate split set by time window before any leaf fan-out. Unit is epoch seconds, half-open interval — matching the existing ListFieldsRequest proto contract.

  2. Column-hints fast path — new ?fields=… URL param (comma-separated names). When every requested name is a flat literal (no *, no ?, no .) declared in the union of the indexes' doc_mapping, the handler builds the response straight from the declared mapping, filtered to those names. No list_fields call, no split I/O.

    Anything else (wildcards, dotted paths, names not in doc_mapping) falls through to the full-mapping path: list_fields over the splits in the time range, full unfiltered mapping returned — same shape as today, just with the timestamp-pushdown optimization applied.

Conflict resolutions vs #6436

Three conflicts were resolved to align with #6439's renames:

Test plan

  • cargo build -p quickwit-serve — clean
  • cargo clippy -p quickwit-serve --tests — clean
  • cargo +nightly fmt --all -- --check — clean
  • cargo nextest run -p quickwit-serve --lib elasticsearch_api:: — 75 / 75 pass

Test coverage (same as #6436)

  • IndexMappingQueryParams parser: start_timestamp / end_timestamp / fields accepted in isolation and together; empty fields=; unknown params silently ignored.
  • ElasticsearchMappingsResponse::from_doc_mapping_filtered: keeps only requested names; object subtree preserved on top-level match; empty filter behaves identically to from_doc_mapping.
  • collect_declared_top_level_names: unions field names across multiple indexes.
  • parse_field_hints: empty / whitespace / commas-only / well-formed lists.

🤖 Generated with Claude Code

@congx4 congx4 force-pushed the cong/mapping-fast-path-on-list-fields branch 2 times, most recently from fef38b8 to 5bab546 Compare May 18, 2026 23:55
@guilload guilload force-pushed the guilload/list-fields branch 2 times, most recently from e8a723e to 0cd3d6c Compare May 19, 2026 19:43
Base automatically changed from guilload/list-fields to main May 19, 2026 19:55
#[serde(default)]
pub end_timestamp: Option<i64>,
#[serde(default)]
pub fields: Option<String>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those field patterns?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they are. I will rename it to field_pattern to align with your code.


#[test]
fn empty_query_string_yields_none() {
let params: IndexMappingQueryParams = serde_urlencoded::from_str("").unwrap();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use serde_qs? I believe it's already a dependency.

.map(|index_metadata| {
let field_mappings = &index_metadata.index_config.doc_mapping.field_mappings;
let mut properties = build_properties(field_mappings);
if !filter.is_empty() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we push down the filter to the leaves and single split list fields requests so we don't need to do this here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let raw = params.fields.as_deref()?;
let tokens: Vec<String> = raw
.split(',')
.map(|s| s.trim().to_string())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in general, it's better to allocate after filtering, but it does not matter here.

}

fn collect_declared_top_level_names(indexes_metadata: &[IndexMetadata]) -> HashSet<String> {
let mut names = HashSet::new();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: functional way might be nicer... depending on who you ask :)

}

fn collect_declared_top_level_names(indexes_metadata: &[IndexMetadata]) -> HashSet<String> {
let mut names = HashSet::new();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to use with_capacity if you know the final size in advance. It also signals to the reader that there will be no filtering or "explosion"

congx4 added 2 commits May 20, 2026 09:37
Two small additions to the ES-compat `_mapping(s)` endpoint that
together let downstream callers (e.g. Trino's ES connector) skip the
expensive `list_fields` scan in the common case.

Today `GET /_elastic/{index}/_mapping(s)` calls `list_fields` over every
published split. On indexes with hundreds of thousands of dynamic
fields this can take several seconds and runs into
`QW_FIELD_LIST_SIZE_LIMIT` (100k by default). This PR addresses both
pieces with no proto change:

1. Timestamp pushdown
   New `?start_timestamp=…&end_timestamp=…` URL params on `_mapping(s)`,
   forwarded into `ListFieldsRequest` verbatim. The metastore prunes
   the candidate split set by time window before any leaf fan-out.
   Unit is epoch seconds, half-open interval — matching the existing
   `ListFieldsRequest` proto contract.

2. Column-hints fast path
   New `?fields=…` URL param (comma-separated names). When every
   requested name is a flat literal (no `*`, no `?`, no `.`) declared
   in the union of the indexes' `doc_mapping`, the handler builds the
   response straight from the declared mapping, filtered to those
   names. No `list_fields` call, no split I/O.

   Anything else (wildcards, dotted paths, names not in `doc_mapping`)
   falls through to the full-mapping path: `list_fields` over the
   splits in the time range, full unfiltered mapping returned — same
   shape as today, just with the timestamp-pushdown optimization
   applied.

Notes:
- Unknown query params are silently ignored (no `deny_unknown_fields`)
  to stay compatible with standard ES clients that pass `pretty`,
  `ignore_unavailable`, `allow_no_indices`, etc.
- No proto change. Stays on existing `ListFieldsRequest`.
- `IndexMappingQueryParams` parser and the new
  `ElasticsearchMappingsResponse::from_doc_mapping_filtered` are
  unit-tested in their respective modules.
- rename `fields` query param to `field_patterns` to mirror
  `ListFieldsRequest.field_patterns`
- switch tests to `serde_qs` (already a workspace dep, matches the
  bulk-query-params test style)
- move the declared-field filter out of `mappings.rs` and into the
  rest_handler fast path: trim `field_mappings` in place before calling
  `from_doc_mapping`, dropping `from_doc_mapping_filtered` entirely
  (dynamic fields were already filtered at the leaves via
  `ListFieldsRequest.field_patterns`)
- nits in `parse_field_patterns`: trim/filter before allocating the
  owned String per token
- nits in `collect_declared_top_level_names`: functional flat_map style
@congx4 congx4 force-pushed the cong/mapping-fast-path-on-list-fields branch from 5bab546 to 2f719f6 Compare May 20, 2026 14:08
- drop the fast-path declared-field `retain(...)` in rest_handler.
  `field_patterns` is now hint-only: it triggers the fast path (skip
  `list_fields`) when every pattern matches a flat declared field, and
  is pushed down to the leaves for dynamic-field filtering. Both fast
  and slow paths now return the full declared schema, matching slow-
  path semantics that existed before.
- remove unused `serde_urlencoded` dev-dep from `quickwit-serve` and
  the workspace `Cargo.toml` (was already unused after switching tests
  to `serde_qs`).
- `collect_declared_top_level_names`: switch back to a procedural form
  preallocated with `HashSet::with_capacity(sum)` to signal the upper
  bound — no filtering, no explosion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants