-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Use docvalue skippers on dimension fields #137029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use docvalue skippers on dimension fields #137029
Conversation
|
|
||
| this.dimension = TimeSeriesParams.dimensionParam(m -> toType(m).fieldType().isDimension()).addValidator(v -> { | ||
| if (v && (indexed.getValue() == false || hasDocValues.getValue() == false)) { | ||
| if (v && (hasDocValues.getValue() == false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this change will need to be applied to all field mappers that can be dimensions:
BooleanFieldMapper
FlattenedFieldMapper
GeoPointFieldMapper
IpFieldMapper
NumberFieldMapper
UnsignedLongFieldMapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and it will need gating behind a FeatureFlag and IndexVersion too. This is just for experimenting, really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least also add IpFieldMapper? I think that will be important for the experimentation.
|
I ran this against the metricsgen track. The highlights:
Click to expand full rally comparison |
|
I've extracted flame graphs for two queries that have regressed the most: avg_avgot_cpu_load_1m_filtered_by_hosts_5m-main.html avg_avgot_cpu_load_1m_prefix_by_hosts_5m-main.html What both have in common that they're using a two phase iterator in the skippers version. The |
|
Overview of the 50th percentile latency of this PR before and after #137536 has been merged:
After investigating Another thing to discuss is how we want users to control whether to use skippers for dimensions. The easiest thing to do is to just use the existing setting |
|
I extracted flame graphs for the @romseygeek do you see any quick wins here? avg_avgot_memory_by_host_1h-main.html
avg_avgot_memory_by_host_1h-skippers.html
|
|
Posting the full Rally comparison for completeness: Click to expand ... |
BASE=4e68eccd9f663c2d4879f20c409444f3d7aa558b HEAD=d059b5f2d8e7790014d97a80fae570f96201c526 Branch=main
BASE=4e68eccd9f663c2d4879f20c409444f3d7aa558b HEAD=d059b5f2d8e7790014d97a80fae570f96201c526 Branch=main
In preparation for doc-value skippers being enabled in some keyword field types, TermBasedFieldType should take IndexType directly in its constructor rather than booleans for indexed and docvalues.
…nto dimensions/sparse-indexes
|
@felixbarny this is ready for a proper review now. |
|
I just realised that I haven't included |
felixbarny
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
+1 to handle flattened fields separately (if at all)
…nto dimensions/sparse-indexes
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
…-json * upstream/main: (158 commits) Cleanup files from repo root folder (elastic#138030) Implement OpenShift AI integration for chat completion, embeddings, and reranking (elastic#136624) Optimize AsyncSearchErrorTraceIT to avoid failures (elastic#137716) Removes support for null TransportService in RemoteClusterService (elastic#137939) Mute org.elasticsearch.index.mapper.DateFieldMapperTests testSortShortcuts elastic#138018 rest-api-spec: fix type of enums (elastic#137521) Update Gradle wrapper to 9.2.0 (elastic#136155) Add RCS Strong Verification Documentation (elastic#137822) Use docvalue skippers on dimension fields (elastic#137029) Introduce INDEX_SHARD_COUNT_FORMAT (elastic#137210) Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesChatCompletion_AndThenCreatesTextEmbedding elastic#138012 Fix ES|QL search context creation to use correct results type (elastic#137994) Improve Snapshot Logging (elastic#137470) Support extra output field in TOP function (elastic#135434) Remove NumericDoubleValues class (elastic#137884) [ML] Fix ML calendar event update scalability issues (elastic#136886) Task may be unregistered outside of the trace context in exceptional cases. (elastic#137865) Refine workaround for S3 repo analysis known issue (elastic#138000) Additional DEBUG logging on authc failures (elastic#137941) Cleanup index resolution (elastic#137867) ...


Disable points and terms indexes on dimension fields, and use doc_values
skippers instead. Dimension fields are part of the
_tsidand so alignwith the index sorts, meaning that doc_values skippers will work well for
querying and filtering, and have a much smaller disk footprint than full
indexes.