New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure shard size stays stable for low number of sizes in TSVB, Lens and Agg based #139791
Conversation
Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors) |
Pinging @elastic/kibana-app-services (Team:AppServicesSv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VisEditors changes LGTM. I tested it locally and I can see the enhancement. Def much better now :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggs lgtm. change makes sense 👍
I didn't understand why 25
, but then realized it is derived from the formula: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-shard-size
I think a comment everywhere where magic 25
is used would be helpful
That's absolutely correct @Dosant . Added a comment to these places. |
💚 Build Succeeded
Metrics [docs]Async chunks
Page load bundle
History
To update your PR or re-run it, just comment with: |
…ns and Agg based (elastic#139791) * make sure shard size stays stable for low number of sizes * add terms test * adjust test * fix tests * add explanatory comment
Fixes #137056
By making sure the terms aggs built for field summaries in the unified field list, agg configs and TSVB all set a
shard_size
parameter of 25 if thesize
is below or equal to 10 and the shard_size isn't set by some other means.This is done to make sure the top values stay stable when increasing the size. As terms is not always 100% accurate, in the case of many shards and high cardinality fields it's easy to run into situations otherwise where increasing the number of top values for 3 to 7 changes the top 3 values because the shard size is implicitly increased.
To test this, you can load data using makelogs like this:
Doing top values by
geo.dest
and ordering by the median of the bytes field is pretty unstable, so changing the top values size will also re-order/change the terms on main. With this PR this behavior should stop.This is what this is about on the Elasticsearch level:
For the data generated like above,
returns the terms JM, TL, KW.
Changing the "size" to 10 results in JM, KW, IE, LS, GN, TT, TM, KI, ME, TG
Setting the shard size to 25 doesn't change the result for size 10, but "fixes" the size 3 result by returning JM, KW and IE . Note that an even larger shard_size changes the results again - the results for shard size 25 are not 100 %correct, this change is about keeping them stable for the most common "size" settings to avoid confusion as well as providing a sensible lower bound.