Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure shard size stays stable for low number of sizes in TSVB, Lens and Agg based #139791

Merged
merged 7 commits into from Sep 5, 2022

Conversation

flash1293
Copy link
Contributor

Fixes #137056

By making sure the terms aggs built for field summaries in the unified field list, agg configs and TSVB all set a shard_size parameter of 25 if the size is below or equal to 10 and the shard_size isn't set by some other means.

This is done to make sure the top values stay stable when increasing the size. As terms is not always 100% accurate, in the case of many shards and high cardinality fields it's easy to run into situations otherwise where increasing the number of top values for 3 to 7 changes the top 3 values because the shard size is implicitly increased.

To test this, you can load data using makelogs like this:

yarn makelogs -s 10 -i 99999999 -c 25k --auth elastic:changeme

Doing top values by geo.dest and ordering by the median of the bytes field is pretty unstable, so changing the top values size will also re-order/change the terms on main. With this PR this behavior should stop.

This is what this is about on the Elasticsearch level:

For the data generated like above,

GET logstash*/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "terms": {
        "field": "geo.dest",
        "size": 3,
        "order": {
          "median.50": "desc"
        }
      },
      "aggs": {
        "median": {
          "percentiles": {
            "field": "bytes",
            "percents": [
              50
            ]
          }
        }
      }
    }
  }
}

returns the terms JM, TL, KW.

Changing the "size" to 10 results in JM, KW, IE, LS, GN, TT, TM, KI, ME, TG

Setting the shard size to 25 doesn't change the result for size 10, but "fixes" the size 3 result by returning JM, KW and IE . Note that an even larger shard_size changes the results again - the results for shard size 25 are not 100 %correct, this change is about keeping them stable for the most common "size" settings to avoid confusion as well as providing a sensible lower bound.

@flash1293 flash1293 added release_note:enhancement Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:TSVB TSVB (Time Series Visual Builder) Team:Visualizations Visualization editors, elastic-charts and infrastructure Team:AppServicesSv Feature:Lens backport:skip This commit does not require backporting labels Aug 31, 2022
@flash1293 flash1293 marked this pull request as ready for review August 31, 2022 14:45
@flash1293 flash1293 requested review from a team as code owners August 31, 2022 14:45
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-vis-editors @elastic/kibana-vis-editors-external (Team:VisEditors)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesSv)

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisEditors changes LGTM. I tested it locally and I can see the enhancement. Def much better now :)

Copy link
Contributor

@Dosant Dosant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggs lgtm. change makes sense 👍

I didn't understand why 25, but then realized it is derived from the formula: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-shard-size

I think a comment everywhere where magic 25 is used would be helpful

@flash1293
Copy link
Contributor Author

That's absolutely correct @Dosant . Added a comment to these places.

@flash1293 flash1293 enabled auto-merge (squash) September 5, 2022 15:38
@flash1293 flash1293 merged commit 15f8329 into elastic:main Sep 5, 2022
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
unifiedFieldList 47.5KB 47.6KB +54.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 430.5KB 430.6KB +63.0B

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Mpdreamz pushed a commit to Mpdreamz/kibana that referenced this pull request Sep 6, 2022
…ns and Agg based (elastic#139791)

* make sure shard size stays stable for low number of sizes

* add terms test

* adjust test

* fix tests

* add explanatory comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Aggregations Aggregation infrastructure (AggConfig, esaggs, ...) Feature:Lens Feature:TSVB TSVB (Time Series Visual Builder) release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Lens] Set min bound for shard_size on top values
6 participants