Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch: Terms group by 'size=No limit' is misleading as it sets a 500 limit #15870

Open
jochenonline opened this issue Mar 8, 2019 · 18 comments

Comments

@jochenonline
Copy link

jochenonline commented Mar 8, 2019

summary by @gabor : when the terms-aggregation's size is set to no limit, it sets the limit to 500. this is confusing. sometimes people need more than 500, so they have to write a very long value in the field and hope it covers all of it. the problem is, the elasticsearch-database does not have a no size limit option ( https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-size )


What happened:
The percentage of items of a very long list are calculated wrong if the list size is set to "no limit". If I set the list size manually to a size far bigger than the list itself (99999 in my case) the percentages are calculated correctly.

What you expected to happen:
I expect to get correctly calculated percentages if the list size is set to "no limit".

How to reproduce it (as minimally and precisely as possible):
see above

Anything else we need to know?:

Environment:

  • Grafana version: 5.3.4
  • Data source type & version: Elastic search
  • OS Grafana is installed on: Linux
  • User OS & Browser: Ubuntu 18.04
  • Grafana plugins: -
  • Others: -
@torkelo
Copy link
Member

torkelo commented Mar 8, 2019

Elasticsearch percentiles are calculated by Elasticsearch not Grafana. Is there anything wrong the the query Grafana send to ES?

@bergquist bergquist added the needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc label Mar 8, 2019
@jochenonline
Copy link
Author

Yes. The query is wrong! If I choose no limit the sizeproperty is set to 500. For long lists that is definitely not "without limit"...

@torkelo
Copy link
Member

torkelo commented Mar 11, 2019

list size? where do you set list size?

This is the query I get when querying for elasticsearch percentiles:

{"search_type":"query_then_fetch","ignore_unavailable":true,"index":["metrics-2019.03.11"]}
{"size":0,"query":{"bool":{"filter":[{"range":{"@timestamp":{"gte":"1552278478005","lte":"1552300078005","format":"epoch_millis"}}},{"query_string":{"analyze_wildcard":true,"query":"*"}}]}},"aggs":{"2":{"date_histogram":{"interval":"30s","field":"@timestamp","min_doc_count":0,"extended_bounds":{"min":"1552278478005","max":"1552300078005"},"format":"epoch_millis"},"aggs":{"1":{"percentiles":{"field":"@value","percents":[25,50,75,95,99]}}}}}}

image

@torkelo torkelo closed this as completed Mar 11, 2019
@torkelo torkelo removed the type/bug label Mar 11, 2019
@jochenonline
Copy link
Author

I never write about "percentiles". I talked about percentages. To be precise (and maybe that's what was missing in my description) it is the Pie Chart that has this problem. Here you can see the percentage option that I selected:

image

And this is the part where I set "size" to "no limit" and which has the effect that the "size" property in the elastic query is set to 500:

image

@jochenonline jochenonline changed the title Percentages are calculated wrong if list size is set to "no limit" Percentages are calculated wrong if list size is set to "no limit" in Pie Chart Mar 11, 2019
@jochenonline
Copy link
Author

image

@torkelo
Copy link
Member

torkelo commented Mar 11, 2019

Sorry I missed the issue.

The size is for how many series to return back for the terms group by, It's not a good idea to return more than 500 series.

@jochenonline
Copy link
Author

Anyway the naming "no limit" stays misleading if it actually sets the size to 500. And the percentages are calculated wrong if you assume to get and regard "all" (aka "no limit).

@torkelo torkelo changed the title Percentages are calculated wrong if list size is set to "no limit" in Pie Chart Elasticsearch terms group by 'No limit' is misleading as it sets a 500 limit Mar 11, 2019
@torkelo torkelo added datasource/Elasticsearch and removed needs more info Issue needs more information, like query results, dashboard or panel json, grafana version etc labels Mar 11, 2019
@torkelo torkelo reopened this Mar 11, 2019
@torkelo
Copy link
Member

torkelo commented Mar 11, 2019

yea, agree it's a bit misleading.

@marefr
Copy link
Member

marefr commented Mar 24, 2019

Needs fixes here

queryNode.terms.size = parseInt(aggDef.settings.size, 10) === 0 ? 500 : parseInt(aggDef.settings.size, 10);

and here

func addTermsAgg(aggBuilder es.AggBuilder, bucketAgg *BucketAgg, metrics []*MetricAgg) es.AggBuilder {
aggBuilder.Terms(bucketAgg.ID, bucketAgg.Field, func(a *es.TermsAggregation, b es.AggBuilder) {
if size, err := bucketAgg.Settings.Get("size").Int(); err == nil {
a.Size = size
} else if size, err := bucketAgg.Settings.Get("size").String(); err == nil {
a.Size, err = strconv.Atoi(size)
if err != nil {
a.Size = 500
}
} else {
a.Size = 500
}
if a.Size == 0 {
a.Size = 500
}

Plus update of tests, if any. I suggest to not set a size in query to elasticsearch if no limit is set in ui. Anyone interested in contributing a fix?

@conet
Copy link

conet commented Apr 9, 2019

Having the possibility to use variables there would be a great feature, simply stating that returning more than 500 groups is not a good idea is false, if data is rendered in a table there is nothing wrong with returning more groups.

@srid99
Copy link
Contributor

srid99 commented Aug 5, 2019

The suggestion from @conet makes sense since we have similar requirement where we like to display the values in a table and the rows can be more than 500. With "Raw document" it is possible to enter the values and that works but we also have some aggregation queries which doesn't fit there and now it's limited to displaying up to 500 rows.

@karln-star
Copy link

I agree with conet, very nice if handled with variable. Alternatively much larger than 500. We do a table Group By Terms. The number of Terms in our case might well be over 500, meaning we miss potentially a lot of data. (and the relevant bucketing is on term value)

@anapsix
Copy link

anapsix commented Apr 1, 2020

alas, this remains broken, the only workaround for a moment is to set limit to a high enough value hoping it will include all results

@aocenas
Copy link
Member

aocenas commented Aug 26, 2020

Seems like there was a PR opened, probably we can just revisit that one and fix tests there.

@ifrost ifrost added the prio/low It's a good idea, but not scheduled for any release label Jul 29, 2021
@gabor gabor removed onboarding help wanted prio/low It's a good idea, but not scheduled for any release labels Oct 19, 2022
@gabor
Copy link
Contributor

gabor commented Nov 8, 2022

i did a quick check, and this will not be so easy to improve the situation ... there is no such thing as "unlimited size" in the elasticsearch database.

from the elasticsearch documentation at https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-size :

"
By default, the terms aggregation returns the top ten terms with the most documents. Use the size parameter to return more terms, up to the search.max_buckets limit.

If your data contains 100 or 1000 unique terms, you can increase the size of the terms aggregation to return them all. If you have more unique terms and you need them all, use the composite aggregation instead.

Larger values of size use more memory to compute and, push the whole aggregation close to the max_buckets limit. You’ll know you’ve gone too large if the request fails with a message about max_buckets.
"

if we just remove the size=500 part, then elastic returns the top 10. in other words, if there is no size=500, then it is the same as size=10.
(also, if we send size=0, that's not a valid value, and you get an error-message from the elasticsearch database)

@gabor gabor removed their assignment Nov 8, 2022
@gabor gabor changed the title Elasticsearch terms group by 'No limit' is misleading as it sets a 500 limit Elasticsearch: Terms group by 'size=No limit' is misleading as it sets a 500 limit Nov 8, 2022
@gabor
Copy link
Contributor

gabor commented Nov 8, 2022

considering there's no easy way to improve this (there is no real no-size-limit option in the elasticsearch database), maybe we could do a small improvement, and just change that options' label-text from No limit to No limit (500) ? it's not the nicest thing, but would probably help with the confusion.. WDYT @grafana/observability-logs ?

like this diff:

 export const sizeOptions = [
-  { label: 'No limit', value: '0' },
+  { label: 'No limit (500)', value: '0' }, // this will get set to 500
   { label: '1', value: '1' },
   { label: '2', value: '2' },
   { label: '3', value: '3' },

@ivanahuckova
Copy link
Member

@gabor is it always 500? Or is this a setting somewhere that can be changed? If it is always 500 at it is not possible to change it, I'd do change No limit to 500, as that's what it is. If it is possible to change it, I'd figure out what is the field that affects it, and would do something like No limit (defaults to set limit value).

@gabor
Copy link
Contributor

gabor commented Nov 9, 2022

@ivanahuckova the way it works is the value stored in the grafana-database is 0, and then when we are building the "elastic format" query, we take this value, and if it is 0, we set it to 500.

( https://github.com/grafana/grafana/blob/main/pkg/tsdb/elasticsearch/time_series_query.go#L293-L307 ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

Successfully merging a pull request may close this issue.