Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bucketsPath syntax for lower and upper standard deviation bounds in the extended_stats aggregation more obvious #19040

Closed
Tracked by #82808
colings86 opened this issue Jun 23, 2016 · 10 comments
Labels
:Analytics/Aggregations Aggregations >breaking >enhancement help wanted adoptme Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@colings86
Copy link
Contributor

This issue was raised on the forum here: https://discuss.elastic.co/t/accessing-lower-bound-of-extended-stats-bucket/53763

The output of the extended_stats aggregation looks like this:

    "my_stats": {
      "count": 20523877,
      "min": 1,
      "max": 98250000,
      "avg": 165931.3618741722,
      "sum": 3405554861548,
      "sum_of_squares": 2540304986111513000,
      "variance": 96239937025.42874,
      "std_deviation": 310225.6227738591,
      "std_deviation_bounds": {
        "upper": 786382.6074218905,
        "lower": -454519.883673546
      }
    }

but the buckets paths to get the upper and lower standard deviation bounds are my_stats.std_lower and my_stats.std_upper which is not very intuitive given the output above. I think we should change it so the buckets_path (and terms sort path) is my_stats.std_deviation_bounds.lower and my_stats.std_deviation_bounds.upper

@tarunsapra
Copy link

+1

@liketic
Copy link
Contributor

liketic commented Oct 22, 2017

@colings86 Can I continue to work on this? Thanks in advance. 😄

@colings86
Copy link
Contributor Author

@liketic if you are able to raise a PR for this then that would be great, thanks

@liketic
Copy link
Contributor

liketic commented Oct 26, 2017

@colings86 Thanks for your reply! 👍

If I'm not wrong, we expect to access upper and lower bound through my_stats. std_deviation_bounds.lower and my_stats. std_deviation_bounds.upper. However when we parse the path as

We didn't support access value more than one levels which means foo.bar is parse to
name=foo, key=bar and foo.bar1.bar2 will be parse to name=foo.bar1, key=bar2. So if this is not the expected behaviour, I think I can convert the key of PathElement to an array, for example:
name=foo, keys=[bar1, bar2].

Is that the correct direction? Any comments are appreciated. 😄

@colings86
Copy link
Contributor Author

@liketic this is actually a bit tricky because the pattern for a valid aggregation name is: https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/aggregations/AggregatorFactories.java#L53

So the reason why the line you linked to calls element.lastIndexOf('.') is so we support aggregation names containing dots. There is an argument that we should not support aggregation names containing dots but this is a bigger and potentially controversial change.

Because of this its going to be difficult to change the path to my_stats.std_deviation_bounds.lower and my_stats.std_deviation_bounds.upper since it wouldn't be easy to support both dots in aggregation names and dots in metric names.

I would if we should make a change to stop dots in the aggregation names so we can make this change or if we should instead keep the current limitation of metrics not containing dots and not worry about this issue? @jpountz @clintongormley what do you think?

@clintongormley
Copy link

@colings86 I can't think of a reason for supporting dot in aggregation names, esp as we don't support dots in field names (dots are treated as object paths). I'd be in favour of removing support (deprecating in 6.x and removing in 7.0)

@markharwood
Copy link
Contributor

cc @elastic/es-search-aggs

polyfractal added a commit to polyfractal/elasticsearch that referenced this issue Jun 20, 2018
Dots in aggregation names can lead to tricky parsing situations, like
being unable to sort by agg names with dots.

This adds a deprecation logger and a note in the docs, letting us
remove them in 8.0.

Related to elastic#17600 and elastic#19040
@rjernst rjernst added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@imotov
Copy link
Contributor

imotov commented Jun 23, 2021

We kind of made it worse recently by adding population and sampling bounds in #49782. Maybe it is time to clean this up

@wchaparro
Copy link
Contributor

wchaparro commented Jan 24, 2022

Added to the refactor issue for bucket / pipeline aggs
#82808

@wchaparro
Copy link
Contributor

Closing as not planned, focus is on ES|QL development

@wchaparro wchaparro closed this as not planned Won't fix, can't repro, duplicate, stale Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >breaking >enhancement help wanted adoptme Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

8 participants