Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that sum(nothing) is 0 #71978

Closed
wants to merge 2 commits into from
Closed

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Apr 20, 2021

It'd be super reasonable if the sum aggregation returned null when
run on an unmapped field. Or if the query filtered out all results. But
it doesn't. It returns 0. Which is also reasonable! Its just different
from what other reasonable systems like Postgresql do.

This adds a note to with an anchor we can link to. Folks ask about this
a fair bit.

Closes #71582

@elasticmachine elasticmachine added Team:Docs Meta label for docs team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Apr 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

It'd be super reasonable if the `sum` aggregation returned `null` when
run on an unmapped field. Or if the query filtered out all results. But
it doesn't. It returns `0`. Which is also reasonable! Its just different
from what other reasonable systems like Postgresql do.

This adds a note to with an anchor we can link to. Folks ask about this
a fair bit.

Closes elastic#71582
@mbarretta
Copy link
Contributor

@nik9000 I think this PR should expand a bit to modify the language under the missing value heading

Current:

The missing parameter defines how documents that are missing a value should be treated. By default documents missing the value will be ignored but it is also possible to treat them as if they had a value.

Suggested:
The missing parameter defines what value should be assigned to documents that are missing a value. By default, documents missing the value will be given a value of 0

@nik9000
Copy link
Member Author

nik9000 commented Apr 20, 2021

I can certainly leave a note in the missing section too. We really don't treat missing values as 0 - its more like we initialize the accumulator to 0 instead of null.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also an issue in stats and extended stats and we have a very similar fragments there. But honestly, I think we should fix it. We make everything null if we don't have available data, but sum is 0 for some reason.

@nik9000
Copy link
Member Author

nik9000 commented Apr 21, 2021

But honestly, I think we should fix it. We make everything null if we don't have available data, but sum is 0 for some reason.

Like, introduce an option for it the defaults to false and flip it to true in 8.0? I'm generally on board with anything that makes us postgresql-like, but this has been like this for so so so long. I remember looking and seeing that it isn't "simple" to fix. Like, not a huge thing, but a thing.

@imotov
Copy link
Contributor

imotov commented Apr 21, 2021

On the second thought, maybe 0 is not such a bad idea as long as it is documented. After all, it seems to be consistent with the math definition.

@mbarretta
Copy link
Contributor

@imotov Think the ask is around getting the same behavior given the same data.

While the default behavior for sum and min/max might be different, you can make min/max behave like sum via the missing parameter. But, it doesn't look like you can do the opposite so that they all give null.

@imotov
Copy link
Contributor

imotov commented Apr 22, 2021

My point was that min/max of an empty set are theoretically speaking undefined, but sum of the empty set is well defined and it is 0. When you say "make them behave like sum" you are actually not making them to behave like sum, you are just replacing an empty set with a non-empty set where all these operations are defined. What you are asking is to change the definition of sum of an empty set.

@mbarretta
Copy link
Contributor

mbarretta commented Apr 22, 2021

For the data that drove this, the set of documents wasn't empty. The set of fields being summed on those documents was empty. I don't know if that makes a big difference in your thinking, but I do appreciate your point.

From the user side (outside->in), sum is receiving data even if the relevant fields are empty. We clearly had the opinion that users should get to specify how to handle missing fields beyond what would happen given a strictly "correct" application of pure math or we'd not have added the missing parameter. Given those two points, I don't think it's unreasonable to extend the allowed values of missing to include null. Again, from the user side, it allows similar behavior on similar data: when min/max is null, sum is null (or at least can be coerced as such). When charts with min/max give no bar or dot, sum gives no bar or dot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >docs General docs changes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Docs Meta label for docs team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sum aggregation not ignoring missing values
6 participants