-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document that sum(nothing) is 0 #71978
Conversation
Pinging @elastic/es-docs (Team:Docs) |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
It'd be super reasonable if the `sum` aggregation returned `null` when run on an unmapped field. Or if the query filtered out all results. But it doesn't. It returns `0`. Which is also reasonable! Its just different from what other reasonable systems like Postgresql do. This adds a note to with an anchor we can link to. Folks ask about this a fair bit. Closes elastic#71582
@nik9000 I think this PR should expand a bit to modify the language under the missing value heading Current:
Suggested: |
I can certainly leave a note in the missing section too. We really don't treat missing values as 0 - its more like we initialize the accumulator to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also an issue in stats and extended stats and we have a very similar fragments there. But honestly, I think we should fix it. We make everything null if we don't have available data, but sum is 0 for some reason.
Like, introduce an option for it the defaults to false and flip it to true in 8.0? I'm generally on board with anything that makes us postgresql-like, but this has been like this for so so so long. I remember looking and seeing that it isn't "simple" to fix. Like, not a huge thing, but a thing. |
On the second thought, maybe 0 is not such a bad idea as long as it is documented. After all, it seems to be consistent with the math definition. |
@imotov Think the ask is around getting the same behavior given the same data. While the default behavior for |
My point was that |
For the data that drove this, the set of documents wasn't empty. The set of fields being summed on those documents was empty. I don't know if that makes a big difference in your thinking, but I do appreciate your point. From the user side (outside->in), sum is receiving data even if the relevant fields are empty. We clearly had the opinion that users should get to specify how to handle missing fields beyond what would happen given a strictly "correct" application of pure math or we'd not have added the |
It'd be super reasonable if the
sum
aggregation returnednull
whenrun on an unmapped field. Or if the query filtered out all results. But
it doesn't. It returns
0
. Which is also reasonable! Its just differentfrom what other reasonable systems like Postgresql do.
This adds a note to with an anchor we can link to. Folks ask about this
a fair bit.
Closes #71582