-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
min_by/max_by returns wrong result in Window operation #8138
Comments
@kagamiori : Are you interested in fixing this ? If not, I'll take a look. |
Hi @aditi-pandit, please feel free to go ahead. Thanks! |
@kagamiori : I'm on vacation this month with limited availability. It might be better for you to take a look if this is blocking you. |
Hi, I'm a first time contributor. If @kagamiori has other priorities, I'd be more than happy to take a look at this! |
Hi @kevinmingtarja, thank you for reaching out! This issue is a bit time-sensitive. For first time contributors, we'd suggest starting from the list of good first issues. Also, here is a document about code contribution that would be helpful. |
Got it. I'll take a look at other issues with less severity then! Thanks. |
There are two bugs with min_by/max_by in Window operation, as explained below, one in MinMaxByNAggregate::extractValues() and the other in AggregateWindowFunction::incrementalAggregation().
Both bugs can be reproduced via the following unit test.
|
Great analysis @kagamiori. Thanks ! |
Summary: min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs: 1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior. 2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues(). This diff fixes facebookincubator#8138. Differential Revision: D52575661
Another bug in min/max(x, n) is found during the testing of the fix #8296.
The root cause is that for min/max(x, n), when aggregation result is extracted from accumulator, the accumulator should retain its content (https://github.com/prestodb/presto/blob/bed052afeef85c728d0c99237be1047f60e24839/presto-main/src/main/java/com/facebook/presto/operator/aggregation/AbstractMinMaxNAggregationFunction.java#L180). In the current Velox implementation, accumulator content is cleared after the content is extracted (
One interesting thing is that this behavior of min/max(x, n) is exactly the opposite to the behavior of min_by/max_by(x, y, n) in Presto that clears the content in accumulator after extraction (https://github.com/prestodb/presto/blob/cb582bce0fd18f51d8862a8b2f53a134780f41aa/presto-main/src/main/java/com/facebook/presto/operator/aggregation/minmaxby/AbstractMinMaxByNAggregationFunction.java#L153). I wonder whether this difference is intended or a bug in Presto. cc @mbasmanova |
Summary: In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs: 1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior. 2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues(). This diff fixes facebookincubator#8138. Differential Revision: D52575661
Summary: In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs: 1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior. 2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues(). This diff fixes facebookincubator#8138. Differential Revision: D52575661
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Velox has an optimization for Window operation with incremental aggregation when there are peer rows with the same frame. In this situation, the aggregation result is only computed at the first row of the peer and the rest rows simply copy this result. This optimization assumes that results of incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) in Velox breaks this assumption because their extractValues() method causes the accumulator to be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) to make the extraction method not clear the accumulator. The behavior after the fix also align with Presto's. This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) doesn't pass testIncrementalAggregation. This diff fixes facebookincubator#8138. Reviewed By: mbasmanova Differential Revision: D52638334
Summary: Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation because their extractValues() methods has a side effect of clearing the accumulator. This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not clear the accumulators. Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix will make Velox's min_by/max_by behave differently from Presto when used in Window operation, until prestodb/presto#21653 is fixed. This diff fixes facebookincubator#8138. Differential Revision: D53139892
Summary: Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation because their extractValues() methods has a side effect of clearing the accumulator. This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not clear the accumulators. Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix will make Velox's min_by/max_by behave differently from Presto when used in Window operation, until prestodb/presto#21653 is fixed. This diff fixes facebookincubator#8138. Differential Revision: D53139892
Summary: Pull Request resolved: facebookincubator#8566 Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation because their extractValues() methods has a side effect of clearing the accumulator. This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not clear the accumulators. Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix will make Velox's min_by/max_by behave differently from Presto when used in Window operation, until prestodb/presto#21653 is fixed. This diff fixes facebookincubator#8138. Reviewed By: bikramSingh91 Differential Revision: D53139892 fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Summary: Pull Request resolved: facebookincubator#8566 Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation because their extractValues() methods has a side effect of clearing the accumulator. This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not clear the accumulators. Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix will make Velox's min_by/max_by behave differently from Presto when used in Window operation, until prestodb/presto#21653 is fixed. This diff fixes facebookincubator#8138. Reviewed By: bikramSingh91 Differential Revision: D53139892 fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Summary: Pull Request resolved: facebookincubator#8566 Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation because their extractValues() methods has a side effect of clearing the accumulator. This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not clear the accumulators. Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix will make Velox's min_by/max_by behave differently from Presto when used in Window operation, until prestodb/presto#21653 is fixed. This diff fixes facebookincubator#8138. Reviewed By: bikramSingh91 Differential Revision: D53139892 fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Description
Window fuzzer found a bug of min_by(x, y, n) and max_by(x, y, n) when evaluated in the Window operator. The error is that for a given input row where
y
is NULL, Velox returns an empty array[]
while Presto returns a NULL.This error happens when there are at least two rows in a partition where the first row has non-null
y
and the second row has a NULLy
, and the frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. What happens is that Window operator performs incremental aggregation for this partition. After the result for the first row is produced that is a non-null array, the is-null flag of the accumulator (i.e.,isNull(group)
) isn't reset to true before the second row is processed. Sincey
at the second row is NULL, MinMaxByNAggregate ignores this input row, so the accumulator remains empty (elements added by the previous row were already removed when the previous row's result was extracted). When the result of the second row is extracted, since isNull(group) is false and the accumulator has no element, an empty array is generated.Error Reproduction
Relevant logs
No response
The text was updated successfully, but these errors were encountered: