Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result of min_by/max_by(x, y, n) in Window operation #21653

Open
kagamiori opened this issue Jan 9, 2024 · 3 comments
Open

Incorrect result of min_by/max_by(x, y, n) in Window operation #21653

kagamiori opened this issue Jan 9, 2024 · 3 comments
Assignees
Labels

Comments

@kagamiori
Copy link
Contributor

kagamiori commented Jan 9, 2024

Hi community, I noticed min_by/max_by(x, y, n) produces incorrect results when it is used in Window operation. As an example, for the query below with min_by(x, y, n). Since the window frame is UNBOUNDED PRECEDING to CURRENT ROW, when the second row is processed, the function should aggregate both the first and second input rows and hence there should be two values in the result array, i.e., [1, 2]. However, the current result of the second row is [2] only.

SELECT
    MIN_BY(c0, c0, c1) OVER (
        PARTITION BY
            c2
        ORDER BY
            c3 ASC
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    )
FROM (
    VALUES
        (1, 10, FALSE, 0),
        (2, 10, FALSE, 1)
) AS t(c0, c1, c2, c3)

This error is caused by AbstractMinMaxByNAggregationFunction::output() breaking the assumption for "expanding frame" in AggregateWindowFunction::processRow(). AggregateWindowFunction::processRow() has an optimization branch for "same or expanding frame" of the previously computed frame, where it only add additional input rows of the new frame to accumulator.

else if ((frameStart == currentStart) && (frameEnd >= currentEnd)) {
// same or expanding frame
accumulate(currentEnd + 1, frameEnd);
currentEnd = frameEnd;
}

This draws an implicit assumption that the content of the accumulator of the previous frame remains in the accumulator when processing the current frame. This, however, is not true with min_by because AbstractMinMaxByNAggregationFunction::output() clears the accumulator.

Your Environment

  • Presto version used: 0.286

Expected Behavior

Expected result is

[1]
[1,2]

Current Behavior

Current result is

[1]
[2]

Possible Solution

Add heap.addAll(reversedBlockBuilder); to the end of AbstractMinMaxByNAggregationFunction::output() before out.closeEntry().

Steps to Reproduce

Screenshots (if appropriate)

Context

@kagamiori kagamiori added the bug label Jan 9, 2024
@kagamiori
Copy link
Contributor Author

cc @mbasmanova

@mbasmanova
Copy link
Contributor

CC: @aditi-pandit @tdcmeehan @kaikalur @amitkdutta

@mbasmanova
Copy link
Contributor

CC: @arhimondr

@kagamiori kagamiori changed the title Inconsistent behavior between min/max(x, n) and min_by/max_by(x, y, n) in Window operation Incorrect result of min_by/max_by(x, y, n) in Window operation Jan 12, 2024
@feilong-liu feilong-liu self-assigned this Jan 26, 2024
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 26, 2024
Summary:
Same as bug in min/max(x, n) fixed in facebookincubator#8311, 
min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation 
because their extractValues() methods has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not 
clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). 
This fix will make Velox's min_by/max_by behave differently from Presto when used in Window 
operation, until prestodb/presto#21653 is fixed.

Differential Revision: D53139892
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 30, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
kagamiori added a commit to kagamiori/velox that referenced this issue Feb 8, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
facebook-github-bot pushed a commit to facebookincubator/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: #8566

Same as bug in min/max(x, n) fixed in #8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes #8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🆕 Unprioritized
Development

No branches or pull requests

3 participants