Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_by/max_by returns wrong result in Window operation #8138

Closed
kagamiori opened this issue Dec 21, 2023 · 10 comments
Closed

min_by/max_by returns wrong result in Window operation #8138

kagamiori opened this issue Dec 21, 2023 · 10 comments
Assignees
Labels
aggregates bug Something isn't working fuzzer-found window Issues related to window operation

Comments

@kagamiori
Copy link
Contributor

Description

Window fuzzer found a bug of min_by(x, y, n) and max_by(x, y, n) when evaluated in the Window operator. The error is that for a given input row where y is NULL, Velox returns an empty array [] while Presto returns a NULL.

This error happens when there are at least two rows in a partition where the first row has non-null y and the second row has a NULL y, and the frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. What happens is that Window operator performs incremental aggregation for this partition. After the result for the first row is produced that is a non-null array, the is-null flag of the accumulator (i.e., isNull(group)) isn't reset to true before the second row is processed. Since y at the second row is NULL, MinMaxByNAggregate ignores this input row, so the accumulator remains empty (elements added by the previous row were already removed when the previous row's result was extracted). When the result of the second row is extracted, since isNull(group) is false and the accumulator has no element, an empty array is generated.

Error Reproduction

TEST_F(MinMaxByNTest, aaa) {
  auto data = makeRowVector({
          makeFlatVector<int32_t>({3, 3}),
          makeNullableFlatVector<int64_t>({2, std::nullopt}),
          makeFlatVector<int64_t>({1, 1}),
          makeFlatVector<bool>({false, false}),
          makeFlatVector<int64_t>({0, 1}),
      });

  auto plan = PlanBuilder()
                  .values({data})
                  .window({"min_by(c0, c1, c2) over (partition by c3 order by c4 asc)"})
                  .planNode();

  // Result should be two rows: {3, 2, 1, false, 0, [3]}, {3, null, 1, false, 1, null}.
  // Current wrong result is: {3, 2, 1, false, 0, [3]}, {3, null, 1, false, 1, []}.
  auto result = AssertQueryBuilder(plan).copyResults(pool());
}

Relevant logs

No response

@kagamiori kagamiori added bug Something isn't working fuzzer-found labels Dec 21, 2023
@kagamiori
Copy link
Contributor Author

cc @mbasmanova @aditi-pandit

@aditi-pandit aditi-pandit self-assigned this Dec 21, 2023
@aditi-pandit
Copy link
Collaborator

@kagamiori : Are you interested in fixing this ? If not, I'll take a look.

@kagamiori
Copy link
Contributor Author

@kagamiori : Are you interested in fixing this ? If not, I'll take a look.

Hi @aditi-pandit, please feel free to go ahead. Thanks!

@aditi-pandit
Copy link
Collaborator

@kagamiori : I'm on vacation this month with limited availability. It might be better for you to take a look if this is blocking you.

@kevinmingtarja
Copy link

Hi, I'm a first time contributor. If @kagamiori has other priorities, I'd be more than happy to take a look at this!

@kagamiori
Copy link
Contributor Author

Hi, I'm a first time contributor. If @kagamiori has other priorities, I'd be more than happy to take a look at this!

Hi @kevinmingtarja, thank you for reaching out! This issue is a bit time-sensitive. For first time contributors, we'd suggest starting from the list of good first issues. Also, here is a document about code contribution that would be helpful.

@kevinmingtarja
Copy link

Got it. I'll take a look at other issues with less severity then! Thanks.

@kagamiori
Copy link
Contributor Author

kagamiori commented Jan 5, 2024

There are two bugs with min_by/max_by in Window operation, as explained below, one in MinMaxByNAggregate::extractValues() and the other in AggregateWindowFunction::incrementalAggregation().

  1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior. https://github.com/prestodb/presto/blob/cb582bce0fd18f51d8862a8b2f53a134780f41aa/presto-main/src/main/java/com/facebook/presto/operator/aggregation/minmaxby/AbstractMinMaxByNAggregationFunction.java#L137-L140. This would solve the original error described above.
  2. A new bug discovered during investigation is that when there are two peer rows in the same frame, Presto still calls Aggregate::extactValues() on the second row (https://github.com/prestodb/presto/blob/cb582bce0fd18f51d8862a8b2f53a134780f41aa/presto-main/src/main/java/com/facebook/presto/operator/window/AggregateWindowFunction.java#L80), while Velox only copies the result of the previous row (
    result->copy(aggregateResultVector_.get(), resultOffset + i, 0, 1);
    ). This distinction makes a difference in particular for min_by and max_by because the accumulator type of these two functions clears the existing content when Aggregate::extractValues() is called, so the second row in the same frame should get a different result from the first row.

Both bugs can be reproduced via the following unit test.

TEST_F(MinMaxByNTest, aaa) {
  // SELECT
  //  c0, c1, c2, c3, c4, 
  //  min_by(c0, c1, c2) over (partition by c3 order by c4 asc)
  // FROM (
  //  VALUES
  //      (4, 2, 1, false, 0),
  //      (3, 1, 1, false, 0),
  //      (2, 0, 1, false, 1),
  //      (1, null, 1, false, 2)
  // ) AS t(c0, c1, c2, c3, c4);
  auto data = makeRowVector({
      makeFlatVector<int32_t>({4, 3, 2, 1}),
      makeNullableFlatVector<int64_t>({2, 1, 0, std::nullopt}),
      makeFlatVector<int64_t>({1, 1, 1, 1}),
      makeFlatVector<bool>({false, false, false, false}),
      makeFlatVector<int64_t>({0, 0, 1, 2}),
  });

  auto plan =
      PlanBuilder()
          .values({data})
          .window({"min_by(c0, c1, c2) over (partition by c3 order by c4 asc)"})
          .planNode();

  // Result should be {4, 2, 1, false, 0, [3]}, {3, 1, 1, false, 0,
  // null}, {2, 0, 1, false, 1, [2]}, {1, null, 1, false, 2, null}.
  auto result = AssertQueryBuilder(plan).copyResults(pool());
  auto expected = makeRowVector({
      makeFlatVector<int32_t>({4, 3, 2, 1}),
      makeNullableFlatVector<int64_t>({2, 1, 0, std::nullopt}),
      makeFlatVector<int64_t>({1, 1, 1, 1}),
      makeFlatVector<bool>({false, false, false, false}),
      makeFlatVector<int64_t>({0, 0, 1, 2}),
      makeNullableArrayVector<int32_t>(
          {{std::vector<std::optional<int32_t>>(1, 3)},
           std::nullopt,
           {std::vector<std::optional<int32_t>>(1, 2)},
           std::nullopt}),
  });
  velox::test::assertEqualVectors(expected, result);
}

@aditi-pandit
Copy link
Collaborator

aditi-pandit commented Jan 5, 2024

Great analysis @kagamiori. Thanks !

kagamiori added a commit to kagamiori/velox that referenced this issue Jan 8, 2024
Summary:
min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs:
1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior.
2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues().

This diff fixes facebookincubator#8138.

Differential Revision: D52575661
@kagamiori
Copy link
Contributor Author

kagamiori commented Jan 8, 2024

Another bug in min/max(x, n) is found during the testing of the fix #8296.

TEST_F(MinMaxNTest, aaa) {
  // SELECT
  //  c0, c1, c2, c3,
  //  max(c0, c1) over (partition by c2 order by c3 asc)
  // FROM (
  //  VALUES
  //      (1, 10, false, 0),
  //      (2, 10, false, 1)
  // ) AS t(c0, c1, c2, c3)
  auto data = makeRowVector({
      makeFlatVector<int64_t>({1, 2}),
      makeFlatVector<int64_t>({10, 10}),
      makeFlatVector<bool>({false, false}),
      makeFlatVector<int64_t>({0, 1}),
  });

  auto plan =
      PlanBuilder()
          .values({data})
          .window({"max(c0, c1) over (partition by c2 order by c3 asc)"})
          .planNode();

  auto result = AssertQueryBuilder(plan).copyResults(pool());
  auto expected = makeRowVector({
      makeFlatVector<int64_t>({1, 2}),
      makeFlatVector<int64_t>({10, 10}),
      makeFlatVector<bool>({false, false}),
      makeFlatVector<int64_t>({0, 1}),
      makeArrayVector<int64_t>({{1}, {2, 1}}),
  });
  // Expected result: {1, 10, false, 0, [1]}, {2, 10, false, 1, [2, 1]}.
  // Current wrong result: {1, 10, false, 0, [1]}, {2, 10, false, 1, [2]}.
  facebook::velox::test::assertEqualVectors(expected, result);
}

The root cause is that for min/max(x, n), when aggregation result is extracted from accumulator, the accumulator should retain its content (https://github.com/prestodb/presto/blob/bed052afeef85c728d0c99237be1047f60e24839/presto-main/src/main/java/com/facebook/presto/operator/aggregation/AbstractMinMaxNAggregationFunction.java#L180). In the current Velox implementation, accumulator content is cleared after the content is extracted (

).

One interesting thing is that this behavior of min/max(x, n) is exactly the opposite to the behavior of min_by/max_by(x, y, n) in Presto that clears the content in accumulator after extraction (https://github.com/prestodb/presto/blob/cb582bce0fd18f51d8862a8b2f53a134780f41aa/presto-main/src/main/java/com/facebook/presto/operator/aggregation/minmaxby/AbstractMinMaxByNAggregationFunction.java#L153). I wonder whether this difference is intended or a bug in Presto.

cc @mbasmanova

kagamiori added a commit to kagamiori/velox that referenced this issue Jan 9, 2024
Summary:
In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from 
accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 9, 2024
Summary:

In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from 
accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 9, 2024
Summary:

min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs:
1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior.
2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues().

This diff fixes facebookincubator#8138.

Differential Revision: D52575661
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 9, 2024
Summary:

In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from 
accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 9, 2024
Summary:

min_by/max_by(x, y, n) functions produce different results from Presto when invoked in Window operator. There are two bugs:
1. MinMaxByNAggregate::extractValues() should return NULL when the accumulator is empty to follow Presto's behavior.
2. To be consistent with Presto, AggregateWindowFunction::incrementalAggregation() should call aggregate_->extractValues() again even if the current frame is already evaluated at the previous row. This is necessary for some function such as min_by/max_by that change the accumulator by the previous extractValues().

This diff fixes facebookincubator#8138.

Differential Revision: D52575661
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 11, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer 
rows with the same frame. In this situation, the aggregation result is only computed at the first row 
of the peer and the rest rows simply copy this result. This optimization assumes that results of 
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) 
in Velox breaks this assumption because their extractValues() method causes the accumulator to 
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) 
to make the extraction method not clear the accumulator. The behavior after the fix also align with 
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() 
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) 
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 13, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 17, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 17, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@kagamiori kagamiori added the window Issues related to window operation label Jan 18, 2024
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 22, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 22, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
@kagamiori kagamiori reopened this Jan 26, 2024
kagamiori added a commit to kagamiori/velox that referenced this issue Jan 30, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
kagamiori added a commit to kagamiori/velox that referenced this issue Feb 8, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this issue Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aggregates bug Something isn't working fuzzer-found window Issues related to window operation
Projects
None yet
3 participants