Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix min/max(x, n) #8311

Closed
wants to merge 2 commits into from
Closed

Conversation

kagamiori
Copy link
Contributor

@kagamiori kagamiori commented Jan 9, 2024

Summary:
Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff fixes #8138.

Differential Revision: D52638334

Copy link

netlify bot commented Jan 9, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3e560d6
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65aea4a0b7780a000865116a

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 9, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 9, 2024
Summary:

In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from 
accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 9, 2024
Summary:

In Presto, accumulators of min/max(x, n) do not clear the heap when values are extracted from 
accumulator. But in Velox they do. Fix this bug to make Velox behavior align with Presto.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 11, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer 
rows with the same frame. In this situation, the aggregation result is only computed at the first row 
of the peer and the rest rows simply copy this result. This optimization assumes that results of 
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n) 
in Velox breaks this assumption because their extractValues() method causes the accumulator to 
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n) 
to make the extraction method not clear the accumulator. The behavior after the fix also align with 
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues() 
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n) 
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

.window({"max(c0, c1) over (partition by c2 order by c3 asc)"})
.planNode();

auto result = AssertQueryBuilder(plan).copyResults(pool());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyResults -> assertResults(expected)

if (auto field =
dynamic_cast<const core::FieldAccessTypedExpr*>(arg.get())) {
auto channel = type.getChildIdx(field->name());
columns.push_back(input->childAt(channel));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input->childAt(field->name());

  /// Returns child vector for the specified field name. Throws if field with
  /// specified name doesn't exist.
  VectorPtr& childAt(const std::string& name) {
    return children_[type_->asRow().getChildIdx(name)];
  }

const core::CallTypedExprPtr& aggregateExpr,
const RowVectorPtr& input,
memory::MemoryPool* pool) {
auto type = input->type()->as<TypeKind::ROW>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asRow()

std::vector<VectorPtr> columns;
for (const auto& arg : aggregateExpr->inputs()) {
if (auto field =
dynamic_cast<const core::FieldAccessTypedExpr*>(arg.get())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypedExprs::asFieldAccess(arg)

auto channel = type.getChildIdx(field->name());
columns.push_back(input->childAt(channel));
}
if (dynamic_cast<const core::ConstantTypedExpr*>(arg.get())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isConstant(arg)

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 13, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 17, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 17, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 19, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

Summary:

Add a test in testAggregations to check that Aggregate::extractAccumulators and
Aggregate::extractValues don't have side effects of changing the accumulators.

min, max, min_by, and max_by currently have side effects. avg and sum for decimal
inputs have undefined values in intermediate results. So they are excluded from the
added testing.

Reviewed By: mbasmanova

Differential Revision: D52755566
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 22, 2024
Summary:

Velox has an optimization for Window operation with incremental aggregation when there are peer
rows with the same frame. In this situation, the aggregation result is only computed at the first row
of the peer and the rest rows simply copy this result. This optimization assumes that results of
incremental aggregation in Window operation on peer rows should be the same. However, min/max(x, n)
in Velox breaks this assumption because their extractValues() method causes the accumulator to
be cleared, making the peer rows after the first row expect a different result. This diff fixes min/max(x, n)
to make the extraction method not clear the accumulator. The behavior after the fix also align with
Presto's.

This diff also adds a method testIncrementalAggregation in testAggregations to check that extractValues()
doesn't change accumulator afterwards for all aggregation functions. After this fix, only min_by/max_by(x, y, n)
doesn't pass testIncrementalAggregation.

This diff fixes facebookincubator#8138.

Reviewed By: mbasmanova

Differential Revision: D52638334
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D52638334

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d747ac3.

Copy link

Conbench analyzed the 1 benchmark run on commit d747ac33.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 26, 2024
Summary:
Same as bug in min/max(x, n) fixed in facebookincubator#8311, 
min_by/max_by(x, y, n) also breaks the assumption of incremental window aggregation 
because their extractValues() methods has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n) not 
clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). 
This fix will make Velox's min_by/max_by behave differently from Presto when used in Window 
operation, until prestodb/presto#21653 is fixed.

Differential Revision: D53139892
kagamiori added a commit to kagamiori/velox that referenced this pull request Jan 30, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
kagamiori added a commit to kagamiori/velox that referenced this pull request Feb 8, 2024
Summary:

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Differential Revision: D53139892
facebook-github-bot pushed a commit that referenced this pull request Feb 12, 2024
Summary:
Pull Request resolved: #8566

Same as bug in min/max(x, n) fixed in #8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes #8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
FelixYBW pushed a commit to FelixYBW/velox that referenced this pull request Feb 12, 2024
Summary:
Pull Request resolved: facebookincubator#8566

Same as bug in min/max(x, n) fixed in facebookincubator#8311, min_by/max_by(x, y, n) also breaks the
assumption of incremental window aggregation because their extractValues() methods
has a side effect of clearing the accumulator.

This diff fixes this issue by making the extractValues() methods of min_by/max_by(x, y, n)
not clear the accumulators.

Since Presto's min_by/max_by have the same bug (prestodb/presto#21653). This fix
will make Velox's min_by/max_by behave differently from Presto when used in Window
operation, until prestodb/presto#21653 is fixed.

This diff fixes facebookincubator#8138.

Reviewed By: bikramSingh91

Differential Revision: D53139892

fbshipit-source-id: 1323f22196e22554c0d880d20584a4ee4059b64c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

min_by/max_by returns wrong result in Window operation
3 participants