Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom result verifiers in AggregationFuzzer #7674

Conversation

mbasmanova
Copy link
Contributor

Introduce ResultVerifier interface to customize result comparison and
verification for individual aggregate functions. This can be used to (1) apply
custom transformation to results before comparing, e.g. canonicalize an array
for array_agg; (2) verify results of approx_xxx functions by comparing these
with "perfect" results.

Add TransformResultVerifier implementation to apply transformation to results
before comparing. Use it for array_agg, set_agg, set_union and other
functions.

Before this change it was already possible to apply custom transformations
before comparing results. This change generalizes this logic to allow for more
sophisticated custom verifiers. For example, verifier for approx_distinct will
compare results with results of count(distinct).

Follow-up PRs will introduce custom verifiers for approx_distinct and
approx_percentile.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 21, 2023
Copy link

netlify bot commented Nov 21, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 7717216
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/655d081655e5140007eaa01f

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

/// Returns true if 'compare' API is supported. The verifier must support
/// either 'compare' or 'verify' API. If both are supported, 'compare' API is
/// used and 'verify' API is ignored.
virtual bool supportsCompare() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference here between compare and verify ?
Is verify testing for equality ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions can be verified by comparing Velox with reference DB. These functions may require results to be transformed first. For example, array_agg can be verified this way by applying array_sort to the result. This is what 'compare' API is for.

Other functions cannot be verified like this. For example, approx_xxx functions. These require more sophisticated verification. Here, we use 'verify' API. This API takes only on result as having another result is not helpful.

///
/// 'initialize' must be called first. 'compare' may be called multiple times
/// after single 'initialize' call.
virtual bool compare(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : if we are comparing result and altResult, maybe rename altResult to otherResult ?


bool compare(const RowVectorPtr& result, const RowVectorPtr& altResult)
override {
return assertEqualResults({transform(result)}, {transform(altResult)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we check that projections_ is set or initialize was called before this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added VELOX_CHECK(!projections_.empty()); to transform helper method.

///
/// 'initialize' must be called first. 'verify' may be called multiple times
/// after single 'initialize' call.
virtual bool verify(const RowVectorPtr& result) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for verification, might it also make sense to pass in the other result ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, verify doesn't take another result. See my earlier explanation.

return std::make_shared<TransformResultVerifier>(transform);
}

bool supportsCompare() override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if we can pass in comparison or verification as a lambda so we might not need the supports functions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 API are different and we need a way to know which API is supported.

@mbasmanova
Copy link
Contributor Author

@kgpai Krishna, thank you for review. I replied / addressed your comments. Would you take another look?

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbasmanova LGTM. Thanks!

@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…#7674)

Summary:
Introduce ResultVerifier interface to customize result comparison and
verification for individual aggregate functions. This can be used to (1) apply
custom transformation to results before comparing, e.g. canonicalize an array
for array_agg; (2) verify results of approx_xxx functions by comparing these
with "perfect" results.

Add TransformResultVerifier implementation to apply transformation to results
before comparing. Use it for array_agg, set_agg, set_union and other
functions.

Before this change it was already possible to apply custom transformations
before comparing results. This change generalizes this logic to allow for more
sophisticated custom verifiers. For example, verifier for approx_distinct will
compare results with results of count(distinct).

Follow-up PRs will introduce custom verifiers for approx_distinct and
approx_percentile.


Reviewed By: xiaoxmeng

Differential Revision: D51495384

Pulled By: mbasmanova
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D51495384

@facebook-github-bot
Copy link
Contributor

@mbasmanova merged this pull request in 8e186cf.

Copy link

Conbench analyzed the 1 benchmark run on commit 8e186cfc.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

@liujiayi771
Copy link
Contributor

liujiayi771 commented Nov 22, 2023

Hi @mbasmanova. Since the skewness algorithm in Presto and DuckDB is not consistent, the Velox skewness result needs to be multiplied by a coefficient in order to compare it with DuckDB's result. This has been discussed in #4845.

Additionally, in some special cases, there may also be differences in the calculation results between NaN and NULL. You can refer to issue comment.

Can TransformResultVerifier support these situations? If it can, I will adapt agg functions such as skewness and kurtosis in fuzzer test.

@mbasmanova
Copy link
Contributor Author

@liujiayi771 Joey, thank you for sharing this context and offering help. We are actively working towards replacing DuckQueryRunner with PrestoQueryRunner so that we can compare Velox results with Presto, not DuckDB: #7628 I assume we'll be able to verify skewness and kurtosis functions as is then. Let me know if that's not the case.

CC: @kgpai @duanmeng

@liujiayi771
Copy link
Contributor

@mbasmanova OK. I will test skewness and kurtosis with the new PrestoQueryRunner to check if we can remove them from customVerificationFunctions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants