-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow custom result verifiers in AggregationFuzzer #7674
Allow custom result verifiers in AggregationFuzzer #7674
Conversation
✅ Deploy Preview for meta-velox canceled.
|
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
ce0aff1
to
2c5bf35
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2c5bf35
to
f2ca0f7
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
/// Returns true if 'compare' API is supported. The verifier must support | ||
/// either 'compare' or 'verify' API. If both are supported, 'compare' API is | ||
/// used and 'verify' API is ignored. | ||
virtual bool supportsCompare() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference here between compare and verify ?
Is verify testing for equality ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some functions can be verified by comparing Velox with reference DB. These functions may require results to be transformed first. For example, array_agg can be verified this way by applying array_sort to the result. This is what 'compare' API is for.
Other functions cannot be verified like this. For example, approx_xxx functions. These require more sophisticated verification. Here, we use 'verify' API. This API takes only on result as having another result is not helpful.
/// | ||
/// 'initialize' must be called first. 'compare' may be called multiple times | ||
/// after single 'initialize' call. | ||
virtual bool compare( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : if we are comparing result and altResult, maybe rename altResult to otherResult ?
|
||
bool compare(const RowVectorPtr& result, const RowVectorPtr& altResult) | ||
override { | ||
return assertEqualResults({transform(result)}, {transform(altResult)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we check that projections_ is set or initialize was called before this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added VELOX_CHECK(!projections_.empty()); to transform helper method.
/// | ||
/// 'initialize' must be called first. 'verify' may be called multiple times | ||
/// after single 'initialize' call. | ||
virtual bool verify(const RowVectorPtr& result) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for verification, might it also make sense to pass in the other result ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, verify doesn't take another result. See my earlier explanation.
return std::make_shared<TransformResultVerifier>(transform); | ||
} | ||
|
||
bool supportsCompare() override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if we can pass in comparison or verification as a lambda so we might not need the supports functions ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 API are different and we need a way to know which API is supported.
f2ca0f7
to
dd1c4ed
Compare
@kgpai Krishna, thank you for review. I replied / addressed your comments. Would you take another look? |
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova LGTM. Thanks!
dd1c4ed
to
310f7eb
Compare
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…#7674) Summary: Introduce ResultVerifier interface to customize result comparison and verification for individual aggregate functions. This can be used to (1) apply custom transformation to results before comparing, e.g. canonicalize an array for array_agg; (2) verify results of approx_xxx functions by comparing these with "perfect" results. Add TransformResultVerifier implementation to apply transformation to results before comparing. Use it for array_agg, set_agg, set_union and other functions. Before this change it was already possible to apply custom transformations before comparing results. This change generalizes this logic to allow for more sophisticated custom verifiers. For example, verifier for approx_distinct will compare results with results of count(distinct). Follow-up PRs will introduce custom verifiers for approx_distinct and approx_percentile. Reviewed By: xiaoxmeng Differential Revision: D51495384 Pulled By: mbasmanova
310f7eb
to
7717216
Compare
This pull request was exported from Phabricator. Differential Revision: D51495384 |
@mbasmanova merged this pull request in 8e186cf. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Hi @mbasmanova. Since the skewness algorithm in Presto and DuckDB is not consistent, the Velox skewness result needs to be multiplied by a coefficient in order to compare it with DuckDB's result. This has been discussed in #4845. Additionally, in some special cases, there may also be differences in the calculation results between Can |
@liujiayi771 Joey, thank you for sharing this context and offering help. We are actively working towards replacing DuckQueryRunner with PrestoQueryRunner so that we can compare Velox results with Presto, not DuckDB: #7628 I assume we'll be able to verify skewness and kurtosis functions as is then. Let me know if that's not the case. |
@mbasmanova OK. I will test |
Introduce ResultVerifier interface to customize result comparison and
verification for individual aggregate functions. This can be used to (1) apply
custom transformation to results before comparing, e.g. canonicalize an array
for array_agg; (2) verify results of approx_xxx functions by comparing these
with "perfect" results.
Add TransformResultVerifier implementation to apply transformation to results
before comparing. Use it for array_agg, set_agg, set_union and other
functions.
Before this change it was already possible to apply custom transformations
before comparing results. This change generalizes this logic to allow for more
sophisticated custom verifiers. For example, verifier for approx_distinct will
compare results with results of count(distinct).
Follow-up PRs will introduce custom verifiers for approx_distinct and
approx_percentile.