-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Spark cast expr based on cast hooks #7377
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Rui, overall I think it makes sense to Velox to allow applications to provide custom implementations for special forms and register new special forms as well. We need to figure out how to add proper APIs to do that.
Overriding CAST requires also customizing this code in ExprCompiler. CC: @bikramSingh91
} else if (auto cast = dynamic_cast<const core::CastTypedExpr*>(expr.get())) {
VELOX_CHECK(!compiledInputs.empty());
if (FOLLY_UNLIKELY(*resultType == *compiledInputs[0]->type())) {
result = compiledInputs[0];
} else {
result = std::make_shared<CastExpr>(
resultType,
std::move(compiledInputs[0]),
trackCpuUsage,
cast->nullOnFailure());
}
@mbasmanova Thank you for the review. Can I prototype with |
@rui-mo Rui, please, go ahead. |
206aa66
to
b537923
Compare
@mbasmanova Masha, I made the prototype with
Seems it is not necessary if we don't want to parse cast from string. In this PR, SparkCastExpr can to be created with In this PR, parsing from string to SparkCastExpr is not included. But if it is needed, I can continue to work on that. Seems it is also a UT conveniency issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Thank you for prototyping Spark-specific CAST. Overall looks good. A few things that need to be taken care of.
- Move SparkCastExpr out of velox/expression. Perhaps, into functions/sparksql/specialforms folder I saw you adding in some other PR.
- Make sure that CastTypedExpr is equivalent to CallTypedExpr("cast"). Currently, these are not the same because ExprCompiler explicitly converts CastTypedExpr to CastExpr. Perhaps, this logic can be replaced with getSpecialForm("cast").
- Figure out all the differences between Spark and Presto CAST impementations to see whether it is feasible to continue using CastExpr as a base class.
c428aa7
to
d8b917d
Compare
2743676
to
6caec00
Compare
6caec00
to
fb900f9
Compare
@mbasmanova Thank you for the suggestions. Fixed them, and apology for the delay.
Summarized them in this table: #4876 (comment). |
@rui-mo Thank you, Rui. What's your assessment of these differences? Is it feasible to continue using CastExpr as a base class? |
@mbasmanova These cases can be divided into below categories.
As discussed with @PHILO-HE, we tend to use CastExpr as a base class, which can help reuse existing code and avoid duplicates. Take below case as an example.
we can fix some of them by overriding a velox/velox/expression/CastExpr-inl.h Lines 232 to 233 in 261d0b0
As for #5307 (comment), we don't need to implement duplicate cast functions in CastExpr and SparkCastExpr. By overriding a small part of code in SparkCastExpr to remove the white-spaces before cast should be enough. Glad to hear your opinions, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Rui, overall looks good. I have some question about the design: separate hook class vs. virtual methods in CastExpr. I also wonder how many hooks we would need to fully implement Spark semantics. Would it make sense to prototype this to see what it looks like and whether the set of hooks is managable.
Summary: Moves `testCast` function from CastExprTest.cpp to CastBaseTest.h for other test files to use. Separates `testCast` function into `testCast` (for valid output test), `testTryCast` (for try_cast test), and `testInvalidCast` (for exception test). #7377 (comment) #7377 (comment) Pull Request resolved: #7912 Reviewed By: xiaoxmeng Differential Revision: D51968059 Pulled By: mbasmanova fbshipit-source-id: 26fe9896766ce5b1ed175d3b85c930e7a0b2a1c8
79a821d
to
ef5b1ea
Compare
ef5b1ea
to
2ef89f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Rui, thank you for iterating on this. I'm having a few questions about the hooks API.
2ef89f8
to
71db27c
Compare
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
FYI, I replace
in the
to resolve some internal test failures. Also made some format changes. |
@kagamiori merged this pull request in 51dc97b. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
@rui-mo, this change turns out to break some string-related unit test in spark, and hence the linux build job in CI (https://app.circleci.com/pipelines/github/facebookincubator/velox/42248/workflows/18d750f7-eed8-466f-94d8-63958efad94a/jobs/291869/tests), so we're going to revert it. Could you take a look at the failed unit tests? |
Summary: The PR facebookincubator#7377 broke string-related unit tests in velox/functions/sparksql/tests/StringTest.cpp Reviewed By: kgpai Differential Revision: D52681222
@kagamiori Thank you for helping land this PR. I know the cause of this failure, and please check #8121. Could you tell me more details about the internal failures as #7377 (comment) mentioned? |
Hi @rui-mo, we're seeing the CI signal "ci/circleci: linux-build" started failing on the main branch since #7377 was landed (https://github.com/facebookincubator/velox/commits/main/). cc @kgpai |
Hi @kagamiori, I checked the log and found |
OK, after reading the context in #8121, I think I know the reason to the failure. It was likely the replacement of SetUpTestCase() with SparkCastExprTest(). This change however didn't break the internal test of velox_functions_spark_test. Let me try making a fix. |
SetUpTestCase() was replaced because |
@kagamiori Thank you. I understand. In this PR |
Here is a fix: #8346. |
Summary: Previously emptyOutput was not called in trimUnicodeWhiteSpace after the refactoring in facebookincubator#7377. This diff fixes it. Differential Revision: D52737293
Summary: Previously emptyOutput was not called in trimUnicodeWhiteSpace after the refactoring in facebookincubator#7377. This diff fixes it. Reviewed By: amitkdutta Differential Revision: D52737293
Summary: There are several corner cases of semantic differences on cast between Presto and Spark. According to the implementation of `registerFunctionCallToSpecialForm`, a previously registered special form can be overriden by one registered after it and of the same name.This PR implements SparkCastExpr, which is registered as special form, and can be customized with the help of cast hooks compatible with Spark's semantics. Below hooks are added to solve several semantic differences. - castStringToTimestamp - castStringToDate - castTimestampToString - legacy - removeWhiteSpaces - truncate Two configurations `kCastToIntByTruncate` and `kCastStringToDateIsIso8601` are replaced by cast hooks. These configurations are no longer used and will be removed by subsequent PRs. facebookincubator#4876 Fixes facebookincubator#8121. Pull Request resolved: facebookincubator#7377 Reviewed By: kgpai Differential Revision: D52566119 Pulled By: kagamiori fbshipit-source-id: 34577133550e112eddb7f8080b9d897c45ee1fec
Summary: Pull Request resolved: facebookincubator#8368 Previously emptyOutput was not called in trimUnicodeWhiteSpace after the refactoring in facebookincubator#7377. This diff fixes it. Reviewed By: amitkdutta Differential Revision: D52737293 fbshipit-source-id: ecaece86172646b1e3fd910555e859de91636b53
Summary: Pull Request resolved: facebookincubator#8368 Previously emptyOutput was not called in trimUnicodeWhiteSpace after the refactoring in facebookincubator#7377. This diff fixes it. Reviewed By: amitkdutta Differential Revision: D52737293 fbshipit-source-id: ecaece86172646b1e3fd910555e859de91636b53
@mbasmanova @kagamiori Thanks for your continuous support on this pull request. I'm working on below tasks discovered when implementing Spark cast. Could you spare some time to take a review? Thanks!
|
…teIsIso8601` (facebookincubator#8352) Summary: After facebookincubator#7377, kCastToIntByTruncate and kCastStringToDateIsIso8601 are no longer in use. Pull Request resolved: facebookincubator#8352 Reviewed By: mbasmanova Differential Revision: D52737612 Pulled By: kagamiori fbshipit-source-id: c82d55a592b85a16bce5c29b20d22f7a19c6c9a3
There are several corner cases of semantic differences on cast between Presto
and Spark. According to the implementation of
registerFunctionCallToSpecialForm
, a previously registered special form can beoverriden by one registered after it and of the same name.This PR implements
SparkCastExpr, which is registered as special form, and can be customized with
the help of cast hooks compatible with Spark's semantics. Below hooks are added
to solve several semantic differences.
Two configurations
kCastToIntByTruncate
andkCastStringToDateIsIso8601
arereplaced by cast hooks. These configurations are no longer used and will be removed
by subsequent PRs.
#4876
Fixes #8121.