-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-3154][FOLLOWUP] Add keyGroupedPartitioning in BatchScanExecTransformer for spark-3.3 #3184
Conversation
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/oap-project/gluten/issues Then could you also rename commit message and pull request title in the following format?
See also: |
Run Gluten Clickhouse CI |
@PHILO-HE Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging out the underneath reason!
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
class BatchScanExecTransformer( | ||
output: Seq[AttributeReference], | ||
@transient scan: Scan, | ||
runtimeFilters: Seq[Expression]) | ||
runtimeFilters: Seq[Expression], | ||
keyGroupedPartitioning: Option[Seq[Expression]] = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiayi771 Does this change affect Spark3.2? Because spark 3.2 doesn't contain the keyGroupedPartitioning
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf It will not affect spark3.2, it is an optional parameter. In Java class, this class will have a constructor without keyGroupedPartitioning parameter. Did you encounter any problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiayi771 We are currently working on upgrading to Spark 3.4 in Gluten. One of the changes in Spark 3.4 is the modification of BatchScanExec. It introduces a new table parameter. Currently, I have placed all the parameters in the constructor of BatchScanExecTransformer here. However the table is not an optional parameter, I am unsure whether this change will affect Spark 3.2 and 3.3 and cause them to encounter this issue. Do you have any suggestions regarding this? Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf According to the description of this PR, I think that such modification in 3.4 will cause the same problems that I encountered before. The reason is as explained in the description. Adding it directly to member variables will cause the lower version spark to not be able to find the corresponding constructors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JkSelf It may be necessary to re-invent BatchScanExecTransformer so that it does not inherit Spark's BatchScanExec. Inheriting case class is not a good practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liujiayi771 Yes, I will override the otherCopyArgs
method in the shim layer to bypass this issue when upgrading to Spark 3.4 here. We can further optimize the BatchScanExecTransformer in another pull request because it appears that the gluten override the case class for most XXTransformer.
What changes were proposed in this pull request?
Fix Spark-3.3 BatchScanExecTransformer makeCopy error after removing pushdownFilters parameter in BatchScanExecTransformer.
In Spark-3.3, when Spark is in TreeNode.makeCopy, it will call the productArity method of Product to obtain the number of parameters of the case class. The current number of parameters of BatchScanExecTransformer is 3, but productArity gets 4. After my test, productArity gets the number of parameters of the first specific case class, which is the number of parameters of BatchScanExec, which does not match the number of BatchScanExecTransformer constructors.
I think the standard approach here is that if we need to inherit Spark's case class plan, we need to ensure that the number of parameters is consistent, otherwise it will cause problems when plan.transform calls makeCopy. Or implement the otherCopyArgs method to ensure that not transformed args can be copied.