Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GLUTEN-3154][FOLLOWUP] Add keyGroupedPartitioning in BatchScanExecTransformer for spark-3.3 #3184

Merged
merged 1 commit into from
Sep 18, 2023

Conversation

liujiayi771
Copy link
Contributor

What changes were proposed in this pull request?

Fix Spark-3.3 BatchScanExecTransformer makeCopy error after removing pushdownFilters parameter in BatchScanExecTransformer.

Exception in thread "main" java.lang.IllegalStateException:
Failed to copy node.
Is otherCopyArgs specified correctly for BatchScanExecTransformer.
Exception message: wrong number of arguments
ctor: public io.glutenproject.execution.BatchScanExecTransformer(scala.collection.Seq,org.apache.spark.sql.connector.read.Scan,scala.collection.Seq)?
types: class scala.collection.immutable.$colon$colon, class org.apache.iceberg.spark.source.SparkBatchQueryScan, class scala.collection.immutable.$colon$colon, class scala.None$
args: List(xxxxx#10037, xxxxx#10046, xxxxx#10047L, xxxxx#10048L, xxxxx#10079, xxxxx#10080, xxxxx#10081), IcebergScan(table=default_iceberg.xxxxx.xxxxx, type=struct<1: xxxxx: optional string, 10: xxxxx: optional string, 11: xxxxx: optional long, 12: xxxxx: optional long, 43: xxxxx: optional string, 44: xxxxx: optional string, 45: xxxxx: optional string>, filters=[not_null(ref(name="xxxxx")), not_null(ref(name="xxxxx")), ref(name="xxxxx") == "2023-08-01", not(ref(name="xxxxx") == "xxxxx"), not_null(ref(name="xxxxx")), not_null(ref(name="xxxxx"))], runtimeFilters=[], caseSensitive=false), List(dynamicpruningexpression(true), dynamicpruningexpression(xxxxx#10080 IN dynamicpruning#10622)), None

	at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:854)
	at org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy(TreeNode.scala:797)
	at org.apache.spark.sql.execution.SparkPlan.super$makeCopy(SparkPlan.scala:100)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$makeCopy$1(SparkPlan.scala:100)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
	at org.apache.spark.sql.execution.SparkPlan.makeCopy(SparkPlan.scala:100)
	at org.apache.spark.sql.execution.SparkPlan.makeCopy(SparkPlan.scala:60)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:223)
	at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:188)
	at org.apache.spark.sql.execution.reuse.ReuseExchangeAndSubquery$$anonfun$org$apache$spark$sql$execution$reuse$ReuseExchangeAndSubquery$$reuse$1$1.applyOrElse(ReuseExchangeAndSubquery.scala:54)
	at org.apache.spark.sql.execution.reuse.ReuseExchangeAndSubquery$$anonfun$org$apache$spark$sql$execution$reuse$ReuseExchangeAndSubquery$$reuse$1$1.applyOrElse(ReuseExchangeAndSubquery.scala:44)

In Spark-3.3, when Spark is in TreeNode.makeCopy, it will call the productArity method of Product to obtain the number of parameters of the case class. The current number of parameters of BatchScanExecTransformer is 3, but productArity gets 4. After my test, productArity gets the number of parameters of the first specific case class, which is the number of parameters of BatchScanExec, which does not match the number of BatchScanExecTransformer constructors.

I think the standard approach here is that if we need to inherit Spark's case class plan, we need to ensure that the number of parameters is consistent, otherwise it will cause problems when plan.transform calls makeCopy. Or implement the otherCopyArgs method to ensure that not transformed args can be copied.

@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@liujiayi771 liujiayi771 changed the title Add keyGroupedPartitioning arg for spark-3.3 [GLUTEN-3154][FOLLOWUP] Add keyGroupedPartitioning arg for spark-3.3 Sep 18, 2023
@github-actions
Copy link

#3154

@github-actions
Copy link

Run Gluten Clickhouse CI

@liujiayi771 liujiayi771 changed the title [GLUTEN-3154][FOLLOWUP] Add keyGroupedPartitioning arg for spark-3.3 [GLUTEN-3154][FOLLOWUP] Add keyGroupedPartitioning in BatchScanExecTransformer for spark-3.3 Sep 18, 2023
@liujiayi771
Copy link
Contributor Author

@PHILO-HE Please take a look.

Copy link
Contributor

@PHILO-HE PHILO-HE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging out the underneath reason!

@PHILO-HE PHILO-HE merged commit bb48a99 into apache:main Sep 18, 2023
16 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3184_time.csv log/native_master_09_17_2023_8cd51b20d_time.csv difference percentage
q1 44.41 43.28 -1.129 97.46%
q2 24.48 24.40 -0.080 99.67%
q3 37.26 37.21 -0.044 99.88%
q4 41.02 41.84 0.813 101.98%
q5 69.81 70.08 0.271 100.39%
q6 6.59 6.51 -0.073 98.88%
q7 84.81 85.79 0.974 101.15%
q8 81.05 78.95 -2.098 97.41%
q9 115.80 115.22 -0.586 99.49%
q10 50.11 47.02 -3.097 93.82%
q11 19.28 18.81 -0.467 97.58%
q12 22.43 24.49 2.065 109.21%
q13 49.66 50.28 0.625 101.26%
q14 16.14 13.73 -2.403 85.11%
q15 31.64 26.40 -5.238 83.45%
q16 15.78 15.90 0.123 100.78%
q17 121.87 119.56 -2.312 98.10%
q18 161.96 160.35 -1.612 99.00%
q19 12.11 12.01 -0.091 99.25%
q20 26.64 26.92 0.286 101.07%
q21 233.98 232.60 -1.378 99.41%
q22 15.60 15.58 -0.014 99.91%
total 1282.43 1266.96 -15.465 98.79%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_09_18_2023_time.csv log/native_master_09_17_2023_8cd51b20d_time.csv difference percentage
q1 43.04 43.28 0.242 100.56%
q2 24.97 24.40 -0.566 97.73%
q3 36.87 37.21 0.349 100.95%
q4 41.75 41.84 0.083 100.20%
q5 69.76 70.08 0.320 100.46%
q6 5.05 6.51 1.465 129.02%
q7 86.37 85.79 -0.580 99.33%
q8 80.53 78.95 -1.576 98.04%
q9 117.29 115.22 -2.066 98.24%
q10 47.13 47.02 -0.111 99.76%
q11 19.27 18.81 -0.453 97.65%
q12 25.24 24.49 -0.751 97.02%
q13 50.78 50.28 -0.502 99.01%
q14 18.15 13.73 -4.415 75.68%
q15 28.92 26.40 -2.515 91.30%
q16 15.70 15.90 0.201 101.28%
q17 120.19 119.56 -0.632 99.47%
q18 161.67 160.35 -1.317 99.19%
q19 12.35 12.01 -0.334 97.29%
q20 28.67 26.92 -1.744 93.92%
q21 243.12 232.60 -10.519 95.67%
q22 15.16 15.58 0.423 102.79%
total 1291.96 1266.96 -25.002 98.06%

class BatchScanExecTransformer(
output: Seq[AttributeReference],
@transient scan: Scan,
runtimeFilters: Seq[Expression])
runtimeFilters: Seq[Expression],
keyGroupedPartitioning: Option[Seq[Expression]] = None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771 Does this change affect Spark3.2? Because spark 3.2 doesn't contain the keyGroupedPartitioning parameter.

Copy link
Contributor Author

@liujiayi771 liujiayi771 Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf It will not affect spark3.2, it is an optional parameter. In Java class, this class will have a constructor without keyGroupedPartitioning parameter. Did you encounter any problems?

Copy link
Contributor

@JkSelf JkSelf Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771 We are currently working on upgrading to Spark 3.4 in Gluten. One of the changes in Spark 3.4 is the modification of BatchScanExec. It introduces a new table parameter. Currently, I have placed all the parameters in the constructor of BatchScanExecTransformer here. However the table is not an optional parameter, I am unsure whether this change will affect Spark 3.2 and 3.3 and cause them to encounter this issue. Do you have any suggestions regarding this? Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf According to the description of this PR, I think that such modification in 3.4 will cause the same problems that I encountered before. The reason is as explained in the description. Adding it directly to member variables will cause the lower version spark to not be able to find the corresponding constructors.

Copy link
Contributor Author

@liujiayi771 liujiayi771 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JkSelf It may be necessary to re-invent BatchScanExecTransformer so that it does not inherit Spark's BatchScanExec. Inheriting case class is not a good practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liujiayi771 Yes, I will override the otherCopyArgs method in the shim layer to bypass this issue when upgrading to Spark 3.4 here. We can further optimize the BatchScanExecTransformer in another pull request because it appears that the gluten override the case class for most XXTransformer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants