feat(spark): build and publish multi-variant targets#707
feat(spark): build and publish multi-variant targets#707andrew-coleman merged 3 commits intosubstrait-io:mainfrom
Conversation
06be1c4 to
3f137e8
Compare
a7c0931 to
ecdc2e6
Compare
WIP Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
Adds support for building/publishing multiple targets for different versions of scala (2.12, 2.13) and spark (3.4, 3.5, 4.0). Other combinations can be added in the future. Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com> # Conflicts: # gradle/libs.versions.toml
There are a number of Spark API breaking changes between 3.x and 4.x. This commit refactors the code to abstract affected API calls into a compatibility interface which can be implemented by each version. Signed-off-by: Andrew Coleman <andrew_coleman@uk.ibm.com>
bestbeforetoday
left a comment
There was a problem hiding this comment.
A few minor things that can be tidied up, either in this PR or in a later PR (see comments). Generally looks OK to me though so approved for you to merge when you are ready.
| // Scala dependencies | ||
| implementation("org.scala-lang:scala-library:$scalaVersion") | ||
| testImplementation("org.scalatest:scalatest_$scalaBinary:3.2.19") | ||
| testRuntimeOnly("org.scalatestplus:junit-5-12_$scalaBinary:3.2.19.0") | ||
|
|
||
| // Spark dependencies | ||
| api("org.apache.spark:spark-core_$scalaBinary:$sparkVersion") | ||
| api("org.apache.spark:spark-sql_$scalaBinary:$sparkVersion") | ||
| implementation("org.apache.spark:spark-hive_$scalaBinary:$sparkVersion") | ||
| implementation("org.apache.spark:spark-catalyst_$scalaBinary:$sparkVersion") | ||
| testImplementation("org.apache.spark:spark-core_$scalaBinary:$sparkVersion:tests") | ||
| testImplementation("org.apache.spark:spark-sql_$scalaBinary:$sparkVersion:tests") | ||
| testImplementation("org.apache.spark:spark-catalyst_$scalaBinary:$sparkVersion:tests") |
There was a problem hiding this comment.
Probably should be using values from libs here but that can be tidied up later.
| val sparkVersion = "3.4.4" | ||
| val scalaVersion = "2.12.20" | ||
| val sparkMajorMinor = "3.4" | ||
| val scalaBinary = "2.12" |
There was a problem hiding this comment.
It would be nice to pull these values out of libs to keep everything consistent and the versions defined in a single place. This can be tidied up later though.
| "spark34_2.12" to Triple(":spark:spark-3.4_2.12", "3.4.4", "2.12"), | ||
| "spark35_2.12" to Triple(":spark:spark-3.5_2.12", "3.5.4", "2.12"), | ||
| "spark40_2.13" to Triple(":spark:spark-4.0_2.13", "4.0.2", "2.13"), |
There was a problem hiding this comment.
It would be nice for these versions to be centralised in libs. That can be tidied up later.
| // case plan if SparkCompat.instance.supportsWindowGroupLimit => | ||
| // SparkCompat.instance.handleWindowGroupLimit(plan, visit) |
There was a problem hiding this comment.
Probably should remove commented out code.
| ### Publish to Maven Central Portal | ||
|
|
||
| Publish all variants to Maven Central: | ||
|
|
||
| ```bash | ||
| ./gradlew :spark:publishAllVariantsToCentralPortal | ||
| ``` | ||
|
|
||
| Or publish a specific variant: | ||
|
|
||
| ```bash | ||
| ./gradlew :spark:spark-4.0_2.13:publishMaven-publishPublicationToNmcpRepository | ||
| ``` |
There was a problem hiding this comment.
I am not sure this section is correct. It can be reviewed and corrected (or removed) later.
|
Thanks Mark. In the interest of getting this delivered, I'll merge now and address your comments in a followup PR. |
This PR adds the framework to build and publish multiple Substrait Spark packages targeted at different versions of Scala and Spark.
This is a large PR, however, no functional changes are made. It is split into three commits:
lpadandrpadhad to be added because the Spark 4.0.2 query planner/optimizer inserted them for many of the TPCDS queries. Thedialect.yamlwas regenerated accordingly.2.12and2.13.3.4,3.5and4.0.spark34_2.12,spark35_2.12andspark40_2.13