New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Spark 3-based builds #9524
Conversation
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Actually, it looks like I'm having some trouble with block matrices in Python. |
What errors? I'll run CI on this, but I don't expect the spark 2 stuff to have any problems. |
I don't believe there were any glaring problems on the Spark 2 side, but some the Python tests on Spark 3 are failing, notably in |
Passing CI tests for Spark 2.4. Do you have a stack trace for a failure? Sriram saw issues related to Breeze in #9199, but I think it was the bug you noted above. |
@@ -176,13 +176,14 @@ class Method private[lir] ( | |||
def findBlocks(): Blocks = { | |||
val blocksb = new ArrayBuilder[Block]() | |||
|
|||
val s = new mutable.Stack[Block]() | |||
var s = List[Block]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just make this a new ArrayStack[Block]()
? I saw this a few weeks ago and meant to fix it, so glad this came up again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayStack is a one of ours, is.hail.utils.ArrayStack
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Yep, I bumped into the Breeze bug a while back while trying to upgrade Hail to Spark 3 internally, and realized it'd be a blocker downstream. I've only seen issues on the Python side; for example:
|
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Turns out I needed to pull in the transitive breeze dependencies; without these, the Python tests failed while the Scala tests passed. I've re-run the tests on Spark 3 now; let me know if there are any other changes you'd like to see. Unfortunately, there's a small bug I saw on the Spark 3 MLLib side; I'll make sure this gets addressed ASAP: https://issues.apache.org/jira/browse/SPARK-33043 |
Signed-off-by: Karen Feng <karen.feng@databricks.com>
Ran CI tests, looks like everything is passing now. Will leave approval to @tpoterba though |
Thanks for doing this! Should make things easier when dataproc actually releases a Spark 3 version |
OK, looks good. When this goes in, I'll add a CI target to build against spark 3 so we don't accidentally break it again. |
Although Dataproc does not have a public Spark 3-based GA release schedule yet, it'd probably be helpful to start supporting a Spark 3 build; tagging @tpoterba for context.
I'm not familiar with the release process internally, so let me know what other changes need to be made to accommodate this. In particular, this PR likely needs to change the PySpark requirements specified in https://github.com/hail-is/hail/blob/main/hail/python/requirements.txt.
This PR builds on changes from #9199.
The code changes are due to Scala 2.12 and Spark 3 changes:
y
inx << y
must be an intmutable.Stack
is deprecatedJavaConversions
is deprecatedaddTaskCompletionListener
is overloadedRow.merge()
is deprecatedThe build changes are as follows:
scalatest 3.0.5
for Scala 2.12 compatibilitypyspark
version inpython/requirements.txt
to matchSCALA_VERSION
duringmake install-deps
The following testing commands pass (at least to the degree that
main
does):make -j8 test SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.5
make -j8 test SCALA_VERSION=2.12.8 SPARK_VERSION=3.0.0