-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for Spark 3 TaskContext to fetch GPU resources per task #1584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice wiring
@@ -0,0 +1,567 @@ | |||
# Copyright 2017 onwards, fast.ai, Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 567 lines script deviates from examples/keras_spark_rosspann.py
in only 11 rows. Differences should only be spark3 specific, but I suspect both files will diverge quite quickly on other changes.
I suggest to put some github actions in place that compare both scripts and flag up unexpected deviations, as put in place for README.rst
and docs/summary.rst
. I am happy to create a PR for this once this is merged into master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. At some point we may want to further consolidate things so that we don't have so many nearly identical scripts. Once Spark 3 is out, we may just replace the old examples or add special handling for older versions of Spark.
After @EnricoMi 's comments, LGTM. |
the changes look good for local-cluster and standalone deployment. If you want to support yarn or k8s the configs for the gpu scheduling will be slightly different. Note that your default - local-cluster[2,1,1024] is using 2 workers and thus is relying on you also having 2 GPUs on the host. If you try to run on a host without at least 2 GPUs, it will fail with a error saying not enough GPU addresses available. |
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
Signed-off-by: Travis Addair <taddair@uber.com>
@tgravescs