New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[openmpi] Add custom resources support #772
[openmpi] Add custom resources support #772
Conversation
/test kubeflow-presubmit |
1 similar comment
/test kubeflow-presubmit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Review status: 0 of 2 files reviewed at latest revision, all discussions resolved. kubeflow/openmpi/workloads.libsonnet, line 178 at r1 (raw file):
Can you move { to its previous line? i.e. BTW, we have a script to autoformat: Comments from Reviewable |
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion. kubeflow/openmpi/workloads.libsonnet, line 102 at r1 (raw file):
Is it same as + operator? That seems more straightforward. See object composition in https://jsonnet.org/learning/tutorial.html Comments from Reviewable |
@everpeace I'm okay with supporting custom resources in this package. |
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions. kubeflow/openmpi/workloads.libsonnet, line 102 at r1 (raw file): Previously, jiezhang (Jie Zhang) wrote…
actually no. kubeflow/openmpi/workloads.libsonnet, line 178 at r1 (raw file): Previously, jiezhang (Jie Zhang) wrote…
Sure. Oh, I noticed the script and remembered I ran. Let me rerun. Comments from Reviewable |
3099852
to
664e5a3
Compare
@jiezhang I changed the format and squash it. Could you review again? I hope you'll approve this, Thanks 🙇 |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jiezhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
* Removing suggestions from manager interface * Removing long running services * Increasing timeout to 60 sec
…ubeflow#772) * image gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-g56c9025a * Image built from kubeflow/kubeflow@56c9025a
Motivation
Machine learning tasks sometimes require special hardware. Currently we operate several custom resources on our on-premise Kubernetes cluster to run distributed machine learning tasks (with openmpi package) which requires special hardware other than GPUs.
How
customResources
parameter. This specifies custom resources to assignopenmpi-job
containers in worker pods.custom-resource-name=amount
.Note
This feature doesn't break any backward compatibility. So I would be very happy if it could be merged to the official repo. But I'm not sure that such general custom resources feature should be supported in this package. I am happy if I could have feedbacks.
Development of device plugin in kubernetes repository seems to be very active now. They would introduce special hardware support officially in the near future. For example, FPGA, Solarflare NICs, Infiniband, etc.
This change is