Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[openmpi] Add custom resources support #772

Merged
merged 1 commit into from May 9, 2018

Conversation

everpeace
Copy link
Contributor

@everpeace everpeace commented May 8, 2018

Motivation

Machine learning tasks sometimes require special hardware. Currently we operate several custom resources on our on-premise Kubernetes cluster to run distributed machine learning tasks (with openmpi package) which requires special hardware other than GPUs.

How

  • introduce customResources parameter. This specifies custom resources to assign openmpi-job containers in worker pods.
  • The value should be comma separated list of custom-resource-name=amount.

Note

This feature doesn't break any backward compatibility. So I would be very happy if it could be merged to the official repo. But I'm not sure that such general custom resources feature should be supported in this package. I am happy if I could have feedbacks.

Development of device plugin in kubernetes repository seems to be very active now. They would introduce special hardware support officially in the near future. For example, FPGA, Solarflare NICs, Infiniband, etc.


This change is Reviewable

@everpeace
Copy link
Contributor Author

/assign @jiezhang @inc0
/cc @alsrgv @jlewi

@everpeace
Copy link
Contributor Author

/test kubeflow-presubmit

1 similar comment
@everpeace
Copy link
Contributor Author

/test kubeflow-presubmit

Copy link

@inc0 inc0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@jiezhang
Copy link

jiezhang commented May 8, 2018

Review status: 0 of 2 files reviewed at latest revision, all discussions resolved.


kubeflow/openmpi/workloads.libsonnet, line 178 at r1 (raw file):


  customResources(params, role)::
    if role == ROLE_WORKER then

Can you move { to its previous line? i.e. if role == ROLE_WORKER then {

BTW, we have a script to autoformat: scripts/autoformat_jsonnet.sh


Comments from Reviewable

@jiezhang
Copy link

jiezhang commented May 8, 2018

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion.


kubeflow/openmpi/workloads.libsonnet, line 102 at r1 (raw file):

      image: params.image,
      imagePullPolicy: params.imagePullPolicy,
      resources: std.mergePatch($.resources(params, role), $.customResources(params, role)),

Is it same as + operator? That seems more straightforward.

See object composition in https://jsonnet.org/learning/tutorial.html


Comments from Reviewable

@jiezhang
Copy link

jiezhang commented May 8, 2018

@everpeace I'm okay with supporting custom resources in this package.

@everpeace
Copy link
Contributor Author

Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions.


kubeflow/openmpi/workloads.libsonnet, line 102 at r1 (raw file):

Previously, jiezhang (Jie Zhang) wrote…

Is it same as + operator? That seems more straightforward.

See object composition in https://jsonnet.org/learning/tutorial.html

actually no. $.resources and $.customResources have the same key limits. So, if we did $.resources + $.customResources then, contents of $.resources will be vanished.


kubeflow/openmpi/workloads.libsonnet, line 178 at r1 (raw file):

Previously, jiezhang (Jie Zhang) wrote…

Can you move { to its previous line? i.e. if role == ROLE_WORKER then {

BTW, we have a script to autoformat: scripts/autoformat_jsonnet.sh

Sure. Oh, I noticed the script and remembered I ran. Let me rerun.


Comments from Reviewable

@everpeace
Copy link
Contributor Author

everpeace commented May 9, 2018

@jiezhang I changed the format and squash it. Could you review again? I hope you'll approve this, Thanks 🙇

@jiezhang
Copy link

jiezhang commented May 9, 2018

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiezhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiezhang
Copy link

jiezhang commented May 9, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 37f27df into kubeflow:master May 9, 2018
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
yanniszark pushed a commit to arrikto/kubeflow that referenced this pull request Feb 15, 2021
* Removing suggestions from manager interface

* Removing long running services

* Increasing timeout to 60 sec
surajkota pushed a commit to surajkota/kubeflow that referenced this pull request Jun 13, 2022
…ubeflow#772)

* image gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-g56c9025a
* Image built from kubeflow/kubeflow@56c9025a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants