openmpi: make 'schedulerName' configurable to use custom schedulers. #683

everpeace · 2018-04-19T07:47:38Z

What the PR changes

introducing schedulerName parameter in openmpi prototype.

Why we need this?

In Kubernetes default scheduler, scheduling multiple openmpi prototypes will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . Please imagine the case 2 openmpi prototypes with 100 workers were scheduled simultaneously.

In that case, users would want to do gang-scheduling (scheduling a group of pods all-together). Currently, kube-arbitrator can achieve such scheduling.

This change is

everpeace · 2018-04-19T07:51:50Z

@jiezhang would you mind reviewing this, too? Thank you in advance.

pdmack · 2018-04-19T14:16:55Z

/ok-to-test

jiezhang · 2018-04-19T17:14:58Z

Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jiezhang · 2018-04-19T17:15:35Z

/lgtm

jiezhang · 2018-04-19T17:16:32Z

@everpeace Thanks for contributing to the package.

jiezhang · 2018-04-19T18:10:43Z

Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed.

kubeflow/openmpi/workloads.libsonnet, line 41 at r1 (raw file):

          terminationGracePeriodSeconds: 30,
          dnsPolicy: "ClusterFirst",
          schedulerName: params.schedulerName,

Actually I modified the workload to use Pod instead of StatefulSet in #671 (to support graceful shutdown).

IIUC we'll be scheduling all the pods at the same time now. Do we still need this change?

Comments from Reviewable

jiezhang · 2018-04-19T18:20:46Z

/lgtm cancel

everpeace · 2018-04-20T01:05:35Z

Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed.

kubeflow/openmpi/workloads.libsonnet, line 41 at r1 (raw file):

Previously, jiezhang (Jie Zhang) wrote…

Actually I modified the workload to use Pod instead of StatefulSet in #671 (to support graceful shutdown).

IIUC we'll be scheduling all the pods at the same time now. Do we still need this change?

I don't think so. Even if we use pods instead of statefulsets, the situation is not changed in terms of scheduling. Default cheduler still schedules pending pods (created by openmpi prototype) one-by-one manner. This will lead deadlock.

Comments from Reviewable

jiezhang · 2018-04-20T01:11:36Z

/lgtm

Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jiezhang · 2018-04-20T01:13:17Z

Reviewed 2 of 2 files at r1.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

jiezhang · 2018-04-20T01:15:34Z

/cc @jlewi

jiezhang · 2018-04-20T01:16:47Z

/assign @jlewi

In Kubernetes default scheduler, scheduling multiple openmpi package will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . In that case, user would want to perform gang-scheduling(scheduling a group of pods all-together). Currently, kube-arbitrator support it. To achieve that, we need to make 'schedulerName' customizable.

everpeace · 2018-04-20T02:14:36Z

I rebased my branch to the latest master because it conflicted.

pdmack · 2018-04-20T02:31:26Z

/approve
/lgtm

k8s-ci-robot · 2018-04-20T02:31:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiezhang, pdmack

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pdmack]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In Kubernetes default scheduler, scheduling multiple openmpi package will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . In that case, user would want to perform gang-scheduling(scheduling a group of pods all-together). Currently, kube-arbitrator support it. To achieve that, we need to make 'schedulerName' customizable.

…AML resources. (kubeflow#1016) * unittests should compare result of kustomize build to golden set of YAML resources. * Per kubeflow/manifests#306 to allow reviewers to verify that the expected output is correct we should check in the result of "kustomize build -o" so that reviewers can review the diff and verify that it is correct. * This also simplifies the test generation code; the python script generate_tests.py just recourses over the directory tree and runs "kustomize build -o" and checks in the output into the test_data directory. * This is different from what the tests are currently doing. * Currently what the generation scripts do is generate "kustomization.yaml" files and then generate the expected output from that when the test is run. * This makes it very difficult to validate the expected output and to debug whether the expected output is correct. * Going forward, per kubeflow#1014, I think what we want to do is check in test cases corresponding to kustomization.yaml files corresponding to various kustomizations that we want to validate are working correctly * Our generate scripts would then run "kustomize build" to generate expected output and check that in so that we can validate that the expected output is correct. * Also change the tests data structure so that it mirrors the kustomize directory tree rather than flattening the tests into the "tests" directory. * Fix kubeflow#683 * Right now running the unittests takes a long time * The problem is that we generate unittests for every "kustomization.yaml" file * Per kubeflow#1014 this is kind of pointless/redundant because most of these tests aren't actually testing kustomizations. * We will address this in follow on PRs which will add more appropriate tests and remove some of these unnecessary/redundant tests. * Cherry pick AWS fixes. * Regenerate the tests. * Fix the unittests; need to update the generate logic to remove unused tests to remove tests that aren't part of this PR. * Address comments. * Rebase on master and regenerate the tests.

k8s-ci-robot requested review from wbuchwalter and willingc April 19, 2018 07:47

k8s-ci-robot added needs-ok-to-test size/XS labels Apr 19, 2018

k8s-ci-robot removed the needs-ok-to-test label Apr 19, 2018

k8s-ci-robot assigned jiezhang Apr 19, 2018

k8s-ci-robot added the lgtm label Apr 19, 2018

k8s-ci-robot removed the lgtm label Apr 19, 2018

k8s-ci-robot added the lgtm label Apr 20, 2018

k8s-ci-robot requested a review from jlewi April 20, 2018 01:15

k8s-ci-robot assigned jlewi Apr 20, 2018

everpeace force-pushed the feature/openmpi-custom-scheduler branch from 8305dcd to 4c4e4a2 Compare April 20, 2018 02:13

k8s-ci-robot removed the lgtm label Apr 20, 2018

k8s-ci-robot assigned pdmack Apr 20, 2018

k8s-ci-robot added the lgtm label Apr 20, 2018

k8s-ci-robot added the approved label Apr 20, 2018

k8s-ci-robot merged commit 8e4f669 into kubeflow:master Apr 20, 2018

snyk-bot mentioned this pull request Jul 28, 2021

[Snyk] Upgrade @kubernetes/client-node from 0.8.1 to 0.15.0 kmr0877/kubeflow#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

openmpi: make 'schedulerName' configurable to use custom schedulers. #683

openmpi: make 'schedulerName' configurable to use custom schedulers. #683

everpeace commented Apr 19, 2018 •

edited by jlewi

everpeace commented Apr 19, 2018

pdmack commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

everpeace commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

everpeace commented Apr 20, 2018

pdmack commented Apr 20, 2018

k8s-ci-robot commented Apr 20, 2018

openmpi: make 'schedulerName' configurable to use custom schedulers. #683

openmpi: make 'schedulerName' configurable to use custom schedulers. #683

Conversation

everpeace commented Apr 19, 2018 • edited by jlewi

What the PR changes

Why we need this?

everpeace commented Apr 19, 2018

pdmack commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

jiezhang commented Apr 19, 2018

everpeace commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

jiezhang commented Apr 20, 2018

everpeace commented Apr 20, 2018

pdmack commented Apr 20, 2018

k8s-ci-robot commented Apr 20, 2018

everpeace commented Apr 19, 2018 •

edited by jlewi