New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openmpi: make 'schedulerName' configurable to use custom schedulers. #683
openmpi: make 'schedulerName' configurable to use custom schedulers. #683
Conversation
@jiezhang would you mind reviewing this, too? Thank you in advance. |
/ok-to-test |
Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
/lgtm |
@everpeace Thanks for contributing to the package. |
Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed. kubeflow/openmpi/workloads.libsonnet, line 41 at r1 (raw file):
Actually I modified the workload to use Pod instead of StatefulSet in #671 (to support graceful shutdown). IIUC we'll be scheduling all the pods at the same time now. Do we still need this change? Comments from Reviewable |
/lgtm cancel |
Review status: 0 of 2 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. kubeflow/openmpi/workloads.libsonnet, line 41 at r1 (raw file): Previously, jiezhang (Jie Zhang) wrote…
I don't think so. Even if we use pods instead of statefulsets, the situation is not changed in terms of scheduling. Default cheduler still schedules pending pods (created by openmpi prototype) one-by-one manner. This will lead deadlock. Comments from Reviewable |
/lgtm Review status: 0 of 2 files reviewed at latest revision, all discussions resolved, some commit checks failed. Comments from Reviewable |
Reviewed 2 of 2 files at r1. Comments from Reviewable |
/cc @jlewi |
/assign @jlewi |
In Kubernetes default scheduler, scheduling multiple openmpi package will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . In that case, user would want to perform gang-scheduling(scheduling a group of pods all-together). Currently, kube-arbitrator support it. To achieve that, we need to make 'schedulerName' customizable.
8305dcd
to
4c4e4a2
Compare
I rebased my branch to the latest master because it conflicted. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jiezhang, pdmack The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In Kubernetes default scheduler, scheduling multiple openmpi package will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . In that case, user would want to perform gang-scheduling(scheduling a group of pods all-together). Currently, kube-arbitrator support it. To achieve that, we need to make 'schedulerName' customizable.
…AML resources. (kubeflow#1016) * unittests should compare result of kustomize build to golden set of YAML resources. * Per kubeflow/manifests#306 to allow reviewers to verify that the expected output is correct we should check in the result of "kustomize build -o" so that reviewers can review the diff and verify that it is correct. * This also simplifies the test generation code; the python script generate_tests.py just recourses over the directory tree and runs "kustomize build -o" and checks in the output into the test_data directory. * This is different from what the tests are currently doing. * Currently what the generation scripts do is generate "kustomization.yaml" files and then generate the expected output from that when the test is run. * This makes it very difficult to validate the expected output and to debug whether the expected output is correct. * Going forward, per kubeflow#1014, I think what we want to do is check in test cases corresponding to kustomization.yaml files corresponding to various kustomizations that we want to validate are working correctly * Our generate scripts would then run "kustomize build" to generate expected output and check that in so that we can validate that the expected output is correct. * Also change the tests data structure so that it mirrors the kustomize directory tree rather than flattening the tests into the "tests" directory. * Fix kubeflow#683 * Right now running the unittests takes a long time * The problem is that we generate unittests for every "kustomization.yaml" file * Per kubeflow#1014 this is kind of pointless/redundant because most of these tests aren't actually testing kustomizations. * We will address this in follow on PRs which will add more appropriate tests and remove some of these unnecessary/redundant tests. * Cherry pick AWS fixes. * Regenerate the tests. * Fix the unittests; need to update the generate logic to remove unused tests to remove tests that aren't part of this PR. * Address comments. * Rebase on master and regenerate the tests.
What the PR changes
introducing
schedulerName
parameter in openmpi prototype.Why we need this?
In Kubernetes default scheduler, scheduling multiple openmpi prototypes will sometimes lead deadlocks as discussed kubeflow/training-operator#165 . Please imagine the case 2 openmpi prototypes with 100 workers were scheduled simultaneously.
In that case, users would want to do gang-scheduling (scheduling a group of pods all-together). Currently, kube-arbitrator can achieve such scheduling.
This change is