Skip to content

Commit

Permalink
TorchX-MCAD support (#679)
Browse files Browse the repository at this point in the history
* TorchX-MCAD scheduler support

* Quality assurance updates

* Added list support, updated testing, additional quality assurance

* Update documentation, remove extra comments and debugging

* Restructure get_role_information, format constants

* Update MCAD and Volcano list function, list tests, and prefix change

* Restore kubernetes_mcad_scheduler_test.py

* Update prefix in test cases

* Update port for Kubernetes service

* Update list, list tests, and quality assurance

* remove debugging option

* pyre fix

* pyre fixes

* Update list_failure_tests

* Update test_list_failure

* Minor formatting changes

* Additional CI tests

---------

Co-authored-by: Sara Kokkila Schumacher <Sara.Ilane.Ladd.Kokkila.Schumacher@ibm.com>
  • Loading branch information
Sara-KS and Sara-KS committed Jan 28, 2023
1 parent 0e85593 commit ef01789
Show file tree
Hide file tree
Showing 6 changed files with 3,143 additions and 43 deletions.
19 changes: 18 additions & 1 deletion scripts/component_integration_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,14 @@ def main() -> None:
torchx_image = "dummy_image"
dryrun = False

if scheduler in ("kubernetes", "local_docker", "aws_batch", "lsf", "gcp_batch"):
if scheduler in (
"kubernetes",
"kubernetes_mcad",
"local_docker",
"aws_batch",
"lsf",
"gcp_batch",
):
try:
build = build_and_push_image()
torchx_image = build.torchx_image
Expand All @@ -71,6 +78,16 @@ def main() -> None:
"queue": "default",
},
},
"kubernetes_mcad": {
"providers": [
component_provider,
examples_app_defs_providers,
],
"image": torchx_image,
"cfg": {
"namespace": "torchx-dev",
},
},
"local_cwd": {
"providers": [
component_provider,
Expand Down
1 change: 1 addition & 0 deletions torchx/schedulers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"local_cwd": "torchx.schedulers.local_scheduler",
"slurm": "torchx.schedulers.slurm_scheduler",
"kubernetes": "torchx.schedulers.kubernetes_scheduler",
"kubernetes_mcad": "torchx.schedulers.kubernetes_mcad_scheduler",
"aws_batch": "torchx.schedulers.aws_batch_scheduler",
"gcp_batch": "torchx.schedulers.gcp_batch_scheduler",
"ray": "torchx.schedulers.ray_scheduler",
Expand Down
Loading

0 comments on commit ef01789

Please sign in to comment.