Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA support for PyTorch Elastic #1701

Merged
merged 9 commits into from
Dec 21, 2022
Merged

HPA support for PyTorch Elastic #1701

merged 9 commits into from
Dec 21, 2022

Conversation

johnugeorge
Copy link
Member

This PR adds support to PytorchJob Elastic which was introduced #1453

It has following fixes

  1. Fixed LabelSelector field for Scale sub resource with correct labels formatted as string(Ref: Changing label selector type for HPA common#197)
  2. Fixed Scale subresource spec path and statuspath
  3. Upgraded Elastic imagenet example to latest pytorch base image.
  4. Autoscaling version upgraded to v2

Fixes: #1645 #1626

/assign @gaocegege @zw0610

/cc @kubeflow/wg-training-leads

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@johnugeorge
Copy link
Member Author

/hold

I will remove hold when review is completed

@coveralls
Copy link

Pull Request Test Coverage Report for Build 3740580459

  • 6 of 28 (21.43%) changed or added relevant lines in 6 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 39.614%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/apis/kubeflow.org/v1/zz_generated.deepcopy.go 0 2 0.0%
pkg/controller.v1/paddlepaddle/paddlepaddle_controller.go 3 5 60.0%
pkg/controller.v1/pytorch/pytorchjob_controller.go 3 5 60.0%
pkg/controller.v1/xgboost/xgboostjob_controller.go 0 2 0.0%
pkg/apis/kubeflow.org/v1/openapi_generated.go 0 6 0.0%
pkg/controller.v1/pytorch/hpa.go 0 8 0.0%
Files with Coverage Reduction New Missed Lines %
pkg/controller.v1/mpi/mpijob_controller.go 2 77.67%
Totals Coverage Status
Change from base Build 3730090510: 0.2%
Covered Lines: 2691
Relevant Lines: 6793

💛 - Coveralls

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for your contribution! 🎉 👍

@johnugeorge
Copy link
Member Author

/hold cancel

@gaocegege
Copy link
Member

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes HPA doesn't work with elastic PytorchJob
4 participants