-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
koord-scheduler: support Reservation reserve devices #1141
koord-scheduler: support Reservation reserve devices #1141
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1141 +/- ##
==========================================
- Coverage 66.29% 66.23% -0.06%
==========================================
Files 280 280
Lines 30317 29733 -584
==========================================
- Hits 20099 19695 -404
+ Misses 8783 8607 -176
+ Partials 1435 1431 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 39 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
52381ea
to
9880967
Compare
19477e0
to
afd9138
Compare
2e3f43e
to
1b889dc
Compare
1b889dc
to
18f697e
Compare
Signed-off-by: Joseph <joseph.t.lee@outlook.com>
18f697e
to
a399701
Compare
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eahydra, FillZpp, jasonliu747, ZiMengSheng The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
Enhanced DeviceShare scheduling plugin:
There are some differences between Device Resource Reservation and other resource processing methods. Other resources, such as CPU/Memory or CPU Core, are continuous, single-dimensional resources. During Pod scheduling, resources reserved by Reservation and the remaining resources on the node can be allocated together. However, Devices such as GPUs are device instances. They are continuous within a single instance, and it is impossible to combine the Device resources reserved by Reservation with the remaining Device resources of the same type on the node. This may lead to fragmentation. However, the current device resource abstraction and resource protocol can only handle this for the time being.
For example, Reservation A reserves
koordiantor.sh/gpu-core:100
, and the scheduler allocates a GPU device instance for it. If an owner Pod A allocateskoordinator.sh/gpu-core:80
first, and then an owner Pod B applies forkoordinator.sh/gpu-core:50
, Pod B cannot reuse the resources reserved by this Reservation to be allocated from the node.Additionally, the device resources specified when Reservation reserves resources need to correspond to the Requests of the owner Pod. For example, if Reservation A reserves
koordinator.sh/gpu-core:100
, the Pod must also writekoordinator.sh/gpu-core:100
instead ofnvidia.com/gpu:1
; otherwise, the Reservation cannot be reused.I am exploring ways to optimize this issue in future upgrades, including upgrading to k8s version1.24 and introducing the informer transformer.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test