Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: support Reservation reserve devices #1141

Merged

Conversation

eahydra
Copy link
Member

@eahydra eahydra commented Mar 26, 2023

Ⅰ. Describe what this PR does

Enhanced DeviceShare scheduling plugin:

  • Support reserving device via Reservation.

There are some differences between Device Resource Reservation and other resource processing methods. Other resources, such as CPU/Memory or CPU Core, are continuous, single-dimensional resources. During Pod scheduling, resources reserved by Reservation and the remaining resources on the node can be allocated together. However, Devices such as GPUs are device instances. They are continuous within a single instance, and it is impossible to combine the Device resources reserved by Reservation with the remaining Device resources of the same type on the node. This may lead to fragmentation. However, the current device resource abstraction and resource protocol can only handle this for the time being.

For example, Reservation A reserves koordiantor.sh/gpu-core:100, and the scheduler allocates a GPU device instance for it. If an owner Pod A allocates koordinator.sh/gpu-core:80 first, and then an owner Pod B applies for koordinator.sh/gpu-core:50, Pod B cannot reuse the resources reserved by this Reservation to be allocated from the node.

Additionally, the device resources specified when Reservation reserves resources need to correspond to the Requests of the owner Pod. For example, if Reservation A reserves koordinator.sh/gpu-core:100, the Pod must also write koordinator.sh/gpu-core:100 instead of nvidia.com/gpu:1; otherwise, the Reservation cannot be reused.

I am exploring ways to optimize this issue in future upgrades, including upgrading to k8s version1.24 and introducing the informer transformer.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Mar 26, 2023

Codecov Report

Patch coverage: 68.67% and project coverage change: -0.06 ⚠️

Comparison is base (dc360a6) 66.29% compared to head (6d1422b) 66.23%.

❗ Current head 6d1422b differs from pull request most recent head a399701. Consider uploading reports for the commit a399701 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1141      +/-   ##
==========================================
- Coverage   66.29%   66.23%   -0.06%     
==========================================
  Files         280      280              
  Lines       30317    29733     -584     
==========================================
- Hits        20099    19695     -404     
+ Misses       8783     8607     -176     
+ Partials     1435     1431       -4     
Flag Coverage Δ
unittests 66.23% <68.67%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...lugins/reservation/controller/node_eventhandler.go 0.00% <0.00%> (ø)
pkg/util/reservation/reservation.go 46.28% <29.85%> (-28.04%) ⬇️
.../scheduler/plugins/reservation/pod_eventhandler.go 40.62% <40.62%> (ø)
...plugins/reservation/controller/pod_eventhandler.go 40.84% <40.84%> (ø)
pkg/scheduler/plugins/deviceshare/utils.go 90.73% <50.00%> (-0.58%) ⬇️
pkg/scheduler/plugins/reservation/eventhandler.go 60.86% <60.86%> (ø)
pkg/scheduler/plugins/reservation/nominator.go 61.90% <61.90%> (ø)
pkg/scheduler/plugins/deviceshare/plugin.go 73.07% <68.98%> (+1.64%) ⬆️
...reservation/controller/reservation_eventhandler.go 69.23% <69.23%> (ø)
...ugins/reservation/controller/garbage_collection.go 69.44% <69.44%> (ø)
... and 13 more

... and 39 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jasonliu747 jasonliu747 force-pushed the deviceshare_support_reservation branch from 52381ea to 9880967 Compare March 27, 2023 03:12
@eahydra eahydra force-pushed the deviceshare_support_reservation branch 2 times, most recently from 19477e0 to afd9138 Compare March 27, 2023 03:24
@eahydra eahydra changed the title koord-scheduler: supports preempting devices and reserving devices via Reservation koord-scheduler: support Reservation reserve devices Mar 27, 2023
@eahydra eahydra added this to the v1.2 milestone Mar 27, 2023
@eahydra eahydra added the enhancement New feature or request label Mar 27, 2023
@eahydra eahydra force-pushed the deviceshare_support_reservation branch 4 times, most recently from 2e3f43e to 1b889dc Compare April 2, 2023 15:11
@eahydra eahydra force-pushed the deviceshare_support_reservation branch from 1b889dc to 18f697e Compare April 6, 2023 06:24
@koordinator-bot koordinator-bot bot added size/XL and removed size/XXL labels Apr 6, 2023
Signed-off-by: Joseph <joseph.t.lee@outlook.com>
@eahydra eahydra force-pushed the deviceshare_support_reservation branch from 18f697e to a399701 Compare April 6, 2023 06:31
@ZiMengSheng
Copy link
Contributor

/lgtm
/approve

Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@FillZpp FillZpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@eahydra
Copy link
Member Author

eahydra commented Apr 6, 2023

/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra, FillZpp, jasonliu747, ZiMengSheng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 10209f4 into koordinator-sh:main Apr 6, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants