Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: add device scheduling debug api #637

Merged

Conversation

buptcozy
Copy link
Contributor

@buptcozy buptcozy commented Sep 19, 2022

Signed-off-by: yangzhang bupt_cozy@126.com

Ⅰ. Describe what this PR does

add restful api for device scheduling plugin to help diagnose problems

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Base: 68.90% // Head: 68.97% // Increases project coverage by +0.06% 🎉

Coverage data is based on head (ed33ebe) compared to base (576b076).
Patch coverage: 72.91% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #637      +/-   ##
==========================================
+ Coverage   68.90%   68.97%   +0.06%     
==========================================
  Files         194      196       +2     
  Lines       22180    22304     +124     
==========================================
+ Hits        15284    15384     +100     
- Misses       5832     5856      +24     
  Partials     1064     1064              
Flag Coverage Δ
unittests 68.97% <72.91%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...kg/scheduler/plugins/deviceshare/plugin_service.go 38.46% <38.46%> (ø)
...kg/scheduler/plugins/deviceshare/device_handler.go 92.06% <50.00%> (-7.94%) ⬇️
pkg/scheduler/plugins/deviceshare/plugin.go 90.05% <50.00%> (-0.91%) ⬇️
pkg/scheduler/plugins/deviceshare/device_cache.go 86.57% <77.57%> (-1.27%) ⬇️
...kg/scheduler/plugins/deviceshare/device_summary.go 100.00% <100.00%> (ø)
pkg/util/httputil/reverseproxy.go 84.84% <0.00%> (+0.53%) ⬆️
...eduler/plugins/coscheduling/controller/podgroup.go 72.41% <0.00%> (+1.47%) ⬆️
pkg/koordlet/statesinformer/states_device_linux.go 43.08% <0.00%> (+8.90%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jasonliu747 jasonliu747 added this to the v0.7 milestone Sep 19, 2022
@buptcozy buptcozy changed the title koord-scheduler: add device share debug api koord-scheduler: 1.add device share debug api 2.modify nodeDeviceInfo's allocateSet with real resources on each minor 3.support assign fail if card unhealthy Sep 19, 2022
pkg/scheduler/plugins/deviceshare/device_cache.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/deviceshare/plugin.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/deviceshare/plugin_service.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/deviceshare/plugin_service.go Outdated Show resolved Hide resolved
pkg/scheduler/plugins/deviceshare/plugin_service_test.go Outdated Show resolved Hide resolved
@buptcozy buptcozy changed the title koord-scheduler: 1.add device share debug api 2.modify nodeDeviceInfo's allocateSet with real resources on each minor 3.support assign fail if card unhealthy koord-scheduler: add device share debug api | modify allocateSet | reset resource when minor unheathy Sep 19, 2022
@buptcozy buptcozy changed the title koord-scheduler: add device share debug api | modify allocateSet | reset resource when minor unheathy koord-scheduler: add device share debug api | modify allocateSet | reset resource when minor unhealthy Sep 19, 2022
@buptcozy buptcozy changed the title koord-scheduler: add device share debug api | modify allocateSet | reset resource when minor unhealthy koord-scheduler: add device share debug api | reset resource when minor unhealthy Sep 19, 2022
@buptcozy buptcozy changed the title koord-scheduler: add device share debug api | reset resource when minor unhealthy koord-scheduler: add device share debug api | handle minor unhealthy Sep 19, 2022
@@ -89,7 +88,15 @@ func (n *nodeDeviceCache) update(nodeName string, device *schedulingv1alpha1.Dev
if nodeDeviceResource[deviceInfo.Type] == nil {
nodeDeviceResource[deviceInfo.Type] = make(deviceResources)
}
nodeDeviceResource[deviceInfo.Type][int(deviceInfo.Minor)] = deviceInfo.Resources
if !deviceInfo.Health {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑和目前这个PR是不关联的。建议单独提交一个PR。

@buptcozy buptcozy force-pushed the device-share-debug-api branch 2 times, most recently from 0117d69 to 1b83aeb Compare September 19, 2022 11:28
@eahydra eahydra changed the title koord-scheduler: add device share debug api | handle minor unhealthy koord-scheduler: add device scheduling debug api Sep 19, 2022
@buptcozy buptcozy force-pushed the device-share-debug-api branch 2 times, most recently from 9840c53 to efaeb0f Compare September 19, 2022 11:37
- add device share debug api
- support assign fail if card unhealthy

Signed-off-by: yangzhang <bupt_cozy@126.com>
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@hormes
Copy link
Member

hormes commented Sep 19, 2022

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra, hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit ebd35a9 into koordinator-sh:main Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants