Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: fix Coscheduling resource deadlock with groups #873

Merged

Conversation

KunWuLuan
Copy link
Contributor

Ⅰ. Describe what this PR does

fix: deadlock when schedule multi-gang-group

Ⅱ. Does this pull request fix one issue?

fixes #872

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@eahydra eahydra changed the title fix: deadlock when schedule multi-gang-group koord-scheduler: fix Coscheduling resource deadlock with groups Dec 9, 2022
@eahydra
Copy link
Member

eahydra commented Dec 12, 2022

Thanks for your contribution. Please fix the unit-tests, and add signed off in git commit.

@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch 2 times, most recently from fd2424f to ce7f554 Compare December 16, 2022 07:35
@koordinator-bot koordinator-bot bot added size/L and removed size/M labels Dec 19, 2022
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch 3 times, most recently from 8d311e7 to 3079b7d Compare December 19, 2022 12:06
@codecov
Copy link

codecov bot commented Dec 19, 2022

Codecov Report

Base: 67.52% // Head: 66.85% // Decreases project coverage by -0.66% ⚠️

Coverage data is based on head (e3db841) compared to base (f321a92).
Patch coverage: 51.95% of modified lines in pull request are covered.

❗ Current head e3db841 differs from pull request most recent head 5596bd6. Consider uploading reports for the commit 5596bd6 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #873      +/-   ##
==========================================
- Coverage   67.52%   66.85%   -0.67%     
==========================================
  Files         225      234       +9     
  Lines       25631    26919    +1288     
==========================================
+ Hits        17307    17998     +691     
- Misses       7121     7644     +523     
- Partials     1203     1277      +74     
Flag Coverage Δ
unittests 66.85% <51.95%> (-0.67%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/koordlet/executor/resctrl_executor.go 0.00% <ø> (ø)
pkg/koordlet/executor/resource_update_executor.go 29.57% <ø> (ø)
pkg/koordlet/executor/types.go 31.48% <ø> (ø)
pkg/koordlet/metrics/psi.go 93.61% <ø> (ø)
pkg/koordlet/metricsadvisor/collector_gpu_linux.go 36.84% <ø> (ø)
...dlet/metricsadvisor/performance_collector_linux.go 69.56% <ø> (ø)
pkg/koordlet/resmanager/resmanager.go 54.33% <ø> (ø)
pkg/koordlet/runtimehooks/config.go 100.00% <ø> (ø)
...runtimehooks/hooks/batchresource/batch_resource.go 69.91% <ø> (ø)
pkg/koordlet/runtimehooks/hooks/cpuset/cpuset.go 90.58% <ø> (ø)
... and 100 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@FillZpp
Copy link
Member

FillZpp commented Dec 20, 2022

/milestone v1.2

@koordinator-bot koordinator-bot bot added this to the v1.2 milestone Dec 20, 2022
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch 6 times, most recently from 1fdf885 to 13636a9 Compare December 26, 2022 07:12
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch from 13636a9 to 4a3588e Compare December 27, 2022 12:37
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KunWuLuan Hi, thanks for your hard work. Please fix the CI, then we're all set.

@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch from 4a3588e to 416b523 Compare December 28, 2022 12:00
@koordinator-bot koordinator-bot bot removed the lgtm label Dec 28, 2022
@KunWuLuan KunWuLuan requested review from eahydra and removed request for buptcozy December 28, 2022 12:06
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch from 416b523 to 6b70fea Compare January 9, 2023 06:14
@koordinator-bot koordinator-bot bot removed the lgtm label Jan 9, 2023
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch from 6b70fea to cc44354 Compare January 9, 2023 06:35
@KunWuLuan KunWuLuan requested review from eahydra and removed request for buptcozy January 9, 2023 07:09
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch 2 times, most recently from 5ec7d98 to ca54f8c Compare January 11, 2023 03:40
Signed-off-by: KunWuLuan <kunwuluan@gmail.com>
@KunWuLuan KunWuLuan force-pushed the fix-deadlock-in-multi-ganggroup branch from ca54f8c to 5596bd6 Compare January 11, 2023 03:56
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@koordinator-bot koordinator-bot bot added the lgtm label Jan 11, 2023
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 395d093 into koordinator-sh:main Jan 11, 2023
@eahydra eahydra modified the milestones: v1.2, someday Jan 11, 2023
FillZpp pushed a commit that referenced this pull request Jan 16, 2023
lucming pushed a commit to lucming/koordinator that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]deadlock when schedule multi-gang-group
5 participants