Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Eviction Arbitration Mechanism in Descheduler #1454

Merged

Conversation

baowj-678
Copy link
Member

@baowj-678 baowj-678 commented Jul 7, 2023

Ⅰ. Describe what this PR does

Arbitrate Mechanism is an important capability that Pod Migration relies on, and Pod Migration is relied on by many components (such as deschedulers). But pod Migration is a complex process, involving steps such as auditing, resource allocation, and application startup, and is mixed with application upgrading, scaling scenarios, and resource operation and maintenance operations by cluster administrators.

So when a large number of PODs are simultaneously migrated, this may have some impact on the stability of the system. In addition, if many pods of the same workload are migrated simultaneously, it will also have an impact on the stability of the application. Moreover, if a job's pod migration takes too long, it can affect the job's completion time.

Therefore, it is necessary to design an arbitration mechanism. This arbitration mechanism will select suitable PodMigrationJob to execute and control the execution speed of PodMigrationJob (to avoid a large number of jobs executing simultaneously), thereby ensuring the stability of the system and application.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Design docs for #430

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Patch coverage has no change and project coverage change: +1.02% 🎉

Comparison is base (06b2f5b) 64.14% compared to head (24ddf31) 65.16%.
Report is 62 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1454      +/-   ##
==========================================
+ Coverage   64.14%   65.16%   +1.02%     
==========================================
  Files         341      352      +11     
  Lines       34939    36326    +1387     
==========================================
+ Hits        22411    23673    +1262     
- Misses      10881    10909      +28     
- Partials     1647     1744      +97     
Flag Coverage Δ
unittests 65.16% <ø> (+1.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 136 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: baowj-678 <bwj_678@qq.com>
@baowj-678 baowj-678 force-pushed the proposal_arbitration_mechanism branch from 81aa852 to 6a2261a Compare July 7, 2023 15:56
@eahydra eahydra changed the title [proposal] Eviction Arbitration Mechanism in Descheduler proposal: Eviction Arbitration Mechanism in Descheduler Jul 9, 2023
1.update user stories
2.add diagrams
3.others

Signed-off-by: baowj-678 <bwj_678@qq.com>
1.update diagrams.
2.move the arbitrate mechanism process before reconcile process.
3.delete the modifications to the PodMigrationJob CRD.
4.others.

Signed-off-by: baowj-678 <bwj_678@qq.com>
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me. But, it is recommended to add a picture to describe the timing diagram of a PMJ entering the migration controller, entering the arbitration queue, performing arbitration, and entering the workqueue. It will be easier to understand this way.

1.update diagrams.
2.others.

Signed-off-by: baowj-678 <bwj_678@qq.com>
@eahydra
Copy link
Member

eahydra commented Aug 15, 2023

/lgtm
/approve

Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ZiMengSheng
Copy link
Contributor

/lgtm

Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

1.replace GroupFilter and Select process with Filter process.
2.update some diagrams.
3.others.

Signed-off-by: baowj-678 <bwj_678@qq.com>
@koordinator-bot koordinator-bot bot removed the lgtm label Aug 21, 2023
1.add some introduction to the NonRetryableFilter and RetryableFilter.

Signed-off-by: baowj-678 <bwj_678@qq.com>
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@koordinator-bot koordinator-bot bot added the lgtm label Aug 22, 2023
@eahydra
Copy link
Member

eahydra commented Aug 22, 2023

@hormes PTAL

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra, hormes, jasonliu747

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit e06ce56 into koordinator-sh:main Aug 24, 2023
16 checks passed
@baowj-678 baowj-678 mentioned this pull request Aug 26, 2023
3 tasks
ls-2018 pushed a commit to ls-2018/koordinator that referenced this pull request Mar 25, 2024
ls-2018 pushed a commit to ls-2018/koordinator that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants