Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log scheduling queue movement events #111878

Merged
merged 1 commit into from
Aug 25, 2022

Conversation

yuanchen8911
Copy link
Member

@yuanchen8911 yuanchen8911 commented Aug 17, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Add additional scheduling events for pod movement to an internal scheduling queue to facilitate scheduling troubleshooting.

  • Added to active queue
  • Moved from active queue to unschedulable queue
  • Moved from active queue to backoff queue
  • Moved from unschedulable queue to active queue
  • Moved from unschedulable queue to backoff queue
  • Moved from backoff queue to active queue.

Some events are mutually exclusive and won't appear in the same scheduling cycle.

Below is an example for a failed scheduling with 3 new queue events. There's a single event Add to active queue for a successful pod scheduling.

I0817 00:53:01.018280 1495316 scheduling_queue.go:274] "Pod moved to an internal scheduling queue" pod="yuan/pod-143-c95674667-h6t7b" event="PodAdd"  queue="ActiveQueue"
I0817 00:54:30.877590 1495316 scheduling_queue.go:388] "Pod moved to an internal scheduling queue" pod="yuan/pod-143-c95674667-h6t7b" event="ScheduleAttemptFailure"  queue ="UnschedulablePods"
I0817 00:54:33.456604 1495316 scheduling_queue.go:597] "Pod moved to an internal scheduling queue" pod="yuan/pod-143-c95674667-h6t7b" event="NodeLabelChange" queue="BackoffQueue"

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Aug 17, 2022
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.25 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.25.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Tue Aug 16 19:37:32 UTC 2022.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 17, 2022
@k8s-ci-robot
Copy link
Contributor

@yuanchen8911: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Aug 17, 2022
@yuanchen8911
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 17, 2022
@yuanchen8911
Copy link
Member Author

/assign @Huang-Wei

@yuanchen8911
Copy link
Member Author

/retest

1 similar comment
@yuanchen8911
Copy link
Member Author

/retest

pkg/scheduler/internal/queue/scheduling_queue.go Outdated Show resolved Hide resolved
@@ -313,6 +318,7 @@ func (p *PriorityQueue) Add(pod *v1.Pod) error {
if err := p.podBackoffQ.Delete(pInfo); err == nil {
klog.ErrorS(nil, "Error: pod is already in the podBackoff queue", "pod", klog.KObj(pod))
}
klog.V(5).InfoS("Pod moved", "pod", klog.KObj(pod), "event", PodAdd, "from", "", "to", activeQName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe specify it's "from" New (or Incoming)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it New

@yuanchen8911
Copy link
Member Author

/retest

@@ -558,6 +567,7 @@ func (p *PriorityQueue) Update(oldPod, newPod *v1.Pod) error {
return err
}
p.unschedulablePods.delete(usPodInfo.Pod)
klog.V(5).InfoS("Pod moved", "pod", klog.KObj(pInfo.Pod), "event", "PodUpdated", "from", unschedulablePods, "to", activeQName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not related to this line)

Add logging below L564? unschedulable -> backoff

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added additional logging unschedulePods to backoff.

@Huang-Wei
Copy link
Member

/lgtm
/approve

This PR tries to reveal more details about pods moving among internal queues, which is helpful to debug issues that pods may stay in unschedulablePods for a long time.

/hold
for a while in case other folks have concerns.
cc @ahg-g @alculquicondor

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2022
@yuanchen8911 yuanchen8911 requested review from alculquicondor and removed request for Huang-Wei August 24, 2022 21:42
@@ -313,6 +318,7 @@ func (p *PriorityQueue) Add(pod *v1.Pod) error {
if err := p.podBackoffQ.Delete(pInfo); err == nil {
klog.ErrorS(nil, "Error: pod is already in the podBackoff queue", "pod", klog.KObj(pod))
}
klog.V(5).InfoS("Pod moved between internal queues", "pod", klog.KObj(pod), "event", PodAdd, "to", activeQName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

Suggested change
klog.V(5).InfoS("Pod moved between internal queues", "pod", klog.KObj(pod), "event", PodAdd, "to", activeQName)
klog.V(5).InfoS("Pod moved to an internal queue", "pod", klog.KObj(pod), "event", PodAdd, "queue", activeQName)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Pod moved to an internal scheduling queue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to to queue.

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 24, 2022
@yuanchen8911 yuanchen8911 requested review from Huang-Wei and removed request for alculquicondor August 24, 2022 23:03
Fix a typo

Address comments

Log one more queue event

Update pkg/scheduler/internal/queue/scheduling_queue.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

Update pkg/scheduler/internal/queue/scheduling_queue.go

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

Address comments

Remove 'source' from scheudling queue events

Update scheduling queue event msg.

Update scheduling queue events
@yuanchen8911 yuanchen8911 requested review from Huang-Wei and alculquicondor and removed request for Huang-Wei August 24, 2022 23:47
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 24, 2022
@yuanchen8911
Copy link
Member Author

Updated the PR description based on the discussion and changes.

@Huang-Wei
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2022
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@alculquicondor
Copy link
Member

/approve
/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 25, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, Huang-Wei, yuanchen8911

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9ec888e into kubernetes:master Aug 25, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Aug 25, 2022
@yuanchen8911
Copy link
Member Author

Thank you, folks!

@yuanchen8911 yuanchen8911 changed the title Log scheduling queue events Log scheduling queue movement events Aug 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants