Fix the inaccurate status when a plugin internal status is found #105727

chendave · 2021-10-18T07:43:32Z

Currently, the status code returned is Unschedulable when an internal plugin error found, the returned Unschedulable status is built from a FitError which means no fit nodes found but should not an internal plugin error.

Instead of build an Unschedulable status from the FitError, return the Error status directly will bring us two things,

Reveal the failure point directly with klog.ErrorS instead of increasing log level and checking with the klog.InfoS.

kubernetes/pkg/scheduler/scheduler.go

Lines 461 to 465 in 9804a83

    
           if status.Code() == framework.Error { 
        
           	klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status) 
        
           } else { 
        
           	klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status) 
        
           }

Avoid to come up a FitError and build Unschedulable status based on the FitError but return the Error status directly, this is aligning with the main scheduler path,

kubernetes/pkg/scheduler/generic_scheduler.go

Lines 106 to 118 in 9804a83

    
           feasibleNodes, diagnosis, err := g.findNodesThatFitPod(ctx, extenders, fwk, state, pod) 
        
           if err != nil { 
        
           	return result, err 
        
           } 
        
           trace.Step("Computing predicates done") 
        
           if len(feasibleNodes) == 0 { 
        
           	return result, &framework.FitError{ 
        
           		Pod:         pod, 
        
           		NumAllNodes: g.nodeInfoSnapshot.NumNodes(), 
        
           		Diagnosis:   diagnosis, 
        
           	} 
        
           }

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2021-10-18T07:43:40Z

@chendave: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chendave · 2021-10-18T07:44:00Z

/sig scheduling

chendave · 2021-10-18T07:45:18Z

/release-note-none

alculquicondor · 2021-10-18T13:35:55Z

pkg/scheduler/framework/preemption/preemption.go

+	if err != nil {
+		return nil, nil, err
+	}
+	return append(nonViolatingCandidates.get(), violatingCandidates.get()...), nodeStatuses, nil


The changes in this function seem unnecessary.

Would it be possible to check if any of the nodeStatuses is an Error somewhere else? Before they are converted into a FitError.

okay, leave this unchanged, maybe we can check it right after the calling of DryRunPreemption.

alculquicondor · 2021-10-18T13:37:49Z

pkg/scheduler/framework/preemption/preemption.go

@@ -565,10 +569,17 @@ func (ev *Evaluator) DryRunPreemption(ctx context.Context, pod *v1.Pod, potentia
 		if status.IsSuccess() && len(pods) == 0 {
 			status = framework.AsStatus(fmt.Errorf("expected at least one victim pod on node %q", nodeInfoCopy.Node().Name))
 		}
+		if status.Code() == framework.Error {
+			err = status.AsError()


Note that is not thread safe, but before fixing, think of my other comment.

Just out of curiosity, if we don't mind which Error status is eventually returned, send back any one of them is acceptable, then the coding like this should be fine, right?

We might want to report the unschedulable statuses to the logs.

Do this inside the lock below

chendave · 2021-10-19T06:32:57Z

@alculquicondor updated, pls take another look, thanks!

alculquicondor · 2021-10-19T13:40:11Z

pkg/scheduler/framework/preemption/preemption.go

-	for node, status := range unschedulableNodeStatus {
-		nodeStatuses[node] = status
+	var errMsg []string
+	for _, nodeStatus := range nodeStatuses {


What is it that you want to achieve? Fail the entire scheduling cycle if we find any error? Or is logging with ErrorS enough?

If it's the latter, you could log right here. If the former, let's discuss, because that might be a user facing change of behavior.

I want to keep the original logic, not fail the entire scheduling cycle, but just make sure the Error status is not hidden and not converted to FitError.

pls let me know is it good to squash as is? :)

If it's the latter, you could log right here.

the logs will eventually print out here if it's a Error, so I think we needn't log it twice.

kubernetes/pkg/scheduler/scheduler.go

Lines 461 to 465 in 9804a83

if status.Code() == framework.Error {

klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)

} else {

klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)

}

But at that point the scheduling cycle is considered failed.

IIUC, the "errors" are converted into a FitError if no node passed the preemption logic. However, with your change, any error will fail the entire preemption even if there is a node that passes.

kubernetes/pkg/scheduler/framework/preemption/preemption.go

Lines 145 to 148 in d5de03f

candidates, nodeToStatusMap, status := ev.findCandidates(ctx, pod, m)

if !status.IsSuccess() {

return nil, status

}

oh... yes, I miss that, updated the logic to make sure we can still get back any nominated node even there is Error on other nodes.

And testcases updated to reflect the change as well.

alculquicondor · 2021-10-21T14:09:16Z

pkg/scheduler/framework/preemption/preemption.go

-	// Return a FitError only when there are no candidates that fit the pod.
-	if len(candidates) == 0 {
+	// Return a FitError only when there are no candidates that fit the pod and status is success
+	if status.IsSuccess() && len(candidates) == 0 {


Do we need this change?

right, this is not needed.

alculquicondor · 2021-10-21T14:12:56Z

pkg/scheduler/framework/preemption/preemption.go

@@ -215,10 +216,19 @@ func (ev *Evaluator) findCandidates(ctx context.Context, pod *v1.Pod, m framewor
 		klog.Infof("from a pool of %d nodes (offset: %d, sample %d nodes: %v), ~%d candidates will be chosen", len(potentialNodes), offset, len(sample), sample, numCandidates)
 	}
 	candidates, nodeStatuses := ev.DryRunPreemption(ctx, pod, potentialNodes, pdbs, offset, numCandidates)
-	for node, status := range unschedulableNodeStatus {
-		nodeStatuses[node] = status
+	var errMsg []string


Make it var errs []error

Then you can use status.AsError in the loop

alculquicondor · 2021-10-21T14:16:18Z

pkg/scheduler/framework/plugins/defaultpreemption/default_preemption_test.go

+	return &TestPlugin{name: "test-plugin"}, nil
+}
+
+type TestPlugin struct {


Give it a more informative name and a comment for how it fails

comments added, but I am struggle to make it more informative. :(

alculquicondor · 2021-10-21T14:18:30Z

pkg/scheduler/framework/plugins/defaultpreemption/default_preemption_test.go

@@ -299,8 +373,15 @@ func TestPostFilter(t *testing.T) {
 			}

 			gotResult, gotStatus := p.PostFilter(context.TODO(), state, tt.pod, tt.filteredNodesStatuses)
-			if diff := cmp.Diff(tt.wantStatus, gotStatus); diff != "" {
-				t.Errorf("Unexpected status (-want, +got):\n%s", diff)
+			// As we cannot compare two errors directly, we lack the equal method for how to compare two errors, so we just need to compare the reasons.


I think you can make it work with errors.Aggregate

Have tried with this but still doesn't work,
cmp.Diff(utilerrors.NewAggregate([]error{tt.wantStatus.AsError()}), utilerrors.NewAggregate([]error{gotStatus.AsError()}), cmpopts.EquateErrors())

I think both the code and the reasons match should be enough.

chendave · 2021-10-22T09:35:56Z

/retest

ahg-g · 2021-10-25T00:54:19Z

Just to make sure I understand the purpose of this change, the idea to make sure that the ErrorS branch gets executed rather than the InfoS, right?

Is it really worth iterating over all the node statues to do that, why not just log the error in the place where it happened in DryRunPreemption.

Also, I wouldn't consider this a bug, just a cleanup.

chendave · 2021-10-25T02:20:38Z

Just to make sure I understand the purpose of this change, the idea to make sure that the ErrorS branch gets executed rather than the InfoS, right?

thanks for the comments! this is one of the purpose, the other is I am trying to align with the main scheduler path, where an internal error is different with the FitError. The code there just ignore the Error status and return FitError after that.

kubernetes/pkg/scheduler/framework/preemption/preemption.go

Line 221 in 9804a83

return candidates, nodeStatuses, nil

If you think it's fine to do that then there is no point to do this change, we can close this one.

alculquicondor · 2021-10-25T14:31:00Z

/remove-kind bug
/kind cleanup
/approve

This is fine with me. Leaving LGTM to @ahg-g

ahg-g · 2021-10-25T15:03:01Z

/hold

If you think it's fine to do that then there is no point to do this change, we can close this one.

I don't think we should do this because we will be iterating over all the candidates every time, and in reality we will rarely face errors here. Simply log the error in DryRunPreemption.

chendave · 2021-10-26T09:38:32Z

I don't think we should do this because we will be iterating over all the candidates every time, and in reality we will rarely face errors here

Make sense, I took some change from here: a664b8a, so the error there just looks like a flag, and only loop the candidates to collect all the errors when the flag is set.

Simply log the error in DryRunPreemption.

I don't think log is really needed as this is already done there, the problem is that the returned status is not an Error when an error was happened in current code base, this is another reason for the change.

kubernetes/pkg/scheduler/scheduler.go

Lines 461 to 465 in 9804a83

    
           if status.Code() == framework.Error { 
        
           	klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status) 
        
           } else { 
        
           	klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status) 
        
           }

If this change still doesn't look like a right approach, let's just close it instead.

alculquicondor · 2021-10-26T15:15:30Z

I think the previous approach was better.

I don't think log is really needed as this is already done there, the problem is that the returned status is not an Error when an error was happened in current code base, this is another reason for the change.

I think this is a valid thing to fix.
I don't think iterating over the statuses is a big deal. Calculating the candidates is way more expensive.

ahg-g · 2021-10-26T15:50:13Z

pkg/scheduler/framework/preemption/preemption.go

+		for _, nodeStatus := range nodeStatuses {
+			if nodeStatus.Code() == framework.Error {
+				errs = append(errs, nodeStatus.AsError())
+			}
+		}


do this in DryRunPreemption instead of iterating again here.

ahg-g · 2021-10-26T15:50:50Z

Calculating the candidates is more expensive, but those extra iterations do add up at scale; we should avoid doing those especially when they usually end up being a no-op. We could create the error while iterating in DryRunPreemption.

chendave · 2021-10-27T05:38:37Z

We could create the error while iterating in DryRunPreemption.

Done.

ahg-g · 2021-10-27T13:30:17Z

pkg/scheduler/framework/preemption/preemption.go

-		nodeStatuses[node] = status
+	candidates, nodeStatuses, err := ev.DryRunPreemption(ctx, pod, potentialNodes, pdbs, offset, numCandidates)
+	if err != nil {
+		status = framework.AsStatus(fmt.Errorf("found error: %v", err.Error()))


framework.AsStatus(err)

although I am not quite sure why findCandidates return a status in the first place, should simply return an error, but lets leave that for another day.

findCandidates is not an exported method, no testcase is depending on that, no big change is needed, so add one more commit to address this.

alculquicondor · 2021-10-27T15:49:14Z

/approve
you can squash

k8s-ci-robot · 2021-10-27T15:49:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AaronWashington, alculquicondor, chendave

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Currently, the status code returned is `Unschedulable` when an internal error found, the `Unschedulable` status is built from a `FitError` which means no fit nodes found without a internal error. Instead of build an Unschedulable status from the `FitError`, return the Error status directly. Signed-off-by: Dave Chen <dave.chen@arm.com>

chendave · 2021-10-28T01:44:32Z

squash done

ahg-g · 2021-10-28T02:02:13Z

/lgtm

chendave · 2021-10-28T02:21:07Z

/unhold
/retest

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 18, 2021

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 18, 2021

k8s-ci-robot requested review from adtac and alculquicondor October 18, 2021 07:44

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 18, 2021

chendave changed the title ~~Fix the return status when a plugin internal status is found~~ Fix the inaccurate status when a plugin internal status is found Oct 18, 2021

alculquicondor reviewed Oct 18, 2021

View reviewed changes

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 19, 2021

alculquicondor reviewed Oct 19, 2021

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2021

alculquicondor reviewed Oct 21, 2021

View reviewed changes

AaronWashington approved these changes Oct 22, 2021

View reviewed changes

k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 25, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2021

ahg-g reviewed Oct 26, 2021

View reviewed changes

chendave force-pushed the wrong_status branch from 153c68d to c1261e9 Compare October 27, 2021 05:29

ahg-g reviewed Oct 27, 2021

View reviewed changes

chendave force-pushed the wrong_status branch from c1261e9 to cff78f5 Compare October 27, 2021 15:00

chendave force-pushed the wrong_status branch from cff78f5 to 468a600 Compare October 28, 2021 01:44

k8s-ci-robot assigned ahg-g Oct 28, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2021

k8s-ci-robot merged commit 87b0412 into kubernetes:master Oct 28, 2021

k8s-ci-robot added this to the v1.23 milestone Oct 28, 2021

chendave deleted the wrong_status branch October 28, 2021 03:05

	if status.Code() == framework.Error {
	klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
	} else {
	klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
	}

	feasibleNodes, diagnosis, err := g.findNodesThatFitPod(ctx, extenders, fwk, state, pod)
	if err != nil {
	return result, err
	}
	trace.Step("Computing predicates done")

	if len(feasibleNodes) == 0 {
	return result, &framework.FitError{
	Pod: pod,
	NumAllNodes: g.nodeInfoSnapshot.NumNodes(),
	Diagnosis: diagnosis,
	}
	}

	candidates, nodeToStatusMap, status := ev.findCandidates(ctx, pod, m)
	if !status.IsSuccess() {
	return nil, status
	}

Fix the inaccurate status when a plugin internal status is found #105727

Fix the inaccurate status when a plugin internal status is found #105727

Conversation

chendave commented Oct 18, 2021 • edited

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Oct 18, 2021

chendave commented Oct 18, 2021

chendave commented Oct 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chendave commented Oct 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chendave Oct 20, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor Oct 20, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chendave commented Oct 22, 2021

ahg-g commented Oct 25, 2021

chendave commented Oct 25, 2021

alculquicondor commented Oct 25, 2021

ahg-g commented Oct 25, 2021

chendave commented Oct 26, 2021 • edited

alculquicondor commented Oct 26, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ahg-g commented Oct 26, 2021

chendave commented Oct 27, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Oct 27, 2021

k8s-ci-robot commented Oct 27, 2021

chendave commented Oct 28, 2021

ahg-g commented Oct 28, 2021

chendave commented Oct 28, 2021

chendave commented Oct 18, 2021 •

edited

chendave Oct 20, 2021 •

edited

alculquicondor Oct 20, 2021 •

edited

chendave commented Oct 26, 2021 •

edited