Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the inaccurate status when a plugin internal status is found #105727

Merged
merged 1 commit into from
Oct 28, 2021

Conversation

chendave
Copy link
Member

@chendave chendave commented Oct 18, 2021

Currently, the status code returned is Unschedulable when an internal plugin error found, the returned Unschedulable status is built from a FitError which means no fit nodes found but should not an internal plugin error.

Instead of build an Unschedulable status from the FitError, return the Error status directly will bring us two things,

  1. Reveal the failure point directly with klog.ErrorS instead of increasing log level and checking with the klog.InfoS.

    if status.Code() == framework.Error {
    klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
    } else {
    klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
    }

  2. Avoid to come up a FitError and build Unschedulable status based on the FitError but return the Error status directly, this is aligning with the main scheduler path,

    feasibleNodes, diagnosis, err := g.findNodesThatFitPod(ctx, extenders, fwk, state, pod)
    if err != nil {
    return result, err
    }
    trace.Step("Computing predicates done")
    if len(feasibleNodes) == 0 {
    return result, &framework.FitError{
    Pod: pod,
    NumAllNodes: g.nodeInfoSnapshot.NumNodes(),
    Diagnosis: diagnosis,
    }
    }

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 18, 2021
@k8s-ci-robot
Copy link
Contributor

@chendave: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Oct 18, 2021
@chendave
Copy link
Member Author

/sig scheduling

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 18, 2021
@chendave
Copy link
Member Author

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 18, 2021
@chendave chendave changed the title Fix the return status when a plugin internal status is found Fix the inaccurate status when a plugin internal status is found Oct 18, 2021
if err != nil {
return nil, nil, err
}
return append(nonViolatingCandidates.get(), violatingCandidates.get()...), nodeStatuses, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this function seem unnecessary.

Would it be possible to check if any of the nodeStatuses is an Error somewhere else? Before they are converted into a FitError.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, leave this unchanged, maybe we can check it right after the calling of DryRunPreemption.

@@ -565,10 +569,17 @@ func (ev *Evaluator) DryRunPreemption(ctx context.Context, pod *v1.Pod, potentia
if status.IsSuccess() && len(pods) == 0 {
status = framework.AsStatus(fmt.Errorf("expected at least one victim pod on node %q", nodeInfoCopy.Node().Name))
}
if status.Code() == framework.Error {
err = status.AsError()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that is not thread safe, but before fixing, think of my other comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, if we don't mind which Error status is eventually returned, send back any one of them is acceptable, then the coding like this should be fine, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to report the unschedulable statuses to the logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this inside the lock below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 19, 2021
@chendave
Copy link
Member Author

@alculquicondor updated, pls take another look, thanks!

for node, status := range unschedulableNodeStatus {
nodeStatuses[node] = status
var errMsg []string
for _, nodeStatus := range nodeStatuses {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is it that you want to achieve? Fail the entire scheduling cycle if we find any error? Or is logging with ErrorS enough?

If it's the latter, you could log right here. If the former, let's discuss, because that might be a user facing change of behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep the original logic, not fail the entire scheduling cycle, but just make sure the Error status is not hidden and not converted to FitError.

Copy link
Member Author

@chendave chendave Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls let me know is it good to squash as is? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's the latter, you could log right here.

the logs will eventually print out here if it's a Error, so I think we needn't log it twice.

if status.Code() == framework.Error {
klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
} else {
klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at that point the scheduling cycle is considered failed.

IIUC, the "errors" are converted into a FitError if no node passed the preemption logic. However, with your change, any error will fail the entire preemption even if there is a node that passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidates, nodeToStatusMap, status := ev.findCandidates(ctx, pod, m)
if !status.IsSuccess() {
return nil, status
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... yes, I miss that, updated the logic to make sure we can still get back any nominated node even there is Error on other nodes.

And testcases updated to reflect the change as well.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2021
// Return a FitError only when there are no candidates that fit the pod.
if len(candidates) == 0 {
// Return a FitError only when there are no candidates that fit the pod and status is success
if status.IsSuccess() && len(candidates) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, this is not needed.

@@ -215,10 +216,19 @@ func (ev *Evaluator) findCandidates(ctx context.Context, pod *v1.Pod, m framewor
klog.Infof("from a pool of %d nodes (offset: %d, sample %d nodes: %v), ~%d candidates will be chosen", len(potentialNodes), offset, len(sample), sample, numCandidates)
}
candidates, nodeStatuses := ev.DryRunPreemption(ctx, pod, potentialNodes, pdbs, offset, numCandidates)
for node, status := range unschedulableNodeStatus {
nodeStatuses[node] = status
var errMsg []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it var errs []error

Then you can use status.AsError in the loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return &TestPlugin{name: "test-plugin"}, nil
}

type TestPlugin struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give it a more informative name and a comment for how it fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added, but I am struggle to make it more informative. :(

@@ -299,8 +373,15 @@ func TestPostFilter(t *testing.T) {
}

gotResult, gotStatus := p.PostFilter(context.TODO(), state, tt.pod, tt.filteredNodesStatuses)
if diff := cmp.Diff(tt.wantStatus, gotStatus); diff != "" {
t.Errorf("Unexpected status (-want, +got):\n%s", diff)
// As we cannot compare two errors directly, we lack the equal method for how to compare two errors, so we just need to compare the reasons.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can make it work with errors.Aggregate

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have tried with this but still doesn't work,
cmp.Diff(utilerrors.NewAggregate([]error{tt.wantStatus.AsError()}), utilerrors.NewAggregate([]error{gotStatus.AsError()}), cmpopts.EquateErrors())

I think both the code and the reasons match should be enough.

@chendave
Copy link
Member Author

/retest

@ahg-g
Copy link
Member

ahg-g commented Oct 25, 2021

Just to make sure I understand the purpose of this change, the idea to make sure that the ErrorS branch gets executed rather than the InfoS, right?

Is it really worth iterating over all the node statues to do that, why not just log the error in the place where it happened in DryRunPreemption.

Also, I wouldn't consider this a bug, just a cleanup.

@chendave
Copy link
Member Author

Just to make sure I understand the purpose of this change, the idea to make sure that the ErrorS branch gets executed rather than the InfoS, right?

thanks for the comments! this is one of the purpose, the other is I am trying to align with the main scheduler path, where an internal error is different with the FitError. The code there just ignore the Error status and return FitError after that.

return candidates, nodeStatuses, nil

If you think it's fine to do that then there is no point to do this change, we can close this one.

@alculquicondor
Copy link
Member

/remove-kind bug
/kind cleanup
/approve

This is fine with me. Leaving LGTM to @ahg-g

@k8s-ci-robot k8s-ci-robot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed kind/bug Categorizes issue or PR as related to a bug. labels Oct 25, 2021
@ahg-g
Copy link
Member

ahg-g commented Oct 25, 2021

/hold

If you think it's fine to do that then there is no point to do this change, we can close this one.

I don't think we should do this because we will be iterating over all the candidates every time, and in reality we will rarely face errors here. Simply log the error in DryRunPreemption.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 25, 2021
@chendave
Copy link
Member Author

chendave commented Oct 26, 2021

I don't think we should do this because we will be iterating over all the candidates every time, and in reality we will rarely face errors here

Make sense, I took some change from here: a664b8a, so the error there just looks like a flag, and only loop the candidates to collect all the errors when the flag is set.

Simply log the error in DryRunPreemption.

I don't think log is really needed as this is already done there, the problem is that the returned status is not an Error when an error was happened in current code base, this is another reason for the change.

if status.Code() == framework.Error {
klog.ErrorS(nil, "Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
} else {
klog.V(5).InfoS("Status after running PostFilter plugins for pod", "pod", klog.KObj(pod), "status", status)
}

If this change still doesn't look like a right approach, let's just close it instead.

@alculquicondor
Copy link
Member

I think the previous approach was better.

I don't think log is really needed as this is already done there, the problem is that the returned status is not an Error when an error was happened in current code base, this is another reason for the change.

I think this is a valid thing to fix.
I don't think iterating over the statuses is a big deal. Calculating the candidates is way more expensive.

Comment on lines 222 to 226
for _, nodeStatus := range nodeStatuses {
if nodeStatus.Code() == framework.Error {
errs = append(errs, nodeStatus.AsError())
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this in DryRunPreemption instead of iterating again here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ahg-g
Copy link
Member

ahg-g commented Oct 26, 2021

Calculating the candidates is more expensive, but those extra iterations do add up at scale; we should avoid doing those especially when they usually end up being a no-op. We could create the error while iterating in DryRunPreemption.

@chendave
Copy link
Member Author

We could create the error while iterating in DryRunPreemption.

Done.

nodeStatuses[node] = status
candidates, nodeStatuses, err := ev.DryRunPreemption(ctx, pod, potentialNodes, pdbs, offset, numCandidates)
if err != nil {
status = framework.AsStatus(fmt.Errorf("found error: %v", err.Error()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework.AsStatus(err)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although I am not quite sure why findCandidates return a status in the first place, should simply return an error, but lets leave that for another day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findCandidates is not an exported method, no testcase is depending on that, no big change is needed, so add one more commit to address this.

@alculquicondor
Copy link
Member

/approve
you can squash

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AaronWashington, alculquicondor, chendave

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Currently, the status code returned is `Unschedulable` when an internal error
found, the `Unschedulable` status is built from a `FitError` which means no
fit nodes found without a internal error.

Instead of build an Unschedulable status from the `FitError`, return the
Error status directly.

Signed-off-by: Dave Chen <dave.chen@arm.com>
@chendave
Copy link
Member Author

squash done

@ahg-g
Copy link
Member

ahg-g commented Oct 28, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 28, 2021
@chendave
Copy link
Member Author

/unhold
/retest

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 28, 2021
@k8s-ci-robot k8s-ci-robot merged commit 87b0412 into kubernetes:master Oct 28, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Oct 28, 2021
@chendave chendave deleted the wrong_status branch October 28, 2021 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants