Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix filter plugins are not been called during preemption #81876

Conversation

wgliang
Copy link
Contributor

@wgliang wgliang commented Aug 24, 2019

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #81866

Special notes for your reviewer:
@alculquicondor

Does this PR introduce a user-facing change?:

Fix filter plugins are not been called during preemption

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 24, 2019
@wgliang
Copy link
Contributor Author

wgliang commented Aug 24, 2019

/sig scheduling
/priority important-soon

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 24, 2019
@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch 2 times, most recently from 5a62ce6 to 8cc86ce Compare August 24, 2019 02:26
Copy link
Contributor

@draveness draveness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1121,7 +1136,7 @@ func selectVictimsOnNode(
// inter-pod affinity to one or more victims, but we have decided not to
// support this case for performance reasons. Having affinity to lower
// priority pods is not a recommended configuration anyway.
if fits, _, err := podFitsOnNode(pod, meta, nodeInfoCopy, fitPredicates, queue, false); !fits {
if fits, _, err, _ := podFitsOnNode(pluginContext, pod, meta, nodeInfoCopy, fitPredicates, queue, false, filterPluginsFunc); !fits {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following is correct since the err is the last return value

Suggested change
if fits, _, err, _ := podFitsOnNode(pluginContext, pod, meta, nodeInfoCopy, fitPredicates, queue, false, filterPluginsFunc); !fits {
if fits, _, _, err := podFitsOnNode(pluginContext, pod, meta, nodeInfoCopy, fitPredicates, queue, false, filterPluginsFunc); !fits {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pod *v1.Pod,
meta predicates.PredicateMetadata,
info *schedulernodeinfo.NodeInfo,
predicateFuncs map[string]predicates.FitPredicate,
queue internalqueue.SchedulingQueue,
alwaysCheckAllPredicates bool,
) (bool, []predicates.PredicateFailureReason, error) {
filterPluginsFunc func(pc *framework.PluginContext, pod *v1.Pod, nodeName string) *framework.Status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using the function as a parameter? what about the framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @draveness
IMO, from now on, it is sufficient to pass filterPluginsFunc, which seems to be better from the perspective of minimum function implementation and testing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this approach has the following drawbacks:

  • Harder to read, IDEs can't find all the references to Framewokr.RunFilterPlugins
  • We have repeated code for the function prototype. Refactoring takes longer.

If we have a proper fake Framework, testing shouldn't be a problem. Also, we should prefer to test exported interfaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a proper fake Framework, testing shouldn't be a problem. Also, we should prefer to test exported interfaces.

We do have a fakeFramework in scheduling_queue_test.go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a value in passing the function as a parameter, it is not easy to follow. Make podFitsOnNode, selectVictimsOnNode and selectNodesForPreemption private methods of genericScheduler, and that will give you access to the framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahg-g Agree with you, one more parameter is really unnecessary, whether it is framework or filterPluginsFunc.

nodeNameToInfo map[string]*schedulernodeinfo.NodeInfo,
potentialNodes []*v1.Node,
fitPredicates map[string]predicates.FitPredicate,
metadataProducer predicates.PredicateMetadataProducer,
queue internalqueue.SchedulingQueue,
pdbs []*policy.PodDisruptionBudget,
filterPluginsFunc func(pc *framework.PluginContext, pod *v1.Pod, nodeName string) *framework.Status,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -525,7 +515,15 @@ func (g *genericScheduler) findNodesThatFit(pluginContext *framework.PluginConte
}
} else {
predicateResultLock.Lock()
failedPredicateMap[nodeName] = failedPredicates
if status != nil && !status.IsSuccess() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: !status.IsSuccess() won't panic even if the status is nil, please take a look at the implementation of IsSuccess function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I understand that you don't mean it. You mean that when the status is nil, calling status.IsSusccess() won't panic?
nil is untyped, my understanding will get an error similar to use of untyped nil panic error.

If my understanding is wrong, please let me know. Thanks. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I understand that you don't mean it. You mean that when the status is nil, calling status.IsSusccess() won't panic?

Please take a look at https://github.com/kubernetes/kubernetes/pull/81876/files#diff-c237cdd9e4cb201118ca380732d7f361L509

Copy link
Contributor

@draveness draveness Aug 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a typed nil, I created a snippet here which shows it won't cause panic.

package main

import (
	"fmt"
)

type Status struct{}

func (s *Status) IsSuccess() bool {
	if s == nil {
		return true
	}
	return false
}

func main() {
	st := returnStatus()
	fmt.Println(st.IsSuccess())
}

func returnStatus() *Status {
	return nil
}
package main

import (
	"fmt"
)

type Status struct{}

func (s *Status) IsSuccess() bool {
	if s == nil {
		return true
	}
	return false
}

func main() {
	st := returnStatus()
	fmt.Println(st.IsSuccess())
}

func returnStatus() *Status {
	return nil
}

$ go run ...
true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! I got it.

@hex108
Copy link
Contributor

hex108 commented Aug 24, 2019

As mentioned in comment, how about converting podFitsOnNode to default filter plugin, then genericScheduler.Preempt calls framework.RunFilterPlugins instead of podFitsOnNode?

@draveness
Copy link
Contributor

draveness commented Aug 24, 2019

As mentioned in comment, how about converting podFitsOnNode to default filter plugin, then genericScheduler.Preempt calls framework.RunFilterPlugins instead of podFitsOnNode?

There could be a lot of compatible work to do since they are using two different error systems, I have tried once but cause a lot of trouble which looks quite like a hack, and this PR is a fine approach to solve this issue IMO.

@draveness
Copy link
Contributor

cc/ @alculquicondor

@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch 2 times, most recently from c296483 to 7fb9e23 Compare August 25, 2019 23:47
@@ -137,7 +137,7 @@ type ScheduleAlgorithm interface {
// the pod by preempting lower priority pods if possible.
// It returns the node where preemption happened, a list of preempted pods, a
// list of pods whose nominated node name should be removed, and error if any.
Preempt(*v1.Pod, error) (selectedNode *v1.Node, preemptedPods []*v1.Pod, cleanupNominatedPods []*v1.Pod, err error)
Preempt(*framework.PluginContext, *v1.Pod, error) (selectedNode *v1.Node, preemptedPods []*v1.Pod, cleanupNominatedPods []*v1.Pod, err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pod and then pluginContext? I prefer we are consistent with Schedule

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the keep the context as the first argument, there is a chance it would conform context.Context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor
FYI:https://golang.org/pkg/context/
The Context should be the first parameter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not currently an implementation of context.Context, so it doesn't apply. That said, we should consider rearranging the parameters in Schedule along with Preempt. But that shouldn't be in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Aldo's suggestion regarding rearranging the Schedule to be consistent with Preempt in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will do that. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we could accept this since we have already reached a consensus on conforming the Context in #81433 and #82072 is making the change.

pod *v1.Pod,
meta predicates.PredicateMetadata,
info *schedulernodeinfo.NodeInfo,
predicateFuncs map[string]predicates.FitPredicate,
queue internalqueue.SchedulingQueue,
alwaysCheckAllPredicates bool,
) (bool, []predicates.PredicateFailureReason, error) {
filterPluginsFunc func(pc *framework.PluginContext, pod *v1.Pod, nodeName string) *framework.Status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, this approach has the following drawbacks:

  • Harder to read, IDEs can't find all the references to Framewokr.RunFilterPlugins
  • We have repeated code for the function prototype. Refactoring takes longer.

If we have a proper fake Framework, testing shouldn't be a problem. Also, we should prefer to test exported interfaces.


// Iterate each plugin to verify current node
status := filterPluginsFunc(pluginContext, pod, info.Node().Name)
if !status.IsSuccess() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary. We can directly return, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clear to seperate the good and base cases IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return value is the same, I don't see the point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for the returned boolean, whether or not the pod fits on the node is easily determined by the other returned values, so I suggest we just return failedPredicates, status, nil here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahg-g review. IMO, we need to return a bool type value to indicate whether the current node matches.

Because the results cannot be directly determined based on failedPredicates or err, this is because L663 and L681. Indeed, we can check both len(failedPredicates) and err, but this undoubtedly brings complexity to the judgment. We need more readable and easy to understand processing logic.

@@ -1121,7 +1136,7 @@ func selectVictimsOnNode(
// inter-pod affinity to one or more victims, but we have decided not to
// support this case for performance reasons. Having affinity to lower
// priority pods is not a recommended configuration anyway.
if fits, _, err := podFitsOnNode(pod, meta, nodeInfoCopy, fitPredicates, queue, false); !fits {
if fits, _, _, err := podFitsOnNode(pluginContext, pod, meta, nodeInfoCopy, fitPredicates, queue, false, filterPluginsFunc); !fits {
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about status? That could be an error as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@@ -1330,7 +1330,8 @@ func TestSelectNodesForPreemption(t *testing.T) {
newnode := makeNode("newnode", 1000*5, priorityutil.DefaultMemoryRequest*5)
newnode.ObjectMeta.Labels = map[string]string{"hostname": "newnode"}
nodes = append(nodes, newnode)
nodeToPods, err := selectNodesForPreemption(test.pod, nodeNameToInfo, nodes, test.predicates, PredicateMetadata, nil, nil)
pluginContext := framework.NewPluginContext()
nodeToPods, err := selectNodesForPreemption(pluginContext, test.pod, nodeNameToInfo, nodes, test.predicates, PredicateMetadata, nil, nil, func(pc *framework.PluginContext, pod *v1.Pod, nodeName string) *framework.Status { return nil })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for what we are adding? preemption when the filter plugins keep filtering the nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you work on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed it. If I am not mistaken, the test case you are talking about is actually to confirm that the filter plugins are also executed when preemption occurs.

This does not seem to be a test case, but rather the number of times the filter plugins are executed.

I will check filterPlugin.numFilterCalled with the expected in TestSelectNodesForPreemption. What do you think of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds good. But you could also check that a plugin responds Unschedulable in the preemption phase and then the node is not returned in the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@@ -349,14 +349,14 @@ func (sched *Scheduler) schedule(pod *v1.Pod, pluginContext *framework.PluginCon
// preempt tries to create room for a pod that has failed to schedule, by preempting lower priority pods if possible.
// If it succeeds, it adds the name of the node where preemption has happened to the pod spec.
// It returns the node name and an error if any.
func (sched *Scheduler) preempt(fwk framework.Framework, preemptor *v1.Pod, scheduleErr error) (string, error) {
func (sched *Scheduler) preempt(pluginContext *framework.PluginContext, fwk framework.Framework, preemptor *v1.Pod, scheduleErr error) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the framework should be available in sched.Framework. Pass pluginContext as second argument to for consistency with schedule

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the reply above, I actually prefer to update pass pluginContext as first argument with schedule.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but please do it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is being handled in #82072

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this in a separate small PR because we don't know if #82072 will move forward or not, and if yes in what form, so it may take a little while to merge.

Copy link
Member

@ahg-g ahg-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this PR, I took a quick look, I will take another look once the comments are addressed.

failedPredicateMap[nodeName] = failedPredicates
if !status.IsSuccess() {
filteredNodesStatuses[nodeName] = status
if status.Code() != framework.Unschedulable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podFitsOnNode should return an error if this happens, so we shouldn't check here again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE,Thanks!

pod *v1.Pod,
meta predicates.PredicateMetadata,
info *schedulernodeinfo.NodeInfo,
predicateFuncs map[string]predicates.FitPredicate,
queue internalqueue.SchedulingQueue,
alwaysCheckAllPredicates bool,
) (bool, []predicates.PredicateFailureReason, error) {
filterPluginsFunc func(pc *framework.PluginContext, pod *v1.Pod, nodeName string) *framework.Status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a value in passing the function as a parameter, it is not easy to follow. Make podFitsOnNode, selectVictimsOnNode and selectNodesForPreemption private methods of genericScheduler, and that will give you access to the framework.

@@ -137,7 +137,7 @@ type ScheduleAlgorithm interface {
// the pod by preempting lower priority pods if possible.
// It returns the node where preemption happened, a list of preempted pods, a
// list of pods whose nominated node name should be removed, and error if any.
Preempt(*v1.Pod, error) (selectedNode *v1.Node, preemptedPods []*v1.Pod, cleanupNominatedPods []*v1.Pod, err error)
Preempt(*framework.PluginContext, *v1.Pod, error) (selectedNode *v1.Node, preemptedPods []*v1.Pod, cleanupNominatedPods []*v1.Pod, err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Aldo's suggestion regarding rearranging the Schedule to be consistent with Preempt in a separate PR.

}

// Iterate each plugin to verify current node
status := filterPluginsFunc(pluginContext, pod, info.Node().Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should run the filter plugins inside the for i := 0; i < 2; i++ { loop (at the end) so that they get examined with and without nominated pods.

Also, return error in case status is not one of the expected values (not Success or Unschedulable).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing we need to think about is whether or not we should pass the alwaysCheckAllPredicates flag to RunFilterPlugins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a separate PR to complete this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a separate PR for the first comment, this is part of addressing the bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah!


// Iterate each plugin to verify current node
status := filterPluginsFunc(pluginContext, pod, info.Node().Name)
if !status.IsSuccess() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the need for the returned boolean, whether or not the pod fits on the node is easily determined by the other returned values, so I suggest we just return failedPredicates, status, nil here.

@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch 2 times, most recently from cd2333d to f078b59 Compare August 29, 2019 09:08

// Iterate each plugin to verify current node
status = g.framework.RunFilterPlugins(pluginContext, pod, info.Node().Name)
if !status.IsSuccess() && status.Code() != framework.Unschedulable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use status.IsUnschedulable or even, what about simply if status.Code() == framework.Error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use !status.IsUnschedulable() not status.Code() == framework.Error because Success and Unschedulable* are the only two valid statues that we expect from filters, all other statuses should be considered errors.

if err != nil {
klog.Warningf("Encountered error while selecting victims on node %v: %v", nodeInfo.Node().Name, err)
}

if !status.IsSuccess() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you already handle it in podFitsOnNode, when you return status.AsError()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be unschedulable here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that these warnings are for actual errors. Also, if the filters failed, this conditional and the one about would both produce the same output, which is bad for troubleshooting when looking at logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we shouldn't have this warning.

@@ -653,7 +653,7 @@ func TestGenericScheduler(t *testing.T) {
schedulerapi.DefaultPercentageOfNodesToScore,
false)
result, err := scheduler.Schedule(test.pod, framework.NewPluginContext())
if !reflect.DeepEqual(err, test.wErr) {
if err != nil && !reflect.DeepEqual(err, test.wErr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the returned err may be nil, otherwise reflect.DeepEqual(err, test.wErr) in test will panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reflect.DeepEqual(err, test.wErr) in test will panic

No, it won't panic.
https://play.golang.org/p/nLXk5MDyc7k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line of code, I did get a null pointer error in my development environment, which is why I originally made this change. But I found that it doesn't exist now, so weird.

I have cancelled this change, thanks for all of you reminding me.

@@ -1307,6 +1307,9 @@ func TestSelectNodesForPreemption(t *testing.T) {
labelKeys := []string{"hostname", "zone", "region"}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
g := &genericScheduler{
framework: emptyFramework,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests with a non-empty framework? Basically, to have a regression test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestSelectNodesForPreemption mainly tests selectNodesForPreemption, so the empty framework does not seem to be necessary. However, I have changed to a non-empty framework. If I understand your comment, please let me know, thanks. :)


// Iterate each plugin to verify current node
status = g.framework.RunFilterPlugins(pluginContext, pod, info.Node().Name)
if !status.IsSuccess() && status.Code() != framework.Unschedulable {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use !status.IsUnschedulable() not status.Code() == framework.Error because Success and Unschedulable* are the only two valid statues that we expect from filters, all other statuses should be considered errors.

}

return len(failedPredicates) == 0, failedPredicates, nil
return true, failedPredicates, status, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 682-686 should be replaced with:

fit := len(failedPredicates) == 0 && status.IsSuccess()
return fit, failedPredicates, status, nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, DONE thanks!

@@ -676,9 +671,19 @@ func podFitsOnNode(
}
}
}

// Iterate each plugin to verify current node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this comment, it does not read well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

filteredNodesStatuses[nodeName] = status
} else {
failedPredicateMap[nodeName] = failedPredicates
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add both:

if !status.IsSuccess() {
  filteredNodesStatuses[nodeName] = status
} 
if len(failedPredicates) {
  failedPredicateMap[nodeName] = failedPredicates
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@k8s-ci-robot
Copy link
Contributor

@ahg-g: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to this:

/milestone v1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ahg-g
Copy link
Member

ahg-g commented Aug 30, 2019

@k82cn can you please add milestone 1.16 for this one as well? this is a bug fix.

@alculquicondor
Copy link
Member

You can rebase now

@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch from f340772 to 6da5f9e Compare August 31, 2019 00:15
@wgliang
Copy link
Contributor Author

wgliang commented Aug 31, 2019

You can rebase now

Ok, done

@k82cn
Copy link
Member

k82cn commented Sep 2, 2019

/milestone v1.16

@k8s-ci-robot k8s-ci-robot added this to the v1.16 milestone Sep 2, 2019
@alculquicondor
Copy link
Member

Implementation LGTM. Let's just add the regression test.

@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch from 6da5f9e to 892ddf5 Compare September 5, 2019 12:15
@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2019
@@ -1119,14 +1119,31 @@ var startTime20190107 = metav1.Date(2019, 1, 7, 1, 1, 1, 0, time.UTC)
// that podsFitsOnNode works correctly and is tested separately.
func TestSelectNodesForPreemption(t *testing.T) {
defer algorithmpredicates.SetPredicatesOrderingDuringTest(order)()

filterPluginRegistry := framework.Registry{filterPlugin.Name(): NewFilterPlugin}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we faced problems in the past when using shared variables for the plugin instances, it would be great if we can change the way we instantiate plugins to be similar to what we did in integration tests:

define a generic function for instantiation:

// newPlugin returns a plugin factory with specified Plugin.
func newPlugin(plugin framework.Plugin) framework.PluginFactory {
	return func(_ *runtime.Unknown, fh framework.FrameworkHandle) (framework.Plugin, error) {
		return plugin, nil
	}
}

and here, you can just do:

filterPlugin := &FilterPlugin{}
	registry := framework.Registry{filterPluginName: newPlugin(filterPlugin)}

The above should allow you to remove the global instance filterPlugin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

@guineveresaenger
Copy link
Contributor

@ahg-g @alculquicondor @wgliang

I'm the Release Lead Shadow for 1.16 release.

This is an outstanding 1.16 PR and needs to be merged before EOD Monday 9/9 in time for our rc cut.

It looks like according to @ahg-g 's review that this isn't actually marked lgtm? Can someone update with an expected time of completion? Thank you! ❤️

@alculquicondor
Copy link
Member

It looks like according to @ahg-g 's review that this isn't actually marked lgtm? Can someone update with an expected time of completion? Thank you!

Exactly, @wgliang is going to handle the latest minor suggestion

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2019
@guineveresaenger
Copy link
Contributor

do you have a reasonable notion of when thiswill be done, @alculquicondor? one thing that's missing is an Approver, it might be helpful to find someone so we can 🚢 it!

@ahg-g
Copy link
Member

ahg-g commented Sep 5, 2019

I can approve it.

@wgliang
Copy link
Contributor Author

wgliang commented Sep 6, 2019

/retest

@wgliang wgliang force-pushed the bugfix/filter-plugins-are-not-been-called-during-preemption branch from 892ddf5 to d84a75c Compare September 6, 2019 01:59
@wgliang
Copy link
Contributor Author

wgliang commented Sep 6, 2019

@ahg-g @alculquicondor
Already updated all comments. Thank you for your review. :)

@ahg-g
Copy link
Member

ahg-g commented Sep 6, 2019

/lgtm
/approve
Thanks @wgliang

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, wgliang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 6, 2019
@k8s-ci-robot k8s-ci-robot merged commit c58767b into kubernetes:master Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Framework] Filter plugins are not been called during preemption
9 participants