Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the issue that the pod anti-filtering rules are not taking effect #1395

Merged

Conversation

fanhaouu
Copy link
Contributor

Currently, the "podMatchesInterPodAntiAffinity" method has an issue. It is unable to check if the pod matches the anti-affinity rule of another pod that is already on the given node.

The root cause is that the "CheckPodsWithAntiAffinityExist" method in "predicate.go" can only check if pods with AntiAffinity issues exist on the same node

This pr can resolve that.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 11, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @fanhaouu. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 11, 2024
@ingvagabund
Copy link
Contributor

ingvagabund commented May 13, 2024

@fanhaouu would you please share an example of this case? With nodeFit=true the pod anti-affinity check is completely skipped.

Currently, the "podMatchesInterPodAntiAffinity" method has an issue. It is unable to check if the pod matches the anti-affinity rule of another pod that is already on the given node.
The root cause is that the "CheckPodsWithAntiAffinityExist" method in "predicate.go" can only check if pods with AntiAffinity issues exist on the same node

Can you rephrase the description in terms of "a pod P1 on a node N1", "a pod P1 wrt. P2", etc.? The description is still ambiguous. It would help to start the description with "There are two nodes N1 and N2. Both pods P1 and P2 are scheduled. P1 to node N1, P2 to node N2. There's another pod P3 scheduled to node N? that violates anti-affinity of P?.
The current algorithm does THIS for pod P? instead of THIS. etc.".

@fanhaouu
Copy link
Contributor Author

fanhaouu commented May 13, 2024

@fanhaouu would you please share an example of this case? With nodeFit=true the pod anti-affinity check is completely skipped.

Currently, the "podMatchesInterPodAntiAffinity" method has an issue. It is unable to check if the pod matches the anti-affinity rule of another pod that is already on the given node.
The root cause is that the "CheckPodsWithAntiAffinityExist" method in "predicate.go" can only check if pods with AntiAffinity issues exist on the same node

Can you rephrase the description in terms of "a pod P1 on a node N1", "a pod P1 wrt. P2", etc.? The description is still ambiguous. It would help to start the description with "There are two nodes N1 and N2. Both pods P1 and P2 are scheduled. P1 to node N1, P2 to node N2. There's another pod P3 scheduled to node N? that violates anti-affinity of P?. The current algorithm does THIS for pod P? instead of THIS. etc.".

Sure, let me give you an example.

If according to the current logic of podMatchesInterPodAntiAffinity, to determine if pod1 can be scheduled to node2, this method returns match as false and error as nil, but in reality, the match value should be true, meaning that pod1 cannot be scheduled to node2

Pod1:

apiVersion: v1
kind: Pod
metadata:
  name: pod1
spec:
  nodeName: node1
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: topology.kubernetes.io/zone

Node1:

apiVersion: v1
kind: Node
metadata:
  labels:
    topology.kubernetes.io/zone: xx-1
  name: node1

Pod2:

apiVersion: v1
kind: Pod
metadata:
  labels:
    security: S2
  name: pod2
spec:
  nodeName: node2
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          topologyKey: topology.kubernetes.io/zone

Node2:

apiVersion: v1
kind: Node
metadata:
  labels:
    topology.kubernetes.io/zone: xx-1
  name: node2

@fanhaouu
Copy link
Contributor Author

@ingvagabund I don't know if the example I provided is clear enough?

}

if nodeFit {
if _, ok := node.Labels[term.TopologyKey]; ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we still do not know whether pod and existingPod are violating the anti-affinity rule. Just that both pods are in anti-affinity relation. Checking whether pod is in a given topology domain is not sufficient here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, master, i think it is sufficient here. If the program reaches this point, it means the podMatchesTermsNamespaceAndSelector method returned true. If the topologyKey of the pod also exists on the node label of the existingPod, it indicates that the pod and the existingPod are violating the anti-affinity rule

Copy link
Contributor

@ingvagabund ingvagabund May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case nodeHavingExistingPod.Labels instead of node.Labels as nodeHavingExistingPod is the target node, not node.

@ingvagabund
Copy link
Contributor

ingvagabund commented May 22, 2024

If according to the current logic of podMatchesInterPodAntiAffinity, to determine if pod1 can be scheduled to node2, this method returns match as false and error as nil, but in reality, the match value should be true, meaning that pod1 cannot be scheduled to node2

So it's the nodefit part here that's in question? I.e. when pod1 is to be evicted you'd expect the descheduler to check whether there's a node where pod1 (resp. it's clone) can be scheduled to?

but in reality, the match value should be true, meaning that pod1 cannot be scheduled to node2

If the match is true, the strategy is expected to evict the pod. Independent of whether the pod can be scheduled to another node or not. If you want to implement the nodefit part for the strategy you can utilize sigs.k8s.io/descheduler/pkg/descheduler/node.PodFitsAnyOtherNode as e.g. in

if topologyBalanceNodeFit && !node.PodFitsAnyOtherNode(getPodsAssignedToNode, aboveToEvict[k], nodesBelowIdealAvg) {
.

@ingvagabund
Copy link
Contributor

Is this PR expected to resolve #1370?

@ingvagabund
Copy link
Contributor

ingvagabund commented May 22, 2024

I see now what's going on:

Here, we evaluate whether both Pods are present on the Node. It only makes sense to check this for the plugin where this function originated from, RemovePodsViolatingInterPodAntiAffinity. That plugin checks if both Pods are on the same Node and whether to evict one of them, if they have matching antiaffinities. It does not make sense to do this when we’re evaluating if a Pod could fit on another Node as part of NodeFit.

You are addressing this part: "It does not make sense to do this when we’re evaluating if a Pod could fit on another Node as part of NodeFit."

@fanhaouu
Copy link
Contributor Author

Is this PR expected to resolve #1370?

yes, master. can fix #1370

@ingvagabund
Copy link
Contributor

ingvagabund commented May 22, 2024

It should be assessing whether a Pod fits on another Node based on whether another Pod on this other Node has matching antiaffinities, but it returns early, due to the following issue.

Let's have P1 scheduled to N1 and P2 scheduled to N2. Pods P1 and P2 have matching anti-affinities. The check here is NodeFit(P1,N2) and the claim here is the check "returns early". Because of https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/utils/predicates.go#L358:

func hasSameLabelValue(node1, node2 *v1.Node, key string) bool {
	if node1.Name == node2.Name { // returns early
		return true
	}
...
}

So the goal here is to ignore node N1 in the check. N1 is retrieved due to accessing pod.Spec.NodeName in https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/utils/predicates.go#L326. Instead, just check whether P1 does/does not violate the anti-affinity rule (produced by P2) on node N2 by checking node N2's labels (through TopologyKey key).

@fanhaouu is this the current issue in hand? Just rephrasing so it is clear what this issue and fix is all about.

@nikimanoledaki for awareness.

@fanhaouu
Copy link
Contributor Author

It should be assessing whether a Pod fits on another Node based on whether another Pod on this other Node has matching antiaffinities, but it returns early, due to the following issue.

Let's have P1 scheduled to N1 and P2 scheduled to N2. Pods P1 and P2 have matching anti-affinities. The check here is NodeFit(P1,N2) and the claim here is the check "returns early". Because of https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/utils/predicates.go#L358:

func hasSameLabelValue(node1, node2 *v1.Node, key string) bool {
	if node1.Name == node2.Name { // returns early
		return true
	}
...
}

So the goal here is to ignore node N1 in the check. N1 is retrieved due to accessing pod.Spec.NodeName in https://github.com/kubernetes-sigs/descheduler/blob/master/pkg/utils/predicates.go#L326. Instead, just check whether P1 does/does not violate the anti-affinity rule (produced by P2) on node N2 by checking node N2's labels (through TopologyKey key).

@fanhaouu is this the current issue in hand? Just rephrasing so it is clear what this issue and fix is all about.

@nikimanoledaki for awareness.

yes, master, you are right

@ingvagabund ingvagabund self-assigned this May 22, 2024
@ingvagabund
Copy link
Contributor

ingvagabund commented May 22, 2024

It should be assessing whether a Pod fits on another Node based on whether another Pod on this other Node has matching antiaffinities, but it returns early, due to the following issue.

More on this. Again with let's have P1 scheduled to N1 and P2 scheduled to N2. Pods P1 and P2 have matching anti-affinities. The check here is NodeFit(P1,N2) and the claim here is the check "returns early".

Given N1 != N2 then hasSameLabelValue is true only when both N1 and N2 have the same Labels[TopologyKey] value. Meaning hasSameLabelValue returns at the end. @nikimanoledaki would you please elaborate more on what "returns early" means in context of #1370? I have spent too much time looking at the code I am probably missing the obvious.

@ingvagabund
Copy link
Contributor

ingvagabund commented May 23, 2024

Wrt. "returns early": In case CheckPodsWithAntiAffinityExist wants to detect that P1 is in violation with P2 on N2 while P1 and P2 are in "pod anti-affinity" relation and N2 has Labels[TopologyKey] (the actual value is irrelevant), CheckPodsWithAntiAffinityExist is expected to return true. Which can happen only when hasSameLabelValue returns true. Which is not the case when N1 (which is different from N2) leads to hasSameLabelValue returning false. Causing CheckPodsWithAntiAffinityExist returns false, which turns into NodeFit not erroring on podMatchesInterPodAntiAffinity.

So the following case leads to NodeFit(P1, N2) behaving incorrectly:

  1. P1 on N1, P2 on N2, P1 in pod anti-affinity with P2, N1 != N2,
  2. N2 has Labels[TopologyKey], N1 does not have Labels[TopologyKey]
  3. NodeFit(P1, N2) returns no error instead of "pod matches inter-pod anti-affinity rule of other pod on node" since N1 is missing Labels[TopologyKey] causing hasSameLabelValue to return false

@ingvagabund
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 23, 2024
@likakuli
Copy link

/cc

@k8s-ci-robot
Copy link
Contributor

@likakuli: GitHub didn't allow me to request PR reviews from the following users: likakuli.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented May 23, 2024

Given N1 != N2 then hasSameLabelValue is true only when both N1 and N2 have the same Labels[TopologyKey] value.

This is correct - I came to the same conclusion that when there are multiple pods and multiple nodes (rather than P1 and P2 both on N1), given that both pods have an anti-affinity, a valid check to validate this is if they have the same TopologyKey.

I wrote a draft PR a while ago that I didn't have time to finish, here, which has more logging that I was using to debug in our environment. It approached this logic in a slightly different way: https://github.com/kubernetes-sigs/descheduler/pull/1415/files

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented May 23, 2024

would you please elaborate more on what "returns early" means in context of #1370?

It returned early here:

// assuming that we have two pods, P1 and P2,
// and assuming that we have two nodes, N1 and N2, 
// where P1 is on N1 and P2 is on N2,
// the following will return early because it evaluates if P1 is on N2, but it isn't.
node, ok := nodeMap[pod.Spec.NodeName] 

As you can see in the PR I used to debug, here, I added the following log:

node, ok := nodeMap[pod.Spec.NodeName]
if !ok {
	klog.V(4).InfoS("antiaffinity check: pod being evaluated not in node", "pod", klog.KObj(pod), "node", klog.KObjs(node))
	continue
}

@ingvagabund
Copy link
Contributor

node, ok := nodeMap[pod.Spec.NodeName]

Normally, the descheduler is expected to check only a pod that has been scheduled. Which implies nodeMap[pod.Spec.NodeName] will always exist. Though, I can see how CheckPodsWithAntiAffinityExist can be functionality tested over a pod that has not been scheduled yet.

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented May 23, 2024

the descheduler is expected to check only a pod that has been scheduled. Which implies nodeMap[pod.Spec.NodeName] will always exist.

Hm not sure we are seeing the same thing. Taking a step back, we know that NodeFit should be able to check whether P1, which is on N1, can be scheduled on another node, N2. With this anti-affinity check, it should check if there is an existing Pod on N2, P2, which has anti-affinity with P1.

Currently, the code will check whether P1 and P2 are on N2. Of course, we know that P1 is not yet on N2, so node, ok := nodeMap[pod.Spec.NodeName] will always return !ok and then return early (continue). Therefore, we don't want to check this if we are coming from NodeFit where P1 is not on N2 (yet).

I think the issue is that this was a leftover/oversight from my part when I was refactoring this in #1356 where I tried to refactor this anti-affinity check from RemovePodsViolatingInterPodAntiAffinit. That plugin was comparing P1 and P2 when they are already both on the same Node. On the other hand, when coming from NodeFit, it should be able to check when P1 is on N1 and P2 is on N2. I hope this makes some sense 🙈

@ingvagabund
Copy link
Contributor

Currently, the code will check whether P1 and P2 are on N2. Of course, we know that P1 is not yet on N2, so node, ok := nodeMap[pod.Spec.NodeName] will always return !ok and then return early (continue). Therefore, we don't want to check this if we are coming from NodeFit where P1 is not on N2 (yet).

P1 is scheduled on N1 so I wonder why you get !ok when accessing nodeMap[P1.Spec.NodeName]

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented May 23, 2024

P1 is scheduled on N1 so I wonder why you get !ok when accessing nodeMap[P1.Spec.NodeName]

Because the Node that we are checking for in nodeMap is in fact N2! It's the destination Node that we are evaluating against, not the origin Node...

It's checking if P1 is on N2 (which is not expected to be there yet, therefore it returns early) and then if P2 is on N2 (which we expect it to be).

At least, this is my understanding - I could be wrong.

@nikimanoledaki
Copy link
Contributor

nikimanoledaki commented May 23, 2024

This is my understanding, let me know if this is different from yours

Screenshot 2024-05-23 at 15 55 50

continue
}

node, ok := nodeMap[pod.Spec.NodeName]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one Node being passed when coming from NodeFit and that is the destination Node (N2). P1 is not on N2, so this will be !ok. We should only evaluate this if nodeFit == false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, should using existingPod here, i have fixed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps my pr misled ingvagabund,😂

Copy link
Contributor

@nikimanoledaki nikimanoledaki May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should using existingPod here

Yes! I agree about reversing to make sure that existingPod is on N2, even when assessing from NodeFit 👍

@fanhaouu fanhaouu force-pushed the fix-check-pod-anti-affinity branch 2 times, most recently from 65916fa to af9eab9 Compare May 24, 2024 03:50
@fanhaouu
Copy link
Contributor Author

/retest

Copy link
Contributor

@nikimanoledaki nikimanoledaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This PR should solve #1370 👍 Thank you @fanhaouu!

pkg/utils/predicates.go Outdated Show resolved Hide resolved
@@ -1067,6 +1067,7 @@ func TestCheckPodsWithAntiAffinityExist(t *testing.T) {
}
}),
},
nodeFit: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can omit nodeFit: false lines. false is the default value.

{
name: "found pod matching pod anti-affinity when node fitting",
pod: test.PodWithPodAntiAffinity(test.BuildTestPod("p1", 1000, 1000, "node1", nil), "foo", "bar"),
podsInNamespace: map[string][]*v1.Pod{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's construct a complete map here and other test cases. I.e.:

podsInNamespace: map[string][]*v1.Pod{
	"default": {
		test.PodWithPodAntiAffinity(test.BuildTestPod("p1", 1000, 1000, "node1", nil), "foo", "bar"),
		test.PodWithPodAntiAffinity(test.BuildTestPod("p2", 1000, 1000, "node2", nil), "foo", "bar"),
	},
},

Missing a pod in podsInNamespace makes the pod non-existent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

podsInNamespace refers to pods on different nodes from pod1, so it is complete

test.PodWithPodAntiAffinity(test.BuildTestPod("p2", 1000, 1000, "node2", nil), "foo", "bar"),
},
},
nodeMap: map[string]*v1.Node{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here:

nodeMap: map[string]*v1.Node{
	"node1": ...
	"node2": test.BuildTestNode("node2", 64000, 128*1000*1000*1000, 2, func(node *v1.Node) {
		node.ObjectMeta.Labels = map[string]string{
			"region": "main-region",
		}
	}),
},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also not necessary. pod1 is on node1, nodeMap contains the node where pod2 is located.Here we are writing test cases for when nodeFit is true

expMatch: false,
},
{
name: "found pod matching pod anti-affinity when node fitting",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rephrase it in a more specific way? E.g. "Pod violates inter-pod anti-affinity, ignoring source pod domain" or similar.

The same for other test names. Better to be more expressive than generic.

@@ -310,7 +310,7 @@ func CreateNodeMap(nodes []*v1.Node) map[string]*v1.Node {
}

// CheckPodsWithAntiAffinityExist checks if there are other pods on the node that the current pod cannot tolerate.
func CheckPodsWithAntiAffinityExist(pod *v1.Pod, pods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
func CheckPodsWithAntiAffinityExist(pod *v1.Pod, pods map[string][]*v1.Pod, nodeMap map[string]*v1.Node, nodeFit bool) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/pod/candidatePod (or similar) to stress pod variable is the pod getting tested for the inter-pod anti-affinity violation.

Worth renaming pods into e.g. placedPods, allocatedPods, assignedPods or similar to stress these pods are already assigned to nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem

continue
}

if nodeFit {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeFit name can be misleading here. NodeFit itself takes a pod and a node and the predicates checks whether the pod can be scheduled to the node. In here, there's no node passed through the function arguments. nodeHavingExistingPod nodes are nodes to which the already assigned pods are scheduled/assigned to.

In case of NodeFit(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, node *v1.Node) the underlying podMatchesInterPodAntiAffinity function selects only pods that are assigned to node. Nevertheless, given CheckPodsWithAntiAffinityExist is a public function setting nodeFit=true, it's not easy to guess the original intention. For that, I'd rather prefer to separate the former CheckPodsWithAntiAffinityExist implementation from it's invocation in NodeFit.

Given podMatchesInterPodAntiAffinity already checks for pod.Spec.Affinity == nil || pod.Spec.Affinity.PodAntiAffinity == nil you can replace return utils.CheckPodsWithAntiAffinityExist(pod, podsInANamespace, nodeMap), nil with what's left in CheckPodsWithAntiAffinityExist when nodeFit is true. I.e.:

func podMatchesInterPodAntiAffinity(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, node *v1.Node) (bool, error) {
	if pod.Spec.Affinity == nil || pod.Spec.Affinity.PodAntiAffinity == nil {
		return false, nil
	}

	podsOnNode, err := podutil.ListPodsOnANode(node.Name, nodeIndexer, nil)
	if err != nil {
		return false, fmt.Errorf("error listing all pods: %v", err)
	}

	podsInANamespace := podutil.GroupByNamespace(podsOnNode)
	nodeMap := utils.CreateNodeMap([]*v1.Node{node})

	/// This is new
	for _, term := range getPodAntiAffinityTerms(pod.Spec.Affinity.PodAntiAffinity) {
		namespaces := getNamespacesFromPodAffinityTerm(pod, &term)
		selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
		if err != nil {
			return false, fmt.Errorf("Unable to convert LabelSelector into Selector: %v", err)
		}
		for namespace := range namespaces {
			for _, existingPod := range podsInANamespace[namespace] {
				if existingPod.Name != pod.Name && podMatchesTermsNamespaceAndSelector(existingPod, namespaces, selector) {
					nodeHavingExistingPod, ok := nodeMap[existingPod.Spec.NodeName]
					if !ok {
						continue
					}
					if _, ok = nodeHavingExistingPod.Labels[term.TopologyKey]; ok {
						klog.V(1).InfoS("Found Pods matching PodAntiAffinity when node fitting", "pod with anti-affinity", klog.KObj(pod))
						return true, nil
					}
				}
			}
		}
	}
	return false, nil
}

You could turn

	for _, term := range getPodAntiAffinityTerms(pod.Spec.Affinity.PodAntiAffinity) {
		namespaces := getNamespacesFromPodAffinityTerm(pod, &term)
		selector, err := metav1.LabelSelectorAsSelector(term.LabelSelector)
		if err != nil {
			return false, fmt.Errorf("Unable to convert LabelSelector into Selector: %v", err)
		}
		for namespace := range namespaces {
			for _, existingPod := range podsInANamespace[namespace] {

into a traversing function InterPodAntiAffinityTraverseFnc(pod, func(existingPod *v1.Pod, namespaces sets.Set[string], selector labels.Selector) bool) bool:

func InterPodAntiAffinityTraverseFnc(pod, func(existingPod *v1.Pod, namespaces sets.Set[string], selector labels.Selector) bool {
	if existingPod.Name != pod.Name && podMatchesTermsNamespaceAndSelector(existingPod, namespaces, selector) {
		nodeHavingExistingPod, ok := nodeMap[existingPod.Spec.NodeName]
		if !ok {
			return false
		}
		if _, ok = nodeHavingExistingPod.Labels[term.TopologyKey]; ok {
			klog.V(1).InfoS("Found Pods matching PodAntiAffinity when node fitting", "pod with anti-affinity", klog.KObj(pod))
			return true
		}
	}
	return false
});

Then

func podMatchesInterPodAntiAffinity(nodeIndexer podutil.GetPodsAssignedToNodeFunc, pod *v1.Pod, node *v1.Node) (bool, error) {
	if pod.Spec.Affinity == nil || pod.Spec.Affinity.PodAntiAffinity == nil {
		return false, nil
	}

	podsOnNode, err := podutil.ListPodsOnANode(node.Name, nodeIndexer, nil)
	if err != nil {
		return false, fmt.Errorf("error listing all pods: %v", err)
	}

	podsInANamespace := podutil.GroupByNamespace(podsOnNode)
	nodeMap := utils.CreateNodeMap([]*v1.Node{node})

	/// This is new
	return InterPodAntiAffinityTraverseFnc(pod, func(existingPod *v1.Pod, namespaces sets.Set[string], selector labels.Selector) bool {
		if existingPod.Name != pod.Name && podMatchesTermsNamespaceAndSelector(existingPod, namespaces, selector) {
			nodeHavingExistingPod, ok := nodeMap[existingPod.Spec.NodeName]
			if !ok {
				return false
			}
			if _, ok = nodeHavingExistingPod.Labels[term.TopologyKey]; ok {
				klog.V(1).InfoS("Found Pods matching PodAntiAffinity when node fitting", "pod with anti-affinity", klog.KObj(pod))
				return true
			}
		}
		return false
	});
}

and

// CheckPodsWithAntiAffinityExist checks if there are other pods on the node that the current pod cannot tolerate.
func CheckPodsWithAntiAffinityExist(pod *v1.Pod, pods map[string][]*v1.Pod, nodeMap map[string]*v1.Node) bool {
	if pod.Spec.Affinity == nil || pod.Spec.Affinity.PodAntiAffinity == nil {
		return false
	}

	match, err := InterPodAntiAffinityIterator(pod, func(existingPod *v1.Pod, namespaces sets.Set[string], selector labels.Selector) bool {
		if existingPod.Name != pod.Name && podMatchesTermsNamespaceAndSelector(existingPod, namespaces, selector) {
			node, ok := nodeMap[pod.Spec.NodeName]
			if !ok {
				return false
			}
			nodeHavingExistingPod, ok := nodeMap[existingPod.Spec.NodeName]
			if !ok {
				return false
			}
			if hasSameLabelValue(node, nodeHavingExistingPod, term.TopologyKey) {
				klog.V(1).InfoS("Found Pods matching PodAntiAffinity", "pod with anti-affinity", klog.KObj(pod))
				return true
			}
		}
		return false
	})
	if err != nil {
		klog.ErrorS(err, "Unable to convert LabelSelector into Selector")
		return false
	}
	return false
}

Or, even go further and make if existingPod.Name != pod.Name && podMatchesTermsNamespaceAndSelector(existingPod, namespaces, selector) { check part of the traversing function.

Note: I will need to double check this tomorrow with a fresh pair of eyes.

Copy link
Contributor Author

@fanhaouu fanhaouu May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CheckPodsWithAntiAffinityExist is a common method, it will still be prone to misuse, thus triggering the same bug, without resolving any issues, it just glosses over the problem

@ingvagabund
Copy link
Contributor

@fanhaouu apologies for pushing the other commit. I mis-clicked.

@fanhaouu
Copy link
Contributor Author

@fanhaouu apologies for pushing the other commit. I mis-clicked.

haha, let me resubmit it.

@fanhaouu fanhaouu force-pushed the fix-check-pod-anti-affinity branch from 12ac004 to c80556f Compare June 2, 2024 11:02
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 2, 2024
@fanhaouu
Copy link
Contributor Author

fanhaouu commented Jun 2, 2024

/retest

@fanhaouu
Copy link
Contributor Author

fanhaouu commented Jun 2, 2024

I made adjustments, to no longer pursue reuse CheckPodsWithAntiAffinityExist method, but realized podMatchesInterPodAntiAffinity method alone, at the same time by adjusting part of the method of parameter names and internal logic. @ingvagabund @nikimanoledaki
Looking forward to a re-review and merging as soon as possible to avoid impacting users.

@ingvagabund
Copy link
Contributor

Very nice!!! @fanhaouu thank you for your patience and improvements.
/approve
Leaving the final lgtm for @nikimanoledaki

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2024
Copy link
Contributor

@nikimanoledaki nikimanoledaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too! Thank you @fanhaouu for implementing this change and @ingvagabund for your suggestions and review! 🚀

@ingvagabund
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 3, 2024
@k8s-ci-robot k8s-ci-robot merged commit 748495a into kubernetes-sigs:master Jun 3, 2024
8 checks passed
k8s-ci-robot added a commit that referenced this pull request Jun 5, 2024
…#1412-#1413-#1416-#1395-upstream-release-1.30

Automated cherry pick of #1378: Fix the replicas type for the helm-chart
#1390: allow 'falsey' value in cmdOption
#1412: fix helm's default deschedulerPolicy
#1413: fix TOC location in Readme
#1416: use cmd context instead of using context.Background()
#1395: fix the issue that the pod anti-filtering rules are not
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants