NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present #114440

jason-i-vv · 2022-12-13T08:02:48Z

What happened?

We had a nodePort service which use port 30002, transport to a host-network pod which listen 8002.Now it is invalid , we can't access nodeIp:30002 , but nodeIp:8002 is valid.

why we use nodePort when we already use host-network?
Because it could has multiple replicas, we use nodePort to ensure it's avaliable when one pod is down.

we check ipvs rules,it's like :

ipvsadm -L -n 
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  172.17.0.1:30002 rr
TCP  172.17.0.1:30009 rr
  -> 10.16.203.62:8002

We all see , 30002 doesn't have any target.

we check endpont status ,it's like:

We all see , the same ipaddr is both in Addresses and NotReadyAddress.

And , there are some invalid pods and one valid pod , this is why the endpoint looks like.

What did you expect to happen?

if an ipaddr is both in readyAddrs and notReadyAddrs, it shouled be removed from notReadyAddrs to make sure that
valid IP addresses are not masked by invalid IP addresses。Then the url http://nodeIP:30002 we called will always be accessible whenever the deployment has invalid pods or not.

How can we reproduce it (as minimally and precisely as possible)?

apply a pod which use host-network , and bind to a fixed node; the pod shouled not has request and limit value ,ensure is is BestEffort , so it will be evicited first.
apply a service which use nodePort 30002, and bind to the pod
manual make a Diskpressure which is easily caused by dd a big file.
the pod will be evicited ,
move the Diskpressure ,And the pod will be created
Then you will reproduce it , the endpoint will has the nodeIp both in readyAddrs and notReadyAddrs.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# v1.22.4

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Tercel)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Tercel)"
ANSI_COLOR="0;31"
$ uname -a
Linux i-24AFE987 4.19.90-21.2.ky10.aarch64 #1 SMP Tue Oct 27 16:46:15 CST 2020 aarch64 aarch64 aarch64 GNU/Linux

</details>


### Install tools

<details>

</details>


### Container runtime (CRI) and version (if applicable)

<details>
containerd -v
containerd github.com/containerd/containerd v1.6.2 de8046a5501db9e0e478e1c10cbcfb21af4c6b2d
</details>


### Related plugins (CNI, CSI, ...) and versions (if applicable)

<details>

</details>

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2022-12-13T08:02:57Z

@jason-i-vv: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jason-i-vv · 2022-12-13T09:29:19Z

/sig network

jason-i-vv · 2022-12-13T09:31:42Z

/wg multitenancy

danwinship · 2022-12-16T17:38:05Z

/retitle NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present

The API docs say "No address will appear in both Addresses and NotReadyAddresses in the same subset" but I think that means "no completely identical EndpointAddress including TargetRef".

Also, kube-proxy doesn't actually operate on the Endpoints any more anyway, it operates on EndpointSlices, which are slightly different, and the distinction is whether the Terminating field is set or not.

I think probably we do want to indicate that there are simultaneous Ready and Terminating endpoints for the service on the same node, but the proxy code needs to be more careful to interpret this situation correctly...

/cc @robscott

robscott · 2022-12-16T20:24:12Z

I'm not sure I'm completely understanding this, but it sounds like we may need to update the kube-proxy code that handles the case where more than one endpoint exists for the same IP:

kubernetes/pkg/proxy/endpointslicecache.go

Lines 312 to 317 in 53906cb

    
           // This logic ensures we're deduplicating potential overlapping endpoints 
        
           // isLocal should not vary between matching endpoints, but if it does, we 
        
           // favor a true value here if it exists. 
        
           if _, exists := endpointSet[endpointInfo.String()]; !exists || isLocal { 
        
           	endpointSet[endpointInfo.String()] = cache.makeEndpointInfo(endpointInfo, svcPortName) 
        
           }

aojea · 2022-12-17T00:21:55Z

How can we reproduce it (as minimally and precisely as possible)?

apply a pod which use host-network , and bind to a fixed node; the pod shouled not has request and limit value ,ensure is is BestEffort , so it will be evicited first.

apply a service which use nodePort 30002, and bind to the pod

manual make a Diskpressure which is easily caused by dd a big file.

the pod will be evicited ,

move the Diskpressure ,And the pod will be created

Then you will reproduce it , the endpoint will has the nodeIp both in readyAddrs and notReadyAddrs.

a reproducer will be very useful, it seems an issue with the endpoints controller, no?

there is an e2e that forces a disruption that can server as a basis of the reproducer

kubernetes/test/e2e/network/service.go

Lines 2387 to 2394 in 9edd4d8

    
           // regression test for https://issues.k8s.io/109414 and https://issues.k8s.io/109718 
        
           ginkgo.It("should be rejected for evicted pods (no endpoints exist)", func(ctx context.Context) { 
        
           	namespace := f.Namespace.Name 
        
           	serviceName := "evicted-pods" 
        
           	jig := e2eservice.NewTestJig(cs, namespace, serviceName) 
        
           	nodes, err := e2enode.GetBoundedReadySchedulableNodes(cs, e2eservice.MaxNodesForEndpointsTests) 
        
           	framework.ExpectNoError(err) 
        
           	nodeName := nodes.Items[0].Name

jason-i-vv · 2022-12-20T01:39:33Z

as I know, kube-proxy check endpointslice condition, So this issue can be solved by update endpoint controller or endpointslice controller.

Finally, this two ways will both update endpointslice condition to Ready , when both Ready and Terminating pods are present.

My first way is to update endpoint controller in my pr , Do you suggest some other ways?

Name:         bingokube-console-thdj9
Namespace:    kube-system
Labels:       app.kubernetes.io/managed-by=Helm
              endpointslice.kubernetes.io/managed-by=endpointslice-controller.k8s.io
              kubernetes.io/cluster-service=true
              kubernetes.io/service-name=bingokube-console
              name=bingokube-console
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2022-12-16T02:02:20Z
AddressType:  IPv4
Ports:
  Name               Port  Protocol
  ----               ----  --------
  bingokube-console  8002  TCP
Endpoints:
  - Addresses:  10.202.40.245
    Conditions:
      Ready:    true
    Hostname:   <unset>
    TargetRef:  Pod/bingokube-console-56f4684f76-6dmdq
    NodeName:   i-24afe987
    Zone:       <unset>
Events:         <none>

jason-i-vv · 2022-12-20T06:47:06Z

#TestPackSubsets
{
			name: "two sets, two mixed ips, two dup ip with uid, dup port, wrong order, ends up with split addresses",
			given: []v1.EndpointSubset{{
				Addresses: []v1.EndpointAddress{{IP: "5.6.7.8"}},
				Ports:     []v1.EndpointPort{{Port: 111}},
			}, {
				NotReadyAddresses: []v1.EndpointAddress{{IP: "5.6.7.8", TargetRef: podRef("uid-1")}},
				Ports:             []v1.EndpointPort{{Port: 111}},
			}, {
				NotReadyAddresses: []v1.EndpointAddress{{IP: "1.2.3.4", TargetRef: podRef("uid-1")}},
				Ports:             []v1.EndpointPort{{Port: 111}},
			}, {
				NotReadyAddresses: []v1.EndpointAddress{{IP: "1.2.3.4"}},
				Ports:             []v1.EndpointPort{{Port: 111}},
			}},
			expect: []v1.EndpointSubset{{
				Addresses: []v1.EndpointAddress{
					{IP: "5.6.7.8"},
				},
				NotReadyAddresses: []v1.EndpointAddress{
					{IP: "1.2.3.4"},
					{IP: "1.2.3.4", TargetRef: podRef("uid-1")},
					{IP: "5.6.7.8", TargetRef: podRef("uid-1")},
				},
				Ports: []v1.EndpointPort{{Port: 111}},
			}},
		}

this test has been writed to expect the dup ip both in ready or not ready, I am not sure is my idea correct with remove the dup ip from not Ready ips set.
@danwinship

aojea · 2022-12-20T07:45:39Z

apply a pod which use host-network , and bind to a fixed node; the pod shouled not has request and limit value ,ensure is is BestEffort , so it will be evicited first.

apply a service which use nodePort 30002, and bind to the pod

manual make a Diskpressure which is easily caused by dd a big file.

the pod will be evicited ,

move the Diskpressure ,And the pod will be created

Then you will reproduce it , the endpoint will has the nodeIp both in readyAddrs and notReadyAddrs.

which version are you using?

I think that the problem here is that the endpoint is retaining the PodIP of the evicted pod and that is wrong.

This was fixed by #110255 , please check that you are running in a version with that patch included , it seems you are running in 1.22.4 and this was fixed in v1.22.10, you should update to the latest stable version

jason-i-vv · 2022-12-20T07:50:49Z

Got it ,I will check it.

thockin · 2023-01-05T16:15:21Z

Any update here?

jason-i-vv · 2023-01-06T01:45:20Z

still check for release-1.22.

jason-i-vv · 2023-01-19T09:07:49Z

@thockin i had checked the stabe version release-1.22, and this problem was indeed solved。Sorry for take so lang.

thockin · 2023-01-21T00:20:51Z

Thanks for following up!

…

On Thu, Jan 19, 2023 at 1:08 AM jason-i-vv ***@***.***> wrote: @thockin <https://github.com/thockin> i had checked the stabe version release-1.22, and this problem was indeed solved。Sorry for take so lang. — Reply to this email directly, view it on GitHub <#114440 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVAXWDCRGXJFTXEL2IDWTD77FANCNFSM6AAAAAAS44HVWU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

jason-i-vv added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2022

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Dec 13, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 13, 2022

jason-i-vv mentioned this issue Dec 13, 2022

Fixed : NodePort Service Lost Ipvs Target #114441

Closed

k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 13, 2022

k8s-ci-robot added the wg/multitenancy Categorizes an issue or PR as relevant to WG Multitenancy. label Dec 13, 2022

k8s-ci-robot changed the title ~~NodePort Service Lost Ipvs Target~~ NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present Dec 16, 2022

thockin assigned robscott Dec 21, 2022

thockin closed this as completed Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present #114440

NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present #114440

jason-i-vv commented Dec 13, 2022

k8s-ci-robot commented Dec 13, 2022

jason-i-vv commented Dec 13, 2022

jason-i-vv commented Dec 13, 2022

danwinship commented Dec 16, 2022

robscott commented Dec 16, 2022

aojea commented Dec 17, 2022

How can we reproduce it (as minimally and precisely as possible)?

jason-i-vv commented Dec 20, 2022

jason-i-vv commented Dec 20, 2022

aojea commented Dec 20, 2022 •

edited

jason-i-vv commented Dec 20, 2022

thockin commented Jan 5, 2023

jason-i-vv commented Jan 6, 2023

jason-i-vv commented Jan 19, 2023

thockin commented Jan 21, 2023 via email

NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present #114440

NodePort service for hostNetwork pod fails when both Ready and Terminating pods are present #114440

Comments

jason-i-vv commented Dec 13, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

k8s-ci-robot commented Dec 13, 2022

jason-i-vv commented Dec 13, 2022

jason-i-vv commented Dec 13, 2022

danwinship commented Dec 16, 2022

robscott commented Dec 16, 2022

aojea commented Dec 17, 2022

How can we reproduce it (as minimally and precisely as possible)?

jason-i-vv commented Dec 20, 2022

jason-i-vv commented Dec 20, 2022

aojea commented Dec 20, 2022 • edited

jason-i-vv commented Dec 20, 2022

thockin commented Jan 5, 2023

jason-i-vv commented Jan 6, 2023

jason-i-vv commented Jan 19, 2023

thockin commented Jan 21, 2023 via email

aojea commented Dec 20, 2022 •

edited