Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

humboldt-xie · 2020-03-04T07:56:18Z

destination return expired ip when endpoints in namespace kube-system #4055

Signed-off-by: humboldt humboldt_xie@163.com

…stem linkerd#4055 Signed-off-by: humboldt <humboldt_xie@163.com>

adleong

alpeb · 2020-03-05T01:44:20Z

@humboldt-xie I couldn't replicate the issue you describe in #4055
What linkerd version are you on? By looking at the source code for edge-20.2.3, I can see the destination server's Get method is following a different code path that makes it query for the endpoints even if endpoints events for kube-system haven't being caught by addEndpoints.

The code path is:
destination.Get()->EndpointsWatcher.Subscribe()->servicePublisher.subscribe()->portPublisher.subscribe()->endpointTranslator.Add()

That being said, while testing your branch, I can see there's a continued stream of endpoint events for kube-controller and kube-scheduler:

time="2020-03-05T01:37:54Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager                                          
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler                                                             
time="2020-03-05T01:38:02Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager

This appears to be a known issue: kubernetes/kubernetes#86286 and kubernetes/kubernetes#23812 -- That's probably why we had this filter in place.

If we still can't reproduce your issue, my recommendation would be to leave these filters in place (or better yet, remove them all like you did, and see if there's a way to use a selector to exclude them in the informer 😉 )

humboldt-xie · 2020-03-05T07:06:11Z

@humboldt-xie I couldn't replicate the issue you describe in #4055
What linkerd version are you on? By looking at the source code for edge-20.2.3, I can see the destination server's Get method is following a different code path that makes it query for the endpoints even if endpoints events for kube-system haven't being caught by addEndpoints.

The code path is:
destination.Get()->EndpointsWatcher.Subscribe()->servicePublisher.subscribe()->portPublisher.subscribe()->endpointTranslator.Add()

That being said, while testing your branch, I can see there's a continued stream of endpoint events for kube-controller and kube-scheduler:
time="2020-03-05T01:37:54Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager                                          
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler                                                             
time="2020-03-05T01:38:02Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
This appears to be a known issue: kubernetes/kubernetes#86286 and kubernetes/kubernetes#23812 -- That's probably why we had this filter in place.

If we still can't reproduce your issue, my recommendation would be to leave these filters in place (or better yet, remove them all like you did, and see if there's a way to use a selector to exclude them in the informer 😉 )

did you run

$ kubectl  exec -it demo-client-7bd774d7d8-2xvp2 bash 
[root@client-7bd774d7d8-2xvp2 client]# while true; do curl example.kube-system.svc.cluster.local; sleep 1; done

first ?

humboldt-xie · 2020-03-05T07:32:43Z

@alpeb view the code from master

if we get a new portPublisher there would list new endpoint from k8s

func (sp *servicePublisher) newPortPublisher(srcPort Port, hostname string) *portPublisher {
...
endpoints, err := sp.k8sAPI.Endpoint().Lister().Endpoints(sp.id.Namespace).Get(sp.id.Name)
...
▹   ▹   port.updateEndpoints(endpoints)
....
...

if we get and sub a exist portPublisher it would get endpoints from mem,but never update:

func (pp *portPublisher) subscribe(listener EndpointUpdateListener) {
        if pp.exists {
                if len(pp.pods.Pods) > 0 {
                        listener.Add(pp.pods)
                } else {
                        listener.NoEndpoints(true)
                }
        } else {
                listener.NoEndpoints(false)
        }

humboldt-xie · 2020-03-05T07:58:20Z

That event is harmless even if it is not filtered

Because the listening ports are empty:

func (sp *servicePublisher) updateEndpoints(newEndpoints *corev1.Endpoints) {
	sp.Lock()
	defer sp.Unlock()
	sp.log.Debugf("Updating endpoints for %s", sp.id)

	for _, port := range sp.ports { //no one would subscribe kube-scheduler and kube-controller-manager
		port.updateEndpoints(newEndpoints)
	}
}

The only effect is that there are two more servicePublisher structures in memory

And will always print the debugging information

adleong · 2020-03-05T18:33:28Z

I can reproduce this issue on stable-2.7.0.

After installing Linkerd, I deploy nginx to the kube-system namespace:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx

Then I run a curl pod:

kubectl run curl --rm -it --image=appropriate/curl --restart=Never --command /bin/sh
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done

Then I roll the nginx:

kubectl -n kube-system rollout restart deploy/nginx

The curl loop then hangs and never recovers.

Interestingly, the linkerd endpoints command yields the up-to-date endpoints. This is because each invocation of this command initiates a new subscription which fetches the current data, whereas the curl pod holds onto a persistent subscription and never receives updates from the kube-system namespace.

adleong · 2020-03-05T18:45:02Z

I agree with @alpeb that these constant debug log messages clog up the output and make the debug logs basically useless.

It's not the cleanest solution, but maybe we should simply skip updates from kube-scheduler and kube-controller-manager rather than for the entire kube-system namespace. WDYT?

humboldt-xie · 2020-03-06T07:36:43Z

I can reproduce this issue on stable-2.7.0.

After installing Linkerd, I deploy nginx to the kube-system namespace:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx
Then I run a curl pod:
kubectl run curl --rm -it --image=appropriate/curl --restart=Never --command /bin/sh
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done
Then I roll the nginx:
kubectl -n kube-system rollout restart deploy/nginx
The curl loop then hangs and never recovers.

Interestingly, the linkerd endpoints command yields the up-to-date endpoints. This is because each invocation of this command initiates a new subscription which fetches the current data, whereas the curl pod holds onto a persistent subscription and never receives updates from the kube-system namespace.

add:

  annotations:
    linkerd.io/inject: enabled

to your service and client pod

humboldt-xie · 2020-03-06T08:04:04Z

maybe we can change the code:

func (ew *EndpointsWatcher) addEndpoints(obj interface{}) {//and addService
	endpoints := obj.(*corev1.Endpoints)
	id := ServiceID{
		Namespace: endpoints.Namespace,
		Name:      endpoints.Name,
	}

	sp := ew.getOrNewServicePublisher(id)

	sp.updateEndpoints(endpoints)
}

to

func (ew *EndpointsWatcher) addEndpoints(obj interface{}) { //and addService
	endpoints := obj.(*corev1.Endpoints)
	id := ServiceID{
		Namespace: endpoints.Namespace,
		Name:      endpoints.Name,
	}

	sp ,ok:= ew.getServicePublisher(id)
         if ok {
	    sp.updateEndpoints(endpoints)
         }  
}

create ServicePublisher and update endpoints when someone subscribe

humboldt-xie · 2020-03-06T09:55:55Z

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: kube-system
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
        - name: nginx
          image: nginx
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx
---
apiVersion: v1
kind: Pod
metadata:
  name: curl
  namespace: default
  annotations:
    linkerd.io/inject: enabled
  labels:
    app: curl
spec:
  containers:
  - name: curl
    imagePullPolicy: IfNotPresent
    image: appropriate/curl
    command: ['/bin/sh',"-c","sleep 10000;"]
---

apply this yaml ,and run

kubectl exec -it curl sh 
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done

and:
···
kubectl scale --replicas=1 -n kube-system deploy/nginx
···

   i try it on minikube the pod ip would not change when restart，so scale replicas from 3 to 1,can see there always have 3 ip .. ..

adleong · 2020-03-09T17:13:14Z

@humboldt-xie I'd rather not change the way that publishers and subscribers work in this PR. A more targeted change would be to simply ignore the events that we don't care about.

grampelberg · 2020-04-20T21:12:41Z

@humboldt-xie are you still working on this or should we close it out?

Watch events for objects in the kube-system namespace were previously ignored. In certain situations, this would cause the destination service to return invalid (outdated) endpoints for services in kube-system - including unmeshed services. It was suggested [1] that kube-system events were ignored to avoid handling frequent Endpoint updates - specifically from controllers using Endpoints for leader elections [2]. As of Kubernetes 1.20, these controllers default to using Leases instead of Endpoints for their leader elections [3], obviating the need to exclude (or filter) updates from kube-system. The exclusions have been removed accordingly. [1]: linkerd#4133 (comment) [2]: kubernetes/kubernetes#86286 [3]: kubernetes/kubernetes#94603 Signed-off-by: Jacob Henner <code@ventricle.us>

Watch events for objects in the kube-system namespace were previously ignored. In certain situations, this would cause the destination service to return invalid (outdated) endpoints for services in kube-system - including unmeshed services. It [was suggested][1] that kube-system events were ignored to avoid handling frequent Endpoint updates - specifically from [controllers using Endpoints for leader elections][2]. As of Kubernetes 1.20, these controllers [default to using Leases instead of Endpoints for their leader elections][3], obviating the need to exclude (or filter) updates from kube-system. The exclusions have been removed accordingly. [1]: #4133 (comment) [2]: kubernetes/kubernetes#86286 [3]: kubernetes/kubernetes#94603 Signed-off-by: Jacob Henner <code@ventricle.us>

Fix destination return expired ip when endpoints in namespace kube-sy…

f6c1a11

…stem linkerd#4055 Signed-off-by: humboldt <humboldt_xie@163.com>

adleong approved these changes Mar 4, 2020

View reviewed changes

grampelberg assigned alpeb Mar 4, 2020

adleong closed this Jun 15, 2020

This was referenced Jun 21, 2022

Remove kube-system exclusions from watchers #8720

Merged

Exclusion of kube-system in EndpointsWatcher leads to broken DNS resolution #8721

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

humboldt-xie commented Mar 4, 2020

adleong left a comment

alpeb commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

adleong commented Mar 5, 2020

adleong commented Mar 5, 2020

humboldt-xie commented Mar 6, 2020

humboldt-xie commented Mar 6, 2020

humboldt-xie commented Mar 6, 2020

adleong commented Mar 9, 2020

grampelberg commented Apr 20, 2020

Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

Conversation

humboldt-xie commented Mar 4, 2020

adleong left a comment

Choose a reason for hiding this comment

alpeb commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

humboldt-xie commented Mar 5, 2020

adleong commented Mar 5, 2020

adleong commented Mar 5, 2020

humboldt-xie commented Mar 6, 2020

humboldt-xie commented Mar 6, 2020

humboldt-xie commented Mar 6, 2020

adleong commented Mar 9, 2020

grampelberg commented Apr 20, 2020