Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix destination return expired ip when endpoints in namespace kube-system #4055 #4133

Closed
wants to merge 1 commit into from

Conversation

humboldt-xie
Copy link

destination return expired ip when endpoints in namespace kube-system #4055

Signed-off-by: humboldt humboldt_xie@163.com

…stem linkerd#4055

Signed-off-by: humboldt <humboldt_xie@163.com>
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks, @humboldt-xie!

@alpeb
Copy link
Member

alpeb commented Mar 5, 2020

@humboldt-xie I couldn't replicate the issue you describe in #4055
What linkerd version are you on? By looking at the source code for edge-20.2.3, I can see the destination server's Get method is following a different code path that makes it query for the endpoints even if endpoints events for kube-system haven't being caught by addEndpoints.

The code path is:
destination.Get()->EndpointsWatcher.Subscribe()->servicePublisher.subscribe()->portPublisher.subscribe()->endpointTranslator.Add()

That being said, while testing your branch, I can see there's a continued stream of endpoint events for kube-controller and kube-scheduler:

time="2020-03-05T01:37:54Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager                                          
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler                                                             
time="2020-03-05T01:38:02Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager

This appears to be a known issue: kubernetes/kubernetes#86286 and kubernetes/kubernetes#23812 -- That's probably why we had this filter in place.

If we still can't reproduce your issue, my recommendation would be to leave these filters in place (or better yet, remove them all like you did, and see if there's a way to use a selector to exclude them in the informer 😉 )

@humboldt-xie
Copy link
Author

@humboldt-xie I couldn't replicate the issue you describe in #4055
What linkerd version are you on? By looking at the source code for edge-20.2.3, I can see the destination server's Get method is following a different code path that makes it query for the endpoints even if endpoints events for kube-system haven't being caught by addEndpoints.

The code path is:
destination.Get()->EndpointsWatcher.Subscribe()->servicePublisher.subscribe()->portPublisher.subscribe()->endpointTranslator.Add()

That being said, while testing your branch, I can see there's a continued stream of endpoint events for kube-controller and kube-scheduler:

time="2020-03-05T01:37:54Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:56Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager
time="2020-03-05T01:37:58Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager                                          
time="2020-03-05T01:38:00Z" level=debug msg="Updating endpoints for kube-system/kube-scheduler" addr=":8086" component=service-publisher ns=kube-system svc=kube-scheduler                                                             
time="2020-03-05T01:38:02Z" level=debug msg="Updating endpoints for kube-system/kube-controller-manager" addr=":8086" component=service-publisher ns=kube-system svc=kube-controller-manager

This appears to be a known issue: kubernetes/kubernetes#86286 and kubernetes/kubernetes#23812 -- That's probably why we had this filter in place.

If we still can't reproduce your issue, my recommendation would be to leave these filters in place (or better yet, remove them all like you did, and see if there's a way to use a selector to exclude them in the informer 😉 )

did you run

$ kubectl  exec -it demo-client-7bd774d7d8-2xvp2 bash 
[root@client-7bd774d7d8-2xvp2 client]# while true; do curl example.kube-system.svc.cluster.local; sleep 1; done

first ?

@humboldt-xie
Copy link
Author

@alpeb view the code from master

if we get a new portPublisher there would list new endpoint from k8s

func (sp *servicePublisher) newPortPublisher(srcPort Port, hostname string) *portPublisher {
...
endpoints, err := sp.k8sAPI.Endpoint().Lister().Endpoints(sp.id.Namespace).Get(sp.id.Name)
...
▹   ▹   port.updateEndpoints(endpoints)
....
...

if we get and sub a exist portPublisher it would get endpoints from mem,but never update:

func (pp *portPublisher) subscribe(listener EndpointUpdateListener) {
        if pp.exists {
                if len(pp.pods.Pods) > 0 {
                        listener.Add(pp.pods)
                } else {
                        listener.NoEndpoints(true)
                }
        } else {
                listener.NoEndpoints(false)
        }

@humboldt-xie
Copy link
Author

That event is harmless even if it is not filtered

Because the listening ports are empty:

func (sp *servicePublisher) updateEndpoints(newEndpoints *corev1.Endpoints) {
	sp.Lock()
	defer sp.Unlock()
	sp.log.Debugf("Updating endpoints for %s", sp.id)

	for _, port := range sp.ports { //no one would subscribe kube-scheduler and kube-controller-manager
		port.updateEndpoints(newEndpoints)
	}
}

The only effect is that there are two more servicePublisher structures in memory

And will always print the debugging information

@adleong
Copy link
Member

adleong commented Mar 5, 2020

I can reproduce this issue on stable-2.7.0.

After installing Linkerd, I deploy nginx to the kube-system namespace:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx

Then I run a curl pod:

kubectl run curl --rm -it --image=appropriate/curl --restart=Never --command /bin/sh
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done

Then I roll the nginx:

kubectl -n kube-system rollout restart deploy/nginx

The curl loop then hangs and never recovers.

Interestingly, the linkerd endpoints command yields the up-to-date endpoints. This is because each invocation of this command initiates a new subscription which fetches the current data, whereas the curl pod holds onto a persistent subscription and never receives updates from the kube-system namespace.

@adleong
Copy link
Member

adleong commented Mar 5, 2020

I agree with @alpeb that these constant debug log messages clog up the output and make the debug logs basically useless.

It's not the cleanest solution, but maybe we should simply skip updates from kube-scheduler and kube-controller-manager rather than for the entire kube-system namespace. WDYT?

@humboldt-xie
Copy link
Author

I can reproduce this issue on stable-2.7.0.

After installing Linkerd, I deploy nginx to the kube-system namespace:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx

Then I run a curl pod:

kubectl run curl --rm -it --image=appropriate/curl --restart=Never --command /bin/sh
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done

Then I roll the nginx:

kubectl -n kube-system rollout restart deploy/nginx

The curl loop then hangs and never recovers.

Interestingly, the linkerd endpoints command yields the up-to-date endpoints. This is because each invocation of this command initiates a new subscription which fetches the current data, whereas the curl pod holds onto a persistent subscription and never receives updates from the kube-system namespace.

add:

  annotations:
    linkerd.io/inject: enabled

to your service and client pod

@humboldt-xie
Copy link
Author

maybe we can change the code:

func (ew *EndpointsWatcher) addEndpoints(obj interface{}) {//and addService
	endpoints := obj.(*corev1.Endpoints)
	id := ServiceID{
		Namespace: endpoints.Namespace,
		Name:      endpoints.Name,
	}

	sp := ew.getOrNewServicePublisher(id)

	sp.updateEndpoints(endpoints)
}

to

func (ew *EndpointsWatcher) addEndpoints(obj interface{}) { //and addService
	endpoints := obj.(*corev1.Endpoints)
	id := ServiceID{
		Namespace: endpoints.Namespace,
		Name:      endpoints.Name,
	}

	sp ,ok:= ew.getServicePublisher(id)
         if ok {
	    sp.updateEndpoints(endpoints)
         }  
}

create ServicePublisher and update endpoints when someone subscribe

@humboldt-xie
Copy link
Author

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: kube-system
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        linkerd.io/inject: enabled
    spec:
      containers:
        - name: nginx
          image: nginx
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  namespace: kube-system
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
  selector:
    app: nginx
---
apiVersion: v1
kind: Pod
metadata:
  name: curl
  namespace: default
  annotations:
    linkerd.io/inject: enabled
  labels:
    app: curl
spec:
  containers:
  - name: curl
    imagePullPolicy: IfNotPresent
    image: appropriate/curl
    command: ['/bin/sh',"-c","sleep 10000;"]
---

apply this yaml ,and run

kubectl exec -it curl sh 
# in that pod:
while true; do curl http://nginx.kube-system.svc.cluster.local; sleep 1; done

and:
···
kubectl scale --replicas=1 -n kube-system deploy/nginx
···

   i try it on minikube the pod ip would not change when restart,so scale replicas from 3 to 1,can see there always have 3 ip .. ..

@adleong
Copy link
Member

adleong commented Mar 9, 2020

@humboldt-xie I'd rather not change the way that publishers and subscribers work in this PR. A more targeted change would be to simply ignore the events that we don't care about.

@grampelberg
Copy link
Contributor

@humboldt-xie are you still working on this or should we close it out?

@adleong adleong closed this Jun 15, 2020
JacobHenner added a commit to JacobHenner/linkerd2 that referenced this pull request Jun 21, 2022
Watch events for objects in the kube-system namespace were previously
ignored. In certain situations, this would cause the destination service
to return invalid (outdated) endpoints for services in kube-system -
including unmeshed services.

It was suggested [1] that kube-system events were ignored to avoid
handling frequent Endpoint updates - specifically from controllers using
Endpoints for leader elections [2]. As of Kubernetes 1.20, these
controllers default to using Leases instead of Endpoints for their
leader elections [3], obviating the need to exclude (or filter) updates
from kube-system. The exclusions have been removed accordingly.

[1]: linkerd#4133 (comment)
[2]: kubernetes/kubernetes#86286
[3]: kubernetes/kubernetes#94603

Signed-off-by: Jacob Henner <code@ventricle.us>
kleimkuhler pushed a commit that referenced this pull request Jul 11, 2022
Watch events for objects in the kube-system namespace were previously ignored.
In certain situations, this would cause the destination service to return
invalid (outdated) endpoints for services in kube-system - including unmeshed
services.

It [was suggested][1] that kube-system events were ignored to avoid handling
frequent Endpoint updates - specifically from [controllers using Endpoints for
leader elections][2]. As of Kubernetes 1.20, these controllers [default to using
Leases instead of Endpoints for their leader elections][3], obviating the need
to exclude (or filter) updates from kube-system. The exclusions have been
removed accordingly.

[1]: #4133 (comment)
[2]: kubernetes/kubernetes#86286
[3]: kubernetes/kubernetes#94603

Signed-off-by: Jacob Henner <code@ventricle.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants