Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ServiceEntry has stale endpoints #35404

Closed
yingzhuivy opened this issue Sep 28, 2021 · 5 comments
Closed

ServiceEntry has stale endpoints #35404

yingzhuivy opened this issue Sep 28, 2021 · 5 comments
Labels
area/networking feature/Multi-cluster issues related with multi-cluster support

Comments

@yingzhuivy
Copy link
Member

yingzhuivy commented Sep 28, 2021

Bug Description

We have a ServiceEntry in an external Istiod cluster that selects k8s pods in a remote cluster, when the pods are restarted, sometimes there will be stale endpoints in the cluster corresponding to the ServiceEntry. Looking at Istiod's debug/endpointz, all of the istiod pods have the same stale states, restarting the Istiod pods fix the problem.

To more easily reproduce, I just scale the deployment to 100 replicas, and then scale down to 1, sometimes some endpoints are not correctly deleted and there will be more than 1 endpoint under the cluster corresponding to the ServiceEntry. It do appear that it's easier to reproduce in more heavy loaded k8s clusters.

Here is the code for reproduction:
In the external Istiod cluster:

apiVersion: v1
kind: Namespace
metadata:
  name: airmesh-test-ying
---
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: fortioserver
  namespace: airmesh-test-ying
spec:
  addresses:
  - 100.124.99.206
  hosts:
  - fortioserver.airmesh-test-ying.svc.ea1.us.a
  location: MESH_INTERNAL
  ports:
    - name: http-port
      number: 8080
      protocol: HTTP
  resolution: STATIC
  workloadSelector:
    labels:
      app: fortioserver

In the remote cluster:

---
apiVersion: v1
kind: Namespace
metadata:
  name: airmesh-test-ying
  labels:
    istio-injection: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fortioclient
  name: fortioclient
  namespace: airmesh-test-ying
spec:
  selector:
    matchLabels:
      app: fortioclient
  template:
    metadata:
      labels:
        app: fortioclient
    spec:
      containers:
        - name: fortioclient
          image: fortio/fortio:latest_release
          args:
            - report
          volumeMounts:
            - name: shared-data
              mountPath: /var/lib/fortio
      volumes:
        - name: shared-data
          emptyDir: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fortioserver
  name: fortioserver
  namespace: airmesh-test-ying
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortioserver
  template:
    metadata:
      labels:
        app: fortioserver
    spec:
      containers:
        - name: fortioserver
          image: fortio/fortio:latest_release
          volumeMounts:
            - name: shared-data
              mountPath: /var/lib/fortio
      volumes:
        - name: shared-data
          emptyDir: {}

Version

istio version: 1.10.3
k8s version: 1.17.14

Additional Information

If I turn on debug logging, for the stale endpoints, I didn't see these lines:

2021-09-28T18:02:19.160451Z debug Handle event delete for service instance (from 100.117.83.50) in namespace airmesh-test-ying

The log comes from here: https://github.com/istio/istio/blob/master/pilot/pkg/serviceregistry/serviceentry/servicediscovery.go#L444

So looks like it's either the delete event is missing or redundantEventForPod is true (which probably is not) ?

@istio-policy-bot istio-policy-bot added area/networking feature/Multi-cluster issues related with multi-cluster support labels Sep 28, 2021
@yingzhuivy
Copy link
Member Author

I added some debug logs and found out that the issue is due to informer getting event out of order.

  1. old pod shuts down, releases IP
  2. new pod, from another service, starts up, get the same IP that just got released

however, istiod gets event in this order:

  1. new pod starts up, because the pod is not ready yet, it's a delete event, and it will delete the field in workloadInstancesByIP
  2. now when the old pod delete event comes, it's marked as a redundant event , so the delete never happens leaving stale endpoints

From my testing, the new pod event comes at 2021-09-28T23:52:10.266077Z and the old pod delete event comes at around 2021-09-28T23:52:25.650738Z so there is roughly 15s delay. I think we can not rely on the pod informer to have strict order. I think one possible fix is to store workloadInstances by name instead of by IP, so that we delete based on namespace/name. Thoughts? @hzxuzhonghu @howardjohn @iandyh

@hzxuzhonghu
Copy link
Member

I think we can not rely on the pod informer to have strict order.

In theory, the informer events are in order. Thanks for your investigation. I need to have a closer look.

@howardjohn
Copy link
Member

In theory, the informer events are in order

I suspect they are in order. But the new pod legitimately starts with the new IP before the old pod is fully deleted.

Just a guess though

@yingzhuivy
Copy link
Member Author

Actually as long as the pod starts terminating, it is marked as a delete event here and should be deleted in the service registry. But that event comes later than the new pod event. I think IP is released after the pod starts terminating, so the terminating event should come before the new pod starts up event, which indicates to me that the informer events are out of order.

(by the way, to reproduce in a cluster with small load, I had another deployment with 300 pods that is consistently restarting to mimic load)

airbnb-gerrit pushed a commit to airbnb/istio that referenced this issue Oct 11, 2021
This CL fixes the problem described in this issue:
istio#35404

workloadInstancesByIP is replaced by workloadInstancesByName, i.e.
we don't assume one to one correspondance between wi and ip.
workloadInstancesIPsByName is removed since it's covered by the
new map.

There is a refactor going on for the ServiceEntryStore which should
also fix the problem: istio#35369. But
that probably won't get backported to 1.11, so I created this smaller
fix to be patched onto 1.11.

Change-Id: I62992082979cb12ff5fecef4d85f20e3c9fd435a
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/1690
Reviewed-by: Weibo He <weibo.he@airbnb.com>
yingzhuivy pushed a commit to airbnb/istio that referenced this issue Oct 12, 2021
This CL fixes the problem described in this issue:
istio#35404

workloadInstancesByIP is replaced by workloadInstancesByName, i.e.
we don't assume one to one correspondance between wi and ip.
workloadInstancesIPsByName is removed since it's covered by the
new map.

There is a refactor going on for the ServiceEntryStore which should
also fix the problem: istio#35369. But
that probably won't get backported to 1.11, so I created this smaller
fix to be patched onto 1.11.

Change-Id: I62992082979cb12ff5fecef4d85f20e3c9fd435a
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/1690
Reviewed-by: Weibo He <weibo.he@airbnb.com>
istio-testing pushed a commit that referenced this issue Oct 12, 2021
* istio: fix service entry stale endpoints

This CL fixes the problem described in this issue:
#35404

workloadInstancesByIP is replaced by workloadInstancesByName, i.e.
we don't assume one to one correspondance between wi and ip.
workloadInstancesIPsByName is removed since it's covered by the
new map.

There is a refactor going on for the ServiceEntryStore which should
also fix the problem: #35369. But
that probably won't get backported to 1.11, so I created this smaller
fix to be patched onto 1.11.

Change-Id: I62992082979cb12ff5fecef4d85f20e3c9fd435a
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/1690
Reviewed-by: Weibo He <weibo.he@airbnb.com>

* lint

* add release notes
istio-testing pushed a commit to istio-testing/istio that referenced this issue Oct 12, 2021
This CL fixes the problem described in this issue:
istio#35404

workloadInstancesByIP is replaced by workloadInstancesByName, i.e.
we don't assume one to one correspondance between wi and ip.
workloadInstancesIPsByName is removed since it's covered by the
new map.

There is a refactor going on for the ServiceEntryStore which should
also fix the problem: istio#35369. But
that probably won't get backported to 1.11, so I created this smaller
fix to be patched onto 1.11.

Change-Id: I62992082979cb12ff5fecef4d85f20e3c9fd435a
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/1690
Reviewed-by: Weibo He <weibo.he@airbnb.com>
@howardjohn
Copy link
Member

Fix is merged

istio-testing added a commit that referenced this issue Oct 12, 2021
* istio: fix service entry stale endpoints

This CL fixes the problem described in this issue:
#35404

workloadInstancesByIP is replaced by workloadInstancesByName, i.e.
we don't assume one to one correspondance between wi and ip.
workloadInstancesIPsByName is removed since it's covered by the
new map.

There is a refactor going on for the ServiceEntryStore which should
also fix the problem: #35369. But
that probably won't get backported to 1.11, so I created this smaller
fix to be patched onto 1.11.

Change-Id: I62992082979cb12ff5fecef4d85f20e3c9fd435a
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/1690
Reviewed-by: Weibo He <weibo.he@airbnb.com>

* lint

* add release notes

Co-authored-by: ying-zhu <ying.zhu@airbnb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking feature/Multi-cluster issues related with multi-cluster support
Projects
None yet
Development

No branches or pull requests

4 participants