Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale endpoints on scale to zero #14336

Closed
howardjohn opened this issue May 23, 2019 · 9 comments · Fixed by #16967
Closed

Stale endpoints on scale to zero #14336

howardjohn opened this issue May 23, 2019 · 9 comments · Fixed by #16967
Assignees
Milestone

Comments

@howardjohn
Copy link
Member

@howardjohn howardjohn commented May 23, 2019

Scale deployment to 5 replicas, see all 5 in /clusters

outbound|80||identity.envoylite.svc.cluster.local::10.60.22.51:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.22.52:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.75:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.76:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.77:8080::region::us-central1

Scale deployment to 0 replicas, endpoints still remain in /clusters

outbound|80||identity.envoylite.svc.cluster.local::10.60.22.51:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.22.52:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.75:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.76:8080::region::us-central1
outbound|80||identity.envoylite.svc.cluster.local::10.60.25.77:8080::region::us-central1

Running 1.2 dailies from 5/22

Adding s.ConfigUpdate(true) to https://github.com/istio/istio/blob/master/pilot/pkg/proxy/envoy/v2/eds.go#L503 fixes this, but I am not sure it is the best option

Scaling back up triggers a push which resolves this

This is not very critical, because if you have 0 endpoints you aren't going to have any successful requests anyways

@howardjohn howardjohn self-assigned this May 23, 2019
@mbanikazemi

This comment has been minimized.

Copy link
Contributor

@mbanikazemi mbanikazemi commented May 28, 2019

Hmmm, I see them removed in my cluster.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented May 28, 2019

@mbanikazemi if there are any other config changes, etc pilot will trigger a push which will remove them I think, could be what happened?

@mbanikazemi

This comment has been minimized.

Copy link
Contributor

@mbanikazemi mbanikazemi commented May 29, 2019

@howardjohn Testsed it on a small cluster with no load but one cannot be sure there were no other triggers. Will check again tomorrow.

@XanderStrike

This comment has been minimized.

Copy link
Member

@XanderStrike XanderStrike commented Aug 30, 2019

Repro'd on GKE with 1.2.5, I'll see if I can get a fix going.

XanderStrike added a commit to XanderStrike/istio that referenced this issue Sep 3, 2019
otherwise sidecars will never find out about it, and they'll keep these
endpoints until something else happens with the service

resolves istio#14336
@hzxuzhonghu

This comment has been minimized.

Copy link
Member

@hzxuzhonghu hzxuzhonghu commented Sep 11, 2019

/assign @XanderStrike

@p4ali

This comment has been minimized.

Copy link

@p4ali p4ali commented Oct 16, 2019

@howardjohn
We have a different case (istio 1.2.5, not sure if this is related):
Assume we have a service with 3 pods: a,b,c.

If a,b become unhealthy due to readiness-probe failure, the cluster will have only 1 endpoint c. So far so good.

But if c also become unhealthy, we expect the cluster become empty, however, the unhealthy endpoint c seems always there.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Oct 16, 2019

@p4ali that is the same case as above, which is fixed since 1.2.5

@robertpanzer

This comment has been minimized.

Copy link
Contributor

@robertpanzer robertpanzer commented Oct 16, 2019

I just tested that with 1.2.7 I still see the last endpoint in /clusters and requests are still routed.
Only current master that has #16967 seems to show the updated behavior.

@howardjohn

This comment has been minimized.

Copy link
Member Author

@howardjohn howardjohn commented Oct 16, 2019

@p4ali that is the same case as above, which is fixed since 1.2.5

To clarify, by "since 1.2.5" I mean in 1.4, not in 1.2.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
7 participants
You can’t perform that action at this time.