-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traffic got routed to an endpoint that does not belong to the target service #11763
Comments
This sounds a bit similar to an issue we had where the destination controller could become locked and stop processing service discovery updates. However, this bug was fixed in stable-2.14.2 and should not affect you in stable-2.14.3. In order to rule out that possibility, you could take a look at the
You should see this counter incremented when the endpoints of a service change. If, instead, this counter remains at the same value, it means that the destination controller is not processing updates for some reason. In stable-2.14.4 we added |
Hey @adleong , thanks for the reply. And yes, I do see the endpoints_updates counter incremented after the deployment of the target service: service A. With that I guess the destination controller was processing. A couple of things worth mentioning:
Does it change anything? And as action item, I think we will try to update to Meanwhile, if we found anything new, we will report in the thread again. Thanks! |
@Wenliang-CHEN Any joy trying with |
Hey @kflynn not yet...around Christmas holiday. I will let you know 😄 But there has not been another instance since I reported the issue. But to be safe, we are still observing... |
@Wenliang-CHEN Keeping fingers crossed for you -- enjoy the holiday! 🙂 |
@Wenliang-CHEN Happy new year!! Just wanted to make sure this was still on your radar. 🙂 |
Hey @kflynn happy new year! And yes, we have not forgotten this. We just upgraded to v2.14.9. And so far we did not get any report about the same issue. Hopefully the upgrade somehow fixes it. We will monitor it through out Feb. If there is no further report, I think we can close it for now. Thanks! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
What is the issue?
Hey all, here is what we saw so far: our alerting system detected some failing traffic.
From the linkerd-proxy logs of the outbound pod, it seems there is a problem with target resolution.
A pod that does not belong to the target service was resolved as a target endpoint.
Meanwhile, from the linkerd-proxy logs, we could see that the proxy was at the same time trying a lot of different endpoints.
We are not sure where to look yet. And it doesn't happen often. We have alert against this situation as well. I will update here if it happens again.
Meanwhile, please let me know what you think. Thanks!
How can it be reproduced?
We are not completely sure. But it seems this issue happens after a bigger reshuffling of internal IPs, e.g. after the deployment of a workload that has 400 pods.
Logs, error output, etc
The "proxy"
here is the IP of the target service, service A
And the "endpoint"
here is a pod that belongs to another service, service B
When this happened, from the
endpoints_available
metrics, there is no change to both service A and service Boutput of
linkerd check -o short
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: