New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination & identity service unable to reach k8s API due to static port 443 override #8591
Comments
That doesn't seem likely to me. This change added 443 to the proxy's list of default opaque ports. This should only really impact application pods, since, as you mention, the control plane is configured with: - --outbound-ports-to-ignore
- "443"That is, iptables configures all outbound traffic to :443 to skip the proxy entirely. I suspect the issue is this: in non-cillium clusters, application connections to 443 are seen by iptables as connecting to 443, and so they handle this skip appropriately. But when cillium is in the mix, it rewrites the connection metadata before the iptables apply, so the traffic does not skip the proxy. You could try manually editing the manifests to change |
But that would apply to linkerd <= stable-2.10 only, no? As far as I understood the controller was made redundant and all components talk directly to the K8s API. Which would also explain why this worked for us until the upgrade to 2.11. I just checked one of our clusters that is still running - --outbound-ports-to-ignore
- 25,443,587,3306,11211,5432,6443
Yes, our current manual fix for this is to patch the generated manifest via |
No, Linkerd proxies continue to use the controller components for discovery & identity. |
Okay, understood. I could be mixing up cause & effect then, it's just that what worked for 2.10 no longer works for 2.11 – the unblocking of |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
|
Please don't close this one, it is still a problem. |
|
I had a look at this and came-up with some steps to reproduce the problem. This issue stems from using Cilium in its kube-proxy replacement mode. As soon as I set it to Steps to reproduce locally in k3d
Control plane pods for me were not starting up (only the identity pod); errors in the identity pod: Cilium config: Looking at the Cilium logs, I haven't yet been able to find any details about packets being re-written from '443' to '6443' though: Quick fixes:Two ways that this can be quickly patched for people that need immediate support for this:
initContainers:
- args:
- --incoming-proxy-port
- "4143"
- --outgoing-proxy-port
- "4140"
- --proxy-uid
- "2102"
- --inbound-ports-to-ignore
- 4190,4191,4567,4568
- --outbound-ports-to-ignore
- 443,6443
|
|
@bascht I have managed to reproduce this with |
|
@bascht Have you had a chance to try the Cilium upgrade @mateiidavid suggested above? |
What is the issue?
It looks like #6887 introduced TLS detection on port 443 under the static assumption that the Kubernetes API is always reachable on port
:443when using Cilium as the CNI:Both destination and identity are rendering templates with a static
initContainer which will break in a cluster where the API endpoints are listening on a different port.
How can it be reproduced?
K8s v1.20.4 cluster running the API server on a different port:
which will lead to the destination service not being able to reach the API:
(logs from the
linkerd-proxycontainer)As soon as I manually patch the deployments for identity & destination service to ignore outbound port
6443:all services come up and are healthy
Logs, error output, etc
See section /How can it be reproduced?/
output of
linkerd check -o short(that is - after setting the outgoing ignore ports)
Environment
v1.20.4, self hosted on Debian 104.19.0-20-amd64v1.9.5stable-2.11.2Possible solution
The outgoing port ignores should not be patched statically from an included template. This worked for us with Linkerd stable-2.10, so the breaking change was likely #6887.
Additional context
This could be duplicating #7460 but I think I can add a bit more context or help with debugging this.
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: