-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
What is the issue?
We are encountering sporadic HTTP 403 response codes across all our clusters in the Linkerd-proxy setup. The attached logs capture the conversation flow from gojira-pod's Linkerd-proxy sidecar <-> gojira-pod's Linkerd-debug sidecar <-> mosura-pod's Linkerd-debug sidecar <-> mosura-pod's Linkerd-proxy sidecar, illustrating one such occurrence. Approximately 5 promille of all requests fail due to unsuccessful TLS handshakes, resulting in HTTP 403 response codes.
How can it be reproduced?
To reproduce the issue, you can generate approximately 1000 requests to a linkerd-proxy setup. A small percentage of these requests will fail, with the log message "Peer does not support TLS" occurring during the TLS handshake. The client and target are situated in different namespaces, with the client being gojira-pod and the server being mosura-pod, where mosura serves the kani-api. This issue can be replicated using any client communicating with any server that provides a REST API.
For detailed configuration, please refer to the "Additional context" section, where the configurations of authorizationpolicies.v1alpha1.policy.linkerd.io, servers.v1beta3.policy.linkerd.io, and meshtlsauthentications.v1alpha1.policy.linkerd.io can be reviewed.
Logs, error output, etc
gojira-debug.log.csv
gojira-proxy.log.csv
mosura-debug.log.csv
mosura-proxy.log.csv
output of linkerd check -o short
kurisu@linkerd-worries$ linkerd check -n linkerd --linkerd-namespace linkerd --cni-namespace linkerd -o short
linkerd-identity
----------------
‼ trust anchors are valid for at least 60 days
Anchors expiring soon:
* 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
see https://linkerd.io/2/checks/#l5d-identity-trustAnchors-not-expiring-soon for hints
‼ issuer cert is valid for at least 60 days
issuer certificate will expire on 2025-05-01T08:56:18Z
see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints
linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
Anchors expiring soon:
* 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
* 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
see https://linkerd.io/2/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
Anchors expiring soon:
* 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
* 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
see https://linkerd.io/2/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
Anchors expiring soon:
* 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
* 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
see https://linkerd.io/2/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints
linkerd-version
---------------
‼ cli is up-to-date
is running version 25.2.3 but the latest edge version is 25.3.3
see https://linkerd.io/2/checks/#l5d-version-cli for hints
control-plane-version
---------------------
‼ control plane is up-to-date
is running version 25.1.2 but the latest edge version is 25.3.3
see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
control plane running edge-25.1.2 but cli running edge-25.2.3
see https://linkerd.io/2/checks/#l5d-version-control for hints
linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
some proxies are not running the current version:
* linkerd-destination-5c7f864fcc-2zqp6 (edge-25.1.2)
* linkerd-destination-5c7f864fcc-k2k7t (edge-25.1.2)
* linkerd-identity-7554d6fdd-pjhvd (edge-25.1.2)
* linkerd-identity-7554d6fdd-qlwph (edge-25.1.2)
* linkerd-proxy-injector-6d47f9cc8-jht52 (edge-25.1.2)
* linkerd-proxy-injector-6d47f9cc8-splcs (edge-25.1.2)
see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
linkerd-destination-5c7f864fcc-2zqp6 running edge-25.1.2 but cli running edge-25.2.3
see https://linkerd.io/2/checks/#l5d-cp-proxy-cli-version for hints
Status check results are √
Environment
kurisu@linkerd-worries$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.30.7
Possible solution
No response
Additional context
kurisu@linkerd-worries$ kubectl get -n mosura-land authorizationpolicies.v1alpha1.policy.linkerd.io
NAME AGE
kani-auth-gojira-land 16d
kani-auth-mosura-land 16d
kurisu@linkerd-worries$ kubectl get -n gojira-land authorizationpolicies.v1alpha1.policy.linkerd.io
No resources found in gojira-land namespace.
kurisu@linkerd-worries$ linkerd authz -n mosura-land deploy/kani-api
ROUTE SERVER AUTHORIZATION_POLICY SERVER_AUTHORIZATION
* kani-api-server kani-auth-gojira-land
* kani-api-server kani-auth-mosura-land
kurisu@linkerd-worries$ kubectl describe servers.v1beta3.policy.linkerd.io -n mosura-land kani-api-server
Name: kani-api-server
Namespace: mosura-land
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: kani-api
meta.helm.sh/release-namespace: mosura-land
API Version: policy.linkerd.io/v1beta3
Kind: Server
Metadata:
Creation Timestamp: <redacted>
Generation: <redacted>
Resource Version: <redacted>
UID: <redacted>
Spec:
Access Policy: deny
Pod Selector:
Match Labels:
App: kani-api
Port: 8080
Proxy Protocol: unknown
Events: <none>
kurisu@linkerd-worries$ kubectl describe authorizationpolicies.v1alpha1.policy.linkerd.io -n mosura-land kani-auth-gojira-land
Name: kani-auth-gojira-land
Namespace: mosura-land
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: kani-api
meta.helm.sh/release-namespace: mosura-land
API Version: policy.linkerd.io/v1alpha1
Kind: AuthorizationPolicy
Metadata:
Creation Timestamp: <redacted>
Generation: <redacted>
Resource Version: <redacted>
UID: <redacted>
Spec:
Required Authentication Refs:
Group: policy.linkerd.io
Kind: MeshTLSAuthentication
Name: kani-gojira-meshauth
Target Ref:
Group: policy.linkerd.io
Kind: Server
Name: kani-api-server
Events: <none>
kurisu@linkerd-worries$ kubectl describe authorizationpolicies.v1alpha1.policy.linkerd.io -n mosura-land kani-auth-mosura-land
Name: kani-auth-mosura-land
Namespace: mosura-land
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: kani-api
meta.helm.sh/release-namespace: mosura-land
API Version: policy.linkerd.io/v1alpha1
Kind: AuthorizationPolicy
Metadata:
Creation Timestamp: <redacted>
Generation: <redacted>
Resource Version: <redacted>
UID: <redacted>
Spec:
Required Authentication Refs:
Group: policy.linkerd.io
Kind: MeshTLSAuthentication
Name: kani-mosura-meshauth
Target Ref:
Group: policy.linkerd.io
Kind: Server
Name: kani-api-server
Events: <none>
kurisu@linkerd-worries$ kubectl describe -n mosura-land meshtlsauthentications.v1alpha1.policy.linkerd.io kani-gojira-meshauth
Name: kani-gojira-meshauth
Namespace: mosura-land
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: kani-api
meta.helm.sh/release-namespace: mosura-land
API Version: policy.linkerd.io/v1alpha1
Kind: MeshTLSAuthentication
Metadata:
Creation Timestamp: <redacted>
Generation: <redacted>
Resource Version: <redacted>
UID: <redacted>
Spec:
Identity Refs:
Kind: ServiceAccount
Name: gojira-proxy
Namespace: gojira-land
Events: <none>
kurisu@linkerd-worries$ kubectl describe -n mosura-land meshtlsauthentications.v1alpha1.policy.linkerd.io kani-mosura-meshauth
Name: kani-mosura-meshauth
Namespace: mosura-land
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: kani-api
meta.helm.sh/release-namespace: mosura-land
API Version: policy.linkerd.io/v1alpha1
Kind: MeshTLSAuthentication
Metadata:
Creation Timestamp: <redacted>
Generation: <redacted>
Resource Version: <redacted>
UID: <redacted>
Spec:
Identity Refs:
Kind: ServiceAccount
Name: default
Namespace: mosura-land
Events: <none>
Would you like to work on fixing this bug?
None