Skip to content

Sporadic HTTP 403 Response Codes Due to TLS Handshake Failures #13865

@soma-kurisu

Description

@soma-kurisu

What is the issue?

We are encountering sporadic HTTP 403 response codes across all our clusters in the Linkerd-proxy setup. The attached logs capture the conversation flow from gojira-pod's Linkerd-proxy sidecar <-> gojira-pod's Linkerd-debug sidecar <-> mosura-pod's Linkerd-debug sidecar <-> mosura-pod's Linkerd-proxy sidecar, illustrating one such occurrence. Approximately 5 promille of all requests fail due to unsuccessful TLS handshakes, resulting in HTTP 403 response codes.

How can it be reproduced?

To reproduce the issue, you can generate approximately 1000 requests to a linkerd-proxy setup. A small percentage of these requests will fail, with the log message "Peer does not support TLS" occurring during the TLS handshake. The client and target are situated in different namespaces, with the client being gojira-pod and the server being mosura-pod, where mosura serves the kani-api. This issue can be replicated using any client communicating with any server that provides a REST API.

For detailed configuration, please refer to the "Additional context" section, where the configurations of authorizationpolicies.v1alpha1.policy.linkerd.io, servers.v1beta3.policy.linkerd.io, and meshtlsauthentications.v1alpha1.policy.linkerd.io can be reviewed.

Logs, error output, etc

gojira-debug.log.csv
gojira-proxy.log.csv
mosura-debug.log.csv
mosura-proxy.log.csv

output of linkerd check -o short

kurisu@linkerd-worries$ linkerd check -n linkerd --linkerd-namespace linkerd --cni-namespace linkerd  -o short
linkerd-identity
----------------
‼ trust anchors are valid for at least 60 days
    Anchors expiring soon:
        * 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
    see https://linkerd.io/2/checks/#l5d-identity-trustAnchors-not-expiring-soon for hints
‼ issuer cert is valid for at least 60 days
    issuer certificate will expire on 2025-05-01T08:56:18Z
    see https://linkerd.io/2/checks/#l5d-identity-issuer-cert-not-expiring-soon for hints

linkerd-webhooks-and-apisvc-tls
-------------------------------
‼ proxy-injector cert is valid for at least 60 days
    Anchors expiring soon:
        * 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
        * 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
    see https://linkerd.io/2/checks/#l5d-proxy-injector-webhook-cert-not-expiring-soon for hints
‼ sp-validator cert is valid for at least 60 days
    Anchors expiring soon:
        * 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
        * 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
    see https://linkerd.io/2/checks/#l5d-sp-validator-webhook-cert-not-expiring-soon for hints
‼ policy-validator cert is valid for at least 60 days
    Anchors expiring soon:
        * 4116 BlueOysterCA will expire on 2025-05-23T16:53:17Z
        * 558509666953049899959298119593666570293338784147 root.linkerd.cluster.local will expire on 2025-04-13T15:29:32Z
    see https://linkerd.io/2/checks/#l5d-policy-validator-webhook-cert-not-expiring-soon for hints

linkerd-version
---------------
‼ cli is up-to-date
    is running version 25.2.3 but the latest edge version is 25.3.3
    see https://linkerd.io/2/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ control plane is up-to-date
    is running version 25.1.2 but the latest edge version is 25.3.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints
‼ control plane and cli versions match
    control plane running edge-25.1.2 but cli running edge-25.2.3
    see https://linkerd.io/2/checks/#l5d-version-control for hints

linkerd-control-plane-proxy
---------------------------
‼ control plane proxies are up-to-date
    some proxies are not running the current version:
        * linkerd-destination-5c7f864fcc-2zqp6 (edge-25.1.2)
        * linkerd-destination-5c7f864fcc-k2k7t (edge-25.1.2)
        * linkerd-identity-7554d6fdd-pjhvd (edge-25.1.2)
        * linkerd-identity-7554d6fdd-qlwph (edge-25.1.2)
        * linkerd-proxy-injector-6d47f9cc8-jht52 (edge-25.1.2)
        * linkerd-proxy-injector-6d47f9cc8-splcs (edge-25.1.2)
    see https://linkerd.io/2/checks/#l5d-cp-proxy-version for hints
‼ control plane proxies and cli versions match
    linkerd-destination-5c7f864fcc-2zqp6 running edge-25.1.2 but cli running edge-25.2.3
    see https://linkerd.io/2/checks/#l5d-cp-proxy-cli-version for hints

Status check results are √

Environment

kurisu@linkerd-worries$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.30.7

Possible solution

No response

Additional context

kurisu@linkerd-worries$ kubectl get -n mosura-land authorizationpolicies.v1alpha1.policy.linkerd.io
NAME                            AGE
kani-auth-gojira-land           16d
kani-auth-mosura-land           16d
kurisu@linkerd-worries$ kubectl get -n gojira-land authorizationpolicies.v1alpha1.policy.linkerd.io
No resources found in gojira-land namespace.

kurisu@linkerd-worries$ linkerd authz -n mosura-land deploy/kani-api
ROUTE   SERVER              AUTHORIZATION_POLICY     SERVER_AUTHORIZATION
*       kani-api-server     kani-auth-gojira-land
*       kani-api-server     kani-auth-mosura-land

kurisu@linkerd-worries$ kubectl describe servers.v1beta3.policy.linkerd.io -n mosura-land kani-api-server
Name:         kani-api-server
Namespace:    mosura-land
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: kani-api
              meta.helm.sh/release-namespace: mosura-land
API Version:  policy.linkerd.io/v1beta3
Kind:         Server
Metadata:
  Creation Timestamp:  <redacted>
  Generation:          <redacted>
  Resource Version:    <redacted>
  UID:                 <redacted>
Spec:
  Access Policy:  deny
  Pod Selector:
    Match Labels:
      App:         kani-api
  Port:            8080
  Proxy Protocol:  unknown
Events:            <none>

kurisu@linkerd-worries$ kubectl describe authorizationpolicies.v1alpha1.policy.linkerd.io -n mosura-land kani-auth-gojira-land
Name:         kani-auth-gojira-land
Namespace:    mosura-land
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: kani-api
              meta.helm.sh/release-namespace: mosura-land
API Version:  policy.linkerd.io/v1alpha1
Kind:         AuthorizationPolicy
Metadata:
  Creation Timestamp:  <redacted>
  Generation:          <redacted>
  Resource Version:    <redacted>
  UID:                 <redacted>
Spec:
  Required Authentication Refs:
    Group:  policy.linkerd.io
    Kind:   MeshTLSAuthentication
    Name:   kani-gojira-meshauth
  Target Ref:
    Group:  policy.linkerd.io
    Kind:   Server
    Name:   kani-api-server
Events:     <none>

kurisu@linkerd-worries$ kubectl describe authorizationpolicies.v1alpha1.policy.linkerd.io -n mosura-land kani-auth-mosura-land
Name:         kani-auth-mosura-land
Namespace:    mosura-land
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: kani-api
              meta.helm.sh/release-namespace: mosura-land
API Version:  policy.linkerd.io/v1alpha1
Kind:         AuthorizationPolicy
Metadata:
  Creation Timestamp:  <redacted>
  Generation:          <redacted>
  Resource Version:    <redacted>
  UID:                 <redacted>
Spec:
  Required Authentication Refs:
    Group:  policy.linkerd.io
    Kind:   MeshTLSAuthentication
    Name:   kani-mosura-meshauth
  Target Ref:
    Group:  policy.linkerd.io
    Kind:   Server
    Name:   kani-api-server
Events:     <none>

kurisu@linkerd-worries$ kubectl describe -n mosura-land meshtlsauthentications.v1alpha1.policy.linkerd.io kani-gojira-meshauth
Name:         kani-gojira-meshauth
Namespace:    mosura-land
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: kani-api
              meta.helm.sh/release-namespace: mosura-land
API Version:  policy.linkerd.io/v1alpha1
Kind:         MeshTLSAuthentication
Metadata:
  Creation Timestamp:  <redacted>
  Generation:          <redacted>
  Resource Version:    <redacted>
  UID:                 <redacted>
Spec:
  Identity Refs:
    Kind:       ServiceAccount
    Name:       gojira-proxy
    Namespace:  gojira-land
Events:         <none>

kurisu@linkerd-worries$ kubectl describe -n mosura-land meshtlsauthentications.v1alpha1.policy.linkerd.io kani-mosura-meshauth
Name:         kani-mosura-meshauth
Namespace:    mosura-land
Labels:       app.kubernetes.io/managed-by=Helm
Annotations:  meta.helm.sh/release-name: kani-api
              meta.helm.sh/release-namespace: mosura-land
API Version:  policy.linkerd.io/v1alpha1
Kind:         MeshTLSAuthentication
Metadata:
  Creation Timestamp:  <redacted>
  Generation:          <redacted>
  Resource Version:    <redacted>
  UID:                 <redacted>
Spec:
  Identity Refs:
    Kind:       ServiceAccount
    Name:       default
    Namespace:  mosura-land
Events:         <none>

Would you like to work on fixing this bug?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions