[BUG] K3S fails to authenticate completely valid service account tokens #4820

IceeMC · 2021-12-21T23:53:01Z

Environmental Info:
K3s Version:

k3s version v1.22.5+k3s1 (405bf79)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux us-dedi1 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi3 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

3 nodes running in high availability.

Installed using: curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -s - server --datastore-endpoint="etcd nodes" --kube-apiserver-arg=enable-admission-plugins=DefaultTolerationSeconds,PodSecurityPolicy --kube-apiserver-arg=feature-gates=ServiceInternalTrafficPolicy=true --docker --kube-apiserver-arg=default-not-ready-toleration-seconds=15 --kube-apiserver-arg=default-unreachable-toleration-seconds=15 --disable=traefik

Audit logs are enabled.

Describe the bug:

I have noticed most of my API server requests are failing on two of my nodes after updating to version 1.22.5, I read something on the documentation about the TokenRequest API; that doesn't work either.
The default install of traefik will fail with "Failed to watch... Unauthorized", I reproduced this on my local testing cluster as-well.

Steps To Reproduce:

Install the version of K3S above.
Traefik should deploy but return 401 Unauthenticated.
401 Is also expected for service account tokens over a year of age, rancher fails too; well, anything with a service account token that is.
Try to delete the service account token secret, you will see that does nothing either.

Expected behavior:

All my servers to handle API server requests properly, without 2 being Unauthenticated despite a valid service account.

Actual behavior:

Two of my kube-apiservers are failing, responding with 401: Unauthorized, which is weird because one node authenticates the same tokens properly.

Additional context / logs:

My production logs show (this being the node which reports unauthenticated, one node authenticates properly):

E1221 23:49:56.958198       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:49:57.869531       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
E1221 23:49:58.008336       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Unauthorized
E1221 23:49:58.174947       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.TLSStore: failed to list *v1alpha1.TLSStore: Unauthorized
E1221 23:49:58.357891       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: Unauthorized
E1221 23:49:58.428274       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.Middleware: failed to list *v1alpha1.Middleware: Unauthorized
E1221 23:49:58.488262       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:49:59.727860       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
E1221 23:50:00.299258       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: Unauthorized
E1221 23:50:00.358712       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Unauthorized
E1221 23:50:00.757727       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:50:01.212120       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.TLSStore: failed to list *v1alpha1.TLSStore: Unauthorized
E1221 23:50:01.320630       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.Middleware: failed to list *v1alpha1.Middleware: Unauthorized

Backporting

Needs backporting to older releases

The text was updated successfully, but these errors were encountered:

ChristianCiach · 2021-12-22T21:09:18Z

You didn't specify a token when installing the k3s servers. According to https://rancher.com/docs/k3s/latest/en/installation/ha/:

The same example command in Step 2 can be used to join additional server nodes, where the token from the first node needs to be used.

If the first server node was started without the --token CLI flag or K3S_TOKEN variable, the token value can be retrieved from any server already joined to the cluster:

brandond · 2021-12-22T21:13:17Z

^^ this is correct, and the most likely cause of your issue. Ensure that you've got the same token on all three nodes. Right now you've probably got different bootstrap data (cluster root certificates, etc) on all three servers, so that tokens issued by one are not valid on others.

IceeMC · 2021-12-22T23:57:56Z

All three of my nodes have the exact bootstrap token, this is weird.

brandond · 2021-12-23T00:37:13Z

What version did you upgrade from? If you upgraded from a previous minor version, you might see if just draining/cordoning and then uncordoning your nodes (to restart the pods) fixes things.

IceeMC · 2021-12-23T01:01:44Z

I believe i upgraded from 1.21.5

IceeMC · 2021-12-23T03:02:18Z

Is there a tool to validate that all my certificates are valid? And i’m noticing that when using curl over https to access the API, half my requests are valid others are not, resulting in 401. Cluster contents are identical from the surface, I just don’t understand why this is going on. Node draining did not help either.

IceeMC · 2021-12-23T08:52:22Z

After some more debugging with cURL, it would also appear after requests a few times in a row ~4-5, I get 401 Unauthorized (I assume this is also normal behavior?), but audit logs show me that

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "2fe95f15-5f46-43dd-93c4-4b7201f6b0e1",
  "stage": "ResponseComplete",
  "requestURI": "/",
  "verb": "get",
  "user": {
    "username": "system:serviceaccount:<redacted>",
    "uid": "8393b1a9-0d54-4d22-8d2c-b44408b6029e",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:<redacted>",
      "system:authenticated"
    ],
    "extra": {
      "authentication.kubernetes.io/pod-name": [
        "<redacted>"
      ],
      "authentication.kubernetes.io/pod-uid": [
        "0f492408-829a-46d1-9080-6d60cf8b45fd"
      ]
    }
  },
  "sourceIPs": [
    "10.42.2.197"
  ],
  "userAgent": "curl/7.79.1",
  "responseStatus": {
    "metadata": {},
    "status": "Failure",
    "reason": "Forbidden",
    "code": 403
  },
  "requestReceivedTimestamp": "2021-12-23T00:28:49.967345Z",
  "stageTimestamp": "2021-12-23T00:28:49.967706Z",
  "annotations": {
    "authorization.k8s.io/decision": "forbid",
    "authorization.k8s.io/reason": ""
  }
}

later...

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "1cdc6a36-4416-4352-8197-68ff4e4fefff",
  "stage": "ResponseStarted",
  "requestURI": "/",
  "verb": "get",
  "user": {},
  "sourceIPs": [
    "10.42.2.197"
  ],
  "userAgent": "curl/7.79.1",
  "responseStatus": {
    "metadata": {},
    "status": "Failure",
    "reason": "Unauthorized",
    "code": 401
  },
  "requestReceivedTimestamp": "2021-12-23T00:31:53.684574Z",
  "stageTimestamp": "2021-12-23T00:31:53.684850Z"
}

It isn't detecting the user (but they provided valid authentication?), it would appear. Something is definitely broken here, and I do not believe I messed something up on my side, this started after upgrading.

IceeMC · 2021-12-23T11:51:48Z

I am closing this issue now as my issue was that time was not synced, sorry.

brandond · 2021-12-23T17:22:49Z

Just for posterity, how off were they?

IceeMC · 2021-12-23T21:32:41Z

They were around 8 hours off. And not synced with NTP.

IceeMC closed this as completed Dec 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] K3S fails to authenticate completely valid service account tokens #4820

[BUG] K3S fails to authenticate completely valid service account tokens #4820

IceeMC commented Dec 21, 2021 •

edited

ChristianCiach commented Dec 22, 2021

brandond commented Dec 22, 2021

IceeMC commented Dec 22, 2021

brandond commented Dec 23, 2021

IceeMC commented Dec 23, 2021

IceeMC commented Dec 23, 2021 •

edited

IceeMC commented Dec 23, 2021 •

edited

IceeMC commented Dec 23, 2021

brandond commented Dec 23, 2021

IceeMC commented Dec 23, 2021

[BUG] K3S fails to authenticate completely valid service account tokens #4820

[BUG] K3S fails to authenticate completely valid service account tokens #4820

Comments

IceeMC commented Dec 21, 2021 • edited

ChristianCiach commented Dec 22, 2021

brandond commented Dec 22, 2021

IceeMC commented Dec 22, 2021

brandond commented Dec 23, 2021

IceeMC commented Dec 23, 2021

IceeMC commented Dec 23, 2021 • edited

IceeMC commented Dec 23, 2021 • edited

IceeMC commented Dec 23, 2021

brandond commented Dec 23, 2021

IceeMC commented Dec 23, 2021

IceeMC commented Dec 21, 2021 •

edited

IceeMC commented Dec 23, 2021 •

edited

IceeMC commented Dec 23, 2021 •

edited