Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] K3S fails to authenticate completely valid service account tokens #4820

Closed
1 task
IceeMC opened this issue Dec 21, 2021 · 10 comments
Closed
1 task

[BUG] K3S fails to authenticate completely valid service account tokens #4820

IceeMC opened this issue Dec 21, 2021 · 10 comments

Comments

@IceeMC
Copy link

IceeMC commented Dec 21, 2021

Environmental Info:
K3s Version:

k3s version v1.22.5+k3s1 (405bf79)
go version go1.16.10

Node(s) CPU architecture, OS, and Version:

Linux us-dedi1 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi3 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

3 nodes running in high availability.

Installed using: curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -s - server --datastore-endpoint="etcd nodes" --kube-apiserver-arg=enable-admission-plugins=DefaultTolerationSeconds,PodSecurityPolicy --kube-apiserver-arg=feature-gates=ServiceInternalTrafficPolicy=true --docker --kube-apiserver-arg=default-not-ready-toleration-seconds=15 --kube-apiserver-arg=default-unreachable-toleration-seconds=15 --disable=traefik

Audit logs are enabled.

Describe the bug:

I have noticed most of my API server requests are failing on two of my nodes after updating to version 1.22.5, I read something on the documentation about the TokenRequest API; that doesn't work either.
The default install of traefik will fail with "Failed to watch... Unauthorized", I reproduced this on my local testing cluster as-well.

Steps To Reproduce:

  • Install the version of K3S above.
  • Traefik should deploy but return 401 Unauthenticated.
  • 401 Is also expected for service account tokens over a year of age, rancher fails too; well, anything with a service account token that is.
  • Try to delete the service account token secret, you will see that does nothing either.

Expected behavior:

All my servers to handle API server requests properly, without 2 being Unauthenticated despite a valid service account.

Actual behavior:

Two of my kube-apiservers are failing, responding with 401: Unauthorized, which is weird because one node authenticates the same tokens properly.

Additional context / logs:

My production logs show (this being the node which reports unauthenticated, one node authenticates properly):

E1221 23:49:56.958198       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:49:57.869531       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
E1221 23:49:58.008336       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Unauthorized
E1221 23:49:58.174947       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.TLSStore: failed to list *v1alpha1.TLSStore: Unauthorized
E1221 23:49:58.357891       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: Unauthorized
E1221 23:49:58.428274       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.Middleware: failed to list *v1alpha1.Middleware: Unauthorized
E1221 23:49:58.488262       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:49:59.727860       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Unauthorized
E1221 23:50:00.299258       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Secret: failed to list *v1.Secret: Unauthorized
E1221 23:50:00.358712       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Unauthorized
E1221 23:50:00.757727       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1.IngressClass: failed to list *v1.IngressClass: Unauthorized
E1221 23:50:01.212120       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.TLSStore: failed to list *v1alpha1.TLSStore: Unauthorized
E1221 23:50:01.320630       1 reflector.go:138] pkg/mod/k8s.io/client-go@v0.21.0/tools/cache/reflector.go:167: Failed to watch *v1alpha1.Middleware: failed to list *v1alpha1.Middleware: Unauthorized

Backporting

  • Needs backporting to older releases
@ChristianCiach
Copy link

You didn't specify a token when installing the k3s servers. According to https://rancher.com/docs/k3s/latest/en/installation/ha/:

The same example command in Step 2 can be used to join additional server nodes, where the token from the first node needs to be used.

If the first server node was started without the --token CLI flag or K3S_TOKEN variable, the token value can be retrieved from any server already joined to the cluster:

@brandond
Copy link
Contributor

^^ this is correct, and the most likely cause of your issue. Ensure that you've got the same token on all three nodes. Right now you've probably got different bootstrap data (cluster root certificates, etc) on all three servers, so that tokens issued by one are not valid on others.

@IceeMC
Copy link
Author

IceeMC commented Dec 22, 2021

All three of my nodes have the exact bootstrap token, this is weird.

@brandond
Copy link
Contributor

What version did you upgrade from? If you upgraded from a previous minor version, you might see if just draining/cordoning and then uncordoning your nodes (to restart the pods) fixes things.

@IceeMC
Copy link
Author

IceeMC commented Dec 23, 2021

I believe i upgraded from 1.21.5

@IceeMC
Copy link
Author

IceeMC commented Dec 23, 2021

Is there a tool to validate that all my certificates are valid? And i’m noticing that when using curl over https to access the API, half my requests are valid others are not, resulting in 401. Cluster contents are identical from the surface, I just don’t understand why this is going on. Node draining did not help either.

@IceeMC
Copy link
Author

IceeMC commented Dec 23, 2021

After some more debugging with cURL, it would also appear after requests a few times in a row ~4-5, I get 401 Unauthorized (I assume this is also normal behavior?), but audit logs show me that

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "2fe95f15-5f46-43dd-93c4-4b7201f6b0e1",
  "stage": "ResponseComplete",
  "requestURI": "/",
  "verb": "get",
  "user": {
    "username": "system:serviceaccount:<redacted>",
    "uid": "8393b1a9-0d54-4d22-8d2c-b44408b6029e",
    "groups": [
      "system:serviceaccounts",
      "system:serviceaccounts:<redacted>",
      "system:authenticated"
    ],
    "extra": {
      "authentication.kubernetes.io/pod-name": [
        "<redacted>"
      ],
      "authentication.kubernetes.io/pod-uid": [
        "0f492408-829a-46d1-9080-6d60cf8b45fd"
      ]
    }
  },
  "sourceIPs": [
    "10.42.2.197"
  ],
  "userAgent": "curl/7.79.1",
  "responseStatus": {
    "metadata": {},
    "status": "Failure",
    "reason": "Forbidden",
    "code": 403
  },
  "requestReceivedTimestamp": "2021-12-23T00:28:49.967345Z",
  "stageTimestamp": "2021-12-23T00:28:49.967706Z",
  "annotations": {
    "authorization.k8s.io/decision": "forbid",
    "authorization.k8s.io/reason": ""
  }
}

later...

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "1cdc6a36-4416-4352-8197-68ff4e4fefff",
  "stage": "ResponseStarted",
  "requestURI": "/",
  "verb": "get",
  "user": {},
  "sourceIPs": [
    "10.42.2.197"
  ],
  "userAgent": "curl/7.79.1",
  "responseStatus": {
    "metadata": {},
    "status": "Failure",
    "reason": "Unauthorized",
    "code": 401
  },
  "requestReceivedTimestamp": "2021-12-23T00:31:53.684574Z",
  "stageTimestamp": "2021-12-23T00:31:53.684850Z"
}

It isn't detecting the user (but they provided valid authentication?), it would appear. Something is definitely broken here, and I do not believe I messed something up on my side, this started after upgrading.

@IceeMC
Copy link
Author

IceeMC commented Dec 23, 2021

I am closing this issue now as my issue was that time was not synced, sorry.

@IceeMC IceeMC closed this as completed Dec 23, 2021
@brandond
Copy link
Contributor

Just for posterity, how off were they?

@IceeMC
Copy link
Author

IceeMC commented Dec 23, 2021

They were around 8 hours off. And not synced with NTP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants