New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] K3S fails to authenticate completely valid service account tokens #4820
Comments
You didn't specify a
|
^^ this is correct, and the most likely cause of your issue. Ensure that you've got the same token on all three nodes. Right now you've probably got different bootstrap data (cluster root certificates, etc) on all three servers, so that tokens issued by one are not valid on others. |
All three of my nodes have the exact bootstrap token, this is weird. |
What version did you upgrade from? If you upgraded from a previous minor version, you might see if just draining/cordoning and then uncordoning your nodes (to restart the pods) fixes things. |
I believe i upgraded from 1.21.5 |
Is there a tool to validate that all my certificates are valid? And i’m noticing that when using curl over https to access the API, half my requests are valid others are not, resulting in 401. Cluster contents are identical from the surface, I just don’t understand why this is going on. Node draining did not help either. |
After some more debugging with cURL, it would also appear after requests a few times in a row ~4-5, I get 401 Unauthorized (I assume this is also normal behavior?), but audit logs show me that {
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "Metadata",
"auditID": "2fe95f15-5f46-43dd-93c4-4b7201f6b0e1",
"stage": "ResponseComplete",
"requestURI": "/",
"verb": "get",
"user": {
"username": "system:serviceaccount:<redacted>",
"uid": "8393b1a9-0d54-4d22-8d2c-b44408b6029e",
"groups": [
"system:serviceaccounts",
"system:serviceaccounts:<redacted>",
"system:authenticated"
],
"extra": {
"authentication.kubernetes.io/pod-name": [
"<redacted>"
],
"authentication.kubernetes.io/pod-uid": [
"0f492408-829a-46d1-9080-6d60cf8b45fd"
]
}
},
"sourceIPs": [
"10.42.2.197"
],
"userAgent": "curl/7.79.1",
"responseStatus": {
"metadata": {},
"status": "Failure",
"reason": "Forbidden",
"code": 403
},
"requestReceivedTimestamp": "2021-12-23T00:28:49.967345Z",
"stageTimestamp": "2021-12-23T00:28:49.967706Z",
"annotations": {
"authorization.k8s.io/decision": "forbid",
"authorization.k8s.io/reason": ""
}
}
later...
{
"kind": "Event",
"apiVersion": "audit.k8s.io/v1",
"level": "Metadata",
"auditID": "1cdc6a36-4416-4352-8197-68ff4e4fefff",
"stage": "ResponseStarted",
"requestURI": "/",
"verb": "get",
"user": {},
"sourceIPs": [
"10.42.2.197"
],
"userAgent": "curl/7.79.1",
"responseStatus": {
"metadata": {},
"status": "Failure",
"reason": "Unauthorized",
"code": 401
},
"requestReceivedTimestamp": "2021-12-23T00:31:53.684574Z",
"stageTimestamp": "2021-12-23T00:31:53.684850Z"
} It isn't detecting the user (but they provided valid authentication?), it would appear. Something is definitely broken here, and I do not believe I messed something up on my side, this started after upgrading. |
I am closing this issue now as my issue was that time was not synced, sorry. |
Just for posterity, how off were they? |
They were around 8 hours off. And not synced with NTP. |
Environmental Info:
K3s Version:
k3s version v1.22.5+k3s1 (405bf79)
go version go1.16.10
Node(s) CPU architecture, OS, and Version:
Linux us-dedi1 5.4.0-90-generic #101-Ubuntu SMP Fri Oct 15 20:00:55 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi2 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Linux us-dedi3 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Cluster Configuration:
3 nodes running in high availability.
Installed using:
curl -sfL https://get.k3s.io | INSTALL_K3S_CHANNEL=latest sh -s - server --datastore-endpoint="etcd nodes" --kube-apiserver-arg=enable-admission-plugins=DefaultTolerationSeconds,PodSecurityPolicy --kube-apiserver-arg=feature-gates=ServiceInternalTrafficPolicy=true --docker --kube-apiserver-arg=default-not-ready-toleration-seconds=15 --kube-apiserver-arg=default-unreachable-toleration-seconds=15 --disable=traefik
Audit logs are enabled.
Describe the bug:
I have noticed most of my API server requests are failing on two of my nodes after updating to version 1.22.5, I read something on the documentation about the TokenRequest API; that doesn't work either.
The default install of traefik will fail with "Failed to watch... Unauthorized", I reproduced this on my local testing cluster as-well.
Steps To Reproduce:
Expected behavior:
All my servers to handle API server requests properly, without 2 being Unauthenticated despite a valid service account.
Actual behavior:
Two of my kube-apiservers are failing, responding with 401: Unauthorized, which is weird because one node authenticates the same tokens properly.
Additional context / logs:
My production logs show (this being the node which reports unauthenticated, one node authenticates properly):
Backporting
The text was updated successfully, but these errors were encountered: