Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers fails to to register - Unable to register node "worker-0" with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope #597

Closed
gklein opened this issue Jun 5, 2019 · 6 comments

Comments

@gklein
Copy link

gklein commented Jun 5, 2019

Describe the bug
After a cluster deployment, all workers are failing to to register with:

Unable to register node "worker-0" with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope

To Reproduce
Deploy a 3 masters/3 workers cluster (virt) with OCP 4.2.0-0.ci-2019-06-03-105723-kni.0

$ oc get baremetalhosts -A
NAMESPACE               NAME                 STATUS   PROVISIONING STATUS   MACHINE                 BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-machine-api   openshift-master-0   OK       provisioned           ostest-master-0         ipmi://192.168.111.1:6230   unknown            true
openshift-machine-api   openshift-master-1   OK       provisioned           ostest-master-1         ipmi://192.168.111.1:6231   unknown            true
openshift-machine-api   openshift-master-2   OK       provisioned           ostest-master-2         ipmi://192.168.111.1:6232   unknown            true
openshift-machine-api   openshift-worker-0   OK       inspecting            ostest-worker-0-7spfq   ipmi://192.168.111.1:6233                      true
openshift-machine-api   openshift-worker-1   OK       inspecting            ostest-worker-0-zpcw4   ipmi://192.168.111.1:6234                      true
openshift-machine-api   openshift-worker-2   OK       inspecting            ostest-worker-0-wkgrb   ipmi://192.168.111.1:6235                      true

$ oc get nodes -A
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   12m   v1.13.4+cb455d664
master-1   Ready    master   12m   v1.13.4+cb455d664
master-2   Ready    master   12m   v1.13.4+cb455d664

From Worker-0 kublet log:

Jun 05 06:50:35 worker-0 hyperkube[1579]: I0605 06:50:35.687551    1579 kubelet_node_status.go:72] Attempting to register node worker-0
Jun 05 06:50:35 worker-0 hyperkube[1579]: I0605 06:50:35.687950    1579 event.go:221] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-0", UID:"worker-0", A
PIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasSufficientMemory' Node worker-0 status is now: NodeHasSufficientMemory
Jun 05 06:50:35 worker-0 hyperkube[1579]: I0605 06:50:35.688082    1579 event.go:221] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-0", UID:"worker-0", A
PIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasNoDiskPressure' Node worker-0 status is now: NodeHasNoDiskPressure
Jun 05 06:50:35 worker-0 hyperkube[1579]: I0605 06:50:35.688196    1579 event.go:221] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-0", UID:"worker-0", A
PIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'NodeHasSufficientPID' Node worker-0 status is now: NodeHasSufficientPID
Jun 05 06:50:35 worker-0 hyperkube[1579]: E0605 06:50:35.689491    1579 event.go:203] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectM
eta:v1.ObjectMeta{Name:"worker-0.15a53abe75c70b03", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{T
ime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Anno
tations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1
.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-0", UID:"worker-0", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasSufficientMemory", Message:"No
de worker-0 status is now: NodeHasSufficientMemory", Source:v1.EventSource{Component:"kubelet", Host:"worker-0"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf35f76ee69e7
d03, ext:444761842, loc:(*time.Location)(0xcedee40)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf35f76ee8f848e1, ext:484201177, loc:(*time.Location)(0xcedee40)}}, Count
:2, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReferenc
e)(nil), ReportingController:"", ReportingInstance:""}': 'events "worker-0.15a53abe75c70b03" is forbidden: User "system:anonymous" cannot patch resource "events" in API gro
up "" in the namespace "default"' (will not retry!)
Jun 05 06:50:35 worker-0 hyperkube[1579]: E0605 06:50:35.689556    1579 kubelet_node_status.go:94] Unable to register node "worker-0" with API server: nodes is forbidden: U
ser "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope
Jun 05 06:50:35 worker-0 hyperkube[1579]: E0605 06:50:35.692224    1579 event.go:203] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectM
eta:v1.ObjectMeta{Name:"worker-0.15a53abe75c744ca", GenerateName:"", Namespace:"default", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{T
ime:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Anno
tations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1
.ObjectReference{Kind:"Node", Namespace:"", Name:"worker-0", UID:"worker-0", APIVersion:"", ResourceVersion:"", FieldPath:""}, Reason:"NodeHasNoDiskPressure", Message:"Node
 worker-0 status is now: NodeHasNoDiskPressure", Source:v1.EventSource{Component:"kubelet", Host:"worker-0"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbf35f76ee69eb6ca,
 ext:444776633, loc:(*time.Location)(0xcedee40)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbf35f76ee8f9c3d8, ext:484298192, loc:(*time.Location)(0xcedee40)}}, Count:2,
Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(n
il), ReportingController:"", ReportingInstance:""}': 'events "worker-0.15a53abe75c744ca" is forbidden: User "system:anonymous" cannot patch resource "events" in API group "
" in the namespace "default"' (will not retry!)

Expected/observed behavior
All workers should be able to register the cluster correctly

@gklein gklein changed the title Workers fails to to register - Unable to register node "worker-0" with API server: nodes is forbidden: U ser "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope Workers fails to to register - Unable to register node "worker-0" with API server: nodes is forbidden: User "system:anonymous" cannot create resource "nodes" in API group "" at the cluster scope Jun 5, 2019
@hardys
Copy link

hardys commented Jun 5, 2019

I suspect this is the same issue reported in #570 - I see these errors, but they are temporary, eventually (after about 30mins in my environment) the workers do register, and the errors stop.

I'm not yet sure why this is happening though.

@hardys
Copy link

hardys commented Jun 5, 2019

I wonder if this is another csr approval issue, although I'm not clear how the nodes show as Ready when all the csrs are still pending, I guess the ignition-provided cert will expire then they'll become NotReady?

[shardy@dell-r630-007 dev-scripts]$ oc get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   63m   v1.13.4+cb455d664
master-1   Ready    master   63m   v1.13.4+cb455d664
master-2   Ready    master   64m   v1.13.4+cb455d664
worker-0   Ready    worker   28m   v1.13.4+cb455d664
worker-1   Ready    worker   28m   v1.13.4+cb455d664
worker-2   Ready    worker   28m   v1.13.4+cb455d664
[shardy@dell-r630-007 dev-scripts]$ oc get csrs
error: the server doesn't have a resource type "csrs"
[shardy@dell-r630-007 dev-scripts]$ oc get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
csr-2z8k2   64m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-6hcsm   63m     system:node:master-1                                                        Approved,Issued
csr-8m64v   64m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-8zqmc   64m     system:node:master-2                                                        Approved,Issued
csr-968jk   39m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-bwkrt   28m     system:node:worker-2                                                        Pending
csr-chtbr   63m     system:node:master-0                                                        Approved,Issued
csr-dm64z   3m23s   system:node:worker-0                                                        Pending
csr-hwfw5   64m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-jgrfs   40m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-q86pd   28m     system:node:worker-1                                                        Pending
csr-sdtgt   3m23s   system:node:worker-2                                                        Pending
csr-vkxk2   15m     system:node:worker-0                                                        Pending
csr-vxm9s   39m     system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
csr-wjnr4   28m     system:node:worker-0                                                        Pending
csr-xbwvk   15m     system:node:worker-2                                                        Pending
csr-zgsxt   15m     system:node:worker-1                                                        Pending
csr-ztb5f   3m23s   system:node:worker-1                                                        Pending

@hardys
Copy link

hardys commented Jun 5, 2019

Ok so the CSR approval is due to #421 so we have to run link-machine-and-node.sh manually for each worker - perhaps @russellb can help with a status update of making that association automatically now we have introspection data, atm I'm failing to find the upstream PRs.

@gklein
Copy link
Author

gklein commented Jun 5, 2019

I suspect this is the same issue reported in #570 - I see these errors, but they are temporary, eventually (after about 30mins in my environment) the workers do register, and the errors stop.

I'm not yet sure why this is happening though.
You are right about that. I opened this issue because I didn't see the workers registered after 30 min, but it was able to register it eventually.

I never saw this with OSP 4.1 , and this is my first try with OSP 4.2 so I wonder if it is related.

In addition, if this is CSR related, should we consider running the fix_certs.sh more aggressively for time being (At least till we get #260 fixed)?

@hardys
Copy link

hardys commented Jun 5, 2019

Ah thanks #260, I was missing that earlier.

I was wondering if we can come up with a better interim hack now we have introspection data, but I can confirm this works fine for me with a single worker:

$ oc get csr
NAME        AGE   REQUESTOR                                                                   CONDITION
csr-h2dcq   26m   system:node:worker-0                                                        Approved,Issued
csr-pbc2m   38m   system:node:worker-0                                                        Approved,Issued
csr-sngmx   49m   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Approved,Issued
$ oc get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   73m   v1.13.4+cb455d664
master-1   Ready    master   73m   v1.13.4+cb455d664
master-2   Ready    master   73m   v1.13.4+cb455d664
worker-0   Ready    worker   38m   v1.13.4+cb455d664

However this may all be tangential to your original report, because I see the same "cannot create resource "nodes" " errors in the logs before the worker-0 becomes ready.

@hardys
Copy link

hardys commented Jun 7, 2019

We've landed the short-term workaround which is to run fix_certs.sh more regularly, and #260 tracks the long-term fix, lets close this and track the final solution via #260

@hardys hardys closed this as completed Jun 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants