-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OKD FCOS 4.15] OKD master unable to start up, boot logs location to see errors with oc #1995
Comments
As port 6443 is up and you're able to SSH into the control-planes, I would use that route to find out some information about the cluster. You can use the node-kubeconfig files in For example, when connected to one of the control-plane nodes: export KUBECONFIG="/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig"
oc get clusterversion,co,nodes,csr Based on the results you can find out what part of the cluster needs your attention. |
This cluster has
export KUBECONFIG="/etc/kubernetes/static-pod-resources/kube-apiserver-certs/secrets/node-kubeconfigs/localhost-recovery.kubeconfig"
oc get csr | grep Pending | awk '{print $1}' | xargs oc adm certificate approve
oc get csr When new |
Hi Melle, my cluster revived and it works! thanks so much for your instantaneous support on my case! |
Describe the bug
I have a 3-node OKD FCOS 4.15 airgap cluster and with an on-prem quay. On cluster restart, port 6443 is green on HAProxy but machine-config 22623 maintains red and console doesn't. I can't query anything with oc from my bastion server but I can ssh into each master. May I know how to check the boot-up ignition errors of each individual master node to see why it's failing? I've queried coreos-ignition-write-issues.service but doesnt show significant errors.
On this note, would like to check besides communicating with the quay repo, is the apache httpd server master.ign and worker.ign important for the cluster startup? Although I my master.ign is accessible, I thought there's a csr expiry of 24hrs and would not be necessary after cluster initialization, is my theory correct?
Cluster environment
OKD Cluster Version: 4.15.0-0.okd-2024-03-10-010116
Kernel version: v1.28.2-3598+6e2789bbd58938-dirty
Installation method: Bare-metal UPI (Airgapped, self hosted quay)
The text was updated successfully, but these errors were encountered: