Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not catching when nodes are Ready,SchedulingDisabled #203

Open
paigerube14 opened this issue Jul 12, 2023 · 1 comment
Open

Not catching when nodes are Ready,SchedulingDisabled #203

paigerube14 opened this issue Jul 12, 2023 · 1 comment

Comments

@paigerube14
Copy link
Collaborator

paigerube14 commented Jul 12, 2023

Cerberus run passes but 2 nodes are SchedulingDisabled, should catch this as a failure or make it an option

Node has taint: Taints: node.kubernetes.io/unschedulable:NoSchedule

07-12 15:50:03.255  
07-12 15:50:03.255                 _                         
07-12 15:50:03.255    ___ ___ _ __| |__   ___ _ __ _   _ ___ 
07-12 15:50:03.255   / __/ _ \ '__| '_ \ / _ \ '__| | | / __|
07-12 15:50:03.255  | (_|  __/ |  | |_) |  __/ |  | |_| \__ \
07-12 15:50:03.255   \___\___|_|  |_.__/ \___|_|   \__,_|___/
07-12 15:50:03.255                                           
07-12 15:50:03.255  
07-12 15:50:03.839  Error: unknown flag: --duration
07-12 15:50:03.839  See 'oc create --help' for usage.
07-12 15:50:04.397  2023-07-12 19:50:04,235 [WARNING] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2633)'))': /api/v1/namespaces/openshift-kube-apiserver-operator/pods?pretty=True&limit=100
07-12 15:50:04.397  2023-07-12 19:50:04,237 [WARNING] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))': /api/v1/namespaces/openshift-user-workload-monitoring/pods?pretty=True&limit=100
07-12 15:50:04.397  2023-07-12 19:50:04,237 [WARNING] Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2633)'))': /api/v1/namespaces/openshift-machine-api/pods?pretty=True&limit=100
07-12 15:50:11.055  2023-07-12 19:50:09,944 [INFO] Iteration 1: No Terminating Namespaces status: True
07-12 15:50:11.055  2023-07-12 19:50:09,966 [INFO] Iteration 1: Node status: True
07-12 15:50:11.055  2023-07-12 19:50:10,174 [INFO] Iteration 1: Cluster Operator status: True
07-12 15:50:11.055  2023-07-12 19:50:10,206 [INFO] Iteration 1: openshift-user-workload-monitoring: True
07-12 15:50:11.055  2023-07-12 19:50:10,207 [INFO] Iteration 1: openshift-ovirt-infra: True
07-12 15:50:11.055  2023-07-12 19:50:10,228 [INFO] Iteration 1: openshift-host-network: True
07-12 15:50:11.055  2023-07-12 19:50:10,230 [INFO] Iteration 1: openshift-kube-apiserver-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,253 [INFO] Iteration 1: openshift-insights: True
07-12 15:50:11.055  2023-07-12 19:50:10,263 [INFO] Iteration 1: openshift-machine-api: True
07-12 15:50:11.055  2023-07-12 19:50:10,271 [INFO] Iteration 1: openshift-cluster-machine-approver: True
07-12 15:50:11.055  2023-07-12 19:50:10,280 [INFO] Iteration 1: openshift-cluster-version: True
07-12 15:50:11.055  2023-07-12 19:50:10,293 [INFO] Iteration 1: openshift-service-ca-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,296 [INFO] Iteration 1: openshift-kni-infra: True
07-12 15:50:11.055  2023-07-12 19:50:10,311 [INFO] Iteration 1: openshift-apiserver-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,318 [INFO] Iteration 1: openshift-infra: True
07-12 15:50:11.055  2023-07-12 19:50:10,343 [INFO] Iteration 1: openshift-kube-storage-version-migrator: True
07-12 15:50:11.055  2023-07-12 19:50:10,347 [INFO] Iteration 1: openshift-cloud-credential-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,354 [INFO] Iteration 1: openshift-ingress: True
07-12 15:50:11.055  2023-07-12 19:50:10,383 [INFO] Iteration 1: openshift-cluster-samples-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,388 [INFO] Iteration 1: openshift-console: True
07-12 15:50:11.055  2023-07-12 19:50:10,413 [INFO] Iteration 1: openshift-node: True
07-12 15:50:11.055  2023-07-12 19:50:10,445 [INFO] Iteration 1: openshift-kube-controller-manager-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,457 [INFO] Iteration 1: openshift-oauth-apiserver: True
07-12 15:50:11.055  2023-07-12 19:50:10,480 [INFO] Iteration 1: openshift-openstack-infra: True
07-12 15:50:11.055  2023-07-12 19:50:10,506 [INFO] Iteration 1: openshift-config-managed: True
07-12 15:50:11.055  2023-07-12 19:50:10,649 [INFO] Iteration 1: openshift-operator-lifecycle-manager: True
07-12 15:50:11.055  2023-07-12 19:50:10,742 [INFO] Iteration 1: openshift-dns-operator: True
07-12 15:50:11.055  2023-07-12 19:50:10,766 [INFO] Iteration 1: kube-node-lease: True
07-12 15:50:11.055  2023-07-12 19:50:10,865 [INFO] Iteration 1: default: True
07-12 15:50:11.643  2023-07-12 19:50:11,353 [INFO] Iteration 1: openshift-machine-config-operator: True
07-12 15:50:11.643  2023-07-12 19:50:11,549 [INFO] Iteration 1: openshift-kube-apiserver: True
07-12 15:50:11.940  2023-07-12 19:50:11,643 [INFO] Iteration 1: openshift-console-operator: True
07-12 15:50:11.940  2023-07-12 19:50:11,646 [INFO] Iteration 1: openshift-image-registry: True
07-12 15:50:11.940  2023-07-12 19:50:11,679 [INFO] Iteration 1: openshift-vsphere-infra: True
07-12 15:50:11.940  2023-07-12 19:50:11,742 [INFO] Iteration 1: openshift-kube-scheduler-operator: True
07-12 15:50:11.940  2023-07-12 19:50:11,767 [INFO] Iteration 1: kube-public: True
07-12 15:50:11.940  2023-07-12 19:50:11,857 [INFO] Iteration 1: openshift-marketplace: True
07-12 15:50:12.195  2023-07-12 19:50:11,962 [INFO] Iteration 1: openshift-apiserver: True
07-12 15:50:12.195  2023-07-12 19:50:12,064 [INFO] Iteration 1: openshift: True
07-12 15:50:12.195  2023-07-12 19:50:12,069 [INFO] Iteration 1: openshift-monitoring: True
07-12 15:50:12.195  2023-07-12 19:50:12,142 [INFO] Iteration 1: openshift-ingress-operator: True
07-12 15:50:12.195  2023-07-12 19:50:12,171 [INFO] Iteration 1: openshift-config: True
07-12 15:50:12.491  2023-07-12 19:50:12,249 [INFO] Iteration 1: openshift-authentication: True
07-12 15:50:12.491  2023-07-12 19:50:12,251 [INFO] Iteration 1: openshift-controller-manager: True
07-12 15:50:12.491  2023-07-12 19:50:12,273 [INFO] Iteration 1: kube-system: True
07-12 15:50:12.491  2023-07-12 19:50:12,342 [INFO] Iteration 1: openshift-etcd-operator: True
07-12 15:50:12.839  2023-07-12 19:50:12,543 [INFO] Iteration 1: openshift-kube-controller-manager: True
07-12 15:50:12.839  2023-07-12 19:50:12,638 [INFO] Iteration 1: openshift-operators: True
07-12 15:50:12.839  2023-07-12 19:50:12,646 [INFO] Iteration 1: openshift-etcd: True
07-12 15:50:12.839  2023-07-12 19:50:12,741 [INFO] Iteration 1: openshift-kube-storage-version-migrator-operator: True
07-12 15:50:12.839  2023-07-12 19:50:12,742 [INFO] Iteration 1: openshift-network-operator: True
07-12 15:50:12.839  2023-07-12 19:50:12,775 [INFO] Iteration 1: openshift-authentication-operator: True
07-12 15:50:13.094  2023-07-12 19:50:12,841 [INFO] Iteration 1: openshift-service-ca: True
07-12 15:50:13.094  2023-07-12 19:50:12,865 [INFO] Iteration 1: openshift-controller-manager-operator: True
07-12 15:50:13.094  2023-07-12 19:50:12,875 [INFO] Iteration 1: openshift-cluster-storage-operator: True
07-12 15:50:13.094  2023-07-12 19:50:12,943 [INFO] Iteration 1: openshift-cluster-node-tuning-operator: True
07-12 15:50:13.094  2023-07-12 19:50:12,950 [INFO] Iteration 1: openshift-config-operator: True
07-12 15:50:13.094  2023-07-12 19:50:12,984 [INFO] Iteration 1: openshift-multus: True
07-12 15:50:14.040  2023-07-12 19:50:13,858 [INFO] Iteration 1: openshift-cluster-csi-drivers: True
07-12 15:50:14.294  2023-07-12 19:50:14,046 [INFO] Iteration 1: openshift-dns: True
07-12 15:50:14.853  2023-07-12 19:50:14,639 [INFO] Iteration 1: openshift-ovn-kubernetes: True
07-12 15:50:14.853  2023-07-12 19:50:14,735 [INFO] Iteration 1: openshift-kube-scheduler: True
07-12 15:50:14.853  2023-07-12 19:50:14,737 [INFO] HTTP requests served: 0 
07-12 15:50:14.853  
07-12 15:50:15.411  2023-07-12 19:50:15,193 [INFO] []
07-12 15:50:15.411  
07-12 15:50:15.411  2023-07-12 19:50:15,193 [INFO] Sleeping for the specified duration: 3
% oc get nodes             
NAME                                         STATUS                     ROLES    AGE     VERSION
........
ip-10-0-214-151.us-east-2.compute.internal   Ready                      worker   4h41m   v1.19.16+8203b20
ip-10-0-215-198.us-east-2.compute.internal   Ready                      worker   4h41m   v1.19.16+8203b20
ip-10-0-215-4.us-east-2.compute.internal     Ready,SchedulingDisabled   worker   4h41m   v1.19.16+8203b20
ip-10-0-218-208.us-east-2.compute.internal   Ready                      worker   4h41m   v1.19.16+8203b20
ip-10-0-220-10.us-east-2.compute.internal    Ready                      worker   4h41m   v1.19.16+8203b20
ip-10-0-221-107.us-east-2.compute.internal   Ready,SchedulingDisabled   worker   5h1m    v1.19.16+8203b20
ip-10-0-221-75.us-east-2.compute.internal    Ready                      master   5h12m   v1.19.16+8203b20
@chaitanyaenr
Copy link
Collaborator

Node enters Ready,SchedulingDisabled state when a user intentionally cordons the node which sets it in a maintenance mode until user uncordons it if I understand correctly. In that case, we can skip reporting as it's intentional from user perspective and cerberus tracks whether it's ready or not. Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants