When `preflight` results in FAILED, it exits with code 0 #1131

xavpaice · 2023-04-20T06:53:09Z

Bug Description

When I run a preflight test (via kubectl preflight ), the output might be, for example:

$ kubectl preflight --interactive=false ./preflight.yaml
name: cluster-resources    status: running         completed: 0    total: 2
name: cluster-resources    status: completed       completed: 1    total: 2
name: cluster-info         status: running         completed: 1    total: 2
name: cluster-info         status: completed       completed: 2    total: 2

   --- PASS Required Kubernetes Version
      --- Your cluster meets the recommended and required versions of Kubernetes.
   --- FAIL: Must have at least 3 nodes in the cluster
      --- This application requires at least 3 nodes
--- FAIL   preflight-sample
FAILED

$ echo $?
0

Expected Behavior

If I use this in a shell script or any other wrapper that uses preflights as a test to determine if the cluster is up to snuff or not, I need to grep for the output rather than simply use the exit code like most folks would be default.

I would expect the exit code for failed tests to be non-zero, though possibly different if there's an error actually running the preflights as opposed to preflights failing.

Steps To Reproduce

Preflight 0.62.1, Linux/Intel, k3d cluster with one node.
Very simple preflight check:

apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
  name: preflight-sample
spec:
  analyzers:
    - clusterVersion:
        outcomes:
          - fail:
              when: "< 1.19.0"
              message: The application requires at least Kubernetes 1.19.0, and recommends 1.24.0.
              uri: https://kubernetes.io
          - warn:
              when: "< 1.23.0"
              message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.24.0 or later.
              uri: https://kubernetes.io
          - pass:
              message: Your cluster meets the recommended and required versions of Kubernetes.
    - nodeResources:
        checkName: Must have at least 3 nodes in the cluster
        outcomes:
          - fail:
              when: "count() < 3"
              message: This application requires at least 3 nodes
          - pass:
              message: This cluster has enough nodes.

From what I can see, the preflight package return doesn't do any check to actually return a boolean for the result, just the text of the result itself.

The text was updated successfully, but these errors were encountered:

banjoh · 2023-04-20T07:29:05Z

I think we would need a few exit codes for at least the following

3 - Failed checks
2 - Invalid input (cli options, invalid spec)
1 - catch all

The exit codes values above are arbitrary except 1 which is inspired by bash. We can choose other exit code values if we so wish as long as they remain consistent for all troubleshoot binaries.

I suggest w also introduce this change in the supportbundle cause configured analysers get executed and can fail.

CpuID · 2023-04-23T13:41:02Z

Adding an extra code here: 4 to be no failures, but at least 1 warning

= at least one failure, 4 = no failures but at least 1 warn 1 as a catch all, 2 for invalid input etc ref #1131

ref replicatedhq/troubleshoot#1131

CpuID · 2023-04-23T14:54:42Z

This is functionally complete, but I need to refactor some changes I made to cli.RootCmd to still propagate the exit code up the stack, while ensuring the tests pass. Relates to how spf13/cobra's Command only allows an error to be propagated up...

CpuID · 2023-04-27T21:09:50Z

#1135 works now, and tests pass

ready for review/merge

same with docs replicatedhq/troubleshoot.sh#489

0 = all passed, 3 = at least one failure, 4 = no failures but at least 1 warn 1 as a catch all (generic errors), 2 for invalid input/specs etc ref #1131 docs replicatedhq/troubleshoot.sh#489

* adding docs for multiple exit code support in preflight ref replicatedhq/troubleshoot#1131 --------- Co-authored-by: Paige Calvert <paige@replicated.com>

CpuID · 2023-05-10T00:27:07Z

Released in Troubleshoot v0.63.0

CpuID self-assigned this Apr 21, 2023

CpuID pushed a commit that referenced this issue Apr 23, 2023

support multiple exit codes based on what went wrong. 0 = all passed, 3

cd6962f

= at least one failure, 4 = no failures but at least 1 warn 1 as a catch all, 2 for invalid input etc ref #1131

CpuID mentioned this issue Apr 23, 2023

support multiple exit codes based on what went wrong/right #1135

Merged

6 tasks

CpuID pushed a commit to replicatedhq/troubleshoot.sh that referenced this issue Apr 23, 2023

adding docs for multiple exit code support in preflight

ae4cf19

ref replicatedhq/troubleshoot#1131

CpuID mentioned this issue Apr 23, 2023

adding docs for multiple exit code support in preflight replicatedhq/troubleshoot.sh#489

Merged

CpuID closed this as completed in #1135 May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When `preflight` results in FAILED, it exits with code 0 #1131

When `preflight` results in FAILED, it exits with code 0 #1131

xavpaice commented Apr 20, 2023

banjoh commented Apr 20, 2023 •

edited

Loading

CpuID commented Apr 23, 2023

CpuID commented Apr 23, 2023

CpuID commented Apr 27, 2023

CpuID commented May 10, 2023

When preflight results in FAILED, it exits with code 0 #1131

When preflight results in FAILED, it exits with code 0 #1131

Comments

xavpaice commented Apr 20, 2023

banjoh commented Apr 20, 2023 • edited Loading

CpuID commented Apr 23, 2023

CpuID commented Apr 23, 2023

CpuID commented Apr 27, 2023

CpuID commented May 10, 2023

When `preflight` results in FAILED, it exits with code 0 #1131

When `preflight` results in FAILED, it exits with code 0 #1131

banjoh commented Apr 20, 2023 •

edited

Loading