Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When preflight results in FAILED, it exits with code 0 #1131

Closed
xavpaice opened this issue Apr 20, 2023 · 5 comments · Fixed by #1135
Closed

When preflight results in FAILED, it exits with code 0 #1131

xavpaice opened this issue Apr 20, 2023 · 5 comments · Fixed by #1135
Assignees

Comments

@xavpaice
Copy link
Member

Bug Description

When I run a preflight test (via kubectl preflight ), the output might be, for example:

$ kubectl preflight --interactive=false ./preflight.yaml
name: cluster-resources    status: running         completed: 0    total: 2
name: cluster-resources    status: completed       completed: 1    total: 2
name: cluster-info         status: running         completed: 1    total: 2
name: cluster-info         status: completed       completed: 2    total: 2

   --- PASS Required Kubernetes Version
      --- Your cluster meets the recommended and required versions of Kubernetes.
   --- FAIL: Must have at least 3 nodes in the cluster
      --- This application requires at least 3 nodes
--- FAIL   preflight-sample
FAILED

$ echo $?
0

Expected Behavior

If I use this in a shell script or any other wrapper that uses preflights as a test to determine if the cluster is up to snuff or not, I need to grep for the output rather than simply use the exit code like most folks would be default.

I would expect the exit code for failed tests to be non-zero, though possibly different if there's an error actually running the preflights as opposed to preflights failing.

Steps To Reproduce

Preflight 0.62.1, Linux/Intel, k3d cluster with one node.
Very simple preflight check:

apiVersion: troubleshoot.sh/v1beta2
kind: Preflight
metadata:
  name: preflight-sample
spec:
  analyzers:
    - clusterVersion:
        outcomes:
          - fail:
              when: "< 1.19.0"
              message: The application requires at least Kubernetes 1.19.0, and recommends 1.24.0.
              uri: https://kubernetes.io
          - warn:
              when: "< 1.23.0"
              message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.24.0 or later.
              uri: https://kubernetes.io
          - pass:
              message: Your cluster meets the recommended and required versions of Kubernetes.
    - nodeResources:
        checkName: Must have at least 3 nodes in the cluster
        outcomes:
          - fail:
              when: "count() < 3"
              message: This application requires at least 3 nodes
          - pass:
              message: This cluster has enough nodes.

From what I can see, the preflight package return doesn't do any check to actually return a boolean for the result, just the text of the result itself.

@banjoh
Copy link
Member

banjoh commented Apr 20, 2023

I think we would need a few exit codes for at least the following

  • 3 - Failed checks
  • 2 - Invalid input (cli options, invalid spec)
  • 1 - catch all

The exit codes values above are arbitrary except 1 which is inspired by bash. We can choose other exit code values if we so wish as long as they remain consistent for all troubleshoot binaries.

I suggest w also introduce this change in the supportbundle cause configured analysers get executed and can fail.

@CpuID CpuID self-assigned this Apr 21, 2023
@CpuID
Copy link
Contributor

CpuID commented Apr 23, 2023

Adding an extra code here: 4 to be no failures, but at least 1 warning

CpuID pushed a commit that referenced this issue Apr 23, 2023
= at least one failure, 4 = no failures but at least 1 warn

1 as a catch all, 2 for invalid input etc

ref #1131
CpuID pushed a commit to replicatedhq/troubleshoot.sh that referenced this issue Apr 23, 2023
@CpuID
Copy link
Contributor

CpuID commented Apr 23, 2023

This is functionally complete, but I need to refactor some changes I made to cli.RootCmd to still propagate the exit code up the stack, while ensuring the tests pass. Relates to how spf13/cobra's Command only allows an error to be propagated up...

@CpuID
Copy link
Contributor

CpuID commented Apr 27, 2023

#1135 works now, and tests pass

ready for review/merge

same with docs replicatedhq/troubleshoot.sh#489

CpuID pushed a commit that referenced this issue May 9, 2023
0 = all passed, 3 = at least one failure, 4 = no failures but at least 1 warn

1 as a catch all (generic errors), 2 for invalid input/specs etc

ref #1131

docs replicatedhq/troubleshoot.sh#489
CpuID pushed a commit to replicatedhq/troubleshoot.sh that referenced this issue May 9, 2023
* adding docs for multiple exit code support in preflight

ref replicatedhq/troubleshoot#1131

---------

Co-authored-by: Paige Calvert <paige@replicated.com>
@CpuID
Copy link
Contributor

CpuID commented May 10, 2023

Released in Troubleshoot v0.63.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants