-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Improve retcode reporting in salt/salt-call CLI commands #48361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is great. Let me respond to few of your comments as best I can:
Frankly, I think we can be somewhat arbitrary about this. We could, for example, simply agree to return the highest integer. The more important notion is that it's non-zero and there's not a ton that we can do beyond that.
I have never liked this flag. I think it's non-intuitive and I've always felt like salt-call should just "do the right thing" by default which would be to pass through the return code of the called function. That said, I'm pretty skittish about this change to behavior. We'd be talking about something that might bite a lot of people. I'd want to discuss how we'd communicate this impending change. That said, I do think that as much parity as we can achieve between |
salt/minion.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got a lot of things setting retcodes of one. Worth doing some debug logging here so at least the reason could be inspected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can be arranged. The 1s I added are kind of just placeholders right now though, I would prefer to eventually be using constants defined in salt.defaults.exitcodes.
salt/modules/test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the most intuitive error message, IMHO. As a user, I wouldn't know what to do if I saw this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be accompanied by a line number in the log file. This is a function designed to be used for testing, so I didn't think we needed something that was all that verbose. The idea is that if you are running this function, you're doing it for a reason. I don't think it's something that 99.99% of non-developer users are ever going to see.
What did you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I failed to recognize that this was in the test module. Yes, your counter-point is sound.
salt/modules/test.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue resolved via above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@terminalmage Right direction! Since the exit codes are always very important to any Unix scripting, I would though say my few cents here:
- salt-call should not return 1 and 0 for the same thing. If it is success, then it is 0. But then it is not, it should be something else (TBD).
- sys.exit(11) is still very odd and we should change that. POSIX it means "Try Again" (?), Salt Exit Codes it means "Thin redeployment failure" (??). I would propose to return an actual code, or at least stick to a generic failure and success.
- We probably need to either change exit codes normalising for every command or at least segregate one for Salt SSH and other for the rest.
- I would concentrate of what is returned, not how (though it is also very important). Because, again, if one gets 0 for False and True, then the whole thing makes no much sense.
- Separate "CLI error codes" from "Salt call error codes". That is, whatever mechanism is chosen to determine what kind of "False" with whatever internal error code Minion just returned in the actual result inside JSON or so, to CLI it should be POSIX way.
I am still puzzled why we are overriding POSIX codes, instead of just allocating new ones? I.e. why we are overriding 11 with "Thin wrong deployment" instead of allocating 211 for it?
|
My vote would be for a generic @isbm what are your thoughts for signaling an exception raised on a remote minion? I personally like the idea of having an optional key in the return event payload rather than setting a Perhaps |
|
Since the whole thing is quite complex here, I'd break it into steps and then summarise the following:
❓ Open question: How do we handle exit codes between Unix and Windows? Problem is that Windows has up to 16000 ways to crash 😃 of those and POSIX no longer applicable here. I would think of a logic inside the EX_ACCESS_DENIED = 5 if is_windows() else 13@twangboy this is where you are going to shine. 😉 |
|
The other option is to assign an error to a bit from 0 to 255 there are 8 bits. Therefore 8 different types of errors. If two different errors occur xor them together. So 6 would mean some minions did not respond and some states failed. Then a person could care about States failing and not care about minion not responding in their shell script which determines to alarm or not based on the exit code. |
|
This is not how error codes supposed to work. |
|
Not sure if 20 plus errors codes are the way to go. We need to known for
Anything else is a bit more details along the same lines. Also people like to use cmd.run to get real error code results as well, which might be an exception to the rule which only makes sense for an action against a single server. Maybe a new command called |
|
I don't think we need a large number of error codes. At most we would need 2, in my opinion. As I see it, here are two good options:
|
|
I am suggesting an extra command e.g. Which will give people the scripted remote execution they want, without impacting |
Criteria include: 1. __context__['retcode'] is nonzero 2. An exception is caught 3. The return data is a dict, and has either a 'result' or 'success' key with a False value.
This use a generator comprehension to stop iteration as soon as a nonzero retcode is found.
This tests that we set return codes properly both for salt and salt-call
Also add tests for checking result/success keys
|
After talking with @thatch45 earlier today on this, this is what we've decided:
|
|
My name is @thatch45 and I endorse this conclusion. |
db41ded to
828331f
Compare
|
OK, adjustments have been made to error handling, including adjusting some of the integration tests I added and adding a couple new ones. The updated behavior should be as follows:
|
|
IMHO we absolutely need a page in the docs dedicated to explaining this added before we merge it. |
|
@cachedout How does this look to you now? |
What this PR does
Improves how the
saltandsalt-callCLI commands set exit codes.Refs: #47732
For
salt:The following cases will result in a nonzero exit status for
saltsaltresultorsuccesskey, and its value isFalseFor
salt-call:The following cases will result in a nonzero exit status for
salt-call(assuming it was invoked using--retcode-passthrough):resultorsuccesskey, and its value isFalseFurther points of discussion needed before merge
There is some asymmetry to how
saltandsalt-callhandle exit codes:For
salt, when the master sees that one of its minions has set a nonzero retcode in the return data, it does asys.exit(11). Whensalt-callis invoked with--retcode-passthrough, it will return whatever retcode was set in__context__['retcode'], otherwise it just returns a0exit status (assuming no exception was raised).For
salt, when a minion catches an exception, it sets a retcode in the return event sent back to the master. As noted above, any one minion that generated a nonzero retcode will result in the master exiting with a return code of11. However, forsalt-call, any exception raised just results in asys.exit(salt.defaults.exitcodes.EX_GENERIC)(wth that retcode being1).To represent this all in a table:
saltexit codesalt-callexit codesalt-call --retcode-passthroughexit code__context____context__['retcode']resultorsuccessisFalseI feel like we should normalize this somehow. Here are my ideas:
--retcode-passthroughshould be deprecated and salt-call's default behavior should be to return a nonzero retcode when there is a failure casesaltandsalt-callwhen an exception is raised by a minion functionThis is somewhat complicated by the fact that
salthas to set a CLI retcode based on potentially multiple minions having failed, whilesalt-callonly has to set a CLI retcode based on one host's result. So perhaps some asymmetry is called for. I think it would be worth creating a new exit code insalt.defaults.exitcodeswhich represents a caught exception, and then exiting with that code in bothsaltandsalt-call. This gets around the fact that when runningsaltfrom the master, the master can't necessarily tell that the minion raised an exception or whether that retcode was set in__context__['retcode']. Alternatively, rather than trying to manage this simply via the retcode, it may be a good idea to set an optional key in the return event payload the minion sends back to the master, so that the master knows uneqivocally that an exception was caught. After all, if we just managed this using theretcodekey in the return event payload, then one could just set__context__['retcode']to that number and it would fool thesaltCLI command into thinking that an exception had been raised.I'm less sure what to do when
__context__['retcode']is set to a nonzero value. What if there are multiple distinct retcodes that come back to the master for different minions? Which does thesaltCLI command use for its exit code?