Improve retcode reporting in salt/salt-call CLI commands #48361
What this PR does
Improves how the
|Nonzero retcode set in
||11||0||Whatever was set in
I feel like we should normalize this somehow. Here are my ideas:
--retcode-passthroughshould be deprecated and salt-call's default behavior should be to return a nonzero retcode when there is a failure case
- The same retcode should be returned in both
salt-callwhen an exception is raised by a minion function
This is somewhat complicated by the fact that
salt has to set a CLI retcode based on potentially multiple minions having failed, while
salt-call only has to set a CLI retcode based on one host's result. So perhaps some asymmetry is called for. I think it would be worth creating a new exit code in
salt.defaults.exitcodes which represents a caught exception, and then exiting with that code in both
salt-call. This gets around the fact that when running
salt from the master, the master can't necessarily tell that the minion raised an exception or whether that retcode was set in
__context__['retcode']. Alternatively, rather than trying to manage this simply via the retcode, it may be a good idea to set an optional key in the return event payload the minion sends back to the master, so that the master knows uneqivocally that an exception was caught. After all, if we just managed this using the
retcode key in the return event payload, then one could just set
__context__['retcode'] to that number and it would fool the
salt CLI command into thinking that an exception had been raised.
I'm less sure what to do when
__context__['retcode'] is set to a nonzero value. What if there are multiple distinct retcodes that come back to the master for different minions? Which does the
salt CLI command use for its exit code?
The text was updated successfully, but these errors were encountered:
This is great. Let me respond to few of your comments as best I can:
Frankly, I think we can be somewhat arbitrary about this. We could, for example, simply agree to return the highest integer. The more important notion is that it's non-zero and there's not a ton that we can do beyond that.
I have never liked this flag. I think it's non-intuitive and I've always felt like salt-call should just "do the right thing" by default which would be to pass through the return code of the called function. That said, I'm pretty skittish about this change to behavior. We'd be talking about something that might bite a lot of people. I'd want to discuss how we'd communicate this impending change. That said, I do think that as much parity as we can achieve between
My vote would be for a generic
@isbm what are your thoughts for signaling an exception raised on a remote minion? I personally like the idea of having an optional key in the return event payload rather than setting a
Since the whole thing is quite complex here, I'd break it into steps and then summarise the following:
Problem is that Windows has up to 16000 ways to crash
EX_ACCESS_DENIED = 5 if is_windows() else 13
@twangboy this is where you are going to shine.
The other option is to assign an error to a bit from 0 to 255 there are 8 bits. Therefore 8 different types of errors. If two different errors occur xor them together.
So 6 would mean some minions did not respond and some states failed. Then a person could care about States failing and not care about minion not responding in their shell script which determines to alarm or not based on the exit code.
Not sure if 20 plus errors codes are the way to go.
We need to known for
Anything else is a bit more details along the same lines.
Also people like to use cmd.run to get real error code results as well, which might be an exception to the rule which only makes sense for an action against a single server. Maybe a new command called
I don't think we need a large number of error codes. At most we would need 2, in my opinion. As I see it, here are two good options:
I am suggesting an extra command e.g.
Which will give people the scripted remote execution they want, without impacting
After talking with @thatch45 earlier today on this, this is what we've decided:
OK, adjustments have been made to error handling, including adjusting some of the integration tests I added and adding a couple new ones. The updated behavior should be as follows: