New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit codes #564

Closed
clausa opened this Issue Mar 11, 2015 · 30 comments

Comments

Projects
None yet
4 participants
@clausa

clausa commented Mar 11, 2015

We are running "/usr/sbin/rear checklayout || /usr/sbin/rear mkrescue" from cron.

If EXIT_CODE gets set to a non-zero value anywhere in the code, /usr/sbin/rear writes "WARNING rc=$EXIT_CODE" to syslog. That triggers a logcheck script to notify our monitoring system - so someone can look into this.

However, if there is a change in the layout, EXIT_CODE is set to 1, in order to trigger mkrescue to run.

Unfortunately that also result in a WARNING in syslog, which creates a false alarm.

Any suggestions, on how to "fix" this?

@gdha gdha added the enhancement label Mar 19, 2015

@gdha gdha added this to the Rear v1.18 milestone Mar 19, 2015

@gdha gdha self-assigned this Mar 19, 2015

@gdha gdha added the discuss/RFC label Mar 19, 2015

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Mar 19, 2015

Member

EXIT_CODE=1 must be kept for the cron entry so that mkrescue is getting triggered.
We could have different exit codes. e.g. EXIT_CODE=2 for serious errors...
Proposal:

  • EXIT_CODE=0 : no issue found
  • EXIT_CODE=1 : layout changed (needed to trigger mkrescue). I would like to see this logged as well in syslog, but how to formulate this? NOTICE: rear layout changed - trigger mkrescue or something like that?
  • EXIT_CODE>1 : serious error that should get logged in syslog
Member

gdha commented Mar 19, 2015

EXIT_CODE=1 must be kept for the cron entry so that mkrescue is getting triggered.
We could have different exit codes. e.g. EXIT_CODE=2 for serious errors...
Proposal:

  • EXIT_CODE=0 : no issue found
  • EXIT_CODE=1 : layout changed (needed to trigger mkrescue). I would like to see this logged as well in syslog, but how to formulate this? NOTICE: rear layout changed - trigger mkrescue or something like that?
  • EXIT_CODE>1 : serious error that should get logged in syslog
@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 19, 2015

Contributor

@gdha
why must it be "EXIT_CODE=1" for cron?

I think in general the exit code value '1' is used for "anything wrong"
and not to indicate a specific outcome.

In rear-1.17.0 in usr/sbin/rear I don't see a reason why the EXIT_CODE value for cron cannot be something else.

In particular in usr/sbin/rear it seems "EXIT_CODE=1" is already used for "ERROR: The specified command '$WORKFLOW' does not exist !"

If my understanding is right I would like to suggest something like:

EXIT_CODE=0 : no issue found

EXIT_CODE=1 : default value for "anything wrong"

EXIT_CODE=2 ... EXIT_CODE=9 : reserved for specific fatal errors.

EXIT_CODE>9 : reserved to indicate a specific outcome

in particular currently used:

EXIT_CODE=11 : layout changed

For background info what I have in mind see the exit codes of CUPS backends and see CUPS filters how messages are prefixed there, see
"Exit Codes" at
http://cups.org/documentation.php/doc-1.7/man-backend.html
and "Log Messages" at
http://cups.org/documentation.php/doc-1.7/man-filter.html

Contributor

jsmeix commented Mar 19, 2015

@gdha
why must it be "EXIT_CODE=1" for cron?

I think in general the exit code value '1' is used for "anything wrong"
and not to indicate a specific outcome.

In rear-1.17.0 in usr/sbin/rear I don't see a reason why the EXIT_CODE value for cron cannot be something else.

In particular in usr/sbin/rear it seems "EXIT_CODE=1" is already used for "ERROR: The specified command '$WORKFLOW' does not exist !"

If my understanding is right I would like to suggest something like:

EXIT_CODE=0 : no issue found

EXIT_CODE=1 : default value for "anything wrong"

EXIT_CODE=2 ... EXIT_CODE=9 : reserved for specific fatal errors.

EXIT_CODE>9 : reserved to indicate a specific outcome

in particular currently used:

EXIT_CODE=11 : layout changed

For background info what I have in mind see the exit codes of CUPS backends and see CUPS filters how messages are prefixed there, see
"Exit Codes" at
http://cups.org/documentation.php/doc-1.7/man-backend.html
and "Log Messages" at
http://cups.org/documentation.php/doc-1.7/man-filter.html

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 19, 2015

Member

I am actually against using exit codes as an official interface. My favourite bad example are the rsync exit codes Where 24 means that all is fine but some source files disappeared between scanning and transferring. The result is that whereever I use rsync in a script I must check the exit code and add code to handle 24 like 0.

Therefore I suggest to use exit codes > 0 only as a signal to tell the outside world that we could not do our job as expected. More advanced stuff should be done via an explicit interface so that you can write something like this

res=$(rear something) || deal_with_error
if [[ "$res" == *need-new-rescue-image* ]] ; then
    do_something_else
fi

IMHO that is much more legible than for example

rear something
res=$?
if (( res == 11 )) ; then
    do_something
elif (( res > 0 )) ; then
    deal_with_error
fi

For the case of creating a cron job to run ReaR if the layout changed I would always opt to add stuff to rear to keep the conditionals out of the cron job. Within ReaR we have much more detail and context infos available to make decisions.

In this specific example I would rather extend ReaR to support something like

rear mkrescue --if-layout-changed

so that you can rely on ReaR to do the right thing. And if the exit code is >0 then you know for sure that there was a problem.

Member

schlomo commented Mar 19, 2015

I am actually against using exit codes as an official interface. My favourite bad example are the rsync exit codes Where 24 means that all is fine but some source files disappeared between scanning and transferring. The result is that whereever I use rsync in a script I must check the exit code and add code to handle 24 like 0.

Therefore I suggest to use exit codes > 0 only as a signal to tell the outside world that we could not do our job as expected. More advanced stuff should be done via an explicit interface so that you can write something like this

res=$(rear something) || deal_with_error
if [[ "$res" == *need-new-rescue-image* ]] ; then
    do_something_else
fi

IMHO that is much more legible than for example

rear something
res=$?
if (( res == 11 )) ; then
    do_something
elif (( res > 0 )) ; then
    deal_with_error
fi

For the case of creating a cron job to run ReaR if the layout changed I would always opt to add stuff to rear to keep the conditionals out of the cron job. Within ReaR we have much more detail and context infos available to make decisions.

In this specific example I would rather extend ReaR to support something like

rear mkrescue --if-layout-changed

so that you can rely on ReaR to do the right thing. And if the exit code is >0 then you know for sure that there was a problem.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 19, 2015

Contributor

@schlomo

when exit codes are used as an interface I agree that the ranges of error exit codes and non-error exit codes must be strictly separated.

If I understand you correctly you propose some special stdout (or perhaps also stderr) messages as an official interface.

Currently I don't have an opinion if this is good or bad.

But couldn't that conflict with other output when e.g. "rear -v" is used?
For example accidentally "rear -v" might output something like "No layout change -> skipping need-new-rescue-image". What I mean is: If some special messages are an official interface every contributor to rear must carefully avoid to output something that matches the special messages by accident.

Contributor

jsmeix commented Mar 19, 2015

@schlomo

when exit codes are used as an interface I agree that the ranges of error exit codes and non-error exit codes must be strictly separated.

If I understand you correctly you propose some special stdout (or perhaps also stderr) messages as an official interface.

Currently I don't have an opinion if this is good or bad.

But couldn't that conflict with other output when e.g. "rear -v" is used?
For example accidentally "rear -v" might output something like "No layout change -> skipping need-new-rescue-image". What I mean is: If some special messages are an official interface every contributor to rear must carefully avoid to output something that matches the special messages by accident.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 19, 2015

Contributor

Only a quick idea:

What about using specific prefixed lines on stdout as official interface like:

$ rear --very-verbose something
Hello,
my name is rear!
How are you?
INFO: I do now 'something' for you:
DEBUG: Init 'something'
Hello, my name is something!
DEBUG: Cheking whether or not need-new-rescue-image
INFO: Found new device /dev/sdX mounted as /dev/sdX9 at /foo
disk layout has changed => need-new-rescue-image
INTERFACE: need-new-rescue-image
I am fine!
ERROR: failed to read /etc/fstab
Oops!
I am no longer fine!
I abort now... 
$ echo $?
1
Contributor

jsmeix commented Mar 19, 2015

Only a quick idea:

What about using specific prefixed lines on stdout as official interface like:

$ rear --very-verbose something
Hello,
my name is rear!
How are you?
INFO: I do now 'something' for you:
DEBUG: Init 'something'
Hello, my name is something!
DEBUG: Cheking whether or not need-new-rescue-image
INFO: Found new device /dev/sdX mounted as /dev/sdX9 at /foo
disk layout has changed => need-new-rescue-image
INTERFACE: need-new-rescue-image
I am fine!
ERROR: failed to read /etc/fstab
Oops!
I am no longer fine!
I abort now... 
$ echo $?
1
@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 20, 2015

Member

I would rather think about defining call backs. That means that you have to specify a program that will be called if ReaR has to tell something. Could be something as simple es echo >special_file or something that talks to a central ReaR management service.

Member

schlomo commented Mar 20, 2015

I would rather think about defining call backs. That means that you have to specify a program that will be called if ReaR has to tell something. Could be something as simple es echo >special_file or something that talks to a central ReaR management service.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 20, 2015

Contributor

@schlomo

Do I understand your "defining call backs" correctly that this is another different interface how rear could communicate with "the outer world"
because I think "defining call backs" does not match what you wrote in your example above "res=$(rear something)".

If I understand it correctly there are now three proposals for an interface how rear could communicate with "the outer world":

  • Using exit codes
  • Using stdout/stderr messages
  • Using call backs

I think this particular issue has revealed a general issue in rear that currently it is not yet well defined how rear would communicate with "the outer world".

Perhaps a new separated issue should be created for the general issue how rear would communicate with "the outer world" and this issue here is only about how rear could tell when the disk layout has changed (or when in general a new rescue image is needed, e.g. also when kernel drivers have changed).

Contributor

jsmeix commented Mar 20, 2015

@schlomo

Do I understand your "defining call backs" correctly that this is another different interface how rear could communicate with "the outer world"
because I think "defining call backs" does not match what you wrote in your example above "res=$(rear something)".

If I understand it correctly there are now three proposals for an interface how rear could communicate with "the outer world":

  • Using exit codes
  • Using stdout/stderr messages
  • Using call backs

I think this particular issue has revealed a general issue in rear that currently it is not yet well defined how rear would communicate with "the outer world".

Perhaps a new separated issue should be created for the general issue how rear would communicate with "the outer world" and this issue here is only about how rear could tell when the disk layout has changed (or when in general a new rescue image is needed, e.g. also when kernel drivers have changed).

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 20, 2015

Member

You are right. This is a topic which should be discussed in greater detail. What I meant is that if to go beyond a simple single reply on stdout then I would rather opt for a call back system.

So far we use echo and read to communicate with the user. I think that this should be moved to a library that could then be customized to use a call back.

To go back to the original issue as opened by @clausa, I think that it would be enough to simply add a rear mkrescue --only-if-changed feature or something similar. @clausa, could you please comment what would be the most simple solution to your problem?

Member

schlomo commented Mar 20, 2015

You are right. This is a topic which should be discussed in greater detail. What I meant is that if to go beyond a simple single reply on stdout then I would rather opt for a call back system.

So far we use echo and read to communicate with the user. I think that this should be moved to a library that could then be customized to use a call back.

To go back to the original issue as opened by @clausa, I think that it would be enough to simply add a rear mkrescue --only-if-changed feature or something similar. @clausa, could you please comment what would be the most simple solution to your problem?

@clausa

This comment has been minimized.

Show comment
Hide comment
@clausa

clausa Mar 20, 2015

@schlomo being able to just call something like rear mkrescue --only-if-changed would definitely solve our 'problem'. So +1 from here.

clausa commented Mar 20, 2015

@schlomo being able to just call something like rear mkrescue --only-if-changed would definitely solve our 'problem'. So +1 from here.

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 21, 2015

Member

I just had a look. May way to implement this would be to take a checksum of the layout and gracefully exit if the checksum did not change after saving the layout.

Member

schlomo commented Mar 21, 2015

I just had a look. May way to implement this would be to take a checksum of the layout and gracefully exit if the checksum did not change after saving the layout.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 25, 2015

Contributor

@schlomo
regarding a "rear mkrescue --only-if-changed" feature:

Perhaps I am overanxious and I do not have sufficient overview knowledge of the whole rear but I fear such a feature could become a nightmare to implement and to keep it working correctly.

I think "rear mkrescue --only-if-changed" promises the user that he will get a new rear recovery system image when anything has changed that requires a new recovery system.

This means when any such change is by accident not recognized and later recovery fails because the recovery system is outdated, then the user rightfully complains that rear is guilty.

Without such a feature it is left to the user to decide if he needs a new recovery system (i.e. in case of doubt he would just make it anew).

In contrast a specific feature like "rear mkrescue --if-disklayout-changed" is perfectly o.k. from my point of view because it can be documented that this means any change in the disklayout.conf file (nothing more, nothing less).

When several such specific "rear mkrescue --if-this-changed", "rear mkrescue --if-that-changed" features are implemented, a general "rear mkrescue --if-changed" feature could be just a simpler way to call "rear mkrescue --if-this-changed --if-that-changed" but this would not mean the user gets a new recovery system image when anything has changed that requires a new recovery system.

Contributor

jsmeix commented Mar 25, 2015

@schlomo
regarding a "rear mkrescue --only-if-changed" feature:

Perhaps I am overanxious and I do not have sufficient overview knowledge of the whole rear but I fear such a feature could become a nightmare to implement and to keep it working correctly.

I think "rear mkrescue --only-if-changed" promises the user that he will get a new rear recovery system image when anything has changed that requires a new recovery system.

This means when any such change is by accident not recognized and later recovery fails because the recovery system is outdated, then the user rightfully complains that rear is guilty.

Without such a feature it is left to the user to decide if he needs a new recovery system (i.e. in case of doubt he would just make it anew).

In contrast a specific feature like "rear mkrescue --if-disklayout-changed" is perfectly o.k. from my point of view because it can be documented that this means any change in the disklayout.conf file (nothing more, nothing less).

When several such specific "rear mkrescue --if-this-changed", "rear mkrescue --if-that-changed" features are implemented, a general "rear mkrescue --if-changed" feature could be just a simpler way to call "rear mkrescue --if-this-changed --if-that-changed" but this would not mean the user gets a new recovery system image when anything has changed that requires a new recovery system.

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 25, 2015

Member

So your concern is mostly about the wording? No problem.

In any case, if I would implement this then I would not change any existing behaviour but extend mkrescue (and possibly mkbackup) to drop out quickly if nothing relevant changed.

In any case there would be some remaining risk involved. Imagine a server beeing set up and getting OS updated over 5 years. In many cases the disk layout would not change in those 5 years. But the ReaR version and the kernel would change.

To successfully recover this system on new hardware you would really need the recent ReaR version and OS kernel from all those updates. But since you never created a new Rescue Image you will try to recover your system with a Linux and ReaR that is 5 years old.

I actually believe that this example covers a majority of use cases in a data center.

Coming back to the original issue of @clausa I can think of the following:

  • Why don't you create a new rescue image every day, week or month? Why try to optimize this point? How much money or time does this optimization actually save you?
  • Provide a Pull request to optionally disable logging to syslog
  • Provide a Pull request that makes rear not log a warning if there was no Error (Maybe it is indeed a bug that we log a warning there)

Looking at 7820c99 where that behaviour was added I see a reference to OVO which was apparently used somewhere to collect infos about ReaR. @gdha, could you please try to remember what was the background there? Is it really necessary to log any exit code != 0 to syslog as a warning? Or maybe it would be enough to log the actual Errors?

Bottom line, after looking at everything I start to believe that it would be more "logical" that an exit code != 0 should be logged to syslog as info and not as warning. After all, it can be part of regular operations.

Member

schlomo commented Mar 25, 2015

So your concern is mostly about the wording? No problem.

In any case, if I would implement this then I would not change any existing behaviour but extend mkrescue (and possibly mkbackup) to drop out quickly if nothing relevant changed.

In any case there would be some remaining risk involved. Imagine a server beeing set up and getting OS updated over 5 years. In many cases the disk layout would not change in those 5 years. But the ReaR version and the kernel would change.

To successfully recover this system on new hardware you would really need the recent ReaR version and OS kernel from all those updates. But since you never created a new Rescue Image you will try to recover your system with a Linux and ReaR that is 5 years old.

I actually believe that this example covers a majority of use cases in a data center.

Coming back to the original issue of @clausa I can think of the following:

  • Why don't you create a new rescue image every day, week or month? Why try to optimize this point? How much money or time does this optimization actually save you?
  • Provide a Pull request to optionally disable logging to syslog
  • Provide a Pull request that makes rear not log a warning if there was no Error (Maybe it is indeed a bug that we log a warning there)

Looking at 7820c99 where that behaviour was added I see a reference to OVO which was apparently used somewhere to collect infos about ReaR. @gdha, could you please try to remember what was the background there? Is it really necessary to log any exit code != 0 to syslog as a warning? Or maybe it would be enough to log the actual Errors?

Bottom line, after looking at everything I start to believe that it would be more "logical" that an exit code != 0 should be logged to syslog as info and not as warning. After all, it can be part of regular operations.

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Mar 25, 2015

Member

@schlomo Indeed it was added on request of an important customer (who actually pay for writing code) for picking it up via OVO as there are thousands of Linux systems and nobody seemed to know how many failures there where on a weekly basis. So, what @clausa does not want to see, is not what my customer wanted... To be honest I believe it is good practice if you see a warning to always do a quick check of the log file. Better one check too many then a failed recovery afterwards...

Member

gdha commented Mar 25, 2015

@schlomo Indeed it was added on request of an important customer (who actually pay for writing code) for picking it up via OVO as there are thousands of Linux systems and nobody seemed to know how many failures there where on a weekly basis. So, what @clausa does not want to see, is not what my customer wanted... To be honest I believe it is good practice if you see a warning to always do a quick check of the log file. Better one check too many then a failed recovery afterwards...

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 25, 2015

Member

@gdha, are you 100% sure that your OVO customer actually needs the WARNING line in syslog for all cases where the exit code was != 0? Or do they more care about catching the ERROR lines which are generated by our Error function?

Could you talk to the OVO customer about the cases where a non-0 exit code is actually the desired behaviour? Maybe now they have some filters in their monitoring to remove those "false positives" and maybe they could also benefit from improving this topic?

Member

schlomo commented Mar 25, 2015

@gdha, are you 100% sure that your OVO customer actually needs the WARNING line in syslog for all cases where the exit code was != 0? Or do they more care about catching the ERROR lines which are generated by our Error function?

Could you talk to the OVO customer about the cases where a non-0 exit code is actually the desired behaviour? Maybe now they have some filters in their monitoring to remove those "false positives" and maybe they could also benefit from improving this topic?

@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha Mar 25, 2015

Member

@schlomo I can ask them next week when I'm there again on what they scan. I cannot tell for the moment - only that there problem has reduced significantly.

Member

gdha commented Mar 25, 2015

@schlomo I can ask them next week when I'm there again on what they scan. I cannot tell for the moment - only that there problem has reduced significantly.

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 25, 2015

Member

@gdha That sounds great! IMHO we should drop the WARNING line altogether. A WARNING is in my experience totally useless. Either I treat it as an error and live with a lot of false positives or I ignore it and only care about errors.

I would rather like to improve ReaR to throw an Error if we think that a human should check and not bother humans in all other cases.

Member

schlomo commented Mar 25, 2015

@gdha That sounds great! IMHO we should drop the WARNING line altogether. A WARNING is in my experience totally useless. Either I treat it as an error and live with a lot of false positives or I ignore it and only care about errors.

I would rather like to improve ReaR to throw an Error if we think that a human should check and not bother humans in all other cases.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 26, 2015

Contributor

@schlomo
interesting information! On the one hand I fully agree with your reasoning why warning messages are useless but on the other hand I would like to be able to show warnings as a programmer when there are issues where I can continue the program in a reasonable way but with an uncertain feeling if this is what the user actually wanted.

For example:

# /bin/showsomething --output-mode=color
WARNING: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII

in contrast to using the fallback silently or abort.

Perhaps for rear it could be a special case that there is nothing in between success and failure for a disaster recovery tool?
If yes, would this mean that rear must abort for any kind of issue?

Contributor

jsmeix commented Mar 26, 2015

@schlomo
interesting information! On the one hand I fully agree with your reasoning why warning messages are useless but on the other hand I would like to be able to show warnings as a programmer when there are issues where I can continue the program in a reasonable way but with an uncertain feeling if this is what the user actually wanted.

For example:

# /bin/showsomething --output-mode=color
WARNING: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII

in contrast to using the fallback silently or abort.

Perhaps for rear it could be a special case that there is nothing in between success and failure for a disaster recovery tool?
If yes, would this mean that rear must abort for any kind of issue?

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 26, 2015

Member

In my experience the wish for "warnings" is actually an expression of not trusting the software. As a software program, emitting a "warning" means that you want to shed the responsibility to actually decide if this is an error or not.

But the only thing you achieve as a program is to make your human waste time on false positives.

That is why I am against warnings. A program should call my attention to something only if I must take action as a result.

Your example with the color mode is perfectly showing my point: There is actually no action the user can take right now, the user can only accept it as it is. The only real "action" the user could possibly take is to use the program in a different way (from a color terminal). But since the user happens to use the program from an ASCII terminal, he probably has a reason for doing so and the warning does not help at all.

Therefore in your example the warning is a false positive because there is nothing one can do about it. I would really prefer your showsomething program to either issue an info message or - even better - to just shut up about this. If I miss the color output I will start myself to look into it. For that purpose a verbose or debug message would be much better suited.

Or put different: Warnings are only useful for people who want to run their environment manually. For those who build automation all kind of warnings are a curse.

Member

schlomo commented Mar 26, 2015

In my experience the wish for "warnings" is actually an expression of not trusting the software. As a software program, emitting a "warning" means that you want to shed the responsibility to actually decide if this is an error or not.

But the only thing you achieve as a program is to make your human waste time on false positives.

That is why I am against warnings. A program should call my attention to something only if I must take action as a result.

Your example with the color mode is perfectly showing my point: There is actually no action the user can take right now, the user can only accept it as it is. The only real "action" the user could possibly take is to use the program in a different way (from a color terminal). But since the user happens to use the program from an ASCII terminal, he probably has a reason for doing so and the warning does not help at all.

Therefore in your example the warning is a false positive because there is nothing one can do about it. I would really prefer your showsomething program to either issue an info message or - even better - to just shut up about this. If I miss the color output I will start myself to look into it. For that purpose a verbose or debug message would be much better suited.

Or put different: Warnings are only useful for people who want to run their environment manually. For those who build automation all kind of warnings are a curse.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Mar 26, 2015

Contributor

Many thanks for your great explanation!

In particular your "emitting a 'warning' means that you want to shed the responsibility to actually decide if this is an error or not" describes exactly what at least my motivation as programmer is in such cases - as I wrote "uncertain feeling".

Your "issue an info message" is also an interesting point that makes things very clear.

The issue is not that a program must run either silently successfully or it aborts with an ERROR message, the issue is only that WARNING messages are useless.

If I understand it correctly the following would be o.k.:

# /bin/showsomething --output-mode=color
INFO: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII
# echo $?
0
# /bin/showsomething --output-mode=color 2>/dev/null
# echo $?
0
# /bin/showsomething --debug --output-mode=color
DEBUG: TERM=ibmmono
INFO: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII
# echo $?
0
Contributor

jsmeix commented Mar 26, 2015

Many thanks for your great explanation!

In particular your "emitting a 'warning' means that you want to shed the responsibility to actually decide if this is an error or not" describes exactly what at least my motivation as programmer is in such cases - as I wrote "uncertain feeling".

Your "issue an info message" is also an interesting point that makes things very clear.

The issue is not that a program must run either silently successfully or it aborts with an ERROR message, the issue is only that WARNING messages are useless.

If I understand it correctly the following would be o.k.:

# /bin/showsomething --output-mode=color
INFO: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII
# echo $?
0
# /bin/showsomething --output-mode=color 2>/dev/null
# echo $?
0
# /bin/showsomething --debug --output-mode=color
DEBUG: TERM=ibmmono
INFO: output-mode=color unsupported on this terminal, using fallback output-mode=ASCII
# echo $?
0
@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo Mar 26, 2015

Member

@jsmeix exactly that is what I mean. Only bother humans if they are really supposed to do something about that single run of the program.

Member

schlomo commented Mar 26, 2015

@jsmeix exactly that is what I mean. Only bother humans if they are really supposed to do something about that single run of the program.

@gdha gdha modified the milestones: Rear v1.17.1, Rear v1.18 Mar 30, 2015

gdha added a commit that referenced this issue Apr 7, 2015

rear main script: removed the WARNING message at the end (as error
are logged to syslog anyway via the Error function).
See issue #564
@gdha

This comment has been minimized.

Show comment
Hide comment
@gdha

gdha May 31, 2015

Member

@rear/contributors Are we fine with this issue? Can we close it?

Member

gdha commented May 31, 2015

@rear/contributors Are we fine with this issue? Can we close it?

@schlomo

This comment has been minimized.

Show comment
Hide comment
@schlomo

schlomo May 31, 2015

Member

@gdha I am fine with this, thanks a lot!

Member

schlomo commented May 31, 2015

@gdha I am fine with this, thanks a lot!

@gdha gdha closed this May 31, 2015

@clausa

This comment has been minimized.

Show comment
Hide comment
@clausa

clausa May 31, 2015

Thanks all :-)

clausa commented May 31, 2015

Thanks all :-)

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 16, 2016

Contributor

@clausa
for ReaR 2.0 there is again a change what rear
reports to syslog when it finishes, see
#1130 and
#1089

Contributor

jsmeix commented Dec 16, 2016

@clausa
for ReaR 2.0 there is again a change what rear
reports to syslog when it finishes, see
#1130 and
#1089

@clausa

This comment has been minimized.

Show comment
Hide comment
@clausa

clausa Dec 19, 2016

@jsmeix

Thanks for the heads-up. I'm pretty sure this change will bring the issue raised in this ticket. (as "failed" in logfiles will also raise an alarm in our systems).

I might be wrong, but IIRC the issues is merely related to how non-zero exit codes are being used. A change in the disklayout is IMHO not a failure, but it will be logged as "rear checklayout failed with exit code 1" as RC has been set to 1 in order to notify/trigger "mkrescue"

clausa commented Dec 19, 2016

@jsmeix

Thanks for the heads-up. I'm pretty sure this change will bring the issue raised in this ticket. (as "failed" in logfiles will also raise an alarm in our systems).

I might be wrong, but IIRC the issues is merely related to how non-zero exit codes are being used. A change in the disklayout is IMHO not a failure, but it will be logged as "rear checklayout failed with exit code 1" as RC has been set to 1 in order to notify/trigger "mkrescue"

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 19, 2016

Contributor

As far as I see the root cause that
usr/share/rear/layout/compare/default/500_compare_layout.sh
and
usr/share/rear/layout/compare/default/510_compare_files.sh
set EXIT_CODE=1
if disk layout or config files have changed
is still not solved.

Therefore one cannot distinguish between a real failure
where a non zero exit code is right and the misuse of
a non zero exit code to signal something.

With the new syslog messages you see at least the workflow
so that you can implement special "failure" handling when
the checklayout workflow ends with non zero exit code.

Contributor

jsmeix commented Dec 19, 2016

As far as I see the root cause that
usr/share/rear/layout/compare/default/500_compare_layout.sh
and
usr/share/rear/layout/compare/default/510_compare_files.sh
set EXIT_CODE=1
if disk layout or config files have changed
is still not solved.

Therefore one cannot distinguish between a real failure
where a non zero exit code is right and the misuse of
a non zero exit code to signal something.

With the new syslog messages you see at least the workflow
so that you can implement special "failure" handling when
the checklayout workflow ends with non zero exit code.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 19, 2016

Contributor

I think I cannot solve it in a backward compatible way.
Therefore I think I will "just add another special case hack"
to usr/sbin/rear when the checklayout workflow ends.

Contributor

jsmeix commented Dec 19, 2016

I think I cannot solve it in a backward compatible way.
Therefore I think I will "just add another special case hack"
to usr/sbin/rear when the checklayout workflow ends.

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 20, 2016

Contributor

In
#1133
I implemented special case syslog message
when the checklayout workflow exits with exit code 1.
Now in /var/log/messages it look like

rear[13332]: 12759: rear checklayout finished with zero exit code

versus (when the checklayout workflow exits with exit code 1)

rear[11460]: 10888: rear checklayout finished with exit code 1 (layout or config changed)

versus (when checklayout exits with another non-zero exit code)

rear[12046]: 11473: rear checklayout failed with exit code 2
Contributor

jsmeix commented Dec 20, 2016

In
#1133
I implemented special case syslog message
when the checklayout workflow exits with exit code 1.
Now in /var/log/messages it look like

rear[13332]: 12759: rear checklayout finished with zero exit code

versus (when the checklayout workflow exits with exit code 1)

rear[11460]: 10888: rear checklayout finished with exit code 1 (layout or config changed)

versus (when checklayout exits with another non-zero exit code)

rear[12046]: 11473: rear checklayout failed with exit code 2

jsmeix added a commit that referenced this issue Dec 20, 2016

Merge pull request #1133 from jsmeix/special_case_syslog_message_for_…
…checklayout_workflow_issue564_and_issue1089

Special case syslog message for checklayout workflow.
The checklayout workflow is special because
it sets EXIT_CODE=1 when the disk layout has changed
or when configuration files have changed
but that exit code 1 is no error but meant as a signal
that things have changed which require a new ISO image
so that users can run "rear checklayout || rear mkrescue"
see #564
@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 20, 2016

Contributor

With #1133 merged
I consider this issue to be again fixed.

The root cause "misuse of non-zero exit codes"
is not yet solved and will not be solved for ReaR 2.0

Contributor

jsmeix commented Dec 20, 2016

With #1133 merged
I consider this issue to be again fixed.

The root cause "misuse of non-zero exit codes"
is not yet solved and will not be solved for ReaR 2.0

@jsmeix

This comment has been minimized.

Show comment
Hide comment
@jsmeix

jsmeix Dec 20, 2016

Contributor

To deal with the root cause I submited
#1134

Contributor

jsmeix commented Dec 20, 2016

To deal with the root cause I submited
#1134

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment