Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker run exit code consisency #6734

Closed
rhatdan opened this issue Jun 27, 2014 · 27 comments · Fixed by #14012
Closed

docker run exit code consisency #6734

rhatdan opened this issue Jun 27, 2014 · 27 comments · Fixed by #14012
Labels
exp/beginner kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.

Comments

@rhatdan
Copy link
Contributor

rhatdan commented Jun 27, 2014

I found other issues that talked about this but I did not find one that addressed it exactly.

The problem is differentiating between an exit code from

docker run blowing up versus the command being run by docker run blowing up.

We could follow the chroot standard.
Exit status:

 125 if 'chroot' itself fails
 126 if COMMAND is found but cannot be invoked
 127 if COMMAND cannot be found
 the exit status of COMMAND otherwise
@cevich
Copy link

cevich commented Jun 27, 2014

This is especially important for automation, where it often key to know if a problem happened w/in the container or within docker. Scraping output/logs is frequently not good enough, since not all error messages have the word "error" in them 😖 There really are not many better ways to do this than reserve some range of exit codes for docker, unlikely to clash with the majority of contained-processes. For the minority that may clash, they can be wrapped/mapped onto other numbers. Using the cheroot-convention, or something similar strikes me as a really helpful change.

@bgrant0607
Copy link

@rhatdan's suggestion is a good one, consistent with reserved shell exit code guidelines:
http://tldp.org/LDP/abs/html/exitcodes.html

However, we needn't try to cram all termination/failure reasons into the user exit code space. The API (and docker inspect) could provide another field or fields with more detail regarding system errors and/or even client-defined termination reasons, for clients that care.

Numerous things could go wrong when creating and setting up a container or starting the application. Or the container could exceed its memory limit. Or the host could run out of memory. Or looking beyond factors under Docker's control, Docker could crash, the host could crash or reboot, containers could fail liveness probes (e.g., systemd watchdog), or the user or higher-level system could stop the container for a variety of reasons.

See also:
kubernetes/kubernetes#137

@cyphar
Copy link
Contributor

cyphar commented Jul 7, 2014

@bgrant0607 Why don't we use the API to provide very detailed error data (what caused it and error strings, etc) but get the docker command to return an error code (per @rhatdan's suggestion) and the error string from the API?

@cevich
Copy link

cevich commented Jul 7, 2014

Am I understanding right, it would look something like (exact numeric codes not important to me):

125 Docker CLI itself fails, no other data available except printed error message.
126 Docker "something" failed, query (some) API and/or system log for more details.

  • The real exit status of COMMAND

@rhatdan
Copy link
Contributor Author

rhatdan commented Jul 7, 2014

Unless we can get more specific. Like the

125 Docker CLI itself fails, no other data available except printed error message.
126 Docker "init command" failed, query (some) API and/or system log for more details.
127 init command does not exist

  • The real exit status of COMMAND

@cyphar
Copy link
Contributor

cyphar commented Jul 8, 2014

@cevich Basically, yes. Exit codes provide "classes" of error types and the actual output to stderr tells the user what the error was.

@bgrant0607
Copy link

@cyphar Telling the user what the failure cause was is necessary but not sufficient. Systems will make decisions based on the cause of the failure, such as whether to restart the container on the same or on a different host, or whether to increase its memory limit. Querying the API for more info in the case of a failure is acceptable.

@shykes shykes changed the title We have a bugzilla about docker run exit code consisency docker run exit code consisency Jul 22, 2014
@duglin
Copy link
Contributor

duglin commented Oct 13, 2014

@crosbymichael @tianon
any thoughts on this one? I wouldn't mind taking a stab at a PR if the approach is acceptable.

@rhatdan
Copy link
Contributor Author

rhatdan commented Oct 14, 2014

@duglin If you are going to do my suggested solution, I am all for it.

@duglin
Copy link
Contributor

duglin commented Oct 14, 2014

That's my plan - just want to make sure one of the core maintainers is ok with the direction first.

@pikeas
Copy link

pikeas commented Jan 6, 2015

+1, this would be extremely useful

@rhatdan
Copy link
Contributor Author

rhatdan commented Jan 6, 2015

@duglin Have you made any progress, or I can get one of my engineers to work on this.

@duglin
Copy link
Contributor

duglin commented Jan 6, 2015

I was waiting for the official "+1" from a maintainer but if one of your guys wants to jump on it, go for it.

@crosbymichael
Copy link
Contributor

@duglin was returning error messages in the headers part of the work to support a change like this?

@duglin
Copy link
Contributor

duglin commented Jan 7, 2015

@crosbymichael Nope that's not related to this at all.

@jessfraz
Copy link
Contributor

status on this anyone?

@jessfraz jessfraz added white-belt kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. and removed Orchestration labels Feb 27, 2015
@rhatdan
Copy link
Contributor Author

rhatdan commented Feb 27, 2015

We have not worked on it. But I can raise it's priority.

@duglin
Copy link
Contributor

duglin commented Feb 27, 2015

@rhatdan let me know when you do start it - its still on my todo list too....

@spf13 spf13 added exp/beginner roadmap kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. and removed kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. roadmap white-belt labels Mar 21, 2015
@gdm85
Copy link
Contributor

gdm85 commented Apr 16, 2015

+1 this would be a nice improvement for integration

@cyphar
Copy link
Contributor

cyphar commented Apr 16, 2015

Upon reviewing this again, I think the best method would be to have the exit code for the docker run command be Docker-related errors (i.e. errors that occurred during the Docker-related operations of running the process) and that you should poll the API (or run docker exit-code <container> or some such) to get the exit code of the container. The only issue is that this isn't really backwards compatible (unless we make it so that docker run returns 0 iff the process returned 0 and there were no Docker errors and docker run returns 1 otherwise (unless there were Docker errors)).

@rhatdan
Copy link
Contributor Author

rhatdan commented Apr 16, 2015

We have started working on this again, basically looking at returning 127 most of the time when a

docker run/exec/start fails as opposed to a command failing within the container. Once we have this, we could look at trying to see if we can figure out 126 errors where we could not find the PID 1 command.

@cyphar
Copy link
Contributor

cyphar commented Apr 16, 2015

@rhatdan What happens if the PID 1 command returns 127, or is that non-standard behaviour we are ignoring?

@gdm85
Copy link
Contributor

gdm85 commented Apr 17, 2015

Just for reference/ideas, there's plenty of existing implementations that one could draw inspiration from.
In general normalizing any non-zero exit status to a Docker-defined value seems reasonable.

Quoting the man page EXIT STATUS sections of some Linux/BSD commands:

xargs

xargs exits with the following status:
       0 if it succeeds
       123 if any invocation of the command exited with status 1-125
       124 if the command exited with status 255
       125 if the command is killed by a signal
       126 if the command cannot be run
       127 if the command is not found
       1 if some other error occurred.

parallel

If --halt-on-error 0 or not specified:

       0     All jobs ran without error.

       1-253 Some of the jobs failed. The exit status gives the number of failed jobs

       254   More than 253 jobs failed.

       255   Other error.

       If --halt-on-error 1 or 2: Exit status of the failing job.

sudo

Upon successful execution of a program, the exit status from sudo will simply be the exit status of the program that was executed.

     Otherwise, sudo exits with a value of 1 if there is a configuration/permission problem or if sudo cannot execute the given command.  In the latter case the error string is printed to the standard
     error.  If sudo cannot stat(2) one or more entries in the user's PATH, an error is printed on stderr.  (If the directory does not exist or if it is not really a directory, the entry is ignored and no
     error is printed.)  This should not happen under normal circumstances.  The most common reason for stat(2) to return “permission denied” is if you are running an automounter and one of the directories
     in your PATH is on a machine that is currently unreachable.

sh

     Errors that are detected by the shell, such as a syntax error, will cause the shell to exit with a non-zero exit status.  If the shell is not an interactive shell, the execution of the shell file will
     be aborted.  Otherwise the shell will return the exit status of the last command executed, or if the exit builtin is used with a numeric argument, it will return the argument.

@rhatdan
Copy link
Contributor Author

rhatdan commented Apr 17, 2015

@cyphar Yes that is a problem, but as shown above multiple tools that exec other tools have followed this standard and they have the same problem.

The main goal is to scripting tools/testing tools to easily tell the difference between the tool failing and the command failing.

@icecrime icecrime removed the roadmap label Apr 22, 2015
@runcom
Copy link
Member

runcom commented Jul 19, 2015

there's a PR for this now #14012

@shahmalhar
Copy link

I am still getting "exit status 126" issue, did anybody figured it out?

@ghafran
Copy link

ghafran commented Oct 27, 2016

I got this "exit status 126" because I was mounting a volume and overriding the script that was the entrypoint. I moved the script outside of the directory and it worked. Hope that helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exp/beginner kind/enhancement Enhancements are not bugs or new features but can improve usability or performance.
Projects
None yet
Development

Successfully merging a pull request may close this issue.