-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve error message when bytecode program runs out of fds #2195
Conversation
Before, any errors when loading bytecode backtrace information were fatal and the error message was potentially misleading.
Load the debug information during runtime startup if OCAMLRUNPARAM=b=2. This guards against the specific case of running out of fds, since the debug information can't then be loaded.
See also ocaml/dune#1633 (comment) for a log showing how weird this error message was before! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read read_main_debug_info
only so far, not the rest of the changes.
I wondered about whether reading the debug information at startup could be an issue in any way. The only potential issue I can see (besides making the startup logic more copmlex) is a slowdown at startup. Do we have reasonably big executables on which we could measure the debuginfo-loading time, to ensure that it remains unnoticeable in scripting scenarios?
runtime/backtrace_byt.c
Outdated
di->events = process_debug_events(caml_start_code, events, &di->num_events); | ||
di->events = | ||
process_debug_events(caml_start_code, events, &di->num_events); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find the code more readable if the fd < 0
case was handled first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the else
branch for if (caml_seek_optional_section
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(though your comment can still stand - this is actually a leaking fd bug in the code which was there before)
runtime/backtrace_byt.c
Outdated
} else { | ||
close(fd); | ||
} | ||
fd = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this write, isn't it the case that fd
is always 0
at the CAMLreturnT
point below? I don't understand when it is that we may have a non-zero return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - if fd
was negative after the call to caml_attempt_open
then this code is never reached (see line 345)
runtime/backtrace.c
Outdated
print_location(&li, i); | ||
result = caml_debuginfo_location(dbg, &li); | ||
if (result == 0) | ||
print_location(&li, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but I would find err
or errno
clearer than result
, and I think the control flow would be clearer with a break
right after the caml_debuginfo_location
case --- than a test in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two loops - break
would only exit the inner loop. The structuring relies on a "trick" which is namely that result
will always be 0
in native code and in bytecode the inner loop only executes once (because the inner loop is to do with inlining).
Happy to rename the variable, though!
Thanks @dra27, I really like the idea of this being fixed!
Do you intend to include a test program in this GRP?
For instance, couldn't your little example program be made part of the
testsuite?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased and addressed @shindere's suggestion to incorporate the test (a few other notes inline)
testsuite/tests/backtrace/pr2195.ml
Outdated
@@ -0,0 +1,18 @@ | |||
(* TEST | |||
flags += "-g" | |||
ocamlrunparam = "b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shindere, this idiom is used in other places apparently with the intention to cause OCAMLRUNPARAM
to be set, but it doesn't seem to. Is it supposed to?
testsuite/tests/backtrace/pr2195.run
Outdated
@@ -0,0 +1,4 @@ | |||
#!/bin/sh | |||
export OCAMLRUNPARAM=b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above - this hook is only used to ensure that the test is run with OCAMLRUNPARAM=b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a matter of general policy, I would still prefer to load debug information lazily, only when needed to produce a backtrace, rather than at program start-up. (You could imagine short-running programs with humonguous debugging information that takes forever, relatively speaking, to load at start-up time.)
Could we just improve the error message ("cannot display backtrace information because I'm out of file descriptors") instead? This problem happens excessively rarely, so that might be good enough.
@dra27: what makes you think OCAMLRUNPARAM is not defined / exported I just checked and, yes, something like
in a test block should add b to the current value of OCAMLRUNPARAM I think this is working properly e.g. in |
You could also open the file descriptor at the start, and only read from it when you need the info. This would avoid the problem with little overhead. The drawback is that you'd leak one fd every time the program |
991f018
to
c16c64a
Compare
@damiendoligez - the drawback feels too expensive. In the original case with Dune, the problem was that there are not enough FDs - i.e. it wasn't bug per se in Dune, but that it needed to be careful to exhaust a finite resource. @shindere - this is working this time, thanks. I'm not sure quite sure what was going wrong when I tried to set @xavierleroy - this is now done. See longer next comment! |
Rebased and revised version pushed:
|
f3d4589
to
4746ea3
Compare
A small round of updates, in the light of a somewhat red precheck...
|
4746ea3
to
b7ed1ed
Compare
OK, so this PR now seems to fall into two parts, which I'm happy to separate:
Prior to splitting, though, I have 3 possible implementations of the second part:
I personally lean towards the third one - at startup, it adds only the overhead of reading data from a file which is already open and the memory overhead of keeping that around. On the plus-side, backtraces will always work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, modulo the wording in the documentation (see below for suggestions).
I would have been happy with the first commit only (the one that explains the issue with running out of FDs), but the second commit (the one that introduces "b=2") is small enough.
manual/manual/cmds/runtime.etex
Outdated
when an uncaught exception aborts the program. This option does not | ||
require an argument. If a bytecode program does not have any available | ||
FDs when the runtime attempts to load debugging information, then | ||
location information will not be displayed. You can force the runtime | ||
to load debugging information at program startup by setting "b=2". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is not very clear. Not everyone knows what an "FD" is. Plus, the backtrace will tell when "b=2" is required. An example of an alternate wording:
when an uncaught exception aborts the program. This option does not | |
require an argument. If a bytecode program does not have any available | |
FDs when the runtime attempts to load debugging information, then | |
location information will not be displayed. You can force the runtime | |
to load debugging information at program startup by setting "b=2". | |
when an uncaught exception aborts the program. An optional argument | |
can be provided: "b=0" turns backtrace printing off; "b=1" is equivalent to "b" | |
and turns backtrace printing on; "b=2" turns backtrace printing on and forces | |
the runtime system to load debugging information at program startup time | |
instead of at backtrace printing time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dra27 could you please agree to reword the documentation, or state that you don't want it changed? We need to make progress on PRs.
Concerning the other two alternatives:
That's quite a refactoring. Not sure I want to take that risk for such a rarely-used feature.
The stack backtrace information produced by (For the record: the little bit of DWARF debugging info that ocamlopt produces, and the much bigger chunk of DWARF debugging info that @mshinwell 's gdb support work produced, go into separate sections of the ELF executable file, sections which are never loaded in memory when the program starts.) |
Sorry this one got dropped, too - your wording is clearly better than mine! Rebased and updated to include it, all I added was a sentence explicitly saying when you might want to use |
( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Merging right now...
Is this still OK for 4.12, @Octachron? |
Improve error message when bytecode program runs out of fds (cherry picked from commit 8575a4b)
The following program:
would terminate with
Fatal error: executable program file not found
if backtraces were enabled which is wrong for two reasons:Furthermore, this error would happen if user code attempted to get the backtrace information (e.g. to display an error message). In this case, the information about the exception is completely lost, since the runtime terminates.
The first commit detects
EMFILE
incaml_attempt_open
and returns a new flag to differentiate the error.caml_debuginfo_location
is altered to return0
on success but to pass on the return value ofcaml_attempt_open
if this fails (native runtime is unaffected by this change).caml_print_exception_backtrace
is then altered so that it displays a more helpful error message, and the error is no longer fatal. Note thatloc_is_raise
remains valid in this case, which means that the effect on user code would be to act as though debugging information were not available (where before the runtime had terminated)The second commit optionally extends this further by causing the debug information to be loaded during startup if backtraces are enabled. The assumption is that the system is not likely to have run out of fds at this point, though in this case the runtime would in fact fail to start. Again, native code is not affected by this change.
The background for this seemingly mundane change is a non-trivial amount of time lost debugging Dune...
cc @jonludlam