New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leaks in intern.c when OOM is raised #283
Conversation
One common issue with error-handling code is that, because it is extremely rarely exercised, it is likely to go stale. Do you have a reproducible way to exercise this particular failure code? Is there a reproduction case that could be included in the test suite? (It seems tricky to play with OOM behavior without putting the tester's machine reactivity in danger, but maybe we can do it using some form of process-level quotas?) |
To test the presence of the leak, you can compile and execute the program below. To observe the leak, you have to trigger the exception Out_of_memory during demarshalling (you could dichotomically look for the smallest input that raises OOM). If you are one linux, you can use ulimit to impose a memory limitation on your process. As you can see, there's 38Mb that remains unfreed at the end of program. let input_from_size size =
size * 1024 / (Sys.word_size / 8)
let n = try input_from_size (int_of_string Sys.argv.(1)) with _ -> input_from_size 10
let ignore_exception f x =
try
ignore (f x)
with e -> print_endline (Printexc.to_string e)
let heap_words msg =
Printf.printf "%s: %dMb\n%!" msg Gc.((stat ()).heap_words / 1024 / 1024 * (Sys.word_size / 8))
let () = begin
let file_name, oc = Filename.open_temp_file ~mode:[Open_binary] "test" ".tmp" in
begin
heap_words "before allocation";
let l = Array.init n (fun i -> Array.init 1022 (fun j -> i*j) ) in
heap_words "after allocation";
ignore_exception (Marshal.to_channel oc l) [];
end;
close_out oc;
Printf.printf "filename = %s\n%!" file_name;
heap_words "after marshaling";
let ic = open_in_bin file_name in
ignore_exception Marshal.from_channel ic;
heap_words "after demarshaling";
close_in ic;
Gc.compact ();
heap_words "after collection";
if not Sys.win32 then
Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ()))
|> ignore;
print_endline "Press enter to continue ...";
Unix.unlink file_name;
ignore (read_line ());
end |
The memory limitation is quite system dependent (as well as checking the used memory), are you sure it would be a good thing to add this kind of test in the testsuite ? |
I added a commit to fix a memory leak in Array.concat (when raising OOM). It is a lot less critic than the one in marshaling (you really need to concat a lot of arrays to notice it), but it is easy to fix. You can observe it with the program below. On my computer : let input_from_size size =
size * 1024 / (Sys.word_size / 8)
let heap_words msg =
Printf.printf "%s: %dMb\n%!" msg Gc.((stat ()).heap_words / 1024 / 1024 * (Sys.word_size / 8))
let n = try input_from_size (int_of_string Sys.argv.(1)) with _ -> input_from_size 10
let main () =
begin
heap_words "before allocation";
Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;
let stuff = Array.init n (fun i -> Array.init 1022 (fun j -> i*j)) in
heap_words "after allocation";
Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;
let input : int array list =
let rec aux acc = function
| 0 -> acc
| i -> aux ([||]::acc) (i - 1)
in aux [] Sys.max_array_length
in
heap_words "input built";
Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;
let ignore_exception f x =
try ignore (f x) with e -> print_endline (Printexc.to_string e)
in
ignore_exception Array.concat input;
stuff
end
let () =
begin
ignore (main ());
Gc.compact ();
heap_words "after collection";
Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore
end |
Do you think it would be worth to make a "no_raise" version of caml_alloc_shr (that returns 0 instead of raising_out_of_memory) in order to completly solve the initial problem ? |
I haven't looked at the details yet (nor would I trust myself to have a correct intuition here), but I find the idea of a version of After looking at |
To implement caml_alloc_shr_no_raise, I was literally proposing to replace the two occurrences of "caml_raise_out_of_memory" by a return NULL. The caller to this new function, should then check whether or not the returned value is 0. If it is 0, it frees its memory and raise_out_of_memory, if not it does as before. If I'm right, there's only two places which could raise an exception (and the first one could not really happen since we check "wosize > Max_wosize" before calling the function); all the functions called from caml_alloc_shr are exception free. I have to concede that this does feel like a hack and that I may be acting like a "sorcerer apprentice".
Actually, the problematic leak is not just a block, it is the whole unmarshalled value in "intern_input".
There's no guarantee that your program will come back in "intern.c" soon (or ever), so cleaning at the next Marshaling does not sound that great. Another solution could be to maintain a static list of statically allocated pointers and make the function caml_raise_out_of_memory` free them. |
You are right, the |
The last commit implements the "no_raise" solution discussed above for |
As discussed with Marc offline, making sure that intern_cleanup is idempotent would make the code simpler (no need to set fields to NULL before calling it) and also more robust (one could call intern_cleanup on the next call to the demarshaler). It still seems to be a net improvement -- worth the extra effort -- to release the temporary memory as soon as the OOM condition is detected. Alternatively, since it would be difficult to catch the OOM exception in the C code, one could expose intern_cleanup as a primitive and arrange so that OCaml wrappers calling the demarshaler always call this function on exit (including after an arbitrary exception). This would again simplify the code in intern.c (currently, it has to be careful to call intern_cleanup before raising an exception). caml_deserialize_error would become useless (custom methods could raise directly). |
Thanks for looking into these out-of-memory conditions, they can use some work indeed. The more I read this PR and others from @mlasson, the more I think we should rewrite the runtime system in C++ :-) At least it gets the interaction of exceptions and destructors right... I'm on the fence concerning caml_stat_alloc_no_raise. So far we've been using malloc() directly in this kind of situation. I'm not sure the "fill the block with garbage if we're in debug mode" is very useful. caml_alloc_shr_no_raise looks like a good idea to me, pending the blessing of @damiendoligez , I am concerned, however, about the extra run-time cost on caml_alloc_shr. The latter is a time-critical function of the runtime system, as it is called over and over again during minor collections. |
Various possibilities to avoid the overhead of having caml_alloc_shr as a wrapper around the no_raise variant:
|
|
@@ -12,7 +12,7 @@ | |||
/***********************************************************************/ | |||
|
|||
/* Operations on arrays */ | |||
|
|||
#include <stdio.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not, I forgot to remove it after using printf for debugging purpose (I tried to see if I was able to raise out_of_memory between each call). Sorry :-/.
After giving it some thoughts, I think #283 is a better starting from than #287. My preference would be to remove the duplication of caml_alloc_shr, probably by implementing the raising version as a wrapper around the "noraise" one (and checking for NULL explicitly in hot spots instead of going through the wrapper). One could also arrange so that calls to the cleanup functions in the demarshaler don't need to reset some global variables to NULL (by adding an initializer on these variables and resetting their values in the cleanup function; perhaps adding some assertions at the beginning of the demarshaler to check for these values, hence failing nicely if some exceptions, for instance raised from custom demarshaler -- still escape). |
We discussed this during yesterday's developer meeting. This bug needs to be fixed before the release (so having a final bugfix ready before mid-December would be important); @xavierleroy wants it as simple as possible (and seemed willing to help along the way), and @damiendoligez as fast as possible. I'm sure @mlasson will find a nice way to please everyone. P.S.: closing one of the pull requests would be convenient from an issue-tracking point of view. (I do think exploring both parts of the design space was a very nice idea). |
@gasche: I've just closed the other one. |
I think we are converging to a solution. |
In the function, `intern_alloc` a call to caml_alloc_for_heap is very likely to return NULL when reading a big marshaled value. If that happens, before raising out_of_memory, it should call the `intern_cleanup` function to free the stack as well as `intern_input` that may have been malloced by `caml_input_val`. Similarly, `intern_cleanup` should also be called when we are not able to allocate `intern_obj_table`. To do that, I added a function `caml_stat_alloc_no_raise` which, like its brother, `caml_stat_alloc` wraps some debugging information around a call to malloc. I could have used directly malloc instead of adding a new function to memory.c, as it is done in other places of the code (it has the drawback of not adding the debug tag). Note that this fix is not perfect. The function `intern_alloc` could also raise out_of_memory through its call to `caml_alloc_shr`. It is less likely to happen since caml_alloc_shr is only called when the input is smaller than Max_wosize but it could happen. In that case, there will be leak (but a smaller one).
There is a memory leak when the second or the third call to `caml_stack_alloc` in `caml_array_concat` raises Out_of_memory. This will fix it.
Prevents the function `caml_alloc_shr` to raise an OOM exception before intern_cleanup could be called (this complete commit 1e62f1b). It defines a new caml_alloc_shr_no_raise function.
I replaced all calls to stat_alloc_no_raise by plain mallocs. Also, I reimplemented alloc_shr_no_raise by duplicating the code of alloc_shr to avoid any overhead induced by an extra function call.
It is explicitly unfolded in the minor_gc.c for performance.
Assert that intern is in a clean state at the beginning of demarshaling primitives.
I find the current version very nice, with clean init/cleanup "parentheses". I'm mildly convinced by the implementation of caml_alloc_shr_no_raise, which follows @damiendoligez 's suggestion ( mlasson@a9d1226#commitcomment-14442754 ):
I'm pretty sure that implementing caml_alloc_shr as a wrapper around caml_alloc_shr_no_raise, and perhaps inlining this wrapper at its three call sites in minor_gc.c (or just define the wrapper as @damiendoligez Would you be ok with the suggestion above, or do you insist on keeping the current version? (As @mlasson, I've no idea how to produce a convincing benchmark.) |
I discussed this with Damien (I find his suggestion horrible). His point is that testing for |
What about passing an extra Boolean argument to the function instead of reading a global variable? (Plus two one wrapper implemented as a macro for the common raising version.) |
value res; | ||
caml_alloc_shr_return = 1; | ||
res = caml_alloc_shr (wosize, tag); | ||
caml_alloc_shr_return = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know for sure that caml_alloc_shr_return
is always 0
when this function is called? Otherwise it would make sense to save the pre-call value and restore it, instead of always setting it to 0
.
What would be a useful benchmark? I'm pretty sure that in a full build (e.g. |
@alainfrisch I personally think that would be fine (and it would make it easier for our threadsafe-runtime friends). I think it would be interesting to try to benchmark all three choices on a dumb benchmark (I understand you want to avoid the important work of finding a good benchmark), the idea being that if its very hot it is likely to show on the profile. If we happen to observe that the null-return-check makes a noticeable difference while the extra-parameter and global-variable are indistinguishable (that would be my guess), then you're good to go! |
(The safe approach would be to duplicate the code for caml_alloc_shr. It's not like it's a huge piece of code.) Would e.g. tests/misc-kb be considered as a good benchmark for the minor GC? |
Indeed I was going to suggest misc-kb, aka "@xavierleroy 's pet benchmark", which seems to allocate quite a bit. Otherwise a silly allocate-in-a-loop microbenchmark as you know well how to do would probably also be reasonable (it would observe maximal overhead, which is what we are looking for here). @chambart is there any good choice of gc-stressing benchmark in operf-micro or operf-macro? |
misc-kb oscillates between 0.057s and 0.063s on my machine. Does not seem to be a good micro-benchmark... Will create a synthetic one. |
@alainfrisch : misc-kb used to run for longer, but the test suite took too long to run... More generally, everything in testsuite/ is tests, not benchmarks. |
You could run short benchmarks several times in a row. Regardless of the time taken, I now compute "best out of N" times and observed that this can make a large difference in robustness of the results. I use N=5 for quick iteration testing and N=20 for solid results. This allows to reliably observe timing differences that seem to be "within the noise" when looking at the time distribution of any fixed benchmark. (Benchmarks are hard and usually wrong, I'm confident I'll eventually regret my advice above.) |
Here is a simple proposal: a static inline function caml_alloc_shr_aux that takes a third parameter indicating what to do in case of OOM; two instantiations as caml_alloc_shr and caml_alloc_shr_no_raise. No code duplication, and GCC produces perfect code. |
Thanks @xavierleroy , we will use this version. |
If you do not plan to use |
Fix memory leaks in intern.c when OOM is raised
You mean, something like:
? |
Note: github distinguishes general comments done on the PR, and specific comments made against a specific line in one of the PR commits; it shows the specific comments in the general stream as well (to make it easier for others to follow them), but it is more convenient to reply directly on the commit. When the comment header says "$(person) commented at Yes, I mean something like that, but of course the choice of either a non-return |
Fix for mark stack being too small
…ml#283) Co-authored-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
In the function,
intern_alloc
a call to caml_alloc_for_heap is very likely toreturn NULL when reading a big marshaled value. If that happens, before raising
out_of_memory, it should call the
intern_cleanup
function to free the stackas well as
intern_input
that may have been malloced bycaml_input_val
.Similarly,
intern_cleanup
should also be called when we are not able toallocate
intern_obj_table
. To do that, I added a functioncaml_stat_alloc_no_raise
which, like its brother,caml_stat_alloc
wrapssome debugging information around a call to malloc. I could have used directly
malloc instead of adding a new function to memory.c, as it is done in other
places of the code (it has the drawback of not adding the debug tag).
Note that this fix is not perfect. The function
intern_alloc
could also raiseout_of_memory through its call to
caml_alloc_shr
. It is less likely to happensince caml_alloc_shr is only called when the input is smaller than Max_wosize
but it could happen. In that case, there will be leak (but a smaller one).