Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leaks in intern.c when OOM is raised #283

Merged
merged 16 commits into from Nov 23, 2015
Merged

Fix memory leaks in intern.c when OOM is raised #283

merged 16 commits into from Nov 23, 2015

Conversation

mlasson
Copy link
Contributor

@mlasson mlasson commented Nov 6, 2015

In the function, intern_alloc a call to caml_alloc_for_heap is very likely to
return NULL when reading a big marshaled value. If that happens, before raising
out_of_memory, it should call the intern_cleanup function to free the stack
as well as intern_input that may have been malloced by caml_input_val.
Similarly, intern_cleanup should also be called when we are not able to
allocate intern_obj_table. To do that, I added a function
caml_stat_alloc_no_raise which, like its brother, caml_stat_alloc wraps
some debugging information around a call to malloc. I could have used directly
malloc instead of adding a new function to memory.c, as it is done in other
places of the code (it has the drawback of not adding the debug tag).

Note that this fix is not perfect. The function intern_alloc could also raise
out_of_memory through its call to caml_alloc_shr. It is less likely to happen
since caml_alloc_shr is only called when the input is smaller than Max_wosize
but it could happen. In that case, there will be leak (but a smaller one).

@gasche
Copy link
Member

gasche commented Nov 6, 2015

One common issue with error-handling code is that, because it is extremely rarely exercised, it is likely to go stale. Do you have a reproducible way to exercise this particular failure code? Is there a reproduction case that could be included in the test suite? (It seems tricky to play with OOM behavior without putting the tester's machine reactivity in danger, but maybe we can do it using some form of process-level quotas?)

@mlasson
Copy link
Contributor Author

mlasson commented Nov 6, 2015

To test the presence of the leak, you can compile and execute the program below.
It will create a temporary file of about "Sys.argv.(1) Mb" using marshaling and will demarshaling just after. At the end, it will look in /proc/PID/status for the used memory (if your system does not have this, you can still use a method of your choice (eg. Resource Monitor on windows)).

To observe the leak, you have to trigger the exception Out_of_memory during demarshalling (you could dichotomically look for the smallest input that raises OOM). If you are one linux, you can use ulimit to impose a memory limitation on your process.
$ ulimit -Sv 100000
$ ./main 40
before allocation: 0Mb
after allocation: 40Mb
filename = /tmp/test85443e.tmp
after marshaling: 40Mb
Out of memory
after demarshaling: 40Mb
after collection: 0Mb
VmSize: 38808 kB

As you can see, there's 38Mb that remains unfreed at the end of program.

let input_from_size size = 
  size * 1024 / (Sys.word_size / 8)

let n = try input_from_size (int_of_string Sys.argv.(1)) with _ -> input_from_size 10

let ignore_exception f x = 
  try
   ignore (f x)
  with e -> print_endline (Printexc.to_string e)


let heap_words msg = 
  Printf.printf "%s: %dMb\n%!" msg Gc.((stat ()).heap_words /  1024 / 1024 * (Sys.word_size / 8))

let () = begin
  let file_name, oc = Filename.open_temp_file ~mode:[Open_binary] "test" ".tmp" in
  begin
    heap_words "before allocation";
    let l = Array.init n (fun i -> Array.init 1022 (fun j -> i*j) ) in
    heap_words "after allocation";
    ignore_exception (Marshal.to_channel oc l) [];
  end;
  close_out oc;
  Printf.printf "filename = %s\n%!" file_name;
  heap_words "after marshaling";
  let ic = open_in_bin file_name in
  ignore_exception Marshal.from_channel ic;
  heap_words "after demarshaling";
  close_in ic;
  Gc.compact ();
  heap_words "after collection";
  if not Sys.win32 then 
      Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ()))
   |> ignore;
  print_endline "Press enter to continue ...";
  Unix.unlink file_name;
  ignore (read_line ());
end

@mlasson
Copy link
Contributor Author

mlasson commented Nov 6, 2015

The memory limitation is quite system dependent (as well as checking the used memory), are you sure it would be a good thing to add this kind of test in the testsuite ?

@mlasson
Copy link
Contributor Author

mlasson commented Nov 6, 2015

I added a commit to fix a memory leak in Array.concat (when raising OOM). It is a lot less critic than the one in marshaling (you really need to concat a lot of arrays to notice it), but it is easy to fix.

You can observe it with the program below. On my computer :
$ ulimit -Sv 200000
$ concat 90
before allocation: 0Mb
VmSize: 5256 kB
after allocation: 92Mb
VmSize: 102104 kB
input built: 144Mb
VmSize: 152860 kB
Out of memory
after collection: 0Mb
VmSize: 41160 kB

let input_from_size size = 
  size * 1024 / (Sys.word_size / 8)
let heap_words msg = 
  Printf.printf "%s: %dMb\n%!" msg Gc.((stat ()).heap_words / 1024 / 1024 * (Sys.word_size / 8))
let n = try input_from_size (int_of_string Sys.argv.(1)) with _ -> input_from_size 10

let main () = 
  begin
    heap_words "before allocation";
    Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;
    let stuff = Array.init n (fun i -> Array.init 1022 (fun j -> i*j)) in
    heap_words "after allocation";
    Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;

    let input : int array list =
      let rec aux acc = function 
      | 0 -> acc
      | i -> aux ([||]::acc) (i - 1)
      in aux [] Sys.max_array_length
    in
    heap_words "input built";
    Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore;

    let ignore_exception f x = 
      try ignore (f x) with e -> print_endline (Printexc.to_string e) 
    in
    ignore_exception Array.concat input;
    stuff
  end

let () = 
  begin
    ignore (main ());
    Gc.compact ();
    heap_words "after collection";
    Sys.command (Printf.sprintf "grep VmSize /proc/%d/status" (Unix.getpid ())) |> ignore
  end

@mlasson
Copy link
Contributor Author

mlasson commented Nov 6, 2015

Do you think it would be worth to make a "no_raise" version of caml_alloc_shr (that returns 0 instead of raising_out_of_memory) in order to completly solve the initial problem ?

@gasche
Copy link
Member

gasche commented Nov 6, 2015

I haven't looked at the details yet (nor would I trust myself to have a correct intuition here), but I find the idea of a version of caml_alloc_shr that does not raise rather suspect: it's a complex beast, there is no easy way to know before hand whether it will fail or not, and it would be easy to introduce a bug while modifying this function.

After looking at intern_cleanup: I wonder if a different approach wouldn't be possible and safer. Leaking one caml_alloc_shr is not a strong issue (although "zero leak" is a much more comfortable place to be in for reasoning), the problem is with code that catches the Out_of_memory exception and keeps running into this leak repeatedly. So instead of trying to be very careful to cleanup before leaving intern_alloc with an out_of_memory exception, couldn't you try to detect, on the next entry in intern, that a cleanup is missing and should be performed first? This could possibly lead to a less tricky patch (than your proposed caml_alloc_shr_noraise), as you would just set some cleanup_if_I'm_still_set global flag before the dangerous operation (and make sure that the global state is cleanable at each failure point).

@mlasson
Copy link
Contributor Author

mlasson commented Nov 9, 2015

I find the idea of a version of caml_alloc_shr that does not raise rather suspect: it's a complex beast, there is no easy way to know before hand whether it will fail or not, and it would be easy to introduce a bug while modifying this function.

To implement caml_alloc_shr_no_raise, I was literally proposing to replace the two occurrences of "caml_raise_out_of_memory" by a return NULL. The caller to this new function, should then check whether or not the returned value is 0. If it is 0, it frees its memory and raise_out_of_memory, if not it does as before. If I'm right, there's only two places which could raise an exception (and the first one could not really happen since we check "wosize > Max_wosize" before calling the function); all the functions called from caml_alloc_shr are exception free. I have to concede that this does feel like a hack and that I may be acting like a "sorcerer apprentice".

Leaking one caml_alloc_shr is not a strong issue (although "zero leak" is a much more comfortable place to be in for reasoning), the problem is with code that catches the Out_of_memory exception and keeps running into this leak repeatedly.

Actually, the problematic leak is not just a block, it is the whole unmarshalled value in "intern_input".

So instead of trying to be very careful to cleanup before leaving intern_alloc with an out_of_memory exception, couldn't you try to detect, on the next entry in intern, that a cleanup is missing and should be performed first?

There's no guarantee that your program will come back in "intern.c" soon (or ever), so cleaning at the next Marshaling does not sound that great. Another solution could be to maintain a static list of statically allocated pointers and make the function caml_raise_out_of_memory` free them.

@gasche
Copy link
Member

gasche commented Nov 9, 2015

You are right, the NULL approach seems workable. I think you would need feedback from someone actually knowledgeable about the GC. @damiendoligez ?

@mlasson
Copy link
Contributor Author

mlasson commented Nov 9, 2015

The last commit implements the "no_raise" solution discussed above for caml_alloc_shr.

@alainfrisch
Copy link
Contributor

As discussed with Marc offline, making sure that intern_cleanup is idempotent would make the code simpler (no need to set fields to NULL before calling it) and also more robust (one could call intern_cleanup on the next call to the demarshaler). It still seems to be a net improvement -- worth the extra effort -- to release the temporary memory as soon as the OOM condition is detected.

Alternatively, since it would be difficult to catch the OOM exception in the C code, one could expose intern_cleanup as a primitive and arrange so that OCaml wrappers calling the demarshaler always call this function on exit (including after an arbitrary exception). This would again simplify the code in intern.c (currently, it has to be careful to call intern_cleanup before raising an exception). caml_deserialize_error would become useless (custom methods could raise directly).

@xavierleroy
Copy link
Contributor

Thanks for looking into these out-of-memory conditions, they can use some work indeed.

The more I read this PR and others from @mlasson, the more I think we should rewrite the runtime system in C++ :-) At least it gets the interaction of exceptions and destructors right...

I'm on the fence concerning caml_stat_alloc_no_raise. So far we've been using malloc() directly in this kind of situation. I'm not sure the "fill the block with garbage if we're in debug mode" is very useful.

caml_alloc_shr_no_raise looks like a good idea to me, pending the blessing of @damiendoligez , I am concerned, however, about the extra run-time cost on caml_alloc_shr. The latter is a time-critical function of the runtime system, as it is called over and over again during minor collections.

@alainfrisch
Copy link
Contributor

caml_alloc_shr_no_raise .. I am concerned, however, about the extra run-time cost on caml_alloc_shr

Various possibilities to avoid the overhead of having caml_alloc_shr as a wrapper around the no_raise variant:

  • Duplicate the code.
  • Duplicate the code, but in the less critical no_raise variant, do not inline caml_allocation_color.
  • Use a big macro to avoid code duplication.
  • Add an extra argument to caml_alloc_shr telling it whether it should raise or return NULL in case of OOM (and provide macros to avoid changing existing code).
  • Find a way to force inlining the wrapper in a portable way (is this possible?).
  • Use the no_raise variant during minor collection and check NULL on each call site (there are tree of them in minor_gc.c).

@mlasson
Copy link
Contributor Author

mlasson commented Nov 12, 2015

  • I duplicated the code because it was first in the list.
  • I do not like the big macro because the implementation contains a #ifdef DEBUG that would force us to play some CPP trick (also it decreases readability by turning off syntax coloration :)).
  • We could use the C inline keywords to define the wrapper in the headers (it is merely an indication, the compiler would be allowed not to inline, but in practice, I guess, most will). But inlining is not much used in the codebase, may be should not start now.
  • Personally, I like the last solution (checking NULL) on each call site.

@@ -12,7 +12,7 @@
/***********************************************************************/

/* Operations on arrays */

#include <stdio.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, I forgot to remove it after using printf for debugging purpose (I tried to see if I was able to raise out_of_memory between each call). Sorry :-/.

@alainfrisch
Copy link
Contributor

After giving it some thoughts, I think #283 is a better starting from than #287. My preference would be to remove the duplication of caml_alloc_shr, probably by implementing the raising version as a wrapper around the "noraise" one (and checking for NULL explicitly in hot spots instead of going through the wrapper). One could also arrange so that calls to the cleanup functions in the demarshaler don't need to reset some global variables to NULL (by adding an initializer on these variables and resetting their values in the cleanup function; perhaps adding some assertions at the beginning of the demarshaler to check for these values, hence failing nicely if some exceptions, for instance raised from custom demarshaler -- still escape).

@gasche
Copy link
Member

gasche commented Nov 19, 2015

We discussed this during yesterday's developer meeting. This bug needs to be fixed before the release (so having a final bugfix ready before mid-December would be important); @xavierleroy wants it as simple as possible (and seemed willing to help along the way), and @damiendoligez as fast as possible. I'm sure @mlasson will find a nice way to please everyone.

P.S.: closing one of the pull requests would be convenient from an issue-tracking point of view. (I do think exploring both parts of the design space was a very nice idea).

@mlasson
Copy link
Contributor Author

mlasson commented Nov 19, 2015

@gasche: I've just closed the other one.
@damiendoligez: I don't think I know enough to craft a trustworthy benchmark for proving you wrong. So, I'll implement your solution with a global variable.

@mlasson
Copy link
Contributor Author

mlasson commented Nov 19, 2015

I think we are converging to a solution.

In the function, `intern_alloc` a call to caml_alloc_for_heap is very likely to
return NULL when reading a big marshaled value. If that happens, before raising
out_of_memory, it should call the `intern_cleanup` function to free the stack
as well as `intern_input` that may have been malloced by `caml_input_val`.
Similarly, `intern_cleanup` should also be called when we are not able to
allocate `intern_obj_table`. To do that, I added a function
`caml_stat_alloc_no_raise` which, like its brother, `caml_stat_alloc` wraps
some debugging information around a call to malloc. I could have used directly
malloc instead of adding a new function to memory.c, as it is done in other
places of the code (it has the drawback of not adding the debug tag).

Note that this fix is not perfect. The function `intern_alloc` could also raise
out_of_memory through its call to `caml_alloc_shr`. It is less likely to happen
since caml_alloc_shr is only called when the input is smaller than Max_wosize
but it could happen. In that case, there will be leak (but a smaller one).
There is a memory leak when the second or the third call to `caml_stack_alloc`
in `caml_array_concat` raises Out_of_memory. This will fix it.
Prevents the function `caml_alloc_shr` to raise an OOM exception
before intern_cleanup could be called (this complete commit 1e62f1b).

It defines a new caml_alloc_shr_no_raise function.
I replaced all calls to stat_alloc_no_raise by plain mallocs.
Also, I reimplemented alloc_shr_no_raise by duplicating the code of alloc_shr to avoid any overhead induced by an extra function call.
It is explicitly unfolded in the minor_gc.c for performance.
Assert that intern is in a clean state at the beginning of demarshaling
primitives.
@alainfrisch
Copy link
Contributor

I find the current version very nice, with clean init/cleanup "parentheses".

I'm mildly convinced by the implementation of caml_alloc_shr_no_raise, which follows @damiendoligez 's suggestion ( mlasson@a9d1226#commitcomment-14442754 ):

I'd rather write caml_alloc_shr_no_raise as a wrapper around caml_alloc_shr : use a boolean global
variable, test it before raising or returning NULL, have caml_alloc_shr_no_raise change the variable
before and after the call to caml_alloc_shr. Ugly but probably necessary.

I'm pretty sure that implementing caml_alloc_shr as a wrapper around caml_alloc_shr_no_raise, and perhaps inlining this wrapper at its three call sites in minor_gc.c (or just define the wrapper as inline) would be ok.

@damiendoligez Would you be ok with the suggestion above, or do you insist on keeping the current version? (As @mlasson, I've no idea how to produce a convincing benchmark.)

@gasche
Copy link
Member

gasche commented Nov 23, 2015

I discussed this with Damien (I find his suggestion horrible). His point is that testing for NULL return adds work to do to the fast path of this function, while doing horrible global variable stuff at the raise site only adds work to the slow (and infrequent) path. I cannot argue with that, but I would of course encourage you to do real benchmarks to see whether your approach is acceptable in terms of performances (I think realistic benchmarks are the only way to get reliable information to make a choice here, and without them being conservative is best).

@alainfrisch
Copy link
Contributor

What about passing an extra Boolean argument to the function instead of reading a global variable? (Plus two one wrapper implemented as a macro for the common raising version.)

value res;
caml_alloc_shr_return = 1;
res = caml_alloc_shr (wosize, tag);
caml_alloc_shr_return = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know for sure that caml_alloc_shr_return is always 0 when this function is called? Otherwise it would make sense to save the pre-call value and restore it, instead of always setting it to 0.

@alainfrisch
Copy link
Contributor

What would be a useful benchmark? I'm pretty sure that in a full build (e.g. make world), the difference would be in the noise. Perhaps one specific test in the testsuite can serve as good benchmark for the GC?

@gasche
Copy link
Member

gasche commented Nov 23, 2015

@alainfrisch I personally think that would be fine (and it would make it easier for our threadsafe-runtime friends). I think it would be interesting to try to benchmark all three choices on a dumb benchmark (I understand you want to avoid the important work of finding a good benchmark), the idea being that if its very hot it is likely to show on the profile. If we happen to observe that the null-return-check makes a noticeable difference while the extra-parameter and global-variable are indistinguishable (that would be my guess), then you're good to go!

@alainfrisch
Copy link
Contributor

(The safe approach would be to duplicate the code for caml_alloc_shr. It's not like it's a huge piece of code.)

Would e.g. tests/misc-kb be considered as a good benchmark for the minor GC?

@gasche
Copy link
Member

gasche commented Nov 23, 2015

Indeed I was going to suggest misc-kb, aka "@xavierleroy 's pet benchmark", which seems to allocate quite a bit. Otherwise a silly allocate-in-a-loop microbenchmark as you know well how to do would probably also be reasonable (it would observe maximal overhead, which is what we are looking for here).

@chambart is there any good choice of gc-stressing benchmark in operf-micro or operf-macro?

@alainfrisch
Copy link
Contributor

misc-kb oscillates between 0.057s and 0.063s on my machine. Does not seem to be a good micro-benchmark... Will create a synthetic one.

@xavierleroy
Copy link
Contributor

@alainfrisch : misc-kb used to run for longer, but the test suite took too long to run... More generally, everything in testsuite/ is tests, not benchmarks.

@gasche
Copy link
Member

gasche commented Nov 23, 2015

You could run short benchmarks several times in a row. Regardless of the time taken, I now compute "best out of N" times and observed that this can make a large difference in robustness of the results. I use N=5 for quick iteration testing and N=20 for solid results. This allows to reliably observe timing differences that seem to be "within the noise" when looking at the time distribution of any fixed benchmark.

(Benchmarks are hard and usually wrong, I'm confident I'll eventually regret my advice above.)

@xavierleroy
Copy link
Contributor

Here is a simple proposal: a static inline function caml_alloc_shr_aux that takes a third parameter indicating what to do in case of OOM; two instantiations as caml_alloc_shr and caml_alloc_shr_no_raise. No code duplication, and GCC produces perfect code.
memory.c-diff.txt

@alainfrisch
Copy link
Contributor

Thanks @xavierleroy , we will use this version.

@mlasson
Copy link
Contributor Author

mlasson commented Nov 23, 2015

If you do not plan to use caml_alloc_shr_no_raise elsewhere than the one call in intern.c then it is useless to control this raise out_of_memory because the size is tested before calling caml_alloc_shr_no_raise.

alainfrisch added a commit that referenced this pull request Nov 23, 2015
Fix memory leaks in intern.c when OOM is raised
@alainfrisch alainfrisch merged commit 4788ab3 into ocaml:trunk Nov 23, 2015
@alainfrisch
Copy link
Contributor

You mean, something like:

-      if (raise_oom) {
-        if (caml_in_minor_collection)
-          caml_fatal_error ("Fatal error: out of memory.\n");
-        else
-          caml_raise_out_of_memory ();
-      } else {
-        return 0;
-      }
+      if (!raise_oom) return 0;
+      if (caml_in_minor_collection)
+        caml_fatal_error ("Fatal error: out of memory.\n");
+      else
+        caml_raise_out_of_memory ();

?

@gasche
Copy link
Member

gasche commented Nov 23, 2015

Note: github distinguishes general comments done on the PR, and specific comments made against a specific line in one of the PR commits; it shows the specific comments in the general stream as well (to make it easier for others to follow them), but it is more convenient to reply directly on the commit. When the comment header says "$(person) commented at $(date)", it is a general comment, when it says "$(person) commented on $(file) in $(commit) at $(date)", it is a specific comment, and clicking on the $(file) (from the comment header in the general PR page) will bring you to the specific commit to reply to the comment.

Yes, I mean something like that, but of course the choice of either a non-return if or a full if .. else .. construct is to your preference (code authors do get to make life-defining choices -- in that case I would personally favor a full if .. else ..). It's just that in a binary conditional, the shortest case should always come first for readability.

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Feb 20, 2020
Fix for mark stack being too small
lthls added a commit to lthls/ocaml that referenced this pull request Nov 25, 2020
…ml#283)

Co-authored-by: Nicolás Ojeda Bär <n.oje.bar@gmail.com>
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants