Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bigarray's caml_ba_alloc doesn't try GC if malloc fails #7100

Closed
vicuna opened this Issue Dec 18, 2015 · 10 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Dec 18, 2015

Original bug ID: 7100
Reporter: talex
Assigned to: @damiendoligez
Status: resolved (set by @alainfrisch on 2018-11-09T13:31:52Z)
Resolution: fixed
Priority: normal
Severity: major
OS: Linux and Mirage
Version: 4.02.3
Fixed in version: 4.08.0+dev/beta1/beta2
Category: otherlibs
Has duplicate: #7670
Related to: #7158 #7180 #7198 #7671
Monitored by: @gasche @rixed @yallop @hcarty @dbuenzli @alainfrisch

Bug description

If there happens to be no memory available when allocating a bigarray because a GC is due then it raises Out_of_memory, even if memory would be available after GC.

Steps to reproduce

This program crashes with "Fatal error: exception Out_of_memory" if run in an environment with limited memory (so that malloc may return null; tested with "ulimit -Sv 52000"):

open Bigarray

let () =
let rec loop () =
let x = Array1.create Char c_layout 102400 in
ignore x;
loop () in
loop ()

However, it works with an explicit call to the GC:

let () =
let rec loop () =
let x =
try Array1.create Char c_layout 102400
with Out_of_memory ->
print_endline "GC!";
Gc.full_major ();
Array1.create Char c_layout 102400 in
ignore x;
loop () in
loop ()

Additional information

MirageOS uses bigarrays extensively (via Cstruct), and this causes MirageOS unikernels to crash from time to time.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jan 22, 2016

Comment author: @damiendoligez

The problem with "triggering a GC" is that you can easily get into a state where every allocation triggers a GC and the program gets bogged down to the speed of a snail, which is worse than crashing.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Jan 23, 2016

Comment author: talex

Isn't that just how GC works? You run out of memory and then run a GC. Not running a GC because it might not free memory makes no sense to me (crashing afterwards if it fails might be OK though).

If OCaml doesn't run the GC when it runs out the memory, then applications have to instead. e.g. we currently have:

https://github.com/talex5/qubes-mirage-firewall/blob/26adeee1da5aa6f7d468f0ada7341b1756575a4c/memory_pressure.ml#L39

Each time we get a network packet, we check the memory situation. If less than 10% is free, we Gc.full_major. Compared to having OCaml do it, this means:

  1. We become slow at close to 90% used, rather than close to 100%.
  2. Sometimes we still crash (more margin => less chance of crash, but more RAM wasted).
  3. Every input event (incoming packet, user commands, etc) needs to run the check.
@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 30, 2016

Comment author: @rixed

I think the problem with MirageOs usage of bigarrays is more the value of CAML_BA_MAX_MEMORY (1Gb) that's very far from the average memory one wants to spend on bigarrays in a microkernel (which often times would run with 256Mb of RAM or even less).
Especially given the terrible page allocator of minios, which can allocate only power of two number of pages for large allocations.
Therefore, MirageOs is going to malloc, say, 32KiB for a 20KiB cstruct, and says the GC that "unless I have mallocated 50000 such blocks there is no need to run garbage collection".

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Feb 16, 2017

Comment author: @xavierleroy

I think we or the Mirage people need to do something to address this issue, it's just unclear to me what needs to be done. @doligez could you please restart the discussion?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 5, 2017

Comment author: @damiendoligez

@talex, OCaml's GC doesn't work like that because it's incremental: it tries to do enough work, as the program is running, to make sure it won't ever run out of memory. When the program does run out of memory, we assume it means the program is allocating faster than it is dropping objects, which means its memory needs are increasing, so we increase the heap size.

For the CAML_BA_MAX_MEMORY problem, I have a posssible solution: instead of using a constant, use a proportion of the heap size, set by the user or by the program. For example, if you set it at 100%, it means you are allocating half your memory to the heap, and the other half to bigarrays (along with other external data, if you use other libraries with custom objects).

Would that be a workable solution?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 6, 2017

Comment author: talex

doligez: yes, I was confused when I wrote this. I was imagining that OCaml's GC worked like Java's.

rixed pointed out in mirage/io-page#38 that Mirage's io-page does not instruct the GC of how much memory could be free by a GC, so that could be a big part of the problem. We should probably fix that and reopen this issue if that doesn't fix it.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 13, 2017

Comment author: @alainfrisch

I think there is a real problem here: caml_alloc_custom does not have any specific logic to trigger a minor GC when too many "external" memory is used by custom blocks in the minor heap. One can thus easily a lot of memory with e.g. bigarrays -- and reach an OOM -- before the GC even triggers. This does not even depend on the value for CAML_BA_MAX_MEMORY.

It seems one would need some logic to keep track of the "size" of external memory used by custom blocks in the minor heap (i.e. the mem/max arguments to caml_alloc_custom) and force a minor GC when a given threshold is reached.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 13, 2017

Comment author: @alainfrisch

Alternatively, one could put a limit to the "external size" of custom blocks allocated in the minor heap. For instance, it makes sense to allocate "small float bigarrays" in the minor heap, but for large ones, the benefit is less clear.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 13, 2017

Comment author: @alainfrisch

#1476

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Nov 5, 2018

Comment author: @damiendoligez

#1738 seems to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.