Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC fails to collect large array #5389

Closed
vicuna opened this Issue Oct 28, 2011 · 20 comments

Comments

Projects
None yet
2 participants
@vicuna
Copy link
Collaborator

vicuna commented Oct 28, 2011

Original bug ID: 5389
Reporter: @mjambon
Assigned to: @damiendoligez
Status: closed (set by @xavierleroy on 2016-12-07T10:36:56Z)
Resolution: fixed
Priority: normal
Severity: major
Version: 3.12.1
Target version: 4.00.0+dev
Fixed in version: 4.00.0+dev
Category: ~DO NOT USE (was: OCaml general)
Related to: #5757
Monitored by: varobert jmeister @protz @lefessan mehdi @ygrek @glondu @hcarty @mjambon "Christoph Bauer"

Bug description

A large array initialized with zeros fails to be collected after a call to Gc.compact. However a string or a bigarray of the same physical size are collected. This behavior is observed on amd64 Linux with OCaml 3.11.2 and 3.12.1.

See attached code.

In practice we are experiencing this problem with a large read-only hash table loaded from a file using Marshal and copied using Ancient in order to increase performance. The old copy produced by Marshal is not reclaimed, which doubles the expected memory usage.

File attachments

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 28, 2011

Comment author: @lefessan

Actually, the GC collects the block, but fails to free the chunk during compaction. The reason is simple: it is the first chunk in the list of chunks (because Linux gives it the smallest address in memory), and the compaction algorithm never frees the first chunk. Clearly, the compaction algorithm should use a different strategy to free chunks: it should first sort chunks by the increasing probability that a chunk be freed during compaction, so that it would not use a chunk that is free at 99% as the target for copying the other chunks...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 28, 2011

Comment author: @lefessan

Another (simpler) solution would be to truncate the chunk when it is clearly much bigger than necessary.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 28, 2011

Comment author: @ygrek

I am seeing similar behaviour quite often, i.e. after some heap growing and shrinking Gc reports small overall heap size while OS reports large chunks allocated for process. but never got to investigate it closely, thanks a lot for your explanation. Looking forward for a patch.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 28, 2011

Comment author: @mjambon

I found a workaround which may or may not work in other situations. It consists in allocating a string of about the same length in bytes (multiply array length by 8 or 4) after having allocated the array and before the garbage collector performs any compaction.

See alloc_array_fix and without_gc_compaction in the newly uploaded program test_gc_workaround.ml.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 29, 2011

Comment author: @lefessan

Your "workaround" will probably lead to a memory leak sooner or later. The difference between arrays and strings is not that the string is freed, and the array is not, the real difference is that strings are not initialized, and thus, they are not completely allocated by the kernel (pages in the middle will be mapped on demand). What you see in your workaround is that the chunk containing the array is reclaimed by the compaction, but the chunk containing the string (almost the same size) is not. Since the string is not initialized, it looks like everything has been collected, but it is not the case, and as the chunk will be progressively used for other data, pages will be mapped by the kernel, and the memory footprint of your program is going to grow until the size of chunk (1GB). However, this might happen slowly enough that it won't bother your application.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 30, 2011

Comment author: @mjambon

Thanks for the explanation.

Is there a document that describes how OCaml's GC works and that would help me understand the problem better and find a workaround?

Alternatively, are chunks created as large as needed, rounded up to the nearest power of two (here 1GB)? What is the smallest chunk size? Can I artificially create a small first chunk?

I am thinking that creating a list that uses at least as many bytes as my array would do the trick (i.e. allocating a large array, allocating a long list then compacting) because it would have to create a new chunk but not a huge one. It seems to work in my naive test. Is it supposed to?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Oct 30, 2011

Comment author: @lefessan

Unfortunately, there is no public description of OCaml garbage collection algorithm, probably because it has been tuned so much that a complete description would not be easier to read than the sources... ;-)

If your goal is to use ancient to move the array outside the heap, you might consider allocating outside the heap in the first place.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @protz

Fabrice, can I close this as "not a problem", or do you still think something ought to be done about this? #5389#c6188 implies that the GC could be smarter, but #5389#c6195 implies that Martin shouldn't do this in the first place :)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @ygrek

I for one would suggest that the first chunk should be truncated. Otherwise for the user it looks like the program eats much more memory than it really uses.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Dec 21, 2011

Comment author: @lefessan

Yes, this PR should be kept as at least a feature wish, with "need to truncate the first chunk after compaction".

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Mar 9, 2012

Comment author: varobert

I stumbled upon that bug today as I was fuzz-testing my application. Basically I would get some length from the fuzzed input and allocate a string of that size. This could go up to 2^31-ish in my runs.

This string is just used to do some comparison up to a few characters, then completely discarded (yes, what a waste to allocate it in the first place!).

http://caml.inria.fr/mantis/file_download.php?file_id=615&type=bug < This shows exactly what I'm doing. The output shows pretty clearly what is happening:

268435456
Before: 60525056, 33555710
After: 60398080, 1299

536870912
Before: 181194240, 67110142
After: 120796160, 1299

1073741824
Before: 362388480, 134219006
After: 241592320, 1299

[...]

You can see that the difference in the major heap's size between "before" and "after" compaction is equal to its previous size:
181 194 240 - 120 796 160 = 60 398 080 (exactly the size after compaction of the previous iteration)

Since I run several of these fuzz-tests in parallel, my OS is painfully DOS-ed by those OCaml programs claiming to need ~4GB of the RAM-cake each!

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Mar 23, 2012

Comment author: gerd

varobert: a possible workaround exists (not really nice, but should solve your problem). The idea is to allocate a large buffer as bigarray (which is malloced memory, and can even be reused in each iteration), and then initialize the bigarray so it looks like a string.

Ocamlnet contains the required C functions for this (feel free to copy them):

let buffer = Netsys_mem.alloc_memory_pages (n+64)
let () = Netsys_mem.value_area buffer
let (voffset,bytelen) = Netsys_mem.init_string buffer 0 n
let s = (Netsys_mem.as_value buffer voffset : string)

Now s is a string outside Ocaml's heap, and you have detailed control about memory management. The only "danger" here is that the bigarray is deallocated when buffer gets unreachable, but s is still used. You can avoid that by always passing (buffer,s) to any function.

For questions about using that, please contact me directly (gerd (at) gerd-stolpmann.de)

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Mar 23, 2012

Comment author: @alainfrisch

You can avoid that by always passing (buffer,s) to any function.

Are you sure this is enough? In native code, the garbage collector knows, for each point in code where it can be triggered, which are the live values (used in the rest of the function). So you need to make sure that buffer is explicitly used "at the end" of the function, to be sure it won't be collected.

It seems very easy to forget about it and get some very rare bug (difficult to track).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Mar 26, 2012

Comment author: @damiendoligez

lefessan: Truncating the chunk is not an option, because it is malloced memory, and realloc may move the block when resizing it.

ygrek: If the overall heap size is small, then the OCaml GC has already freed the memory and it's your libc that fails to give it back to the OS. There's nothing much we can do in that case, short of reimplementing our own malloc.

Martin Jambon: Fabrice is right, your workaround is likely to fail in long-running programs. You cannot create a small first chunk because we don't control the placement of the chunks in memory.

protz and lefessan: don't close this PR. I consider this a performance bug, not a mere feature wish.

I think I have a solution, but it involves changes to the compaction algorithm that will break one undocumented invariant: as it is now, the compaction doesn't change the relative order of blocks in memory; after the change, it will. I know at least one library out there relies on the relative order of pointers. However foolish that is, I want to give the authors advance warning about this change.

The only problem is, I don't remember who is doing that...

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 16, 2012

Comment author: @damiendoligez

I couldn't reproduce the problem with either test_gc.ml or Gcfail.ml, but I managed to trigger it with a small synthetic example (compact_problem.ml).

I have a patch that seems to solve the problem. This is how it works:
When the heap is still too large after a compaction, we allocate one chunk of the desired heap size and re-do a compaction that will move all data into this chunk and free the over-large chunk.

This patch is only lightly tested for the moment. If you guys could try it out, that would be great.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 17, 2012

Comment author: varobert

268435456
Before: 60525056, 33554532
After: 126976, 98

536870912
Before: 120923136, 67108964
After: 126976, 98

1073741824
Before: 241719296, 134217828
After: 126976, 98

It indeed solves my issue with Gcfail.ml (in both 3.12.1 and 4.01.0+dev1_2012-03-31).

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 17, 2012

Comment author: @damiendoligez

Patch applied to 4.00 (commit 12364) and trunk (12365).
Still waiting for more testing before closing this PR.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 17, 2012

Comment author: @ygrek

Is it expected to work on 3.11? I've applied this patch on 3.11.2 and observe crash in do_compaction (second invocation) in large application (with some potentially buggy C bindings, so I am not absolutely sure that this patch is a reason, maybe just a trigger)..

PS patch doesn't update caml_stat_heap_chunks - is it ok?

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented Apr 17, 2012

Comment author: @hcarty

I'm able to build trunk and the problem is fixed for me there, but version/4.00 does not build for me.

make[2]: Entering directory /home/hcarty/ocamlbrew/ocaml-4.00.0/build/build/otherlibs/graph' make[2]: *** No rule to make target /opt/local/include/X11/Xlib.h', needed by `open.o'. Stop.

@vicuna

This comment has been minimized.

Copy link
Collaborator Author

vicuna commented May 31, 2012

Comment author: @damiendoligez

About version/4.00: this is an independent problem that was fixed (#5582).

About caml_stat_heap_chunks: well spotted. I've fixed this in 4.00 (commit 12524) and trunk (commit 12525).

About 3.11: I have no idea whether it might work with this patch.

@vicuna vicuna closed this Dec 7, 2016

@vicuna vicuna added this to the 4.00.0 milestone Mar 14, 2019

@vicuna vicuna added the bug label Mar 20, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.