-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
I'm seeing what I think is a pathological performance in compact_allocate.
The following program allocates ~1GB of memory and does compaction:
let () = Gc.set { (Gc.get ()) with minor_heap_size = 1048576 }
let v = List.init (1_000_000_000 / (8 * 3)) (fun i -> i)
let () = Gc.compact ()
let (_ : _) = Sys.opaque_identity v
The compaction takes ~5s, almost all of the time taken by compact_allocate.
Unfortunately, this benchmark is fragile: in particular I can't reproduce the issue without the [Gc.set] call, even though it doesn't matter what I set the minor heap size to. (it only matters that it's different from the initial value).
Here's a perf report (~200 samples in total here):
│ 00000000005a3f30 <compact_allocate>:
│ compact_allocate():
│ mov compact_fl,%rax
│ mov -0x30(%rax),%rcx
│ mov -0x28(%rax),%rdx
│ sub %rcx,%rdx
│ cmp $0xf,%rdx
│ ↓ ja 3c
│ nop
│20: mov -0x20(%rax),%rax
│ mov -0x30(%rax),%rcx
│ mov -0x28(%rax),%rdx
│ sub %rcx,%rdx
│ cmp $0xf,%rdx
│ ↑ jbe 20
│ mov %rax,compact_fl
│3c: cmp %rdx,%rdi
│ ↓ jbe 5c
│ nop
│48:┌─→mov -0x20(%rax),%rax
│ │ mov -0x30(%rax),%rcx
91.70 │ │ mov -0x28(%rax),%rdx
│ │ sub %rcx,%rdx
│ │ cmp %rdi,%rdx
8.30 │ └──jb 48
│5c: add %rcx,%rdi
│ mov %rdi,-0x30(%rax)
│ add %rcx,%rax
│ ← retq
We looked at this with Leo (@lpw25) and our conjecture is that the allocation algorithm suffers from the known first-fit performance problem: for every element of the list, the allocation algorithm scans all blocks on the free list. In our case we have ~45 blocks, so allocation is slower than what it should be by that factor.
One improvement idea, suggested by Leo, is to implement next-fit instead of first-fit.
If we're worried about sacrificing even the smallest bit of compactness, we can implement a tweak to the next-fit algorithm: maintain a pointer in the free list which tells us which blocks have been used already, and which ones are completely free (therefore subject to being dropped). Then we make sure to avoid touching a completely free block unless we've scanned through all blocks that have been used.
I can also think of a more efficient (in the worst-case) implementation of first-fit semantics, where instead of scanning the free list from the beginning every time, we scan it backwards from the point where the previous scan ended (we need to keep track of extra info for this: the monotonically increasing subsequence of the prefix). However, next-fit seems just simpler.