New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mostly precise GC #726
Mostly precise GC #726
Conversation
🎉Woohooo! 🎉 If anyone is interested in how it works, it's a clean-room implementation based on the "Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance" paper and there is also talk from the authors available online. We are going to talk more about this in detail at ScalaDays CPH. Note: this is an intended fix for #128. |
@LukasKellenberger, please reformat using new |
What pause times should be expected with this new GC? From the paper it looks like they focussed mainly on throughput. Although I also found another paper that suggests that there are low-latency GC designs based on Immix. I assume that JVM Scala is likely to stay the better option if maximum throughput is desired, so it might be a smart play to ensure Scala Native is suitable for realtime applications. (Go also seems to have made great strides towards this with their GC improvements.) |
2fcb418
to
e0fe5d0
Compare
@shado23 As you correctly noticed, immix itself doesn't aim at addressing the issue of GC pause times. Primary motivation for it right now it to improve throughput performance on GC-heavy workloads. Current prototype fully achieves this goal, providing a major improvement on our benchmarks. Solving GC pause times is the next step after this, but it fundamentally depends on having a good tracing collector to start with. Typical solution for this problem is to make your existing collector to run concurrently with the application utilizing a separate thread for the garbage collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass.
|
||
void scalanative_collect() { Heap_collect(heap, stack); } | ||
|
||
void scalanative_safepoint() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is not used for now in any way, lets move the definition of the safepoint implementation in the safepoint.c
, not to duplicate it in every GC implementation.
void **alloc = (void **)Heap_allocLarge(heap, size); | ||
*alloc = info; | ||
return (void *)alloc; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like our allocation entry points are slowly getting out of hand here. We really need a doc page in the contributors guide section that explains a bit more about compiler/gc interface. This should list all of the supported allocation entry points, their signatures, and expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More specifically I'd only leave three allocation entry points:
scalanative_alloc_small
scalanative_alloc_large
scalanative_alloc_atomic
Leaving out the raw version altogether (it's an artifact of Boehm GC.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scalanative_alloc_raw
is used for arrays. This would mean that we need to change the allocation of array to choose the correct function depending on the size.
|
||
void *scalanative_alloc_raw_atomic(size_t size) { | ||
return scalanative_alloc_raw(size); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is used in the java library quite a bit without any RTTI in the start of the object. I'm pretty sure that this is unsound at the moment.
An obvious solution would be to use an RTTI of an Object class and return a pointer 1 word past the RTTI. But this would fall apart if the pointer gets stored somewhere as we only support inner pointers for stack to heap references, but not for heap to heap references, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's correct, everywhere apart from the stack, pointers are assumed to point on rtti.
return (void *)alloc; | ||
} | ||
|
||
void *scalanative_alloc_large(void *info, size_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We need to have a precise definition of large in the docs.
- We need an assert that object is large here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should we document this ? Is there documentation about the GC interface ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to create a new page for this. E.g. docs/contrib/gc.rst
.
return scalanative_alloc_raw(size); | ||
} | ||
|
||
void *scalanative_alloc(void *info, size_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given it calls to allocSmall
in the implementation, the more appropriate name for this one is scalanative_alloc_small
. Also needs an assert that size is indeed small.
#include "../Log.h" | ||
|
||
typedef enum { | ||
object_forward = 0x0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used at the moment, I assume that's the state you set for objects that are being evacuated to a different location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this flag will be used for evacuation during the marking phase.
#include "Heap.h" | ||
#include "datastructures/Stack.h" | ||
|
||
void Mark_roots(Heap *heap, Stack *stack); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marker_markRoots
.
markModules(heap, stack); | ||
|
||
mark(heap, stack); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has quite a few naming convention violations.
static inline bool heap_isWordInHeap(Heap *heap, word_t *word) { | ||
return heap_isWordInSmallHeap(heap, word) || | ||
heap_isWordInLargeHeap(heap, word); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make sure that large and small objects heaps are adjacent in memory? This way we'll need just two bounds checks here instead of 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. We can do it but it will not help. The problem being that we can map memory only once for both heaps and grow up and down. But we can have a value from the stack pointing to "unallocated heap" meaning part of memory that has been mapped but where the heaps have not grown to yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, with growing heap this isn't going to work at all.
word_t *end = (word_t *)((uint8_t *)start + size); | ||
|
||
if (end > allocator->largeLimit) { | ||
// DIFFERENT FROM IMMIX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this comment? It doesn't seem to be very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be removed, it's different from the mmtk code because of multithreading, the access to the global allocator needs to be synchronised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, lets remove it.
let(n, Op.Call(allocSig, alloc, Seq(cls.rtti.const, size), Next.None)) | ||
val size = MemoryLayout.sizeOf(cls.layout.struct) | ||
val allocMethod = | ||
if (size < LARGE_OBJECT_MIN_SIZE) alloc else largeAlloc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@densh Here we need the check to take into account the header of the object in the GC. The total allocated size needs to be less than 8K. But the header size can change depending on the GC. Should we add GC specific properties in compiler ?
If the compiler would know about the GC, we could also adapt the injection passes.
a576fe6
to
ccae343
Compare
#define PRINT_STACK_OVERFLOW | ||
|
||
#define INITIAL_STACK_SIZE (8) | ||
#define INITIAL_STACK_SIZE (128 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be really small, is this sufficient not to overflow on our benchmarks?
Needs another rebase due to a bug in CI that was fixed in #744 |
…nd inner pointers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming convention changes.
Immix is a mostly precise mark-region collector based on the paper: | ||
`Immix: A Mark-Region Garbage Collector with Space Efficiency, | ||
Fast Collection, and Mutator Performance | ||
<http://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really expect our end-users to read papers to understand what we did. Contributors maybe, but not the end-users.
* Updates the the cursor and the limit of the Allocator to point to the first | ||
* free line of the new block. | ||
*/ | ||
void firstLineNewBlock(Allocator *allocator, BlockHeader *block) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocator_firstLineNewBlock
.
} | ||
} | ||
|
||
bool getNextLine(Allocator *allocator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocator_getNextLine
* Updates the cursor and the limit of the Allocator to point the next line of | ||
* the recycled block | ||
*/ | ||
bool nextLineRecycled(Allocator *allocator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocator_nextLineRecycled
* the fast allocator is too small to fit | ||
* the block to alloc. | ||
*/ | ||
word_t *overflowAllocation(Allocator *allocator, size_t size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocator_overflowAllocation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to add the module name to all functions ? I did it only for "public" (defined in .h) functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think it's a good idea to add it. It helps to easily navigate the codebase without IDE.
} | ||
} | ||
|
||
bool smallHeapOverflowHeapScan(Heap *heap, Stack *stack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackOverflowHandler_smallHeapOverflowHeapScan
return false; | ||
} | ||
|
||
bool overflowMark(Heap *heap, Stack *stack, Object *object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackOverflowHandler_overflowMark
* Scans through the large heap to find marked blocks with unmarked children. | ||
* Updates `currentOverflowAddress` while doing so. | ||
*/ | ||
void largeHeapOverflowHeapScan(Heap *heap, Stack *stack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackOverflowHandler_largeHeapOverflowHeapScan
} | ||
} | ||
|
||
bool overflowScanLine(Heap *heap, Stack *stack, BlockHeader *block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackOverflowHandler_overflowScanLine
* object is found it returns `false`. | ||
* | ||
*/ | ||
bool overflowBlockScan(BlockHeader *block, Heap *heap, Stack *stack, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StackOverflowHandler_overflowBlockScan
2a45487
to
1e1dec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pass.
collections pauses are not acceptable. | ||
collections pauses are not acceptable.` | ||
|
||
2. **immix.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(experimental, introduced in 0.3)
.
@@ -98,7 +98,14 @@ Garbage collectors | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(experimental, introduced 0.2)
.
#include "State.h" | ||
#include "utils/MathUtils.h" | ||
|
||
#define MAX_SIZE 64 * 1024 * 1024 * 1024L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do something more sensible here. For example limiting it to the amount of physical memory would be a start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing.
Heap_grow(heap, increment); | ||
} | ||
Allocator_initCursors(heap->allocator); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to gracefully fail if we reach the limit of available heap memory, e.g. throw OutOfMemoryError.
As soon as CI passes, this seems to be good to go! 👏 We'll definitely need to spend a bit more time polishing this, but the current prototype should be more than good enough as an experimental feature for 0.3. 👍 |
Hey I saw this fly past on Reddit. While I don't have any experience with Scala or whatsoever, I did write a full implementation of Immix (including evacuation, parallel stack scanning, finalisation, etc) in Rust: https://github.com/YorickPeterse/inko/tree/master/vm/src/gc I'm not sure how helpful it is, but you may be able to learn bits from the code and apply it to Scala. It may not be entirely clear what is what since the code isn't that well documented, so I'd love to answer any questions you might have. Worth mentioning is that for the above VM the GC assumes every process (or thread for that matter) has its own isolated heap. For a language with a shared heap you might have to adjust bits and pieces (e.g. the code used for suspending processes/threads). |
@yorickpeterse Thanks for sharing your work! We do have a problem of having to support a shared mutable heap due to the JVM heritage of the Scala language, but there are standard well-known solutions to make this all work together nicely. 😄 |
Prototype of a Mark-Region Garbage Collector.
Todo: