Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mostly precise GC #726

Merged
merged 27 commits into from May 26, 2017
Merged

Mostly precise GC #726

merged 27 commits into from May 26, 2017

Conversation

LukasKellenberger
Copy link
Contributor

@LukasKellenberger LukasKellenberger commented May 20, 2017

Prototype of a Mark-Region Garbage Collector.

Todo:

  • Handle stack overflow
  • Automatically resize heap

@densh
Copy link
Member

densh commented May 20, 2017

🎉Woohooo! 🎉

If anyone is interested in how it works, it's a clean-room implementation based on the "Immix: a mark-region garbage collector with space efficiency, fast collection, and mutator performance" paper and there is also talk from the authors available online.

We are going to talk more about this in detail at ScalaDays CPH.

Note: this is an intended fix for #128.

@densh densh mentioned this pull request May 20, 2017
@densh
Copy link
Member

densh commented May 21, 2017

@LukasKellenberger, please reformat using new bin/clangfmt script from #632.

@fgoepel
Copy link

fgoepel commented May 21, 2017

What pause times should be expected with this new GC? From the paper it looks like they focussed mainly on throughput. Although I also found another paper that suggests that there are low-latency GC designs based on Immix.

I assume that JVM Scala is likely to stay the better option if maximum throughput is desired, so it might be a smart play to ensure Scala Native is suitable for realtime applications. (Go also seems to have made great strides towards this with their GC improvements.)

@densh
Copy link
Member

densh commented May 21, 2017

@shado23 As you correctly noticed, immix itself doesn't aim at addressing the issue of GC pause times. Primary motivation for it right now it to improve throughput performance on GC-heavy workloads. Current prototype fully achieves this goal, providing a major improvement on our benchmarks.

Solving GC pause times is the next step after this, but it fundamentally depends on having a good tracing collector to start with. Typical solution for this problem is to make your existing collector to run concurrently with the application utilizing a separate thread for the garbage collection.

Copy link
Member

@densh densh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass.


void scalanative_collect() { Heap_collect(heap, stack); }

void scalanative_safepoint() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is not used for now in any way, lets move the definition of the safepoint implementation in the safepoint.c, not to duplicate it in every GC implementation.

void **alloc = (void **)Heap_allocLarge(heap, size);
*alloc = info;
return (void *)alloc;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like our allocation entry points are slowly getting out of hand here. We really need a doc page in the contributors guide section that explains a bit more about compiler/gc interface. This should list all of the supported allocation entry points, their signatures, and expected behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically I'd only leave three allocation entry points:

  1. scalanative_alloc_small
  2. scalanative_alloc_large
  3. scalanative_alloc_atomic

Leaving out the raw version altogether (it's an artifact of Boehm GC.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scalanative_alloc_raw is used for arrays. This would mean that we need to change the allocation of array to choose the correct function depending on the size.


void *scalanative_alloc_raw_atomic(size_t size) {
return scalanative_alloc_raw(size);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is used in the java library quite a bit without any RTTI in the start of the object. I'm pretty sure that this is unsound at the moment.

An obvious solution would be to use an RTTI of an Object class and return a pointer 1 word past the RTTI. But this would fall apart if the pointer gets stored somewhere as we only support inner pointers for stack to heap references, but not for heap to heap references, don't we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct, everywhere apart from the stack, pointers are assumed to point on rtti.

return (void *)alloc;
}

void *scalanative_alloc_large(void *info, size_t size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We need to have a precise definition of large in the docs.
  2. We need an assert that object is large here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should we document this ? Is there documentation about the GC interface ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to create a new page for this. E.g. docs/contrib/gc.rst.

return scalanative_alloc_raw(size);
}

void *scalanative_alloc(void *info, size_t size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it calls to allocSmall in the implementation, the more appropriate name for this one is scalanative_alloc_small. Also needs an assert that size is indeed small.

#include "../Log.h"

typedef enum {
object_forward = 0x0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used at the moment, I assume that's the state you set for objects that are being evacuated to a different location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this flag will be used for evacuation during the marking phase.

#include "Heap.h"
#include "datastructures/Stack.h"

void Mark_roots(Heap *heap, Stack *stack);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marker_markRoots.

markModules(heap, stack);

mark(heap, stack);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has quite a few naming convention violations.

static inline bool heap_isWordInHeap(Heap *heap, word_t *word) {
return heap_isWordInSmallHeap(heap, word) ||
heap_isWordInLargeHeap(heap, word);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure that large and small objects heaps are adjacent in memory? This way we'll need just two bounds checks here instead of 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. We can do it but it will not help. The problem being that we can map memory only once for both heaps and grow up and down. But we can have a value from the stack pointing to "unallocated heap" meaning part of memory that has been mapped but where the heaps have not grown to yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, with growing heap this isn't going to work at all.

word_t *end = (word_t *)((uint8_t *)start + size);

if (end > allocator->largeLimit) {
// DIFFERENT FROM IMMIX
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this comment? It doesn't seem to be very clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be removed, it's different from the mmtk code because of multithreading, the access to the global allocator needs to be synchronised.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, lets remove it.

let(n, Op.Call(allocSig, alloc, Seq(cls.rtti.const, size), Next.None))
val size = MemoryLayout.sizeOf(cls.layout.struct)
val allocMethod =
if (size < LARGE_OBJECT_MIN_SIZE) alloc else largeAlloc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@densh Here we need the check to take into account the header of the object in the GC. The total allocated size needs to be less than 8K. But the header size can change depending on the GC. Should we add GC specific properties in compiler ?

If the compiler would know about the GC, we could also adapt the injection passes.

@densh
Copy link
Member

densh commented May 25, 2017

Needs a rebase on top of master due to changes in #730 #733 #734 #737

#define PRINT_STACK_OVERFLOW

#define INITIAL_STACK_SIZE (8)
#define INITIAL_STACK_SIZE (128 * 1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be really small, is this sufficient not to overflow on our benchmarks?

@densh
Copy link
Member

densh commented May 26, 2017

Needs another rebase due to a bug in CI that was fixed in #744

Copy link
Member

@densh densh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming convention changes.

Immix is a mostly precise mark-region collector based on the paper:
`Immix: A Mark-Region Garbage Collector with Space Efficiency,
Fast Collection, and Mutator Performance
<http://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't really expect our end-users to read papers to understand what we did. Contributors maybe, but not the end-users.

* Updates the the cursor and the limit of the Allocator to point to the first
* free line of the new block.
*/
void firstLineNewBlock(Allocator *allocator, BlockHeader *block) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocator_firstLineNewBlock.

}
}

bool getNextLine(Allocator *allocator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocator_getNextLine

* Updates the cursor and the limit of the Allocator to point the next line of
* the recycled block
*/
bool nextLineRecycled(Allocator *allocator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocator_nextLineRecycled

* the fast allocator is too small to fit
* the block to alloc.
*/
word_t *overflowAllocation(Allocator *allocator, size_t size) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocator_overflowAllocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add the module name to all functions ? I did it only for "public" (defined in .h) functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think it's a good idea to add it. It helps to easily navigate the codebase without IDE.

}
}

bool smallHeapOverflowHeapScan(Heap *heap, Stack *stack) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflowHandler_smallHeapOverflowHeapScan

return false;
}

bool overflowMark(Heap *heap, Stack *stack, Object *object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflowHandler_overflowMark

* Scans through the large heap to find marked blocks with unmarked children.
* Updates `currentOverflowAddress` while doing so.
*/
void largeHeapOverflowHeapScan(Heap *heap, Stack *stack) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflowHandler_largeHeapOverflowHeapScan

}
}

bool overflowScanLine(Heap *heap, Stack *stack, BlockHeader *block,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflowHandler_overflowScanLine

* object is found it returns `false`.
*
*/
bool overflowBlockScan(BlockHeader *block, Heap *heap, Stack *stack,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StackOverflowHandler_overflowBlockScan

Copy link
Member

@densh densh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pass.

collections pauses are not acceptable.
collections pauses are not acceptable.`

2. **immix.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(experimental, introduced in 0.3).

@@ -98,7 +98,14 @@ Garbage collectors

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(experimental, introduced 0.2).

#include "State.h"
#include "utils/MathUtils.h"

#define MAX_SIZE 64 * 1024 * 1024 * 1024L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do something more sensible here. For example limiting it to the amount of physical memory would be a start.

Copy link
Member

@densh densh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing.

Heap_grow(heap, increment);
}
Allocator_initCursors(heap->allocator);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to gracefully fail if we reach the limit of available heap memory, e.g. throw OutOfMemoryError.

@densh
Copy link
Member

densh commented May 26, 2017

As soon as CI passes, this seems to be good to go! 👏

We'll definitely need to spend a bit more time polishing this, but the current prototype should be more than good enough as an experimental feature for 0.3. 👍

@densh densh changed the title [WIP] Mostly precise GC Mostly precise GC May 26, 2017
@densh densh merged commit 65004ea into scala-native:master May 26, 2017
@yorickpeterse
Copy link

yorickpeterse commented Jun 15, 2017

Hey I saw this fly past on Reddit. While I don't have any experience with Scala or whatsoever, I did write a full implementation of Immix (including evacuation, parallel stack scanning, finalisation, etc) in Rust: https://github.com/YorickPeterse/inko/tree/master/vm/src/gc

I'm not sure how helpful it is, but you may be able to learn bits from the code and apply it to Scala. It may not be entirely clear what is what since the code isn't that well documented, so I'd love to answer any questions you might have.

Worth mentioning is that for the above VM the GC assumes every process (or thread for that matter) has its own isolated heap. For a language with a shared heap you might have to adjust bits and pieces (e.g. the code used for suspending processes/threads).

@densh
Copy link
Member

densh commented Jun 20, 2017

@yorickpeterse Thanks for sharing your work! We do have a problem of having to support a shared mutable heap due to the JVM heritage of the Scala language, but there are standard well-known solutions to make this all work together nicely. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants