-
Notifications
You must be signed in to change notification settings - Fork 1
Description
This is proposed as an alternative to #71 which advocated guarding the code paths introduced by us in string.c and array.c with compile-time macros and letting CRuby support allocating objects larger than 640 bytes. This proposal does not require CRuby's default GC to be able to be able to allocate larger objects.
Proposal
We modify CRuby so that objects that need malloc for buffers (which include almost all types in CRuby, including T_OBJECT, T_STRING, T_ARRAY, T_MATCH, etc.) shall have three states:
- (old) Embedded: In this state, the payload is embedded in the object itself.
- (old) DisjointMalloc: In this state, the payload is held in a buffer allocated by malloc.
- (new) DisjointGC: In this state, the payload is held in another object in the GC heap.
The GC will tell the runtime whether it can allocate a char[] or VALUE[] of a given capacity in the GC heap. CRuby's default GC will use 640 bytes as its threshold and refuse to allocate larger buffers. MMTk will report that it can allocate any size into the GC heap.
The users of buffers (T_OBJECT, T_STRING, T_ARRAY, T_MATCH, etc.) need to be aware of all the three states (Embedded, DisjointMalloc and DisjointGC) in all of its code paths (constructors, append, dup, trim, insert, etc.), and be able to transition those objects from one state to the other. When making the object disjoint (non-embedded), they shall prefer allocating the buffer in the GC heap, and only use malloc if the GC implementation reports that it cannot allocate that buffer in the GC heap. The users will be responsible for freeing the malloc buffers only if an object is transitioned away from the DisjointMalloc state.
Remember that the main purpose of allocating the buffers in the GC heap is to eliminate the need to use obj_free() to free their buffers. An object does not need obj_free when allocated, but only needs obj_free when it enters the DisjointMalloc state. The goal is that if the GC reports it can allocate arbitrary object sizes, no objects will ever reach the DisjointMalloc state. Hopefully CRuby's default GC will be the only GC that needs to handle the DisjointMalloc state.
- When using CRuby's default GC, we can keep using
obj_free()to free themallocbuffers because the default GC needs to sweep every cell anyway, live or dead. - When using other GCs such as MMTk, we don't register those objects (
T_STRING,T_ARRAY,T_MATCH, etc.) as candidates for finalization, as they will never enter theDisjointMallocstate.
Upstreaming
We can start from a few types, such as T_STRING, T_ARRAY and T_MATCH, so that they start using the three-sate strategy. We then gradually reform other types, such as T_OBJECT, T_HASH, etc. Eventually, all types will be using the three-state strategy.