SMAL is a simple mark-sweep allocator for small objects. It is designed to be as minimal, performant and modular as possible.
- It is not a drop-in replacement for
malloc()
andfree()
. - It uses only
malloc()
,free()
,mmap()
, andmunmap()
. - It has no default collection scheduling policy; users must explicitly start a collection.
- It has no default root marking policy.
- Is neither conservative, nor type-safe; users can configure it for both scenarios.
- It should work in co-operation with any
malloc()
/free()
implementation. - Most advanced options are supported on Linux and OS X.
- Threads.
- Explicit, exact root declarations of globals and stack variables.
- Implicit, conservative root scanning of stack variables and register sets.
- Weak references.
- Reference queues.
- Finalizers.
- Remembered Sets for Mostly Unchanging Objects. (See below).
smal_type_for_desc(smal_type_descriptor *desc)
locates or creates a newsmal_type
object for objects of a particular size, mark and free functions.smal_alloc(smal_type *type)
allocates a new object ofsmal_type
.smal_mark_ptr(void * ptr)
marks a potential pointer. This is called from within asmal_type
mark_func
.smal_collect()
starts a collection.
smal_type
represents a user-data type of a fixed size, with mark and free callbacks.
smal_page
represents an mmap()
region of smal_page_size
aligned to smal_page_size
.
smal_buffer
represents the state of objects in a smal_page
. It handles parcelling and bookkeeping for all objects allocated from its smal_page
.
smal_type
defines the size of objects and other characteristics that will be allocated from one or more smal_buffer
s.
mark_func
and free_func
callbacks can be registered for each smal_type
.
Each smal_page
is allocated from the OS via mmap()
and is ensured to be aligned to smal_page_size
.
The reasons for this aggressive alignment are discussed below.
This is achieved by attempting to allocate smal_page_size
bytes using MAP_ANON
mmap()
.
If this buffer not aligned, SMAL munmap()
s the attempted buffer and mmap()
s smal_page_size * 2
bytes and subsequently munmap()
s the unaligned portion(s).
SMAL supports two smal_buffer
allocation options:
- Use a portion at the head each page for the page's
smal_buffer
structure. - Allocate a
smal_buffer
structure usingmalloc()
and store a pointer to it in the first word of the page.
smal_page*
and its smal_buffer*
is the same pointer, but bookkeeping operations will dirty pages during allocation and collection. This can cause unnecessary copy-on-write effort in forked processes.
The latter avoids page mutations during allocation and collections but has an additional pointer dereference when mapping smal_page*
pointers to smal_buffer*
pointers.
Objects are allocated by smal_alloc(smal_type*)
. Note: objects are allocated by type, not size.
Objects are parceled, on-demand, from a smal_buffer
, by incrementing alloc_ptr
by the smal_type
's fixed object size, object_size
, starting at begin_ptr
and terminating at end_ptr
.
Each smal_buffer
keeps its own free_list
.
A free bitmap is also maintained to avoid double-free during sweep.
smal_buffers
without active objects are munmap
ed and returned to the OS after smal_collect()
.
Objects can be explicitly freed by smal_free(void *ptr)
.
Since SMAL does not compact or relocate objects during collection, it attempts to allocate from the smal_buffer
with the least amount of available objects (either unparceled or on the free list), associated with the requested smal_type
.
SMAL has no default collection scheduling algorithm or policy. Users must decide on when to call smal_collect()
.
The alignment restrictions of smal_buffers
ensures very minimal mmap()
/OS-level fragmentation.
It also allows for efficient mapping from arbitrary pointers to smal_buffer*
, making SMAL
suitable for conservative and non-conservative collection.
Every potential pointer maps to a unique page_id
, computed
by dividing the pointer address by smal_page_size
.
Since every mmap()
'ed page is always aligned to smal_page_size
,
it is trivial to map a potential object pointer to a potential page_id
and it's smal_buffer*
.
A non-colliding hash table mapping page_id
s to smal_buffer*
s is maintained.
A lack of a table entry for a given page_id
, means that any potential
pointer mapping within the aligned smal_page_size
region associated with the page_id
is not a pointer to an object allocated from a smal_buffer
.
Because the table is guaranteed to be non-colliding, mapping an arbitrary pointer to a
smal_buffer
can be done in O(1)
time, using a divide (or shift), a modulo and a vector index.
This operation takes 6 x86 instructions.
If a smal_buffer*
can be found for a potential pointer, the pointer must also be within the smal_buffer
's region between
begin_ptr
and alloc_ptr
.
This operation takes 8 x86 instructions.
Preliminary size overhead is approximately 4.5% for 10,000,000 16-byte objects.
- SMAL cannot service allocations larger than
smal_page_size - sizeof(smal_buffer) - object_alignment
orsmal_page_size - sizeof(void*) - object_alignment
, depending on the configuration. - The
page_id
hash table may become large and sparse whenmmap()
tends to not allocatesmal_buffer
s in a mostly contiguous fashion -- other active allocators (e.g.:malloc()
) may cause holes in the address spacemmap
ped by SMAL.
Each smal_buffer
maintains a mark bitmap and free bitmap indexable by every object parceled from the smal_buffer
.
Placing mark bits in a separate bitmap reduces memory overhead, esp. for fork()
ed processes -- this protects
copy-on-write pages from full mutation due to mark bit manipulations.
An object's mark bitmap index is computed by dividing, the difference between its pointer and its smal_buffer
begin_ptr
,
by the size of the objects (object_size
) allocated from the smal_buffer
.
If an object pointer is known to be allocated and aligned, smal_mark_ptr()
can
- test the mark bit in 14 x86 instructions,
- set the mark bit in 5 x86 instructions and,
- recurse into the
mark_func
in 3 x86 instructions,
Each smal_buffer
maintains a free list. Sweeping unused objects into the free list will cause page mutations. There will be an option to only use the free bitmap to find free objects for allocation, at the expensive of allocation speed.
SMAL supports an optional mark queue to avoid C stack recursion in mark_func
. This is slightly slower than C recursion but will reduce stack overflows for threads with small C stacks. The mark queue uses malloc()/free()
.
The mark function returns a void*
. A non-zero return value will be continued for marking. This allows the mark function to avoid tail recursion into the mark engine. For example:
typedef void *my_oop; typedef struct my_cons { my_oop car, cdr; } my_cons; static void * my_cons_mark (void *ptr) { smal_mark_ptr(ptr, ((my_cons *) ptr)->car); return ((my_cons *) ptr)->cdr; }
The algorithms/data structures are as follows:
- Map a potential object pointer to a potential
page_id
. - Map a potential
page_id
to a potentialsmal_buffer*
usingbuffer_table
. - Determine if object pointer within
smal_buffer
'ssmal_page
region where object allocations took place. - Determine the mark bitmap offset.
- If object is not already marked,
- Mark the object in the bitmap, and
- Call the
smal_type
'smark_func()
function to continue marking other objects.
- For each
smal_buffer
: - MORE HERE.
SMAL supports designating a smal_type
as containing mostly unchanging objects. The smal_buffers for these objects are tracked for mutations using a write barrier per buffer and can be scanned for pointers less frequently by keeping a remembered set of references pointing outside itself. This is functional on Linux and OS X.
Types can be created that will sweep objects only every N collections. This should only be used for object types that are unlikely to ever become unreferenced during normal collection life-cycles. This can be used in conjunction with "Mostly Unchanging Objects" feature.
SMAL implements a write barrier to determine if a smal_buffer
has been mutated since last collection. This is used to invalidate remembered sets within a buffer. Write barrier uses mprotect
. Write barrier faults are trapped with POSIX sigaction
on Linux and Mach exception ports on OS X.
SMAL relies on MAP_ANON
mmap()
/munmap()
and malloc()
/free()
,
thus can co-exist with any other mmap()
-based malloc()
/free()
library.
SMAL probably cannot co-exist with allocators that use sbrk()
.
The user must provide the following functions:
void smal_collect_inner_before(void *top_of_stack);
void smal_collect_before_mark();
void smal_collect_after_mark();
void smal_collect_before_sweep();
void smal_collect_after_sweep();
void smal_collect_mark_roots();
The user must provide a smal_collect_mark_roots()
function.
This allows the user to configure and optimize for different environments:
- single-threaded vs. multi-threaded
- conservative vs. type-safe
smal_roots
package to explicitly declare root pointers on the C stack or in global variables.
SMAL supports a simple smal_thread
abstraction around pthreads
.
The smal_roots
, finalization and reference packages are thread-aware and thread-safe.
See include/smal/thread.h for more info.
The rest of the SMAL allocator is not completely thread-safe, yet. SMAL does not yet have portable mechanisms to:
- discover all active threads,
- pause all threads,
- collect their stacks, roots and registers,
- resume all threads.
When enabled with threading support, SMAL attempts to use many smaller mutex and read/write lock regions to avoid long or global locks. There is a single allocation read/write lock that is held to allow disabling of allocation from each buffer before collection starts. The multi-thread version is approx. 2 to 3 times slower than the single-thread version, when running only one thread. Obviously there is is room for improvement.
SMAL supports global object enumeration (i.e. Ruby ObjectSpace.each_object).
SMAL contains an optional, thread-safe weak reference and reference queue implementation. See include/smal/reference.h.
SMAL contains an optional, thread-safe, lazy finalizer implementation. See include/smal/finalizer.h.
MIT License
- Thread Support:
- Complete thread-safety.
- Thread-safe stop-world.
- Thread-safe stack and register discovery.
- Active thread discovery.
- Concurrent Sweeping.