Skip to content

Power of two size-pools#2

Closed
eightbitraptor wants to merge 21 commits intomasterfrom
mvh-power-two-sizepools
Closed

Power of two size-pools#2
eightbitraptor wants to merge 21 commits intomasterfrom
mvh-power-two-sizepools

Conversation

@eightbitraptor
Copy link
Copy Markdown
Owner

@eightbitraptor eightbitraptor commented Feb 19, 2026

This does not contain the 48 byte size pool, that's on another branch on top of this one.

This is very WIP

@eightbitraptor eightbitraptor force-pushed the mvh-power-two-sizepools branch 6 times, most recently from 3c68653 to 5322c41 Compare February 26, 2026 09:56
Change slot sizes from {40,80,160,320,640} to {64,128,256,512,1024}.
BASE_SLOT_SIZE is now 64 (2^6) and all pool sizes are powers of 2

This enables bit-shift slot indexing instead of magic number division.
because BASE_SLOT_SIZE is now 32 bytes, it's no longer suitable for use
in tests that use it to assume the size of most RVALUE objects, like
strings
When RVALUE_OVERHEAD is large (debug builds with RACTOR_CHECK_MODE +
GC_DEBUG), the smallest size pool's usable size can be less than
sizeof(struct RBasic). The capacity calculation underflows:

  (8 - 16) / 8 → 0xFFFF (via size_t wraparound, truncated to uint16_t)

Since shape_grow_capa iterates capacities from index 0, the garbage
65535 at capacities[0] poisons all ivar capacity growth, causing a
buffer overflow in the RUBY_DEBUG assertion that fills unused capacity
with Qundef.
When RVALUE_OVERHEAD > 0 (GC_DEBUG, RACTOR_CHECK_MODE), heap[0]'s
usable space can equal sizeof(struct RBasic), leaving zero bytes for
instance variables. The capacity was incorrectly set to 1, allowing
the shape system to embed an IV that overflows into the overhead area.

Change the fallback capacity to 0 and switch shape_grow_capa to
count-based iteration so that a zero capacity is not confused with
the array sentinel.
rb_obj_embedded_size(0) returned sizeof(struct RBasic), which is
too small on builds with RVALUE_OVERHEAD (GC_DEBUG) where heap[0]
has no usable space beyond RBasic. The as.heap variant needs at
least one VALUE of space for the external IV pointer.

Clamp the minimum fields_count to 1 so T_OBJECT allocations always
request enough space for the as union.
This assumes that the typical allocation pattern is going to build way
more smaller objects than large (>512 byte) objects for most cases.
I don't know if this is the correct Fix!
Add the new 32-byte heap 0 to the stat_heap example, shift
all heap indices by 1, and update the keys example to show
[0, 1, 2, 3, 4, 5].
The assertion used 4 * RVALUE_SIZE (160) which was correct for the
old heaps[2] = 160B, but no longer matches any pool boundary.
Use the literal heaps[3] size (256B) since that is the target pool
for class objects on 64-bit.
BASE_SLOT_SIZE is now 32 on 64-bit (16 on 32-bit), matching the
smallest slot in heap_sizes[]. RVALUE_SIZE stays at 64/32 as the
minimum slot for a standard RVALUE. Update stale comment about
pool count from 5 to 6.
BASE_SLOT_SIZE is now 32 bytes, so this exclusion (which was
explicitly marked for removal when that happened) can be dropped.
@eightbitraptor eightbitraptor force-pushed the mvh-power-two-sizepools branch from 5322c41 to 34acfad Compare February 27, 2026 18:43
Slot sizes are not necessarily powers of BASE_SLOT_SIZE (e.g. the
upcoming 48-byte pool). Objects at odd multiples of non-power-of-two
slot sizes are VALUE-aligned but not BASE_SLOT_SIZE-aligned. Use
sizeof(VALUE) as the minimum alignment for quick-reject in
is_pointer_to_heap and related debug assertions.
Replace the formula-based slot size computation (BASE_SLOT_SIZE << i)
with an explicit lookup table that includes a 48-byte pool at index 1.
Pool layout: 32, 48, 64, 128, 256, 512, 1024 bytes (7 pools).

The 48-byte pool targets the dominant 40-byte RVALUE allocation,
reducing internal fragmentation from 37.5% to 16.7%.

Init slots are now table-driven: pool 1 (48B) gets the full budget
as the busiest pool; pool 0 (32B) gets fewer since little fits there.
The RVALUE pool is now 48 bytes (capacity 4 embedded IVs), down from
64 bytes (capacity 6). Adjust the test to use 5 ivars to trigger
external storage, then remove 2 to test re-embedding with 3.
heap_idx_for_size used ceil(log2(size)) to map allocation sizes to
pool indices, which assumed every pool was a power-of-two multiple of
BASE_SLOT_SIZE. With the 48-byte pool at index 1, objects needing
49-64 bytes were placed in 48-byte slots, causing heap buffer overflows
(e.g. embedded TypedData with embed_size=56).

Handle the first two pools with direct comparisons, then use the
log2 formula with +1 offset for the remaining power-of-two pools.

slot_index_for_offset used offset >> slot_size_log2 for bitmap index
calculation. Replace with precomputed reciprocal multiplication
(offset * reciprocal >> 48) which handles arbitrary slot sizes in
two instructions without branching.

heap_add_page aligned page start with & ~(slot_size - 1), a bitmask
trick that only works for powers of two. Replace with modulo-based
round-up on this cold path.

Reorder heap_page fields to keep freelist, start, and
slot_size_reciprocal in the first cache line for allocation and
SLOT_INDEX hot paths.

Fix RVALUE_SIZE computation and tests that assumed power-of-two pool
progression.
@eightbitraptor eightbitraptor force-pushed the mvh-power-two-sizepools branch from a676ebd to 9cf3de6 Compare February 27, 2026 23:19
Align MMTk's GC::INTERNAL_CONSTANTS key with the default GC. Update
all test references from SIZE_POOL_COUNT to HEAP_COUNT.
@eightbitraptor eightbitraptor force-pushed the mvh-power-two-sizepools branch from 9cf3de6 to 7cb2c66 Compare February 28, 2026 09:34
In debug builds RVALUE_OVERHEAD is 16 bytes (file + line tracking),
pushing the minimum RVALUE size to 56 bytes which exceeds the 48-byte
pool. Use size_allocatable_p instead of checking against a specific
pool index so the assertion holds regardless of overhead size.
The 48B pool has capacity for 4 embedded ivars (down from 6 in the
old 64B RVALUE pool). Setting @d (the 4th ivar on main) now triggers
a shape extension, so ZJIT falls back to SetIvar instead of
specializing the store.
The 24-byte non-power-of-two pool on 32-bit triggers a heap corruption
in test_marshal on i686. Revert to the original {16, 32, 64, 128, 256,
512} layout on 32-bit until the root cause is identified. The 48-byte
pool on 64-bit is unaffected.
@eightbitraptor eightbitraptor force-pushed the mvh-power-two-sizepools branch from c28ad6d to 98223e5 Compare March 2, 2026 14:26
MMTk did not account for RVALUE_OVERHEAD. rb_gc_impl_obj_slot_size
returned the full slot size, but the GC interface contract requires
the usable size (full minus overhead). RACTOR_BELONGING_ID writes at
obj + obj_slot_size expecting to land in the overhead region — without
the subtraction, the write corrupted adjacent objects.

This is a pre-existing bug exposed by allocation pattern changes.

Add RVALUE_OVERHEAD awareness: define the overhead struct matching the
default GC, subtract it in obj_slot_size, account for it in allocation
sizing and heap_sizes, and call rb_ractor_setup_belonging after
allocation to initialize the ractor ID.
eightbitraptor pushed a commit that referenced this pull request Apr 1, 2026
Move compilation steps from the heaviest jobs to the lightest to reduce
the critical path of the Compilations workflow.

Before: jobs ranged from 13-41 min (compile#12 had 4 steps, compile#3
had 10 clang versions).

After: jobs range from 7-9 steps each (excluding compile#1 which has
the LTO build), bringing the estimated critical path from ~41 min to
~30 min.

Moves:
- clang 23, 22, 21 from #3 to ruby#12 and ruby#10
- GCC 8, 7 from #2 to ruby#12
- `OPT_THREADED_CODE=1`, `OPT_THREADED_CODE=2` from ruby#7 to ruby#10
eightbitraptor pushed a commit that referenced this pull request Apr 2, 2026
pm_parse_process initializes the index_lookup_table but nothing seems to
use it after it has been allocated. However, pm_compile_scope_node will
overwrite the index_lookup_table and cause it to leak memory. This can
be seen during bootup with the following memory leaks reported by ASAN:

    #0 0x60dba31b7af3 in malloc
    #1 0x60dba32e0718 in rb_gc_impl_malloc gc/default/default.c:8287:5
    #2 0x60dba32c7aa7 in ruby_xmalloc_body gc.c:5373:12
    #3 0x60dba32c4a54 in ruby_xmalloc gc.c:5355:34
    ruby#4 0x60dba3260314 in pm_index_lookup_table_init_heap prism_compile.h:89:29
    ruby#5 0x60dba3209388 in pm_parse_process prism_compile.c:11366:5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant