Power of two size-pools by eightbitraptor · Pull Request #2 · eightbitraptor/ruby

eightbitraptor · 2026-02-19T14:52:07Z

This does ~~not~~ contain the 48 byte size pool, that's on another branch on top of this one.

This is very WIP

Change slot sizes from {40,80,160,320,640} to {64,128,256,512,1024}. BASE_SLOT_SIZE is now 64 (2^6) and all pool sizes are powers of 2 This enables bit-shift slot indexing instead of magic number division.

RFLOAT mostly

because BASE_SLOT_SIZE is now 32 bytes, it's no longer suitable for use in tests that use it to assume the size of most RVALUE objects, like strings

When RVALUE_OVERHEAD is large (debug builds with RACTOR_CHECK_MODE + GC_DEBUG), the smallest size pool's usable size can be less than sizeof(struct RBasic). The capacity calculation underflows: (8 - 16) / 8 → 0xFFFF (via size_t wraparound, truncated to uint16_t) Since shape_grow_capa iterates capacities from index 0, the garbage 65535 at capacities[0] poisons all ivar capacity growth, causing a buffer overflow in the RUBY_DEBUG assertion that fills unused capacity with Qundef.

When RVALUE_OVERHEAD > 0 (GC_DEBUG, RACTOR_CHECK_MODE), heap[0]'s usable space can equal sizeof(struct RBasic), leaving zero bytes for instance variables. The capacity was incorrectly set to 1, allowing the shape system to embed an IV that overflows into the overhead area. Change the fallback capacity to 0 and switch shape_grow_capa to count-based iteration so that a zero capacity is not confused with the array sentinel.

rb_obj_embedded_size(0) returned sizeof(struct RBasic), which is too small on builds with RVALUE_OVERHEAD (GC_DEBUG) where heap[0] has no usable space beyond RBasic. The as.heap variant needs at least one VALUE of space for the external IV pointer. Clamp the minimum fields_count to 1 so T_OBJECT allocations always request enough space for the as union.

This assumes that the typical allocation pattern is going to build way more smaller objects than large (>512 byte) objects for most cases.

I don't know if this is the correct Fix!

Add the new 32-byte heap 0 to the stat_heap example, shift all heap indices by 1, and update the keys example to show [0, 1, 2, 3, 4, 5].

The assertion used 4 * RVALUE_SIZE (160) which was correct for the old heaps[2] = 160B, but no longer matches any pool boundary. Use the literal heaps[3] size (256B) since that is the target pool for class objects on 64-bit.

BASE_SLOT_SIZE is now 32 on 64-bit (16 on 32-bit), matching the smallest slot in heap_sizes[]. RVALUE_SIZE stays at 64/32 as the minimum slot for a standard RVALUE. Update stale comment about pool count from 5 to 6.

BASE_SLOT_SIZE is now 32 bytes, so this exclusion (which was explicitly marked for removal when that happened) can be dropped.

Slot sizes are not necessarily powers of BASE_SLOT_SIZE (e.g. the upcoming 48-byte pool). Objects at odd multiples of non-power-of-two slot sizes are VALUE-aligned but not BASE_SLOT_SIZE-aligned. Use sizeof(VALUE) as the minimum alignment for quick-reject in is_pointer_to_heap and related debug assertions.

Replace the formula-based slot size computation (BASE_SLOT_SIZE << i) with an explicit lookup table that includes a 48-byte pool at index 1. Pool layout: 32, 48, 64, 128, 256, 512, 1024 bytes (7 pools). The 48-byte pool targets the dominant 40-byte RVALUE allocation, reducing internal fragmentation from 37.5% to 16.7%. Init slots are now table-driven: pool 1 (48B) gets the full budget as the busiest pool; pool 0 (32B) gets fewer since little fits there.

The RVALUE pool is now 48 bytes (capacity 4 embedded IVs), down from 64 bytes (capacity 6). Adjust the test to use 5 ivars to trigger external storage, then remove 2 to test re-embedding with 3.

heap_idx_for_size used ceil(log2(size)) to map allocation sizes to pool indices, which assumed every pool was a power-of-two multiple of BASE_SLOT_SIZE. With the 48-byte pool at index 1, objects needing 49-64 bytes were placed in 48-byte slots, causing heap buffer overflows (e.g. embedded TypedData with embed_size=56). Handle the first two pools with direct comparisons, then use the log2 formula with +1 offset for the remaining power-of-two pools. slot_index_for_offset used offset >> slot_size_log2 for bitmap index calculation. Replace with precomputed reciprocal multiplication (offset * reciprocal >> 48) which handles arbitrary slot sizes in two instructions without branching. heap_add_page aligned page start with & ~(slot_size - 1), a bitmask trick that only works for powers of two. Replace with modulo-based round-up on this cold path. Reorder heap_page fields to keep freelist, start, and slot_size_reciprocal in the first cache line for allocation and SLOT_INDEX hot paths. Fix RVALUE_SIZE computation and tests that assumed power-of-two pool progression.

Align MMTk's GC::INTERNAL_CONSTANTS key with the default GC. Update all test references from SIZE_POOL_COUNT to HEAP_COUNT.

In debug builds RVALUE_OVERHEAD is 16 bytes (file + line tracking), pushing the minimum RVALUE size to 56 bytes which exceeds the 48-byte pool. Use size_allocatable_p instead of checking against a specific pool index so the assertion holds regardless of overhead size.

@d

The 48B pool has capacity for 4 embedded ivars (down from 6 in the old 64B RVALUE pool). Setting @d (the 4th ivar on main) now triggers a shape extension, so ZJIT falls back to SetIvar instead of specializing the store.

The 24-byte non-power-of-two pool on 32-bit triggers a heap corruption in test_marshal on i686. Revert to the original {16, 32, 64, 128, 256, 512} layout on 32-bit until the root cause is identified. The 48-byte pool on 64-bit is unaffected.

MMTk did not account for RVALUE_OVERHEAD. rb_gc_impl_obj_slot_size returned the full slot size, but the GC interface contract requires the usable size (full minus overhead). RACTOR_BELONGING_ID writes at obj + obj_slot_size expecting to land in the overhead region — without the subtraction, the write corrupted adjacent objects. This is a pre-existing bug exposed by allocation pattern changes. Add RVALUE_OVERHEAD awareness: define the overhead struct matching the default GC, subtract it in obj_slot_size, account for it in allocation sizing and heap_sizes, and call rb_ractor_setup_belonging after allocation to initialize the ractor ID.

Move compilation steps from the heaviest jobs to the lightest to reduce the critical path of the Compilations workflow. Before: jobs ranged from 13-41 min (compile#12 had 4 steps, compile#3 had 10 clang versions). After: jobs range from 7-9 steps each (excluding compile#1 which has the LTO build), bringing the estimated critical path from ~41 min to ~30 min. Moves: - clang 23, 22, 21 from #3 to ruby#12 and ruby#10 - GCC 8, 7 from #2 to ruby#12 - `OPT_THREADED_CODE=1`, `OPT_THREADED_CODE=2` from ruby#7 to ruby#10

pm_parse_process initializes the index_lookup_table but nothing seems to use it after it has been allocated. However, pm_compile_scope_node will overwrite the index_lookup_table and cause it to leak memory. This can be seen during bootup with the following memory leaks reported by ASAN: #0 0x60dba31b7af3 in malloc #1 0x60dba32e0718 in rb_gc_impl_malloc gc/default/default.c:8287:5 #2 0x60dba32c7aa7 in ruby_xmalloc_body gc.c:5373:12 #3 0x60dba32c4a54 in ruby_xmalloc gc.c:5355:34 ruby#4 0x60dba3260314 in pm_index_lookup_table_init_heap prism_compile.h:89:29 ruby#5 0x60dba3209388 in pm_parse_process prism_compile.c:11366:5

eightbitraptor force-pushed the mvh-power-two-sizepools branch 6 times, most recently from 3c68653 to 5322c41 Compare February 26, 2026 09:56

eightbitraptor added 12 commits February 27, 2026 18:43

Use power-of-2 size pool slot sizes

12cbd04

Change slot sizes from {40,80,160,320,640} to {64,128,256,512,1024}. BASE_SLOT_SIZE is now 64 (2^6) and all pool sizes are powers of 2 This enables bit-shift slot indexing instead of magic number division.

Add a 32 byte size pool for small objects

0e6ade3

RFLOAT mostly

Introduce RVALUE_SIZE to capture the size of most RVALUES

aca2778

because BASE_SLOT_SIZE is now 32 bytes, it's no longer suitable for use in tests that use it to assume the size of most RVALUE objects, like strings

Scale initial heap size by bytes rather than slot count

1a69884

This assumes that the typical allocation pattern is going to build way more smaller objects than large (>512 byte) objects for most cases.

Fix the zjit tests.

415720b

I don't know if this is the correct Fix!

Update gc.rb docs for 6 size pools

1d62f38

Add the new 32-byte heap 0 to the stat_heap example, shift all heap indices by 1, and update the keys example to show [0, 1, 2, 3, 4, 5].

Fix STATIC_ASSERT for class embedding with power-of-2 pools

a6d4b97

The assertion used 4 * RVALUE_SIZE (160) which was correct for the old heaps[2] = 160B, but no longer matches any pool boundary. Use the literal heaps[3] size (256B) since that is the target pool for class objects on 64-bit.

Fix MMTk BASE_SLOT_SIZE to match default GC

b08f648

BASE_SLOT_SIZE is now 32 on 64-bit (16 on 32-bit), matching the smallest slot in heap_sizes[]. RVALUE_SIZE stays at 64/32 as the minimum slot for a standard RVALUE. Update stale comment about pool count from 5 to 6.

Remove stale MMTk test exclusion for test_dump_includes_slot_size

34acfad

BASE_SLOT_SIZE is now 32 bytes, so this exclusion (which was explicitly marked for removal when that happened) can be dropped.

eightbitraptor force-pushed the mvh-power-two-sizepools branch from 5322c41 to 34acfad Compare February 27, 2026 18:43

eightbitraptor added 4 commits February 27, 2026 18:57

Fix re-embed test for 48-byte RVALUE pool

d70af8c

The RVALUE pool is now 48 bytes (capacity 4 embedded IVs), down from 64 bytes (capacity 6). Adjust the test to use 5 ivars to trigger external storage, then remove 2 to test re-embedding with 3.

eightbitraptor force-pushed the mvh-power-two-sizepools branch from a676ebd to 9cf3de6 Compare February 27, 2026 23:19

Rename MMTk SIZE_POOL_COUNT to HEAP_COUNT

7cb2c66

Align MMTk's GC::INTERNAL_CONSTANTS key with the default GC. Update all test references from SIZE_POOL_COUNT to HEAP_COUNT.

eightbitraptor force-pushed the mvh-power-two-sizepools branch from 9cf3de6 to 7cb2c66 Compare February 28, 2026 09:34

eightbitraptor added 3 commits February 28, 2026 09:51

Update ZJIT snapshot for 48-byte RVALUE pool

7ec8f54

The 48B pool has capacity for 4 embedded ivars (down from 6 in the old 64B RVALUE pool). Setting @d (the 4th ivar on main) now triggers a shape extension, so ZJIT falls back to SetIvar instead of specializing the store.

Use power-of-two pools on 32-bit for now

98223e5

The 24-byte non-power-of-two pool on 32-bit triggers a heap corruption in test_marshal on i686. Revert to the original {16, 32, 64, 128, 256, 512} layout on 32-bit until the root cause is identified. The 48-byte pool on 64-bit is unaffected.

eightbitraptor force-pushed the mvh-power-two-sizepools branch from c28ad6d to 98223e5 Compare March 2, 2026 14:26

eightbitraptor closed this Mar 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Power of two size-pools#2

Power of two size-pools#2
eightbitraptor wants to merge 21 commits intomasterfrom
mvh-power-two-sizepools

eightbitraptor commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eightbitraptor commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

eightbitraptor commented Feb 19, 2026 •

edited

Loading