Skip to content

Reserve 2 bits for expressing object layout#17139

Merged
tenderlove merged 8 commits into
ruby:masterfrom
tenderlove:messing-around
May 29, 2026
Merged

Reserve 2 bits for expressing object layout#17139
tenderlove merged 8 commits into
ruby:masterfrom
tenderlove:messing-around

Conversation

@tenderlove
Copy link
Copy Markdown
Member

@tenderlove tenderlove commented May 28, 2026

We would like to make instance variable reads in the JIT compiler faster (as well as simplify the JIT implementation). Currently, in order to read an instance variable, we have to:

  1. Test for heap object
  2. Load object flags to a 64 bit register
  3. Mask the object header
  4. Bit test against the masked header
  5. JNE
  6. Load field

We would like to:

  1. Test for heap object
  2. Load object shape to a 32 bit register
  3. Bit test against the shape
  4. JNE
  5. Load field

The way we fetch instance variables is not consistent across objects. In order to realize our goal, we need to encode object layout inside the shape. If we encode object layout inside the shape, then the shape itself will guarantee that the access pattern generated by the JIT compiler is correct.

We should encode the following load patterns into the shape tag bits. This way we can share shapes on transitions, but be able to differentiate the access patterns for the JIT compiler. In other words, two objects can have an @a -> @b -> @c transition and share the same shape, but the tag bits can differentiate the access pattern so that the JIT compiler can be confident that the machine code is correct.

Here are the patterns:

  1. Embedded/Extended T_OBJECT Instance Variables

Objects with direct references to instance variables or via malloc buffer

  1. Objects with fields_objects fields

These are Data and TypedData objects. They have an associated axillary imemo/fields object that stores the instance variables. The access pattern is object[2] + 2. The fields object is the 3rd field, and the instance variables start at +2 inside the fields object. The fields object itself is a Ruby object, so it contains the usual header bits + class headers.

  1. Non Boxable Classes / Modules

This is similar to Objects with fields_objects, but the fields object is stored at a different offset. We’re differentiating this from boxable classes and modules because those are harder to support.

  1. Other

"Other" pattern is for objects that are rare, or have difficult-to-implement access patterns. This includes:

  • Boxable classes and modules
  • Structs (for now)
  • Objects that use the geniv table

Proposed shape bit layout:

  Current shape_id_t is 32 bits:
  31        28 27 26 25 24 23 22        19 18                         0
  +-----------+--+--+--+--+--+------------+----------------------------+
  | unused    |L1|L0|OI|FR|CX| heap index | shape tree offset          |
  +-----------+--+--+--+--+--+------------+----------------------------+
               |  |  |  |  |  |            |
               |  |  |  |  |  |            +-- bits 0-18: SHAPE_ID_OFFSET_MASK
               |  |  |  |  |  +--------------- bits 19-22: SHAPE_ID_HEAP_INDEX_MASK
               |  |  |  |  +------------------ bit 23: SHAPE_ID_FL_COMPLEX
               |  |  |  +--------------------- bit 24: SHAPE_ID_FL_FROZEN
               |  |  +------------------------ bit 25: SHAPE_ID_FL_HAS_OBJECT_ID
               +--+--------------------------- bits 26-27: SHAPE_ID_LAYOUT_MASK

The important part about these layout patterns is that they do not reflect the type of object, only how the object is laid out in memory. For example, we currently treat structs as "other", but we can refactor them to have the same layout as "Objects with fields_objects", and when we do that they should get a different bit in the shape header.

This commit only reserves the two bits, it doesn't use them in the JIT compiler yet.

@tenderlove tenderlove requested a review from byroot May 28, 2026 22:47
@tenderworks tenderworks force-pushed the messing-around branch 2 times, most recently from ae3bfd0 to c2c8d8c Compare May 28, 2026 22:54
@XrXr
Copy link
Copy Markdown
Member

XrXr commented May 28, 2026

What's a non-boxable class? I thought every class can end up inside a box

@tenderlove
Copy link
Copy Markdown
Member Author

tenderlove commented May 29, 2026

What's a non-boxable class? I thought every class can end up inside a box

Ya, every class can end up inside a box, but only "box able" classes can be "seen" differently between boxes. For example the String class can have different ivars depending on the box that it's accessed from, but a user class that's created and passed between boxes will only ever have one set of IVs.

> RUBY_BOX=1 ./miniruby -e'p [Ruby::Box.new.eval("String.instance_variable_set(:@a, 1)"), String.instance_variable_get(:@a)]'
./miniruby: warning: Ruby::Box is experimental, and the behavior may change in the future!
See https://docs.ruby-lang.org/en/master/Ruby/Box.html for known issues, etc.
[1, nil]

The flag is here

The "boxable" flag depends on the boolean passed here, and BOX_MASTER_P(rb_current_box()) is only true during Ruby's boot process. User classes will all be defined as "not boxable". It's kind of a weird name for a flag but I don't know a better name off the top of my head.

Anyway if you inspect the current box after boot, you'll see it's a user box:

RUBY_BOX=1 ./miniruby -e'p Ruby::Box.current'
./miniruby: warning: Ruby::Box is experimental, and the behavior may change in the future!
See https://docs.ruby-lang.org/en/master/Ruby/Box.html for known issues, etc.
#<Ruby::Box:3,user,main>

So BOX_MASTER_P is false, and user classes will be "not boxable".

@tenderlove
Copy link
Copy Markdown
Member Author

tenderlove commented May 29, 2026

If we make the other case fall back to just a function call, then I think we could also eliminate the "main box" guards in the IV read path:

> RUBY_BOX=1 ./miniruby -e'p RubyVM::Shape.of(String).layout'
./miniruby: warning: Ruby::Box is experimental, and the behavior may change in the future!
See https://docs.ruby-lang.org/en/master/Ruby/Box.html for known issues, etc.
:other
> ./miniruby -e'p RubyVM::Shape.of(String).layout'
:rclass

@tenderworks tenderworks force-pushed the messing-around branch 2 times, most recently from 005c652 to d584473 Compare May 29, 2026 00:35
@tekknolagi
Copy link
Copy Markdown
Contributor

They are patchpoints and therefore don't cost much, if anything

Comment thread gc.c
@byroot
Copy link
Copy Markdown
Member

byroot commented May 29, 2026

I'll have a lot of nitpicks, but this is very much where I was going with the recent shape refactors and the paused work on #16843, so 👍 on the idea itself.

Copy link
Copy Markdown
Member

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of little nitpicks, but overall big 👍 on this, it's essentially what I was trying to do, and I'm happy to base myself of these changes.

Comment thread array.c
struct RArray fake_ary = {0};
fake_ary.basic.flags = T_ARRAY;
VALUE ary = (VALUE)&fake_ary;
RBASIC_SET_SHAPE_ID(ary, ROOT_SHAPE_ID | SHAPE_ID_LAYOUT_OTHER);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would probably save yourself lots of trouble if SHAPE_ID_LAYOUT_OTHER was the default (aka 0).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. We could change it. TBH I think we should be explicitly setting shapes on objects at allocation time. This fake array worked because ROOT_SHAPE_ID happens to be 0. IOW I think this allocation point should have been setting a shape in the first place and I'm glad we caught it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I think we should be explicitly setting shapes on objects at allocation time.

Agreed. I started moving in that direction a while ago, e.g. introduction of rb_newobj_of_with_shape etc.

Comment thread shape.h
}

static inline shape_id_t
rb_shape_id_with_robject_layout(shape_id_t shape_id)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rb_shape_id_with_robject_layout(shape_id_t shape_id)
rb_shape_transition_robject_layout(shape_id_t shape_id)

Not the end of the world, but the naming convention for function that transition to another shape (even if just flipping bits) is rb_shape_transition_xxx

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used as a transition though. The semantics are the same but the intent is a little different

Comment thread shape.h Outdated
Comment thread shape.h

// Means IVs are found at an offset from the object's addr, or in a
// malloc allocated side table
SHAPE_ID_LAYOUT_ROBJECT = 0,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I know it's much harder (since I tried it: #16843), but I think ideally you do want two distinct layouts for ROBJECT. One for embedded, one for "extended".

Can be a followup though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally you do want two distinct layouts for ROBJECT. One for embedded, one for "extended".

Why? Since the size class is embedded in the shape, the JIT compiler can know whether the object is embedded or extended based on the shape alone. Not against it, just wondering the reasons.

Another thing we were discussing is that when an object goes extended maybe we use an imemo/fields object and then the object becomes "rdata" layout.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the size class is embedded in the shape

Right, I didn't properly explain.

So first right now this is a bit annoying because it means that when removing an ivar, you have to know if you are dealing with an "extended RObject" in which case you may have to re-embed it as to not confuse the JITs.

That is why I do want to properly materialize IS_ROBJECT_EMBED as a bit in the shape_id (#17145) and not solely rely on the heap_id + shape->depth to tell us if we're embedded or not.

The more secondary reason I would like this is that I would want all types to have the heap_id in their shape (right now only ROBJECT does) so that you could more efficiently query the object slot size. Right now rb_gc_obj_slot_size() is used a bit across array.c/string.c but it's quite slow for what is it.

But to be clear I'm not asking you do implement any of this, I'm just sharing my own goals so that we hopefully don't step on each others toes too much (that being said most of this work has been paused for a couple weeks now, and unclear why I'll have the time+energy to resume it, so priority to you).

Comment thread shape.h
Comment on lines +55 to +59
SHAPE_ID_LAYOUT_RCLASS = RBIMPL_SHAPE_ID_FL(3),

// Means this object is an RData or RTypedData and IVs are found in the
// fields_obj found on the RData/RTypedData struct
SHAPE_ID_LAYOUT_RDATA = RBIMPL_SHAPE_ID_FL(4),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my plan was to move RClass.fields_obj at the same offset than RTypedData.fields_obj and RObject.as.extended (which right now is a raw buffer, but that I would like to make an IMEMO/fields #17145).

If you do this, SHAPE_ID_LAYOUT_RCLASS and SHAPE_ID_LAYOUT_RDATA can become the same (unless you need to discriminate classes for other reason? Ractors?).

This way you don't really care what the type is, it's really just at which offset you find the fields object.

Copy link
Copy Markdown
Member Author

@tenderlove tenderlove May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my plan was to move RClass.fields_obj at the same offset than RTypedData.fields_obj and RObject.as.extended (which right now is a raw buffer, but that I would like to make an IMEMO/fields #17145).

🤦 yes, I should have read all comments before responding.

So my plan was to move RClass.fields_obj

Were you only going to do this for non-boxable classes?

If you do this, SHAPE_ID_LAYOUT_RCLASS and SHAPE_ID_LAYOUT_RDATA can become the same (unless you need to discriminate classes for other reason? Ractors?).

This way you don't really care what the type is, it's really just at which offset you find the fields object.

I don't think there is any reason to discriminate. The only reason we have this layout type is to know that a class isn't boxable, and where its fields pointer is via the shape. If we can make it have the same layout as RDATA that would be great.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, boxable classes can't have that "layout", but other than that, you can share the same flag for:

  • non-boxables classes
  • RTypedData
  • "extended" RObject

As they're all generate the exact same native code (search for IMEMO/fields reference at obj[2]).

Comment thread gc.c Outdated
Comment thread ractor.c
VALUE type = RB_BUILTIN_TYPE(obj);
size_t slot_size = rb_gc_obj_slot_size(obj);
VALUE moved = rb_newobj(GET_EC(), 0, type, 0, wb_protected_types[type], slot_size);
VALUE moved = rb_newobj(GET_EC(), 0, type, RBASIC_SHAPE_ID(obj), wb_protected_types[type], slot_size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to copy the whole shape on an otherwise empty object? I'm thinking that if GC triggers before that object is initialized it might cause some bug.

You may want to only copy the "structural bits", so LAYOUT and HEAP parts.

Comment thread shape.c
Comment thread variable.c Outdated
// In most case we can replicate the single `fields_obj` shape
// but in namespaced case? Perhaps INVALID_SHAPE_ID?
RBASIC_SET_SHAPE_ID(obj, RBASIC_SHAPE_ID(new_fields_obj));
RBASIC_SET_SHAPE_ID(obj, rb_shape_id_layout(RBASIC_SHAPE_ID(obj)) | RBASIC_SHAPE_ID(new_fields_obj));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So on my experimental branch, I struggle quite a bit with keeping the "owner" and "field obj" shapes in sync, as it resulted in lot of brittle code like this.

What I was thinking was to introduce a split between the bits that represent the object type (slot size, location of fields, etc) and the parts that are shared with the fields obj like the offset frozen bit etc etc.

Note this isn't a blocker, if you PR is green I'm happy to refactor this in a followup, it's much easier to simplify working code.

Comment thread shape.c Outdated
tenderlove and others added 8 commits May 29, 2026 10:05
We would like to make instance variable reads in the JIT compiler faster
(as well as simplify the JIT implementation).  Currently, in order to
read an instance variable, we have to:

1. Test for heap object
2. Load object to a 64 bit register
3. Mask the object header
4. Bit test against the masked header
5. JNE
6. Load field

We would like to:

1. Test for heap object
2. Load object shape to a 32 bit register
3. Bit test against the shape
4. JNE
5. Load field

The way we fetch instance variables is not consistent across objects.
In order to realize our goal, we need to encode object layout inside the
shape.  If we encode object layout inside the shape, then the shape
itself will guarantee that the access pattern generated by the JIT
compiler is correct.

We should encode the following load patterns into the shape tag bits.
This way we can share shapes on transitions, but be able to
differentiate the access patterns for the JIT compiler.  In other words,
two objects can have an `@a -> @b -> @c` transition and share the same
shape, but the tag bits can differentiate the access pattern so that the
JIT compiler can be confident that the machine code is correct.

Here are the patterns:

1. Embedded/Extended T_OBJECT Instance Variables

Objects with direct references to instance variables or via malloc
buffer

2. Objects with fields_objects fields

These are Data and TypedData objects.  They have an associated axillary
imemo/fields object that stores the instance variables.  The access
pattern is `object[2] + 2`.  The fields object is the 3rd field, and the
instance variables start at +2 inside the fields object.  The fields
object itself is a Ruby object, so it contains the usual header bits +
class headers.

3. Non Boxable Classes / Modules

This is similar to Objects with fields_objects, but the fields object is
stored at a different offset.  We’re differentiating this from boxable
classes and modules because those are harder to support.

4. Other

"Other" pattern is for objects that are rare, or have
difficult-to-implement access patterns.  This includes:

* Boxable classes and modules
* Structs (for now)
* Objects that use the geniv table

Proposed shape bit layout:

```
  Current shape_id_t is 32 bits:
  31        28 27 26 25 24 23 22        19 18                         0
  +-----------+--+--+--+--+--+------------+----------------------------+
  | unused    |L1|L0|OI|FR|CX| heap index | shape tree offset          |
  +-----------+--+--+--+--+--+------------+----------------------------+
               |  |  |  |  |  |            |
               |  |  |  |  |  |            +-- bits 0-18: SHAPE_ID_OFFSET_MASK
               |  |  |  |  |  +--------------- bits 19-22: SHAPE_ID_HEAP_INDEX_MASK
               |  |  |  |  +------------------ bit 23: SHAPE_ID_FL_COMPLEX
               |  |  |  +--------------------- bit 24: SHAPE_ID_FL_FROZEN
               |  |  +------------------------ bit 25: SHAPE_ID_FL_HAS_OBJECT_ID
               +--+--------------------------- bits 26-27: SHAPE_ID_LAYOUT_MASK
```

The important part about these layout patterns is that they do not
reflect the _type_ of object, only how the object is laid out in memory.
For example, we currently treat structs as "other", but we can refactor
them to have the same layout as "Objects with fields_objects", and when
we do that they should get a different bit in the shape header.

This commit only reserves the two bits, it doesn't use them in the JIT
compiler yet.

Co-Authored-By: John Hawthorn <john@hawthorn.email>
Co-Authored-By: Max Bernstein <tekknolagi@gmail.com>
Co-authored-by: Nobuyoshi Nakada <nobu.nakada@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
This reverts commit 900711d.
Copy link
Copy Markdown
Member

@XrXr XrXr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before implementing this in the JITs, you should be able to use this in the to simplify some class of cached ivar reading in the interpreter right? That's a good way to verify the shapes infra is working properly.

@tenderlove
Copy link
Copy Markdown
Member Author

tenderlove commented May 29, 2026

you should be able to use this in the to simplify some class of cached ivar reading in the interpreter right?

I don't know that this change would simplify any of the existing code. These bits teach us where to find the IV buffer and the code for locating the IV buffer is not part of the current inline cache hit code.

If we refactored the IV caches to do CC fast paths like methods, then we could eliminate this branch but that seems like a much bigger change than this PR.

@tenderlove
Copy link
Copy Markdown
Member Author

@XrXr I made a commit that dogfoods these bits. I don't think it's any simpler at the moment, so I'd like to follow up with a PR after.

@tenderlove tenderlove merged commit 63d9f09 into ruby:master May 29, 2026
119 checks passed
@tenderworks tenderworks deleted the messing-around branch May 29, 2026 22:26
matzbot pushed a commit that referenced this pull request May 29, 2026
tenderlove added a commit to tenderlove/ruby that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants