Reserve 2 bits for expressing object layout#17139
Conversation
ae3bfd0 to
c2c8d8c
Compare
|
What's a non-boxable class? I thought every class can end up inside a box |
Ya, every class can end up inside a box, but only "box able" classes can be "seen" differently between boxes. For example the The flag is here The "boxable" flag depends on the boolean passed here, and Anyway if you inspect the current box after boot, you'll see it's a user box: So |
|
If we make the |
005c652 to
d584473
Compare
|
They are patchpoints and therefore don't cost much, if anything |
|
I'll have a lot of nitpicks, but this is very much where I was going with the recent shape refactors and the paused work on #16843, so 👍 on the idea itself. |
byroot
left a comment
There was a problem hiding this comment.
Lots of little nitpicks, but overall big 👍 on this, it's essentially what I was trying to do, and I'm happy to base myself of these changes.
| struct RArray fake_ary = {0}; | ||
| fake_ary.basic.flags = T_ARRAY; | ||
| VALUE ary = (VALUE)&fake_ary; | ||
| RBASIC_SET_SHAPE_ID(ary, ROOT_SHAPE_ID | SHAPE_ID_LAYOUT_OTHER); |
There was a problem hiding this comment.
You would probably save yourself lots of trouble if SHAPE_ID_LAYOUT_OTHER was the default (aka 0).
There was a problem hiding this comment.
Maybe. We could change it. TBH I think we should be explicitly setting shapes on objects at allocation time. This fake array worked because ROOT_SHAPE_ID happens to be 0. IOW I think this allocation point should have been setting a shape in the first place and I'm glad we caught it.
There was a problem hiding this comment.
TBH I think we should be explicitly setting shapes on objects at allocation time.
Agreed. I started moving in that direction a while ago, e.g. introduction of rb_newobj_of_with_shape etc.
| } | ||
|
|
||
| static inline shape_id_t | ||
| rb_shape_id_with_robject_layout(shape_id_t shape_id) |
There was a problem hiding this comment.
| rb_shape_id_with_robject_layout(shape_id_t shape_id) | |
| rb_shape_transition_robject_layout(shape_id_t shape_id) |
Not the end of the world, but the naming convention for function that transition to another shape (even if just flipping bits) is rb_shape_transition_xxx
There was a problem hiding this comment.
It's not used as a transition though. The semantics are the same but the intent is a little different
|
|
||
| // Means IVs are found at an offset from the object's addr, or in a | ||
| // malloc allocated side table | ||
| SHAPE_ID_LAYOUT_ROBJECT = 0, |
There was a problem hiding this comment.
So I know it's much harder (since I tried it: #16843), but I think ideally you do want two distinct layouts for ROBJECT. One for embedded, one for "extended".
Can be a followup though.
There was a problem hiding this comment.
I think ideally you do want two distinct layouts for ROBJECT. One for embedded, one for "extended".
Why? Since the size class is embedded in the shape, the JIT compiler can know whether the object is embedded or extended based on the shape alone. Not against it, just wondering the reasons.
Another thing we were discussing is that when an object goes extended maybe we use an imemo/fields object and then the object becomes "rdata" layout.
There was a problem hiding this comment.
Since the size class is embedded in the shape
Right, I didn't properly explain.
So first right now this is a bit annoying because it means that when removing an ivar, you have to know if you are dealing with an "extended RObject" in which case you may have to re-embed it as to not confuse the JITs.
That is why I do want to properly materialize IS_ROBJECT_EMBED as a bit in the shape_id (#17145) and not solely rely on the heap_id + shape->depth to tell us if we're embedded or not.
The more secondary reason I would like this is that I would want all types to have the heap_id in their shape (right now only ROBJECT does) so that you could more efficiently query the object slot size. Right now rb_gc_obj_slot_size() is used a bit across array.c/string.c but it's quite slow for what is it.
But to be clear I'm not asking you do implement any of this, I'm just sharing my own goals so that we hopefully don't step on each others toes too much (that being said most of this work has been paused for a couple weeks now, and unclear why I'll have the time+energy to resume it, so priority to you).
| SHAPE_ID_LAYOUT_RCLASS = RBIMPL_SHAPE_ID_FL(3), | ||
|
|
||
| // Means this object is an RData or RTypedData and IVs are found in the | ||
| // fields_obj found on the RData/RTypedData struct | ||
| SHAPE_ID_LAYOUT_RDATA = RBIMPL_SHAPE_ID_FL(4), |
There was a problem hiding this comment.
So my plan was to move RClass.fields_obj at the same offset than RTypedData.fields_obj and RObject.as.extended (which right now is a raw buffer, but that I would like to make an IMEMO/fields #17145).
If you do this, SHAPE_ID_LAYOUT_RCLASS and SHAPE_ID_LAYOUT_RDATA can become the same (unless you need to discriminate classes for other reason? Ractors?).
This way you don't really care what the type is, it's really just at which offset you find the fields object.
There was a problem hiding this comment.
So my plan was to move
RClass.fields_objat the same offset thanRTypedData.fields_objandRObject.as.extended(which right now is a raw buffer, but that I would like to make an IMEMO/fields #17145).
🤦 yes, I should have read all comments before responding.
So my plan was to move RClass.fields_obj
Were you only going to do this for non-boxable classes?
If you do this,
SHAPE_ID_LAYOUT_RCLASSandSHAPE_ID_LAYOUT_RDATAcan become the same (unless you need to discriminate classes for other reason? Ractors?).This way you don't really care what the type is, it's really just at which offset you find the fields object.
I don't think there is any reason to discriminate. The only reason we have this layout type is to know that a class isn't boxable, and where its fields pointer is via the shape. If we can make it have the same layout as RDATA that would be great.
There was a problem hiding this comment.
Yep, boxable classes can't have that "layout", but other than that, you can share the same flag for:
- non-boxables classes
- RTypedData
- "extended" RObject
As they're all generate the exact same native code (search for IMEMO/fields reference at obj[2]).
| VALUE type = RB_BUILTIN_TYPE(obj); | ||
| size_t slot_size = rb_gc_obj_slot_size(obj); | ||
| VALUE moved = rb_newobj(GET_EC(), 0, type, 0, wb_protected_types[type], slot_size); | ||
| VALUE moved = rb_newobj(GET_EC(), 0, type, RBASIC_SHAPE_ID(obj), wb_protected_types[type], slot_size); |
There was a problem hiding this comment.
Is it OK to copy the whole shape on an otherwise empty object? I'm thinking that if GC triggers before that object is initialized it might cause some bug.
You may want to only copy the "structural bits", so LAYOUT and HEAP parts.
| // In most case we can replicate the single `fields_obj` shape | ||
| // but in namespaced case? Perhaps INVALID_SHAPE_ID? | ||
| RBASIC_SET_SHAPE_ID(obj, RBASIC_SHAPE_ID(new_fields_obj)); | ||
| RBASIC_SET_SHAPE_ID(obj, rb_shape_id_layout(RBASIC_SHAPE_ID(obj)) | RBASIC_SHAPE_ID(new_fields_obj)); |
There was a problem hiding this comment.
So on my experimental branch, I struggle quite a bit with keeping the "owner" and "field obj" shapes in sync, as it resulted in lot of brittle code like this.
What I was thinking was to introduce a split between the bits that represent the object type (slot size, location of fields, etc) and the parts that are shared with the fields obj like the offset frozen bit etc etc.
Note this isn't a blocker, if you PR is green I'm happy to refactor this in a followup, it's much easier to simplify working code.
We would like to make instance variable reads in the JIT compiler faster (as well as simplify the JIT implementation). Currently, in order to read an instance variable, we have to: 1. Test for heap object 2. Load object to a 64 bit register 3. Mask the object header 4. Bit test against the masked header 5. JNE 6. Load field We would like to: 1. Test for heap object 2. Load object shape to a 32 bit register 3. Bit test against the shape 4. JNE 5. Load field The way we fetch instance variables is not consistent across objects. In order to realize our goal, we need to encode object layout inside the shape. If we encode object layout inside the shape, then the shape itself will guarantee that the access pattern generated by the JIT compiler is correct. We should encode the following load patterns into the shape tag bits. This way we can share shapes on transitions, but be able to differentiate the access patterns for the JIT compiler. In other words, two objects can have an `@a -> @b -> @c` transition and share the same shape, but the tag bits can differentiate the access pattern so that the JIT compiler can be confident that the machine code is correct. Here are the patterns: 1. Embedded/Extended T_OBJECT Instance Variables Objects with direct references to instance variables or via malloc buffer 2. Objects with fields_objects fields These are Data and TypedData objects. They have an associated axillary imemo/fields object that stores the instance variables. The access pattern is `object[2] + 2`. The fields object is the 3rd field, and the instance variables start at +2 inside the fields object. The fields object itself is a Ruby object, so it contains the usual header bits + class headers. 3. Non Boxable Classes / Modules This is similar to Objects with fields_objects, but the fields object is stored at a different offset. We’re differentiating this from boxable classes and modules because those are harder to support. 4. Other "Other" pattern is for objects that are rare, or have difficult-to-implement access patterns. This includes: * Boxable classes and modules * Structs (for now) * Objects that use the geniv table Proposed shape bit layout: ``` Current shape_id_t is 32 bits: 31 28 27 26 25 24 23 22 19 18 0 +-----------+--+--+--+--+--+------------+----------------------------+ | unused |L1|L0|OI|FR|CX| heap index | shape tree offset | +-----------+--+--+--+--+--+------------+----------------------------+ | | | | | | | | | | | | | +-- bits 0-18: SHAPE_ID_OFFSET_MASK | | | | | +--------------- bits 19-22: SHAPE_ID_HEAP_INDEX_MASK | | | | +------------------ bit 23: SHAPE_ID_FL_COMPLEX | | | +--------------------- bit 24: SHAPE_ID_FL_FROZEN | | +------------------------ bit 25: SHAPE_ID_FL_HAS_OBJECT_ID +--+--------------------------- bits 26-27: SHAPE_ID_LAYOUT_MASK ``` The important part about these layout patterns is that they do not reflect the _type_ of object, only how the object is laid out in memory. For example, we currently treat structs as "other", but we can refactor them to have the same layout as "Objects with fields_objects", and when we do that they should get a different bit in the shape header. This commit only reserves the two bits, it doesn't use them in the JIT compiler yet. Co-Authored-By: John Hawthorn <john@hawthorn.email> Co-Authored-By: Max Bernstein <tekknolagi@gmail.com>
Co-authored-by: Nobuyoshi Nakada <nobu.nakada@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
Co-authored-by: Jean Boussier <jean.boussier@gmail.com>
This reverts commit 900711d.
3bb14ff to
7df5509
Compare
XrXr
left a comment
There was a problem hiding this comment.
Before implementing this in the JITs, you should be able to use this in the to simplify some class of cached ivar reading in the interpreter right? That's a good way to verify the shapes infra is working properly.
I don't know that this change would simplify any of the existing code. These bits teach us where to find the IV buffer and the code for locating the IV buffer is not part of the current inline cache hit code. If we refactored the IV caches to do CC fast paths like methods, then we could eliminate this branch but that seems like a much bigger change than this PR. |
|
@XrXr I made a commit that dogfoods these bits. I don't think it's any simpler at the moment, so I'd like to follow up with a PR after. |
This reverts commit ddb5055.
We would like to make instance variable reads in the JIT compiler faster (as well as simplify the JIT implementation). Currently, in order to read an instance variable, we have to:
We would like to:
The way we fetch instance variables is not consistent across objects. In order to realize our goal, we need to encode object layout inside the shape. If we encode object layout inside the shape, then the shape itself will guarantee that the access pattern generated by the JIT compiler is correct.
We should encode the following load patterns into the shape tag bits. This way we can share shapes on transitions, but be able to differentiate the access patterns for the JIT compiler. In other words, two objects can have an
@a -> @b -> @ctransition and share the same shape, but the tag bits can differentiate the access pattern so that the JIT compiler can be confident that the machine code is correct.Here are the patterns:
Objects with direct references to instance variables or via malloc buffer
These are Data and TypedData objects. They have an associated axillary imemo/fields object that stores the instance variables. The access pattern is
object[2] + 2. The fields object is the 3rd field, and the instance variables start at +2 inside the fields object. The fields object itself is a Ruby object, so it contains the usual header bits + class headers.This is similar to Objects with fields_objects, but the fields object is stored at a different offset. We’re differentiating this from boxable classes and modules because those are harder to support.
"Other" pattern is for objects that are rare, or have difficult-to-implement access patterns. This includes:
Proposed shape bit layout:
The important part about these layout patterns is that they do not reflect the type of object, only how the object is laid out in memory. For example, we currently treat structs as "other", but we can refactor them to have the same layout as "Objects with fields_objects", and when we do that they should get a different bit in the shape header.
This commit only reserves the two bits, it doesn't use them in the JIT compiler yet.