Remove the reserved member from rb_data_type_t as the addition of the compactor callback pushed it over a single cache line #2396
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I noticed that since the introduction of the
GC.compact
API, structrb_data_type_t
spans multiple cache lines with the introduction of thedcompact
function pointer / callback:I'm wondering what the
reserved
member was originally intended for, given introducing thedcompact
member basically already broke binary compatibility by changing the struct size from64
->72
bytes when preserving thereserved
member as well.This struct is defined in
include/ruby.h
and used extensively in MRI but also extensions and thus "public API". If there's the off chance that there isn't a need for the reserved member moving forward (maybe could have been for compacting or a similar GC feature?), could we remove it and prefer aligning on cache line boundaries instead?Packed with the
reserved
member removed, single cache line:Usage in MRI
Examples of internal APIs that use it and how the typed data type declarations does not affect the tail of the function struct with the style used in MRI (I realize this may not be true for all extensions):
AST
Fiber
Enumerator
And related generator etc. types.
Encoding
Proc, Binding and methods
Threads
And many others both internal and in
ext/
. Looking at the definitions in MRI at least, I don't see:reserved
memberBenchmarks
Focused from the standard bench suite on typed data objects as mentioned above.
Prelude:
Left side
compare-ruby
(master), right sidecurrent
(this branch):Further cache utilization info
Used
perf stat
on a rails console using the integration session helper to load the redmine homepage 100 times (removes network roundtrip and other variance and easier to reproduce for reviewers - less tools).Master
This branch:
Conclusions:
L1-dcache-loads
:1057,725 M/sec
->1064,138 M/sec
(higher rate of L1 cache loads)L1-dcache-load-misses
:4,69%
->4,40%
(reduced L1 cache miss rate)Thoughts? @ko1 @tenderlove @ioquatix (seems to impact the Fiber benches most)