New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up IV writes #6767
Speed up IV writes #6767
Conversation
I wanted to measure the impact of this patch when writing non-immediate instance variables, so I made a benchmark to measure it. We get a nice speed up there, though not as good as the immediate case:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few comments
@tenderlove some test failures on arm, in the backend. |
Yep, @jemmaissroff and I are looking in to it. I'll re-request a review when we get it green. |
I worked with @k0kubun this week to fix upstream. I don't have permissions to rebase this PR off master. But I made #6839, a draft PR of this branch rebased off master to show that CI will be green once we rebase off the latest master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rebased off-of master and removed the double store of the shape id now that the issue with the backend is fixed.
Just one open comment to resolve wrt defining a constant for the shape id offset if possible.
This PR takes advantage of shapes in the set instance variable instruction.
It handles a few different cases:
For the time being, we still fire a write barrier in many cases. We never need to fire a write barrier when the object we reference is an immediate. If we know from the virtual stack that the object we're writing is an immediate, we will not generate WB code. If we don't know what type it is, then we do a runtime check and only fire the WB if it's a heap object.
Ideally we could fire the WB even less frequently. The generational algorithm only wants the WB so that we can know old to young references. Unfortunately, the GC also implements an incremental marking algorithm. If the GC is doing incremental marking, then it relies on the WB for executing any black / white / grey transitions.
In other words, object age is not the only factor for deciding whether or not the WB should execute, we also have to take in to account whether or not the GC is currently doing incremental marking.
There must be something clever we can do to handle the above situation, I'm just not sure about it right now.
Benchmarks look like this:
I really really need to study the
respond_to
benchmark. I don't see any instance variables in that benchmark and yet this patch makes it faster.