New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create ifuncs for compiled blocks once at module load time #4905
Conversation
We have a policy of testing changes to Sorbet against Stripe's codebase before Stripe employees can see the build results here: → https://go/builds/bui_KeKHb3IPQ6znT1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this all seems reasonable! Just one cosmetic suggestion about something that would have made it easier to understand what was going on as I was reviewing.
If you're able, it would be neat to see some benchmarks before and after this change.
compiler/IREmitter/Payload.cc
Outdated
@@ -1165,4 +1161,39 @@ llvm::Value *Payload::getCFPForBlock(CompilerState &cs, llvm::IRBuilderBase &bui | |||
llvm::Value *Payload::buildLocalsOffset(CompilerState &cs) { | |||
return llvm::ConstantInt::get(cs, llvm::APInt(64, 0, true)); | |||
} | |||
|
|||
llvm::Value *Payload::buildBlockIfunc(CompilerState &cs, llvm::IRBuilderBase &builder, const IREmitterContext &irctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest renaming to getOrBuildBlockIfunc
. Working backwards from the call sites I found myself a bit confused that we were "building" an ifunc at certain places, but now I see that this actually uses getOrInsertGlobal
.
We have a policy of testing changes to Sorbet against Stripe's codebase before Stripe employees can see the build results here: → https://go/builds/bui_KeN1EjP7xr7Cpa |
It's not as effective as I hoped: it does indeed reduce the allocations to interpreter levels on the benchmark that motivated this, but the speedup is in the 1-5% range. Still probably worth it to avoid increasing memory pressure in compiled code? |
Motivation
When a Ruby function
rubyfunc
receives a block implemented in C, astruct vm_ifunc
is created for that block...on every call torubyfunc
. The compiler, of course, does this all the time, and it's suboptimal when compared to the interpreter: the interpreter does not require an extra allocation for passing an (interpreted) block to functions.This PR intends to change that state of affairs by allocating the
struct vm_ifunc
up front at module load time. We can then re-use the structure on each call torubyfunc
, saving an allocation at runtime. I expect this to be beneficial for block-heavy benchmarks, but haven't yet gotten the chance to benchmark this.Test plan
See included automated tests.