-
-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prophile #5384
Conversation
opaque as this is the only consumer which does not extend CallBase for compileCallCommon. EQQInstr is either an actual call or we fast path but from IR analysis perspective it is reasonable for it to extend a call since it can perform one. The rational for this change is that the JIT breaks all the elements out of eqqinstr and callbase to compile calls. I need hasliteralclosure where the JIT lazily constructs callsites so by making all things which invoke a callbase then we can pass that down there without having to change n potential signatures within the JIT.
…Slowly pushing callbase as opaque unit so I can setup callsite based on extra info not passed into depths of JIT without having to add new parameters in several signatures. Technically, I am doing that with these changes but it will be more insulation from future additions to callbase.
…allers IR scope and can keep track of IR destined methods.
…tains compiled method and the scope it will be accessing. With both ends we should now be able to try rewriting the mixed method with an inlined version.
…ll run inliner and not crash. The code generated is not executed at all yet and will definitely need some debugging once we do try and execute (from past experience -- some snippets would work and some wouldn't for the arities that were attempted in the old inline branch).
… lucky or the new optimized method is not really replacing the previous JITd version.
…params. Changed some names to be where my head is at in inliner. Dead line of code in JVMVisitor.
…reter mode. Added it. IRScope is pushed through all call constructors since callsite needs it. This is not as bad in retrospect as I thought since we have already started doing it for other instruction constructors. In my test script I am working now both full and JIT will crash on same exact classcastexception. Now at a point to debug that problem. At this point -X-C and JIT (only MixedIRMethod) will inline methods. There are no heuristics in place (although by nature of where I put the inline we will inline after 10 mono calls) and this is conservative inline vs the deopt version.
…der. Makes seeing possible next BB much simpler than lookup up above.
…t compiled form are considered the same. This is definitely important from full to JIT.
…. This maybe still it needed if we have single arg where that arg happens to be an array? Thus far I am opting for less fields.
…e time of inlining for their receiver.
…e out proper scope in searches for things like constants without us having to maintain a depth. Another path this could have taken would be to create a Scope operand which would just pre-calculate and store the proper StaticScope instance but that would add a lot of memory to IR. Scope as an operand will come into being but it will only be used for inlining or moving instructions out of their current scope (not to be done in this commit).
inline operation they will be able to record their actual scope (as it will not be current in their new location).
…cope. Make the temporary variables. Note: Later for deopt this substition is where we record temp var values so we can deopt.
… removed via CFG optimization which assumed next BB would be the next regular or exit BB. Add final postprocessing to re-add a jump if they are missing. Adding these jumps may not be 100% needed vs rearranging linearized CFG but in the end it should not matter since the JVM will rearrange all this and optimize it anyways. We could care about non-optimal layout for full interp but it is probably a pretty small penalty.
…losures being cloned during inline can belong to one of the two method scopes being considered. host scope closures just stay self and inlined method scopes need to get their new receiver variable.
…mingly, we should be able to do something less arduous than two copy instrs but in the case I fixed we were dealing with a block pass argument. We do not seem to actually use the block pass itself explicitly in our instr set so when we cloned we would assume we can pick up the closure implicitly. For literal closures we no doubt have some extra copies we don't need now but JVM will eliminate them. At some later point we need to reexamine this. I think for block pass we explicitly reify or consumer from that variable directly.
…h and try and setup up something incomplete.
Ultimately, this should convert to full build and then perform inline (one benefit of design to keep logical callsites share the same callsiteid).
… is not too important to get working right now.
…pes. This was a broad stroke for now. The primary issue is that if those scopes capture lvars of the method to be inlined then those lvars are gone...so how can a closure still capture it? I say broad stroke here because there are many nested closures which do not capture variables. This is just a little scope reduction for now.
- Add ir.inliner.threshold property (defaults to 20) - Add logging of profiling examination in profilingcachingcallsite - Hook up scopes decision to be host to an inlining via inliningAllowed(). - Remove type change logic as it is not needed yet - cleanup naming
…ining makes simple Integer#times 60% faster. Wowie Zowie. This PoC blows my mind at how little effort it took to work and that it works. We have actually wanted a path towards technique like this for a long time. This might not be our end goal but it surprising how easily it fit into the inliner. So when profiled callsite discovers a particular receiver (Fixnum) and a particular method (times). It will request a new ruby implementation of said method and kick out of the inliner. The next time the callsite decides whether to inline we will have Ruby host method and a Ruby target method so our inliner can inline! This does have a small issue that it invalidates all our fast fixnum ops (which all things considered is not a massive hit for non-indy targets), but other than that it has a good improvement in perf: ```ruby def foo s = 0 10_000_000.times do |i| s += i end s end total = 0 300.times do |i| t = Time.now total += foo p Time.now - t end p total ``` jruby -Xir.inliner.threshold=5 -Xir.inliner.verbose -Xir.inliner=true -Xjit.threshold=0 -r ../snippets/block_inline13 0.539815 0.420346 0.408688 0.41530599999999995 0.41341999999999995 uri:classloader:/jruby/ruby_implementations/Fixnum/times.rb:1: warning: constant ::Fixnum is deprecated 0.389792 0.529822 0.5269240000000001 0.498106 0.50862 0.5806739999999999 0.489052 2018-10-24T15:00:47.100-05:00 [main] INFO ProfilingCachingCallSite : PROFILE: INSTANCE_METHOD foo[/home/enebo/work/snippets/block_inline13.rb:0] -> Integer#times - 6 ---------------------------------- PROLOGUE (start) -------- Looking for: 3581: > %v_3 = call(Fixnum:10000000, %self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2] ;n:times, t:NO, cl:true) host of inline cfg: BB [1:LBL_0:-1]:>[2,3] BB [2:LBL_1:-1]:>[3,4], <[1] BB [3:LBL_2:-1]:>[4], <[1,2,4] BB [4:_GLOBAL_ENSURE_BLOCK__0:-1]:>[3], <[2,3] host of inline instrs: BB [1:LBL_0:-1] -EXIT->3 -FALL_THROUGH->2 push_method_frame() push_method_binding() BB [2:LBL_1:-1] -EXIT->3 -EXCEPTION->4 check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 1) s(0:0) = copy(Fixnum:0) line_num(;n: 2) %v_3 = call(Fixnum:10000000, %self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2] ;n:times, t:NO, cl:true) line_num(;n: 5) %v_5 = copy(s(0:0)) pop_binding() pop_method_frame() return(%v_5) BB [3:LBL_2:-1] -EXCEPTION->4 BB [4:_GLOBAL_ENSURE_BLOCK__0:-1] -EXIT->3 %v_4 = recv_jruby_exc() pop_binding() pop_method_frame() throw(%v_4) ------ Rescue block map ------ BB 2 --> BB 4 BB 3 --> BB 4 method to inline cfg: BB [1:LBL_1:-1]:>[2,8] BB [2:LBL_2:-1]:>[3], <[1] BB [3:_LOOP_BEGIN_0:9]:>[4,6], <[2,4] BB [4:_ITER_BEGIN_0:12]:>[3], <[3] BB [6:LBL_0:20]:>[8], <[3] BB [8:LBL_3:-1]:<[1,6] method to inline instrs: BB [1:LBL_1:-1] -EXIT->8 -FALL_THROUGH->2 BB [2:LBL_2:-1] -FALL_THROUGH->3 %v_0 = load_implicit_closure() check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 2) %t_i_6 = copy(Fixnum:0) line_num(;n: 3) BB [3:_LOOP_BEGIN_0:9] -FALL_THROUGH->4 -REGULAR->6 %v_4 = call_1o(%t_i_6, %self ;n:<, t:NO, cl:false) b_false(LBL_0:20, %v_4) BB [4:_ITER_BEGIN_0:12] -REGULAR->3 thread_poll() line_num(;n: 4) %v_5 = yield(%v_0, %t_i_6 ;unwrap: false) line_num(;n: 5) %t_i_6 = call_1f(%t_i_6, Fixnum:1 ;n:+, t:NO, cl:false) jump(_LOOP_BEGIN_0:9) BB [6:LBL_0:20] -EXIT->8 return(nil) BB [8:LBL_3:-1] ------ Rescue block map ------ ---------------------------------- PROLOGUE (end) ----------- ---------------------------------- SPLIT BB (start) -------- Before:LBL_1:-1 BB [2:LBL_1:-1] check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 1) s(0:0) = copy(Fixnum:0) line_num(;n: 2) After:LBL_3:-1 BB [5:LBL_3:-1] -EXIT->3 -EXCEPTION->4 line_num(;n: 5) %v_5 = copy(s(0:0)) pop_binding() pop_method_frame() return(%v_5) cfg: BB [1:LBL_0:-1]:>[2,3] BB [2:LBL_1:-1]:<[1] BB [3:LBL_2:-1]:>[4], <[1,4,5] BB [4:_GLOBAL_ENSURE_BLOCK__0:-1]:>[3], <[3,5] BB [5:LBL_3:-1]:>[3,4] instrs: BB [1:LBL_0:-1] -EXIT->3 -FALL_THROUGH->2 push_method_frame() push_method_binding() BB [2:LBL_1:-1] check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 1) s(0:0) = copy(Fixnum:0) line_num(;n: 2) BB [3:LBL_2:-1] -EXCEPTION->4 BB [4:_GLOBAL_ENSURE_BLOCK__0:-1] -EXIT->3 %v_4 = recv_jruby_exc() pop_binding() pop_method_frame() throw(%v_4) BB [5:LBL_3:-1] -EXIT->3 -EXCEPTION->4 line_num(;n: 5) %v_5 = copy(s(0:0)) pop_binding() pop_method_frame() return(%v_5) ------ Rescue block map ------ BB 2 --> BB 4 BB 3 --> BB 4 ---------------------------------- SPLIT BB (end) ----------- ---------------------------------- SPLIT BB (start) -------- Before:LBL_7:13 BB [8:LBL_7:13] thread_poll() line_num(;n: 4) After:LBL_9:-1 BB [11:LBL_9:-1] -REGULAR->7 line_num(;n: 5) %v_8 = call_1f(%v_8, Fixnum:1 ;n:+, t:NO, cl:false) jump(LBL_5:11) cfg: BB [1:LBL_0:0]:>[2,3] BB [2:LBL_1:2]:>[6,10], <[1] BB [3:LBL_2:30]:>[4], <[1,4,5] BB [4:_GLOBAL_ENSURE_BLOCK__0:25]:>[3], <[3,5] BB [5:LBL_3:20]:>[3,4], <[9,10] BB [6:LBL_4:6]:>[7], <[2] BB [7:LBL_5:11]:>[8,9], <[6,11] BB [8:LBL_7:13]:<[7] BB [9:LBL_6:19]:>[5], <[7] BB [10:LBL_8:-1]:>[5], <[2] BB [11:LBL_9:-1]:>[7] instrs: BB [1:LBL_0:0] -EXIT->3 -FALL_THROUGH->2 push_method_frame() push_method_binding() BB [2:LBL_1:2] -FALL_THROUGH->6 -REGULAR->10 check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 1) s(0:0) = copy(Fixnum:0) line_num(;n: 2) module_guard(Fixnum:10000000, LBL_8:-1 ;name: Object, expected_version: 22361) BB [3:LBL_2:30] -EXCEPTION->4 BB [4:_GLOBAL_ENSURE_BLOCK__0:25] -EXIT->3 %v_4 = recv_jruby_exc() pop_binding() pop_method_frame() throw(%v_4) BB [5:LBL_3:20] -EXIT->3 -EXCEPTION->4 line_num(;n: 5) %v_5 = copy(s(0:0)) pop_binding() pop_method_frame() return(%v_5) BB [6:LBL_4:6] -FALL_THROUGH->7 %v_6 = copy(Fixnum:10000000) %v_7 = copy(%self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2]) line_num(;n: 2) %v_8 = copy(Fixnum:0) line_num(;n: 3) BB [7:LBL_5:11] -FALL_THROUGH->8 -REGULAR->9 %v_9 = call_1o(%v_8, %v_6 ;n:<, t:NO, cl:false) b_false(LBL_6:19, %v_9) BB [8:LBL_7:13] thread_poll() line_num(;n: 4) BB [9:LBL_6:19] -EXIT->5 %v_3 = copy(nil) BB [10:LBL_8:-1] -REGULAR->5 %v_3 = call(Fixnum:10000000, %self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2] ;n:times, t:NO, cl:true) jump(LBL_3:20) BB [11:LBL_9:-1] -REGULAR->7 line_num(;n: 5) %v_8 = call_1f(%v_8, Fixnum:1 ;n:+, t:NO, cl:false) jump(LBL_5:11) ------ Rescue block map ------ BB 2 --> BB 4 BB 3 --> BB 4 BB 5 --> BB 4 BB 6 --> BB 4 BB 7 --> BB 4 BB 8 --> BB 4 BB 9 --> BB 4 ---------------------------------- SPLIT BB (end) ----------- ---------------------------------- EPILOGUE (start) -------- cfg: BB [1:LBL_0:0]:>[2,3] BB [2:LBL_1:2]:>[6,10], <[1] BB [3:LBL_2:30]:>[4], <[1,4,5] BB [4:_GLOBAL_ENSURE_BLOCK__0:25]:>[3], <[3,5] BB [5:LBL_3:20]:>[3,4], <[9,10] BB [6:LBL_4:6]:>[7], <[2] BB [7:LBL_5:11]:>[8,9], <[6,11] BB [8:LBL_7:13]:>[12], <[7] BB [9:LBL_6:19]:>[5], <[7] BB [10:LBL_8:-1]:>[5], <[2] BB [11:LBL_9:-1]:>[7], <[12] BB [12:LBL_10:-1]:>[11], <[8] instrs: BB [1:LBL_0:0] -EXIT->3 -FALL_THROUGH->2 push_method_frame() push_method_binding() BB [2:LBL_1:2] -FALL_THROUGH->6 -REGULAR->10 check_arity(;req: 0, opt: 0, *r: false, kw: false) line_num(;n: 1) s(0:0) = copy(Fixnum:0) line_num(;n: 2) module_guard(Fixnum:10000000, LBL_8:-1 ;name: Object, expected_version: 22361) BB [3:LBL_2:30] -EXCEPTION->4 BB [4:_GLOBAL_ENSURE_BLOCK__0:25] -EXIT->3 %v_4 = recv_jruby_exc() pop_binding() pop_method_frame() throw(%v_4) BB [5:LBL_3:20] -EXIT->3 -EXCEPTION->4 line_num(;n: 5) %v_5 = copy(s(0:0)) pop_binding() pop_method_frame() return(%v_5) BB [6:LBL_4:6] -FALL_THROUGH->7 %v_6 = copy(Fixnum:10000000) %v_7 = copy(%self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2]) line_num(;n: 2) %v_8 = copy(Fixnum:0) line_num(;n: 3) BB [7:LBL_5:11] -FALL_THROUGH->8 -REGULAR->9 %v_9 = call_1o(%v_8, %v_6 ;n:<, t:NO, cl:false) b_false(LBL_6:19, %v_9) BB [8:LBL_7:13] -FALL_THROUGH->12 thread_poll() line_num(;n: 4) [DEAD]nop() BB [9:LBL_6:19] -EXIT->5 %v_3 = copy(nil) jump(LBL_3:20) BB [10:LBL_8:-1] -REGULAR->5 %v_3 = call(Fixnum:10000000, %self:CLOSURE foo_CLOSURE_1[/home/enebo/work/snippets/block_inline13.rb:2] ;n:times, t:NO, cl:true) jump(LBL_3:20) BB [11:LBL_9:-1] -REGULAR->7 line_num(;n: 5) %v_8 = call_1f(%v_8, Fixnum:1 ;n:+, t:NO, cl:false) jump(LBL_5:11) BB [12:LBL_10:-1] -EXIT->11 %v_11 = copy(%v_8) line_num(;n: 3) s(0:0) = call_1o(s(0:0), %v_11 ;n:+, t:NO, cl:false) %v_12 = copy(s(0:0)) [DEAD]nop() [DEAD]nop() [DEAD]nop() %v_10 = copy(%v_12) jump(LBL_9:-1) ------ Rescue block map ------ BB 2 --> BB 4 BB 3 --> BB 4 BB 5 --> BB 4 BB 6 --> BB 4 BB 7 --> BB 4 BB 8 --> BB 4 BB 9 --> BB 4 BB 11 --> BB 4 BB 12 --> BB 4 ---------------------------------- EPILOGUE (end) ----------- 2018-10-24T15:00:47.115-05:00 [main] INFO IRScope : Inline of INSTANCE_METHOD times[uri:classloader:/jruby/ruby_implementations/Fixnum/times.rb:1] into INSTANCE_METHOD foo[/home/enebo/work/snippets/block_inline13.rb:0] succeeded. 0.54959 0.256487 0.257141 0.265739 0.248881 0.256676 0.24279599999999998 0.24123999999999998 0.260453 0.246416 0.24501699999999998 0.26812400000000003 0.25128 0.245881 0.247723 0.247009 0.248825 0.241915 0.243483 0.244178 0.252438 0.240706 0.243314 ``` With fastops forced on this drops to about 0.21s. Extra Note: Current inliner works with full interpreter and JIT but does not work with invokedynamic currently. Caveat: This is a PoC. I think we need a much faster comparison mechanism gainst all builtin methods we can replace with ruby implementations (possibly use DynamicMethods as keys with location of ruby implementation as value). A second probable direction is to make an internal loader and a special method for replacement without actually causing any invalidation. This would not cause the math fastops invalidation mentioned above.
…cope and result in wrong order...but why does it persist now?
difference in this version is that it will not invalidate the class in which it is substituting the method. So in the case of Integer#times we still reaplce the native call with an IR version and still guard against integer being re-opened but since we never register the ruby version of times we never invalidate fast math ops.
… IRScope. (also getLineNumber). Base problem was that IRScriptScope did not override getFile but getFileName and this created an endless recursive call.
@@ -1852,6 +1852,9 @@ public boolean shouldPrecompileAll() { | |||
public static boolean IR_COMPILER_DEBUG = Options.IR_COMPILER_DEBUG.load(); | |||
public static boolean IR_WRITING = Options.IR_WRITING.load(); | |||
public static boolean IR_READING = Options.IR_READING.load(); | |||
public static boolean IR_INLINER = Options.IR_INLINER.load(); | |||
public static int IR_INLINER_THRESHOLD = Options.IR_INLINER_THRESHOLD.load(); | |||
public static boolean IR_INLINER_VERBOSE = Options.IR_INLINER_VERBOSE.load(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just move these closer to where you use them...I think we want to get away from the pattern of static fields on RubyInstanceConfig for every little thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@headius ok I will change this quick...although not entirely sure where I will put them maybe in IRManager
RubyModule type = self.getMetaClass(); | ||
String fileName = "classpath:/jruby/ruby_implementations/" + type + "/" + method + ".rb"; | ||
FileResource file = JRubyFile.createResourceAsFile(context.runtime, fileName); | ||
InputStream stream = file.openInputStream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to close this...ideally try-with-resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@headius will do post merge. This can only happen via Integer#times and still needs cleanup. but I will fix that fairly soon.
|
||
return newMethod; | ||
} catch (IOException e) { | ||
e.printStackTrace(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, there's already a regarding comment from @headius
|
||
public YieldInstr(Variable result, Operand block, Operand arg, boolean unwrapArray) { | ||
super(Operation.YIELD, result, block, arg == null ? UndefinedValue.UNDEFINED : arg); | ||
|
||
assert result != null: "YieldInstr result is null"; | ||
|
||
this.unwrapArray = unwrapArray; | ||
this.callSiteId = CallBase.callSiteCounter++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't matter if some end up with the same id (under concurrent execution), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For today, this is not an issue because the conflict needs to be within the same scope we are deciding to inline on. So it is more a tuple of {IRScope, callsiteId} as identity of the callsite we are interested in. We cannot parallel build the same scope and we also have a limitation on multiple inlinings happening. When we do support multiple inlinings we should revisit this (this scenario I suspect would be rare as well as both scopes would need to inline together then inline again -- 3 inlines where 2 methods were built at same moment). I did not add this originally so this is largely just a continuation of what we do in CallBase. You do make me realize when we get to recursive inlining we need to be aware we will have potentially n identical calls. Even with atomic updates we need to cope with uniqueness.
*/ | ||
public Site siteOf(long callsiteId) { | ||
for (Instr instr: instrs) { | ||
if (instr instanceof Site && ((Site) instr).getCallSiteId() == callsiteId) return (Site) instr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, seems (above) might matter - better of using an AtomicLong instead of long++
@@ -15,12 +16,14 @@ | |||
import java.util.List; | |||
|
|||
public class CFGInliner { | |||
private static final boolean debug = false; | |||
private static final boolean debug = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's flip this back to false
before 🚢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I still am happy to see this output but I should use Log and hide behind the pref. Just forgot to change this over. I will do that before merging.
@@ -0,0 +1,9 @@ | |||
class Integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe at some point the ruby_implementations name should make it clear that its only used when inlining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can change this name at any point. I am not sure if this will eventually only be for inlining though. Another thought about the ruby implementations is they can be specialized implementations if we know exactly how they will be used. In that case, this could be a much more general location?
This is beginnings of a callsite-based profiler. It is enabled with --ir.profiler. You can try to enable it but it stlll is a work in progress. The main changes to notice:
Additional high level design goals. Be as incremental as possible. Original profiling work has a global profiler but this is just simple counter on special callsite type. We can later have callsites coordinate with a global profiler. Only normal calls are supported but in same spirit we can easily add FCALLS. Not all method targets are allowed. Some of these will be able to be incrementally supported later....
An experiment on the experiment is that native implementations can be replaced by pure-ruby versions so we can still inline in common core APIs which accept blocks.