-
-
Notifications
You must be signed in to change notification settings - Fork 922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jruby > 9.1.13.0 crashing on bundler when certain jvm flags are enabled #5138
Comments
stacktrace with |
Can you get us a full backtrace with -Xjit.threshold=0? It should show us where this "first" call is happening and we can sort out why it's nil. |
Oh also...Bundler 1.9.10 is several years old. Perhaps it simply doesn't work on newer Rubies? Or perhaps this should not be a priority to "fix" in JRuby if a newer Bundler works fine? |
backtrace here
I understand it may not work with an old version, I just found it strange that jvm flags trigger a no method exception within a very obscure backtrace (there is no |
Yes, it could indicate a problem that should be fixed. |
Sorry, the trace you gave me probably isn't showing what we need...but I have managed to reproduce locally! |
Ok, I've figured out the line that's actually causing this. It's in the same method, further down (like line 48 or so): if spec = lookup['bundler'].first
specs << spec
end I'm not sure why the line number has not been updated for the trace; that would be a separate bug in IR, or possibly only in the full interpreter. So it gets to this point, but lookup['bundler'] is returning a nil result for whatever reason (and presumably it shouldn't since we're obviously inside bundler). |
Ok, I think I can track the line number issue like this...
That will need to be fixed as a separate issue. By moving the |
Yeah this seems like some obscure IR bug. I am having trouble locating it, because whatever logging code I add causes the error to move around. @enebo Maybe you have an idea here? It seems like a variable is not getting assigned the right value, but I can't sort out why. |
@jsvd For what it's worth, this combination of flags probably isn't very useful to you. Is there a reason you used them? |
@headius we're seeing this in the logstash plugin manager where we optimize for faster startup vs performance because it performs short tasks like install/list/remove plugins. The list of flags (which is more than those just two) was compiled from sources like https://github.com/jruby/jruby/wiki/Improving-startup-time |
@headius yeah the line number bug is probably a little juggling in IR builder. It could even be improper line number in ifnode in AST but as you said this is not that issue. next handling I guess I need to run it as well to get more insight. Nothing pops out. It is interesting that JIT works and full interp fails. The thing is in a simple loop a next gets made into just a jump instr....but why would it get triggered if it is not supposed to be? Another wonderment...does this work in startup interpreter? That is pre-CFG and so boundaries are done differently but it may be a data point. We are talking about a jump or something weird with until vs loop (newer code uses loop + break vs until). |
@enebo I'm not sure the bogus value has anything directly to do with the next or line number stuff. It seems to move around as I change the code; a value I'll confirm isn't nil from a called method will end up nil a line later in the variable. It's very strange. |
Example: in the In ...
l = lookup
p l.class
var = l['bundler']
first = var.first
if spec = first
specs << spec
end
... And in def lookup
@lookup ||= begin
lookup = Hash.new { |h,k| h[k] = [] }
specs = @specs.sort_by do |s|
s.platform.to_s == 'ruby' ? "\0" : s.platform.to_s
end
specs.reverse_each do |s|
lookup[s.name] << s
end
lookup
end
p "returning from lookup: " + @lookup.class.inspect
@lookup
end |
Oh, and the output...
|
Debug IR output for the relevant section of
|
Ok, after pairing with @enebo for a while we're working up a theory. He bisected the regression to a CFG optimization in ffa0151. We're still sorting out the details, but my current theory is that the loop eventually ends up with an empty basic block that just dumps off into another block which itself only holds a jump to the loop condition. Because the source block is empty, CFG.optimize crushes the target block into it and kills the target block...without checking that some other block might branch to it. That is the case here, because the Here's my working patch to CFG.optimize that avoids removing a target block if there's more than one place branching to it. This works, but it may be defeating the optimization @enebo intended. This is a very good find, and we'll have it fixed for 9.1.17 one way or another. Here's my patch: diff --git a/core/src/main/java/org/jruby/ir/representations/CFG.java b/core/src/main/java/org/jruby/ir/representations/CFG.java
index 7340837e62..7bb8a9a517 100644
--- a/core/src/main/java/org/jruby/ir/representations/CFG.java
+++ b/core/src/main/java/org/jruby/ir/representations/CFG.java
@@ -476,15 +476,18 @@ public class CFG {
addEdge(a, e.getDestination().getData(), e.getType());
}
- // Delete bb
- removeBB(b);
-
// Update rescue map
if (aR == null && bR != null) {
setRescuerBB(a, bR);
}
- return true;
+ // Delete bb
+ if (inDegree(b) == 1) {
+ removeBB(b);
+ return true;
+ } else {
+ return false; // target only swallowed because someone else needs it
+ }
} else {
return false;
}
diff --git a/rakelib/commands.rake b/rakelib/commands.rake
index 7b6ee3f439..03fe5fa00d 100644
--- a/rakelib/commands.rake
+++ b/rakelib/commands.rake
@@ -21,6 +21,25 @@ def jvm_model
end
end
+class JRubyLauncher
+ def initialize
+ @classpath = [
+ File.join(BUILD_LIB_DIR, 'junit.jar'),
+ File.join(BUILD_LIB_DIR, 'bsf.jar'),
+ TEST_CLASSES_DIR,
+ File.join(TEST_DIR, 'jruby', 'requireTest.jar'),
+ TEST_DIR
+ ]
+ @java_options = {}
+ end
+
+ def run(options, &code)
+ @java_options.merge(options)
+ instance_eval(&code) if code
+
+ end
+end
+
def initialize_paths
self.class.const_set(:JVM_MODEL, jvm_model)
|
…s are enabled. Better title would be CFG merges and deletes a BB other BBs still think they are pointing to. This seems reasonable to me at the moment and I want to land so if someone is enterprising they can try and find more problems. I will have people look at this.
Ok a little dinner and my code just works! @subbuss can you review this change? The base problem was that there was an optimization which would merge an empty BB that it falls through to into a second BB. The issue was that our mergeBB merges the second BB into the first BB. The second BB has incoming edges from jumps or branches but now the label of those are pointing to a non-existent BB (merge deletes it). Also CFG has outgoing edges point to this old deleted BB. A bad time for all. The actual execution was even weirder since it would pick an ipc for the jump based on where the old Label's BB had been. So we would just randomly jump into an instr without executing Solution (A -fallthrough-> B with mergeBBs(A, B) will merge B into A):
So an obvious question to all this is why don't we just merge A into B. In how we use mergeBBs today that might be a simpler solution but this code is more robust since it can handle the relocation. This means when we find other mergeable conditions it will handle fixing up the labels for us. @jsvd can you try and kick the tires A LOT with your configuration. This issue made me realize that our Full Interpreter only coverage is lacking. PS- Another mystery @headius and I wondered about for a little bit was why JIT worked. The short answer is that it calculates where to jump and does not rely on ipc offsets. It is still a little unclear how a jump to a nonexistant label works but perhaps the CFG is still maintaining enough info after the merge? |
I was trying to figure out how I botched the mergeBBs so badly. Looking at git history, I realized I named it pretty badly and didn't document it properly either. It was only being used to merge straight line basic blocks where this scenario wouldn't have happened. I should have named it mergeStraightLineBBs since it was being used in collapseStraightLineBBs. In ffa0151, you used it without realizing that mergeBBs isn't set up to handle general BB merges. Your fix is correct. It generalizes merging for non straight line BBs by fixing up incoming edges of 'b', not just outgoing edges of 'b', as well as the jumps that jump along those incoming edges. So, mergeBBs now lives up to its name. Nice sleuthing and fixing! |
@subbuss thanks for checking. Yeah, I really should have realized we did nothing with incoming edge to 'b' in that method. I even had another regression earlier and missed that fact the second time I looked at that code. |
@enebo In the JIT I use the location in a hash of bytecode labels, so I presume that the target label still exists at the right logical location in the instructions. If it didn't, I'd get to the end of the code with a jump to a label that was never again encountered, and ASM would refuse to emit the bytecode. @jsvd Note that this fix explains all the odd behavior we saw: line number was off (because it skipped over a line number instruction), adding code changed behavior (because it shifted instructions forward beyond the jump location), and of course the original issue where it appeared to be getting nil when it should not. A very good discovery...thank you for helping us reduce it! |
The following crashes on jruby 9.1.16.0 but not on 9.1.13.0:
JRuby 9.1.16.0 raises a NoMethodError exception on a nil object while 9.1.13.0 works correctly (i.e. expected
Bundler::GemNotFound
exception).This happens only when both
-X-C
and-J-Djruby.jit.threshold=0
are enabled. Dropping either one will trigger the normal GemNotFound exception.Also, a newer bundler (like 1.16.1) doesn't trigger the NoMethodError, just the old 1.9.10.
With jruby-9.1.13.0:
But with jruby-9.1.16.0:
This is the minimal reproduction scenario I could find, it is surfacing in Logstash in a more complex scenario: https://logstash-ci.elastic.co/job/elastic+logstash+pull-request+multijob-integration-2/294/console
The text was updated successfully, but these errors were encountered: