Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the Optcarrot benchmark iteratively in the same VM results in a core dump #919

Closed
Ichaelus opened this issue Jan 3, 2018 · 16 comments
Assignees

Comments

@Ichaelus
Copy link

Ichaelus commented Jan 3, 2018

Outline

  • Ubuntu Mate 16.04
  • Intel i7 870 - 4x 2.93GHz - 3.60 GHz
  • 16GB RAM

Reproduction

Error description

After ~ the 7th run, the JVM tries to dump its core. Exact message:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fea720e3ea6, pid=20200, tid=0x00007fea8b568700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_151-b12) (build 1.8.0_151-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.71-b01-internal-jvmci-0.38 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 15644 JVMCI org.graalvm.compiler.truffle.OptimizedCallTarget.callRoot([Ljava/lang/Object;)Ljava/lang/Object; (53 bytes) @ 0x00007fea720e3ea6 [0x00007fea720e0c20+0x3286] (#run /home/michael/Projects/RubyCompare/RubySuite/optcarrot/lib/optcarrot/nes.rb:74)
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/michael/Projects/RubyCompare/RubySuite/optcarrot/hs_err_pid20200.log
Loaded disassembler from /home/michael/.rbenv/graalvm-0.30.2/jre/lib/amd64/hsdis-amd64.so
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)

The referenced log file can be found here. Version 0.29 does not seem to be affected of this issue.

@eregon
Copy link
Member

eregon commented Jan 4, 2018

  • Run ruby -S truffle-benchmark.rb with (an rbenv version of) TruffleRuby 0.30.2.

The -S should not be needed here (but probably doesn't matter much).

Unfortunately I cannot reproduce this on Fedora 24.
Does it reproduce every time at you?

All 50 runs work successfully and it ends with:

truffle-benchmark.rb:5:in `method_missing': undefined method `start' for 50:Fixnum (NoMethodError)

Which is just that extra .start call.

I am running with the full path to bin/ruby: $ ../graalvm-0.30.2/bin/ruby -S truffle-benchmark.rb.

@eregon eregon self-assigned this Jan 4, 2018
@Ichaelus
Copy link
Author

Ichaelus commented Jan 4, 2018

I see, the start call is just a fragment of the original extracted code. On my machine, the behavior has shown repeatable for ~20 times. Experiments with/without -S flag or full binpath are showing no effects.

@Ichaelus
Copy link
Author

Ichaelus commented Jan 4, 2018

Update: My Notebook does not run in any problems either (though using TR 0.30.1).. strange. Its runs the same Ubuntu with a slightly different Kernel and CPU. Don't know how to solve this on the desktop, though

@gilles-duboscq
Copy link
Member

I tried reproducing on a laptop and a server but couldn't on either.

Could you try with:

ruby --jvm.Dgraal.Dump=:2 --jvm.Dgraal.MethodFilter='*nes_rb:74*' -S truffle-benchmark.rb

It should dump some compiler graphs. You will see a message somewhere is the stdout. e.g.,:

...
Running for the 6th time
Dumping debug output in /home/gmdubosc/optcarrot/dumps/1515082214696
fps: 144.896338612011
...

If it doesn't reproduce with that command line, try removing the :2 after Dump=.

If you have a run that dumps graphs and reproduces, could you provide the hs_err file as well as the contents of the dump directory?

@Ichaelus
Copy link
Author

Ichaelus commented Jan 4, 2018

Thanks for investigating, @gilles-duboscq!

This is the cli-output when running the command you provided. Excluding :2 had no influence on the result.

...
Running for the 5th time
fps: 111.191726488053
checksum: 59662
Running for the 6th time
Dumping debug output in /home/michael/Projects/RubyCompare/Benches/optcarrot/dumps/1515083065092
fps: 51.849690467061
checksum: 59662
Running for the 7th time
fps: 123.352515793125
checksum: 59662
Running for the 8th time
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff0ca407066, pid=5138, tid=0x00007ff0e3ca6700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_151-b12) (build 1.8.0_151-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.71-b01-internal-jvmci-0.38 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# J 15669 JVMCI org.graalvm.compiler.truffle.OptimizedCallTarget.callRoot([Ljava/lang/Object;)Ljava/lang/Object; (53 bytes) @ 0x00007ff0ca407066 [0x00007ff0ca403de0+0x3286] (#run /home/michael/Projects/RubyCompare/Benches/optcarrot/lib/optcarrot/nes.rb:74)
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/michael/Projects/RubyCompare/Benches/optcarrot/hs_err_pid5138.log
Loaded disassembler from /home/michael/.rbenv/graalvm-0.30.2/jre/lib/amd64/hsdis-amd64.so
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)

The files are attached to my comment:
1515083065092_dump.tar.gz
hs_err_pid5138.log

@gilles-duboscq
Copy link
Member

Thank you @Ichaelus ! The dump helped narrow down the part of the compiler that's doing the wrong transformation. I'm still looking into it.

@gilles-duboscq
Copy link
Member

Hi @Ichaelus, I have a potential fix which i could test against a unit test which reproduces something similar to your issue but it would help if you could confirm that it really fixes your issue.

To do so, could you replace graalvm-0.30.2/jre/lib/jvmci/graal.jar with the version from this archive:
graal.zip and run your benchmark?

@Ichaelus
Copy link
Author

Ichaelus commented Jan 5, 2018

I'm totally amazed! Can't stop grinning, since I tried to circumvent this error for a while now ;-)
Long story short - it works like a charm for me, thanks!

@gilles-duboscq
Copy link
Member

Thanks a lot for your help with diagnosing and testing this!

I'll start the review process and hopefully the fix should be in the next GraalVM version.

@eregon, i'm not sure what we should do with this issue here. Should it be closed since it's a Graal compiler issue and not a truffleruby issue? Or do you prefer to wait until there is a Graal version with the fix?

@eregon
Copy link
Member

eregon commented Jan 5, 2018

@gilles-duboscq and @davleopo Thanks a lot for the help!

I think it's best to wait until the fix is in Graal and then we can link it here, so we know the revision of the fix.

@gilles-duboscq
Copy link
Member

Should be fixed in oracle/graal@6cfa502

@Ichaelus
Copy link
Author

Ichaelus commented Jan 9, 2018

Does this fix include additional code as the graal.jar you posted earlier on?
Using the JAR, a repeated call of

Optcarrot::NES.new(["--benchmark", "examples/Lan_Master.nes"]).run

keeps delivering results, but the VM slows down enormously at ~ 220 calls.

optcarrot_runtime_over_total_time

(x-axis: Seconds, y-axis: FPS)

I know this is low-priority bug, but I felt like you could want to know about it.

Michael

@gilles-duboscq
Copy link
Member

The JAR i provided earlier was built by applying the same fix locally on the 0.30.2 branch.
Do you know if this performance issue after ~220 calls also happens on earlier GraalVM versions?
I'm guessing this is an other problem we need to look into.

@Ichaelus
Copy link
Author

Ichaelus commented Jan 9, 2018

Ok. I'll try generate a 0.28 vs. 0.30.2 report in the next couple of days.

@eregon
Copy link
Member

eregon commented Jan 9, 2018

@Ichaelus Could you open a separate bug for the slowdown?
BTW, I think this is a fairly uncommon way to run OptCarrot, notably it runs the NES startup sequence every time. But nevertheless this slowdown is worth investigating.

@Ichaelus
Copy link
Author

Ok we're at #959 now.
Yeah I'm aware of that, but I found it an interesting alternative :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants