New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prime? very slow #3649

Closed
giuan opened this Issue Feb 6, 2016 · 11 comments

Comments

Projects
None yet
3 participants
@giuan

giuan commented Feb 6, 2016

The method prime? is 20x slower then MRI.
But a method implemented by myself give the same time as MRI prime? in MRI and jruby.
I think the problem is in the library prime.
Try

require 'prime'
require 'benchmark'
n = 1090109110921093
Benchmark.bm do |r|
  4.times do
    r.report {puts n.prime?}
  end
end

Tiziano Merzi

P.S.
jruby 9.0.5.0 on MAC

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 6, 2016

Member

Wow. Ok, I'll look into this. Definitely something wrong.

Member

headius commented Feb 6, 2016

Wow. Ok, I'll look into this. Definitely something wrong.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 6, 2016

Member

It looks like this is never getting out of the (simple) interpreter:

     Compiled + native   Method                        
 28.9%  1649  +     8    org.jruby.ir.interpreter.StartupInterpreterEngine.interpret
 25.5%  1460  +     0    org.jruby.ir.interpreter.StartupInterpreterEngine.processOtherOp
 10.8%   620  +     0    org.jruby.ir.IRScope.getFileName
  6.8%   390  +     0    org.jruby.runtime.BlockBody.yield
  6.1%   351  +     0    org.jruby.runtime.callsite.CachingCallSite.call
  6.1%   348  +     0    org.jruby.ir.interpreter.InterpreterEngine.processCall
  4.4%   251  +     0    org.jruby.ir.instructions.BFalseInstr.interpretAndGetNewIPC
  3.9%   224  +     0    java.lang.invoke.LambdaForm$DMH.invokeStatic_L6_L
  1.4%    77  +     1    org.jruby.ir.operands.LocalVariable.retrieve
  1.2%    67  +     1    org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath
  0.6%    36  +     0    org.jruby.RubyKernel.loop
Member

headius commented Feb 6, 2016

It looks like this is never getting out of the (simple) interpreter:

     Compiled + native   Method                        
 28.9%  1649  +     8    org.jruby.ir.interpreter.StartupInterpreterEngine.interpret
 25.5%  1460  +     0    org.jruby.ir.interpreter.StartupInterpreterEngine.processOtherOp
 10.8%   620  +     0    org.jruby.ir.IRScope.getFileName
  6.8%   390  +     0    org.jruby.runtime.BlockBody.yield
  6.1%   351  +     0    org.jruby.runtime.callsite.CachingCallSite.call
  6.1%   348  +     0    org.jruby.ir.interpreter.InterpreterEngine.processCall
  4.4%   251  +     0    org.jruby.ir.instructions.BFalseInstr.interpretAndGetNewIPC
  3.9%   224  +     0    java.lang.invoke.LambdaForm$DMH.invokeStatic_L6_L
  1.4%    77  +     1    org.jruby.ir.operands.LocalVariable.retrieve
  1.2%    67  +     1    org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath
  0.6%    36  +     0    org.jruby.RubyKernel.loop
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 6, 2016

Member

It appears that at least part of this bad benchmark result is due to the fact that most of the hot call path is not compiling before the benchmark completes. If I force JIT to happen sooner, the numbers improve (but still don't beat MRI).

It's not clear to me yet which methods are getting called.

Member

headius commented Feb 6, 2016

It appears that at least part of this bad benchmark result is due to the fact that most of the hot call path is not compiling before the benchmark completes. If I force JIT to happen sooner, the numbers improve (but still don't beat MRI).

It's not clear to me yet which methods are getting called.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 6, 2016

Member

Ok, it looks like Ruby 2.3 improved the performance of prime.rb. JRuby 9.0.5.0 roughly matches MRI 2.2 performance, and JRuby with 2.3 stdlib roughly matches MRI 2.3 performance:

Ruby 2.2 stdlib:

[] ~/projects/jruby $ jruby -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  3.170000   0.110000   3.280000 (  2.220189)
 true
  2.520000   0.030000   2.550000 (  2.033816)
 true
  2.100000   0.010000   2.110000 (  1.926965)
 true
  1.930000   0.010000   1.940000 (  1.879152)

[] ~/projects/jruby $ ruby22 bench_prime.rb 
       user     system      total        real
 true
  3.530000   0.010000   3.540000 (  3.533079)
 true
  3.530000   0.000000   3.530000 (  3.530615)
 true
  3.590000   0.010000   3.600000 (  3.596020)
 true
  3.430000   0.000000   3.430000 (  3.436481)

Ruby 2.3 stdlib

[] ~/projects/jruby $ jruby -I ../ruby/lib -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  1.910000   0.100000   2.010000 (  0.934381)
 true
  0.700000   0.010000   0.710000 (  0.631105)
 true
  1.030000   0.010000   1.040000 (  0.748262)
 true
  0.630000   0.010000   0.640000 (  0.610905)

[] ~/projects/jruby $ ruby23 bench_prime.rb 
       user     system      total        real
 true
  0.640000   0.000000   0.640000 (  0.637603)
 true
  0.630000   0.000000   0.630000 (  0.629570)
 true
  0.640000   0.000000   0.640000 (  0.637175)
 true
  0.630000   0.000000   0.630000 (  0.636287)
Member

headius commented Feb 6, 2016

Ok, it looks like Ruby 2.3 improved the performance of prime.rb. JRuby 9.0.5.0 roughly matches MRI 2.2 performance, and JRuby with 2.3 stdlib roughly matches MRI 2.3 performance:

Ruby 2.2 stdlib:

[] ~/projects/jruby $ jruby -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  3.170000   0.110000   3.280000 (  2.220189)
 true
  2.520000   0.030000   2.550000 (  2.033816)
 true
  2.100000   0.010000   2.110000 (  1.926965)
 true
  1.930000   0.010000   1.940000 (  1.879152)

[] ~/projects/jruby $ ruby22 bench_prime.rb 
       user     system      total        real
 true
  3.530000   0.010000   3.540000 (  3.533079)
 true
  3.530000   0.000000   3.530000 (  3.530615)
 true
  3.590000   0.010000   3.600000 (  3.596020)
 true
  3.430000   0.000000   3.430000 (  3.436481)

Ruby 2.3 stdlib

[] ~/projects/jruby $ jruby -I ../ruby/lib -Xjit.threshold=0 -Xcompile.invokedynamic bench_prime.rb 
       user     system      total        real
 true
  1.910000   0.100000   2.010000 (  0.934381)
 true
  0.700000   0.010000   0.710000 (  0.631105)
 true
  1.030000   0.010000   1.040000 (  0.748262)
 true
  0.630000   0.010000   0.640000 (  0.610905)

[] ~/projects/jruby $ ruby23 bench_prime.rb 
       user     system      total        real
 true
  0.640000   0.000000   0.640000 (  0.637603)
 true
  0.630000   0.000000   0.630000 (  0.629570)
 true
  0.640000   0.000000   0.640000 (  0.637175)
 true
  0.630000   0.000000   0.630000 (  0.636287)
@giuan

This comment has been minimized.

Show comment
Hide comment
@giuan

giuan Feb 6, 2016

This is the link
https://gist.github.com/giuan/f2176226e253d8620e0a
In jruby prime? is 20x slower then my method

giuan commented Feb 6, 2016

This is the link
https://gist.github.com/giuan/f2176226e253d8620e0a
In jruby prime? is 20x slower then my method

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 6, 2016

Member

Ok, so other than the 2.2/2.3 stdlib issue, the remaining problem here is that a single long-running method does not trigger JIT.

We don't have a good way to do on-stack replacement, so the best we could do would be to trigger JIT of a method that ran longer than some expectation.

Below is a patch for this, but it's not really accurate because it doesn't exclude child calls.

diff --git a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
index 2cdeaf0..7aeee01 100644
--- a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
+++ b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
@@ -97,7 +97,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, args, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -132,7 +139,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -167,7 +181,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -202,7 +223,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -237,7 +265,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, arg2, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

I'm not sure what priority to give this.

Member

headius commented Feb 6, 2016

Ok, so other than the 2.2/2.3 stdlib issue, the remaining problem here is that a single long-running method does not trigger JIT.

We don't have a good way to do on-stack replacement, so the best we could do would be to trigger JIT of a method that ran longer than some expectation.

Below is a patch for this, but it's not really accurate because it doesn't exclude child calls.

diff --git a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
index 2cdeaf0..7aeee01 100644
--- a/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
+++ b/core/src/main/java/org/jruby/internal/runtime/methods/MixedModeIRMethod.java
@@ -97,7 +97,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, args, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, args, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -132,7 +139,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -167,7 +181,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -202,7 +223,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

@@ -237,7 +265,14 @@ public class MixedModeIRMethod extends AbstractIRMethod implements Compilable<Dy
         if (jittedMethod != null) {
             return jittedMethod.call(context, self, clazz, name, arg0, arg1, arg2, block);
         } else {
-            return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            long start = System.currentTimeMillis();
+            try {
+                return INTERPRET_METHOD(context, ensureInstrsReady(), getImplementationClass().getMethodLocation(), self, name, arg0, arg1, arg2, block);
+            } finally {
+                if ((System.currentTimeMillis() - start) > 100) {
+                    context.runtime.getJITCompiler().buildThresholdReached(context, this);
+                }
+            }
         }
     }

I'm not sure what priority to give this.

@giuan

This comment has been minimized.

Show comment
Hide comment
@giuan

giuan Feb 6, 2016

But if try https://gist.github.com/giuan/f2176226e253d8620e0a
my method is compiled. I have 0.9 sec. for MRI 2.3 and 1.1 for jruby

giuan commented Feb 6, 2016

But if try https://gist.github.com/giuan/f2176226e253d8620e0a
my method is compiled. I have 0.9 sec. for MRI 2.3 and 1.1 for jruby

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 7, 2016

Member

0.9s versus 1.1s is not bad. And for me, the difference is greater. JRuby wins for both your prime? implementation and for the stdlib version (I set JIT threshold to 0 here to force it to compile right away).

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic -Xjit.threshold=0 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  1.560000   0.090000   1.650000 (  0.861703)
 true
  1.110000   0.020000   1.130000 (  0.754386)
 true
  0.700000   0.010000   0.710000 (  0.660860)
 true
  0.960000   0.010000   0.970000 (  0.738980)
 true
  0.780000   0.010000   0.790000 (  0.650650)
 true
  0.530000   0.010000   0.540000 (  0.426186)
 true
  0.650000   0.000000   0.650000 (  0.618836)
 true
  0.420000   0.010000   0.430000 (  0.395621)
 true
  0.640000   0.010000   0.650000 (  0.608281)
 true
  0.420000   0.000000   0.420000 (  0.392981)
 true
  0.650000   0.010000   0.660000 (  0.617567)
 true
  0.430000   0.010000   0.440000 (  0.401770)
 true
  0.660000   0.000000   0.660000 (  0.640085)
 true
  0.430000   0.010000   0.440000 (  0.397853)
 true
  0.750000   0.010000   0.760000 (  0.607031)
 true
  0.450000   0.010000   0.460000 (  0.432919)
 true
  0.670000   0.000000   0.670000 (  0.637849)
 true
  0.410000   0.010000   0.420000 (  0.381010)
 true
  0.620000   0.000000   0.620000 (  0.596011)
 true
  0.500000   0.010000   0.510000 (  0.407787)

[] ~/projects/jruby $ ruby23 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  0.730000   0.000000   0.730000 (  0.733017)
 true
  0.710000   0.000000   0.710000 (  0.706568)
 true
  0.690000   0.000000   0.690000 (  0.691615)
 true
  0.680000   0.010000   0.690000 (  0.682329)
 true
  0.620000   0.000000   0.620000 (  0.619963)
 true
  0.670000   0.000000   0.670000 (  0.666560)
 true
  0.650000   0.000000   0.650000 (  0.652076)
 true
  0.680000   0.000000   0.680000 (  0.682871)
 true
  0.640000   0.000000   0.640000 (  0.640788)
 true
  0.700000   0.000000   0.700000 (  0.702633)
 true
  0.740000   0.000000   0.740000 (  0.741795)
 true
  0.700000   0.000000   0.700000 (  0.700159)
 true
  0.680000   0.000000   0.680000 (  0.684462)
 true
  0.730000   0.010000   0.740000 (  0.727324)
 true
  0.670000   0.000000   0.670000 (  0.676201)
 true
  0.710000   0.000000   0.710000 (  0.709593)
 true
  0.680000   0.000000   0.680000 (  0.681053)
 true
  0.720000   0.000000   0.720000 (  0.719188)
 true
  0.670000   0.000000   0.670000 (  0.677244)
 true
  0.720000   0.000000   0.720000 (  0.719593)
Member

headius commented Feb 7, 2016

0.9s versus 1.1s is not bad. And for me, the difference is greater. JRuby wins for both your prime? implementation and for the stdlib version (I set JIT threshold to 0 here to force it to compile right away).

[] ~/projects/jruby $ jruby -Xcompile.invokedynamic -Xjit.threshold=0 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  1.560000   0.090000   1.650000 (  0.861703)
 true
  1.110000   0.020000   1.130000 (  0.754386)
 true
  0.700000   0.010000   0.710000 (  0.660860)
 true
  0.960000   0.010000   0.970000 (  0.738980)
 true
  0.780000   0.010000   0.790000 (  0.650650)
 true
  0.530000   0.010000   0.540000 (  0.426186)
 true
  0.650000   0.000000   0.650000 (  0.618836)
 true
  0.420000   0.010000   0.430000 (  0.395621)
 true
  0.640000   0.010000   0.650000 (  0.608281)
 true
  0.420000   0.000000   0.420000 (  0.392981)
 true
  0.650000   0.010000   0.660000 (  0.617567)
 true
  0.430000   0.010000   0.440000 (  0.401770)
 true
  0.660000   0.000000   0.660000 (  0.640085)
 true
  0.430000   0.010000   0.440000 (  0.397853)
 true
  0.750000   0.010000   0.760000 (  0.607031)
 true
  0.450000   0.010000   0.460000 (  0.432919)
 true
  0.670000   0.000000   0.670000 (  0.637849)
 true
  0.410000   0.010000   0.420000 (  0.381010)
 true
  0.620000   0.000000   0.620000 (  0.596011)
 true
  0.500000   0.010000   0.510000 (  0.407787)

[] ~/projects/jruby $ ruby23 -I ../ruby/lib bench_prime2.rb 
       user     system      total        real
 true
  0.730000   0.000000   0.730000 (  0.733017)
 true
  0.710000   0.000000   0.710000 (  0.706568)
 true
  0.690000   0.000000   0.690000 (  0.691615)
 true
  0.680000   0.010000   0.690000 (  0.682329)
 true
  0.620000   0.000000   0.620000 (  0.619963)
 true
  0.670000   0.000000   0.670000 (  0.666560)
 true
  0.650000   0.000000   0.650000 (  0.652076)
 true
  0.680000   0.000000   0.680000 (  0.682871)
 true
  0.640000   0.000000   0.640000 (  0.640788)
 true
  0.700000   0.000000   0.700000 (  0.702633)
 true
  0.740000   0.000000   0.740000 (  0.741795)
 true
  0.700000   0.000000   0.700000 (  0.700159)
 true
  0.680000   0.000000   0.680000 (  0.684462)
 true
  0.730000   0.010000   0.740000 (  0.727324)
 true
  0.670000   0.000000   0.670000 (  0.676201)
 true
  0.710000   0.000000   0.710000 (  0.709593)
 true
  0.680000   0.000000   0.680000 (  0.681053)
 true
  0.720000   0.000000   0.720000 (  0.719188)
 true
  0.670000   0.000000   0.670000 (  0.677244)
 true
  0.720000   0.000000   0.720000 (  0.719593)
@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Feb 8, 2016

Member

@enebo @subbuss Any thoughts on triggering JIT sooner than 50 calls when we have a hot loop? I know this falls into the "let's add OSR because benchmarks look better" world, but I wonder if we can at least accelerate the JIT promotion in these cases.

Member

headius commented Feb 8, 2016

@enebo @subbuss Any thoughts on triggering JIT sooner than 50 calls when we have a hot loop? I know this falls into the "let's add OSR because benchmarks look better" world, but I wonder if we can at least accelerate the JIT promotion in these cases.

@enebo

This comment has been minimized.

Show comment
Hide comment
@enebo

enebo Feb 8, 2016

Member

Well nothing in inlining branch changes when/how we JIT methods (read till end :) ), but we do keep track of checking for hot callsites via thread_polling stats. This could also be used for methods which trigger the most thread_polls. Will that be faster than 50 calls? I don't know, but it seems reasonable to me. fwiw, when inlining+profiling is enabled if we detect a site in that loopy method that needs to be inlined we actually compile the surrounding method. In that sense, how profiling works now might end up compiling earlier...assuming there is a hot ruby-only method in the loop.

Member

enebo commented Feb 8, 2016

Well nothing in inlining branch changes when/how we JIT methods (read till end :) ), but we do keep track of checking for hot callsites via thread_polling stats. This could also be used for methods which trigger the most thread_polls. Will that be faster than 50 calls? I don't know, but it seems reasonable to me. fwiw, when inlining+profiling is enabled if we detect a site in that loopy method that needs to be inlined we actually compile the surrounding method. In that sense, how profiling works now might end up compiling earlier...assuming there is a hot ruby-only method in the loop.

@headius

This comment has been minimized.

Show comment
Hide comment
@headius

headius Mar 8, 2016

Member

Ok, I did find a regression for #3626 that prevented blocks from jitting. This was the source of the performance drop in #3626 going from 9.0.4.0 to 9.0.5.0 and it is now fixed. I believe the benchmarks here are back where they should be.

Member

headius commented Mar 8, 2016

Ok, I did find a regression for #3626 that prevented blocks from jitting. This was the source of the performance drop in #3626 going from 9.0.4.0 to 9.0.5.0 and it is now fixed. I believe the benchmarks here are back where they should be.

@headius headius closed this Mar 8, 2016

@headius headius added this to the JRuby 9.1.0.0 milestone Mar 8, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment