New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtualize unsafe compare and swap calls #636

Merged
merged 1 commit into from Sep 26, 2018

Conversation

4 participants
@fwbrasil
Contributor

fwbrasil commented Aug 24, 2018

Problem

Compare and swap calls using Unsafe are currently always lowered to an atomic operation. That's unnecessary when the CAS mutates a field of an instance that is virtual. It introduces some performance overhead and prevents other optimizations.

Let's take this JMH benchmark class as an example:

public class VirtualCASBench {

  public static final long valueOffset = UnsafeUtil.fieldOffset(TestClass.class, "value");

  private static class TestClass {
    public volatile int value;

    public TestClass(int value) {
      this.value = value;
    }
  }

  @Benchmark
  public boolean testUnsafe() {
    TestClass t = new TestClass(0);
    return UnsafeUtil.unsafe.compareAndSwapInt(t, valueOffset, 0, 1);
  }

  @Benchmark
  public boolean testIfElse() {
    TestClass t = new TestClass(0);
    synchronized (t) {
      if (t.value != 0)
        return false;
      else {
        t.value = 1;
        return true;
      }
    }
  }
}

The if/else version has much better performance than the version using the unsafe compare and swap:

image

It is optimized before lowering to a graph that returns a constant:

image

That's expected since the instance doesn't escape and constant folding can determine that the CAS will always return true. The same doesn't happen with the unsafe version since the compare and swap node is opaque to constant folding:

image

Solution

When an object is virtualized, compare and swap can be replaced by a guard and the new value can be set using the VirtualizerTool. I've implemented such optimization and the benchmark results are promising:

image

With the CAS virtualization, the unsafe compare and swap benchmark is also optimized to a constant:

image

Notes and questions

  1. I've used a guard that will trigger deoptimization in case the current value doesn't match the expected value. It assumes that users won't write code that will makes a CAS operation fail since the object doesn't escape and can't be used concurrently.

  2. I couldn't link the guard node to the predecessor using the VirtualizerTool, so I set it in UnsafeCompareAndSwapNode.virtualize. I'm not sure if that could be problematic since it seems that all effects should be done through the tool during virtualization.

  3. The optimization is applied only when the field can be resolved. If the user uses a dynamic or invalid offset, it won't be applied.

  4. I've added an option to enable the optimization. Should it be enabled by default?

@graalvmbot

This comment has been minimized.

Show comment
Hide comment
@graalvmbot

graalvmbot Aug 24, 2018

Collaborator
  • Hello Flavio Brasil, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address fbrasil -(at)- twitter -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

Collaborator

graalvmbot commented Aug 24, 2018

  • Hello Flavio Brasil, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address fbrasil -(at)- twitter -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 24, 2018

Contributor

Twitter has already signed the OCA, but I've also submitted the form by email.

Contributor

fwbrasil commented Aug 24, 2018

Twitter has already signed the OCA, but I've also submitted the form by email.

@graalvmbot

This comment has been minimized.

Show comment
Hide comment
@graalvmbot

graalvmbot Aug 27, 2018

Collaborator
  • Flavio Brasil has signed the Oracle Contributor Agreement (based on email address fbrasil -(at)- twitter -(dot)- com) so can contribute to this repository.
Collaborator

graalvmbot commented Aug 27, 2018

  • Flavio Brasil has signed the Oracle Contributor Agreement (based on email address fbrasil -(at)- twitter -(dot)- com) so can contribute to this repository.
@gilles-duboscq

This comment has been minimized.

Show comment
Hide comment
@gilles-duboscq

gilles-duboscq Aug 27, 2018

Member

Thank you for the PR, that's a good idea.

I've answered some of your questions in comments.

Regarding the predecessor issue, the tool's addNode method weaves fixed nodes as needed: when the addNode is applied it adds the fixed node before the node that is being virtualized using StructuredGraph#addBeforeFixed.

Member

gilles-duboscq commented Aug 27, 2018

Thank you for the PR, that's a good idea.

I've answered some of your questions in comments.

Regarding the predecessor issue, the tool's addNode method weaves fixed nodes as needed: when the addNode is applied it adds the fixed node before the node that is being virtualized using StructuredGraph#addBeforeFixed.

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 27, 2018

Contributor

@gilles-duboscq thank you for the review! I've changed the impl to use ConditionalNodes as suggested. I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

Contributor

fwbrasil commented Aug 27, 2018

@gilles-duboscq thank you for the review! I've changed the impl to use ConditionalNodes as suggested. I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

@gilles-duboscq

This comment has been minimized.

Show comment
Hide comment
@gilles-duboscq

gilles-duboscq Aug 28, 2018

Member

I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

Right. I guess this is due to tool.addNode choking on nodes that are already in the graph. That's a bit inconvenient but we can probably work around that:

  • equalsNode is always a new node (a constant or a comparison)
  • fieldValue might be a new node, newValue, or currentValue
  • result is always a new node

Based on this it might be enough to just gate the tool.addNode(fieldValue) on if (!fieldValue.isAlive()). If that works it would simplify this code a bit.

In any case, i think we can remove the option and we should be good to go.

Member

gilles-duboscq commented Aug 28, 2018

I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

Right. I guess this is due to tool.addNode choking on nodes that are already in the graph. That's a bit inconvenient but we can probably work around that:

  • equalsNode is always a new node (a constant or a comparison)
  • fieldValue might be a new node, newValue, or currentValue
  • result is always a new node

Based on this it might be enough to just gate the tool.addNode(fieldValue) on if (!fieldValue.isAlive()). If that works it would simplify this code a bit.

In any case, i think we can remove the option and we should be good to go.

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 28, 2018

Contributor

@gilles-duboscq it seems more complicate than that. For some reason the partial escape analysis tries to remove twice the constant node representing true.

I'm testing this new version with a service and it seems that there's a bug. The constant folding somehow infers the comparison as false in some cases where they should be true. I'm investigating if it could be related to the stamps of the values.

Contributor

fwbrasil commented Aug 28, 2018

@gilles-duboscq it seems more complicate than that. For some reason the partial escape analysis tries to remove twice the constant node representing true.

I'm testing this new version with a service and it seems that there's a bug. The constant folding somehow infers the comparison as false in some cases where they should be true. I'm investigating if it could be related to the stamps of the values.

@gilles-duboscq

This comment has been minimized.

Show comment
Hide comment
@gilles-duboscq

gilles-duboscq Aug 28, 2018

Member

I tried to write a few unit tests on top of your changes but i couldn't reproduce either error.
Do you have some example code that could reproduce?

Member

gilles-duboscq commented Aug 28, 2018

I tried to write a few unit tests on top of your changes but i couldn't reproduce either error.
Do you have some example code that could reproduce?

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 28, 2018

Contributor

@gilles-duboscq thank you for looking into it. I haven't been able to reproduce the issue with the logic constant being false in isolation yet. For the invalid graph issue, this always fail if I remove the if condition for logic constants:

  public static void main(String[] args) {
    while (true) {
      test();
    }
  }

  private static boolean test() {
    AtomicInteger a = new AtomicInteger(0);
    return a.compareAndSet(0, 1);
  }
[thread:5] scope: JVMCI CompilerThread0
    [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler
    Context: StructuredGraph:139{HotSpotMethod<Bugs.main(String[])>}
                [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase
                Exception raised in scope JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase: java.lang.ArrayIndexOutOfBoundsException: -15625002
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:85)
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:71)
                	at org.graalvm.compiler.graph.NodeFlood.isMarked(NodeFlood.java:65)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.deleteNodes(DeadCodeEliminationPhase.java:140)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.run(DeadCodeEliminationPhase.java:103)
                	at org.graalvm.compiler.phases.Phase.run(Phase.java:49)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:42)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:38)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.runAnalysis(EffectsPhase.java:108)
                	at org.graalvm.compiler.virtual.phases.ea.PartialEscapePhase.run(PartialEscapePhase.java:82)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.run(EffectsPhase.java:1)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.phases.PhaseSuite.run(PhaseSuite.java:212)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.core.GraalCompiler.emitFrontEnd(GraalCompiler.java:256)
                	at org.graalvm.compiler.core.GraalCompiler.compile(GraalCompiler.java:180)
                	at org.graalvm.compiler.core.GraalCompiler.compileGraph(GraalCompiler.java:165)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileHelper(HotSpotGraalCompiler.java:191)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compile(HotSpotGraalCompiler.java:204)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:181)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:1)
                	at org.graalvm.compiler.core.CompilationWrapper.run(CompilationWrapper.java:171)
                	at org.graalvm.compiler.hotspot.CompilationTask.runCompilation(CompilationTask.java:330)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:144)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:111)
                	at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:525)
Contributor

fwbrasil commented Aug 28, 2018

@gilles-duboscq thank you for looking into it. I haven't been able to reproduce the issue with the logic constant being false in isolation yet. For the invalid graph issue, this always fail if I remove the if condition for logic constants:

  public static void main(String[] args) {
    while (true) {
      test();
    }
  }

  private static boolean test() {
    AtomicInteger a = new AtomicInteger(0);
    return a.compareAndSet(0, 1);
  }
[thread:5] scope: JVMCI CompilerThread0
    [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler
    Context: StructuredGraph:139{HotSpotMethod<Bugs.main(String[])>}
                [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase
                Exception raised in scope JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase: java.lang.ArrayIndexOutOfBoundsException: -15625002
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:85)
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:71)
                	at org.graalvm.compiler.graph.NodeFlood.isMarked(NodeFlood.java:65)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.deleteNodes(DeadCodeEliminationPhase.java:140)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.run(DeadCodeEliminationPhase.java:103)
                	at org.graalvm.compiler.phases.Phase.run(Phase.java:49)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:42)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:38)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.runAnalysis(EffectsPhase.java:108)
                	at org.graalvm.compiler.virtual.phases.ea.PartialEscapePhase.run(PartialEscapePhase.java:82)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.run(EffectsPhase.java:1)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.phases.PhaseSuite.run(PhaseSuite.java:212)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.core.GraalCompiler.emitFrontEnd(GraalCompiler.java:256)
                	at org.graalvm.compiler.core.GraalCompiler.compile(GraalCompiler.java:180)
                	at org.graalvm.compiler.core.GraalCompiler.compileGraph(GraalCompiler.java:165)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileHelper(HotSpotGraalCompiler.java:191)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compile(HotSpotGraalCompiler.java:204)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:181)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:1)
                	at org.graalvm.compiler.core.CompilationWrapper.run(CompilationWrapper.java:171)
                	at org.graalvm.compiler.hotspot.CompilationTask.runCompilation(CompilationTask.java:330)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:144)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:111)
                	at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:525)
@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 28, 2018

Contributor

it's a concurrency bug, not related to the stamps. I'm analyzing it

Contributor

fwbrasil commented Aug 28, 2018

it's a concurrency bug, not related to the stamps. I'm analyzing it

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 28, 2018

Contributor

@gilles-duboscq should we add a memory barrier somewhere? It seems that other threads see class instances set through a CAS operation with some of their fields null. I haven't been able isolate the issue, but you can reproduce it using:

curl -L https://github.com/fwbrasil/scala-graal/blob/master/scala-graal-assembly-1.0.0-SNAPSHOT.jar?raw=true > bench.jar

mx  vm -XX:+UseJVMCICompiler -cp bench.jar -Dgraal.Dump=:5 -Dgraal.MethodFilter=Promise.transform -Dgraal.VirtualCAS=true org.openjdk.jmh.Main -f 0 -wi 15 -i 15 -r 1 -w 999999 FinagleBench

If you use -Dgraal.VirtualCAS=false the NPEs don't happen

Contributor

fwbrasil commented Aug 28, 2018

@gilles-duboscq should we add a memory barrier somewhere? It seems that other threads see class instances set through a CAS operation with some of their fields null. I haven't been able isolate the issue, but you can reproduce it using:

curl -L https://github.com/fwbrasil/scala-graal/blob/master/scala-graal-assembly-1.0.0-SNAPSHOT.jar?raw=true > bench.jar

mx  vm -XX:+UseJVMCICompiler -cp bench.jar -Dgraal.Dump=:5 -Dgraal.MethodFilter=Promise.transform -Dgraal.VirtualCAS=true org.openjdk.jmh.Main -f 0 -wi 15 -i 15 -r 1 -w 999999 FinagleBench

If you use -Dgraal.VirtualCAS=false the NPEs don't happen

@gilles-duboscq

This comment has been minimized.

Show comment
Hide comment
@gilles-duboscq

gilles-duboscq Aug 29, 2018

Member

I had a more thorough look at this.
Regarding the NPEs: The issue is that when creating the Compare and Conditional nodes, care must be taken not to mix virtual and non-virtual nodes: you can not have one side virtual and the other one not.

You should use tool.getAlias to get the current node for expected & newValue. Then you have to check the status of the inputs before creating binary nodes. In general, you can only create them if the inputs are not virtual (i.e., not instances of VirtualObjectNode). If both are virtual you can sometimes come to a conclusion. e.g., if 2 VirtualObjectNode are != in the IR then they can also not represent == objects at runtime if they have identity (VirtualObjectNode#hasIdentity).

Regarding the other issue, i can not reproduce it when using

if (!fieldValue.isAlive()) {
    tool.addNode(fieldValue);
}

Also node that this kind of problems is usually detected earlier if you use -esa.

Member

gilles-duboscq commented Aug 29, 2018

I had a more thorough look at this.
Regarding the NPEs: The issue is that when creating the Compare and Conditional nodes, care must be taken not to mix virtual and non-virtual nodes: you can not have one side virtual and the other one not.

You should use tool.getAlias to get the current node for expected & newValue. Then you have to check the status of the inputs before creating binary nodes. In general, you can only create them if the inputs are not virtual (i.e., not instances of VirtualObjectNode). If both are virtual you can sometimes come to a conclusion. e.g., if 2 VirtualObjectNode are != in the IR then they can also not represent == objects at runtime if they have identity (VirtualObjectNode#hasIdentity).

Regarding the other issue, i can not reproduce it when using

if (!fieldValue.isAlive()) {
    tool.addNode(fieldValue);
}

Also node that this kind of problems is usually detected earlier if you use -esa.

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Aug 29, 2018

Contributor

@gilles-duboscq thank you for the pointers! I've implemented the logic you suggested, but I couldn't find any CAS operations in our service where both values are virtual, or both non-virtual. They're usually virtual for the new value but a PiNode for the expected value.

I've done some more testing, and the virtualize method is called twice by the virtualization process. During the second pass, if I create the equalsNode it becomes a logic constant because of constant folding and I can apply the CAS statically, so I added the condition back. If I leave that condition out, no CAS operations are virtualized in our codebase.

EDIT: I've tested another codebase and found one instance where both are non-virtual

Contributor

fwbrasil commented Aug 29, 2018

@gilles-duboscq thank you for the pointers! I've implemented the logic you suggested, but I couldn't find any CAS operations in our service where both values are virtual, or both non-virtual. They're usually virtual for the new value but a PiNode for the expected value.

I've done some more testing, and the virtualize method is called twice by the virtualization process. During the second pass, if I create the equalsNode it becomes a logic constant because of constant folding and I can apply the CAS statically, so I added the condition back. If I leave that condition out, no CAS operations are virtualized in our codebase.

EDIT: I've tested another codebase and found one instance where both are non-virtual

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Sep 4, 2018

Contributor

@gilles-duboscq I've added test, removed the option, and removed the virtual object comparison since it's covered by the constant fold applied when the equals node is created. It should be good to merge

Contributor

fwbrasil commented Sep 4, 2018

@gilles-duboscq I've added test, removed the option, and removed the virtual object comparison since it's covered by the constant fold applied when the equals node is created. It should be good to merge

@fwbrasil fwbrasil changed the title from [wip] virtualize unsafe compare and swap calls to virtualize unsafe compare and swap calls Sep 4, 2018

@fwbrasil

This comment has been minimized.

Show comment
Hide comment
@fwbrasil

fwbrasil Sep 6, 2018

Contributor

the build failure doesn't seem related to the change: https://travis-ci.org/oracle/graal/jobs/424908424#L1693

Contributor

fwbrasil commented Sep 6, 2018

the build failure doesn't seem related to the change: https://travis-ci.org/oracle/graal/jobs/424908424#L1693

@gilles-duboscq

This comment has been minimized.

Show comment
Hide comment
@gilles-duboscq

gilles-duboscq Sep 19, 2018

Member

Thank you @fwbrasil for bearing with all my comments and making the adjustments!
I'll integrate this.

Member

gilles-duboscq commented Sep 19, 2018

Thank you @fwbrasil for bearing with all my comments and making the adjustments!
I'll integrate this.

@dougxc dougxc merged commit e5cab6b into oracle:master Sep 26, 2018

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
oca-check Authors have signed the OCA, or are Oracle employees.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment