Analyze constants to remove unnecessary branches #2214

Merged
merged 2 commits into from Mar 15, 2013

Projects

None yet

5 participants

@JamesIry

This commit adds analysis and optimization of constants to remove
unnecessary branches. It uses abstract interpretation to determine
what constant(s) a particular stack slot or variable might or might not
hold at a given spot and uses that knowledge to eliminate branches that
cannot be taken. Its primary goal is null check removal, but it also
works for other constants.

Several tests are modified to include the new optimization phase.

Two new tests are added. One verifies that branching still works as
expected. The other verifies that branches are removed.

@JamesIry JamesIry Analyze constants to remove unnecessary branches
This commit adds analysis and optimization of constants to remove
unnecessary branches. It uses abstract interpretation to determine
what constant(s) a particular stack slot or variable might or might not
hold at a given spot and uses that knowledge to eliminate branches that
cannot be taken. Its primary goal is null check removal, but it also
works for other constants.

Several tests are modified to include the new optimization phase.

Two new tests are added. One verifies that branching still works as
expected. The other verifies that branches are removed.
69109c0
@retronym retronym was assigned Mar 8, 2013
@retronym
The Scala Programming Language member

Neat, this will help me bring SI-7172 home, too. I'll take a closer look at this patch in tomorrow.

@magarciaEPFL

@JamesIry , I haven't looked in detail at your implementation , just at the description of its goals (constant folding, including discarding branches not taken after null checks).

It appears that performing Abstract Interpretation sequentially on ICode CFGs will invariably be slower than at the ASM level (GenASM prepares ASM trees anyway).

It's clear the optimization should be done, to get rid of redundant code. However, there are advantages to performing this (very same) optimization at the ASM level, for example via:

https://github.com/magarciaEPFL/scala/blob/GenRefactored2/src/asm/scala/tools/asm/optimiz/ConstantFolder.java
https://github.com/magarciaEPFL/scala/blob/GenRefactored2/src/asm/scala/tools/asm/optimiz/NullnessPropagator.java

The main advantage being parallelism.

Is there any phase ordering advantage to having an ICode phase for the above?

@JamesIry

@magarciaEPFL There is an advantage right now: namely this optimization only replaces conditional branches with jumps, it doesn't remove any blocks nor analyze what code it renders useless. Later phases do dead code elimination, unreachable block elimination, peephole optimization, jump collapsing, etc. As you know there's nothing preventing all those things from being done on ASM, but we don't have that stuff just yet.

@JamesIry

@retronym This version doesn't yet understand that on the true side of x.isInstanceOf[Something] x can't be null (until mutated). That's in its list of TODOs.

@magarciaEPFL

@JamesIry

As you know there's nothing preventing all those things from being done on ASM, but we don't have that stuff just yet.

The optimizations you mention are available at https://github.com/magarciaEPFL/scala/tree/GenRefactored2/src/asm/scala/tools/asm/optimiz

They are not tied to the new optimizer, just to ASM. The contract of each optimization is documented on each file header.

To use those optimizations from GenASM, basically instead of subclassing asm.ClassWriter as is done now, asm.tree.ClassNode should be used (both ClassNode and ClassWriter are behavioral subtypes of ClassVisitor). GenASM will not notice, and then (a chosen subset of) asm.optimiz optimizations can be applied.

@magarciaEPFL

Let's see what NullnessPropagator does:

    package scala.tools.asm.optimiz;

    /**
     *  Infers null resp. non-null reaching certain program points, using that information to:
     *
     *    (a) simplify control-flow when a conditional jump will always be taken or avoided,
     *        for three classes of instructions:
     *          (a.1) IF_ACMPEQ, IF_ACMPNE
     *          (a.2) IFNULL,    IFNONNULL
     *          (a.3) INSTANCEOF
     *
     *    (b) simplify an ALOAD for a local-var known to contain null (ACONST_NULL replaces it).
     *        This might enable further reductions (e.g., dead-store elimination).
     *
     *    (c) Scala unbox of null is replaced by a 0 load of the appropriate sort (I, L, F, or D).
     *
     *  @author  Miguel Garcia, http://lamp.epfl.ch/~magarcia/ScalaCompilerCornerReloaded/
     *  @version 1.0
     */
    public class NullnessPropagator {

An instanceof bytecode instruction whose outcome is known is rewritten as follows:

case Opcodes.INSTANCEOF:
    if(frame.getStackTop().isNull()) {
        // drop ref, ICONST_0
        TypeInsnNode iof = (TypeInsnNode)insns[i];
        mnode.instructions.insert(iof, new InsnNode(Opcodes.ICONST_0));
        mnode.instructions.insert(iof, Util.getDrop(1));
        mnode.instructions.remove(iof);
        changed = true;
    }
    break;
@JamesIry

@magarciaEPFL pull requests welcome

@magarciaEPFL

@JamesIry What's that supposed to mean?

@JamesIry

@magarciaEPFL Just that: please don't leave that stuff lying around in a private fork. Start shoveling in! We want it! It's much easier for you to generate bite sized, self consistent pull requests from your code than it is for me to troll through a large, unknown code base trying to find and package up nuggets. I may well do that at some point, but not today.

@retronym retronym commented on an outdated diff Mar 10, 2013
...cala/tools/nsc/backend/opt/ConstantOptimization.scala
+
+ case CJUMP(success, failure, cond, kind) =>
+ val in1 = in peek 0
+ val in2 = in peek 1
+ interpretConditional(kind, in, 2, in1, in2, success, failure, cond)
+
+ case CZJUMP(success, failure, cond, kind) =>
+ val in1 = in peek 0
+ val in2 = getZeroOf(kind)
+ interpretConditional(kind, in, 1, in1, in2, success, failure, cond)
+
+ case SWITCH(tags, labels) =>
+ val in1 = in peek 0
+ val newStuff = tags zip labels filter { case (tagSet, _) => canSwitch(in1, tagSet) }
+ val (reachableTags, reachableNormalLabels) = (tags zip labels filter { case (tagSet, _) => canSwitch(in1, tagSet) }).unzip
+ val reachableLabels = if (labels.size > tags.size) {
@retronym
retronym Mar 10, 2013

Could use lengthCompare.

@retronym retronym commented on an outdated diff Mar 10, 2013
...cala/tools/nsc/backend/opt/ConstantOptimization.scala
+ case LE | GE => !guaranteedEqual // if the two are guaranteed to be equal then they must be LE/GE
+ }
+
+ val out = in drop toDrop
+
+ var result = Map[BasicBlock, State]()
+ if (succPossible) {
+ result += ((success, out))
+ }
+
+ if (failPossible) {
+ result += ((failure, out))
+ }
+
+ if (result.size == 1) (result, List.fill(toDrop)(DROP(kind)) :+ JUMP(result.keySet.head))
+ else (result, inst :: Nil)
@retronym
retronym Mar 10, 2013

Push the if inside (result, ... ).

@retronym retronym commented on an outdated diff Mar 10, 2013
...cala/tools/nsc/backend/opt/ConstantOptimization.scala
+ in drop 3
+
+ case STORE_LOCAL(local) =>
+ in store local
+
+ case STORE_THIS(_) =>
+ // if a local is already known to have a constant and we're replacing with the same constant then we can
+ // replace this with a drop
+ in store THIS_LOCAL
+
+ case STORE_FIELD(_, isStatic) =>
+ val drops = if (isStatic) 1 else 2
+ in drop drops
+
+ case CALL_PRIMITIVE(_) =>
+ in drop inst.consumed push UNKNOWN
@retronym
retronym Mar 10, 2013

Perhaps use in drop inst.consumed all over, rather than drop <literal>.

@retronym retronym and 1 other commented on an outdated diff Mar 10, 2013
...cala/tools/nsc/backend/opt/ConstantOptimization.scala
+ case NEW(_) =>
+ in push NOT_NULL
+
+ case CREATE_ARRAY(_, dims) =>
+ in drop dims push NOT_NULL
+
+ case IS_INSTANCE(_) =>
+ // TODO IS_INSTANCE is going to be followed by a C(Z)JUMP
+ // and if IS_INSTANCE/C(Z)JUMP the branch for "true" can
+ // know that whatever was checked was not a null
+ // see the TODO on CJUMP for more information about propagating null
+ // information
+ // TODO if the top of stack is guaranteed null then we can eliminate this IS_INSTANCE check and
+ // replace with a constant false, but how often is a knowable null checked for instanceof?
+ // TODO we could track type information and statically know to eliminate IS_INSTANCE
+ // but that's probably not a huge win
@retronym
retronym Mar 10, 2013

Such cases might be interesting under specialization; that's the subject of SI-7172. We could also handle this earlier by emitting if (x eq null) false else [[x.tpe <:< checkTp]] rather than x.isInstanceOf[checkTp] in Erasure, as we have all the type information at hand there. But I guess that doesn't profit from refined type information that might be gleaned on subsequent iterations of this optimization process.

@JamesIry
JamesIry Mar 11, 2013

That layout would also make it harder to eliminate the null check. With the current x.isInstanceOf[checkTp] it's easy enough to eliminate the null, I just need to do it.

@retronym retronym and 1 other commented on an outdated diff Mar 11, 2013
...cala/tools/nsc/backend/opt/ConstantOptimization.scala
+ private sealed abstract class Datum
+ /**
+ * A constant datum
+ */
+ private case class Const(c: Constant) extends Datum {
+ def isIntAssignable = c.tag >= BooleanTag && c.tag <= IntTag
+ def toInt = c.tag match {
+ case BooleanTag => if (c.booleanValue) 1 else 0
+ case _ => c.intValue
+ }
+
+ /**
+ * True if this constant has the same representation (and therefore would compare true under eq) as another constant
+ */
+ override def equals(other: Any) = (other match {
+ case oc @ Const(o) => (this eq oc) || (if (this.isIntAssignable && oc.isIntAssignable) this.toInt == oc.toInt else c.value == o.value)
@retronym
retronym Mar 11, 2013

I think you'll need to treat Double.NaN here. Reuse Constant#equalHashValue.

@JamesIry
JamesIry Mar 11, 2013

Can't do that because NaN != NaN and I don't want to eliminate the branch in 'if (NaN != NaN) println("Blame IEEE")'

But UGH. NaN means I have to eliminate the shortcircuit check on pointer equality in that line for the same reason I can't use equalsHashValue.

I think I have to eliminate a couple of other pointer equality checks as well.

@retronym
retronym Mar 11, 2013

Whichever way you slice it, it would be good to see unit tests for equals/hashCode consistency of this class, in addition to the functional tests like blame-aye-triple-eee.scala.

@paulp

Can someone summarize the state of the pull request?

@JamesIry

Waiting on second round of feedback from @retronym

@JamesIry

I should rephrase because that sounds like @retronym is being slow. He's not. My updates based on his first round review went in this morning. The PR is now waiting for a second round or a thumbs up.

@paulp

So Slowy McSlowerson is slowing everything down? Gotcha. Guess we have no choice but to hunker down and wait him out.

@retronym
The Scala Programming Language member

Two more comments.

  • in drop inst.consumed should be factored out to a method.
  • Can you comment on the apparent duplication in effort between these optimizations (against icode), and Miguel's work (against bytecode.)? My read is that: "icode is still the reality on this branch, we need these optimizations in place ASAP, we're happy if they are subsumed (even shortly) down the track, but in any case the test cases will live on.". Which would Sound Good To Me.

Which is reasonable enough to me. I also think that this exercise will help us to review and test Miguel's optimizations, which is also a win.

BTW, GitHub doesn't notify me that new commits have been pushed to a PR; it's best to add a comment to expediate things :)

@JamesIry

This optimization works best if dead code elimination, peephole optimization, unreachable branch elimination, dead variable elimination, and block collapsing all come after it. It will take awhile to incorporate all that from Miguel's branch. We can envision a process where ASM based optimizations are brought in from back to front, one at a time, subsuming icode based ones. Eventually that would mean this one is superfluous - or if this one does something that the new back end doesn't then the logic would have to be ported.

@retronym
The Scala Programming Language member

Thanks for the explanation. LGTM, a pleasure to read. But this area of the compiler isn't really in my wheelhouse so I'd love if @magarciaEPFL could also review this for any correctness problems.

@JamesIry

If Miguel doesn't have time maybe it would cheer up @gkossakowski to take a look or @dragos can put on his old optimization hat.

@magarciaEPFL

I'm a bit under pressure with time, yes. But I promise I'll keep watching this space :)

@gkossakowski
The Scala Programming Language member
@JamesIry JamesIry Cleanup of constant optimization
This commit cleans up constant optimization from the review of
scala#2214 .

* drops are done using the instruction's consumed count rather than a
  numeric literal
* drops are moved into one common method in the main instruction
  interpreter
* One instance of x.length > y.length is replaced with
  x.lengthCompare(y.length) > 0
* NaN is dealt with by treating it as an UNKNOWN
* A test is added to make sure NaN semantics aren't broken.
* The constant-optmization test is improved with tests for switch
  statements
3a17ff0
@JamesIry

I've got more work to do against this. Based on @retronym's LGTM and a conversation with @adriaanm I'm merging. If any of the afore mentioned cast of characters wants to do an after-the-fact review I'm willing to incorporate changes then.

@JamesIry JamesIry merged commit 25aabc8 into scala:master Mar 15, 2013

1 check passed

Details default pr-checkin-per-commit Took 80 min.
@retronym retronym added a commit to retronym/scala that referenced this pull request Jul 29, 2015
@retronym retronym SI-9422 Fix incorrect constant propagation
The ConstantOptimization phase uses abstract interpretation
to track what is known about values, and then to use this information
to optimize away tests with a statically known result.

Constant propagation was added under -optimize in Scala 2.11.0-M3, in
PR #2214.

For example, we might know that a variable must hold one of a set
of values (`Possible`). Or, we could track that it must *not*
be of of a set of value (`Impossible`).

The test case in the bug report was enough to create comparison:

    v1 == v2 // where V1 = Possible(Set(true, false))
             //       V2 = Possible(Set(true, false))

This test was considered to be always true, due to a bug in
`Possible#mightNotEqual`. The only time we can be sure that
`Possible(p1) mightNotEquals Possible(p2)` is if
`p1` and `p2` are the same singleton set. This commit changes
this method to implement this logic.

The starting assumption for all values is currently
`Impossible(Set())`, although it would also be reasonable to represent
an unknown boolean variable as `Possible(Set(true, false))`, given
the finite and small domain.

I tried to change the starting assumption for boolean locals in
exactly this manner, and that brings the bug into sharp focus.

Under this patch:

    e564fe5

This code:

   def test(a: Boolean, b: Boolean) = a == b

Compiles to:

  public boolean test(boolean, boolean);
    Code:
       0: iconst_1
       1: ireturn

Note: the enclosed test case does not list `-optimize` in a `.flags`
file, I'm relying on that being passed in by the validation build.
I've tested locally with that option, though.
ec95e53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment