New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate non-escaping boxes, tuples and refs #4858

Merged
merged 29 commits into from Jan 25, 2016

Conversation

Projects
None yet
4 participants
@lrytz
Member

lrytz commented Nov 23, 2015

This PR includes the commits of #4846.

Eliminate boxes, tuples and refs that are created and used within a
single method without escaping. For details on the implementation see
the doc comment in class BoxUnbox.

lrytz added some commits Oct 29, 2015

Track aliases of primitive values in AliasingFrame
This allows using an AliasingAnalyzer for copy propagation
(subsequent commit).
Fix VarInstruction extractor to include IINC
Previously the VarInstruction extractor did not include IINCs.
Also give rename the isVarInstruction predicate to isLoadStoreOrRet,
as it doesn't include IINC.

Also fixes a bug in removeJumpAndAdjustStack, it checked the wrong
opcode range before.
Clean up DCE: remove eliminated callsites from call graph earlier
When running DCE, directly remove eliminated callsites from the call
graph (instead delegating this task to the callees).
Fix merging of AliasingFrames
By convention (for performance), `aliasingFrame.aliases(slot) == null`
means that the value at `slot` has no aliases, so this is isomorphic
to having a singleton `AliasSet(slot)`.

When merging an AliasingFrame `other` into `this`, at every value slot
we have to keep the intersection of the two alias sets. In other words
we have to remove from `this.aliases(slot)` those values that are
not in `other.aliases(slot)`. This was wrong for the cases where
`other.aliases == null` - we did not treat this case as a singleton
set.
Copy propagation, remove unused values (closures!) and local variables
Copy propagation uses an AliasingAnalyzer: it replaces a `LOAD n`
instruction by `LOAD m` where m is the smallest alias of n. This
leads to stale STORE instructions.

Stale STOREs are identified using a ProdCons analyzer and replaced by
POPs.

Values that are pushed on the stack by a side-effect free instruction
and consumed by a POP are then removed by `eliminatePushPop`. This
includes elimination of unused closure allocations and unused boxes
and  tuple allocations (*).

A final cleanup eliminates `STORE x; LOADx` pairs where the stored
value is not otherwise used.

Fixes
  - scala/scala-dev#25
  - scala/scala-dev#7
  - scala/scala-dev#14
  - scala/scala-dev#12

(*) We don't yet rewrite reads of boxes and tuples yet. For example,
`val x = (1, 2); x._1` remains a method invocation and the tuple
cannot be eliminated (scala/scala-dev#11).

Inspired in many ways by Miguel's work!
@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

What about instances created through the static factory methods, *Ref.{create,zero}?

What about instances created through the static factory methods, *Ref.{create,zero}?

This comment has been minimized.

Show comment
Hide comment
@lrytz
Owner

lrytz replied Jan 20, 2016

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

These constructors perform a volatile write. But I guess you're only considering removing the call if there are no other references to the field (everything has collapsed back to local variables being accessed in a single thread) so there is nothing that needs the "happens before" memory barrier.

These constructors perform a volatile write. But I guess you're only considering removing the call if there are no other references to the field (everything has collapsed back to local variables being accessed in a single thread) so there is nothing that needs the "happens before" memory barrier.

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Jan 20, 2016

Owner

yes, this is the case

Owner

lrytz replied Jan 20, 2016

yes, this is the case

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

This is a bit fragile, I'd prefer to call something in Definitions to give us a list of symbols. I remember I needed something similar in Delambdafy, see the use of specializedFunctionName.

This is a bit fragile, I'd prefer to call something in Definitions to give us a list of symbols. I remember I needed something similar in Delambdafy, see the use of specializedFunctionName.

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Jan 20, 2016

Owner

ok, i'll look into cleaning this up - the difficulty is that we're outside global, so the code has to move somewhere else.

Owner

lrytz replied Jan 20, 2016

ok, i'll look into cleaning this up - the difficulty is that we're outside global, so the code has to move somewhere else.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

nitpick I'd prefer: val storesRemoved = compilerSettings.YoptCopyPropagation.value && eliminateStaleStores(...), etc.

nitpick I'd prefer: val storesRemoved = compilerSettings.YoptCopyPropagation.value && eliminateStaleStores(...), etc.

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Jan 20, 2016

Owner

this is cleaned up in a subsequent commit, 1265e19#diff-70147974eed4d5f7faa546e42504ce64L198

Owner

lrytz replied Jan 20, 2016

this is cleaned up in a subsequent commit, 1265e19#diff-70147974eed4d5f7faa546e42504ce64L198

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

Would be interesting to keep track of a histogram of the number of rounds required. I'd love to see the bytecode progression for the method that takes the most rounds. Might even be worth adding a diagnostic mode to log that out for a nominated method during compilation, might really help people report bugs or understand the progression of the optimizer.

Would be interesting to keep track of a histogram of the number of rounds required. I'd love to see the bytecode progression for the method that takes the most rounds. Might even be worth adding a diagnostic mode to log that out for a nominated method during compilation, might really help people report bugs or understand the progression of the optimizer.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

Great overview, much appreciated.

Great overview, much appreciated.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

What if the constant load we're eliminating isn't a String, but is rather a Class?

ldc

Otherwise, if the runtime constant pool entry is a symbolic reference to a class (§5.1), then the named class is resolved (§5.4.3.1) and a reference to the Class object representing that class, value, is pushed onto the operand stack.

5.4.3.1 Class and Interface Resolution
Any exception that can be thrown as a result of failure of class or interface creation can thus be thrown as a result of failure of class and interface resolution.

Same for method type / method handle constants.

What if the constant load we're eliminating isn't a String, but is rather a Class?

ldc

Otherwise, if the runtime constant pool entry is a symbolic reference to a class (§5.1), then the named class is resolved (§5.4.3.1) and a reference to the Class object representing that class, value, is pushed onto the operand stack.

5.4.3.1 Class and Interface Resolution
Any exception that can be thrown as a result of failure of class or interface creation can thus be thrown as a result of failure of class and interface resolution.

Same for method type / method handle constants.

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Nov 11, 2015

Owner

29d772d#diff-70147974eed4d5f7faa546e42504ce64R683 - maybe I have to add method handles?

Owner

lrytz replied Nov 11, 2015

29d772d#diff-70147974eed4d5f7faa546e42504ce64R683 - maybe I have to add method handles?

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Jan 20, 2016

Owner

55faa0d cleans it up

Owner

lrytz replied Jan 20, 2016

55faa0d cleans it up

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Nov 11, 2015

I think this is the case, but just checking: does optimizer's handling of tuples account for the fact that TupleN is not (yet) final?

I think this is the case, but just checking: does optimizer's handling of tuples account for the fact that TupleN is not (yet) final?

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Nov 11, 2015

Owner

yes, it only eliminates new TupleN

Owner

lrytz replied Nov 11, 2015

yes, it only eliminates new TupleN

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

lrytz added some commits Nov 13, 2015

Support specialized functions in closure optimizer
Fixes scala/scala-dev#52.

An IndyLambda may create a specialized function type, where the SAM
is the corresponding specialized variant of apply. If this closure
is invoked through the generic apply method, the closure optimizer
would previously not re-write the invocation to the $anonfun method.
This is now done, including the necessary box / unbox operations.
Fix in "consumersOfOutputsFrom" query in prod-cons analysis
Fix the "consumersOfOutputsFrom" in the prod-cons analysis to support
dummy instructions:

  - UninitializedLocalProducer
  - ParameterProducer
  - ExceptionProducer

@scala-jenkins scala-jenkins added this to the 2.12.0-M4 milestone Nov 23, 2015

@lrytz lrytz changed the title from Box-unbox elimination to Eliminate non-escaping boxes, tuples and refs Dec 12, 2015

lrytz added some commits Dec 11, 2015

Fix push-pop elimination for values pushed by DUP
If a DUP is consumed by two POPs, ensure that the DUP and its producer
are eliminated. Before, only the DUP was eliminated, leaving an unused
value on the stack.
Don't run closure optimizer on methods too large for prod-cons
For methods that are too large to run a producers-consumers analysis,
don't run the closure optimizer.

This is an oversight, all data flow analyses in the backend are
guarded by a method size check.
Eliminate non-escaping boxes, tuples and refs
Eliminate boxes, tuples and refs that are created and used within a
single method without escaping. For details on the implementation see
the doc comment in class BoxUnbox.

This commit also cleans up the logic of inter-dependent method-level
optimizations that run until reaching a fixpoint.
Eliminate unnecessary casts
Eliminate casts that are statically known to succeed. This enables
boxes to be eliminated and simplifies the implementation of closure
allocation elimination.
@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jan 25, 2016

Member

I think the -YoptTrace option ought to accept an argument that lets one limit the output to a particular class / method pair. As it stands, it is only really useful after you've extracted the method of interest to a standalone test case. Feel free to do this in a followup PR, though.

Member

retronym commented Jan 25, 2016

I think the -YoptTrace option ought to accept an argument that lets one limit the output to a particular class / method pair. As it stands, it is only really useful after you've extracted the method of interest to a standalone test case. Feel free to do this in a followup PR, though.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jan 25, 2016

Member

Here's a script to post-process the optimizer trace into a series of diffs (via a temp git repo). I've used your test case from 1a9e649 as the example.

https://gist.github.com/retronym/0efa5cea463aced52fae

Here's another example from the central commit in this PR: https://gist.github.com/retronym/99a026883916abf2d309

Member

retronym commented Jan 25, 2016

Here's a script to post-process the optimizer trace into a series of diffs (via a temp git repo). I've used your test case from 1a9e649 as the example.

https://gist.github.com/retronym/0efa5cea463aced52fae

Here's another example from the central commit in this PR: https://gist.github.com/retronym/99a026883916abf2d309

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jan 25, 2016

Member

After looking at those diffs, I notice that eliminateStaleStores doesn't reduce maxLocals. This happens subsequently in dce, but I was wondering whether this is something that could/should be done more eagerly.

Member

retronym commented Jan 25, 2016

After looking at those diffs, I notice that eliminateStaleStores doesn't reduce maxLocals. This happens subsequently in dce, but I was wondering whether this is something that could/should be done more eagerly.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Jan 25, 2016

Member

the -YoptTrace option ought to accept an argument that lets one limit the output to a particular class / method pair

Added in 3dccf0e

notice that eliminateStaleStores doesn't reduce maxLocals

This is intentional, see this comment: lrytz@29d772d#diff-70147974eed4d5f7faa546e42504ce64R94

Member

lrytz commented Jan 25, 2016

the -YoptTrace option ought to accept an argument that lets one limit the output to a particular class / method pair

Added in 3dccf0e

notice that eliminateStaleStores doesn't reduce maxLocals

This is intentional, see this comment: lrytz@29d772d#diff-70147974eed4d5f7faa546e42504ce64R94

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym Jan 25, 2016

Member

LGTM

Member

retronym commented Jan 25, 2016

LGTM

lrytz added a commit that referenced this pull request Jan 25, 2016

Merge pull request #4858 from lrytz/opt/elimBoxes
Eliminate non-escaping boxes, tuples and refs

@lrytz lrytz merged commit 90ee1b8 into scala:2.12.x Jan 25, 2016

6 checks passed

cla @lrytz signed the Scala CLA. Thanks!
Details
combined All previous commits successful.
integrate-ide [840] SUCCESS. Took 7 s.
Details
validate-main [1034] SUCCESS. Took 73 min.
Details
validate-publish-core [1020] SUCCESS. Took 11 min.
Details
validate-test [842] SUCCESS. Took 61 min.
Details

@lrytz lrytz deleted the lrytz:opt/elimBoxes branch Jan 25, 2016

lrytz added a commit to lrytz/scala that referenced this pull request Jan 27, 2016

Improve simplifyJumps
Improve simplifyJumps to rewrite

  IFEQ L4
  L5
  GOTO L6

to

  IFNE L6
  L5

This rewrite is only correct if L5 is not the target of any jump
instruction (otherwise, removing the GOTO would change semantics).
Previously we did not do the rewrite if there was any label between
the conditional jump and the goto (like L5). Now we track which labels
are jump targets.

Also adds a test for SI-9160 which tests the above. The unnecessary
boxing mentioned in the ticket is optimzied since push-pop elimination
(scala#4858).

lrytz added a commit to lrytz/scala that referenced this pull request Jan 27, 2016

Improve simplifyJumps
Improve simplifyJumps to rewrite

  IFEQ L4
  L5
  GOTO L6

to

  IFNE L6
  L5

This rewrite is only correct if L5 is not the target of any jump
instruction (otherwise, removing the GOTO would change semantics).
Previously we did not do the rewrite if there was any label between
the conditional jump and the goto (like L5). Now we track which labels
are jump targets.

Also adds a test for SI-9160 which tests the above. The unnecessary
boxing mentioned in the ticket is optimzied since push-pop elimination
(scala#4858).

lrytz added a commit to lrytz/scala that referenced this pull request Feb 3, 2016

Re-write and Re-enable optimizer tests
Rewrite tests for new optimizer
  - SI-6941
  - SI-2171
  - t3430
  - t3252
  - t4840
  - t2171
  - t3430
  - t3252
  - t6157
  - t6547
  - t8062
  - t8306
  - t8359
  - t9123
  - trait-force-info
  - private-inline

test cases for bugs fixed in the new optimizer
  - SI-9160, the unnecessary boxing mentioned in the ticket is optimzied
    since push-pop elimination (scala#4858).
  - SI-8796
  - SI-8524
  - SI-7807

fix flags file for t3420

remove an empty flags file

remove unnecessary partest filters

explicit inliner warnings in test t7582

Restore the lisp test. Removing the flags file - our build runs with the
(new) optimizer enabled anyway.

The test spent the past few years as an optimizer test in pos/
see https://issues.scala-lang.org/browse/SI-4512. The attempt may fail,
but why not give it a try.

$ git lg -S"lisp"

...
| * | | | f785785 - SI-4579 Yoke the power of lisp.scala as a stress for the optimizer. (3 years, 8 months ago) <Jason Zaugg>
...
* | | | | | | 622cc99 - Revert the lisp test. (3 years, 10 months ago) <Paul Phillips>
...
* | | | | | | 97f0324 - Revived the lisp test. (3 years, 10 months ago) <Paul Phillips>
...
* | 1e0f7dc - Imprison the lisp test, no review. (4 years, 4 months ago) <Paul Phillips>
...
* | 6b09630 - "Freed the lisp test." Tweaked partest defaults... (4 years, 6 months ago) <Paul Phillips>
...
* | fec42c1 - Lisp test wins again, no review. (4 years, 8 months ago) <Paul Phillips>
...
* | 1c2d44d - Restored the lisp.scala test. (4 years, 8 months ago) <Paul Phillips>
...
* | 15ed892 - Temporarily sending lisp.scala to be interprete... (4 years, 8 months ago) <Paul Phillips>
...

lrytz added a commit to lrytz/scala that referenced this pull request Feb 3, 2016

Re-write and Re-enable optimizer tests
Rewrite tests for new optimizer
  - SI-6941
  - SI-2171
  - t3430
  - t3252
  - t4840
  - t2171
  - t3430
  - t3252
  - t6157
  - t6547
  - t8062
  - t8306
  - t8359
  - t9123
  - trait-force-info
  - private-inline

test cases for bugs fixed in the new optimizer
  - SI-9160, the unnecessary boxing mentioned in the ticket is optimzied
    since push-pop elimination (scala#4858).
  - SI-8796
  - SI-8524
  - SI-7807

fix flags file for t3420

remove an empty flags file

remove unnecessary partest filters

explicit inliner warnings in test t7582

Restore the lisp test. Removing the flags file - our build runs with the
(new) optimizer enabled anyway.

The test spent the past few years as an optimizer test in pos/
see https://issues.scala-lang.org/browse/SI-4512. The attempt may fail,
but why not give it a try.

$ git lg -S"lisp"

...
| * | | | f785785 - SI-4579 Yoke the power of lisp.scala as a stress for the optimizer. (3 years, 8 months ago) <Jason Zaugg>
...
* | | | | | | 622cc99 - Revert the lisp test. (3 years, 10 months ago) <Paul Phillips>
...
* | | | | | | 97f0324 - Revived the lisp test. (3 years, 10 months ago) <Paul Phillips>
...
* | 1e0f7dc - Imprison the lisp test, no review. (4 years, 4 months ago) <Paul Phillips>
...
* | 6b09630 - "Freed the lisp test." Tweaked partest defaults... (4 years, 6 months ago) <Paul Phillips>
...
* | fec42c1 - Lisp test wins again, no review. (4 years, 8 months ago) <Paul Phillips>
...
* | 1c2d44d - Restored the lisp.scala test. (4 years, 8 months ago) <Paul Phillips>
...
* | 15ed892 - Temporarily sending lisp.scala to be interprete... (4 years, 8 months ago) <Paul Phillips>
...

lrytz added a commit to lrytz/scala that referenced this pull request Feb 13, 2016

SI-8790 test case
Tuples created for pattern matching are eliminated since
scala#4858

@lrytz lrytz referenced this pull request Feb 13, 2016

Merged

SI-8790 test case #4965

@adriaanm adriaanm added 2.12.0 and removed 2.12 labels Oct 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment