Skip to content

Commit

Permalink
cmd/compile: move argument stack construction to SSA generation
Browse files Browse the repository at this point in the history
The goal of this change is to move work from walk to SSA,
and simplify things along the way.

This is hard to accomplish cleanly with small incremental changes,
so this large commit message aims to provide a roadmap to the diff.

High level description:

Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
ascompatte gathered variadic arguments into a slice.
It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
ascompatte was called multiple times to handle the receiver in a method call.
reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
adjustargs then made extra stack space for go/defer args as needed.

Node to SSA construction evaluated all the statements in n.List,
and issued the function call, assuming that the stack was correctly constructed.
Intrinsic calls had to dig around inside n.List to extract the arguments,
since intrinsics don't use the stack to make function calls.

This change moves stack construction to the SSA construction phase.
ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
It does not, however, make any assignments to stack slots.
Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
(It would be better to use Ninit instead of List; future work.)
During SSA construction, after doing all the temporary assignments in n.List,
the function arguments are assigned to stack slots by
constructing the appropriate SSA Value, using (*state).storeArg.
SSA construction also now handles adjustments for go/defer args.
This change also simplifies intrinsic calls, since we no longer need to undo walk's work.

Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.

Generated code differences:

There were a few optimizations applied along the way, the old way.
f(g()) was rewritten to do a block copy of function results to function arguments.
And reorder1 avoided introducing the final "save the stack" temporary in n.List.

The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.

SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
The exception was when the temporary's type was not SSA-able;
in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
with the autotmp never read or used again.
This change introduces a new rewrite rule to detect such pointless double Moves
and collapse them into a single Move.
This is actually more powerful than the original optimization,
since the original optimization relied on the imprecise Node.HasCall calculation.

The other significant difference in the generated code is that the stack is now constructed
completely in SP-offset order. Prior to this change, the stack was constructed somewhat
haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
then other arguments, then the method receiver, then the defer/go args.
SP-offset is probably a good default order. See future work.

There are a few minor object file size changes as a result of this change.
I investigated some regressions in early versions of this change.

One regression (in archive/tar) was the addition of a single CMPQ instruction,
which would be eliminated were this TODO from flagalloc to be done:
	// TODO: Remove original instructions if they are never used.

One regression (in text/template) was an ADDQconstmodify that is now
a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
in the order in which arguments are written. The argument change
order can also now be luckier, so this appears to be a wash.

All in all, though there will be minor winners and losers,
this change appears to be performance neutral.

Future work:

Move loading the result of function calls to SSA construction; eliminate OINDREGSP.

Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
Among other benefits, this would make it easier to transition to a new calling convention.
This would require rethinking the handling of stack conflicts and is non-trivial.

Figure out some clean way to indicate that stack construction Stores/Moves
do not alias each other, so that subsequent passes may do things like
CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
This would allow us to eliminate the minor text/template regression.

Possibly make assignments to stack slots not treated as statements by DWARF.

Compiler benchmarks:

name        old time/op       new time/op       delta
Template          182ms ± 2%        179ms ± 2%  -1.69%  (p=0.000 n=47+48)
Unicode          86.3ms ± 5%       85.1ms ± 4%  -1.36%  (p=0.001 n=50+50)
GoTypes           646ms ± 1%        642ms ± 1%  -0.63%  (p=0.000 n=49+48)
Compiler          2.89s ± 1%        2.86s ± 2%  -1.36%  (p=0.000 n=48+50)
SSA               8.47s ± 1%        8.37s ± 2%  -1.22%  (p=0.000 n=47+50)
Flate             122ms ± 2%        121ms ± 2%  -0.66%  (p=0.000 n=47+45)
GoParser          147ms ± 2%        146ms ± 2%  -0.53%  (p=0.006 n=46+49)
Reflect           406ms ± 2%        403ms ± 2%  -0.76%  (p=0.000 n=48+43)
Tar               162ms ± 3%        162ms ± 4%    ~     (p=0.191 n=46+50)
XML               223ms ± 2%        222ms ± 2%  -0.37%  (p=0.031 n=45+49)
[Geo mean]        382ms             378ms       -0.89%

name        old user-time/op  new user-time/op  delta
Template          219ms ± 3%        216ms ± 3%  -1.56%  (p=0.000 n=50+48)
Unicode           109ms ± 6%        109ms ± 5%    ~     (p=0.190 n=50+49)
GoTypes           836ms ± 2%        828ms ± 2%  -0.96%  (p=0.000 n=49+48)
Compiler          3.87s ± 2%        3.80s ± 1%  -1.81%  (p=0.000 n=49+46)
SSA               12.0s ± 1%        11.8s ± 1%  -2.01%  (p=0.000 n=48+50)
Flate             142ms ± 3%        141ms ± 3%  -0.85%  (p=0.003 n=50+48)
GoParser          178ms ± 4%        175ms ± 4%  -1.66%  (p=0.000 n=48+46)
Reflect           520ms ± 2%        512ms ± 2%  -1.44%  (p=0.000 n=45+48)
Tar               200ms ± 3%        198ms ± 4%  -0.61%  (p=0.037 n=47+50)
XML               277ms ± 3%        275ms ± 3%  -0.85%  (p=0.000 n=49+48)
[Geo mean]        482ms             476ms       -1.23%

name        old alloc/op      new alloc/op      delta
Template         36.1MB ± 0%       35.3MB ± 0%  -2.18%  (p=0.008 n=5+5)
Unicode          29.8MB ± 0%       29.3MB ± 0%  -1.58%  (p=0.008 n=5+5)
GoTypes           125MB ± 0%        123MB ± 0%  -2.13%  (p=0.008 n=5+5)
Compiler          531MB ± 0%        513MB ± 0%  -3.40%  (p=0.008 n=5+5)
SSA              2.00GB ± 0%       1.93GB ± 0%  -3.34%  (p=0.008 n=5+5)
Flate            24.5MB ± 0%       24.3MB ± 0%  -1.18%  (p=0.008 n=5+5)
GoParser         29.4MB ± 0%       28.7MB ± 0%  -2.34%  (p=0.008 n=5+5)
Reflect          87.1MB ± 0%       86.0MB ± 0%  -1.33%  (p=0.008 n=5+5)
Tar              35.3MB ± 0%       34.8MB ± 0%  -1.44%  (p=0.008 n=5+5)
XML              47.9MB ± 0%       47.1MB ± 0%  -1.86%  (p=0.008 n=5+5)
[Geo mean]       82.8MB            81.1MB       -2.08%

name        old allocs/op     new allocs/op     delta
Template           352k ± 0%         347k ± 0%  -1.32%  (p=0.008 n=5+5)
Unicode            342k ± 0%         339k ± 0%  -0.66%  (p=0.008 n=5+5)
GoTypes           1.29M ± 0%        1.27M ± 0%  -1.30%  (p=0.008 n=5+5)
Compiler          4.98M ± 0%        4.87M ± 0%  -2.14%  (p=0.008 n=5+5)
SSA               15.7M ± 0%        15.2M ± 0%  -2.86%  (p=0.008 n=5+5)
Flate              233k ± 0%         231k ± 0%  -0.83%  (p=0.008 n=5+5)
GoParser           296k ± 0%         291k ± 0%  -1.54%  (p=0.016 n=5+4)
Reflect           1.05M ± 0%        1.04M ± 0%  -0.65%  (p=0.008 n=5+5)
Tar                343k ± 0%         339k ± 0%  -0.97%  (p=0.008 n=5+5)
XML                432k ± 0%         426k ± 0%  -1.19%  (p=0.008 n=5+5)
[Geo mean]         815k              804k       -1.35%

name        old object-bytes  new object-bytes  delta
Template          505kB ± 0%        505kB ± 0%  -0.01%  (p=0.008 n=5+5)
Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
GoTypes          1.82MB ± 0%       1.83MB ± 0%  +0.06%  (p=0.008 n=5+5)
Flate             324kB ± 0%        324kB ± 0%  +0.00%  (p=0.008 n=5+5)
GoParser          402kB ± 0%        402kB ± 0%  +0.04%  (p=0.008 n=5+5)
Reflect          1.39MB ± 0%       1.39MB ± 0%  -0.01%  (p=0.008 n=5+5)
Tar               449kB ± 0%        449kB ± 0%  -0.02%  (p=0.008 n=5+5)
XML               598kB ± 0%        597kB ± 0%  -0.05%  (p=0.008 n=5+5)

Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
Reviewed-on: https://go-review.googlesource.com/c/114797
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
  • Loading branch information
josharian committed Oct 19, 2018
1 parent 6c631ae commit 2578ac5
Show file tree
Hide file tree
Showing 7 changed files with 236 additions and 344 deletions.
122 changes: 61 additions & 61 deletions src/cmd/compile/internal/gc/ssa.go
Original file line number Diff line number Diff line change
Expand Up @@ -3558,59 +3558,34 @@ func (s *state) intrinsicCall(n *Node) *ssa.Value {
return v
}

type callArg struct {
offset int64
v *ssa.Value
}
type byOffset []callArg

func (x byOffset) Len() int { return len(x) }
func (x byOffset) Swap(i, j int) { x[i], x[j] = x[j], x[i] }
func (x byOffset) Less(i, j int) bool {
return x[i].offset < x[j].offset
}

// intrinsicArgs extracts args from n, evaluates them to SSA values, and returns them.
func (s *state) intrinsicArgs(n *Node) []*ssa.Value {
// This code is complicated because of how walk transforms calls. For a call node,
// each entry in n.List is either an assignment to OINDREGSP which actually
// stores an arg, or an assignment to a temporary which computes an arg
// which is later assigned.
// The args can also be out of order.
// TODO: when walk goes away someday, this code can go away also.
var args []callArg
// Construct map of temps; see comments in s.call about the structure of n.
temps := map[*Node]*ssa.Value{}
for _, a := range n.List.Slice() {
if a.Op != OAS {
s.Fatalf("non-assignment as a function argument %v", a.Op)
s.Fatalf("non-assignment as a temp function argument %v", a.Op)
}
l, r := a.Left, a.Right
switch l.Op {
case ONAME:
// Evaluate and store to "temporary".
// Walk ensures these temporaries are dead outside of n.
temps[l] = s.expr(r)
case OINDREGSP:
// Store a value to an argument slot.
var v *ssa.Value
if x, ok := temps[r]; ok {
// This is a previously computed temporary.
v = x
} else {
// This is an explicit value; evaluate it.
v = s.expr(r)
}
args = append(args, callArg{l.Xoffset, v})
default:
s.Fatalf("function argument assignment target not allowed: %v", l.Op)
if l.Op != ONAME {
s.Fatalf("non-ONAME temp function argument %v", a.Op)
}
// Evaluate and store to "temporary".
// Walk ensures these temporaries are dead outside of n.
temps[l] = s.expr(r)
}
args := make([]*ssa.Value, n.Rlist.Len())
for i, n := range n.Rlist.Slice() {
// Store a value to an argument slot.
if x, ok := temps[n]; ok {
// This is a previously computed temporary.
args[i] = x
continue
}
// This is an explicit value; evaluate it.
args[i] = s.expr(n)
}
sort.Sort(byOffset(args))
res := make([]*ssa.Value, len(args))
for i, a := range args {
res[i] = a.v
}
return res
return args
}

// Calls the function n using the specified call type.
Expand Down Expand Up @@ -3651,7 +3626,7 @@ func (s *state) call(n *Node, k callKind) *ssa.Value {
n2.Pos = fn.Pos
n2.Type = types.Types[TUINT8] // dummy type for a static closure. Could use runtime.funcval if we had it.
closure = s.expr(n2)
// Note: receiver is already assigned in n.List, so we don't
// Note: receiver is already present in n.Rlist, so we don't
// want to set it here.
case OCALLINTER:
if fn.Op != ODOTINTER {
Expand All @@ -3672,32 +3647,43 @@ func (s *state) call(n *Node, k callKind) *ssa.Value {
dowidth(fn.Type)
stksize := fn.Type.ArgWidth() // includes receiver

// Run all argument assignments. The arg slots have already
// been offset by the appropriate amount (+2*widthptr for go/defer,
// +widthptr for interface calls).
// For OCALLMETH, the receiver is set in these statements.
// Run all assignments of temps.
// The temps are introduced to avoid overwriting argument
// slots when arguments themselves require function calls.
s.stmtList(n.List)

// Set receiver (for interface calls)
if rcvr != nil {
argStart := Ctxt.FixedFrameSize()
if k != callNormal {
argStart += int64(2 * Widthptr)
}
addr := s.constOffPtrSP(s.f.Config.Types.UintptrPtr, argStart)
s.store(types.Types[TUINTPTR], addr, rcvr)
}

// Defer/go args
// Store arguments to stack, including defer/go arguments and receiver for method calls.
// These are written in SP-offset order.
argStart := Ctxt.FixedFrameSize()
// Defer/go args.
if k != callNormal {
// Write argsize and closure (args to newproc/deferproc).
argStart := Ctxt.FixedFrameSize()
argsize := s.constInt32(types.Types[TUINT32], int32(stksize))
addr := s.constOffPtrSP(s.f.Config.Types.UInt32Ptr, argStart)
s.store(types.Types[TUINT32], addr, argsize)
addr = s.constOffPtrSP(s.f.Config.Types.UintptrPtr, argStart+int64(Widthptr))
s.store(types.Types[TUINTPTR], addr, closure)
stksize += 2 * int64(Widthptr)
argStart += 2 * int64(Widthptr)
}

// Set receiver (for interface calls).
if rcvr != nil {
addr := s.constOffPtrSP(s.f.Config.Types.UintptrPtr, argStart)
s.store(types.Types[TUINTPTR], addr, rcvr)
}

// Write args.
t := n.Left.Type
args := n.Rlist.Slice()
if n.Op == OCALLMETH {
f := t.Recv()
s.storeArg(args[0], f.Type, argStart+f.Offset)
args = args[1:]
}
for i, n := range args {
f := t.Params().Field(i)
s.storeArg(n, f.Type, argStart+f.Offset)
}

// call target
Expand Down Expand Up @@ -4182,6 +4168,20 @@ func (s *state) storeTypePtrs(t *types.Type, left, right *ssa.Value) {
}
}

func (s *state) storeArg(n *Node, t *types.Type, off int64) {
pt := types.NewPtr(t)
sp := s.constOffPtrSP(pt, off)

if !canSSAType(t) {
a := s.addr(n, false)
s.move(t, sp, a)
return
}

a := s.expr(n)
s.storeType(t, sp, a, 0, false)
}

// slice computes the slice v[i:j:k] and returns ptr, len, and cap of result.
// i,j,k may be nil, in which case they are set to their default value.
// t is a slice, ptr to array, or string type.
Expand Down
44 changes: 26 additions & 18 deletions src/cmd/compile/internal/gc/syntax.go
Original file line number Diff line number Diff line change
Expand Up @@ -603,24 +603,32 @@ const (
OAS2DOTTYPE // List = Rlist (x, ok = I.(int))
OASOP // Left Etype= Right (x += y)
OCALL // Left(List) (function call, method call or type conversion)
OCALLFUNC // Left(List) (function call f(args))
OCALLMETH // Left(List) (direct method call x.Method(args))
OCALLINTER // Left(List) (interface method call x.Method(args))
OCALLPART // Left.Right (method expression x.Method, not called)
OCAP // cap(Left)
OCLOSE // close(Left)
OCLOSURE // func Type { Body } (func literal)
OCOMPLIT // Right{List} (composite literal, not yet lowered to specific form)
OMAPLIT // Type{List} (composite literal, Type is map)
OSTRUCTLIT // Type{List} (composite literal, Type is struct)
OARRAYLIT // Type{List} (composite literal, Type is array)
OSLICELIT // Type{List} (composite literal, Type is slice)
OPTRLIT // &Left (left is composite literal)
OCONV // Type(Left) (type conversion)
OCONVIFACE // Type(Left) (type conversion, to interface)
OCONVNOP // Type(Left) (type conversion, no effect)
OCOPY // copy(Left, Right)
ODCL // var Left (declares Left of type Left.Type)

// OCALLFUNC, OCALLMETH, and OCALLINTER have the same structure.
// Prior to walk, they are: Left(List), where List is all regular arguments.
// If present, Right is an ODDDARG that holds the
// generated slice used in a call to a variadic function.
// After walk, List is a series of assignments to temporaries,
// and Rlist is an updated set of arguments, including any ODDDARG slice.
// TODO(josharian/khr): Use Ninit instead of List for the assignments to temporaries. See CL 114797.
OCALLFUNC // Left(List/Rlist) (function call f(args))
OCALLMETH // Left(List/Rlist) (direct method call x.Method(args))
OCALLINTER // Left(List/Rlist) (interface method call x.Method(args))
OCALLPART // Left.Right (method expression x.Method, not called)
OCAP // cap(Left)
OCLOSE // close(Left)
OCLOSURE // func Type { Body } (func literal)
OCOMPLIT // Right{List} (composite literal, not yet lowered to specific form)
OMAPLIT // Type{List} (composite literal, Type is map)
OSTRUCTLIT // Type{List} (composite literal, Type is struct)
OARRAYLIT // Type{List} (composite literal, Type is array)
OSLICELIT // Type{List} (composite literal, Type is slice)
OPTRLIT // &Left (left is composite literal)
OCONV // Type(Left) (type conversion)
OCONVIFACE // Type(Left) (type conversion, to interface)
OCONVNOP // Type(Left) (type conversion, no effect)
OCOPY // copy(Left, Right)
ODCL // var Left (declares Left of type Left.Type)

// Used during parsing but don't last.
ODCLFUNC // func f() or func (r) f()
Expand Down
Loading

0 comments on commit 2578ac5

Please sign in to comment.