Permalink
Switch branches/tags
weekly.2012-03-27 weekly.2012-03-22 weekly.2012-03-13 weekly.2012-03-04 weekly.2012-02-22 weekly.2012-02-14 weekly.2012-02-07 weekly.2012-01-27 weekly.2012-01-20 weekly.2012-01-15 weekly.2011-12-22 weekly.2011-12-14 weekly.2011-12-06 weekly.2011-12-02 weekly.2011-12-01 weekly.2011-11-18 weekly.2011-11-09 weekly.2011-11-08 weekly.2011-11-02 weekly.2011-11-01 weekly.2011-10-26 weekly.2011-10-25 weekly.2011-10-18 weekly.2011-10-06 weekly.2011-09-21 weekly.2011-09-16 weekly.2011-09-07 weekly.2011-09-01 weekly.2011-08-17 weekly.2011-08-10 weekly.2011-07-29 weekly.2011-07-19 weekly.2011-07-07 weekly.2011-06-23 weekly.2011-06-16 weekly.2011-06-09 weekly.2011-06-02 weekly.2011-05-22 weekly.2011-04-27 weekly.2011-04-13 weekly.2011-04-04 weekly.2011-03-28 weekly.2011-03-15 weekly.2011-03-07.1 weekly.2011-03-07 weekly.2011-02-24 weekly.2011-02-15 weekly.2011-02-01.1 weekly.2011-02-01 weekly.2011-01-20 weekly.2011-01-19 weekly.2011-01-12 weekly.2011-01-06 weekly.2010-12-22 weekly.2010-12-15.1 weekly.2010-12-15 weekly.2010-12-08 weekly.2010-12-02 weekly.2010-11-23 weekly.2010-11-10 weekly.2010-11-02 weekly.2010-10-27 weekly.2010-10-20 weekly.2010-10-13.1 weekly.2010-10-13 weekly.2010-09-29 weekly.2010-09-22 weekly.2010-09-15 weekly.2010-09-06 weekly.2010-08-25 weekly.2010-08-11 weekly.2010-08-04 weekly.2010-07-29 weekly.2010-07-14 weekly.2010-07-01 weekly.2010-06-21 weekly.2010-06-09 weekly.2010-05-27 weekly.2010-05-04 weekly.2010-04-27 weekly.2010-04-13 weekly.2010-03-30 weekly.2010-03-22 weekly.2010-03-15 weekly.2010-03-04 weekly.2010-02-23 weekly.2010-02-17 weekly.2010-02-04 weekly.2010-01-27 weekly.2010-01-13 weekly.2010-01-05 weekly.2009-12-22 weekly.2009-12-09 weekly.2009-12-07 weekly.2009-11-17 weekly.2009-11-12 weekly.2009-11-10.1 weekly.2009-11-10 weekly.2009-11-06 weekly
Nothing to show
Commits on Oct 19, 2018
  1. cmd/compile: move argument stack construction to SSA generation

    josharian committed May 6, 2018
    The goal of this change is to move work from walk to SSA,
    and simplify things along the way.
    
    This is hard to accomplish cleanly with small incremental changes,
    so this large commit message aims to provide a roadmap to the diff.
    
    High level description:
    
    Prior to this change, walk was responsible for constructing (most of) the stack for function calls.
    ascompatte gathered variadic arguments into a slice.
    It also rewrote n.List from a list of arguments to a list of assignments to stack slots.
    ascompatte was called multiple times to handle the receiver in a method call.
    reorder1 then introduced temporaries into n.List as needed to avoid smashing the stack.
    adjustargs then made extra stack space for go/defer args as needed.
    
    Node to SSA construction evaluated all the statements in n.List,
    and issued the function call, assuming that the stack was correctly constructed.
    Intrinsic calls had to dig around inside n.List to extract the arguments,
    since intrinsics don't use the stack to make function calls.
    
    This change moves stack construction to the SSA construction phase.
    ascompatte, now called walkParams, does all the work that ascompatte and reorder1 did.
    It handles variadic arguments, inserts the method receiver if needed, and allocates temporaries.
    It does not, however, make any assignments to stack slots.
    Instead, it moves the function arguments to n.Rlist, leaving assignments to temporaries in n.List.
    (It would be better to use Ninit instead of List; future work.)
    During SSA construction, after doing all the temporary assignments in n.List,
    the function arguments are assigned to stack slots by
    constructing the appropriate SSA Value, using (*state).storeArg.
    SSA construction also now handles adjustments for go/defer args.
    This change also simplifies intrinsic calls, since we no longer need to undo walk's work.
    
    Along the way, we simplify nodarg by pushing the fp==1 case to its callers, where it fits nicely.
    
    Generated code differences:
    
    There were a few optimizations applied along the way, the old way.
    f(g()) was rewritten to do a block copy of function results to function arguments.
    And reorder1 avoided introducing the final "save the stack" temporary in n.List.
    
    The f(g()) block copy optimization never actually triggered; the order pass rewrote away g(), so that has been removed.
    
    SSA optimizations mostly obviated the need for reorder1's optimization of avoiding the final temporary.
    The exception was when the temporary's type was not SSA-able;
    in that case, we got a Move into an autotmp and then an immediate Move onto the stack,
    with the autotmp never read or used again.
    This change introduces a new rewrite rule to detect such pointless double Moves
    and collapse them into a single Move.
    This is actually more powerful than the original optimization,
    since the original optimization relied on the imprecise Node.HasCall calculation.
    
    The other significant difference in the generated code is that the stack is now constructed
    completely in SP-offset order. Prior to this change, the stack was constructed somewhat
    haphazardly: first the final argument that Node.HasCall deemed to require a temporary,
    then other arguments, then the method receiver, then the defer/go args.
    SP-offset is probably a good default order. See future work.
    
    There are a few minor object file size changes as a result of this change.
    I investigated some regressions in early versions of this change.
    
    One regression (in archive/tar) was the addition of a single CMPQ instruction,
    which would be eliminated were this TODO from flagalloc to be done:
    	// TODO: Remove original instructions if they are never used.
    
    One regression (in text/template) was an ADDQconstmodify that is now
    a regular MOVQLoad+ADDQconst+MOVQStore, due to an unlucky change
    in the order in which arguments are written. The argument change
    order can also now be luckier, so this appears to be a wash.
    
    All in all, though there will be minor winners and losers,
    this change appears to be performance neutral.
    
    Future work:
    
    Move loading the result of function calls to SSA construction; eliminate OINDREGSP.
    
    Consider pushing stack construction deeper into SSA world, perhaps in an arch-specific pass.
    Among other benefits, this would make it easier to transition to a new calling convention.
    This would require rethinking the handling of stack conflicts and is non-trivial.
    
    Figure out some clean way to indicate that stack construction Stores/Moves
    do not alias each other, so that subsequent passes may do things like
    CSE+tighten shared stack setup, do DSE using non-first Stores, etc.
    This would allow us to eliminate the minor text/template regression.
    
    Possibly make assignments to stack slots not treated as statements by DWARF.
    
    Compiler benchmarks:
    
    name        old time/op       new time/op       delta
    Template          182ms ± 2%        179ms ± 2%  -1.69%  (p=0.000 n=47+48)
    Unicode          86.3ms ± 5%       85.1ms ± 4%  -1.36%  (p=0.001 n=50+50)
    GoTypes           646ms ± 1%        642ms ± 1%  -0.63%  (p=0.000 n=49+48)
    Compiler          2.89s ± 1%        2.86s ± 2%  -1.36%  (p=0.000 n=48+50)
    SSA               8.47s ± 1%        8.37s ± 2%  -1.22%  (p=0.000 n=47+50)
    Flate             122ms ± 2%        121ms ± 2%  -0.66%  (p=0.000 n=47+45)
    GoParser          147ms ± 2%        146ms ± 2%  -0.53%  (p=0.006 n=46+49)
    Reflect           406ms ± 2%        403ms ± 2%  -0.76%  (p=0.000 n=48+43)
    Tar               162ms ± 3%        162ms ± 4%    ~     (p=0.191 n=46+50)
    XML               223ms ± 2%        222ms ± 2%  -0.37%  (p=0.031 n=45+49)
    [Geo mean]        382ms             378ms       -0.89%
    
    name        old user-time/op  new user-time/op  delta
    Template          219ms ± 3%        216ms ± 3%  -1.56%  (p=0.000 n=50+48)
    Unicode           109ms ± 6%        109ms ± 5%    ~     (p=0.190 n=50+49)
    GoTypes           836ms ± 2%        828ms ± 2%  -0.96%  (p=0.000 n=49+48)
    Compiler          3.87s ± 2%        3.80s ± 1%  -1.81%  (p=0.000 n=49+46)
    SSA               12.0s ± 1%        11.8s ± 1%  -2.01%  (p=0.000 n=48+50)
    Flate             142ms ± 3%        141ms ± 3%  -0.85%  (p=0.003 n=50+48)
    GoParser          178ms ± 4%        175ms ± 4%  -1.66%  (p=0.000 n=48+46)
    Reflect           520ms ± 2%        512ms ± 2%  -1.44%  (p=0.000 n=45+48)
    Tar               200ms ± 3%        198ms ± 4%  -0.61%  (p=0.037 n=47+50)
    XML               277ms ± 3%        275ms ± 3%  -0.85%  (p=0.000 n=49+48)
    [Geo mean]        482ms             476ms       -1.23%
    
    name        old alloc/op      new alloc/op      delta
    Template         36.1MB ± 0%       35.3MB ± 0%  -2.18%  (p=0.008 n=5+5)
    Unicode          29.8MB ± 0%       29.3MB ± 0%  -1.58%  (p=0.008 n=5+5)
    GoTypes           125MB ± 0%        123MB ± 0%  -2.13%  (p=0.008 n=5+5)
    Compiler          531MB ± 0%        513MB ± 0%  -3.40%  (p=0.008 n=5+5)
    SSA              2.00GB ± 0%       1.93GB ± 0%  -3.34%  (p=0.008 n=5+5)
    Flate            24.5MB ± 0%       24.3MB ± 0%  -1.18%  (p=0.008 n=5+5)
    GoParser         29.4MB ± 0%       28.7MB ± 0%  -2.34%  (p=0.008 n=5+5)
    Reflect          87.1MB ± 0%       86.0MB ± 0%  -1.33%  (p=0.008 n=5+5)
    Tar              35.3MB ± 0%       34.8MB ± 0%  -1.44%  (p=0.008 n=5+5)
    XML              47.9MB ± 0%       47.1MB ± 0%  -1.86%  (p=0.008 n=5+5)
    [Geo mean]       82.8MB            81.1MB       -2.08%
    
    name        old allocs/op     new allocs/op     delta
    Template           352k ± 0%         347k ± 0%  -1.32%  (p=0.008 n=5+5)
    Unicode            342k ± 0%         339k ± 0%  -0.66%  (p=0.008 n=5+5)
    GoTypes           1.29M ± 0%        1.27M ± 0%  -1.30%  (p=0.008 n=5+5)
    Compiler          4.98M ± 0%        4.87M ± 0%  -2.14%  (p=0.008 n=5+5)
    SSA               15.7M ± 0%        15.2M ± 0%  -2.86%  (p=0.008 n=5+5)
    Flate              233k ± 0%         231k ± 0%  -0.83%  (p=0.008 n=5+5)
    GoParser           296k ± 0%         291k ± 0%  -1.54%  (p=0.016 n=5+4)
    Reflect           1.05M ± 0%        1.04M ± 0%  -0.65%  (p=0.008 n=5+5)
    Tar                343k ± 0%         339k ± 0%  -0.97%  (p=0.008 n=5+5)
    XML                432k ± 0%         426k ± 0%  -1.19%  (p=0.008 n=5+5)
    [Geo mean]         815k              804k       -1.35%
    
    name        old object-bytes  new object-bytes  delta
    Template          505kB ± 0%        505kB ± 0%  -0.01%  (p=0.008 n=5+5)
    Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
    GoTypes          1.82MB ± 0%       1.83MB ± 0%  +0.06%  (p=0.008 n=5+5)
    Flate             324kB ± 0%        324kB ± 0%  +0.00%  (p=0.008 n=5+5)
    GoParser          402kB ± 0%        402kB ± 0%  +0.04%  (p=0.008 n=5+5)
    Reflect          1.39MB ± 0%       1.39MB ± 0%  -0.01%  (p=0.008 n=5+5)
    Tar               449kB ± 0%        449kB ± 0%  -0.02%  (p=0.008 n=5+5)
    XML               598kB ± 0%        597kB ± 0%  -0.05%  (p=0.008 n=5+5)
    
    Change-Id: Ifc9d5c1bd01f90171414b8fb18ffe2290d271143
    Reviewed-on: https://go-review.googlesource.com/c/114797
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: David Chase <drchase@google.com>
    Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Commits on Oct 17, 2018
  1. test: limit runoutput concurrency with -v

    josharian committed Oct 17, 2018
    This appears to have simply been an oversight.
    
    Change-Id: Ia5d1309b3ebc99c9abbf0282397693272d8178aa
    Reviewed-on: https://go-review.googlesource.com/c/142885
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
Commits on Oct 15, 2018
  1. cmd/compile: fuse before branchelim

    josharian committed May 27, 2018
    The branchelim pass works better after fuse.
    Running fuse before branchelim also increases
    the stability of generated code amidst other compiler changes,
    which was the original motivation behind this change.
    
    The fuse pass is not cheap enough to run in its entirety
    before branchelim, but the most important half of it is.
    This change makes it possible to run "plain fuse" independently
    and does so before branchelim.
    
    During make.bash, elimIf occurrences increase from 4244 to 4288 (1%),
    and elimIfElse occurrences increase from 989 to 1079 (9%).
    
    Toolspeed impact is marginal; plain fuse pays for itself.
    
    name        old time/op       new time/op       delta
    Template          189ms ± 2%        189ms ± 2%    ~     (p=0.890 n=45+46)
    Unicode          93.2ms ± 5%       93.4ms ± 7%    ~     (p=0.790 n=48+48)
    GoTypes           662ms ± 4%        660ms ± 4%    ~     (p=0.186 n=48+49)
    Compiler          2.89s ± 4%        2.91s ± 3%  +0.89%  (p=0.050 n=49+44)
    SSA               8.23s ± 2%        8.21s ± 1%    ~     (p=0.165 n=46+44)
    Flate             123ms ± 4%        123ms ± 3%  +0.58%  (p=0.031 n=47+49)
    GoParser          154ms ± 4%        154ms ± 4%    ~     (p=0.492 n=49+48)
    Reflect           430ms ± 4%        429ms ± 4%    ~     (p=1.000 n=48+48)
    Tar               171ms ± 3%        170ms ± 4%    ~     (p=0.122 n=48+48)
    XML               232ms ± 3%        232ms ± 2%    ~     (p=0.850 n=46+49)
    [Geo mean]        394ms             394ms       +0.02%
    
    name        old user-time/op  new user-time/op  delta
    Template          236ms ± 5%        236ms ± 4%    ~     (p=0.934 n=50+50)
    Unicode           132ms ± 7%        130ms ± 9%    ~     (p=0.087 n=50+50)
    GoTypes           861ms ± 3%        867ms ± 4%    ~     (p=0.124 n=48+50)
    Compiler          3.93s ± 4%        3.94s ± 3%    ~     (p=0.584 n=49+44)
    SSA               12.2s ± 2%        12.3s ± 1%    ~     (p=0.610 n=46+45)
    Flate             149ms ± 4%        150ms ± 4%    ~     (p=0.194 n=48+49)
    GoParser          193ms ± 5%        191ms ± 6%    ~     (p=0.239 n=49+50)
    Reflect           553ms ± 5%        556ms ± 5%    ~     (p=0.091 n=49+49)
    Tar               218ms ± 5%        218ms ± 5%    ~     (p=0.359 n=49+50)
    XML               299ms ± 5%        298ms ± 4%    ~     (p=0.482 n=50+49)
    [Geo mean]        516ms             516ms       -0.01%
    
    name        old alloc/op      new alloc/op      delta
    Template         36.3MB ± 0%       36.3MB ± 0%  -0.02%  (p=0.000 n=49+49)
    Unicode          29.7MB ± 0%       29.7MB ± 0%    ~     (p=0.270 n=50+50)
    GoTypes           126MB ± 0%        126MB ± 0%  -0.34%  (p=0.000 n=50+49)
    Compiler          534MB ± 0%        531MB ± 0%  -0.50%  (p=0.000 n=50+50)
    SSA              1.98GB ± 0%       1.98GB ± 0%  -0.06%  (p=0.000 n=49+49)
    Flate            24.6MB ± 0%       24.6MB ± 0%  -0.29%  (p=0.000 n=50+50)
    GoParser         29.5MB ± 0%       29.4MB ± 0%  -0.15%  (p=0.000 n=49+50)
    Reflect          87.3MB ± 0%       87.2MB ± 0%  -0.13%  (p=0.000 n=49+50)
    Tar              35.6MB ± 0%       35.5MB ± 0%  -0.17%  (p=0.000 n=50+50)
    XML              48.2MB ± 0%       48.0MB ± 0%  -0.30%  (p=0.000 n=48+50)
    [Geo mean]       83.1MB            82.9MB       -0.20%
    
    name        old allocs/op     new allocs/op     delta
    Template           352k ± 0%         352k ± 0%  -0.01%  (p=0.004 n=49+49)
    Unicode            341k ± 0%         341k ± 0%    ~     (p=0.341 n=48+50)
    GoTypes           1.28M ± 0%        1.28M ± 0%  -0.03%  (p=0.000 n=50+49)
    Compiler          4.96M ± 0%        4.96M ± 0%  -0.05%  (p=0.000 n=50+49)
    SSA               15.5M ± 0%        15.5M ± 0%  -0.01%  (p=0.000 n=50+49)
    Flate              233k ± 0%         233k ± 0%  +0.01%  (p=0.032 n=49+49)
    GoParser           294k ± 0%         294k ± 0%    ~     (p=0.052 n=46+48)
    Reflect           1.04M ± 0%        1.04M ± 0%    ~     (p=0.171 n=50+47)
    Tar                343k ± 0%         343k ± 0%  -0.03%  (p=0.000 n=50+50)
    XML                429k ± 0%         429k ± 0%  -0.04%  (p=0.000 n=50+50)
    [Geo mean]         812k              812k       -0.02%
    
    Object files grow slightly; branchelim often increases binary size, at least on amd64.
    
    name        old object-bytes  new object-bytes  delta
    Template          509kB ± 0%        509kB ± 0%  -0.01%  (p=0.008 n=5+5)
    Unicode           224kB ± 0%        224kB ± 0%    ~     (all equal)
    GoTypes          1.84MB ± 0%       1.84MB ± 0%  +0.00%  (p=0.008 n=5+5)
    Compiler         6.71MB ± 0%       6.71MB ± 0%  +0.01%  (p=0.008 n=5+5)
    SSA              21.2MB ± 0%       21.2MB ± 0%  +0.01%  (p=0.008 n=5+5)
    Flate             324kB ± 0%        324kB ± 0%  -0.00%  (p=0.008 n=5+5)
    GoParser          404kB ± 0%        404kB ± 0%  -0.02%  (p=0.008 n=5+5)
    Reflect          1.40MB ± 0%       1.40MB ± 0%  +0.09%  (p=0.008 n=5+5)
    Tar               452kB ± 0%        452kB ± 0%  +0.06%  (p=0.008 n=5+5)
    XML               596kB ± 0%        596kB ± 0%  +0.00%  (p=0.008 n=5+5)
    [Geo mean]       1.04MB            1.04MB       +0.01%
    
    Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f
    Reviewed-on: https://go-review.googlesource.com/c/129378
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Keith Randall <khr@golang.org>
Commits on Oct 10, 2018
  1. cmd/compile: remove some inl budget hacks

    josharian committed Oct 9, 2018
    Prior to stack tracing, inlining could cause
    dead pointers to be kept alive in some loops.
    See #18336 and CL 31674.
    
    The adjustment removed by this change preserved the inlining status quo
    in the face of Node structure changes, to avoid creating new problems.
    Now that stack tracing provides precision, these hacks can be removed.
    
    Of course, our inlining code model is already hacky (#17566),
    but at least now there will be fewer epicyclical hacks.
    
    Newly inline-able functions in std cmd as a result of this change:
    
    hash/adler32/adler32.go:65:6: can inline (*digest).UnmarshalBinary
    hash/fnv/fnv.go:281:6: can inline (*sum32).UnmarshalBinary
    hash/fnv/fnv.go:292:6: can inline (*sum32a).UnmarshalBinary
    reflect/value.go:1298:6: can inline Value.OverflowComplex
    compress/bzip2/bit_reader.go:25:6: can inline newBitReader
    encoding/xml/xml.go:365:6: can inline (*Decoder).switchToReader
    vendor/golang_org/x/crypto/cryptobyte/builder.go:77:6: can inline (*Builder).AddUint16
    crypto/x509/x509.go:1851:58: can inline buildExtensions.func2.1.1
    crypto/x509/x509.go:1871:58: can inline buildExtensions.func2.3.1
    crypto/x509/x509.go:1883:58: can inline buildExtensions.func2.4.1
    cmd/vet/internal/cfg/builder.go:463:6: can inline (*builder).labeledBlock
    crypto/tls/handshake_messages.go:1450:6: can inline (*newSessionTicketMsg).marshal
    crypto/tls/handshake_server.go:769:6: can inline (*serverHandshakeState).clientHelloInfo
    crypto/tls/handshake_messages.go:1171:6: can inline (*nextProtoMsg).unmarshal
    cmd/link/internal/amd64/obj.go:40:6: can inline Init
    cmd/link/internal/ppc64/obj.go:40:6: can inline Init
    net/http/httputil/persist.go:54:6: can inline NewServerConn
    net/http/fcgi/child.go:83:6: can inline newResponse
    cmd/compile/internal/ssa/poset.go:245:6: can inline (*poset).newnode
    
    Change-Id: I19e8e383a6273849673d35189a9358870665f82f
    Reviewed-on: https://go-review.googlesource.com/c/141117
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
    Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Commits on Sep 8, 2018
  1. cmd/compile: move v.Pos.line check to warnRule

    josharian committed May 28, 2018
    This simplifies the rewrite rules.
    
    Change-Id: Iff062297d42a23cb31ad55e8c733842ecbc07da2
    Reviewed-on: https://go-review.googlesource.com/129377
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Cherry Zhang <cherryyz@google.com>
Commits on Sep 4, 2018
  1. cmd/compile: prefer rematerializeable arg0 for HMUL

    josharian committed Jun 16, 2018
    This prevents accidental regalloc regressions
    that otherwise can occur from unrelated changes.
    
    Change-Id: Iea356fb1a24766361fce13748dc1b46e57b21cea
    Reviewed-on: https://go-review.googlesource.com/129375
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Cherry Zhang <cherryyz@google.com>
  2. encoding/binary: simplify Read and Write

    josharian committed Sep 1, 2018
    There's no need to manually manage the backing slice for bs.
    Removing it simplifies the code, removes some allocations,
    and speeds it up slightly.
    
    Fixes #27403
    
    name                     old time/op    new time/op    delta
    ReadSlice1000Int32s-8      6.39µs ± 1%    6.31µs ± 1%   -1.37%  (p=0.000 n=27+27)
    ReadStruct-8               1.25µs ± 2%    1.23µs ± 2%   -1.06%  (p=0.003 n=30+29)
    ReadInts-8                  301ns ± 0%     297ns ± 1%   -1.21%  (p=0.000 n=27+30)
    WriteInts-8                 325ns ± 1%     320ns ± 1%   -1.59%  (p=0.000 n=26+29)
    WriteSlice1000Int32s-8     6.60µs ± 0%    6.52µs ± 0%   -1.23%  (p=0.000 n=28+27)
    PutUint16-8                0.72ns ± 2%    0.71ns ± 2%     ~     (p=0.286 n=30+30)
    PutUint32-8                0.71ns ± 1%    0.71ns ± 0%   -0.42%  (p=0.003 n=30+25)
    PutUint64-8                0.78ns ± 2%    0.78ns ± 0%   -0.55%  (p=0.001 n=30+27)
    LittleEndianPutUint16-8    0.57ns ± 0%    0.57ns ± 0%     ~     (all equal)
    LittleEndianPutUint32-8    0.57ns ± 0%    0.57ns ± 0%     ~     (all equal)
    LittleEndianPutUint64-8    0.57ns ± 0%    0.57ns ± 0%     ~     (all equal)
    PutUvarint32-8             23.1ns ± 1%    23.1ns ± 1%     ~     (p=0.925 n=26+29)
    PutUvarint64-8             57.5ns ± 2%    57.3ns ± 1%     ~     (p=0.338 n=30+26)
    [Geo mean]                 23.0ns         22.9ns        -0.61%
    
    name                     old speed      new speed      delta
    ReadSlice1000Int32s-8     626MB/s ± 1%   634MB/s ± 1%   +1.38%  (p=0.000 n=27+27)
    ReadStruct-8             60.2MB/s ± 2%  60.8MB/s ± 2%   +1.08%  (p=0.002 n=30+29)
    ReadInts-8                100MB/s ± 1%   101MB/s ± 1%   +1.24%  (p=0.000 n=27+30)
    WriteInts-8              92.2MB/s ± 1%  93.6MB/s ± 1%   +1.56%  (p=0.000 n=26+29)
    WriteSlice1000Int32s-8    606MB/s ± 0%   614MB/s ± 0%   +1.24%  (p=0.000 n=28+27)
    PutUint16-8              2.80GB/s ± 1%  2.80GB/s ± 1%     ~     (p=0.095 n=28+29)
    PutUint32-8              5.61GB/s ± 1%  5.62GB/s ± 1%     ~     (p=0.069 n=27+28)
    PutUint64-8              10.2GB/s ± 1%  10.2GB/s ± 0%   +0.15%  (p=0.039 n=27+27)
    LittleEndianPutUint16-8  3.50GB/s ± 1%  3.50GB/s ± 1%     ~     (p=0.552 n=30+29)
    LittleEndianPutUint32-8  7.01GB/s ± 1%  7.02GB/s ± 1%     ~     (p=0.160 n=29+27)
    LittleEndianPutUint64-8  14.0GB/s ± 1%  14.0GB/s ± 1%     ~     (p=0.413 n=29+29)
    PutUvarint32-8            174MB/s ± 1%   173MB/s ± 1%     ~     (p=0.648 n=25+30)
    PutUvarint64-8            139MB/s ± 2%   140MB/s ± 1%     ~     (p=0.271 n=30+26)
    [Geo mean]                906MB/s        911MB/s        +0.55%
    
    name                     old alloc/op   new alloc/op   delta
    ReadSlice1000Int32s-8      4.14kB ± 0%    4.13kB ± 0%   -0.19%  (p=0.000 n=30+30)
    ReadStruct-8                 200B ± 0%      200B ± 0%     ~     (all equal)
    ReadInts-8                  64.0B ± 0%     32.0B ± 0%  -50.00%  (p=0.000 n=30+30)
    WriteInts-8                  112B ± 0%       64B ± 0%  -42.86%  (p=0.000 n=30+30)
    WriteSlice1000Int32s-8     4.14kB ± 0%    4.13kB ± 0%   -0.19%  (p=0.000 n=30+30)
    PutUint16-8                 0.00B          0.00B          ~     (all equal)
    PutUint32-8                 0.00B          0.00B          ~     (all equal)
    PutUint64-8                 0.00B          0.00B          ~     (all equal)
    LittleEndianPutUint16-8     0.00B          0.00B          ~     (all equal)
    LittleEndianPutUint32-8     0.00B          0.00B          ~     (all equal)
    LittleEndianPutUint64-8     0.00B          0.00B          ~     (all equal)
    PutUvarint32-8              0.00B          0.00B          ~     (all equal)
    PutUvarint64-8              0.00B          0.00B          ~     (all equal)
    [Geo mean]                   476B           370B       -22.22%
    
    name                     old allocs/op  new allocs/op  delta
    ReadSlice1000Int32s-8        3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=30+30)
    ReadStruct-8                 16.0 ± 0%      16.0 ± 0%     ~     (all equal)
    ReadInts-8                   8.00 ± 0%      8.00 ± 0%     ~     (all equal)
    WriteInts-8                  14.0 ± 0%      14.0 ± 0%     ~     (all equal)
    WriteSlice1000Int32s-8       3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=30+30)
    PutUint16-8                  0.00           0.00          ~     (all equal)
    PutUint32-8                  0.00           0.00          ~     (all equal)
    PutUint64-8                  0.00           0.00          ~     (all equal)
    LittleEndianPutUint16-8      0.00           0.00          ~     (all equal)
    LittleEndianPutUint32-8      0.00           0.00          ~     (all equal)
    LittleEndianPutUint64-8      0.00           0.00          ~     (all equal)
    PutUvarint32-8               0.00           0.00          ~     (all equal)
    PutUvarint64-8               0.00           0.00          ~     (all equal)
    [Geo mean]                   6.94           5.90       -14.97%
    
    Change-Id: I3790b93e4190d98621d5f2c47e42929a18f56c2e
    Reviewed-on: https://go-review.googlesource.com/133135
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Commits on Jul 9, 2018
  1. cmd/link/internal/sym: add sizeof tests

    josharian committed Jul 9, 2018
    CL 121916 showed that sym.Symbol matters for linker performance.
    Prevent accidental regression.
    
    Change-Id: I5fd998c91fdeef9e721bc3f6e30f775b81103e95
    Reviewed-on: https://go-review.googlesource.com/122716
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Commits on Jun 4, 2018
  1. text/tabwriter: fix BenchmarkPyramid and BenchmarkRagged again

    josharian authored and griesemer committed Jun 3, 2018
    These were added in CL 106979. I got them wrong.
    They were fixed in CL 111643. They were still wrong.
    Hopefully this change will be the last fix.
    
    With this fix, CL 106979 is allocation-neutral for BenchmarkRagged.
    The performance results for BenchmarkPyramid reported in CL 111643 stand.
    
    Change-Id: Id6a522e6602e5df31f504adf5a3bec9969c18649
    Reviewed-on: https://go-review.googlesource.com/116015
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Robert Griesemer <gri@golang.org>
Commits on May 31, 2018
  1. encoding/hex: improve Decode and DecodeString docs

    josharian committed May 31, 2018
    Simplify the wording of both.
    
    Make the DecodeString docs more accurate:
    DecodeString returns a slice, not a string.
    
    Change-Id: Iba7003f55fb0a37aafcbeee59a30492c0f68aa4e
    Reviewed-on: https://go-review.googlesource.com/115615
    Reviewed-by: Ian Lance Taylor <iant@golang.org>
Commits on May 29, 2018
  1. cmd/compile: fix trivial typos in comments

    josharian committed May 28, 2018
    Change-Id: I04880d87e317a1140ec12da6ec5e788991719760
    Reviewed-on: https://go-review.googlesource.com/114936
    Reviewed-by: Ian Lance Taylor <iant@golang.org>
  2. test: gofmt bounds.go

    josharian committed May 28, 2018
    Change-Id: I8b462e20064658120afc8eb1cbac926254d1e24e
    Reviewed-on: https://go-review.googlesource.com/114937
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Ian Lance Taylor <iant@golang.org>
Commits on May 25, 2018
  1. cmd/compile: make -W and -w headers and footers clearer

    josharian committed May 24, 2018
    -W and -w turn on printing of Nodes for both order and walk.
    I have found their output mildly incomprehensible for years.
    Improve it, at long last.
    
    Change-Id: Ia05d77e59aa741c2dfc9fcca07f45019420b655e
    Reviewed-on: https://go-review.googlesource.com/114520
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    Reviewed-by: Ian Lance Taylor <iant@golang.org>
  2. cmd/compile: improve fncall docs

    josharian committed May 24, 2018
    Comment changes only.
    
    Change-Id: I3f9c1c38ae6b4989f02b62fff09265e4bcb934f7
    Reviewed-on: https://go-review.googlesource.com/114519
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Ian Lance Taylor <iant@golang.org>
Commits on May 7, 2018
  1. cmd/compile: add some LEAL{1,2,4,8} rewrite rules for AMD64

    josharian committed Feb 26, 2018
    This should improve some 32 bit arithmetic operations.
    
    During make.bash, this increases the number of
    rules firing by 15518:
    
    $ wc -l rulelog-*
     13490514 rulelog-head
     13474996 rulelog-master
    
    compress/flate benchmarks:
    
    name                             old time/op    new time/op    delta
    Decode/Digits/Huffman/1e4-8         103µs ± 4%     102µs ± 0%  -0.95%  (p=0.000 n=30+27)
    Decode/Digits/Huffman/1e5-8         962µs ± 2%     954µs ± 1%  -0.80%  (p=0.000 n=25+25)
    Decode/Digits/Huffman/1e6-8        9.55ms ± 1%    9.50ms ± 1%  -0.57%  (p=0.000 n=29+29)
    Decode/Digits/Speed/1e4-8           110µs ± 2%     110µs ± 2%  -0.41%  (p=0.003 n=28+30)
    Decode/Digits/Speed/1e5-8          1.15ms ± 1%    1.14ms ± 1%  -0.85%  (p=0.000 n=29+28)
    Decode/Digits/Speed/1e6-8          11.5ms ± 2%    11.4ms ± 1%  -1.26%  (p=0.000 n=28+27)
    Decode/Digits/Default/1e4-8         113µs ± 1%     112µs ± 1%  -0.49%  (p=0.001 n=27+30)
    Decode/Digits/Default/1e5-8        1.13ms ± 0%    1.12ms ± 1%  -0.75%  (p=0.000 n=26+24)
    Decode/Digits/Default/1e6-8        11.1ms ± 1%    11.1ms ± 1%  -0.47%  (p=0.000 n=28+27)
    Decode/Digits/Compression/1e4-8     113µs ± 1%     112µs ± 1%  -0.70%  (p=0.000 n=28+29)
    Decode/Digits/Compression/1e5-8    1.13ms ± 2%    1.12ms ± 1%  -1.41%  (p=0.000 n=28+26)
    Decode/Digits/Compression/1e6-8    11.1ms ± 1%    11.1ms ± 1%  -0.33%  (p=0.002 n=29+27)
    Decode/Twain/Huffman/1e4-8          115µs ± 1%     115µs ± 1%  -0.40%  (p=0.000 n=28+26)
    Decode/Twain/Huffman/1e5-8         1.05ms ± 1%    1.04ms ± 0%  -0.41%  (p=0.000 n=27+25)
    Decode/Twain/Huffman/1e6-8         10.4ms ± 1%    10.4ms ± 1%    ~     (p=0.993 n=28+24)
    Decode/Twain/Speed/1e4-8            118µs ± 2%     116µs ± 1%  -1.08%  (p=0.000 n=27+29)
    Decode/Twain/Speed/1e5-8           1.07ms ± 1%    1.07ms ± 1%  -0.23%  (p=0.041 n=26+27)
    Decode/Twain/Speed/1e6-8           10.6ms ± 1%    10.5ms ± 0%  -0.68%  (p=0.000 n=29+27)
    Decode/Twain/Default/1e4-8          110µs ± 1%     109µs ± 0%  -0.49%  (p=0.000 n=29+26)
    Decode/Twain/Default/1e5-8          906µs ± 1%     902µs ± 1%  -0.48%  (p=0.000 n=27+28)
    Decode/Twain/Default/1e6-8         8.75ms ± 1%    8.68ms ± 2%  -0.73%  (p=0.000 n=28+28)
    Decode/Twain/Compression/1e4-8      110µs ± 1%     109µs ± 1%  -0.80%  (p=0.000 n=27+28)
    Decode/Twain/Compression/1e5-8      905µs ± 1%     906µs ± 5%    ~     (p=0.065 n=27+29)
    Decode/Twain/Compression/1e6-8     8.75ms ± 2%    8.68ms ± 1%  -0.76%  (p=0.000 n=26+26)
    Encode/Digits/Huffman/1e4-8        31.8µs ± 1%    32.3µs ± 2%  +1.43%  (p=0.000 n=28+27)
    Encode/Digits/Huffman/1e5-8         299µs ± 2%     296µs ± 1%  -1.05%  (p=0.000 n=29+29)
    Encode/Digits/Huffman/1e6-8        2.99ms ± 3%    2.96ms ± 1%  -1.00%  (p=0.000 n=29+28)
    Encode/Digits/Speed/1e4-8           149µs ± 1%     152µs ± 4%  +2.18%  (p=0.000 n=30+30)
    Encode/Digits/Speed/1e5-8          1.39ms ± 1%    1.40ms ± 2%  +1.02%  (p=0.000 n=27+27)
    Encode/Digits/Speed/1e6-8          13.7ms ± 0%    13.8ms ± 1%  +0.81%  (p=0.000 n=27+27)
    Encode/Digits/Default/1e4-8         297µs ± 7%     297µs ± 7%    ~     (p=1.000 n=30+30)
    Encode/Digits/Default/1e5-8        4.51ms ± 1%    4.42ms ± 1%  -2.06%  (p=0.000 n=29+29)
    Encode/Digits/Default/1e6-8        47.5ms ± 1%    46.6ms ± 1%  -1.90%  (p=0.000 n=27+25)
    Encode/Digits/Compression/1e4-8     302µs ± 7%     303µs ± 9%    ~     (p=0.854 n=30+30)
    Encode/Digits/Compression/1e5-8    4.52ms ± 1%    4.43ms ± 2%  -1.91%  (p=0.000 n=26+25)
    Encode/Digits/Compression/1e6-8    47.5ms ± 1%    46.7ms ± 1%  -1.70%  (p=0.000 n=26+27)
    Encode/Twain/Huffman/1e4-8         46.6µs ± 2%    46.8µs ± 2%    ~     (p=0.114 n=30+30)
    Encode/Twain/Huffman/1e5-8          357µs ± 3%     352µs ± 2%  -1.13%  (p=0.000 n=29+28)
    Encode/Twain/Huffman/1e6-8         3.58ms ± 4%    3.52ms ± 1%  -1.43%  (p=0.003 n=30+28)
    Encode/Twain/Speed/1e4-8            173µs ± 1%     174µs ± 1%  +0.65%  (p=0.000 n=27+28)
    Encode/Twain/Speed/1e5-8           1.39ms ± 1%    1.40ms ± 1%  +0.92%  (p=0.000 n=28+27)
    Encode/Twain/Speed/1e6-8           13.6ms ± 1%    13.7ms ± 1%  +0.51%  (p=0.000 n=25+26)
    Encode/Twain/Default/1e4-8          364µs ± 5%     361µs ± 5%    ~     (p=0.219 n=30+30)
    Encode/Twain/Default/1e5-8         5.41ms ± 1%    5.43ms ± 5%    ~     (p=0.655 n=27+27)
    Encode/Twain/Default/1e6-8         57.2ms ± 1%    58.4ms ± 4%  +2.15%  (p=0.000 n=22+28)
    Encode/Twain/Compression/1e4-8      371µs ± 9%     373µs ± 6%    ~     (p=0.503 n=30+29)
    Encode/Twain/Compression/1e5-8     5.97ms ± 2%    5.92ms ± 1%  -0.75%  (p=0.000 n=28+26)
    Encode/Twain/Compression/1e6-8     64.0ms ± 1%    63.8ms ± 1%  -0.36%  (p=0.036 n=27+25)
    [Geo mean]                         1.37ms         1.36ms       -0.38%
    
    
    Change-Id: I3df4de63f06eaf121c38821bd889453a8de1b199
    Reviewed-on: https://go-review.googlesource.com/101276
    Reviewed-by: Keith Randall <khr@golang.org>
  2. text/tabwriter: don't mimic previous lines on flush

    josharian authored and griesemer committed May 6, 2018
    \f triggers a flush.
    
    This is used (by gofmt, among others) to indicate that
    the current aligned segment has ended.
    
    When flushed, it is unlikely that the previous line is
    in fact a good predictor of the upcoming line,
    so stop treating it as such.
    
    No performance impact on the existing benchmarks,
    which do not perform any flushes.
    
    Change-Id: Ifdf3e6d4600713c90db7b51a10e429d9260dc08c
    Reviewed-on: https://go-review.googlesource.com/111644
    Reviewed-by: Robert Griesemer <gri@golang.org>
Commits on May 6, 2018
  1. cmd/compile: use slice extension idiom in LSym.Grow

    josharian committed Apr 27, 2018
    name        old alloc/op      new alloc/op      delta
    Template         35.0MB ± 0%       35.0MB ± 0%  -0.05%  (p=0.008 n=5+5)
    Unicode          29.3MB ± 0%       29.3MB ± 0%    ~     (p=0.310 n=5+5)
    GoTypes           115MB ± 0%        115MB ± 0%  -0.08%  (p=0.008 n=5+5)
    Compiler          519MB ± 0%        519MB ± 0%  -0.08%  (p=0.008 n=5+5)
    SSA              1.59GB ± 0%       1.59GB ± 0%  -0.05%  (p=0.008 n=5+5)
    Flate            24.2MB ± 0%       24.2MB ± 0%  -0.06%  (p=0.008 n=5+5)
    GoParser         28.2MB ± 0%       28.1MB ± 0%  -0.04%  (p=0.016 n=5+5)
    Reflect          78.8MB ± 0%       78.7MB ± 0%  -0.10%  (p=0.008 n=5+5)
    Tar              34.5MB ± 0%       34.4MB ± 0%  -0.07%  (p=0.008 n=5+5)
    XML              43.3MB ± 0%       43.2MB ± 0%  -0.09%  (p=0.008 n=5+5)
    [Geo mean]       77.5MB            77.4MB       -0.06%
    
    name        old allocs/op     new allocs/op     delta
    Template           330k ± 0%         329k ± 0%  -0.32%  (p=0.008 n=5+5)
    Unicode            337k ± 0%         336k ± 0%  -0.10%  (p=0.008 n=5+5)
    GoTypes           1.15M ± 0%        1.14M ± 0%  -0.34%  (p=0.008 n=5+5)
    Compiler          4.78M ± 0%        4.77M ± 0%  -0.25%  (p=0.008 n=5+5)
    SSA               12.9M ± 0%        12.9M ± 0%  -0.12%  (p=0.008 n=5+5)
    Flate              221k ± 0%         220k ± 0%  -0.32%  (p=0.008 n=5+5)
    GoParser           275k ± 0%         274k ± 0%  -0.34%  (p=0.008 n=5+5)
    Reflect            944k ± 0%         940k ± 0%  -0.42%  (p=0.008 n=5+5)
    Tar                323k ± 0%         322k ± 0%  -0.31%  (p=0.008 n=5+5)
    XML                384k ± 0%         383k ± 0%  -0.26%  (p=0.008 n=5+5)
    [Geo mean]         749k              747k       -0.28%
    
    
    Updates #21266
    
    Change-Id: I926ee3ba009c068239db70cdee8fdf85b5ee6bb4
    Reviewed-on: https://go-review.googlesource.com/109816
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
  2. text/tabwriter: fix BenchmarkPyramid and BenchmarkRagged

    josharian committed May 6, 2018
    These were added in CL 106979. They were wrong.
    
    The correct impact of CL 106979 on these benchmarks is:
    
    name            old time/op    new time/op    delta
    Pyramid/10-8      6.22µs ± 1%    5.68µs ± 0%    -8.78%  (p=0.000 n=15+13)
    Pyramid/100-8      275µs ± 1%     255µs ± 1%    -7.30%  (p=0.000 n=15+13)
    Pyramid/1000-8    25.6ms ± 1%    24.8ms ± 1%    -2.88%  (p=0.000 n=15+14)
    Ragged/10-8       8.98µs ± 1%    6.74µs ± 0%   -24.98%  (p=0.000 n=15+14)
    Ragged/100-8      85.3µs ± 0%    57.5µs ± 1%   -32.51%  (p=0.000 n=13+15)
    Ragged/1000-8      847µs ± 1%     561µs ± 1%   -33.85%  (p=0.000 n=14+15)
    
    name            old alloc/op   new alloc/op   delta
    Pyramid/10-8      4.74kB ± 0%    4.88kB ± 0%    +3.04%  (p=0.000 n=15+15)
    Pyramid/100-8      379kB ± 0%     411kB ± 0%    +8.50%  (p=0.000 n=15+12)
    Pyramid/1000-8    35.3MB ± 0%    41.6MB ± 0%   +17.68%  (p=0.000 n=15+15)
    Ragged/10-8       4.82kB ± 0%    1.82kB ± 0%   -62.13%  (p=0.000 n=15+15)
    Ragged/100-8      45.4kB ± 0%     1.8kB ± 0%   -95.98%  (p=0.000 n=15+15)
    Ragged/1000-8      449kB ± 0%       2kB ± 0%   -99.59%  (p=0.000 n=15+15)
    
    name            old allocs/op  new allocs/op  delta
    Pyramid/10-8        50.0 ± 0%      35.0 ± 0%   -30.00%  (p=0.000 n=15+15)
    Pyramid/100-8        704 ± 0%       231 ± 0%   -67.19%  (p=0.000 n=15+15)
    Pyramid/1000-8     10.0k ± 0%      2.1k ± 0%   -79.52%  (p=0.000 n=15+15)
    Ragged/10-8         60.0 ± 0%      19.0 ± 0%   -68.33%  (p=0.000 n=15+15)
    Ragged/100-8         511 ± 0%        19 ± 0%   -96.28%  (p=0.000 n=15+15)
    Ragged/1000-8      5.01k ± 0%     0.02k ± 0%   -99.62%  (p=0.000 n=15+15)
    
    
    This is an improvement over what was originally reported,
    except the increase in alloc/op for the Pyramid benchmarks.
    
    Change-Id: Ib2617c1288ce35f2c78e0172533d231b86e48bc2
    Reviewed-on: https://go-review.googlesource.com/111643
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Commits on May 3, 2018
  1. cmd/compile: regenerate ssa ops

    josharian committed May 3, 2018
    Must have been missed in a previous CL.
    
    Change-Id: I303736e82585be8d58b330235c76ed4b24a92952
    Reviewed-on: https://go-review.googlesource.com/111259
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
  2. cmd/compile: optimize a - b == 0 into a == b

    josharian committed Apr 27, 2018
    These rules trigger 1141 times during make.bash.
    
    Shrinks a few object files a tiny bit:
    
    name        old object-bytes  new object-bytes  delta
    Template          476kB ± 0%        476kB ± 0%  +0.00%  (p=0.008 n=5+5)
    Unicode           218kB ± 0%        218kB ± 0%    ~     (all equal)
    GoTypes          1.58MB ± 0%       1.58MB ± 0%    ~     (all equal)
    Compiler         6.25MB ± 0%       6.25MB ± 0%  -0.00%  (p=0.008 n=5+5)
    Flate             304kB ± 0%        304kB ± 0%  -0.01%  (p=0.008 n=5+5)
    GoParser          370kB ± 0%        370kB ± 0%    ~     (all equal)
    Reflect          1.27MB ± 0%       1.27MB ± 0%    ~     (all equal)
    Tar               421kB ± 0%        421kB ± 0%  -0.05%  (p=0.008 n=5+5)
    XML               518kB ± 0%        518kB ± 0%    ~     (all equal)
    
    archive/tar benchmarks:
    
    name             old time/op    new time/op    delta
    /Writer/USTAR-8    3.97µs ± 1%    3.88µs ± 0%  -2.26%  (p=0.000 n=26+26)
    /Writer/GNU-8      4.67µs ± 0%    4.54µs ± 1%  -2.72%  (p=0.000 n=28+27)
    /Writer/PAX-8      8.20µs ± 0%    8.01µs ± 0%  -2.32%  (p=0.000 n=29+29)
    /Reader/USTAR-8    3.61µs ± 0%    3.54µs ± 1%  -2.04%  (p=0.000 n=25+28)
    /Reader/GNU-8      2.27µs ± 2%    2.17µs ± 0%  -4.08%  (p=0.000 n=30+28)
    /Reader/PAX-8      7.75µs ± 0%    7.63µs ± 0%  -1.60%  (p=0.000 n=28+28)
    [Geo mean]         4.61µs         4.50µs       -2.51%
    
    Change-Id: Ib4dfade5069a7463ccaba073ea91c8213e9714a0
    Reviewed-on: https://go-review.googlesource.com/110235
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
    Reviewed-by: Keith Randall <khr@golang.org>
  3. cmd/compile: shrink liveness maps

    josharian committed Apr 2, 2018
    The GC maps don't care about trailing non-pointers in args.
    Work harder to eliminate them.
    
    This should provide a slight speedup to everything that reads these
    maps, mainly GC and stack copying.
    
    The non-ptr-y runtime benchmarks happen to go from having a non-empty
    args map to an empty args map, so they have a significant speedup.
    
    name                old time/op  new time/op  delta
    StackCopyPtr-8      80.2ms ± 4%  79.7ms ± 2%  -0.63%  (p=0.001 n=94+91)
    StackCopy-8         63.3ms ± 3%  59.2ms ± 3%  -6.45%  (p=0.000 n=98+97)
    StackCopyNoCache-8   107ms ± 3%    98ms ± 3%  -8.00%  (p=0.000 n=95+88)
    
    It also shrinks object files a tiny bit:
    
    name        old object-bytes  new object-bytes  delta
    Template          476kB ± 0%        476kB ± 0%  -0.03%  (p=0.008 n=5+5)
    Unicode           218kB ± 0%        218kB ± 0%  -0.09%  (p=0.008 n=5+5)
    GoTypes          1.58MB ± 0%       1.58MB ± 0%  -0.03%  (p=0.008 n=5+5)
    Compiler         6.25MB ± 0%       6.24MB ± 0%  -0.06%  (p=0.008 n=5+5)
    SSA              15.9MB ± 0%       15.9MB ± 0%  -0.06%  (p=0.008 n=5+5)
    Flate             304kB ± 0%        303kB ± 0%  -0.29%  (p=0.008 n=5+5)
    GoParser          370kB ± 0%        370kB ± 0%  +0.02%  (p=0.008 n=5+5)
    Reflect          1.27MB ± 0%       1.27MB ± 0%  -0.07%  (p=0.008 n=5+5)
    Tar               421kB ± 0%        421kB ± 0%  -0.05%  (p=0.008 n=5+5)
    XML               518kB ± 0%        517kB ± 0%  -0.06%  (p=0.008 n=5+5)
    [Geo mean]        934kB             933kB       -0.07%
    
    Note that some object files do grow;
    this can happen because some maps that were
    duplicates of each others must be stored separately.
    
    Change-Id: Ie076891bd8e9d269ff2ff5435d5d25c721e0e31d
    Reviewed-on: https://go-review.googlesource.com/104175
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
  4. runtime: convert g.waitreason from string to uint8

    josharian committed Mar 7, 2018
    Every time I poke at #14921, the g.waitreason string
    pointer writes show up.
    
    They're not particularly important performance-wise,
    but it'd be nice to clear the noise away.
    
    And it does open up a few extra bytes in the g struct
    for some future use.
    
    This is a re-roll of CL 99078, which was rolled
    back because of failures on s390x.
    Those failures were apparently due to an old version of gdb.
    
    Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
    Reviewed-on: https://go-review.googlesource.com/111256
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Michael Munday <mike.munday@ibm.com>
Commits on May 1, 2018
  1. cmd/compile: recognize some OpRsh64Ux64 Values as non-negative

    josharian committed Apr 25, 2018
    Proves IsSliceInBounds one additional time building std+cmd,
    at encoding/hex/hex.go:187:8.
    
    The code is:
    
    	if numAvail := len(d.in) / 2; len(p) > numAvail {
    		p = p[:numAvail]
    	}
    
    Previously we were unable to prove that numAvail >= 0.
    
    Change-Id: Ie74e0aef809f9194c45e129ee3dae60bc3eae02f
    Reviewed-on: https://go-review.googlesource.com/109415
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Giovanni Bajo <rasky@develer.com>
  2. runtime: allow inlining of stackmapdata

    josharian committed Apr 1, 2018
    Also do very minor code cleanup.
    
    name                old time/op  new time/op  delta
    StackCopyPtr-8      84.8ms ± 6%  82.9ms ± 5%  -2.19%  (p=0.000 n=95+94)
    StackCopy-8         68.4ms ± 5%  65.3ms ± 4%  -4.54%  (p=0.000 n=99+99)
    StackCopyNoCache-8   107ms ± 2%   105ms ± 2%  -2.13%  (p=0.000 n=91+95)
    
    Change-Id: I2d85ede48bffada9584d437a08a82212c0da6d00
    Reviewed-on: https://go-review.googlesource.com/109001
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
  3. runtime: use staticbytes in intstring for small v

    josharian committed May 1, 2018
    Triggers 21 times during make.bash.
    
    Change-Id: I7efb34200439256151304bb66cd309913f7c9c9e
    Reviewed-on: https://go-review.googlesource.com/110557
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Martin Möhrmann <moehrmann@google.com>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
  4. cmd/compile: optimize bvec routines

    josharian committed Apr 30, 2018
    The recent improvements to the prove pass
    make it possible to provide bounds
    hints to the compiler in some bvec routines.
    
    This speeds up the compilation of the code in
    
    name  old time/op       new time/op       delta
    Pkg         7.93s ± 4%        7.69s ± 3%  -2.98%  (p=0.000 n=29+26)
    
    While we're here, clean up some C-isms.
    
    Updates #13554
    Updates #20393
    
    Change-Id: I47a0ec68543a9fc95c5359c3f37813fb529cb4f0
    Reviewed-on: https://go-review.googlesource.com/110560
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Matthew Dempsky <mdempsky@google.com>
  5. runtime: avoid unnecessary scanblock calls

    josharian committed Apr 7, 2018
    This is the scanstack analog of CL 104737,
    which made a similar change for copystack.
    
    name         old time/op  new time/op  delta
    ScanStack-8  41.1ms ± 6%  38.9ms ± 5%  -5.52%  (p=0.000 n=50+48)
    
    Change-Id: I7427151dea2895ed3934f8a0f61d96b568019217
    Reviewed-on: https://go-review.googlesource.com/105536
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
  6. runtime: add BenchmarkScanStack

    josharian committed Apr 7, 2018
    There are many possible stack scanning benchmarks,
    but this one is at least a start.
    
    cpuprofiling shows about 75% of CPU in func scanstack.
    
    Change-Id: I906b0493966f2165c1920636c4e057d16d6447e0
    Reviewed-on: https://go-review.googlesource.com/105535
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
Commits on Apr 30, 2018
  1. cmd/compile: use AuxInt to store shift boundedness

    josharian authored and aclements committed Apr 30, 2018
    Fixes ssacheck build.
    
    Change-Id: Idf1d2ea9a971a1f17f2fca568099e870bb5d913f
    Reviewed-on: https://go-review.googlesource.com/110122
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
Commits on Apr 29, 2018
  1. cmd/compile: simplify shifts using bounds from prove pass

    josharian committed Apr 27, 2018
    The prove pass sometimes has bounds information
    that later rewrite passes do not.
    
    Use this information to mark shifts as bounded,
    and then use that information to generate better code on amd64.
    It may prove to be helpful on other architectures, too.
    
    While here, coalesce the existing shift lowering rules.
    
    This triggers 35 times building std+cmd. The full list is below.
    
    Here's an example from runtime.heapBitsSetType:
    
    			if nb < 8 {
    				b |= uintptr(*p) << nb
    				p = add1(p)
    			} else {
    				nb -= 8
    			}
    
    We now generate better code on amd64 for that left shift.
    
    Updates #25087
    
    vendor/golang_org/x/crypto/curve25519/mont25519_amd64.go:48:20: Proved Rsh8Ux64 bounded
    runtime/mbitmap.go:1252:22: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1265:16: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1275:28: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1645:25: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1663:25: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1808:41: Proved Lsh64x64 bounded
    runtime/mbitmap.go:1831:49: Proved Lsh64x64 bounded
    syscall/route_bsd.go:227:23: Proved Lsh32x64 bounded
    syscall/route_bsd.go:295:23: Proved Lsh32x64 bounded
    syscall/route_darwin.go:40:23: Proved Lsh32x64 bounded
    compress/bzip2/bzip2.go:384:26: Proved Lsh64x16 bounded
    vendor/golang_org/x/net/route/address.go:370:14: Proved Lsh64x64 bounded
    compress/flate/inflate.go:201:54: Proved Lsh64x64 bounded
    math/big/prime.go:50:25: Proved Lsh64x64 bounded
    vendor/golang_org/x/crypto/cryptobyte/asn1.go:464:43: Proved Lsh8x8 bounded
    net/ip.go:87:21: Proved Rsh8Ux64 bounded
    cmd/internal/goobj/read.go:267:23: Proved Lsh64x64 bounded
    cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:534:27: Proved Lsh32x32 bounded
    cmd/vendor/golang.org/x/arch/arm64/arm64asm/decode.go:544:27: Proved Lsh32x32 bounded
    cmd/internal/obj/arm/asm5.go:1044:16: Proved Lsh32x64 bounded
    cmd/internal/obj/arm/asm5.go:1065:10: Proved Lsh32x32 bounded
    cmd/internal/obj/mips/obj0.go:1311:21: Proved Lsh32x64 bounded
    cmd/compile/internal/syntax/scanner.go:352:23: Proved Lsh64x64 bounded
    go/types/expr.go:222:36: Proved Lsh64x64 bounded
    crypto/x509/x509.go:1626:9: Proved Rsh8Ux64 bounded
    cmd/link/internal/loadelf/ldelf.go:823:22: Proved Lsh8x64 bounded
    net/http/h2_bundle.go:1470:17: Proved Lsh8x8 bounded
    net/http/h2_bundle.go:1477:46: Proved Lsh8x8 bounded
    net/http/h2_bundle.go:1481:31: Proved Lsh64x8 bounded
    cmd/compile/internal/ssa/rewriteARM64.go:18759:17: Proved Lsh64x64 bounded
    cmd/compile/internal/ssa/sparsemap.go:70:23: Proved Lsh32x64 bounded
    cmd/compile/internal/ssa/sparsemap.go:73:45: Proved Lsh32x64 bounded
    
    Change-Id: I58bb72f3e6f12f6ac69be633ea7222c245438142
    Reviewed-on: https://go-review.googlesource.com/109776
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Giovanni Bajo <rasky@develer.com>
  2. runtime: iterate over set bits in adjustpointers

    josharian committed Apr 1, 2018
    There are several things combined in this change.
    
    First, eliminate the gobitvector type in favor
    of adding a ptrbit method to bitvector.
    In non-performance-critical code, use that method.
    In performance critical code, though, load the bitvector data
    one byte at a time and iterate only over set bits.
    To support that, add and use sys.Ctz8.
    
    name                old time/op  new time/op  delta
    StackCopyPtr-8      81.8ms ± 5%  78.9ms ± 3%   -3.58%  (p=0.000 n=97+96)
    StackCopy-8         65.9ms ± 3%  62.8ms ± 3%   -4.67%  (p=0.000 n=96+92)
    StackCopyNoCache-8   105ms ± 3%   102ms ± 3%   -3.38%  (p=0.000 n=96+95)
    
    Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74
    Reviewed-on: https://go-review.googlesource.com/109716
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
  3. runtime: add fast version of getArgInfo

    josharian committed Mar 31, 2018
    getArgInfo is called a lot during stack copying.
    In the common case it doesn't do much work,
    but it cannot be inlined.
    
    This change works around that.
    
    name                old time/op  new time/op  delta
    StackCopyPtr-8       108ms ± 5%    96ms ± 4%  -10.40%  (p=0.000 n=20+20)
    StackCopy-8         82.6ms ± 3%  78.4ms ± 6%   -5.15%  (p=0.000 n=19+20)
    StackCopyNoCache-8   130ms ± 3%   122ms ± 3%   -6.44%  (p=0.000 n=20+20)
    
    Change-Id: If7d8a08c50a4e2e76e4331b399396c5dbe88c2ce
    Reviewed-on: https://go-review.googlesource.com/108945
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Austin Clements <austin@google.com>
Commits on Apr 27, 2018
  1. cmd/internal/obj: convert unicode C to ASCII C

    josharian committed Apr 27, 2018
    Hex before: d0 a1
    Hex after: 43
    
    Not sure where that came from.
    
    Change-Id: I189e7e21f8faf480ba72846b956a149976f720f8
    Reviewed-on: https://go-review.googlesource.com/109777
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
  2. cmd/compile: increase initial allocation of LSym.R

    josharian committed Apr 26, 2018
    Not a big win, but cheap.
    
    name        old alloc/op      new alloc/op      delta
    Template         34.4MB ± 0%       34.4MB ± 0%  -0.20%  (p=0.000 n=15+15)
    Unicode          29.2MB ± 0%       29.3MB ± 0%  +0.17%  (p=0.000 n=15+15)
    GoTypes           113MB ± 0%        113MB ± 0%  -0.22%  (p=0.000 n=15+15)
    Compiler          509MB ± 0%        508MB ± 0%  -0.11%  (p=0.000 n=15+14)
    SSA              1.46GB ± 0%       1.46GB ± 0%  -0.08%  (p=0.000 n=14+15)
    Flate            23.8MB ± 0%       23.7MB ± 0%  -0.22%  (p=0.000 n=15+15)
    GoParser         27.9MB ± 0%       27.8MB ± 0%  -0.21%  (p=0.000 n=14+15)
    Reflect          77.2MB ± 0%       77.0MB ± 0%  -0.27%  (p=0.000 n=14+15)
    Tar              34.0MB ± 0%       33.9MB ± 0%  -0.21%  (p=0.000 n=13+15)
    XML              42.6MB ± 0%       42.5MB ± 0%  -0.15%  (p=0.000 n=15+15)
    [Geo mean]       75.8MB            75.7MB       -0.15%
    
    name        old allocs/op     new allocs/op     delta
    Template           322k ± 0%         320k ± 0%  -0.60%  (p=0.000 n=15+15)
    Unicode            337k ± 0%         336k ± 0%  -0.23%  (p=0.000 n=12+15)
    GoTypes           1.13M ± 0%        1.12M ± 0%  -0.58%  (p=0.000 n=15+14)
    Compiler          4.67M ± 0%        4.65M ± 0%  -0.38%  (p=0.000 n=14+15)
    SSA               11.7M ± 0%        11.6M ± 0%  -0.25%  (p=0.000 n=15+15)
    Flate              216k ± 0%         214k ± 0%  -0.67%  (p=0.000 n=15+15)
    GoParser           271k ± 0%         270k ± 0%  -0.57%  (p=0.000 n=15+15)
    Reflect            927k ± 0%         920k ± 0%  -0.72%  (p=0.000 n=13+14)
    Tar                318k ± 0%         316k ± 0%  -0.57%  (p=0.000 n=15+15)
    XML                376k ± 0%         375k ± 0%  -0.46%  (p=0.000 n=14+14)
    [Geo mean]         731k              727k       -0.50%
    
    Change-Id: I1417c5881e866fb3efe62a3d0fbe1134275da31a
    Reviewed-on: https://go-review.googlesource.com/109755
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
  3. cmd/compile: log Ctz non-zero proofs

    josharian committed Apr 26, 2018
    I forgot this in CL 109358.
    
    Change-Id: Ia5e8bd9cf43393f098b101a0d6a0c526e3e4f101
    Reviewed-on: https://go-review.googlesource.com/109775
    Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>