New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: looprotate picking wrong jump target #20355

Closed
josharian opened this Issue May 13, 2017 · 9 comments

Comments

Projects
None yet
3 participants
@josharian
Contributor

josharian commented May 13, 2017

This is easier to explain with some visuals. They're a bit complicated; bear with me. Ask if you have questions about how to read them.

The screenshots below are of ssa.html with a CFG of the fannkuch benchmark, including CL 43293. If you want to look around yourself, here's the ssa.html. Be patient--this is hard on your browser. To click to highlight a node you must click on the text in the node, not anywhere in the ellipse, because I hate javascript.

First, here's looprotate doing a good job.

The nodes are blocks. The node labels contain block id and block kind; the number in parens indicates its index in f.Blocks (i.e. where is has been laid out). Edges are labeled with successor numbers. Dashed edges are unlikely. Green edges indicate adjacency in f.Blocks. Dotted green edges aren't part of the CFG; they are there only to show layout adjacency. Node coloring was done manually, by clicking on them.

You can see that b3 is the header for the loop b4 b5 b6, which can panic (via b9) or terminate normally (via b7). Looprotate changes the layout so that after b3 comes b5, not b4. Observe the new dotted green edge and updated parenthesized indices.

looprotate_good

Now here's looprotate doing something untoward. Same graph, different highlighted nodes. The second screenshot shows the bottom of the graph.

b7 is a loop header for a very big loop starting at b11 (observe the edge coming in from b12 offscreen below). But b11 is not the node that decides whether the loop should continue. That's b61, which (if still looping) continues to b12, which returns to b11.

However, looprotate bypasses b11 in favor of b14, which doesn't really make much sense. The third screenshot shows what this looks like after the trim pass. Definitely doesn't seem like ideal code layout.

looprotate_bad

end_of_loop

after_trim

cc @randall77

@josharian josharian added this to the Go1.9 milestone May 13, 2017

@josharian josharian changed the title from cmd/compile: looprotate disrupting non-loops to cmd/compile: looprotate picking wrong jump target May 13, 2017

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian
Contributor

josharian commented May 13, 2017

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 14, 2017

CL https://golang.org/cl/43464 mentions this issue.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 14, 2017

Contributor

Started looking at this. Have some ideas.

Contributor

josharian commented May 14, 2017

Started looking at this. Have some ideas.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 14, 2017

CL https://golang.org/cl/43491 mentions this issue.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 May 15, 2017

Contributor

(You don't mean to say that b7 is the loop header, do you? b11 is the header. b7 isn't even in the loop.)

I agree that this second reordering is not desirable. From my perspective, that's just because the branch at the end of b11 isn't an exit-loop branch. Maybe that should be part of the condition for triggering the rotate? Then we'd only move single blocks.

Maybe the condition is that if we rotate the loop, the non-local edge out of the header is now a fallthrough edge (the non-loop successor is the immediately next block).

I think your CL 43491 is correct, but it may be overkill.

Contributor

randall77 commented May 15, 2017

(You don't mean to say that b7 is the loop header, do you? b11 is the header. b7 isn't even in the loop.)

I agree that this second reordering is not desirable. From my perspective, that's just because the branch at the end of b11 isn't an exit-loop branch. Maybe that should be part of the condition for triggering the rotate? Then we'd only move single blocks.

Maybe the condition is that if we rotate the loop, the non-local edge out of the header is now a fallthrough edge (the non-loop successor is the immediately next block).

I think your CL 43491 is correct, but it may be overkill.

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot commented May 15, 2017

CL https://golang.org/cl/43502 mentions this issue.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 16, 2017

Contributor

I ran some benchmarks with CL 43491 ("josh") and CL 43502 ("khr") on top of 5548f7d ("tip"). For each, I'll show all three together, tip -> khr, tip -> josh, and khr -> josh. It's lots of data, but it's hard to make sense of without all the pairwise stats.

Looking through these, I'm weakly inclined to go for the only slightly more complicated josh CL, mostly on the basis of the macro (compilebench) benchmark and the big append regression. I also think it is interesting that the khr CL has a big +/- impact on lots of go1 benchmarks, but the josh CL does not. It makes me suspect that there are better heuristics available to decide whether to not unroll (khr) or move many blocks (josh). I think we should pull in either khr or josh for 1.9 and then revisit in 1.10 whether they can be fruitfully combined. Other changes to the layout pass may also affect this.

@randall77 now that we have some data, please make an executive decision about how to proceed.

Fannkuch running alone:

all three
name \ time/op  fk-tip       fk-khr       fk-josh
Fannkuch11-8    2.73s ± 2%  2.81s ± 2%  2.82s ± 1%

tip -> khr
Fannkuch11-8   2.73s ± 2%   2.81s ± 2%  +2.78%  (p=0.000 n=29+30)

tip -> josh
Fannkuch11-8   2.73s ± 2%   2.82s ± 1%  +3.16%  (p=0.000 n=29+30)

khr -> josh
Fannkuch11-8   2.81s ± 2%   2.82s ± 1%   ~     (p=0.059 n=30+30)

All go1 benchmarks:

all three
name \ time/op           go1-tip         go1-khr         go1-josh
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%     2.25s ± 2%
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%     2.82s ± 1%
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    34.4ns ± 3%
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%    57.3ns ± 2%
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%    61.3ns ± 3%
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%    92.0ns ± 1%
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%     107ns ± 2%
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%     185ns ± 1%
FmtManyArgs-8               406ns ± 2%     387ns ± 2%     397ns ± 2%
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    5.18ms ± 1%
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%    4.05ms ± 2%
Gzip-8                      194ms ± 2%     192ms ± 2%     194ms ± 1%
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%    30.6ms ± 1%
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%    63.8µs ± 1%
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%    10.5ms ± 2%
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%    41.2ms ± 2%
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%    3.68ms ± 1%
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    2.67ms ± 1%
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    67.7ns ± 2%
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%     159ns ± 2%
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    63.4ns ± 1%
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%     282ns ± 3%
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%     100ns ± 2%
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%    31.8µs ± 2%
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    1.52µs ± 5%
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    44.4µs ± 2%
Revcomp-8                   358ms ± 2%     392ms ± 3%     359ms ± 3%
Template-8                 47.0ms ± 3%    47.1ms ± 2%    46.7ms ± 2%
TimeParse-8                 262ns ± 2%     245ns ± 2%     248ns ± 2%
TimeFormat-8                262ns ± 2%     254ns ± 2%     260ns ± 2%

tip -> khr
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%  -3.02%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%  +2.91%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    ~     (p=0.788 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%  +4.27%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%  +0.85%  (p=0.022 n=20+18)
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%  -4.18%  (p=0.000 n=18+20)
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%  +5.26%  (p=0.000 n=20+19)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%  -1.76%  (p=0.000 n=20+20)
FmtManyArgs-8               406ns ± 2%     387ns ± 2%  -4.76%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    ~     (p=0.231 n=20+20)
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%  +2.36%  (p=0.000 n=17+19)
Gzip-8                      194ms ± 2%     192ms ± 2%  -0.91%  (p=0.004 n=20+20)
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%  -4.61%  (p=0.000 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%  -1.84%  (p=0.000 n=20+15)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+19)
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%  -1.51%  (p=0.000 n=20+19)
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%  +0.55%  (p=0.040 n=20+20)
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    ~     (p=0.478 n=19+20)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    ~     (p=0.783 n=20+20)
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%  -1.02%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.113 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%  -2.17%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%  -2.98%  (p=0.000 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%  -3.78%  (p=0.000 n=20+20)
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    ~     (p=0.344 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    ~     (p=0.604 n=19+18)
Revcomp-8                   358ms ± 2%     392ms ± 3%  +9.64%  (p=0.000 n=19+20)
Template-8                 47.0ms ± 3%    47.1ms ± 2%    ~     (p=0.444 n=20+19)
TimeParse-8                 262ns ± 2%     245ns ± 2%  -6.21%  (p=0.000 n=20+19)
TimeFormat-8                262ns ± 2%     254ns ± 2%  -3.07%  (p=0.000 n=20+20)
[Geo mean]                 39.7µs         39.5µs       -0.50%

tip -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.29s ± 1%     2.25s ± 2%  -1.82%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.82s ± 1%  +3.49%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    34.4ns ± 3%  +1.75%  (p=0.015 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    57.3ns ± 2%  +1.61%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.3ns ± 3%    ~     (p=0.154 n=20+20)
FmtFprintfIntInt-8         92.3ns ± 1%    92.0ns ± 1%    ~     (p=0.119 n=18+18)
FmtFprintfPrefixedInt-8     109ns ± 2%     107ns ± 2%  -1.24%  (p=0.007 n=20+20)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 1%  -1.34%  (p=0.000 n=20+18)
FmtManyArgs-8               406ns ± 2%     397ns ± 2%  -2.31%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.18ms ± 1%    ~     (p=0.968 n=20+20)
GobEncode-8                4.05ms ± 1%    4.05ms ± 2%    ~     (p=0.684 n=17+20)
Gzip-8                      194ms ± 2%     194ms ± 1%    ~     (p=0.659 n=20+20)
Gunzip-8                   31.0ms ± 2%    30.6ms ± 1%  -1.09%  (p=0.001 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.8µs ± 1%  -0.99%  (p=0.000 n=20+18)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+20)
JSONDecode-8               42.9ms ± 2%    41.2ms ± 2%  -3.86%  (p=0.000 n=20+20)
Mandelbrot200-8            3.67ms ± 1%    3.68ms ± 1%  +0.47%  (p=0.019 n=20+19)
GoParse-8                  2.67ms ± 2%    2.67ms ± 1%    ~     (p=0.845 n=19+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.66%  (p=0.028 n=20+18)
RegexpMatchEasy0_1K-8       160ns ± 1%     159ns ± 2%  -0.64%  (p=0.049 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.091 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     282ns ± 3%    ~     (p=0.615 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%     100ns ± 2%    ~     (p=0.636 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    31.8µs ± 2%    ~     (p=0.857 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%    ~     (p=0.073 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 2%    ~     (p=0.361 n=19+20)
Revcomp-8                   358ms ± 2%     359ms ± 3%    ~     (p=0.550 n=19+20)
Template-8                 47.0ms ± 3%    46.7ms ± 2%    ~     (p=0.192 n=20+20)
TimeParse-8                 262ns ± 2%     248ns ± 2%  -5.32%  (p=0.000 n=20+20)
TimeFormat-8                262ns ± 2%     260ns ± 2%    ~     (p=0.114 n=20+20)
[Geo mean]                 39.7µs         39.6µs       -0.25%

khr -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.22s ± 2%     2.25s ± 2%  +1.23%  (p=0.000 n=19+19)
Fannkuch11-8                2.80s ± 2%     2.82s ± 1%  +0.56%  (p=0.040 n=20+20)
FmtFprintfEmpty-8          33.9ns ± 2%    34.4ns ± 3%  +1.68%  (p=0.011 n=20+20)
FmtFprintfString-8         58.8ns ± 3%    57.3ns ± 2%  -2.55%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.5ns ± 2%    61.3ns ± 3%    ~     (p=0.627 n=18+20)
FmtFprintfIntInt-8         88.5ns ± 2%    92.0ns ± 1%  +4.03%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-8     114ns ± 1%     107ns ± 2%  -6.18%  (p=0.000 n=19+20)
FmtFprintfFloat-8           185ns ± 2%     185ns ± 1%    ~     (p=0.061 n=20+18)
FmtManyArgs-8               387ns ± 2%     397ns ± 2%  +2.57%  (p=0.000 n=19+19)
GobDecode-8                5.20ms ± 2%    5.18ms ± 1%    ~     (p=0.149 n=20+20)
GobEncode-8                4.15ms ± 1%    4.05ms ± 2%  -2.23%  (p=0.000 n=19+20)
Gzip-8                      192ms ± 2%     194ms ± 1%  +1.01%  (p=0.001 n=20+20)
Gunzip-8                   29.5ms ± 2%    30.6ms ± 1%  +3.69%  (p=0.000 n=20+20)
HTTPClientServer-8         63.2µs ± 0%    63.8µs ± 1%  +0.87%  (p=0.000 n=15+18)
JSONEncode-8               10.5ms ± 2%    10.5ms ± 2%    ~     (p=0.901 n=19+20)
JSONDecode-8               42.2ms ± 2%    41.2ms ± 2%  -2.38%  (p=0.000 n=19+20)
Mandelbrot200-8            3.69ms ± 2%    3.68ms ± 1%    ~     (p=0.813 n=20+19)
GoParse-8                  2.67ms ± 3%    2.67ms ± 1%    ~     (p=0.496 n=20+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.72%  (p=0.007 n=20+18)
RegexpMatchEasy0_1K-8       158ns ± 1%     159ns ± 2%    ~     (p=0.207 n=20+20)
RegexpMatchEasy1_32-8      63.4ns ± 1%    63.4ns ± 1%    ~     (p=0.808 n=18+18)
RegexpMatchEasy1_1K-8       275ns ± 2%     282ns ± 3%  +2.65%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8     96.7ns ± 1%    99.6ns ± 2%  +3.05%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-8     30.7µs ± 2%    31.8µs ± 2%  +3.50%  (p=0.000 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%  +2.01%  (p=0.020 n=20+20)
RegexpMatchHard_1K-8       44.4µs ± 1%    44.4µs ± 2%    ~     (p=0.691 n=18+20)
Revcomp-8                   392ms ± 3%     359ms ± 3%  -8.54%  (p=0.000 n=20+20)
Template-8                 47.1ms ± 2%    46.7ms ± 2%  -0.69%  (p=0.028 n=19+20)
TimeParse-8                 245ns ± 2%     248ns ± 2%  +0.95%  (p=0.008 n=19+20)
TimeFormat-8                254ns ± 2%     260ns ± 2%  +2.58%  (p=0.000 n=20+20)
[Geo mean]                 39.5µs         39.6µs       +0.25%

append benchmarks from #14758:

all three
name \ time/op  append-tip      append-khr      append-josh
Foo-8              324ns ± 2%     486ns ± 2%     294ns ± 2%
Bar-8              435ns ± 2%     579ns ± 1%     434ns ± 2%

tip -> khr
Foo-8          324ns ± 2%     486ns ± 2%  +50.12%  (p=0.000 n=27+29)
Bar-8          435ns ± 2%     579ns ± 1%  +33.03%  (p=0.000 n=30+29)
[Geo mean]     375ns          530ns       +41.32%

tip -> josh
name        old time/op    new time/op    delta
Foo-8          324ns ± 2%     294ns ± 2%   -9.32%  (p=0.000 n=27+30)
Bar-8          435ns ± 2%     434ns ± 2%     ~     (p=0.231 n=30+30)
[Geo mean]     375ns          357ns        -4.93%

khr -> josh
name        old time/op    new time/op    delta
Foo-8          486ns ± 2%     294ns ± 2%  -39.60%  (p=0.000 n=29+30)
Bar-8          579ns ± 1%     434ns ± 2%  -25.07%  (p=0.000 n=29+30)
[Geo mean]     530ns          357ns       -32.72%

compilebench:

all three
name \ time/op       cb-tip       cb-khr       cb-josh
Template              205ms ± 5%   204ms ± 5%   205ms ± 6%
Unicode              85.0ms ± 5%  84.3ms ± 5%  83.7ms ± 4%
GoTypes               530ms ± 2%   526ms ± 3%   526ms ± 4%
Compiler              2.44s ± 3%   2.44s ± 3%   2.43s ± 3%
SSA                   4.89s ± 2%   4.86s ± 2%   4.85s ± 2%
Flate                 118ms ± 4%   117ms ± 3%   116ms ± 2%
GoParser              139ms ± 4%   138ms ± 4%   138ms ± 3%
Reflect               334ms ± 3%   330ms ± 4%   331ms ± 3%
Tar                   102ms ± 5%   102ms ± 5%   100ms ± 4%
XML                   192ms ± 8%   196ms ±10%   197ms ±10%
[Geo mean]            322ms        320ms        320ms     

tip -> khr
name        old time/op       new time/op       delta
Template          205ms ± 5%        204ms ± 5%    ~     (p=0.105 n=96+95)
Unicode          85.0ms ± 5%       84.3ms ± 5%  -0.75%  (p=0.003 n=99+98)
GoTypes           530ms ± 2%        526ms ± 3%  -0.65%  (p=0.000 n=99+96)
Compiler          2.44s ± 3%        2.44s ± 3%    ~     (p=0.492 n=97+99)
SSA               4.89s ± 2%        4.86s ± 2%  -0.57%  (p=0.000 n=97+98)
Flate             118ms ± 4%        117ms ± 3%  -0.65%  (p=0.001 n=99+96)
GoParser          139ms ± 4%        138ms ± 4%  -0.73%  (p=0.001 n=99+100)
Reflect           334ms ± 3%        330ms ± 4%  -1.12%  (p=0.000 n=98+96)
Tar               102ms ± 5%        102ms ± 5%  -0.47%  (p=0.032 n=97+98)
XML               192ms ± 8%        196ms ±10%    ~     (p=0.091 n=86+100)
[Geo mean]        322ms             320ms       -0.35%

tip -> josh
Template          205ms ± 5%        205ms ± 6%    ~     (p=0.672 n=96+97)
Unicode          85.0ms ± 5%       83.7ms ± 4%  -1.48%  (p=0.000 n=99+95)
GoTypes           530ms ± 2%        526ms ± 4%  -0.69%  (p=0.000 n=99+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.119 n=97+100)
SSA               4.89s ± 2%        4.85s ± 2%  -0.67%  (p=0.000 n=97+99)
Flate             118ms ± 4%        116ms ± 2%  -1.51%  (p=0.000 n=99+90)
GoParser          139ms ± 4%        138ms ± 3%  -1.32%  (p=0.000 n=99+99)
Reflect           334ms ± 3%        331ms ± 3%  -0.88%  (p=0.000 n=98+97)
Tar               102ms ± 5%        100ms ± 4%  -1.70%  (p=0.000 n=97+97)
XML               192ms ± 8%        197ms ±10%  +2.60%  (p=0.040 n=86+100)
[Geo mean]        322ms             320ms       -0.60%

khr -> josh
name        old time/op       new time/op       delta
Template          204ms ± 5%        205ms ± 6%    ~     (p=0.061 n=95+97)
Unicode          84.3ms ± 5%       83.7ms ± 4%  -0.74%  (p=0.005 n=98+95)
GoTypes           526ms ± 3%        526ms ± 4%    ~     (p=0.962 n=96+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.398 n=99+100)
SSA               4.86s ± 2%        4.85s ± 2%    ~     (p=0.327 n=98+99)
Flate             117ms ± 3%        116ms ± 2%  -0.86%  (p=0.000 n=96+90)
GoParser          138ms ± 4%        138ms ± 3%  -0.60%  (p=0.004 n=100+99)
Reflect           330ms ± 4%        331ms ± 3%    ~     (p=0.187 n=96+97)
Tar               102ms ± 5%        100ms ± 4%  -1.24%  (p=0.000 n=98+97)
XML               196ms ±10%        197ms ±10%    ~     (p=0.919 n=100+100)
[Geo mean]        320ms             320ms       -0.25%
Contributor

josharian commented May 16, 2017

I ran some benchmarks with CL 43491 ("josh") and CL 43502 ("khr") on top of 5548f7d ("tip"). For each, I'll show all three together, tip -> khr, tip -> josh, and khr -> josh. It's lots of data, but it's hard to make sense of without all the pairwise stats.

Looking through these, I'm weakly inclined to go for the only slightly more complicated josh CL, mostly on the basis of the macro (compilebench) benchmark and the big append regression. I also think it is interesting that the khr CL has a big +/- impact on lots of go1 benchmarks, but the josh CL does not. It makes me suspect that there are better heuristics available to decide whether to not unroll (khr) or move many blocks (josh). I think we should pull in either khr or josh for 1.9 and then revisit in 1.10 whether they can be fruitfully combined. Other changes to the layout pass may also affect this.

@randall77 now that we have some data, please make an executive decision about how to proceed.

Fannkuch running alone:

all three
name \ time/op  fk-tip       fk-khr       fk-josh
Fannkuch11-8    2.73s ± 2%  2.81s ± 2%  2.82s ± 1%

tip -> khr
Fannkuch11-8   2.73s ± 2%   2.81s ± 2%  +2.78%  (p=0.000 n=29+30)

tip -> josh
Fannkuch11-8   2.73s ± 2%   2.82s ± 1%  +3.16%  (p=0.000 n=29+30)

khr -> josh
Fannkuch11-8   2.81s ± 2%   2.82s ± 1%   ~     (p=0.059 n=30+30)

All go1 benchmarks:

all three
name \ time/op           go1-tip         go1-khr         go1-josh
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%     2.25s ± 2%
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%     2.82s ± 1%
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    34.4ns ± 3%
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%    57.3ns ± 2%
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%    61.3ns ± 3%
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%    92.0ns ± 1%
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%     107ns ± 2%
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%     185ns ± 1%
FmtManyArgs-8               406ns ± 2%     387ns ± 2%     397ns ± 2%
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    5.18ms ± 1%
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%    4.05ms ± 2%
Gzip-8                      194ms ± 2%     192ms ± 2%     194ms ± 1%
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%    30.6ms ± 1%
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%    63.8µs ± 1%
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%    10.5ms ± 2%
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%    41.2ms ± 2%
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%    3.68ms ± 1%
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    2.67ms ± 1%
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    67.7ns ± 2%
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%     159ns ± 2%
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    63.4ns ± 1%
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%     282ns ± 3%
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%     100ns ± 2%
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%    31.8µs ± 2%
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    1.52µs ± 5%
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    44.4µs ± 2%
Revcomp-8                   358ms ± 2%     392ms ± 3%     359ms ± 3%
Template-8                 47.0ms ± 3%    47.1ms ± 2%    46.7ms ± 2%
TimeParse-8                 262ns ± 2%     245ns ± 2%     248ns ± 2%
TimeFormat-8                262ns ± 2%     254ns ± 2%     260ns ± 2%

tip -> khr
BinaryTree17-8              2.29s ± 1%     2.22s ± 2%  -3.02%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.80s ± 2%  +2.91%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    33.9ns ± 2%    ~     (p=0.788 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    58.8ns ± 3%  +4.27%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.5ns ± 2%  +0.85%  (p=0.022 n=20+18)
FmtFprintfIntInt-8         92.3ns ± 1%    88.5ns ± 2%  -4.18%  (p=0.000 n=18+20)
FmtFprintfPrefixedInt-8     109ns ± 2%     114ns ± 1%  +5.26%  (p=0.000 n=20+19)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 2%  -1.76%  (p=0.000 n=20+20)
FmtManyArgs-8               406ns ± 2%     387ns ± 2%  -4.76%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.20ms ± 2%    ~     (p=0.231 n=20+20)
GobEncode-8                4.05ms ± 1%    4.15ms ± 1%  +2.36%  (p=0.000 n=17+19)
Gzip-8                      194ms ± 2%     192ms ± 2%  -0.91%  (p=0.004 n=20+20)
Gunzip-8                   31.0ms ± 2%    29.5ms ± 2%  -4.61%  (p=0.000 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.2µs ± 0%  -1.84%  (p=0.000 n=20+15)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+19)
JSONDecode-8               42.9ms ± 2%    42.2ms ± 2%  -1.51%  (p=0.000 n=20+19)
Mandelbrot200-8            3.67ms ± 1%    3.69ms ± 2%  +0.55%  (p=0.040 n=20+20)
GoParse-8                  2.67ms ± 2%    2.67ms ± 3%    ~     (p=0.478 n=19+20)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.2ns ± 2%    ~     (p=0.783 n=20+20)
RegexpMatchEasy0_1K-8       160ns ± 1%     158ns ± 1%  -1.02%  (p=0.000 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.113 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     275ns ± 2%  -2.17%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%      97ns ± 1%  -2.98%  (p=0.000 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    30.7µs ± 2%  -3.78%  (p=0.000 n=20+20)
RegexpMatchHard_32-8       1.49µs ± 2%    1.49µs ± 2%    ~     (p=0.344 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 1%    ~     (p=0.604 n=19+18)
Revcomp-8                   358ms ± 2%     392ms ± 3%  +9.64%  (p=0.000 n=19+20)
Template-8                 47.0ms ± 3%    47.1ms ± 2%    ~     (p=0.444 n=20+19)
TimeParse-8                 262ns ± 2%     245ns ± 2%  -6.21%  (p=0.000 n=20+19)
TimeFormat-8                262ns ± 2%     254ns ± 2%  -3.07%  (p=0.000 n=20+20)
[Geo mean]                 39.7µs         39.5µs       -0.50%

tip -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.29s ± 1%     2.25s ± 2%  -1.82%  (p=0.000 n=19+19)
Fannkuch11-8                2.73s ± 1%     2.82s ± 1%  +3.49%  (p=0.000 n=20+20)
FmtFprintfEmpty-8          33.8ns ± 3%    34.4ns ± 3%  +1.75%  (p=0.015 n=18+20)
FmtFprintfString-8         56.4ns ± 2%    57.3ns ± 2%  +1.61%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.0ns ± 2%    61.3ns ± 3%    ~     (p=0.154 n=20+20)
FmtFprintfIntInt-8         92.3ns ± 1%    92.0ns ± 1%    ~     (p=0.119 n=18+18)
FmtFprintfPrefixedInt-8     109ns ± 2%     107ns ± 2%  -1.24%  (p=0.007 n=20+20)
FmtFprintfFloat-8           188ns ± 2%     185ns ± 1%  -1.34%  (p=0.000 n=20+18)
FmtManyArgs-8               406ns ± 2%     397ns ± 2%  -2.31%  (p=0.000 n=20+19)
GobDecode-8                5.18ms ± 1%    5.18ms ± 1%    ~     (p=0.968 n=20+20)
GobEncode-8                4.05ms ± 1%    4.05ms ± 2%    ~     (p=0.684 n=17+20)
Gzip-8                      194ms ± 2%     194ms ± 1%    ~     (p=0.659 n=20+20)
Gunzip-8                   31.0ms ± 2%    30.6ms ± 1%  -1.09%  (p=0.001 n=20+20)
HTTPClientServer-8         64.4µs ± 1%    63.8µs ± 1%  -0.99%  (p=0.000 n=20+18)
JSONEncode-8               10.2ms ± 3%    10.5ms ± 2%  +2.16%  (p=0.000 n=20+20)
JSONDecode-8               42.9ms ± 2%    41.2ms ± 2%  -3.86%  (p=0.000 n=20+20)
Mandelbrot200-8            3.67ms ± 1%    3.68ms ± 1%  +0.47%  (p=0.019 n=20+19)
GoParse-8                  2.67ms ± 2%    2.67ms ± 1%    ~     (p=0.845 n=19+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.66%  (p=0.028 n=20+18)
RegexpMatchEasy0_1K-8       160ns ± 1%     159ns ± 2%  -0.64%  (p=0.049 n=18+20)
RegexpMatchEasy1_32-8      63.1ns ± 2%    63.4ns ± 1%    ~     (p=0.091 n=20+18)
RegexpMatchEasy1_1K-8       281ns ± 3%     282ns ± 3%    ~     (p=0.615 n=20+20)
RegexpMatchMedium_32-8      100ns ± 1%     100ns ± 2%    ~     (p=0.636 n=18+20)
RegexpMatchMedium_1K-8     31.9µs ± 4%    31.8µs ± 2%    ~     (p=0.857 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%    ~     (p=0.073 n=20+20)
RegexpMatchHard_1K-8       44.5µs ± 2%    44.4µs ± 2%    ~     (p=0.361 n=19+20)
Revcomp-8                   358ms ± 2%     359ms ± 3%    ~     (p=0.550 n=19+20)
Template-8                 47.0ms ± 3%    46.7ms ± 2%    ~     (p=0.192 n=20+20)
TimeParse-8                 262ns ± 2%     248ns ± 2%  -5.32%  (p=0.000 n=20+20)
TimeFormat-8                262ns ± 2%     260ns ± 2%    ~     (p=0.114 n=20+20)
[Geo mean]                 39.7µs         39.6µs       -0.25%

khr -> josh
name                     old time/op    new time/op    delta
BinaryTree17-8              2.22s ± 2%     2.25s ± 2%  +1.23%  (p=0.000 n=19+19)
Fannkuch11-8                2.80s ± 2%     2.82s ± 1%  +0.56%  (p=0.040 n=20+20)
FmtFprintfEmpty-8          33.9ns ± 2%    34.4ns ± 3%  +1.68%  (p=0.011 n=20+20)
FmtFprintfString-8         58.8ns ± 3%    57.3ns ± 2%  -2.55%  (p=0.000 n=20+20)
FmtFprintfInt-8            61.5ns ± 2%    61.3ns ± 3%    ~     (p=0.627 n=18+20)
FmtFprintfIntInt-8         88.5ns ± 2%    92.0ns ± 1%  +4.03%  (p=0.000 n=20+18)
FmtFprintfPrefixedInt-8     114ns ± 1%     107ns ± 2%  -6.18%  (p=0.000 n=19+20)
FmtFprintfFloat-8           185ns ± 2%     185ns ± 1%    ~     (p=0.061 n=20+18)
FmtManyArgs-8               387ns ± 2%     397ns ± 2%  +2.57%  (p=0.000 n=19+19)
GobDecode-8                5.20ms ± 2%    5.18ms ± 1%    ~     (p=0.149 n=20+20)
GobEncode-8                4.15ms ± 1%    4.05ms ± 2%  -2.23%  (p=0.000 n=19+20)
Gzip-8                      192ms ± 2%     194ms ± 1%  +1.01%  (p=0.001 n=20+20)
Gunzip-8                   29.5ms ± 2%    30.6ms ± 1%  +3.69%  (p=0.000 n=20+20)
HTTPClientServer-8         63.2µs ± 0%    63.8µs ± 1%  +0.87%  (p=0.000 n=15+18)
JSONEncode-8               10.5ms ± 2%    10.5ms ± 2%    ~     (p=0.901 n=19+20)
JSONDecode-8               42.2ms ± 2%    41.2ms ± 2%  -2.38%  (p=0.000 n=19+20)
Mandelbrot200-8            3.69ms ± 2%    3.68ms ± 1%    ~     (p=0.813 n=20+19)
GoParse-8                  2.67ms ± 3%    2.67ms ± 1%    ~     (p=0.496 n=20+18)
RegexpMatchEasy0_32-8      67.2ns ± 2%    67.7ns ± 2%  +0.72%  (p=0.007 n=20+18)
RegexpMatchEasy0_1K-8       158ns ± 1%     159ns ± 2%    ~     (p=0.207 n=20+20)
RegexpMatchEasy1_32-8      63.4ns ± 1%    63.4ns ± 1%    ~     (p=0.808 n=18+18)
RegexpMatchEasy1_1K-8       275ns ± 2%     282ns ± 3%  +2.65%  (p=0.000 n=20+20)
RegexpMatchMedium_32-8     96.7ns ± 1%    99.6ns ± 2%  +3.05%  (p=0.000 n=20+20)
RegexpMatchMedium_1K-8     30.7µs ± 2%    31.8µs ± 2%  +3.50%  (p=0.000 n=20+19)
RegexpMatchHard_32-8       1.49µs ± 2%    1.52µs ± 5%  +2.01%  (p=0.020 n=20+20)
RegexpMatchHard_1K-8       44.4µs ± 1%    44.4µs ± 2%    ~     (p=0.691 n=18+20)
Revcomp-8                   392ms ± 3%     359ms ± 3%  -8.54%  (p=0.000 n=20+20)
Template-8                 47.1ms ± 2%    46.7ms ± 2%  -0.69%  (p=0.028 n=19+20)
TimeParse-8                 245ns ± 2%     248ns ± 2%  +0.95%  (p=0.008 n=19+20)
TimeFormat-8                254ns ± 2%     260ns ± 2%  +2.58%  (p=0.000 n=20+20)
[Geo mean]                 39.5µs         39.6µs       +0.25%

append benchmarks from #14758:

all three
name \ time/op  append-tip      append-khr      append-josh
Foo-8              324ns ± 2%     486ns ± 2%     294ns ± 2%
Bar-8              435ns ± 2%     579ns ± 1%     434ns ± 2%

tip -> khr
Foo-8          324ns ± 2%     486ns ± 2%  +50.12%  (p=0.000 n=27+29)
Bar-8          435ns ± 2%     579ns ± 1%  +33.03%  (p=0.000 n=30+29)
[Geo mean]     375ns          530ns       +41.32%

tip -> josh
name        old time/op    new time/op    delta
Foo-8          324ns ± 2%     294ns ± 2%   -9.32%  (p=0.000 n=27+30)
Bar-8          435ns ± 2%     434ns ± 2%     ~     (p=0.231 n=30+30)
[Geo mean]     375ns          357ns        -4.93%

khr -> josh
name        old time/op    new time/op    delta
Foo-8          486ns ± 2%     294ns ± 2%  -39.60%  (p=0.000 n=29+30)
Bar-8          579ns ± 1%     434ns ± 2%  -25.07%  (p=0.000 n=29+30)
[Geo mean]     530ns          357ns       -32.72%

compilebench:

all three
name \ time/op       cb-tip       cb-khr       cb-josh
Template              205ms ± 5%   204ms ± 5%   205ms ± 6%
Unicode              85.0ms ± 5%  84.3ms ± 5%  83.7ms ± 4%
GoTypes               530ms ± 2%   526ms ± 3%   526ms ± 4%
Compiler              2.44s ± 3%   2.44s ± 3%   2.43s ± 3%
SSA                   4.89s ± 2%   4.86s ± 2%   4.85s ± 2%
Flate                 118ms ± 4%   117ms ± 3%   116ms ± 2%
GoParser              139ms ± 4%   138ms ± 4%   138ms ± 3%
Reflect               334ms ± 3%   330ms ± 4%   331ms ± 3%
Tar                   102ms ± 5%   102ms ± 5%   100ms ± 4%
XML                   192ms ± 8%   196ms ±10%   197ms ±10%
[Geo mean]            322ms        320ms        320ms     

tip -> khr
name        old time/op       new time/op       delta
Template          205ms ± 5%        204ms ± 5%    ~     (p=0.105 n=96+95)
Unicode          85.0ms ± 5%       84.3ms ± 5%  -0.75%  (p=0.003 n=99+98)
GoTypes           530ms ± 2%        526ms ± 3%  -0.65%  (p=0.000 n=99+96)
Compiler          2.44s ± 3%        2.44s ± 3%    ~     (p=0.492 n=97+99)
SSA               4.89s ± 2%        4.86s ± 2%  -0.57%  (p=0.000 n=97+98)
Flate             118ms ± 4%        117ms ± 3%  -0.65%  (p=0.001 n=99+96)
GoParser          139ms ± 4%        138ms ± 4%  -0.73%  (p=0.001 n=99+100)
Reflect           334ms ± 3%        330ms ± 4%  -1.12%  (p=0.000 n=98+96)
Tar               102ms ± 5%        102ms ± 5%  -0.47%  (p=0.032 n=97+98)
XML               192ms ± 8%        196ms ±10%    ~     (p=0.091 n=86+100)
[Geo mean]        322ms             320ms       -0.35%

tip -> josh
Template          205ms ± 5%        205ms ± 6%    ~     (p=0.672 n=96+97)
Unicode          85.0ms ± 5%       83.7ms ± 4%  -1.48%  (p=0.000 n=99+95)
GoTypes           530ms ± 2%        526ms ± 4%  -0.69%  (p=0.000 n=99+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.119 n=97+100)
SSA               4.89s ± 2%        4.85s ± 2%  -0.67%  (p=0.000 n=97+99)
Flate             118ms ± 4%        116ms ± 2%  -1.51%  (p=0.000 n=99+90)
GoParser          139ms ± 4%        138ms ± 3%  -1.32%  (p=0.000 n=99+99)
Reflect           334ms ± 3%        331ms ± 3%  -0.88%  (p=0.000 n=98+97)
Tar               102ms ± 5%        100ms ± 4%  -1.70%  (p=0.000 n=97+97)
XML               192ms ± 8%        197ms ±10%  +2.60%  (p=0.040 n=86+100)
[Geo mean]        322ms             320ms       -0.60%

khr -> josh
name        old time/op       new time/op       delta
Template          204ms ± 5%        205ms ± 6%    ~     (p=0.061 n=95+97)
Unicode          84.3ms ± 5%       83.7ms ± 4%  -0.74%  (p=0.005 n=98+95)
GoTypes           526ms ± 3%        526ms ± 4%    ~     (p=0.962 n=96+98)
Compiler          2.44s ± 3%        2.43s ± 3%    ~     (p=0.398 n=99+100)
SSA               4.86s ± 2%        4.85s ± 2%    ~     (p=0.327 n=98+99)
Flate             117ms ± 3%        116ms ± 2%  -0.86%  (p=0.000 n=96+90)
GoParser          138ms ± 4%        138ms ± 3%  -0.60%  (p=0.004 n=100+99)
Reflect           330ms ± 4%        331ms ± 3%    ~     (p=0.187 n=96+97)
Tar               102ms ± 5%        100ms ± 4%  -1.24%  (p=0.000 n=98+97)
XML               196ms ±10%        197ms ±10%    ~     (p=0.919 n=100+100)
[Geo mean]        320ms             320ms       -0.25%
@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 16, 2017

Contributor

Another data point. For the code in #16122:

all three
name \ time/op  tip         khr         josh
Speed/encode-8  438µs ± 2%  430µs ± 1%  374µs ± 0%
Speed/decode-8  356µs ± 1%  314µs ± 1%  320µs ± 1%

tip -> khr
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   430µs ± 1%   -1.81%  (p=0.000 n=10+10)
Speed/decode-8   356µs ± 1%   314µs ± 1%  -11.76%  (p=0.000 n=10+9)

tip -> josh
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   374µs ± 0%  -14.71%  (p=0.000 n=10+9)
Speed/decode-8   356µs ± 1%   320µs ± 1%   -9.93%  (p=0.000 n=10+10)

khr -> josh
name            old time/op  new time/op  delta
Speed/encode-8   430µs ± 1%   374µs ± 0%  -13.13%  (p=0.000 n=10+9)
Speed/decode-8   314µs ± 1%   320µs ± 1%   +2.07%  (p=0.000 n=9+10)

Again, seems like josh is a bit better than khr, but an appropriate combo might be better yet.

Contributor

josharian commented May 16, 2017

Another data point. For the code in #16122:

all three
name \ time/op  tip         khr         josh
Speed/encode-8  438µs ± 2%  430µs ± 1%  374µs ± 0%
Speed/decode-8  356µs ± 1%  314µs ± 1%  320µs ± 1%

tip -> khr
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   430µs ± 1%   -1.81%  (p=0.000 n=10+10)
Speed/decode-8   356µs ± 1%   314µs ± 1%  -11.76%  (p=0.000 n=10+9)

tip -> josh
name            old time/op  new time/op  delta
Speed/encode-8   438µs ± 2%   374µs ± 0%  -14.71%  (p=0.000 n=10+9)
Speed/decode-8   356µs ± 1%   320µs ± 1%   -9.93%  (p=0.000 n=10+10)

khr -> josh
name            old time/op  new time/op  delta
Speed/encode-8   430µs ± 1%   374µs ± 0%  -13.13%  (p=0.000 n=10+9)
Speed/decode-8   314µs ± 1%   320µs ± 1%   +2.07%  (p=0.000 n=9+10)

Again, seems like josh is a bit better than khr, but an appropriate combo might be better yet.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 16, 2017

Contributor

(foolish comment deleted)

Contributor

josharian commented May 16, 2017

(foolish comment deleted)

@gopherbot gopherbot closed this in 638ebb0 May 18, 2017

josharian added a commit to josharian/go that referenced this issue May 23, 2017

cmd/compile: add CFG graphs to ssa.html
DO NOT SUBMIT

This CL adds CFG graphs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html:
Clicking on blocks highlights the relevant
part of the CFG graph, and vice versa.

Sample output and screenshots can be seen
in issues #20355 and #20356.

There are serious problems with this CL, though.

Performance:

* Calling dot after every pass is noticeably slow.
* The generated output is giant.
* The browser is very slow to render the resulting page.
* Clicking on blocks is even slower than before.
* Some things I want to do, like allow the user to change
  the table column widths, lock up the browser.

Appearance:

* The CFGs can easily be large and overwhelming.
  Toggling them on/off might be workable, if the
  performance concerns above were addressed.
* I can't figure out the right size to render the CFGs;
  simple ones are currently oversized and cartoonish,
  while larger ones are unreadable.
* They requires an unsatisfying amount of explanation (see #20356).
  Block layout information is particularly inferior/confusing.
* Dead blocks float awkwardly in the sky, with no indication
  that they are dead.
* It'd be nice to somehow add visual information about loops,
  which we can calculate, and which is non-obvious in large graphs,
  but I don't know how.
* It'd be nice to add more information per block,
  like the number of values it contains,
  or even the values themselves,
  but adding info to a node makes the graph even less readable.
  Just adding the f.Blocks index in parens was not good.

Bugs, incompleteness:

* I seem to have broken highlighting around the entire
  block in the text.
* Need to hook up some way to communicate dot-related errors
  without bringing down the process.
* Might need some way to enable/disable dot entirely.

Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb

@golang golang locked and limited conversation to collaborators May 18, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.