Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: implement more optimizations on loong64 #59120

Open
3 of 12 tasks
xen0n opened this issue Mar 19, 2023 · 2 comments
Open
3 of 12 tasks

cmd/compile: implement more optimizations on loong64 #59120

xen0n opened this issue Mar 19, 2023 · 2 comments
Labels
arch-loong64 Issues solely affecting the loongson architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@xen0n
Copy link
Member

xen0n commented Mar 19, 2023

This issue is mainly for tracking the implementation progress of various low-hanging fruits regarding loong64 optimizations.

There are many missed optimization chances on loong64. A quick survey on SSA intrinsics uncovers:

  • runtime.publicationBarrier
    • dmb st on arm64
    • dbar 0 on LA64 v1.00
    • dbar <TBD> on next revision of LA64 (finer-grained barriers are to be supported)
  • runtime.Bswap{32,64}
    • revb.{2w,d} on LA64 v1.00
    • CL: TBD
  • runtime/internal/sys.Prefetch{,Streamed}
    • preld on LA64 v1.00
  • runtime/internal/atomic.{And,Or}
    • am{and,or}.d on LA64 v1.00
  • math.{Trunc,Ceil,Floor,RoundToEven} not possible with LA64 v1.00
    • LA64 v1.00 frint.[sd] is not orthogonal: no fixed rounding mode variants (unlike e.g. ftintr{m,p,z,ne}).
  • math.Round
    • frint.[sd] on LA64 v1.00 -- have to check if the rounding mode behavior is tolerable
  • math.Abs
    • fabs.[sd] on LA64 v1.00
  • math.Copysign
    • fcopysign.[sd] on LA64 v1.00
  • math.FMA
    • f{,n}m{add,sub}.[sd] on LA64 v1.00
    • CL: TBD (small overall improvement, but weird performance regression for select cases)
  • math/bits.TrailingZeros{64,32} (ssa.OpCtz{64,32})
  • math/bits.Len{64,32,} (ssa.OpBitLen{64,32})
    • clz.[wd] on LA64 v1.00
    • significant performance regression across the board, needs investigation (micro-architecture quirk?)
  • math/bits.Reverse{64,32,8} (ssa.OpBitRev{64,32,8})
    • bitrev.{d,w,4b} on LA64 v1.00

We may want to implement (and preferably benchmark) all of the above.

cc @golang/loong64

@xen0n xen0n added arch-loong64 Issues solely affecting the loongson architecture. compiler/runtime Issues related to the Go compiler and/or runtime. labels Mar 19, 2023
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
…ns with EXTW{B,H}

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
@heschi heschi added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 20, 2023
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
@cherrymui cherrymui added this to the Unplanned milestone Mar 20, 2023
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 20, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 21, 2023
…ns with EXTW{B,H}

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 21, 2023
Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 21, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 21, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 21, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
…xtensions

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
…xtensions

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 25, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
…xtensions

8- and 16-bit sign extensions and 32-bit zero extensions were realized
with left and right shifts before this change. We now support assembling
EXTWB, EXTWH and BSTRPICKV so all three can be done with a single insn.

Benchmark results on Loongson 3A5000:

goos: linux
goarch: loong64
pkg: test/bench/go1
                      │    before    │                after                │
                      │    sec/op    │   sec/op     vs base                │
BinaryTree17             14.55 ±  1%    14.56 ± 1%        ~ (p=0.912 n=10)
Fannkuch11               3.574 ±  0%    3.711 ± 0%   +3.83% (p=0.000 n=10)
FmtFprintfEmpty         92.89n ±  0%   93.44n ± 0%   +0.60% (p=0.000 n=10)
FmtFprintfString        148.9n ±  0%   157.1n ± 0%   +5.51% (p=0.000 n=10)
FmtFprintfInt           153.5n ±  0%   158.9n ± 0%   +3.52% (p=0.000 n=10)
FmtFprintfIntInt        234.6n ±  0%   241.0n ± 0%   +2.73% (p=0.000 n=10)
FmtFprintfPrefixedInt   317.3n ±  0%   345.2n ± 0%   +8.79% (p=0.000 n=10)
FmtFprintfFloat         404.9n ±  0%   402.2n ± 0%   -0.67% (p=0.000 n=10)
FmtManyArgs             929.1n ±  0%   963.2n ± 0%   +3.66% (p=0.000 n=10)
GobDecode               15.47m ± 17%   12.82m ± 2%  -17.08% (p=0.000 n=10)
GobEncode               17.91m ±  1%   17.85m ± 3%        ~ (p=0.247 n=10)
Gzip                    416.0m ±  0%   410.9m ± 0%   -1.22% (p=0.000 n=10)
Gunzip                  85.34m ±  0%   87.44m ± 0%   +2.46% (p=0.000 n=10)
HTTPClientServer        86.64µ ±  0%   86.43µ ± 0%        ~ (p=0.052 n=10)
JSONEncode              18.58m ±  0%   18.55m ± 0%        ~ (p=0.089 n=10)
JSONDecode              77.58m ±  0%   78.25m ± 0%   +0.88% (p=0.000 n=10)
Mandelbrot200           7.221m ±  0%   7.214m ± 0%   -0.10% (p=0.000 n=10)
GoParse                 7.653m ±  1%   7.921m ± 2%   +3.51% (p=0.000 n=10)
RegexpMatchEasy0_32     140.2n ±  0%   133.0n ± 0%   -5.14% (p=0.000 n=10)
RegexpMatchEasy0_1K     1.539µ ±  0%   1.361µ ± 0%  -11.57% (p=0.000 n=10)
RegexpMatchEasy1_32     161.8n ±  0%   162.9n ± 0%   +0.68% (p=0.000 n=10)
RegexpMatchEasy1_1K     1.633µ ±  0%   1.491µ ± 0%   -8.70% (p=0.000 n=10)
RegexpMatchMedium_32    1.369µ ±  0%   1.411µ ± 0%   +3.07% (p=0.000 n=10)
RegexpMatchMedium_1K    39.98µ ±  0%   41.49µ ± 0%   +3.78% (p=0.000 n=10)
RegexpMatchHard_32      2.100µ ±  0%   2.057µ ± 0%   -2.05% (p=0.000 n=10)
RegexpMatchHard_1K      62.54µ ±  0%   60.80µ ± 0%   -2.78% (p=0.000 n=10)
Revcomp                  1.351 ±  0%    1.196 ± 0%  -11.46% (p=0.000 n=10)
Template                117.9m ±  1%   115.4m ± 2%   -2.07% (p=0.002 n=10)
TimeParse               408.1n ±  0%   397.3n ± 0%   -2.65% (p=0.000 n=10)
TimeFormat              508.0n ±  0%   505.1n ± 0%   -0.58% (p=0.000 n=10)
geomean                 103.5µ         102.6µ        -0.94%

                     │    before     │                after                 │
                     │      B/s      │     B/s       vs base                │
GobDecode              47.33Mi ± 20%   57.08Mi ± 2%  +20.59% (p=0.000 n=10)
GobEncode              40.87Mi ±  1%   41.01Mi ± 3%        ~ (p=0.288 n=10)
Gzip                   44.49Mi ±  0%   45.04Mi ± 0%   +1.23% (p=0.000 n=10)
Gunzip                 216.9Mi ±  0%   211.6Mi ± 0%   -2.40% (p=0.000 n=10)
JSONEncode             99.61Mi ±  0%   99.77Mi ± 0%        ~ (p=0.078 n=10)
JSONDecode             23.86Mi ±  0%   23.65Mi ± 0%   -0.86% (p=0.000 n=10)
GoParse                7.215Mi ±  1%   6.976Mi ± 2%   -3.30% (p=0.000 n=10)
RegexpMatchEasy0_32    217.7Mi ±  0%   229.5Mi ± 0%   +5.42% (p=0.000 n=10)
RegexpMatchEasy0_1K    634.7Mi ±  0%   717.6Mi ± 0%  +13.07% (p=0.000 n=10)
RegexpMatchEasy1_32    188.6Mi ±  0%   187.3Mi ± 0%   -0.68% (p=0.000 n=10)
RegexpMatchEasy1_1K    598.2Mi ±  0%   655.0Mi ± 0%   +9.50% (p=0.000 n=10)
RegexpMatchMedium_32   22.29Mi ±  0%   21.63Mi ± 0%   -2.95% (p=0.000 n=10)
RegexpMatchMedium_1K   24.42Mi ±  0%   23.54Mi ± 0%   -3.63% (p=0.000 n=10)
RegexpMatchHard_32     14.53Mi ±  0%   14.83Mi ± 0%   +2.03% (p=0.000 n=10)
RegexpMatchHard_1K     15.62Mi ±  0%   16.06Mi ± 0%   +2.84% (p=0.000 n=10)
Revcomp                179.5Mi ±  0%   202.7Mi ± 0%  +12.94% (p=0.000 n=10)
Template               15.70Mi ±  1%   16.04Mi ± 2%   +2.16% (p=0.001 n=10)
geomean                60.09Mi         61.96Mi        +3.12%

The benchmark runs were wrapped by `perf stat record` for recording
dynamic instruction counts. A 1.27% reduction (3217920720927 ->
3176953433562) was recorded.

Of the few regressed individual cases, Fannkuch11 saw a tiny increase
of 0.06% in dynamic instruction count, but also a rise of mispredicted
branches from 3.88% to 4.30%; FmtFprintfPrefixedInt's dynamic
instruction count decreased by 6.62% and mispredicted branches ratio
dropped from 0.30% to 0.21%, but cycle count increased by 1.41%. (Data
was obtained also by wrapping individual test binary runs with
`perf stat`). Hence, it is likely the individual regressions are caused
by LA464 micro-architecture quirks, and not because of accidental
pessimization; and that the optimization should be a net win on
future quirk-free micro-architectures.

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d))
into x+(c-d) as well.

benchmark TODO

Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
@gopherbot
Copy link

Change https://go.dev/cl/479496 mentions this issue: cmd/asm: use single-instruction forms for all loong64 sign and zero extensions

@gopherbot
Copy link

Change https://go.dev/cl/479498 mentions this issue: cmd/compile: wire up math/bits.TrailingZeros intrinsics for loong64

xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
…xtensions

8- and 16-bit sign extensions and 32-bit zero extensions were realized
with left and right shifts before this change. We now support assembling
EXTWB, EXTWH and BSTRPICKV so all three can be done with a single insn.

Benchmark results on Loongson 3A5000:

goos: linux
goarch: loong64
pkg: test/bench/go1
                      │    before    │                after                │
                      │    sec/op    │   sec/op     vs base                │
BinaryTree17             14.55 ±  1%    14.56 ± 1%        ~ (p=0.912 n=10)
Fannkuch11               3.574 ±  0%    3.711 ± 0%   +3.83% (p=0.000 n=10)
FmtFprintfEmpty         92.89n ±  0%   93.44n ± 0%   +0.60% (p=0.000 n=10)
FmtFprintfString        148.9n ±  0%   157.1n ± 0%   +5.51% (p=0.000 n=10)
FmtFprintfInt           153.5n ±  0%   158.9n ± 0%   +3.52% (p=0.000 n=10)
FmtFprintfIntInt        234.6n ±  0%   241.0n ± 0%   +2.73% (p=0.000 n=10)
FmtFprintfPrefixedInt   317.3n ±  0%   345.2n ± 0%   +8.79% (p=0.000 n=10)
FmtFprintfFloat         404.9n ±  0%   402.2n ± 0%   -0.67% (p=0.000 n=10)
FmtManyArgs             929.1n ±  0%   963.2n ± 0%   +3.66% (p=0.000 n=10)
GobDecode               15.47m ± 17%   12.82m ± 2%  -17.08% (p=0.000 n=10)
GobEncode               17.91m ±  1%   17.85m ± 3%        ~ (p=0.247 n=10)
Gzip                    416.0m ±  0%   410.9m ± 0%   -1.22% (p=0.000 n=10)
Gunzip                  85.34m ±  0%   87.44m ± 0%   +2.46% (p=0.000 n=10)
HTTPClientServer        86.64µ ±  0%   86.43µ ± 0%        ~ (p=0.052 n=10)
JSONEncode              18.58m ±  0%   18.55m ± 0%        ~ (p=0.089 n=10)
JSONDecode              77.58m ±  0%   78.25m ± 0%   +0.88% (p=0.000 n=10)
Mandelbrot200           7.221m ±  0%   7.214m ± 0%   -0.10% (p=0.000 n=10)
GoParse                 7.653m ±  1%   7.921m ± 2%   +3.51% (p=0.000 n=10)
RegexpMatchEasy0_32     140.2n ±  0%   133.0n ± 0%   -5.14% (p=0.000 n=10)
RegexpMatchEasy0_1K     1.539µ ±  0%   1.361µ ± 0%  -11.57% (p=0.000 n=10)
RegexpMatchEasy1_32     161.8n ±  0%   162.9n ± 0%   +0.68% (p=0.000 n=10)
RegexpMatchEasy1_1K     1.633µ ±  0%   1.491µ ± 0%   -8.70% (p=0.000 n=10)
RegexpMatchMedium_32    1.369µ ±  0%   1.411µ ± 0%   +3.07% (p=0.000 n=10)
RegexpMatchMedium_1K    39.98µ ±  0%   41.49µ ± 0%   +3.78% (p=0.000 n=10)
RegexpMatchHard_32      2.100µ ±  0%   2.057µ ± 0%   -2.05% (p=0.000 n=10)
RegexpMatchHard_1K      62.54µ ±  0%   60.80µ ± 0%   -2.78% (p=0.000 n=10)
Revcomp                  1.351 ±  0%    1.196 ± 0%  -11.46% (p=0.000 n=10)
Template                117.9m ±  1%   115.4m ± 2%   -2.07% (p=0.002 n=10)
TimeParse               408.1n ±  0%   397.3n ± 0%   -2.65% (p=0.000 n=10)
TimeFormat              508.0n ±  0%   505.1n ± 0%   -0.58% (p=0.000 n=10)
geomean                 103.5µ         102.6µ        -0.94%

                     │    before     │                after                 │
                     │      B/s      │     B/s       vs base                │
GobDecode              47.33Mi ± 20%   57.08Mi ± 2%  +20.59% (p=0.000 n=10)
GobEncode              40.87Mi ±  1%   41.01Mi ± 3%        ~ (p=0.288 n=10)
Gzip                   44.49Mi ±  0%   45.04Mi ± 0%   +1.23% (p=0.000 n=10)
Gunzip                 216.9Mi ±  0%   211.6Mi ± 0%   -2.40% (p=0.000 n=10)
JSONEncode             99.61Mi ±  0%   99.77Mi ± 0%        ~ (p=0.078 n=10)
JSONDecode             23.86Mi ±  0%   23.65Mi ± 0%   -0.86% (p=0.000 n=10)
GoParse                7.215Mi ±  1%   6.976Mi ± 2%   -3.30% (p=0.000 n=10)
RegexpMatchEasy0_32    217.7Mi ±  0%   229.5Mi ± 0%   +5.42% (p=0.000 n=10)
RegexpMatchEasy0_1K    634.7Mi ±  0%   717.6Mi ± 0%  +13.07% (p=0.000 n=10)
RegexpMatchEasy1_32    188.6Mi ±  0%   187.3Mi ± 0%   -0.68% (p=0.000 n=10)
RegexpMatchEasy1_1K    598.2Mi ±  0%   655.0Mi ± 0%   +9.50% (p=0.000 n=10)
RegexpMatchMedium_32   22.29Mi ±  0%   21.63Mi ± 0%   -2.95% (p=0.000 n=10)
RegexpMatchMedium_1K   24.42Mi ±  0%   23.54Mi ± 0%   -3.63% (p=0.000 n=10)
RegexpMatchHard_32     14.53Mi ±  0%   14.83Mi ± 0%   +2.03% (p=0.000 n=10)
RegexpMatchHard_1K     15.62Mi ±  0%   16.06Mi ± 0%   +2.84% (p=0.000 n=10)
Revcomp                179.5Mi ±  0%   202.7Mi ± 0%  +12.94% (p=0.000 n=10)
Template               15.70Mi ±  1%   16.04Mi ± 2%   +2.16% (p=0.001 n=10)
geomean                60.09Mi         61.96Mi        +3.12%

The test binaries were pre-compiled with `go test -c`, and the test runs
were wrapped by `perf stat record` for recording dynamic instruction
counts. A 1.27% reduction (3217920720927 -> 3176953433562) was recorded.

Of the few regressed individual cases, Fannkuch11 saw a tiny increase
of 0.06% in dynamic instruction count, but also a rise of mispredicted
branches from 3.88% to 4.30%; FmtFprintfPrefixedInt's dynamic
instruction count decreased by 6.62% and mispredicted branches ratio
dropped from 0.30% to 0.21%, but cycle count increased by 1.41%. (Data
was obtained also by wrapping individual test binary runs with
`perf stat`). Hence, it is likely the individual regressions are caused
by LA464 micro-architecture quirks, and not because of accidental
pessimization; and that the optimization should be a net win on
future quirk-free micro-architectures.

Updates golang#59120

Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
The runtime malloc implementation makes use of these, among others.

Benchmark results on Loongson 3A5000:

goos: linux
goarch: loong64
pkg: test/bench/go1
                      │  CL 479496  │               this CL               │
                      │   sec/op    │    sec/op     vs base               │
BinaryTree17             14.56 ± 1%    14.10 ±  1%  -3.15% (p=0.000 n=10)
Fannkuch11               3.711 ± 0%    3.705 ±  0%  -0.16% (p=0.000 n=10)
FmtFprintfEmpty         93.44n ± 0%   93.11n ±  0%  -0.36% (p=0.000 n=10)
FmtFprintfString        157.1n ± 0%   150.8n ±  0%  -4.01% (p=0.000 n=10)
FmtFprintfInt           158.9n ± 0%   156.2n ±  0%  -1.70% (p=0.000 n=10)
FmtFprintfIntInt        241.0n ± 0%   243.5n ±  0%  +1.04% (p=0.000 n=10)
FmtFprintfPrefixedInt   345.2n ± 0%   360.8n ±  1%  +4.52% (p=0.000 n=10)
FmtFprintfFloat         402.2n ± 0%   395.2n ±  0%  -1.74% (p=0.000 n=10)
FmtManyArgs             963.2n ± 0%   957.9n ±  0%  -0.55% (p=0.000 n=10)
GobDecode               12.82m ± 2%   13.60m ± 15%  +6.03% (p=0.004 n=10)
GobEncode               17.85m ± 3%   17.37m ±  0%  -2.67% (p=0.000 n=10)
Gzip                    410.9m ± 0%   404.6m ±  0%  -1.54% (p=0.000 n=10)
Gunzip                  87.44m ± 0%   86.58m ±  0%  -0.98% (p=0.000 n=10)
HTTPClientServer        86.43µ ± 0%   86.99µ ±  1%  +0.65% (p=0.002 n=10)
JSONEncode              18.55m ± 0%   19.51m ±  0%  +5.19% (p=0.000 n=10)
JSONDecode              78.25m ± 0%   77.00m ±  1%  -1.60% (p=0.000 n=10)
Mandelbrot200           7.214m ± 0%   7.237m ±  0%  +0.32% (p=0.000 n=10)
GoParse                 7.921m ± 2%   7.406m ±  2%  -6.51% (p=0.000 n=10)
RegexpMatchEasy0_32     133.0n ± 0%   132.7n ±  0%  -0.23% (p=0.000 n=10)
RegexpMatchEasy0_1K     1.361µ ± 0%   1.360µ ±  0%  -0.07% (p=0.000 n=10)
RegexpMatchEasy1_32     162.9n ± 0%   161.1n ±  0%  -1.10% (p=0.000 n=10)
RegexpMatchEasy1_1K     1.491µ ± 0%   1.523µ ±  0%  +2.11% (p=0.000 n=10)
RegexpMatchMedium_32    1.411µ ± 0%   1.381µ ±  0%  -2.13% (p=0.000 n=10)
RegexpMatchMedium_1K    41.49µ ± 0%   40.27µ ±  0%  -2.95% (p=0.000 n=10)
RegexpMatchHard_32      2.057µ ± 0%   2.055µ ±  0%  -0.10% (p=0.000 n=10)
RegexpMatchHard_1K      60.80µ ± 0%   60.80µ ±  0%       ~ (p=0.744 n=10)
Revcomp                  1.196 ± 0%    1.210 ±  0%  +1.22% (p=0.000 n=10)
Template                115.4m ± 2%   113.3m ±  4%  -1.80% (p=0.005 n=10)
TimeParse               397.3n ± 0%   397.5n ±  0%  +0.05% (p=0.000 n=10)
TimeFormat              505.1n ± 0%   494.5n ±  0%  -2.09% (p=0.000 n=10)
geomean                 102.6µ        102.0µ        -0.51%

                     │  CL 479496   │               this CL                │
                     │     B/s      │      B/s       vs base               │
GobDecode              57.08Mi ± 2%   53.97Mi ± 14%  -5.44% (p=0.004 n=10)
GobEncode              41.01Mi ± 3%   42.13Mi ±  0%  +2.74% (p=0.000 n=10)
Gzip                   45.04Mi ± 0%   45.74Mi ±  0%  +1.57% (p=0.000 n=10)
Gunzip                 211.6Mi ± 0%   213.7Mi ±  0%  +0.98% (p=0.000 n=10)
JSONEncode             99.77Mi ± 0%   94.84Mi ±  0%  -4.94% (p=0.000 n=10)
JSONDecode             23.65Mi ± 0%   24.03Mi ±  1%  +1.61% (p=0.000 n=10)
GoParse                6.976Mi ± 2%   7.458Mi ±  2%  +6.90% (p=0.000 n=10)
RegexpMatchEasy0_32    229.5Mi ± 0%   230.0Mi ±  0%  +0.25% (p=0.000 n=10)
RegexpMatchEasy0_1K    717.6Mi ± 0%   718.1Mi ±  0%  +0.07% (p=0.000 n=10)
RegexpMatchEasy1_32    187.3Mi ± 0%   189.4Mi ±  0%  +1.08% (p=0.000 n=10)
RegexpMatchEasy1_1K    655.0Mi ± 0%   641.4Mi ±  0%  -2.08% (p=0.000 n=10)
RegexpMatchMedium_32   21.63Mi ± 0%   22.10Mi ±  0%  +2.16% (p=0.000 n=10)
RegexpMatchMedium_1K   23.54Mi ± 0%   24.25Mi ±  0%  +3.04% (p=0.000 n=10)
RegexpMatchHard_32     14.83Mi ± 0%   14.85Mi ±  0%  +0.13% (p=0.000 n=10)
RegexpMatchHard_1K     16.06Mi ± 0%   16.06Mi ±  0%       ~ (p=1.000 n=10)
Revcomp                202.7Mi ± 0%   200.3Mi ±  0%  -1.20% (p=0.000 n=10)
Template               16.04Mi ± 2%   16.33Mi ±  4%  +1.81% (p=0.004 n=10)
geomean                61.96Mi        62.25Mi        +0.47%

The test binaries were pre-compiled with `go test -c` and test runs were
wrapped with `perf stat record` for tracking dynamic instruction counts,
of which a 0.36% reduction was recorded.

The change should be a net win, as all it does is to pattern-match and
replace Ctz ops into respective native instructions, so performance
regressions are likely also micro-architecture related, like observed in
CL 479496's results.

Updates golang#59120

Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d))
into x+(c-d) as well.

benchmark TODO

Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d))
into x+(c-d) as well.

benchmark TODO

Updates golang#59120

Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
Updates golang#59120

Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n added a commit to xen0n/go that referenced this issue Mar 27, 2023
…n loong64

tests TODO

Updates golang#59120

Change-Id: Icde85d717999600954244c1105b7c55759d3469f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-loong64 Issues solely affecting the loongson architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

4 participants