New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: implement more optimizations on loong64 #59120
Labels
arch-loong64
Issues solely affecting the loongson architecture.
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsFix
The path to resolution is known, but the work has not been done.
Performance
Milestone
Comments
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
…ns with EXTW{B,H} Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 20, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 21, 2023
…ns with EXTW{B,H} Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 21, 2023
Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 21, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 21, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 21, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
…xtensions Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
…xtensions Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 25, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
…xtensions 8- and 16-bit sign extensions and 32-bit zero extensions were realized with left and right shifts before this change. We now support assembling EXTWB, EXTWH and BSTRPICKV so all three can be done with a single insn. Benchmark results on Loongson 3A5000: goos: linux goarch: loong64 pkg: test/bench/go1 │ before │ after │ │ sec/op │ sec/op vs base │ BinaryTree17 14.55 ± 1% 14.56 ± 1% ~ (p=0.912 n=10) Fannkuch11 3.574 ± 0% 3.711 ± 0% +3.83% (p=0.000 n=10) FmtFprintfEmpty 92.89n ± 0% 93.44n ± 0% +0.60% (p=0.000 n=10) FmtFprintfString 148.9n ± 0% 157.1n ± 0% +5.51% (p=0.000 n=10) FmtFprintfInt 153.5n ± 0% 158.9n ± 0% +3.52% (p=0.000 n=10) FmtFprintfIntInt 234.6n ± 0% 241.0n ± 0% +2.73% (p=0.000 n=10) FmtFprintfPrefixedInt 317.3n ± 0% 345.2n ± 0% +8.79% (p=0.000 n=10) FmtFprintfFloat 404.9n ± 0% 402.2n ± 0% -0.67% (p=0.000 n=10) FmtManyArgs 929.1n ± 0% 963.2n ± 0% +3.66% (p=0.000 n=10) GobDecode 15.47m ± 17% 12.82m ± 2% -17.08% (p=0.000 n=10) GobEncode 17.91m ± 1% 17.85m ± 3% ~ (p=0.247 n=10) Gzip 416.0m ± 0% 410.9m ± 0% -1.22% (p=0.000 n=10) Gunzip 85.34m ± 0% 87.44m ± 0% +2.46% (p=0.000 n=10) HTTPClientServer 86.64µ ± 0% 86.43µ ± 0% ~ (p=0.052 n=10) JSONEncode 18.58m ± 0% 18.55m ± 0% ~ (p=0.089 n=10) JSONDecode 77.58m ± 0% 78.25m ± 0% +0.88% (p=0.000 n=10) Mandelbrot200 7.221m ± 0% 7.214m ± 0% -0.10% (p=0.000 n=10) GoParse 7.653m ± 1% 7.921m ± 2% +3.51% (p=0.000 n=10) RegexpMatchEasy0_32 140.2n ± 0% 133.0n ± 0% -5.14% (p=0.000 n=10) RegexpMatchEasy0_1K 1.539µ ± 0% 1.361µ ± 0% -11.57% (p=0.000 n=10) RegexpMatchEasy1_32 161.8n ± 0% 162.9n ± 0% +0.68% (p=0.000 n=10) RegexpMatchEasy1_1K 1.633µ ± 0% 1.491µ ± 0% -8.70% (p=0.000 n=10) RegexpMatchMedium_32 1.369µ ± 0% 1.411µ ± 0% +3.07% (p=0.000 n=10) RegexpMatchMedium_1K 39.98µ ± 0% 41.49µ ± 0% +3.78% (p=0.000 n=10) RegexpMatchHard_32 2.100µ ± 0% 2.057µ ± 0% -2.05% (p=0.000 n=10) RegexpMatchHard_1K 62.54µ ± 0% 60.80µ ± 0% -2.78% (p=0.000 n=10) Revcomp 1.351 ± 0% 1.196 ± 0% -11.46% (p=0.000 n=10) Template 117.9m ± 1% 115.4m ± 2% -2.07% (p=0.002 n=10) TimeParse 408.1n ± 0% 397.3n ± 0% -2.65% (p=0.000 n=10) TimeFormat 508.0n ± 0% 505.1n ± 0% -0.58% (p=0.000 n=10) geomean 103.5µ 102.6µ -0.94% │ before │ after │ │ B/s │ B/s vs base │ GobDecode 47.33Mi ± 20% 57.08Mi ± 2% +20.59% (p=0.000 n=10) GobEncode 40.87Mi ± 1% 41.01Mi ± 3% ~ (p=0.288 n=10) Gzip 44.49Mi ± 0% 45.04Mi ± 0% +1.23% (p=0.000 n=10) Gunzip 216.9Mi ± 0% 211.6Mi ± 0% -2.40% (p=0.000 n=10) JSONEncode 99.61Mi ± 0% 99.77Mi ± 0% ~ (p=0.078 n=10) JSONDecode 23.86Mi ± 0% 23.65Mi ± 0% -0.86% (p=0.000 n=10) GoParse 7.215Mi ± 1% 6.976Mi ± 2% -3.30% (p=0.000 n=10) RegexpMatchEasy0_32 217.7Mi ± 0% 229.5Mi ± 0% +5.42% (p=0.000 n=10) RegexpMatchEasy0_1K 634.7Mi ± 0% 717.6Mi ± 0% +13.07% (p=0.000 n=10) RegexpMatchEasy1_32 188.6Mi ± 0% 187.3Mi ± 0% -0.68% (p=0.000 n=10) RegexpMatchEasy1_1K 598.2Mi ± 0% 655.0Mi ± 0% +9.50% (p=0.000 n=10) RegexpMatchMedium_32 22.29Mi ± 0% 21.63Mi ± 0% -2.95% (p=0.000 n=10) RegexpMatchMedium_1K 24.42Mi ± 0% 23.54Mi ± 0% -3.63% (p=0.000 n=10) RegexpMatchHard_32 14.53Mi ± 0% 14.83Mi ± 0% +2.03% (p=0.000 n=10) RegexpMatchHard_1K 15.62Mi ± 0% 16.06Mi ± 0% +2.84% (p=0.000 n=10) Revcomp 179.5Mi ± 0% 202.7Mi ± 0% +12.94% (p=0.000 n=10) Template 15.70Mi ± 1% 16.04Mi ± 2% +2.16% (p=0.001 n=10) geomean 60.09Mi 61.96Mi +3.12% The benchmark runs were wrapped by `perf stat record` for recording dynamic instruction counts. A 1.27% reduction (3217920720927 -> 3176953433562) was recorded. Of the few regressed individual cases, Fannkuch11 saw a tiny increase of 0.06% in dynamic instruction count, but also a rise of mispredicted branches from 3.88% to 4.30%; FmtFprintfPrefixedInt's dynamic instruction count decreased by 6.62% and mispredicted branches ratio dropped from 0.30% to 0.21%, but cycle count increased by 1.41%. (Data was obtained also by wrapping individual test binary runs with `perf stat`). Hence, it is likely the individual regressions are caused by LA464 micro-architecture quirks, and not because of accidental pessimization; and that the optimization should be a net win on future quirk-free micro-architectures. Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d)) into x+(c-d) as well. benchmark TODO Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
Change https://go.dev/cl/479496 mentions this issue: |
Change https://go.dev/cl/479498 mentions this issue: |
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
…xtensions 8- and 16-bit sign extensions and 32-bit zero extensions were realized with left and right shifts before this change. We now support assembling EXTWB, EXTWH and BSTRPICKV so all three can be done with a single insn. Benchmark results on Loongson 3A5000: goos: linux goarch: loong64 pkg: test/bench/go1 │ before │ after │ │ sec/op │ sec/op vs base │ BinaryTree17 14.55 ± 1% 14.56 ± 1% ~ (p=0.912 n=10) Fannkuch11 3.574 ± 0% 3.711 ± 0% +3.83% (p=0.000 n=10) FmtFprintfEmpty 92.89n ± 0% 93.44n ± 0% +0.60% (p=0.000 n=10) FmtFprintfString 148.9n ± 0% 157.1n ± 0% +5.51% (p=0.000 n=10) FmtFprintfInt 153.5n ± 0% 158.9n ± 0% +3.52% (p=0.000 n=10) FmtFprintfIntInt 234.6n ± 0% 241.0n ± 0% +2.73% (p=0.000 n=10) FmtFprintfPrefixedInt 317.3n ± 0% 345.2n ± 0% +8.79% (p=0.000 n=10) FmtFprintfFloat 404.9n ± 0% 402.2n ± 0% -0.67% (p=0.000 n=10) FmtManyArgs 929.1n ± 0% 963.2n ± 0% +3.66% (p=0.000 n=10) GobDecode 15.47m ± 17% 12.82m ± 2% -17.08% (p=0.000 n=10) GobEncode 17.91m ± 1% 17.85m ± 3% ~ (p=0.247 n=10) Gzip 416.0m ± 0% 410.9m ± 0% -1.22% (p=0.000 n=10) Gunzip 85.34m ± 0% 87.44m ± 0% +2.46% (p=0.000 n=10) HTTPClientServer 86.64µ ± 0% 86.43µ ± 0% ~ (p=0.052 n=10) JSONEncode 18.58m ± 0% 18.55m ± 0% ~ (p=0.089 n=10) JSONDecode 77.58m ± 0% 78.25m ± 0% +0.88% (p=0.000 n=10) Mandelbrot200 7.221m ± 0% 7.214m ± 0% -0.10% (p=0.000 n=10) GoParse 7.653m ± 1% 7.921m ± 2% +3.51% (p=0.000 n=10) RegexpMatchEasy0_32 140.2n ± 0% 133.0n ± 0% -5.14% (p=0.000 n=10) RegexpMatchEasy0_1K 1.539µ ± 0% 1.361µ ± 0% -11.57% (p=0.000 n=10) RegexpMatchEasy1_32 161.8n ± 0% 162.9n ± 0% +0.68% (p=0.000 n=10) RegexpMatchEasy1_1K 1.633µ ± 0% 1.491µ ± 0% -8.70% (p=0.000 n=10) RegexpMatchMedium_32 1.369µ ± 0% 1.411µ ± 0% +3.07% (p=0.000 n=10) RegexpMatchMedium_1K 39.98µ ± 0% 41.49µ ± 0% +3.78% (p=0.000 n=10) RegexpMatchHard_32 2.100µ ± 0% 2.057µ ± 0% -2.05% (p=0.000 n=10) RegexpMatchHard_1K 62.54µ ± 0% 60.80µ ± 0% -2.78% (p=0.000 n=10) Revcomp 1.351 ± 0% 1.196 ± 0% -11.46% (p=0.000 n=10) Template 117.9m ± 1% 115.4m ± 2% -2.07% (p=0.002 n=10) TimeParse 408.1n ± 0% 397.3n ± 0% -2.65% (p=0.000 n=10) TimeFormat 508.0n ± 0% 505.1n ± 0% -0.58% (p=0.000 n=10) geomean 103.5µ 102.6µ -0.94% │ before │ after │ │ B/s │ B/s vs base │ GobDecode 47.33Mi ± 20% 57.08Mi ± 2% +20.59% (p=0.000 n=10) GobEncode 40.87Mi ± 1% 41.01Mi ± 3% ~ (p=0.288 n=10) Gzip 44.49Mi ± 0% 45.04Mi ± 0% +1.23% (p=0.000 n=10) Gunzip 216.9Mi ± 0% 211.6Mi ± 0% -2.40% (p=0.000 n=10) JSONEncode 99.61Mi ± 0% 99.77Mi ± 0% ~ (p=0.078 n=10) JSONDecode 23.86Mi ± 0% 23.65Mi ± 0% -0.86% (p=0.000 n=10) GoParse 7.215Mi ± 1% 6.976Mi ± 2% -3.30% (p=0.000 n=10) RegexpMatchEasy0_32 217.7Mi ± 0% 229.5Mi ± 0% +5.42% (p=0.000 n=10) RegexpMatchEasy0_1K 634.7Mi ± 0% 717.6Mi ± 0% +13.07% (p=0.000 n=10) RegexpMatchEasy1_32 188.6Mi ± 0% 187.3Mi ± 0% -0.68% (p=0.000 n=10) RegexpMatchEasy1_1K 598.2Mi ± 0% 655.0Mi ± 0% +9.50% (p=0.000 n=10) RegexpMatchMedium_32 22.29Mi ± 0% 21.63Mi ± 0% -2.95% (p=0.000 n=10) RegexpMatchMedium_1K 24.42Mi ± 0% 23.54Mi ± 0% -3.63% (p=0.000 n=10) RegexpMatchHard_32 14.53Mi ± 0% 14.83Mi ± 0% +2.03% (p=0.000 n=10) RegexpMatchHard_1K 15.62Mi ± 0% 16.06Mi ± 0% +2.84% (p=0.000 n=10) Revcomp 179.5Mi ± 0% 202.7Mi ± 0% +12.94% (p=0.000 n=10) Template 15.70Mi ± 1% 16.04Mi ± 2% +2.16% (p=0.001 n=10) geomean 60.09Mi 61.96Mi +3.12% The test binaries were pre-compiled with `go test -c`, and the test runs were wrapped by `perf stat record` for recording dynamic instruction counts. A 1.27% reduction (3217920720927 -> 3176953433562) was recorded. Of the few regressed individual cases, Fannkuch11 saw a tiny increase of 0.06% in dynamic instruction count, but also a rise of mispredicted branches from 3.88% to 4.30%; FmtFprintfPrefixedInt's dynamic instruction count decreased by 6.62% and mispredicted branches ratio dropped from 0.30% to 0.21%, but cycle count increased by 1.41%. (Data was obtained also by wrapping individual test binary runs with `perf stat`). Hence, it is likely the individual regressions are caused by LA464 micro-architecture quirks, and not because of accidental pessimization; and that the optimization should be a net win on future quirk-free micro-architectures. Updates golang#59120 Change-Id: Ia7dd0dfe20c0ea3e64889e2b38c6b2118b50d56e
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
The runtime malloc implementation makes use of these, among others. Benchmark results on Loongson 3A5000: goos: linux goarch: loong64 pkg: test/bench/go1 │ CL 479496 │ this CL │ │ sec/op │ sec/op vs base │ BinaryTree17 14.56 ± 1% 14.10 ± 1% -3.15% (p=0.000 n=10) Fannkuch11 3.711 ± 0% 3.705 ± 0% -0.16% (p=0.000 n=10) FmtFprintfEmpty 93.44n ± 0% 93.11n ± 0% -0.36% (p=0.000 n=10) FmtFprintfString 157.1n ± 0% 150.8n ± 0% -4.01% (p=0.000 n=10) FmtFprintfInt 158.9n ± 0% 156.2n ± 0% -1.70% (p=0.000 n=10) FmtFprintfIntInt 241.0n ± 0% 243.5n ± 0% +1.04% (p=0.000 n=10) FmtFprintfPrefixedInt 345.2n ± 0% 360.8n ± 1% +4.52% (p=0.000 n=10) FmtFprintfFloat 402.2n ± 0% 395.2n ± 0% -1.74% (p=0.000 n=10) FmtManyArgs 963.2n ± 0% 957.9n ± 0% -0.55% (p=0.000 n=10) GobDecode 12.82m ± 2% 13.60m ± 15% +6.03% (p=0.004 n=10) GobEncode 17.85m ± 3% 17.37m ± 0% -2.67% (p=0.000 n=10) Gzip 410.9m ± 0% 404.6m ± 0% -1.54% (p=0.000 n=10) Gunzip 87.44m ± 0% 86.58m ± 0% -0.98% (p=0.000 n=10) HTTPClientServer 86.43µ ± 0% 86.99µ ± 1% +0.65% (p=0.002 n=10) JSONEncode 18.55m ± 0% 19.51m ± 0% +5.19% (p=0.000 n=10) JSONDecode 78.25m ± 0% 77.00m ± 1% -1.60% (p=0.000 n=10) Mandelbrot200 7.214m ± 0% 7.237m ± 0% +0.32% (p=0.000 n=10) GoParse 7.921m ± 2% 7.406m ± 2% -6.51% (p=0.000 n=10) RegexpMatchEasy0_32 133.0n ± 0% 132.7n ± 0% -0.23% (p=0.000 n=10) RegexpMatchEasy0_1K 1.361µ ± 0% 1.360µ ± 0% -0.07% (p=0.000 n=10) RegexpMatchEasy1_32 162.9n ± 0% 161.1n ± 0% -1.10% (p=0.000 n=10) RegexpMatchEasy1_1K 1.491µ ± 0% 1.523µ ± 0% +2.11% (p=0.000 n=10) RegexpMatchMedium_32 1.411µ ± 0% 1.381µ ± 0% -2.13% (p=0.000 n=10) RegexpMatchMedium_1K 41.49µ ± 0% 40.27µ ± 0% -2.95% (p=0.000 n=10) RegexpMatchHard_32 2.057µ ± 0% 2.055µ ± 0% -0.10% (p=0.000 n=10) RegexpMatchHard_1K 60.80µ ± 0% 60.80µ ± 0% ~ (p=0.744 n=10) Revcomp 1.196 ± 0% 1.210 ± 0% +1.22% (p=0.000 n=10) Template 115.4m ± 2% 113.3m ± 4% -1.80% (p=0.005 n=10) TimeParse 397.3n ± 0% 397.5n ± 0% +0.05% (p=0.000 n=10) TimeFormat 505.1n ± 0% 494.5n ± 0% -2.09% (p=0.000 n=10) geomean 102.6µ 102.0µ -0.51% │ CL 479496 │ this CL │ │ B/s │ B/s vs base │ GobDecode 57.08Mi ± 2% 53.97Mi ± 14% -5.44% (p=0.004 n=10) GobEncode 41.01Mi ± 3% 42.13Mi ± 0% +2.74% (p=0.000 n=10) Gzip 45.04Mi ± 0% 45.74Mi ± 0% +1.57% (p=0.000 n=10) Gunzip 211.6Mi ± 0% 213.7Mi ± 0% +0.98% (p=0.000 n=10) JSONEncode 99.77Mi ± 0% 94.84Mi ± 0% -4.94% (p=0.000 n=10) JSONDecode 23.65Mi ± 0% 24.03Mi ± 1% +1.61% (p=0.000 n=10) GoParse 6.976Mi ± 2% 7.458Mi ± 2% +6.90% (p=0.000 n=10) RegexpMatchEasy0_32 229.5Mi ± 0% 230.0Mi ± 0% +0.25% (p=0.000 n=10) RegexpMatchEasy0_1K 717.6Mi ± 0% 718.1Mi ± 0% +0.07% (p=0.000 n=10) RegexpMatchEasy1_32 187.3Mi ± 0% 189.4Mi ± 0% +1.08% (p=0.000 n=10) RegexpMatchEasy1_1K 655.0Mi ± 0% 641.4Mi ± 0% -2.08% (p=0.000 n=10) RegexpMatchMedium_32 21.63Mi ± 0% 22.10Mi ± 0% +2.16% (p=0.000 n=10) RegexpMatchMedium_1K 23.54Mi ± 0% 24.25Mi ± 0% +3.04% (p=0.000 n=10) RegexpMatchHard_32 14.83Mi ± 0% 14.85Mi ± 0% +0.13% (p=0.000 n=10) RegexpMatchHard_1K 16.06Mi ± 0% 16.06Mi ± 0% ~ (p=1.000 n=10) Revcomp 202.7Mi ± 0% 200.3Mi ± 0% -1.20% (p=0.000 n=10) Template 16.04Mi ± 2% 16.33Mi ± 4% +1.81% (p=0.004 n=10) geomean 61.96Mi 62.25Mi +0.47% The test binaries were pre-compiled with `go test -c` and test runs were wrapped with `perf stat record` for tracking dynamic instruction counts, of which a 0.36% reduction was recorded. The change should be a net win, as all it does is to pattern-match and replace Ctz ops into respective native instructions, so performance regressions are likely also micro-architecture related, like observed in CL 479496's results. Updates golang#59120 Change-Id: I6c90f727eb00e0add2a5f8575ac045b9e288af54
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d)) into x+(c-d) as well. benchmark TODO Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
For the SubFromLen64 test case to work, we need to fold c-(-(x-d)) into x+(c-d) as well. benchmark TODO Updates golang#59120 Change-Id: Icc8f7d8e79c6168aae634f5c36f044f3fd034d89
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
Updates golang#59120 Change-Id: I39c1edbd7363f454ad1e848a25abeced722b16ac
xen0n
added a commit
to xen0n/go
that referenced
this issue
Mar 27, 2023
…n loong64 tests TODO Updates golang#59120 Change-Id: Icde85d717999600954244c1105b7c55759d3469f
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
arch-loong64
Issues solely affecting the loongson architecture.
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsFix
The path to resolution is known, but the work has not been done.
Performance
This issue is mainly for tracking the implementation progress of various low-hanging fruits regarding
loong64
optimizations.There are many missed optimization chances on
loong64
. A quick survey on SSA intrinsics uncovers:runtime.publicationBarrier
dmb st
onarm64
dbar 0
on LA64 v1.00dbar <TBD>
on next revision of LA64 (finer-grained barriers are to be supported)runtime.Bswap{32,64}
revb.{2w,d}
on LA64 v1.00runtime/internal/sys.Prefetch{,Streamed}
preld
on LA64 v1.00runtime/internal/atomic.{And,Or}
am{and,or}.d
on LA64 v1.00not possible with LA64 v1.00math.{Trunc,Ceil,Floor,RoundToEven}
frint.[sd]
is not orthogonal: no fixed rounding mode variants (unlike e.g.ftintr{m,p,z,ne}
).math.Round
frint.[sd]
on LA64 v1.00 -- have to check if the rounding mode behavior is tolerablemath.Abs
fabs.[sd]
on LA64 v1.00math.Copysign
fcopysign.[sd]
on LA64 v1.00math.FMA
f{,n}m{add,sub}.[sd]
on LA64 v1.00math/bits.TrailingZeros{64,32}
(ssa.OpCtz{64,32}
)ctz.[wd]
on LA64 v1.00math/bits.Len{64,32,}
(ssa.OpBitLen{64,32}
)clz.[wd]
on LA64 v1.00math/bits.Reverse{64,32,8}
(ssa.OpBitRev{64,32,8}
)bitrev.{d,w,4b}
on LA64 v1.00We may want to implement (and preferably benchmark) all of the above.
cc @golang/loong64
The text was updated successfully, but these errors were encountered: