-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes,strings: tests appear to take ~100x as long on riscv #50615
Comments
For the strings package essentially all the extra time is taken by The inner loop of the function on riscv64 is
That code is not optimal, but I have no idea why it would be so very slow. |
I can submit a CL for this if you want :) |
@mengzhuo I don't know what the fix is, do you? If you do know, by all means send a CL. Thanks. |
Change https://golang.org/cl/379074 mentions this issue: |
Change https://golang.org/cl/380075 mentions this issue: |
Change https://golang.org/cl/380076 mentions this issue: |
Implement memequal using loops that process 32 bytes, 16 bytes, 4 bytes or 1 byte depending on size and alignment. For comparisons that are less than 32 bytes the overhead of checking and adjusting alignment usually exceeds the overhead of reading and processing 4 bytes at a time. Updates #50615 name old time/op new time/op delta Equal/0-4 38.3ns _ 0% 43.1ns _ 0% +12.54% (p=0.000 n=3+3) Equal/1-4 77.7ns _ 0% 90.3ns _ 0% +16.27% (p=0.000 n=3+3) Equal/6-4 116ns _ 0% 121ns _ 0% +3.85% (p=0.002 n=3+3) Equal/9-4 137ns _ 0% 126ns _ 0% -7.98% (p=0.000 n=3+3) Equal/15-4 179ns _ 0% 170ns _ 0% -4.77% (p=0.001 n=3+3) Equal/16-4 186ns _ 0% 159ns _ 0% -14.65% (p=0.000 n=3+3) Equal/20-4 215ns _ 0% 178ns _ 0% -17.18% (p=0.000 n=3+3) Equal/32-4 298ns _ 0% 101ns _ 0% -66.22% (p=0.000 n=3+3) Equal/4K-4 28.9_s _ 0% 2.2_s _ 0% -92.56% (p=0.000 n=3+3) Equal/4M-4 29.6ms _ 0% 2.2ms _ 0% -92.72% (p=0.000 n=3+3) Equal/64M-4 758ms _75% 35ms _ 0% ~ (p=0.127 n=3+3) CompareBytesEqual-4 226ns _ 0% 131ns _ 0% -41.76% (p=0.000 n=3+3) name old speed new speed delta Equal/1-4 12.9MB/s _ 0% 11.1MB/s _ 0% -13.98% (p=0.000 n=3+3) Equal/6-4 51.7MB/s _ 0% 49.8MB/s _ 0% -3.72% (p=0.002 n=3+3) Equal/9-4 65.7MB/s _ 0% 71.4MB/s _ 0% +8.67% (p=0.000 n=3+3) Equal/15-4 83.8MB/s _ 0% 88.0MB/s _ 0% +5.02% (p=0.001 n=3+3) Equal/16-4 85.9MB/s _ 0% 100.6MB/s _ 0% +17.19% (p=0.000 n=3+3) Equal/20-4 93.2MB/s _ 0% 112.6MB/s _ 0% +20.74% (p=0.000 n=3+3) Equal/32-4 107MB/s _ 0% 317MB/s _ 0% +195.97% (p=0.000 n=3+3) Equal/4K-4 142MB/s _ 0% 1902MB/s _ 0% +1243.76% (p=0.000 n=3+3) Equal/4M-4 142MB/s _ 0% 1946MB/s _ 0% +1274.22% (p=0.000 n=3+3) Equal/64M-4 111MB/s _55% 1941MB/s _ 0% +1641.21% (p=0.000 n=3+3) Change-Id: I9af7e82de3c4c5af8813772ed139230900c03b92 Reviewed-on: https://go-review.googlesource.com/c/go/+/380075 Trust: Joel Sing <joel@sing.id.au> Trust: mzh <mzh@golangcn.org> Reviewed-by: mzh <mzh@golangcn.org> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
Implement compare using loops that process 32 bytes, 16 bytes, 4 bytes or 1 byte depending on size and alignment. For comparisons that are less than 32 bytes the overhead of checking and adjusting alignment usually exceeds the overhead of reading and processing 4 bytes at a time. Updates #50615 name old time/op new time/op delta BytesCompare/1-4 68.4ns _ 1% 61.0ns _ 0% -10.78% (p=0.001 n=3+3) BytesCompare/2-4 82.9ns _ 0% 71.0ns _ 1% -14.31% (p=0.000 n=3+3) BytesCompare/4-4 107ns _ 0% 70ns _ 0% -34.96% (p=0.000 n=3+3) BytesCompare/8-4 156ns _ 0% 90ns _ 0% -42.36% (p=0.000 n=3+3) BytesCompare/16-4 267ns _11% 130ns _ 0% -51.10% (p=0.011 n=3+3) BytesCompare/32-4 446ns _ 0% 74ns _ 0% -83.31% (p=0.000 n=3+3) BytesCompare/64-4 840ns _ 2% 91ns _ 0% -89.17% (p=0.000 n=3+3) BytesCompare/128-4 1.60_s _ 0% 0.13_s _ 0% -92.18% (p=0.000 n=3+3) BytesCompare/256-4 3.15_s _ 0% 0.19_s _ 0% -93.91% (p=0.000 n=3+3) BytesCompare/512-4 6.25_s _ 0% 0.33_s _ 0% -94.80% (p=0.000 n=3+3) BytesCompare/1024-4 12.5_s _ 0% 0.6_s _ 0% -95.23% (p=0.000 n=3+3) BytesCompare/2048-4 24.8_s _ 0% 1.1_s _ 0% -95.46% (p=0.000 n=3+3) CompareBytesEqual-4 225ns _ 0% 131ns _ 0% -41.69% (p=0.000 n=3+3) CompareBytesToNil-4 45.3ns _ 7% 46.7ns _ 0% ~ (p=0.452 n=3+3) CompareBytesEmpty-4 41.0ns _ 1% 40.6ns _ 0% ~ (p=0.071 n=3+3) CompareBytesIdentical-4 48.9ns _ 0% 41.3ns _ 1% -15.58% (p=0.000 n=3+3) CompareBytesSameLength-4 127ns _ 0% 77ns _ 0% -39.48% (p=0.000 n=3+3) CompareBytesDifferentLength-4 136ns _12% 78ns _ 0% -42.65% (p=0.018 n=3+3) CompareBytesBigUnaligned-4 14.9ms _ 1% 7.3ms _ 1% -50.95% (p=0.000 n=3+3) CompareBytesBig-4 14.9ms _ 1% 2.7ms _ 8% -82.10% (p=0.000 n=3+3) CompareBytesBigIdentical-4 52.5ns _ 0% 44.9ns _ 0% -14.53% (p=0.000 n=3+3) name old speed new speed delta CompareBytesBigUnaligned-4 70.5MB/s _ 1% 143.8MB/s _ 1% +103.87% (p=0.000 n=3+3) CompareBytesBig-4 70.3MB/s _ 1% 393.8MB/s _ 8% +460.43% (p=0.003 n=3+3) CompareBytesBigIdentical-4 20.0TB/s _ 0% 23.4TB/s _ 0% +17.00% (p=0.000 n=3+3) Change-Id: Ie18712a9009d425c75e1ab49d5a673d84e73a1eb Reviewed-on: https://go-review.googlesource.com/c/go/+/380076 Trust: Joel Sing <joel@sing.id.au> Trust: mzh <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
Updates: after Joel's CLs had merged:
|
That seems within the 10x slow-down range of some other std packages being tested, so I think we could close this issue if the purpose was just to avoid the testing slowness. Happy to leave it open if anyone thinks there's still improvements to be made for riscv in that package. |
I actually found another similar case. |
Some other pacakges that slow crypto/ed25519/internal/edwards25519 39.730s https://build.golang.org/log/185c9831ce42fda738be8ab272cb157aa050e3e4 |
Change https://go.dev/cl/431099 mentions this issue: |
Change https://go.dev/cl/431100 mentions this issue: |
…scv64 On riscv64 we need 8 byte alignment for 8 byte loads - the existing check was only ensuring 4 byte alignment, which potentially results in unaligned loads being performed. Unaligned loads incur a significant performance penality due to the resulting kernel traps and fix ups. Adjust BenchmarkCompareBytesBigUnaligned so that this issue would have been more readily visible. Updates #50615 name old time/op new time/op delta CompareBytesBigUnaligned/offset=1-4 6.98ms _ 5% 6.84ms _ 3% ~ (p=0.319 n=5+5) CompareBytesBigUnaligned/offset=2-4 6.75ms _ 1% 6.99ms _ 4% ~ (p=0.063 n=5+5) CompareBytesBigUnaligned/offset=3-4 6.84ms _ 1% 6.74ms _ 1% -1.48% (p=0.003 n=5+5) CompareBytesBigUnaligned/offset=4-4 146ms _ 1% 7ms _ 6% -95.08% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 7.05ms _ 5% 6.75ms _ 1% ~ (p=0.079 n=5+5) CompareBytesBigUnaligned/offset=6-4 7.11ms _ 5% 6.89ms _ 5% ~ (p=0.177 n=5+5) CompareBytesBigUnaligned/offset=7-4 7.14ms _ 5% 6.91ms _ 6% ~ (p=0.165 n=5+5) name old speed new speed delta CompareBytesBigUnaligned/offset=1-4 150MB/s _ 5% 153MB/s _ 3% ~ (p=0.336 n=5+5) CompareBytesBigUnaligned/offset=2-4 155MB/s _ 1% 150MB/s _ 4% ~ (p=0.058 n=5+5) CompareBytesBigUnaligned/offset=3-4 153MB/s _ 1% 156MB/s _ 1% +1.51% (p=0.004 n=5+5) CompareBytesBigUnaligned/offset=4-4 7.16MB/s _ 1% 145.79MB/s _ 6% +1936.23% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 149MB/s _ 5% 155MB/s _ 1% ~ (p=0.078 n=5+5) CompareBytesBigUnaligned/offset=6-4 148MB/s _ 5% 152MB/s _ 5% ~ (p=0.175 n=5+5) CompareBytesBigUnaligned/offset=7-4 147MB/s _ 5% 152MB/s _ 6% ~ (p=0.160 n=5+5) Change-Id: I2c859e061919db482318ce63b85b808aa973a9ba Reviewed-on: https://go-review.googlesource.com/c/go/+/431099 Reviewed-by: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>
Thanks - this was hitting a bug in the compare function, which resulted in unaligned accesses during test set up. The last run took 6.585s. |
The |
Remove some unnecessary loops and pull the comparison code out from the compare/loop code. Add an unaligned 8 byte comparison, which reads 8 bytes from each input before comparing them. This gives a reasonable gain in performance for the large unaligned case. Updates #50615 name old time/op new time/op delta CompareBytesEqual-4 116ns _ 0% 111ns _ 0% -4.10% (p=0.000 n=5+5) CompareBytesToNil-4 34.9ns _ 0% 35.0ns _ 0% +0.45% (p=0.002 n=5+5) CompareBytesEmpty-4 29.6ns _ 1% 29.8ns _ 0% +0.71% (p=0.016 n=5+5) CompareBytesIdentical-4 29.8ns _ 0% 29.9ns _ 1% +0.50% (p=0.036 n=5+5) CompareBytesSameLength-4 66.1ns _ 0% 60.4ns _ 0% -8.59% (p=0.000 n=5+5) CompareBytesDifferentLength-4 63.1ns _ 0% 60.5ns _ 0% -4.20% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=1-4 6.84ms _ 3% 6.04ms _ 5% -11.70% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=2-4 6.99ms _ 4% 5.93ms _ 6% -15.22% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=3-4 6.74ms _ 1% 6.00ms _ 5% -10.94% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=4-4 7.20ms _ 6% 5.97ms _ 6% -17.05% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 6.75ms _ 1% 5.81ms _ 8% -13.93% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=6-4 6.89ms _ 5% 5.75ms _ 2% -16.58% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 6.91ms _ 6% 6.13ms _ 6% -11.27% (p=0.001 n=5+5) CompareBytesBig-4 2.75ms _ 5% 2.71ms _ 8% ~ (p=0.651 n=5+5) CompareBytesBigIdentical-4 29.9ns _ 1% 29.8ns _ 0% ~ (p=0.751 n=5+5) name old speed new speed delta CompareBytesBigUnaligned/offset=1-4 153MB/s _ 3% 174MB/s _ 6% +13.40% (p=0.003 n=5+5) CompareBytesBigUnaligned/offset=2-4 150MB/s _ 4% 177MB/s _ 6% +18.06% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=3-4 156MB/s _ 1% 175MB/s _ 5% +12.39% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=4-4 146MB/s _ 6% 176MB/s _ 6% +20.67% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=5-4 155MB/s _ 1% 181MB/s _ 7% +16.35% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=6-4 152MB/s _ 5% 182MB/s _ 2% +19.74% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 152MB/s _ 6% 171MB/s _ 6% +12.70% (p=0.001 n=5+5) CompareBytesBig-4 382MB/s _ 5% 388MB/s _ 9% ~ (p=0.616 n=5+5) CompareBytesBigIdentical-4 35.1TB/s _ 1% 35.1TB/s _ 0% ~ (p=0.800 n=5+5) Change-Id: I127edc376e62a2c529719a4ab172f481e0a81357 Reviewed-on: https://go-review.googlesource.com/c/go/+/431100 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>
Remove some unnecessary loops and pull the comparison code out from the compare/loop code. Add an unaligned 8 byte comparison, which reads 8 bytes from each input before comparing them. This gives a reasonable gain in performance for the large unaligned case. Updates golang#50615 name old time/op new time/op delta CompareBytesEqual-4 116ns _ 0% 111ns _ 0% -4.10% (p=0.000 n=5+5) CompareBytesToNil-4 34.9ns _ 0% 35.0ns _ 0% +0.45% (p=0.002 n=5+5) CompareBytesEmpty-4 29.6ns _ 1% 29.8ns _ 0% +0.71% (p=0.016 n=5+5) CompareBytesIdentical-4 29.8ns _ 0% 29.9ns _ 1% +0.50% (p=0.036 n=5+5) CompareBytesSameLength-4 66.1ns _ 0% 60.4ns _ 0% -8.59% (p=0.000 n=5+5) CompareBytesDifferentLength-4 63.1ns _ 0% 60.5ns _ 0% -4.20% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=1-4 6.84ms _ 3% 6.04ms _ 5% -11.70% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=2-4 6.99ms _ 4% 5.93ms _ 6% -15.22% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=3-4 6.74ms _ 1% 6.00ms _ 5% -10.94% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=4-4 7.20ms _ 6% 5.97ms _ 6% -17.05% (p=0.000 n=5+5) CompareBytesBigUnaligned/offset=5-4 6.75ms _ 1% 5.81ms _ 8% -13.93% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=6-4 6.89ms _ 5% 5.75ms _ 2% -16.58% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 6.91ms _ 6% 6.13ms _ 6% -11.27% (p=0.001 n=5+5) CompareBytesBig-4 2.75ms _ 5% 2.71ms _ 8% ~ (p=0.651 n=5+5) CompareBytesBigIdentical-4 29.9ns _ 1% 29.8ns _ 0% ~ (p=0.751 n=5+5) name old speed new speed delta CompareBytesBigUnaligned/offset=1-4 153MB/s _ 3% 174MB/s _ 6% +13.40% (p=0.003 n=5+5) CompareBytesBigUnaligned/offset=2-4 150MB/s _ 4% 177MB/s _ 6% +18.06% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=3-4 156MB/s _ 1% 175MB/s _ 5% +12.39% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=4-4 146MB/s _ 6% 176MB/s _ 6% +20.67% (p=0.001 n=5+5) CompareBytesBigUnaligned/offset=5-4 155MB/s _ 1% 181MB/s _ 7% +16.35% (p=0.002 n=5+5) CompareBytesBigUnaligned/offset=6-4 152MB/s _ 5% 182MB/s _ 2% +19.74% (p=0.000 n=5+4) CompareBytesBigUnaligned/offset=7-4 152MB/s _ 6% 171MB/s _ 6% +12.70% (p=0.001 n=5+5) CompareBytesBig-4 382MB/s _ 5% 388MB/s _ 9% ~ (p=0.616 n=5+5) CompareBytesBigIdentical-4 35.1TB/s _ 1% 35.1TB/s _ 0% ~ (p=0.800 n=5+5) Change-Id: I127edc376e62a2c529719a4ab172f481e0a81357 Reviewed-on: https://go-review.googlesource.com/c/go/+/431100 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>
People interested in riscv unaligned accesses might want to take a look at this golang-dev thread: https://groups.google.com/g/golang-dev/c/5Om3lJcYxzA/m/KDXhKafPBgAJ |
I was poking at https://build.golang.org/, and I wondered why the builds for the riscv machines were lagging behind. I understand they may be slower machines, but still, they shouldn't be that slow.
I took a peek at some of the recent logs, such as:
https://build.golang.org/log/c0751a2f215266bf2e29d8987f7a274a4774894d
https://build.golang.org/log/f7b5e9a48eba8ab84a1ac5f77e87dec51f28ef3b
https://build.golang.org/log/ba7516c48279b409253d9742a2efb1fa2e2fa59d
What jumped out to me is how the
bytes
andstrings
packages take 100-300s to test, which sounds like something is very wrong. Perhaps they are running huge inputs through unoptimized functions likeIndexByte
? On most other builders, these packages take 1-2s.For the rest of the packages, testing on riscv is somewhere in between about as fast and 10x slower. For example, cmd/go includes lots of process execution, filesystem I/O, and Go builds, and it goes from ~30s to ~300s, which is still reasonable. I haven't found a single other package that gets close to being 100x as slow.
Also, at an intuitive level, it does not make any sense that testing
cmd/go
takes about as long as testingbytes
; the former has significantly more tests, even in short mode, and they tend to be integration tests too.It's unclear to me if we can truly blame the two packages, or if this problem is only occurring on riscv; I haven't done further research.
cc @4a6f656c @rhysh
The text was updated successfully, but these errors were encountered: