Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: add GOMIPS32, GOMIPS64 ISA levels (iii, r1, r2, r5, r6) #60072

Open
HeliC829 opened this issue May 9, 2023 · 27 comments
Open

cmd/go: add GOMIPS32, GOMIPS64 ISA levels (iii, r1, r2, r5, r6) #60072

HeliC829 opened this issue May 9, 2023 · 27 comments

Comments

@HeliC829
Copy link
Contributor

HeliC829 commented May 9, 2023

Currently GOMIPS64 accepts hardfloat(as default) and softfloat.

Golang currently support MIPS III or higher. I had submitted two CLs and they take little performance improvement to MIPS64. CL 485635 CL 485595. But those instructions only available after r1 but not MIPS III.

So if we want to get more performance improvement on mips64, we should support more isa level.

We wish that GOMIPS can also accept r2/r5.

I tried introduce some instructions from MIPS R2. The following data shows the test results and performance improvement if we can support newer isa level on mips64x..

goos: linux
goarch: mips64le
pkg: crypto/tls
                                                 │    oldtls     │               newtls                │
                                                 │    sec/op     │    sec/op     vs base               │
CertCache/0-4                                       5.839m ±  6%   6.417m ±  8%   +9.91% (p=0.001 n=8)
CertCache/1-4                                       6.277m ±  6%   6.246m ±  7%        ~ (p=0.721 n=8)
CertCache/2-4                                       6.119m ± 14%   6.305m ±  7%   +3.04% (p=0.050 n=8)
CertCache/3-4                                       6.115m ± 10%   6.542m ± 11%   +6.98% (p=0.038 n=8)
HandshakeServer/RSA-4                               6.293m ±  1%   6.214m ±  0%   -1.26% (p=0.002 n=8)
HandshakeServer/ECDHE-P256-RSA/TLSv13-4             11.57m ±  0%   11.34m ±  1%   -1.98% (p=0.010 n=8)
HandshakeServer/ECDHE-P256-RSA/TLSv12-4             10.89m ±  0%   10.79m ±  0%   -0.88% (p=0.000 n=8)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv13-4      7.247m ±  1%   7.008m ±  1%   -3.29% (p=0.007 n=8)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv12-4      6.592m ±  0%   6.496m ±  0%   -1.46% (p=0.000 n=8)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv13-4    5.356m ±  3%   5.172m ±  3%   -3.45% (p=0.015 n=8)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv12-4    4.686m ±  1%   4.566m ±  0%   -2.56% (p=0.000 n=8)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv13-4      220.2m ±  0%   217.4m ±  0%   -1.26% (p=0.000 n=8)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv12-4      219.6m ±  0%   216.9m ±  0%   -1.25% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv12-4                   519.1m ±  1%   148.1m ±  2%  -71.47% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv13-4                   537.9m ±  0%   164.5m ±  1%  -69.42% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv12-4                  1028.5m ±  0%   279.3m ±  1%  -72.85% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv13-4                  1063.3m ±  0%   313.1m ±  1%  -70.56% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv12-4                  2036.4m ±  0%   552.1m ±  1%  -72.89% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv13-4                  2106.4m ±  0%   614.5m ±  1%  -70.83% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv12-4                    4.064 ±  0%    1.080 ±  4%  -73.43% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv13-4                    4.198 ±  0%    1.212 ±  7%  -71.12% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv12-4                   8.115 ±  1%    2.202 ±  7%  -72.87% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv13-4                   8.383 ±  0%    2.403 ±  1%  -71.33% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv12-4                  16.198 ±  0%    4.283 ±  0%  -73.56% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv13-4                  16.763 ±  0%    4.792 ±  1%  -71.42% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv12-4                  32.388 ±  0%    8.603 ±  2%  -73.44% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv13-4                  33.502 ±  0%    9.636 ±  1%  -71.24% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv12-4               514.2m ±  1%   146.3m ±  1%  -71.55% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv13-4               531.9m ±  1%   162.4m ±  2%  -69.47% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv12-4              1019.8m ±  3%   279.2m ±  3%  -72.62% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv13-4              1056.9m ±  0%   311.2m ±  1%  -70.56% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv12-4              2031.2m ±  1%   547.3m ±  1%  -73.06% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv13-4              2102.5m ±  0%   608.2m ±  1%  -71.07% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv12-4                4.053 ±  0%    1.082 ±  1%  -73.31% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv13-4                4.193 ±  0%    1.216 ±  1%  -70.99% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv12-4               8.104 ±  1%    2.151 ±  2%  -73.46% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv13-4               8.388 ±  0%    2.406 ±  1%  -71.32% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv12-4              16.202 ±  0%    4.287 ±  1%  -73.54% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv13-4              16.761 ±  0%    4.869 ±  2%  -70.95% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv12-4              32.394 ±  0%    8.589 ±  2%  -73.49% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv13-4              33.500 ±  0%    9.610 ±  3%  -71.31% (p=0.000 n=8)
Latency/MaxPacket/200kbps/TLSv12-4                  719.9m ±  0%   712.3m ±  0%   -1.06% (p=0.000 n=8)
Latency/MaxPacket/200kbps/TLSv13-4                  722.7m ±  0%   714.5m ±  0%   -1.13% (p=0.000 n=8)
Latency/MaxPacket/500kbps/TLSv12-4                  303.9m ±  0%   296.1m ±  0%   -2.57% (p=0.000 n=8)
Latency/MaxPacket/500kbps/TLSv13-4                  304.5m ±  0%   296.3m ±  0%   -2.68% (p=0.000 n=8)
Latency/MaxPacket/1000kbps/TLSv12-4                 165.5m ±  0%   157.5m ±  0%   -4.85% (p=0.000 n=8)
Latency/MaxPacket/1000kbps/TLSv13-4                 165.0m ±  0%   156.6m ±  0%   -5.07% (p=0.000 n=8)
Latency/MaxPacket/2000kbps/TLSv12-4                 96.05m ±  0%   88.17m ±  0%   -8.21% (p=0.000 n=8)
Latency/MaxPacket/2000kbps/TLSv13-4                 95.48m ±  0%   87.23m ±  0%   -8.65% (p=0.000 n=8)
Latency/MaxPacket/5000kbps/TLSv12-4                 54.42m ±  1%   46.43m ±  0%  -14.68% (p=0.000 n=8)
Latency/MaxPacket/5000kbps/TLSv13-4                 54.75m ±  0%   46.36m ±  0%  -15.33% (p=0.000 n=8)
Latency/DynamicPacket/200kbps/TLSv12-4              152.4m ±  0%   149.2m ±  0%   -2.13% (p=0.000 n=8)
Latency/DynamicPacket/200kbps/TLSv13-4              153.8m ±  0%   151.6m ±  0%   -1.48% (p=0.000 n=8)
Latency/DynamicPacket/500kbps/TLSv12-4              73.47m ±  0%   69.92m ±  1%   -4.84% (p=0.000 n=8)
Latency/DynamicPacket/500kbps/TLSv13-4              72.63m ±  1%   70.06m ±  0%   -3.54% (p=0.000 n=8)
Latency/DynamicPacket/1000kbps/TLSv12-4             47.15m ±  0%   43.59m ±  0%   -7.55% (p=0.000 n=8)
Latency/DynamicPacket/1000kbps/TLSv13-4             45.26m ±  1%   42.60m ±  1%   -5.88% (p=0.000 n=8)
Latency/DynamicPacket/2000kbps/TLSv12-4             33.88m ±  0%   30.25m ±  0%  -10.70% (p=0.000 n=8)
Latency/DynamicPacket/2000kbps/TLSv13-4             31.90m ±  1%   29.36m ±  0%   -7.96% (p=0.000 n=8)
Latency/DynamicPacket/5000kbps/TLSv12-4             25.60m ±  0%   21.99m ±  1%  -14.12% (p=0.000 n=8)
Latency/DynamicPacket/5000kbps/TLSv13-4             24.41m ±  0%   21.93m ±  1%  -10.19% (p=0.000 n=8)
geomean                                             346.1m         188.9m        -45.43%

                                       │    oldtls     │                newtls                 │
                                       │      B/s      │      B/s       vs base                │
Throughput/MaxPacket/1MB/TLSv12-4        1.926Mi ±  1%   6.752Mi ± 13%  +250.50% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv13-4        1.860Mi ±  1%   6.080Mi ±  4%  +226.92% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv12-4        1.945Mi ±  0%   7.162Mi ±  3%  +268.14% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv13-4        1.884Mi ±  4%   6.390Mi ± 21%  +239.24% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv12-4        1.965Mi ±  2%   7.243Mi ±  4%  +268.69% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv13-4        1.898Mi ±  1%   6.509Mi ±  5%  +242.96% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv12-4        1.969Mi ±  0%   7.405Mi ±  6%  +276.03% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv13-4        1.907Mi ±  0%   6.599Mi ± 18%  +246.00% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv12-4       1.974Mi ±  1%   7.262Mi ±  8%  +267.87% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv13-4       1.907Mi ±  4%   6.657Mi ±  2%  +249.00% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv12-4       1.974Mi ±  2%   7.467Mi ±  3%  +278.26% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv13-4       1.907Mi ±  1%   6.680Mi ±  1%  +250.25% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv12-4       1.974Mi ±  2%   7.439Mi ±  3%  +276.81% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv13-4       1.912Mi ±  1%   6.642Mi ±  2%  +247.38% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv12-4    1.945Mi ± 12%   6.838Mi ±  8%  +251.47% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv13-4    1.879Mi ±  1%   6.156Mi ±  3%  +227.66% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv12-4    1.965Mi ± 11%   7.167Mi ± 16%  +264.81% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv13-4    1.893Mi ±  1%   6.428Mi ±  2%  +239.55% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv12-4    1.969Mi ±  1%   7.310Mi ±  6%  +271.19% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv13-4    1.903Mi ±  0%   6.576Mi ±  2%  +245.61% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv12-4    1.974Mi ±  1%   7.396Mi ± 10%  +274.64% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv13-4    1.907Mi ±  1%   6.576Mi ±  3%  +244.75% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv12-4   1.974Mi ±  3%   7.439Mi ±  3%  +276.81% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv13-4   1.907Mi ±  1%   6.647Mi ±  4%  +248.50% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv12-4   1.974Mi ±  0%   7.463Mi ± 10%  +278.02% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv13-4   1.907Mi ±  1%   6.576Mi ±  2%  +244.75% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv12-4   1.974Mi ±  0%   7.448Mi ±  3%  +277.29% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv13-4   1.912Mi ±  1%   6.661Mi ±  4%  +248.38% (p=0.000 n=8)
geomean                                  1.931Mi         6.878Mi        +256.13%
goos: linux
goarch: mips64le
pkg: crypto/md5
                      │    oldmd5    │               newmd5               │
                      │    sec/op    │   sec/op     vs base               │
Hash8Bytes-4             2.712µ ± 0%   2.514µ ± 0%   -7.28% (p=0.000 n=8)
Hash64-4                 3.387µ ± 0%   2.999µ ± 0%  -11.46% (p=0.000 n=8)
Hash128-4                4.115µ ± 0%   3.527µ ± 0%  -14.30% (p=0.000 n=8)
Hash256-4                5.569µ ± 0%   4.583µ ± 0%  -17.71% (p=0.000 n=8)
Hash512-4                8.492µ ± 0%   6.709µ ± 0%  -21.00% (p=0.000 n=8)
Hash1K-4                 14.31µ ± 0%   10.94µ ± 0%  -23.57% (p=0.000 n=8)
Hash8K-4                 95.82µ ± 0%   70.18µ ± 0%  -26.76% (p=0.000 n=8)
Hash1M-4                11.933m ± 0%   8.674m ± 0%  -27.31% (p=0.000 n=8)
Hash8M-4                 95.45m ± 0%   69.40m ± 0%  -27.29% (p=0.000 n=8)
Hash8BytesUnaligned-4    2.784µ ± 0%   2.588µ ± 0%   -7.04% (p=0.000 n=8)
Hash1KUnaligned-4        14.31µ ± 0%   10.95µ ± 0%  -23.48% (p=0.000 n=8)
Hash8KUnaligned-4        95.76µ ± 0%   70.23µ ± 0%  -26.66% (p=0.000 n=8)
geomean                  38.51µ        30.88µ       -19.82%

                      │    oldmd5     │                newmd5                │
                      │      B/s      │      B/s       vs base               │
Hash8Bytes-4            2.813Mi ±  0%    3.033Mi ± 0%   +7.80% (p=0.000 n=8)
Hash64-4                18.02Mi ±  0%    20.35Mi ± 0%  +12.91% (p=0.000 n=8)
Hash128-4               29.66Mi ±  0%    34.61Mi ± 0%  +16.69% (p=0.000 n=8)
Hash256-4               43.85Mi ±  0%    53.27Mi ± 0%  +21.50% (p=0.000 n=8)
Hash512-4               57.50Mi ±  0%    72.78Mi ± 0%  +26.59% (p=0.000 n=8)
Hash1K-4                68.25Mi ±  0%    89.30Mi ± 0%  +30.84% (p=0.000 n=8)
Hash8K-4                81.53Mi ±  0%   111.33Mi ± 0%  +36.54% (p=0.000 n=8)
Hash1M-4                83.80Mi ± 28%   115.29Mi ± 0%  +37.58% (p=0.000 n=8)
Hash8M-4                83.82Mi ±  0%   115.27Mi ± 0%  +37.52% (p=0.000 n=8)
Hash8BytesUnaligned-4   2.737Mi ±  0%    2.947Mi ± 0%   +7.67% (p=0.000 n=8)
Hash1KUnaligned-4       68.24Mi ±  0%    89.19Mi ± 0%  +30.69% (p=0.000 n=8)
Hash8KUnaligned-4       81.59Mi ±  0%   111.24Mi ± 0%  +36.34% (p=0.000 n=8)
geomean                 33.84Mi          42.21Mi       +24.72%
goos: linux
goarch: mips64le
pkg: crypto/sha1
                   │   oldsha1   │              newsha1               │
                   │   sec/op    │   sec/op     vs base               │
Hash8Bytes/New-4     5.341µ ± 0%   4.863µ ± 0%   -8.95% (p=0.000 n=8)
Hash8Bytes/Sum-4     5.456µ ± 0%   4.983µ ± 0%   -8.68% (p=0.000 n=8)
Hash320Bytes/New-4   16.69µ ± 0%   13.85µ ± 0%  -17.00% (p=0.000 n=8)
Hash320Bytes/Sum-4   16.81µ ± 0%   13.97µ ± 0%  -16.92% (p=0.000 n=8)
Hash1K/New-4         42.90µ ± 0%   34.81µ ± 0%  -18.87% (p=0.000 n=8)
Hash1K/Sum-4         43.02µ ± 0%   34.94µ ± 0%  -18.80% (p=0.000 n=8)
Hash8K/New-4         309.6µ ± 0%   248.3µ ± 0%  -19.78% (p=0.000 n=8)
Hash8K/Sum-4         309.5µ ± 0%   248.5µ ± 0%  -19.71% (p=0.000 n=8)
geomean              33.11µ        27.75µ       -16.20%

                   │   oldsha1    │               newsha1               │
                   │     B/s      │     B/s       vs base               │
Hash8Bytes/New-4     1.431Mi ± 1%   1.574Mi ± 1%  +10.00% (p=0.000 n=8)
Hash8Bytes/Sum-4     1.402Mi ± 1%   1.535Mi ± 1%   +9.52% (p=0.000 n=8)
Hash320Bytes/New-4   18.29Mi ± 0%   22.04Mi ± 0%  +20.49% (p=0.000 n=8)
Hash320Bytes/Sum-4   18.15Mi ± 0%   21.85Mi ± 0%  +20.39% (p=0.000 n=8)
Hash1K/New-4         22.76Mi ± 1%   28.06Mi ± 0%  +23.25% (p=0.000 n=8)
Hash1K/Sum-4         22.70Mi ± 0%   27.95Mi ± 0%  +23.13% (p=0.000 n=8)
Hash8K/New-4         25.24Mi ± 0%   31.46Mi ± 0%  +24.64% (p=0.000 n=8)
Hash8K/Sum-4         25.24Mi ± 0%   31.44Mi ± 0%  +24.54% (p=0.000 n=8)
geomean              11.03Mi        13.16Mi       +19.35%
goos: linux
goarch: mips64le
pkg: math/bits
                  │   oldbits    │              newbits               │
                  │    sec/op    │   sec/op     vs base               │
LeadingZeros-4      20.505n ± 1%   6.780n ± 0%  -66.93% (p=0.000 n=8)
LeadingZeros8-4     10.040n ± 0%   9.039n ± 0%   -9.98% (p=0.000 n=8)
LeadingZeros16-4    19.085n ± 0%   9.038n ± 0%  -52.64% (p=0.000 n=8)
LeadingZeros32-4     24.13n ± 0%   10.55n ± 0%  -56.28% (p=0.000 n=8)
LeadingZeros64-4    19.660n ± 0%   6.776n ± 0%  -65.54% (p=0.000 n=8)
TrailingZeros-4     13.055n ± 0%   9.037n ± 0%  -30.77% (p=0.000 n=8)
TrailingZeros8-4     7.364n ± 0%   7.364n ± 0%        ~ (p=0.449 n=8)
TrailingZeros16-4    17.07n ± 0%   10.05n ± 0%  -41.14% (p=0.000 n=8)
TrailingZeros32-4   17.405n ± 0%   8.534n ± 0%  -50.97% (p=0.000 n=8)
TrailingZeros64-4   13.050n ± 0%   9.037n ± 0%  -30.75% (p=0.000 n=8)
OnesCount-4          21.09n ± 0%   21.10n ± 0%        ~ (p=0.054 n=8)
OnesCount8-4         6.024n ± 0%   6.024n ± 0%        ~ (p=0.533 n=8)
OnesCount16-4        13.05n ± 0%   13.05n ± 0%        ~ (p=1.000 n=8)
OnesCount32-4        20.08n ± 0%   20.08n ± 0%        ~ (p=0.367 n=8)
OnesCount64-4        23.10n ± 0%   23.11n ± 0%        ~ (p=0.407 n=8)
RotateLeft-4         9.037n ± 0%   4.418n ± 0%  -51.11% (p=0.000 n=8)
RotateLeft8-4        9.537n ± 0%   9.208n ± 0%   -3.45% (p=0.000 n=8)
RotateLeft16-4       9.208n ± 0%   9.375n ± 0%   +1.82% (p=0.000 n=8)
RotateLeft32-4      10.380n ± 0%   4.021n ± 0%  -61.26% (p=0.000 n=8)
RotateLeft64-4       8.034n ± 0%   4.016n ± 0%  -50.01% (p=0.000 n=8)
Reverse-4            62.26n ± 0%   18.08n ± 0%  -70.96% (p=0.000 n=8)
Reverse8-4           5.020n ± 0%   5.021n ± 0%        ~ (p=1.000 n=8)
Reverse16-4          9.036n ± 0%   9.039n ± 0%        ~ (p=0.098 n=8)
Reverse32-4          29.13n ± 0%   23.11n ± 0%  -20.68% (p=0.000 n=8)
Reverse64-4          27.50n ± 0%   21.10n ± 0%  -23.27% (p=0.000 n=8)
ReverseBytes-4      13.970n ± 1%   3.044n ± 1%  -78.21% (p=0.000 n=8)
ReverseBytes16-4     4.297n ± 1%   4.329n ± 1%   +0.74% (p=0.050 n=8)
ReverseBytes32-4    12.050n ± 0%   5.021n ± 0%  -58.34% (p=0.000 n=8)
ReverseBytes64-4    17.220n ± 2%   3.030n ± 0%  -82.40% (p=0.000 n=8)
Add-4                8.178n ± 0%   8.188n ± 0%        ~ (p=0.661 n=8)
Add32-4              8.284n ± 0%   8.285n ± 0%        ~ (p=0.292 n=8)
Add64-4              7.890n ± 1%   7.876n ± 0%        ~ (p=0.522 n=8)
Add64multiple-4      17.08n ± 0%   17.08n ± 0%        ~ (p=0.297 n=8)
Sub-4                9.543n ± 0%   9.540n ± 0%        ~ (p=0.312 n=8)
Sub32-4              13.07n ± 0%   13.05n ± 0%   -0.08% (p=0.011 n=8)
Sub64-4              10.30n ± 0%   10.29n ± 0%        ~ (p=0.080 n=8)
Sub64multiple-4      19.09n ± 0%   19.08n ± 0%   -0.05% (p=0.008 n=8)
Mul-4                5.100n ± 0%   5.097n ± 0%        ~ (p=0.338 n=8)
Mul32-4              7.371n ± 0%   7.363n ± 0%   -0.11% (p=0.000 n=8)
Mul64-4              5.242n ± 0%   5.020n ± 0%   -4.24% (p=0.000 n=8)
Div-4                133.6n ± 0%   118.4n ± 0%  -11.38% (p=0.000 n=8)
Div32-4              15.65n ± 1%   15.41n ± 0%   -1.53% (p=0.000 n=8)
Div64-4              132.7n ± 0%   117.3n ± 1%  -11.53% (p=0.000 n=8)
geomean              13.85n        9.917n       -28.41%
goos: linux
goarch: mips64le
pkg: crypto/sha256
                    │  oldsha256  │             newsha256              │
                    │   sec/op    │   sec/op     vs base               │
Hash8Bytes/New-4      6.689µ ± 0%   6.094µ ± 0%   -8.89% (p=0.000 n=8)
Hash8Bytes/Sum224-4   7.106µ ± 0%   6.507µ ± 0%   -8.43% (p=0.000 n=8)
Hash8Bytes/Sum256-4   7.217µ ± 0%   6.623µ ± 0%   -8.24% (p=0.000 n=8)
Hash1K/New-4          62.66µ ± 0%   52.35µ ± 0%  -16.45% (p=0.000 n=8)
Hash1K/Sum224-4       62.91µ ± 0%   52.75µ ± 0%  -16.16% (p=0.000 n=8)
Hash1K/Sum256-4       63.03µ ± 0%   52.86µ ± 0%  -16.14% (p=0.000 n=8)
Hash8K/New-4          450.8µ ± 0%   373.5µ ± 0%  -17.15% (p=0.000 n=8)
Hash8K/Sum224-4       451.0µ ± 0%   373.9µ ± 0%  -17.10% (p=0.000 n=8)
Hash8K/Sum256-4       451.5µ ± 0%   374.0µ ± 0%  -17.16% (p=0.000 n=8)
geomean               58.34µ        50.14µ       -14.05%

                    │  oldsha256   │              newsha256               │
                    │     B/s      │      B/s       vs base               │
Hash8Bytes/New-4      1.144Mi ± 1%   1.249Mi ±  0%   +9.17% (p=0.000 n=8)
Hash8Bytes/Sum224-4   1.078Mi ± 1%   1.173Mi ±  0%   +8.85% (p=0.000 n=8)
Hash8Bytes/Sum256-4   1.059Mi ± 1%   1.154Mi ±  0%   +9.01% (p=0.000 n=8)
Hash1K/New-4          15.58Mi ± 0%   18.65Mi ± 12%  +19.71% (p=0.000 n=8)
Hash1K/Sum224-4       15.53Mi ± 1%   18.51Mi ±  0%  +19.23% (p=0.000 n=8)
Hash1K/Sum256-4       15.49Mi ± 0%   18.47Mi ±  0%  +19.24% (p=0.000 n=8)
Hash8K/New-4          17.33Mi ± 0%   20.91Mi ±  0%  +20.69% (p=0.000 n=8)
Hash8K/Sum224-4       17.32Mi ± 0%   20.90Mi ±  0%  +20.65% (p=0.000 n=8)
Hash8K/Sum256-4       17.30Mi ± 0%   20.89Mi ±  0%  +20.73% (p=0.000 n=8)
geomean               6.649Mi        7.729Mi        +16.24%
@gopherbot gopherbot added this to the Proposal milestone May 9, 2023
@HeliC829
Copy link
Contributor Author

HeliC829 commented May 9, 2023

cc @cherrymui

@randall77
Copy link
Contributor

See my comment over at #59415 (comment)

@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals May 9, 2023
@ianlancetaylor
Copy link
Contributor

@randall77 I think that your comment has been addressed: the proposal here is permitting setting GOMIPS64 to direct the compiler to generate a few special purpose instructions.

@HeliC829 The GOMIPS64 variable already exists, of course. I think that you are suggesting that we permit a comma-separate list of options in GOMIPS64. The options can be

  • either hardfloat (default) or softfloat
  • one of r1 (default), r2, r3, r5, r6

I added r1 because there has to be a way to specify the default. I added the others because compilers support them. I don't know what happened to r4.

Do you have any reference to what the different ISA levels mean? I couldn't find one.

@HeliC829
Copy link
Contributor Author

Do you have any reference to what the different ISA levels mean? I couldn't find one.

Here is MIPS ISA level ref, at Page 24 of 148 :
https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00083-2B-MIPS64INT-AFP-06.01.pdf

Golang currently support MIPS III on MIPS64, to be notice, MIPS III is different from MIPS R1. So we could consider 3 as the default level,

To resolve there is letter in ISA level, i think we can use the enum mips_isa level defined in gcc in rules.

@ianlancetaylor
Copy link
Contributor

I know that the situation is very confusing, but it doesn't seem ideal to treat 3 as the default level while also permitting r1. Can we come up with a list of strings that makes sense today and also for the future?

@HeliC829
Copy link
Contributor Author

OK, so let us use roman numerals iii mean default level MIPS III? And the value related to isa level are as follows:

iii: MIPS III (default, also current MIPS64 isa level)
r1:MIPS R1
r2:MIPS R2
r5:MIPS R5
r6:MIPS R6

@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

From the doc linked above:

Screenshot 2023-06-07 at 1 56 15 PM

It sounds like GOMIPS64 is a comma-separated list of choices: hardfloat, softfloat, iii, r1, r2, r5, r6.
Probably we should define them all: iii, iv, v, r1, r2, r3, r5, r6. We may not use them today but they'll be defined.

Do I have that right?

@rsc rsc moved this from Incoming to Active in Proposals Jun 7, 2023
@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc changed the title proposal: MIPS64: pass ISA level with GOMIPS64 cmd/go: add GOMIPS32, GOMIPS64 ISA levels (i, ii, iii, iv, v, r1, r2, r3, r5, r6) Jun 7, 2023
@Rongronggg9
Copy link
Member

It sounds like GOMIPS64 is a comma-separated list of choices: hardfloat, softfloat, iii, r1, r2, r5, r6.

Right.

Probably we should define them all: iii, iv, v, r1, r2, r3, r5, r6. We may not use them today but they'll be defined.

Just FYI: there is no MIPS IV hardware running Linux distribution in practice and even no MIPS V hardware implementation. Besides, in user space, the difference between III, IV and V is tiny. R3 is a significant release but there are only privileged instructions added and no visible user space change compared to R2. Thus, as a minimum requirement, we consider that only defining iii, r1, r2, r5 and r6 should be enough. It is okay to define other ISA levels as reserved, of course, if there is such a demand.

@rsc
Copy link
Contributor

rsc commented Jun 14, 2023

In practice since we don't emit code that cares about the difference, GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v will all mean the same thing, but they exist(ed) and it's easy to include them, so we might as well recognize the full set.

@rsc rsc moved this from Active to Likely Accept in Proposals Jun 14, 2023
@rsc
Copy link
Contributor

rsc commented Jun 14, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@Rongronggg9
Copy link
Member

Rongronggg9 commented Jun 14, 2023

Based on the discussion above, this proposal seems like a likely accept.

Excited news! Thanks for your review.

GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v

Did you mean GOMIPS64?

they exist(ed) and it's easy to include them, so we might as well recognize the full set.

Let me summarize:

ISA level GOMIPS32 GOMIPS64
i defined, ? N/A
ii defined, ? N/A
iii N/A valid, implemented (current default)
iv N/A valid, equivalent to iii
v N/A valid, equivalent to iii
r1 valid, implemented (current default) valid, ?1
r2 valid, to be implemented valid, to be implemented
r3 valid, equivalent to r2 valid, equivalent to r2
r42 N/A N/A
r5 valid, to be implemented valid, to be implemented
r6 valid, to be implemented valid, to be implemented

Footnotes

  1. I consider we can make GOMIPS64=r1 equivalent to GOMIPS64=iii for the time being, or separate r1-compatible optimizations from the GOMIPS64=r2 patchset if it is not too complex.

  2. Does not exist.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/493816 mentions this issue: cmd/internal/obj/mips: add REBH/REBHV/REHVV instructions

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/485595 mentions this issue: math/bits: optimize BitLens64/32 on mips64x

@HeliC829
Copy link
Contributor Author

Excited news! Thanks for your review.

GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v

Did you mean GOMIPS64?

they exist(ed) and it's easy to include them, so we might as well recognize the full set.

Let me summarize:

It‘s such a good summary. Besides, each newer ISA level is the superset of previous version except for R6 (R6 removed and adjusted some outdated instructions due to the changes in microarchitecture desgin).

@rsc
Copy link
Contributor

rsc commented Jun 21, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals Jun 21, 2023
@rsc rsc modified the milestones: Proposal, Backlog Jun 21, 2023
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/508095 mentions this issue: internal/buildcfg: add support for accepting different MIPS ISA level on mips64

@HeliC829 HeliC829 changed the title cmd/go: add GOMIPS32, GOMIPS64 ISA levels (i, ii, iii, iv, v, r1, r2, r3, r5, r6) cmd/go: add GOMIPS32, GOMIPS64 ISA levels (iii, r1, r2, r5, r6) Jul 12, 2023
@HeliC829
Copy link
Contributor Author

Can some one take a look at CL 508095 ? So that I can rework on CL 485635 CL 485595 again.

gopherbot pushed a commit that referenced this issue Aug 3, 2023
Add support for WSBH/DSBH/DSHD instructions, which are introduced in mips{32,64}r2.

WSBH reverse bytes within halfwords for 32-bit word, DSBH reverse bytes within halfwords for 64-bit doubleword, and DSHD reverse halfwords within doublewords. These instructions can be used to optimize byte swaps.

Ref: The MIPS64 Instruction Set, Revision 5.04: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00087-2B-MIPS64BIS-AFP-05.04.pdf

Updates #60072

Change-Id: I31c043150fe8ac03027f413ef4cb2f3e435775e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/493816
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/515475 mentions this issue: cmd/internal/obj/mips: add SEB/SEH instructions

gopherbot pushed a commit that referenced this issue Aug 8, 2023
Add support for SEB/SEH instructions, which are introduced in mips32r2.

SEB/SEH can be used to sign-extend byte/halfword in registers directly without passing through memory.

Ref: The MIPS32 Instruction Set, Revision 5.04: https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-05.04.pdf

Updates #60072

Change-Id: I33175ae9d943ead5983ac004bd2a158039046d65
Reviewed-on: https://go-review.googlesource.com/c/go/+/515475
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
clktmr added a commit to clktmr/go that referenced this issue Sep 30, 2023
For GOARCH=mips the Go compiler will use the newer MIPS32-r1 ISA,
whereas for GOARCH=mips64 it will use the MIPS-III ISA, which is the
highest N64 supports.

See golang#60072
@HeliC829
Copy link
Contributor Author

@cherrymui Hi, PTAL on CL 508095, thanks.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/578175 mentions this issue: cmd/go: add GOMIPS32, GOMIPS64 ISA levels

HeliC829 added a commit to HeliC829/go that referenced this issue May 7, 2024
This CL does not introduced any instruction after MIPS III at now.
It should still behave identically to MIPS III.

Updates golang#60072
@stffabi
Copy link

stffabi commented May 14, 2024

Is there any progress on the implementation of this proposal? I'm in the process of contributing ChaCha20 and Poly1305 assembly implementations for MIPSLE to x/crypto. For ChaCha20 it would be great to use ROTR for Mips32r2 and newer.

HeliC829 added a commit to HeliC829/go that referenced this issue May 17, 2024
This CL does not introduced any instruction after MIPS III at now.
It should still behave identically to MIPS III.

Updates golang#60072
@HeliC829
Copy link
Contributor Author

Is there any progress on the implementation of this proposal? I'm in the process of contributing ChaCha20 and Poly1305 assembly implementations for MIPSLE to x/crypto. For ChaCha20 it would be great to use ROTR for Mips32r2 and newer.

Unfortunately, I'm still waiting for Golang to accept my CL about adding ISA level in GOMIPS{,64} environment variable. After this, I will submit more CLs about implementing the new instructions in mips{32,64}r1, mips{32,64}r2. In the meantime, I have created a fork branch based on the golang release branch, with newer mips instructions support. (implement for mips32 will be added soon).

@stffabi
Copy link

stffabi commented May 21, 2024

Thanks for your status update. I'm going to link your CL from mine, that is going to add ChaCha20 and Poly1305 assembly implementations.

HeliC829 added a commit to HeliC829/go that referenced this issue Jun 12, 2024
This CL does not introduced any instruction after MIPS III at now.
It should still behave identically to MIPS III.

Updates golang#60072
@clktmr
Copy link

clktmr commented Jun 20, 2024

I'm working on a mips64 target with ISA level MIPS-III. The current compiler doesn't emit valid MIPS-III code anymore. At least the following the commits must be reverted to get back to MIPS-III level (there are probably more):

24f83ed
83c4e53
68fea52
918d4d4

It would be greatly appreciated if MIPS-III can still be supported, or a CL accepted which adds that, once ISA level can be selected via the GOMIPS.

@HeliC829
Copy link
Contributor Author

I'm working on a mips64 target with ISA level MIPS-III. The current compiler doesn't emit valid MIPS-III code anymore. At least the following the commits must be reverted to get back to MIPS-III level (there are probably more):

For mips64, the current compiler won't emit code higher than MIPS-III. The ssa front end won't emit code higher than MIPS-III for now.

24f83ed
83c4e53
68fea52

Those commits mentioned above are used for ssa backend (aka assembler). It won't emit code directly.

918d4d4

Even if this commit used MIPS MSA instructions, the runtime will detect which instructions should be executed only if MSA is available.

I don't know what happened on your mips64 target, can you provide more information or logs?

@clktmr
Copy link

clktmr commented Jun 20, 2024

It was my fault. I'm running on a custom OS and didn't initialize CPU correctly in src/internal/cpu/. Tested again and everything is fine.

Sorry for the noise!

aykevl added a commit to tinygo-org/tinygo that referenced this issue Aug 22, 2024
This should widen compatibility a bit, so that older CPUs can also
execute programs built by TinyGo. The performance may be lower, if
that's an issue we can look into implementing the proposal here:
golang/go#60072

This still wouldn't make programs usable on MIPS II CPUs, I suppose we
can lower compatiblity down to that CPU if needed.

I tried setting the -cpu flag in the QEMU command line to be able to
test this, but it looks like there are no QEMU CPU models that are
mips32r1 and have a FPU. So it's difficult to test this.
deadprogram pushed a commit to tinygo-org/tinygo that referenced this issue Sep 5, 2024
This should widen compatibility a bit, so that older CPUs can also
execute programs built by TinyGo. The performance may be lower, if
that's an issue we can look into implementing the proposal here:
golang/go#60072

This still wouldn't make programs usable on MIPS II CPUs, I suppose we
can lower compatiblity down to that CPU if needed.

I tried setting the -cpu flag in the QEMU command line to be able to
test this, but it looks like there are no QEMU CPU models that are
mips32r1 and have a FPU. So it's difficult to test this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests

8 participants