You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since #28 that fixed #27, another strange regression appeared, dividing per by 5:
from a March 23 build
$ ./build/gemm_f32_omp
A matrix shape: (M: 1920, N: 1920)
B matrix shape: (M: 1920, N: 1920)
Output shape: (M: 1920, N: 1920)
Required number of operations: 14155.776 millions
Required bytes: 29.491 MB
Arithmetic intensity: 480.000 FLOP/byte
Theoretical peak single-core: 224.000 GFLOP/s
Theoretical peak multi: 4032.000 GFLOP/s
Make sure to not bench Apple Accelerate or the default Linux BLAS.
Reference loop
Collected 10 samples in 10.421 seconds
Average time: 1041.539 ms
Stddev time: 3.983 ms
Min time: 1035.329 ms
Max time: 1047.674 ms
Perf: 13.591 GFLOP/s
OpenBLAS benchmark
Collected 10 samples in 0.091 seconds
Average time: 8.438 ms
Stddev time: 6.319 ms
Min time: 6.240 ms
Max time: 26.393 ms
Perf: 1677.596 GFLOP/s
Laser production implementation
Collected 10 samples in 0.087 seconds
Average time: 8.035 ms
Stddev time: 4.186 ms
Min time: 6.517 ms
Max time: 19.913 ms
Perf: 1761.855 GFLOP/s
PyTorch Glow: libjit matmul implementation (with AVX+FMA)
Collected 10 samples in 1.900 seconds
Average time: 189.987 ms
Stddev time: 2.893 ms
Min time: 188.794 ms
Max time: 198.044 ms
Perf: 74.509 GFLOP/s
MKL-DNN reference GEMM benchmark
Collected 10 samples in 0.368 seconds
Average time: 36.043 ms
Stddev time: 5.048 ms
Min time: 34.275 ms
Max time: 50.364 ms
Perf: 392.748 GFLOP/s
MKL-DNN JIT AVX benchmark
Collected 10 samples in 0.105 seconds
Average time: 9.758 ms
Stddev time: 5.933 ms
Min time: 7.715 ms
Max time: 26.624 ms
Perf: 1450.731 GFLOP/s
MKL-DNN JIT AVX512 benchmark
Collected 10 samples in 0.088 seconds
Average time: 8.154 ms
Stddev time: 10.128 ms
Min time: 4.733 ms
Max time: 36.938 ms
Perf: 1736.020 GFLOP/s
Mean Relative Error compared to vendor BLAS: 3.045843413929106e-06
From a recent rebuild
$./build/gemm_omp_f32
Amatrix shape: (M: 1920, N: 1920)
Bmatrix shape: (M: 1920, N: 1920)
Output shape: (M: 1920, N: 1920)
Required number of operations: 14155.776 millions
Required bytes: 29.491MBArithmetic intensity: 480.000FLOP/byteTheoreticalpeak single-core: 224.000GFLOP/s
Theoreticalpeak multi: 4032.000GFLOP/s
MakesuretonotbenchAppleAccelerateorthedefaultLinuxBLAS.
Laserproduction implementation
Collected10 samples in0.555 seconds
Average time: 54.917 ms
Stddev time: 5.027 ms
Min time: 53.250 ms
Max time: 69.218 ms
Perf: 257.765GFLOP/s
The text was updated successfully, but these errors were encountered:
Since #28 that fixed #27, another strange regression appeared, dividing per by 5:
from a March 23 build
From a recent rebuild
The text was updated successfully, but these errors were encountered: