You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the cosma miniapp on a 72 core xeon machine with the following parameters $parallel_cosma -m 8688 -n 8688 -k 8688 -r 3
The last line of the stdout reads: COSMA TIMES [ms] = 458 460 771
I am curious about the large spread between fastest and slowest multiplication. The fast number would mean 40 GFLOPS/core/s which is a good number for this machine. The slowest number would imply only 23 GFLOPS/core/s.
Am I right that there is a 300 ms overhead finding the optimal "parallelization strategy"? Which of both numbers would be fair to compare with other libraries like ScaLapack and others?
I am aware that this is a very extreme example. But a spread of 10-20% between fastest and slowest number is very typical.
The text was updated successfully, but these errors were encountered:
Am I right that there is a 300 ms overhead finding the optimal "parallelization strategy"?
No, the overhead is very likely due to library initializations during the first run in the miniapp.
Multithreaded MKL is usually the the library that introduces more overhead as it has to initialize the OpenMP environment and allocate some memory during the first library calls. MPI on certain systems introduces as well some overhead during the first communications.
I am running the cosma miniapp on a 72 core xeon machine with the following parameters
$parallel_cosma -m 8688 -n 8688 -k 8688 -r 3
The last line of the stdout reads:
COSMA TIMES [ms] = 458 460 771
I am curious about the large spread between fastest and slowest multiplication. The fast number would mean
40 GFLOPS/core/s
which is a good number for this machine. The slowest number would imply only23 GFLOPS/core/s
.Am I right that there is a 300 ms overhead finding the optimal "parallelization strategy"? Which of both numbers would be fair to compare with other libraries like ScaLapack and others?
I am aware that this is a very extreme example. But a spread of 10-20% between fastest and slowest number is very typical.
The text was updated successfully, but these errors were encountered: