-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Description
Re-posting from https://forums.developer.nvidia.com/t/openmp-excessive-power-consumption-for-waiting-threads/279272
"The OpenMP power consumption test is with the -p
argument to primes1
or primes3
which involves ordered output or one thread writing output at a time. Other threads wait their turn, orderly. I expect for the waiting threads to be idle or consume low CPU utilization. That is not the case and seeing full 6400% CPU utilization (AMD Threadripper 3970X - 64 logical CPU threads) for printing prime numbers to /dev/null. Nothing like GNU GCC consuming just173% for the same test."
I see also, near 6400% CPU utilization using clang for the power consumption test, during orderly output.
gcc -o primes1.gcc -O3 -fopenmp -I../src primes1.c -lm
clang -o primes1.clang -O3 -fopenmp -I../src primes1.c -lm
nvc -o primes1.nvc -O3 -mp=multicore -I../src primes1.c -lm
gcc -o primes3.gcc -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
clang -o primes3.clang -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
nvc -o primes3.nvc -O3 -mp=multicore -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
OpenMP Ordered Power Consumption Test
Threadripper 3970X idle (browser NV forums page) 120 watts
./primes1.gcc 1e10 -p >/dev/null 10.173 secs, 201 watts
./primes1.clang 1e10 -p >/dev/null 12.729 secs, 288 watts
./primes1.nvc 1e10 -p >/dev/null 21.346 secs, 322 watts
./primes3.gcc 1e10 -p >/dev/null 7.092 secs, 181 watts
./primes3.clang 1e10 -p >/dev/null 8.876 secs, 274 watts
./primes3.nvc 1e10 -p >/dev/null 11.080 secs, 361 watts
OpenMP Performance Test
Threadripper 3970X idle (browser NV forums page) 120 watts
./primes1.gcc 1e12 16.168 secs, 399 watts
./primes1.clang 1e12 16.274 secs, 395 watts
./primes1.nvc 1e12 14.780 secs, 393 watts
./primes3.gcc 1e12 5.762 secs, 437 watts
./primes3.clang 1e12 6.277 secs, 434 watts
./primes3.nvc 1e12 5.755 secs, 442 watts
I first witnessed the power consumption issue using Codon.
Is it okay for waiting threads to be spinning the CPU during ordered or exclusive blocks? I wonder about cloud customers possibly paying extra power consumption simply for threads waiting their turn. The Intel oneAPI compilers are also impacted.