Skip to content

[OpenMP] excessive power consumption for waiting threads #78485

@marioroy

Description

@marioroy

Re-posting from https://forums.developer.nvidia.com/t/openmp-excessive-power-consumption-for-waiting-threads/279272

"The OpenMP power consumption test is with the -p argument to primes1 or primes3 which involves ordered output or one thread writing output at a time. Other threads wait their turn, orderly. I expect for the waiting threads to be idle or consume low CPU utilization. That is not the case and seeing full 6400% CPU utilization (AMD Threadripper 3970X - 64 logical CPU threads) for printing prime numbers to /dev/null. Nothing like GNU GCC consuming just173% for the same test."

I see also, near 6400% CPU utilization using clang for the power consumption test, during orderly output.

Prime Demos

gcc -o primes1.gcc -O3 -fopenmp -I../src primes1.c -lm
clang -o primes1.clang -O3 -fopenmp -I../src primes1.c -lm
nvc -o primes1.nvc -O3 -mp=multicore -I../src primes1.c -lm

gcc -o primes3.gcc -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
clang -o primes3.clang -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
nvc -o primes3.nvc -O3 -mp=multicore -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm

OpenMP Ordered Power Consumption Test

Threadripper 3970X idle (browser NV forums page)  120 watts

./primes1.gcc   1e10 -p >/dev/null   10.173 secs, 201 watts
./primes1.clang 1e10 -p >/dev/null   12.729 secs, 288 watts
./primes1.nvc   1e10 -p >/dev/null   21.346 secs, 322 watts

./primes3.gcc   1e10 -p >/dev/null    7.092 secs, 181 watts
./primes3.clang 1e10 -p >/dev/null    8.876 secs, 274 watts
./primes3.nvc   1e10 -p >/dev/null   11.080 secs, 361 watts

OpenMP Performance Test

Threadripper 3970X idle (browser NV forums page)  120 watts

./primes1.gcc   1e12                 16.168 secs, 399 watts
./primes1.clang 1e12                 16.274 secs, 395 watts
./primes1.nvc   1e12                 14.780 secs, 393 watts

./primes3.gcc   1e12                  5.762 secs, 437 watts
./primes3.clang 1e12                  6.277 secs, 434 watts
./primes3.nvc   1e12                  5.755 secs, 442 watts

I first witnessed the power consumption issue using Codon.

exaloop/codon#456

Is it okay for waiting threads to be spinning the CPU during ordered or exclusive blocks? I wonder about cloud customers possibly paying extra power consumption simply for threads waiting their turn. The Intel oneAPI compilers are also impacted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions