unexpected performance when using COSMA with GPU (single node) #94

rohany · 2021-06-21T18:09:46Z

I'm testing out COSMA with GPU support on a single node with GPU's, and I'm not seeing performance that I might expect.

1 GPU:
COSMA TIMES [ms] = 1562 1657 2133 2390 6865
2 GPU:
COSMA TIMES [ms] = 1544 2710 3374 3626 6060
4 GPU:
COSMA TIMES [ms] = 805 832 1456 3142 6419

I expect to:

See some difference in runtime from 1 -> 2 GPUs
Somewhat stable performance? The difference between the min and max are quite high.

I'm on the current master, and running the miniapp with (-n and -r are how many ranks to run on a node)

OMP_NUM_THREADS=6 COSMA_OVERLAP_COMM_AND_COMP=ON jsrun -n 4 -c 6 -g 1 -b none -r 4 ./miniapp/cosma_miniapp -m 16384 -n 16384 -k 16384 -r 5

I build cosma with:

cmake -DCOSMA_BLAS=CUDA -DCMAKE_INSTALL_PREFIX=../ ..

The text was updated successfully, but these errors were encountered:

kabicm · 2021-06-21T18:53:16Z

Great that it works now!

Can you check also without overlapping communication and computation? And can you also try some larger matrix sizes, e.g. 32k or so? Basically, 16k case can be run on a rank with a single GPU.

I will check this testcase on our system and then we will see.

rohany · 2021-06-21T19:10:25Z

I see slightly better performance without overlap:
1 GPU:
COSMA TIMES [ms] = 1996 2453 3166 4126 4613
2 GPU:
COSMA TIMES [ms] = 1934 2348 2505 3602 5486
4 GPU:
COSMA TIMES [ms] = 1041 1370 1530 1584 1905

At matrix size 30000 without overlap i see:
1 GPU:
COSMA TIMES [ms] = 8987 9811 9840 13445 13833
2 GPU:
COSMA TIMES [ms] = 7181 7182 7227 7839 9772
4 GPU:
COSMA TIMES [ms] = 4282 4327 6609 8345 23970

rohany · 2021-09-28T16:30:36Z

I'm going to close this since things are working as expected for me now.

kabicm · 2021-09-28T16:35:36Z

Hi Rohan,

What was the main problem? Was this also related to limited memory that we yesterday discussed?

Also, did you check adding just -s "sm2" or just -s "sm2,sn2" instead of splitting all three dimensions beforehand?

Thanks for your feedback!

rohany · 2021-09-28T16:41:43Z

The main problem at this time iirc was that I was strong scaling instead of weak scaling, as well as on relatively small problem sizes.

rohany closed this as completed Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected performance when using COSMA with GPU (single node) #94

unexpected performance when using COSMA with GPU (single node) #94

rohany commented Jun 21, 2021

kabicm commented Jun 21, 2021 •

edited

Loading

rohany commented Jun 21, 2021

rohany commented Sep 28, 2021

kabicm commented Sep 28, 2021

rohany commented Sep 28, 2021

unexpected performance when using COSMA with GPU (single node) #94

unexpected performance when using COSMA with GPU (single node) #94

Comments

rohany commented Jun 21, 2021

kabicm commented Jun 21, 2021 • edited Loading

rohany commented Jun 21, 2021

rohany commented Sep 28, 2021

kabicm commented Sep 28, 2021

rohany commented Sep 28, 2021

kabicm commented Jun 21, 2021 •

edited

Loading