-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unexpected performance when using COSMA with GPU (single node) #94
Comments
Great that it works now! Can you check also without overlapping communication and computation? And can you also try some larger matrix sizes, e.g. 32k or so? Basically, 16k case can be run on a rank with a single GPU. I will check this testcase on our system and then we will see. |
I see slightly better performance without overlap: At matrix size 30000 without overlap i see: |
I'm going to close this since things are working as expected for me now. |
Hi Rohan, What was the main problem? Was this also related to limited memory that we yesterday discussed? Also, did you check adding just Thanks for your feedback! |
The main problem at this time iirc was that I was strong scaling instead of weak scaling, as well as on relatively small problem sizes. |
I'm testing out COSMA with GPU support on a single node with GPU's, and I'm not seeing performance that I might expect.
1 GPU:
COSMA TIMES [ms] = 1562 1657 2133 2390 6865
2 GPU:
COSMA TIMES [ms] = 1544 2710 3374 3626 6060
4 GPU:
COSMA TIMES [ms] = 805 832 1456 3142 6419
I expect to:
I'm on the current master, and running the miniapp with (-n and -r are how many ranks to run on a node)
I build cosma with:
The text was updated successfully, but these errors were encountered: