-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU staggered Dslash hanging on 8/16 GPUs #100
Comments
When going from 4 to 8 GPUs, was there anything else that changed? I am curious if the 4 to 8 GPU transition is when another dimension is partitioned. |
I had partitioned the z and t directions. On 01/23/2013 10:39 AM, mikeaclark wrote:
|
Ok, it looks like the partitioning isn't the issue then. I'll take a look later today. |
Had the same problem for multi-node execution. Single node (i.e., 2 GPUs in my case) seemed to be ok. |
This problem is not to do with number of nodes, rather it only seems to occur when the number grid size is 4 or greater. |
I just noticed this problem on Blue Waters yesterday when I was testing the MPI build. staggered_dslash_test and staggered_invert_test run fine on 4 GPUs, but hang in tests involving 8 and 16 GPUs. The bug was introduced in one of the commits of November 27 and 28. The code in the master branch worked fine before that. Blue Waters is down for maintenance today, but I will check whether the same problem occurs in the QMP build once it's back up.
The text was updated successfully, but these errors were encountered: