-
Notifications
You must be signed in to change notification settings - Fork 511
TEST/MPI: Added MPI+CUDA example. #10601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e50b97e to
ff5b829
Compare
e0995a6 to
e3ccd4c
Compare
e3ccd4c to
9011fcd
Compare
9011fcd to
a15d178
Compare
a15d178 to
4641483
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rakhmets Overall the tests stress multi-GPU support in a good way.
I see the following not being tested:
- The case where one thread has device context bound to it and it allocates device memory. Later another thread issues the MPI operations with the allocated memory.
- Also use cudaSetDevice/cudaDeviceReset runtime API Instead of explicitly using ctxRetain/Release. This is more commonly the API exercised by high-level applications so it would be good to ensure that no cases break from just testing driver API.
The above two cases are supported by multi-GPU support, right? If so, will they be tested in a separate tests?
I will add a separate test using CUDA Runtime API (probably in another PR). And I will include the test case described in the first bullet to this new test. |
e4954df to
f69090b
Compare
contrib/test_jenkins.sh
Outdated
| export LD_LIBRARY_PATH=${ucx_inst}/lib:${MPI_HOME}/lib:${prev_LD_LIBRARY_PATH} | ||
|
|
||
| build release --disable-gtest --with-mpi | ||
| build release-mt --with-mpi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd include asserts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added --enable-assertions
test/mpi/test_mpi_cuda.c
Outdated
| if (__err != CUDA_SUCCESS) { \ | ||
| const char *__err_str; \ | ||
| cuGetErrorString(__err, &__err_str); \ | ||
| fprintf(stderr, "test_mpi_cuda.c:%-3u %s failed: %d (%s)\n", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use __FILE__ instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will print full path which is not needed here. And I don't want to add code to fetch a filename from the FILE, since it overcomplicates simple code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will print full path which is not needed here. And I don't want to add code to fetch a filename from the FILE, since it overcomplicates simple code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be just something like below?
const char *filename = strrchr(__FILE__, '/');
filename = filename ? filename + 1 : __FILE__;
imo, it is better, so no need to change the code if file is renamed for some reason. Another alternative is just do not print a file name, because it is anyway a single file test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
test/mpi/test_mpi_cuda.c
Outdated
| CUdeviceptr d_send; | ||
| CUdeviceptr d_recv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: maybe use send_ptr and recv_ptr or use some other consistent names for CUdeviceptr variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the fields.
test/mpi/test_mpi_cuda.c
Outdated
| alloc_mem_send = allocator_send->alloc(size); | ||
| alloc_mem_recv = allocator_recv->alloc(size); | ||
|
|
||
| cuda_memcpy((void*)alloc_mem_send.ptr, gold_data, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it is worth to memset recv buffer to zeros or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
test/mpi/test_mpi_cuda.c
Outdated
|
|
||
| cuda_memcpy((void*)alloc_mem_send.ptr, gold_data, size); | ||
|
|
||
| CUDA_CALL(cuCtxPopCurrent(&primary_ctx)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a check that there is no other context set currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
test/mpi/test_mpi_cuda.c
Outdated
| CUDA_CALL(cuDeviceGetCount(&dev_count)); | ||
|
|
||
| for (i = 0; i < dev_count; ++i) { | ||
| CUDA_CALL(cuDeviceGet(&cu_dev_alloc, (i + rank) % dev_count)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe iterate over all possible combinations instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left only one compbination, since the test allocates on one gpu and send using another. Iterating through all gpus doesn't improve coverage.
test/mpi/test_mpi_cuda.c
Outdated
| if (__err != CUDA_SUCCESS) { \ | ||
| const char *__err_str; \ | ||
| cuGetErrorString(__err, &__err_str); \ | ||
| fprintf(stderr, "test_mpi_cuda.c:%-3u %s failed: %d (%s)\n", \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will it be just something like below?
const char *filename = strrchr(__FILE__, '/');
filename = filename ? filename + 1 : __FILE__;
imo, it is better, so no need to change the code if file is renamed for some reason. Another alternative is just do not print a file name, because it is anyway a single file test
|
@rakhmets, can u pls squash? |
c051257 to
4d06b0e
Compare
What?
Added MPI+CUDA example.