-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge updates #3
Conversation
@robertmaynard can we get some review on the cmake logic here, please? |
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
CMakeLists.txt
Outdated
@@ -7,41 +20,53 @@ include(rapids-cuda) | |||
include(rapids-export) | |||
include(rapids-find) | |||
|
|||
rapids_cuda_init_architectures(WHOLEGRAPH) | |||
|
|||
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go with rapids_cuda_init_architectures(WHOLEGRAPH)
instead of the explicit value set. This will allow users to compile for a subset, and make it easier to support new architectures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we use some new CUDA features that only supported after 70 and newer architectures now, like memory consistency model and nanosleep. So we would like to set CUDA architectures newer than or equal to 70. It seems to me that rapids cmake supports ALL and NATIVE, is it able to set values >= 70?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I am aware all of RAPIDS needs to support sm_60 as that is still a major deployment target.
rapids-cmake ALL
keyword maps to 60-real, 70-real, 75-real, ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way what you should do is call rapids_cuda_init_architectures(WHOLEGRAPH)
.
This will allow the user to specify a value for CMAKE_CUDA_ARCHITECTURES
. After the project call you can always iterate that value and produce an error if it contains an sm value that is too low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean I can use set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)
after rapids_cuda_init_architectures(WHOLEGRAPH)
and project call?
Maybe like this?
rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am saying is that you shouldn't overwrite what the user has specified. If a user wants to just build the GPU on the local machine they should be able to without changing any CMake code. They should be able to specify -DCMAKE_CUDA_ARCHITECTURE=86-real
or -DCMAKE_CUDA_ARCHITECTURE=NATIVE
.
Therefore what you should do is:
rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)
and have a C++ side check like:
#if __CUDA_ARCH__ < 700
#error "wholegraph doesn't support architectures .....
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the cmake logic, set default CMAKE_CUDA_ARCHITECTURES
if user doesn't specify it. And call rapids_cuda_init_architectures
to support ALL
and NATIVE
if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)
endif ()
rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)
Also updated CUDA C++ code as suggested.
# Configure path to modules (for find_package) | ||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PROJECT_SOURCE_DIR}/cmake/modules/") | ||
# enable assert in RelWithDebInfo build type | ||
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to overwrite the default flag values for RELWITHDEBINFO
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to remove -DNDEBUG
in RELWITHDEBINFO
to enable assert
. Is there a better way to do this?
test/CMakeLists.txt
Outdated
add_executable(whole_graph_sp_test whole_graph_sp_test.cu) | ||
target_link_libraries(whole_graph_sp_test whole_graph) | ||
add_executable(whole_memory_mp_test whole_memory_mp_test.cu) | ||
target_link_libraries(whole_memory_mp_test whole_graph) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the target_link_libraries
should be updated to be target_link_libraries( <target> PRIVATE whole_graph)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
target_compile_definitions(whole_graph PUBLIC -D_FILE_OFFSET_BITS=64) | ||
if (${USE_CXX11_ABI}) | ||
message(STATUS "Using CXX ABI = 1") | ||
target_compile_definitions(whole_graph PUBLIC -D_GLIBCXX_USE_CXX11_ABI=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be public? does whole_graph have a public api that includes C++ types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, whole_graph provides api with C++ types.
Co-authored-by: Robert Maynard <robertjmaynard@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of very minor nitpicks.
thread_local std::mt19937 gen(rd()); | ||
thread_local std::uniform_int_distribution<unsigned long long> distrib; | ||
unsigned long long random_seed = distrib(gen); | ||
WM_CUDA_CHECK(cudaStreamSynchronize(stream)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this stream-sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not needed, removed, thanks!
char *ptr_to = (char *) to; | ||
const char *ptr_from = (const char *) from; | ||
for (int i = 0; i < DataSize; i++) { | ||
ptr_to[i] = ptr_from[i]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's simpler to use memcpy
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! Yes it will be simpler. Here DataSize is template parameter, and should be not large in normal cases, maybe we would prefer compiler to optimize for it instead of device function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-approving. Overall LGTM.
Thanks @dongxuy04 . Appreciate your patience during the PR review process. @BradReesWork we are now be ready to merge this one! |
Thanks @teju85 @robertmaynard @BradReesWork for many good suggestions and great help during the PR review process! @BradReesWork shall we get this PR merged? |
Merged updates from WholeMemory and update docs.