Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge updates #3

Merged
merged 15 commits into from
Aug 29, 2022
Merged

merge updates #3

merged 15 commits into from
Aug 29, 2022

Conversation

dongxuy04
Copy link
Contributor

Merged updates from WholeMemory and update docs.

include/whole_memory_graph.h Outdated Show resolved Hide resolved
Dockerfile Show resolved Hide resolved
test/CMakeLists.txt Show resolved Hide resolved
@dongxuy04 dongxuy04 requested a review from teju85 August 16, 2022 06:30
@teju85
Copy link
Member

teju85 commented Aug 16, 2022

@robertmaynard can we get some review on the cmake logic here, please?

docs/CppAPI.md Outdated Show resolved Hide resolved
docs/GNNExample.md Outdated Show resolved Hide resolved
docs/GNNExample.md Outdated Show resolved Hide resolved
docs/PyTorchAPI.md Outdated Show resolved Hide resolved
docs/WholeMemoryIntroduction.md Outdated Show resolved Hide resolved
dongxuy04 and others added 5 commits August 16, 2022 15:44
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
Co-authored-by: Thejaswi. N. S <rao.thejaswi@gmail.com>
wholegraph/bootstrap_communicator.cc Outdated Show resolved Hide resolved
wholegraph/bootstrap_communicator.cc Outdated Show resolved Hide resolved
fetch_rapids.cmake Outdated Show resolved Hide resolved
fetch_rapids.cmake Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
wholegraph/torch/CMakeLists.txt Outdated Show resolved Hide resolved
wholegraph/torch/CMakeLists.txt Outdated Show resolved Hide resolved
wholegraph/torch/CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated
@@ -7,41 +20,53 @@ include(rapids-cuda)
include(rapids-export)
include(rapids-find)

rapids_cuda_init_architectures(WHOLEGRAPH)

set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go with rapids_cuda_init_architectures(WHOLEGRAPH) instead of the explicit value set. This will allow users to compile for a subset, and make it easier to support new architectures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we use some new CUDA features that only supported after 70 and newer architectures now, like memory consistency model and nanosleep. So we would like to set CUDA architectures newer than or equal to 70. It seems to me that rapids cmake supports ALL and NATIVE, is it able to set values >= 70?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I am aware all of RAPIDS needs to support sm_60 as that is still a major deployment target.
rapids-cmake ALL keyword maps to 60-real, 70-real, 75-real, ....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way what you should do is call rapids_cuda_init_architectures(WHOLEGRAPH).

This will allow the user to specify a value for CMAKE_CUDA_ARCHITECTURES. After the project call you can always iterate that value and produce an error if it contains an sm value that is too low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean I can use set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86) after rapids_cuda_init_architectures(WHOLEGRAPH) and project call?
Maybe like this?
rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)
set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am saying is that you shouldn't overwrite what the user has specified. If a user wants to just build the GPU on the local machine they should be able to without changing any CMake code. They should be able to specify -DCMAKE_CUDA_ARCHITECTURE=86-real or -DCMAKE_CUDA_ARCHITECTURE=NATIVE.

Therefore what you should do is:

rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)

and have a C++ side check like:

#if __CUDA_ARCH__ < 700
#error "wholegraph doesn't support architectures .....
#endif 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the cmake logic, set default CMAKE_CUDA_ARCHITECTURES if user doesn't specify it. And call rapids_cuda_init_architectures to support ALL and NATIVE

if (NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
    set(CMAKE_CUDA_ARCHITECTURES 70-real 75-real 80-real 86)
endif ()
rapids_cuda_init_architectures(WHOLEGRAPH)
project(wholegraph CXX CUDA)

Also updated CUDA C++ code as suggested.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
# Configure path to modules (for find_package)
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${PROJECT_SOURCE_DIR}/cmake/modules/")
# enable assert in RelWithDebInfo build type
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to overwrite the default flag values for RELWITHDEBINFO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to remove -DNDEBUG in RELWITHDEBINFO to enable assert. Is there a better way to do this?

CMakeLists.txt Outdated Show resolved Hide resolved
add_executable(whole_graph_sp_test whole_graph_sp_test.cu)
target_link_libraries(whole_graph_sp_test whole_graph)
add_executable(whole_memory_mp_test whole_memory_mp_test.cu)
target_link_libraries(whole_memory_mp_test whole_graph)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the target_link_libraries should be updated to be target_link_libraries( <target> PRIVATE whole_graph)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

wholegraph/torch/CMakeLists.txt Outdated Show resolved Hide resolved
target_compile_definitions(whole_graph PUBLIC -D_FILE_OFFSET_BITS=64)
if (${USE_CXX11_ABI})
message(STATUS "Using CXX ABI = 1")
target_compile_definitions(whole_graph PUBLIC -D_GLIBCXX_USE_CXX11_ABI=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be public? does whole_graph have a public api that includes C++ types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whole_graph provides api with C++ types.

dongxuy04 and others added 2 commits August 17, 2022 22:46
Co-authored-by: Robert Maynard <robertjmaynard@gmail.com>
Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of very minor nitpicks.

thread_local std::mt19937 gen(rd());
thread_local std::uniform_int_distribution<unsigned long long> distrib;
unsigned long long random_seed = distrib(gen);
WM_CUDA_CHECK(cudaStreamSynchronize(stream));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this stream-sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed, removed, thanks!

Comment on lines +305 to +309
char *ptr_to = (char *) to;
const char *ptr_from = (const char *) from;
for (int i = 0; i < DataSize; i++) {
ptr_to[i] = ptr_from[i];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simpler to use memcpy function instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your suggestion! Yes it will be simpler. Here DataSize is template parameter, and should be not large in normal cases, maybe we would prefer compiler to optimize for it instead of device function call.

Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving. Overall LGTM.

@teju85
Copy link
Member

teju85 commented Aug 22, 2022

Thanks @dongxuy04 . Appreciate your patience during the PR review process.

@BradReesWork we are now be ready to merge this one!

@dongxuy04
Copy link
Contributor Author

Thanks @teju85 @robertmaynard @BradReesWork for many good suggestions and great help during the PR review process! @BradReesWork shall we get this PR merged?

@BradReesWork BradReesWork merged commit bef14e0 into rapidsai:main Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants