-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
view(uvm): don't fence during view's allocation (#6005) #6008
Conversation
Can one of the admins verify this patch? |
1f3c50f
to
cfc6ce4
Compare
cfc6ce4
to
2d8069e
Compare
d26e6cc
to
e6d2ee2
Compare
ad29115
to
33bed43
Compare
The developer meeting asked for more clarification of what we are discussing: Consider Kokkos::Cuda exec_space;
Kokkos::View<int*, Kokkos::CudaUVMSpace> view(Kokkos::view_alloc(exec_space, "view"), 10);
view[0] = 2; should this code work or do we require the user to fence? Kokkos::Experimental::HPX exec_space;
Kokkos::View<int*, Kokkos::HostSpace> view(Kokkos::view_alloc(exec_space, "view"), 10);
view[0] = 2; ? Currently, we are fencing the default execution space instance(!) before allocation and after (potential initialization) only for |
In my opinion, yes both need fences. |
Hi @dalg24 @masterleinad ! We tried to think about this a bit more. Here is a table that show what we have in mind. Note the introduction of the opt-in In our opinion, the behavior for a shared memory space should be similar to any other one.
|
33bed43
to
391ccd3
Compare
There is no need for this flag. If you give an execution space , you agree that the operation may be asynchronous. If you don't provide an execution space. we should fence for you. If you provide an execution space, then it's your responsibility. |
Let's just do a simple poll that might decide pretty much already how we want to proceed:
|
Hi @masterleinad ! 👋 I guess your poll is unanimous:
If you agree, I'll make the changes. |
Yes, just delete the UVM fences. |
391ccd3
to
47f4b6b
Compare
47f4b6b
to
9b5e667
Compare
@masterleinad I guess the reason for failure of |
This doesn't really touch |
46e4030
to
ef07355
Compare
@masterleinad The last action showed only |
I already approved it. So you need to bug other people (possibly someone with the power to merge it). 🙂 |
Indeed, sorry @masterleinad ! I guess I need to bug @dalg24 and @PhilMiller 📨 |
Hi @dalg24 and @PhilMiller, would you have a moment to look at this PR, and, if you agree with the changes, approve? Many thanks in advance! |
Make sure you update the title line and the description of this PR to reflect the latest version. |
//! @name Some tests are skipped for @c CudaUVM memory space. | ||
///@{ | ||
#ifdef KOKKOS_ENABLE_CUDA | ||
#define GTEST_SKIP_IF_CUDAUVM_MEMORY_SPACE \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other discussion aside, standard practice when defining a macro like this is to wrap the contents in do { ... } while (false)
so that the use sites have to treat it like a statement with a trailing ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now
#ifdef KOKKOS_ENABLE_CUDA
#define GTEST_SKIP_IF_CUDAUVM_MEMORY_SPACE \
if constexpr (std::is_same_v<typename TEST_EXECSPACE::memory_space, \
Kokkos::CudaUVMSpace>) \
GTEST_SKIP() << "skipping since CudaUVMSpace requires additional fences"
#else
#define GTEST_SKIP_IF_CUDAUVM_MEMORY_SPACE do {} while(false)
#endif
such that the caller must add the trailing ;
. Is that what you meant? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with Phil's suggestion. This only apply to macro that take arguments.
I would prefer to revert but I am not blocking on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also preferred how it was before. I'll revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dalg24 I think this is the last thread. If you agree with all the changes, this PR is good to go 🚀
ef07355
to
e395d5f
Compare
cb17816
to
0a47643
Compare
Impl::cuda_device_synchronize( | ||
"Kokkos::Cuda: backend fence after async malloc"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masterleinad This is just a comment about the clang-format
. Do you think it would make sense (not in this PR of course) to lower the penalty of excess characters? (See https://clang.llvm.org/docs/ClangFormatStyleOptions.html#penaltyexcesscharacter)
I mean, sometimes a line with 87 characters might be nicer than the same line split weirdly across 2 lines 😄 This is just a question, not suggesting to do anything but gathering your opinion on this 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the clang-format version we are using to something more recent in the near future.
This will trigger changes throughout the code base. When we do so would be a time where we would consider changing the clang-format configuration. If that's something you are interested in you are welcome to explore options but this needs to be done in an issue elsewhere.
A word of warning, this is a time sink and at times can get controversial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If I'm not mistaken, the PenaltyExcessCharacter is available from
clang-format
3.7, and you're requiring version8.0
inscripts/apply-clang-format
. So I guess the option could be readily used. - Instead of modifying the whole code base when the clang format changes (either version or options), I think you should use
git-clang-format
to check for style only in modified files. This would avoid committing purely stylistic changes. That would avoid messing up the git history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow disabling clang-format when appropriate. You will find all sort of examples throughout the code base such as
kokkos/core/src/impl/Kokkos_QuadPrecisionMath.hpp
Lines 46 to 67 in 4b6d971
// clang-format off | |
// Numeric distinguished value traits | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(infinity, __float128, __float128, HUGE_VALQ) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(finite_min, __float128, __float128, -FLT128_MAX) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(finite_max, __float128, __float128, FLT128_MAX) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(epsilon, __float128, __float128, FLT128_EPSILON) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(round_error, __float128, __float128, static_cast<__float128>(0.5)) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(norm_min, __float128, __float128, FLT128_MIN) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(denorm_min, __float128, __float128, FLT128_DENORM_MIN) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(quiet_NaN, __float128, __float128, __builtin_nanq("")) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(signaling_NaN, __float128, __float128, __builtin_nansq("")) | |
// Numeric characteristics traits | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(digits, __float128, int, FLT128_MANT_DIG) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(digits10, __float128, int, FLT128_DIG) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(max_digits10, __float128, int, 36) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(radix, __float128, int, 2) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(min_exponent, __float128, int, FLT128_MIN_EXP) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(max_exponent, __float128, int, FLT128_MAX_EXP) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(min_exponent10, __float128, int, FLT128_MIN_10_EXP) | |
KOKKOS_IMPL_SPECIALIZE_NUMERIC_TRAIT(max_exponent10, __float128, int, FLT128_MAX_10_EXP) | |
// clang-format on |
kokkos/core/src/OpenACC/Kokkos_OpenACC_ParallelFor_Range.hpp
Lines 35 to 37 in 4b6d971
// clang-format off | |
#pragma acc parallel loop gang(static:chunk_size) vector copyin(functor) async(async_arg) | |
// clang-format on |
In principle you are right but in practice I think it would mean that the entire file would be reformatted as soon as someone touches it and this "spreads" the noise across the history of the repo instead of having one clean "break".
(event.descriptor().find(fence_event_message_for_zeromemset( | ||
TEST_EXECSPACE{})) != std::string::npos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unnecessary. Please undo.
/** | ||
* @name Some tests are skipped for @c CudaUVM memory space. | ||
* | ||
* @todo To be revised according to the future of @c KOKKOS_ENABLE_CUDA_UVM . | ||
*/ | ||
///@{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not requiring changes but these are 5 lines of comment (not even counting the closing comment) when one single line would do and we do not really use Doxygen so chances are it would not even generate correct documentation after other edit the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used ///
instead of /** ... */
. Tell me if it's good for you ;)
(almost there) |
0a47643
to
5c2d948
Compare
This PR is a fix for
Kokkos::View
given aspace
instance andCuda UVM
#6005Basically, when
CudaUVM
is enabled, there is a fence before and after the view's allocation (whether thespace
instance was given viaarg_prop
or not).However, the fence was not performed on the space instance the user provided (if any).
According to the "poll" below, it was decided to
Therefore, any additional fence that was added for
CudaUVM
in view's allocation has been removed. Note that fencing still occurs in the backend implementations.