-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DualView sync's on ExecutionSpaces #3822
DualView sync's on ExecutionSpaces #3822
Conversation
Note: not doing this for resize, as that allocates, which is a sync, too. So no point in optimizing that (unless we get async allocations) |
Don't merge this one, I did something very dumb |
Christian and I discussed over Slack. There's subtlety here, and I was worried about other problems we might encounter. Specifically, assuming a UVM View, and somebody does
Conceptually, this could be broken. On the other hand, it's the moral equivalent of Kokkos::deep_copy(some_host_space, uvm_view_1, uvm_view_2); So I'm going to make an incremental change here, but otherwise this will be good to go |
Assuming this passes CI, it's ready for a review |
f127f47
to
08dc893
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me.
Retest this please |
Retest this please. |
You probably have to rebase to make this pass... |
template <class Device> | ||
void sync(const typename std::enable_if< | ||
(!std::is_same<typename traits::data_type, | ||
typename traits::non_const_data_type>::value) || | ||
(std::is_same<Device, int>::value), | ||
int>::type& = 0) { | ||
sync_impl<Device>(std::false_type{}); | ||
} | ||
template <class Device, class ExecutionSpace> | ||
void sync(const ExecutionSpace& exec, | ||
const typename std::enable_if< | ||
(!std::is_same<typename traits::data_type, | ||
typename traits::non_const_data_type>::value) || | ||
(std::is_same<Device, int>::value), | ||
int>::type& = 0) { | ||
sync_impl<Device>(std::false_type{}, exec); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely you meant
template <class Device> | |
void sync(const typename std::enable_if< | |
(!std::is_same<typename traits::data_type, | |
typename traits::non_const_data_type>::value) || | |
(std::is_same<Device, int>::value), | |
int>::type& = 0) { | |
sync_impl<Device>(std::false_type{}); | |
} | |
template <class Device, class ExecutionSpace> | |
void sync(const ExecutionSpace& exec, | |
const typename std::enable_if< | |
(!std::is_same<typename traits::data_type, | |
typename traits::non_const_data_type>::value) || | |
(std::is_same<Device, int>::value), | |
int>::type& = 0) { | |
sync_impl<Device>(std::false_type{}, exec); | |
} | |
template <class Device> | |
void sync() { | |
sync_impl<Device>(std::intergal_constant<bool, | |
std::is_same<typename traits::data_type, typename traits::non_const_data_type>::value || | |
std::is_same<Device, int>::value>); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you mixed SFINAE and tag dispatching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I mean that this should be completely rewritten, with this type trait made a constexpr static bool
of the class. If you'd like to make this change I can look at it, though I know that as is it wouldn't compile. Also, I don't know what it means for the Device to be an int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, wait, your example doesn't pass the execution space, if we're given one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <class Device, class.. ExecSpaceOrNothing>
void sync(ExecSpaceOrNothing const& exec_space_or_nothing...) {
using Tag = std::intergal_constant<bool,
std::is_same<typename traits::data_type, typename traits::non_const_data_type>::value ||
std::is_same<Device, int>::value>;
sync_impl<Device>(Tag{}, exec_space_or_nothing...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and you can try a
using Tuple = std::conditional_t<sizeof...(ExecSpaceOrNothing)==1, std::tuple< ExecSpaceOrNothing...>, std::tuple<void>>;
static_assert(sizeof...(ExecSpaceOrNothing)==0 || sizeof...(ExecSpaceOrNothing)==1 && Kokkos::is_execution_space<std::tuple_element_t<0,Tuple>>::value, "")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or better if you want make Daisy proud
#ifdef KOKKOS_ENABLE_CXX17
static_assert((... && Kokkos::is_execution_space<ExecSpaceOrNothing>::value));
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what I worry about here is that in the current version, we have functions with the same signatures I have provided, that is
template<typename Device> void sync(int)
So I am positive that any program using DualView won't see changes. I can argue to myself that your change works: the only thing I can see breaking is if somebody incorrectly actually passed an int
argument:
Kokkos::DualView<float*, Space> my_dv;
my_dv.sync<Space>(0);
This would be incorrect, but parses and works in the current version. That's just off the top of my head, and this is C++, there's always another corner case. If you think your edit needs to happen, I'll do it, but I worry about doing the subtle and clever thing to save ~30 lines, rather than guaranteeing ourselves the same interface. My recommendation is to have an action item for 4.0 to make this change, but emphasize stability here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thought of another: if you've taken a (member) function pointer to this function, this would be a breaking change. Of course I hope nobody is taking function pointers to these things, but I've hoped for a lot of things...
containers/src/Kokkos_DualView.hpp
Outdated
return Kokkos::Cuda(Kokkos::Impl::cuda_get_deep_copy_stream()); | ||
} | ||
|
||
template <typename Arg> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> NonCudaExecSpace
and
const
This is something EMPIRE is going to need eventually, probably Trilinos too. In the spirit of removing these globally fencing 2-arg deep copies, DualViews should allow for a user to call sync on an execution space, and use that execution space for any deep copies it invokes. There's some complications, that lead to the design being a bit screwy.
First, the sync code is huge, I want to avoid code duplication. In my head, the way to do this, conceptually looks like
If somebody passes an ExecutionSpace, we use it, otherwise we don't, just like the default.
Problem 1: the existing sync method does tricks with a default argument to specialize based on whether the DualView is of const or nonconst type.
Solution 1: Okay, use a delegating method
Problem 2: @crtrott making work for me. We have this gnarly block to handle UVM goodness in DualView's sync
This should use the execution space if and only if it's a CUDA instance. So I added some plumbing for that, Impl::get_cuda_space, which returns a Cuda Space if one is passed to it, if anything else is passed to it, it just construct a default one (exactly what we currently do).
I really don't know enough about C++ design patterns to say whether this thing is implemented well or terribly, to be honest.
edit: Also, made our default deep copy instance for these be the deep_copy stream. Possible small asynchrony benefits there (see: #3822 (comment) )