Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualView sync's on ExecutionSpaces #3822

Merged
merged 4 commits into from
Mar 16, 2021

Conversation

DavidPoliakoff
Copy link
Contributor

@DavidPoliakoff DavidPoliakoff commented Mar 3, 2021

This is something EMPIRE is going to need eventually, probably Trilinos too. In the spirit of removing these globally fencing 2-arg deep copies, DualViews should allow for a user to call sync on an execution space, and use that execution space for any deep copies it invokes. There's some complications, that lead to the design being a bit screwy.

First, the sync code is huge, I want to avoid code duplication. In my head, the way to do this, conceptually looks like

template<class... Args>
void sync(Args... args){
  if(i_should){ 
    Kokkos::deep_copy(args..., d_view,h_view); 
  }
}

If somebody passes an ExecutionSpace, we use it, otherwise we don't, just like the default.

Problem 1: the existing sync method does tricks with a default argument to specialize based on whether the DualView is of const or nonconst type.

Solution 1: Okay, use a delegating method

template<class... Args>
void sync_impl(const std::true_type&, Args... args){
  if(i_should){ 
    Kokkos::deep_copy(args..., d_view,h_view); 
  }
}

template<typename ExecutionSpace>
void sync(const ExecutionSpace& exec, /** chthonic metaprogramming incantation, true branch */) {
  sync_impl(true_type{}, exec);
}

void sync(/** chthonic metaprogramming incantation, true branch */) {
  sync_impl(true_type{});
}

//false left off here, but handled the same way

Problem 2: @crtrott making work for me. We have this gnarly block to handle UVM goodness in DualView's sync

Kokkos::Impl::cuda_prefetch_pointer(
  Kokkos::Cuda(), d_view.data(),
 sizeof(typename t_dev::value_type) * d_view.span(), false);

This should use the execution space if and only if it's a CUDA instance. So I added some plumbing for that, Impl::get_cuda_space, which returns a Cuda Space if one is passed to it, if anything else is passed to it, it just construct a default one (exactly what we currently do).

I really don't know enough about C++ design patterns to say whether this thing is implemented well or terribly, to be honest.

edit: Also, made our default deep copy instance for these be the deep_copy stream. Possible small asynchrony benefits there (see: #3822 (comment) )

@DavidPoliakoff DavidPoliakoff added the Blocks Promotion Overview issue for release-blocking bugs label Mar 3, 2021
@DavidPoliakoff
Copy link
Contributor Author

Note: not doing this for resize, as that allocates, which is a sync, too. So no point in optimizing that (unless we get async allocations)

containers/src/Kokkos_DualView.hpp Outdated Show resolved Hide resolved
nliber
nliber previously requested changes Mar 4, 2021
containers/src/Kokkos_DualView.hpp Show resolved Hide resolved
@DavidPoliakoff
Copy link
Contributor Author

Don't merge this one, I did something very dumb

@DavidPoliakoff
Copy link
Contributor Author

Christian and I discussed over Slack. There's subtlety here, and I was worried about other problems we might encounter. Specifically, assuming a UVM View, and somebody does

my_uvm_dual_view.sync_host(my_host_space_instance);

Conceptually, this could be broken. On the other hand, it's the moral equivalent of

Kokkos::deep_copy(some_host_space, uvm_view_1, uvm_view_2);

So I'm going to make an incremental change here, but otherwise this will be good to go

@DavidPoliakoff
Copy link
Contributor Author

Assuming this passes CI, it's ready for a review

containers/src/Kokkos_DualView.hpp Show resolved Hide resolved
containers/src/Kokkos_DualView.hpp Show resolved Hide resolved
containers/src/Kokkos_DualView.hpp Outdated Show resolved Hide resolved
Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

@DavidPoliakoff
Copy link
Contributor Author

Retest this please

@DavidPoliakoff
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor

You probably have to rebase to make this pass...

@masterleinad masterleinad added this to the Tentative 3.4 Release milestone Mar 11, 2021
core/src/Kokkos_CudaSpace.hpp Show resolved Hide resolved
Comment on lines +593 to +609
template <class Device>
void sync(const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{});
}
template <class Device, class ExecutionSpace>
void sync(const ExecutionSpace& exec,
const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{}, exec);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely you meant

Suggested change
template <class Device>
void sync(const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{});
}
template <class Device, class ExecutionSpace>
void sync(const ExecutionSpace& exec,
const typename std::enable_if<
(!std::is_same<typename traits::data_type,
typename traits::non_const_data_type>::value) ||
(std::is_same<Device, int>::value),
int>::type& = 0) {
sync_impl<Device>(std::false_type{}, exec);
}
template <class Device>
void sync() {
sync_impl<Device>(std::intergal_constant<bool,
std::is_same<typename traits::data_type, typename traits::non_const_data_type>::value ||
std::is_same<Device, int>::value>);
}

Copy link
Member

@dalg24 dalg24 Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you mixed SFINAE and tag dispatching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I mean that this should be completely rewritten, with this type trait made a constexpr static bool of the class. If you'd like to make this change I can look at it, though I know that as is it wouldn't compile. Also, I don't know what it means for the Device to be an int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wait, your example doesn't pass the execution space, if we're given one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  template <class Device, class.. ExecSpaceOrNothing>
  void sync(ExecSpaceOrNothing const& exec_space_or_nothing...) {
    using Tag = std::intergal_constant<bool,
        std::is_same<typename traits::data_type, typename traits::non_const_data_type>::value ||
        std::is_same<Device, int>::value>;
    sync_impl<Device>(Tag{}, exec_space_or_nothing...);
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you can try a

using Tuple = std::conditional_t<sizeof...(ExecSpaceOrNothing)==1, std::tuple< ExecSpaceOrNothing...>, std::tuple<void>>;
static_assert(sizeof...(ExecSpaceOrNothing)==0 || sizeof...(ExecSpaceOrNothing)==1 && Kokkos::is_execution_space<std::tuple_element_t<0,Tuple>>::value, "")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or better if you want make Daisy proud

#ifdef KOKKOS_ENABLE_CXX17
static_assert((... && Kokkos::is_execution_space<ExecSpaceOrNothing>::value));
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what I worry about here is that in the current version, we have functions with the same signatures I have provided, that is

template<typename Device> void sync(int)

So I am positive that any program using DualView won't see changes. I can argue to myself that your change works: the only thing I can see breaking is if somebody incorrectly actually passed an int argument:

Kokkos::DualView<float*, Space> my_dv;
my_dv.sync<Space>(0);

This would be incorrect, but parses and works in the current version. That's just off the top of my head, and this is C++, there's always another corner case. If you think your edit needs to happen, I'll do it, but I worry about doing the subtle and clever thing to save ~30 lines, rather than guaranteeing ourselves the same interface. My recommendation is to have an action item for 4.0 to make this change, but emphasize stability here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thought of another: if you've taken a (member) function pointer to this function, this would be a breaking change. Of course I hope nobody is taking function pointers to these things, but I've hoped for a lot of things...

return Kokkos::Cuda(Kokkos::Impl::cuda_get_deep_copy_stream());
}

template <typename Arg>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> NonCudaExecSpace
and
const

@dalg24 dalg24 dismissed nliber’s stale review March 16, 2021 21:15

Changes requested have been implemented

@dalg24 dalg24 merged commit 08ba110 into kokkos:develop Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants