Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: deep_copy within parallel regions #689

Closed
vbrunini opened this issue Mar 17, 2017 · 13 comments
Closed

Request: deep_copy within parallel regions #689

vbrunini opened this issue Mar 17, 2017 · 13 comments
Assignees
Labels
Feature Request Create new capability; will potentially require voting

Comments

@vbrunini
Copy link
Contributor

I'd like to have versions of deep_copy (both the copy from another view and the fill with a constant value) that can be called from within parallel regions and are as performant as the equivalent using memcpy/memset on the underlying data.

@crtrott

@gmackey
Copy link
Contributor

gmackey commented Mar 17, 2017

This functionality is something that could be supported, but I'm not sure using the function deep_copy() is the way we want to go. What you want is something that copies between views only in the same memory space. deep_copy() implies that it works copying between memory spaces.

@crtrott We've talked a couple of times about adding some of the stl algorithm functionality to Kokkos algorithms. One way to approach this problem is to use a Kokkos algorithms analogue to stl::copy(). This would avoid confusing the intent of the name deep_copy(). Although it would bring up the issue of should our copy() work for both in parallel regions and outside parallel regions? Should we have separate functions that do each (copy() and serial_copy()), or should a single function detect if it is in a parallel region and do the right thing?

@crtrott crtrott added the Feature Request Create new capability; will potentially require voting label Mar 17, 2017
@mhoemmen
Copy link
Contributor

deep_copy is the "easiest Kokkos kernel" ;-) Would it make sense to have team-level, thread-level, etc. versions of it?

@hcedwar
Copy link
Contributor

hcedwar commented Apr 5, 2017

@vbrunini : what is the scope you are requesting? As per @mhoemmen comment? Restricted to same memory space?

@hcedwar hcedwar added this to the Backlog milestone Apr 5, 2017
@vbrunini
Copy link
Contributor Author

vbrunini commented Apr 5, 2017

Restricted to the same memory space and just single thread level is fine for my needs. Basically I have a bunch of code like:

#ifndef KOKKOS_HAVE_CUDA
auto subview =
Kokkos::subview(view, Kokkos::pair<int, int>(*range.begin(), *range.end()), Kokkos::ALL);
auto size = subview.span() * DoubleTypeLength;
auto begin = reinterpret_cast<double *>(subview.ptr_on_device());
std::fill(begin, begin + size, 0.);
#else
const int n1 = view.extent(1);
for (auto elem : range)
for (int i1 = 0; i1 < n1; ++i1)
view(elem, i1) = 0.;
#endif

that I would like to replace with just the subview construction and a call to a deep_copy (or an alternatively named function) that does the fill.

@ibaned
Copy link
Contributor

ibaned commented Mar 14, 2018

Is this still needed, and if so is it okay if its called something different (like local_deep_copy)?

@ibaned ibaned self-assigned this Mar 14, 2018
@ibaned ibaned modified the milestones: Backlog, 2018 June Mar 14, 2018
@vbrunini
Copy link
Contributor Author

Yes this would still be useful. I'm fine with a different name, maybe fill to be consistent with std::fill?

@ibaned ibaned modified the milestones: 2018 June, 2018 April Mar 15, 2018
@crtrott crtrott modified the milestones: 2018 April, 2018 June Apr 18, 2018
@crtrott crtrott assigned vqd8a and unassigned ibaned May 2, 2018
@Char-Aznable
Copy link
Contributor

Char-Aznable commented Jun 7, 2018

I have a similar request for this feature. Here's a usage example, which only works on CPU:

#include <cstdio>
#include <iostream>
#include <Kokkos_Core.hpp>
#include <Kokkos_DualView.hpp>

using namespace std;

using ExecutionSpace = Kokkos::DefaultExecutionSpace;

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv); {
  Kokkos::DualView<int***[5][6],Kokkos::LayoutRight,ExecutionSpace,Kokkos::MemoryTraits<Kokkos::Atomic>> a("a",2,3,4);
  using MemorySpace = decltype(a)::memory_space;
  using HostMirrorSpace = decltype(a)::host_mirror_space;
  auto asub = Kokkos::subview(a,0,1,2,Kokkos::ALL(),Kokkos::ALL());
  cout << "Before: \n";
  for(int i = 0; i < 5; ++i) {
    for(int j = 0; j < 6; ++j) {
      asub.h_view(i,j) = i * j;
      cout.width(5);
      cout << asub.h_view(i,j) << " ";
    }
    cout << endl;
  }
  a.modify<HostMirrorSpace>();
  a.sync<MemorySpace>();
  Kokkos::parallel_for(Kokkos::RangePolicy<ExecutionSpace>(0,5), KOKKOS_LAMBDA (const int i){
    auto subSrc = Kokkos::subview(a.d_view,0,1,2,Kokkos::ALL(),i+1);
    auto subDst = Kokkos::subview(a.d_view,0,1,2,Kokkos::ALL(),i);
    Kokkos::deep_copy(subDst,subSrc);
    }); 
  Kokkos::fence();
  a.modify<MemorySpace>();
  a.sync<HostMirrorSpace>();

  cout << "After: \n";
  for(int i = 0; i < 5; ++i) {
    for(int j = 0; j < 6; ++j) {
      cout.width(5);
      cout << asub.h_view(i,j) << " ";
    }
    cout << endl;
  }

  } Kokkos::finalize();
}

This example is a bit abusive because it basically copy the i+1th row to the ith row of a subview matrix but the actually use case would be i being somewhat randomly associated with each thread.

@ibaned ibaned modified the milestones: 2018 July, 2018 September Aug 1, 2018
@mhoemmen
Copy link
Contributor

@vbrunini I was thinking about how to do this. The main issue is that Kokkos::TeamPolicy doesn't expose member_type, so we can't overload Kokkos::deep_copy on it. TeamPolicy's base class does expose member_type, but the base class lives in the Kokkos::Impl namespace, and is thus not part of the public interface.

@vbrunini
Copy link
Contributor Author

I don't think a solution involving Kokkos::TeamPolicy or its member_type makes sense because we may also want to call this from loops that use a RangePolicy.

@mhoemmen
Copy link
Contributor

@vbrunini Why couldn't we just call Kokkos::deep_copy (Kokkos::Serial, dst, src); in that case?

@vbrunini
Copy link
Contributor Author

I hadn't thought about that option, no objections to that.

@ibaned
Copy link
Contributor

ibaned commented Sep 18, 2018

@mhoemmen I think we've seen issues before when trying to run a Serial parallel_for inside some other parallel_for. There is an interpretation in which that should work, however in practice I think Serial is an outer-level execution space only.

@mhoemmen
Copy link
Contributor

@ibaned The member_type of a RangePolicy is an integer; we could just overload Kokkos::deep_copy so if the first argument is an integer, it does a sequential copy. That may look weird, though.

@ibaned ibaned modified the milestones: 2018 September, 2018 December Nov 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Create new capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

9 participants