Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_mirror_view_and_copy #1161

Closed
ibaned opened this issue Oct 11, 2017 · 5 comments
Closed

create_mirror_view_and_copy #1161

ibaned opened this issue Oct 11, 2017 · 5 comments
Labels
Enhancement Improve existing capability; will potentially require voting
Milestone

Comments

@ibaned
Copy link
Contributor

ibaned commented Oct 11, 2017

Lets say that users are porting a code using unmanaged views:

auto h_data = Kokkos::View<T*, Kokkos::HostSpace, Kokkos::MemoryUnmanaged>(my_ptr, 1000);
auto d_data = Kokkos::create_mirror_view(Kokkos::DefaultExecutionSpace(), h_data);
Kokkos::deep_copy(d_data, h_data);

This will do three things when compiled with CUDA:

  1. Allocate d_data. This is fine.
  2. Initialize d_data with zeros! This is what I want to avoid. The data is about to be overwritten.
  3. Deep copy from host to device. This is fine.

By combining these three actions under one API, we can optimize out initialization in the case when data is about to be overwritten. More generally, we should offer a create_uninitialized_mirror_view.

Second case: The above code doesn't even compile when T=const double. The reason being that deep_copy doesn't accept a const left hand side. This is the fundamental complaint in #728 I think. By combining these three lines into one API, we could specialize it for the const case such that in CUDA it creates a mirror, does a deep copy, and then returns a const View, while in other cases it simply returns the input View.

So, there are at least two reasons why the proposed single API would improve performance. Also, I think this is basically the first and most common thing that users do when they start porting.

@ibaned ibaned added the Enhancement Improve existing capability; will potentially require voting label Oct 11, 2017
@ibaned ibaned added this to the 2017 December milestone Oct 11, 2017
@dholladay00
Copy link

I too have to do this. I receive inputs from the host code and they have to be transferred to the default execution space's memory space. It is nice that this can be done in a way that avoids copies when possible, but making them const is a little clunky. Something like this would be very nice.

// const input data
View<input*, MemoryUnmanaged, HostSpace> h_a(ptr, N);
View<input*> d_a_nc = create_mirror_view(DefaultExecutionSpace(), h_a);
deep_copy(d_a_nc, h_a);
View<const input*, RandomAccess> d_a(d_a_nc);
// output data
View<output*, MemoryUnmanaged, HostSpace> h_b(other_ptr, N);
View<output*> d_b = create_mirror_view(DefaultExecutionSpace(), h_b);
deep_copy(d_b, h_b);

@ibaned
Copy link
Contributor Author

ibaned commented Oct 11, 2017

Yea, thats basically what I meant. There should be one API that chooses the right option from your example automatically.

@ibaned
Copy link
Contributor Author

ibaned commented Oct 11, 2017

Just for book-keeping, the VPIC team would benefit from this.

@stanmoore1
Copy link
Contributor

Would be nice to have something similar for Kokkos::DualView as well.

ibaned added a commit that referenced this issue Oct 11, 2017
ibaned added a commit that referenced this issue Oct 11, 2017
@ibaned ibaned changed the title create_mirror_view_and_deep_copy create_mirror_copy Oct 11, 2017
@ibaned
Copy link
Contributor Author

ibaned commented Oct 11, 2017

First draft implementation in PR #1164

@ibaned ibaned changed the title create_mirror_copy create_mirror_view_and_copy Oct 16, 2017
@hcedwar hcedwar modified the milestones: 2017 December, 2017 October Oct 18, 2017
@crtrott crtrott closed this as completed Oct 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Improve existing capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

5 participants