-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create_mirror_view_and_copy #1161
Comments
I too have to do this. I receive inputs from the host code and they have to be transferred to the default execution space's memory space. It is nice that this can be done in a way that avoids copies when possible, but making them // const input data
View<input*, MemoryUnmanaged, HostSpace> h_a(ptr, N);
View<input*> d_a_nc = create_mirror_view(DefaultExecutionSpace(), h_a);
deep_copy(d_a_nc, h_a);
View<const input*, RandomAccess> d_a(d_a_nc);
// output data
View<output*, MemoryUnmanaged, HostSpace> h_b(other_ptr, N);
View<output*> d_b = create_mirror_view(DefaultExecutionSpace(), h_b);
deep_copy(d_b, h_b); |
Yea, thats basically what I meant. There should be one API that chooses the right option from your example automatically. |
Just for book-keeping, the VPIC team would benefit from this. |
Would be nice to have something similar for |
First draft implementation in PR #1164 |
Lets say that users are porting a code using unmanaged views:
This will do three things when compiled with CUDA:
d_data
. This is fine.d_data
with zeros! This is what I want to avoid. The data is about to be overwritten.By combining these three actions under one API, we can optimize out initialization in the case when data is about to be overwritten. More generally, we should offer a
create_uninitialized_mirror_view
.Second case: The above code doesn't even compile when
T
=const double
. The reason being thatdeep_copy
doesn't accept aconst
left hand side. This is the fundamental complaint in #728 I think. By combining these three lines into one API, we could specialize it for theconst
case such that in CUDA it creates a mirror, does a deep copy, and then returns a const View, while in other cases it simply returns the input View.So, there are at least two reasons why the proposed single API would improve performance. Also, I think this is basically the first and most common thing that users do when they start porting.
The text was updated successfully, but these errors were encountered: