Skip to content

[SYCL][CUDA] unnecessary memcpy for write buffers. #1992

@Ryefalk

Description

@Ryefalk

When using write and discard_write accessors with the CUDA backend a Host to Device copy is made. Even though this is supposed to be write only which does not need such an action and this is wasted resources.

queue.submit([&] (cl::sycl::handler& cgh) {         
         auto input_acc = input.get_access<sycl::access::mode::read>(cgh);
         auto output_acc = output.get_access<sycl::access::mode::discard_write>(cgh);
         auto maxRange = sycl::nd_range<2>(sycl::range<2>{height, width / 4}, sycl::range<2>(1, 128));
         cgh.parallel_for<class test>(maxRange, [=](sycl::nd_item<2> item){
            output_acc[item.get_global_id()] = 0;
         });
});

nvvp

I have never written an issue like this before so if you need additional information just ask.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcudaCUDA back-endperformancePerformance related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions