Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

successive custom operations on the same SYCL queue sometimes execute in different HIP streams #614

Closed
mabraham opened this issue Aug 11, 2021 · 9 comments
Labels
bug Something isn't working

Comments

@mabraham
Copy link
Contributor

Bug summary

hipSYCL's custom operations sometimes execute in a different native stream even when on the same SYCL queue

To Reproduce
Add the fragment below to e.g. tests/sycl/accessor.cpp, build targeting an AMD device, and repeat running ./sycl_tests --run_test=accessor_tests/custom_kernel until the assertion on stream1 != stream2 fails. Around 1 execution in 4 fails for me. See sample failing output below.

#if defined(HIPSYCL_PLATFORM_HIP)
__global__
void plus_one_kernel (const float* __restrict__ d_input, float * __restrict__ d_result, const int size)
{
  int i = hipBlockDim_x * hipBlockIdx_x + hipThreadIdx_x;
  if (i < size ) {
    d_result[i] = d_input[i] + 1.0;
  }
}
#endif

BOOST_AUTO_TEST_CASE(custom_kernel) {
  namespace s = cl::sycl;
  const int size = 32;
  std::vector<float> host_input(size, 1.0);
  s::buffer<float, 1> input(size);
  // Copy input to device                                                                                                                                                                                                                                                                 
  s::queue queue;
  queue.submit([&](s::handler& cgh) {
    auto acc_input = s::accessor { input, cgh, s::write_only, s::property::no_init{}};
    cgh.copy(host_input.data(), acc_input);
  });
  // Initialize result buffer on the host, so that we can see that the                                                                                                                                                                                                                    
  // subsquent transfer from the device occurs, but do not transfer to                                                                                                                                                                                                                    
  // device now.                                                                                                                                                                                                                                                                          
  std::vector<float> host_result(size, 0.0);
  s::buffer<float, 1> result(size), result2(size);
  // Copy input to result, adding 1 as we go                                                                                                                                                                                                                                              
#if defined(HIPSYCL_PLATFORM_HIP)
  hipStream_t stream1, stream2;
  queue.submit([&](s::handler& cgh) {
    auto acc_input = s::accessor { input, cgh, s::read_only, s::property::no_init{}};
    auto acc_result = s::accessor { result, cgh, s::write_only, s::property::no_init{}};
    cgh.hipSYCL_enqueue_custom_operation([=](cl::sycl::interop_handle& h) {
      const float* d_input = h.get_native_mem<s::backend::hip>(acc_input);
      float* d_result = h.get_native_mem<s::backend::hip>(acc_result);
      hipStream_t stream1 = h.get_native_queue<cl::sycl::backend::hip>();
      const int threads_per_block = 8;
      hipLaunchKernelGGL(plus_one_kernel,
                         dim3(size / threads_per_block),
                         dim3(threads_per_block),
                         0, stream1, d_input, d_result, size);
    });
  });
  queue.wait();
  // Submit a second kernel with the same function to see if that                                                                                                                                                                                                                         
  // works in the same stream.                                                                                                                                                                                                                                                            
  queue.submit([&](s::handler& cgh) {
    auto acc_result = s::accessor { result, cgh, s::read_only, s::property::no_init{}};
    auto acc_result2 = s::accessor { result2, cgh, s::write_only, s::property::no_init{}};
    cgh.hipSYCL_enqueue_custom_operation([=](cl::sycl::interop_handle& h) {
      const float* d_result = h.get_native_mem<s::backend::hip>(acc_result);
      float* d_result2 = h.get_native_mem<s::backend::hip>(acc_result2);
      hipStream_t stream2 = h.get_native_queue<cl::sycl::backend::hip>();
      const int threads_per_block = 8;
      hipLaunchKernelGGL(plus_one_kernel,
                         dim3(size / threads_per_block),
                         dim3(threads_per_block),
                         0, stream2, d_result, d_result2, size);
    });
  });
  BOOST_CHECK(stream1 == stream2);
  queue.wait();
#else
  // Same logical operation as the above for HIP                                                                                                                                                                                                                                          
  queue.submit([&](s::handler& cgh) {
    auto acc_input = s::accessor { input, cgh, s::read_only, s::property::no_init{}};
    auto acc_result = s::accessor { result, cgh, s::write_only, s::property::no_init{}};
    auto kernel = [=](cl::sycl::id<1> i) {
      acc_result[i] = acc_input[i] + 1.0;
    };
    cgh.parallel_for(s::range(size), kernel);
  });
  queue.submit([&](s::handler& cgh) {
    auto acc_result = s::accessor { result, cgh, s::read_only, s::property::no_init{}};
    auto acc_result2 = s::accessor { result2, cgh, s::write_only, s::property::no_init{}};
    auto kernel = [=](cl::sycl::id<1> i) {
      acc_result2[i] = acc_result[i] + 1.0;
    };
    cgh.parallel_for(s::range(size), kernel);
  });
#endif
  queue.submit([&](s::handler& cgh) {
    auto acc_result2 = s::accessor { result2, cgh, s::read_only, s::property::no_init{}};
    cgh.copy(acc_result2, host_result.data());
  });
  queue.wait();
  for (auto &value : host_result)
  {
    fprintf(stderr, "%g\n", value);
  }
  BOOST_CHECK(host_result[0] == 3.0);
}

Expected behavior
I expected that all operations on a SYCL queue take place on the same underlying hipStream_t and that that stream is also the one used for a hipSYCL custom operation. If this is not the case, then arranging for correct dependencies may be difficult.

Describe your setup
hipSYCL built from today's develop commit 3e05c38, built on ROCm 4.2, using the clang compiler installed with ROCm, and targeting hip:gfx906. The node has two such devices.

Optional additional diagnostic information

Sample failing output is found below. Note HIPSYCL_DEBUG_LEVEL=3 is on

$ ./sycl_tests --run_test=accessor_tests/custom_kernel
Running 1 test case...
[hipSYCL Info] data_region: constructed with page table dimensions 1 1 1
[hipSYCL Info] backend_loader: Searching path for backend libs: '"/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL"'
[hipSYCL Info] backend_loader: Successfully opened plugin: "/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL/librt-backend-omp.so" for backend 'omp'
[hipSYCL Info] backend_loader: Successfully opened plugin: "/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL/librt-backend-hip.so" for backend 'hip'
[hipSYCL Info] Registering backend: 'omp'...
[hipSYCL Info] multi_queue_executor: Spawned for backend OpenMP with configuration: 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     kernel lane: 1
[hipSYCL Info] Registering backend: 'hip'...
[hipSYCL Info] hip_device_manager: Switchting to device 1
[hipSYCL Info] multi_queue_executor: Spawned for backend HIP with configuration: 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     memcpy lane: 1
[hipSYCL Info]     kernel lane: 2
[hipSYCL Info]     kernel lane: 3
[hipSYCL Info]   device 1: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     memcpy lane: 1
[hipSYCL Info]     kernel lane: 2
[hipSYCL Info]     kernel lane: 3
[hipSYCL Info] Discovered devices from backend 'OpenMP': 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     vendor: the hipSYCL project
[hipSYCL Info]     name: hipSYCL OpenMP host device
[hipSYCL Info] Discovered devices from backend 'HIP': 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     vendor: AMD
[hipSYCL Info]     name: Vega 20
[hipSYCL Info]   device 1: 
[hipSYCL Info]     vendor: AMD
[hipSYCL Info]     name: Vega 20
[hipSYCL Info] dag_manager: DAG manager is alive!
[hipSYCL Info] runtime: ******* rt launch initiated ********
[hipSYCL Info] queue: Constructed queue with node group id 1
[hipSYCL Info] dag_manager: Checking DAG flush opportunity...
[hipSYCL Info] data_region: constructed with page table dimensions 1 1 1
[hipSYCL Info] data_region: constructed with page table dimensions 1 1 1
[hipSYCL Info] dag_manager: Checking DAG flush opportunity...
[hipSYCL Info] dag_manager: Submitting asynchronous flush...
[hipSYCL Info] dag_manager: waiting for async worker...
[hipSYCL Info] dag_manager [async]: Flushing!
[hipSYCL Info] dag_builder: DAG contains operation: Memcpy: CPU-Device0 #4 {0, 0, 0}+{1, 1, 32}-->ROCm-Device0 #4 {0, 0, 0}+{1, 1, 32}{1, 1, 32} @node 0x1dcdcd0
[hipSYCL Info]     --> requires: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcda90
[hipSYCL Info] dag_builder: DAG contains operation: kernel: ZZN14accessor_tests13custom_kernel11test_methodEvENKUlRN7hipsycl4sycl7handlerEE0_clES4_EUlRNS2_14interop_handleEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4
   MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce800
[hipSYCL Info]     --> requires: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce2f0
[hipSYCL Info]     --> requires: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce410
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcda90
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce2f0
[hipSYCL Info]     --> requires: Memcpy: CPU-Device0 #4 {0, 0, 0}+{1, 1, 32}-->ROCm-Device0 #4 {0, 0, 0}+{1, 1, 32}{1, 1, 32} @node 0x1dcdcd0
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce410
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1ce7400
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1ce7400
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1cdf2c0
[hipSYCL Info] dag_manager [async]: Submitting node to scheduler!
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e09000
[hipSYCL Info] multi_queue_executor: Processing node 0x1dcdcd0 with 0 non-virtual requirement(s) and 1 direct requirement(s).
[hipSYCL Info] multi_queue_executor: Dispatching to lane 0: Memcpy: CPU-Device0 #4 {0, 0, 0}+{1, 1, 32}-->ROCm-Device0 #4 {0, 0, 0}+{1, 1, 32}{1, 1, 32}
[hipSYCL Info] dag_manager [async]: Submitting node to scheduler!
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e09000
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e0a000
[hipSYCL Info] multi_queue_executor: Processing node 0x1dce800 with 1 non-virtual requirement(s) and 2 direct requirement(s).
[hipSYCL Info]  --> Synchronizes with other queue for node: 0x1dcdcd0 lane = 0
[hipSYCL Info] multi_queue_executor: Dispatching to lane 2: kernel: ZZN14accessor_tests13custom_kernel11test_methodEvENKUlRN7hipsycl4sycl7handlerEE0_clES4_EUlRNS2_14interop_handleEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4
   MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4
[hipSYCL Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x1d060f0
[hipSYCL Info] Identified embedded pointer with uid 16076125835937896726-2744383884891742132 in kernel blob, setting to 0x7f7058e09000
[hipSYCL Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x1d06000
[hipSYCL Info] Identified embedded pointer with uid 8005467495190924626-13408990265625531316 in kernel blob, setting to 0x7f7058e0a000
[hipSYCL Info] dag_manager [async]: DAG flush complete.
[hipSYCL Info] dag_submitted_ops: Waiting for node group 1
[hipSYCL Info] dag_submitted_ops: Waiting for node group; current node: 0x1dce800
[hipSYCL Info] dag_manager: Checking DAG flush opportunity...
/nethome/mabraham/git/hipSYCL/tests/sycl/accessor.cpp(207): error: in "accessor_tests/custom_kernel": check stream1 == stream2 has failed
[hipSYCL Info] dag_manager: Submitting asynchronous flush...
[hipSYCL Info] dag_manager: waiting for async worker...
[hipSYCL Info] dag_manager [async]: Flushing!
[hipSYCL Info] dag_builder: DAG contains operation: kernel: ZZN14accessor_tests13custom_kernel11test_methodEvENKUlRN7hipsycl4sycl7handlerEE1_clES4_EUlRNS2_14interop_handleEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4
   MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcf190
[hipSYCL Info]     --> requires: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcec90
[hipSYCL Info]     --> requires: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcee80
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcec90
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dcee80
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1cdf2c0
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1dce050
[hipSYCL Info] dag_manager [async]: Submitting node to scheduler!
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e0a000
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e1f000
[hipSYCL Info] multi_queue_executor: Processing node 0x1dcf190 with 0 non-virtual requirement(s) and 2 direct requirement(s).
[hipSYCL Info] multi_queue_executor: Dispatching to lane 3: kernel: ZZN14accessor_tests13custom_kernel11test_methodEvENKUlRN7hipsycl4sycl7handlerEE1_clES4_EUlRNS2_14interop_handleEE_
   MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4
   MEM_REQ: Discard W device {0, 0, 0}+{1, 1, 32} #4
[hipSYCL Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x1dceb50
[hipSYCL Info] Identified embedded pointer with uid 11104133657822316222-3969468537625869492 in kernel blob, setting to 0x7f7058e0a000
[hipSYCL Info] buffer_memory_requirement: Attempting to initialize embedded pointers for requirement 0x1dced40
[hipSYCL Info] Identified embedded pointer with uid 10959842551414061094-10958964997170337972 in kernel blob, setting to 0x7f7058e1f000
[hipSYCL Info] dag_manager [async]: DAG flush complete.
[hipSYCL Info] dag_submitted_ops: Waiting for node group 1
[hipSYCL Info] dag_submitted_ops: Waiting for node group; current node: 0x1dcf190
[hipSYCL Info] dag_manager: Checking DAG flush opportunity...
[hipSYCL Info] dag_manager: Submitting asynchronous flush...
[hipSYCL Info] dag_manager: waiting for async worker...
[hipSYCL Info] dag_manager [async]: Flushing!
[hipSYCL Info] dag_builder: DAG contains operation: Memcpy: ROCm-Device0 #4 {0, 0, 0}+{1, 1, 32}-->CPU-Device0 #4 {0, 0, 0}+{1, 1, 32}{1, 1, 32} @node 0x1dcf680
[hipSYCL Info]     --> requires: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce410
[hipSYCL Info] dag_builder: DAG contains operation: MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 @node 0x1dce410
[hipSYCL Info] dag_manager [async]: Releasing dead users of data region 0x1dce050
[hipSYCL Info] dag_manager [async]: Submitting node to scheduler!
[hipSYCL Info] dag_direct_scheduler: Setting device data pointer of requirement node MEM_REQ: R device {0, 0, 0}+{1, 1, 32} #4 to 0x7f7058e1f000
[hipSYCL Info] multi_queue_executor: Processing node 0x1dcf680 with 0 non-virtual requirement(s) and 1 direct requirement(s).
[hipSYCL Info] multi_queue_executor: Dispatching to lane 1: Memcpy: ROCm-Device0 #4 {0, 0, 0}+{1, 1, 32}-->CPU-Device0 #4 {0, 0, 0}+{1, 1, 32}{1, 1, 32}
[hipSYCL Info] dag_manager [async]: DAG flush complete.
[hipSYCL Info] dag_submitted_ops: Waiting for node group 1
[hipSYCL Info] dag_submitted_ops: Waiting for node group; current node: 0x1dcf680
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
[hipSYCL Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[hipSYCL Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x1dcdec0
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x7f7058e0a000
[hipSYCL Info] buffer_impl::~buffer_impl: Waiting for operations to complete...
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x1dc3ee0
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x7f7058e09000
[hipSYCL Info] rt_manager: Restarting runtime...
[hipSYCL Info] dag_manager: Submitting asynchronous flush...
[hipSYCL Info] dag_manager: waiting for async worker...
[hipSYCL Info] dag_manager [async]: Nothing to do
[hipSYCL Info] backend_loader: Searching path for backend libs: '"/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL"'
[hipSYCL Info] backend_loader: Successfully opened plugin: "/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL/librt-backend-omp.so" for backend 'omp'
[hipSYCL Info] backend_loader: Successfully opened plugin: "/nethome/mabraham/git/hipSYCL/install-3e05c388-debug/lib/hipSYCL/librt-backend-hip.so" for backend 'hip'
[hipSYCL Info] Registering backend: 'omp'...
[hipSYCL Info] multi_queue_executor: Spawned for backend OpenMP with configuration: 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     kernel lane: 1
[hipSYCL Info] Registering backend: 'hip'...
[hipSYCL Info] hip_device_manager: Switchting to device 0
[hipSYCL Info] hip_device_manager: Switchting to device 1
[hipSYCL Info] multi_queue_executor: Spawned for backend HIP with configuration: 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     memcpy lane: 1
[hipSYCL Info]     kernel lane: 2
[hipSYCL Info]     kernel lane: 3
[hipSYCL Info]   device 1: 
[hipSYCL Info]     memcpy lane: 0
[hipSYCL Info]     memcpy lane: 1
[hipSYCL Info]     kernel lane: 2
[hipSYCL Info]     kernel lane: 3
[hipSYCL Info] Discovered devices from backend 'OpenMP': 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     vendor: the hipSYCL project
[hipSYCL Info]     name: hipSYCL OpenMP host device
[hipSYCL Info] Discovered devices from backend 'HIP': 
[hipSYCL Info]   device 0: 
[hipSYCL Info]     vendor: AMD
[hipSYCL Info]     name: Vega 20
[hipSYCL Info]   device 1: 
[hipSYCL Info]     vendor: AMD
[hipSYCL Info]     name: Vega 20
[hipSYCL Info] dag_manager: DAG manager is alive!
[hipSYCL Info] runtime: ******* rt launch initiated ********
[hipSYCL Info] runtime: ******* rt shutdown ********
[hipSYCL Info] dag_manager: Waiting for async worker...
[hipSYCL Info] dag_manager: Shutdown.
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x1dce130
[hipSYCL Info] data_region::~data_region: Freeing allocation 0x7f7058e1f000

*** 1 failure is detected in the test module "hipsycl unit tests"

Additional context
GROMACS would like to set the hipStream_t for the subsequent rocFFT calls once during setup and then repeatedly call rocfft_execute and rely on getting the same stream each time.

@mabraham mabraham added the bug Something isn't working label Aug 11, 2021
@illuhad
Copy link
Collaborator

illuhad commented Aug 11, 2021

This is expected behavior. The hipSYCL scheduler automatically distributes work across multiple backend streams in order to offer to the hardware to execute independent operations in parallel.

There is no direct mapping between a sycl::queue and a backend queue/stream, and there cannot be because it's not a 1:1 relation. This is a feature of hipSYCL: You get the same overlap e.g. of data transfers and compute no matter how many SYCL queues you use.

If you think you know better than the hipSYCL scheduler, you can attach the hipSYCL_prefer_execution_lane property to the submit() call in which case hipSYCL will execute the operation on the specified backend queue (if applicable to the backend).

https://github.com/illuhad/hipSYCL/blob/develop/doc/extensions.md#hipsycl_ext_cg_property_prefer_execution_lane

EDIT: It might even work to pass this property to the queue in which case it should use it by default for all operations, but I haven't tested this yet.

@illuhad illuhad closed this as completed Aug 11, 2021
@mabraham
Copy link
Contributor Author

Thanks, I will consider further!

@mabraham
Copy link
Contributor Author

Replacing the submissions for kernels with

  queue.submit({s::property::command_group::hipSYCL_prefer_execution_lane{0}},
               [&](s::handler& cgh) {
...

I see that the hipStream can differ even when the property is passed. If that's expected behavior, then GROMACS will need to set the stream before every call to rocfft_execute (or perhaps omit stream handling at all, since correctness is assured by the handling of the generated SYCL event).

@fodinabor
Copy link
Collaborator

From a quick glance over the submission code, I don't see any issue.
Are you still verifying this with your code from above? If so, that might be an issue, as the hipStream_t stream1, stream2; values which are used for the BOOST_CHECK are never set.

@mabraham
Copy link
Contributor Author

I was also hiding them with extra declarations of stream1 and stream2 in the custom operations and then discarding their values. If I try to save the values in a mutable lambda, things crash, but I'm not sure why yet.

@fodinabor
Copy link
Collaborator

Exactly :)
Using mutable lambdas as kernels is definitely UB in SYCL (https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:interfaces.kernels.as.lambdas).

So either manually check by writing to the console or use a buffer to store the streams and compare later...

@mabraham
Copy link
Contributor Author

Ahh, that's great to know, thanks. I was using mutable lambdas only when probing for why things were wrong... creating more problems

@illuhad
Copy link
Collaborator

illuhad commented Aug 12, 2021

mutable kernel lambdas (i.e. things that are passed to parallel_for or single_task) are not allowed, I don't think there is an issue with mutable lambdas for hipSYCL_custom_operation per se as they just execute on the host like any other lambda.
Without having seen the code, I suspect your issue is one of lifetime: Since everything submitted to a queue (including custom operations) is executed asynchronously, it can happen that the lambda is executed when everything from the original scope is already destroyed.

One pattern to return values from asynchronous SYCL operations that should work well is to wrap the value in a shared_ptr and capture the shared_ptr by value in the lambda.

@illuhad
Copy link
Collaborator

illuhad commented Aug 12, 2021

I see that the hipStream can differ even when the property is passed. If that's expected behavior

This is not expected behavior. The interpretation of the property is in general backend-specific, but with all current hipSYCL backends it should guarantee that all submitted operations are scheduled to the same backend queue/stream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants