Skip to content

Optimisation bug using sycl::select with sycl::short3 #19935

@emtKazys

Description

@emtKazys

Describe the bug

I'm getting zero values when using sycl::select() with sycl::short3 and converting the result to a sycl::id<3>. The following function call will unexpectedly return zero values:

sycl::id<3> wrap_relative_index(sycl::short3 idx, const sycl::short3& dimensions) noexcept
{
  // Wrap underflow indices
  idx = sycl::select(idx, idx + dimensions, idx < 0);
  // Wrap overflow indices
  idx = sycl::select(idx, idx - dimensions, idx >= dimensions);
  return {static_cast<size_t>(idx[0]), static_cast<size_t>(idx[1]), static_cast<size_t>(idx[2])};
}

I can't really distil down a simple explanation, but I do have a single source reproducible case attached along with a Dockerfile to setup a repeatable environment and build the source.

The issue can be worked around by either using sycl::int3 instead of sycl::short3 or by replacing the use of sycl::select() in the example source file (build with -DUSE_INT3 or -DNO_SELECT respectively.

To reproduce

Reproduce with docker file:

  1. Install docker and nvidia-container-toolkit
  2. Download source file and docker file (attached)
  3. Build container docker build -f Dockerfile.txt -t short3_select_bug .
  4. Run the container docker run --rm -it --gpus all short3_select_bug bash
  5. Run the example: /root/bin/select_short3_bug
  6. Run the workarounds:
  • /root/bin/select_short3_bug_o0
  • /root/bin/select_short3_bug_int3
  • /root/bin/select_short3_bug_no_select

select_short3_bug.cpp
Dockerfile.txt

Manual build:

  1. Download sample source (above)
  2. Open shell and navigate to the source location.
  3. source /opt/intel/oneapi/setvars.sh
  4. Create output directory: mkdir bin
  5. Build: icpx select_short3_bug.cpp -fsycl -sycl-std=2020 -std=c++20 -fsycl-targets=nvptx64-nvidia-cuda -O3 -o bin/select_short3_bug (see also the Dockerfile for workaround variations)
  6. Run with bin/select_short3_bug

Expected output:

(0,0,0) => (2,2,2)      expect: (2,2,2) ok? true
(-1,-1,-1) => (1,1,1)   expect: (1,1,1) ok? true
(4,4,4) => (2,2,2)      expect: (2,2,2) ok? true
(5,5,5) => (3,3,3)      expect: (3,3,3) ok? true
(4,4,-2) => (2,2,0)     expect: (2,2,0) ok? true
(-3,-3,-4) => (3,3,2)   expect: (3,3,2) ok? true
(-3,3,1) => (3,1,3)     expect: (3,1,3) ok? true
(0,1,4) => (2,3,2)      expect: (2,3,2) ok? true
(4,0,1) => (2,2,3)      expect: (2,2,3) ok? true
(4,2,-1) => (2,0,1)     expect: (2,0,1) ok? true
(1,-3,2) => (3,3,0)     expect: (3,3,0) ok? true
(4,4,-4) => (2,2,2)     expect: (2,2,2) ok? true
(-4,-2,-4) => (2,0,2)   expect: (2,0,2) ok? true
(-1,-2,3) => (1,0,1)    expect: (1,0,1) ok? true
(1,1,3) => (3,3,1)      expect: (3,3,1) ok? true
(4,3,1) => (2,1,3)      expect: (2,1,3) ok? true

Actual output:

(0,0,0) => (0,0,0)      expect: (2,2,2) ok? false
(-1,-1,-1) => (0,0,0)   expect: (1,1,1) ok? false
(4,4,4) => (0,0,0)      expect: (2,2,2) ok? false
(5,5,5) => (0,0,0)      expect: (3,3,3) ok? false
(4,4,-2) => (0,0,0)     expect: (2,2,0) ok? false
(-3,-3,-4) => (0,0,0)   expect: (3,3,2) ok? false
(-3,3,1) => (0,0,0)     expect: (3,1,3) ok? false
(0,1,4) => (0,0,0)      expect: (2,3,2) ok? false
(4,0,1) => (0,0,0)      expect: (2,2,3) ok? false
(4,2,-1) => (0,0,0)     expect: (2,0,1) ok? false
(1,-3,2) => (0,0,0)     expect: (3,3,0) ok? false
(4,4,-4) => (0,0,0)     expect: (2,2,2) ok? false
(-4,-2,-4) => (0,0,0)   expect: (2,0,2) ok? false
(-1,-2,3) => (0,0,0)    expect: (1,0,1) ok? false
(1,1,3) => (0,0,0)      expect: (3,3,1) ok? false
(4,3,1) => (0,0,0)      expect: (2,1,3) ok? false

Environment

  • OS: Linux 24.04
  • Target device and vendor: Nvidia GPU
  • DPC++ version: 2025.2 also tested at tag v6.1.0
  • Dependencies version:
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]

Platforms: 1
Platform [#1]:
    Version  : CUDA 12.9
    Name     : NVIDIA CUDA BACKEND
    Vendor   : NVIDIA Corporation
    Devices  : 1
        Device [#0]:
        Type              : gpu
        Version           : 8.6
        Name              : NVIDIA RTX A3000 Laptop GPU
        Vendor            : NVIDIA Corporation
        Driver            : CUDA 12.9
        UUID              : 599917723921314109239219176501114419841218
        DeviceID          : 0
        Num SubDevices    : 0
        Num SubSubDevices : 0
        Aspects           : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
 ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_1d_usm ext_oneapi_bindless_images_sample_2d_usm
        info::device::sub_group_sizes: 32
        Architecture: nvidia_gpu_sm_86
default_selector()      : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
accelerator_selector()  : No device of requested type available.
cpu_selector()          : No device of requested type available.
gpu_selector()          : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
custom_selector(gpu)    : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
custom_selector(cpu)    : No device of requested type available.
custom_selector(acc)    : No device of requested type available.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcudaCUDA back-end

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions