-
Notifications
You must be signed in to change notification settings - Fork 799
Open
Labels
Description
Describe the bug
I'm getting zero values when using sycl::select()
with sycl::short3
and converting the result to a sycl::id<3>
. The following function call will unexpectedly return zero values:
sycl::id<3> wrap_relative_index(sycl::short3 idx, const sycl::short3& dimensions) noexcept
{
// Wrap underflow indices
idx = sycl::select(idx, idx + dimensions, idx < 0);
// Wrap overflow indices
idx = sycl::select(idx, idx - dimensions, idx >= dimensions);
return {static_cast<size_t>(idx[0]), static_cast<size_t>(idx[1]), static_cast<size_t>(idx[2])};
}
I can't really distil down a simple explanation, but I do have a single source reproducible case attached along with a Dockerfile to setup a repeatable environment and build the source.
The issue can be worked around by either using sycl::int3
instead of sycl::short3
or by replacing the use of sycl::select()
in the example source file (build with -DUSE_INT3
or -DNO_SELECT
respectively.
To reproduce
Reproduce with docker file:
- Install docker and
nvidia-container-toolkit
- Download source file and docker file (attached)
- Build container
docker build -f Dockerfile.txt -t short3_select_bug .
- Run the container
docker run --rm -it --gpus all short3_select_bug bash
- Run the example:
/root/bin/select_short3_bug
- Run the workarounds:
/root/bin/select_short3_bug_o0
/root/bin/select_short3_bug_int3
/root/bin/select_short3_bug_no_select
select_short3_bug.cpp
Dockerfile.txt
Manual build:
- Download sample source (above)
- Open shell and navigate to the source location.
source /opt/intel/oneapi/setvars.sh
- Create output directory:
mkdir bin
- Build:
icpx select_short3_bug.cpp -fsycl -sycl-std=2020 -std=c++20 -fsycl-targets=nvptx64-nvidia-cuda -O3 -o bin/select_short3_bug
(see also the Dockerfile for workaround variations) - Run with
bin/select_short3_bug
Expected output:
(0,0,0) => (2,2,2) expect: (2,2,2) ok? true
(-1,-1,-1) => (1,1,1) expect: (1,1,1) ok? true
(4,4,4) => (2,2,2) expect: (2,2,2) ok? true
(5,5,5) => (3,3,3) expect: (3,3,3) ok? true
(4,4,-2) => (2,2,0) expect: (2,2,0) ok? true
(-3,-3,-4) => (3,3,2) expect: (3,3,2) ok? true
(-3,3,1) => (3,1,3) expect: (3,1,3) ok? true
(0,1,4) => (2,3,2) expect: (2,3,2) ok? true
(4,0,1) => (2,2,3) expect: (2,2,3) ok? true
(4,2,-1) => (2,0,1) expect: (2,0,1) ok? true
(1,-3,2) => (3,3,0) expect: (3,3,0) ok? true
(4,4,-4) => (2,2,2) expect: (2,2,2) ok? true
(-4,-2,-4) => (2,0,2) expect: (2,0,2) ok? true
(-1,-2,3) => (1,0,1) expect: (1,0,1) ok? true
(1,1,3) => (3,3,1) expect: (3,3,1) ok? true
(4,3,1) => (2,1,3) expect: (2,1,3) ok? true
Actual output:
(0,0,0) => (0,0,0) expect: (2,2,2) ok? false
(-1,-1,-1) => (0,0,0) expect: (1,1,1) ok? false
(4,4,4) => (0,0,0) expect: (2,2,2) ok? false
(5,5,5) => (0,0,0) expect: (3,3,3) ok? false
(4,4,-2) => (0,0,0) expect: (2,2,0) ok? false
(-3,-3,-4) => (0,0,0) expect: (3,3,2) ok? false
(-3,3,1) => (0,0,0) expect: (3,1,3) ok? false
(0,1,4) => (0,0,0) expect: (2,3,2) ok? false
(4,0,1) => (0,0,0) expect: (2,2,3) ok? false
(4,2,-1) => (0,0,0) expect: (2,0,1) ok? false
(1,-3,2) => (0,0,0) expect: (3,3,0) ok? false
(4,4,-4) => (0,0,0) expect: (2,2,2) ok? false
(-4,-2,-4) => (0,0,0) expect: (2,0,2) ok? false
(-1,-2,3) => (0,0,0) expect: (1,0,1) ok? false
(1,1,3) => (0,0,0) expect: (3,3,1) ok? false
(4,3,1) => (0,0,0) expect: (2,1,3) ok? false
Environment
- OS: Linux 24.04
- Target device and vendor: Nvidia GPU
- DPC++ version: 2025.2 also tested at tag
v6.1.0
- Dependencies version:
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
Platforms: 1
Platform [#1]:
Version : CUDA 12.9
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.6
Name : NVIDIA RTX A3000 Laptop GPU
Vendor : NVIDIA Corporation
Driver : CUDA 12.9
UUID : 599917723921314109239219176501114419841218
DeviceID : 0
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_native_assert ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_1d_usm ext_oneapi_bindless_images_sample_2d_usm
info::device::sub_group_sizes: 32
Architecture: nvidia_gpu_sm_86
default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
accelerator_selector() : No device of requested type available.
cpu_selector() : No device of requested type available.
gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA RTX A3000 Laptop GPU 8.6 [CUDA 12.9]
custom_selector(cpu) : No device of requested type available.
custom_selector(acc) : No device of requested type available.
Additional context
No response