-
Notifications
You must be signed in to change notification settings - Fork 798
Description
Describe the bug
Executing a trivial kernel on the second out of two Level-Zero devices (Arc A770) in my machine causes a PI_ERROR_DEVICE_NOT_AVAILABLE
error since #10794 was merged.
To Reproduce
The following program
#include <cstdio>
#include <sycl/sycl.hpp>
int main() {
for(size_t i = 0; i < sycl::device::get_devices().size(); ++i) {
auto device = sycl::device::get_devices()[i];
printf("Using device %zu: %s (id: %u, platform: %s)\n", i,
device.get_info<sycl::info::device::name>().c_str(),
device.get_info<sycl::info::device::vendor_id>(),
device.get_platform().get_info<sycl::info::platform::name>().c_str());
sycl::queue q{device};
q.parallel_for(sycl::range<1>(10), [](sycl::id<1>) {
// no-op
});
q.wait_and_throw();
printf("Done\n");
}
}
runs for all devices except the last, which is the second Arc A770 in the system exposed through Level-Zero, for which it seems to hang for a couple of seconds and then crashes:
$ clang++ -fsycl test.cpp -o test && ./test
Using device 0: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz (id: 32902, platform: Intel(R) OpenCL)
Done
Using device 1: Intel(R) Arc(TM) A770 Graphics (id: 32902, platform: Intel(R) OpenCL Graphics)
Done
Using device 2: Intel(R) Arc(TM) A770 Graphics (id: 32902, platform: Intel(R) OpenCL Graphics)
Done
Using device 3: Intel(R) Arc(TM) A770 Graphics (id: 32902, platform: Intel(R) Level-Zero)
Done
Using device 4: Intel(R) Arc(TM) A770 Graphics (id: 32902, platform: Intel(R) Level-Zero)
terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -2 (PI_ERROR_DEVICE_NOT_AVAILABLE) -2 (PI_ERROR_DEVICE_NOT_AVAILABLE)
[1] 439610 IOT instruction ./test
It appears that this is not an issue with the second device per-se, but rather the runtime's handling of it. Point in case: If I limit the visible devices through ONEAPI_DEVICE_SELECTOR="level_zero:1"
to only show the second device, everything still works:
$ ONEAPI_DEVICE_SELECTOR="level_zero:1" ./test
Using device 0: Intel(R) Arc(TM) A770 Graphics (id: 32902, platform: Intel(R) Level-Zero)
Done
I'm unfortunately unable to test whether this has been fixed since, as the current HEAD build of DPC++ (dbd9b67) segfaults during compilation. Edit: As noted below, current builds still have this problem.
Environment (please complete the following information):
- OS: Ubuntu 22.04
- Motherboard: Supermicro X12DPU-6
- Target device and vendor: 2x Intel Arc A770
- DPC++ version: clang version 17.0.0 (https://github.com/intel/llvm 0e49948)
- Dependencies version:
$ apt list --installed | grep "intel\|level"
intel-fw-gpu/jammy,jammy,now 2023.25.6-231~22.04 all [installed]
intel-gsc/jammy,now 0.8.9+51~u22.04 amd64 [installed,automatic]
intel-i915-dkms/jammy,jammy,now 1.23.5.19.230406.21.5.17.0.1034+i38-1 all [installed]
intel-igc-cm/jammy,now 1.0.176+i600~22.04 amd64 [installed]
intel-igc-core/now 1.0.14062.11 amd64 [installed,local]
intel-igc-opencl/now 1.0.14062.11 amd64 [installed,local]
intel-level-zero-gpu/now 1.3.26516.18 amd64 [installed,local]
intel-media-va-driver-non-free/jammy,now 23.2.1-647~22.04 amd64 [installed]
intel-metrics-discovery/jammy,now 1.12.164-647~22.04 amd64 [installed,automatic]
intel-metrics-library/jammy,now 1.0.133-647~22.04 amd64 [installed,automatic]
intel-microcode/jammy-updates,jammy-security,now 3.20230808.0ubuntu0.22.04.1 amd64 [installed,automatic]
intel-opencl-icd/now 23.22.26516.18 amd64 [installed,local]
intel-platform-cse-dkms/jammy,now 2023.11.1-36 amd64 [installed]
intel-platform-vsec-dkms/jammy,now 2023.20.0-21 amd64 [installed]
level-zero-devel/now 1.11.0 amd64 [installed,local]
level-zero/now 1.11.0 amd64 [installed,upgradable to: 1.11.0-647~22.04]
libdrm-intel1/jammy-updates,now 2.4.113-2~ubuntu0.22.04.1 amd64 [installed,automatic]