Describe the bug
A minimal SYCL program using Intel GPU offload under WSL2 hangs during cleanup/exit on an Intel Lunar Lake integrated GPU.
The system can see the GPU via sycl-ls, and a simple SYCL kernel runs correctly. However, the process does not terminate cleanly. Depending on the backend selected:
With ONEAPI_DEVICE_SELECTOR=opencl:gpu, the program prints normal end, but the process remains stuck in uninterruptible sleep with WCHAN=vmbus_teardown_gpadl.
With ONEAPI_DEVICE_SELECTOR=level_zero:gpu, the program prints the device name and kernel result, but does not reach normal end; it appears to hang during or immediately around sycl::free() / cleanup.
This same failure mode also affects downstream users such as PyTorch XPU. A minimal PyTorch XPU allocation succeeds but the Python process hangs on exit in vmbus_teardown_gpadl. However, the SYCL reproducer below demonstrates that the problem occurs below PyTorch.
To reproduce
Minimal reproducer:
#include <sycl/sycl.hpp>
#include <iostream>
int main() {
sycl::queue q{sycl::gpu_selector_v};
std::cout << "device: "
<< q.get_device().get_info<sycl::info::device::name>()
<< std::endl;
int *x = sycl::malloc_shared<int>(1, q);
x[0] = 0;
q.submit([&](sycl::handler& h) {
h.single_task([=]() { x[0] = 1; });
}).wait();
std::cout << "x: " << x[0] << std::endl;
sycl::free(x, q);
std::cout << "normal end" << std::endl;
return 0;
}
Save as
Compile with
source /opt/intel/oneapi/setvars.sh
icpx -fsycl sycl_min.cpp -o sycl_min
Launch with OpenCL GPU backend
ONEAPI_DEVICE_SELECTOR=opencl:gpu ./sycl_min
Observed output:
device: Intel(R) Graphics [0x64a0]
x: 1
normal end
However, the process does not exit. In another terminal:
ps -eo pid,ppid,stat,wchan:40,cmd | egrep 'sycl_min|PID'
shows:
PID PPID STAT WCHAN CMD
700 292 Dl+ vmbus_teardown_gpadl ./sycl_min
Launch with Level Zero GPU backend
ONEAPI_DEVICE_SELECTOR=level_zero:gpu ./sycl_min
Observed output:
device: Intel(R) Graphics [0x64a0]
x: 1
The program does not reach:
so it appears to hang before or during cleanup, likely around sycl::free(x, q) or related runtime cleanup.
What was expected
The program should print:
device: Intel(R) Graphics [0x64a0]
x: 1
normal end
and then exit normally with status 0.
What is wrong
The SYCL kernel runs correctly, but the process does not terminate cleanly. It becomes stuck in uninterruptible sleep, with WCHAN=vmbus_teardown_gpadl, and cannot be killed with kill -9. Recovery requires:
from Windows PowerShell.
Environment
Host:
Guest:
Intel GPU
Intel Core Ultra 7 268V
Intel Arc 140V GPU
Lunar Lake
PCI device ID: 8086:64A0
Get-CimInstance Win32_Processor | Select-Object Name
Get-CimInstance Win32_VideoController | Select-Object Name,PNPDeviceID,DriverVersion,DriverDate
Output:
Name
----
Intel(R) Core(TM) Ultra 7 268V
Name PNPDeviceID DriverVersion DriverDate
---- ----------- ------------- ----------
Intel(R) Arc(TM) 140V GPU (16GB) PCI\VEN_8086&DEV_64A0&SUBSYS_0CE31028&REV_04\3&11583659&0&10 32.0.101.8737 29/04/2026 9:30:00 AM
source /opt/intel/oneapi/setvars.sh
icpx --version
On this system:
Intel(R) oneAPI DPC++/C++ Compiler 2026.0.0 (2026.0.0.20260331)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2026.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2026.0/bin/compiler/../icpx.cfg
- Dependencies / device discovery
source /opt/intel/oneapi/setvars.sh
which icpx
which sycl-ls
sycl-ls
Output:
/opt/intel/oneapi/compiler/2026.0/bin/icpx
/opt/intel/oneapi/compiler/2026.0/bin/sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero V2, Intel(R) Graphics [0x64a0] 20.4.4 [1.15.37833+4]
[opencl:cpu][opencl:0] Intel(R) OpenCL, Intel(R) Core(TM) Ultra 7 268V OpenCL 3.0 (Build 0) [2026.21.3.0.31_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Graphics [0x64a0] OpenCL 3.0 NEO [26.14.37833.4]
- Linux GPU runtime packages:
apt-cache policy intel-opencl-icd libze1 libze-intel-gpu1 xpu-smi libxpum1 intel-gsc intel-metrics-discovery
Relevant installed versions:
intel-opencl-icd: 26.09.37435.12-1~24.04~ppa1
libze1: 1.28.0-1~24.04~ppa1
libze-intel-gpu1: 26.09.37435.12-1~24.04~ppa1
xpu-smi: 1.3.6-1~24.04~ppa1
libxpum1: 1.3.6-1~24.04~ppa1
intel-gsc: 0.9.5-1~24.04~ppa2
intel-metrics-discovery: 1.14.183-1~24.04~ppa1
Additional context
This issue was first observed while trying to use PyTorch XPU and a downstream bioinformatics tool that uses a Transformer model on XPU. The same hang occurs with a minimal PyTorch reproducer, but the SYCL example above shows that the problem is not specific to PyTorch.
Minimal PyTorch reproducer:
import torch, gc
print("torch", torch.__version__, flush=True)
print("xpu available", torch.xpu.is_available(), flush=True)
print("device", torch.xpu.get_device_name(0), flush=True)
a = torch.ones((10, 10), device="xpu")
torch.xpu.synchronize()
print(a[0, 0].cpu(), flush=True)
del a
gc.collect()
torch.xpu.empty_cache()
torch.xpu.synchronize()
print("normal end", flush=True)
Observed with both:
torch 2.8.0+xpu
torch 2.11.0+xpu
The PyTorch script prints normal end, but the Python process remains stuck:
STAT=Dl+
WCHAN=vmbus_teardown_gpadl
CMD=python xpu_normal_exit_test.py
The process cannot be killed with kill -9; recovery requires restarting WSL:
xpu-smi discovery reports no device on this system, but both sycl-ls and PyTorch detect the GPU correctly:
Intel(R) Graphics [0x64a0]
This may be related to WSL2 / DXG / VMBus teardown of GPU resources after SYCL USM allocation or cleanup on Lunar Lake / Arc 140V.
Describe the bug
A minimal SYCL program using Intel GPU offload under WSL2 hangs during cleanup/exit on an Intel Lunar Lake integrated GPU.
The system can see the GPU via
sycl-ls, and a simple SYCL kernel runs correctly. However, the process does not terminate cleanly. Depending on the backend selected:With
ONEAPI_DEVICE_SELECTOR=opencl:gpu, the program printsnormal end, but the process remains stuck in uninterruptible sleep withWCHAN=vmbus_teardown_gpadl.With
ONEAPI_DEVICE_SELECTOR=level_zero:gpu, the program prints the device name and kernel result, but does not reachnormal end; it appears to hang during or immediately aroundsycl::free()/ cleanup.This same failure mode also affects downstream users such as PyTorch XPU. A minimal PyTorch XPU allocation succeeds but the Python process hangs on exit in
vmbus_teardown_gpadl. However, the SYCL reproducer below demonstrates that the problem occurs below PyTorch.To reproduce
Minimal reproducer:
Save as
Compile with
source /opt/intel/oneapi/setvars.sh icpx -fsycl sycl_min.cpp -o sycl_minLaunch with OpenCL GPU backend
Observed output:
However, the process does not exit. In another terminal:
shows:
Launch with Level Zero GPU backend
Observed output:
The program does not reach:
so it appears to hang before or during cleanup, likely around
sycl::free(x, q)or related runtime cleanup.What was expected
The program should print:
and then exit normally with status 0.
What is wrong
The SYCL kernel runs correctly, but the process does not terminate cleanly. It becomes stuck in uninterruptible sleep, with
WCHAN=vmbus_teardown_gpadl, and cannot be killed withkill -9. Recovery requires:from Windows PowerShell.
Environment
Host:
Guest:
Output:
source /opt/intel/oneapi/setvars.sh icpx --versionOn this system:
source /opt/intel/oneapi/setvars.sh which icpx which sycl-ls sycl-lsOutput:
Relevant installed versions:
Additional context
This issue was first observed while trying to use PyTorch XPU and a downstream bioinformatics tool that uses a Transformer model on XPU. The same hang occurs with a minimal PyTorch reproducer, but the SYCL example above shows that the problem is not specific to PyTorch.
Minimal PyTorch reproducer:
Observed with both:
The PyTorch script prints normal end, but the Python process remains stuck:
The process cannot be killed with kill -9; recovery requires restarting WSL:
xpu-smi discoveryreports no device on this system, but bothsycl-lsand PyTorch detect the GPU correctly:This may be related to WSL2 / DXG / VMBus teardown of GPU resources after SYCL USM allocation or cleanup on Lunar Lake / Arc 140V.