Skip to content

HIP error: invalid device function #164346

@mihongyu

Description

@mihongyu

🐛 Describe the bug

I have installed ROCM7 and used the pytorch version with the following URL:
https://download.pytorch.org/whl/nightly/rocm7.0/

In COMfyUI, using any workflow would result in the following error:

torch.AcceleratorError: HIP error: invalid device function
Search for hipErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information. HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA` to enable device-side assertions.

Traceback (most recent call last):
File "/home/mewmhy/AI/ComfyUI/execution.py", line 496, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/execution.py", line 315, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/execution.py", line 289, in async_map_node_over_list
await process_inputs(input_dict, i)
File "/home/mewmhy/AI/ComfyUI/execution.py", line 277, in process_inputs
result = f(inputs)
^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/nodes.py", line 952, in load_clip
clip = comfy.sd.load_clip(ckpt_paths=[clip_path], embedding_directory=folder_paths.get_folder_paths("embeddings"), clip_type=clip_type, model_options=model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/sd.py", line 845, in load_clip
return load_text_encoder_state_dicts(clip_data, embedding_directory=embedding_directory, clip_type=clip_type, model_options=model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/sd.py", line 1062, in load_text_encoder_state_dicts
clip = CLIP(clip_target, embedding_directory=embedding_directory, parameters=parameters, tokenizer_data=tokenizer_data, model_options=model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/sd.py", line 113, in init
self.cond_stage_model = clip(
(params))
^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/qwen_image.py", line 94, in init
super().init(device=device, dtype=dtype, model_options=model_options)
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/qwen_image.py", line 57, in init
super().init(device=device, dtype=dtype, name="qwen25_7b", clip_model=Qwen25_7BVLIModel, model_options=model_options)
File "/home/mewmhy/AI/ComfyUI/comfy/sd1_clip.py", line 675, in init
setattr(self, self.clip, clip_model(device=device, dtype=dtype, model_options=model_options, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/qwen_image.py", line 52, in init
super().init(device=device, layer=layer, layer_idx=layer_idx, textmodel_json_config={}, dtype=dtype, special_tokens={"pad": 151643}, layer_norm_hidden_state=False, model_class=comfy.text_encoders.llama.Qwen25_7BVLI, enable_attention_masks=attention_mask, return_attention_masks=attention_mask, model_options=model_options)
File "/home/mewmhy/AI/ComfyUI/comfy/sd1_clip.py", line 121, in init
self.transformer = model_class(config, dtype, device, self.operations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/llama.py", line 391, in init
self.model = Llama2
(config, device=device, dtype=dtype, ops=operations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/llama.py", line 287, in init
transformer(config, device=device, dtype=dtype, ops=ops)
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/llama.py", line 197, in init
self.self_attn = Attention(config, device=device, dtype=dtype, ops=ops)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/text_encoders/llama.py", line 150, in init
self.q_proj = ops.Linear(config.hidden_size, self.inner_size, bias=config.qkv_bias, device=device, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/mewmhy/AI/ComfyUI/comfy/ops.py", line 386, in init
super().init(*args, **kwargs)
File "/home/mewmhy/miniconda3/envs/rocm-torch7/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 115, in init
self.reset_parameters()
File "/home/mewmhy/AI/ComfyUI/comfy/ops.py", line 390, in reset_parameters
self.scale_weight = torch.nn.parameter.Parameter(data=torch.ones((), device=self.weight.device, dtype=torch.float32), requires_grad=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.AcceleratorError: HIP error: invalid device function
Search for hipErrorInvalidDeviceFunction' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__HIPRT__TYPES.html for more information. HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3 Compile with TORCH_USE_HIP_DSA` to enable device-side assertions.

Versions

Collecting environment information...
PyTorch version: 2.10.0.dev20250929+rocm7.0
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 7.0.51831-a3e329ad8

OS: Ubuntu 24.04.3 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.11 | packaged by Anaconda, Inc. | (main, Jun 5 2025, 13:09:17) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.16.0-061600-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to:
GPU models and configuration: AMD Radeon Graphics (gfx1151)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: 7.0.51831
MIOpen runtime version: 3.5.0
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S
CPU family: 26
Model: 112
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 0
Frequency boost: enabled
CPU(s) scaling MHz: 43%
CPU max MHz: 5187.5000
CPU min MHz: 625.0000
BogoMIPS: 5988.79
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
Virtualization: AMD-V
L1d cache: 768 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 16 MiB (16 instances)
L3 cache: 64 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Not affected
Vulnerability Ghostwrite: Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Old microcode: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; IBPB on VMEXIT only
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsa: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] nvidia-cuda-runtime-cu13==0.0.0a0
[pip3] onnx==1.19.0
[pip3] onnxruntime==1.23.0
[pip3] onnxruntime-gpu==1.23.0
[pip3] pytorch-triton-rocm==3.5.0+gitbbb06c03
[pip3] torch==2.10.0.dev20250929+rocm7.0
[pip3] torchaudio==2.8.0.dev20250930+rocm7.0
[pip3] torchsde==0.2.6
[pip3] torchvision==0.25.0.dev20250930+rocm7.0
[conda] numpy 2.2.6 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu13 0.0.0a0 pypi_0 pypi
[conda] pytorch-triton-rocm 3.5.0+gitbbb06c03 pypi_0 pypi
[conda] torch 2.10.0.dev20250929+rocm7.0 pypi_0 pypi
[conda] torchaudio 2.8.0.dev20250930+rocm7.0 pypi_0 pypi
[conda] torchsde 0.2.6 pypi_0 pypi
[conda] torchvision 0.25.0.dev20250930+rocm7.0 pypi_0 pypi

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: rocmAMD GPU support for PytorchtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions