[Bug]: Optimization of Unet fails 6950 XT #517

captroper · 2023-08-26T15:32:06Z

What happened?

This appeared to me to be the same issue as 510 and 301, though I know nothing. I ran the following commands:

conda create --name olive python=3.9
conda activate olive
pip install olive-ai[directml]==0.3.1
git clone https://github.com/microsoft/olive --branch v0.3.1
cd (to relevant directory)
pip install -r requirements.txt
python stable_diffusion_xl.py --optimize

I've attached the log, as well as a DXDIAG, but it errors out when optimizing unet saying "failed to run olive on gpu-dml".... "887a0006 the gpu will not respond to more commands".

DxDiag.txt
ErrorLog.txt

Version?

0.3.1

guotuofeng · 2023-08-28T00:58:03Z

The following error message seems be related to DirectML EP.

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : D:\a_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\ExecutionProvider.cpp(896)\onnxruntime_pybind11_state.pyd!00007FFE31C80201: (caller: 00007FFE31C80C2F) Exception(2) tid(3c14) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

guotuofeng · 2023-09-17T03:05:27Z

@jstoecker, do you have any insight?

guotuofeng · 2023-09-18T01:55:51Z

seems similar with #510

jstoecker · 2023-09-19T00:02:36Z

This is DXGI_ERROR_DEVICE_HUNG during inference/evaluation, which typically happens when some GPU work is taking excessively long. The recent AMD driver optimizations for stable diffusion / multi-head attention target the RDNA 3 architecture (e.g., the 7000 series, like the Radeon RX 7900 XTX) but not the RDNA 2 (6000 series). Still, we can try to repro this on an RDNA card to see if anything jumps out.

CellerX · 2023-09-26T04:33:42Z

6800xt has same err

vibbix · 2023-11-22T21:49:17Z

Error on my 6900XT as well, on 0.4.0

Jerry-zirui · 2024-06-21T07:37:30Z

Same Error occurred in AMD Ryzen 7 7840U w/ Radeon 780M Graphics.
I increased the dedicated GPU memory as #510 mentioned, but the error still.

captroper added the bug Something isn't working label Aug 26, 2023

guotuofeng added the waiting for response Waiting for response label Sep 17, 2023

guotuofeng removed the waiting for response Waiting for response label Sep 19, 2023

woonyee28 mentioned this issue May 22, 2024

[Bug]: Optimization of Unet fails - AMD RDNA3.5 Strix Point Processor #1170

Open

devang-ml added the DirectML DirectML label Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Optimization of Unet fails 6950 XT #517

[Bug]: Optimization of Unet fails 6950 XT #517

captroper commented Aug 26, 2023 •

edited

Loading

guotuofeng commented Aug 28, 2023

guotuofeng commented Sep 17, 2023

guotuofeng commented Sep 18, 2023

jstoecker commented Sep 19, 2023

CellerX commented Sep 26, 2023

vibbix commented Nov 22, 2023 •

edited

Loading

Jerry-zirui commented Jun 21, 2024 •

edited

Loading

[Bug]: Optimization of Unet fails 6950 XT #517

[Bug]: Optimization of Unet fails 6950 XT #517

Comments

captroper commented Aug 26, 2023 • edited Loading

What happened?

Version?

guotuofeng commented Aug 28, 2023

guotuofeng commented Sep 17, 2023

guotuofeng commented Sep 18, 2023

jstoecker commented Sep 19, 2023

CellerX commented Sep 26, 2023

vibbix commented Nov 22, 2023 • edited Loading

Jerry-zirui commented Jun 21, 2024 • edited Loading

captroper commented Aug 26, 2023 •

edited

Loading

vibbix commented Nov 22, 2023 •

edited

Loading

Jerry-zirui commented Jun 21, 2024 •

edited

Loading