Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drivers from 4885 and newer break IPEX for native windows. #442

Open
Mindset-Official opened this issue Oct 10, 2023 · 14 comments
Open

Drivers from 4885 and newer break IPEX for native windows. #442

Mindset-Official opened this issue Oct 10, 2023 · 14 comments

Comments

@Mindset-Official
Copy link

Mindset-Official commented Oct 10, 2023

Describe the issue

I have tried running it in both Sd.Next and ComfyUI and both fail when trying to generate an image. There is no error message it just seems to crash the Webui comletely. 4676 and older worked perfectly fine. Since there is no error message I can't really tell you what is broken. I believe the driver team is notified but I'm not sure what they can do since it's not officially supported, so I figured I would also post in here as well.

Wsl2 seems to still work fine.

a750
Windows 11
AOT compiled IPEX for windows
ryzen 5600
32gb of ddr4 at 3200

@jingxu10
Copy link
Contributor

@min-jean-cho

@Nuullll
Copy link

Nuullll commented Oct 11, 2023

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

@Mindset-Official
Copy link
Author

+1.

there is no error message

Under some circumstances, I can see "Abort was called at 198 line in file:" -- I believe this is raised from compute runtime.

I'm trying to isolate the issue.

Just to confirm, I also got this a few times.

@Vipitis
Copy link

Vipitis commented Oct 12, 2023

accelerate with --use_xpu or ipex enabled in config also throws exit status 3221225477 with A750 on Windows 10 and driver 4887

@Nuullll
Copy link

Nuullll commented Oct 15, 2023

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/torch-2.0.0a0+gite9ebda2-cp310-cp310-win_amd64.whl
pip install https://github.com/Nuullll/intel-extension-for-pytorch/releases/download/v2.0.110%2Bxpu-master%2Bdll-bundle/intel_extension_for_pytorch-2.0.110+gitc6ea20b-cp310-cp310-win_amd64.whl
pip install diffusers transformers
set SYCL_PI_TRACE=2
python reproducer.py 1> trace.log 2>&1

trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

@Mindset-Official
Copy link
Author

Mindset-Official commented Oct 15, 2023

It seems that driver 4885 was breaking backward compatibility against previous drivers.

The officially released IPEX Windows JIT wheels work fine with the following reproducer (the image was generated as expected):

import torch
import intel_extension_for_pytorch as ipex
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16).to("xpu")
prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save(f"astronaut_rides_horse.png")

However, if I use IPEX AOT wheels built from source with driver 4676 (or earlier) (for example, https://github.com/Nuullll/intel-extension-for-pytorch/releases/tag/v2.0.110%2Bxpu-master%2Bdll-bundle), the program crashes.

SYCL_PI_TRACE=2 log trace.log

---> piKernelCreate(
	<unknown> : 000001CA0D158570
	<const char *>: _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_
	<unknown> : 0000000DD9BE8C78
PI ---> (*RetKernel)->initialize()
PI ---> piProgramRetain(Program)
) ---> 	pi_result : PI_SUCCESS
	[out]<unknown> ** : 0000000DD9BE8C78[ 000001C9B5B0E8D0 ... ]

...

---> piEnqueueKernelLaunch(
	<unknown> : 000001C8A3717830
	<unknown> : 000001C9B5B0E8D0
	<unknown> : 1
	<unknown> : 0000000DD9BEA658
	<unknown> : 0000000DD9BEA628
	<unknown> : 0000000DD9BEA640
	<unknown> : 0
	pi_event * : 0000000000000000[ nullptr ]
	pi_event * : 000001C8A7A7D1D8[ 0000000000000000 ... ]
PI ---> Queue->insertStartBarrierIfDiscardEventsMode(CommandList)
PI ---> EventCreate(Queue->Context, Queue, ForceHostVisible, Event)
PI ---> piEventRetain(*Event)
PI ---> piKernelRetain(Kernel)

Crashed while executing piEnqueueKernelLaunch for kernel _ZTSZZN2at15AtenIpexTypeXPUL20launch_legacy_kernelIZNS0_18dpcpp_loops_kernelIZZZNS_4impl21copy_device_to_deviceERNS_14TensorIteratorEbENKUlvE3_clEvENKUlvE9_clEvEUlN3c104HalfEE_Lb0ELb1EEEvRNS_18TensorIteratorBaseET_EUliE_EEvxRKSD_ENKUlRN4sycl3_V17handlerEE_clESK_EUlNSI_7nd_itemILi1EEEE_

Probably I should compile IPEX with driver 4885?

You could try and see, but the official wheels haven't been updated (afaik) so I don't think they were compiled on the latest drivers. Maybe the new drivers break something in AOT?

@Nuullll
Copy link

Nuullll commented Oct 16, 2023

I tried compiling IPEX AOT for Arc with driver 4887. The reproducer still crashes with the same SYCL PI TRACE log.

@Mindset-Official
Copy link
Author

Are there any updates on whats going on with the newest drivers? I personally haven't tried the very latest but have heard it is also not working from others.(I may give it a shot if someone says otherwise). Any progress on figuring out what's happening?

@Nuullll
Copy link

Nuullll commented Oct 26, 2023

I can confirm that Driver 4885, 4887 and 4900 all cannot work with IPEX AOT, simply because they ship the same Level Zero Compute Runtime "1.3.27193".

@Mindset-Official
Copy link
Author

I take it this is completely driver level and no way to override and install the older runtime version?

@Nuullll
Copy link

Nuullll commented Oct 26, 2023

I take it this is completely driver level and no way to override and install the older runtime version?

I tried to replace the driver storage files ze_intel_gpu64.dll, ze_loader.dll, ze_tracing_layer.dll, ze_validation_layer.dll under C:\Windows\System32 with the older dlls. But apparently I could've missed something -- failed to load compute runtime library.

@Mindset-Official
Copy link
Author

Mindset-Official commented Oct 26, 2023

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

@Nuullll
Copy link

Nuullll commented Oct 26, 2023

that's way above my level, however in my folder I do not see a ze_intel_gpu64.dll in the main folder but only in one of the driver state repository folders, this is driver 4676

Yes, correct. 4 ze_*.dll in driver storage folder and 3 ze_*.dll in system32. I replaced them all but still got no luck :-(

@Nuullll
Copy link

Nuullll commented Nov 2, 2023

The issue is gone with Driver 4952

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants