-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfaults on RDNA1 / gfx1010 (RX 5700XT) with ROCm on built nightly torch 2.1.0 #106728
Comments
@cl0ck-byte Can you run the following commands and send the output:
|
Output from
|
@cl0ck-byte Thanks for the output. For your reference, the link below shows the latest list of AMD officially supported GPUs. https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html#linux-supported-gpus |
From what I'm understanding, nothing will be done about this issue since RDNA1 is not officially supported? edit: if someone ever wants to pick up on this, here's GEF(gdb fork) output with debug symbols (torch was compiled without debug param), although missing colored syntax: Seems like it's something related to drivers/ROCm/whatever else, and that's way beyond my capabilities. And if someone really wants to get Torch working on RDNA1, downgrade it to latest pre 2.0.0 version and use same workaround as before, which is forcing environment variable also quick edit: maybe DirectML will work instead? be sure to check it out |
interesting, should I try building that snapshot by myself with |
I can confirm that pytorch 2 is indeed working on gfx1010 if compiled using rocm 5.2, using "export HSA_OVERRIDE_GFX_VERSION=10.3.0" Actually, webarchive isn't really needed here. The pytorch official repo still has it. So, for torch, torchvision and torchaudio the links are: anyway.... i think the problem here is on rocm 5.3 and newer |
Someone compiled wheels for torch 2.1.0 on rocm 5.2 if somebody wants to use them |
Grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties Returns "flags 13" on RDNA1 / gfx1010 (RX 5700XT)! Can this version be supported? |
That's a different issue and has nothing to do with the RX 5700XT itself, it's about your configuration, wich doesn't supports PCI Atomics, needed on newer rocm versions. There can be multiple causes. Maybe the motherboard or the CPU are too old. There was also a guy who wrote somewhere here on github wich had the video card mounted in the lower pci slot but his motherboard supported pci atomics only on the first one. |
Thank you for the guidance. Yes unfortunately both the cpu and motherboard are very old! |
Does this include Torch audio/Vision? I only see Torch 2.1.0 |
🐛 Describe the bug
Since workaround with forcing environment variable
HSA_OVERRIDE_GFX_VERSION=10.3.0
(which is RDNA2 card) doesn't work anymore after torch>=2.0.0 on RDNA1 (results in segfaults), I've tried my luck with compiling torch wheel by myself withgfx1010
target and numpy support.Running example code under such torch build:
results in segmentation fault:
Output from running
coredumpctl debug
on latest dump:https://gist.github.com/cl0ck-byte/5a4e24f1a67fd588fde06d28da2e5765
Output from running
dmesg -kuT
:Cannot upload torch wheel build due to size.
Versions
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang
The text was updated successfully, but these errors were encountered: