Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm breaks the system after recent update. #3235

Closed
loafylemon opened this issue Feb 28, 2024 · 12 comments
Closed

ROCm breaks the system after recent update. #3235

loafylemon opened this issue Feb 28, 2024 · 12 comments

Comments

@loafylemon
Copy link

loafylemon commented Feb 28, 2024

Distribution (run cat /etc/os-release):

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

Hardware information:

AMD Ryzen 7 7800X3D 8-Core Processor
AMD Radeon 7900 XTX

Related Application and/or Package Version (run apt policy $PACKAGE NAME):
N/A

Issue/Bug Description:
After the recent update, I've encountered an issue where the system wouldn't go past the 'Something went wrong :(' GNOME screen. Since then, I have reinstalled Pop_!OS in order to find the point of failure, and after trying to install ROCm and rebooting, the issue reoccurred. I must emphasise, the problem did not occur before the recent update.

Logs:
dump.log

Steps to reproduce (if you know):

  1. Download and install amdgpu-install.
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
  1. Add Pop_!OS to the supported list.
sudo nano /usr/bin/amdgpu-install
case "$ID" in
ubuntu|linuxmint|debian|pop)
                         ^
  1. Install ROCm.
sudo amdgpu-install --usecase=rocm --no-dkms
  1. Reboot.

Expected behavior:

Other Notes:
I am able to log-in through TTYL by pressing CTRL+ALT+F4.

Things I have tried to make it work:

  • Install another DE, WM.
  • Reconfiguring dpkg.
  • Installing older version of ROCm (5.7).
  • Disabling hybrid graphics/iGPU.
  • Rolling back to the previous version of Pop_!OS <- This allows ROCm to work, but is not ideal.

EDIT: (3 MAR 2024)
Still broken on the new kernel.
Linux pop-os 6.6.10-76060610-generic #202401051437~1709085277~22.04~31d73d8 SMP PREEMPT_DYNAMIC Wed F x86_64 x86_64 x86_64 GNU/Linux

@pablovesnine
Copy link

I have exactly the same issue.

Thank goodness I made a backup with Timeshift before experimenting with installing ROCm 6.0.2. The only way to make it work right now on Pop!_OS is by installing AMD's DKMS graphics drivers using kernel 6.5.4:

amdgpu-install --usecase=graphics,rocm

I haven't found a way to get ROCm 6.0.2 working with the open-source drivers or in a more recent kernel.

@Hunterrules0-0
Copy link

Hunterrules0-0 commented Apr 8, 2024

Can comfirm that this still is a problem has gotten worse. Dual monitors no longer work, going to the settings crahses the session and sends you back to the menu and worst of all it seems the drivers arent even loaded in. Uninstalling amdgpu dose not appear to do anything and im having to move over 1tb of data of files and videos because i have to reinstall the entire os.
Whatever amdgpu installs it seems to have broken parts of the os after this update.

If your lucky you can try spamming tab on your pcs startup. then wait for the bootloader to show you the option to boot from the old kernel, current kernel, or recovery. Select old kernel and see if that fixes it. If it does DO NOT update until pop os figures out there shit.

@leviport
Copy link
Member

leviport commented Apr 8, 2024

until pop os figures out there shit.

Please keep in mind that we have a code of conduct.

It's been mentioned multiple times that this isn't Pop's fault. AMD's DKMS modules are often incompatible with the kernels we ship, and there's nothing we can do about it. Since System76 hardware is very bleeding-edge, we often need to ship newer kernels for hardware enablement.

At this time, we're recommending ROCm/AMDGPU-PRO users implement a containerized workflow, rather than trying to install these drivers directly on your OS.

@Hunterrules0-0
Copy link

until pop os figures out there shit.

Please keep in mind that we have a code of conduct.

It's been mentioned multiple times that this isn't Pop's fault. AMD's DKMS modules are often incompatible with the kernels we ship, and there's nothing we can do about it. Since System76 hardware is very bleeding-edge, we often need to ship newer kernels for hardware enablement.

At this time, we're recommending ROCm/AMDGPU-PRO users implement a containerized workflow, rather than trying to install these drivers directly on your OS.

Yeah I am sorry about that comment. Im just mad That I have to reinstall everything. I have a 2tb nvme drive that im moving the things to. it acts as a secondary storage device for my pc. IM moving all my important files to there temporarily. until I reinstall from the built in recovery mode. then after that ill move the files back to there respected places and delete the files from the secondary drive. It just makes me really mad that im having to format and reinstall everything. But I am sorry for that comment

@pablovesnine
Copy link

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

  1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).
  2. Install the AMD driver using:
amdgpu-install --no-dkms
  1. Restart the system and ensure everything is working properly.
  2. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
  1. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
  1. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.

Good luck!

@Hunterrules0-0
Copy link

Hunterrules0-0 commented Apr 8, 2024

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).

2. Install the AMD driver using:
amdgpu-install --no-dkms
3. Restart the system and ensure everything is working properly.

4. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
5. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
6. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.

Good luck!

can you show a picture of cpu-x to show that the gpu drivers are loaded in and your kernel version. I dont think im going to install rocm again after it destroyed my installation. also who knows whats going to happen next update. it could break it more

@pablovesnine
Copy link

I managed to get ROCm 6.02 working on Pop!_OS 22.04 LTS with the latest Linux kernel (6.8.0) and a RX 7900 GRE.

1. First, make some kind of backup of your system (there's no guarantee these steps will work for you).

2. Install the AMD driver using:
amdgpu-install --no-dkms
3. Restart the system and ensure everything is working properly.

4. Install ROCm:
amdgpu-install --usecase=rocm --no-dkms
5. Add yourself for rendering and video permissions:
sudo usermod -aG video $USER
sudo usermod -aG render $USER
6. Restart again and verify that ROCm is functioning properly:
rocminfo
rocm-smi

Keep in mind that this might entail a regression in some aspects. For instance, MESA shifts from version 24 to 23.3 according to the command glxinfo -B.
Good luck!

can you show a picture of cpu-x to show that the gpu drivers are loaded in and your kernel version. I dont think im going to install rocm again after it destroyed my installation. also who knows whats going to happen next update. it could break it more

It's in Spanish and I'm not familiar with CPU-X, but I believe this is what you're looking for:

imagen

I understand that you don't want to try again. I've nuked my system a couple of times after discovering that with Timeshift, you can 'snapshot' your system (excluding personal files) and try things carelessly.

@Hunterrules0-0
Copy link

Thank you. seems the drivers have loaded. Does rocm/amdgpupro offer better performance than the standard drivers included in pop os

@pablovesnine
Copy link

Thank you. seems the drivers have loaded. Does rocm/amdgpupro offer better performance than the standard drivers included in pop os

I don't believe they offer any additional benefits beyond the open-source drivers, aside from the ability to utilize ROCm. While I do occasionally game, and the performance is excellent, it was also stellar with the open-source drivers.

@Hunterrules0-0
Copy link

thanks

@jacobgkau
Copy link
Member

We have a WIP article that will explain how to install ROCm in an easier way, now that AMD provides an Ubuntu repository for it and offers a ROCm package that doesn't do anything DKMS-related: https://github.com/system76/docs/pull/1231/files

Can anyone confirm if following the instructions in that draft article gets you what you need?

@loafylemon
Copy link
Author

I did a fresh reinstall of Pop!_OS and followed the steps outlined in the aforementioned article, and can confirm that ROCm seems to work just fine. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants