Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text & Icon corruption after resume from suspend on Nvidia #133

Open
Rick1029 opened this issue Feb 5, 2022 · 24 comments
Open

Text & Icon corruption after resume from suspend on Nvidia #133

Rick1029 opened this issue Feb 5, 2022 · 24 comments

Comments

@Rick1029
Copy link

Rick1029 commented Feb 5, 2022

After resuming from suspend on Nvidia graphics, about 60% of the time it results in major corruption of text and icons requiring a shell restart to fix. This seems to be true independent of any other factor. This has only ever occurred to me on Nvidia graphics. Only gnome and gnome extensions is affected - all apps I've seen work normally. See example photos below.

Pop 21.10
Gnome 40.4.0 X11 (have not tried Wayland)
Cosmic version 1
nvidia-driver-495:
Installed: 495.46-0ubuntu0.21.10.1
Candidate: 495.46-0ubuntu0.21.10.1

Login screen after resume.
image

Desktop in workspace overview.
image

Photo of arcmenu. The Cosmic launchers have the same corruption but I can't screenshot them as they disappear when I click to take the screenshot.
image

Portion of the status bar.
image

Some elements in the calendar are fine.
image

@wagnerck
Copy link

wagnerck commented Mar 17, 2022

I've been documenting this problem after encountering it via tickets, customer calls, etc. It's also happening on my personal work laptop (an oryp6). It can occur both after suspend+resume and when switching users, and it appears to occur with both on the 470 and 510 Nvidia binary drivers. It seems to only happen in full Nvidia graphics mode, and not Hybrid.

The workaround when it happens is to press Alt + F2 to bring up the GNOME command prompt, press "r", and then press Enter. This restarts the GNOME shell but leaves running applications intact.

I've been tagging related tickets with the nvidia-gnome-suspend tag, and this includes the following tickets: 50264, 56831, 58336 (all oryp8), 56571 (galp5), 58582 (kudu6), as well as one person running Pop!OS on third-party hardware who reported the issue via the chat system (https://chat.pop-os.org/pop-os/pl/3hzx5iufsbg9pmi77o1c1iaqyw). There's definitely more but those are the ones I've got saved.

This is not a problem limited to Pop!OS either. The same problem is documented in other tracking systems and forums, and here's a small number of the many threads and reports:

The system76-power daemon does appear to be doing the right thing, at least according to Nvidia's power management documents: http://us.download.nvidia.com/XFree86/Linux-x86_64/510.54/README/powermanagement.html

Per the source at (https://github.com/pop-os/system76-power/blob/master/src/graphics.rs) we're using NVreg_PreserveVideoMemoryAllocations=1 on the nvidia kernel module on S3 systems (like my oryp6), and we're using NVreg_EnableS0ixPowerManagement=1 instead on S0iX systems (like the oryp8 and kudu6, I believe). We're also using NVreg_DynamicPowerManagement=0x02 in Hybrid and Compute modes, which allows us to power down the discrete GPU when it's not in use, but this is not enabled in the Nvidia graphics mode where the corruption problem occurs so it shouldn't affect this problem

While testing in Nvidia graphics mode, I commented out the options nvidia NVreg_PreserveVideoMemoryAllocations=1 line in /etc/modprobe.d/system76-power.conf, rebuilt the boot image, and restarted, and for several days now I have not experienced the corruption problem after suspending or switching users. This seems to contradict the documentation from Nvidia, where that module option should prevent the problem instead.

I do not have a system with S0iX support to test with, so I don't know if disabling the NVreg_EnableS0ixPowerManagement would do something similar on that equipment. The problem may actually have a differnet underlying cause in those cases.

Some additional details:

  • This problem also occurs under Wayland (which Pop!OS does not use on Nvidia systems) but the symptoms and fixes appear to be different. I don't know if we need to be concerned about that right now.
  • Some systems are not resuming properly at all but I think that's a different problem altogether.
  • The nvidia kernel module doesn't expose its current parameters via /sys or /proc in the usual places; you can run cat /proc/driver/nvidia/params to see what settings were used for the current boot instead.

[edit] The kudu6 may be an S3 suspend laptop, but I don't have one to test with. The command cat /sys/power/mem_sleep will show if that's the case; if it says [s2idle] shallow it's an S0iX system, and if it says s2idle [deep] then it's an S3 system.

If the kudu6 is an S3 system then the fix that works on my oryp6 may work on the Kudu as well. Note that changing graphics modes will we-write the /etc/modprobe.d/system76-power.conf file and that line will have to be commented out again, and the boot image rebuilt with update-initramfs.

[more edit] Per customer testing, the kudu6 is an S0iX system like the oryp6 is.

@peterHoburg
Copy link

I own a Kudu6, and I am having this issue. 58582 is my ticket, and I thought I could chime in.

When running cat /sys/power/mem_sleep I get [s2idle]. Yes, that is the entire output.

I ended up following the intel guide found here to confirm that the kudu6 IS S0ix.

I would be happy to disable the NVreg_EnableS0ixPowerManagement flag and rebuild the boot image. Unfortunately, I don't know how to rebuild the image, so some instruction would be really helpful, or just point me to the makefile/docs.

@wagnerck
Copy link

@peterHoburg I've responded in the ticket conversation.

For anyone else reading this issue, rebuilding the boot image is just a matter of running sudo update-initramfs -c -k all after making the manual changes to the config files, and then rebooting.

@peterHoburg
Copy link

This purposed fix does not work on the kudu6.

  • I commented out options nvidia NVreg_EnableS0ixPowerManagement=1
  • Ran sudo update-initramfs -c -k all
  • Rebooted
  • Used it for a little while, then closed the lid.
  • The laptop failed to resume from sleep. It would not go back into the UI.

It was easy to reboot and undo these changes.

@wagnerck
Copy link

@peterHoburg Thank you for sharing that information. We'll continue to work on this via the service ticket as well.

@wagnerck
Copy link

New Nvidia 510 series driver is out: https://www.nvidia.com/download/driverResults.aspx/187162/en-us

From the v510.60.02 release notes:

Fixed a regression that could cause OpenGL applications to hang or render incorrectly after suspend/resume cycles or VT-switches

This is not yet in the packaging system so any immediate testing will require using the Nvidia installer, which we don't recommend to end users.

@wagnerck
Copy link

Initial testing on an oryp6 with the unpackaged Nvidia drivers has v510.60.02 performing identically to the previous revision v510.54. The same fix to system76-power.conf works as well. These new drivers may have a fix for S0iX systems but we'll need to wait until we have proper packaging to test safely.

@wagnerck
Copy link

Additional testing with packaged v510.60.02 drivers from a staging repo (https://github.com/pop-os/nvidia-graphics-drivers/tree/nvidia-510.60.02) show exactly the same problem on on oryp6. Going to look into having someone test on an S0iX system using the same staging repo.

@wagnerck
Copy link

wagnerck commented Apr 5, 2022

Adding both Pop!OS testing repositories linux-5.17.1 and nvidia-510.60.02 appears to have resolved the issue on an oryp6 with S3 suspend. With options nvidia NVreg_PreserveVideoMemoryAllocations=1 enabled, the system suspends and resumes without the GNOME corruption. I'm doing additional testing on this system, and will be looking into additional testing on a S0iX suspend system.

@wagnerck
Copy link

wagnerck commented Apr 5, 2022

Additional testing shows that the GNOME corruption still occurs, but only after a significantly longer suspend time (maybe 30m, possibly less). Back to the drawing board...

@wagnerck
Copy link

Initial testing shows that the corruption after suspend+resume still occurs with the Pop!OS v22.04 beta and driver 510.60.02. Continuing testing.

@n3m0-22
Copy link
Contributor

n3m0-22 commented Apr 21, 2022

On the gaze15 with 22.04 this only affects encrypted installs, but works fine on an non-encrypted.

@wagnerck
Copy link

wagnerck commented Apr 21, 2022

@n3m0-22 That is extremely interesting to hear. That's going to make troubleshooting it a little harder because we can't really un-encrypt a drive to try to replicate it, but it gives us some additional things to look at.

@SUPERCILEX
Copy link

FYI I'm on an XPS unencrypted and the issue still occurs.

@thor314
Copy link

thor314 commented Jun 3, 2022

Running a System76 Oryx pro w/ PopOS in nvidia mode also produces this issue on resume from suspend.

@wagnerck
Copy link

wagnerck commented Jun 4, 2022

Initial testing suggests that the v515 Nvidia drivers may resolve this issue, at least on an S3 suspend mode laptop like an oryp6.

We do not recommend downloading the drivers directly from the Nvidia site, as they are not packaged specifically for the OS, will be harder to uninstall if something goes wrong, and may have other unintended effects. We have a testing repository with pre-release packaging, and folks who want to test this for themselves can do so with the following instructions. Please only do so at your own risk on non-critical systems, as these drivers have not been extensively tested.

Run these commands in the terminal, one at a time:

cd ~/Downloads
git clone https://github.com/pop-os/pop
sudo ./pop/scripts/apt add nvidia-515.48.07
sudo apt update
sudo apt purge ~nnvidia
sudo apt install nvidia-driver-515

If the system is a hybrid graphics laptop, run the following as well:

system76-power graphics nvidia

Either way, then reboot the system.

Eventually, the testing repo will be deleted, which will result in apt and/or the Pop!Shop complaining that it can't find it, which may prevent system updates from going through. The repository can be most easily removed by opening up the Pop!Shop, clicking on the gear in the upper right to open the "Repoman" tool, and then removing the "Pop Development Branch nvidia-515.48.07" repository. There are additional details about "Repoman" here, along with screenshots.

@shkm
Copy link

shkm commented Jun 14, 2022

Initial testing suggests that the v515 Nvidia drivers may resolve this issue, at least on an S3 suspend mode laptop like an oryp6.

We do not recommend downloading the drivers directly from the Nvidia site, as they are not packaged specifically for the OS, will be harder to uninstall if something goes wrong, and may have other unintended effects. We have a testing repository with pre-release packaging, and folks who want to test this for themselves can do so with the following instructions. Please only do so at your own risk on non-critical systems, as these drivers have not been extensively tested.

Run these commands in the terminal, one at a time:

cd ~/Downloads
git clone https://github.com/pop-os/pop
sudo ./pop/scripts/apt add nvidia-515.48.07
sudo apt update
sudo apt purge ~nnvidia
sudo apt install nvidia-driver-515

If the system is a hybrid graphics laptop, run the following as well:

system76-power graphics nvidia

Either way, then reboot the system.

Eventually, the testing repo will be deleted, which will result in apt and/or the Pop!Shop complaining that it can't find it, which may prevent system updates from going through. The repository can be most easily removed by opening up the Pop!Shop, clicking on the gear in the upper right to open the "Repoman" tool, and then removing the "Pop Development Branch nvidia-515.48.07" repository. There are additional details about "Repoman" here, along with screenshots.

Thanks for this, @cwsystem76. Just wanted to report that there's no change for me on my (custom) desktop and Wayland. Seems to work for some, as reported on the Nvidia forums.

@wagnerck
Copy link

The nvidia-515.48.07 staging repo was removed over the weekend, possibly because it's going to be going live this week. If you added it and are now getting errors, you can remove the repo via the Pop!Shop as described previously.

@wagnerck
Copy link

The package nvidia-driver-515 should be available in the main repos now.

That said, this problem is only partially resolved, it appears: switching users will still cause the problem to come back, although it's a bit different. Some GNOME apps (noteably the Settings app and GNOME terminal) will be all black until GNOME is reset with the Alt-F2 shortcut. Switching TTYs does not appear to trigger the problem.

@rstanuwijaya
Copy link

Any simple way to update from nvidia-510 driver to nvidia-515 driver as it is already available officially?

@leviport
Copy link
Member

Any simple way to update from nvidia-510 driver to nvidia-515 driver as it is already available officially?

Pop-shop > installed. You'll see an install button for it.

@Apacelus
Copy link

Apacelus commented Mar 5, 2023

So I had the same issue as in the initial comment, on a non-system76 system:
image
Driver version: 525.85.05

I added options nvidia NVreg_PreserveVideoMemoryAllocations=1 to /etc/modprobe.d/system76-power.conf, rebuilt my initramfs, rebooted and now the system wakes up from sleep without any issues.

System is clean, I did a refresh about 2 months ago.

@tavinus
Copy link

tavinus commented Mar 8, 2023

The fix above from @Apacelus worked for me, even though I am on X11 (instead of Wayland).
Driver version: 525.85.05

image

Just had to

sudo nano /etc/modprobe.d/system76-power.conf

Add this line to the end of the file and save/exit (Ctrl + X)

options nvidia NVreg_PreserveVideoMemoryAllocations=1

Then rebuild initramfs with

sudo update-initramfs -c -k all

Then reboot and it should be fixed.

The only problem is that this mod will probably disappear in future updates (or will it not?).
Edit: Just updated to 525.89.02 and the configuration remained.

No more image corruption after waking up.
Thanks heaps!
This was getting me crazy.

@serkanerip
Copy link

I'm having the same issue see the screenshots. I'm going to try @Apacelus's fix but wanted to share that the issue still remains.

OS: Pop!_OS 22.04 LTS
GNOME: 42.9
Windowing System: X11
Nvidia Driver: 550.67

Screenshot from 2024-05-28 23-05-37
Screenshot from 2024-05-28 23-06-13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests