Skip to content

Conversation

@mmstick
Copy link
Member

@mmstick mmstick commented Jan 25, 2024

https://www.nvidia.com/download/driverResults.aspx/218153/en-us/

For review purposes. We can choose to release this separately in a different repository if we want to release it.

@gabriele2000
Copy link

gabriele2000 commented Jan 25, 2024

DKMS doesn't build the module

Terminal log

Loading new nvidia-550.40.07 DKMS files...
Building for 6.6.10-76060610-generic
Building for architecture x86_64
Building initial module for 6.6.10-76060610-generic
ERROR (dkms apport): kernel package linux-headers-6.6.10-76060610-generic is not supported
Error! Bad return status for module build on kernel: 6.6.10-76060610-generic (x86_64)
Consult /var/lib/dkms/nvidia/550.40.07/build/make.log for more information.
dpkg: error processing package nvidia-dkms-550 (--configure):
 installed nvidia-dkms-550 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of nvidia-driver-550:
 nvidia-driver-550 depends on nvidia-dkms-550 (>= 550.40.07); however:
  Package nvidia-dkms-550 is not configured yet.

dpkg: error processing package nvidia-driver-550 (--configure):
 dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
                                                                                                          Processing triggers for gnome-menus (3.36.0-1ubuntu3) ...
Processing triggers for libc-bin (2.35-0ubuntu3.6) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for dbus (1.12.20-2ubuntu4.1) ...
Processing triggers for dbus-broker (29-4build1) ...
Processing triggers for mailcap (3.70+nmu1ubuntu1) ...
Processing triggers for desktop-file-utils (0.26-1ubuntu3) ...
Processing triggers for initramfs-tools (0.140ubuntu13.4) ...
update-initramfs: Generating /boot/initrd.img-6.6.10-76060610-generic
kernelstub.Config    : INFO     Looking for configuration...
kernelstub           : INFO     System information:

    OS:..................Pop!_OS 22.04
    Root partition:....../dev/nvme0n1p3
    Root FS UUID:........012303dc-82fb-4ce0-8f08-62160f8e3263
    ESP Path:............/boot/efi
    ESP Partition:......./dev/nvme0n1p1
    ESP Partition #:.....1
    NVRAM entry #:.......-1
    Boot Variable #:.....0000
    Kernel Boot Options:.i915.mitigations=off intel_pstate=disable mitigations=off systemd.show_status=false loglevel=0 quiet
    Kernel Image Path:.../boot/vmlinuz-6.6.10-76060610-generic
    Initrd Image Path:.../boot/initrd.img-6.6.10-76060610-generic
    Force-overwrite:.....False

kernelstub.Installer : INFO     Copying Kernel into ESP
kernelstub.Installer : INFO     Copying initrd.img into ESP
kernelstub.Installer : INFO     Setting up loader.conf configuration
kernelstub.Installer : INFO     Making entry file for Pop!_OS
kernelstub.Installer : INFO     Backing up old kernel
kernelstub.Installer : INFO     Making entry file for Pop!_OS
Errors were encountered while processing:
 nvidia-dkms-550
 nvidia-driver-550
E: Sub-process /usr/bin/dpkg returned an error code (1)
gabriele@msi-gp72m:~$

make.log

gabriele@msi-gp72m:~$ cat /var/lib/dkms/nvidia/550.40.07/build/make.log
DKMS make.log for nvidia-550.40.07 for kernel 6.6.10-76060610-generic (x86_64)
gio 25 gen 2024, 22:10:48, CET
make[1]: Entering directory '/usr/src/linux-headers-6.6.10-76060610-generic'
make --no-print-directory -C /usr/src/linux-headers-6.6.10-76060610-generic \
-f /usr/src/linux-headers-6.6.10-76060610-generic/Makefile modules
make -f ./scripts/Makefile.build obj=/var/lib/dkms/nvidia/550.40.07/build need-builtin=1 need-modorder=1
/var/lib/dkms/nvidia/550.40.07/build/Kbuild:233: /var/lib/dkms/nvidia/550.40.07/build/header-presence-tests.mk: No such file or directory
make[3]: *** No rule to make target '/var/lib/dkms/nvidia/550.40.07/build/header-presence-tests.mk'.  Stop.
make[2]: *** [/usr/src/linux-headers-6.6.10-76060610-generic/Makefile:1919: /var/lib/dkms/nvidia/550.40.07/build] Error 2
make[1]: *** [Makefile:234: __sub-make] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-6.6.10-76060610-generic'
make: *** [Makefile:85: modules] Error 2
gabriele@msi-gp72m:~$

@mmstick mmstick force-pushed the nvidia-550.40.08 branch 2 times, most recently from f5e3f87 to f8720ae Compare January 25, 2024 21:49
@XV-02
Copy link

XV-02 commented Feb 5, 2024

I'm seeing issues with native Cosmic applications and some others such as the Lapce flatpak being entirely unresponsive in Pop with this PR on desktop.

This is true both in Cosmic DE and Gnome in either Wayland or X modes. Without the Nvidia driver installed, on the same hardware, I see no such issues. I found this on a Spark with a 1080 graphics card - previously we were looking to the Beta driver to address an existing nvidia bug around wayland session in which response times were being measures in seconds per frame on 10xx series hardware. That general responsiveness appears resolved despite the issues with specific applications.

On a Serw13 with a 4070, native Cosmic applications seemed to function without issues, but Lapce was equally unusable in integrated, hybrid, or dedicated graphics modes in a different way. Lapce would jump almost entirely off the display (to the left) whenever it received a mouse input, and horizontal window size would collapse to about twice the width of that application's close button. However, I could move and resize Lapce successfully using the Cosmic extensions for Gnome.

It looks like the driver may need a little longer to mature before we can push it. Alternatively, it may suggest broader issues.

@gabriele2000
Copy link

gabriele2000 commented Feb 5, 2024

response times were being measures in seconds per frame on 10xx series hardware. That general responsiveness appears resolved despite the issues with specific applications.

Heh, I remember that a week after that problem appeared, maybe two, there was a patch for every cosmic application that fixed the issue, despite adding a rendering issue for a lot of elements.
Since two days that problem got fixed too, since wgpu got patched.

Basically nvidia claims were fake, since I was able at some point, thanks to @mmstick fix (later that day), to see that not only the problem didn't get corrected even after the driver update, but I had the classic standby issues (and even other issues, such flatpaks being unresponsive, like you said) that you often get after a new nvidia driver release.

It's a beta, sure, but beta doesn't mean "broken", it means "it works with some minor issues"

@mmstick
Copy link
Member Author

mmstick commented Feb 5, 2024

It's possible that a required fix is in pop-os/egl-wayland#2

@XV-02
Copy link

XV-02 commented Feb 5, 2024

The egl-wayland Nvidia commits are resolving most of my issues. I don't think that other behaviours I'm seeing are specific to this Nvidia update. I'll continue seeing if I notice any other differences/regressions between 550 and 545.

@RayJW
Copy link

RayJW commented Feb 7, 2024

This driver is currently unusable on my system. I've attached a log extract of what's happening on boot. Basically I just get a grey screen and when switching to a TTY session I get spammed with the line Feb 07 23:28:39 device kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000900] Flip event timeout on head 1 a few times after logging in, before it becomes usable. The same happens when running sudo systemctl restart gdm3 and the graphical session is still unusable.

All updates are applied, including the newly released libnvidia-egl-wayland1 version: 1:1.1.13-2pop1~1707162632~22.04~c5241b5.

@XV-02
Copy link

XV-02 commented Feb 8, 2024

@RayJW can you attach system specifications for us? Particularly what GPU you have installed?

@RayJW
Copy link

RayJW commented Feb 8, 2024

@RayJW can you attach system specifications for us? Particularly what GPU you have installed?

Sure! I have a Ryzen 9 3900X with a GTX 1080 Ti, anything else that could be of help?

@XV-02
Copy link

XV-02 commented Feb 12, 2024

@RayJW We don't have a GTX 1080 Ti in our lab. However, our standard GTX 1080 performed without issues. If you haven't already, please try the following steps:

  1. Remove the Nvidia driver, and reboot to see if the card functions under the open-source Nouveau driver. First, from a TTY, run sudo apt purge ~nnvidia (that "~nnvidia" with two "n"s is not a typo 😄 ) This will remove the Nvidia driver and associated configuration files. Then run sudo apt autoremove which will remove any packages that were installed as dependencies, but are no longer needed. This should clean out and dependencies the Nvidia driver pulled in that are no longer needed.

  2. Reboot the system, and see if you can successfully reach a graphical user session. If you can successfully reach a graphical user session:

  3. Try reinstalling the driver. Run sudo apt install nvidia-driver-550 which will install the Nvidia-driver from this pull-request, and try rebooting again.

The Nvidia driver is complicated, and I have seen issues where purging and reinstalling the driver has been the solution, so those are good first steps.

@XV-02
Copy link

XV-02 commented Feb 12, 2024

Broadly, on our lab hardware, I am seeing compliance with this driver. There is one obvious issue though. nvidia-smi reports the driver version as 550.40.07 not 550.40.08

It looks to be behaving without issue on 10xx series, 20xx series, and 40xx series GPUs in my desktop testing. On the laptop front, I've only really looked at a 16xx series system, so that might leave a 40xx series laptop to test unless someone else on QA has already covered that base.

Finally, something I have noticed recently, which is a nuisance and not an actual driver problem, is that the output of nvidia-smi is now wider than our default width for freshly spawned gnome-terminal window in floating mode. I don't know if that's something we had configured ourselves or not, but thought I'd mention it as a quality of life thing if it was and it had slipped through the cracks at some point.

@mmstick
Copy link
Member Author

mmstick commented Feb 12, 2024

550.40.07 is correct. I must have typo'd the .08

@RayJW
Copy link

RayJW commented Feb 13, 2024

@XV-02 Sorry, it seems like that fixed it, although I was so sure I already tried that. I did however only purge ".*nvidia.*" so maybe the ~nnvidia did the trick. I can confirm the system seems to be working fine now and the session seems to be working without issues so far!

@leviport leviport requested review from a team February 13, 2024 22:46
leviport
leviport previously approved these changes Feb 13, 2024
@mmstick mmstick mentioned this pull request Feb 23, 2024
@jackpot51 jackpot51 closed this Feb 28, 2024
@jackpot51 jackpot51 deleted the nvidia-550.40.08 branch February 28, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants