Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser WebGL Creation Fails on NVIDIA #84

Closed
WatchMkr opened this issue Mar 3, 2023 · 12 comments
Closed

Browser WebGL Creation Fails on NVIDIA #84

WatchMkr opened this issue Mar 3, 2023 · 12 comments
Milestone

Comments

@WatchMkr
Copy link

WatchMkr commented Mar 3, 2023

NVIDIA RTX A2000 12GB
Driver 525.85.05

From about:support

WebGL 1 Driver Renderer
WebGL creation failed:

  • WebglAllowWindowsNativeGl:false restricts context creation on this system. ()
  • Exhausted GL driver options. (FEATURE_FAILURE_WEBGL_EXHAUSTED_DRIVERS)

WebGL 2 Driver Renderer
WebGL 2 Driver Renderer WebGL creation failed:

  • AllowWebgl2:false restricts context creation on this system. ()

Window Protocol
wayland

Desktop Environment
pop:cosmic

Quite a few errors related to blocklisting by glxinfo
#BLOCKLIST_FEATURE_FAILURE_GLXTEST_FAILED
Blocklisted by gfxInfo

Failure Log
No GPUs detected via PCI
glxtest: process failed (received signal 11)

@ids1024
Copy link
Member

ids1024 commented Mar 3, 2023

I see the same behavior if I run this on my gaze15 with cosmic-comp set to use the Nvidia GPU for rendering, with MOZ_ENABLE_WAYLAND=1.

With MOZ_ENABLE_WAYLAND=0 it seems to be falling back to LLVMPipe for some reason? Which "works", but not well.

glxtest: process failed (received signal 11)

Signal 11 is SIGSEGV. So it could be a segfault within the driver? But also it shouldn't use GLX on Wayland (it shouldn't be possible to use GLX). So unless that's inaccurately named something's going wrong if it's using GLX rather than EGL.

@ids1024
Copy link
Member

ids1024 commented Mar 3, 2023

Or looking at the Firefox source, maybe it's expected to be prone to segfaults. But anyway, it shouldn't be using glx.

  // bug 639842 - it's very important to fire this process BEFORE we set up
  // error handling. indeed, this process is expected to be crashy, and we
  // don't want the user to see its crashes. That's the whole reason for
  // doing this in a separate process.
  //  
  // This call will cause a fork and the fork will terminate itself separately
  // from the usual shutdown sequence
  fire_glxtest_process();   

@WatchMkr
Copy link
Author

WatchMkr commented Mar 6, 2023

Chrome is also dropping to LLVMPipe though more gracefully. Appears this is related to nvidia in general. Issue title updated.

chrome://gpu/
WebGL: Software only, hardware acceleration unavailable
WebGL2: Software only, hardware acceleration unavailable
WebGPU: Disabled

@WatchMkr WatchMkr changed the title Firefox WebGL Creation Failes on NVIDIA Browser WebGL Creation Fails on NVIDIA Mar 6, 2023
@ids1024
Copy link
Member

ids1024 commented Mar 7, 2023

Hm, these errors logged to journalctl may be relevant. Or is something worth checking anyway.

I see the compositor is still advertising a bunch of formats with the zwp_linux_dmabuf_v1, but unlike amdgpu on Dev One no formats are available with linear modifiers. Only modifiers I presume are vendor specific formats. Maybe that's expected on this GPU though, and it isn't obviously something that should cause an issue for webgl...

Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format
Mar 07 09:41:58 pop-os cosmic-comp[248483]: [EGL] 0x300c (BAD_PARAMETER) eglQueryDmaBufModifiersEXT: EGL_BAD_PARAMETER error: In eglQueryDmaBufModifiersEXT: Invalid format

Edit: Actually, I see there's a comment in Smithay about this call failing being a known issue with Nvidia's driver. And it hopefully shouldn't prevent things from working...

@ids1024
Copy link
Member

ids1024 commented Mar 7, 2023

Okay, so looking at the Firefox code and the bug tracker:

  • Firefox forks a process called glxtest to test what is supported. It uses a separate process so it can handle crashes.
  • Despite the name, glxtest uses EGL on Wayland, and now attempts to use EGL before GLX on X.
    • So the segfault is indeed the problem and not some distraction, but does not actually indicate GLX is involved.
  • The process is segfaulting. Presumably somewhere in Nvidia's EGL implementation, but I'm having trouble debugging the subprocess.
    • For some reason Firefox hangs and never gets to the point in gdb with detach-on-fork disabled.
  • https://bugzilla.mozilla.org/show_bug.cgi?id=1787597 seems similar. https://bugzilla.mozilla.org/show_bug.cgi?id=1768260 mentions a crash that sounds like it could occur with Nvidia/wayland in egltest, but it looks like that shouldn't be a segfault?

@ids1024
Copy link
Member

ids1024 commented Mar 7, 2023

Okay, I can break fork then set follow-fork-mode child after the first fork to attach the debugger to the glxtest process. And get debug symbols for libraries through debuginfod.

#0  0x00007ffff59a1ee8 in queue_event (len=4096, display=0x7ffff779c430) at ../src/wayland-client.c:1499
#1  read_events (display=0x7fffffffc1a0) at ../src/wayland-client.c:1622
#2  wl_display_read_events (display=display@entry=0x7ffff779c430) at ../src/wayland-client.c:1705
#3  0x00007ffff59a2d59 in wl_display_dispatch_queue (queue=<optimized out>, display=<optimized out>)
    at ../src/wayland-client.c:1944
#4  wl_display_dispatch_queue (display=display@entry=0x7ffff779c430, queue=queue@entry=0x7ffff779c500)
    at ../src/wayland-client.c:1912
#5  0x00007ffff59a3d7f in wl_display_roundtrip_queue (display=0x7ffff779c430, queue=0x7ffff779c500)
    at ../src/wayland-client.c:1358
#6  0x00007fffeed0dcbb in  () at /usr/lib/firefox/libxul.so
#7  0x00007fffeed0e406 in  () at /usr/lib/firefox/libxul.so
#8  0x00007ffff145e81a in  () at /usr/lib/firefox/libxul.so
#9  0x00007fffeed0b544 in  () at /usr/lib/firefox/libxul.so
#10 0x00007fffeed0b9c3 in  () at /usr/lib/firefox/libxul.so
#11 0x00005555555c2c10 in _start ()

Kind of a strange place to segfault, but consistent with a comment in Firefox's glxtest.cpp:

  // This is enough to crash some broken NVIDIA prime + Wayland setups, see
  // https://github.com/NVIDIA/egl-wayland/issues/41 and bug 1768260.
  wl_display_roundtrip(dpy);

@ids1024
Copy link
Member

ids1024 commented Mar 7, 2023

In particular libwayland-client is failing at the line if (opcode >= proxy->object.interface->event_count) {, since proxy->object.interface is not a valid pointer.

@ids1024
Copy link
Member

ids1024 commented Mar 8, 2023

Glxtest seems to segfault on sway and Gnome Wayland too, with the Nvidia GPU. Though for some reason the whole Firefox process is failing too when I try it there?

So it doesn't seem to be an issue on our end. I guess something is wrong with the Nvidia Wayland EGL implementation.

@ids1024
Copy link
Member

ids1024 commented Mar 8, 2023

NVIDIA/egl-wayland#64 describes a segfault in Firefox in glxtest dereferencing the same thing in queue_event.

@ids1024
Copy link
Member

ids1024 commented Mar 8, 2023

This appears to be fixed when I build and install https://github.com/NVIDIA/egl-wayland from git. So we should probably be able to package a newer release.

@ids1024
Copy link
Member

ids1024 commented Mar 8, 2023

Actually it seems NVIDIA/egl-wayland@7af7082 is the commit that fixes it, so it's not in the latest release.

@ezkha
Copy link

ezkha commented Apr 22, 2023

Actually it seems NVIDIA/egl-wayland@7af7082 is the commit that fixes it, so it's not in the latest release.

I actually think it's NVIDIA/egl-wayland@c63bf73 that fixes it.

Regardless, latest release of egl-wayland still has the issue; a new release should be made.
I will test soon with git build of egl-wayland; stuck on 1.1.11 because of a third-party driver installer.

@WatchMkr WatchMkr added this to the alpha 1 milestone May 3, 2024
@WatchMkr WatchMkr closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Alpha 1 Complete
Development

No branches or pull requests

3 participants