Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call to glDeleteFramebuffers happens from wrong OS thread ⇒ crash on exit #879

Closed
kmcallister opened this issue Sep 6, 2013 · 8 comments
Closed

Comments

@kmcallister
Copy link
Contributor

@kmcallister kmcallister commented Sep 6, 2013

Sometimes (about half the time) the last reference to the Skia GL context is dropped from an OS thread that doesn't have the OpenGL thread-local state, causing a crash on exit. Here's a GDB transcript showing how this happens.

$ RUST_THREADS=2 gdb --args ./servo ../src/test/html/about-mozilla.html 
GNU gdb (GDB) 7.4.1-debian

(gdb) break SkNativeSharedGLContext::createGLContext
Breakpoint 1 (SkNativeSharedGLContext::createGLContext) pending.
(gdb) run

Breakpoint 1, SkNativeSharedGLContext::createGLContext (this=0x7fffeb613300)
    at servo/src/support/skia/skia/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp:57
(gdb) info threads
  Id   Target Id         Frame 
  3    Thread 0x7fffec7e3700 (LWP 8422) "servo" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
* 2    Thread 0x7ffff7fc6700 (LWP 8421) "servo" SkNativeSharedGLContext::createGLContext (this=0x7fffeb87a800)
    at servo/src/support/skia/skia/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp:57
  1    Thread 0x7ffff7fca780 (LWP 8418) "servo" syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39

We are making OpenGL calls from the thread with Id = 2. The "Target Id" is equal to that thread's %fs register value, used for thread-local storage.

Let's examine a particular bit of TLS (offset justified below):

(gdb) x/xg 0x7ffff7fc6700-0xd0
0x7ffff7fc6630: 0x00007fffef362280
(gdb) x/xg 0x7fffec7e3700-0xd0
0x7fffec7e3630: 0x0000000000000000

The current thread (Id = 2) has a non-null value here, pointing to data within libnvidia-glcore.so.304.88, while thread 3 has NULL.

(gdb) disable 1
(gdb) continue

Servo displays the page. Now we close the window.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffec7e3700 (LWP 8434)]
0x00007ffff1258769 in glDeleteFramebuffers () from /usr/lib/x86_64-linux-gnu/libGL.so.1
(gdb) x/2i $pc-9
   0x7ffff1258760 <glDeleteFramebuffers>:   mov    %fs:0xffffffffffffff30,%rax
=> 0x7ffff1258769 <glDeleteFramebuffers+9>: jmpq   *0x13f8(%rax)
(gdb) p/x $rax
$2 = 0x0

We are on thread 3, trying to call glDeleteFramebuffers through that NULL pointer from TLS. The offset from %fs is the one we examined earlier.

(gdb) backtrace
#0  0x00007ffff1258769 in glDeleteFramebuffers () from /usr/lib/x86_64-linux-gnu/libGL.so.1
#1  0x00007ffff5ee79ca in SkNativeSharedGLContext::~SkNativeSharedGLContext (this=0x7fffeb613300, __in_chrg=<optimized out>)
    at servo/src/support/skia/skia/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp:37
#2  0x00007ffff5ee7c12 in SkNativeSharedGLContext::~SkNativeSharedGLContext (this=0x7fffeb613300, __in_chrg=<optimized out>)
    at servo/src/support/skia/skia/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp:48
#3  0x00007ffff5de43bc in SkRefCnt::internal_dispose (this=0x7fffeb613300) at servo/src/support/skia/skia/include/core/SkRefCnt.h:104
#4  0x00007ffff5db8d96 in SkRefCnt::unref (this=0x7fffeb613300)
    at servo/src/support/azure/rust-azure/../../skia/skia/include/core/SkRefCnt.h:66
#5  0x00007ffff5db8dca in SkRefCnt::Release (this=0x7fffeb613300)
    at servo/src/support/azure/rust-azure/../../skia/skia/include/core/SkRefCnt.h:82
#6  0x00007ffff5db7fbb in AzReleaseSkiaSharedGLContext (aGLContext=0x7fffeb613300)
    at servo/src/support/azure/rust-azure/azure-c.cpp:175
#7  0x00007ffff5d9bd45 in azure_hl::__extensions__::meth_3573::drop::_7696d3b5ca3f6f5::_0$x2e1 ()
   from servo/build/x86_64-unknown-linux-gnu/src/support/azure/rust-azure/libazure-1bdc5ce4bb2c9e2a-0.1.so
#8  0x0000000000439475 in azure..azure_hl..DrawTarget::_94513dc4b5937bf1::glue_drop_10842 ()
#9  0x0000000000439418 in msg..compositor_msg..LayerBuffer::_22fad6777e74b18f::glue_drop_10836 ()
#10 0x0000000000469be9 in msg..compositor_msg..LayerBufferSet::_a358786b25d81f9::glue_drop_13495 ()
#11 0x0000000000469cda in std..option..Option$LT$$UP$msg..compositor_msg..LayerBufferSet$GT$::_77a336369c3b8b3c::glue_drop_13507 ()
#12 0x00000000005270ac in gfx..render_task..RenderTask$LT$compositing..CompositorChan$C$script..dom..node..AbstractNode$LT$$LP$$RP$$GT$$GT$::_a576ad71b9a38ba6::glue_drop_23345 ()
#13 0x000000000050bf54 in render_task::__extensions__::create_21744::anon::expr_fn_21765 ()
...

Note that I had to apply this patch first:

diff --git a/src/components/main/platform/common/glfw_windowing.rs b/src/components/main/platform/common/glfw_windowing.rs
index 9383dc7..afcc6e6 100644
--- a/src/components/main/platform/common/glfw_windowing.rs
+++ b/src/components/main/platform/common/glfw_windowing.rs
@@ -33,7 +33,7 @@ impl ApplicationMethods for Application {

 impl Drop for Application {
     fn drop(&self) {
-        glfw::terminate();
+        //glfw::terminate();
     }
 }

to avoid another crash on exit (#822).

@metajack
Copy link
Contributor

@metajack metajack commented Sep 6, 2013

Sounds like we need to add MakeCurrent() to the drop logic.

@metajack
Copy link
Contributor

@metajack metajack commented Sep 6, 2013

Or better yet, the Skia destructor should call MakeCurrent so we can't get this wrong.

I'm not super satisfied with how MakeCurrent() works in general. It's too easy to get wrong right now.

@kmcallister
Copy link
Contributor Author

@kmcallister kmcallister commented Sep 9, 2013

When I patch Skia like so:

diff --git a/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp b/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp
index d7dd097..2ce48f5 100644
--- a/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp
+++ b/src/gpu/gl/unix/SkNativeSharedGLContext_unix.cpp
@@ -33,6 +33,7 @@ SkNativeSharedGLContext::SkNativeSharedGLContext(GrGLSharedContext sharedContext

 SkNativeSharedGLContext::~SkNativeSharedGLContext() {
     if (fGL) {
+        glXMakeCurrent(fDisplay, fGlxPixmap, fContext);
         SK_GL_NOERRCHECK(*this, DeleteFramebuffers(1, &fFBO));
         SK_GL_NOERRCHECK(*this, DeleteTextures(1, &fTextureID));
         SK_GL_NOERRCHECK(*this, DeleteRenderbuffers(1, &fDepthStencilBufferID));

the segfaults are replaced by

X Error of failed request:  BadDrawable (invalid Pixmap or Window parameter)
  Major opcode of failed request:  138 (NV-GLX)
  Minor opcode of failed request:  4 ()
  Resource id in failed request:  0x1a00008
  Serial number of failed request:  479
  Current serial number in output stream:  479

So we'll have to do something more.

@kmcallister
Copy link
Contributor Author

@kmcallister kmcallister commented Sep 9, 2013

That goes away if I preserve the GLFW window (in addition to the library global state) through program exit.

@metajack
Copy link
Contributor

@metajack metajack commented Sep 9, 2013

I think it might be right to call glXMakeCurrent here anyway, even if it doesn't solve this problem.

@kmcallister
Copy link
Contributor Author

@kmcallister kmcallister commented Sep 9, 2013

Yeah, I think it is the right solution, we just need to keep the window alive too (which is an extension of the fix for #822)

@jdm
Copy link
Member

@jdm jdm commented Apr 18, 2014

Did we fix this?

@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Apr 20, 2014

Yes, this is fixed by the sweeping changes in #1487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.