eglGetDisplay heap corruption #99

Closed
johang opened this Issue Oct 6, 2012 · 16 comments

4 participants

@johang

I have an issue where a call to eglGetDisplay somehow writes in Glib's thread-local storage, which corrupts Glib's slice allocator. The pointer magazine1 here (0x3000) is corrupted:

Program received signal SIGSEGV, Segmentation fault.
g_slice_alloc (mem_size=12)
    at /build/glib2.0-GEFbTq/glib2.0-2.32.3/./glib/gslice.c:988
988     /build/glib2.0-GEFbTq/glib2.0-2.32.3/./glib/gslice.c: No such file or directory.
(gdb) bt
#0  g_slice_alloc (mem_size=12)
    at /build/glib2.0-GEFbTq/glib2.0-2.32.3/./glib/gslice.c:988
[...]
(gdb) p tmem
$2 = (ThreadMemory *) 0x12698
(gdb) p *tmem
$3 = {magazine1 = 0x3000, magazine2 = 0x12a98}

It should contain approximately the same address as magazine2. I have tracked this down to two instructions in eglGetDisplay. See line +36 and +40:

(gdb) disassemble eglGetDisplay
Dump of assembler code for function eglGetDisplay:
=> 0x40148690 <+0>: push    {r4, lr}
   0x40148694 <+4>: ldr r3, [pc, #44]   ; 0x401486c8 <eglGetDisplay+56>
   0x40148698 <+8>: ldr r2, [pc, #44]   ; 0x401486cc <eglGetDisplay+60>
   0x4014869c <+12>:    add r3, pc, r3
   0x401486a0 <+16>:    ldr r2, [r3, r2]
   0x401486a4 <+20>:    mov r4, r0
   0x401486a8 <+24>:    ldr r0, [r2]
   0x401486ac <+28>:    bl  0x4014650c
   0x401486b0 <+32>:    cmp r0, #0
   0x401486b4 <+36>:    movne   r3, #12288  ; 0x3000
   0x401486b8 <+40>:    strne   r3, [r0]
   0x401486bc <+44>:    mov r0, r4
   0x401486c0 <+48>:    pop {r4, lr}
   0x401486c4 <+52>:    b   0x40146104
   0x401486c8 <+56>:    muleq   r2, r8, r10
   0x401486cc <+60>:            ; <UNDEFINED> instruction: 0x000004b0
End of assembler dump.

Sadly, the only way to reproduce the issue is by building Cogl[1] and running any of the example Cogl programs. I have not found a way to reproduce the crash without Cogl. My question is, why is eglGetDisplay writing there, and what is it writing?

It's easy to workaround this issue though. Thing is, eglGetDisplay doesn't seem to do anything important and always returns 1. I just replace the call with 1.

[1] http://git.gnome.org/browse/cogl

@popcornmix
static INLINE CLIENT_THREAD_STATE_T *CLIENT_GET_CHECK_THREAD_STATE(void)
{
   return (CLIENT_THREAD_STATE_T *)platform_tls_get_check(client_tls);
}

EGLAPI EGLDisplay EGLAPIENTRY eglGetDisplay(EGLNativeDisplayType display_id)
{
   CLIENT_THREAD_STATE_T *thread = CLIENT_GET_CHECK_THREAD_STATE();
   if (thread)
      thread->error = EGL_SUCCESS;

   return khrn_platform_set_display_id(display_id);
}

I'm going to guess there is memory corruption elsewhere, or you are not calling this from a valid thread (i.e. a thread that has called eglMakeCurrent)

@johang

EGL_SUCCESS is 0x3000 so that sort of explains it. Thanks.

@popcornmix popcornmix closed this Oct 28, 2012
@julianscheel

I'm struggling with this issue as well. Should eglMakeCurrent be called before eglGetDisplay? Does this make sense? eglMakeCurrent awaits the display as parameter, so this sounds odd to me?
If it should be called, what would the parameters look like?

@popcornmix

Calling eglGetDisplay before eglMakeCurrent should be fine (hello_triangle does it).
My understanding is that CLIENT_GET_CHECK_THREAD_STATE should return NULL if eglMakeCurrent hasn't been called, and a valid pointer to CLIENT_THREAD_STATE_T if it has.

The code is here if you want to see what happens:
https://github.com/raspberrypi/userland/blob/master/interface/khronos/egl/egl_client_cr.c#L172

@sdroege

Please re-open this issue.

I can confirm that eglGetDisplay() causes memory corruption when it's not preceded with eglMakeCurrent() with a bogus display value (because we don't have a display yet of course). This happens with GStreamer's eglglessink plugin for example.

hello_triangle however works fine for me, probably because it doesn't use enough memory and nothing critical is corrupted.

valgrind doesn't show anything suspicious in both cases other than usage of unitialized memory in many places.

@popcornmix popcornmix reopened this Feb 13, 2013
@popcornmix

Reopening. I'll try to step through the code and protect against an invalid thread pointer when I get a chance.

@popcornmix

I've had a closer look. You need to have called bcmhost_init (which calls vcos_platform_init) before calling eglGetDisplay (or any function that communicates with GPU)

Are you doing that?

@sdroege

Yes

@popcornmix

Could you provide sample code that provokes this?

@sdroege

I don't have standalone code that reproduces it but you can easily reproduce it with
http://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/ext/eglgles

You only need to compile gstreamer and gst-plugins-base for that. (Remove gsteglglessink.c:1550 , it has the eglMakeCurrent() workaround)

@popcornmix

@sdroege
Sorry, building gstreamer is not something I'm likely to have time for.
Can you post the binary (plus libs or list of apt-get installs required) to run it?

@sdroege

This should be the same as #112 , which has a patch

@sdroege

@popcornmix Does the analysis in #112 and the patch make sense to you?

@popcornmix

@sdroege
I've not understood the sequence the function calls occur in, that results in the failure case described.
Is this something you understand well enough to examplian?

@sdroege

@popcornmix Not really but I think I understood the real problem now. eglGetDisplay() is called as first EGL function in a new thread (eglInitialize() and other stuff was called from other threads already) and it is the first function calling platform_tls_get_check(). Nothing in this thread has called platform_tls_get() yet, so the client state for this thread is not initialized yet.

The patch from pull request 15 is fixing that the same way as it's done for platform_tls_get(). And it's fixing the heap correction for me.

@popcornmix popcornmix closed this Jun 8, 2013
@ajlennon ajlennon added a commit to DynamicDevices/meta-raspberrypi that referenced this issue Jun 8, 2014
@ajlennon ajlennon gstreamer1.0-plugins-bad: Build eglglessink for Raspberry Pi
The eglglessink needs to be built to target the Raspberry Pi or it will seg-fault in use.

Autoconf attempts to detect whether to build for RPi but there are some include files needed, the search path to which must be specified.

ref: raspberrypi/firmware#34

ref: raspberrypi/firmware#99

This patch adds the needed include paths and forces the plugin to be built for Raspberry Pi.

Change-Id: Iffa79dfa9d9e852848ab1ab70a58c124567b7298
Signed-off-by: Alex J Lennon <ajlennon@dynamicdevices.co.uk>
4d7dc4a
@agherzan agherzan pushed a commit to djwillis/meta-raspberrypi that referenced this issue Jun 12, 2014
@ajlennon ajlennon gstreamer1.0-plugins-bad: Build eglglessink for Raspberry Pi
The eglglessink needs to be built to target the Raspberry Pi or it will
seg-fault in use.

Autoconf attempts to detect whether to build for RPi but there are some
include files needed, the search path to which must be specified.

ref: raspberrypi/firmware#34

ref: raspberrypi/firmware#99

This patch adds the needed include paths and forces the plugin to be
built for Raspberry Pi.

With this patch the following pipeline works on RPi,

  modprobe bcm2835-v4l2 gst_v4l2src_is_broken=1
  gst-launch-1.0 --gst-debug-no-color v4l2src \
  ! 'video/x-raw,format=RGB,width=1280,height=720,framerate=(fraction)30/1' \
  ! eglglessink max-lateness=-1

Change-Id: Iabd93c0601c2ab898de3352a7d30423ef9cfce43
Signed-off-by: Alex J Lennon <ajlennon@dynamicdevices.co.uk>
c67cee2
@agherzan agherzan added a commit to djwillis/meta-raspberrypi that referenced this issue Aug 14, 2014
@ajlennon ajlennon gstreamer1.0-plugins-bad: Build eglglessink for Raspberry Pi
The eglglessink needs to be built to target the Raspberry Pi or it will
seg-fault in use.

Autoconf attempts to detect whether to build for RPi but there are some
include files needed, the search path to which must be specified.

ref: raspberrypi/firmware#34

ref: raspberrypi/firmware#99

This patch adds the needed include paths and forces the plugin to be
built for Raspberry Pi.

With this patch the following pipeline works on RPi,

  modprobe bcm2835-v4l2 gst_v4l2src_is_broken=1
  gst-launch-1.0 --gst-debug-no-color v4l2src \
  ! 'video/x-raw,format=RGB,width=1280,height=720,framerate=(fraction)30/1' \
  ! eglglessink max-lateness=-1

Change-Id: Iabd93c0601c2ab898de3352a7d30423ef9cfce43
Signed-off-by: Alex J Lennon <ajlennon@dynamicdevices.co.uk>
946b692
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment