Skip to content
This repository has been archived by the owner on Apr 27, 2024. It is now read-only.

Crash in native code #52

Open
baldur opened this issue Mar 14, 2014 · 35 comments
Open

Crash in native code #52

baldur opened this issue Mar 14, 2014 · 35 comments

Comments

@baldur
Copy link

baldur commented Mar 14, 2014

Hi I just diving in to troubleshoot a crash that started happening after we updated to android 4.4. The devices affected are Samsung Galaxy S4, we have a nexus running 4.4 which doesn't seem to be affected or at least we have not experience the problem there, nore did we with our S4 prior to the 4.4 update.

I figured I would raise the issue here incase someone already knows about this issue or has some insights. I will follow up as I progress in my search.

********** Crash dump: **********
Build fingerprint: 'samsung/jflteuc/jflteatt:4.4.2/KOT49H/I337UCUFNB1:user/release-keys'
pid: 19670, tid: 19693, name: Thread-8277  >>> com.mapzen <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 80e6c000 
Stack frame #00  pc 0002225c  /system/lib/libc.so (__memcpy_base+227)
Stack frame #01  pc 00071107  /system/vendor/lib/egl/libGLESv2_adreno.so (rb_memcpy+282)
Stack frame #02  pc 0007d4a1  /system/vendor/lib/egl/libGLESv2_adreno.so (rb_vbo_cache_buffer+320)
Stack frame #03  pc 000465a7  /system/vendor/lib/egl/libGLESv2_adreno.so (cache_vbo_attrib+298)
Stack frame #04  pc 0004962d  /system/vendor/lib/egl/libGLESv2_adreno.so
Stack frame #05  pc 00049da5  /system/vendor/lib/egl/libGLESv2_adreno.so (core_glDrawElementsInstancedXXX+140)
Stack frame #06  pc 00049fd7  /system/vendor/lib/egl/libGLESv2_adreno.so (core_glDrawElements+10)
Stack frame #07  pc 00039767  /system/vendor/lib/egl/libGLESv2_adreno.so (glDrawElements+28)
Stack frame #08  pc 00020bcc  /system/lib/libdvm.so (dvmPlatformInvoke+112)
Stack frame #09  pc 00051927  /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+398)
Stack frame #10  pc 0002a060  /system/lib/libdvm.so
Stack frame #11  pc 00031510  /system/lib/libdvm.so (dvmMterpStd(Thread*)+76)
Stack frame #12  pc 0002eba8  /system/lib/libdvm.so (dvmInterpret(Thread*, Method const*, JValue*)+184)
Stack frame #13  pc 00063e75  /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+336)
Stack frame #14  pc 00063e99  /system/lib/libdvm.so (dvmCallMethod(Thread*, Method const*, Object*, JValue*, ...)+20)
Stack frame #15  pc 00058b6b  /system/lib/libdvm.so
Stack frame #16  pc 0000d278  /system/lib/libc.so (__thread_entry+72)
Stack frame #17  pc 0000d410  /system/lib/libc.so (pthread_create+240)
@hjanetzek
Copy link
Member

The same crash with adreno chipset and kitkat was reported earlier today (via mail). The user found that GL.glDrawElements(GL20.GL_LINES ... ) in ExtrusionRenderer triggers the problem. I havent looked further into it yet, though google:'adreno kitkat rb_memcpy' shows the issue also happens elsewhere.

@baldur
Copy link
Author

baldur commented Mar 19, 2014

Just dumping info here for what it's worth: https://gist.github.com/baldur/9652381

it's a bit strange the map loads fine but as soon as you start interact with it it eventually will crash and it's almost always shortly after a GC run. I have tried fiddling with some of the code including the line you mentioned and I am not convinced it's the same issue. I've been unable to attach a gdb debugger due various issues http://developer.samsung.com/forum/thread/ndk-debugging-with-gdb/77/178834 being one of them.

I am sort of running out of ideas and if you have any advice on how I can help with debugging this further please let me know ... thankfully we do have devices still with 4.3 so we are not pressed for time but I can definitely spend a bit more time if you have ideas for what would be good to experiment with in order to identify the culprit.

@hjanetzek
Copy link
Member

There were some reports for Unity providing similar traces. So I'm pretty sure it's a bug in the driver - or in Android memory management. One way to test if it is caused by buffer data is being garbage collected before moved to GL memory would be to comment out 'mUsedBuffers = releaseAll(mUsedBuffers);' in MapRenderer. If thats the case one could ensure to keep references to the Buffer objects and not reuse them until the corresponding VBOs are drawn once.

@hjanetzek
Copy link
Member

Could you try changing GL_DYNAMIC_DRAW in BufferObject to GL_STATIC_DRAW? It might use a different path in the driver and circumvent the problem.

@baldur
Copy link
Author

baldur commented Mar 20, 2014

Tried both of those to no avail ... everything points to the problem with the driver as you mentioned. Next step I am planning on is to root the device and reset some of the settings as is suggested here https://developer.qualcomm.com/forum/qdevnet-forums/mobile-technologies/mobile-gaming-graphics-optimization-adreno/26936. This mentions opengl 3 so I am not sure if it applies to our situation but it's worth a shot.

The post does mention another workaround which I didn't quite understand but perhaps you understand what he means by:

I have found another thing to do that helps decrease the chance of it crashing, 
this is another terrible workaround but it "works." Every time I draw something with 
glDrawRangeElements, I insert a eglSwapBuffers. This has the downfall of absolutely 
murdering performance and introducing flickering, but again, it helps lessen the 
chance of crashing.

@stleusc
Copy link
Contributor

stleusc commented Apr 13, 2014

I just had reported the same via email and then I found it here ;-)
One user of my app reported same issue, same device, etc.
Any idea here?

@stleusc
Copy link
Contributor

stleusc commented Apr 13, 2014

Not sure if what they talk here can be applied (was about buffers...)
http://www.tasharen.com/forum/index.php?topic=8415.msg42698#msg42698

@hjanetzek
Copy link
Member

If you could check the ant traces file one might see if the crash is triggered by a rendering call from the same renderer - One reporter told me that it happens in ExtrusionRenderer but he didnt replied back to confirm that it's the only place. In this case one could disable 3D buildings for blacklisted drivers...

@baldur
Copy link
Author

baldur commented Apr 14, 2014

Just for the record I can crash without the building layer added ... so I am not sure if that approach will suffice.

@bcamper
Copy link

bcamper commented Apr 14, 2014

Also blacklisting two of the most popular GPUs doesn't feel like a great
permanent solution (though maybe a short-term band-aid).

On Mon, Apr 14, 2014 at 11:38 AM, Baldur Gudbjornsson <
notifications@github.com> wrote:

Just for the record I can crash without the building layer added ... so I
am not sure if that approach will suffice.


Reply to this email directly or view it on GitHubhttps://github.com//issues/52#issuecomment-40381172
.

@hjanetzek
Copy link
Member

Could you send the crash details to qualcomm? - It seems one can get direct feedback on their forum with such issues:
https://developer.qualcomm.com/forum/qdevnet-forums/mobile-technologies/mobile-gaming-graphics-optimization-adreno/27030

@hjanetzek hjanetzek reopened this Apr 14, 2014
@stleusc
Copy link
Contributor

stleusc commented Apr 14, 2014

Any way to use opengl 3 on these devices? Read this fixed it in other apps.

@hjanetzek
Copy link
Member

@stleusc where did you find that?

@stleusc
Copy link
Contributor

stleusc commented Apr 15, 2014

I don't remember :-(
Would it be hard to implement the change?

@hjanetzek
Copy link
Member

I guess it wouldn't - If it were possible to enable gles3. For the driver it should make no difference as gles2 is a strict subset of the gles3 api - but from what I've read about adreno drivers[1] I wouldnt count on should :)

[1] https://dolphin-emu.org/blog/2013/09/26/dolphin-emulator-and-opengl-drivers-hall-fameshame/

@stleusc
Copy link
Contributor

stleusc commented Apr 15, 2014

well according to this: http://developer.android.com/guide/topics/graphics/opengl.html
you can check if gles3 is supported and if so, use it!

@hjanetzek
Copy link
Member

might be worth a try, maybe it really switches the complete driver .so... In org.oscim.android.gl.GLView() add:

        setEGLContextFactory(new GLSurfaceView.EGLContextFactory() {
            private int EGL_CONTEXT_CLIENT_VERSION = 0x3098;

            public EGLContext createContext(EGL10 egl, EGLDisplay display, EGLConfig eglConfig) {
                Log.w("", "creating OpenGL ES3 context");
                int[] attrib_list = { EGL_CONTEXT_CLIENT_VERSION, 3, EGL10.EGL_NONE };
                EGLContext context = egl.eglCreateContext(display, eglConfig,
                                                          EGL10.EGL_NO_CONTEXT, attrib_list);
                if (context != EGL10.EGL_NO_CONTEXT)
                    return context;

                Log.w("", "creating OpenGLES2 context");
                attrib_list[1] = 2;
                context = egl.eglCreateContext(display, eglConfig, EGL10.EGL_NO_CONTEXT,
                                               attrib_list);
                return context;
            }

            @Override
            public void destroyContext(EGL10 egl, EGLDisplay display, EGLContext context) {
                egl.eglDestroyContext(display, context);
            }
        });

        setEGLConfigChooser(new GlConfigChooser());
        //setEGLContextClientVersion(2);

@baldur
Copy link
Author

baldur commented Apr 15, 2014

I had problems compiling your code sample but I set the clientVersion directly to 3 and also changed all the constants in AndroidGL to use GLES30 in lua of GLE20 but I still get crashes.

libGLESv2 seems to suggest that the driver for v2 is still being used so I wonder if this is not enough to get it to use gles3. Do you know what I can call in the running app to verify that I have successfully set it to use gles3?

********** Crash dump: **********
Build fingerprint: 'samsung/jflteuc/jflteatt:4.4.2/KOT49H/I337UCUFNB1:user/release-keys'
pid: 1143, tid: 1254, name: Thread-10097  >>> com.mapzen <<<
signal 7 (SIGBUS), code 2 (BUS_ADRERR), fault addr 7efd4940
Stack frame #00  pc 0002225c  /system/lib/libc.so (__memcpy_base+227)
Stack frame #01  pc 00071107  /system/vendor/lib/egl/libGLESv2_adreno.so (rb_memcpy+282)
Stack frame #02  pc 0007d4a1  /system/vendor/lib/egl/libGLESv2_adreno.so (rb_vbo_cache_buffer+320)
Stack frame #03  pc 000465a7  /system/vendor/lib/egl/libGLESv2_adreno.so (cache_vbo_attrib+298)
Stack frame #04  pc 0004962d  /system/vendor/lib/egl/libGLESv2_adreno.so
Stack frame #05  pc 00049da5  /system/vendor/lib/egl/libGLESv2_adreno.so (core_glDrawElementsInstancedXXX+140)
Stack frame #06  pc 00049fd7  /system/vendor/lib/egl/libGLESv2_adreno.so (core_glDrawElements+10)
Stack frame #07  pc 00039767  /system/vendor/lib/egl/libGLESv2_adreno.so (glDrawElements+28)
Stack frame #08  pc 00020bcc  /system/lib/libdvm.so (dvmPlatformInvoke+112)
Stack frame #09  pc 00051927  /system/lib/libdvm.so (dvmCallJNIMethod(unsigned int const*, JValue*, Method const*, Thread*)+398)
Stack frame #10  pc 00000214  /dev/ashmem/dalvik-jit-code-cache (deleted)

@hjanetzek
Copy link
Member

If there is no /system/vendor/lib/egl/libGLESv3_adreno.so it is probably the correct library. What was the problem with the code above? It's the recommended way to query the gl version at http://developer.android.com/guide/topics/graphics/opengl.html - when the first call to eglCreateContext does return a context then you have a gles3 context.

Maybe we can figure out if one specific vtm renderer triggers the crash - There are not many uses of glDrawElements. Have you tried to turn off LabelLayer and BuildingLayer? Thinking about it I suspect LineTexLayer.Renderer.draw() is the one - just comment out the body to check. I could write a simpler version if that one makes trouble :)

@hjanetzek
Copy link
Member

while (curLayer != null && curLayer.type == TEXLINE) 
   curLayer = curLayer.next;
return;

must remain in draw() though ...

@baldur
Copy link
Author

baldur commented Apr 15, 2014

I have previously tried pulling out both building and label layer ... and now I tried emptying out the body of the draw method and still having failures.

@hjanetzek
Copy link
Member

So if there is no call to glDrawElements done by vtm anymore (all the other renderers use glDrawArrays) then glDrawElements may only be called by the Android UI or compositor, i.e. after a gl context switch.. stilll I would like to find out which vtm renderer is involved with it: could you disable draw() in LineLayer and PolygonLayer the same way?

@baldur
Copy link
Author

baldur commented May 6, 2014

@hjanetzek we have made some progress here and have identified the culprit:
https://github.com/opensciencemap/vtm/blob/master/vtm/src/org/oscim/renderer/elements/TextureLayer.java#L192

By commenting out that GL.glDrawElements we have a running app that doesn't crash ... we found this by looking at which shaders where affected and the app also runs by making main methods in this shader blank:

https://github.com/opensciencemap/vtm/blob/master/vtm/resources/assets/shaders/texture_layer.glsl

We don't know yet how to fix it but we wanted to give you an update to see if you had thoughts in the light of this discovery.

@hjanetzek
Copy link
Member

When an attribute is not used in the shader it will be optimized out and GL.glGetAttribLocation will return an invalid handle (< 0) So glDrawElements will probably fail before even fetching data from the vbo (fail in the usual GL way - just show nothing).
If you are sure that only the texture renderer is involved, i.e. the crash happens when the texture renderer is alone active one could try to use dynamic vertex arrays instead of the vbo:
b729a52

@baldur
Copy link
Author

baldur commented May 7, 2014

Awesome thanks so much, this patch appears to be working. Looks like the icons for pois are missing though.

@hjanetzek
Copy link
Member

Good to hear that this works. I've added no-vbo option to SymbolLayer now: bdc63d8

actually squashed the SymbolLayer change again and added useVBO option to ElementLayers for putting vertex data into a separate buffer. This only works when ElementLayers contains only TextureLayers though.

@baldur
Copy link
Author

baldur commented May 7, 2014

Awesome ... poi's are back and the map appears to be running smoothly on affected devices. Thanks again for fixing this.

@stleusc
Copy link
Contributor

stleusc commented May 8, 2014

I also gave the fix to my affected users.
Report back is that the issue is gone!

Great work!
Thanks....

@hjanetzek
Copy link
Member

Merged with a check in MapView to enable the workaround for Samsung devices running Kitkat - If you have the exact models for the affected devices this test could be made more specific, but I guess devices running Kitkat are fast enough to have no measurable performance difference using no VBO in this case.

@bcamper
Copy link

bcamper commented May 12, 2014

Thanks! We know the S4 (Adreno 320) and S5 (Adreno 330) devices are
affected - those are two of the most popular (maybe most?) Samsung devices
in the US.

On Mon, May 12, 2014 at 9:12 AM, Hannes Janetzek
notifications@github.comwrote:

Merged with a check in MapView to enable the workaround for Samsung
devices running Kitkat - If you have the exact models for the affected
devices this test could be made more specific, but I guess devices running
Kitkat are fast enough to have no measurable performance difference using
no VBO in this case.


Reply to this email directly or view it on GitHubhttps://github.com//issues/52#issuecomment-42829673
.

@hjanetzek
Copy link
Member

I could reproduce the crash with a S5 now. It seems the problem is actually the use of glBufferSubData (which seems to have realiably issues with adreno). Just disabling glBufferSubData makes it work for me. The crash probably shows up in text renderer because its vertex data is most frequently replaced. So I guess a more appropriate fix would be 9c1ae88

@baldur
Copy link
Author

baldur commented Jun 24, 2014

@hjanetzek we found another device which has issues
HTC One (M8) 4.4.2
HTC Sense version 6.0

Here is a gist from the logcat if that's useful https://gist.github.com/baldur/9dd383bfba1b83bb9593

As before we tryied commenting out the GL.glDrawElements call in:
https://github.com/opensciencemap/vtm/blob/master/vtm/src/org/oscim/renderer/elements/TextureLayer.java
Which stops the crashing from happening. We are happy to help troubleshoot this problem so feel free to ask us for more details or try things to sort this out.

@hjanetzek
Copy link
Member

It seems to be the same problem. I wasnt pleased with the test for Samsung with Kitkat anyway - now I just found that one can get the vendor/renderer info via glGetString to disable use of glBufferSubData for these chips. Could you try https://github.com/opensciencemap/vtm/tree/testing-adreno

@baldur
Copy link
Author

baldur commented Jun 24, 2014

awesome that did the trick ... thanks

@Bezzu
Copy link

Bezzu commented Mar 25, 2015

Hi,
I have had the same issue on a samsung galaxy s4 mini with android 4.4.2 and i have solved it setting the variable, called "NO_BUFFER_SUB_DATA", in the file vtm/org/oscim/backend/GLAdapter.java to true.

hjanetzek pushed a commit to hjanetzek/vtm that referenced this issue Nov 18, 2018
hjanetzek pushed a commit to hjanetzek/vtm that referenced this issue Nov 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants