Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Engine crash in MessageLoopTaskQueues::NotifyObservers #38778

Closed
ds84182 opened this issue Aug 19, 2019 · 7 comments
Closed

Engine crash in MessageLoopTaskQueues::NotifyObservers #38778

ds84182 opened this issue Aug 19, 2019 · 7 comments
Assignees
Labels
c: crash Stack traces logged to the console c: regression It was better in the past than it is now engine flutter/engine repository. See also e: labels.

Comments

@ds84182
Copy link
Contributor

ds84182 commented Aug 19, 2019

Very inconsistent crash, may be concurrency related. The stack trace the tombstone gives back is incorrect (lack of frame pointers?). Here's a symbolicated stack dump instead:

engine commit: 8d11d1c815737939216ab3aa98a567c93e2d7e69

signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x1
Cause: null pointer dereference
    r0  00000001  r1  cf56d3e0  r2  cf56d3e0  r3  00000009
    r4  cf372e30  r5  e5a16250  r6  d0c85710  r7  00000018
    r8  ebd74960  r9  ebc8c728  r10 ebd23880  r11 ebd238cc
    ip  ef304cfc  sp  d0c856e0  lr  d0dc85a7  pc  d0dc5afa

stack:
         d0c856a0  00000000
         d0c856a4  e5a2b020  [anon:libc_malloc]
         d0c856a8  eb0dc810  [anon:libc_malloc]
         d0c856ac  eb0dc800  [anon:libc_malloc]
         d0c856b0  cf4098d0  [anon:libc_malloc]
         d0c856b4  cf4098d0  [anon:libc_malloc]
         d0c856b8  cf4098dc  [anon:libc_malloc]
         d0c856bc  cf56d3e0  [anon:libc_malloc]
         d0c856c0  e5a16250  [anon:libc_malloc]
         d0c856c4  d0c85710  <anonymous:d0b89000>
         d0c856c8  00000018
         d0c856cc  ebd74960  [anon:libc_malloc]
         d0c856d0  ebc8c728  [anon:libc_malloc]
         d0c856d4  ebd23880  [anon:libc_malloc]
         d0c856d8  ebd238cc  [anon:libc_malloc]
         d0c856dc  d0dc85a7  /data/app/<redacted>/lib/arm/libflutter.so
    #00  d0c856e0  d0c85710  <anonymous:d0b89000>
         ........  ........
    #01  d0c856e8  00000018
         d0c856ec  d0dc74ad  /data/app/<redacted>/lib/arm/libflutter.so
         d0c856f0  d0c85728  <anonymous:d0b89000>
         d0c856f4  00000001
         d0c856f8  00000018
         d0c856fc  d0dc7413  /data/app/<redacted>/lib/arm/libflutter.so
         d0c85700  ebd23650  [anon:libc_malloc]
         d0c85704  d0c85728  <anonymous:d0b89000>
         d0c85708  00000018
         d0c8570c  d0dc8547  /data/app/<redacted>/lib/arm/libflutter.so
         d0c85710  d1240a64  /data/app/<redacted>/lib/arm/libflutter.so
         d0c85714  ebd23650  [anon:libc_malloc]
         d0c85718  ed09b371  /system/lib/libutils.so (_ZN7android20SimpleLooperCallback11handleEventEiiPv)
         d0c8571c  d0db3be0  /data/app/<redacted>/lib/arm/libflutter.so
         d0c85720  d0c85710  <anonymous:d0b89000>
         d0c85724  ebc8c728  [anon:libc_malloc]

Manual stack trace:

(gdb) info line *((0xd0dc5afa - 0xd0da3000) + 0x11d000)
Line 1860 of "../../third_party/libcxx/include/functional"
   starts at address 0x13fafa <std::__1::__function::__value_func<void ()>::operator()() const+2>
   and ends at 0x13fb02 <std::__1::__function::__value_func<void ()>::operator()() const+10>.
(gdb) info line *((0xd0dc85a7 - 0xd0da3000) + 0x11d000)
Line 845 of "../../third_party/libcxx/include/__tree"
   starts at address 0x1425a6 <std::__1::__function::__func<fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)::$_2, std::__1::allocator<fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)::$_2>, void (fml::TaskQueueId)>::operator()(fml::TaskQueueId&&)+26>
   and ends at 0x1425ae <std::__1::__function::__func<fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)::$_2, std::__1::allocator<fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)::$_2>, void (fml::TaskQueueId)>::operator()(fml::TaskQueueId&&)+34>.
(gdb) info line *((0xd0dc74ad - 0xd0da3000) + 0x11d000)
Line 2419 of "../../third_party/libcxx/include/functional"
   starts at address 0x1414ac <std::__1::function<void (fml::TaskQueueId)>::operator()(fml::TaskQueueId) const+16>
   and ends at 0x1414ae <std::__1::function<void (fml::TaskQueueId)>::operator()(fml::TaskQueueId) const+18>.
(gdb) info line *((0xd0dc7413 - 0xd0da3000) + 0x11d000)
Line 27 of "../../flutter/fml/message_loop_task_queues.h"
   starts at address 0x141412 <fml::MessageLoopTaskQueues::MergedQueuesRunner::InvokeMerged(std::__1::function<void (fml::TaskQueueId)>)+14>
   and ends at 0x141414 <fml::MessageLoopTaskQueues::MergedQueuesRunner::InvokeMerged(std::__1::function<void (fml::TaskQueueId)>)+16>.
(gdb) info line *((0xd0dc8547 - 0xd0da3000) + 0x11d000)
Line 2406 of "../../third_party/libcxx/include/functional"
   starts at address 0x142546 <fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)+42>
   and ends at 0x14254c <fml::MessageLoopTaskQueues::NotifyObservers(fml::TaskQueueId)+48>.

Typically this happens during image loading in my project. I use a background view to manage thumbnail downloading (since I can use the http2 package, which works quite nicely). It is somewhat hard to reproduce, but it has happened on multiple devices.

cc @iskakaushik @chinmaygarde

@chinmaygarde
Copy link
Member

This seems related to dynamic thread merging but the engine SHA does not include the changes to the same made over the weekend. @iskakaushik: Does this look related? Probably needs more investigation but it seems like a junk observer is getting invoked.

@chinmaygarde chinmaygarde added engine flutter/engine repository. See also e: labels. c: crash Stack traces logged to the console c: regression It was better in the past than it is now labels Aug 19, 2019
@iskakaushik iskakaushik self-assigned this Aug 19, 2019
iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Aug 20, 2019
@iskakaushik
Copy link
Contributor

@ds84182 I'm still trying to reproduce this issue. One theory I have is that there might be null observers getting added. I will continue trying to reproduce this.

I've made some changes so we warn when we do this going forward: flutter/engine#11315

Looks like you are running this on android, on commit 8d11d1 can you tell me which flutter_mode this app was running with? Is it release or debug?

Also is the backtrace you have provided for a 64 bit device?

Thanks!

iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Aug 20, 2019
iskakaushik pushed a commit to iskakaushik/engine that referenced this issue Aug 20, 2019
iskakaushik added a commit to flutter/engine that referenced this issue Aug 21, 2019
@ds84182
Copy link
Contributor Author

ds84182 commented Aug 21, 2019

@iskakaushik It is a 32-bit Android device. The backtrace above was from release mode, but I've also hit it in debug mode.

@tvolkert
Copy link
Contributor

/cc @adazh

@jason-simmons
Copy link
Member

I was able to reproduce this by running an internal customer test case, and I think I know why this is happening.

The crash occurs in MessageLoopTaskQueues::NotifyObservers when observer.second is an invalid callback. The invalid callback is obtained by iterating over task_observers_[queue], which is a map containing closures.

MessageLoopTaskQueues::NotifyObservers holds a lock guarding task_observers_[queue]. But NotifyObservers does not hold the queue_meta_mutex_ that guards task_observers_. task_observers_ is a vector of maps, and if some other thread simultaneously calls MessageLoopTaskQueues::CreateTaskQueue, then task_observers_ may be reallocated. This will destroy the task_observers_[queue] instance that NotifyObservers is iterating over.

One option would be replacing the task_observers_ vector with a map of queue IDs to objects containing the queue's observers/locks/etc. MessageLoopTaskQueues methods can then grab the meta mutex, look up the queue's object in the map, release the meta mutex, and continue using the object.

The map will also make it easier to clean up a queue after the engine that created the queue is deleted.
Currently, each engine instance will create new queues for its task runners in the global singleton MessageLoopTaskQueues. Each new queue extends task_observers_ and other vectors. These vectors are never shrunk, resulting in a leak.

iskakaushik added a commit to flutter/engine that referenced this issue Aug 23, 2019
The core underlying issue is that vector push_back could re-allocate and cause us to segfault. I have switched the backing queues to a map per @jason-simmons suggestion in flutter/flutter#38778.

I've also added a test to capture the aforementioned bug. I've run internal tests several times to validate that this is fixed.

General threading note for this class is that only the following operations take a write lock on the meta mutex:

1. Create
2. Dispose

The rest of the operations take read lock on the meta mutex and acquire finer grained locks for the duration of the operation. We can not grab read lock for the entire duration of NotifyObservers for example because observer can in-turn create other queues -- Which we should not block.

Additional changes:

1. Make as many methods as possible const. Unlocked methods are all const.
2. Migrate all the queue members to a struct, and have a map.
3. Get rid of the un-used Swap functionality.
@iskakaushik
Copy link
Contributor

@ds84182 looks like this is fixed per our internal testing. Please let me know if you still run into issue on master. Thanks!

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
c: crash Stack traces logged to the console c: regression It was better in the past than it is now engine flutter/engine repository. See also e: labels.
Projects
None yet
Development

No branches or pull requests

5 participants