Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firestore failing to SetAsync() or log the call indefinitely after unknown triggering event #918

Closed
jonahgoldsaito opened this issue Jan 3, 2021 · 32 comments

Comments

@jonahgoldsaito
Copy link

jonahgoldsaito commented Jan 3, 2021

[REQUIRED] Please fill in the following fields:

  • Unity editor version: 2019.3.8, 2019.4, 2020.2
  • Firebase Unity SDK version: 6.15.2 and 7.0.2
  • Source you installed the SDK: Unity Package Manager
  • Problematic Firebase Component: Firestore
  • Other Firebase Components in use: Core, Auth, Storage, FCM, Crashlytics, Analytics
  • Additional SDKs you are using: NA
  • Platform you are using the Unity editor on: Mac
  • Platform you are targeting: iOS (various from iPhone 6 Plus to 12 Pro and iPad Pro)
  • Scripting Runtime: IL2CPP

[REQUIRED] Please describe the issue here:

(Please list the full steps to reproduce the issue. Include device logs, Unity logs, and stack traces if available.)

I am doing a simple SetAsync() call:

await docRef.SetAsync( dictionaryObj, SetOptions.MergeAll )
  .ContinueWithOnMainThread(task =>
    {
      Debug.Log(”We’ve arrived!”);
    });

In the following log, the first few SetAsync() calls work as expected, and eventually one fails absolutely silently, meaning:

  • The data is not created/updated in Firestore.
  • The ContinueWithOnMainThread() is never called at all, so there’s no isFaulted message to help debug.
  • Despite Firebase logging set to Firebase.LogLevel.Verbose there are no logs about any interactions in Firebase locally or remotely. As if the call never happened. The code is definitely reaching a Debug.Log() on the line directly preceding the SetAsync() call.

Other things to note:

  • After hitting this error, subsequent SetAsync() or GetSnapShotAsync() calls always fail silently whether targeting the same or a different docRef. Not sure if there's any way to "reboot" firestore once it's been initialized.
  • The app doesn’t crash or freeze, or throw any errors / warnings and the rest of the app continues to operate normally.
  • I don’t believe it has to do with the network. This call is always made immediately after an awaited Storage PutFileAsync() and GetDownloadUrlAsync() because the SetAsync() is essentially storing that retrieved URL. Those Storage calls never fail. It’s always the SetAsync the does.
  • This is happening when building to any iOS device. I can’t get it to occur running in the Editor (Mac). Haven’t tried Android.
  • It occurs only after the app has been running a bit. It never happens if the app is launched and the Upload / SetAsync sequence is run right away.

Given that verbose Firebase logging doesn’t show that a Firebase call is being attempted, I’m not sure how to debug where the failure is happening. Super unfortunate :(

Steps to reproduce:

Have you been able to reproduce this issue with just the Firebase Unity quickstarts (this GitHub project)?
Nope

What's the issue repro rate? (eg 100%, 1/5 etc)
It is quite regular… it happens 0% when the app is first launched, but quite consistently if the app has been doing its thing (this is an ARFoundation app that uses URP’s post processing and utilizes the VideoPlayer at times. I'm also using the NAudio.LAME lib to convert recorded audio to MP3 (the file being uploaded to Storage). There’s decent processing happening, certainly.

What happened? How can we make the problem occur?
This could be a description, log/console output, etc.

I’m honestly not sure how to create a simplified, reliable repro environment of this. Is there a way to turn on even more granular logging for Firestore to see at least at what point it fails?

Relevant Code:

@jonahgoldsaito jonahgoldsaito added the new New issue. label Jan 3, 2021
@thebrianchen thebrianchen self-assigned this Jan 4, 2021
@thebrianchen
Copy link

@jonahgoldsaito Thanks for the detailed bug report! Could you please make your Firebase logs link viewable to those with the link? It's currently restricted. Firebase.LogLevel.Verbose is the most granular log level we have. Hopefully, we can dig a little deeper with the logs you have!

@thebrianchen thebrianchen added the needs-info Need information for the developer label Jan 5, 2021
@thebrianchen thebrianchen assigned dconeybe and unassigned thebrianchen Jan 5, 2021
@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 5, 2021

Sorry, I just re-shared up the Google doc logs and added more complete logs with the initial successful reads/writes to Firestore, and then the sequence at the end where the SetAsync() fails silently (zero logging despite being set to VERBOSE), and then a subsequent GetSnapShotAsync() which also fails silently. Any Firestore call beyond this point appears to always fail in the same way.

I'm leaving the log pretty much as-is with some comments for context... sorry if that's overwhelming. Let me know if you'd rather I trim it down.

@google-oss-bot google-oss-bot added needs-attention Need Googler's attention and removed needs-info Need information for the developer labels Jan 5, 2021
@dconeybe
Copy link

dconeybe commented Jan 7, 2021

Thanks for the logs. I don't see anything obvious that jumps out to me. I'm writing an app that attempts to reproduce this issue and I'll reply back with the outcome of that experiment. If you have any success in writing a minimal app to reproduce then please share it as it will help with debugging immensely.

@jonahgoldsaito
Copy link
Author

Thanks @dconeybe! Working on trying to simplify the project without getting rid of the error.

@dconeybe
Copy link

Hi @jonahgoldsaito. I have been unable to reproduce this issue. I tried writing an iOS app that emulates the steps carried out in your description but I never saw the task returned from SetAsync() not completing.

You also mentioned that "Any Firestore call beyond this point appears to always fail in the same way" which makes me wonder if there is some sort of deadlock going on in Firestore that is blocking all future asynchronous operations. Could you capture the stack traces of all threads after reproducing the issue? The stack traces may end up pointing to somewhere in the code that is deadlocked, to give me something to investigate.

To capture the stack trace, do the following:

  1. Cause the situation where SetAsync() is failing.
  2. Attach Xcode.
  3. Pause the debugger.
  4. Run the following lldb command: thread backtrace all
  5. Past the output of that command into a gist and post the link here.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 14, 2021

Thanks for the direction, @dconeybe - Here's the gist.

BTW, I have a component that waits to execute if the SetAsync() hasn't returned in 10 seconds. It calls a simple GetSnapShotAsync() to just confirm that it also doesn't return or fail in any way.

Along those lines, is there a way to do a reset / reinitialization of Firestore if I suspect it's failed?

Still whittling down my app so I can isolate and/or share in a repo, and the only remaining Firebase pkgs are Core, Auth, and Firestore. I've removed the audio recording and all the file IO, and uploading to Storage. I seem to still be hitting the bug, though perhaps less often.

I was also seeing that some of the Firestore calls were being done too close together at times, so I added a layer that made sure there was at least a second between calls, but that didn't solve anything. One last note: I failed to mention that I'm using Zenject for dependency injection in the original post.

@dconeybe
Copy link

This stack trace looks suspicious:

  thread #83, queue = 'com.google.firebase.firestore'
    frame #0: 0x00000001c6eb81ac libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x00000001e4915458 libsystem_pthread.dylib`_pthread_cond_wait + 1192
    frame #2: 0x00000001ae151328 libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x00000001061ff248 UnityFramework`firebase::firestore::util::BackgroundQueue::AwaitAll() + 64
    frame #4: 0x000000010627854c UnityFramework`firebase::firestore::local::LevelDbRemoteDocumentCache::GetAll(firebase::firestore::immutable::SortedSet<firebase::firestore::model::DocumentKey, firebase::firestore::util::Comparator<firebase::firestore::model::DocumentKey> > const&) + 844
    frame #5: 0x000000010627f860 UnityFramework`firebase::firestore::local::LocalDocumentsView::GetDocuments(firebase::firestore::immutable::SortedSet<firebase::firestore::model::DocumentKey, firebase::firestore::util::Comparator<firebase::firestore::model::DocumentKey> > const&) + 60
    frame #6: 0x0000000106288cd4 UnityFramework`std::__1::__function::__func<std::__1::enable_if<!(std::is_same<void, decltype(fp0())>::value), decltype(fp0())>::type firebase::firestore::local::Persistence::Run<firebase::firestore::local::LocalStore::AcknowledgeBatch(firebase::firestore::model::MutationBatchResult const&)::$_4>(absl::lts_2020_02_25::string_view, firebase::firestore::local::LocalStore::AcknowledgeBatch(firebase::firestore::model::MutationBatchResult const&)::$_4)::'lambda'(), std::__1::allocator<std::__1::enable_if<!(std::is_same<void, decltype(fp0())>::value), decltype(fp0())>::type firebase::firestore::local::Persistence::Run<firebase::firestore::local::LocalStore::AcknowledgeBatch(firebase::firestore::model::MutationBatchResult const&)::$_4>(absl::lts_2020_02_25::string_view, firebase::firestore::local::LocalStore::AcknowledgeBatch(firebase::firestore::model::MutationBatchResult const&)::$_4)::'lambda'()>, void ()>::operator()() + 136
    frame #7: 0x0000000106277328 UnityFramework`firebase::firestore::local::LevelDbPersistence::RunInternal(absl::lts_2020_02_25::string_view, std::__1::function<void ()>) + 136
    frame #8: 0x0000000106285890 UnityFramework`firebase::firestore::local::LocalStore::AcknowledgeBatch(firebase::firestore::model::MutationBatchResult const&) + 152
    frame #9: 0x00000001062dc254 UnityFramework`firebase::firestore::core::SyncEngine::HandleSuccessfulWrite(firebase::firestore::model::MutationBatchResult const&) + 132
    frame #10: 0x00000001062c4298 UnityFramework`firebase::firestore::remote::RemoteStore::OnWriteStreamMutationResult(firebase::firestore::model::SnapshotVersion, std::__1::vector<firebase::firestore::model::MutationResult, std::__1::allocator<firebase::firestore::model::MutationResult> >) + 492
    frame #11: 0x0000000106303e54 UnityFramework`firebase::firestore::remote::WriteStream::NotifyStreamResponse(grpc::ByteBuffer const&) + 436
    frame #12: 0x00000001062d27dc UnityFramework`firebase::firestore::remote::Stream::OnStreamRead(grpc::ByteBuffer const&) + 240
    frame #13: 0x000000010625bf70 UnityFramework`std::__1::__function::__func<firebase::firestore::remote::GrpcCompletion::Complete(bool)::$_0, std::__1::allocator<firebase::firestore::remote::GrpcCompletion::Complete(bool)::$_0>, void ()>::operator()() + 56
    frame #14: 0x00000001061fde6c UnityFramework`firebase::firestore::util::AsyncQueue::ExecuteBlocking(std::__1::function<void ()> const&) + 68
    frame #15: 0x00000001062e6ff4 UnityFramework`firebase::firestore::util::Task::ExecuteAndRelease() + 184
    frame #16: 0x0000000198d35db0 libdispatch.dylib`_dispatch_client_callout + 20
    frame #17: 0x0000000198d3d10c libdispatch.dylib`_dispatch_lane_serial_drain + 580
    frame #18: 0x0000000198d3dc5c libdispatch.dylib`_dispatch_lane_invoke + 408
    frame #19: 0x0000000198d47d78 libdispatch.dylib`_dispatch_workloop_worker_thread + 708
    frame #20: 0x00000001e4913804 libsystem_pthread.dylib`_pthread_wqthread + 276

It's blocked on this line:

https://github.com/firebase/firebase-ios-sdk/blob/b1766c2fd60510d42f63266dd23f0716904b3104/Firestore/core/src/local/leveldb_remote_document_cache.cc#L167

And the AsyncQueue is in the stack trace, which is a single-threaded "event dispatcher" of sorts. If it's blocked then no asynchronous operations will proceed, which matches perfectly with your description of "Any Firestore call beyond this point appears to always fail in the same way".

This is very useful information. I don't have an explanation at the moment but I'll dig into it further.

As for how to "reset" the Firestore object... the best I can think of is to call FirebaseApp.Dispose() (https://firebase.google.com/docs/reference/unity/class/firebase/firebase-app#dispose) and then get a new Firestore instance; however, Dispose() may also block indefinitely waiting for the AsyncQueue to shut down. You'd have to try it out.

I'm also curious... if you remove the preceding upload to storage, does the problem go away? If you're able to test this out that may be useful info, but not completely required.

Thanks for continuing to work on a minimal reproduction app. As it stands at the moment, it smells like a bug in Firestore; however, I'll do more investigation to confirm.

@dconeybe
Copy link

Another idea to "reset" Firestore is to call FirebaseFirestore.TerminateAsync() (https://firebase.google.com/docs/reference/unity/class/firebase/firestore/firebase-firestore#terminateasync) and then get a new FirebaseFirestore instance. It may also block indefinitely but I'd try this before my previous suggestion of disposing the FirebaseApp.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 14, 2021

This is very useful information.

GREAT :)

if you remove the preceding upload to storage, does the problem go away?

That stack trace is one where I've already removed the Storage PutFileAsync() call (and the entire Storage package), as well as the preceding WAV/MP3 writing, and even the audio recording that kicks off the sequence. The only thing remaining from that is the SetAsync() itself.

But the issue persists. I've also found that the error can sometimes occur at other SetAsync() calls. In one instance it occurs just after a VideoPlayer is destroyed, and that one doesn't seem to happen if the video that's loaded (via url) is a tiny file size. Really hard to tell what's a red herring :)

TerminateAsync()

I'll try the TerminateAsync() call ASAP. I hadn't seen it.

Thanks for continuing to work on a minimal reproduction app.

Thank YOU! Without your help I'd already be writing my own half-assed REST API :)

@dconeybe
Copy link

Since I can't reproduce, I'm going to ask you to do something for me and capture the resulting logs. With those logs it may help me reproduce, which will enable proper investigation on my end.

Steps:

  1. Find the Pods directory for the iOS project that was exported from Unity.
  2. Make a backup of the file FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc under the Pods directory.
  3. Replace the file FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc under the Pods directory with the version here: https://gist.github.com/dconeybe/b3415ee47af4fdb703b4264026b9d2cf
  4. Reproduce the bug where the Task returned from SetAsync() never completes.
  5. Capture all log lines containing the text "zzyzx" and paste them in a gist.
  6. (Optional) Restore the backed up leveldb_remote_document_cache.cc.

The modified leveldb_remote_document_cache.cc just adds a bunch of logging using LOG_WARN, where each log line contains "zzyzx".

@dconeybe dconeybe added needs-info Need information for the developer and removed needs-attention Need Googler's attention new New issue. labels Jan 15, 2021
@jonahgoldsaito
Copy link
Author

Done. Here's the gist. I left it intact for fear of removing any other clues that might be in there.

@google-oss-bot google-oss-bot added needs-attention Need Googler's attention and removed needs-info Need information for the developer labels Jan 19, 2021
@dconeybe dconeybe removed the needs-attention Need Googler's attention label Jan 19, 2021
@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 20, 2021

@dconeybe, I think that this 2nd gist that may be better.

It's showing where the SetAsync() clearly fails, and also that the following two SetAsync() functions fail without any logging from Firebase. The same would have been true if there had been a GrabSnapshotAsync() following the fail.

Anything elucidating about that error? What seems clear is that the silent fail isn't the source of the problem, but a symptom of the previously failing call, which is logged by Firebase at least. Good, right?

Also, is there somewhere crucial I should add more logs into the Firestore source to find out how far it gets after the SetAsync() call?

Again, this build has Storage, Analytics, Crashlytics, and FCM uninstalled, so all that's left is Firestore and Auth.

@dconeybe
Copy link

Hi @jonahgoldsaito. Thank you for providing those logs. Unfortunately, I am even more perplexed! The logs show that for some reason the task that is enqueued with the executor here in leveldb_remote_document_cache.cc is never executed, resulting in the subsequent call to tasks.AwaitAll() to block indefinitely, awaiting its completion. This causes the AsyncQueue to no longer proceed, causing the behavior you are noting where SetAsync() and GetSnapshotAsync() never complete.

This can only happen in one of two scenarios: (1) The executor's threads are all busy and/or deadlocked or (2) the executor silently rejects the task and never executes it. I ruled out (1) because the thread dump that you provided in an earlier comment showed no executor threads doing work. So that leaves (2), which could only happen if the executor's Dispose() method is invoked, which I think would only happen if Firestore was shutting down (but I could be wrong).

If you don't mind, I'd like to test out scenario 2. To do this, you need to patch the following 4 files from the gist https://gist.github.com/dconeybe/06c88b1a708d61875d1e3a4d847ee1b2, reproduce the issue, and capture the logs. Note that the leveldb_remote_document_cache.cc from this gist is slightly different than the gist that I posted earlier, so please be sure to patch it as well.

  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor_libdispatch.mm
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor_libdispatch.h
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor.h

Thanks for your help and patience with this investigation!

@jonahgoldsaito
Copy link
Author

Thanks @dconeybe. Here's the gist using the 4 patched files. It took a few more runs to hit the error, but you'll see it at the end of the file.

I'm taking the same steps every time I test btw, and the fail seems to spring up at different SetAsync() calls within the flow. This one didn't drop until 21 minutes into usage. Haven't been able to find any obvious patterns if it had to do with overlapping calls or a certain number of calls.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 21, 2021

Holy crap.

I just stumbled upon a bug thread in the Unity forums that seems like it might be at the root of this issue. All these folks reproducing seem to be using ARFoundation as well and have updated their testing devices to iOS 14.2+

Many folks report not seeing the bug after replacing all UnityWebRequest calls with ClaytonIndustries' HttpClient. Clearly a lot of work, and refactoring Firestore doesn't sound fun. Someone also mentioned that the error only occurred when using UWR PUTs and not POSTs. Not sure if that was validated.

The responding Unity team member also suggested this NSOperationQueue fix (just this past Friday). Made that patch to UnityWebRequest.mm and I haven't yet hit another failure. Running more tests.

If this is actually it, how wild it's at the intersection of ARFoundation + UWR + iOS 14.2/3 and that it essentially would be breaking every launched, previously working AR Unity based app in the App Store.

@dconeybe
Copy link

Wow that's very interesting. My iPhone has iOS 14.1, so perhaps is not plagued by this issue and would explain why I had no success reproducing. Moreover, my test app does not use ARFoundation or UnityWebRequest.

Something that stood out to me from that Unity bug thread was a speculation that there is an OS bug at the root. As I've been pouring over the Firestore code in https://github.com/firebase/firebase-ios-sdk based on the "zzyzx" logs that you provided me, the only explanation that I can come up with is that the call to dispatch_async_f() in ExecutorLibdispatch::Execute() works... until it doesn't; namely, at some point the task specified to dispatch_async_f() is not called back, causing the call to BackgroundQueue::AwaitAll() to block forever, causing SetAsync() and all future Firestore operations to never complete.

Although that Unity bug report doesn't mention dispatch_async_f(), it has a similar symptom with the result of a web request never being sent. Perhaps UnityWebRequest is using dispatch_async_f() (or some other related function) under the hood?

In any case, to confirm whether or not there is a bug in dispatch_async_f(), can you patch the files one more time and capture the logs for me? This time I've attached UnityIssue918PatchFilesV3.zip that you can just unzip in the "Pods" directory.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 21, 2021

Hey @dconeybe, I just got a report from a tester that SetAsync() is still hanging in a new build with the UWR patch.

While I'm not getting a failure on my test device, I am seeing that my 5 second timeout is triggering the Firestore reset you suggested:

await _db.TerminateAsync();
_db = FirebaseFirestore.GetInstance(_app);

The reset is what's allowing me to do subsequent Firestore read and write operations. And it looks like in my last run through, the reset was actually called 3 times.

Unfortunately I have one tester, the one who originally identified all this on his brand new iPhone 12 Pro, for whom the reset does not seem to fix things. It's a super regular fail for him. Ugh, I was so optimistic about that UWR patch :)

@jonahgoldsaito
Copy link
Author

In any case, to confirm whether or not there is a bug in dispatch_async_f(), can you patch the files one more time and capture the logs for me? This time I've attached UnityIssue918PatchFilesV3.zip that you can just unzip in the "Pods" directory.

Should I add in the UnityWebRequest.mm patch from the thread or leave it untouched?

@dconeybe
Copy link

Thanks for the update. I'd say to leave out the UnityWebRequest.mm patch just for consistency with other logs that you've provided. That being said, as long as you can reproduce the issue I don't think it matters.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 21, 2021

In any case, to confirm whether or not there is a bug in dispatch_async_f(), can you patch the files one more time and capture the logs for me? This time I've attached UnityIssue918PatchFilesV3.zip that you can just unzip in the "Pods" directory.

Hey @dconeybe, here's the newest gist. without the UnityWebRequest.mm patch.

@dconeybe
Copy link

Yep, that confirms that the task being specified to dispatch_async_f() is not being invoked, which causes SetAsync() (and all future Firestore operations) to stop proceeding. This is truly bizarre, and my only explanation is a bug in dispatch_async_f()in the OS. I'm going to do some research to see if it's a known issue. Perhaps it's something that can be reported to Apple?

If interested, I came to this conclusion based on line 3477 from the logs you posted:

zzyzx LevelDbRemoteDocumentCache invocation_id=4 tasks.Execute() returned task_id=1070

This line should be (eventually) followed by one like this:

zzyzx ExecutorLibdispatch::InvokeAsync() task_id=1070 Calling ExecuteAndRelease()

The absence of this log line indicates that InvokeAsync, which is the callback specified to dispatch_async_f(), is not invoked. It should, instead, be invoked shortly after the log line 3477 (e.g. lines 3375 and 3377).

@dconeybe
Copy link

I found another bug report of a user experiencing the same issue when they use ARKit in iOS 14.2: https://developer.apple.com/forums/thread/671645.

@dconeybe
Copy link

And another similar bug report: https://stackoverflow.com/questions/63939557/ios-14-dispatchqueue-main-async-not-working

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 21, 2021

Thanks so much for the diligence, @dconeybe.

So at this point is the only path forward an OS update? Eeesh.

It's really interesting that the UnityWebRequest issue may come from the same root problem and that there's at least a temporary fix by fiddling with the priority (or qualityOfService) of the NSOperationQueue:

webOperationQueue = [[NSOperationQueue alloc] init];
webOperationQueue.qualityOfService = NSQualityOfServiceUserInteractive;

I assume there's no equivalent at the lower level calls you're making in the Firebase Pod, right?

As an aside, can you imagine why terminating and getting a new Firestore instance allows me to make new calls on my iPhone 11 but not on my tester's 12 Pro, which continues to fail? Both on iOS 14.3. That would at least allow forward motion while waiting for an OS update.

i.e.

await _db.TerminateAsync();
_db = FirebaseFirestore.GetInstance(_app);

@dconeybe
Copy link

@jonahgoldsaito Yes, it appears that the "fix" will need to be made in the OS. Yes, there is no equivalent to setting webOperationQueue.qualityOfService in the Firestore Pod. And nothing comes to mind as to why the terminate workaround does not work on your tester's 12 Pro.

I'm going to use our Google-internal process to log a bug against Apple and I'll update back here once I have new information. This is going to require that I reproduce the issue so this whole process will take some unknown amount of time. And even if Apple acknowledges the bug, it would require an iOS update to fix. So there is no good short-term solution.

The best idea that I have is to disable the use of libdispatch in the Firestore Pod code that you've been patching. This specific asynchronous dispatching via dispatch_async_f() was supposed to be a performance optimization to allow parsing data received from the server in parallel. So there will be some performance penalty. Actually... looking at the logs that you sent me all usages of this specific dispatch queue are starting at most 1 asynchronous task; therefore, this "performance optimization" using libdispatch is actually a bit of a performance hit due to the overheard of multithreading. If this is how your real application works then you may not realize any performance hit at all by removing the usage of libdispatch.

To test out this workaround, please patch the following 2 files with those from this gist: https://gist.github.com/dconeybe/e1801caabef562f1eb7ee6937f2b83df

  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.h
  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc

Hopefully, this will unblock you. Note, however, that since the true nature of this bug is not understood, it could creep up in other dispatch queues as well.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 22, 2021

OMG.
I don't want to get too excited, but the first test of your patch worked! 🎉🎉🎉🍾🥂🎉🎉🎉

I'm going to relish this feeling for a moment, even if it's fleeting.
Even the air smells sweeter right now.

So awesome if you can work on filing the bug, since it has way more potential of getting fixed :)

As for a basic repro project for an iOS bug report, I wonder if the UnityWebRequest issue (which is perhaps built on the same underlying stuff) would be helpful. It's tracked here and was submitted with a repro project to which you could substitute Firestore for UWR calls. I just came upon it, so no idea if it works (or doesn't).

Have a great weekend! In my book you've DEFINITELY earned it (and I'd throw you a peer bonus @dconeybe if I hadn't turned in my badge a few years back :) )

@dconeybe
Copy link

I'm glad to hear that the workaround worked for you!

I'll take a look at that project for guidance on reproducing the issue. I'll most likely need to remove the Unity-specific parts, but something that can consistently reproduce it will be incredibly helpful.

I hope you have a great weekend also :)

@dconeybe
Copy link

Hi @jonahgoldsaito. After discussing internally, we have a hypothesis that the bug you observed may be due to starvation of the dispatch queue due to ARKit's processing dominating the CPU. If you're able to, would you mind testing out this hypothesis for me?

Here is what you'd need to do:

  1. Undo the workaround patch that I posted a few comments ago.
  2. Reproduce the bug (i.e. cause a call to SetAsync() to fail to complete).
  3. Do something in your app to stop ARKit from doing its processing (e.g. change to a different screen, stop the camera capturing).
  4. Observe if the SetAsync() call that was previously stuck now runs to completion.

Specifically, I'm wondering if the dispatch queue that's not progressing and causing SetAsync() to not complete will "resume" once ARKit stops doing its processing. Thanks in advance if you're able to test this out!

@jonahgoldsaito
Copy link
Author

Hey @dconeybe, I can do that.

What is interesting is about the resource constraint theory:

  • I've been getting more reports of the bug on a shiny new iPhone 12 Pro than on 6+ test device
  • It's only started happening after the last iOS updates (14.3 is when I noticed it but it may have been 14.2).

@morganchen12
Copy link

Some other thoughts that come to mind after reading this thread:

  • If resource starvation is causing the Firestore executor queue to not make progress, upping the priority of the Unity web queue would likely not fix it, but applying a higher qos to the Firestore queue might. Something like:
    std::unique_ptr<Executor> Executor::CreateSerial(const char* label) {
      dispatch_queue_attr_t qosAttribute = dispatch_queue_attr_make_with_qos_class(
        DISPATCH_QUEUE_SERIAL, 
        QOS_CLASS_USER_INITIATED,
        0
      );
      dispatch_queue_t queue = dispatch_queue_create(label, qosAttribute);
      return absl::make_unique<ExecutorLibdispatch>(queue);
    }
    in executor_libdispatch.cc.
    Apple's documentation implies we should be specifying a non-default qos in this case, though in the long term it may be more reasonable to specify qos on a per-block basis to better support things like Firestore downloading data while the app is backgrounded.
  • I can't think of a reason why this bug would be more readily reproducible on an iPhone 12 than a 6+.

@jonahgoldsaito
Copy link
Author

jonahgoldsaito commented Jan 28, 2021

Nice @morganchen12! - that QoS specification seems promising and right in line with the fix that Unity has suggested for their UnityWebRequest problems. Also, if a vital request is being made via direct user action, and one that the user needs to know is finished before they can progress, it makes sense to specify QOS_CLASS_USER_INITIATED (especially if there's a bug that a low quality request might even never execute 😅).

@chkuang-g
Copy link
Contributor

Seems like this has been fixed from the Unity side.

I will close this issue for now. If this is still happening to you, please create a new issue and reference this one.

Thank you

@firebase firebase locked and limited conversation to collaborators Aug 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants