New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore failing to SetAsync() or log the call indefinitely after unknown triggering event #918
Comments
|
@jonahgoldsaito Thanks for the detailed bug report! Could you please make your Firebase logs link viewable to those with the link? It's currently restricted. |
|
Sorry, I just re-shared up the Google doc logs and added more complete logs with the initial successful reads/writes to Firestore, and then the sequence at the end where the I'm leaving the log pretty much as-is with some comments for context... sorry if that's overwhelming. Let me know if you'd rather I trim it down. |
|
Thanks for the logs. I don't see anything obvious that jumps out to me. I'm writing an app that attempts to reproduce this issue and I'll reply back with the outcome of that experiment. If you have any success in writing a minimal app to reproduce then please share it as it will help with debugging immensely. |
|
Thanks @dconeybe! Working on trying to simplify the project without getting rid of the error. |
|
Hi @jonahgoldsaito. I have been unable to reproduce this issue. I tried writing an iOS app that emulates the steps carried out in your description but I never saw the task returned from SetAsync() not completing. You also mentioned that "Any Firestore call beyond this point appears to always fail in the same way" which makes me wonder if there is some sort of deadlock going on in Firestore that is blocking all future asynchronous operations. Could you capture the stack traces of all threads after reproducing the issue? The stack traces may end up pointing to somewhere in the code that is deadlocked, to give me something to investigate. To capture the stack trace, do the following:
|
|
Thanks for the direction, @dconeybe - Here's the gist. BTW, I have a component that waits to execute if the Along those lines, is there a way to do a reset / reinitialization of Firestore if I suspect it's failed? Still whittling down my app so I can isolate and/or share in a repo, and the only remaining Firebase pkgs are Core, Auth, and Firestore. I've removed the audio recording and all the file IO, and uploading to Storage. I seem to still be hitting the bug, though perhaps less often. I was also seeing that some of the Firestore calls were being done too close together at times, so I added a layer that made sure there was at least a second between calls, but that didn't solve anything. One last note: I failed to mention that I'm using Zenject for dependency injection in the original post. |
|
This stack trace looks suspicious: It's blocked on this line: And the AsyncQueue is in the stack trace, which is a single-threaded "event dispatcher" of sorts. If it's blocked then no asynchronous operations will proceed, which matches perfectly with your description of "Any Firestore call beyond this point appears to always fail in the same way". This is very useful information. I don't have an explanation at the moment but I'll dig into it further. As for how to "reset" the Firestore object... the best I can think of is to call I'm also curious... if you remove the preceding upload to storage, does the problem go away? If you're able to test this out that may be useful info, but not completely required. Thanks for continuing to work on a minimal reproduction app. As it stands at the moment, it smells like a bug in Firestore; however, I'll do more investigation to confirm. |
|
Another idea to "reset" Firestore is to call |
GREAT :)
That stack trace is one where I've already removed the Storage But the issue persists. I've also found that the error can sometimes occur at other
I'll try the
Thank YOU! Without your help I'd already be writing my own half-assed REST API :) |
|
Since I can't reproduce, I'm going to ask you to do something for me and capture the resulting logs. With those logs it may help me reproduce, which will enable proper investigation on my end. Steps:
The modified |
|
Done. Here's the gist. I left it intact for fear of removing any other clues that might be in there. |
|
@dconeybe, I think that this 2nd gist that may be better. It's showing where the Anything elucidating about that error? What seems clear is that the silent fail isn't the source of the problem, but a symptom of the previously failing call, which is logged by Firebase at least. Good, right? Also, is there somewhere crucial I should add more logs into the Firestore source to find out how far it gets after the SetAsync() call? Again, this build has Storage, Analytics, Crashlytics, and FCM uninstalled, so all that's left is Firestore and Auth. |
|
Hi @jonahgoldsaito. Thank you for providing those logs. Unfortunately, I am even more perplexed! The logs show that for some reason the task that is enqueued with the executor here in leveldb_remote_document_cache.cc is never executed, resulting in the subsequent call to This can only happen in one of two scenarios: (1) The executor's threads are all busy and/or deadlocked or (2) the executor silently rejects the task and never executes it. I ruled out (1) because the thread dump that you provided in an earlier comment showed no executor threads doing work. So that leaves (2), which could only happen if the executor's If you don't mind, I'd like to test out scenario 2. To do this, you need to patch the following 4 files from the gist https://gist.github.com/dconeybe/06c88b1a708d61875d1e3a4d847ee1b2, reproduce the issue, and capture the logs. Note that the
Thanks for your help and patience with this investigation! |
|
Thanks @dconeybe. Here's the gist using the 4 patched files. It took a few more runs to hit the error, but you'll see it at the end of the file. I'm taking the same steps every time I test btw, and the fail seems to spring up at different |
|
Holy crap. I just stumbled upon a bug thread in the Unity forums that seems like it might be at the root of this issue. All these folks reproducing seem to be using ARFoundation as well and have updated their testing devices to iOS 14.2+ Many folks report not seeing the bug after replacing all The responding Unity team member also suggested this If this is actually it, how wild it's at the intersection of ARFoundation + UWR + iOS 14.2/3 and that it essentially would be breaking every launched, previously working AR Unity based app in the App Store. |
|
Wow that's very interesting. My iPhone has iOS 14.1, so perhaps is not plagued by this issue and would explain why I had no success reproducing. Moreover, my test app does not use ARFoundation or UnityWebRequest. Something that stood out to me from that Unity bug thread was a speculation that there is an OS bug at the root. As I've been pouring over the Firestore code in https://github.com/firebase/firebase-ios-sdk based on the "zzyzx" logs that you provided me, the only explanation that I can come up with is that the call to dispatch_async_f() in ExecutorLibdispatch::Execute() works... until it doesn't; namely, at some point the task specified to Although that Unity bug report doesn't mention In any case, to confirm whether or not there is a bug in |
|
Hey @dconeybe, I just got a report from a tester that While I'm not getting a failure on my test device, I am seeing that my 5 second timeout is triggering the Firestore reset you suggested: The reset is what's allowing me to do subsequent Firestore read and write operations. And it looks like in my last run through, the reset was actually called 3 times. Unfortunately I have one tester, the one who originally identified all this on his brand new iPhone 12 Pro, for whom the reset does not seem to fix things. It's a super regular fail for him. Ugh, I was so optimistic about that UWR patch :) |
Should I add in the |
|
Thanks for the update. I'd say to leave out the UnityWebRequest.mm patch just for consistency with other logs that you've provided. That being said, as long as you can reproduce the issue I don't think it matters. |
Hey @dconeybe, here's the newest gist. without the |
|
Yep, that confirms that the task being specified to If interested, I came to this conclusion based on line 3477 from the logs you posted: This line should be (eventually) followed by one like this: The absence of this log line indicates that |
|
I found another bug report of a user experiencing the same issue when they use ARKit in iOS 14.2: https://developer.apple.com/forums/thread/671645. |
|
And another similar bug report: https://stackoverflow.com/questions/63939557/ios-14-dispatchqueue-main-async-not-working |
|
Thanks so much for the diligence, @dconeybe. So at this point is the only path forward an OS update? Eeesh. It's really interesting that the I assume there's no equivalent at the lower level calls you're making in the Firebase Pod, right? As an aside, can you imagine why terminating and getting a new Firestore instance allows me to make new calls on my iPhone 11 but not on my tester's 12 Pro, which continues to fail? Both on iOS 14.3. That would at least allow forward motion while waiting for an OS update. i.e. |
|
@jonahgoldsaito Yes, it appears that the "fix" will need to be made in the OS. Yes, there is no equivalent to setting I'm going to use our Google-internal process to log a bug against Apple and I'll update back here once I have new information. This is going to require that I reproduce the issue so this whole process will take some unknown amount of time. And even if Apple acknowledges the bug, it would require an iOS update to fix. So there is no good short-term solution. The best idea that I have is to disable the use of libdispatch in the Firestore Pod code that you've been patching. This specific asynchronous dispatching via To test out this workaround, please patch the following 2 files with those from this gist: https://gist.github.com/dconeybe/e1801caabef562f1eb7ee6937f2b83df
Hopefully, this will unblock you. Note, however, that since the true nature of this bug is not understood, it could creep up in other dispatch queues as well. |
|
OMG. I'm going to relish this feeling for a moment, even if it's fleeting. So awesome if you can work on filing the bug, since it has way more potential of getting fixed :) As for a basic repro project for an iOS bug report, I wonder if the UnityWebRequest issue (which is perhaps built on the same underlying stuff) would be helpful. It's tracked here and was submitted with a repro project to which you could substitute Firestore for UWR calls. I just came upon it, so no idea if it works (or doesn't). Have a great weekend! In my book you've DEFINITELY earned it (and I'd throw you a peer bonus @dconeybe if I hadn't turned in my badge a few years back :) ) |
|
I'm glad to hear that the workaround worked for you! I'll take a look at that project for guidance on reproducing the issue. I'll most likely need to remove the Unity-specific parts, but something that can consistently reproduce it will be incredibly helpful. I hope you have a great weekend also :) |
|
Hi @jonahgoldsaito. After discussing internally, we have a hypothesis that the bug you observed may be due to starvation of the dispatch queue due to ARKit's processing dominating the CPU. If you're able to, would you mind testing out this hypothesis for me? Here is what you'd need to do:
Specifically, I'm wondering if the dispatch queue that's not progressing and causing SetAsync() to not complete will "resume" once ARKit stops doing its processing. Thanks in advance if you're able to test this out! |
|
Hey @dconeybe, I can do that. What is interesting is about the resource constraint theory:
|
|
Some other thoughts that come to mind after reading this thread:
|
|
Nice @morganchen12! - that QoS specification seems promising and right in line with the fix that Unity has suggested for their |
|
Seems like this has been fixed from the Unity side. I will close this issue for now. If this is still happening to you, please create a new issue and reference this one. Thank you |
[REQUIRED] Please fill in the following fields:
[REQUIRED] Please describe the issue here:
(Please list the full steps to reproduce the issue. Include device logs, Unity logs, and stack traces if available.)
I am doing a simple SetAsync() call:
In the following log, the first few
SetAsync()calls work as expected, and eventually one fails absolutely silently, meaning:ContinueWithOnMainThread()is never called at all, so there’s no isFaulted message to help debug.Firebase.LogLevel.Verbosethere are no logs about any interactions in Firebase locally or remotely. As if the call never happened. The code is definitely reaching aDebug.Log()on the line directly preceding theSetAsync()call.Other things to note:
SetAsync()orGetSnapShotAsync()calls always fail silently whether targeting the same or a different docRef. Not sure if there's any way to "reboot" firestore once it's been initialized.PutFileAsync()andGetDownloadUrlAsync()because the SetAsync() is essentially storing that retrieved URL. Those Storage calls never fail. It’s always the SetAsync the does.Given that verbose Firebase logging doesn’t show that a Firebase call is being attempted, I’m not sure how to debug where the failure is happening. Super unfortunate :(
Steps to reproduce:
Have you been able to reproduce this issue with just the Firebase Unity quickstarts (this GitHub project)?
Nope
What's the issue repro rate? (eg 100%, 1/5 etc)
It is quite regular… it happens 0% when the app is first launched, but quite consistently if the app has been doing its thing (this is an ARFoundation app that uses URP’s post processing and utilizes the VideoPlayer at times. I'm also using the NAudio.LAME lib to convert recorded audio to MP3 (the file being uploaded to Storage). There’s decent processing happening, certainly.
What happened? How can we make the problem occur?
This could be a description, log/console output, etc.
I’m honestly not sure how to create a simplified, reliable repro environment of this. Is there a way to turn on even more granular logging for Firestore to see at least at what point it fails?
Relevant Code:
The text was updated successfully, but these errors were encountered: