Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIRListenerRegistration will failed while connected Router disconnected from Internet and then connect back. #2987

Closed
TIEmerald opened this issue May 12, 2019 · 22 comments

Comments

Projects
None yet
6 participants
@TIEmerald
Copy link

commented May 12, 2019

[READ] Step 1: Are you in the right place?

  • For issues or feature requests related to the code in this repository
    file a Github issue.
    • If this is a feature request make sure the issue title starts with "FR:".
  • For general technical questions, post a question on StackOverflow
    with the firebase tag.
  • For general Firebase discussion, use the firebase-talk
    google group.
  • For help troubleshooting your application that does not fall under one
    of the above categories, reach out to the personalized
    Firebase support channel.

[REQUIRED] Step 2: Describe your environment

  • Xcode version: 10.1 (10B61)
  • Firebase SDK version: 6.0.0
  • Firebase Component: Firestore (Auth, Core, Database, Firestore, Messaging, Storage, etc)
  • Component version: 1.3.0

[REQUIRED] Step 3: Describe the problem

As Described in the title, the created Listener FIRListenerRegistration will failed if the device-connected router disconnected from Internet and then connect back. (I mean the device connected with router all the time, but the router is disconnected from Internet.)

Steps to reproduce:

Un-plug the Internet cable from Router and then connect it back....

Relevant Code:

Here is a very simple Project I used to testing... I found out Firebase Listener doesn't have this issue but Firestore's one has.
Testing Procject For Firestore.zip

@rsgowman

This comment has been minimized.

Copy link
Collaborator

commented May 14, 2019

Hi @TIEmerald; could you elaborate on what you mean by "failed"? I tested out your project, and I think it's working as expected. (I tested on an emulator, and faked "pulling the router cable" by turning off the network connection on my mac.)

When the network is in a bad state, Firestore does indicate in the logs that there is a network error, but once the network state is restored, it should reconnect and continue working properly.

@rsgowman rsgowman added the needs-info label May 14, 2019

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 14, 2019

Nope... You cannot fake faked "pulling the router cable" by turning off the network connection on my Mac. you have to literally pulling the router cable.
Those two are different.
If you turn off the network connection on you Mac, Max's Reachability will notice your device has disconnected with any further network connection devices.
but if you "pulling the router cable" Reachability won't notice it... just your Mac disconnected from Internet.

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 14, 2019

And Failed means if I change anything from the listening collection or document, our project won't notice that...

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 14, 2019

And also, "Pulling the router cable" is trying to simulate the unstable Internet connection from Router.

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 14, 2019

@TIEmerald Thanks for the added information. The SDK should initiate reconnect attempts (subject to exponential backoff) once it detects the connection has been closed. If you pull the network cable, it may very well take longer for the client to be notified that the connection has been closed, and so recovery could take longer. But it should still reconnect and get appropriate events. Can you try waiting longer and see if the changes eventually arrive?

Else, can you enable logging with FirebaseFirestore.setLoggingEnabled(true) and capture logs for the scenario, including ~5minutes of logs after the point that you pull the network cable? That should make it apparent whether reconnects are happening.

THanks!

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 14, 2019

Hi @mikelehen , I have test it one more time, here is the records: ( ̄∇ ̄) ~~
I compile my project at around 9:33-34 am, and here is the Screenshot from my simulator:
Screenshot Before Changing

Then, I un-plug and re-plugin the cable from connected router arount 9:35-36 am. After that, while my computer reconnected with Internet, I changed the value from firebase/firestore database:
Changed Firebase
Changed Firestore

The firebase value changed immediately, but not the firestore's one.... I have waited until 9:41 am... still not changing... even now it's still not changing...
Screenshot after Changing

And here is the logs I copied from console:
Logs For Firestore Testing.txt

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 15, 2019

@TIEmerald Thanks very much for the screenshots and logs. This makes it very clear what you're observing and it's definitely not what I'd expect. From the logs, the Firestore SDK hasn't been notified that the connection has failed and so we haven't initiated any retry attempts. I still suspect that if you wait long enough, it will eventually reconnect. So if you're willing, I'd appreciate it if you'd try waiting ~15 minutes and see if it ever recovers. Obviously this would be unacceptable performance but it would at least verify that we're dealing with a timeout issue rather than a full bug.

In the meantime, I'll see if I can reproduce, maybe using the Network Link Conditioner in OSX. Unfortunately this is likely something that would need to be addressed in the gRPC layer (the underlying transport that we use to communicate with the backend), so it may not be something we can fix super easily (at least in the short-term)...

@mikelehen mikelehen self-assigned this May 15, 2019

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 15, 2019

Thanks, @mikelehen , I am appreciated your efforts.

Also... I am pretty sure waiting ~15 minutes won't help in here... As while I generating my last comments.. I believe I have noticed that even through the time is 9:51 am, which is around 15 mins from my testing, the Firestore Listener is still not updated..

To be honest... I am not pretty sure could we reproduce it with Network Link Conditioner..... But my work around is whenever I detected the Internet connection is off, I will recreate the same Listener when the Internet is back.

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 15, 2019

@TIEmerald Interesting. Could you do the following and collect the logs for me?

  1. Start your listener, unplug /re-plugin your router cable as you did before.
  2. Make a change via the console and wait ~5min to make sure the listener doesn't see it.
  3. Add a new listener, for a different query.

I am wondering if when you get data for the new listener, you also get the missing event for the first listener. It really shouldn't be possible for you to "miss" events and then have a new listener work correctly. So I'm wondering if the reason recreating the listener works for you is because it triggers us to notice the connection has died and perform reconnect logic.

Meanwhile I will try to reproduce this myself today. Thanks!

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 16, 2019

@TIEmerald Good news. I was able to reproduce this using Network Link Conditioner. I will follow up with the gRPC team to see if I can get to the bottom of it. Thank you for the report. I’ll update here as I have news to share.

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 16, 2019

Cheers @mikelehen , looking forward your updates~~

I didn't get a chance to collect the logs today... let me check if I could find a time later today or not..

About the "because it triggers us to notice the connection has died and perform reconnect logic." part. Actually, there is one thing I found very interesting with our production application....

Besides "recreate the listener", I could even bring the listener back while I called a cloud function from the same device which updated the parent document of my listening collection.....

Which means... If I have a document D, which has a collection C like this:
D / C

My application A has a listener L to the collection C.

And then if I disconnect Router and connected it back, the listener L will failed, which is the issue I was describing...

After that if I called one Cloud function on Firebase server from the same application A, and this cloud function updated one property on D. I could notice that the failed listener L will back to work.....

Very strange... might be helpful thus I leave it in here... could ignore it if it's too complicated...

Thanks anyway.

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 16, 2019

Thanks for the added info. Don’t worry about collecting the logs. I think I have everything that I need since I can reproduce it now.

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 16, 2019

Quick Question in here, hi @mikelehen, would you mind tell me how do you reproduce it with Network Link Conditioner? with Profile "100% Loss"? seems like I failed to reproduce it with Network Link Conditioner

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 16, 2019

@TIEmerald What I've been doing is starting the listener, enabling 100% packet loss in Network Link Conditioner, waiting 2.5 minutes, and then disabling Network Link Conditioner to restore full network connectivity. When I do that, about 50% of the time the listener doesn't recover. I don't know how important it is to wait 2.5 minutes. But I tried 60, 90, 120, 150 seconds and 150 was the first time it reproduced for me. 🤷‍♂

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 17, 2019

en... strange things... might to do something with the base level internet communication... 🤷‍♂️ ... I might still testing it with my re-plugin way......

Kind of annoying that my temporary solution didn't fully resolve the issue our customers are facing... could because our internet connection detection logic is not perfect, or something else, I don't know....... I think I might try to depend on firebase listener to temporarily resolve this ...

Still, thanks a lot mate~

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 17, 2019

Hi @mikelehen we encounter a similar issue today while we are testing our application with Xcode.
We have captured Firestore logs hope it will be helpful:

Scenario :

Our device connected to router all the time, and we didn't notice our device disconnected from Internet at all. We found out Firestore Listener failed in the middle of our testing, and then we tried to reset the listener manually.

Logs :

Captured Logs for Firestore Listener failed.txt

This image will help you understand the scenario: ( I grep all lines with WatchStream)

Screen Shot 2019-05-17 at 5 01 16 pm
We have updated the listened collection on 14:56 and 14:57, but nothing happened... therefore, we manually reset the listener on 14:58 and it has shown the WatchStream try to "unwatch" -> "watch" -> "Stream error" -> "backoff" and then continued.....

Further Experience:

Then I did further experience with my simple project. And I use my re-plugin method to simulate the issue.
I found out if I reset the listener within several minute, there won't be any "Stream error".
But if I wait longer enough then I could capture the similar logs:
NewTestingLogs.txt
Screen Shot 2019-05-17 at 5 22 05 pm

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 17, 2019

@TIEmerald Thanks for the added info. I think that's probably expected. Removing the network cable results in packet loss and TCP streams can tolerate some amount of packet loss and still recover. So depending on how long and various timing aspects, the TCP stream may just recover (no stream error) or it may fail and close (resulting in a stream error).

I've forwarded the info on this case to the gRPC team. I'm also looking into whether we can avoid it ourselves by enabling a keepalive in gRPC... which I believe will help detect failures much more quickly / predictably. I'll keep you updated.

@mikelehen

This comment has been minimized.

Copy link
Member

commented May 17, 2019

*EDIT: * (thanks @paulb777) I should have said "will go out in a future iOS release" rather than "next". :-)

I think this should be addressed by #3029 which will go out in the next iOS release. You can pick it up earlier by following the instructions at https://github.com/Firebase/firebase-ios-sdk#accessing-firebase-source-snapshots

With that change, my listener recovered within a few seconds of turning off packet loss. It could take longer, depending on various timing of retry attempts, etc. But it should never take more than ~1 minute.

Thanks again for reporting this and all the help to track it down.

@mikelehen mikelehen closed this May 17, 2019

@paulb777 paulb777 added this to the M49 milestone May 17, 2019

@paulb777

This comment has been minimized.

Copy link
Member

commented May 17, 2019

Actually, we've already branched and are stabilizing the next release. This bug fix will be on the M49 release which will be the one after that.

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 19, 2019

Thanks very much @mikelehen, you are a legend. I will give it a try today.

@TIEmerald

This comment has been minimized.

Copy link
Author

commented May 20, 2019

Hi, @mikelehen 😂 I cannot pick it up via following the instructions at https://github.com/Firebase/firebase-ios-sdk#accessing-firebase-source-snapshots, would you mind help me and tell me what's wrong in here? Thanks

target 'Testing Procject For Firestore' do
  # Uncomment the next line if you're using Swift or would like to use dynamic frameworks
  # use_frameworks!
pod 'Firebase/Core', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'
pod 'Firebase/Firestore', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'
pod 'Firebase/Analytics', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'
pod 'Firebase/Database', :git => 'https://github.com/firebase/firebase-ios-sdk.git', :branch => 'master'

  target 'Testing Procject For FirestoreTests' do
    inherit! :search_paths
    # Pods for testing
  end

  target 'Testing Procject For FirestoreUITests' do
    inherit! :search_paths
    # Pods for testing
  end

end
Testing Procject For Firestore > pod update
Update all pods
Updating local specs repositories

CocoaPods 1.7.0.rc.2 is available.
To update use: `sudo gem install cocoapods --pre`
[!] This is a test version we'd love you to try.

For more information, see https://blog.cocoapods.org and the CHANGELOG for this version at https://github.com/CocoaPods/CocoaPods/releases/tag/1.7.0.rc.2

Analyzing dependencies
Pre-downloading: `Firebase` from `https://github.com/firebase/firebase-ios-sdk.git`, branch `master`
[!] Unable to find a specification for 'Firebase'.
@wilhuff

This comment has been minimized.

Copy link
Member

commented May 20, 2019

Unfortunately, there's a difference between subspecs of the Firebase pod and the actual product pods. To target a product CocoaPod from source you want the one without the slash in it.

That is, you want pod 'FirebaseCore' not pod 'Firebase/Core'. That applies to all the pods you want to get from master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.