Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

Closed
mseroczynski opened this issue Jun 1, 2020 · 11 comments
Closed

Wi-Fi reconnection not detected on Android 7.0 / 7.1.1 #7453

mseroczynski opened this issue Jun 1, 2020 · 11 comments
Assignees
Labels

Comments

@mseroczynski
Copy link
Contributor

DownloadManager has Requirements.NETWORK_UNMETERED, have for example 10 elements queued, and then wi-fi range is lost. Then, when wi-fi range is back and phone is able to use connection ExoPlayer still thinks there is no network and keeps all the queue elements in state QUEUED with DownloadManager.notMetRequirements = 3.

I've done some microwave kitchen testing today (2,4ghz Faraday cage) and been able to reproduce this issue repeatedly.

@andrewlewis
Copy link
Collaborator

Which ExoPlayer version are you using?

@mseroczynski
Copy link
Contributor Author

2.11.4

@andrewlewis
Copy link
Collaborator

@ojw28 Please take a look.

@ojw28
Copy link
Contributor

ojw28 commented Jun 1, 2020

@mseroczynski - What is your DownloadService implementation returning from getScheduler? Do you have any idea whether your process is being killed between connectivity being lost and becoming available again, when you reproduce this issue? Thanks!

@mseroczynski
Copy link
Contributor Author

mseroczynski commented Jun 1, 2020

I've followed example code in this matter:

  override fun getScheduler(): Scheduler? {
    return if (Util.SDK_INT >= 21) PlatformScheduler(this, JOB_ID) else null
  }

I assume process wasn't killed but will try to determine it sometime later (currently I'm busy working on workaround fix to recover from stuck download by modifying current requirements by adding 'or DEVICE_IDLE' flag, just to set it back like it was a line later - it allows me to reinstantiate RequirementsWatcher, hoping for it to work, isn't confirmed yet). It's not super easy to reproduce (happens 1 out of ~50 tries when microwaving Xiaomi Mi5 7.0 and 2 times when walking away from Wi-Fi on Motorola Nexus 6 7.1.1).

Below is state of DownloadManager.currentDownloads after onIdle call when it happened:
Zrzut ekranu 2020-06-01 14 42 53

@ojw28
Copy link
Contributor

ojw28 commented Jun 1, 2020

It might be worth trying to use WorkManagerScheduler instead of PlatformScheduler, just in case the underlying implementation has some extra fixes that didn't make it into the platform for Android N.

@mseroczynski
Copy link
Contributor Author

mseroczynski commented Jun 2, 2020

FYI reinstating RequirementsWatcher did not help (my code thats using https://github.com/pwittchen/ReactiveNetwork/ could correctly detect that there is wrong state in ExoPlayer not met requirements yet still workaround fix described in last post failed to work), I'm trying out WorkSchedulerManager now.

@mseroczynski
Copy link
Contributor Author

mseroczynski commented Jun 2, 2020

There is surely something wrong elsewhere, process is not killed and issue is reproducible on other versions of Android (happened on 10 now). (Scheduler implementation doesn't seem to play a role). I think it's a critical bug and I'm having trouble fixing it outside of ExoPlayer. Try to take 10+ devices for a 20 minute walk like me and you'll supposedly reproduce it. Also, FYI I'm triggering resumeAll on queue on every app-in-foreground event, it does nothing because ExoPlayer keeps thinking there is no NETWORK_UNMETERED.

Please let me know, if this issue will be ignored in 2.11.5 as I'll have to fork ExoPlayer to base RequirementsWatcher on verified working implementations like these: https://github.com/pwittchen/ReactiveNetwork/tree/RxJava2.x/library/src/main/java/com/github/pwittchen/reactivenetwork/library/rx2/network/observing/strategy

@ojw28 ojw28 added bug and removed needs triage labels Jun 3, 2020
@ojw28
Copy link
Contributor

ojw28 commented Jun 3, 2020

I've been able to reproduce this by walking repeatedly outside and back into WiFi connectivity (I'm not able to reproduce by simulating the same thing with a microwave; perhaps the way the signal strength drops off and comes back differently in the two cases makes a difference).

When the issue reproduces, RequirementsWatcher.NetworkCallback receives a capabilities changed event indicating that the network is validated (NET_CAPABILITY_VALIDATED). But immediately after this when we evaluate the network in Requirements.getNotMetNetworkRequirements, the platform tells us that the network is not connected, or connecting, or validated, and that it is metered.

It's unclear whether we should be listening to a different capabilities change in RequirementsWatcher.NetworkCallback, or whether there's an issue in the underlying platform where it doesn't make the change in the network status visible quickly enough. In the latter case we may need to work around the problem by adding a small delay before we evaluate the network. Further investigation is needed into these points.

As an aside, there's really no need to ping an issue three times in under an hour with no useful new information. All it does is spam a lot of people with email. We obviously can't commit to fixing it in a given release prior to understanding what the problem is and what the correct fix is, and at this point we're still figuring that out.

ojw28 added a commit that referenced this issue Jun 4, 2020
Issue: #7453
PiperOrigin-RevId: 314710328
ojw28 added a commit that referenced this issue Jun 4, 2020
Issue: #7453
PiperOrigin-RevId: 314710328
@ojw28
Copy link
Contributor

ojw28 commented Jun 4, 2020

@mseroczynski - The commits referenced above should largely mitigate this issue (one is into dev-v2 and the other is the same change cherry-picked into dev-v2-r2.11.5).

There's still something weird going on in the problematic case where the events that the platform is firing seem to be inconsistent with what we get when we then query the platform through its getters. This is described in a little more detail in my post above. Interestingly, when I tried to create a minimal example that listens to the same events and queries the same APIs, I could not make the issue occur. It's possible that the network requests the Downloader instances are making are important for triggering the problem, perhaps because if they fail at around the time the WiFi network is lost, it causes the network to be declassified as the "active" one by the underlying platform. This is just a guess, but would explain why my minimal reproduction attempt didn't reproduce the same problem. It may also imply that the working alternatives you found may also not work properly once fully integrated into DownloadService/DownloadManager workflow.

The way the commits referenced above mitigate this issue is by re-evaluating connectivity when we receive additional events that get fired every now and again and imply continued network connectivity. By doing this, even if we do hit the problematic case, it should resolve itself when one of these subsequent events gets fired and the network is classified as being active again by the underlying platform. It appears that turning the display on is sufficient for this to happen, at least on a Pixel test device. Most likely any other network activity that the device does (e.g. push notifications) will also cause things to return to their correct state, so in practice this mitigation is probably going to be good enough to resolve the issue pretty quickly in nearly all cases, and in the worst case will resolve the issue as soon as the user looks at their device. The events we're listening to and the way in which we query connectivity looks to be broadly inline with how WorkManager does things.

Marking as fixed because, in practice, the issue is probably fixed well enough. We're having some discussions with the Android core networking team to try and work out exactly what's happening, and we may come up with a "better" fix in due course, if we manage to figure it out.

@ojw28
Copy link
Contributor

ojw28 commented Jun 5, 2020

Subsequent fix in 226583f / 46d29b2.

@google google locked and limited conversation to collaborators Aug 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants