-
-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP outgoing queue gets stuck after connection error #656
Comments
You are right. The event is not lost. It remains in the queue ✔ thanks, michael |
today it did not recover from the DNS error at all. the problem remained, even after opening and closing the host settings, until I changed the mode to Significant location change / Move mode. |
Hard to say what happend without further information. Could you enable debug logging and see if the issue occurs again? |
Logging Enabled. I'll come back with a logfile |
What if I say that the issue didn't occur since I enabled logging? Without logging it was reproduceable. |
I wouldn't believe you if you said that ;) |
Great stuff: solution: leave logging enabled. Shall I close this ticket now? (Just kidding :-) |
I've encountered a couple times now what seems like a similar situation. I'll find that my location is not being reported (via HTTPS) and in looking at the status window, I see: Endpoint state: Error Both times I've presumably gone through an area without connectivity, the problem is that it seems stuck in that same state after connectivity is restored, instead of the usual return to normal + uploading the queue. Despite being able to lookup/reach the host with a browser, Owntracks continues to increment the Endpoint queue # while showing the same error. I've had this persist for days with thousand of endpoints. Restarting Owntracks clears the error and it starts reporting again right away - unfortunately all the queued messages seem to be thrown away on restart. I turned on debug logging but haven't been able to artificially reproduce the error so far (by toggling airplane mode etc). App build number: 2.1.2 (21201) |
I've noticed the issue several times in the past but I've been unable to reproduce it reliably. At some point it seems that it can no longer resolve adresses and hence fails to send new messages. |
The app is whitelisted for protection against battery optimizations (battery optimization whitelisted: true), however it's not whitelisted for "unrestricted mobile data access", which allows background data usage when Data Saver mode is enabled and using cellular connectivity. In these two cases the Data Saver feature was indeed enabled at some points. As mentioned it continues to affect the app even when on wifi, which prevents the expected uploading of the queue. I wasn't able to reproduce it even with Data Saver enabled, but I also didn't wait for extended periods so perhaps Android hadn't fully restricted it during my tests. Pending a fix or workaround (other than "don't use Data Saver", if that's indeed the cause), it'd be nice to be able to preserve the queue when restarting the app, so that no data is lost, just delayed. Is there a way to do that? |
not at the moment |
I found today that the issue still occurs even if OwnTracks is whitelisted to allow background mobile data usage while in Data Saver mode (with Data Saver enabled). I restarted the app, confirmed it was able to post updates, then disconnected from wifi & traveled a bit (in Move mode). The first java.net. UnknownHostException happens within about a minute and the updates begin to queue, with repeated occurrences of the same exception. ~21 minutes later there's 104 messages in the queue. Back on the wifi, not that it should matter since the app was whitelisted for background data access, but still getting unable to resolve host, have to restart the app to get updates working again.
I notice the next exception error says the endpoint state lists HTTP code 400, which doesn't really jive with being unable to resolve the host: Here's the full last set of messages from the app before restarting to fix it:
|
The error code 400 is indeed strange but so is the 500 that sometimes appears. |
Since turning off Data Saver mode, I've seen it get stuck on again
(3000+ backlogged messages makes for very large debug log files very fast 😄 ) |
Ah yes, I'll remove logging of queue content again, now that we ruled out that queue processing is the issue ;) I've disabled caching for the HTTP client explicitly in the next build. Maybe it caches the error :s |
Just a quick question, but do you get an IPv6 address from your mobile provider or is there only a IPv4/IPv6 record for your target address? Maybe the route selector defaults to IPv6 but there's only IPv4 connection. |
My HTTP target address has only an IPv4 DNS record. I believe my mobile provider does provide IPv6 connectivity by default (T-Mobile), with IPv4 NAT. On wifi I have only IPv4 connectivity. OwnTracks works on both networks normally (and always after being restarted). |
Can you try to reset the HTTP client when it gets stuck the next time? Afterwards, send a message and see if it still is stuck. |
After performing the above steps, it's no longer stuck. |
I think I slowly understand what's happening. The HTTP library maintains a connection pool. When there's a DNS or connection error the response is not closed correctly and shortly after all worker get stuck. Further queue processing attemps reuse the stuck HTTP worker threads. |
Could you install the debug APK from https://github.com/owntracks/android/releases/tag/v.2.1.3 to see if it improves anything? You can install it side by side because it uses the app identifier org.owntracks.android.debug. I'm unfortunately unable to reproduce the behavior on my device or the emulator. When I enable airplane mode again, everything resumes fine. I might already have fixed the issue accidentially by rewriting parts of the HTTP code before though. |
I can give it a go, but it has a conflict trying to install side-by-side:
I can install it in place of the existing one, unless it'd be preferable to resolve that so that their behavior under the same conditions can be compared. |
Ah that one's picky. I've updated the release with a new hardcoded application id. |
More debugs! |
Next try https://alr.st/files/app-debug.apk |
I can just install the debug version alone, if having a side-by-side comparison (both at the same time) isn't a priority. Otherwise send in the next try! |
I'm out of ideas unfortunately. If possible, just install the debug version alone |
No problem, debug version installed, will followup in a few days (hopefully). Thanks! |
So while out and about using the debug version, I found the app stuck with 148 queued messages and the Endpoint state message Once back on the home wifi, now with 241 messages queued, the Endpoint state message had actually changed to |
Reading through the HTTP library issue tracker on Github, I found several other reports with the same issue on Samsung devices: square/okhttp#3919. Unfortunately, there is no workaround mentioned. I could add a custom setting that allows to circumvent system DNS settings but I'm not sure if that is wise. Meanwhile I've created a new version that sets up a new HTTP client instance on each request. Maybe the new instances use a fresh DNS instance. Could you check if the fresh version at https://alr.st/files/app-debug.apk improves anything? @jpmens see, it's always a DNS issue 😁 |
I've installed the latest debug.apk and will give it a go. There was the one stuck error on the previous build which seemed to not be a DNS issue -- the SSL handshake IO Error / Connection reset by peer one. |
Well, I haven't gotten stuck on the "Unable to resolve host" on the latest build, but I have gotten stuck twice with Also again: While I have trouble reproducing it by toggling airplane mode, turning on/off wifi, data, etc, I'd say it happens about 90% of the time when actually traveling from one place to another (if there are network type changes involved). |
Went on a trip and managed to go a couple days, switching occasionally back and forth between native cell data and a cellular hotspot's wifi, without getting hung up. Checked not long after arriving home and saw it stuck on Maybe the existence of a region, or related to the hostname I use having split-dns (resolves to a different IPv4 address on home wifi vs cellular - same SSL cert). I've deleted the home region for now as an experiment. |
For better or worse, I haven't had the problem recur in any form in the last two weeks, despite a number and in-and-outs from home. I haven't changed anything since loading the last debug build. |
Version 2.1.2: I also have |
The message |
I am having the same issues, after not being able to connect to the receiving host with http, the queue seems to be stuck. I was reading the code and noticed that there is a switch in getHttpClient() to either re-use the or setup a new client for each request. I could not however find this setting in the client. Maybe this happens because the httpclient itself gets stuck? If I could find a way to switch it to always create a new httpClient I could test it. |
Thank you for this details. Would be nice this could be fixed. |
I can quite easily reproduce the unknown host exception followed by a pending queue by just disabling my data and WiFi connection and trying to push my location. After enabling data again, I would expect the queue to be emptied within a few minutes. Unfortunately this is not the case, it mostly takes around 20 minutes, which is too long for my unlock the door when I'm home scenario ;) I will try to implement some sort of retry mechanism to be not dependant on some (location) event to arrive for clearing the queue. |
Try setting |
I am using that setting now for a few days, did not see the error yet, will monitor it the coming weeks. Just for my information, why would the queue be emptied if there are no new events? When I look at the code, it's all reactive, there's no process which runs every X minutes so the queue will remain as is without events. |
I was using the Move Monitoring Mode for the last few days, but because of a battery drain I changed it back to Significant Monitoring Mode. Today I was in an error state again, with a SocketTimeoutexception. It took about 35 minutes for it be retried. Would it be possible to change the Monitoring Mode temporarily to Significant, whenever there are http requests stuck? And change it back afterwards? |
Changing modes does not "unstick" the queue, hence no. If in significant location mode a location report fails, it takes until the next report to send it. This might be well over 35 minutes depending on location availability etc. As background tasks are limited to run in certain intervals, we cannot continuously monitor for stuck messages. Your best bet is configuring Also your assumption that the queue should be emptied after enabling data connection again is not correct. We're not monitoring any data connection hence it might again take some time until the next publish (unless there's a bug with Socket DNS resolution like there is on many Samsung devices, in which case |
Yes, I get that.
That's exactly my point, the ping interval of 15m won't solve my door unlock issue, but if monitoring mode sends messages every 1 minute, this most likely will. |
15m is the lowest interval for OS background tasks. Ultimately you might however be using the wrong tool for the job you're trying to achieve. You might have better results by using Region detection if Wifi Location is reliable in your area or some other kind of presence detection besides Owntracks. |
Thanks I'm gonna try that! WiFi is not an option for me. Unlocking the doors with Bluetooth can sometimes taken up to a minute, if I need to be connected to my home network first before it starts unlocking doors, I'm probably gonna have to wait a lot ;) Edit: changing locatorPriority doesn't do anything for me, its stuck at 2 |
There's been a number of changes since 2.1.3 and 2.2.0, including the fact that even bringing the app to the foreground will init a high-frequency location request (similar to how gmaps works). I don't believe there's a bug on the HTTP and queue side in 2.2.0, so will close. If you have a specific issue that's still present in the latest version, please re-open and we'll take a look. |
Hi,
I’m using Regions in OwnTracks for GeoFencing in OpenHAB via a HTTPs-Server. OwnTracks is switched to “Manuel Mode”, means Region Updates only (Enter/Exit).
Usually I’m leaving my Home-Region via car. The car takes the SIM of the mobile via Bluetooth (rSAP) and - after setting up the internet connection - provides a data connection back to the mobile via WiFi. During this transition, there is a short phase (~1 Minute), where the WiFi is already available and connected, but without internet access! Exactly during this time OwnTracks wants to publish the Exit Event. The DNS name of my Server cannot be resolved. An error shows up in the OwnTracks status and the event is lost - OwnTracks doesn’t retry it.
Is it possible to implement a kind of retry mechanism, especially in the case that GeoFencing (Manual Mode with Regions) is used? In this case a lost message is much more critical that in the normal tracking mode.
Do you know any workaround except from doing the Region calculation in OpenHAB instead of OwnTracks?
Thanks and kind regards
michael
The text was updated successfully, but these errors were encountered: