New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force terminate libtorrent session after saving resume data #18772
Conversation
The protocol is important to follow. It allows trackers to remove the peer from the database when the user sends the |
It would seem far more preferable to have the client ignore trackers that have a status of "not working", rather than blindly ignore the spec for the benefit of users using broken trackers. |
Who said that we are talking about "not working" trackers? It is unlikely that In fact, I could just make it optional so that everyone can choose what is more important to him - to comply with all the formalities of the protocol at all costs or turn off the computer without unacceptable delay. Of course, it would be preferable to get some kind of support from |
It turns out that qBittorrent/libtorrent even has an associated setting "Stop tracker timeout". I wonder what kind of influence it has in this or that case? For example, if there are a lot (N) of "stopped" requests to a "bad" tracker, how much time will it take in the end? N*T? Then this is hardly an effective solution. EDIT:
If it really works in such a way that this PR does not make sense at all. |
Indeed, I didn't read far enough in the issue. Apologies. |
It is better to add an option so that the user himself chooses whether he wants to close the client faster without sending statistics to trackers. |
|
I'm only saying that this feature can be improved, because in its current form, many people did not like it. In principle, I also agree, you should not make a forced session termination by default. But this feature is definitely needed. |
From the other point this patch has advantage over zero |
the ways libtorrent tries to mitigate this issue are:
Another aspect to slow shut-downs are sessions with hundreds of active torrents, libtorrent will queue the tracker requests and only keep 50 (by default) announces outstanding at any given time. This can be controlled by (max_concurrent_http_announces). The issue people would run into before this logic was running out of memory because there was so many SSL contexts allocated for all the tracker requests. Some torrents may also have multiple trackers that need to be stopped. libtorrent already tracks which trackers it successfully sent an Also note that DNS lookups aren't cancelable. If a DNS lookup is stuck, it can hold up the queue for minutes. Killing the thread is really the only way to avoid that unfortunately. |
Why? it seems libtorrent is using boost.asio for DNS lookup so it should support async lookups. I mean why not ditch an lookup query after it has exceed some timeout limit (5secs or configurable) and presume the host isn't reachable instead of waiting/locking up to minutes? It would be user's own problem if their DNS resolver/provider isn't performant and causes lookups to drop in this scenario of shutting down. |
because boost.asio uses underlying operating system calls to perform DNS lookups, and they are blocking. Specifically, getaddrinfo(). It does run this in separate threads, to emulate asyncronous operations, but that doesn't solve the cancel operation. In theory it would be possible for asio to implement the DNS protocol itself, and make lookups async and cancelable, but last time I looked it doesn't. It would still need to query the system for the DNS configuration. |
Sorry just spotted this right now. |
|
Do all previously successfully announced trackers have a cached DNS lookups, or is cache limited in size/timeout, so some trackers still execute the DNS query while sending the |
Unfortunately, we may still find ourselves in a situation where exactly one of the first requests will be stuck and will hold the rest of the queue before thread is interrupted. Since private torrents need sending |
There's a (hard coded) cap on the DNS cache size. It's currently 700 entries. As long as you have no more than 700 different hostnames in trackers, every successful lookup will be cached. To avoid shutdown stalls because of DNS, trackers are only announced to if they have an entry in this cache. There's a flag called The rationale for doing it this was was to really prioritize avoiding shutdown stalls. I'm open to suggestions on how to improve it. |
I tried to exit qBittorent over 12 hours ago. It is still running and using network/disk. |
I attached MSVC 2019 to it, and this is what the threads are doing. Not Flagged 30696 0 Worker Thread qbittorrent.exe!thread_start<unsigned int (__cdecl*)(void ),1> qbittorrent.exe!QWindowsFileSystemWatcherEngineThread::run
|
The mechanisms listed above to deal with shutdown freezes due to "bad" trackers should cover most possible cases. But here's what's weird. The user reports that deleting non-working trackers still affects the delay, although it shouldn't. So my question is, is all of the above inherent in both v1.2 and v2.0 libtorrent branches? |
a tracker can be bad in other ways. In fact, if the DNS lookup is taking a long time I would think it is mostly a property of the DNS server being used. The tracker can be slow in responding for instance. @bbeuning that stall of yours is a separate issue. In your case libtorrent is failing to shut down entirely, it's not just taking a long time. |
The case I mentioned above is:
So I suppose it should fall under:
...since sending an |
Right, I would expect those host names to not have entries in the cache, and just not be announced to when shutting down. But perhaps they were stuck trying to announce start by the time you shut down. |
Well, maybe. It seems to be impossible to be protected from all problems. I still believe that the following could bring some improvement:
|
Do you mean to prioritize private torrents, to announce stop to those first, before trying others? |
Yes. But in fact we are talking about prioritizing private torrents to be paused first once session pause is invoked. |
The answer to this question is important to understand whether a single "bad" tracker referred by many torrents can ruin everything. |
I mean, if it announces 50 torrents in a row on the same tracker, will it execute 50 DNS queries? |
If the DNS lookup fails, it will be re-tried 50 times, yes. It would probably make sense to cache failures as well. Now, I would kind of expect operating systems to do things like this, but it's probably not a safe assumption. If the lookup succeeds, then the remaining 49 announces will use the cache. Since lookups are serialized, there's no overlap where the cache may not be available yet. |
👍 |
The following is from the doc of dnsmasq:
So it really depends on upstream nameserver to whether reply with an TTL value. Still it would be a good idea for libtorrent to cache it for a short period of time, like a few tens of minutes. |
That is, when a lot of announcements are initiated, libtorrent first resolves the names, and in the process of receiving responses from the DNS servers, it executes the announcement requests themselves, did I understand correctly or not? I mean, if in the example above 51th announcement is to another tracker, and the name of this tracker resolves faster than the first one, then it will not have to wait for the previous announcements? |
The improvements proposed in this thread sound good (libtorrent caching lookup failures as well, etc). Including @glassez's suggestion to prioritize stop event of private torrents first (at least on shutdown). As does this PR itself. Because this is an old and major issue, IMO they should all be implemented ASAP (my ideal wishful thinking would even want backporting into v4.3.9 and v4.4.5)... even though my (limited and very possibly wrong) understanding of this and #18697 suggests that they won't solve this problem and a little more is needed for that. I will address this in my next comment here, separated from this long one. There is no reason for this PR to be contested at all. It is perhaps presented poorly in that vein, however, phrased a little badly. The original post here:
The program inexplicably (for the end user) taking many minutes (an arbitrary, technically limitless amount of minutes, by the way) to quit - possibly even blocking system shutdown/logoff, or not - is an unacceptable bug. The concept of adhering to protocol standards zealously at all possible times is nothing in the face of this, though fixing this bug and adhering to protocol most of the time (at worst) aren't mutually exclusive at any case. This should be in easy consensus. How long would waiting to quit the now headless "phantom" qBT process (also sometimes delaying a subsequent qBT restart or system log off) for the sake of always fully following the protocol here be acceptable? 1 minute? 3 minutes? 5 minutes? 10? 20? 1 hour? Well, currently, in practice, it is unlimited. Considering the above quote "I added a small timeout that should be respected before the session is interrupted", this PR should be renamed from "Force terminate libtorrent session after saving resume data" to something like "Impose a maximum timeout on lingering libtorrent session on program shutdown after saving data" or "Prevent libtorrent stalling program shutdown for extended periods of time", which are more palatable descriptions and should be easy to agree upon. Then we can decide what this timeout will be, talk about it and make it a configurable setting as well. 5 minutes? 3? Even 10 minutes is technically better than the current state of things. By the way, it could make a lot of sense to use a different, shorter and expedited timeout when qBT can detect the OS is shutting down/logging off. So, this PR itself isn't the true fix (of #18697) but it is a stopgap and protection in the case of all possible future stalls from libtorrent side on shutdown, for reasons related to this thread or possibly completely unrelated reasons (possibly including this, whatever this is) In the case that the user isn't experiencing a problem like #18697, it doesn't affect anything (protocol adherence or anything else), it is simply a timeout. It should be logged when this timeout is hit (hopefully verbosely in a manner helpful to possible debugging). Remember many users have hundreds or thousands of non-paused torrents loaded in qBT (those then may announce to multiple trackers each). And apparently the "bad trackers" currently (and historically?) causing libtorrent to stall for long times on shutdown for whatever exact reason are commonly used (per #18697 thread). By the way, in my case, with 2000-some seeding torrents (vast majority are actually private torrents, say all save 100 or so), on both qBT v4.5.2 and v4.3.9 (both with lt1.2.x), full shutdown takes above 8 minutes (I will have to retest v4.5.2 because I do not remember exactly, but I seem to remember it took much longer than that there), and - even worse, related issue - consistently during this time, my network internet connection is killed: nothing can use it for anything, the PC or even other devices. It comes back online approx. 2 extra minutes after qBT process end. Not sure what is going on with that (I happen to be currently unconnectable, but I don't see what this would have to do with anything here) . Presumably, during this time qBT is choking my modem/router for some reason, and reducing |
arvidn wrote:
It would seem that to fully solve this current very-long-session-ending bug, libtorrent needs to cover handling these "bad in other ways" scenarios. Don't just cache DNS lookup failures, "cache"/remember all 'cases of bad tracker' during current quitting event. If current quitting scenario is we need to announce stop to 1000 torrents, including 100 torrents with tracker A, and during this quitting scenario tracker A has already reached stop timeout 10 times, then don't wait for it anymore for the next 90 times (wait, actually, why do we need to wait for and timeout reply at all when quitting? or possibly ever?), or possibly even skip further stop announces to it for the duration of this quitting, but maybe only if it really seems dead (including bad pre-quit last known status, but this should already be implemented? or:), say if it didn't reply for 10 seconds (rather than 5 seconds) 10 times, or something. Seems it would be great for the BitTorrent protocol if there could be a single message event for "I am quitting this client named XXX version YYY on address and port ZZZ, and am now stopping all torrents corresponding to this that you know of" (I understand that this would work fine even for say more esoteric setups of people running multiple qBT instances of same version because of different port each, otherwise you would need to assign each client session some session ID and use that). Or otherwise at least if you could mass-announce multiple torrents stop events to one tracker at once. It may be worthwhile to discuss behavior or an option to exempt private trackers/torrents from the above or to be more lenient on them (and/or possibly even provide a separate |
guys any update on when the fix is rolling out? |
Some users are still reporting the fact that |
Oh wow, that was quite a read. Is |
The problem is that at the end of the session,
libtorrent
pedantically follows the protocol and tries to correctly de-announce torrents on all connected trackers. For various reasons, this may take a very long time, which causes dissatisfaction of users. qBittorrent is usually constantly running and exits when logging out or shutting down the computer, so if such problems occur, it prevents the system from shutting down, which is extremely unacceptable. Given that such graceful session shutdown is unimportant to the user, this patch modifies qBittorrent in such a way as to terminate thelibtorrent
session after resume data saving is completed. Since the network interaction takes place in parallel with the resume data saving, I added a small timeout that should be respected before the session is interrupted if the resume data is saved too quickly.Closes #18697.