New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qbt with lt 1.1 has worse torrent rechecking/creating speed on Windows #9061
Comments
Thanks for the benchmarks! Although i have nothing to comment i'd like to say that maybe you want to benchmark libtorrent master which (i think) can be used with this qBt repo: https://github.com/zeule/qbit |
There may have been a change to make force rechecking less "hard" on drives and CPUs. |
@arvidn |
@airium what is the percent figure in the table? My first guess at the factor dominating this throughput is the Perhaps the default should be 8 MiB. @airium would you mind doubling or quadrupling this setting and see how it affects the checking throughput? |
@arvidn Sorry for my delay. Here is the result I put on Google Sheet. |
Also sorry I don't have time to fill every entry in the table but if you want some specific entries just tell me. |
I'm primarily interested in whether |
@arvidn Oh sorry I misunderstood your reply. Now I will change |
@arvidn So this time I alter However, the 3 builds with 32, 1024 and 256000 show no observable difference on torrent rechecking/creaing speed compared with the default build with 256. Here are my builds: https://drive.google.com/open?id=16cerOCIJOAW2ksQhu8PgslDJxWLZvUx_ |
I am actually seeing the same thing here, it is not only limited to checking speed but also actual torrent speeds are also much faster and more stable when libtorrent 1.0.11 is used rather than 1.1.X. Arvidn I have been emailing you about this, as I can reproduce this not only on Windows but also on Linux (Using deluge), 1.0.11 will outperform the newer versions in all of my testings. I have tried nearly everything I can think of, nearly all options using "ltconfig" on my linux installs and I have simply not been able to get 1.1.X to go anywhere near the speeds of 1.0.11, thats in terms of network throughput but also in terms of creating and checking torrents. |
@khnielsen would you be able to enable session stats logging (in the alert mask) and post the logs here? |
I would if you could provide some documentation on how to do that, I am a bit lost on how to proceed, my findings on linux is not supported by anything but testing and messing around with it, but I have very little knowledge of how to get the proper logs for you. |
@airium |
I run some tests to see what I can find on MacOS with a spinning disk. creating and checking torrents are slightly different. For checking, there currently is logic to concentrate the checking jobs to a single thread (as long as there are 4 or more disk threads). There's an effect where spreading out the checking over more threads slows it down, presumably because of disk I/O requests not being perfectly sequential. It's possible that having However, when creating a torrent, the hashing happens in a single thread. In my testing, the CPU used for hashing in insignificant compared to the time of the disk read operations. This is on a spinning disk on macOS. People that experience this problem may have a system with different characteristics, like an SSD for instance. |
looking closer at the code, I think the main reason for the poor performance is probably that libtorrent reads 16kiB at a time from the disk. I think the original motivation for this is to keep the code simple and to not use excessive amounts of memory when the piece size is large. I will try to improve this. |
anyone wants to give this patch a try? arvidn/libtorrent#3112 |
iirc, especially windows is very sensitive to small read calls. This reads entire pieces at a time, rather than 16kiB, but it still falls back to the generic (16 kiB at a time) logic in case it fails to allocate enough space in the cache |
Sorry for my delay in response.
@Chocobo1 Thank you for your advice, but actually this setting does not matter. You could find my test packages in the link below, in which the ones with "aio_thread_1" and "aio_thread_64" are.
@arvidn I tried this one and it works! I noticed the creating/rechecking speed on SSD is almost 2x of that of current |
thanks for testing and making these builds! This will be part of the 1.1.8 release |
@arvidn btw, before I close this issue, do you have any immediate idea to further improve this performance? As we can see uTorrent still outperforms libtorrent in some cases (e.g. 500MB/s vs 200-300MB/s on SATA3 SSD), I am still of interest in beating uT and plan to look into how libtorrent implements this part, maybe you could offer a start point. |
the only other thing I can think of would be to bump the |
What kind of items changed in the past arvid? I can concur that the patch improves performance but from my testings it is still a ~250MB\s 1.1.7 with patch vs 420MB\s 1.0.11 which is a massive difference. I would assume that newer versions include better performance, and not something that goes the other way - From my testings then 1.0.11 is generally outperforming the 1.1.x branch in basically everything, from connecting to peers, network throughput on 10G interfaces to hashing speed. |
@khnielsen a lot of things changed, perhaps the most important one was to support more than one disk I/O thread. would you like to build and run performance regression tests, to avoid changes that negatively impact performance in the future? |
@arvidn I have some new findings by chance. Now on the top of branch Screenshots first: However, a single increase on one of the two has no effect, i.e. only increase Windows binaries for test: the same google drive link I also noticed that in the past hours a little more io related patch is introduced, but I have no time to test either it or other changes in next days. Sorry in advance. And just a note here: I also find some other problems during the whole tests. One is that libtorrent 1.1 executes all queued torrent rechecking jobs simultaneously, not likely on 1.0. This might not be expected especially on HDD as HDDs are slower in the case. Another one is that, when creating torrent, clicking on "Cancel" actually does not stop the job immediately. Hard disk is still being read until the job finishes (but gives no torrent). These 2 problems should be discussed as new issues. |
It's not supposed to do this by default. It suggests the |
@ssiloti I can't recall the motivation for concentrating hash jobs into specific threads. I imagine it would have benefits of making disk I/O more sequential when checking a whole torrent (by serialising them all in a single thread). But I can also imagine a benefit being that it prevents hash jobs from starving out other disk jobs, just because there are so many of them. I get the feeling though that in this case, with an SSD, the bottleneck is not the disk I/O, but the CPU of SHA-1. Does that sound reasonable? I can't think of a good architecture that would satisfy both the I/O bound and CPU bound case. As for the |
Just want to add that qbt doesn't touch that parameter, it should be at the default |
I think the splitting of hash jobs into their own threads was before my time, but preventing them from starving other jobs seems like the most plausible justification. A 6700K CPU is capable of computing SHA1 at well over 2GB/s with four hash threads, so it shouldn't be a bottleneck even with the NVMe drive as long as I suspect the biggest drag on performance here is insufficient pipelining of hash jobs combined with insufficient parallelism. SSDs, especially high speed NVMe drives, really need to have multiple requests outstanding to hit their peak performance, so it makes sense that @airium needed both an increased number of active jobs and multiple hash threads to get the most out of those drives. I think increasing |
Actually nevermind about increasing the minimum active jobs and removing |
could someone make a new test build with latest
|
Sorry I was busy in my life. Now here is my compilation of qb 4.1.1 + lt 1.1.8 both at release: Google Drive This time I remind of using the And this time I still encounter the concurrent rechecking problem. To be specific, when queued multiple torrents to recheck, normally it should run one by one while the others are stalled at the At last, some of my friends reported crash issue with 1.1.8 against either deluge and qbittorrent. Maybe I also happened to encounter once yesterday on one my seedbox, but we didn't collect any trace info. |
I just want to inform I tested the last build of airium and it's day and night. Before, I was having many error / checking issues constantly, mainly with big files >30GB to a NAS. Now I've got almost no issue. Also when I exit qbittorent, it really stops now. Before it used to keep running for 5min in the background... Launching qbittorent again led to "checking" on all file. So thanks for fixing all this, I was about to drop qbittorrent. Nice one ! |
Considering so far the disk performance has been much better, I am then closing this issue. |
This issue extends #8181 (the only I found relevant). It identifies a probably libtorrent-side performance degradation wrt torrent rechecking on Windows. Maybe this one should be forwarded to libtorrent, but I think it is better be forwarded by qbt main devs or I will do it later.
qBittorrent version and Operating System
Windows 10 1709/1803, but I think it is general to at least Windows 7+
qBittorrent 3.3.16, 4.0.3, 4.0.4, 4.1.0, 4.1.1, but I think it's general to qbt 3.3+ and 4+
libtorrent: 1.0.11 (at 62c9679) and 1.1.7 but I think it's general to 1.0.6+ and 1.1+
What is the problem
I made a benchmark on torrent rechecking/creating wrt qbt 4.1.1 + lt 1.0.11/1.1.7, additionally with uT2.2.1 for comparison. From the result below, qBittorrent 4.1.1 will have a worse torrent rechecking/creating performance using libtorrent 1.1.7. This slower speed is actually general to any qBittorrent 3.3+ and 4+ built against libtorrent 1.1.x. If built against libtorrent 1.0.x, e.g. 1.0.11, qBittorrent will have a much better speed on that. The potential disk read and hash capability of libtorrent 1.0.11 should be much higher than uTorrent, as seen by 1000MB/s creating speed on NVMe SSD, but there is a degradation on rechecking, and a further drive-dependent degradation as seen on the SATA3 SSD, not seen by uTorrent.
speed / active time
SM951
X400
ST8000DM
ST2000LM
100%
~60%
~80%
~60%
100%
~85%
~97%
~96%
100%
~90%
90-95%
~95%
speed / active time
SM951
X400
ST8000DM
ST2000LM
100%
~60%
~80%
~60%
~30%
~65%
~76%
~85%
100%
~90%
90-95%
~95%
*The performance here is strange, but I tested it multiple times and the value is true. I have no idea why qbt4.1.1+lt1.1.7 should be slower on ST8000 than ST2000.
4.1.1+1.1.7 = the qbt official x64 build of qbt 4.1.1, lt 1.1.7, boost 1.67.0, Qt 5.10.1
4.1.1+1.0.11 = my x64 build of qbt 4.1.1, lt 1.0.11 (at 62c9679), boost 1.65.1, Qt 5.10.1
2.2.1 = uTorrent 2.2.1 build 25302
all using default configuration
Sequential read capability (the outmost cylinder if HDD):
SM951 = Samsung SM951 512G MLC M.2 NVMe SSD, ~1500MB/s, internal PCIe 3.0 4X
X400 = SanDisk X400 1TB TLC M.2 SATA3 SSD, ~500MB/s, internal SATA3
ST8000 = Seagate ST8000DM004, 3.5" 8T 5425R 256M, ~200MB/s, mounted via USB3.0
ST2000 = Seagate ST2000LM003, 2.5" 2T 5400R 32M, ~135MB/s, internal SATA3
CPU is 6700K, RAM 48GB
Test is based on 30GB large files (typ. 2GB each file), in 4MiB piece size
Speed and active time is the value shown in Task Manager
Both HDDs are reformated before test
What is the expected behaviour
Libtorrent 1.1.x should at least have a same disk perf on creating/rechecking torrent as 1.0.x.
Maybe qBittorrent should re-consider the libtorrent version to be used before libtorrent 1.1 becomes better.
Steps to reproduce
You need at least an SSD (assume you are not using some "flash disk" level SSD of 300MB/s or lower). For comparison, lt 1.1x group can include any official qbt 4.x builds (which are all built again lt 1.1.x), and lt 1.0.x group can include the official qbt 3.3.16 build (which is on lt 1.0.11) and my builds of qbt 4.1.1 against lt 1.0.11. You can also build other qbt+lt combination. My note here is that lt 1.0.11 at 4e90eb1 (the one in the releases page) or lower seems not to compile with boost 1.65.0 or above, but 1.64.0 is fine; lt 1.0.11 at 62c9679 does not compile with boost 1.66.0 or above, but 1.65.1 is fine.
It should also be noted that due to windows system read cache, before creating/rechecking you should do enough other read/write jobs to wipe the torrent content off the memory, otherwise operation will directly hit cache from memory at a high speed.
Extra info(if any)
I can upload all screenshots later (maybe in one or two days) if necessary.
As for Linux, I did not perform any similar test as I always compile libtorrent 1.0.11 on it, but if necessary maybe I can test it on my seedboxes, but this might need 1 week or more.
Previously I noticed the discussion about Windows IO and cache issue, so I also conducted some investigation on related setting about libtorrent session (typically becomes settings_pack in 1.1 branch), I altered settings including
enable_os_cache
,low_priority_disk
,coalesce_reads
(by qBittorrent GUI or modifying qbt source code) but result does not show any change. This needs further investigation.I have been trying to fix this issue, but I don't think I can do it faster or better than current main qbt and lt devs. If main devs are willing to fix this issue, I can help run benchmarks.
The text was updated successfully, but these errors were encountered: