Test multithreaded file hashing#6693
Conversation
3ce5b08 to
8bb390b
Compare
|
I used the pyinstrument task profiler when uploading a 145MB RPM. Before this patch was applied the task took 1.17s. After the patch the run time was 0.99s. I am attaching an archive that contains the output of pyinstrument. |
| with ThreadPoolExecutor(max_workers=6) as executor: | ||
| while data := file.read(1048576): | ||
| for hasher in models.Artifact.DIGEST_FIELDS: | ||
| executor.submit(instance.hashers[hasher].update, data) |
There was a problem hiding this comment.
Theoretically speaking I suppose it's possible to build up data in memory if we were submitting tasks faster than they could be completed, as the refcounts wouldn't go to zero on the data immediately. I'm not sure that's too much of a problem in practice? But I'm calling it out.
There was a problem hiding this comment.
I could do here what I've done below, and add a concurrent.futures.wait() to wrap everything up before the loop resets. That trades off with throughput though probably but it's not worse than what we're doing currently so 🤷
8bb390b to
58e22f6
Compare
Since the hashing is done by C code inside of hashlib which releases the GIL, threading actually does work here despite the normal GIL rules. This adds multithreading to uploads and downloads of files.
58e22f6 to
2ad8e00
Compare
| values are header content. None when not using the HttpDownloader or sublclass. | ||
| """ | ||
|
|
||
| THREADPOOL = ThreadPoolExecutor() |
There was a problem hiding this comment.
Shared between all downloaders. I figured that was better than trying to set up a separate threadpool for every downloader.
|
Added: multithreading in downloader digest calculation. Which should improve download performance and in particular on-demand downloads (since that happens on the fly), not just uploads. |
|
Another 25% of the runtime could probably be shaved off by
Just writing that for posterity. May or may not be worth it |
Note that since the hashing is done by C code inside of hashlib which releases the GIL, threading actually does work here despite the normal GIL rules.