-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Increase hub download chunk size #116536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase hub download chunk size #116536
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116536
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit afe2cdd with merge base 71ec3ed ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label "topic: not user facing" |
Reading in larger chunks should be more efficient. Signed-off-by: Aarni Koskela <akx@iki.fi>
2ab241d
to
7654fed
Compare
unit='B', unit_scale=True, unit_divisor=1024) as pbar: | ||
while True: | ||
buffer = u.read(8192) | ||
buffer = u.read(READ_DATA_CHUNK) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @akx. I'm not an expert in download performance, so before moving forward I'd have a few questions:
reading in larger chunks should be more efficient
On a stable ad fast network, does it actually lead to a measurable difference in download speed?
What could be potential issues with making the READ_DATA_CHUNK
bigger? Is it just more RAM usage or is there something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug Thanks for getting back to me!
On a stable ad fast network, does it actually lead to a measurable difference in download speed?
Glad you asked! The bottleneck is not the network stability or speed here; it's how many times you switch from Python to C (the underlying SSL/HTTP socket read()
call) and back. The same stands for feeding lots of small chunks to the hasher (also C code) when you could be feeding more.
I wrote a small benchmark to prove this – downloading from a local HTTP server and to /dev/null
, so we can assume there is no network or disk overhead. The data is random, so we can assume there's no compression hijinks going on either.
import hashlib
import os
from urllib.request import Request, urlopen
import tqdm
url = "http://192.168.68.102:8000/foo"
chunk = int(os.environ["CHUNK"])
req = Request(url, headers={"User-Agent": "torch.hub"})
u = urlopen(req)
sha256 = hashlib.sha256()
with open("/dev/null", "wb") as f, tqdm.tqdm(unit='B', unit_scale=True, unit_divisor=1024) as pbar:
while True:
buffer = u.read(chunk)
if len(buffer) == 0:
break
f.write(buffer)
sha256.update(buffer)
pbar.update(len(buffer))
$ hyperfine 'env CHUNK=8192 python download_speed_test.py' 'env CHUNK=131072 python download_speed_test.py' --show-output
Benchmark 1: env CHUNK=8192 python download_speed_test.py
0.98GB [00:00, 1.37GB/s]
[... snip ...]
Time (mean ± σ): 966.1 ms ± 32.1 ms [User: 626.7 ms, System: 202.0 ms]
Range (min … max): 911.6 ms … 1005.2 ms 10 runs
Benchmark 2: env CHUNK=131072 python download_speed_test.py
0.98GB [00:00, 2.09GB/s]
[... snip ...]
Time (mean ± σ): 690.1 ms ± 12.6 ms [User: 458.7 ms, System: 95.0 ms]
Range (min … max): 668.2 ms … 711.0 ms 10 runs
Summary
env CHUNK=131072 python download_speed_test.py ran
1.40 ± 0.05 times faster than env CHUNK=8192 python download_speed_test.py
IOW, this single-number change made downloading from a perfect network 1.4x faster. You can also see that there's 10 percentage points less time spent in syscalls in the faster version. (In this scenario, updating the chunk size to 524288 gives diminishing returns, more so on non-perfect networks.)
What could be potential issues with making the READ_DATA_CHUNK bigger?
Slightly increased memory usage during downloads, as you noted. Considering where torch
is supposed to be used, 128 KiB instead of 8 KiB should not make a difference for any real use case 😁
The other is that the progress bar, if enabled, won't update quite as often (since it is only updated between chunk reads), but with this chunk size and modern networks, that shouldn't matter either.
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the detailed answer and for the benchmarks @akx . It looks like the speed-up should be fairly minimal for any reasonable ML workload, but since this PR adds zero complexity/maintenance cost, I'm happy to merge it. I just made one nit suggestion in https://github.com/pytorch/pytorch/pull/116536/files#r1440361098. LMK if it's OK with you and I'll merge. Thank you !
@NicolasHug Thank you, committed the suggestion! You're right in that this will probably not make a large difference for a general ML workload (where you'd be expected to have downloaded models and other data already), but e.g. notebooks and CI runs should see some wallclock speedup and less CPU cycles spent (good for the planet, hey!) 😄 |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This PR increases the read size for the
hub.download_url_to_file
function from 8,192 bytes to 131,072 bytes (128 * 1,024), as reading in larger chunks should be more efficient. The size could probably be larger still, at the expense of the progress bar not getting updated as often.It re-introduces use of the
READ_DATA_CHUNK
constant that was originally used for this purpose in 4a3baec and since forgotten.cc @nairbv @NicolasHug @vmoens @jdsgomes @penguinwu