-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support max_concurrency
in upload_blob
and download_blob
operations
#420
Conversation
b329028
to
a7eeb89
Compare
account_name=storage.account_name, connection_string=CONN_STR | ||
account_name=storage.account_name, | ||
connection_string=CONN_STR, | ||
max_concurrency=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrency cannot be used for this test, otherwise storage.insert_time
will be the timestamp of the first completed chunk and creation_time
will be the timestamp of the finished operation (after all chunks are uploaded).
@hayesgb @TomAugspurger Hey folks. Could you take a look, please? Just want to make sure you are fine with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one question but feel free to merge.
Btw, for the record, some comparisons to show how much this patch speeds things up for us iterative/dvc-azure#54 (comment) |
max_concurrency=None
kwarg in async fs methods that use the azure SDKupload_blob
anddownload_blob
(which azure then uses to parallelize async chunk uploads/downloads)AzureBlobFileSystem.max_concurrency
attribute which is used whenever method-levelmax_concurrency
is not setfsspec.asyn._get_batch_size()
max_concurrency
defaults to 1 for the individual file upload/download.batch_size=...
is used to parallelize uploads/downloads at the file level (in something likefs._get()
orfs._put()
), and no additional parallelization is done for chunks within each file.batch_size
andmax_concurrency
. i.e.fs.get(path, batch_size=4, max_concurrency=2)
would download up to 4 files at a time, and up to 2 chunks at a time within each file, giving an overall concurrency of up to 8 async download coroutines being run in the loop at a time.Closes #268 (and supercedes the changes in the
concurrent_io
branch)This PR is incompatible with #329 (but there was discussion in that PR regarding changing the name of the parameter used to something other than
max_concurrency
since it conflicts with the the azure SDK parameter)