Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync threadpool for batched file uploads #19

Merged
merged 8 commits into from
Mar 5, 2024

Conversation

bhperry
Copy link
Contributor

@bhperry bhperry commented Mar 4, 2024

When using rsync, it is more likely to have many small files that need to be uploaded/downloaded. This can be quite slow with the current implementation, which is geared towards uploading large files quickly in parallel chunks.

Because SaturnFS was not written with the async implementation, fsspec's batching blocks on each file transfer until completion, so it does not actually run in batches at all. To get around this we can dispatch each file copy operation to a threadpool.

@bhperry bhperry force-pushed the bhperry/rsync-threadpooling branch from e01cd21 to 04a4d4b Compare March 4, 2024 23:28
@bhperry bhperry changed the title Optimize sfs rsync for many small files rather than few large files rsync threadpool for batched file uploads Mar 5, 2024
@bhperry bhperry force-pushed the bhperry/rsync-threadpooling branch from 7c52f17 to 2ca5127 Compare March 5, 2024 17:26
@bhperry bhperry merged commit 5ce019c into main Mar 5, 2024
1 check passed
@bhperry bhperry deleted the bhperry/rsync-threadpooling branch March 5, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant