Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very low throughput when uploading single file compared to using awscli #667

Open
mboutet opened this issue Sep 22, 2023 · 3 comments
Open

Comments

@mboutet
Copy link

mboutet commented Sep 22, 2023

Uploading a single file of around 152M is significantly slower using s5cmd compared to using awscli. awscli is able to achieve a throughput of around ~55MiB/s whereas s5cmd is only able to reach ~4.4MiB/s. I tested with various concurrency settings (1, 5, 10, 25, 50) and always 1 worker (since it's a single file) and it makes close to no difference. I also tested with various file size: 36M, 152M, 545M, 2.6G, 6.9G and I can observe the same low throughput.

Here's a screenshot of a network capture I made comparing awscli (left) and s5cmd (right) using a concurrency setting of 5:

Screenshot 2023-09-22 at 9 44 41 AM

It seems like s5cmd is transferring the file into many smaller chunks instead of fewer bigger chunks like awscli is doing.

The command I'm using is:

s5cmd \
    --profile my_profile --numworkers=1 \
    --endpoint-url=https://mycephs3endpoint \
    cp --concurrency=5 --show-progress \
    "${temp_dir}/archive.tar.lz4" \
    "s3://${bucket_name}/test-mboutet/${key}/archive.tar.lz4"

Versions:

❯ aws --version
aws-cli/2.11.5 Python/3.11.2 Linux/5.4.0-163-generic exe/x86_64.ubuntu.20 prompt/off

❯ s5cmd version
v2.2.2-48f7e59

I'm using Ceph S3 and I'm able to reproduce the issue when running the same upload command on other servers.

@denizsurmeli
Copy link
Contributor

Hi, there is a flag part-size for the cp command. You can adjust the chunk size as you wish.

@mboutet
Copy link
Author

mboutet commented Sep 25, 2023

@denizsurmeli, unfortunately --part-size didn't help.

I tested with all the combinations of the following parameters:

  • Object size to upload: 36M, 152M, 545M
  • Concurrency: 1, 5, 10, 25, 50, 100
  • Part size: 5, 10, 25, 50

concurrency = 25, part_size = 10 gave the best throughput (around 20 MB/s), while most of the other combinations yield throughputs of 2-5 MB/s. 20 MB/s is still way below what awscli is able to do. For small objects less than around 20MB, s5cmd wins, but it's just because it has no overhead at startup whereas awscli has around 6-7s overhead before it actually starts to do something.

@kucukaslan
Copy link
Contributor

Just for reference:

The problem seems to be related to #418

At the time I tried to tackle it but I couldn't:(

I made a few attempts to optimize write requests to achieve increase throughput without using the storage optimized instances. But I couldn't find a viable solution.

#418 (comment)

see also #418 (comment)_

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants