Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP


Parallel workers #2

cyno opened this Issue · 24 comments

Starting point could be this nice fork by pcorliss who added parallel workers:
In my case, this speeds up my upload by 50x times.
My use case is uploading 20.000.000 small files around several S3 locations ( assets).

I also forked pcorliss modifications to add parallel workers to cp and mv commands:

As you are currently working on the 1.1 release, even if the code changed from the fork, it would be very nice to resolve conflicts and merge this great fork.

@cyno cyno closed this
@cyno cyno reopened this

I'd like to see this in s3cmd master. Hosting a static website with lots of small files (.css, .js) on S3 is a common usecase. So +1.


+1, would love to have this when uploading lots of files to s3


This would be really helpful in my work as a system administrator.


Is this feature on the roadmap? Our uploads are very slow due to a big number of files. I can't get Pearltrees' fork to work reliably. What about a bounty to finish this?


Also add my vote for some version of this. It'd be great to have parallelization. In the mean, we're going to have to use the fork at Pearltrees's fork.




I took a quick look at this. We would need to extend the (new) connection reuse (ConnMan) that's in 1.5.0-alpha3 to be able to have multiple connections per endpoint, one per worker thread. I don't see that in ConnMan right now. So this isn't a trivial port of the existing work. For sync, the _upload() code path should be easily shardable across multiple parallel threads.

@mdomsch mdomsch closed this
@mdomsch mdomsch reopened this

ConnMan was designed to support that exact scenario - multiple threads sharing a pool of connections to S3. So no problem there. But the rewrite of the core to support threads is a big undertaking to make it right (indeed a quick hack to parallelise this or that code path may be easier but not quite what we want). Will see after 1.5.0 release if I can revive some old work done in this space.


It would be great to see this implemented in an future release. It will really speedup file transfers to s3 in my scenario.


+1, I've been using pcorliss/s3cmd-modification for many years, but it hasn't merged with master here since about 2010. We sync incrementals to S3 in nightly batches, and being able to use --parallel --workers=n makes that possible without huge waits. I'd love to have that functionality on top of all the recent improvements.


+1 for this :)


What I have been doing is something like:

cd /target_dir
for d in *; do
 # If there are errors, make some noise
 egrep -1 'ERROR' /tmp/sync-$d.log
 # Test lock before truncating log file
 /usr/bin/flock -n /tmp/sync-$d true && /usr/bin/flock -n /tmp/sync-$d s3cmd -v ..args..  sync /target_dir/$d/ s3://target_bucket/$d/ > /tmp/sync-$d.log 2>&1 &
 sleep 60

Something like that can run in a fairly tight cron loop and it will at least parallelize the directories in the top. The flock stuff is to keep you only syncing a given directory one at a time, and the grep is to notify you that errors have been encountered on a previous run. Crude, but hopefully effective.

Of course, if I could do without a shell loop .. :)


Are we still doesn't have this feature? Downloading 20000 file tooks very long time because of no parallel capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.