You can clone with
HTTPS or Subversion.
Starting point could be this nice fork by pcorliss who added parallel workers:
In my case, this speeds up my upload by 50x times.
My use case is uploading 20.000.000 small files around several S3 locations (Pearltrees.com assets).
I also forked pcorliss modifications to add parallel workers to cp and mv commands:
As you are currently working on the 1.1 release, even if the code changed from the fork, it would be very nice to resolve conflicts and merge this great fork.
Just moved my fork to :
I'd like to see this in s3cmd master. Hosting a static website with lots of small files (.css, .js) on S3 is a common usecase. So +1.
+1, would love to have this when uploading lots of files to s3
This would be really helpful in my work as a system administrator.
Is this feature on the roadmap? Our uploads are very slow due to a big number of files. I can't get Pearltrees' fork to work reliably. What about a bounty to finish this?
Also add my vote for some version of this. It'd be great to have parallelization. In the mean, we're going to have to use the fork at Pearltrees's fork.
I took a quick look at this. We would need to extend the (new) connection reuse (ConnMan) that's in 1.5.0-alpha3 to be able to have multiple connections per endpoint, one per worker thread. I don't see that in ConnMan right now. So this isn't a trivial port of the existing work. For sync, the _upload() code path should be easily shardable across multiple parallel threads.
ConnMan was designed to support that exact scenario - multiple threads sharing a pool of connections to S3. So no problem there. But the rewrite of the core to support threads is a big undertaking to make it right (indeed a quick hack to parallelise this or that code path may be easier but not quite what we want). Will see after 1.5.0 release if I can revive some old work done in this space.
It would be great to see this implemented in an future release. It will really speedup file transfers to s3 in my scenario.
+1, I've been using pcorliss/s3cmd-modification for many years, but it hasn't merged with master here since about 2010. We sync incrementals to S3 in nightly batches, and being able to use --parallel --workers=n makes that possible without huge waits. I'd love to have that functionality on top of all the recent improvements.
+1 for this :)
What I have been doing is something like:
for d in *; do
# If there are errors, make some noise
egrep -1 'ERROR' /tmp/sync-$d.log
# Test lock before truncating log file
/usr/bin/flock -n /tmp/sync-$d true && /usr/bin/flock -n /tmp/sync-$d s3cmd -v ..args.. sync /target_dir/$d/ s3://target_bucket/$d/ > /tmp/sync-$d.log 2>&1 &
Something like that can run in a fairly tight cron loop and it will at least parallelize the directories in the top. The flock stuff is to keep you only syncing a given directory one at a time, and the grep is to notify you that errors have been encountered on a previous run. Crude, but hopefully effective.
Of course, if I could do without a shell loop .. :)
Are we still doesn't have this feature? Downloading 20000 file tooks very long time because of no parallel capability.