Permalink
Browse files

s3cmd du can gobble gigs of RAM on a bucket with millions of objects.

Re-worked 'du' to traverse the structure and store only the sum at each level, allowing python to free memory at each folder. Went from 4GB consumption (and being killed) on a test bucket with ~50M objects in thousands of directories, to a max of 80M usage.
  • Loading branch information...
1 parent 19a529a commit 444ee63c960ab5fcfe849d6b55b16c269e4864e3 @manos manos committed Jul 11, 2012
Showing with 23 additions and 11 deletions.
  1. +23 −11 s3cmd
View
34 s3cmd
@@ -65,18 +65,30 @@ def subcmd_bucket_usage(s3, uri):
if object.endswith('*'):
object = object[:-1]
- try:
- response = s3.bucket_list(bucket, prefix = object, recursive = True)
- except S3Error, e:
- if S3.codes.has_key(e.info["Code"]):
- error(S3.codes[e.info["Code"]] % bucket)
- return
- else:
- raise
+
bucket_size = 0
- for object in response["list"]:
- size, size_coeff = formatSize(object["Size"], False)
- bucket_size += size
+ # iterate and store directories to traverse, while summing objects:
+ dirs = [object]
+ while dirs:
+ try:
+ response = s3.bucket_list(bucket, prefix=dirs.pop())
+ except S3Error, e:
+ if S3.codes.has_key(e.info["Code"]):
+ error(S3.codes[e.info["Code"]] % bucket)
+ return
+ else:
+ raise
+
+ # objects in the current scope:
+ for obj in response["list"]:
+ if len(response['list']) < 1:
+ break
+ bucket_size += int(obj["Size"])
+
+ # directories found in current scope:
+ for obj in response["common_prefixes"]:
+ dirs.append(obj["Prefix"])
+
total_size, size_coeff = formatSize(bucket_size, Config().human_readable_sizes)
total_size_str = str(total_size) + size_coeff
output(u"%s %s" % (total_size_str.ljust(8), uri))

0 comments on commit 444ee63

Please sign in to comment.