Skip to content
Like `du` but for S3
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
du.py
helpers.py
license.txt
readme.md
requirements.txt

readme.md

s3_disk_util

  • What -- A tool that allows a user to visualize which buckets (and parts of buckets) are using the most data storage.
  • Why -- Because I'm trying to pare down my S3 bill, and the S3 control panels (even CloudWatch) do not really provide anything similar.
  • Inspiration -- This script is meant to be like the du tool for linux, except for inspecting the disk usage of s3 buckets.
  • How -- It will traverse s3 buckets and provide high level disk usage information to stdout.

Usage

kevinowocki@local /Users/kevinowocki/Desktop/s3_disk_util~ % python3 du.py --help
usage: du.py [-h] [-b BUCKET] [-p PROFILE] [-d DEPTH] [-di DIR]

This script is meant to be like the `du` tool for linux, except for inspecting
the disk usage of s3 buckets. It will traverse s3 buckets and provide high
level disk usage information to stdout.

optional arguments:
  -h, --help            show this help message and exit
  -b BUCKET, --bucket BUCKET
                        Bucket to examine (ex: 'com.owocki.assets')
  -p PROFILE, --profile PROFILE
                        AWS credentials profile name (default: 'default')
  -d DEPTH, --depth DEPTH
                        Depth to examine bucket (ex: 4)
  -di DIR, --dir DIR    Directory to examine (ex: 'logs/')

Example

kevinowocki@local /Users/kevinowocki/Desktop/s3_disk_utils~ % python3 du.py --depth=1 --bucket=BUCKETNAME --profile=mytestaccount
BUCKETNAME
(Cloudwatch bucket size estimate: 22.7GiB)
/ : 22.7GiB
- DIR1/ : 22.6GiB
- DIR2/ : 452.6KiB
- DIR3/ : 1.6MiB
- DIR4/ : 119.0MiB
- DIR5/ : 0.0B

kevinowocki@local /Users/kevinowocki/Desktop/s3_disk_util~ % python3 du.py --depth=2 --bucket=BUCKETNAME
BUCKETNAME
(Cloudwatch bucket size estimate: 22.7GiB)
/ : 22.7GiB
- DIR1/ : 22.6GiB
-- DIR1/SUBDIR1/ : 31.1MiB
-- DIR1/SUBDIR2/ : 12.7GiB
-- DIR1/SUBDIR3/ : 0.0B
-- DIR1/SUBDIR4/ : 9.9GiB
- DIR2/ : 452.6KiB
-- DIR2/SUBDIR1/ : 429.5KiB
- DIR3/ : 1.6MiB
-- DIR3/SUBDIR1/ : 254.4KiB
- DIR4/ : 119.0MiB
- DIR5/ : 0.0B

Setup

  1. Create a AWS IAM user at https://console.aws.amazon.com/iam/home.
    • Make sure your user has AmazonS3FullAccess and CloudWatchReadOnlyAccess policies.
  2. Use your existing ~/.aws/credentials file and profile names or create a config file that looks like this:
kevinowocki@local /Users/kevinowocki/Desktop/s3_disk_utils~ % cat ~/.aws/credentials
[default]
aws_access_key_id = ACCESS_KEY_GOES_HERE
aws_secret_access_key = SECRET_KEY_GOES_HERE
region=REGION
  1. Clone this repo.
  2. Install python3 (if needed) and boto3 (if needed).
    • To instally python3, instructions different depending upon your OS. Here are some instructions for Mac OS X
    • To install boto3, pip install -r requirements.txt
  3. Run du.py with the usage described above.

What else

This script can run a little slow on larger buckets. Thats okay; This is a limitation inherent to the way this information is provided via AWS APIs. Pipe du.py's' output to a file (perhaps inside of a screen or tmux) and come back later.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.