Loads AWS detailed billing reports into a redshift cluster.
Still a manual process for now:
- Create a Redshift cluster.
- Once the cluster is alive, connect with your favorite postgres client and create the
line_items
andmonth_to_date
tables. The SQL for creating each are in thesql/
subdirectory.
Set these environment variables. Some of them may be overridden at runtime with command-line switches, run the relevant script with --help
for more details.
SOURCE_BUCKET
: the S3 bucket where DBRs are deposited by Amazon.STAGING_BUCKET
: the S3 bucket into which pre-processed DBRs are staged before importing to redshift.AWS_KEY
orSOURCE_AWS_KEY
andSTAGING_AWS_KEY
: the AWS access key ID credential for accessing S3. If the same credentials are used for both the source and staging buckets, you can just setAWS_KEY
. If separate credentials are neccessary, you can specifySOURCE_AWS_KEY
andSTAGING_AWS_KEY
instead.AWS_SECRET
orSOURCE_AWS_SECRET
andSTAGING_AWS_SECRET
: Same asAWS_KEY
, but for your AWS access key secret.REDSHIFT_URI
: a connection URI for redshift. Should include credentials, like the formpostgres://myUser:s0mep4ssword@hostname:port/dbname
ROLLBAR_TOKEN
: a token for error reporting to Rollbar.ROLLBAR_ENVIRONMENT
: an environment name for error reporting to Rollbar.
There are two scripts: import_finalized.js
and import_month_to_date.js
. Both are intended to be run on a daily schedule, preferably at night. Run duration is largely dependent on the size of your DBRs; for large DBRs runs of a few hours are common.
Invoke either with --help
for invocation instructions.
This script imports "finalized" DBRs — specifically, the DBR for the previous month according to UTC.
The script first checks to see if there's a finalized DBR which hasn't been imported yet. If there is no new finalized DBR, the script terminates immediately. Once a month, when a new finalized DBR appears, the script will download, unzip, gzip, stage, and import the DBR into a temporary table named staging_YYYY_MM
. Once that process is complete, it adds a statement_month
column with the relevant month, copies the entire staging table into line_items
, drops the staging table, and VACUUM
s the line_items table.
This script imports "month-to-date" DBRs, which contain "estimated" billing data but are not 100% accurate. Upon every import, the current month's DBR is downloaded, unzipped, gzipped, and staged. The month_to_date
table is emptied by means of TRUNCATE (eliminating the need for an interim VACUUM), and the staged DBR is imported, followed by a VACUUM.
You can run these on your local machine, but unless you live very nearby the AWS datacenters where your source and staging S3 buckets are located, you'll have better performance running them on Heroku.
Use PX dynos for invoking either script; smaller dyno types lack the memory and storage to get the job done.
Here's a sample invocation:
`heroku run -s PX "iojs import_finalized.js"
If you want to run it without fear of laptop disconnections, you can run the process in detached mode:
heroku run:detached -s PX "iojs import_finalized.js"
You can track progress by running heroku logs -t
- One-off month imports
- Heroku button!
License: MIT. See LICENSE.txt.
Questions? Comments? Hit up tools@heroku.com.