Permalink
Browse files

Refactor with restore capabilities

This update refactors the frontend CLI and breaks the program into four
commands:

* dump - Dumps an entire DynamoDB table to file or S3
* load - Loads a previous dump from file or S3 to a DynamoDB table
* info - Displays metadata of a dump stored in S3
* delete - Deletes a dump from S3

The dump format has been changed to the canonical JSON format for
DynamoDB data used by the DynamoDB API.

Dumps stored in S3 are gzipped and broken into 50Mb individually numbered parts
sharing a common object name prefix.  The load command will reassemble
all parts.
  • Loading branch information...
1 parent 48d22c7 commit 2e59ae52175527027d729f8a05047e64b394bab6 @gwatts committed May 14, 2016
View
@@ -1,76 +1,153 @@
-# AWS DynamoDB Table Dump
+# AWS DynamoDB Table Dump and Restore
[![Build Status](https://travis-ci.org/gwatts/dyndump.svg?branch=master)](https://travis-ci.org/gwatts/dyndump)
This utility performs a full scan of an AWS DynamoDB table and outputs each
-row as a JSON object.
+row as a JSON object, storing the dump either on disk or in an S3 bucket.
-It supports rate-limiting to a specified average read capacity and parallel
-requests to achieve high throughput.
+It supports rate-limiting to a specified average read or write capacity and
+parallel requests to achieve high throughput.
The underlying Go library can also be imported into other projects to provide
-scan facilities.
+dump/load facilities.
## Download
Binaries are available for Linux, Solaris, Windows and Mac for the
[latest release](https://github.com/gwatts/dyndump/releases).
+## Compile
+
+[Install Go](https://golang.org/doc/install) and run
+`go get github.com/gwatts/dyndump`.
+
## Utility Usage
+AWS credentials required to connect to DynamoDB must be passed in using
+environment variables, or will be loaded from ~/.aws/credentials or using EC2 metadata.
+
+* `AWS_REGION`
+* `AWS_ACCESS_KEY_ID`
+* `AWS_SECRET_ACCESS_KEY`
+
+The dyndump program supports four commands:
+
+### Dump
+
+Dumps an entire DynamoDB table to file or an S3 bucket.
```
-Usage of dyndump:
- -consistent-read
- Enable consistent reads (at 2x capacity use)
- -maxitems int
- Maximum number of items to read. Set to 0 to read all items
- -parallel int
- Number of concurrent channels to open to DynamoDB (default 4)
- -read-capacity int
- Average aggregate read capacity to use for scan (set to 0 for unlimited) (default 5)
- -region string
- AWS Region (default "us-west-2")
- -silent
- Don't print progress to stderr
- -string-nums
- Output numbers as exact value strings instead of converting
- -tablename string
- DynamoDB table name to dump
- -target string
- Filename to write data to. Defaults to stdout (default "-")
- -typed
- Include type names in JSON output (default true)
+Usage: dyndump dump [--silent] [--no-progress] [-cmpr] [--filename | --stdout] [(--s3-bucket --s3-prefix)] TABLENAME
+
+Dump a table to file or S3
+
+Arguments:
+ TABLENAME="" Table name to dump from Dynamo
+
+Options:
+ -c, --consistent-read=false Enable consistent reads (at 2x capacity use)
+ -f, --filename="" Filename to write data to.
+ --stdout=false If true then send the output to stdout
+ -m, --maxitems=0 Maximum number of items to dump. Set to 0 to process all items
+ -p, --parallel=5 Number of concurrent channels to open to DynamoDB
+ -r, --read-capacity=5 Average aggregate read capacity to use for scan (set to 0 for unlimited)
+ --s3-bucket="" S3 bucket name to upload to
+ --s3-prefix="" Path prefix to use to store data in S3 (eg. "backups/2016-04-01-12:25-")
+ --silent=false Set to true to disable all non-error output
+ --no-progress=false Set to true to disable the progress bar
```
-By default the JSON output provides an array of objects, with keys for each
-column in the row of the table and values mapping to the values in the row.
-
-Numeric values are returned by DynamoDB as strings, but are converted to
-floats unless the -string-nums option is specified.
-
-Specifying -typed changes the values in each object to become an object
-with keys of "type" and "value" where "type" may be one of DynamoDB's
-supported types:
-* binary
-* binary-set
-* bool
-* list
-* map
-* number
-* number-set
-* null
-* string
-* string-set
+### Load
+
+Loads a previous dump from file or S3 into an existing DynamoDB table
+
+```
+
+Usage: dyndump load [--silent] [--no-progress] [-mpw] (--filename | --stdin | (--s3-bucket --s3-prefix)) TABLENAME
+
+Load a table dump from S3 or file to a DynamoDB table
+
+Arguments:
+ TABLENAME="" Table name to load into
+
+Options:
+ --allow-overwrite=false Set to true to overwrite any existing rows
+ -f, --filename="" Filename to read data from. Set to "-" for stdin
+ --stdin=false If true then read the dump data from stdin
+ -m, --maxitems=0 Maximum number of items to load. Set to 0 to process all items
+ -p, --parallel=4 Number of concurrent channels to open to DynamoDB
+ -w, --write-capacity=5 Average aggregate write capacity to use for load (set to 0 for unlimited)
+ --s3-bucket="" S3 bucket name to read from
+ --s3-prefix="" Path prefix to use to read data from S3 (eg. "backups/2016-04-01-12:25-")
+ --silent=false Set to true to disable all non-error output
+ --no-progress=false Set to true to disable the progress bar
+```
+
+### Info
+
+Retrieves and displays metadata about a dump stored in S3
+
+```
+Usage: dyndump info --s3-bucket --s3-prefix
+
+Display backup metadata from an S3 backup
+
+Options:
+ --s3-bucket="" S3 bucket name to read from
+ --s3-prefix="" Path prefix to use to read data from S3 (eg. "backups/2016-04-01-12:25-")
+```
+
+### Delete
+
+Deletes an entire dump from S3 matching a specified prefix.
+
+```
+
+Usage: dyndump delete [--silent] [--no-progress] --s3-bucket --s3-prefix [--force]
+
+Delete a backup from S3
+
+Options:
+ --s3-bucket="" S3 bucket name to delete from
+ --s3-prefix="" Path prefix to use to delete from S3 (eg. "backups/2016-04-01-12:25-")
+ --force=false Set to true to disable the delete prompt
+ --silent=false Set to true to disable all non-error output
+ --no-progress=false Set to true to disable the progress bar
+```
+
+
+## Output Format
+
+JSON is emitted as a stream of objects, one per item in the canonical format
+used by the DynamoDB API. Each object has a key for each field name with a
+value object holding the type and field value. Eg
+
+```
+ {
+ "string-field": {"S": "string value"},
+ "number-field": {"N": "123"}
+ }
+```
+
+The following types are defined by the DynamoDB API:
+
+```
+ * S - String
+ * N - Number (encoded in JSON as a string)
+ * B - Binary (a base64 encoded string)
+ * BOOL - Boolean
+ * NULL - Null
+ * SS - String set
+ * NS - Number set
+ * BS - Binary set
+ * L - List
+ * M - Map
+```
-AWS credentials required to connect to DynamoDB must be passed in using
-environment variables:
-* AWS_ACCESS_KEY_ID
-* AWS_SECRET_KEY
-## dyndump library
+## Library
See the [godoc documentation](https://godoc.org/github.com/gwatts/dyndump/dyndump)
-for the github.com/gwatts/dyndump/dyndump library to integrate scanning into
+for the github.com/gwatts/dyndump/dyndump library to integrate the library into
your own projects.
View
@@ -0,0 +1,78 @@
+// Copyright 2016 Gareth Watts
+// Licensed under an MIT license
+// See the LICENSE file for details
+
+package main
+
+import (
+ "errors"
+ "fmt"
+ "io"
+
+ "github.com/Bowery/prompt"
+ "github.com/aws/aws-sdk-go/aws/session"
+ "github.com/aws/aws-sdk-go/service/s3"
+ "github.com/cheggaaa/pb"
+ "github.com/gwatts/dyndump/dyndump"
+)
+
+type deleter struct {
+ del *dyndump.S3Deleter
+
+ // options
+ force *bool
+ s3BucketName *string
+ s3Prefix *string
+}
+
+func (d *deleter) init() error {
+ del, err := dyndump.NewS3Deleter(s3.New(session.New()), *d.s3BucketName, *d.s3Prefix)
+ if err != nil {
+ return err
+ }
+ if !*d.force {
+ fmt.Printf("Delete backup of table %s from s3://%s/%s\n\n", del.Metadata().TableName, *d.s3BucketName, *d.s3Prefix)
+ ok, err := prompt.Ask("Are you sure you wish to delete the above backup")
+
+ if err != nil {
+ return fmt.Errorf("Could not prompt for confirmation (use --force to override): %v", err)
+ }
+ if !ok {
+ return errors.New("User rejected delete")
+ }
+ }
+
+ d.del = del
+ return nil
+}
+
+func (d *deleter) start(infoWriter io.Writer) (done chan error, err error) {
+ fmt.Fprintf(infoWriter, "Beginning s3 delete prefix=s3://%s/%s parts=%d\n",
+ *d.s3BucketName, *d.s3Prefix, d.del.Metadata().PartCount)
+
+ done = make(chan error)
+
+ go func() {
+ done <- d.del.Delete()
+ }()
+
+ return done, nil
+}
+
+func (d *deleter) newProgressBar() *pb.ProgressBar {
+ bar := pb.New64(d.del.Metadata().PartCount)
+ return bar
+}
+
+func (d *deleter) updateProgress(bar *pb.ProgressBar) {
+ bar.Set64(d.del.Completed())
+}
+
+func (d *deleter) abort() {
+ d.del.Abort()
+}
+
+func (d *deleter) printFinalStats(w io.Writer) {
+ fmt.Fprintf(w, "Deleted %d parts from s3://%s/%s\n",
+ d.del.Completed(), *d.s3BucketName, *d.s3Prefix)
+}
Oops, something went wrong.

0 comments on commit 2e59ae5

Please sign in to comment.