Cain is a backup and restore tool for Cassandra on Kubernetes. It is named after the DC Comics superhero Cassandra Cain.
Cain supports the following cloud storage services:
- AWS S3
- Minio S3
- Azure Blob Storage
- Google Cloud Storage
Cain is now an official part of the Helm incubator/cassandra chart!
- git
- dep
Download the latest release from the Releases page or use it with a Docker image
mkdir -p $GOPATH/src/github.com/maorfr && cd $_
git clone https://github.com/maorfr/cain.git && cd cain
make
Cain performs a backup in the following way:
- Backup the
keyspace
schema (usingcqlsh
). - Get backup data using
nodetool snapshot
- it creates a snapshot of thekeyspace
in all Cassandra pods in the givennamespace
(according toselector
). - Copy the files in
parallel
to cloud storage using Skbn - it copies the files to the specifieddst
, undernamespace/<cassandrClusterName>/keyspace/<keyspaceSchemaHash>/tag/
. - Clear all snapshots.
$ cain backup --help
backup cassandra cluster to cloud storage
Usage:
cain backup [flags]
Flags:
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
--dst string destination to backup to. Example: s3://bucket/cassandra. Overrides $CAIN_DST
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
Backup to AWS S3
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
Backup to Azure Blob Storage
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst abs://my-account/db-backup-container/cassandra
Backup to Google Cloud Storage
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst gcs://db-backup/cassandra
Cain performs a restore in the following way:
- Restore schema if
schema
is specified. - Truncate all tables in
keyspace
. - Copy files from the specified
src
(underkeyspace/<keyspaceSchemaHash>/tag/
) - restore is only possible for the same keyspace schema. - Load new data using
nodetool refresh
.
$ cain restore --help
restore cassandra cluster from cloud storage
Usage:
cain restore [flags]
Flags:
-b, --buffer-size float in memory buffer size (MB) to use for files copy (buffer per file). Overrides $CAIN_BUFFER_SIZE (default 6.75)
--cassandra-data-dir string cassandra data directory. Overrides $CAIN_CASSANDRA_DATA_DIR (default "/var/lib/cassandra/data")
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-p, --parallel int number of files to copy in parallel. set this flag to 0 for full parallelism. Overrides $CAIN_PARALLEL (default 1)
-s, --schema string schema version to restore (optional). Overrides $CAIN_SCHEMA
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--src string source to restore from. Example: s3://bucket/cassandra/namespace/cluster-name. Overrides $CAIN_SRC
-t, --tag string tag to restore. Overrides $CAIN_TAG
--user-group string user and group who should own restored files. Overrides $CAIN_USER_GROUP (default "cassandra:cassandra")
Restore from S3
cain restore \
--src s3://db-backup/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Restore from Azure Blob Storage
cain restore \
--src s3://my-account/db-backup-container/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Restore from Google Cloud Storage
cain restore \
--src gcs://db-backup/cassandra/default/ring01
-n default \
-k keyspace \
-l release=cassandra \
-t 20180903091624
Cain describes the keyspace
schema using cqlsh
. It can return the schema itself, or a checksum of the schema file (used by backup
and restore
).
$ cain schema --help
get schema of cassandra cluster
Usage:
cain schema [flags]
Flags:
-c, --container string container name to act on. Overrides $CAIN_CONTAINER (default "cassandra")
-k, --keyspace string keyspace to act on. Overrides $CAIN_KEYSPACE
-n, --namespace string namespace to find cassandra cluster. Overrides $CAIN_NAMESPACE (default "default")
-l, --selector string selector to filter on. Overrides $CAIN_SELECTOR (default "app=cassandra")
--sum print only checksum. Overrides $CAIN_SUM
cain schema \
-n default \
-l release=cassandra \
-k keyspace
cain schema \
-n default \
-l release=cassandra \
-k keyspace \
--sum
Cain commands support the usage of environment variables instead of flags. For example:
The backup
command can be executed as mentioned in the example:
cain backup \
-n default \
-l release=cassandra \
-k keyspace \
--dst s3://db-backup/cassandra
You can also set the appropriate envrionment variables (CAIN_FLAG, _ instead of -):
export CAIN_NAMESPACE=default
export CAIN_SELECTOR=release=cassandra
export CAIN_KEYSPACE=keyspace
export CAIN_DST=s3://db-backup/cassandra
cain backup
Since Cain uses Skbn, adding support for additional storage services is simple. Read this post for more information.
Cain version | Skbn version |
---|---|
0.6.0 | 0.5.0 |
0.5.1 | 0.4.2 |
0.5.0 | 0.4.1 |
0.4.2 | 0.4.1 |
0.4.1 | 0.4.1 |
0.4.0 | 0.4.0 |
0.3.0 | 0.3.0 |
0.2.0 | 0.2.0 |
0.1.0 | 0.1.1 |
Cain tries to get credentials in the following order:
- if
KUBECONFIG
environment variable is set - cain will use the current context from that config file - if
~/.kube/config
exists - cain will use the current context from that config file with an out-of-cluster client configuration - if
~/.kube/config
does not exist - cain will assume it is working from inside a pod and will use an in-cluster client configuration
Cain uses the default AWS credentials chain.
Cain uses AZURE_STORAGE_ACCOUNT
and AZURE_STORAGE_ACCESS_KEY
environment variables for authentication.
Cain uses Google Application Default Credentials.
Basically, it will first look for the GOOGLE_APPLICATION_CREDENTIALS
environment variable. If it is not defined, it will look for the default service account, or throw an error if none is configured.