Backup tool based on a simple Content Addressed Datastore
Go
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
dumbcaslib
.gitignore
.travis.yml
LICENSE
README.md
archive.go
archive_test.go
common.go
common_test.go
fsck.go
fsck_test.go
gc.go
gc_test.go
info.go
info_test.go
main.go
main_test.go
restore.go
restore_test.go
version.go
web.go
web_test.go

README.md

Dumb Content-Addressed-Datastore

Dumbcas is mainly a backup tool.

The idea of this yet-another-backup-tool is that you can rsync the data around and merge multiple backups together with rsync without problem. Also a single-bit corruption will only affect a single file. To be used as a backup solution that is faster than raw rsync (supports file rename/move efficiently because it's content-addressed) but permits deleting old backups unlike bup.

Dumbcas defines an on-disk CAS (Content-Addressed-Storage) that is somewhat inspired by git objects. It could use a remote CAS backup like camlistore but that's not implemented.

The tool is itself really simple and is a exercise of design. For example, all the unit tests are run in parallel with test case locale logs that are printed out on test case failure. This was certainly challenging for the implementation of the subcommand support.

GoDoc Build Status Coverage Status

Installation

First install Go, then:

go get -u github.com/maruel/dumbcas
dumbcas help

Backup and serve over the web

# List all files or directories to archive.
# - One entry per line.
# - Environment variables are supported.
# - Can be absolute paths or relative to the toArchive file.
echo ${HOME}> toArchive.txt
echo /random/path> toArchive.txt

# Archive the files to /path/to/storage.
dumbcas archive -root=/path/to/storage -comment="My first backup" toArchive.txt

# Verify the archive. Verifies all the sha-1 are valids.
dumbcas fsck -root=/path/to/storage

# Serve over http://localhost:8010/
dumbcas web -root=/path/to/storage

You can set $DUMBCAS_ROOT environment variable to use a default value for -root.

Delete a backup set

rm /path/to/storage/nodes/<month>/<name>
dumbcas gc -root=/path/to/storage

As simple as that.

Background

The tool is based on the fact you set it up and forget about it. So it doesn't to inter-file compression or anything that would make rsync or salvaging files from a broken drive harder.

The main use case is archiving non-compressible media (think family videos and images, music, etc) that is rarely changed.

Other properties includes:

  • Different backups can be merged by rsyncing the thing on each others.
  • Works on 32 bits platforms (like older Atom processors) so the code needs to not load too many things in memory.
  • Doesn't use any C module to keep it simple and usable on Windows.
  • Incremental backups must be fast. It keeps a cache. No-op backups are <3s.
  • Native path-selective backup. I don't want to backup /usr/bin.
  • Must be able to delete old backups.

Non goals

  • Compression, especially inter-file compression. This causes to lose more data than necessary.
  • Special indexing support (like rolling checksums) It causes issues like large file handling on 32 bits platforms.
  • Access control.
  • Store metadata like executable bit. You should backup the source code, not the executables!
  • Anything complex.