Chunkyard is a fast backup application for Windows and Linux which stores files in a content addressable storage with support for dynamic chunking and encryption.
The FastCDC chunking algorithm is a C# port of the Rust crate fastcdc-rs.
I am building Chunkyard for myself. You might want to consider more sophisticated tools. Here’s a list of options.
- Cross platform support. Chunkyard is shipped as two binaries
chunkyard
(Linux) andchunkyard.exe
(Windows) and they work without having to install .NET on your computer - Favor simplicity and readability over features and elaborate performance tricks
- Strong symmetric encryption (AES Galois/Counter Mode using a 256 bit key)
- Ability to copy from/to other repositories
- Verifiable backups
- No third-party dependencies
- Key management
- Asymmetric encryption
- Compression
- Extended file system features such as OS specific flags or links
- Extended “version control” features such as branching or tagging
- Hiding chunk sizes to prevent CDC fingerprint attacks
- Concurrent operations on a single repository using more than one Chunkyard process (e.g. creating a new backup while garbage collecting unused data)
- Install the .NET SDK
- Run
dotnet build src
anddotnet test src
to build and test Chunkyard
- Run
./publish
to create Linux and Windows binaries in theartifacts
directory - Run
git push --follow-tags
if you have created a new tag in the previous step
Type chunkyard help
to see a list of all available commands. You can add
--help
to any command to get more information about what parameters it
expects.
Example:
# List all available commands
chunkyard help
# Learn more about the store command
chunkyard store --help
# See which files chunkyard would backup
chunkyard store --repository 'MyBackup' --paths 'Music' 'Pictures' 'Videos' --preview
# Store a backup
chunkyard store --repository 'MyBackup' --paths 'Music' 'Pictures' 'Videos' --include '!Desktop\.ini' '!thumbs\.db'
# Check if the latest backup is valid
chunkyard check --repository 'MyBackup'
# Restore parts of the latest backup
chunkyard restore --repository 'MyBackup' --destination '.' --include 'mp3$'
# Keep the latest four backups
chunkyard keep --repository 'MyBackup' --latest '4'
- Blob: Binary data (e.g. the content of a file) with some meta data
- Snapshot: A set of BlobReferences. It describes the current state of a set of Blobs at a specific point in time
- Repository: A store which Chunkyard uses to persist data
- Chunk: An encrypted piece of a Blob or a Snapshot
- Chunk ID: A hash address which can be used to retrieve Chunks
- BlobReference: Contains Chunk IDs and meta data which can be used to restore a Blob
- SnapshotReference: Contains Chunk IDs and meta data which can be used to restore a Snapshot
These classes contain the most important logic:
- IRepository.cs: Defines the underlying backup storage
- IBlobSystem.cs: Provides an abstraction to read and write Blobs
- SnapshotStore.cs: Chunks, encrypts, deduplicates and stores Blobs in an IRepository
- Take a set of files
- Split files into encrypted chunks, store them in a repository and return a list of BlobReferences
- Bundle all BlobReferences into a Snapshot, store this Snapshot as encrypted chunks and return a SnapshotReference
- Retrieve a Snapshot using a SnapshotReference
- Retrieve, decrypt and reassemble all files using their BlobReferences of the given Snapshot