Skip to content

Commit

Permalink
Merge branch 'release/v0.3.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
proofrock committed Nov 3, 2021
2 parents e07ed31 + 9deb282 commit 177785b
Show file tree
Hide file tree
Showing 42 changed files with 2,102 additions and 998 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
snapkup
bin/
test/
.idea/

# Created by https://www.toptal.com/developers/gitignore/api/go,visualstudiocode,macos,linux,windows
# Edit at https://www.toptal.com/developers/gitignore?templates=go,visualstudiocode,macos,linux,windows
Expand Down
134 changes: 90 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,94 +1,140 @@
# 🗃️ snapkup v0.2.0
# 🗃️ snapkup v0.3.0

Snapkup is a simple backup tool that takes snapshots of your filesystem (or the parts that you'll decide), storing them efficiently and conveniently.
Snapkup is a simple backup tool that takes snapshots of your filesystem (or the parts that you'll decide), storing them
efficiently and conveniently.

## Basic workflow

Snapkup's goal is to store efficiently one or more filesystem's situation at given points in time, in a manner that is friendly to e.g. Dropbox sync or removable storage.
The basic flow is:

- You initialize an empty directory that will store the backups
- You register one or more backup roots, directory or files that will be snapshotted
- You take one or more snapshots. Snapkup lists all the tree for those roots, taking a snapshot of the contents
- All the files in the roots are deduplicated, and only the files that are different are stored
- It's possible to compress the files, using `zstd -9`
- Files are stored in an efficient manner, with a shallow directory structure.
- You can restore the situation of the roots at a given snapshot, later on
- Files' and dirs' mode and modification time are preserved
- If you choose to delete any snapshot, dangling backup files are removed.
- Of course, it's possible to list roots and snapshots and delete any of them.
- You take one or more snapshots. Snapkup lists all the tree for those roots, taking a copy of the contents
- You can restore the situation of the roots at any given snapshot
- Of course, it's possible to list roots and snapshots and delete any of them, and perform all the other admin ops

All paths are converted to absolute paths, for consistency.
Notable points:

- Files are deduplicated: only one copy of a file is stored, across the filesystem and all the snapshots
- Everything stored on-disk is encrypted, using `XChaCha20Poly1305`
- Checksums, using authenticated 128-bit `SipHash`, are used to perform deduplication and integrity
- By default, everything is compressed using `zstd -19`. Incompressible files are stored as not compressed.
- Small files can be merged in "agglos", to reduce the number of files and make it more sync-friendly (e.g. for Dropbox)
- Snapkup favors features and code readability over speed. It's not slow, though!
- All paths are converted to absolute paths, for consistency.
- Cross-platform portability of backup archives is not a priority, though it should reasonably work.

Plans for the future:

- Ability to produce all outputs as JSON, for better script-ability
- Ability to retrieve files from external filesystems, via SSH
- Ability to back up data that come from the execution of a command (e.g. `crontab -l`)
- FUSE-mount a snapshot

## Mini-tutorial

We will backup the contents of the `C:\MyImportantDir`, using the `C:\MySnapkupDir` folder as the container of the backup structures. This example is styled after windows, but it's completely similar under UNIXes.
We will back up the contents of the `C:\MyImportantDir`, using the `C:\MySnapkupDir` folder as the container of the
backup structures. This example is styled after windows, but it's completely similar under UNIXes.

**N.B.**: all the commands have shortcuts; e.g. `root add` can be `r a`. Read the help (`snapkup --help`)

**N.B.**: UNIX-style command is used (`snapkup`); of course, under windows you can use `snapkup.exe`

### Set the encryption password

For now, it's read from an environment variable, `SNAPKUP_PASSWORD`, so you can use:

```
[UNIX/BASH] export SNAPKUP_PASSWORD=MyCoolPwd
[WIN/CMD] set "SNAPKUP_PASSWORD=MyCoolPwd"
[WIN/POWERSHELL] $env:SNAPKUP_PASSWORD = 'MyCoolPwd'
```

### Initialize the backup directory

`snapkup.exe -d C:\MySnapkupDir init`
You need to initialize a directory to store the backups to. It's specified with the `--backup-dir` or `-d` flags, and
this flag will need to be repeated for every command.

### Register the directory to backup as a root
`snapkup -d C:\MySnapkupDir init`

`snapkup.exe -d C:\MySnapkupDir root add C:\MyImportantDir`
Requires an empty directory. Creates a shallow dir structure to divide the files, and a `snapkup.dat` file (encrypted)
that will store all the data. Also, generates the encryption and checksum keys.

### Take your first snapshot
### Register the directory to back up as a root

`snapkup -d C:\MySnapkupDir root add C:\MyImportantDir`

Adds a directory as one of the paths to back up. All its contents will be recursively scanned when performing a snapshot
(see below).

`snapkup.exe -d C:\MySnapkupDir snap take`
It can also be a single file. The absolute path will be stored, to avoid ambiguities.

As many roots as you want can be stored; `root list` and `root del` are available to manage the list.

### Take your first snapshot

*Add `-z` if you want to compress the files being backed up. Add `-l` to specify a label.*
`snapkup -d C:\MySnapkupDir snap do`

`snapkup.exe -d C:\MySnapkupDir snap take -z -l "My first label"`
It walks the roots' filesystem trees, and hashes every file. It then compares the hashes with the files already stored,
and stores only those files that are not already seen.

*Alias: `snap do`*
`snapkup -d C:\MySnapkupDir snap do -l "My first label"`

### Change the label of a snap
All (unique) files are stored as data "blobs", that are compressed (unless `--no-compress` is specified), encrypted and
protected with a checksum.

`snapkup.exe -d C:\MySnapkupDir snap label 0 "My First Label"`
Metadata (path, mod time, access mode) of files and dirs is preserved for each snapshot.

*Alias: `snap lbl`*
A snap ID is returned, that can be used for a variety of operations: `snap label`, `snap list`, `snap filelist`,
`snap info` and of course `snap del`.

### Get info on a snapshot
Removing a snapshot with `snap del` removes all the orphaned blobs, freeing disk space.

`snapkup.exe -d C:\MySnapkupDir snap info 0`
### Merge small files

*gives info like: number of files, number of dirs, size, and how much space on backup filesystem will be freed if this snap is deleted.*
When having a multitude of small files is not desirable, e.g. in a remote sync scenario, it's possible to merge files
in an "agglo". You can specify the threshold size of the files to merge and the target size of the agglo, in megabytes.

### Get the file list on a snapshot
`agglo calc` allows you to evaluate the number of files that will be merged, and the result.

`snapkup.exe -d C:\MySnapkupDir snap filelist 0`
`snapkup -d C:\MySnapkupDir agglo calc 1 5`

*prints a list of the directories and files for a snap.*
This will merge all the files up to 1Mb in agglos that are (about) 5Mb. Use `agglo do` with the same parameters, to
actually perform the merge.

*Alias: `snap fl`*
`snapkup -d C:\MySnapkupDir agglo unpack`

### Delete it, because... just because.
Does the opposite, unmerging the files and removing the agglos.

`snapkup.exe -d C:\MySnapkupDir snap del 0`
**N.B.** when deleting a snapshot that references a blob inside an agglo, the agglo is not modified even if it's the
last reference, to avoid triggering a sync. To reclaim the space, `unpack` the agglos and the dangling files will not
be restored; then `perform` again.

*Alias: `snap rm`*
### Restore it!

### Or restore it!
To restore all the roots for snapshot `0`:

`snapkup.exe -d C:\MySnapkupDir snap restore 0 C:\MyRestoreDir`
`snapkup -d C:\MySnapkupDir snap restore 0 C:\MyRestoreDir`

*the destination directory must be empty. It is also possible to specify a prefix path to select only a part of the file list:*
The destination directory must be empty.

`snapkup.exe -d C:\MySnapkupDir snap restore 0 C:\MyRestoreDir --prefix-path /foo/bar`
It is also possible to specify a prefix path to select only a part of the file list:

*Alias: `snap res`*
`snapkup -d C:\MySnapkupDir snap restore 0 C:\MyRestoreDir --prefix-path /foo/bar`

## Status

Everything described above should work. **It's still at an early stage of development, so don't trust it with any critical data, yet**.
Everything described above should work. **It's still at an early stage of development, so don't trust it with any
critical data, yet**.

Next steps:

- Proper testing framework, for reliability
- Improved documentation
- Mounting a snapshot as a FUSE filesystem
- Proper cross-compiling
- Further unit testing
- Improve documentation
- Document the on-disk layout of files, for external review
- Better error handling
- Better recovery of the data structures from errors
- Better/more convenient handling of passwords

## Build

Expand Down
29 changes: 0 additions & 29 deletions docs/db.sql

This file was deleted.

49 changes: 0 additions & 49 deletions src/commands/add_root/add_root.go

This file was deleted.

20 changes: 20 additions & 0 deletions src/commands/agglo/calc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package agglo

import (
"fmt"
"github.com/proofrock/snapkup/model"
"github.com/proofrock/snapkup/util/agglos"
)

func Calc(threshold, target int) func(modl *model.Model) error {
return func(modl *model.Model) error {
agglos, blobs, errPlanning := agglos.Plan(modl, int64(threshold), int64(target))
if errPlanning != nil {
return errPlanning
}

fmt.Printf("%d files will be merged, resulting in %d agglo files.\n", len(blobs), len(agglos))

return nil
}
}
28 changes: 28 additions & 0 deletions src/commands/agglo/do.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package agglo

import (
"fmt"
"github.com/proofrock/snapkup/model"
"github.com/proofrock/snapkup/util/agglos"
)

func Do(bkpDir string, threshold, target int) func(modl *model.Model) error {
return func(modl *model.Model) error {
aggloss, blobs, errPlanning := agglos.Plan(modl, int64(threshold), int64(target))
if errPlanning != nil {
return errPlanning
}

fmt.Printf("%d files will be merged, resulting in %d agglo files.\n", len(blobs), len(aggloss))
println("Performing merge...")

errApplying := agglos.Apply(modl, bkpDir, aggloss, blobs)
if errApplying != nil {
return errApplying
}

println("All done.")

return nil
}
}
19 changes: 19 additions & 0 deletions src/commands/agglo/unpack.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
package agglo

import (
"github.com/proofrock/snapkup/model"
"github.com/proofrock/snapkup/util/agglos"
)

func Unpack(bkpDir string) func(modl *model.Model) error {
return func(modl *model.Model) error {
errSplitting := agglos.SplitAll(modl, bkpDir)
if errSplitting != nil {
return errSplitting
}

println("All ok.")

return nil
}
}
33 changes: 0 additions & 33 deletions src/commands/del_root/del_root.go

This file was deleted.

Loading

0 comments on commit 177785b

Please sign in to comment.