Skip to content

Windows 32/64 binary, 64 bit-HW accelerated

Pre-release
Pre-release
Compare
Choose a tag to compare
@fcorbelli fcorbelli released this 08 Dec 17:39
· 22 commits to main since this release
e7d6f9e

Faster overall

~5% in the average case, due to "smarter" computation of SHA-1 on fragments

New switch -dataset on zfs - unix

This will use zfs filesystem to automagically update files without filesystem scans

zpaqfranz a /tmp/test.zpaq * -dataset "tank/d"

Using point-in-time copy mechanisms (e.g., once every hour) requires scanning the entire filesystem.
zpaqfranz has long supported the ZFS backup feature, but it's at the block level, not at single file level
aka: you can very quickly backup "everything", but to restore "something" (a file) you have to ... restore everything, then get back the file you want

In the case of using large fileservers or with magnetic disks, i.e. on which the filesystem scan is slow, the issue becomes "painful", whatever software you use (tar, 7z, srep or whatever you want)

TRANSLATION

Suppose you have a mid-sized file server, with 1M files
Suppose your system can scan the folders at 500 files/sec (real-world performance for spinning drives), you need AT LEAST (~) 30 minutes (1M/(500*60)) just to enumerate everything
THEN "you" (whatever software you use) can start to "do things" (aka: deduplicate, compress, whatever)

With SSD real world speed is ~ 5K files/sec, with NVMes ~ 30K files/sec

=>
you cannot update the backup (in the example) every 10 minutes

But, with zpaqfranz on zfs, now you can

the -dataset automagically will make a temporary snapshot
On the next run will get changed files from the zfs filesystem, instead of scanning again from scratch
First run, nothing done
In this example the dataset is tank/d.
Datasets are (very crudely) parts of a "disk" (I'm actually obscuring the whole ZFS hierarchy), basically... a folder where you write the data (https://www.illumos.org/books/zfs-ad...r-1.html#ftyue)

root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d" -verbose
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset              <<tank/d>>
franz:-verbose
59901: zfs dataset    tank/d
59839: dataset path   |/tank/d/|
59840: topath         |/tank/d/.zfs/snapshot/franco_diff/|
59856: Base snapshot  tank/d@franco_base
59856: Temp snapshot  tank/d@franco_diff
37720: running        Destroy diff snapshot (if any)
38162: x_one          zfs destroy tank/d@franco_diff
37720: running        Taking  diff snapshot
38162: x_one          zfs snapshot tank/d@franco_diff
39147: running        Getting diff
39149: x_one          zfs diff -F tank/d@franco_base tank/d@franco_diff >/tmp/tempdiff.txt
59877: Load a zfsdiff 0 bytes long file <</tmp/tempdiff.txt>>
63108: zfsdiff lines  0
63119: +              0 -              0
59883: zfsdiff to add 0
59896: Nothing to do (from zfsdiff)

0.032 seconds (000:00:00) (with warnings)

Now create a newfile, somewhere in the dataset, and run again
with conventional "something" you have to enumerate all files, find the "touched" one, then "do something"

zpaqfranz will NOT enumerate all files, but take just the changed one(s), relying on the indication of the changes made by ZFS

In effect, it copies the data from the snapshot, therefore with certainty of consistency, even if it automagically changes its name (as if it were in the dataset, and not inside the snapshot). In short, it is transparent to the user

root@aserver:/tmp/zp # echo "test" >/tank/d/spaz/newfile
root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d"
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset              <<tank/d>>
59901: zfs dataset    tank/d
59883: zfsdiff to add 1

Creating prova2.zpaq at offset 0 + 0
Add 2023-12-02 18:17:12         1                  5 (   5.00  B) 16T (0 dirs)
1 +added, 0 -removed.

0 + (5 -> 5 -> 840) = 840 @ 94.00  B/s

0.099 seconds (000:00:00) (all OK)

Now change again something, and run

root@aserver:/tmp/zp # echo "changed" >/tank/d/spaz/newfile
root@aserver:/tmp/zp # zpaqfranz a prova2.zpaq * -dataset "tank/d"
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-dataset              <<tank/d>>
59901: zfs dataset    tank/d
could not find any snapshots to destroy; check snapshot names.
59883: zfsdiff to add 1

prova2.zpaq:
1 versions, 1 files, 840 bytes (840.00  B)
Updating prova2.zpaq at offset 840 + 0
Add 2023-12-02 18:17:55         1                  8 (   8.00  B) 16T (0 dirs)
1 +added, 0 -removed.

840 + (8 -> 8 -> 843) = 1.683 @ 195.00  B/s

0.086 seconds (000:00:00) (all OK)

In the archive the various version of the file(s) will be ready to a in-time file-level rollback

root@aserver:/tmp/zp # zpaqfranz l prova2.zpaq -all
zpaqfranz v58.12m-JIT-L(2023-12-02)
franz:-all                                      4
prova2.zpaq:
2 versions, 2 files, 1.683 bytes (1.64 KB)


- 2023-12-02 18:17:12                   0       0001| +1 -0 -> 840
- 2023-12-02 18:17:08                   5  0644 0001|/tank/d/spaz/newfile
- 2023-12-02 18:17:55                   0       0002| +1 -0 -> 843
- 2023-12-02 18:17:48                   8  0644 0002|/tank/d/spaz/newfile

48650:                    13 (13.00  B) of 13 (13.00  B) in 4 files shown
48651:                 1.683 compressed  Ratio 129.462 <<prova2.zpaq>>

0.001 seconds (000:00:00) (all OK)

Obviously, the archiving time remains the same (if the changed files are very large, it will take the necessary time).
However, for fileservers used for e-mails, Word documents, etc., written by a few dozen users, the files are relatively small, and can be updated in a matter of seconds.
The real problem is to quickly locate what is the new file "foo.docx" written somewhere
Sure it's not a suitable method for giant virtual machine disks, but its goal is different

Default buffersize is now 1MB (was 4KB)

Time to update read-from-file for solid state World

New command redu

Quite complex command, developing of new "smarter" methods under the hood

zpaqfranz redu z:\*.exe

Fixed some (minor) issues on PowerPC (BIG endian)

Refactoring, removed unused code, a bit of trash stripped, smaller exe (on unix)

Minor bug fixed

This release is not very tested, be careful with valuable data

Download zpaqfranz