Java Ma_Sys.ma Block Backup JMBB
Copyright (c) 2013, 2014, 2015, 2017, 2019, 2020 Ma_Sys.ma. For further info send an e-mail to Ma_Sys.firstname.lastname@example.org. This program's encryption functions are modifications of Java AESCrypt, Copyright 2008 Vócali Sistemas Inteligentes. For further information refer to http://www.aescrypt.com/java_aes_crypt.html.
JMBB allows you to create encrypted incremental backups from source directories to a destination directory. Unlike most encrypting backups JMBB does not use encryption on per-file basis. Instead, it adds multiple files to a compressed and encrypted archive file called “block”. This securely encryptes the source directory structure and all file metadata. Whenever files have changed, they are added to new blocks at the next incremental backup. When all files contained in a block have been replaced by files from newer blocks (i.e. when all files in an old block were changed over time) the old block is deleted. This ensures that with common usecases the incremental backup will not be bigger than the source data.
An automatically generated usage information which also conatins “long”
arguments can be obtained by invoking JMBB with
JMBB is usually run through either a
jmbb command if it was installed on UNIX
or Linux or with
java -jar JARFILE with JARFILE being the
containing the program.
Example: Creating a backup with JMBB from
$ java -Xmx5G -jar jmbb.jar -o /media/backup -i /data
If you are getting an out of memory error add suitable
-XmxNG values with N
being enough RAM for the program to work. On Linux you can enter about as much
as your amount of physical RAM + size of available swap space.
On Windows and all other systems where they are not installed by default you need the “Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files” from http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html or JMBB will not encrypt or decrypt and therefore not work at all. This is related to the fact, that JMBB uses AESCrypt's strong cryptography which you need to enable separately for your Java installation. To resolve this, JMBB would either need to package the file with it (licensing issues prevent me from doing this) or implement the whole AES stuff itself which would lead to more bugs in the code and is therefore also not an option.
JMBB was created with the following scenario in mind: The user wants to create an online “cloud” backup but wants to encrypt his/her files. However, transferring a whole, encrypted 7z archive for every backup is very time-consuming. Instead, the user wants to create an incremental backup. The best solutions widely known are either “encfs” or “Box Cryptor” both of which act on per file basis and therefore not encrypt access times, file sizes and only obfuscate filenames, not encrypt directory structures. JMBB can not directly synchronize with an online storage but write to any directory which may then be synced using the client program provided by the “cloud service”. JMBB is an incremental backup program, that means neither a program to synchronize files, nor a program to create disk images or backup system installations. The technical limitations (cf. tables below) make JMBB unuseable for handling large files of 8 GiB or larger. Creating a backup of 8 GiB or more data however, is not an issue.
JMBB has an internal editor for database maintenace tasks. It can be started as follows:
$ jmbb -e BACKUP
The editor can be used interactively to set new passwords, display existing passwords, display database statistics and clear blocks which mainly consist of obsolete files. Also, it allows the user to view contents of a specific block as they are registered in the database. If a password has been cracked or leaked, the editor can also “deprecate” passwords, i. e. force all blocks using the obsolete password to be re-created on the next JMBB invocation.
Transition to 126.96.36.199
Before 188.8.131.52, sometimes an incremental backup would create sparsely filled blocks without an apparent reason. This was a result of how JMBB decided which files were to be added to blocks – files which matched their previous version were not added at the stage of block creation when it as already defined which files would be contained in a block. Using programs which rewrite a lot of data which has not chagned upon program exit, this caused empty blocks. Although empty blocks were no “danger”, they had a major issue: JMBB did not ever delete them because their contained files never changed to obsolete (because there were no files contained).
With Version 184.108.40.206, this issue has been fixed by applying the following:
Checksums are now calculated at the stage of block file list creation in a
single threaded manner if the checksum could influence if the file is added to
a block or not. Therefore, empty blocks will no longer be created. To be able
to clean your old backup tree, a new command
empty has been added to list and
emtpy rm delete all empty blocks. It has been tested that empty blocks
are not relevant for restoration.
Transition to 220.127.116.11
Before 18.104.22.168, JMBB did not store any information whenever a file was found to have a newer timestamp but already existed in the database with the same checksum. The main idea about this was to avoid backing up a known file again and instead just keep the existing (copy) where the modification time is older but all other metadata (and the content) are equal.
Upon upgrading to Debian Stretch with Java 8, a strong disadvantage of this approach was found: Due to the upgrade, the modification times of existing files could be read more exactly, i. e. instead of always being 000, the last three digits of the modification time in milliseconds suddenly got a value for a lot of files. By this, JMBB found a lot of files with different timestamps but only few of them had changed, making it update only the copies of said few files. The other (unchanged) files' new times were not stored leading to very long scanning times due to checksuming all files in the filesystem where new times were available upon each invocation.
While one could have started the backups from scratch or reduce the accuracy of
metdata, another approach was chosen (cf.
DBNewTimes.java in the source code
for reasoning about the design decision): A new place in the “database” XML
file is now used to track files which have only changed in modification time
and thus not been backed up again. For these files, whenever the file is
scanned again with the same time, it is treated as already present in the
backup (which it is except for an older timestamp) making it unnecessary to
re-checksum the affected files upon every invocation.
Compatibility with previous versions : Currently, the additional database entries are not used for restoration which means the new JMBB can restore (and update) backups created with previous versions. Upon updating an old backup, JMBB will automatically create the new structure in the XML file.
Short Summary : If you experience unusually long backup update times after a Java and/or OS upgrade (with a lot of HDD activity), this update is likely to solve the issue after the 2nd (!) backup update done with it.
With this version update, no manual database editing is necessary.
Updates in 1.0.3
No changes should be necessary. This release fixes an issue with backup
transactionality: In case a backup were updated and some files that existed
already previously were changed but no longer readable (e. g. due to
intermittent deletion or
chmod 000), then this could cause a backup
inconsistency (blocks being deleted but not marked in the database as such).
This version attempts to fix the issue by only deleting blocks after the DB
with the changes could be saved successfully.
Updates in 1.0.4 to 1.0.6
No changes should be necessary. This release adds the integrity check feature described in section Performing Integrity Checks.
JMBB Features and their respective implementations
Archiving GNU CPIO "Portable ASCII Format" (filesize limit: 8 GiB) Compression XZ For Java http://tukaani.org/xz/java.html Encryption AESCrypt AES256, see copyright http://aescrypt.com/ Database GZipped XML with DTD included in JMBB JAR File traversal Java NIO for backup source directory traversal
Envorinoment variables affecting backup creation. Use with care
JMBB_THREADS Number of XZ threads to create. Default: number of cores
JMBB_XZLEVEL Changes the XZ compression level from 1 to 9, default: 8
true means warnings about failed
stat()s are hidden
JMBB performance comparison table
Command User/s Sys/s Real/s MaxRes/k Comp/O
jmbb -o BAK -i SRC 1231 71 298 3959444 0.3122
tar -c SRC | 7z* BAK 822 7 239 3785532 0.3065
jmbb -r DST -s BAK 162 15 175 478300 n/a
7z x -so BAK p* | tar -C DST -x 38 4 53 68300 n/a
Command aliases for the JMBB performance comparison table
p* := -ptestwort
7z* := 7z a -t7z -m0=lzma2 -mx=9 -mfb=64 -md=64m -ms=2g -l p* -si -bd
Machine and OS Requirements and Recommendations
- Linux or UNIX with GNU CPIO
- Java 7 or higher Runtime Environment (only Java 7 and 8 known to work). For more than three (virtual) cores a 64 Bit JVM is required.
- Enough HDD space to store the newly created backup.
- RAM requirements (see below)
- Multicore (4+ recommended) system with much RAM (6 GiB+ recommended).
JMBB is a “heavyweight” Java program which requires a lot of RAM depending on your system because it creates distinct XZ compressors for every processor core and because it reads it's whole database into your RAM.
Approximately you will need
300 MiB + (virtual) cores * 600 MiB free RAM. => old singlecore: 1 GiB of free RAM recommended. => new quadcore: 5 GiB of free RAM recommended (rem: 8 virtual cores).
You might need more than 300 MiB of additional RAM depending on the size of the directory tree you want to backup. Also, the amount of RAM required for the data base slowly grows with each incremental backup. This is currently an unfortunate design error/requirement.
If you do not fulfil the RAM requirements but still want to use JMBB you might be able to tune the amount of memory required by lowering compression settings or reducing the number of threads to be created. These values can be affected by environment variables as listed in the table above.
Also, the increasing RAM usage of the database is considered a “known issue” which should be resolved in the future. Be aware, however, that a solution to that issue will at least require you to convert your database if not to switch to anoter program.
Performing Integrity Checks
Since version 1.0.6, a new mode of invoking JMBB has been added: The integrity
check. The idea behind the integrity check is to supply a database file and
a directory that contains
.cxe files. JMBB will then attempt to decrypt all
of the blocks and compare their checksum against the value stored in the
database. Additionally, JMBB checks if all blocks necessary to restore the
backup state from the database are present on disk.
Despite being a computationally and I/O intensive operation, the integrity check can achieve good performance: Below invocation ran on about 250 GiB of block data and finished in about half an hour, i.e. averaged 140 MiB/s on an Intel Core i7-4770 with 24 GiB of RAM and a software RAID 1 of SATA HDDs.
$ jmbb -I /fs/backuphist/metadata/db.xml.gz -R /fs/backuphist/blocks Details ======= 0000000000000001 [ ok ] obsolete, absent 0000000000000002 [ ok ] obsolete, absent 0000000000000003 [ ok ] obsolete, absent [...] 00000000000013b2 [FAIL] active, absent 00000000000013b3 [FAIL] active, absent 00000000000013b4 [FAIL] active, absent [...] 00000000000033a8 [ ok ] active, verified 00000000000033a9 [ ok ] active, verified Statistics ========== [ ok ] active, verified 3098 [ ok ] obsolete, verified 8942 [ ok ] obsolete, absent 915 -- SUM 12955 [FAIL] active, absent 269 Summary ======= BACKUP IS INCONSISTENT!
The report is structured as follows:
All blocks are listed in a sorted and tabular fashion. The table columns are as follows:
- Block ID: The number of the block
[ ok ]or
[FAIL]depending on whether this block is good or not. A block is good if either (a) it exists on disk and its checksum matches OR (b) it does not exist on disk and is no longer needed (obsolete).
- Whether the block under consideration is needed (
active) or obsolete.
- Whether the block is present on disk and verified (
verified), not present on disk (
absent) or present but not the same as recorded in the database (
CHECKSUM MISMATCH). In case multiple block files exist with the same name, their individual file paths and whether they match the database will also be reported in separate lines following the current entry. In case you are interested in the complete set of messages that can appear here, check file
This section counts each of the result messages' occurrences. In case of
a backup archive there might be a constant number of blocks in the
obsolete absent category that should not increase (unless the archive
is losing blocks!). Hence, these statistics allow insights into the completeness
of such an archive beyond a simple “is able to restore”.
This section will only ever display one of two messages:
BACKUP IS INCONSISTENT!
: At least one of the active blocks is absent or
at least one block file was found to have a mismatching checksum.
Backup is consistent.
: All blocks on disk match the database's contents and it should be possible
to restore the state of the database from the blocks. Note: JMBB does not
extract all of the blocks' contents, thus a minor uncertainity remains that
the blocks' contents may have been invalid in the first place. Such
anomalies can currently only be detected by performing a proper restore and
then comparing against the original data.
As JMBB is a Java-Application it is normally unnecessary to recompile it. But
if you extract the source from the jarfile, you can compile JMBB with
to generate all .class files or
$ ant jar to generate the jmbb.jar
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
: You can view the GPL by either browsing the JMBB jarfile or running JMBB
-$ parameter. The full sourcecode is also part of the jarfile.
JMBB was designed to allow the user to restore his/her data even if many things
are lost. Depending on the situation, most of the restoration can be done by
JMBB itself. Should JMBB not be able to run or should a compatible
implementation not be available, it is possible to restore the data from the
blocks. To restore all data (including possibly deleted files) from a block, it
can be extracted with common utilities all of which may be substituted if
.cxe are CPIO XZ Encrypted files which means that they can be
extracted in reverse order of creation, e. g.:
$ aescrypt -d -p PASSWORD -o result.cpio.xz block.cxe $ xzcat result.cpio.xz | cpio -i
Apart from user data, every block file contains a file
checksums and information about the contained files. The “global” database file
applying to the whole backup contains everything including all
files. It can be used to find out about which files should be restored and
which files were deleted or replaced by another file. Usually, this is done by
the JMBB restoration function … it is listed here for emergency cases only.
To reduce the possiblity of such an emergency where JMBB might not be available, it is advisable to store JMBB and a “reference” CPIO implementation next to the backup, ideally in the same folder the backup resides in.
Relying on backup software used by few people, which JMBB is for example, is always a security risk: No experts have reviewed the sourcecode and chances are bad data can be recovered in case of failure. To mitigate the potential negative aspects, JMBB relies on standard formats for all of its features. Still, some risks remain. To give an overview about the alternatives which are used by more users, there used to be a list of alternatives here.
Since 2021/04/10, a new and more exhaustive analysis of the alternatives exists under backup_tests_borg_bupstash_kopia(37).
Creating and updating a backup with JMBB is simple enough. However, JMBB was also designed to be used for complex backup strategies. As an example, this section describes how to setup a backup strategy similar to JMBB's author's.
Integration into a Backup Strategy
My backup consists of multiple layers: A quick incremental backup to another internal HDD is created on every shutdown and an encrypted copy is sent to a separate PC for synchronizing the data to a cloud service. Its main purpose is to record changes and allowing to fetch older versions of files if a file was accidentally deleted or changed. About once per week, a backup of important data and programs is copied to an external SSD. This backup strategy is implemented using JMBB and standard Linux utilities.
The backup upon shutdown
$ jmbb -o $HOME/backup -i /data/main | copydelta.sh
copydelta.shcollects new blocks into a separate directory.
The weekly backup to an external SSD
$ jmbb -d $HOME/backup -c /mnt/backup
- Mirroring is sufficient.
The system and data backup
$ jmbb -d $HOME/backup -c /mnt/backup_system
$ rsync -av /data/programs /mnt/backup_system
The online backup.
$ jmbb -d $HOME/backup -c $HOME/backupThis shows a special JMBB feature: A mirror can be stored in a database. By storing all blocks in a directory called "cnt" and an encrypted copy of the database with it, it is easily possible to sync "cnt" to a (possibly public) online storage without disclosing any information about your files. To extract the files, a copy of the DB (
cnt/../db.xml.gz) or the password are required.
$ onlinesync.sh $HOME/backup/cnt
onlinesync.shis a hypothethical name of a script to synchronize a normal directory with the online storage.
The archive backup
On an irregular schedule, new blocks that were collected upon shutdown are archived to separate machine. After arriving there, the integrity check feature is used to check the existent and newly created blocks' consistency. This archive acts in a pull-based fashion and does not ever overwrite existent data to achive some resistance against file corruption caused by malware.
As you can see, JMBB is best used in conjunction with other scripts and utilities. Also, it is recommended not to invoke JMBB directly but create a script to create an interface for your personal backup strategy in order to make backups more convenient.
Wrapper scripts are also helpful to add additional utility invocations, provide required environment variables (cf. table above) and to enter source directories automatically.
Immediately report bugs informally with important information (OS, architecture, CPIO, Java versions, etc.) and exact error message including stack trace (if available) to the Ma_Sys.ma e-mail address listed at the very beginning of this file. If you are able to reproduce the bug, add the minimum amount of steps which result in the bug to your mail.
You can currently not create restorable backups on Windows systems. This is a bug which results from the treatment of file names: The fixed separator
/ is assumed for CPIO's patterns but obviously CPIO's Windows version also
expects (and writes to the CPIO archives) real Windows filenames. Backups with
Unix filenames however, are restored correctly even on Windows systems if you
have a Windows version of CPIO.
- OutOfMemoryError does not cause nonzero exit status (but also does not update DB and therefore leaves it in a consistent state) → TEST
- File Name Handling defect: File names which are invalid UTF-8 are not processed properly.
- Studying AES-CBC in detail it seems that it might be good to make IV generation not only depend on the key but proably on the block number as well!
Plans for enhanced robustness
- For a JMBB SQL variant breaking the program at arbitrary points should be fully supported by creating blocks transactionally (once block on HDD: commit metadata, before block to HDD: commit as “failed” which will be deleted upon reentering the program with the same block ID not being used again and the metadata not being discarded but marked as “failed”
- It would be nice if JMBB handled FS changes sensibly: Upon doing a STAT for the file, the size should be noted. If it later changes → add to a notification list (incl. delta) and if it later disappears → repeat STAT and if it does not appear by then add it to a notification list. At the end of a JMBB invocation, a summary of size-changed and vanished files should be printed. (Does it make sense to do a 2-nd stat phase to detect added files, too?)
cpiowith higher limits.