lightpatch is a simple library for creating and apply patch files, similar to the functions of diff
and patch
. It is available primarily as a Go library, though a simple CLI tool is bundled as well.
The main goals of the project are:
- Efficient patching of files. The patch files themselves are not intended to be human-readable diffs.
- Byte-level diffs. This means it will work well when line-oriented tools like
diff
won't. An example of this being important would be changes to a single-line minified Javascript file. Furthermore, there is no presumuption that the input is in UTF-8 or any other encoding. It's just a stream of bytes. - The patchfile format is simple and well-defined. A basic decoder can be written in a few dozen lines. (Contrast to VCDIFF and its RFC that is nearly 30 pages long.)
- Patches may contain an integrity check (CRC-32) which provides strong validation that the output file matches the original input. Note: the CRC is to detect accidental corruption or a patch being applied to the wrong file. It offers little protection against malicious tampering.
- Simple API for developers. The go-diff library has over 30 functions, whereas lightpatch has three.
The core diff algorithm is general purpose and is geared towards text. While it will handle binary files fine (e.g. you could efficiently patch some changed EXIF data in a JPG image), it does not have the advanced handling of binary diffs you'll find in specialized tools such as bsdiff, xdelta, etc. On the other hand it is much simpler than those tools, especially for decoding.
You can easily build from source by installing Go and running:
go get -u github.com/kalafut/lightpatch/cmd/lightpatch
Making and applying patch files are the two supported commands:
lightpatch make file1 file2 > patch
lightpatch apply file1 patch > output # should match file2
lightpatch make --t 30s file1 file2 > patch # allow 30s to make the patch
lightpatch is very fast in the general case, but if you give it two very different files, it will try hard to find a diff even when there isn't one. By default it will "give up" after 5 seconds (usually plenty of time even for large files), but this is adjustable with the --t
option.
Note: the command still succeeds even if the timeout is reached, but the output might be a naïve diff that is just the new file in its entirety.
The API is described in the docs. The source for the CLI tool is also a good example.
The lightpatch file format is a simple TLV style. The patch file provide edit instruction to be applied to a source file. The command format is:
command [len] [data]
Command | Code | Notes |
---|---|---|
Copy | C (0x43) | Copy len bytes from source to dest. data is not used. |
Insert | I (0x49) | Insert the next len bytes from data into dest. |
Delete | D (0x44) | "Delete" the next len source bytes by advancing the source input and output nothing to dest. data is not used. |
Checksum | K (0x4B) | (Optional) The next 4 bytes are the CRC-32 of dest. If present, this must be the final command of the patch file. |
The len
parameter is varint encoded. Libraries are readily available to handle this encoding (and even a hand-rolled decoder is only a few lines).
CRC handling is optional on both ends. An encoder doesn't have to include it, and decoder don't have to verify them. It's better if they do, but in very simple cases the complexity may not be desired. Regardless if a decoder is verifying it or not, it should still return an error if there is data following the CRC, as that is an invalid patch.
The CRC uses the common CRC-32-IEEE polynomial.
Before:
The quick brown fox jumped over the lazy dog
After:
The quick brown fox leaped over the lazy dog.
Patch (hex):
43 14 copy 20 bytes ("The quick brown fox ")
44 03 delete/skip 3 bytes ("jum")
49 03 6c 65 61 insert 3 bytes ("lea")
43 15 copy 21 bytes ("ped over the lazy dog")
49 01 2e insert 1 byte (".")
4b 96 f6 b7 6c CRC-32 is 0x96f6b76c
lightpatch borrows heavily from:
- go-diff, which provided the inspiration for this project and is still the source of most of the diff generation code (albeit translated from character to byte handling).
- Google's Diff-Map-Patch library provided the fundamental diff generation algorithms upon which lightpatch, go-diff, and many variants are built upon.