compact serialization of variable length unsigned integers
this C libary provides needed API to encode/decode such integers and it can be used to serialize more complex structures
We have file-based API which can be used like this
int i=123456, error;
f=fopen("out.tmp", "wb+");
univaruints_write_single(f, 123456);
fseek(f, 0, SEEK_SET);
i=univaruints_read_single(f, &error)
We also provide another set of API calls for in-memory operations
based on idea from google's varint in its protocolbuffers http://code.google.com/apis/protocolbuffers/docs/encoding.html and from UTF-8 variable length encoding of Unicode I made this more compact format which is also supposed to be faster too and have extra useful properties.
a single integer in univaruints can be something like
0xxx-xxxx
10xx-xxxx xxxx-xxxx
110x-xxxx xxxx-xxxx xxxx-xxxx
the number of leading most significant set bits (ie. till the first zero) of first byte indicates the number of extra bytes needed to encode a single integer
0xxx-xxxx is 0-127
10xx-xxxx xxxx-xxxx is 128-16511 (as 0b1000-0000-0000-0000 => 128 and 0b1011-1111-1111-1111 => 16511)
110x-xxxx xxxx-xxxx xxxx-xxxx is 16512-2113663
- simple, fast, unit-tested and have predetermined length (from first byte)
- it preserve order (unlike protocolbuffers) eg. can be used to sort nested thread comments
- several convinient interfaces (eg. encode_signle/decode_single)
- more compact serialization can be made for increasing lists
This algorithm is also implemented in many computer languages
- univaruints library implemented in C programming language
- python-univaruints Pure python implementation
- js-univaruints Implementation in Javascript