escapeless

Efficient binary encoding for large alphabets.

Features

Low fixed-size overhead.
Compression-friendly output.
Arbitrary alphabets.
Fast and simple algorithm.
Does not involve heavy-weight arithmetic.

Comparison chart

Encoding	Alphabet Size	Overhead
escapeless255	255	0.4%
escapeless254	254	0.8%
escapeless253	253	1.2%
yEnc	252	1.6%*, 0-100%
escapeless252	252	1.6%
escapeless251	251	2.0%
escapeless250	250	2.4%
B-News	224	2.5%
escapeless240	240	6.7%
escapeless230	230	11.4%
escapeless225	225	13.8%
Base122	122	14.3%
basE91	91	22%*, 14-23%
Base94	94	22.2%
Ascii85	85	25.0%
Z85	85	25.0%
Base64	64	33.3%
uuencode	64	33.3%
Base58	58	36.6%
Base36 / 64-bit	36	59.2%*, 0-62.5%
Base32	32	60.0%
Base36 / 32-bit	36	62.0%*, 0-75%
Base16	16	100.0%

(*) On uniform distribution of input octets.

Building and testing

$ git clone git@github.com:kosarev/escapeless.git
$ cd c
$ make
$ make test

Basic idea

Given a source alphabet of size S and a target alphabet of size N < S, break the sequence of input characters into blocks so that the number of characters in each block does not exceed N − 1.

Since a block can contain at most N − 1 different characters and the target alphabet contains N characters, it is known that all those used characters can be mapped to the target alphabet and at least one extra character of the target alphabet will remain unmapped. For example:

 A B C D E F G H I J K L    12  Characters of the source alphabet (S)
 A   C D E     H I   K L     8  Characters of the target alphabet (N)
   x       x x     x         4  Characters missing in the target alphabet (takeouts)
   | | | |     | | |         7  Characters used in the block
 .         . .       . .     5  Characters not used in the block

Here, one possible mapping is:

 B −> A
 J −> K

with L left unmapped and all other characters of the target alphabet mapped to themselves.

What that unmapped character is for, is to make it possible to map unused takeouts, like F and G in the example, to a character of the target alphabet that does not represent any characters of the source alphabet for that block. Taking that into account, here's how a complete mapping would look:

 B −> A
 F -> L
 G -> L
 J −> K

Once the mapping is determined, we can output the encoded block with takeout characters in it replaced with members of the target alphabet. To let a decoder know the mapping, we also have to prepend each of the encoded blocks with a series of characters the takeouts are mapped to and assume that the decoder will be given the same set of takeout characters specified in the same order.

Overhead formula

For a source alphabet of size S, a target alphabet of size N and a block of N − 1 characters, the size of the encoded block is:

 encoded_block_size = takeouts_map_size + block_size =
                      (S − N) + (N - 1) =
                      S - 1

The overhead is thus:

 overhead = (encoded_block_size - block_size) / block_size =
            ((S - 1) - (N - 1)) / (N - 1) =
            (S - 1 - N + 1) / (N - 1) =
            (S - N) / (N - 1)

Encoding algorithm

Break the input message into blocks so that no block contains more than N - 1 characters, where N is the size of the target alphabet. Process every block separately as specified below.
Map every takeout character to a character of the target alphabet that is not used in the block and is not a takeout character. All takeouts not used in the block shall map to the same character.
Replace takeout characters of the block using that map.
Output the map followed by the rewritten block.

Decoding algorithm

Read the takeouts map and the encoded block.
Using the map, restore the takeouts in the block.
Output decoded block.

The idea explained in greater detail

Escapeless, Restartable, Binary Encoding

Thanks, Ian!

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
c		c
cpp		cpp
python		python
tests		tests
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

escapeless

Features

Comparison chart

Building and testing

Basic idea

Overhead formula

Encoding algorithm

Decoding algorithm

The idea explained in greater detail

About

Releases

Packages

Languages

License

kosarev/escapeless

Folders and files

Latest commit

History

Repository files navigation

escapeless

Features

Comparison chart

Building and testing

Basic idea

Overhead formula

Encoding algorithm

Decoding algorithm

The idea explained in greater detail

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages