Skip to content

Latest commit

History

History
99 lines (74 loc) 路 4.44 KB

ENCODING.md

File metadata and controls

99 lines (74 loc) 路 4.44 KB

Binary format

This file describes how data is encoded to binary and explain some rationale behind them.

Endianess

Everything uses big-endian

Basic types

UInt

An unsigned integer, stored in a variable number of bytes, depending on its value. This behavior lets small values (like 17) fit in one byte and, at the same time, give support to (almost) 64 bits integers. Another advantage is that the user doesn't need to care about fixing the field size.

The down-sides of this design are:

  • a more complex encoding/decoding process
  • lost of compatibily with most tools due to this rather rare encoding.

The first matching rule from the list below should be used. This means, for example, encoding 0 with 16 bits is invalid.

  • Integers greater or equal to 0 and less than 2^7=128 are encoded as uint8:
    0xxx xxxx (each char is a bit, x is either 0 or 1)
  • Integers less than 2^14=16384 are encoded as uint16, but with the first bit set:
    10xx xxxx xxxx xxxx
  • Integers less than 2^29=536870912 are encoded as uint32, but with the first 2 bits set:
    110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
  • Integers less than 2^61=2305843009213693952 are encoded as uint64, but with the first 3 bits set: 111x xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
  • Any other value should be treated as an error

Int

A signed integer, store in a variable number of bytes.

The first matching rule from the list below should be used. This means, for example, encoding 0 with 16 bits is invalid.

  • Integers greater or equal to -2^6=-64 and less than 2^6 are encoded as int8, but with the first bit unset:
    0xxx xxxx (each char is a bit, x is either 0 or 1)
  • Integers greater or equal to -2^13=-8192 and less than 2^13 are encoded as int16, but with the first bit set and the second unset:
    10xx xxxx xxxx xxxx
  • Integers greater or equal to -2^28=-268435456 and less than 2^28 are encoded as int32, but with the first 2 bits set and the third unset:
    110x xxxx xxxx xxxx xxxx xxxx xxxx xxxx
  • Integers greater or equal to -2^60=-1152921504606846976 and less than 2^60 are encoded as int64, but with the first 3 bits set: 111x xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
  • Any other value should be treated as an error

Half

A 16-bit floating point, also referred to as half, as specified in IEEE 754

Float

A 32-bit floating point, as specified in IEEE 754

Double

A 64-bit floating point, also referred to as double, as specified in IEEE 754

String

A UTF-8 string.

Binary

Any ArrayBufferLike sequence of octets (bytes). First, the Buffer length (in bytes), len, is encoded as uint (see above) and appended to the result. After that, len bytes follow (the Buffer content): <uint_length> <buffer_data>

Boolean

Either true, encoded as the byte 0x01, or false, encoded as 0x00.

JSON

Any JSON-compatible data. First the value is transformed in string by a JSON serialization algorithm (like JSON.stringify). The resulting string is the encoded as a string (see above).

RegExp

A JS-compatible regular expression, composed of:

  • source: the regex source as a string (as returned by the source property in a RegExp instance);
  • flags: a set from the universe {g, i, m}. That is, each of those 3 flags are active or not.

First, the source is encoded as a string. After that, is appended the flag byte. The flag byte is a bit-mask: 0000 0mig.

Date

A date value, represented by a UNIX timestamp in milliseconds, encoded as an Int.

Compound type

A compound type is an ordered sequence of fields. Each field has three properties:

  • its type
  • whether it's optional or not
  • whether it's an array or a single value

For each field (following in order):

  1. if it's optional
  2. if the value is empty (see below) 1. append the boolean false 2. continue to next field.
  3. else 1. append the boolean true
  4. if it's a single value
  5. append value encoded as defined by the field's type
  6. continue to next field
  7. get the the array length len
  8. append len encoded as an uint
  9. append each value in the array, encoded as defined by the field's type

empty

A value is said to be empty if it's an equivalent of undefined or null. Empty string, empty array, empty Buffers, empty object, zeros, NaN, Infinity, etc are NOT said to be empty